ICS 2015: INFORMS COMPUTING SOCIETY CONFERENCE 2015
PROGRAM FOR TUESDAY, JANUARY 13TH
Days:
previous day
all days

View: session overviewtalk overview

08:30-10:00 Session 24A: Stochastic Optimization VIII
Location: Salon A
08:30
Multi-Period Stock Allocation Via Robust Optimization
SPEAKER: Peter Jackson

ABSTRACT. We propose a robust optimization formulation of the one-warehouse, multi-retailer, multi-period stock allocation problem. We consider a pure formulation which ignores holding cost in order to focus on the risk-pooling motive for centralizing safety stock. The formulation leads to an insightful closed-form solution for the two-period identical retailer case. Furthermore, it is computationally tractable using Benders' Decomposition in the general case, but the number of constraints used to represent the risk pooling phenomenon increases with the square of the number of retailers considered. Furthermore, it appears to capture a large fraction of the potential risk-pooling benefit available. This research suggests that large-scale optimization problems involving risk pooling strategies are now within reach of solution.

08:50
A Two-Stage Stochastic Program for Closed-loop Supply Chain Network Design under Demand and Return Uncertainty
SPEAKER: Halit Uster

ABSTRACT. We provide a two-stage stochastic mixed integer linear programming model to determine the optimal locations of (re)manufacturing and intermediate processing facilities along with their capacity levels and forward and reverse product flows in a CLSC network under uncertainty in new product demand and return quantities. Our solution approach utilizes the Benders decomposition framework which we enhance via developing surrogate constraints, strengthened Benders cuts, and multiple Benders cuts as well as mean-value-scenario based lower-bounding inequalities obtained via dual subproblem disaggregation. Our results illustrate that the enhancements we suggest provide substantial computational improvements in terms of both solution times and quality.

Using our approach in a sample average approximation (SAA) framework, our analysis indicates that parameters such as product type and reasons for return, expected recovery rates, inspection costs, and transportation costs are instrumental in determining return product inspection locations and the structure of the CLSC network.

09:10
Aerial Coffee Delivery: Preference Appraisal Reinforcement Leaning in Presence of Stochastic Input Disturbances for Control-affine Systems

ABSTRACT. The aerial coffee delivery task requires a quadrotor to deliver a freely suspended coffee cup to a moving ground robot, in the presence of external stochastic disturbances. The quadrotor-load system is an inherently unstable, high-dimensional, non-linear system, which makes the task challenging for both traditional control methods and human demonstration. Stochastic disturbances, such as atmospheric changes, add unpredictability to the motion, and pose an additional challenge. Yet, this task is an instance of a larger class of preference-balancing tasks, tasks described with a set of opposing preferences. This talk presents PrEference Appraisal Reinforcement Learning (PEARL), which solves preference-balancing tasks in two phases, learning and planning. To learn a task with PEARL, reinforcement learning appraises the preferences offline on small problems assuming no disturbances. PEARLs online planning performs larger tasks in the presence of stochastic disturbances using Least Squares Axial Sum Policy Approximation. We model the robot as a stochastic control-affine system with unknown dynamics impacted by a Gaussian process, and the task, as a continuous Markov Decision Process. Results show that PEARL performs an order of magnitude faster than comparative methods, rejecting a range of disturbances. Experiments on a quadrotor demonstrate that the method is suitable for physical systems.

09:30
An Approximate Dynamic Programming Algorithm for United States Air Force Officer Sustainment

ABSTRACT. We propose an approximate dynamic programming (ADP) algorithm for making accession and promotion decisions in the United States Air Force (USAF) officer sustainment system. Accession decisions determine how many officers should be hired into the system at the lowest grade for each career specialty. Promotion decisions determine how many officers should be promoted to the next highest grade. We formulate an infinite horizon, discrete time stochastic Markov decision process model to examine this military workforce planning problem. The large size of the problem instance motivating this research suggests that conventional exact dynamic programming algorithms are inappropriate. As such, we employ least-squares approximate policy iteration with instrumental variables Bellman error minimization to determine approximate policies. Within our least-squares temporal differences policy evaluation step, we use a modified version of the Bellman equation that is based on the post-decision state variable. Approximating the value function using a post-decision state variable allows us to solve the inner maximization problem using an integer programming formulation. The ADP algorithm parameters are tuned on smaller problem instances prior to the ADP algorithm being applied to our large problem instance of interest. Computational results demonstrate the performance of the algorithm on the USAF officer sustainment system.

08:30-10:00 Session 24B: Health Applications: Healthcare Operations
Location: Salon B
08:30
Hospital-wide Physical Therapy Scheduling and Therapist Routing

ABSTRACT. We address the problem of scheduling and routing physical therapists hospital-wide. At the beginning of a day, therapy jobs are known to a hospital's physical therapy scheduler who decides for each job when, where and by which therapist it is performed. Besides treatment durations, the scheduler has to take into account the durations that therapists need in order to walk between two rooms where consecutive jobs are executed. We model the problem as Mixed-Integer Program (MIP). As the problem turns out to be NP-complete which leads to high computing times for large-sized hospitals, two heuristic algorithms for this problem are proposed. Using real-world data, we compare the performance of the three approaches on a variety of objectives. Our computational analysis reveals that the MIP can solve test instances with more than 50 jobs to optimality in an acceptable solution time. The advantage of the heuristic approach is that large instances can be solved immediately with a good trade-off regarding the hospital's primary objectives.

08:50
Analyzing the Hospital-Related Flow of Congestive Heart Failure (CHF) Patients and Readmission Risk Factors Identification
SPEAKER: Lior Turgeman

ABSTRACT. The hospital length of stay (LOS), and the time between a discharge and the next admission, are important measures of healthcare utilization, and are generally positively skewed. We model the flow of CHF patients, using data from the VA hospital system, by fitting a Coxian phase-type distribution to their LOS data, and extract the associated states in the latent Markov process. Selecting an appropriate number of phases helps to account for some heterogeneity among different LOS groups within the hospital, and provides a way to interpret each added covariate. By analyzing the strength of the connections among patient social, clinical, and historical characteristics within each group, the associated readmission risk could be estimated. For example, we found that groups with a greater LOS tended to have a greater proportion of patients from nursing home care. Nursing home care patients, who belong to the greater LOS group, tended to have a decreased readmission risk. Thus, by increasing the LOS of CHF patients whose characteristics lead to their inclusion into a nursing home group, or who enter the hospital from a nursing home, we might be able to reduce their risk for readmissions.

09:10
Inference from Structured and Unstructured Electronic Medical Data for Early Dementia Detection

ABSTRACT. The prevalence of Alzheimer's disease (AD) and other forms of dementia is increasing with the aging population, both in the United States and around the globe. The inability to cure these conditions results in prolonged and expensive medical care. Early detection is critical to potentially postpone symptoms and to prepare both healthcare providers and families for subjects' future needs. Current detection methods are typically costly or unreliable, and much stands to benefit from improved recognition of early AD markers. Electronic patient records provide the potential for computational analysis and prediction of complex diseases like AD. Whereas prior work on this problem has focused on structured data (e.g. test results) alone, this study integrates structured and unstructured (e.g. clinical notes) from the Alzheimer's Disease Neuroimaging Initiative (ADNI) for classification of subjects' dementia status. Prediction based on unstructured data alone performs with similar accuracy compared to structured data, and integration of the two provides performance improvements over either in isolation. In addition, we provide insights into which structured features were more useful for classification of AD, supporting previously observed trends, while also highlighting the potential for computational methods to discover new early markers.

08:30-10:00 Session 24C: Integer Programming: Modern Trends for Cutting Planes
Location: Salon D
08:30
On Maximal S-Free Sets with the Covering Property
SPEAKER: Joseph Paat

ABSTRACT. Current algorithms for solving mixed integer programs use cutting planes to approximate the solution set. A collection of these cutting planes can be generated by objects called S-free sets. Certain S-free sets exhibit the 'covering property', a tiling-like feature that yields computational tools for creating cutting planes. In this talk, we discuss these ideas, in addition to methods of constructing S-free sets with the covering property.

08:50
Cut-Generating Functions for Integer Variables
SPEAKER: Sercan Yildiz

ABSTRACT. We study a generalization of Gomory and Johnson's infinite group relaxation in which we allow constraints on the basic variables. Including this additional information in the model provides a tighter relaxation and has the potential to lead to more efficient cutting-planes for mixed integer programs. We present some interesting properties of strong valid inequalities in this model.

09:10
New Formulation and Valid Inequalities for the Optimal Power Flow Problem
SPEAKER: Burak Kocuk

ABSTRACT. The optimal power flow (OPF) problem is a nonconvex quadratically constrained continuous optimization problem that is the fundamental problem in the area of electrical power systems operations. The theme of our work is to design algorithmic techniques for finding globally optimal solutions of OPF. We begin by presenting a new formulation for the OPF problem. We prove that the McCormick relaxation of the classical (rectangular) formulation of OPF is weaker than the McCormick relaxation of the new formulation. We present results on the quality of other relaxations such as a second order conic (SOCP)and semi-definite (SDP) relaxation of the new formulation. Then, we present a class of valid inequalities for OPF in the context of the new formulation. Finally, we present extensive computational results to compare the performance of the new formulation and valid inequalities against the performance of the classical formulation.

08:30-10:00 Session 24D: Data Mining: Text Mining and Other Applications
Location: Salon E
08:30
Cooperative Data Analysis in Supply Chains Using Selective Information Disclosure
SPEAKER: Joerg Laessig

ABSTRACT. Many modern products (e.g., consumer electronics) consist of hundreds of complex parts sourced from a large number of suppliers. In such a setting, finding the source of certain properties, e.g., the source of defects in the final product, becomes more and more difficult. Data analysis methods can be used on information shared in modern supply chains. However, some information might be confidential since they touch proprietary production processes or trade secrets. Important principles of confidentiality are data minimization and that each participant has control over how much information is communicated with others which makes data analysis more difficult.

In this work we investigate the effectiveness of strategies of selective information disclosure in order to perform cooperative data analysis in the supply chain context. The goal is to minimize information exchange by only exchanging information which is needed for the analysis tasks at hand. The work is motivated by the growing demand for cross company data analysis, while addressing confidentiality concerns at the same time. As an example, we apply a newly developed protocol with association mining techniques in an empirical simulation study to compare it with complete information disclosure. The results show that the predictive performance is comparable.

08:50
Text mining for bird strikes and geocoding
SPEAKER: Rex Kincaid

ABSTRACT. Two text mining applications are presented. The first determines whether or not a flight safety incident report is a birdstrike. The second seeks to automate geocoding. AidData builds its geographical foreign development database through geocoding, a human-based data extraction process. Faced with a development finance boom of information, automating geocoding is needed to increase AidData's project evaluation capacity.

08:30-10:00 Session 24E: Heuristics IV
Location: Salon G
08:30
Assembly Flowshop Scheduling Problem with Total Tardiness Performance Measure

ABSTRACT. The two stage assembly flowshop scheduling problem has lots of application in real life. To the best of our knowledge, the two stage assembly flowshop scheduling problem with total tardiness performance measure and separate setup times has not been addressed so far, and hence, it is addressed in this paper. Two dominance relations are developed and several algorithms are proposed. Extensive computational experiments are conducted for the evaluation of the algorithms. The computational experiments have shown that one of the algorithms performs much better than the others. Moreover, the experiments have shown that the best performing algorithm performs much better than the best existing algorithm for the case of zero setup times in the literature. Therefore, the proposed best performing algorithm not only can be used for problems with separate setup times but also for the case of zero setup times.

08:50
Appointment Scheduling with Multiple Providers and Stochastic Service Times

ABSTRACT. We consider a multi-server appointment scheduling problem in which patients may not show up, and those who show up follow stochastic service times. Because this problem is NP-hard, it would be impractical to find optimal solutions by complete enumeration even for small instances. To reduce the size of the solution space, we prove some optimality conditions, which we subsequently use to design an efficient search procedure.

09:10
An Analytical Study in Connectivity of Neighborhoods for Single Round Robin Tournaments

ABSTRACT. Sport scheduling is a trending research topic in operations research. Local search heuristics are among the most effective methods to construct schedules. Prior studies have investigated the connectivity of existing neighborhoods in the scheduling of single round robin tournaments and determined that they are strongly affected by how the initial schedule is constructed.

In this paper we prove that, when constructing the initial schedule with the most popular method (the circle method) and for specific numbers of participating teams, the neighborhoods under consideration are not connected and stay trapped in a tiny portion of the search space. In consequence, search procedures on those neighborhoods are not likely to find good schedules for the tournament.

We also introduce a constructive method based on the faro shuffling of playing cards and show its equivalence with the circle method. The faro method is used in the analysis of connectivity of one of the neighborhoods under consideration. We conjecture that, when the faro shuffle permutation has a (n-2)-cycle, the analyzed neighborhood is not connected and it does not allow the search to escape from schedules isomorphic to the initial one. The non-connectivity of the other neighborhood is consequence of a classic result in graph theory.

08:30-10:00 Session 24F: Network Applications III
Chair: Cole Smith
Location: Salon H
08:30
A Game Theory Model for a Differentiated Service-Oriented Internet with Duration-Based Contracts
SPEAKER: Anna Nagurney

ABSTRACT. This paper presents a game theory model of a service-oriented Internet in which the network providers compete in usage service rates, quality levels, and duration-based contracts. We formulate the network-based Nash equilibrium conditions as a variational inequality problem, provide qualitative properties of existence and uniqueness, and describe an algorithm, which yields closed-form expressions at each iteration. The numerical examples include sensitivity analysis for price functions at the demand markets as well as variations in the upper bounds on the quality levels for the services

08:50
Dynamic Shortest Path Interdiction
SPEAKER: Cole Smith

ABSTRACT. The two-stage shortest path interdiction problem involves an attacker and traveler. The leader acts first to damage some (cardinality-constrained) set of arcs (increasing their traversal cost), in order to maximize the traveler's shortest path on the resulting network. This framework assumes that the attacker reveals its attacks before the traveler acts, putting the attacker at a disadvantage. The opposite philosophy applies in the robust optimization setting: Here, the attacker acts after the traveler's has committed to its path. The traveler cannot alter its path, and so the attacker has an advantage in terms of revealed information. In this talk, we explore a dynamic shortest path interdiction problem, in which the attacker can attack arcs at any time during the traveler's path. Accordingly, the traveler can adjust its path dynamically in response to these attacks. This game reveals rich optimal strategies, in which it may be optimal for the traveler to reuse certain arcs, and in which the attacker does not exhaust its attack budget at optimality. We present an exponential-state dynamic programming algorithm to solve this problem optimally, and then explore several relaxations and restrictions of the problem that produce lower and upper bounds on the optimal objective.

09:10
Network-Based Models for Optimization in Reliability

ABSTRACT. Reliability of a system can be increased by adding redundant components. We seek to maximize the reliability of such systems subject to restrictions on the number and types of redundant components that can be installed. Representing the system as a binary decision diagram (a type of network) yields an exact mixed integer linear program for a general form of this problem. Computational results demonstrate that this approach is effective for structured (e.g., series-parallel) systems.

Note: This presentation should be included in Kelly Sullivan's session in the Network Applications track.

10:00-10:30 Session 25: Coffee Break
Location: James River Prefunction Area
10:30-12:00 Session 26: Plenary Talk III
Location: Salon C
10:30
Using an Exponential Potential Function Method to Optimize Video-On-Demand Content Placement

ABSTRACT. For a large-scale Video-on-Demand service, as the library size grows, it becomes important to balance the disk space necessary to store content locally at the requesting node with the bandwidth required to serve requests from remote nodes.  This gives rise to the problem of deciding which content to place at which serving nodes, taking into account the resource constraints (disk and bandwidth) and content attributes (request patterns, size, and bandwidth).

We model this optimization problem as a mixed-integer program.  However, even for moderately large instances (20,000 videos, 50 serving nodes), the linear relaxation becomes intractable for off-the-shelf linear programming solvers, both in terms of time and memory use.  Instead, we approximately solve the linear relaxation by using a Lagrangian decomposition approach based on exponential potential functions, and then round that solution to an integer solution.

Computational experiments on a testbed of synthetic and realworld instances show that this decomposition approach typically reduces the running time by several orders of magnitude, while achieving solutions within 2% of optimal with no constraint violated by more than 1%. I will outline the approach, and then focus on several implementation choices that were critical to the performance of the approach.

13:00-14:30 Session 28A: Stochastic Optimization: General Methodology
Location: Salon A
13:00
Doubly Stochastic Sequential Assignment Problem
SPEAKER: Arash Khatibi

ABSTRACT. The Doubly Stochastic Sequential Assignment Problem (DSSAP) is an extension of the Sequential Stochastic Assignment Problem (SSAP), where sequentially arriving tasks are assigned to workers with random success rates. A given number of tasks arrive sequentially, each with a random value coming from a known distribution. Upon a task arrival, it must be assigned to one of the available workers, each with a random success rate revealed upon the task arrival. The reward of each assignment is the product of the task and the success rate value. The objective is to maximize the total expected reward. The optimal assignment algorithm for the general case of DSSAP, where workers have distinct success rate distribution, is presented and proven to have an exponential running time. An approximation algorithm with a polynomial running time is proposed that achieves a fraction of the maximum total expected reward.

13:20
Semi-Infinite Relaxations for the Dynamic Knapsack Problem with Stochastic Item Sizes

ABSTRACT. We consider a knapsack problem in which item sizes are stochastic and realized after an attempted insertion. The decision maker chooses an item to insert dynamically based on remaining capacity, and the process continues until no items remain or a realized size is larger than remaining capacity. We derive LP relaxations of polynomial and pseudo-polynomial size based on different approximations of the value function; these LP's are semi-infinite in the most general case, but reduce to simple finite LP models for many item size distributions. We relate the relaxations to previous work and compare them both theoretically and computationally. In both the polynomial and the pseudo-polynomial case, our relaxations dominate other known LP relaxations for this model.

13:00-14:30 Session 28B: Health Applications: Optimization Applications
Location: Salon B
13:00
Exact and Heuristic Approaches for Order Set Optimization

ABSTRACT. Recent research has highlighted the effectiveness of using order sets as a part of hospital information systems. Order sets represent time interval-clustered orders to which inpatients are assigned during the hospital stay. In this paper, we develop two Mixed-Integer Programming (MIP) models of the problem: The first model assigns orders to fixed time intervals and clusters them as order sets. The model minimizes costs associated with patients to i) assign them to order sets, ii) deselect orders from order sets that are not required, iii) select additional required orders to order sets, and iv) not-assigning patients to order sets. In our second model formulation, we cluster orders to flexible instead of fixed time intervals with the same objective function as compared to the first model. Dominance properties of the problem allow us to fix variables and to bound the number of clusters and intervals. We show a two-stage decomposition algorithm that breaks down the problem into the interval clustering and the patient order assignment problem. Using real patient data from a major academic medical center, we demonstrate the effectiveness of our exact and heuristic approach.

13:20
Testing Assumptions for Causal Inference with Observational Data Using the BOSS Model
SPEAKER: Jason Sauppe

ABSTRACT. Estimating causal effects within observational (non-experimental) data is a challenging problem that has received significant attention in the statistics literature. Most proposed methods for causal inference focus first on matching treatment units with similar (under some metric) control units, and then assessing the quality of the set of matched controls using additional covariate balance measures. In contrast, the Balance Optimization Subset Selection (BOSS) model bypasses the matching stage and optimizes these balance measures directly. Which balance measure to use with BOSS is an important practical consideration, both from an optimization perspective and from an efficacy perspective. That is, which balance measure is most effective at reducing potential sources of bias in the estimate of the treatment effect? Building on previous work that establishes the conditions under which treatment effect estimates are unbiased, this presentation discusses computational strategies and statistical tests that can help corroborate the assumptions that need to be made. These strategies are implemented using the BOSS model and demonstrated on both a simulated dataset and a well-studied dataset from the literature.

13:40
Assessing the Effect of Lipid Profile on Quality of Life: An Inverse MILP Approach

ABSTRACT. The health-related quality of life (HRQOL) for diabetes patients is strongly related to their cardiovascular risk factors. Surveys prevail as the most common tool for assessing patients' HRQOL but may lack logical consistency and involve the risk of bias due to challenges in sampling. In this study, we consider national lipid management guidelines from several countries and employ an inverse mixed-integer linear programming framework which seeks quality adjustment factors with varying lipid levels so as to make them as close as possible to being cost-effective among implementable policies. We calibrate our model using clinical data and present computational results to compare the guidelines' perceptions of lipid profile-related quality of life for Type 2 diabetes patients.

13:00-14:30 Session 28C: Integer Programming: Computational Integer Programming
Chair: Bo Zeng
Location: Salon D
13:00
Recent Progress on the DIP Decomposition-based MILP Solver
SPEAKER: Jiadong Wang

ABSTRACT. We discuss recent progress on the DIP decomposition-based MILP solver. The solver takes a generalized approach to exploiting problem decomposition and provides a framework for implementing algorithm based on both the addition of valid inequalities (cutting plane algorithm) and the generation of columns (Dantzig-Wolfe). Computational results illustrating recently added features, including approaches to parallelization and warm-starting of the subproblem solution process, will be presented.

13:20
A Branch-and-price Algorithm for the Capacitated Mobile Facility Location Problem
SPEAKER: Mustafa Sahin

ABSTRACT. The capacitated mobile facility location problem (CMFLP), which is a generalization of the mobile facility location problem (MFLP), is a combinatorial optimization problem with applications in supply chain operations and disaster relief. It consists of determining destination vertices for facilities and clients that are initially located at vertices in the graph such that each client has to share its destination vertex with a facility and the total demand assigned to a facility cannot exceed its capacity. The objective is to minimize the total weighted distance traveled by facilities and clients. We propose two Mixed Integer Programming (MIP) formulations for the CMFLP and discuss a branch-and-price algorithm where the variables of the first formulation are used for branching and the second formulation, which is a set partitioning formulation, is used to obtain lower bounds via a column generation procedure. We demonstrate the advantages of using the set partitioning formulation and the efficiency of the column generation procedure on instances adapted from existing instances in the literature. Our findings indicate the column generation approach is particularly beneficial, both in terms of computational time and solution quality, when the ratio of the number of clients to the number of facilities is small and the facility capacities are tight.

13:40
Computing Bilevel Mixed Integer Quadratic Programs
SPEAKER: Bo Zeng

ABSTRACT. In this study, we analyze bilevel mixed integer quadratic programs where the lower level problem involves discrete variables. Through reformulations, we present a single level equivalent form. We also show that, under some sufficient conditions, the equivalent form can be solved efficiently. Through a decomposition algorithm, along with a new type of cutting plane strategy, we demonstrate that our solution capability on this challenging problem can be clearly improved.

13:00-14:30 Session 28D: Data Mining: Graphs and Integer Programming
Location: Salon E
13:00
Discovery of Markov Network Graph Structure

ABSTRACT. A weighted hypergraph is defined as a tuple $H = (V,E,\omega)$ where, $V$ is the vertex set, $|V| = n$ and $E$ the set of non-empty subsets of the vertex set $V$ ($\cup_j e_j = V$), called hyperedges, where each hyperedge $e_j$ is assigned a positive weight $\omega(e_j)$. Two or more vertices are adjacent in $H=(V,E)$ if there is a hyperedge $e_j$ that contains all these vertices. When represented by a finite incidence $n \times m$ matrix $I(H) = V \times E$ where, each of the $n$ rows is associated with a vertex and each of the $m$ columns is associated with a hyperedge such that $\sum_{i|v_i \in e_j} i(v_i,e_j) = \omega(e_j)$, $H$ denotes the probablistic hypergraph on which the Markov random field (MRF) is defined. The latter provide undirected graphical representations of probability distributions where each vertex of the graph represents a random variable and the edges identify conditional independencies. Following the Hammersley-Clifford theorem, every Markov random field can be specified via potential functions $\phi_C$ corresponding each to maximal clique $C$ in the graph. In this paper, we propose to discover the structure of the Markov network from the data, i.e., discover conditional independencies in multivariate data, by mining the underlying hypergraph and exploiting vertex popularity (instead of trying to remove edges from an initial, fully connected graph). Knowing the undirected model structure (hypergraph), learning Markov networks then consists i) to obtain from the data the feature functions $\f_{X_C}}$ for each clique $C$, and ii) given the feature functions and the structure, to estimate the parameters, i.e., the weights of the feature functions from the data and obtain the potential functions $\phi_C$ for each clique. One can then perform inference tasks such as event probability estimation and computing the most probable configuration.

13:20
Comparison of similarity graphs for spectral clustering algorithms

ABSTRACT. Spectral clustering forms groups of data points based on spectral analysis of a similarity graph. It is easy to implement, and it outperforms traditional clustering methods such as k-means algorithm. However, there is not a systematic way to determine the similarity graph and its parameters. To address this issue, we analyze the impact of the similarity graphs and its parameters on the spectral clustering algorithms. We compare k-nearest neighbor (KNN), mutual KNN, ε-neighborhood, fully connected graphs, and similarity graph based on Gabriel graph. We also propose a parameter-free similarity graph based on density and connectivity. In the comparisons, we use artificial and real data sets with various properties including arbitrary shapes and density variations. We show that the performance of spectral clustering is sensitive to the choice of the similarity graph and its parameters. The proposed similarity graph outperforms the competing approaches in most of the data sets.

13:40
New Exact Formulations to Solve Generalizations and Extensions of the OPSM Problem
SPEAKER: Andrew Trapp

ABSTRACT. The order-preserving submatrix (OPSM) problem was introduced for the purpose of analyzing DNA microarray data (Ben Dor et al. 2003). For a given data matrix A, it seeks to identify the largest submatrix obeying a linear ordering over all included rows, across all included columns. Beyond the context of gene expression, the OPSM problem has been extended to other applications such as target marketing and recommender systems. In recent years the OPSM problem has seen significant advances with respect to solution approaches, including exact approaches and approximation algorithms. Building upon these recent developments, we provide two exact formulations, maxGOPSM and minGOPSM, that can solve a generalization to the OPSM pattern that includes the reverse linear ordering (known as ``GOPSM'', that is, General OPSM). Perhaps more importantly, we further develop the idea of identifying statistically significant (G)OPSM patterns into both formulations, and provide algorithms to find not just a single largest submatrix in A, but all (G)OPSMs that are both statistically significant and maximal with respect to the included columns and rows. We discuss the computational performance of our approaches on both simulated and real data instances.

13:00-14:30 Session 28E: Modeling Systems: Algebraic Modeling Systems and Real-World Applications
Location: Salon G
13:00
Scheduling Foundry Production without Mold Inventory
SPEAKER: Brent Austgen

ABSTRACT. A collection of mathematical programs is used to schedule production at a foundry that casts products without an inventory of sand molds. The problem of sequencing products is divided into two sequential sub-problems. The first distributes weekly demands to hours, and the second optimally sequences products per hour. Both phases are accomplished by solving mixed integer linear programs, and no specialized heuristic is needed. The resulting schedules detail the order in which casting molds should be created and how they should be split between two pouring lines so that products can be cast consecutively down each line. The two phases combine to reduce other production delays such as those caused by waiting for the molten metal to cool. Software development and implementation are discussed in addition to results on real data.

13:20
Toward Reusable Models: System Development for Process Analytics Formalism (PAF)

ABSTRACT. Decision optimization has been broadly used in many areas including economics, finance, manufacturing, logistics, and engineering. However, optimization modeling used for decision optimization presents two major challenges. First, it requires expertise in operation research and mathematical programming, which most users, including software developers, engineers, business analysts, and end-users, typically do not have. Second, optimization models are usually highly customized, not modular, and not extensible. Sustainable Process Analytics Formalism (SPAF) has been recently proposed at National Institute of Standards and Technology (NIST) to address these challenges, by transitioning from the redevelopment of optimization solutions from scratch towards maintaining an extensible library of analytical models, against which declarative optimization queries can be asked. However, the development of practical algorithms and a system for SPAF presents a number of technical challenges, which have not been addressed. In this paper we (1) propose an object-oriented extension of SPAF; (2) present a system implementation based on methods to reduce PAF models and queries into a formal mathematical programming formulation; (3) showcase PAF through a simple case study in the domain of manufacturing; and (4) conduct a preliminary experimental study to assess the overhead introduced.

13:40
Temporal Manufacturing Query Language (tMQL) for Domain Specific Composition, What-if Analysis, and Optimization of Manufacturing Processes With Inventories

ABSTRACT. Smart manufacturing requires streamlining operations and optimizing processes at a global and local level. This paper considers manufacturing processes that involve physical or virtual inventories of products, parts and materials, that move from machine to machine. The inventory levels vary with time and are a function of the configuration settings of the machines involved in the process. These environments require analysis, e.g., answering what-if questions, and optimization to determine optimal operating settings for the entire process. The modeling complexities in performing these tasks are not always within the grasp of production engineers. To address this problem, the paper proposes the temporal Manufacturing Query Language (tMQL) that allows the composition of modular process models for what-if analysis and decision optimization queries. tQML supports an extensible and reusable model knowledge base against which declarative queries can be posed. Additionally, the paper describes the steps to translate a tMQL model and a query into a formal mathematical programming model used by a commercial optimization solver.

14:00
Decision Guidance Analytics Language (DGAL): Toward Reusable Knowledge Base Centric Modeling

ABSTRACT. Decision guidance systems are a class of decision support systems that are geared toward producing actionable recommendations, typically based on formal analytical models and techniques. This paper proposes the Decision Guidance Analytics Language (DGAL) for easy iterative development of decision guidance systems. DGAL allows the creation of modular, reusable and composable models that are stored in the analytical knowledge base independently of the tasks and tools that use them. Based on these unied models, DGAL supports declarative queries of (1) data manipulation, (2) what-if prediction analysis, (3) decision optimization based on mathematical and constraint programming, and (4) machine learning, all through formal reduction to specialized models and tools, and in the presence of uncertainty.

13:00-14:30 Session 28F: Homeland Security: Information Retrieval and Text Mining
Location: Salon H
13:00
Text Mining in Dynamic Blog Networks
SPEAKER: David Banks

ABSTRACT. Many applications (the Internet, Wikipedia) concern networks of documents. We mine the corpus that consists of all U.S. political blog posts in 2012. The intent is to use recent advances in dynamic network modeling to improve the topic discovery, and recent research on text mining to improve the network modeling. We describe an analysis based on the subset of blog posts that concern the shooting of Trayvon Martin, and then a full analysis of the entire corpus, at a lower level of resolution.

13:20
Prediction in the Space of Narratives
SPEAKER: David Roberts

ABSTRACT. In this talk, after defining narrative spaces, I will highlight features of those spaces that both complicate and simplify event prediction tasks. I will describe both similarities and differences between more traditional event-based prediction tasks and narrative prediction tasks. I will show how narrative structure can be encoded in a data representation that enables simulation-based prediction to leverage narrative understanding and constrain the space of predictions. Lastly, I will emphasize open challenges and hidden complexity that results from the formalism I present.

13:40
Topological Data Analysis of Text
SPEAKER: John Harer

ABSTRACT. This talk describe a new methodology, topological data analysis, and applies it to text data from blog posts and other documents.