IOS2020: INFORMS OPTIMIZATION SOCIETY CONFERENCE 2020
PROGRAM FOR TUESDAY, MARCH 17TH
Days:
previous day
all days

View: session overviewtalk overview

08:00-09:30 Session TA1: Dynamic Optimization
08:00
Partially observable multistage stochastic programming
PRESENTER: David Morton

ABSTRACT. We propose a class of partially observable multistage stochastic programs and describe an algorithm for solving this class of problems. We provide a Bayesian update of a belief-state vector, extend the stochastic programming formulation to incorporate the belief state, and characterize saddle-function properties of the corresponding cost-to-go function. Our algorithm is a derivative of the stochastic dual dynamic programming method.

08:22
Inverse Markov decision process with unknown transition probabilities
PRESENTER: Zahra Ghatrani

ABSTRACT. Inverse optimization is a framework for inferring parameters of an optimization model, given observed values of decisions. We study the problem of inverse Markov decision processes (MDPs) with unknown transition probabilities under two different scenarios. In the first case, an optimal value function is given, and we employ a variable transformation method that reduces the inverse MDP problem to a linear program. In the second case, an optimal policy is given, and the resulting inverse problem is bilinear. We explore different exact and inexact numerical methods for solving this bilinear problem.

08:44
Personalized Dose Finding Trials
PRESENTER: Saeid Delshad

ABSTRACT. Motivated by extending the results in Bayesian adaptive design for dose-finding clinical trials to incorporate patient covariates, we study a learning problem with unique features. In particular, given each set of covariates, we estimate the dose-response curve to identify the personalized target dose, achieved by minimizing the expected (over covariates) variance of the personalized target dose. We provide a framework to address the unique challenges in this setting and formulate the problem by which we show structural properties of the learning problem. We propose two policies by adaptation of one-step look-ahead and posterior sampling approaches to our setting. We show that the proposed policies asymptotically learn the true personalized target doses for all patient types. We derive a closed form formula for the proposed one-step look-ahead policy with two doses and study its unique structure. We also analyze the rate of sampling for this policy to understand its asymptotic allocation behavior. Additionally, our results elucidates the connection between the performance of proposed policies and the structure of dose-response curves. Eventually, we apply our framework to a clinical trial with real data to see the improvement achievable through applying the proposed framework and policies in terms of different performance measures.

08:00-09:30 Session TA2: Nonconvexity in Stochastic Programming
Location: Gold Ballroom
08:00
Optimal Crashing of an Activity Network with Multiple Disruptions

ABSTRACT. A stochastic disruption is a type of infrequent event in which the timing and the magnitude are random. We introduce the concept of stochastic disruptions and a stochastic optimization framework is proposed for such problems. In this talk, we present a special example: a project crashing problem under disruptions, potentially more than one. When a disruption occurs, the duration of an activity, which has not yet started, can change. Both the magnitude of the change of an activity's duration and the timing of the disruption can be random. We prove that it is NP-hard to optimize the expected project span for this crashing problem even we limit the number of disruptions to one. We formulate a multi-stage stochastic mixed integer program (SMIP) with mixed integer variables in each stage. This SMIP is computationally challenging to solve using existing techniques. We propose an adaptive branch-and-cut algorithm to solve the SMIP and evaluate the computational performance of our approach against the current SDDiP approach.

08:22
Bi-objective multistage stochastic linear programming
PRESENTER: Oscar Dowson

ABSTRACT. We propose an algorithm for solving a class of bi-objective multistage stochastic linear programs. We characterize the non-convex saddle-function properties of the cost-to-go functions and establish convergence results which also apply to bi-objective linear programs solved using Benders decomposition. Our algorithm is a derivative of the stochastic dual dynamic programming method and is implemented in the open-source stochastic programming solver SDDP.jl. The utility of our method is demonstrated on a hydro-thermal scheduling problem using data from the Brazilian Interconnected Power System.

08:44
Coupled Learning Enabled Stochastic Programming with Endogenous Uncertainty
PRESENTER: Junyi Liu

ABSTRACT. Predictive analytics, empowered by machine learning, is usually followed by decision-making problems in prescriptive analytics. We extend the above sequential prediction-optimization paradigm to a coupled scheme such that the prediction model can guide the decision problem to produce coordinated decisions yielding higher levels of performance. Specifically, for stochastic programming (SP) models with latently decision-dependent uncertainty, we develop a coupled learning enabled optimization (CLEO) algorithm in which the learning step of predicting the latent dependency and the optimization step of computing a candidate decision are conducted interactively. The CLEO algorithm automatically balances the trade-off between the accuracy of learning models and the complexity of the composite decision-making problem. Under certain assumptions, we show that the sequence of solutions provided by CLEO converges to a first-order stationary point of the original SP problem with probability 1. In addition, we present preliminary computations which demonstrate the computational potential of this data-driven approach.

08:00-09:30 Session TA3: Recent Advances in IPCO II
Location: Card Room
08:00
Developing an Efficient Exact Solution Approach for Scheduling Problems
PRESENTER: Masoud Barah

ABSTRACT. We develop a novel mathematical model for the job shop scheduling problem, enhanced with valid inequalities. Specifically, our model has an interesting decomposable structure that allows for solving large-scale problems. We then develop two linear programming based heuristics for the problem: The first heuristic finds a solution that can be passed to a commercial solver as an initial solution. The second heuristic enhances incumbent solutions found by the solver during the Branch and Bound (B&B) procedure. The latter can also be used for early pruning B&B tree nodes, resulting in significant computational savings. Our computational experiments show that our approach solves some benchmark job shop and flow shop problems to optimality for the first time.

08:22
A Study on the Block Relocation Problem: Lower Bound Derivations and Strong Formulations
PRESENTER: Bo Zeng

ABSTRACT. The block relocation problem (BRP) is a fundamental operational issue in modern warehouse and yard management, which, however, is very challenging to solve. In this paper, to advance our understanding on this problem and to provide a substantial assistance to practice, we (i) introduce a classification scheme and present a rather comprehensive review on all 16 BRP variants; (ii) develop a general framework to derive lower bounds on the number of necessary relocations and demonstrate its connection to existing lower bounds of the unrestricted BRP variants; (iii) propose and employ a couple of new critical substructures concepts to analyze the BRP and obtain a lower bound that dominates all existing ones; (iv) build a new and strong mixed integer programming (MIP) formulation that is adaptable to compute 8 BRP variants, and design a novel MIP-formulation-based iterative procedure to compute exact BRP solutions; (v) extend the MIP formulation to address four typical industrial considerations. Computational results on standard test instances show that the new lower bound is significantly stronger, and our new MIP computational methods have superior performances over a state-of-the-art formulation.

08:44
A Model of the Online Independent Set Problem
PRESENTER: Christopher Muir

ABSTRACT. This work studies an online extension of the stable set problem. In the proposed model, the topology of the graph is known probabilistically at the onset and is revealed dynamically as vertices are added to the stable set. The objective is to determine an optimal policy for constructing a stable set with the highest expected value. We present complexity results demonstrating that even for restrictive graph classes (e.g. stars) the problem of determining the optimal policy is NP-Hard. For the general case, we present the results of a polyhedral study and motivate families of valid inequalities arising from paths and cliques. Finally, we use the results of the polyhedral study to demonstrate the computational performance of various heuristic methods for this problem.

08:00-09:30 Session TA4: Online Optimization in Healthcare
Chair:
08:00
Causal Inference with Selectively-Deconfounded Data
PRESENTER: Kyra Gan

ABSTRACT. Given only data generated by a standard confounding graph with unobserved confounder, the average treatment effect (ATE) is not identifiable. To estimate the ATE, a practitioner must then either (a) collect deconfounded data; (b) run a clinical trial; or (c) elucidate further properties of the causal graph that might render the ATE identifiable. In this paper, we focus on the benefit of incorporating a (large) confounded dataset alongside a (small) deconfounded dataset in estimating the ATE. Our theoretical results show that the inclusion of confounded data can reduce the quantity of deconfounded data required to estimate the ATE to within a desired accuracy level. Moreover, in some cases (say genetics), we could imagine retrospectively selecting samples to deconfound. We demonstrate that by carefully choosing these examples based upon the (already) observed treatment and outcome, we can reduce our data dependence further. Moreover, the worst-case performance of our approach (vs. a natural benchmark) is bounded while our best-case gains are unbounded. Both theoretical and empirical results support our conclusions.

08:22
Optimal Decision Tree with Noisy Outcomes
PRESENTER: Su Jia

ABSTRACT. A fundamental task in active learning involves performing a sequence of tests to identify an unknown hypothesis that is drawn from a known distribution. This problem, known as optimal decision tree induction, has been widely studied for decades and the asymptotically best-possible approximation algorithm has been devised for it. We study a generalization where certain test outcomes are noisy, even in the more general case when the noise is persistent, i.e., repeating a test gives the same noisy output, disallowing simple repetition as a way to gain confidence. We design new approximation algorithms for both the non-adaptive setting, where the test sequence must be fixed a-priori, and the adaptive setting where the test sequence depends on the outcomes of prior tests. Previous work in the area assumed at most a logarithmic number of noisy outcomes per hypothesis and provided approximation ratios that depended on parameters such as the minimum probability of a hypothesis. Our new approximation algorithms provide guarantees that are nearly best-possible and work for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades smoothly with this number. Our results adapt and generalize methods used for submodular ranking and stochastic set cover. We evaluate the performance of our algorithms on two natural applications with noise: toxic chemical identification and active learning of linear classifiers. Despite our theoretical logarithmic approximation guarantees, our methods give solutions with cost very close to the information theoretic minimum, demonstrating the effectiveness of our methods.

08:44
Meta Dynamic Pricing: Learning Across Experiments
PRESENTER: Ruihao Zhu

ABSTRACT. We study the problem of learning shared structure \textit{across} a sequence of dynamic pricing experiments for related products. We consider a practical formulation where the unknown demand parameters for each product come from an unknown distribution (prior) that is shared across products. We then propose a meta dynamic pricing algorithm that learns this prior online while solving a sequence of Thompson sampling pricing experiments (each with horizon $T$) for $N$ different products. Our algorithm addresses two challenges: (i) balancing the need to learn the prior (\emph{meta-exploration}) with the need to leverage the estimated prior to achieve good performance (\emph{meta-exploitation}), and (ii) accounting for uncertainty in the estimated prior by appropriately ``widening" the prior as a function of its estimation error, thereby ensuring convergence of each price experiment. Unlike prior-independent approaches, our algorithm's meta regret grows sublinearly in $N$; an immediate consequence of our analysis is that the price of an unknown prior in Thompson sampling is negligible in experiment-rich environments with shared structure (large $N$). Numerical experiments on synthetic and real auto loan data demonstrate that our algorithm significantly speeds up learning compared to prior-independent algorithms or a naive approach of greedily using the updated prior across products.

09:06
Safely Learning to Personalize from Sequential Observational Data
PRESENTER: Angela Zhou

ABSTRACT. We study the problem of learning personalized decision policies from observational data, in the one-stage/batch setting, while accounting for possible unobserved confounding. We prove generalization guarantees that ensure our policy will be safe when applied in practice and will in fact obtain the best-possible uniform control on the range of all possible population regrets that agree with the possible extent of confounding. We develop efficient algorithmic solutions to compute this confounding-robust policy. Since our approach considers robustness over an uncertainty set of weights that does not shrink as the data increases, our generalization guarantees crucially leverage the optimization structure in order to study uniform convergence over a solution space of restricted functional complexity. Finally, we assess and compare our methods on synthetic and semi-synthetic data. In particular, we consider a case study on personalizing hormone replacement therapy based on observational data, where we validate our results on a randomized experiment. We also develop extensions of the framework for the sequential decision-making setting. We consider the case of off-policy evaluation in infinite-horizon MDPs, where historical decisions may have been made with additional unobserved confounders. Our approach optimizes a linear objective function over the set of partially identified solutions to an estimating equation used in the unconfounded off-policy evaluation setting.

08:00-09:30 Session TA5: Interplay between Optimization and Statistics II
Location: Furman Room
08:00
Nonsmooth Optimization over Stiefel Manifold: Riemannian Subgradient Methods
PRESENTER: Zhihui Zhu

ABSTRACT. We consider a class of nonsmooth optimization problems over Stefiel manifold, which are ubiquitous in engineering applications but still largely unexplored. We study this type of nonconvex optimization problems under the settings that the function is weakly convex in the Euclidean space and address them using a family of Riemannian subgradient methods. First, we show iteration complexity $O(\epsilon^{-4})$ for these algorithms driving a natural stationary measure to be smaller than $\epsilon$. Moreover, local linear convergence can be achieved for Riemannian subgradient and incremental subgradient methods if the optimization problem further satisfies the sharpness property and the algorithms are initialized close to the set of weak sharp minima. As a result, we provide the first convergence rate guarantees for a family of Riemannian subgradient methods utilized to optimize nonsmooth functions over Stiefel manifold, under reasonable regularities of the functions. The fundamental ingredient for establishing the aforementioned convergence results is that any weakly convex function in Euclidean space admits an important property holding uniformly over Stiefel manifold which we name Riemannian subgradient inequality. We then extend our convergence results to a broader class of compact Riemannian manifolds embedded in Euclidean space. Finally, we will discuss the applications of these results in the context of robust subspace recovery and orthogonal dictionary learning. This is joint work with Xiao Li, Shixiang Chen, Zengde Deng, Qing Qu, and Anthony Man Cho So.

08:30
Near-optimal bounds for phase synchronization
PRESENTER: Yiqiao Zhong

ABSTRACT. The problem of phase synchronization is to estimate the phases (angles) of a complex unit-modulus vector from their noisy pairwise relative measurements. It is motivated by applications such as cryo-EM, camera orientation estimation, time synchronization of distributed networks, etc. This problem is usually formulated as a nonconvex optimization problem, and many methods, including semidefinite programming (SDP) and generalized power method (GPM), have been proposed to solve it. A lack of understanding of the properties of the nonconvex optimization problem has led to suboptimal results in analysis. In this talk, we bridge this gap, by proving that SDP is tight, under a near-optimal SNR (up to a log factor); and that GPM converges to the global optimum under the same regime. A novel leave-one-out analysis is used to analyze the solution of nonconvex problems.

08:52
Provable Generalizability of Deep Neural Networks under Sparsity-Inducing Regularization

ABSTRACT. A neural network (NN) has been frequently discussed and widely applied in recent literature. Despite the promising and frequent advancement in NN-related algorithms, models, and applications, the development of their theoretical underpinnings is seemingly lagging behind. Particularly, one open question looms large from the recent literature: how to theoretically ensure the generalization performance of the NNs when it is trained with finitely many samples. Most of the existing theories to this question indicate that the generalization error deteriorates polynomially in the dimensionality (the number of fitting parameters) of the NN. Based on those results, it seems necessary to require the sample size to be always larger than the number of fitting parameters in order to ensure the model to be properly generalizable. This is, however, inconsistent with the frequent success of (much) over-parameterized NNs in many practical applications. Towards addressing this inconsistency, our research provides a high-dimensional statistical learning-based analysis on the NNs and introduce a nonconvex regularization to induce sparsity in the network connections. As a result, we show that the generalization error of an over-parameterized NN can be bounded by a poly-logarithmic function of the dimensionality. Such a bound is established for all stationary points to the regularized in-sample NN training formulation.

09:14
Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution
PRESENTER: Cong Ma

ABSTRACT. Recent years have seen astounding progress in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting or completely lacks performance guarantees.

This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical models. In fact, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This “implicit regularization” feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on three fundamental statistical estimation problems, i.e. phase retrieval, low-rank matrix completion, and blind deconvolution, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. In particular, by marrying statistical modeling with generic optimization theory, we develop a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument.

08:00-09:30 Session TA6: Network Optimization Models and Applications II
Location: Palmetto Room
08:00
A multi-vehicle covering tour problem with speed optimization
PRESENTER: Joshua Margolis

ABSTRACT. The multi-vehicle covering tour problem with speed optimization aims to construct a set of maximal coverage routes for a fleet of vehicles that serve (observe) a secondary set of sites given a fixed time schedule and coverage requirements. We develop a labeling algorithm to solve its Lagrangian relaxation problem, which exploits the combinatorial structure exhibited by an optimal solution to the Lagrangian relaxation. Further, we consider extensions by incorporating risk thresholds, energy capacities, and time windows. The problem is formulated as a mixed-integer second-order cone programming (MISOCP) model and is an extension of both the multi-covering tour problem and the vehicle routing problem with time windows under energy constraints.

08:22
Network Alignment by Propagating Reliable Similarities
PRESENTER: Zirou Qiu

ABSTRACT. Networks model a variety of complex phenomena across different domains. In many applications, one of the central questions is how to align two or more networks to infer the similarities between nodes and discover potential correspondence. In this paper, we propose RuleAlign, a network alignment algorithm that relies exclusively on the underlying graph structure. Under the guidance of rules which we define, we compute the similarity between a pair of cross-network vertices iteratively based on their neighborhood. The resulting cross-network similarity matrix is then used to infer a permutation matrix that encodes the alignment. We improve the performance of a commonly used post-processing step (local search) by introducing a novel selection rule based on the levels of mismatching of vertices. Through extensive numerical experiments, we show that our alignment algorithm outperforms the state-of-the-art alignment methods in terms of alignment accuracy at the cost of an extra logarithmic factor. Reproducibility: Our source code, documentation, and data sets are available at https://tinyurl.com/y6qlsyh2 Link to the full paper: https://arxiv.org/abs/1911.05486

08:44
A Team Orienteering Problem for Fixed-Wing Drones

ABSTRACT. We formulate a team orienteering problem with multiple fixed-wing drones and develop an exact branch-and-price algorithm for it. Fixed-wing drones, unlike rotary drones, have kinematic constraints associated with them, thereby preventing them to make on the-spot turns and restricting them to a minimum turn radius. We present the implications of these constraints on the drone routing problem formulation and propose a systematic technique to address them in the context of the team orienteering problem. Further, we propose a novel branch-and-price algorithm with acceleration schemes and branching techniques specific to constraints imposed by fixed-wing drones. We also present computational results corroborating the effectiveness of our algorithm using benchmark instances.

08:00-09:30 Session TA7: Game Theory, Agriculture and Ecology
Location: Alexander Room
08:00
A bilevel programming model for coordinating product transitions

ABSTRACT. We formulate the hierarchical and decentralized decision-making process during product transitions using bilevel programming. Unlike typical bilevel programming formulations, the proposed model has two interdependent followers. We address the interdependency between followers by redirecting information through the leader, and propose an exact solution approach based on cut-and-column generation.

08:22
Incentivizing Homeowners for Tree Management using a Principal-Agent Game-Theoretical Approach
PRESENTER: Chen Chen

ABSTRACT. This paper presents a principal-agent game-theoretical framework where the principal delegates an action to each type of agent through a take-it-or-leave-it contract, which optimizes the expected value of the principal. The approach considers the possible rational and irrational actions of agents who hold private information and it proposes a mechanism that optimizes the resource allocation of the principal. We adopt our model to the control of emerald ash borer (EAB), a destructive pest species of ash trees native to Asia, through the collaboration between the principal (city) and an agent (homeowner) for the treatment or the removal of ash trees. We classify the type of homeowner based on his/her number of ash trees. The city offers a menu of contracts for each homeowner to induce the homeowner to treat infected trees. We also extend the model to a nonlinear stochastic formulation to consider multiple choices based on the preference of each type of agent. We found that if both inspection and treatment are sufficiently effective, only treating the infested trees can significantly limit the negative impact of EAB. Otherwise, the homeowners may need to treat all trees regardless of their infection status.

08:44
Global Supply Chain Networks and Tariff Rate Quotas: Equilibrium Analysis with Application to Agricultural Products
PRESENTER: Deniz Besik

ABSTRACT. In this paper, we develop a global supply chain network model in which profit-maximizing firms engage in competition in the production and distribution of products in the presence of quantitative trade policy instruments in the form of tariff rate quotas. Tariff rate quotas are two-tiered tariffs, in which a lower in-quota tariff is applied to the units of imports until a quota or upper bound is attained and then a higher over-quota tariff is applied to all subsequent imports. They are utilized to protect domestic producers in the case of a wide range of products, from agricultural ones to fabrics and even steel, and can be challenging to formulate. We construct the governing set of novel equilibrium conditions associated with the product flows and Lagrange multipliers, which correspond to quota rent equivalents, and derive the variational inequality formulation. Qualitative properties are presented along with an effective algorithm, which is then applied to compute solutions to numerical examples comprising an agricultural product case study on avocados and global trade. This work is the first to model and solve general, competitive supply chain network problems consisting of oligopolistic firms with multiple production sites and demand markets in multiple countries subject to tariff rate quotas.

09:45-10:45 Session Plenary IV: Constrained Optimization in Machine Learning (Speaker: Juan Pablo Vielma)

Abstract: 

While the machine learning world is dominated by unconstrained optimization models and methods, the operation research world often (or even mostly) considers constrained optimization models and methods. In this talk we consider the potential impact of computational methods and software tools for constrained optimization in machine learning through two specific applications. The first application considers the use of mixed integer programming to verify robustness of trained deep neural network. The second application considers the use of advanced interior point solvers for shape constrained regression.

Bio:

Juan Pablo Vielma is the Richard S. Leghorn (1939) Career Development Associate Professor at MIT Sloan School of Management. Dr. Vielma has a B.S. in Mathematical Engineering from University of Chile and a Ph.D. in Industrial Engineering from the Georgia Institute of Technology. His current research interests include the theory and practice of mixed-integer programming and applications in energy, natural resource management, marketing and statistics. He has received the Presidential Early Career Award for Scientists and Engineers (PECASE), the NSF CAREER Award, the INFORMS Computing Society Prize and the INFORMS Optimization Society Student Paper Prize. Dr. Vielma has served as chair of the INFORMS Section on Energy, Natural Resources, and the Environment and as vice-chair for Integer and Discrete Optimization of the INFORMS Optimization Society. He is currently an associate editor for Operations Research and Operations Research Letters, a member of the board of directors of the INFORMS Computing Society, and a member of the NumFocus steering committee for JuMP.

Location: Gold Ballroom
11:00-12:30 Session TB1: Robust Optimization
11:00
Data-Driven Two-Stage Conic Optimization with Rare High-Impact Zero-One Uncertainties

ABSTRACT. We address high dimensional zero-one random parameters in two-stage convex conic optimization problems. Such parameters typically represent failures of network elements and constitute rare, high-impact random events in several applications. Given a sparse training dataset of the parameters, we motivate and study a distributionally robust formulation of the problem using a Wasserstein ambiguity set centered at the empirical distribution. We present a simple, tractable and conservative approximation of this problem that can be efficiently computed and iteratively improved. Our method relies on a reformulation that optimizes over the convex hull of a mixed-integer conic programming representable set, followed by an approximation of this convex hull using lift-and-project techniques. We illustrate the practical viability and strong out-of-sample performance of our method on nonlinear optimal power flow problems affected by random contingencies, and report improvements of up to 20% over existing methods.

11:22
Data-Driven Robust MDPs Using Distance-Based Ambiguity Sets

ABSTRACT. We study a robust version of finite horizon MDPs, where the uncertainity in the transition probability for each state-action pair is only observable through historical data. Using the Wasserstein metric, we construct an ambiguity set for each state-action pair from this historical data. We show that this robust MDP admits a tractable convex formulation and derive the structure of the worst-case distribution. We also prove the convergence of the robust model to the nominal model as the sample-size of the historical data increases. Furthermore, using a concentration of measure inequality for the Wasserstein metric, we derive finite-sample performance guarantees for solutions of the robust MDP. Extensions to other distance-based ambiguity sets will be discussed if time permits.

11:44
Robust Optimization with Order Statistic Uncertainty Set
PRESENTER: Pengfei Zhang

ABSTRACT. We propose a new uncertainty set based on order statistics for robust optimization problems. We develop a tractable formulation for the robust optimization model with our new uncertainty set. We highlight the advantages of the new uncertainty set and show that the interval uncertainty set, the budget uncertainty set, and the demand uncertainty set can be obtained as special cases of the new uncertainty set. Finally, we report computational results on portfolio optimization problems with shortfall constraints and show that the new uncertainty set has superior performance relative to common alternatives.

11:00-12:30 Session TB2: Optimization of Service Systems Under Uncertainty
Location: Gold Ballroom
11:00
Nurse Staffing under Absenteeism: A Distributionally Robust Optimization Approach
PRESENTER: Minseok Ryu

ABSTRACT. This paper describes a data-driven approach for nurse staffing decision under uncertain demand and absenteeism. We propose a distributionally robust nurse staffing (DRNS) model with both exogenous (stemming from demand uncertainty) and endogenous uncertainty (stemming from nurse absenteeism). We provide a separation approach to solve the DRNS model with general nurse pool structures. Also, we identify several classes of nurse pool structures that often arise in practice and show how the DRNS model in each of these structures can be reformulated as a monolithic mixed-integer linear program that facilitates off-the-shelf commercial software. Also, we propose an optimal nurse pool design model.

11:22
A Distributionally Robust Optimization Approach for Stochastic Elective Surgery Scheduling with Limited ICU Capacity
PRESENTER: Karmel Shehadeh

ABSTRACT. We study the decision process of assigning elective surgery patients to available surgery blocks under random surgery durations, random patients' length-of- stay in the intensive care unit (ICU), and limited capacity of ICU. The probability distribution of random parameters is assumed ambiguous, and only the mean and ranges are known. Therefore, we propose a distributionally robust elective surgery scheduling (DRESS) model that seeks to find surgery scheduling decisions that minimize patient-related costs and the worst-case expected cost of operating room (OR) overtime, OR idle time, and lack of ICU capacity. We evaluate the worst-case over a family of distributions characterized by the known means and ranges of random parameters. We leverage the separability of DRESS in deriving an exact mixed-integer nonlinear programming reformulation. We linearize and derive a family of symmetry breaking inequalities to improve the solvability of the reformulation using an adapted column-and-constraint generation algorithm. Finally, we conduct extensive numerical experiments. Results demonstrate the superior performance of our DR approach as compared to the existing stochastic programming approach, and provide new insights for elective surgery scheduling.

11:44
On the Values of Vehicle-to-Grid Electricity Selling in Electric Vehicle Sharing
PRESENTER: Yiling Zhang

ABSTRACT. 1. Problem definition: We study electric vehicle (EV) sharing systems and explore the opportunity of incorporating vehicle-to-grid (V2G) electricity selling in EV sharing. 2. Academic / Practical Relevance: The problem involves complex planning and operational decisions, as well as multiple sources of uncertainties. The related optimization models impose significant computational challenges. The potential value of V2G integration may have far-reaching impacts on EV sharing and sustainability. 3. Methodology: We formulate the problem as a two-stage stochastic integer linear program. In the first stage, we optimize decisions related to service planning, the capacity of parking and charging facilities, EV battery capacities, and EV allocation in each zone under uncertain time-dependent trip demand and electricity prices. In the second stage, for a realized demand--price scenario, we construct a time-and-charging-status-expanded transportation network and optimize operations of the shared vehicle fleet, EV battery charging, and V2G selling. We develop Benders decomposition and scenario decomposition approaches to improve computational efficiency. A linear-decision-rule-based approximation approach is also provided to model dynamic operations. 4. Results: Via testing instances based on real-world and synthetic data, we demonstrate the computational efficacy of our approaches and study the benefits of integrating V2G in EV sharing from the service provider, consumer, and socio-environmental aspects. 5. Managerial Implications: V2G integration can significantly increase the profitability of EV sharing and the quality of service. It results in the preference of larger EV fleets and battery capacities, which further leads to various social-environmental benefits. The benefit of V2G can still prevail even with more severe battery degradation and can be more significant when combined with (i) more stringent service level, (ii) more traffic congestion, or (iii) when the urban spatial structure has concentrated business/residential areas. V2G integration (complemented by fast charging technology) can also benefit carshare users through improvement in the quality of service.

11:00-12:30 Session TB3: Discrete Optimization Methods and Applications
Chair:
Location: Card Room
11:00
Optimal location of last-mile consolidation areas to reduce environmental impact of urban deliveries of goods: a heuristic approach

ABSTRACT. Urban deliveries of goods have become a rising challenge due to the increase of deliveries to households from e-commerce in addition to the traditional business-to-business (B2B) dynamics. To maintain a satisfactory level of service, carriers must conduct delivery operations under restrictive environments, creating undesirable externalities such as congestion and pollution. Last-mile consolidation areas (LMCAs) are defined as on-street or off-street spaces where all the deliveries within a certain radius are dropped and reconsolidated to be delivered by more sustainable last-leg modes. The creation of provisional LMCAs might contribute to mitigate the impact of those negative externalities. This work formulates the last-mile delivery problem assisted by LMCAs as a mixed integer quadratically constrained program (MIQCP) and presents a greedy heuristic-based algorithm to solve it, inspired by the decomposition methods for large-scale linear optimization problems with complicating variables. The simulation results and a case study in Manhattan show how the heuristic provides reasonable results with respect to the exact solution while also shedding light on how these results can be utilized by policy planners to locate LMCAs in order to reduce the externalities of the last-mile operation without compromising the level of service to customers.

11:22
University timetabling problem with requested instructor workload using constraint program
PRESENTER: Nipitta Burana

ABSTRACT. We focus on a university teaching assignment and course scheduling problem with requested instructor workload. Since instructors should not teach too much more than their requested workload, we have an assumption that some specific course sections may need to be divided into two parts: before midterm and after midterm. Then, more than one instructor can be assigned to these course sections which will reduce the over-workload of those instructors. In order to get the timetable, we divide the calculations into two steps by using constraint programming. The first step can solve the problem without dividing course sections. However, the second step is more complicated, since some course sections require the same instructor in both parts, but some other course sections may allow different instructors in each part. We need to extract the solution obtained from the first step to see the updated amount of workload of each instructor in order to rerun the constraint program.

11:44
On Modeling Local Search with Special-Purpose Combinatorial Optimization Hardware
PRESENTER: Xiaoyuan Liu

ABSTRACT. As we approach the physical limits predicted by Moore's law, a variety of specialized hardware is emerging to tackle specialized tasks in different domains. Within combinatorial optimization, adiabatic quantum computers, CMOS annealers, and optical parametric oscillators are few of the emerging specialized hardware technology aimed at solving optimization problems. In terms of mathematical framework, the Ising optimization model unifies all of these emerging special-purpose hardware. In other words, they are all designed to solve optimization problems expressed in the Ising model. Due to variety of constraints specific to each type of hardware, they usually suffer from two main challenges: (i) number of variables that the hardware can manage to solve; and (ii) precision limitations. Given that large-scale real-world problems, including problems in combinatorial scientific computing, data science and network science require significantly more variables to model than these devices provide, we are likely to witness that cloud-based deployments of these devices will be available for parallel and shared access. Thus hybrid techniques in combination with both hardware and software must be developed to utilize these technologies. Local search as a meta-heuristic seems like a natural approach to tackle this challenge. However, a general optimization step within local search is not traditionally formulated in the Ising model. Thus, straightforward modeling of local search can lead to under-utilization of the hardware. In this work, we propose a new approach to model local search on these special-purpose hardware devices that takes the limitations of the Ising model and current hardware into account. As such, we demonstrate that the proposed approach utilizes a given hardware more efficiently compared to previous approaches.

11:00-12:30 Session TB4: Advances in Nonlinear Optimization and Applications to Machine Learning
11:00
Scaling Up Quasi-Newton Algorithms: Communication Efficient Distributed SR1
PRESENTER: Majid Jahani

ABSTRACT. In this paper, we present a scalable distributed implementation of the sampled LSR1 (S-LSR1) algorithm. First, we show that a naive distributed implementation of S-LSR1 requires multiple rounds of expensive communications at every iteration and thus is inefficient. We then propose DS-LSR1, a communication-efficient variant of the S-LSR1 method, that drastically reduces the amount of data communicated at every iteration, that has favorable work-load balancing across nodes and that is matrix-free and inverse-free. The proposed method scales well in terms of both the dimension of the problem and the number of data points. Finally, we illustrate the performance of DS-LSR1 on standard neural network training tasks.

11:22
A Line-Search Descent Algorithm for Strict Saddle Functions with Complexity Guarantees
PRESENTER: Michael O'Neill

ABSTRACT. Machine learning has given rise to many nonconvex optimization problems with geometry that is amenable to first-order methods. Of particular interest are "strict saddle" functions, whose critical points are either local (and often global) minimizers or saddle points with a direction of negative curvature for the Hessian. In this work, we develop a method which adaptively estimates the strict saddle parameters that define the optimization landscape. These estimates are used at each iteration to determine whether a gradient descent or negative curvature detection step is taken. The resulting method has a global linear convergence rate, without prior knowledge of the strict saddle parameters or a specialized initialization procedure.

11:44
Acceleration of Primal-Dual Methods by Preconditioning and Simple Subproblem Procedures
PRESENTER: Yanli Liu

ABSTRACT. Primal-Dual Hybrid Gradient (PDHG) and Alternating Direction Method of Multipliers (ADMM) are popular first-order optimization methods. They reduce a difficult problem to simple subproblems, so they are easy to implement and have diverse applications. As first-order methods, however, they are sensitive to problem conditions and can struggle to reach the desired accuracy. To improve their performance, researchers have proposed techniques such as diagonal preconditioning and inexact subproblems. This paper realizes additional speedup about one order of magnitude.

Specifically, we choose general (non-diagonal) preconditioners that are much more effective at reducing the total numbers of PDHG/ADMM iterations than diagonal ones. As the result, on many applications, the iterations lose their closed-form solutions. We show, however, it suffices to solve each subproblem approximately with a fixed few proximal-gradient iterations or a fixed few epochs of proximal block-coordinate descent, which are simple and have closed-form steps. Global convergence of this approach is proved.

Our method opens the choices of preconditioners and maintains both low per-iteration cost and global convergence. Consequently, on several typical applications of primal-dual first-order methods, we obtain 4--95$\times$ speedup over the existing state-of-the-art.

12:06
Decentralized Computation of Effective Resistances for Distributed Learning
PRESENTER: Bugra Can

ABSTRACT. The effective resistance between a pair of nodes in a weighted undirected graph is defined as the potential difference induced between them when a unit current is injected at the first node and extracted at the second node, treating edge weights as the conductance values of edges. The effective resistance is a key quantity of interest in many applications and fields including solving linear systems, Markov Chains and continuous-time averaging networks. We develop an efficient linearly convergent distributed algorithm for computing effective resistances and demonstrate its performance through numerical studies. We also apply our algorithm to the consensus problem where the aim is to compute the average of node values in a distributed manner based on weighted averaging with immediate neighbors. We show that the distributed algorithm we developed for effective resistances can be used to design a weight matrix which can help pass the information among neighbors more effectively, accelerating the convergence of the classical consensus iterations considerably by a factor depending on the network structure. We also present an application of our effective resistance-based framework to accelerate distributed optimization algorithms including the EXTRA and DPGA-W.

11:00-12:30 Session TB5: Convex and Discrete Optimization for Machine Learning
Chair:
Location: Furman Room
11:00
An $O(s^r)$-Resolution ODE Framework for Discrete-Time Optimization Algorithms and Applications to Convex-Concave Saddle-Point Problems

ABSTRACT. There has been a long history of using Ordinary Differential Equations (ODEs) to understand the dynamic of discrete-time optimization algorithms. However, one major difficulty of applying this approach is that there can be multiple ODEs that correspond to the same discrete-time algorithm, depending on how to take the continuous limit, which makes it unclear how to obtain the suitable ODE from a discrete-time optimization algorithm. Inspired by the recent paper \cite{shi2018understanding}, we propose the $r$-th degree ODE expansion of a discrete-time optimization algorithm, which provides a principal approach to construct the unique $O(s^r)$-resolution ODE systems for a given discrete-time algorithm, where $s$ is the step-size of the algorithm. We utilize this machinery to study three classic algorithms -- gradient method (GM), proximal point method (PPM) and extra-gradient method (EGM) -- for finding the solution to the unconstrained convex-concave saddle-point problem $\min_{x\in\RR^n} \max_{y\in \RR^m} L(x,y)$, which explains their puzzling convergent/divergent behaviors when $L(x,y)$ is a bilinear function. Moreover, their $O(s)$-resolution ODEs inspire us to define the $O(s)$-linear-convergence condition on $L(x,y)$, under which PPM and EGM exhabit linear convergence. This condition not only unifies the known linear convergence rate of PPM and EGM, but also showcases that these two algorithms exhibit linear convergence in broader contexts.

11:22
Learning Positive-Valued Functions with Pseudo Mirror Descent
PRESENTER: Yingxiang Yang

ABSTRACT. The nonparametric learning of positive-valued functions appears widely in machine learning, especially in the context of estimating intensity functions of point processes. Yet, existing approaches either require computing expensive projections or semidefinite relaxations, or lack convexity and theoretical guarantees after introducing nonlinear link functions. In this talk, we introduce the pseudo mirror descent algorithm for performing efficient estimation of positive functions within a Hilbert space without expensive projections. The algorithm guarantees positivity by performing stochastic mirror descent with an appropriately selected Bregman divergence, and a pseudo-gradient is adopted to speed up the gradient evaluation procedure in practice. We analyze both asymptotic and nonasymptotic convergence of the algorithm. Through simulations, we show that pseudo mirror descent outperforms the state-of-the-art benchmarks for learning intensities of Poisson and multivariate Hawkes processes, in terms of both computational efficiency and accuracy.

11:44
Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives
PRESENTER: Hussein Hazimeh

ABSTRACT. We consider a discrete optimization based approach for learning sparse classifiers. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) L0-regularized problems at scales much larger than what was considered possible in the statistics and machine learning communities. Despite their usefulness, MIP-based approaches are significantly slower compared to the relatively mature algorithms based on L1-regularization and relatives. We aim to bridge this computational gap by developing new MIP-based algorithms for L0-regularized classification. We propose two classes of scalable algorithms: an exact algorithm that can handle p ~ 50,000 features in a few minutes, and approximate algorithms that can address instances with p ~ 10^6 in times comparable to the fast L1-based algorithms. Our exact algorithm is based on the novel idea of integrality generation, which solves the original problem (with p binary variables) via a sequence of mixed integer programs that involve a small number of binary variables. Our approximate algorithms are based on coordinate descent and local combinatorial search. In addition, we present new estimation error bounds for a class of L0-regularized estimators. Experiments on real and synthetic data demonstrate that our approach leads to models with considerably improved statistical performance (especially, variable selection) when compared to competing toolkits. This is a joint work with Antoine Dedieu and Rahul Mazumder.

11:00-12:30 Session TB6: Recent Advances in Structured Nonlinear Optimization, and Applications in Machine Learning
Chairs:
Location: Palmetto Room
11:00
Wasserstein Distributionally Robust Inverse Multiobjective Optimization
PRESENTER: Bo Zeng

ABSTRACT. We propose a distributionally robust approach to inverse multiobjective optimization. Specifically, we study the problem of learning the objective functions or constraints of a multiobjective decision making model, based on a set of observed decisions. In particular, these decisions might not be exact and possibly carry measurement noises or are generated with the bounded rationality of decision makers. We use the Wasserstein metric to construct the uncertainty set centered at the empirical distribution of these decisions. We show that this framework has statistical performance guarantees. We also develop an algorithm to solve the resulting minmax problem and prove its finite convergence. Numerical results demonstrate the performance of the distributionally robust approach to inverse multiobjective optimization.

11:22
Katyusha Acceleration for Convex Finite-Sum Compositional Optimization
PRESENTER: Yibo Xu

ABSTRACT. We focus on convex problems with a finite-sum compositional structure. One popularly used numerical approach for finite-sum problems is the stochastic gradient method (SGM). However, the additional compositional structure prohibits easy access to unbiased stochastic approximation of the gradient, so directly applying the SGM to a finite-sum compositional optimization problem (COP) is often inefficient.

We design new algorithms for solving strongly-convex and also convex two-level finite-sum COPs. Our design incorporates the Katyusha acceleration technique and adopts the mini-batch sampling from both outer-level and inner-level finite-sum. For strongly-convex finite-sum COPs, we obtain linear convergence rate in terms of the expected objective error; and from the convergence rate result, we then establish complexity results of the algorithm to produce an ε-solution. Our complexity results have the same dependence on the number of component functions as existing works. However, due to the use of Katyusha acceleration, our results have better dependence on the condition number κ and improve to κ^{2.5} from the best-known κ^3. For convex finite-sum COPs, we use as a subroutine the algorithm for strongly-convex finite-sum COPs. Again, we obtain better complexity results than existing works in terms of the dependence on ε, improving to ε^{-2.5} from the best-known ε^{-3}.

11:44
Near-Optimal Algorithms for Smooth Convex-Concave Minimax Optimization and Beyond
PRESENTER: Tianyi Lin

ABSTRACT. In this paper, we propose a class of near-optimal algorithms for solving minimax optimization problem $\min_{\x \in \XCal} \max_{\y \in \YCal} f(\x, \y)$ where $f(\cdot, \cdot)$ is $\ell$-smooth and $f(\cdot, \y)$ is strongly convex for each $\y \in \YCal$ with a constant condition number $\kappa>0$. For the case when $f(\x, \cdot)$ is strongly concave for each $\x \in \XCal$ with a condition number $\bar{\kappa}>0$, our algorithm achieves the complexity bound of $\tilde{O}(\kappa^{1/2}\bar{\kappa}^{1/2})$ which improves upon the previous best bound $\tilde{O}(\max\{\kappa, \bar{\kappa}\})$ achieved by~\textit{extragradient} algorithm and \textit{optimistic gradient descent ascent} algorithm. For the case when $f(\x, \cdot)$ is concave, our algorithm achieves the complexity bound of $\tilde{O}(\kappa^{1/2}\epsilon^{-1/2})$. This improves upon the previous best bound $\tilde{O}(\kappa\epsilon^{-1/2})$ achieved by \textit{dual implicit accelerated gradient} algorithm.

Beyond the convex-concave setting, we propose new algorithms for smooth nonconvex-(strongly)-concave minimax optimization problems where $f(\cdot, \y)$ is nonconvex for each $\y \in \YCal$. More specifically, for the case when $f(\x, \cdot)$ is strongly concave for each $\x \in \XCal$ with a condition number $\bar{\kappa} > 0$, our algorithm achieves the complexity bound of $\tilde{O}(\bar{\kappa}^{1/2}\epsilon^{-2})$ which is the best to our knowledge. For the case when $f(\x, \cdot)$ is concave, our algorithm achieves the complexity bound of $\tilde{O}(\epsilon^{-3})$ which matches the current state-of-the-art result.

12:06
Sliding Linearized ADMM For Convex Smooth Optimization With Affine Constraints
PRESENTER: Trevor Squires

ABSTRACT. In this talk, we present a sliding linearized alternating direction method of multipliers (SL-ADMM) for solving a class of convex smooth optimization with affine constraints. We prove that the number of algorithm iterations for solving an ε-approximate solution is bounded by O(1/ε), while the number of evaluations of the gradients of objective function is bounded by O(1/√ε). Preliminary numerical experiments will also be presented.

11:00-12:30 Session TB7: General Session
Location: Alexander Room
11:00
M-natural-convexity and its applications in operations
PRESENTER: Menglong Li

ABSTRACT. Substitution, namely one item can be used to replace another item, is a common scenario in practice. However, finding the optimal policy of a dynamic operations model with a substitution structure is usually a challenging problem. Unlike operations models with complementarity structures that can adopt powerful tools from lattice programming, there is no general theoretical tool appropriate for those with substitution structures. The purpose of this paper is to provide such a tool by developing fundamental properties of M-natural-convexity. We show that in a parametric maximization model, if the objective function is M-natural-concave in the decision variables and parameters, and the constraint is a box, then the optimal solution is nonincreasing in the parameters. We also provide some preservation properties of M-natural-convexity and a characterization of twice continuously differentiable M-natural-convex functions. We then utilize them to analyze two important operations models: a classical dynamic stochastic inventory model with substitutable products and a portfolio contract model. We illustrate that looking from the lens of M-natural-convexity allows to simplify the complicated analysis in the literature and extend the results to more general settings. We also discuss the extension of our theoretical results to a broader function class referred to as S-convex function.

11:22
Earned Value Measurement in Project Buffers Management

ABSTRACT. The aim of this research was the trial of modelling and optimizing the time-cost trade-offs in project planning problem. The base of the problem description contains both safe and reasonable amounts of time and cost estimations and the influence factors matrix. We assumed also the pricing opportunity of performance improving. Finally we try to combine that method with Earned Value Analysis (EVA) method - one of the most popular project audit method.

11:44
Developing a Mixed Integer Linear Program (MILP) to maximize material utilization and switch the setup from internal to external

ABSTRACT. Nesting program is used to optimizes material utilization while limiting the marking dies switches to one per part is developed. To ensure the material utilization is maximized and will give us a list of how the parts will be cut from every steel bar which will allow the operator to prepare the marking dies while the machine is running (i.e. changing the setup to external).

12:06
Strong contraction mapping and topological non-convex optimization

ABSTRACT. The strong contraction mapping, a surjective self-mapping that its range always belongs to a subset of its domain, admits a unique fixed-point that can be pinned down by the iteration of the mapping. As an application of strong contraction mapping, we design a topological non-convex optimization method to achieve global minimum convergence. The strength of the approach is its robustness to local minima and initial point position.