View: session overviewtalk overview
08:30 | A Decision Rule Approach for Two-Stage Data-Driven Distributionally Robust Optimization Problems with Random Recourse PRESENTER: Xiangyi Fan ABSTRACT. We study two-stage stochastic optimization problems with random recourse, where the adaptive decisions are multiplied with the uncertain parameters in both the objective function and the constraints. To mitigate the computational intractability of infinite-dimensional optimization, we propose a scalable approximation scheme via piecewise linear and piecewise quadratic decision rules. We then develop a data-driven distributionally robust framework with two layers of robustness to address distributionally uncertainty. The emerging optimization problem can be reformulated as an exact copositive program, which admits tractable approximations in semidefinite programming. We design a decomposition algorithm where smaller-size semidefinite programs can be solved in parallel, which further reduces the runtime. Lastly, we establish the performance guarantees of the proposed scheme and demonstrate its effectiveness through numerical examples. |
08:52 | DFO: A Robust Framework for Data-driven Decision-making with Outliers PRESENTER: Nan Jiang ABSTRACT. This work studies Distributionally Favorable Optimization (DFO), which seeks the best decision of a data-driven stochastic program under the most favorable distribution from a distributional family. When the problem involves outliers, the commonly-used Distributionally Robust Optimization (DRO) models tend to over-emphasize the unrealistic outliers or scenarios and cause non-robust misleading or even infeasible decisions. On the contrary, DFO can significantly mitigate the effects of outliers and has been not yet well studied. Thus, this paper fills the gap and shows that DFO recovers many robust statistics and can be truly ``robust" in the presence of outliers. While being NP-hard in general, a DFO model can be mixed-integer convex programming representable in many cases. We further propose a notion of decision outlier robustness to properly select a DFO framework for the data-driven optimization with outliers and extend the proposed DFO frameworks to solve two-stage stochastic programs without relatively complete recourse. |
09:14 | Learning and Decision-Making with Data : Optimal Formulations and Phase Transitions PRESENTER: Amine Bennouna ABSTRACT. We study the problem of designing optimal learning and decision-making formulations when only historical data is available. Prior work typically commits to a particular class of data-driven formulation and subsequently tries to establish out-of-sample performance guarantees. We take here the opposite approach. We define first a sensible yard stick with which to measure the quality of any data-driven formulation and subsequently seek to find an “optimal” such formulation. Informally, any data-driven formulation can be seen to balance a measure of proximity of the estimated cost to the actual cost while guaranteeing a level of out-of-sample performance. Given an acceptable level of out-of-sample performance, we construct explicitly a data-driven formulation that is uniformly closer to the true cost than any other formulation enjoying the same out-of-sample performance. We show the existence of three distinct out-of-sample performance regimes (a superexponential regime, an exponential regime and a subexponential regime) between which the nature of the optimal data-driven formulation experiences a phase transition. The optimal data-driven formulations can be interpreted as a classically robust formulation in the superexponential regime, an entropic distributionally robust formulation in the exponential regime and finally a variance penalized formulation in the subexponential regime. |
09:36 | Decision Rule Approaches for Distributionally Robust Stochastic Bilevel Linear Programs PRESENTER: Yiling Zhang ABSTRACT. We study a distributionally robust two-stage stochastic bilevel program (DRBP) in the context of sequential two-player games, where the leader makes a binary here-and-now decision, and the follower responds a continuous wait-and-see decision after observing the leader’s action and revelation of uncertainty. We show that the DRBP is equivalent to generic two-stage distributionally robust stochastic linear programs (TSDRLPs) under two conditions: (i) when the objective coefficients of the follower's decision in the leader's and the follower's problems coincide and (ii) when a pessimistic version of the leader's problem is considered. Under moment-based ambiguity sets of continuous distributions, since the resulting TSDRLP involves random objective and right-hand side, which is NP-hard in general, we approach it by linear decision rule techniques. We provide upper-bounding 0-1 semidefinite programming (SDP) and 0-1 copositive programming (COP) approximations. When distributions are supported on a finite discrete set, we provide exact 0-1 SDP reformulation of the resulting TSDRLP. Extensive computations are conducted to demonstrate the effectiveness of the proposed approaches on a facility location problem. |
08:30 | SQP Methods for Nonlinear Equality Constrained Stochastic Optimization PRESENTER: Albert Berahas ABSTRACT. We propose sequential quadratic optimization algorithms (with inexact solves) for minimizing a stochastic objective function subject to deterministic equality constraints. Algorithms that allow inexact subproblem solutions to be used are important in large-scale applications when the matrices used by the subproblem solver are too expensive to form or factorize. Convergence results (in expectation) are established for our proposed method (under common assumptions), and numerical experiments demonstrate the outperformance of our methods in terms of key efficiency measures. |
08:52 | Accelerating Stochastic Sequential Quadratic Programming for Equality Constrained Stochastic Optimization using Predictive Variance Reduction PRESENTER: Jiahao Shi ABSTRACT. We propose a variance reduction method for solving equality constrained stochastic optimization problems. Specifically, we develop a method based on the sequential quadratic programming paradigm that utilizes variance reduction in the gradient approximation via stochastic variance reduction gradient (SVRG). We prove exact convergence in expectation to first order stationarities with non-diminishing stepsize sequences. Finally, we demonstrate the practical performance of our proposed algorithm of standard constrained machine learning problems. |
09:14 | High Probability Complexity Bounds for Line Search Based on Stochastic Oracles PRESENTER: Miaolan Xie ABSTRACT. We consider a line-search method for continuous optimization under a stochastic setting where the function values and gradients are available only through inexact probabilistic zeroth and first-order oracles. These oracles capture multiple standard settings including expected loss minimization and zeroth-order optimization. Moreover, our framework is very general and allows the function and gradient estimates to be biased. The proposed algorithm is simple to describe, easy to implement and uses these oracles in a similar way as the standard deterministic line search uses exact function and gradient values. Under fairly general conditions on the oracles, we derive a high probability tail bound on the iteration complexity of the algorithm when applied to non-convex smooth functions. These results are stronger than those for other existing stochastic line search methods and apply in more general settings. |
09:36 | Worst-Case Complexity of an SQP Method for Nonlinear Equality Constrained Stochastic Optimization PRESENTER: Michael O'Neill ABSTRACT. A worst-case complexity bound is proved for a sequential quadratic optimization (commonly known as SQP) algorithm that has been designed for solving optimization problems involving a stochastic objective function and deterministic nonlinear equality constraints. Barring additional terms that arise due to the adaptivity of the monotonically nonincreasing merit parameter sequence, the proved complexity bound is comparable to that known for the stochastic gradient algorithm for unconstrained nonconvex optimization. The overall complexity bound, which accounts for the adaptivity of the merit parameter sequence, shows that a result comparable to the unconstrained setting (with additional logarithmic factors) holds with high probability. |
08:30 | On the Complexity of a Practical Primal-Dual Coordinate Method PRESENTER: Ahmet Alacaoglu ABSTRACT. We prove complexity bounds for the primal-dual algorithm with random extrapolation and coordinate descent (PURE-CD), which has been shown to obtain good practical performance for solving convex- concave min-max problems with bilinear coupling. Our complexity bounds either match or improve the best-known results in the literature for both dense and sparse (strongly)-convex-(strongly)-concave problems. |
08:52 | Regularized Step Directions in Conjugate Gradient Minimization ABSTRACT. Conjugate gradient minimization methods (CGM) and their accelerated variants are widely used in machine learning applications. We focus on the use of cubic regularization to improve the CGM direction independent of the steplength (learning rate) computation. Using Shanno's reformulation of CGM as a memoryless BFGS method, we derive new formulas for the regularized step direction, which can be evaluated without additional computational effort. The new step directions are shown to improve iteration counts and runtimes and reduce the need to restart the CGM. |
09:14 | p-Norm Flow Diffusion for Local Graph Clustering PRESENTER: Shenghao Yang ABSTRACT. Local graph clustering and the seed set expansion problem are primitives on graphs that are central to a wide range of analytic and learning tasks such as local clustering, community detection, semi-supervised learning, nodes ranking and feature inference. Prior work mostly falls into two categories with numerical and combinatorial roots respectively. In this work, we draw inspiration from both fields and pro- pose a family of convex optimization formulations based on the idea of diffusion with p-norm network flow. In the context of local clustering, we characterize the optimal solutions for these optimization problems and show their usefulness in finding low conductance cuts around input seed set. In particular, we achieve quadratic approximation of conductance in the case of p = 2 similar to the Cheeger-type bounds of spectral methods, constant factor approximation when p → ∞ similar to max-flow based methods, and a smooth transition for general p values in between. Our optimization formulation can be viewed as bridging the numerical and combinatorial approaches, and we can achieve the best of both worlds in terms of speed and noise robustness. We show that the proposed problem can be solved in strongly local running time for p ≥ 2. |
09:36 | Survey Descent: A Multipoint Generalization of Gradient Descent for Nonsmooth Optimization PRESENTER: X.Y. Han ABSTRACT. For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging: even when the objective is smooth at every iterate, the corresponding local models are unstable, and traditional remedies need unpredictably many cutting planes. We instead propose a multipoint generalization of the gradient descent iteration for local optimization. While designed with general objectives in mind, we are motivated by a "max-of-smooth" model that captures subdifferential dimension at optimality. We prove linear convergence when the objective is itself max-of-smooth, and experiments suggest a more general phenomenon. |
08:30 | On Polytopes with Linear Rank with respect to Generalizations of the Split Closure PRESENTER: Yatharth Dubey ABSTRACT. Here we study the rank of polytopes in the 0-1 cube with respect to $t$-branch split cuts and $t$-dimensional lattice cuts for a fixed positive integer $t$. These inequalities are the same as split cuts when $t=1$ and generalize split cuts when $t > 1$. Known examples with high split rank -- i.e., at least $cn$ for some positive constant $c < 1$ -- are defined by exponentially many (as a function of $n$) linear inequalities. For any fixed integer $t > 0$, we give a family of polytopes contained in $[0,1]^n$ for sufficiently large $n$ such that each polytope has empty integer hull, is defined by $O(n)$ inequalities, and has rank $\Omega(n)$ with respect to $t$-dimensional lattice cuts. Therefore the split rank of these polytopes is $\Omega(n)$. It was shown earlier that there exist generalized branch-and-bound proofs, with logarithmic depth, of the nonexistence of integer points in these polytopes. Therefore, our lower bound results on split rank show an exponential separation between the depth of branch-and-bound proofs and split rank. |
08:52 | A reinterpretation of cutting planes PRESENTER: John Hooker ABSTRACT. Cutting planes accelerate branch-and-bound search by cutting off fractional solutions of the linear programming relaxation. We reinterpret cutting planes as a means to reduce backtracking by excluding inconsistent partial assignments -- that is, partial assignments that cannot be extended to a full feasible assignment. The constraint programming community has studied consistency extensively and uses it to reduce backtracking. We extend this approach to integer programming. We propose a cutting plane algorithm for achieving partial consistency and show that it can substantially reduce the search tree. More broadly, we suggest that consistency concepts offer a new perspective on IP that can lead to a better understanding of what makes branching methods work. |
09:14 | Extended Relaxation and Cutting-planes for Linear Programs with Complementarity Constraints PRESENTER: Haoran Zhu ABSTRACT. We study linear relaxation for linear programs with complementarity constraints, where the embedding conflict graph can be a general graph and complementarity constraints can have overlapping variables. Our technique is based on the graph theoretic concept of vertex cover, and generalizes the extended reformulation-linearization technique of Nguyen et al. introduced in (J. Global Optim., 80(2): 249-286, 2021) to problems with much more general complementarity constraints. We then discuss how to obtain strong cutting-planes for our formulation, one major part of them arise from the boolean quadric polytope associated with a complete bipartite graph. In an extensive computational study, we demonstrate the significant improvement of the linear relaxation in terms of the optimality gap being closed. |
09:36 | Modeling Disjunctive Constraints via Junction Trees PRESENTER: Bochuan Lyu ABSTRACT. A disjunctive constraint, which consists of fixed and finite alternative collections of constraints where at least one collection is satisfied, has soon become prevalent in mixed-integer programming (MIP) with various real-world applications since the late 1970s. In this work, we study the independent-branching (IB) framework of disjunctive constraints and identify a class of pairwise IB-representable disjunctive constraints: disjunctive constraints with junction trees. For this class of constraints, the existence of junction trees can be recognized in polynomial time. We also present a polynomial-time heuristic algorithm for the minimum biclique cover problem on the associated conflict graphs to build small and strong MIP formulations. Additionally, we apply the heuristic to find smaller MIP formulations of generalized special ordered sets with fewer variables and constraints than Huchette and Vielma [2019]. |
08:30 | Resiliency of on-demand multimodal transit systems during a pandemic PRESENTER: Kevin Dalmeijer ABSTRACT. During the COVID-19 pandemic, the collapse of the public transit ridership led to significant budget deficits due to dramatic decreases in fare revenues. Additionally, public transit agencies are facing challenges of reduced vehicle capacity due to social distancing requirements, additional costs of cleaning and protective equipment, and increased downtime for vehicle cleaning. This paper studies the resiliency during a pandemic of On-Demand Multimodal Transit Systems (ODMTS), a new generation of transit systems that combine a network of high-frequency trains and buses with on-demand shuttles to serve the first and last miles. It presents a case study for the city of Atlanta and evaluates ODMTS for multiple scenarios of depressed demand and social distancing representing various stages of the pandemic. The case study relies on an optimization pipeline that provides an end-to-end ODMTS solution by bringing together methods for demand estimation, network design, fleet sizing, and real-time dispatching. Real data from the Metropolitan Atlanta Rapid Transit Authority (MARTA) is used to conduct the case study, and the results are evaluated with a high-fidelity simulation. The case study demonstrates how ODMTS provide a resilient solution in terms of cost, convenience, and accessibility for this wide range of scenarios. |
08:52 | Iterative Approaches for Integrating Rider Behavior into the Design of Large-Scale On-Demand Multimodal Transit Systems PRESENTER: Beste Basciftci ABSTRACT. Capturing latent demand has a pivotal role in designing public transit services as omissions of these riders can lead to inefficiencies and unfairness. This paper explores this topic by combining mode preferences of riders constituting latent demand with a novel transit system---On-Demand Multimodal Transit Systems (ODMTS). In a previous study, a bilevel optimization framework between the transit agency and riders was employed to formulate this problem with an exact solution procedure. Nevertheless, due to the complexity and combinatorial nature of this problem, applying the exact algorithm on large-scale instances becomes computationally challenging. Based on the bilevel optimization framework, this paper derives certain properties of the optimal solutions to be looked for in the desirable network designs and proposes five iterative algorithms to efficiently approximate the optimal solution of this problem. The advantages of the proposed algorithms are demonstrated with two test cases constructed from real data---a normal-sized case study conducted in the Ann Arbor and Ypsilanti region in the state of Michigan and a large-scale case study conducted in the Atlanta metropolitan in the state of Georgia. |
09:14 | Plan Your System and Price for Free: Fast Algorithms for Multimodal Transit Operations PRESENTER: Qi Luo ABSTRACT. We study the problem of jointly pricing and designing a smart transit system, where the transit agency controls a fleet of demand-responsive vehicles and a fixed-line service. The platform offers commuters a menu of options to travel between origin and destination, and commuters make the utility-maximizing choice within this menu, given the price of each trip option. The goal of the platform is to determine the optimal set of trip options to display to commuters, prices for these modes, and the design of the transit network in order to maximize the social welfare of the system. In this work, we tackle the commuter choice aspect of this problem, traditionally approached via computationally intensive bi-level programming techniques. In particular, we develop a framework that efficiently decouples the pricing and network design problem: given an efficient (approximation) algorithm for centralized network design without prices, there exists an efficient (approximation) for decentralized network design with prices and commuter choice. We demonstrate the practicality of our framework via extensive numerical experiments on a real-world dataset, in which we show the efficiency of pricing algorithm. |
09:36 | Pricing in On-Demand (and One-Way) Vehicle Sharing Networks PRESENTER: Xiaobing Shen ABSTRACT. We consider the dynamic pricing problem that arises in the context of an on-demand vehicle sharing system with one-way trips. We provide an alternative approach to bounding the performance of static pricing policies. The approach is startlingly simple, producing a bound on the performance ratio of the static pricing policy upon the application of a well-known recursive relationship that relates system availability in a system with K vehicles to one with K-1 vehicles. The approach yields a parameterized family of static pricing policies that are asymptotically optimal and that generalize static pricing policies previously proposed in the literature. The bound on the performance ratio of the static policy is given by $K/(N + K - 1 + \Lambda/\mu)$ for systems with N locations and K vehicles, where $\Lambda$ is the total demand (sum of all trip requests) rate and $1/\mu$ is the average trip travel time. Moreover, the best static pricing policy this approach produces is optimal among those that require a demand balance constraint (e.g., as in Waserhole and Jost (2016) and Banerjee et al. (2021) ) with a performance that can be significantly higher. |
08:30 | A Generalized Primal-Dual Algorithm Framework for Saddle Point Problems PRESENTER: Xiaochun Niu ABSTRACT. Many important distributed and centralized optimization problems can be formulated as convex-concave saddle point problems. Most existing literature focuses on gradient-type primal-dual methods for this class of problems. In this work, we propose a generalized primal-dual algorithm framework, which includes gradient-type, Newton-type, mixture of gradient and Newton updates as well as other general descent/ascent directions. Theoretically, we derive a global linear convergence rate of generalized primal-dual methods in the case that objective function is smooth and strongly convex. We also provide a necessary and sufficient condition for the method to achieve the local quadratic convergence rate. Moreover, we consider applications of such framework in distributed optimization. We develop both Newton-type and hybrid (with heterogeneous gradient-type and Newton-type local updates) primal-dual implementations for consensus optimization problems and network flow problems. We also conduct numerical experiments on both synthetic data and real-life data to illustrate the performance of the methods. Our method is the first generalized framework with provable guarantees. |
08:52 | Mixed Integer Linear Optimization Formulations for Learning Optimal Binary Classification Trees PRESENTER: Brandon Alston ABSTRACT. Decision trees are powerful tools for classification and regression that attract many researchers working in the burgeoning area of machine learning. A decision tree has two types of nodes: (i) branching nodes at which datapoints are assessed on a set of features, and (ii) leaf nodes at which datapoints are given a prediction. In classification trees, branching and leaf nodes use discrete features and discrete predictions, respectively. A binary classification tree is a specific type of classification tree in which each branching node has exactly two children. An optimal binary classification tree can be obtained by solving a biobjective optimization problem that seeks to (i) maximize the number of correctly classified datapoints and (ii) minimize the number of branching nodes. In this paper, we propose four mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees: two flow-based formulations and two cut-based formulations. Furthermore, we provide theoretical and computational comparisons between our formulations and the most recent flow-based MILO formulation proposed by Aghaei et al. (2021). |
09:14 | Investigating the Impact of Preprocessing on Quantum Algorithms PRESENTER: Moises Ponce ABSTRACT. Computational speed and processing power are not improving in rates as seen in prior decades. With the slower growth in computer capabilities, there is a greater call in optimizing how algorithms solve problems. Even with the advancement of quantum computing, limitations on the number of qubits restrict the size of problems to examine. One of the most prominent combinatorial problems in the field of Quantum Computing is the Max Cut problem. The Max Cut problem is NP-hard and it may demonstrate Quantum Supremacy; however, its complexity can be reduced. By removing nodes and edges incident to that node, storing information on its effect on the objective value is a method that will ultimately reduce the size of the graph while maintaining that the solution of the Max Cut problem of the original graph and the reduced graph are the same. Utilizing this method of preprocessing, we will examine its effect on leading Quantum Algorithms. |
13:15 | On Generating Lagrangian Cuts for Two-Stage Stochastic Integer Programs PRESENTER: Jim Luedtke ABSTRACT. We investigate new methods for generating Lagrangian cuts to solve two-stage stochastic integer programs. Lagrangian cuts can be added to a Benders reformulation, and are derived from solving single scenario integer programming subproblems identical to those used in the nonanticipative Lagrangian dual of a stochastic integer program. While Lagrangian cuts have the potential to significantly strengthen the Benders relaxation, generating Lagrangian cuts can be computationally demanding. We investigate new techniques for generating Lagrangian cuts with the goal of obtaining methods that provide significant improvements to the Benders relaxation quickly. Computational results demonstrate that our proposed method improves the Benders relaxation significantly faster than previous methods for generating Lagrangian cuts and, when used within a branch-and-cut algorithm, significantly reduces the size of the search tree for two classes of test problems. |
13:37 | Projection Cuts for Two-Stage Stochastic Mixed-Integer Programs PRESENTER: Ruiwei Jiang ABSTRACT. We revisit the Benders decomposition algorithm for solving two-stage stochastic mixed-integer programs with \emph{in}complete recourse. Inspired by principal component analysis, we propose a family of valid inequalities, called projection cuts. We design a two-phase variant of Benders decomposition with projection cuts to strengthen the initial relaxation. Extensive numerical experiments on stochastic unit commitment, multi-commodity network design, and stochastic production routing demonstrate that our approach obtains better root-node gaps, reduces the needed number of feasibility cuts to half, and shortens the overall solving time, in comparison with the classic Benders decomposition. |
13:59 | Dual Dynamic Programming for Data Driven Distributionally Robust Multistage Convex Optimization PRESENTER: Shixuan Zhang ABSTRACT. In this talk, we consider distributionally robust multistage convex optimization (DR-MCO) with Wasserstein ambiguity sets constructed from stagewise independent empirical distributions. We show that the DR-MCO models have favorable out-of-sample performance guarantee and adjustable level of in-sample conservatism. Then we extend the dual dynamic programming algorithm for the data driven DR-MCO with complexity analysis based on single stage subproblem oracles, for which we provide two possible implementations exploiting convexity or concavity of the uncertain cost functions. Numerical experiments on inventory control problems and hydrothermal energy system planning problems are conducted to show the effectiveness of our DR-MCO, in comparison with the widely used risk-neutral and risk-averse multistage stochastic optimization approaches. |
14:21 | Computation Efficient Stochastic Optimization for Variable Efficiency Energy Storage PRESENTER: Bolun Xu ABSTRACT. This paper presents a computation-efficient stochastic dynamic programming algorithm for solving energy storage price arbitrage considering variable charge and discharge efficiencies. We formulate the price arbitrage problem using stochastic dynamic programming and model real-time prices as a Markov process. Then we propose an analytical solution algorithm using a piecewise linear approximation of the value-to-go function. Our solution algorithm achieves extreme computation performance and solves the proposed arbitrage problem for one operating day in less than one second on a personal computer. We demonstrate our approach using historical price data from four price zones in New York Independent System Operator, with case studies comparing the performance of different stochastic models and storage settings. Our results show that the proposed method captures 50% to 90% of arbitrage profit compared to perfect price forecasts. In particular, our method captures more than 80% of arbitrage profit in three out of the four price zones when considering batteries with more than two-hour duration and realistic degradation cost. |
13:15 | A Deep Reinforcement Learning Approach to Solving the Multidimensional Knapsack Problem PRESENTER: Sabah Bushaj ABSTRACT. In this study, we address the difficulty of solving the multidimensional knapsack problem (MKP) by presenting a novel deep reinforcement learning (DRL) framework. We perform a combination of dimension reduction techniques and develop a worthiness value formula for each knapsack item. We then propose a Deep Reinforcement Learning (DRL) framework where we aim to train different agents compatible with a discrete action space for sequential decision making while still satisfying any resource constraint of the MKP. To our knowledge, this is the first DRL model of its kind where a 2D environment is formulated, and each element of the DRL solution matrix represents an item of the MKP. Our DRL framework shows that it can solve medium-sized instances at least 45 times faster in CPU solution time and at least 10 times faster for large instances with a maximum solution gap of 0.28% compared to the performance of CPLEX. Furthermore, at least 95% of the items are predicted in accordance with the CPLEX solution. |
13:37 | A Top-Down Approach Toward Understanding Deep Learning ABSTRACT. The remarkable development of deep learning over the past decade relies heavily on sophisticated heuristics and tricks. To better exploit its potential in the coming decade, perhaps a rigorous framework for reasoning deep learning is needed, which however is not easy to build due to the intricate details of modern neural networks. For near-term purposes, a practical alternative is to develop a mathematically tractable surrogate model that yet maintains many characteristics of deep learning models. This talk introduces a model of this kind as a tool toward understanding deep learning. The effectiveness of this model, which we term the Layer-Peeled Model, is evidenced by two use cases. First, we use this model to explain an empirical pattern of deep learning recently discovered by David Donoho and his students. Moreover, this model predicts a hitherto unknown phenomenon that we term Minority Collapse in deep learning training. This is based on joint work with Cong Fang, Hangfeng He, and Qi Long. |
13:59 | Private Federated Learning Without a Trusted Server PRESENTER: Andrew Lowy ABSTRACT. We discuss federated learning (FL) in the absence of a trustworthy server/clients. In this setting, each client needs to ensure the privacy of its own data without relying on the server or other clients. We study local differential privacy (LDP) at the client level and provide nearly tight upper and lower bounds that establish the optimal rates for LDP convex/strongly convex federated stochastic optimization with homogeneous client data. Our rates match the optimal statistical rates in certain practical parameter regimes. Remarkably, we show that similar rates are attainable for smooth losses with arbitrary heterogeneous client data, with a linear time LDP algorithm. We also establish optimal rates for empirical losses and improve communication complexity compared to existing works. Next, with a secure shuffler to anonymize client reports (but no trusted server), our algorithm attains the optimal central DP rates for stochastic convex/strongly convex optimization, thereby achieving optimality in the local and central models of private FL simultaneously. Lastly, we discuss new results for LDP/SDP FL with non-convex loss functions. For instance, the strongly convex rates can nearly be attained without convexity if the loss satisfies the proximal Polyak-Lojasiewicz inequality. |
14:21 | Private optimization in the interpolation regime: faster rates and hardness results PRESENTER: Hilal Asi ABSTRACT. In non-private stochastic convex optimization, stochastic gradient methods converge much faster on interpolation problems---namely, problems where there exists a solution that simultaneously minimizes all of the sample losses---than on non-interpolating ones; similar improvements are not known in the private setting. In this paper, we investigate differentially private stochastic optimization in the interpolation regime. First, we show that without additional assumptions, interpolation problems do not exhibit an improved convergence rates with differential privacy. However, when the functions exhibit quadratic growth around the optimum, we show (near) exponential improvements in the private sample complexity. In particular, we propose an adaptive algorithm that improves the sample complexity to achieve expected error $\alpha$ from $\frac{d}{\diffp \sqrt{\alpha}}$ to $\frac{1}{\alpha^\rho} + \frac{d}{\diffp} \log\paren{\frac{1}{\alpha}}$ for any fixed $\rho >0$, while retaining the standard minimax-optimal sample complexity for non-interpolation problems. We prove a lower bound that shows the dimension-dependent term is tight. Furthermore, we provide a superefficiency result which demonstrates the necessity of the polynomial term for adaptive algorithms: any algorithm that has a polylogarithmic sample complexity for interpolation problems cannot achieve the minimax-optimal rates for the family of non-interpolation problems. |
13:15 | Optimal Methods for Risk Averse Distributed Optimization PRESENTER: Zhe Zhang ABSTRACT. We study the communication complexity of risk averse optimization over a star-shape network. The problem generalizes the well-studied risk-neutral finite-sum distributed optimization problem and its importance stems from the need to handle risk in an uncertain environment. For algorithms in the literature, there exists a gap in communication complexities for solving risk-averse and risk-neutral problems. We propose two distributed algorithms, namely distributed risk averse optimization (DRAO) method and distributed risk averse optimization with sliding (DRAO-S) method, to close the gap. Specifically, the DRAO method achieves the optimal communication complexity by assuming a certain saddle point subproblem can be easily solved in the server node. The DRAO-S method removes the strong assumption by introducing a novel saddle point sliding subroutine which only requires the projection over the ambiguity set $P$. We observe that the number of $P$-projections performed by DRAO-S is optimal. Moreover, matching lower complexity bounds are developed to illustrate that communication complexities of both DRAO and DRAO-S are not improvable. Numerical experiments are conducted to showcase the encouraging empirical performance of the DRAO-S method. |
13:37 | Katyusha Acceleration for Convex Finite-Sum Compositional Optimization PRESENTER: Yibo Xu ABSTRACT. We focus on convex problems with a finite-sum compositional structure. One popularly used numerical approach for finite-sum problems is the stochastic gradient method (SGM). However, the additional compositional structure prohibits easy access to unbiased stochastic approximation of the gradient, so directly applying the SGM to a finite-sum compositional optimization problem (COP) is often inefficient. We design new algorithms for solving strongly-convex and also convex two-level finite-sum COPs. Our design incorporates the Katyusha acceleration technique and adopts the mini-batch sampling from both outer-level and inner-level finite-sum. For strongly-convex finite-sum COPs, we obtain linear convergence rate in terms of the expected objective error; and from the convergence rate result, we then establish complexity results of the algorithm to produce an ε-solution. Our complexity results have the same dependence on the number of component functions as existing works. However, due to the use of Katyusha acceleration, our results have better dependence on the condition number κ and improve to κ^{2.5} from the best-known κ^3. For convex finite-sum COPs, we use as a subroutine the algorithm for strongly-convex finite-sum COPs. Again, we obtain better complexity results than existing works in terms of the dependence on ε, improving to ε^{-2.5} from the best-known ε^{-3}. |
13:59 | First and zeroth-order methods for constrained nonconvex optimization PRESENTER: Zichong Li ABSTRACT. First and zeroth-order methods have been widely used for solving large-scale problems due to their cheap per-iteration costs. A majority of existing works focus on problems without constraint, with simple constraints, or with complicated convex constraints. However, existing works on nonconvex constrained problems are rare, and almost nonexistent if the objective or constraints are black-box (i.e. gradients are inaccessible). In this talk, I will present a first-order inexact augmented Lagrangian method (iALM) on nonconvex constrained problems, involving a composite (i.e., smooth+nonsmooth) objective and functional constraints. I will also present a zeroth-order iALM on black-box nonconvex constrained problems. Under a certain regularity condition (also assumed by several existing works), both proposed methods have better iteration complexities and numerical performances compared to existing methods. The applications of our method span from classical optimization problems to practical machine learning examples. |
13:15 | Tight Guarantees for Multi-unit Prophet Inequalities and Online Stochastic Knapsack PRESENTER: Jiashuo Jiang ABSTRACT. Prophet inequalities are a useful tool for designing online allocation procedures and comparing their performance to the optimal offline allocation. In the basic setting of k-unit prophet inequalities, the magical procedure of Alaei (2011) with its celebrated performance guarantee has found widespread adoption in mechanism design and other online allocation problems in online advertising, healthcare scheduling, and revenue management. However, the tightness of Alaei's procedure for a given k has remained unknown. In this paper we resolve this question, characterizing the tight bound by identifying the structure of the optimal online implementation, and consequently improving the best-known guarantee for k-unit prophet inequalities for all k>1. We also consider a more general online stochastic knapsack problem where each individual allocation can consume an arbitrary fraction of the initial capacity. We introduce a new "best-fit" procedure for implementing a fractionally-feasible knapsack solution online, with a performance guarantee of 13+e−2≈0.319, which we also show is tight. This improves the previously best-known guarantee of 0.2 for online knapsack. Finally, we refine our technique for the unit-density special case of knapsack, and improve the guarantee from 0.321 to 0.3557. |
13:37 | Robust online selection with uncertain offer acceptance ABSTRACT. Motivated by the problem of displayed online advertisement where customers may fail to click on an ad, we consider an online selection problem where a candidate may or may not accept an offer according to a known probability. Candidates arrive online, in random order. Upon arrival, a decision-maker observes the candidate's partial rank compared to previously observed candidates and either extends an offer to the candidate or moves to the next candidate, without the possibility of recalling previously observed candidates. If an offer is extended, the corresponding candidate accepts the offer with probability $p$, in which case the process ends, or rejects it, in which case the decision-maker moves to the next candidate. Because the decision-maker does not know the top candidate willing to accept an offer, the goal is to maximize a robust objective defined as the minimum over integers $k$ of the probability of choosing one of the top $k$ candidates given that one of these candidates will accept an offer. We derive an exact linear program for this max-min objective. We further relax this linear program into an infinite counterpart, which we use to provide asymptotic upper bounds for the objective and closed-form policies. |
13:15 | Joint Routing of Conventional and Range-Extended Electric Vehicles in a Large Metropolitan Network PRESENTER: Anirudh Subramanyam ABSTRACT. Range-extended electric vehicles combine the higher efficiency and environmental benefits of battery- powered electric motors with the longer mileage and autonomy of conventional internal combustion engines. This is particularly advantageous for time-constrained delivery routing in dense urban areas, where battery recharging along routes can be too time-consuming to economically justify the use of all-electric vehicles. However, switching from electric to conventional fossil fuel modes also results in higher costs and emissions and lower efficiency. This paper analyzes this heteroge- neous vehicle routing problem for describes two solution methods: an exact branch-price-and-cut algorithm as well as an iterated tabu search metaheuristic. From a methodological perspective, we find that the exact algorithm consistently obtains tight lower bounds that also serve to certify the metaheuristic solutions as near-optimal. From a policy standpoint, we examine a large-scale real-world case study concerning parcel deliveries in the Chicago metropolitan area and quantify various operational metrics including energy costs and vehicle miles traveled in electric and fossil fuel modes. Our study reveals key areas of improvement to which vehicle manufacturers, distributors, and policy-makers can focus their attention. |
13:37 | Incorporating Intuition in Last-Mile Routing PRESENTER: Roshan Mahes ABSTRACT. In last-mile routing, the optimization is often framed as a Traveling Salesman Problem to minimize travel time and associated cost. However, solutions stemming from this approach do not match the realized paths as drivers deviate due to navigational considerations and preferences. To prescribe routes that incorporate this tacit knowledge, a data-driven model is proposed that aligns well with the hierarchical structure of delivery data wherein each stop belongs to a zone - a geographical area. First, on the global level, a zone sequence is found as a result of a minimization over a cost matrix which balances historical information and distances between zones. Second, within zones, a sequence of stops is determined, such that integrated with the predetermined zone sequence a full solution is obtained. From a cognitive perspective, the concurrence between prescribed and realized routes reveals that delivery drivers adopt a hierarchical breakdown of the full problem combining intuition and locally optimal decisions when navigating. Furthermore, experimenting with the trade-off between historical information and travel times exposes that drivers rely more on their intuition at the start, whereas at the end, when returning to the station, only travel time is the concern. |
13:59 | Stochastic Optimization Models for a Home Service Routing and Appointment Scheduling Problem with Random Travel and Service Times PRESENTER: Karmel Shehadeh ABSTRACT. We study a routing and appointment scheduling problem with uncertain service and travel times arising from home service practice. Specifically, given a set of customers within a service region that an operator needs to serve, we seek to find the operator’s route and time schedule. The quality of routing and scheduling decisions is a function of the total operational cost, consisting of customers’ waiting time, and the operator’s travel time, idle time, and overtime. We propose and rigorously analyze a stochastic programming model and two distributionally robust optimization (DRO) models to model and solve the problem, assuming a known and unknown service and travel time distribution, respectively. We consider two popular types of ambiguity sets for the DRO models: moment-based and Wasserstein distance-based ambiguity sets. We derive equivalent mixed-integer linear programming (MILP) reformulations of both DRO models that can be implemented and efficiently solved using off-the-shelf optimization software, thereby enabling practitioners to use these models. In an extensive numerical experiment, we investigate the proposed models’ computational and operational performance, demonstrating where significant performance improvements can be gained with each and derive insights into the problem. |
14:21 | Efficient Algorithms for Stochastic Ridepooling Assignment with Mixed Fleets PRESENTER: Qi Luo ABSTRACT. Ridepooling plays a pivotal role in improving fleet utilization in shared mobility platforms by serving multiple passenger requests simultaneously. The ridepooling assignment problem is solved repeatedly on a shareability graph, a hypergraph representing the matching compatibility between available vehicles and pending requests, to find optimal co-riders. With a mixed fleet of two or more types of vehicles, fleet sizing is pre-determined before the requests are revealed. Due to the immense size of the underlying shareability graph and demand uncertainty, it is impractical to use exact methods to calculate the optimal trip assignments. Two approximation algorithms for mid-capacity and high-capacity vehicles are proposed in this paper; The respective approximation ratios are 1/p^2 and (e−1)/(2e+o(1))plogp, where p is the maximum vehicle capacity plus one. A data-driven shared mobility simulator with mixed autonomy traffic validates the performance of these algorithms. These efficient algorithms serve as a stepping stone for a variety of multimodal and multiclass on-demand mobility applications. |
13:15 | Stochastic approximate cutting plane algorithms for continuous DR-submodular Maximization PRESENTER: Hugh Medal ABSTRACT. We present a stochastic approximate cutting plane algorithms for maximizing continuous DR-submodular functions subject to constraints. We derive cuts based on properties of the epigraph of such functions and then identify the concave envelop of these cuts. We exploit methods for stochastic gradient decent to speed up the approximation algorithm and present results for non-decreasing functions and computational results for a variety of test problems including maximizing the multilinear extension of the maximum covering location problem and the influence maximization problem. We benchmark our results against the existing continuous greedy algorithm and approximate cutting plane algorithms for both computational time and final solution generated. |
13:37 | High-Dimensional Simulation Optimization via Brownian Fields and Sparse Grids PRESENTER: Liang Ding ABSTRACT. High-dimensional simulation optimization is notoriously challenging. We propose a new sampling algorithm that converges to a global optimal solution and suffers minimally from the curse of dimensionality. The algorithm consists of two stages. First, we take samples following a sparse grid experimental design and approximate the response surface via kernel ridge regression with a Brownian field kernel. Second, we follow the expected improvement strategy---with critical modifications that boost the algorithm's sample efficiency---to iteratively sample from the next level of the sparse grid. Under mild conditions on the smoothness of the response surface and the simulation noise, we establish upper bounds on the convergence rate for both noise-free and noisy simulation samples. These upper bounds deteriorate only slightly in the dimension of the feasible set, and they can be improved if the objective function is known to be of a higher-order smoothness. Extensive numerical experiments demonstrate that the proposed algorithm dramatically outperforms typical alternatives in practice. |
13:59 | Algorithms for Computing Diverse Sets of Near-Optimal Solutions to Mixed-Integer Optimization Problems PRESENTER: Izuwa Ahanor ABSTRACT. While most methods for solving optimization problems find a single optimal solution, finding a set of near-optimal solutions can often be more useful. Further, such near-optimal set would be more useful if the solutions in the set are diverse, rather than all very similar. This project focused on generating such diverse near-optimal solutions for mixed-integer optimization (MIO) problems. In the context of MIOs, methods of generating diverse near optimal solutions usually treat the search for diversity separately from the search for a set of near-optimal solutions. In contrast, we present a new method of finding a pool of diverse solutions by emphasizing diversity within the search for near-optimal solutions. When compared with popular node selection rules such as best-first search, our new method increases diversity approximately 5 times more on most of the data sets we tested. Compared with existing methods for finding diverse near-optimal sets (like the one-tree/sequential algorithms), ours increases diversity approximately 3 times more when the requested number of solutions is small and about 9% when we request for large number of solutions (10,000). Our method runs with slightly more run-time as regular node selection methods when the requested number of solutions is small (below 100). |
14:21 | A global optimization framework with machine learning models to inform early termination PRESENTER: Guyi Chen ABSTRACT. We consider global optimization with a limited computational budget with the goal of carefully managing the balance between exploration and exploitation. Restarts allows exploring the domain space more thoroughly and a local search algorithm helps find local or global optimums. To better allocate computational budgets, we propose a framework where we strategically explore the domain space with restarts guided by iteratively updated Voronoi diagrams. These Voronoi restarts select potential promising restart points by investigating unexplored regions. We also use a logistic regression model to estimate the probability that a sequence of local search iterates will lead to a new local optimum, using only a few function evaluations. We present empirical, computational results to show the effectiveness of the framework on a set of test functions. |
16:30 | Optimal Robust Policy for Feature-Based Newsvendor PRESENTER: Luhao Zhang ABSTRACT. We study policy optimization for the feature-based newsvendor, which seeks an end-to-end policy that renders an explicit mapping from features to ordering decisions. Unlike existing works that restrict the policies to some parametric class which may suffer from sub-optimality (such as affine class) or lack of interpretability (such as neural networks), we aim to optimize over all measurable functions of features. In this case, the classical empirical risk minimization yields a policy that are not well-defined on unseen features. To avoid such degeneracy, we consider a distributionally robust framework. This leads to an adjustable robust optimization, whose optimal solutions are notoriously difficult to obtain except for a few notable cases. Perhaps surprisingly, we identify a new class of policies that are proven to be exactly optimal and can be computed efficiently. The optimal robust policy is obtained by extending an optimal robust in-sample policy to unobserved features in a particular way and can be interpreted as a regularized critical fractile of the empirical conditional demand distribution. We compare our method with several benchmarks using synthetic and real data and demonstrate its superior empirical performance. |
16:52 | Data-driven Multistage Distributionally Robust Optimization PRESENTER: Rui Gao ABSTRACT. We study sequential decision-making under uncertainty in which the underlying dynamics are subject to model ambiguity. Specifically, we consider a multistage distributionally robust optimization that hedges against a family of stochastic processes that are close to an empirical stochastic process formed by sample paths in terms of the nested distance. Due to the non-convexity of the distributional uncertainty set, this problem is notoriously difficult to solve. In this paper, we develop strong duality and derive computationally tractable dynamic program reformulations under various settings. Our results are exemplified for the dynamic portfolio selection problem. |
17:14 | Adjustability in Robust Linear Optimization PRESENTER: Ningji Wei ABSTRACT. Dynamic robust optimization involves sequential decisions over multiple stages against worst-case uncertainty realizations. At each stage, the decision-maker observes the uncertainty realization before committing to decisions, known as adjustable decisions. We focus on adjustability --- the difference between objective values of two problems: a static robust optimization problem where all decisions have to be made before uncertainty realization, and a fully adjustable robust optimization problem where all decisions are made after uncertainty realization. In this work, we develop a theoretical framework to quantify adjustability based on the input data of a robust optimization problem with linear objective, linear constraints, and fixed recourse. We make very few additional assumptions. In particular, we do not assume constraint-wise separability or parameter nonnegativity. Based on the discovery of an interesting connection between the reformulations of the static and fully adjustable problems, our analysis gives a necessary and sufficient criterion for adjustability to be zero when the uncertainty set is polyhedral. Then, we develop a constructive approach to quantify adjustability when the uncertainty set is general. We also develop an efficient algorithm to bound adjustability. We exemplify the value of our theoretical framework by applying it to interdiction, supply chain design, and inventory control problems. |
17:36 | Rockafellian Relaxation in Optimization under Uncertainty: Consistent Alternatives to Robust Formulations PRESENTER: Louis Chen ABSTRACT. It is often the case that optimization problem formulations involve uncertain parameters. Motivated by those settings in which such unknown parameters are learned over time, we investigate the "consistency", or limiting behaviors (of value, solutions, multipliers, etc.) to the optimization problems with respect to the convergence of these parameters. Towards this aim, we will make use of Rockafellians, a concept both old and new, so that we may evaluate the merits and shortcomings of various optimization formulations/practices. |
16:30 | Conjugate gradient methods for nonconvex optimization ABSTRACT. Nonconvex optimization formulations have recently gained popularity due to their prevalence in deep learning applications, but they also arise in numerous problems from data analysis and robust statistics. In this setting, second-order stationary points often correspond to global optima, while saddle points can be escaped from using negative curvature directions. These considerations motivate the development of second-order methods that can provably escape the vicinity of saddle points, either by design or with some probability. In this talk, we will investigate the properties of conjugate gradient methods on nonconvex instances, with a focus on second-order convergence and saddle point avoidance. Our goal will be to develop algorithms close to the textbook schemes while being endowed with second-order guarantees. We will then discuss how those guarantees connect to the practical behavior, through a numerical illustration from nonconvex regression. |
16:52 | Retrospective Approximation For Smooth Stochastic Approximation PRESENTER: Raghu Bollapragada ABSTRACT. We present Retrospective Approximation as a universal sequential sample-average approximation paradigm where during each iteration, a sample-path approximation problem is implicitly generated using an adapted sample size, and solved to an adapted error tolerance, using a ``deterministic method" such as the line search quasi-Newton method. The principal advantage of RA is that decouples optimization from stochastic approximation, allowing the direct adoption of existing deterministic algorithms without modification, thus mitigating the need to redesign algorithms for the stochastic context. |
17:14 | Implicit Regularization of Sub-gradient Method in Robust Matrix Recovery PRESENTER: Jianhao Ma ABSTRACT. It is well-known that simple short-sighted algorithms, such as gradient descent, generalize well in the over-parameterized learning tasks, due to their implicit regularization. However, it is unknown whether the implicit regularization of these algorithms can be extended to robust learning tasks. In this work, we provide a positive answer to this question in the context of robust matrix recovery problem. We show that a simple sub-gradient method converges to the true low-rank solution efficiently, when it is applied to the over-parameterized l1-loss function without any explicit regularization or rank constraint. Moreover, by building upon a new notion of restricted isometry property, called sign-RIP, we prove the robustness of the sub-gradient method against outliers in the over-parameterized regime. |
17:36 | Algorithms for Nonconvex Probabilistic PCA for Data With Heteroscedastic Noise PRESENTER: Kyle Gilman ABSTRACT. Principal component analysis (PCA) is a classical and ubiquitous method for reducing data dimensionality, but it is suboptimal for heterogeneous data that are increasingly common in modern applications. PCA treats all samples uniformly so degrades when the noise is heteroscedastic across samples, as occurs, e.g., when samples come from sources of heterogeneous quality. In this talk, we study a probabilistic PCA variant that estimates and accounts for this heterogeneity by incorporating it in the statistical model. Unlike in the homoscedastic setting, the resulting nonconvex optimization problem is not directly solved by singular value decomposition. We present a heteroscedastic probabilistic PCA technique (HePPCAT) that uses efficient alternating maximization algorithms to jointly estimate both the underlying factors and the unknown noise variances. Simulation experiments illustrate the comparative speed of the algorithms, the benefit of accounting for heteroscedasticity, and the seemingly favorable optimization landscape of this problem. Real data experiments on environmental air quality data show that HePPCAT can give a better PCA estimate than techniques that do not account for heteroscedasticity. Further, we discuss extending our results to the case of streaming heteroscedastic data with missing entries. |
16:30 | Efficient Sequential Experiment Design for Generalized Linear Models PRESENTER: Yongchun Li ABSTRACT. This paper studies the D-optimal experiment design of generalized linear models (GLMs) by selecting a fixed number of experiments to maximize the log-determinant of the Fisher information matrix (FIM). However, the FIM often involves the unknown regression parameters of GLMs, and a small number of experiments are selected sequentially due to resource constraints. To address these issues, we propose a new sequential experimental design framework, where at each experimental run, we update the uncertainty set of regression parameters based on the existing experimental results and then select a handful of new experiments. Specifically, we develop the first-known mixed integer robust optimization model and its optimistic counterpart to formulate the sequential experimental design problem. To efficiently solve our mixed integer robust and optimistic models, we design a branch and cut algorithm with closed-form cuts. We further prove that the regret led by the two models is upper bounded by $O(T^{-1/2})$. |
16:52 | On the convex hull of convex quadratic optimization with indicators PRESENTER: Andres Gomez ABSTRACT. We consider the convex quadratic optimization problem with indicator variables and arbitrary constraints on the indicators. We show that a convex hull description of the associated mixed-integer set in an extended space with a quadratic number of additional variables consists of a single positive semidefinite constraint (explicitly stated) and linear constraints. In particular, convexification of this class of problems reduces to describing a polyhedral set in an extended formulation. We also give descriptions in the original space of variables: we provide a description based on an infinite number of conic-quadratic inequalities, which are ``finitely generated." In particular, it is possible to characterize whether a given inequality is necessary to describe the convex-hull. The new theory presented here unifies several previously established results, and paves the way toward utilizing polyhedral methods to analyze the convex hull of mixed-integer nonlinear sets. |
17:14 | Integer Programming for Causal Structure Learning in the Presence of Latent Variables PRESENTER: Rui Chen ABSTRACT. The problem of finding an ancestral acyclic directed mixed graph (ADMG) that represents the causal relationships between a set of variables is an important area of research on causal inference. Most existing score-based structure learning methods focus on learning directed acyclic graph (DAG) models without latent variables. A number of score-based methods have recently been proposed for the ADMG learning, yet they are heuristic in nature and do not guarantee an optimal solution. We propose a novel exact score-based method that solves an integer programming (IP) formulation and returns a score-maximizing ancestral ADMG for a set of continuous variables that follow a multivariate Gaussian distribution. We generalize the state-of-the-art IP model for DAG learning problems and derive new classes of valid inequalities to formulate an IP model for ADMG learning. Empirically, our model can be solved efficiently for medium-sized problems and achieves better accuracy than state-of-the-art score-based methods as well as benchmark constraint-based methods. |
17:36 | Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features PRESENTER: Haoyue Wang ABSTRACT. Decision trees are one of the most useful and popular methods in the machine learning toolbox. In this paper, we consider the problem of learning optimal decision trees, a combinatorial optimization problem that is challenging to solve at scale. A common approach in the literature is to use greedy heuristics, which may not be optimal. Recently there has been significant interest in learning optimal decision trees using various approaches (e.g., based on integer programming, dynamic programming)---to achieve computational scalability, most of these approaches focus on classification tasks with binary features. In this paper, we present a new discrete optimization method based on branch-and-bound (BnB) to obtain optimal decision trees. Different from existing customized approaches, we consider both regression and classification tasks with continuous features. The basic idea underlying our approach is to split the search space based on the quantiles of the feature distribution---leading to upper and lower bounds for the underlying optimization problem along the BnB iterations. Our proposed algorithm shows significant speedups compared to existing approaches for shallow optimal trees on various real datasets. |
16:30 | Toward Linear Convergence for Non-Smooth Optimization: Implications for Discrete Optimization Problems ABSTRACT. To efficiently solve MILP problems, a novel decomposition and coordination Lagrangian Relaxation-based methodology is developed. The success of the method relies on the optimization of the associated non-smooth dual function. To efficiently optimize the dual function, linear convergence - the best convergence rate theoretically possible - inherent to Polyak’s step-sizing formula is exploited without the generally unknown optimal dual value. The key idea is to obtain “level” values, approaching the optimal dual value from above, through novel hyper-parameter-free “multiplier-convergence-feasibility” LP constraint satisfaction problems, from which the sought-for “level” estimates are inferred whenever the problem is infeasible. Testing results for Generalized Assignment Problems (GAP) demonstrate that 1. “level” estimates decrease and dual values increase faster than those obtained by other existing methods; the “level” values also provide tight upper bounds to quantify the quality of Lagrangian multipliers, 2. the new method converges fast and obtains high-quality feasible solutions – for the largest instances of GAP, globally optimal solutions are obtained, 3. the new method is robust with respect to initial parameters such as step-sizes and multipliers. Stochastic versions of Job-Shop Scheduling problems are also tested demonstrating that the new method is at least 2 orders of magnitude faster than branch-and-cut. |
16:52 | Multi-angle Quantum Approximate Optimization Algorithm PRESENTER: James Ostrowski ABSTRACT. The quantum approximate optimization algorithm (QAOA) generates an approximate solution to combinatorial optimization problems using a variational ansatz circuit defined by parameterized layers of quantum evolution. In theory, the approximation improves with increasing ansatz depth but gate noise and circuit complexity undermine performance in practice. Here, we introduce a multi-angle ansatz for QAOA that reduces circuit depth and improves the approximation ratio by increasing the number of classical parameters. Even though the number of parameters increases, our results indicate that good parameters can be found in polynomial time. This new ansatz gives a 33\% increase in the approximation ratio for an infinite family of MaxCut instances over QAOA. |
17:14 | Deepest Cuts for Benders Decomposition PRESENTER: Mojtaba Hosseini ABSTRACT. Since its inception, Benders Decomposition (BD) has been successfully applied to a wide range of large-scale mixed-integer (linear) problems. The key element of BD is the derivation of Benders cuts, which are often not unique. In this paper, we introduce a novel unifying Benders cut selection technique based on a geometric interpretation of cut “depth”, produce deepest Benders cuts based on lp-norms, and study their properties. Specifically, we show that deepest cuts resolve infeasibility through minimal deviation from the incumbent point, are relatively sparse, and may produce optimality cuts even when classical Benders would require a feasibility cut. Leveraging the duality between separation and projection, we develop a Guided Projections Algorithm for producing deepest cuts while exploiting the combinatorial structure or decomposablity of problem instances. We then propose a generalization of our Benders separation problem that brings several well-known cut selection strategies under one umbrella. In particular, we provide systematic ways of selecting the normalization coefficients in the Minimal Infeasible Subsystems method by establishing its connection to our method. Finally, in our tests on facility location problems, we show deepest cuts often reduce both runtime and number of Benders iterations, as compared to other cut selection strategies. |
17:36 | Motion Planning for Drone Fleets using MINLP PRESENTER: Nathan Runk ABSTRACT. Receding horizon mixed-integer nonlinear programming (MINLP) can address the problem of collision-free, fast, decentralized motion planning (MP) for small multi-rotor unmanned aerial vehicles operating in windy environments. Each quadrotor solves a MINLP to generate its time-optimal speed profile along a minimum snap spline path subject to kinematic, dynamic, communication connectivity, and collision avoidance constraints. The MINLPs are solved using a framework that combines simulated annealing and interior-point methods to handle discrete variables and nonlinear problem components. To simulate realistic wind disturbances which significantly increase the computational time to generate new MPs, the Dryden wind gust model is employed. Control Barrier Functions (CBFs) can guarantee collision avoidance with wind disturbances and alleviate MP recalculations’ necessity. The framework is validated via Hardware-in-the-loop (HITL) experiments using up to six DJI M100 quadrotors. Results demonstrate 1) fast MP solutions for outdoor operations and 2) CBFs’ utility in providing collision-free motion plans. |
16:30 | Federated Learning with Heterogeneous Data: A Superquantile Optimization Approach PRESENTER: Krishna Pillutla ABSTRACT. Federated learning is the centralized training of statistical models from decentralized data on mobile devices while preserving the privacy of each device. We present a federated learning framework that is designed to robustly deliver good predictive performance across individual clients with heterogeneous data. The proposed approach hinges upon a nonsmooth superquantile-based learning objective that captures the tail statistics of the error distribution over heterogeneous clients. We present a stochastic optimization algorithm which interleaves differentially private client reweighting steps with on-device stochastic gradient steps. We prove fine time convergence guarantees of the proposed algorithm in the strongly convex and nonconvex objectives. Numerical results on benchmark datasets for federated learning demonstrate that our approach is competitive with classical ones in terms of average error and outperforms them in terms of tail statistics of the error. |
16:52 | Quantity-Contingent Auctions and Allocation of Airport Slots PRESENTER: Alexander Estes ABSTRACT. We define and investigate quantity-contingent auctions. Such auctions can be used when there exist multiple units of a single product and the value of a set of units depends on the total quantity sold. For example, a road network or airport will become congested as the number of users increase so that a permit for use becomes more valuable as the total number allocated decreases. A quantity-contingent auction determines both the number of items sold and an allocation of items to bidders. Because such auctions could be used by bidders to gain excessive market power, we impose constraints limiting market power. We focus on auctions that allocate airport arrival and departure slots. We propose a continuous model and an integer programming model for the associated winner determination problem. Using these models, we perform computational experiments that lend insights into the properties of the quantity-contingent auction. |
17:14 | Managing Misinformation on Social Media during Disasters: Machine Learning and Game-Theoretic Approaches PRESENTER: Jun Zhuang ABSTRACT. Social media has been increasingly utilized to spread breaking news and risk communications during disasters. Unfortunately, due to the unmoderated nature of social media platforms such as Twitter and Facebook, rumors and misinformation could propagate widely. To address the problem, we develop a machine learning framework to predict the veracity of tweets that are spread during crisis events. The tweets are tracked based on the veracity of their content as either true, false, or neutral. We conduct four separate studies, and the results suggest that the framework is capable of tracking multiple cases of misinformation simultaneously. We also develop two game-theoretical models “Rumor Selection for Clarification” and “Learning for Rumor Clarification”, to help decide which rumor to clarify and when to clarify, respectively. This research provides novel insights on how to efficiently monitor misinformation that is spread during disasters. |
16:30 | On the Convergence of Generalized Proximal Point Algorithms for Monotone Inclusion Problems ABSTRACT. In this talk, we first provide sufficient conditions on the regularization and relaxation parameters of the generalized proximal point algorithm for the weak and strong convergence of the algorithm. Linear convergence of some generalized proximal point algorithms will also be presented. At last, we exhibit some particular examples to illustrate the improvements of our main results. |
16:52 | Multiplicative Gradient Method: Theory and Applications ABSTRACT. The multiplicative gradient (MG) method was originally developed for the positron emission tomography problem, where in each iteration one simply multiplies the current iterate with the current gradient to produce the next iterate. Recently it has been applied to D-optimal design and quantum state tomography as well. However, the understanding of its fundamental mechanism is lacking. In this talk, we unravel this mystery by identifying a general class of convex problems that involve optimizing a log-homogeneous function over a slice of a symmetric cone. We show that the MG method has (ergodic) $O(1/T)$ convergence rate on this class of problems. Our theory not only subsumes the existing results as special cases, but also helps to identify new applications where the MG method significantly outperforms the state-of-the-art. |
17:14 | Robust Interior Point Methods for Quantum Key Rate Computation for Quantum Key Distribution PRESENTER: Haesol Im ABSTRACT. We derive a stable reformulation of the quantum key rate computation for finite dimensional quantum key distribution (QKD) problems. This avoids the difficulties from singularities that arise due to loss of positive definiteness. This allows for the derivation of an efficient Gauss-Newton Interior point approach. Empirical evidence illustrate the strength of this approach as we obtain high accuracy solutions and theoretically guaranteed upper and lower bounds for the quantum key rate. |
17:36 | A Strengthened Barvinok-Pataki Bound on SDP Rank PRESENTER: Henry Wolkowicz ABSTRACT. The Barvinok-Pataki bound provides an upper bound on the rank of extreme points of a spectrahedron. This bound depends solely on the number of affine constraints of the problem, i.e.,~on the algebra of the problem. Specifically, the triangular number of the rank r is upper bounded by the number of affine constraints. We revisit this bound and provide a strengthened upper bound on the rank using the singularity degree of the spectrahedron. Thus we bring in the geometry and stability of the spectrahedron, i.e.,~increased instability as seen by higher singularity degree, yields a lower, strengthened rank bound. |