IOS 2024: INFORMS OPTIMIZATION SOCIETY CONFERENCE 2024
PROGRAM FOR SATURDAY, MARCH 23RD
Days:
previous day
next day
all days

View: session overviewtalk overview

08:30-10:00 Session SatA 1
08:30
Gradient Descent Provably Solves Nonlinear Tomographic Reconstruction

ABSTRACT. About 10 percent of Americans live with metal implants. These implants can cause “metal artifacts” in medical images reconstructed by X-ray computed tomography (CT), in some cases limiting their diagnostic quality. These artifacts are worsened by the logarithmic preprocessing commonly used to linearize CT measurements into a convex inverse problem, because the logarithm is poorly conditioned for the small number of photons that pass through metal implants. In this talk, we instead consider CT reconstruction directly from raw nonlinear measurements, bypassing the ill-conditioned logarithm and replacing it with a nonconvex inverse problem. We show that, despite nonconvexity, gradient descent provably converges to the global optimum at a geometric rate, perfectly reconstructing the underlying signal with a near minimal number of random measurements. We also prove similar results in the under-determined (compressive sensing) setting where the number of measurements is significantly smaller than the dimension of the signal. We illustrate the benefits of direct nonlinear CT reconstruction with cone-beam CT experiments on synthetic and real 3D volumes.

08:52
PathProx: A Proximal Gradient Algorithm for Weight Decay Regularized Deep Neural Networks

ABSTRACT. Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional to the sum of squared weights. This paper argues that stochastic gradient descent (SGD) may be an inefficient algorithm for this objective. For neural networks with ReLU activations, solutions to the weight decay objective are equivalent to those of a different objective in which the regularization term is instead a sum of products of \ell_2 (not squared) norms of the input and output weights associated with each ReLU neuron. This alternative (and effectively equivalent) regularization suggests a novel proximal gradient algorithm for network training. Theory and experiments support the new training approach, showing that it can converge much faster to the sparse solutions it shares with standard weight decay training.

09:14
A linearly convergent Gauss-Newton subgradient method for ill-conditioned problems

ABSTRACT. We analyze a preconditioned subgradient method for optimizing composite functions h∘c, where h is a locally Lipschitz function and c is a smooth nonlinear mapping. We prove that when c satisfies a constant rank property and h is semismooth and sharp on the image of c, the method converges linearly. In contrast to standard subgradient methods, its oracle complexity is invariant under reparameterizations of c.

09:36
A superlinearly convergent subgradient method for nonsmooth optimization

ABSTRACT. Nonsmooth optimization problems appear throughout machine learning and signal processing. However, standard first-order methods for nonsmooth optimization can be slow for "poorly conditioned" problems. In this talk, I will present a locally accelerated first-order method that is less sensitive to conditioning and achieves superlinear (i.e., double-exponential) convergence near solutions for a broad family of problems. The algorithm is inspired by Newton's method for solving nonlinear equations.

08:30-10:00 Session SatA 2
08:30
Using non-convex optimization in quantum process tomography: Factored gradient descent is tough to beat

ABSTRACT. We propose a non-convex optimization algorithm, based on the Burer-Monteiro (BM) factorization, for the quantum process tomography problem, in order to estimate a low-rank process matrix \chi for near-unitary quantum gates. In this work, we compare our approach against state of the art convex optimization approaches based on gradient descent. We use a reduced set of initial states and measurement operators that require 2*8^n circuit settings, as well as O(4^n) measurements for an underdetermined setting. We find our algorithm converges faster and achieves higher fidelities than state of the art, both in terms of measurement settings, as well as in terms of noise tolerance, in the cases of depolarizing and Gaussian noise models.

08:52
Iterative algorithms for finding phase factors in Quantum signal processing

ABSTRACT. Quantum signal processing (QSP) is a powerful quantum tool to exactly implement matrix polynomials on quantum computers. It represents a real scalar polynomial of degree d using a product of unitary matrices of size 2*2, parameterized by (d+1) real numbers called the phase factors. For a given function f, phase factors can be obtained by solving an optimization problem. However, the cost function is non-convex, and has a very complex energy landscape with numerous global and local minima. We provide a partial explanation of the success of our optimization based algorithm. Inspired by our theoretical analysis, we then propose fixed point iteration, which is the first numerically stable algorithm for finding phase factors with provable performance guarantees in the limit d → ∞. We also present a novel Newton's method, which demonstrates rapid and robust convergence in all parameter regimes, including the challenging scenario with ill-conditioned Jacobian matrices. Extensive numerical tests validate the effectiveness and robustness of our approaches, which has been implemented in the QSPPACK software package.

09:14
Local minima in quantum systems

ABSTRACT. Finding ground states of quantum many-body systems is known to be hard for both classical and quantum computers. As a result, when Nature cools a quantum system in a low-temperature thermal bath, the ground state cannot always be found efficiently. Instead, Nature finds a local minimum of the energy. In this work, we study the problem of finding local minima in quantum systems under thermal perturbations. While local minima are much easier to find than ground states, we show that finding a local minimum is computationally hard for classical computers, even when the task is to output a single-qubit observable at any local minimum. In contrast, we prove that a quantum computer can always find a local minimum efficiently using a thermal gradient descent algorithm that mimics the cooling process in Nature. To establish the classical hardness of finding local minima, we consider a family of two-dimensional Hamiltonians such that any problem solvable by polynomial-time quantum algorithms can be reduced to finding ground states of these Hamiltonians. We prove that for such Hamiltonians, all local minima are global minima. Therefore, assuming quantum computation is more powerful than classical computation, finding local minima is classically hard and quantumly easy.

09:36
Performance and limitations of the QAOA at constant levels on large sparse hypergraphs and spin glass models

ABSTRACT. The Quantum Approximate Optimization Algorithm (QAOA) is a general purpose quantum algorithm designed for combinatorial optimization. We analyze its expected performance and prove concentration properties at any constant level (number of layers) on ensembles of random combinatorial optimization problems in the infinite size limit. These ensembles include mixed spin models and Max-q-XORSAT on sparse random hypergraphs. Our analysis can be understood via a saddle-point approximation of a sum-over-paths integral. This is made rigorous by proving a generalization of the multinomial theorem, which is a technical result of independent interest. We then show that the performance of the QAOA at constant levels for the pure q-spin model matches asymptotically the ones for Max-q XORSAT on random sparse Erdôs-Rényi hypergraphs and every large-girth regular hypergraph. Through this correspondence, we establish that the average-case value produced by the QAOA at constant levels is bounded away from optimality for pure q-spin models when q≥4 and is even. This limitation gives a hardness of approximation result for quantum algorithms in a new regime where the whole graph is seen.

08:30-10:00 Session SatA 3
08:30
Maximum Load Assortment Optimization: Approximation Algorithms and Adaptivity Gaps

ABSTRACT. Motivated by modern-day applications such as Attended Home Delivery and Preference-based Group Scheduling, where decision makers wish to steer a large number of customers toward choosing the exact same alternative, we introduce a novel class of assortment optimization problems, referred to as Maximum Load Assortment Optimization. In such settings, given a universe of substitutable products, we are facing a stream of customers, each choosing between either selecting a product out of an offered assortment or opting to leave without making a selection. Assuming that these decisions are governed by the Multinomial Logit choice model, we define the random load of any underlying product as the total number of customers who select it. Our objective is to offer an assortment of products to each customer so that the expected maximum load across all products is maximized. We consider both static and dynamic formulations. In the static setting, a single offer set is carried throughout the entire process of customer arrivals, whereas in the dynamic setting, the decision maker offers a personalized assortment to each customer, based on the entire information available at that time. The main contribution of this paper resides in proposing efficient algorithmic approaches for computing near-optimal static and dynamic assortment policies. In particular, we develop a polynomial-time approximation scheme (PTAS) for the static formulation. Additionally, we demonstrate that an elegant policy utilizing weight-ordered assortments yields a 1/2- approximation. Concurrently, we prove that such policies are sufficiently strong to provide a 1/4-approximation with respect to the dynamic formulation, establishing a constant-factor bound on its adaptivity gap. Finally, we design an adaptive policy whose expected maximum load is within factor 1-\eps of optimal, admitting a quasi-polynomial time implementation.

08:52
Stochastic Scheduling: Strategies for Abandonment Management

ABSTRACT. We explore a job scheduling model driven by applications with impatience, such as in call centers and cloud computing. We have a single server and n jobs with known non-negative values. These jobs also have unknown stochastic service and departure times with known independent distributional information. We aim to design a policy that maximizes the expected total value obtained from jobs run on the server. Natural dynamic programming formulations face the curse of dimensionality. Furthermore, our problem is demonstrably NP-hard. Hence, we focus on policies that can provide good approximation factors compared to the optimal value. We provide a compact linear program relaxation that upper bounds the performance of the optimal policy and examine the quality of this LP relaxation by bounding its integrality gap by constants. As a byproduct of our analysis and mild assumptions on the service times supports, we provide a polynomial-time computable policy that is a (1/2)(1 − 1/e) approximation of the optimal expected reward. Additionally, we consider jobs having independent and identically distributed service times and show that the natural greedy policy guarantees half of the optimal expected value. We provide empirical validation of our policies via numerical experiments.

09:14
Random-order Contention Resolution via Continuous Induction: Tightness for Bipartite Matching under Vertex Arrivals

ABSTRACT. We introduce a new approach for designing Random-order Contention Resolution Schemes (RCRS) via exact solution in continuous time. Given a function $c(y):[0,1] \ra [0,1]$, we show how to select each element which arrives at time $y \in [0,1]$ with probability \textit{exactly} $c(y)$. We provide a rigorous algorithmic framework for achieving this, which discretizes the time interval and also needs to sample its past execution to ensure these exact selection probabilities. We showcase our framework in the context of online contention resolution schemes for matching with random-order vertex arrivals. For bipartite graphs with two-sided arrivals, we design a $(1+\sfe^{-2})/2 \approx 0.567$-selectable RCRS, which we also show to be \textit{tight}. Next, we show that the presence of short odd-length cycles is the only barrier to attaining a (tight) $(1+\sfe^{-2})/2$-selectable RCRS on general graphs. By generalizing our bipartite RCRS, we design an RCRS for graphs with odd-length girth $g$ which is $(1+\sfe^{-2})/2$-selectable as $g \rightarrow \infty$. This convergence happens very rapidly: for triangle-free graphs (i.e., $g \ge 5$), we attain a $121/240 + 7/16 e^2 \approx 0.563$-selectable RCRS. Finally, for general graphs we improve on the $8/15 \approx 0.533$-selectable RCRS of \citet{Fu2021} and design an RCRS which is at least $0.535$-selectable. Due to the reduction of \citet{Ezra_2020}, our bounds yield a $0.535$-competitive (respectively, $(1+\sfe^{-2})/2$-competitive) algorithm for prophet secretary matching on general (respectively, bipartite) graphs under vertex arrivals.

09:36
Adaptivity Gaps in Two-sided Assortment Optimization

ABSTRACT. We study a two-sided assortment optimization framework to address the challenge of choice congestion faced by matching platforms. The goal is to decide the assortments to offer to agents in order to maximize the expected number of matches. We identify several classes of policies that the platforms can use in their design. Our main goal is to measure the value that one class of policies have over another one. For this, we define the adaptivity gap as the worst-case ratio between the optimal value achieved by two different policy classes. First, we show that the adaptivity gap between the class of policies that statically show assortments to one-side first and the class of policies that adaptively show assortments to one-side first is exactly 1-1/e. Second, we show that the adaptivity gap between the latter class of policies and the fully adaptive class of policies that show assortments to agents one by one is exactly 1/2. We observe that the worst policies are those who simultaneously show assortments to all the agents, in fact, we show that the adaptivity gap with respect to one-sided policies can be arbitrarily small. These results showcase the benefit of each class of policies and, in particular, demonstrate that the optimal value of the best class of adaptive policies is a constant multiplicative factor away from those of one-sided policies. Finally, we provide improved approximation guarantees for the two-sided assortment optimization problem under fixed policy classes.

08:30-10:00 Session SatA 4
08:30
Stochastic-Gradient-based Interior-Point Algorithms

ABSTRACT. I will share some of our recent work on stochastic-gradient-based interior-point algorithms for solving constrained optimization problems with objective functions defined by expectations or large finite sums. The algorithms are single-loop in the sense that they do not require an inner criterion for updating the barrier parameter; rather, the barrier parameter is decreased according to a prescribed sequence. Convergence guarantees are attained in both deterministic and stochastic settings. The algorithms exhibit good practical performance in comparison to projected-gradient-based methods.

08:52
Almost-sure convergence of iterates and multipliers in stochastic sequential quadratic optimization

ABSTRACT. Stochastic sequential quadratic optimization (SQP) methods for solving continuous optimization problems with nonlinear equality constraints have attracted attention recently, such as for solving large-scale data-fitting problems subject to nonconvex constraints. However, for a recently proposed subclass of such methods that is built on the popular stochastic-gradient methodology from the unconstrained setting, convergence guarantees have been limited to the asymptotic convergence of the expected value of a stationarity measure to zero. This is in contrast to the unconstrained setting in which almost-sure convergence guarantees (of the gradient of the objective to zero) can be proved for stochastic-gradient-based methods. In this paper, new almost-sure convergence guarantees for the primal iterates, Lagrange multipliers, and stationarity measures generated by a stochastic SQP algorithm are proved. It is shown that the error in the Lagrange multipliers can be bounded by the distance of the primal iterate to a primal stationary point plus the error in the latest stochastic gradient estimate. It is further shown that, subject to certain assumptions, this latter error can be made to vanish by employing a running average of the Lagrange multipliers that are computed during the run of the algorithm. The results of numerical experiments demonstrate the proved theoretical guarantees.

09:14
Retrospective Approximation for Stochastic Constrained Problems Using Sequential Quadratic Programming

ABSTRACT. Sequential Quadratic Programming (SQP) is one of the state of the art algorithms to solve deterministic constrained problems. In recent years the framework has been extended to solve equality constrained problems with stochastic objective functions. To adapt the step size in stochastic settings, new schemes like stochastic line search, Lipschitz constant estimation, hessian averaging have been introduced in SQP. We use SQP algorithms in a Retrospective approximation framework that allows us to solve a series of subsampled deterministic subproblems to solve the the stochastic constrained problem. This framework decouples the stochasticity from the SQP algorithm allowing us to use legacy deterministic solvers (other than SQP as well) to solve constrained stochastic programs.

09:36
Sequential Quadratic Programming for Noisy Equality Constrained Optimization with Rank-Deficient Jacobians

ABSTRACT. We design, analyze, and implement a sequential quadratic programming method to solve general nonlinear equality-constrained optimization problems. We especially consider the setting where only noisy estimates of objective function and gradient values are available, while users may only compute noisy constraint function and Jacobian estimates as well. One of the major contributions is that the proposed algorithm can handle noisy and rank-deficient Jacobians. The results of numerical experiments reveal the efficiency and efficacy of our algorithm framework.

08:30-10:00 Session SatA 5
08:30
A Review of Quantum Interior Point Methods - pros and cons

ABSTRACT. Various Quantum Interior Point Methods (QIPMs) were proposed and analyzed in the past years for Conic Linear Optimization (CLO) problems. Those include: Inexact Infeasible, Inexact Feasible QIPMs and Quantum Central Path algorithms. Some QIPMs are enhanced with preconditioning and iterative refinement methodologies. Novel quantum variants inspired novel classic variants too.

In this talk, at a high level, we review the key features of the various QIPMs and their classic counterparts, and highlight the advantages and challenges offered by quantum computing, the pros and cons of the algorithm paradigms.

08:52
Discrete Optimization: A Quantum Revolution?

ABSTRACT. We use Grover’s algorithm to build quantum algorithms that can be used to solve discrete optimization problems on a universal quantum computer. We use our quantum algorithms to solve the binary knapsack problem, and show that our algorithms can match the performance of the best classical algorithms. Our algorithms, however, are not limited to solving the binary knapsack problem, and can be used to solve any discrete optimization problem using only O(η2^(0.5nb)) operations, where η is the number of operations required to evaluate the feasibility of a solution, n is the number of decision variables, and 2^b is the number of discrete values that can be assigned to each decision variable. Last but not least, we also show that our algorithms can be used as heuristics that can obtain a good solution using far less than O(η2^(0.5nb)) operations.

09:14
Quantum algorithms for linear differential equations beyond Hamiltonian simulation

ABSTRACT. Fault-tolerant quantum computers are expected to simulate unitary dynamics (i.e., Schrodinger equation) much faster than classical computers. However, most applications in scientific and engineering computations involve non-unitary dynamics and processes, and efficient quantum algorithms for non-unitary dynamics are the key for unlocking the full potential of quantum computers to achieve comparable speedup in these general tasks. In this talk, we will focus on quantum algorithms for general linear differential equations. We will discuss a recently proposed method based on linear combination of Hamiltonian simulation (LCHS) problems. The new LCHS method does not rely on converting the problem into a dilated linear system problem, or on the spectral mapping theorem. It can achieve optimal cost in terms of state preparation and near-optimal dependence on all parameters.

08:30-10:00 Session SatA 6
08:30
Recent Developments in the Cardinal Optimizer

ABSTRACT. Cardinal Optimizer (COPT) is a mathematical optimization solver for large-scale optimization problems. It is developed by Cardinal Operations, and it includes high-performance solvers for LP, MIP, SDP, (MI)SOCP, convex (MI)QP and convex (MI)QCP. In this talk we will introduce the latest research and developments in COPT.

08:52
Second-Derivative SQP Methods for Nonconvex Nonlinear Optimization

ABSTRACT. The class of sequential quadratic programming (SQP) methods solve a nonlinearly constrained optimization problem as a sequence of simpler subproblems that involve the minimization of a quadratic model of the Lagrangian function subject to the linearized constraints. In contrast to the quasi-Newton approach, which maintains a positive-definite Hessian approximation, second-derivative SQP methods use the exact Hessian of the Lagrangian. In this context, we consider an adaptive convexification strategy that makes minimal matrix modifications while ensuring the subproblem iterates are bounded and the QP solution defines a descent direction for the relevant merit function.

09:14
Algorithm NCL for constrained optimization: Overcoming LICQ

ABSTRACT. Problems having LICQ difficulties are readily handled by LANCELOT because its short sequence of subproblems have only bound constraints. We call LANCELOT a BCL method (bound-constrained augmented Lagrangian). Algorithm NCL solves an equivalent sequence of nonlinearly constrained subproblems that are much bigger but are suitable for interior methods such as IPOPT and KNITRO. We explain how IPOPT and KNITRO can be warm-started between subproblems, and give numerical results for both AMPL and Julia implementations of Algorithm NCL.

09:36
An LDLT Quasi-Newton Trust-Region Method

ABSTRACT. For quasi-Newton methods in unconstrained minimization, it is valuable to develop methods that are robust, i.e., methods that converge on a large number of problems. Trust-region algorithms are often regarded to be more robust than line-search methods, however, because trust-region methods are computationally more expensive, the most popular quasi-Newton implementations use line-search methods. To fill this gap, we develop a trust-region method that updates an LDLT factorization, scales quadratically with the size of the problem, and is competitive with a conventional line-search method.

08:30-10:00 Session SatA 7
08:30
A Robust Approach to Locating Vaccination Centers for a Pandemic: A Problem Driven by Herd Immunity and Decision-Dependent Utility, with a Solution Framework

ABSTRACT. This study introduces an optimization model for determining the location of vaccination centers while accounting for distributionally robust decision-dependent utility, herd immunity, and equity. We explore both linear and concave utility functions. Addressing uncertainty in the parameters defining utility and the concave herd-immunity function, we reformulate the problems as two-stage stochastic mixed-integer bilinear and mixed-integer biconvex programs, respectively. In their extended form, GUROBI cannot handle the latter problem, while for the bilinear case, it is observed to be inefficient as the problem scales. The Benders' type decomposition structure, on the other hand, results in bilinearity and biconvexity between the first and second stage variables in the second stage problem. However, simple treatment with McCormick relaxation, although faster than GUROBI, still struggles to close the desired optimality gap within a reasonable time. Concerning this limitation, we generate valid cuts, focusing on those first and second stage variables in non-linear relationships to strengthen the projected optimality cuts for the first stage in the Benders' structure. We demonstrate how these cuts help in closing the optimality gap faster. Furthermore, we interpret how location decisions change with respect to anticipated herd-immunity levels, equity thresholds, robustness considerations, and decision dependence, and discuss the resulting implications for pandemic containment.

08:52
A Successive Refinement Algorithm for Stochastic Programs with Decision-Dependent Distribution Functions

ABSTRACT. We study a class of stochastic programming problems where the distribution function of uncertainty is a function of the first-stage variables. A natural way to formulate the objective functions of these problems is as the summation of high-degree products of probabilities, each of which is a function of first-stage variables; hence, this approach is intractable. To solve this class of problems we investigate a successive refinement algorithm that incrementally applies Jensen's inequality to improve lower bounds. We apply this technique to a diverse set of optimization problems, and our experiments indicate that our algorithm outperforms other potential approaches.

09:14
Distributionally Robust Optimization with Multimodal Decision-Dependent Ambiguity Sets

ABSTRACT. We consider a two-stage distributionally robust optimization (DRO) model with multi-modal uncertainty, where both the mode probabilities and uncertainty distributions could be affected by the first-stage decisions. We propose different uncertainty sets to characterize the decision-dependent mode probabilities and consider both moment-based and Wasserstein distance-based ambiguity sets to characterize the uncertainty distribution under each mode. We identify specific forms of decision dependence under which we can derive tractable reformulations for the resulting DRO models. Furthermore, we investigate the benefits of considering multimodality in a DRO model compared to a single-modal one that fuses the information together.

09:36
Residual-Based Distributionally Robust Optimization with Decision-Dependent Uncertainty

ABSTRACT. We consider a residual-based distributionally robust optimization model, where the underlying uncertainty depends on our decisions. We adopt regression models to learn the latent decision dependency and construct ambiguity sets around the learned model using empirical residuals. Specifically, we build sample-based and Wasserstein distance-based ambiguity sets, where both the empirical distributions and radii could be decision-dependent. We provide statistical guarantees of the solutions to our problem, including asymptotic optimality, rate of convergence, and finite sample guarantees. For the problem with decision-dependent radii, we further develop algorithms to calibrate the dependence of radii on the decisions and propose tractable reformulations. Through numerical experiments, we illustrate the effectiveness of our algorithm and the benefits of integrating decision dependency into a residual-based DRO framework.

08:30-10:00 Session SatA 8
08:30
A Bilevel Optimization Approach for a Class of Combinatorial Problems with Disruptions and Probing

ABSTRACT. We study a class of combinatorial optimization problems for which uncertain disruptions can affect the objective's cost coefficients. The disruptions affect the coefficients component-wise and their occurrence are represented by a sequence of Bernoulli random variables. Uncertainty can be mitigated by investing resources into information collection, i.e., probing, before solving the problem. In a first stage, a decision-maker (DM) selects which components to probe subject to a limited budget, whereas in a second decision stage the DM solves the combinatorial problem by estimating the uncertainty using the information collected by the probes. This two-stage problem is formulated and solved as a stochastic bilevel problem. We devise exact and heuristic algorithms that exploit the structural problem of the setting. The results show that our methods are capable of solving medium-sized instances of the problem and that in many instances it is sufficient to probe only a few coefficients to attain the same performance of a full-information decision-maker.

08:52
A branch-and-bound framework to solve a general class of three-stage interdiction games

ABSTRACT. We study a general class of three-stage interdiction games that model the sequential interaction of three decision-makers: a protector, an attacker, and a defender. In the first stage, the protector conducts a set of defensive actions targeting the interdiction strategies conducted by the attacker during the second stage in order to mitigate their effect on the defender's objective. Finally, in the third stage, the defender optimizes their course of action, taking into consideration the effects caused by the previous stages. We focus on a general variation in which the actions taken by the protector can directly affect the solution space of the defender. In other words, the protection actions made in the first stage may also turn solutions for the defender infeasible. General applications of this type of game arise in problems with an underlying network structure, where the protector can alter the composition of the network by adding/removing edges and vertices. In this presentation, we develop an exact solution framework for these types of problems based on a combinatorial branch-and-bound search. We provide some examples of applications arising from communications and network design problems.

09:14
A Single-Level Reformulation of Integer Bilevel Programs using Decision Diagrams

ABSTRACT. Integer bilevel programming problems are known to be very challenging due to the lack of strong relaxations that can be efficiently computed. We propose a new single-level representation of integer bilevel programming problems that relies on a network flow-based formulation of the follower’s value function, using decision diagrams and primal-dual constraints. We then show how we can derive scalable relaxations from this representation by incorporating only a restricted set of the master problem solutions, yielding dual bounds. We experimentally compare our approach with state-of-the-art bilevel programming solvers and show that we can obtain competitive results for certain problem classes, especially when the leader and the follower have opposing objective functions.

09:36
Value Function Projections for Discrete Bilevel Optimization

ABSTRACT. We propose an exact solution method for linear discrete bilevel optimization problems using approximations of the value function of the discrete inner problem based on relaxed decision diagrams. We exploit the network structure provided by the decision diagrams to strengthen the big-M single-level reformulations of the original bilevel problem. Our computational experiments show that our proposed approaches compare favorably with existing state-of-the-art methods on a testbed of instances from the literature.

08:30-10:00 Session SatA 9
08:30
Explicit solutions of the SI and Bass models on random networks

ABSTRACT. We introduce a new methodology for computing explicitly the expected adoption (infection) level in the SI and Bass models for spreading of new products and epidemics on networks. In contrast to the mean-field approach, the obtained expressions are exact. We use this methodology to obtain explicit solutions for two types of sparse random networks: k-regular networks and Erdős–Rényi networks. To the best of our knowledge, this is the first result of this type.

08:52
Distributionally Ambiguous Multistage Stochastic Integer and Disjunctive Programs: Applications to Sequential Two-player Network Interdiction Games

ABSTRACT. We introduce generalizations of multistage stochastic mixed-integer programs (MSIPs) with distributional ambiguity, namely distributionally risk-receptive and risk-averse multistage stochastic mixed-integer programs (denoted by DRR- and DRA-MSIPs). These modeling frameworks have applications in non-cooperative Stackelberg games involving two players, namely a leader and a follower, with uncertainty in the impact of the decisions made by the leader. We present cutting plane-based and reformulation-based approaches for solving DRR- and DRA-MSIPs to optimality. We showcase that these approaches are finitely convergent with probability one. We also introduce generalizations of MSIPs by considering multistage stochastic disjunctive programs with(out) distributional ambiguity and present algorithms for solving them. To assess the performance of the algorithms for MSIPs, we consider instances of multistage maximum flow network and facility location interdiction problems with distributional ambiguity that are important interdiction problems in their own right. Based on our computational results, we observe that the cutting plane-based approaches are 28 and 24.1 times (on average) faster than the reformulation-based approaches for the foregoing instances with distributional risk-aversion and risk-receptiveness, respectively.

09:14
Fundamentals of Sparse Exact Linear Algebra

ABSTRACT. Solving sparse linear systems has a central role in solving linear programs and other optimization problems. Solving linear programs and systems exactly is necessary for some applications (e.g., theoretical results, feasibility problems, military applications, applications with hefty costs, ill-conditioned problems, etc.). To address this, we are developing the Sparse Exact (SPEX) Factorization Framework: a high-performance, well-documented, and extremely robust software package. This talk will focus on the history and the theoretical foundations of the package.

09:36
The minimum feedback arc set problem over bidirectional graphs with an application to faculty hiring networks

ABSTRACT. We introduce a novel graph theory problem that includes the minimum feedback arc set as a special case; specifically, it expands from this classic problem by allowing bidirectional arcs in the input graph and output arc set. We derive and compare multiple integer programming formulations for solving it and explore additional techniques for characterizing hierarchical structures from the resulting solution graph. The proposed framework is implemented to study faculty hiring networks to identify the prestige (rank) of different departments within specific disciplines.

08:30-10:00 Session SatA10
08:30
Machine learning enhanced meta-heuristic optimization algorithms for autonomous experimentation

ABSTRACT. In this work, we investigate a novel approach to autonomous experimentation by integrating machine learning with meta-heuristic optimization algorithms, specifically enhancing Particle Swarm Optimization and Differential Evolution. Our method accelerates these algorithms' convergence rates, significantly improving efficiency in autonomous decision-making processes. Utilizing ML models, we guide the meta-heuristic algorithms toward more promising solution spaces, thereby enhancing their exploration capabilities and increasing the likelihood of identifying near-optimal solutions. This synergy is particularly effective in complex scenarios, such as the inverse design of optical surfaces. Here, our enhanced algorithms navigate the intricate design space more efficiently than traditional methods. The integration of ML with PSO and DE not only speeds up the optimization process but also opens new possibilities for autonomous problem-solving in various domains.

08:52
Hexaly, a new kind of global optimization solver

ABSTRACT. Hexaly Optimizer is a new kind of global optimization solver. Its modeling interface is nonlinear and set-oriented. It also supports user-coded functions, thus enabling black-box optimization and, more particularly, simulation optimization. In a sense, Hexaly APIs unify modeling concepts from mixed-linear programming, nonlinear programming, and constraint programming. Under the hood, Hexaly combines various exact and heuristic optimization methods: spatial branch-and-bound, simplex methods, interior-point methods, augmented Lagrangian methods, automatic Dantzig-Wolfe reformulation, column and row generation, propagation methods, local search, direct search, population-based methods, and surrogate modeling techniques for black-box optimization.

Regarding performance benchmarks, Hexaly distinguishes itself against the leading solvers in the market, like Gurobi, IBM Cplex, and Google OR Tools, by delivering fast and scalable solutions to problems in the spaces of Supply Chain and Workforce Management like Routing, Scheduling, Packing, Clustering, and Location. For example, on notoriously hard problems like the Pickup and Delivery Problem with Time Windows or Flexible Job Shop Scheduling with Setup Times, Hexaly delivers solutions with a gap to the best solutions known in the literature smaller than 1% in a few minutes of running times on a basic computer.

In addition to the Optimizer, we provide an innovative development platform called Hexaly Studio to model and solve rich Vehicle Routing and Job Shop Scheduling problems in a no-code fashion. The user can define its problem and data, run the Optimizer, visualize the solutions and key metrics through dashboards, and deploy the resulting app in the cloud – without coding. This web-based platform is particularly interesting for educational purposes; it is free for faculty and students.

09:14
Data-driven Parameter Estimation for Large-Scale Nonlinear Dynamical Systems

ABSTRACT. Many sequential decision-making problems can be described by a system of nonlinear ordinary differential equations. To use these models for downstream tasks such as inference, relevant model parameters must often be estimated from past data, in the absence of known physical laws. For a variety of applications, including logistics, epidemic modelling, or energy systems, this defines large-scale dynamical systems with a variety of data sources to be used for estimation. The resulting nonlinear optimization problems are thus challenging to solve, especially considering potential real-time constraints. To tackle these challenges, we utilize a parallel decomposition scheme based on the Schur complement method implemented in ParaPint. This approach separates the model components related to parameter estimation, e.g. linear models or neural networks, from the dynamical systems that they are embedded in. The parameters of the estimation model become complicating variables in a structured optimization problem, which can be solved in parallel using an interior point method, where a Schur complement decomposition is applied at every iteration. We demonstrate this framework on parameter estimation problems arising in epidemic inference and demonstrate the parallel scaling with increasing problem size and estimation model complexity.

09:36
Efficient Hybrid Simulation Optimization Via Graph Neural Network Metamodeling

ABSTRACT. Simulation metamodeling is essential for speeding up optimization via simulation to support rapid decision making. During optimization, the metamodel, rather than expensive simulation, is used to compute objective values. We recently developed graphical neural metamodels (GMMs) that use graph neural networks to allow the graphical structure of a simulation model to be treated as a metamodel input parameter that can be varied along with scalar inputs. In this talk, we first introduce GMMs and then we provide novel methods for using GMMs to solve hybrid optimization problems where both real-valued input parameters and graphical structure are jointly optimized. The key ideas are to modify Monte Carlo tree search to incorporate both discrete and continuous optimization and to leverage the automatic differentiation infrastructure used for neural network training to quickly compute gradients of the objective function during stochastic gradient descent. Experiments on stochastic activity network and warehouse models demonstrate the potential of our method.

08:30-10:00 Session SatA11
08:30
A Fokker-Planck optimal control framework in esophageal cancer

ABSTRACT. In this talk, we present an optimal control framework for controlling aberrant signaling pathways in esophageal cancer. The dynamics of signaling pathways is given by a stochastic process that models the randomness present in the system. The stochastic dynamics is then represented by the Fokker-Planck (FP) partial differential equation that governs the evolution of the associated probability density function. We solve two inverse problems: a FP parameter estimation problem for model fitting and a FP feedback control problem to determine the optimal combination therapies for controlling the signaling pathways. Finally, we demonstrate the efficiency of the proposed framework through numerical results with combination drugs. This work was funded by the US National Science Foundation (award number: DMS 2212938)

08:52
A differential game framework for regulating aberrant signaling pathways in esophageal cancer

ABSTRACT. In this work, we present a new framework for a competitive evolutionary game between monoclonal antibodies and signaling pathways in esophageal cancer. The framework is based on a novel mathematical model that takes into account the dynamic progression of signaling pathways, resistance mechanisms, and monoclonal antibody therapies. We conduct the analysis and computation of two game-theoretic strategies, Stackelberg and Nash's equilibria, within this framework to ascertain the most favorable outcome for the patient. Numerical experiments show that the Stackelberg equilibria are superior for treating signaling pathways and are critical for the success of monoclonal antibodies in improving patient outcomes. This work was funded by the US National Science Foundation (award number: DMS 2212938)

09:14
Classifying with Uncertain DEA

ABSTRACT. Data Envelopment Analysis (DEA) is a longstanding and trusted computational stalwart in OR, but its efficacy depends on, and assumes, dependable and accurate data, which is a fallacy in many applications. We present an uncertain variation of DEA and show how it aptly applies not only to efficiency analysis but also to classification, and we illustrate with examples in healthcare (prostate treatments), finance (Dow Jones Stocks), and the social sciences (gerrymandering).

09:36
Blind Source Separation with Spatiotemporal Beamforming

ABSTRACT. Blind Source Separation (BSS) techniques such as Independent Component Analysis, Sparse Component Analysis, and Bounded Component Analysis, often fail to perform when working with signal-mixtures from non-sparse, unbounded, or stochastically dependent sources. This limits their real-world applications. Delivering signal separation in real-time is a far more difficult problem. We have developed a BSS technique with spatiotemporal beamforming that can separate signals in real-time, even in problem-scenarios wherein even the exact positions of sensors receiving signal-mixtures are not known.

Using targeted-search-algorithms to quickly find solutions in high-dimensional-search-space, the proposed BSS technique is able to successfully extract desired signals from signal-mixtures and deliver real-time performance. For a real-word application scenario, modeled in software as a discrete, single-objective, blackbox optimization problem, first, automated-hyperparameter-tuning is performed to identify good hyperparameter values of a few targeted-search-algorithms, and then a number of efficiency and effectiveness performance metrics are used to compare and rank the targeted-search-algorithms. The proposed technique is able to successfully extract the desired signal even when interferers are co-located with the source that is transmitting the desired signal. We have identified a wide range of potential applications in management and healthcare domains.

13:15-14:45 Session SatB 1
13:15
Stochastic Gradient Succeeds for Bandits

ABSTRACT. The stochastic policy gradient is a classic algorithm for bandit problems, which is naturally compatible with deep neural networks, in stark contrast to UCB and Thompson sampling, and has seen significant practical success. However, the theoretical understanding of the algorithm, especially the global convergence result has not been previously established. In this talk, we answer the question affirmatively, and will show that the stochastic gradient bandit algorithm converges to a globally optimal policy at an O(1/t) rate, even with a constant step size. The new result is achieved by establishing two novel technical findings: first, the noise of the stochastic updates in the gradient bandit algorithm satisfies a strong “growth condition” property, where the variance diminishes whenever progress becomes small, implying that additional noise control via diminishing step sizes is unnecessary; second, a form of “weak exploration” is automatically achieved through the stochastic gradient updates, since they prevent the action probabilities from decaying faster than O(1/t), thus ensuring that every action is visited enough times. These two findings can be used to show that the stochastic gradient update is already “sufficient” for bandits in the sense that exploration versus exploitation is automatically balanced in a manner that ensures almost sure convergence to a global optimum.

13:37
On the Foundation of Distributionally Robust Reinforcement Learning

ABSTRACT. Motivated by the need for a robust policy in the face of environment shifts between training and deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs). This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary. By unifying and extending existing formulations, we rigorously construct DRMDPs that embrace various modeling attributes for both the decision maker and the adversary. These attributes include adaptability granularity, exploring history-dependent, Markov, and Markov time-homogeneous decision maker and adversary dynamics. Additionally, we delve into the flexibility of shifts induced by the adversary, examining SA and S-rectangularity. Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP). From an algorithmic standpoint, the existence of DPP holds significant implications, as the vast majority of existing data and computationally efficient RL algorithms are reliant on the DPP. To study its existence, we comprehensively examine combinations of controller and adversary attributes, providing streamlined proofs grounded in a unified methodology. We also offer counterexamples for settings in which a DPP with full generality is absent

13:59
Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Algorithm Design and Convergence Analysis

ABSTRACT. Nonsmooth, nonconvex-nonconcave minimax optimization has gained widespread interest in recent years in machine learning and data science. However, much of the existing work focuses on a variant of the gradient descent-ascent (GDA) algorithm, which exhibits several limitations: (1) They are only applicable to smooth problems, where gradient information is available. (2) They may not converge globally, and they may suffer from limit cycles. (3) No single algorithm can be universally applied to all minimax optimization problems, such as nonconvex-concave and convex-nonconcave minimax settings.

In this talk, we will tackle these challenges for a class of structured nonsmooth, nonconvex-nonconcave minimax optimization problems. The main difficulty in algorithmic design for minimax optimization is balancing the primal decrease and dual increase in an efficient way. To overcome this difficulty, we develop a novel primal-dual error bound that serves as the foundation for both convergence analysis and algorithmic design. This gives rise to a new algorithm design principle known as optimal primal-dual balancing. Following this principle, we develop a single algorithm, doubly smoothed (prox-linear)/ gradient descent ascent, which universally works for all minimax optimization problems. Our algorithm finds an $\epsilon$-stationary point in $O(epsilon^-4)$ iterations. If additional regularity is assumed (weaker than standard assumptions imposed in the literature), we obtain sharper, even optimal, iteration complexity results. We showcase the effectiveness of our algorithm in getting rid of limit cycles in challenging nonconvex-nonconcave minimax optimization problems.

14:21
HyperAgent: A Simple, Efficient and Scalable Reinforcement Learning Framework in Complex Environment

ABSTRACT. Efficiency and scalability in complex environments remain pressing challenges in reinforcement learning (RL). Traditionally, Thompson Sampling have offered principled approaches to deep exploration in RL but fall short in terms of computational feasibility in complex environments due to their reliance on maintaining posterior distributions. The posterior is only tractable in simple environments with restrictive assumptions. A crucial aspect in modern RL is thus to develop an algorithm that is both data and computationally efficient, and scalable to complex environments. \HyperAgent, our proposed framework, addresses these challenges head-on. It can be regarded as approximate Thompson Sampling, employing a hypermodel for efficient uncertainty estimation in the action-value function and thus enabling deep exploration in complex environments. \HyperAgent also incorporates an incremental update mechanism, allowing for computationally efficient model updates, essential for compatibility with the demands of modern deep RL and scalability to complex and high-dimensional environments. Empirically, \HyperAgent demonstrates its robust performance. It efficiently solves the DeepSea benchmark, illustrating its capability for deep exploration and scalability to large state space. Additionally, in the Atari benchmark, \HyperAgent achieves super-human performance with just 2M interactions, highlighting its computational efficiency and scalability to high-dimensional environments. Theoretically, \HyperAgent undergoes a rigorous regret performance analysis, affirming its computational and data efficiency. This positions \HyperAgent as the first provable RL algorithm with practical efficiency in large-scale complex environment. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for efficient and scalable RL algorithms.

13:15-14:45 Session SatB 2
13:15
Data-Driven Structured Reinforcement Learning: Foundations and Practice

ABSTRACT. Next-generation (NextG) networked systems enabling applications ranging from 5G/6G, autonomous driving to augmented/virtual reality will be data-driven and large-scale. This poses enormous burdens on traditional systems such as wireless communication networks, and emerging architectures such as edge and cloud computing systems, in which classic algorithmic guarantees do not always hold and rigorous performance evaluation can be even difficult. Though various data-driven techniques have recently been applied to a variety of networking and distributed systems, pure data-driven solutions face their own challenges on curse-of-dimensionality, expressiveness and generalization. In this talk, I will present a data-driven structured reinforcement learning framework which overcomes these challenges via judiciously combing model-based and model-free learning techniques. This novel framework exploits the inherent structure encoded in classical models, and enables the design of novel reinforcement learning architectures with strong theoretical performance guarantee (e.g., order-of-optimal regret) and low computational complexity, and hence can be easily implemented in practice. Along the way, I will discuss my vision for NextG networked systems that expand the scope of structured learning so as to enable tractable system design.

13:37
Hindsight Learning for MDPs with Exogenous Inputs

ABSTRACT. Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algorithms achieve data efficiency by leveraging a key insight: having samples of the exogenous variables, past decisions can be revisited in hindsight to infer counterfactual consequences that can accelerate policy improvements. We test our algorithms in a cloud resource management problem -- allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods.

13:59
Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

ABSTRACT. This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming, this paper proposes a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the RMAB solution quality. We present three key contributions: (i) we achieve decision-focused learning by establishing the differentiability of the approximate solution Whittle index policy; (ii) we significantly improve the scalability to real-world problem sizes; (iii) we apply our algorithm to a previously collected dataset of maternal and child health to demonstrate its performance. Our algorithm is the first for decision-focused learning in RMAB that scales to real-world problem sizes, and the algorithm has been deployed with more than 280,000 beneficiaries benefited from it.

14:21
Handling Action Space Constraints in Deep Reinforcement Learning

ABSTRACT. Many important real world problems can be formulated as a sequential decision-making problem that is then solved using reinforcement learning. One such class of problem is resources allocation problems, sometimes also in adversarial setting. Often, resource allocation needs to satisfy constraints that arise in the real world. In this talk, I will present two papers that address the issue of constraints on the action space in a reinforcement learning setting. The first work tackles linear constraints arising from the presence of an adversary by utilizing a novel pseudo-projection technique, resulting in a fast and practical solution. This work has applications in security resource allocation for screening of threats, e.g., screening at airports. The second work tackles arbitrary constraints that cannot be written compactly in a mathematical form by using a novel framework of a conditional generative model as a policy network, which furthers provides the advantage of increased expressivity of stochastic policies. This work has application in emergency service resource allocation, e.g., allocation of ambulances.

13:15-14:45 Session SatB 3
13:15
An interesting generalization of the Koopman-Beckmans quadratic assignment problem

ABSTRACT. In this paper, we study a variation of the well-known Koopmans-Beckman quadratic assignment problem called the independent quadratic assignment problem. Motivating applications of the model are indicated. The problem is strongly NP-hard and is also hard to approximate. Some polynomially solvable special cases are identified along with a complete characterization of linearizable instances of the problem, the validity of which can be tested efficiently. We then develop innovative and simple heuristic algorithms to solve the problem based on the paradigms of local search, very large-scale neighborhood (VLSN) search, and local search with path-relinking. The performance of the algorithms are analyzed using results generated by extensive computational experiments. Our experimental study highlights that the strategies of path-relinking and VLSN search when worked in unison produce superior outcomes outperforming algorithms that employ these strategies individually. This underscores the power of hybrid algorithmic approaches in solving complex optimization problems, in particular, the independent quadratic assignment problem.

13:37
Quadratic Combinatorial Optimization: A Polyhedral Characterization of Linearizable Instances

ABSTRACT. We introduce a polyhedral framework for characterizing instances of 0-1 quadratic programs that are linearizable, meaning that the quadratic objective can be equivalently rewritten as linear in such a manner that preserves the objective function value at all feasible binary solutions. Of particular interest to our study are problems that are challenging when the objective function is quadratic but are readily solvable in the linear case, such as the quadratic assignment problem (QAP) and the quadratic minimum spanning tree problem (QMSTP). Our main result is that an instance is linearizable if and only if the quadratic cost coefficients can be used to create a linear equation in a lifted variable space that is valid for the affine hull of a specially structured discrete set. In this talk, we will discuss how viewing the concept of linearizability through this new polyhedral lens has interesting implications for both the QAP and QMSTP.

13:59
Simple Randomized Rounding for Max-Min Eigenvalue Augmentation

ABSTRACT. We consider the max-min eigenvalue augmentation problem: given an n × n symmetric positive semidefinite matrix M , vectors a1, . . . , am ∈ Rn, and a positive integer k ≤ m, the goal is to choose a subset I ⊂ {1, . . . , m} of cardinality at most k to maximize the minimum eigenvalue of the matrix M + ∑i∈I aia⊤_i . The problem captures both the maximum algebraic connectivity augmentation and E-optimal design problems. There are two distinct research streams dedicated to these problems that develop approximation algorithms based on rounding optimal semidefinite programming relaxation solutions. Interestingly, both research streams empirically evaluate the performance of “simple” randomized rounding methods but do not establish theoretical approximation guarantees for them. In this paper we establish approximation guarantees for a simple randomized rounding method applied to the general max-min eigenvalue augmentation problem. Furthermore, we investigate its empirical performance through computational experiments.

14:21
Tightening Linear Reformulations of 0-1 Cubic Knapsack Problems with Variable Reordering

ABSTRACT. The 0-1 cubic knapsack problem (CKP) is a generalization of the classical 0-1 quadratic knapsack problem. An effective exact solution strategy for the CKP is to reformulate the nonlinear problem into an equivalent linear form that can then be solved using a standard mixed-integer programming solver. We consider a classical linearization method and propose a variant of a more recent technique for linearizing 0-1 cubic programs applied to the CKP. Using a variable reordering strategy, we show how to improve the strength of the linear programming relaxations and reduce the overall solution times. In addition, we develop a simple heuristic method for obtaining good-quality solutions that can be used to provide a warm start to the solver. Computational tests demonstrate the effectiveness of both our heuristic method and variable reordering strategy.

13:15-14:45 Session SatB 4
13:15
A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks

ABSTRACT. Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer followed by ridge regression on the second layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the loss, we demonstrate that these non-linear features can enhance learning.

This talk is based on our recent work (https://arxiv.org/abs/2310.07891) with Donghwan Lee, Hamed Hassani, and Edgar Dobriban.

13:37
Towards a First-Order Algorithmic Framework for Wasserstein Distributionally Robust Optimization

ABSTRACT. Due to its ability to offer interpretations of robustness in various learning models, Wasserstein distance-based distributionally robust optimization (DRO) has gained much attention in both optimization and machine learning communities. Although many DRO problems arising in the learning context admit convex reformulations, there is currently a lack of fast iterative methods for solving them. This severely limits the applicability of the DRO approach in machine learning. In this talk, we present a first-order algorithmic framework for solving a class of structured Wasserstein distributionally robust risk minimization problems and discuss its convergence guarantees.

This talk is based on joint work with Caihua Chen and Jiajin Li.

13:59
How do neural nets learn features of "simple" functions

ABSTRACT. One of the key mysteries in modern learning is that a variety of models such as deep neural networks when trained via (stochastic) gradient descent can extract useful features and learn high quality representations directly from data simultaneously with fitting the labels. This feature learning capability is also at the forefront of the recent success of a variety of contemporary paradigms such as end-to-end training for image reconstruction, self-supervised and transfer learning. Despite a flurry of exciting activity over the past few years, existing theoretical results are often too crude and/or pessimistic to explain feature/representation learning in practical regimes of operation or serve as a guiding principle for practitioners. Indeed, existing literature often requires unrealistic hyperparameter choices (e.g. very small step sizes, large initialization or wide models). Surprisingly, this difficulty persists even for learning very "simple" function classes. In this talk I will focus on demystifying this feature/representation learning phenomena for "simple" function classes that have remained unresolved despite nearly a decade of rigorous analysis for nonconvex learning.

14:21
Can Learning Be Explained By Local Optimality In Low-rank Matrix Recovery?

ABSTRACT. We explore the local landscape of low-rank matrix recovery, aiming to reconstruct a low-rank matrix from a limited number of linear measurements, some potentially noisy. When the true rank is unknown, overestimation is common, yielding an over-parameterized model. Recent findings suggest that first-order methods with the robust L1-loss can recover the true low-rank solution even when the rank is overestimated and measurements are noisy, implying that true solutions might emerge as local or global minima. Our study challenges this notion, demonstrating that, under mild conditions, true solutions manifest as strict saddle points.

13:15-14:45 Session SatB 5
13:15
Facial Reduction for Semidefinite Relaxations of Combinatorial Optimization Problems

ABSTRACT. We propose an efficient implementation of FR algorithms for SDP relaxations of combinatorial optimization problems. Our innovative approach is to leverage the combinatorial and the polyhedral structures behind its SDP relaxations. The key idea is that the feasible set of the semidefinite program is a relaxation for the lifted feasible set of the associated problem. Thus, it is possible to connect certain geometric properties between these feasible sets. Our strategy is to derive these relevant properties from the underlying CO problem directly. As a result, this allows us to facially reduce its semidefinite relaxation more efficiently and effectively.

13:37
O-minimal structures and central paths in nonlinear optimization

ABSTRACT. Using results from singularity theory and real and o-minimal geometry, we prove quantitative results for the existence of central paths, their convergence, and their differentiability at the limit point for general nonlinear optimization problems, where the objective function and constraints are definable functions in an o-minimal structure.

13:59
A Peaceman-Rachford Splitting Method for the Protein Side-Chain Positioning Problem

ABSTRACT. This paper considers the NP-hard protein side-chain positioning problem, SCP, an important final task of protein structure prediction. We formulate the SCP as an integer quadratic program, IQP, and derive its doubly nonnegative, DNN, (convex) relaxation. Strict feasibility fails for this DNN relaxation. We apply facial reduction, FR, to regularize the problem. This gives rise to a natural splitting of the variables. We then use a variation of the Peaceman-Rachford splitting method, PRSM, to solve the DNN relaxation. The resulting relaxation and rounding procedures provide strong approximate solutions. Empirical evidence shows that almost all our instances of this NP-hard \SCP problem, taken from the Protein Data Bank, are solved to provable optimality. Our large problems correspond to solving a DNN relaxation with 2,883,601 binary variables to provable optimality.

13:15-14:45 Session SatB 6
13:15
Performance Benchmarks of Quantum Annealing Hardware

ABSTRACT. Over the past decade, the usefulness of quantum computers for combinatorial optimization has been the subject of much debate. Thus far, experimental benchmarking studies have indicated that quantum computing hardware does not provide an irrefutable performance gain over state-of-the-art optimization methods. However, as this hardware continues to evolve, each new iteration brings improved performance and warrants further benchmarking. To that end, this work conducts an optimization performance assessment of D-Wave Systems’ most recent "Advantage Performance Update" computer, which can natively solve sparse unconstrained quadratic optimization problems with over 5,000 binary decision variables and 40,000 quadratic terms. This work demonstrates that classes of contrived optimization problems exist where this quantum annealer can provide run time benefits over a collection of established classical solution methods that represent the current state-of-the-art for benchmarking quantum annealing hardware. Although this work does not present strong evidence of an irrefutable performance benefit for this emerging optimization technology, it does exhibit encouraging progress, signaling the potential impacts on practical optimization tasks in the future.

13:37
Federated Hierarchical Tensor Networks: a Collaborative Quantum AI-Driven Framework for Healthcare

ABSTRACT. Due to the decentralized and diverse nature of healthcare units, real-time data collection becomes challenging due to geographical spread, diversity of systems, and resource disparities. Furthermore, healthcare industries frequently handle sensitive and proprietary data. Due to strict privacy regulations, they are often reluctant to share data directly. In today's context, Federated Learning (FL) stands out as a crucial remedy, facilitating the rapid advancement of distributed machine learning while effectively managing critical concerns regarding data privacy and governance. The intersection of Quantum Machine Learning (QML) represents a groundbreaking interdisciplinary approach with potential for revolutionizing various industries ranging from healthcare to finance. Notably, the resemblance between the methods used in quantum-many body physics (Tensor Networks (TNs)) and in machine learning has drawn considerable attention. In this work, we proposed the QFL framework consisting the following quantum circuits with hierarchical structures: Matrix Product State (MPS), Tree Tensor Network (TTN) and multi-scale entanglement renormalization ansatz (MERA). We systematically evaluate the performance of QTNs in various medical applications by employing chest radiographs from the RSNA dataset and MRI brain scans from the Alzheimer’s ADNI datasets, which are non-independently and non-identically distributed among healthcare institutions/clients. Experimental results demonstrated that a quantum federated global model, comprising highly entangled TTN and MERA structures, achieved higher testing accuracy surpassing the performance of locally trained clients under unbalanced data distributions. To the best of our knowledge, this is the first study to investigate the effectiveness and feasibility of a proposed framework that integrates QTNs, and differential privacy mechanisms to ensure sensitive data remains secure across healthcare institutions.

13:59
Optimization Applications as Quantum Performance Benchmarks

ABSTRACT. Combinatorial optimization is anticipated to be one of the primary use cases for quantum computation in the coming years. Quantum Approximate Optimization Algorithm (QAOA) and Quantum Annealing (QA) are expected to demonstrate significant run-time performance benefits over classical solutions. We review existing methods for characterizing the performance of classical optimization algorithms and use these to compare solution quality obtained by solving Max-Cut problems using a quantum annealing device and gate-model quantum processors and simulators. This is used to guide the development of an advanced benchmarking framework for quantum computers designed to evaluate the trade-off between run-time execution performance and the quality of solution for iterative hybrid quantum-classical applications. Of particular relevance is how the nature of the problem input configuration impacts the solution quality, the length of time required to achieve a solution, and the algorithm limitations uncovered by the benchmarking approach. The framework is an enhancement to the existing open-source QED-C Application-Oriented Benchmark suite. The suite can be executed on various quantum simulators, quantum hardware backends, and a quantum annealing device.

14:21
Benchmarking the Operation of Quantum Heuristics and Ising Machines: Scoring Parameter Setting Strategies on Optimization Applications

ABSTRACT. We discuss guidelines for the performance evaluation of parameterized stochastic solvers of optimization problems, with particular attention to systems that employ novel hardware, such as digital quantum processors running variational algorithms, analog processors performing quantum annealing, or coherent Ising Machines. We illustrate through an example a benchmarking procedure grounded in the statistical analysis of the expectation of a given performance metric measured in a test environment. In particular, we discuss the necessity and the cost of setting parameters that affect the performance of the algorithm, whose optimal value could vary significantly between instances of the same target problem. We present an open-source software package that facilitates the design, evaluation, and visualization of practical parameter tuning strategies for complex use of the heterogeneous components of the solver. We examine in detail an example using parallel tempering as well as a simulator of a photonic Coherent Ising Machine computing and display the scoring of an illustrative baseline family of parameter-setting strategies that feature an exploration-exploitation trade-off.

13:15-14:45 Session SatB 7
13:15
MSPLib and MSPFormat: A Library Of Problems and a New Standardized Data Format For Benchmarking Stochastic Dual Dynamic Programming

ABSTRACT. In this paper, we introduce two essential contributions: MSPLib, a comprehensive library of multistage stochastic programming problems, and MSPFormat, a new standardized data structure format for multistage stochastic programs. The main objective of MSPLib is to facilitate the evaluation and comparison of computational performance among different implementations of stochastic dual dynamic programming (SDDP). The library encompasses a diverse range of instances, including real-world problems and synthetic variations with varying levels of complexity. By incorporating MSPFormat, we provide a unified and consistent representation of multistage stochastic programs. We also test prevailing implementations of SDDP - including QUASAR, SDDP.jl, and MSPPy.

13:37
Self-Adapting Network Relaxations for Weakly Coupled Markov Decision Processes

ABSTRACT. High-dimensional weakly coupled Markov decision processes (WDPs) arise in dynamic decision-making and reinforcement learning, decomposing into smaller MDPs when linking constraints are relaxed. The Lagrangian relaxation of WDPs (LAG) exploits this property to compute policies and (optimistic) bounds efficiently; however, dualizing linking constraints averages away combinatorial information. We introduce feasibility network relaxations (FNRs), a new class of linear programming relaxations that exactly represent the linking constraints. We develop a procedure to obtain the unique minimally sized relaxation, which we refer to as self-adapting FNR, as its size automatically adjusts to the structure of the linking constraints. Our analysis informs model selection: (i) the self-adapting FNR provides (weakly) stronger bounds than LAG, is polynomially sized when linking constraints admit a tractable network representation, and can even be smaller than LAG, and (ii) self-adapting FNR provides bounds and policies that match the classical approximate linear programming (ALP) approach but is substantially smaller in size than the ALP formulation. We perform numerical experiments on constrained dynamic assortment/restless multi-armed bandits and preemptive maintenance applications. Our results show that self-adapting FNR significantly improves upon LAG in terms of policy performance and/or bounds, while being an order of magnitude faster than an alternative Lagrangian and ALP, which are unsolvable in several instances.

13:59
A Data-Driven Multistage Stochastic-Robust Optimization Approach with Temporally Correlated Uncertainty

ABSTRACT. Sequential decision-making is a procedure where decisions are made adaptively based on progressively observed data through multistage optimization. It has gained popularity in recent years and has various applications, such as power systems scheduling, production-inventory planning, financial optimization, and transportation. The main challenge in multistage optimization is how to make future decisions based on historical data. Existing techniques fail to capture the temporal correlation among historical and future time series data accurately. This leads to imprecise estimation of future uncertain parameters and unreliable or overly conservative decisions. To address this challenge, a data-driven multistage hybrid stochastic-robust optimization approach for sequential decision-making with temporally correlated uncertainty is proposed. Our approach makes three contributions: 1) A data-driven dynamic uncertainty set is constructed to model the discrete-time autoregressive stochastic process. The statistical properties of the stochastic process model are analytically derived, and an uncertainty set that captures prediction errors resulting from misspecification of regression coefficients is proposed. 2) A multistage stochastic-robust optimization model for sequential decision-making with partially observed uncertainty is developed. This model is nested in a rolling-horizon decision-making framework that allows decision-makers to dynamically update the uncertainty modeling as more information becomes available, and decisions can be adaptively made with the partially observed data. A tailored column generation algorithm is designed to globally solve the proposed stochastic-robust optimization model with an implicit robust feasible region restricted by an infinite number of constraints. The proposed data-driven hybrid stochastic-robust optimization approach is tested in economic dispatch problems of power systems and compared with conventional multistage stochastic and robust optimization approaches. Our computational results show that our approach produces decision policies with relatively low frequency of constraint violations without significantly impacting economic and environmental benefits.

14:21
Nonlinear Decision Rules Made Scalable by Nonparametric Liftings

ABSTRACT. Sequential decision-making often requires dynamic policies, which are computationally not tractable in general. Decision rules provide approximate solutions by restricting decisions to simple functions of uncertainties. In this paper, we consider a nonparametric lifting framework where the uncertainty space is lifted to higher dimensions to obtain nonlinear decision rules. Current lifting-based approaches require pre-determined functions and are parametric. We propose two nonparametric liftings, which derive the nonlinear functions by leveraging the uncertainty set structure and problem coefficients. Both methods integrate the benefits from lifting and nonparametric approaches, and hence, provide scalable decision rules with performance bounds. More specifically, the set-driven lifting is constructed by finding polyhedrons within uncertainty sets, inducing piecewise-linear decision rules with performance bounds. The dynamics-driven lifting, on the other hand, is constructed by extracting geometric information and accounting for problem coefficients. This is achieved by using linear decision rules of the original problem, also enabling to quantify lower bounds of objective improvements over linear decision rules. Numerical comparisons with competing methods demonstrate superior computational scalability and comparable performance in objectives. These observations are magnified in multistage problems with extended time horizons, suggesting practical applicability of the proposed nonparametric liftings in large-scale dynamic robust optimization.

13:15-14:45 Session SatB 8
13:15
Exact Algorithm for Maximum Feasible Subsystem Problems with Uncertain Input Data

ABSTRACT. Maximum Feasible Subsystem Problem (MSFP) involves finding a largest subsystem of a given infeasible system of linear equations such that this subsystem has a feasible solution. This combinatorial optimization problem has a wide range of applications in domains such as machine learning, protein folding, and chemical engineering. We introduce stochastic MSFP having an infeasible system of linear equations with uncertain input data parameters, and present an exact algorithm for solving this problem. We conduct extensive computational experiments to evaluate the performance of this approach, and observe that this approach is effective and efficient. Furthermore, for deterministic MSFP, this approach is computationally much faster in comparison to the various known exact algorithms for MSFP.

13:37
Chance-Constrained Fractional Programming under Ambiguity -- Distributionally Robust Maximization of Ratio Exceedance Probability

ABSTRACT. We study distributionally robust chance-constrained fractional constraints which maximize the exceedance probability of a ratio. We consider statistical distance-based (phi-divergence, Wasserstein) and moment-based ambiguity sets and several ratio functional forms. We devise reformulation methods providing conxer reformulations and present some computational results.

13:59
Proximal OA method for solving nonconvex MIQCQP

ABSTRACT. In this paper, we introduce a new proximal outer approximation method (PrOA) for the solving of mixed-integer quadratically constrained quadratic programs. The key elements of the method are i) that it extends the PrOA method to models involving nonlinear/quadratic objectives, ii) the nonlinear linking constraints can include both inequalities and equalities, and iii) integer-feasible points are generated by solving an MIQP using KKT conditions with a reformulated objective. We determine the convergence properties of the proposed algorithm. Next, we illustrate its execution on quadratic-constrained variant of the quadratic assignment problem.

14:21
Risk-Adaptive Local Decision Rules

ABSTRACT. For parameterized mixed-binary optimization problems, we construct local decision rules that prescribe near-optimal courses of action across a set of parameter values. The decision rules stem from solving risk-adaptive training problems over classes of continuous, possibly nonlinear mappings. In asymptotic and nonasymptotic analysis, we establish that the decision rules prescribe near-optimal decisions locally for the actual problems, without relying on linearity, convexity, or smoothness. The development also accounts for practically important aspects such as inexact function evaluations, solution tolerances in training problems, regularization, and reformulations to solver-friendly models. The decision rules also furnish a means to carry out sensitivity and stability analysis for broad classes of parameterized optimization problems. We develop a decomposition algorithm for solving the resulting training problems and demonstrate its ability to generate quality decision rules on a nonlinear binary optimization model from search theory.

13:15-14:45 Session SatB 9
13:15
Applications of 0-1 Neural Networks in Prescription and Prediction

ABSTRACT. A key challenge in medical decision making is learning treatment policies for patients with limited data. This challenge is particularly evident in personalized healthcare decision-making, where models need to take into account the intricate, potentially causal, relationships between patient characteristics, treatment options, and health outcomes. To address this, we introduce prescriptive networks, shallow 0-1 neural networks trained with mixed-integer programming that can be used with counterfactual estimation to derive optimal treatment policies in medium data settings. These models offer greater interpretability than deep neural networks and can encode more complex policies than common models such as decision trees. We show that our models can outperform existing methods in both synthetic data experiments and in a real-world case study of assigning treatments for postpartum hypertension using limited observational data.

13:37
Adaptive Optimization and Safe Reinforcement for Heparin Doseing

ABSTRACT. A key challenge in sequential decision making is optimizing systems safely under partial information. While much of the literature has focused on the cases of either partially known states or partially known dynamics, it is further exacerbated in cases where both states and dynamics are partially known. Computing heparin doses for patients fits this paradigm since the concentration of heparin in the patient cannot be measured directly and the rates at which patients metabolize heparin vary greatly between individuals. While many proposed solutions are model free, they require complex models and have difficulty ensuring safety. However, if some of the structure of the dynamics is known, a model based approach can be leveraged to provide safe policies. In this paper we propose such a framework to address the challenge of optimizing personalized heparin doses. We use a predictive model parameterized individually by patient to predict future therapeutic effects. We then leverage this model using a scenario generation based approach that is capable of ensuring patient safety. We validate our models with numerical experiments by comparing the predictive capabilities of our model against existing machine learning techniques and demonstrating how our dosing algorithm can treat patients in a simulated ICU environment.

13:59
Improving Patient Access via Primary Care Optimization

ABSTRACT. We propose a joint predictive-prescriptive framework for primary care transformation. Primary care departments traditionally take a “one size fits all” approach to panel assignment and care delivery, which can lead to over-burdened providers and long delays for booking appointments. However, the heterogeneity of patients within a panel, and the emergence of more flexible care models, provide an opportunity to transform department operations. We develop models to predict individual patient demand and an optimization framework for panel assignment. Our model accommodates multiple provider levels (e.g., MDs and APRNs) and visit modalities (in person vs. virtual), providing a tool to evaluate the tradeoffs between current practice and alternative models. The model is currently in use to inform department care delivery strategies across a large CT health system.

14:21
Quantify Childbirth Care Inequity for Socioeconomically Disadvantaged Groups

ABSTRACT. Obstetrical and neonatal health care disparities among socioeconomically disadvantaged groups, including but not limited to people of color, immigrants, people of low education and/or economic status are persistent, prevalent, and complex. Although many epidemiology studies show inferior obstetrical and neonatal healthcare services received by these minorities, most studies on socioeconomic disparity in childbirth care focus on one or two episodes of maternal and/or neonatal care services. We quantify inequity of quality childbirth health care for socioeconomic disadvantaged groups and cover the periods of prenatal, perinatal, and newborn cares. We aim to promote equitable access for socioeconomic disadvantage women and their newborns to high-quality childbirth health services, by developing a comprehensive quality score for childbirth care. Driven by the actual dataset, we develop multilevel mixed-format item response theory (MMIRT) models with skewed latent trait (i.e. pregnant women's health). We use nonparametric analysis for feature selection, assumption validation, and identification of the best-fit model. We further develop a parametric method for efficient MMIRT model fitting of large-size, mixed-type data.

13:15-14:45 Session SatB10
13:15
Surface Mount Technology (SMT) Production Line Scheduling by Computer Simulation and Genetic Algorithm

ABSTRACT. A typical SMT production line consists of a solder paste printing station, component placement machines, a reflow oven, and an inspection station. Considering the complexity of the system, this study adopts the simulation methodology instead of a mathematical programming to model the SMT operations. We use the SimPy package of Python to build the simulation model. Though the simulation model is able to provide a good representation of the SMT system, it is unable to search a good scheduling of orders. Thus, a genetic algorithm is embedded with the simulation module to find scheduling that yield the completion time as less as possible. The genetic algorithm interacts with the simulation module iteratively through exchanging the newly generated order sequences and the completion times associated with each order sequence obtained from the simulation module. The proposed approach is evaluated with real cases from a local printed circuit board (PCB) company. Experiments are performed with different settings of the problem sizes and the similarities among orders. A small size problem instance is also tested to assess the proposed approach’s capability of finding optimal solutions when the order number is small. The experimental results show the feasibility of our approach in real applications.

13:37
A Novel Population Based Optimization Approach: GAGAN Algorithm for Traveling Salesman Problem

ABSTRACT. In this paper, a novel population-based optimization approach, namely GAGAN Algorithm(GA) is proposed. GAGAN is interpreted as Sky in Sanskrit. The algorithm strives to reach the optimal path by exploring the wide search space as in the sky; hence the algorithm is named the GAGAN Algorithm(GA). The formation of stars in the sky is random fluctuations of density within the cloud, or due to some outside reasons. The stars are always moving, exploring the huge area of the sky by following their local paths. So, the formation and behavior of stars motivate us to develop the GAGAN Algorithm in which the population is randomly generated and then follows the local search strategy (eg. 2-opt). In this study, the Traveling Salesman Problem (TSP) is considered. Most of the algorithms need to be modified or hybridized to tackle TSP as they are not able to solve the TSP directly. However, the GAGAN Algorithm can directly solve the TSP. Finally, to evaluate the performance of the GAGAN Algorithm, it has been tested on 21 benchmark datasets taken from TSPLIB and compared with other competitive algorithms. Simulation outcomes reveal that the proposed GA is superior in terms of accuracy, convergence speed, stability, and robustness.

13:59
Worm Optimization Algorithm to Solve for the Transportation Problem of Cross-Docking Network

ABSTRACT. This research addresses the transportation problem of cross-docking network where the loads are transferred from origins to destinations through cross-docking facilities, without storing them in a distribution center. The transportation cost is minimized by loading trucks in the supplier locations and then routing them either directly to the customers or indirectly to cross-docking facilities so the loads can be consolidated. The transportation problem of cross-docking network is classified as a pure-integer programming model containing both non-negative integer variables and binary integer variables and it is a well-known NP-hard problem. Subsequently, a Worm Optimization (WO) algorithm that is inspired by the foraging behaviors of Caenorhabditis elegans (Worms), is developed for this complex problem. While having a limited number of neurons, worms were able to achieve several intricate behaviors including finding better quality food, avoiding toxins, interchanging between solitary and social foraging styles, alternating between “dwelling - food exploiting” and “roaming - food seeking”, and entering a type of stasis stage. Consequently, WO mimics the worms’ behaviors in order to find optimal solutions in a faster computational time. WO’s parameters were optimized using a custom design of experiments, and its performance assessed by comparing it to ant colony optimization and Branch-and-Bound.

13:15-14:45 Session SatB11
13:15
Tight Bounds for The Price of Fairness

ABSTRACT. A central decision maker (CDM), who seeks an efficient allocation of scarce resources among a finite number of players, often has to incorporate fairness criteria to avoid unfair outcomes. Indeed, the Price of Fairness (POF), a term coined in Bertsimas et al. (2011), refers to the efficiency loss due to the incorporation of fairness criteria into the allocation method. Quantifying the POF would help the CDM strike an appropriate balance between efficiency and fairness. In this paper we improve upon existing results in the literature, by providing tight bounds for the POF for the proportional fairness criterion for any n, when the maximum achievable utilities of the players are equal or are not equal. Further, while Bertsimas et al. (2011) have already derived a tight bound for the max-min fairness criterion for the case that all players have equal maximum achievable utilities, we also provide a tight bound in scenarios where these utilities are not equal. Finally, we investigate the sensitivity of our bounds and bounds of Bertsimas et al. (2011) for the POF to the variability of the maximum achievable utilities.

13:37
Achieving Group Fairness through Social Welfare Optimization

ABSTRACT. Statistical parity metrics (demographic parity, equalized odds, predictive rate parity, etc.) are widely used in the AI community as a means of assessing fairness. However, there are no clear criteria for selecting the appropriate parity metric, identifying relevant protected groups, or reconciling conflicts of interest among groups. We explore whether a broader conception of social justice, based on optimizing well-known social welfare functions (SWFs), can address these issues. Unlike parity metrics, SWFs take into account the actual welfare consequences for the individuals involved. We determine the extent to which maximizing social welfare in the population as a whole achieves parity of various kinds for any given pair of subgroups. Our results suggest that optimization theory can shed light on the intensely discussed question of how to achieve group fairness in AI.

13:59
Districting with Overlapping Service Regions

ABSTRACT. Districting is a crucial optimization problem for modern service systems, such as transportation, healthcare, and emergency response. The goal is to partition a geographical region into districts to provide efficient and effective services to the population. However, traditional non-overlapping districting schemes may suffer from inefficient service delivery when the demand is not evenly distributed across the district. In this talk, we highlight the importance of overlapping districting, which allows for a more flexible and responsive allocation of resources based on the actual service demand. We explore heuristic algorithms and Bayesian optimization as solutions to the overlapping districting challenge, considering real-world limitations. Our goal is to increase understanding of overlapping districting and inspire new research in this field.

14:21
Dynamic Matching with Post-allocation Service and its Application to Refugee Resettlement

ABSTRACT. Motivated by our collaboration with a major refugee resettlement agency in the U.S., we study a dynamic matching problem where each new arrival (a refugee case) must be matched immediately and irrevocably to one of the static resources (a location with a fixed annual quota). In addition to consuming the static resource, each case requires post-allocation services from a server, such as a translator. Given the uncertainty in service time, a server may not be available at a given time, thus we refer to it as a dynamic resource. Upon matching, the case will wait to avail service in a first-come-first-serve manner. Bursty matching to a location may result in undesirable congestion at its corresponding server. Consequently, the central planner (the agency) faces a dynamic matching problem with an objective that combines the matching reward (captured by pair-specific employment outcomes) with the cost for congestion for dynamic resources and over-allocation for the static ones. Motivated by the observed fluctuations in the composition of refugee pools across the years, we aim to design algorithms that do not rely on distributional knowledge. We develop learning-based algorithms that are asymptotically optimal in certain regimes, easy to interpret, and computationally fast. Our design is based on learning the dual variables of the underlying optimization problem; however, the main challenge lies in the time-varying nature of the dual variables associated with dynamic resources. Our theoretical development brings together techniques from Lyapunov analysis, adversarial online learning, and stochastic optimization. On the application side, when tested on real data from our partner agency, our method outperforms existing ones making it a viable candidate for replacing the current practice upon experimentation.

16:30-18:00 Session SatC 1
16:30
No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

ABSTRACT. Existing online learning algorithms for adversarial Markov Decision Processes achieve vanishing regret even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this talk, I will show how to design algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. I will start from the case when the degree of maliciousness is known, and then show how to remove this requirement using a novel black-box reduction approach. Time permitting, I will also discuss how to further improve the algorithm so it simultaneously adapts to certain easier environments with much lower regret.

16:52
Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements

ABSTRACT. For min-max optimization and variational inequalities problems (VIPs), Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size versions of SEG/SGDA have gained popularity due to several appealing benefits, but their convergence behaviors are complicated even in rudimentary bilinear models. Our work elucidates the probabilistic behavior of these algorithms and their projected variants, for a wide range of monotone and non-monotone VIPs with potentially biased stochastic oracles. By recasting them as time-homogeneous Markov Chains, we establish geometric convergence to a unique invariant distribution and asymptotic normality. Specializing to min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the global solution, which in turns allows for bias refinement via the Richardson-Romberg scheme. Our theoretical analysis is corroborated by numerical experiments.

17:14
Policy Mirror Descent Inherently Explores the Action Space

ABSTRACT. Explicit exploration in the action space was assumed to be indispensable for online policy gradient methods to avoid a drastic degradation in sample complexity, for solving general reinforcement learning problems over finite state and action spaces. In this paper, we establish for the first time an $O(1/\epsilon^2)$ sample complexity for online policy gradient methods without incorporating any exploration strategies, provided Markov chains on the state space of generated policies are uniformly mixing with non-diminishing minimal visitation measure. The essential development consists of two new on-policy evaluation operators, and a novel analysis of the stochastic policy mirror descent method that considers interactions between policy evaluation and policy improvement.

17:36
Constrained Policy Optimization in Reinforcement Learning: A Tale of Regularization and Optimism

ABSTRACT. In contemporary reinforcement learning, constrained policy optimization is a prominent methodology for finding optimal policies under constraints. Its applications span diverse domains, ranging from robot navigation to health care. However, the challenge of oscillating policy iterates in training due to the minimax structure within Lagrangian-based policy gradient methods poses a significant hurdle. To mitigate the oscillation, this talk first introduces regularization into Lagrangian-based constrained minimax optimization and presents a regularized policy gradient-based primal-dual method with sublinear convergence of policy iterates to an optimum. Next, this talk introduces optimism from learning in games to develop an optimistic gradient-based primal-dual method. This method demonstrates interesting linear convergence of policy iterates to an optimum, showcasing the effectiveness of optimism in a scenario where the minimax optimization lacks convexity. Ultimately, our results shed light on the role of regularization and optimism in mitigating oscillation in constrained policy learning.

16:30-18:00 Session SatC 2
16:30
Neuromancer: Differentiable Programming Library for Data-driven Modeling and Control

ABSTRACT. Neural Modules with Adaptive Nonlinear Constraints and Efficient Regularizations (NeuroMANCER) is an open-source differentiable programming library for solving parametric constrained optimization problems, physics-informed system identification, and parametric model-based optimal control. NeuroMANCER is written in PyTorch and allows for systematically integrating machine learning with scientific computing to create end-to-end differentiable models and algorithms embedded with prior knowledge and physics. The library is focused on research, rapid prototyping, and streamlined deployment. Strong emphasis is given to extensibility and interoperability with the PyTorch ecosystem with quick adaptability to custom domain problems. Our open-source repository contains numerous tutorial examples demonstrating the use of physics-informed neural networks for solution and parameter estimation of differential equations, learning to optimize methods with feasibility restoration layers and differentiable control algorithms for learning constrained control policies for nonlinear systems.

16:52
Exploiting Observed Network Behavior for Accelerated Optimal Power Flow

ABSTRACT. AC optimal power flow (OPF) is a non-convex, non-linear programming problem. For large systems, computing the optimal solution in the required time frame can be extremely challenging. However, the full formulation does not take any account of network specific parameters, which significantly affect which constraints are active. This talk will explore methods to incorporate observations from data to accelerate the computation of OPF.

17:14
Adaptive Prescriptions for Optimal Power Flow under Uncertainty

ABSTRACT. The massive roll-out of stochastic resources (e.g., wind and solar power) in the power system complicates its operations and challenges established regulatory and economic paradigms. Yet, while novel and provably effective decision-making techniques can internalize parameter stochasticity efficiently, they are often complex and deviate too much from existing approaches for a timely implementation in practice. This talk discusses approaches on how optimal power flow, a central tool for power system operations and planning, can be enriched with few interpretable parameters that are prescribed such that its performance is as close as possible to a fully stochastic approach. We briefly survey the state-of-the-art and present a novel data-driven approach to learn an efficient parameter prescription for robust optimal power flow.

17:36
On the coupling between the scheduling of electrified fleets of buses travel and charging and grid pricing.

ABSTRACT. The scheduling of service routes for fleets of buses is a classical example of a complex integer programming problem, for which effective approximate solutions have been developed. The recent electrification of these fleets introduces a new dimension to the scheduling task, involving decisions on when and where to charge the buses. As buses contribute significant demand to the grid distribution system, their charging schedules can impact local congestion and influence the locational marginal price of electricity. This creates a feedback loop between wholesale market operators and transportation operator decisions if scheduling responds to these prices. This presentation examines approaches to address this coupling, aiming to effectively capture the price-sensitive nature of bus travel and charging scheduling. Simultaneously, the talk seeks computationally efficient methods to obtain solutions for the bus schedule and the correct price of electricity facing this enhanced complexity.

16:30-18:00 Session SatC 3
16:30
Non-Monotonicity of Branching Rules with respect to Linear Relaxations

ABSTRACT. Modern mixed-integer programming solvers use the branch-and-cut framework, where cutting planes are added to improve the tightness of the linear programming (LP) relaxation, with the expectation that the tighter formulation would produce smaller branch-and-bound trees. In this work, we consider the question whether adding cuts will always leads to smaller trees for a given fixed branching rule. We formally call such a property of a branching rule as monotonicity. We show that several well-known branching rules, such as, full strong branching with product score, linear score and Le Bodic-Nemhauser ratio score, most factional branching, and separating distance branching are non-monotonic. Moreover, we present a family of instances where adding a single cut leads to an exponential increase in the tree size for most of the above rules, in spite of improving the LP bound. Finally, we empirically attempt to estimate the prevalence of non-monotonicity in practice while using full strong branching. We consider randomly generated multi-dimensional knapsacks tightened by cover cuts as well as instances from MIPLIB 2017 benchmark set for the computational experiments. Our main insight from these experiments is that if the gap closed by cuts is small, change in tree size is difficult to predict, and often increases, possibly due to inherent non-monotonicity. However, when sufficiently large gap is closed, significant decrease in tree size may be expected.

16:52
Compressing Branch-and-Bound Trees

ABSTRACT. A branch-and-bound (BB) tree certifies a dual bound on the value of an integer program. In this work, we introduce the tree compression problem (TCP): Given a BB tree T that certifies a dual bound, can we obtain a smaller tree with the same (or stronger) bound by either (1) applying a different disjunction at some node in T or (2) removing leaves from T? We believe such post-hoc analysis of BB trees may assist in identifying helpful general disjunctions in BB algorithms. We initiate our study by considering computational complexity and limitations of TCP. We then conduct experiments to evaluate the compressibility of realistic branch-and-bound trees generated by commonly-used branching strategies, using both an exact and a heuristic compression algorithm.

17:14
Branch-and-Bound with Predictions for Variable Selection

ABSTRACT. In this paper, we are interested in solving a binary integer program using the branch-and-bound method. The execution of the branch-and-bound method varies significantly with the choice of variable selection rule (sometimes called branching rule). To solve a given MILP with the smallest branch-and-bound tree possible, one would want to branch on a variable such that the two resulting subproblems are as ``easy'' as possible. Informally, given access to a function $\sigma(S)$ that gives the size of the smallest branch-and-bound tree solving subproblem $S$, we can construct a smallest branch-and-bound tree by branching on the variable $j$ resulting in subproblems $S_{x_j = 0}, S_{x_j = 1}$ such that $\sigma(S_{x_j = 0}) + \sigma(S_{x_j = 1})$ is minimum over all choices of $j$. Of course, computing all of the necessary values of $\sigma(S)$ is no easier than solving the integer program itself. This motivates the need for a good estimator $\hat{\sigma}$ for $\sigma$, which has been an implicit goal of several previous studies, including those employing machine learning techniques for variable selection. Indeed, one can view full strong branching as branching according to an estimator $\hat{\sigma}$ that is a (monotone) function of only the additive integrality gap. Our study reflects the goals of the rapidly developing research on ``algorithms with predictions'', which focuses on how to leverage constantly advancing machine learning tools for improved algorithms. In particular, the goal of this work is to quantify the impact of the quality of the estimate $\hat{\sigma}$ when used for variable selection. We provide some theoretical results and discussion, as well as a computational study of strong branching under this context.

17:36
Branch-and-bound for D-Optimality with fast local search and variable-bound tightening

ABSTRACT. We develop a branch-and-bound algorithm for the D-optimality problem, a central problem in statistical design theory, based on two convex relaxations, employing variable bound tightening and fast local-search procedures, testing our ideas on randomly-generated test problems.

16:30-18:00 Session SatC 4
16:30
Lower Complexity Bounds of First-order Methods for Affinely Constrained Composite Non-convex Problems

ABSTRACT. Many recent studies on first-order methods (FOMs) focus on composite non-convex non-smooth optimization with linear and/or nonlinear function constraints. Upper (or worst-case) complexity bounds have been established for these methods. However, little can be claimed about their optimality as no lower bound is known, except for a few special smooth non-convex cases. In this talk, I will present lower complexity bounds of FOMs for solving a class of composite non-convex non-smooth optimization with linear constraints. Assuming two different first-order oracles, we establish lower complexity bounds of FOMs to produce a (near) $\epsilon$-stationary point of a problem (and its reformulation) in the considered problem class, for any given tolerance $\epsilon>0$. Our lower bounds indicate that the existence of a non-smooth convex regularizer can make the affinely constrained problem arbitrarily harder. In addition, we show that our bounds are tight, with a difference of up to a logarithmic factor from an upper complexity bound.

16:52
Last-iterate convergence for convex function-constrained optimization

ABSTRACT. In recent times, various works have explored algorithms for convex function-constrained optimization problem. Most of these works provide convergence guarantees for the average-iterate. Such solutions degrade various desirable properties of the algorithmic output such as sparsity requirements or differential privacy guarantees. Moreover, it is common practice to output the last-iterate. Even then, last-iterate convergence guarantees are uncommon in function-constrained optimization problems. In this talk, we present primal-dual type algorithms that converge for nonsmooth and stochastic convex function-constrained optimization problems in the last iterate. In particular, our algorithms do not need to search for the Lagrange multiplier, nor do we need explicit descent in some potential function to get a bound on the dual iterate. Our designs are simple to implement and the gradient complexity of our proposed methods matches the existing state-of-the-art for the average iterate in both nonsmooth and stochastic cases.

17:14
Computing a directional stationary solution of DC programs via ADMM

ABSTRACT. Difference-of-convex (DC) programming has been exploited to solve various decision problems in diverse applications, including finance, health care, power systems, and machine learning. This talk presents two alternating direction methods of multipliers (ADMM) that are designed to solve an unconstrained DC program and a special case that aims to reconstruct a group-structured sparsity in the model variables. We show the iterates of these methods converge to a directional stationary solution under suitable conditions. We present the results of our numerical study to compare the proposed methods to widely practiced algorithms, including the DC Algorithm and the iteratively reweighted algorithm.

17:36
Optimal Regularization for a Data Source

ABSTRACT. In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment criteria that enforce data fidelity with a regularizer that promotes desired structural properties in the solution. The choice of a suitable regularizer is typically driven by a combination of prior domain information and computational considerations. Convex regularizers are attractive computationally but they are limited in the types of structure they can promote. On the other hand, nonconvex regularizers are more flexible in the forms of structure they can promote and they have showcased strong empirical performance in some applications, but they come with the computational challenge of solving the associated optimization problems. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what is the optimal regularizer for data drawn from the distribution? What properties of a data source govern whether the optimal regularizer is convex? We address these questions for the class of regularizers specified by functionals that are continuous, positively homogeneous, and positive away from the origin. As the regularizers we consider are in one-to-one correspondence with star bodies, we leverage dual Brunn-Minkowski theory to show that a radial function derived from a data distribution is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization.

16:30-18:00 Session SatC 5
16:30
Any-dimensional convex sets

ABSTRACT. Classical algorithms are defined on inputs of different sizes. In contrast, algorithms learned from data may only be defined on inputs of the same size as the data. What does it mean for an algorithm to be defined on infinitely-many input sizes? How do we describe such algorithms, and how do we parametrize and search over them?

In this talk, we tackle these questions for convex optimization-based algorithms. Describing such algorithms reduces to describing convex sets. These, in turn, are often "freely" described, meaning that their description makes instantiation in every dimension obvious. Examples include unit balls of standard norms defined on vectors of any size, graph parameters defined for graphs of any size, and (quantum) information theoretic quantities defined for distributions on any number of (qu)bits.

We show that such free descriptions of convex sets arise from two ingredients. First, group invariance and the recently-identified phenomenon of representation stability. Second, embeddings and projections relating different-sized problem instances. We combine these ingredients to obtain parametrized families of infinitely instantiable convex sets. We also use representation stability to explain why the size of many symmetry-reduced conic programs stabilize, unifying some of the arguments in the literature.

16:52
Iterative Refinement for Semidefinite Optimization

ABSTRACT. Iterative Refinement (IR) is a classical computing technique for obtaining highly precise solutions to linear systems of equations, as well as linear optimization problems. We develop the first IR scheme for solving semidefinite optimization (SDO) problems and explore two major impacts of the proposed IR scheme. First, we prove that the proposed IR scheme exhibits quadratic convergence toward an optimal solution without any assumption on problem characteristics. On the practical side, IR can be useful to find precise solutions while using inexact SDO solvers. We also show that using IR with Quantum Interior Point Methods (QIPMs) leads to exponential improvements in the worst-case overall running time of QIPMs, compared to previous best-performing QIPMs.

17:14
Computing implicit equalities of a linear inequality system

ABSTRACT. A linear inequality is called an implicit equality in a linear inequality system if it holds at equality for all solutions to the linear inequality system. Identifying all implicit equalities of a linear inequality system is equivalent to finding the affine hull of the polyhedron defined by the linear inequality system. In this talk, we discuss and compare existing methods for computing implicit equalities and propose a simple new method by combining the existing ideas. Numerical results show that our new method performs better in tested instances.

16:30-18:00 Session SatC 6
16:30
Model Deployment with GAMS Engine

ABSTRACT. The right tools help you deploy your GAMS model and maximize the impact of your decision support application. If your model requires significant computational resources to solve, you may benefit from GAMS Engine, a powerful tool for solving GAMS models either on-prem or in the cloud. Engine acts as a broker between applications or users with GAMS models to solve and the computational resources used for this task. Central to Engine is a modern REST API that provides an interface to a scalable, containerized system of services, providing API, database, queue, and a configurable number of GAMS workers. GAMS Engine is available as a standalone application or as a Software-As-A-Service solution running on AWS.

In this talk we will introduce GAMS Engine and demonstrate its use with a simple example.

16:52
The Best of Both Worlds - Integrating Python and GAMS

ABSTRACT. Optimization applications combine technology and expertise from many different areas, including model-building, algorithms, and data-handling. Often, the gathering, pre/post-processing, and visualization of the data is done by a diverse organization-spanning group that shares a common bond: their skill in and appreciation for Python and the vast array of available packages it provides. For this reason, GAMS offers multiple ways to integrate with Python on the data-handling side, as well as offering some packages of our own (e.g. GAMS Transfer, GAMSPy). In this talk, we will explore the benefits of this integration.

17:14
AMPLS Solver Libraries

ABSTRACT. AMPLS Solver Libraries allow users to seamlessly export AMPL model instances to persistent solver representations, facilitating advanced solution algorithm implementation. Compatible with C++, Python, and C#, these libraries offer flexibility in terms of performance/ease of use tradeoff. The libraries ensure a smooth transition from AMPL modeling to ad-hoc algorithm implementation. Notably, some solver functionalities are mapped to allow the reuse of implementations across solvers, while retaining access to solver-specific features. This mapping streamlines the development process, providing a unified interface for common functionalities while preserving the unique capabilities of each solver.

17:36
Automatic Reformulations in AMPL’s New MP Solver Interface Library

ABSTRACT. MP, the AMPL modeling system's new solver interface library, streamlines optimization model development by offering a broad selection of automatic reformulations. These include decomposition (e.g., linearization of logical constraints) and globalization (e.g., filtering of conic constraints), as well as piecewise-linear approximation. The library also streamlines development of new solver interfaces via a flexible type-safe C++ class hierarchy, and offers tools to explore the reformulations that have been performed. Automatic reformulations -- performed either by a modeling tool or in the solver -- can sometimes hide numerical issues with a solution; we discuss several examples and workarounds.

16:30-18:00 Session SatC 7
16:30
Adaptive Partitioning for Chance-Constrained Problems with Finite Support

ABSTRACT. This presentation considers chance-constrained stochastic problems (CCSPs) with finite support. We present an iterative algorithm that solves a reduced size chance-constrained model. This reduced model is obtained by partitioning the scenario set and, when solved to optimality, yields a bound on the optimal objective value of the original CCSP. Moreover, the algorithm ensures finite termination at an optimal solution of the original CCSP. At the heart of the algorithm lie two fundamental operations: refinement and merging. Refinement involves splitting a subset of the partition, while merging combines two subsets. These operations are executed to enhance the bound obtained in each step of the algorithm while minimally increasing the size of the reduced model to be solved. Under mild conditions, we prove that, for specific refinement and merge operations, the bound obtained after solving each reduced model strictly improves throughout the iterative process. Furthermore, the algorithm structure allows for the seamless integration of various computational enhancements, significantly reducing the computational time required to solve the reduced chance-constrained problems. Based on theoretical arguments we also provide strategies to initialize the partition considered in the algorithm. The efficiency of the algorithm is assessed through numerical experiments. We study the impact of every component of the algorithm and compare its performance against state-of-the-art methods on chance constrained multidimensional knapsack problems.

16:52
Integer Programming Approaches for Distributionally Robust Chance Constraints with Adjustable Risks

ABSTRACT. We study distributionally robust chance-constrained programs (DRCCPs) with individual chance constraints and random right-hand sides. The DRCCPs treat the risk tolerances associated with the distributionally robust chance constraints (DRCCs) as decision variables to trade off between the system cost and risk of violations by penalizing the risk tolerances in the objective function. We consider two types of Wasserstein ambiguity sets: one with finite support and one with a continuum of realizations. By exploring the hidden discrete structures, we develop mixed integer programming reformulations under the two types of ambiguity sets to determine the optimal risk tolerance for the chance constraint. Valid inequalities are derived to strengthen the formulations. We test instances with transportation problems of diverse sizes and a demand response management problem.

17:14
Chance-Constrained Set k-Covering Problem

ABSTRACT. We consider a variant of the set covering problem with uncertain data, which we refer to as the chance-constrained set $k$-covering problem. In this problem, we assume that there is uncertainty regarding whether a selected set can cover an item, and the objective is to determine a minimum-cost combination of sets that covers each item $i$ at least $k_i$ times with a prescribed probability. We formulate deterministic integer programs and propose solution approaches for this problem using some combinatorial methods.

17:36
Mixed-Integer Programming Formulations for Chance-Constrained Optimization with Mixture Models

ABSTRACT. We study chance-constrained optimization problems characterized by Discrete-Gaussian Mixture Models (DGMM) that mix both continuous Gaussian components and discrete masses. Central to our approach is the development of an efficient mixed-integer quadratically constrained program (MIQCP), designed to approximate these problems within any specified error tolerance. Our idea leverages the convex-concave nature of the standard normal cumulative distribution function (CDF), in conjunction with special ordered sets, to offer both inner and outer approximations of the original problem. These approximations are realized through piecewise linear functions along with guarantees on the necessary number of linear segments for achieving desired accuracy levels. Complementing the MIQCP, we also introduce a linear-fractional quadratically constrained program (LFQCP), providing a conservative upper bound and solvable through an efficient bisection algorithm. Empirical evaluations underscore the strengths of our proposed models. Through extensive numerical experiments that analyze the tradeoff between solution time and approximation quality, we find that our formulations offer a computational advantage over classical Sample Average Approximation (SAA) methods across various scenarios, including different problem classes and sizes, mixture components, chance constraint reliability levels, and reference distributions.

16:30-18:00 Session SatC 8
16:30
Local Search for Constrained Determinant Maximization

ABSTRACT. Determinant maximization provides an elegant generalization of problems in many areas, including convex geometry, statistics, machine learning, fair allocation of goods, and network design. In an instance of the determinant maximization problem, we are given a collection of vectors $v_1, v_2, \ldots, v_n \in \mathbb{R}^d$, and the goal is to pick a subset $S\subseteq [n]$ of given vectors to maximize the determinant of the matrix $\sum_{i\in S} v_i v_i^{\top}$, where the picked set of vectors $S$ must satisfy some combinatorial constraint such as cardinality constraint ($|S| \leq k$) or matroid constraint ($S$ is a basis of a matroid defined on $[n]$).

In this talk, we describe a local search-based combinatorial algorithm for the determinant maximization problem under a matroid constraint that achieves a $min\{O(d^d), O(r^r)\}-approximation for any matroid of rank $r$. Our approach builds on the classical matroid intersection algorithm by Edmonds while combining ideas from combinatorial optimization with algebraic properties of the determinant.

This talk is based on joint works with Adam Brown, Madhusudhan Pittu, Mohit Singh, and Prasad Tetali.

16:52
Non-convex geodesic optimization for the Paulsen problem in Frame Theory

ABSTRACT. The Paulsen problem is a basic question in operator theory: given frame $u_{1}, ..., u_{n} \in \R^{d}$ that is $\eps$-Parseval $\frac{1-\eps}{d} I_{d} \preceq \sum_{j=1}^{n} u_{j} u_{j}^{*} \preceq \frac{1+\eps}{d} I_{d}$, and $\eps$-nearly equal norm $\forall j \in [n]: \frac{1-\eps}{n} \leq \|u_{j}\|_{2}^{2} \leq \frac{1+\eps}{n}$, what is the distance to the nearest frame $v_{1}, ..., v_{n} \in \R^{d}$ that exactly satisfies these two conditions? A priori, the distance bound may depend on the parameters $(d,n,\eps)$. There are known examples showing a lower bound of $\text{dist}(V,U) \gtrsim \eps$. A recent series of works [KLLR, HM19, R23] used techniques from the scaling framework to resolve this question, showing that this distance bound of $\eps$ was tight up to constant factors in the worst case. Prior to these works, there were two partial results [CFM, BC10] showing a beyond worst-case distance bound $poly(d,n) \eps^{2}$, but these techniques were slightly ad-hoc and applied only in certain special cases. In this work, we are able to re-derive these results via the perspective of geodesic optimization. Specifically, we show that the procedures developed in [CFM, BC10] can be understood as gradient flows over certain non-Euclidean manifolds. Unlike the results from the scaling framework, this is still a non-convex optimization problem. Nevertheless, we are able to prove global convergence by appealing to the Polyak-Lojasiewicz condition. As added benefits, we are able to generalize and improve the result while simplifying the analysis.

17:14
Hidden convexity, optimization, and algorithms on rotation matrices

ABSTRACT. The set of rotation matrices, also known as the special orthogonal group, is the domain of a number of optimization problems in data science and robotics. We will discuss convexity properties of this set, and how they can be used to produce efficient algorithms for such optimization problems. Of special interest to us will be the facts that the image of the special orthogonal group under any linear projection into two dimensional space is convex, and so are certain coordinate projections of the special orthogonal group. We will also discuss some generalizations involving group theory and homotopy theory if time allows.

17:36
Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements

ABSTRACT. For min-max optimization and variational in- equalities problems (VIPs), Stochastic Extra-gradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size versions of SEG/SGDA have gained popularity due to several appealing benefits, but their convergence behaviors are complicated even in rudimentary bilinear models. Our work elucidates the probabilistic behavior of these algorithms and their projected variants, for a wide range of monotone and non-monotone VIPs with potentially biased stochastic oracles. By recasting them as time-homogeneous Markov Chains, we establish geometric convergence to a unique invariant distribution and aymptotic normality. Specializing to min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the global solution, which in turns allows for bias refinement via the Richardson- Romberg scheme. Our theoretical analysis is corroborated by numerical experiments.

16:30-18:00 Session SatC 9
16:30
SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models

ABSTRACT. Fine-tuning pre-trained models has gained significant success in delivering SOTA results across various NLP tasks. In the absence of centralized data, Federated Learning (FL) helps the model to benefit from clients' private data for fine-tuning. However, due to the limited communication, computation, and storage capabilities of edge devices and the huge sizes of popular pre-trained models, efficient fine-tuning is crucial. This work explores the opportunities and challenges of applying parameter-efficient fine-tuning (PEFT) methods in FL for language tasks. Specifically, our investigations reveal that with increasing data heterogeneity across users, the gap between fully fine-tuning the model and employing PEFT methods widens. To bridge this performance gap, we propose a method, SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios through a novel data-driven initialization technique. Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning, with significant sparse updates with $\sim 1\%$ density while reducing training time by up to $90\%$.

16:52
Sparse Random Networks for Communication-Efficient Federated Learning

ABSTRACT. One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round. While prior work has made great progress in compressing the weight updates through gradient compression methods, we propose a radically different approach that does not update the weights at all. Instead, our method freezes the weights at their initial random values and learns how to sparsify the random network for the best performance. To this end, the clients collaborate in training a stochastic binary mask to find the optimal sparse random network within the original one. At the end of the training, the final model is a sparse network with random weights – or a subnetwork inside the dense random network. We show improvements in accuracy, communication (less than 1 bit per parameter (bpp)), convergence speed, and final model size (less than 1 bpp) over relevant baselines on MNIST, EMNIST, CIFAR10, and CIFAR-100 datasets, in the low bitrate regime.

17:14
Adaptive Federated Learning with Auto-Tuned Clients

ABSTRACT. Federated learning (FL) is a distributed machine learning framework where the global model of a central server is trained via multiple collaborative steps by participating clients without sharing their data. While being a flexible framework, where the distribution of local data, participation rate, and computing power of each client can greatly vary, such flexibility gives rise to many new challenges, especially in the hyperparameter tuning on the client side. We propose $\Delta$-SGD, a simple step size rule for SGD that enables each client to use its own step size by adapting to the local smoothness of the function each client is optimizing. We provide theoretical and empirical results where the benefit of the client adaptivity is shown in various FL scenarios.

17:36
CyberTransfer: A Transfer Learning Approach for Enhancing Graph Neural Networks for Intrusion Detection

ABSTRACT. There is a growing interest in applying supervised learning for intrusion detection. However, many emerging applications lack sufficient training data for training an effective prediction model. Transfer learning is a promising approach for intrusion detection by enabling labeled data from other domains to be leveraged. In this paper, we focus on transfer learning for intrusion detection using Graph Neural Networks (GNNs). Graph models have achieved promising results for intrusion detection due to their ability to combine both structural and feature information from communication flows. However, whether GNNs for intrusion detection can benefit from transfer learning remains an open question. To address this question, we propose a transfer learning GNN based on shared latent representations between the source and target domain. Our results, using six datasets, show that the proposed approach consistently outperforms alternative solutions based only on limited labels from the target domain in terms of accuracy and generalization.

16:30-18:00 Session SatC10
16:30
Exploring Nonlinear Distance Metrics for Lipschitz Constant Estimation in Lower Bound Construction for Global Optimization

ABSTRACT. Bounds play a crucial role in guiding optimization algorithms, improving their speed and quality, and providing optimality gaps. While Lipschitz constant-based lower bound construction is an effective technique, the quality of the linear bounds depends on the function’s topological properties. In this research, we improve upon this by incorporating nonlinear distance metrics and surrogate approximations to generate higher-quality bounds. We emphasize the importance of using a flexible distance metric that can adapt to any function. We examine the characteristics and properties of different incorporated distance metrics. While the linear distance metric is popular in the literature due to its simplicity and intuitive interpretation, we discuss the potential benefits of alternative distance metrics, such as sublinear and superlinear distance metrics, which may be used for Hölder continuous functions. Sublinear distance metrics are advantageous for sparse data settings, while superlinear distance metrics can capture nonlinear relationships between data points. Combining surrogate models and nonlinear distance metrics for Lipschitz constant estimations results in high-quality lower bounds that can contribute to more effective exploration and exploitation and more accurate optimality gap estimation.

16:52
A New Fast Proximal Point Algorithm for Monotone Inclusions

ABSTRACT. We present a numerically fast proximal point algorithm to solve monotone inclusion problem in a real Hilbert space. The proposed algorithm is a combination of the classical proximal point algorithm, inertial term and two correction terms. The addition of two correction terms are considered to further accelerate the convergence speed of inertial proximal point algorithm already studied in the literature. In our convergence results, we obtain both weak and linear convergence of the proposed algorithm under some standard assumptions. Applications of the proposed algorithm to mixed variational inequalities, quasiconvex optimization problems and Douglas-Rachford splitting algorithm are given. Numerical results from compressive principal component pursuit and quasi-convex optimization problems show that our proposed algorithm outperforms inertial proximal point algorithms and other related algorithms in the literature.

17:14
A new path tracking algorithm for solving the CDT subproblem

ABSTRACT. The CDT subproblem is a classical quadratic programming problem with applications in nonlinear programs. In 2003, researchers proposed a path tracking algorithm for CDT, which either solves a relaxation given by a trust-region subproblem or enumerates some of the KKT solutions. The original algorithm has significant limitations in real execute because of the lack of the techniques for obtaining the multipliers of the KKT system. In recent years, there has been a great progress for solving the KKT system, especially the parameter eigenvalue based techniques. The corresponding algorithm can solve CDT globally by enumerating all the KKT solutions. However, due to the number of candidates need to be enumerated increases dramatically with the increase of the dimension, the efficiency of the eigenvalue based algorithm is not satisfactory enough. In this talk, we propose a new path tracking algorithm, combining the advantages of both of the algorithms. It either solves a trust-region relaxation of CDT or enumerating some of the KKT systems. The new algorithm reduces the number of the KKT systems to be solved significantly.

17:36
Deep Learning with Nontrivial Constraints

ABSTRACT. Despite the sweeping success and impact of deep learning in numerous domains, imposing explicit constraints is relatively new but increasingly pressing in deep learning (DL), stimulated by, e.g., trustworthy AI that performs robust optimization over complicated perturbation sets and scientific and engineering applications that need to respect physical laws and constraints. In this talk, we will (1) survey DL problems with nontrivial constraints across science, engineering, and medicine, (2) highlight the NCVX computing framework we have recently built, which provides deterministic solvers to solve constrained DL problems, and (3) invite the optimization community to solve the stochastic constrained DL problems.