Optimal and parameter-free gradient minimization methods for convex and nonconvex optimization
ABSTRACT. We propose novel optimal and parameter-free algorithms for computing an approximate solution with small (projected) gradient norm. Specifically, for computing an approximate solution such that the norm of its (projected) gradient does not exceed $\varepsilon$, we obtain the following results:
a) for the convex case, the total number of gradient evaluations is bounded by $\cO(1)\sqrt{L\|x_0 - x^*\|/\varepsilon}$, where $L$ is the Lipschitz constant of the gradient and $x^*$ is any optimal solution; b) for the strongly convex case, the total number of gradient evaluations is bounded by $\cO(1)\sqrt{L/\mu}\log(\|\nabla f(x_0)\|/\epsilon)$, where $\mu$ is the strong convexity modulus; and c) for the nonconvex case, the total number of gradient evaluations is bounded by $\cO(1)\sqrt{Ll}(f(x_0) - f(x^*))/\varepsilon^2$, where $l$ is the lower curvature constant. Our complexity results match the lower complexity bounds of the convex and strongly cases, and achieve the above best-known complexity bound for the nonconvex case for the first time in the literature. Moreover,
for all the convex, strongly convex, and nonconvex cases, we propose parameter-free algorithms that do not require the input of any problem parameters. To the best of our knowledge, there do not exist such parameter-free methods before especially for the strongly convex and nonconvex cases. Since most regularity conditions (e.g., strong convexity and lower curvature) are imposed over a global scope, the corresponding problem parameters are notoriously difficult to estimate. However,
gradient norm minimization equips us with
a convenient tool to monitor the progress of
algorithms and thus the ability to estimate
such parameters in-situ.
ABSTRACT. Recent works in dimensionality reduction for regression tasks have introduced the notion of sensitivity, an estimate of the importance of a specific datapoint in a dataset, offering provable guarantees on the quality of the approximation after removing low-sensitivity datapoints via subsampling. However, fast algorithms for approximating ℓp sensitivities, which we show is equivalent to approximate ℓp regression, are known for only the ℓ2 setting, in which they are termed leverage scores.
In this work, we provide efficient algorithms for approximating ℓp sensitivities and related summary statistics of a given matrix. In particular, for a given n×d matrix, we compute α-approximation to its ℓ1 sensitivities at the cost of O(n/α) sensitivity computations. For estimating the total ℓp sensitivity (i.e. the sum of ℓp sensitivities), we provide an algorithm based on importance sampling of ℓp Lewis weights, which computes a constant factor approximation to the total sensitivity at the cost of roughly O(\sqrt{d}) sensitivity computations. Furthermore, we estimate the maximum ℓ1 sensitivity, up to a \sqrt{d} factor, using O(d) sensitivity computations. We generalize all these results to ℓp norms for p>1. Lastly, we experimentally show that for a wide class of matrices in real-world datasets, the total sensitivity can be quickly approximated and is significantly smaller than the theoretical prediction, demonstrating that real-world datasets have low intrinsic effective dimensionality.
A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport
ABSTRACT. Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples. Recent works suggest that these estimators are more statistically efficient than plug-in (linear programming-based) OT estimators when comparing probability measures in high-dimensions~\citep{Vacher-2021-Dimension}. Unfortunately, that statistical benefit comes at a very steep computational price: because their computation relies on the short-step interior-point method (SSIPM), which comes with a large iteration count in practice, these estimators quickly become intractable w.r.t. sample size $n$. To scale these estimators to larger $n$, we propose a nonsmooth fixed-point model for the kernel-based OT problem, and show that it can be efficiently solved via a specialized semismooth Newton (SSN) method: We show, exploring the problem's structure, that the per-iteration cost of performing one SSN step can be significantly reduced in practice. We prove that our SSN method achieves a global convergence rate of $O(1/\sqrt{k})$, and a local quadratic convergence rate under standard regularity conditions. We show substantial speedups over SSIPM on both synthetic and real datasets.
A simple uniformly optimal method without line search for convex optimization
ABSTRACT. Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal O(1/k^2) rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with Holder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.
Quantum Algorithms for Scientific Computation and Optimization Problems
ABSTRACT. The revolutionary quantum computation can dramatically outperform classical computation for certain tasks like integer factoring. There are grand challenges in developing quantum algorithms for killer-apps such as protein design, epidemic prevention, climate change, and training large language models. This talk aims to cover recent advances in quantum algorithms for scientific computation and optimization problems, and discuss the prospects of quantum computation towards end-to-end applications.
Quantum communication complexity of linear regression
ABSTRACT. Dequantized algorithms show that quantum computers do not have exponential speedups for many linear algebra problems in terms of time and query complexity. In this work, we show that quantum computers can have exponential speedups in terms of communication complexity for some fundamental linear algebra problems. We mainly focus on solving linear regression and Hamiltonian simulation. In the quantum case, the task is to prepare the quantum state of the result. To allow for a fair comparison, in the classical case the task is to sample from the result. We investigate these two problems in two-party and multiparty models, propose near-optimal quantum protocols and prove quantum/classical lower bounds. In this process, we propose an efficient quantum protocol for quantum singular value transformation, which is a powerful technique for designing quantum algorithms. As a result, for many linear algebra problems where quantum computers lose exponential speedups in terms of time and query complexity, it is possible to have exponential speedups in terms of communication complexity.
Quantum Speedups for Zero-Sum Games via Improved Dynamic Gibbs Sampling
ABSTRACT. We give a quantum algorithm for computing an $\epsilon$-approximate Nash equilibrium of a zero-sum game in a $m \times n$ payoff matrix with bounded entries. Given a standard quantum oracle for accessing the payoff matrix our algorithm runs in time $\widetilde{O}(\sqrt{m + n}\cdot \epsilon^{-2.5} + \epsilon^{-3})$ and outputs a classical representation of the $\epsilon$-approximate Nash equilibrium. This improves upon the best prior quantum runtime of $\widetilde{O}(\sqrt{m + n} \cdot \epsilon^{-3})$ obtained by [van Apeldoorn and Gilyen, 2019] and the classic $\widetilde{O}((m + n) \cdot \epsilon^{-2})$ runtime due to [Grigoriadis and Khachiyan, 1995] whenever $\epsilon = \Omega((m +n)^{-1})$. We obtain this result by designing new quantum data structures for efficiently sampling from a slowly-changing Gibbs distribution.
Quantum Lower Bounds for Finding Stationary Points of Nonconvex Functions
ABSTRACT. Quantum algorithms for optimization problems are of general interest. Despite recent progress in classical lower bounds for nonconvex optimization under different settings and quantum lower bounds for convex optimization, quantum lower bounds for nonconvex optimization are still widely open. In this paper, we conduct a systematic study of quantum query lower bounds on finding $\epsilon$-approximate stationary points of nonconvex functions, and we consider the following two important settings: 1) having access to $p$-th order derivatives; or 2) having access to stochastic gradients. The classical query lower bounds is $\Omega\big(\epsilon^{-\frac{1+p}{p}}\big)$ regarding the first setting, and $\Omega(\epsilon^{-4})$ regarding the second setting (or $\Omega(\epsilon^{-3})$ if the stochastic gradient function is mean-squared smooth). In this paper, we extend all these classical lower bounds to the quantum setting. They match the classical algorithmic results respectively, demonstrating that there is no quantum speedup for finding $\epsilon$-stationary points of nonconvex functions with $p$-th order derivative inputs or stochastic gradient inputs, whether with or without the mean-squared smoothness assumption. Technically, our quantum lower bounds are obtained by showing that the sequential nature of classical hard instances in all these settings also applies to quantum queries, preventing any quantum speedup other than revealing information of the stationary points sequentially.
Determining Sharp Proximity Bounds For Low Row Rank and $\Delta$-Modularity
ABSTRACT. A question of great interest in integer programming is that of proximity bounding: how large can the distance be between an optimal vertex solution of the corresponding linear program relaxation and the closest integer optimal solution in the integer linear program? The proximity bound has many applications; for example, it can be used to bound the integrality gap and estimate nearby integer solutions in a dynamic programming framework. The maximum absolute subdeterminant of the coefficient matrix is known to be an important parameter for measuring the proximity bound. Here, we consider the integer program in the standard form $\max\{c^\top x:~Ax=b, x\ge0\}$ with a bounded feasible region, where the coefficient matrix $A$ is of full row rank $m$ and has a maximum order-$m$ absolute subdeterminant of $\Delta$ (where we call the rank-$m$ matrix $A$ $\Delta$-modular). Using Wolsey's $b$-hull results, we propose a method to determine the proximity bound for a fixed $A$ and parametric $b,c$. Furthermore, by adapting an algorithm by Averkov and Schymura for classifying such rank-$m$ $\Delta$-modular matrices up to unimodular transformation, we calculate sharp proximity bounds for small $m$ and $\Delta$ using \texttt{SageMath}.
ABSTRACT. Unlike matrix completion, tensor completion does not have an algorithm that is known to achieve the information-theoretic sample complexity rate. We develop the first practical algorithm to attain this rate. Even the feasibility variant of low rank tensor completion is computationally challenging, as determining the rank of a generic tensor is NP-Hard. To address this issue, our algorithm relies on a modified Frank-Wolfe approach that ensures global convergence, whereby separation is conducted over a polyhedron that is implicitly described via integer programming. The implementation is able to handle a variety of instances involving up to several million entries.
ABSTRACT. A conflict graph represents logical relations between binary variables, and effective use of the graph can significantly accelerate branch-and-cut solvers for mixed-integer programming (MIP). In this paper we develop efficient parallel algorithms for the various components of conflict graph management: conflict detection; maximal clique generation; clique extension; and clique merging. We leverage parallel computing in order to intensify computational effort on the conflict graph, thereby generating a much larger pool of cutting planes than what can be practically achieved in serial. Computational experiments demonstrate near-linear (i.e. ideal) speedups using up to 64 cores when there is high computational load from the conflict graph. Moreover, the expanded pool of cuts enabled by parallel computing lead to substantial reductions in total MIP solve time, especially for more challenging cases.
ABSTRACT. We propose an extension of matrix games where the row player may select rows and remove columns, subject to a budget constraint. We present an exact mixed-integer linear programming (MILP) formulation for the problem, provide analytical results concerning its solution, and discuss applications in the security domain. Our computational experiments show heuristic approaches on average obtain suboptimal solutions with at least a 20% relative gap with those obtained by our MILP formulation.
Adaptive Surrogate Modeling for Trajectory Optimization with Model Inexactness
ABSTRACT. In many applications one must compute optimal trajectories from imperfect knowledge of the dynamics. For example, solving trajectory optimization problems for hypersonic vehicles requires the computation of lift and drag coefficients at many flight configurations. Determining these coefficients requires expensive high-fidelity computations using detailed representations of the hypersonic vehicle. This talk proposes the use of computationally inexpensive adaptive Gaussian process models constructed from high-fidelity samples to approximate the components of the dynamics that are expensive to evaluate. To reduce the effect of model errors on the optimal trajectory, the current Gaussian process model is updated as needed at the cost of evaluating the components of the dynamics at a small number of additional sample points. First, the optimal control problem is solved using the mean of the current Gaussian process model to represent the dynamics. Next, sensitivity analysis is combined with properties of the Gaussian process model, such as its variance, to determine whether the Gaussian process model needs to updated, and if so, at which samples the dynamics should be evaluated to update the Gaussian process model. This talk outlines our current model refinement procedure and demonstrates its performance on a trajectory optimization problem for a hypersonic vehicle with lift and drag models that are known, but expensive to evaluate.
ABSTRACT. Chance constraints provide a principled framework to mitigate the risk of high-impact extreme events by modifying the controllable properties of a system. The low probability and rare occurrence of such events, however, impose severe sampling and computational requirements on classical solution methods that render them impractical. This work proposes a novel sampling-free method for solving rare chance constrained optimization problems affected by uncertainties that follow general Gaussian mixture distributions. By integrating modern developments in large deviation theory with tools from convex analysis and bilevel optimization, we propose tractable formulations that can be solved by off-the-shelf solvers. Our formulations enjoy several advantages compared to classical methods: their size and complexity are independent of event rarity, they do not require linearity or convexity assumptions on system constraints, and under easily verifiable conditions, serve as safe conservative approximations or asymptotically exact reformulations of the true problem. Computational experiments on linear, nonlinear, and PDE-constrained problems from applications in portfolio management, structural engineering, and fluid dynamics illustrate the broad applicability of our method and its advantages over classical sampling-based approaches in terms of both accuracy and efficiency.
Fast iterative solvers for initial value control problems with application to diffeomorphic image registration
ABSTRACT. We consider a variational optimization problem governed by transport equations for image intensities and a geodesic equation on the group of diffeomorphisms. The underlying inverse problem is nonlinear, ill-posed, and infinite-dimensional in principal, leading to high-dimensional, ill-conditioned inversion operators after discretization. Developing effective numerical methods to solve the associated optimality conditions poses significant mathematical and computational challenges.
Our contributions are the design, implementation and analysis of effective numerical algorithms. We will discuss fast numerical methods for evaluating forward and adjoint operators, effective strategies for numerical optimization, and propose several methods for preconditioning. We test the performance of of the designed numerical methods on synthetic and real-world data.
Neural Network Approaches for Parameterized Optimal Control
ABSTRACT. This presentation focuses on optimal control problems with unknown or uncertain parameters. We aim to streamline offline solutions across relevant parameter sets for quick decision-making and adaptability in the online stage. Using neural networks to represent policies, we compare a model-based approach, leveraging system dynamics, and an actor-critic reinforcement learning method for data-driven policy approximation. Through a two-dimensional convection-diffusion equation example, we evaluate accuracy, efficiency, and scalability. The model-based method proves more accurate and reduces partial differential equation solves significantly.
A Practical and Optimal First-Order Method for Large-Scale Convex Quadratic Programming
ABSTRACT. Convex quadratic programming (QP) is an important class of optimization problem with wide applications in practice. The classic QP solvers are based on either simplex or barrier method, both of which suffer from the scalability issue because their computational bottleneck is solving linear equations. In this paper, we design and analyze a first-order method called the restarted accelerated primal-dual hybrid gradient method for QP, whose computational bottleneck is matrix-vector multiplication. We show that the proposed algorithm has a linear convergence rate when solving generic QP, and the obtained linear rate is optimal among a wide class of primal-dual methods. Furthermore, we connect the linear rate with a sharpness constant of the KKT system of QP, which is a standard quantity to measure the hardness of a continuous optimization problem. Numerical experiments on a standard QP benchmark set showcase the advantage of the proposed algorithm compared to its first-order counterparts.
Sublevel-Set Geometry and the Performance of restarted-PDHG for Linear and Conic Optimization
ABSTRACT. The restarted primal-dual hybrid gradient method (rPDHG) -- with heuristic enhancements and GPU implementation -- has been very successful in solving huge-scale LP problems; however its application to more general convex conic optimization problems is not so well studied. Here we analyze the theoretical and practical performance of rPDHG for general (convex) conic optimization. We show a relationship between the geometry of the primal-dual sublevel sets L_eps and the convergence rate of rPDHG. In particular, we show that the convergence rate of rPDHG is faster when there is a primal-dual sublevel set L_eps for which (i) L_eps is close (in Hausdorff distance) to the optimal solution set, and (ii) the ratio of the diameter to the width of L_eps is small. Our computational guarantees suggest new practical enhancements for rPDHG via conic re-scaling and linear transformations based on approximate analytic centers. This is joint work with Robert Freund.
Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient
ABSTRACT. We present PDLP, a practical first-order method for linear programming (LP) that can solve to the high levels of accuracy that are expected in traditional LP applications. In addition, it can scale to very large problems because its core operation is matrix-vector multiplications. PDLP is derived by applying the primal-dual hybrid gradient (PDHG) method, popularized by Chambolle and Pock (2011), to a saddle-point formulation of LP. PDLP enhances PDHG for LP by combining several new techniques with older tricks from the literature; the enhancements include diagonal preconditioning, presolving, adaptive step sizes, and adaptive restarting. PDLP improves the state of the art for first-order methods applied to LP. We compare PDLP with SCS, an ADMM-based solver, on a set of 383 LP instances derived from MIPLIB 2017. With a target of 10^{-8} relative accuracy and 1 hour time limit, PDLP achieves a 6.3x reduction in the geometric mean of solve times and a 4.6x reduction in the number of instances unsolved (from 227 to 49). Furthermore, we highlight standard benchmark instances and a large-scale application (PageRank) where our open-source prototype of PDLP, written in Julia, outperforms a commercial LP solver.
Improving the Geometry of Linear and Conic Optimization Problems for restarted-PDHG
ABSTRACT. Our recent research has shown that the performance of the restarted primal-dual hybrid gradient method (rPDHG) on linear and conic optimization problems depends on certain geometric condition measures of the primal-dual sublevel sets L_eps, namely the ratio of the diameter to the interior radius of L_eps . In practice this ratio can take on extreme values and lead to poor performance of rPDHG both in theory and in practice, which begs the question whether these condition measures can be improved via suitable linear transformations? In this talk we show how linear transformations -- including conic rescaling -- can markedly enhance the theoretical and practical convergence rate of rPDHG. We present extensive computational results that show that central-path-based rescaling can accelerate convergence to high-accuracy solutions, and lead to more efficient methods for large-scale linear and conic optimization problems. This is joint work with Zikai Xiong.
Stochastic derivative-free optimization algorithms using random subspace strategies
ABSTRACT. We propose stochastic derivative-free optimization (DFO) algorithms that achieve scalability using random models and polling directions in low-dimensional random affine subspaces. The proposed trust-region and direct-search methods reduce per-iteration costs in terms of function evaluations, thus yielding strong performance on large-scale DFO problems. The dimension of these subspaces can be chosen via Johnson--Lindenstrauss transforms. The quality of the subspaces and the accuracies of estimates and models used by the algorithms hold with sufficiently high, but fixed, probabilities. Convergence and complexity results are obtained using Clarke nonsmooth analysis and martingale theory.
Branch-and-Model: A Derivative-free Global Optimization Algorithm
ABSTRACT. Derivative-free optimization (DFO) is an important class of optimization algorithms that solve problems based on objective and function evaluations. In this work, we present a novel derivative-free global algorithm Branch-and-Model (BAM). The BAM algorithm utilizes a flexible partition scheme and model-based search techniques, which exploit the local trend and speed up the convergence in solution refinement. We show that the timing to start local refinement is critical and that clustering with the DBSCAN algorithm can help the local search start at a better time and identify the promising regions for refinement search. Extensive computational experiments over 500 publicly open-source test problems show that BAM outperforms state-of-the-art DFO algorithms, especially for higher-dimension problems.
First-order Trust-region Methods with Adaptive Sampling
ABSTRACT. We are interested in optimizing noisy black-box problems that are non-convex in expectation. Such settings occur in a multitude of engineering and machine learning applications. For the noisy black-boxes that provide direct derivative observations, we devise new variants of trust-region methods that use adaptive sampling schemes and adjusted acceptance criteria to improve the iteration and sample complexity of these algorithms. The resulting methods will have clear implementable guidelines for speedy and robust finite-time performance.
Enabling Scale-up through Multi-fidelity Adaptive Computing
ABSTRACT. We present ideas from our ongoing work in adaptive computing – an optimization framework that allows us to strategically deploy various fidelity level experiments and simulations to guide decision making. The framework aims to enable uncertainty quantified scale-up of simulations and experiments, which causes increased complexity. A key feature of the framework is the integration of user-specified local model trustworthiness estimates. Adaptive sampling strategies based on surrogate models allow us to optimally exploit the multiple fidelity level information and trustworthiness measures to arrive at the best decisions within a highly limited budget of objective function evaluations.
ABSTRACT. Wasserstein distributionally robust optimization (DRO) finds robust solutions by hedging against data perturbation specified by distributions in a Wasserstein ball. The robustness is linked to the regularization effect, which has been studied for continuous losses in various settings. However, existing results cannot be simply applied to the 0-1 loss, which is frequently seen in uncertainty quantification, classification, and chance-constrained programs. In this paper, we relate the Wasserstein DRO with 0-1 loss to a new regularization problem, in which the regularization term is a polynomial of the radius of the Wasserstein ball and the density around the decision boundary. Importantly, this result implies a qualitative difference between 0-1 losses and continuous losses in terms of the radius selection: for most interesting cases, it suffices to choose a radius of the order smaller than the root-n rule. Numerical experiments demonstrate the effectiveness of our implied radius selection rule.
A tractable smart predict-then-optimize algorithm beyond the realizable case
ABSTRACT. A common application of machine learning in decision-making is contextual optimization, where we build a machine learning model to predict the unknown parameters of the optimization problems from contextual information. For example, key cost parameters in traffic routing or electricity dispatch are not known a priori and crucially depend on contextual data such as weather and incidents. The "smart" predict-then-optimize approach (SPO) has become an attractive framework for designing prediction models that reduce decision error due to a prediction, rather than simply minimizing prediction error. However, this problem is highly nonconvex, and no provably tractable algorithms have been proposed for solving it in misspecified cases. In this talk, we will present our new work that proposes a tractable algorithmic framework for minimizing SPO loss. Theoretically, this algorithm allows us to find the best prediction within the function class even in misspecified cases. Our numerical experiments show that our algorithm can significantly improve the prediction compared to SPO+.
PAC-Bayesian Spectrally-Normalized Bounds for Adversarially Robust Generalization
ABSTRACT. Deep neural networks (DNNs) are vulnerable to adversarial attacks. It is found empirically that adversarially robust generalization is crucial in establishing defense algorithms against adversarial attacks. Therefore, it is interesting to study the theoretical guarantee of robust generalization. This paper focuses on norm-based complexity, based on a PAC-Bayes approach (Neyshabur et al., 2018}. The main challenge lies in extending the key ingredient, which is a weight perturbation bound in standard settings, to the robust settings. Existing attempts heavily rely on additional strong assumptions, leading to loose bounds. In this paper, we address this issue and provide a spectrally-normalized robust generalization bound for DNNs. Compared to existing bounds, our bound offers two significant advantages: Firstly, it does not depend on additional assumptions. Secondly, it is considerably tighter, aligning with the bounds of standard generalization. Therefore, our result provides a different perspective on understanding robust generalization: The mismatch terms between standard and robust generalization bounds shown in previous studies do not contribute to the poor robust generalization. Instead, these disparities solely due to mathematical issues. Finally, we extend the main result to adversarial robustness against general non-$\ell_p$ attacks and other neural network architectures.
ABSTRACT. Despite the growing prevalence of artificial neural networks in real-world applications, their vulnerability to adversarial attacks remains to be a significant concern, which motivates us to investigate the robustness of machine learning models.
While various heuristics aim to optimize the distributionally robust risk using the $\infty$-Wasserstein metric, such a notion of robustness frequently encounters computation intractability. To tackle the computational challenge, we develop a novel approach to adversarial training that integrates divergence regularization into the distributionally robust risk function. This regularization brings a notable improvement in computation compared with the original formulation. We develop stochastic gradient methods with near-optimal sample complexity to solve this problem efficiently. Moreover, we establish the regularization effects and demonstrate this formulation is asymptotic equivalence to a regularized empirical risk minimization~(ERM) framework, by considering various scaling regimes of the regularization parameter $\eta$ and robustness level $\rho$. These regimes yield gradient norm regularization, variance regularization, or a smoothed gradient norm regularization that interpolates between these extremes. We numerically validate our proposed method in supervised learning and reinforcement learning applications and showcase its state-of-the-art performance against various adversarial attacks.
Coordinate Partitioning for Difficult Euclidean Max-Sum Diversity Problems
ABSTRACT. We present a new method for solving difficult Euclidean max-sum diversity problems. Our approach involves constructing a new set of locations whose squared Euclidean distances equal that of the original. Using squared distances allows the objective function to be broken up into the sum of pairwise distances within each coordinate. We then use a partition set of the coordinates to enable a functional decomposition of the objective. Each functional component is expected to be simpler than the original, and therefore easier to approximate via cutting plane methods. This allows the method to be extremely effective at maximising Euclidean dissimilarity between data points with large feature sets. We prove the global convergence of our new approach and introduce several partitioning strategies. Furthermore, we show how a principle component analysis of coordinate influence can be conducted with minimal extra computation, the results of which can then be used to guide the partitioning process. Extensive numerical results prove the efficiency of the partitioned cutting plane method, with the algorithm able to solve large, 20-coordinate problems of up to 1000 locations. Finally, we introduce a new class of challenging diversity problems, characterised by locations situated on the edge of a ball.
Boosted Difference of Convex Algorithms for Clustering with Constraints
ABSTRACT. This talk aims to investigate the effectiveness of the recently proposed Boosted Difference of Convex functions Algorithm (BDCA) when applied to clustering with constraints and set clustering with constraints problems. We present the mathematical basis for the BDCA and Difference of Convex functions Algorithm (DCA), along with a penalty method based on distance functions. We then present algorithms for solving these problems and computationally implement them. We compare against previous examples from the literature and provide new experiments to test the algorithms. We find that the BDCA method converges in fewer iterations than the corresponding DCA-based method. In addition, BDCA yields faster CPU running-times in all tested problems.
Efficient relax-and-cut separation in a branch-and-cut solver
ABSTRACT. Separation in a branch-and-cut framework, i.e., the construction of valid inequalities cutting off the current relaxation's fractional solution, is an essential component of modern mixed integer solvers. It is typically performed in rounds, i.e., the cuts generated in a separation call are added to the relaxation before the next separation call. This process has potential drawbacks: (1) the generation of higher-ranked cuts resulting in numerical troubles of the solving process for cuts such as Gomory mixed integer cuts and (2) the need for multiple rounds of separation to achieve dual bound improvement in the presence of dual degeneracy. Although multiple lines of work on the relax-and-cut framework address the first drawback, the second drawback has received less attention despite being critical to the usefulness of separation in the solvers.
Our relax-and-cut separation addresses both these drawbacks. We will discuss the general relax-and-cut framework, the specific version of Fischetti and Salvagnin (2011) upon which we build our approach, and our algorithmic enhancements inspired by techniques from nonlinear optimization. We implemented this in SCIP, a state-of-the-art open-source mixed integer solver. Computational experiments on the MIPLIB 2017 benchmark dataset show an improved solving time and the number of nodes, especially for harder instances.
Network Design for the Traffic Assignment Problem with Mixed-Integer Frank-Wolfe
ABSTRACT. We tackle the network design problem for centralized traffic assignment, which can be cast as a mixed-integer convex optimization (MICO) problem. For this task, we propose different formulations and solution methods in both a deterministic and a stochastic setting in which the demand is unknown in the design phase. We leverage the recently proposed Boscia framework, which can solve MICO problems when the main nonlinearity stems from a differentiable objective function. Boscia tackles these problems by branch-and-bound with continuous relaxations solved approximately with Frank-Wolfe algorithms. We compare different linear relaxations and the corresponding subproblems solved by Frank-Wolfe, and alternative problem formulations to identify the situations in which each performs best.
Our experiments evaluate the different approaches on instances from the Transportation Networks library and highlight the suitability of the mixed-integer Frank-Wolfe algorithm for this problem. In particular, we find that the Boscia framework is particularly applicable to this problem and that a mixed-integer linear Frank-Wolfe subproblem performs well for the deterministic case, while a penalty-based approach, with decoupled feasible regions for the design and flow variables, dominates other approaches for stochastic instances with many scenarios.
Joint Expansion Planning of Power and Water Distribution Networks
ABSTRACT. This research explores the joint expansion planning of power and water distribution networks, which exhibit interdependence at various levels. We specifically focus on the dependency arising from the power consumption of pumps and develop models to integrate new components into existing networks seamlessly. Subsequently, we formulate the joint expansion planning as a Mixed Integer Nonlinear Program (MINLP). By applying this MINLP to a small-scale test network, we demonstrate the advantages of combining expansion planning, including cost savings and reduced redundancy, compared to independently expanding power and water distribution networks. Then, we will briefly discuss the computational challenges involved in solving the problem on large-scale networks and the potential methods to tackle these challenges.
Designing Resilient Relay Logistics Networks Using k Shortest Paths
ABSTRACT. We consider the problem of designing resilient relay logistics networks by selecting hub locations to minimize the total demand-weighted distance of the k-shortest paths between each origin-destination pair. We model this problem using a path-based integer-programing formulation and leverage its structure to devise two tailored scalable solution methodologies based on Benders decomposition and branch-and-price. To showcase the applicability of the developed model and solution methodologies, we design large-scale intercity parcel-delivery networks across Central China and compare their resilience to disruptions with respect to traditional cost-optimized logistics networks.
Data Storage and Sharing Scheme Based on Double Blockchain and Proxy Re-Encryption
ABSTRACT. Data diffusion becomes a hot topic in most areas that induce the exchange of value between stakeholders. Information security and privacy play a considerable role in data processing, storage, and sharing. The purpose of the following article is to present a decentralized data storage and sharing scheme that deals with the access delegation problem. The design securely manages private and sensitive information, enhances its confidentiality, integrity, availability, and reinforces data access control. The proposed model combines the decentralized technology of Blockchain which consolidates data integrity thanks to its groundbreaking features (i.e., immutability, traceability, anonymity, etc.), proxy re-encryption (PRE) technique which strengthens the confidentiality of shared data, and the storage framework Storj that protects efficiently saved information (i.e., splitting, encrypting, etc.) and ameliorates its availability. The paper begins by presenting related works and security goals of the scheme, after it defines the key concepts of the solution, draws the suggested storage and sharing model, produces its workflow and announces future works.
ABSTRACT. This paper considers the online resource allocation problem, a problem generalizing many online problems. In this problem, a decision-maker has a limited amount of resources. Requests arrive sequentially and the decision-maker needs to decide on an action for the request immediately on arrival. An action generates a reward and consumes resources. The rewards and resource consumption can differ between each action and each request. Actions must be taken without knowledge of future requests. The objective is to maximize the total rewards subject to resource constraints.
We consider the algorithms with predictions model and assume the shadow price is predicted a priori. The predictions can be erroneous. Our main contribution is an algorithm that takes the prediction of unknown quality as an input, and achieves asymptotically optimal performance under both requests arrival models (stochastic and adversarial) without knowing the prediction quality and the requests arrival model beforehand.
Spectral Bundle Methods For Primal And Dual Semidefinite Programs
ABSTRACT. The spectral bundle method developed by Helmberg and Rendl is well-established for solving large-scale semidefinite programs (SDPs) in the dual form, especially when the SDPs admit low-rank primal solutions. Under mild regularity conditions, a recent result by Ding and Grimmer has established fast linear convergence rates when the bundle method captures the rank of primal solutions.
In this paper, we present an overview and comparison of spectral bundle methods for solving both primal and dual SDPs. In particular, we introduce a new family of spectral bundle methods for solving SDPs in the primal form. The algorithm developments are parallel to those by Helmberg and Rendl, mirroring the elegant duality between primal and dual SDPs. The new family of spectral bundle methods also achieves linear convergence rates for primal feasibility, dual feasibility, and duality gap when the algorithm captures the rank of the dual solutions. Therefore, the original spectral bundle method by Helmberg and Rendl is well-suited for SDPs with low-rank primal solutions, while on the other hand, our new spectral bundle method works well for SDPs with low-rank dual solutions. These theoretical findings are supported by a range of large-scale numerical experiments. Finally, we demonstrate that our new spectral bundle method achieves state-of-the-art efficiency and scalability for solving polynomial optimization compared to a set of baseline solvers SDPT3, MOSEK, CDCS, and SDPNAL+.
A globally convergent difference-of-convex algorithmic framework and application to log-determinant optimization problems
ABSTRACT. The difference-of-convex algorithm (DCA) is a conceptually simple method for the minimization of (possibly) nonconvex functions that are expressed as the difference of two convex functions. At each iteration, DCA constructs a global overestimator of the objective and solves the resulting convex subproblem. Despite its conceptual simplicity, the theoretical understanding and algorithmic framework of DCA needs further investigation. In this paper, global convergence of DCA at a linear rate is established under an extended Polyak--Łojasiewicz condition. The proposed condition holds for a class of DC programs with a bounded, closed, and convex constraint set, for which global convergence of DCA cannot be covered by existing analyses. Moreover, the DCProx computational framework is proposed, in which the DCA subproblems are solved by a primal--dual proximal algorithm with Bregman distances. With a suitable choice of Bregman distances, DCProx has simple update rules with cheap per-iteration complexity. As an application, DCA is applied to several fundamental problems in network information theory, for which no existing numerical methods are able to compute the global optimum. For these problems, our analysis proves the global convergence of DCA, and more importantly, DCProx solves the DCA subproblems efficiently. Numerical experiments are conducted to verify the efficiency of DCProx.
Accelerating nuclear-norm regularized low-rank matrix optimization through Burer-Monteiro decomposition
ABSTRACT. This work proposes a rapid algorithm, BM-Global, for nuclear-norm-regularized convex and low-rank matrix optimization problems. BM-Global efficiently decreases the objective value via low-cost steps leveraging the nonconvex but smooth Burer-Monteiro (BM) decomposition, while effectively escapes saddle points and spurious local minima ubiquitous in the BM form to obtain guarantees of fast convergence rates to the global optima of the original nuclear-norm-regularized problem through aperiodic inexact proximal gradient steps on it. The proposed approach adaptively adjusts the rank for the BM decomposition and can provably identify an optimal rank for the BM decomposition problem automatically in the course of optimization through tools of manifold identification. BM-Global hence also spends significantly less time on parameter tuning than existing matrix-factorization methods, which require an exhaustive search for finding this optimal rank. Extensive experiments on real-world large-scale problems of recommendation systems, regularized kernel estimation, and molecular conformation confirm that BM-Global can indeed effectively escapes spurious local minima at which existing BM approaches are stuck, and is a magnitude faster than state-of-the-art algorithms for low-rank matrix optimization problems involving a nuclear-norm regularizer.
Uncertainty Quantification of Set-Membership Estimation in Control and Perception: Revisiting the Minimum Enclosing Ellipsoid
ABSTRACT. Set-membership estimation (SME) outputs a set estimator that guarantees to cover the groundtruth. Such sets are, however, defined by (many) abstract (and potentially nonconvex) constraints and therefore difficult to manipulate. We present tractable algorithms to compute simple and tight overapproximations of SME in the form of minimum enclosing ellipsoids (MEE). We first introduce the hierarchy of enclosing ellipsoids proposed by Nie and Demmel (2005), based on sums-of-squares relaxations, that asymptotically converge to the MEE of a basic semialgebraic set. This framework, however, struggles in modern control and perception problems due to computational challenges. We contribute three computational enhancements to make this framework practical, namely constraints pruning, generalized relaxed Chebyshev center, and handling non-Euclidean geometry. We showcase numerical examples on system identification and object pose estimation. (Joint work with Yukai Tang and Jean-Bernard Lasserre)
A Low-Rank Augmented Lagrangian Method for Large-Scale Semidefinite Programming Based on a Hybrid Convex-Nonconvex Approach
ABSTRACT. This talk discusses the theory of a new first-order method for solving large-scale semidefinite programs (SDPs) with bounded domain. It is an inexact augmented Lagrangian (AL) method where the AL subproblems are solved by a novel hybrid low-rank method. In contrast to the classical low-rank method, the new method finds a near-optimal solution (with provable complexity bounds) of SDP instances satisfying strong duality. Computational results (to be presented in the next talk) comparing the new method to state-of-the-art solvers on several large SDP instances show that the former finds higher accurate solutions in substantially less CPU time than the latter ones. For example, it can solve in less than 30 minutes, and $10^{-5}$ relative error condition, a maximum stable set SDP with a $10^6 x 10^7$ matrix variable.
A Low-Rank Augmented Lagrangian Method for Large-Scale Semidefinite Programming Based on a Hybrid Convex-Nonconvex Approach: Part II
ABSTRACT. This talk discusses the computational aspects of a new first-order method for solving large-scale semidefinite programs (SDPs) with bounded domain. It is an inexact augmented Lagrangian (AL) method where the AL subproblems are solved by a novel hybrid low-rank method. In contrast to the classical low-rank method, the new method finds a near-optimal solution (with provable complexity bounds) of SDP instances satisfying strong duality. Computational results comparing the new method to state-of-the-art solvers on several large SDP instances show that the former finds higher accurate solutions in substantially less CPU time than the latter ones. For example, it can solve in less than 30 minutes, and $10^{-5}$ relative error condition, a maximum stable set SDP with a $10^6 x 10^7$ matrix variable.
An Adaptive Block Proximal ADMM for Weakly Convex, Linearly-Constrained Composite Functions.
ABSTRACT. This talk presents an adaptive, cyclic block proximal version of the alternating direction method of multipliers (AP-ADMM) for solving linearly-constrained, weakly convex, composite optimization problems. This method is adaptive to all problem parameters including smoothness and weak convexity constants. We assume that the smooth component of the objective is weakly convex and possibly nonseparable, while the non-smooth component is convex, block-separable, and has easily computable block proximal maps. Each iteration of AP-ADMM consists of: (1) sequentially solving the sequence of all block proximal subproblems, and (2) performing full Lagrange multiplier and/or penalty parameter updates based on a novel update criterion. Without any rank assumptions on the constraint matrices, it is shown that AP-ADMM obtains an approximate first-order stationary point of the constrained problem in \mathbf{O}\left(\varepsilon^{-3}\right) iterations.
Problem-dependent convergence analysis for randomized linear gradient compression
ABSTRACT. In distributed optimization, the communication of model updates can be a performance bottleneck. Consequently, gradient compression has been proposed as a means of increasing optimization throughput. In general, due to information loss, compression introduces a penalty on the number of iterations needed to reach a solution. In this work, we investigate how the iteration penalty depends on the interaction between compression and problem structure, in the context of non-convex stochastic optimization. We focus on linear compression schemes, where compression and decompression can be expressed as multiplication with a random matrix. We consider several distributions of matrices, among them random orthogonal matrices and matrices with random Gaussian entries. We find that in each case, the impact of compression on convergence can be quantified in terms of the norm of the Hessian of the objective, using a norm defined by the compression scheme. The analysis reveals that in certain cases, compression can benefit from low-rank structure in the problem. In these cases, our bounds predict that the penalty introduced by compression is significantly reduced compared to worst-case bounds that only consider the compression level, ignoring problem data. We verify the theoretical findings on several real-world distributed machine learning training scenarios.
Optimal Minimax Optimization and Variational Inequalities with Separable Structure
ABSTRACT. We consider the problem of solving stochastic monotone variational inequalities
with a separable structure using a stochastic first-order oracle. Building on standard
extragradient for variational inequalities we propose a novel algorithm—stochastic
accelerated gradient-extragradient (AG-EG)—for strongly monotone variational
inequalities (VIs). Our approach combines the strengths of extragradient and
Nesterov acceleration. By showing that its iterates remain in a bounded domain and
applying scheduled restarting, we prove that AG-EG has an optimal convergence
rate for strongly monotone VIs. Furthermore, when specializing to the particular
case of bilinearly coupled strongly-convex-strongly-concave saddle-point problems,
including bilinear games, our algorithm achieves fine-grained convergence rates
that match the respective lower bounds, with the stochasticity being characterized
by an additive statistical error term that is optimal up to a constant prefactor.
Stochastic Controlled Averaging for Federated Learning with Communication Compression
ABSTRACT. Communication compression, a technique aiming to reduce the information volume to be transmitted over the air, has gained great interest in Federated Learning (FL) for the potential of alleviating its communication overhead. However, communication compression brings forth new challenges in FL due to the interplay of compression-incurred information distortion and inherent characteristics of FL such as partial participation and data heterogeneity. Despite the recent development, the performance of compressed FL approaches has not been fully exploited. The existing approaches either cannot accommodate arbitrary data heterogeneity or partial participation, or require stringent conditions on compression.
In this talk, we revisit the seminal stochastic controlled averaging method by proposing an equivalent but more efficient/simplified formulation with halved uplink communication costs. Building upon this implementation, we propose two compressed FL algorithms, SCALLION and SCAFCOM, to support unbiased and biased compression, respectively. Both the proposed methods outperform the existing compressed FL methods in communication and computation complexities. Moreover, SCALLION and SCAFCOM accommodate arbitrary data heterogeneity and do not make any additional assumptions on compression errors. Experiments verify the extraordinary performance of SCALLION and SCAFCOM.
On the Adversarial Robustness of Benjamini Hochberg
ABSTRACT. The Benjamini-Hochberg (BH) procedure is widely used to control the false detection rate (FDR) in large-scale hypothesis testing. We investigate the effect that adversarial perturbations may have on this control; in particular, we demonstrate that the FDR guarantee can be badly broken with minimal perturbation to a single p-value, highlighting the fragility of the BH procedure. We focus on identifying the perturbation of a single p-value that generates the largest adversarially-adjusted FDR, and provide: (1) an efficient (linear-time) optimal algorithm; (2) non-asymptotic guarantees on the expected adversarial-adjustment to FDR; and (3) numerical experiments on data sets to illustrate this effect. The takeaway is that the probabilistic dependence among the reject/accept decisions generated by the BH procedure can be hacked. Our technical analysis involves reframing the BH procedure as a ``balls into bins'' process, and drawing a connection to generalized ballot problems that facilitates an information-theoretic approach for deriving non-asymptotic lower bounds.
ABSTRACT. We consider the stochastic multi-armed bandit (MAB), which is equivalent to a Markov decision process with a single state and unknown costs. MAB models many data-driven decision-making problems, such as advertisement, hyperparameter tuning, and active learning of neural networks. To solve it, we present a mirror-descent type method and derive regret bounds. We provide numerical simulations to support our theoretical results.
Partially observable multi-agent RL with (quasi-)efficiency: The blessing of information sharing
ABSTRACT. We study provable multi-agent reinforcement learning (MARL) in the general framework of
partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of computationally intractable oracles, we advocate leveraging the potential information sharing among agents, a common practice in empirical MARL, and a standard model for multiagent control systems with communications. We first establish several computation complexity results to justify the necessity of information-sharing, as well as the observability assumption that has enabled quasi-efficient single-agent RL with partial observations, for computational efficiency in solving POSGs. We then propose to further approximate the shared common information to construct an approximate model of the POSG, in which planning an approximate equilibrium (in terms of solving the original POSG) can be quasi-efficient, i.e., of quasi-polynomial-time, under the aforementioned assumptions. Furthermore, we develop a partially observable MARL algorithm that is both statistically and computationally quasi-efficient. We hope our study may open up the possibilities of leveraging and even designing different information structures, for developing both sample- and computation-efficient partially observable MARL.
ABSTRACT. Learning in multi-player games captures many modern machine learning applications, ranging from generative adversarial networks and adversarial training to robust optimization and multi-agent reinforcement learning. Understanding last-iterate convergence in games is crucial since the last iterate characterizes the stability of the learning process and is widely used in practice.
In this talk, we first focus on two fundamental algorithms—the Extra Gradient (EG) method [Korpelevich, 1976] and the Optimistic Gradient (OG) method [Popov, 1980]. Although these two algorithms have been popular and extensively studied in practice, the last-iterate convergence rates for both remained an open question. We resolve this long-standing open question by showing that both EG and OG exhibit a tight $O(1/\sqrt{T})$ last-iterate convergence rate in monotone games, which contain convex-concave min-max optimization as a special case. In the second part of the talk, we present the first set of optimal first-order methods that achieve accelerated $O(1/T)$ last-iterate convergence rates. These results hold for the more general comonotone games that cover a class of structured nonconvex-nonconcave min-max optimization problems. Our proof relies on the use of the tangent residual, a new convergence measure, and an accompanying sum-of-squares programming-based analysis. Finally, we discuss some recent progress on the last-iterate convergence for the Optimistic Multiplicative Weights Update (OMWU) algorithm.
Efficiently Computing Nash Equilibria in Adversarial Team Markov Games
ABSTRACT. Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players -- in the absence of any explicit coordination or communication -- is competing against an adversarial player. This setting allows for a unifying treatment of zero-sum Markov games and Markov potential games, and serves as a step to model more realistic strategic interactions that feature both competing and cooperative interests. Our main contribution is the first algorithm for computing stationary ϵ-approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as 1/ϵ. The proposed algorithm is particularly natural and practical, and it is based on performing independent policy gradient steps for each player in the team, in tandem with best responses from the side of the adversary; in turn, the policy for the adversary is then obtained by solving a carefully constructed linear program. Our analysis leverages non-standard techniques to establish the KKT optimality conditions for a nonlinear program with nonconvex constraints, thereby leading to a natural interpretation of the induced Lagrange multipliers. Along the way, we significantly extend an important characterization of optimal policies in adversarial (normal-form) team games due to Von Stengel and Koller (GEB `97).
High probability and risk-averse guarantees for stochastic saddle point problems
ABSTRACT. We consider stochastic strongly-convex-strongly-concave (SCSC) saddle point (SP) problems which frequently arise in applications ranging from distributionally robust learning to game theory and fairness in machine learning. We focus on the recently developed stochastic accelerated primal-dual algorithm (SAPD), which admits optimal complexity in several settings as an accelerated algorithm. We provide high probability guarantees for convergence to a neighborhood of the saddle point that reflects accelerated convergence behavior. We also provide an analytical formula for the limiting covariance matrix of the iterates for a class of stochastic SCSC quadratic problems where the gradient noise is additive and Gaussian. This allows us to develop lower bounds for this class of quadratic problems which show that our analysis is tight in terms of the high probability bound dependency to the parameters. We also provide a risk-averse convergence analysis characterizing the ``Conditional Value at Risk'', the ``Entropic Value at Risk'', and the χ2-divergence of the distance to the saddle point, highlighting the trade-offs between the bias and the risk associated with an approximate solution obtained by terminating the algorithm at any iteration.
An Exponential Quantum Advantage for Approximating Directed Max Cut in Graph Streams
ABSTRACT. While the search for quantum advantage typically focuses on speedups in execution time, quantum algorithms also offer the potential for advantage in space complexity. Previous work has shown such advantages for data stream problems, in which elements arrive and must be processed sequentially without random access, but these have been restricted to specially-constructed problems [Le Gall, SPAA `06] or polynomial advantage [Kallaugher, FOCS `21]. We will present an exponential quantum space advantage for approximating the maximum directed cut problem. This is the first known exponential quantum space advantage for any natural streaming problem. This also constitutes the first unconditional exponential quantum resource advantage for approximating a discrete optimization problem in any setting.
Universal Quantum Speedup for Branch-and-Bound, Branch-and-Cut, and Tree-Search Algorithms
ABSTRACT. Mixed Integer Programs (MIPs) model many optimization problems of interest in Computer Science, Operations Research, and Financial Engineering. Solving MIPs is NP-Hard in general, but several solvers have found success in obtaining near-optimal solutions for problems of intermediate size. Branch-and-Cut algorithms, which combine Branch-and-Bound logic with cutting-plane routines, are at the core of modern MIP solvers. Montanaro proposed a quantum algorithm with a near-quadratic speedup compared to classical Branch-and-Bound algorithms in the worst case, when every optimal solution is desired. In practice, however, a near-optimal solution is satisfactory, and by leveraging tree-search heuristics to search only a portion of the solution tree, classical algorithms can perform much better than the worst-case guarantee. In this paper, we propose a quantum algorithm, Incremental-Quantum-Branch-and-Bound, with universal near-quadratic speedup over classical Branch-and-Bound algorithms for every input, i.e., if classical Branch-and-Bound has complexity $Q$ on an instance that leads to solution depth d, Incremental-Quantum-Branch-and-Bound offers the same guarantees with a complexity of $\tilde{O}(\sqrt{Q}d)$. Our results are valid for a wide variety of search heuristics, including depth-based, cost-based, and A∗ heuristics. Universal speedups are also obtained for Branch-and-Cut as well as heuristic tree search. Our algorithms are directly comparable to commercial MIP solvers, and guarantee near quadratic speedup whenever $Q \gg d$. We use numerical simulation to verify that $Q \gg d$ for typical instances of the Sherrington-Kirkpatrick model, Maximum Independent Set, and Portfolio Optimization; as well as to extrapolate the dependence of $Q$ on input size parameters. This allows us to project the typical performance of our quantum algorithms for these important problems.
Quantum algorithms and software for nonconvex optimization
ABSTRACT. Identifying efficient quantum algorithms for nonconvex optimization would be appealing for both theoretical and practical applications. A conventional approach to achieving quantum speedups in optimization relies on the quantum acceleration of intermediate steps in classical algorithms, while maintaining the overall algorithmic trajectory and solution quality unchanged. We propose Quantum Hamiltonian Descent (QHD), which is derived from the path integral of dynamical systems corresponding to certain classical optimization algorithms, as a genuine quantum approach to nonconvex optimization. Specifically, we prove that QHD can efficiently solve a family of nonconvex continuous optimization instances, each characterized by exponentially many local minima. Meanwhile, a comprehensive empirical study suggests that representative state-of-the-art classical optimization algorithms/solvers, including Gurobi, would require super-polynomial time to solve such instances.
We also propose several implementation techniques that enable us to run QHD on today's analog quantum computers for solving large-scale non-convex quadratic programming (QP) problems. Based on the QHD algorithm and its real-machine implementation, we have developed a user-friendly software package named QHDOPT, specifically designed for nonconvex optimization.
On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling Walks
ABSTRACT. Classical algorithms are often not effective for solving nonconvex optimization problems where local minima are separated by high barriers. We explore possible quantum speedups for nonconvex optimization by leveraging the global effect of quantum tunneling. Specifically, we introduce a quantum algorithm termed the quantum tunneling walk (QTW) and apply it to nonconvex problems where local minima are approximately global minima. We show that QTW achieves quantum speedup over classical stochastic gradient descents (SGD) when the barriers between different local minima are high but thin and the minima are flat. Based on this observation, we construct a specific double-well landscape, where classical algorithms cannot efficiently hit one target well knowing the other well but QTW can when given proper initial states near the known well. Finally, we corroborate our findings with numerical experiments.
A characterization of simultaneous optimization, majorization, and (bi)submodular polyhedra
ABSTRACT. Motivated by resource allocation problems (RAPs) in power management applications, we investigate the existence of solutions to optimization problems that simultaneously minimize the class of Schur-convex functions, also called least majorized elements. For this, we introduce a generalization of majorization and least majorized elements, called (a,b)-majorization and least (a,b)-majorized elements, and characterize the feasible sets of problems that have such elements in terms of base and (bi)submodular polyhedra. Hereby, we also obtain new characterizations of these polyhedra that extend classical characterizations in terms of optimal greedy algorithms from the 1970s. We discuss the implications of our results for RAPs in power management applications and derive a new characterization of convex cooperative games and new properties of optimal estimators of specific regularized regression problems. In general, our results highlight the combinatorial nature of simultaneously optimizing solutions and provide a theoretical explanation for why such solutions generally do not exist.
On the structure of the inverse-feasible region of a multiobjective integer program
ABSTRACT. The inverse of an optimization problem asks, for fixed constraint data, what objective(s) make some target solution(s) optimal. We study this problem in the multiobjective setting, where (i) multiple, conflicting objectives are present, (ii) in place of a single optimal solution, there is a set of efficient solutions, and (iii) instead of a single-objective objective vectors, objective matrices are sought.
This working study establishes foundations for approaching an exact representation of the inverse feasible set. We provide two outer approximations for the inverse feasible set of objective matrices and prove fundamental properties about them.
The first approximation is based on stability (that one image should not dominate another), and the second is based on supportedness (existing on the boundary of the convex hull of the image set). The theory utilized in this paper explores the limitations of single-objective optimization and highlights gaps in multiobjective tools, e.g., superadditive duality. We include novel visualization tools to ground our understanding in geometric intuition. Our understanding of the remaining representation gap is presented.
Pareto Leap: An Algorithm for Biobjective Mixed-Integer Programming
ABSTRACT. Biobjective mixed-integer programs (BOMIPs) have become an important area of research due to the breadth of their applications. However, most results focus on the linear case. In this talk, we present an extension of a class of methods which use so-called tabu or "no-good" constraints from the linear to nonlinear case. This extension makes use of the nonconvex nature of the outcome sets of BOMIPs and "leaps" between subsets of the outcome set which contribute to the Pareto set, without ever finding dominated subsets. We present a discussion of the theory, an algorithmic framework, and apply the results to an example.
Relaxation and Duality for Multiobjective Integer Programming
ABSTRACT. Multiobjective integer programs (MOIPs), as the name suggests, simultaneously optimize multiple objective functions over a set of linear constraints and integer variables. Relaxations of MOIPs provide bounds on their solutions and play an important role in solution algorithms. In this talk, we present a Lagrangian relaxation of an MOIP and discuss its properties. Specifically, we show that there exist sparse Lagrangian multipliers that satisfy a complementary slackness property and generate tight relaxations at supported solutions. At unsupported solutions, where the convex hull relaxation is not tight either, we show that a Lagrangian relaxation improves upon the convex hull bounds. We use the Lagrangian relaxation to define a Lagrangian dual of an MOIP and establish weak duality. The dual is strong at supported solutions under certain conditions on the primal feasible region which are identical to the single-objective case.
Dynamic Parameter Estimation in Deterministic Dynamical Systems
ABSTRACT. Estimating and quantifying uncertainty in unknown system parameters from partial, noisy system observations remains a challenging inverse problem. In addition to constant parameters, a variety of systems stemming from real-world applications include unobservable parameters that change with time but have unknown evolution models. In this talk, we present a systematic approach to optimize deterministic dynamical systems of this type using hierarchical Bayesian filtering to estimate dynamic parameters given online system observations.
Optimization on Measure Spaces with Regular Divergences and Mixed Boundary Conditions
ABSTRACT. We consider a class of models of sandpile accumulation featuring optimization with gradient constraints. These lead (via duality) to variational problems involving partial differential operators on non-standard vector-valued measure spaces with divergences (represented by measures or by functions). Further, the latter include mixed boundary conditions that are established by means of a normal trace characterization of the vector measure. We determine existence of solutions, stability and strong duality results. We finalize the talk with numerical tests.
A Discontinuous Galerkin Method for Optimal Control Problem of the Obstacle Problem
ABSTRACT. In this talk, we provide quasi-optimal a priori error estimates for an optimal control problem constrained by an elliptic obstacle problem where the finite element discretization is carried out using the symmetric interior penalty discontinuous Galerkin method. The main proofs are based on the improved $L^2$-error estimates
for the obstacle problem, the discrete maximum principle, and a well-known quadratic growth property. The standard (restrictive) assumptions on the mesh are not assumed here.
Risk-averse optimization using randomized quasi-Monte Carlo methods
ABSTRACT. We use randomized quasi-Monte Carlo (RQMC) methods to approximate risk-averse stochastic programs. For RQMC methods based on digital nets, we demonstrate the asymptotic consistency of these approximations using a strong law of large numbers. Moreover, we empirically compare the performance of RQMC, quasi-Monte Carlo (QMC), and Monte Carlo (MC) approximations for high-dimensional risk-averse stochastic problems and a PDE-constrained optimization problem under uncertainty. For these problems, we find that RQMC approaches based on Halton and Sobol sequences yield substantially smaller variance than MC and QMC methods.
Computational and duality properties of Hoffman constants
ABSTRACT. A classical result of Alan Hoffman from 1952 provides a bound on the distance from a point to a non-empty polyhedron. The bound is stated in terms of a "Hoffman constant" associated to the linear inequalities that define the polyhedron.
This talk will present some recent developments concerning the computational and duality properties for Hoffman constants. We will discuss an algorithmic approach to compute Hoffman constants and illustrate its effectiveness in some interesting classes of problems. We will also show a novel "Hoffman duality identity" between some canonical primal and dual systems of linear inequality constraints defined by an m-by-n matrix.
cuPDLP.jl: A GPU Implementation of Restarted Primal-Dual Hybrid Gradient for Linear Programming in Julia
ABSTRACT. In this paper, we provide an affirmative answer to the long-standing question: Are GPUs useful in solving linear programming? We present cuPDLP.jl, a GPU implementation of restarted primal-dual hybrid gradient (PDHG) for solving linear programming (LP). We show that this prototype implementation in Julia has comparable numerical performance on standard LP benchmark sets as Gurobi, a highly optimized implementation of the simplex and interior-point methods. Furthermore, we present the superior performance of cuPDLP.jl with its CPU counterpart. This demonstrates the power of using GPUs in the optimization solvers.
Worst-case analysis of restarted primal-dual hybrid gradient on totally unimodular linear programs
ABSTRACT. Recently, there has been increasing interest in using matrix-free methods to solve linear programs due to their ability to attack extreme-scale problems. Each iteration of these methods is a matrix-vector product, so they benefit from a low memory footprint and are parallelized easily. Restarted primal-dual hybrid gradient (PDHG) is a matrix-free method with good practical performance. Prior analysis showed that it converges linearly on linear programs where the linear convergence constant depends on the Hoffman constant of the KKT system.
We refine this analysis of restarted PDHG for totally unimodular linear programs. In particular, we show that restarted PDHG finds an $\epsilon$-optimal solution in $O( \ME \nrow^{2.5} \sqrt{\nnz} \log(\ME \ncol /\epsilon) )$ matrix-vector multiplies where $\nrow$ is the number of constraints, $\ncol$ the number of variables, $\nnz$ is the number of nonzeros in the constraint matrix, $\ME$ is the largest absolute coefficient in the right hand side or objective vector, and $\epsilon$ is the distance to optimality of the point outputted by PDHG.
A feasible method for general convex low-rank SDP problems
ABSTRACT. In this work, we consider the low rank decomposition (SDPR) of general convex semidefinite programming problems (SDP) that contain both a positive semidefinite matrix and a nonnegative vector as variables. We develop a rank-support-adaptive feasible method to solve (SDPR) based on Riemannian optimization. The method is able to escape from a saddle point to ensure its convergence to a global optimal solution for generic constraint vectors.
We prove its global convergence and local linear convergence without assuming that the objective function is twice differentiable. Due to the special structure of the low-rank SDP problem, our algorithm can achieve better iteration complexity than existing results for more general smooth nonconvex problems. In order to overcome the degeneracy issues of SDP problems, we develop two strategies based on random perturbation and dual refinement. These techniques enable us to solve some primal degenerate SDP problems efficiently, for example, Lov\'{a}sz theta SDPs. Our work is a step forward in extending the application range of Riemannian optimization approaches for solving SDP problems. Numerical experiments are conducted to verify the efficiency and robustness of our method.
Scalable Climate-Aware Power Grid Capacity Expansion using MPI-SPPY
ABSTRACT. In this talk, we describe a power grid capacity expansion model that simultaneously considers generation, transmission, and storage - leveraging a representative day formulation. Data projected from the US Department of Energy's E3SM climate model is used as a key input, considering several years of simulation output that is used to compute potential future renewables production and demand. The model is structurally equivalent to a two-stage stochastic mixed-integer program, which we solve using the open source Python-based mpi-sppy library. We describe challenges and their solutions utilizing mpi-sppy in solving this large-scale stochastic mixed-integer program, considering a ~10K nodal representation of the California electricity grid in the US. Leveraging high-performance computing, scenario-based decomposition (specifically progressive hedging), and lower and upper bounding strategies, we identify provably near-optimal solutions for instances with thousands of representative days in approximately ten hours of wall clock time.
A filter trust-region continuation method for contact optimization and complementarity problems
ABSTRACT. We present a globalization algorithm for using Newton's method to solve nonsmooth system of equations such as nonlinear complementarity and optimality conditions of nonlinear continuously constrained optimization problems. We propose an non-interior smoothing continuation path that is tracked using trust-region techniques and multidimensional filters. We show extensive numerical results on problems reported as problematic in the literature that show the robustness and efficiency of the proposed algorithm. Performance on real-world contact mechanics will be also presented, with focus on the high-performance computing aspects of this class of problems.
ABSTRACT. Nonlinear local optimization is a versatile and efficient tool for decision making in many operations and engineering contexts, but convergence of solvers for these problems is sensitive to problem formulation, scaling factors, and the initial guess. As there are limited analytical tools to determine the quality of formulations, scaling factors, and initial guesses, critical solver failures can be difficult to debug. In this work, we present tools developed to diagnose potential causes of non-convergence in nonlinear optimization problems. First, we introduce the Dulmage-Mendelsohn decomposition, which identifies under and over-constrained subsystems of variables and constraints, and give examples where it identifies modeling errors or faulty assumptions. Second, we introduce the Institute for the Design of Advanced Energy Systems (IDAES) model diagnostics toolbox, a Python-based software tool for diagnosing issues with nonlinear optimiaztion problems. The toolbox incorporates the Dulmage-Mendelsohn decomposition, checks for consistent units of measure, and checks for potentially undefined function evaluations, among other useful analyses. Finally, we propose a workflow for developing and maintaining reliable nonlinear optimization models. This presentation will be useful to cross-disciplinary users and developers of nonlinear optimization tools.
ABSTRACT. This talk will cover five years of R&D on the open-source modeling tool Gravity (github.com/coin-or/Gravity). We will discuss the lessons learned along the way and present some of the latest features including a solver for sparse mixed-integer semidefinite programming. We will also present two use cases: satellite collision avoidance and verification of neural networks.
ABSTRACT. When the data used for optimization and decision-making is uncertain or random, optimum solutions can have a disparate impact on different demographic groups. This impact arises from variations in the probability distributions of data corresponding to different groups or the accuracy of our estimates for these distributions. Such uncertainties can stem from the output of machine learning pipelines (e.g., a differentially private algorithm) or the stochastic nature of the problem (e.g., demands in inventory management). In this talk, we will explore these disparate impacts on budget allocation due to differentially private data and stock-outs in inventory management. We will introduce algorithms based on chance-constrained optimization to effectively mitigate such biases. Furthermore, we demonstrate that our algorithms yield equitable solutions for diverse demographic groups in practical applications.
Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective
ABSTRACT. In data-driven stochastic optimization, model parameters of the underlying distribution need to be estimated from data in addition to the optimization task. Recent literature suggests that integrated estimation-and-optimization (IEO), namely by selecting model parameters via the empirical objective of the target optimization problem, outperforms simple estimate-then-optimize (ETO). In this talk, we demonstrate how a reverse ordering of performance can appear when the model class is well-specified and there is sufficient data. We argue this ordering using the strong sense of stochastic dominance of the regret - in other words, not only the mean or other moments, but the entire distribution of the regret, is always better for ETO compared to IEO. Our results apply to constrained, contextual optimization problems where the decision depends on observed features. Moreover, we also demonstrate how standard sample average approximation (SAA), whenever applicable, also enters this chain of reverse performance ordering.
The Terminator: An Integration of Inner and Outer Approximations for Solving Regular and Distributionally Robust Chance Constrained Programs via Variable Fixing
ABSTRACT. We present a novel approach aimed at enhancing the efficacy of solving both regular and distributionally robust chance constrained programs using an empirical reference distribution. In general, these programs can be reformulated as mixed-integer programs (MIPs) by introducing binary variables for each scenario, indicating whether a scenario should be satisfied. While existing methods have predominantly focused on either inner or outer approximations, this paper bridges this gap by studying a scheme that effectively combines these approximations via variable fixing. Through probing the restricted outer approximations and comparing them with the inner approximations, we derive optimality cuts that can notably reduce the number of binary variables by effectively setting them to either one or zero. We conduct a theoretical analysis of variable fixing techniques, deriving an asymptotic closed-form expression. This expression quantifies the proportion of binary variables that should be optimally fixed to zero. Our empirical results showcase the advantages of our approach, both in terms of computational efficiency and solution quality. Notably, we solve all the tested instances from literature to optimality, signifying the robustness and effectiveness of our proposed approach.
Learning Decision-Focused Uncertainty Sets in Robust Optimization
ABSTRACT. We propose a data-driven technique to automatically learn the uncertainty sets in robust optimization. Our method reshapes the uncertainty sets by minimizing the expected performance across a family of problems while guaranteeing constraint satisfaction. We learn the uncertainty sets using a stochastic augmented Lagrangian method that relies on differentiating the solutions of the robust optimization problems with respect to the parameters of the uncertainty set. We show finite-sample probabilistic guarantees of constraint satisfaction using empirical process theory.
Our approach is very flexible and can learn a wide variety of uncertainty sets while preserving tractability. Numerical experiments show that our method outperforms traditional approaches in robust and distributionally robust optimization in terms of out-of-sample performance and constraint satisfaction guarantees. We implemented our method in the open-source package LROPT.
AI Hilbert: A New Paradigm for Scientific Discovery by Unifying Data and Background Knowledge
ABSTRACT. The discovery of scientific formulae that parsimoniously explain natural phenomena
and align with existing background theory is a key goal in science. Historically,
scientists have derived natural laws by manipulating equations based on existing
knowledge, forming new equations, and verifying them experimentally. In recent
years, data-driven scientific discovery has emerged as a viable competitor in settings
with large amounts of experimental data. Unfortunately, data-driven methods often
fail to discover valid laws when data is noisy or scarce. Accordingly, recent
works combine regression and reasoning to eliminate formulae inconsistent with
background theory. However, the problem of searching over the space of formulae
consistent with background theory to find one that fits the data best is not well-solved. We propose a solution to this problem when all axioms and scientific
laws are expressible via polynomial equalities and inequalities and argue that our
approach is widely applicable. We further model notions of minimal complexity
using binary variables and logical constraints, solve polynomial optimization
problems via mixed-integer linear or semidefinite optimization, and prove the
validity of our scientific discoveries in a principled manner using Positivestellensatz
certificates. Remarkably, the optimization techniques leveraged in this paper allow
our approach to run in polynomial time with fully correct background theory, or
non-deterministic polynomial (NP) time with partially correct background theory.
We demonstrate that some famous scientific laws, including Kepler’s Third Law of
Planetary Motion, the Hagen-Poiseuille Equation, and the Radiated Gravitational
Wave Power equation, can be derived in a principled manner from background
axioms and experimental data.
Gain Confidence, Reduce Disappointment: A New Approach to Cross-Validation for Sparse Regression
ABSTRACT. Ridge regularized sparse linear regression involves selecting a subset of features that explains the relationship between a high-dimensional design matrix and an output vector in an interpretable manner. To select the sparsity and robustness of linear regressors, techniques like leave-one-out cross-validation are commonly used for hyperparameter tuning. However, cross-validation typically increases the cost of sparse regression by several orders of magnitude, because it requires solving multiple mixed-integer optimization problems (MIOs) for each hyperparameter combination. Additionally, validation metrics are noisy estimators of the test-set error, with different hyperparameter combinations leading to models with different amounts of noise. Therefore, optimizing over these metrics is vulnerable to out-of-sample disappointment, especially in underdetermined settings. To address this state of affairs, we make two contributions. First, we leverage the generalization theory literature to propose confidence-adjusted variants of the leave-one-out error that display less propensity to out-of-sample disappointment. Second, we leverage ideas from the mixed-integer optimization literature to obtain computationally tractable relaxations of the confidence-adjusted leave-one-out error, thereby minimizing it without solving as many MIOs. Our relaxations give rise to an efficient cyclic coordinate descent scheme which allows us to obtain significantly lower leave-one-out errors than via other methods in the literature. We validate our theory by demonstrating that we obtain significantly sparser and comparably accurate solutions than via popular methods like GLMNet and suffer from less out-of-sample disappointment. On synthetic datasets, our confidence adjustment procedure generates significantly fewer false discoveries, and improves out-of-sample performance by 2%–5% compared to cross-validating without confidence adjustment. Across a suite of 13 real datasets, a calibrated version of our confidence adjustment improves the test set error by an average of 4% compared to cross-validating without confidence adjustment.
Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression
ABSTRACT. Models like LASSO and ridge regression are extensively used in practice due to their interpretability, ease of use, and strong theoretical guarantees. Cross-validation (CV) is widely used for hyperparameter tuning in these models, but do practical optimization methods minimize the true out-of-sample loss? A recent line of research promises to show that the optimum of the CV loss matches the optimum of the out-of-sample loss (possibly after simple corrections). It remains to show how tractable it is to minimize the CV loss. In the present paper, we show that, in the case of ridge regression, the CV loss may fail to be quasiconvex and thus may have multiple local optima. We can guarantee that the CV loss is quasiconvex in at least one case: when the spectrum of the covariate matrix is nearly flat and the noise in the observed responses is not too high. More generally, we show that quasiconvexity status is independent of many properties of the observed data (response norm, covariate-matrix right singular vectors and singular-value scaling) and has a complex dependence on the few that remain. We empirically confirm our theory using simulated experiments.
Stability Analysis of Discrete-Time Linear Complementarity Systems
ABSTRACT. A Discrete-Time Linear Complementarity System (DLCS) is a dynamical system in discrete time whose state evolution is governed by linear dynamics in states and algebraic variables that solve a Linear Complementarity Problem (LCP). The DLCS, is the hybrid dynamical system that is the discrete-time counterpart of the well-known Linear Complementarity System (LCS). We derive sufficient conditions for Lyapunov stability of a DLCS when using a quadratic Lyapunov function that depends only on the state variables and a quadratic Lyapunov function that depends both on the state and the algebraic variables. The sufficient conditions require checking the feasibility of a copositive program over nonconvex cones. Our results only assume that the LCP is solvable and do not require the solutions to be unique. We devise a novel, exact cutting plane algorithm for the verification of stability and the computation of the Lyapunov functions. To the best of our knowledge, our algorithm is the first exact approach for stability verification of DLCS. A number of numerical examples are presented to illustrate the approach. Though our main object of study in this paper is the DLCS, the proposed algorithm can be readily applied to the stability verification of LCS. In this context, we show the equivalence between the stability of a LCS and the DLCS, resulting from a time-stepping procedure applied to the LCS for all sufficiently small time steps.
Derivation and Generation of Path-Based Valid Inequalities for Transmission Expansion Planning with New Bus Integration
ABSTRACT. This research tackles the intricacies of DCOPF-based transmission expansion planning, accommodating the integration of new buses into the power grid. To handle the general intractability of this problem, the primary computational strategy involves deriving valid inequalities (VIs). The proposed approach utilizes problem structure and auxiliary efficient algorithms to generate highly effective VIs. More specifically, these VIs are generated by considering different paths within the existing and expanded network before solving the dispatch problem. While in the associated theorems identifying the longest paths is deemed essential to preserve integer solutions, the proposed approach efficiently circumvents such an inefficient procedure for the majority of bus pairs. We show that, for instances where this is not applicable, alternative techniques can be employed. Two classes of VIs result from these insights, and their effectiveness and computational advantages are demonstrated through application to a modified Polish 2383-bus system.
Intertemporal Uncertainty Management in Gas-Electric Energy Systems using Stochastic Finite Volumes
ABSTRACT. In this paper, we utilize a recently developed stochastic finite volume representation for solutions of the nonlinear gas flow equations with uncertain boundary conditions to manage intertemporal uncertainties for a pipeline system. Interpemporal uncertainty in this context is used to denote a temporary increase in load starting at a time that is randomly distributed (e.g. uniformly on an interval). This represents the activation of a peaking plant at an a priori unknown time in the day-ahead planning interval. In conjunction with the proven method of chance-constraints, this joint uncertainty management scheme provides robustness guarantees with a minimum of extra reserve requirements.
A Compact Formulation for Dense Fixed-Charge Network Flow Problems
ABSTRACT. We present a compact mixed integer program (MIP) for the fixed charge network flow problem (FCNFP). The triples formulation is a compact formulation of multicommodity network flow that provides an alternative to the traditional and widely used node-arc and arc-path representations. We show that the triples formulation significantly reduces the number of variables and constraints in the mixed integer programming formulations of the FCNFP. In an empirical study with dense benchmark instances from the literature, we find that the triples-based formulation of FCNFP produces better solutions than the node-arc formulation early in the branch-and-bound process. We find that instances that can be solved to provable optimality are solved an average of 2.75 times faster with the triples formulation than with the node-arc formulation. For instances that cannot be solved to provable optimality in our computing environment, we find that given a 30-minute time limit the triples formulation produces solutions with lower costs and optimality gaps.
Convex Hull Pricing in Electricity Markets: Evidence from a Large US Market
ABSTRACT. We present the theory and application of convex hull pricing in non-convex electricity markets.
Recently, we provided the proof-of-concept of a tractable, highly parallelizable, and exact method for convex hull price computation with guaranteed finite convergence, relying on Dantzig-Wolfe decomposition and Column Generation.
We demonstrated how this approach provides intuition on the underlying price formation rationale.
In this work, we present evidence from the application of our method to a large US electricity market, accounting for all applicable characteristics of the market participants and a fully granular network model,
and we elaborate on performance-boosting techniques.
Bayesian Strategies for Numerical Solutions in Nonlinear Fractional Order Systems: A Focus on Cutaneous Leishmaniasis
ABSTRACT. The purpose of this study is to present the relevance and significance of the fractional order (FO) derivatives in the FO-CLDV model, which is a nonlinear model of the cutaneous leishmaniasis (CL) disease virus. The FO-CLDV model's numerical solutions are presented in the current work. Making use of the fractional order derivatives in the sense of Caputo yields some realistic results for the nonlinear model. The dynamics of the CL disease model are described by eight classes: susceptible vector, exposed vector, infected vector, infectious human, clinically ill human, recovered human, and exposed human. The numerical results of the mathematical cutaneous leishmaniasis disease model have been given for four different fractional order situations through using artificial intelligence-based Bayesian regularization neural networks approach, commonly known as SNN-BBA. For training, testing, and validation, the system's statics were chosen to be 75%, 12%, and 13%, respectively. The performances of the produced results and the reference Grünwald-Letnikov derivative numerical results demonstrate the exactness of the stochastic technique. To demonstrate rationality and consistency, stochastic solutions are employed in conjunction with simulations based on state transition measurements, error histogram performances, regression, and correlation.
LINE-SEARCH BASED OPTIMIZATION USING FUNCTION APPROXIMATIONS WITH TUNABLE ACCURACY
ABSTRACT. A line-search algorithm that uses objective function models with tunable accuracy to solve smooth optimization problems with convex constraints is developed. Given a current iterate, traditional line-search methods use a Taylor expansion to build a quadratic model, and construct a new iterate from an approximate minimizer of this quadratic model. However, in many applications other effective models of the objective can be constructed for which approximate minimizers can be efficiently computed. This paper specifies how objective function models can be used to generate new iterates in the context of line-search methods, and specifies approximation properties these models have to satisfy. At each iteration, the model has to satisfy function error and relative gradient error tolerances that are determined by the algorithm based on its progress. Moreover, the algorithm assumes that a bound for the model error is available and uses this bound to explore regions where the model is sufficiently accurate. Our algorithm has the same first-order global convergence properties as standard line-search methods. However, our algorithm uses only the models and the model error bounds, but never directly accesses the original objective function. As an example, the algorithm is applied to problems where the evaluation of the objective requires the solution of a large-scale system of nonlinear equations. The models are constructed from reduced order models of this system. Numerical results for partial differential equation constrained optimization problems show the benefits of the proposed algorithm.
Optimization in hybrid controlled systems with piecewise-smooth systems' dynamics.
ABSTRACT. In this talk I am going to make an overview of recent advances in the field of optimally controlled piecewise-smooth dynamical systems. While there are many papers in the field of piecewise-smooth dynamics, studies on the optimisation for such systems are relatively scarce. There are several papers (including published and in preparation ones) co-authored by the presenter and different colleagues, where novel results concerning optimal dynamics of piecewise-smooth systems are established. This includes sliding motion optimality, optimal periodic dynamics for time-driven systems, hybrid crossing limit cycles optimality for higher-dimensional (non-planar) systems and heteroclinic cycles.
In these papers we show how the geometric approach might be useful in establishing such novel types of optimality for hybrid controlled systems. It is shown, that there is no universal version of the conventional Maximum principle, but some rather straightforward modifications may be done to existing versions to prove at least necessary conditions for optimality of non-smooth trajectories.
These have multiple potential applications in the fields of resource management, renewable energy system transitions and etc. Some of such applications are also discussed.
Optimality analysis and duality conditions for a class of conic semi-infinite program having vanishing constraints
ABSTRACT. This work focuses on a non-smooth conic semi-infinite programming problem having vanishing constraints. Using the limiting constraint qualification, we establish a necessary optimality condition for the optimization model. Subsequently, the concept of generalized convexity over cones is introduced, followed by the development of sufficient optimality conditions. Wolfe’s and Mond-Weir type dual models are also formulated for the considered semi-infinite optimization problem, and weak, strong, and
converse duality results are established under generalized Q-convexity/
pseudoconvexity/ Q-quasiconvexity assumptions.
Interior point method for tensor optimal transport, spectral radius of nonnegative tensors and hypergraphs
ABSTRACT. In the first part of this talk we discuss the interior point method for tensor optimal transport [1]. In the second part of the talk we show that
several quantities as: the spectral radius of weakly irreducible nonnegative tensors,spectral radius of uniform weighted hypergraphs, maximum of d-homogeneous polynomials with nonnegative coefficients in the unit ball of the d-Holder norm, are polynomially computable [2]. Furthermore, one can apply interior point method to find these quantities.
References:
[1] S. Friedland, Interior point method in tensor optimal transport, arXiv:2310.02510, 2023.
[2] S. Friedland and S. Gauber, Bit-complexity estimates in geometric programming, and application to the polynomial-time computation of the spectral radius of nonnegative tensors, arXiv:2301.10637, 2023.
Solving Tall Dense SDPs in the Current Matrix Multiplication Time
ABSTRACT. In this talk I will introduce a new interior point method algorithm that solves semidefinite programming (SDP) with variable size n×n and m constraints in the (current) matrix multiplication time m^ω when m=Ω(n^2). This suggests solving SDP is nearly as fast as solving the linear system with equal number of variables and constraints. This is the first result that tall dense SDP can be solved in the nearly-optimal running time, and it also improves the state-of-the-art SDP solver. Our algorithm maintains the inverse of a Kronecker product using lazy updates, and we design a general amortization scheme for positive semidefinite matrices.
A Higher-Order Interior Point Method for Semidefinite Optimization Problems Failing Strict Complementarity Condition
ABSTRACT. Semidefinite optimization (SDO) problems are known to be solved efficiently using interior point methods (IPMs), however, it is known that fast asymptotic convergence is attained only under the strict complementarity condition. Failing strict complementarity leads to losing the analyticity of the central path at optimality. To address this issue, based on a semi-algebraic description of the central path, we reparametrize the central path and then propose a higher-order IPM to follow the reparametrized central path. We discuss the convergence of our higher-order IPM and present numerical results.
A Stochastic GDA Method with Backtracking for Solving Nonconvex (Strongly) Concave Minimax Problems
ABSTRACT. We propose a stochastic Gradient Descent Ascent (GDA) method with backtracking (SGDA-B) to solve nonconvex-(strongly) concave (NCSC) minimax problems min_x max_y sum_{i=1}^N g_i(x_i)+f(x,y)-h(y), where h and g_i for i=1...N are closed convex, f is L-smooth and μ-strongly concave in y. SGDA-B does not require to know L, and for any p ∈ (0, 1) and ε > 0, using random block-coordinate updates it can compute an ε-stationary point with probability at least 1-p within O(κ^3 L ε^{-4} log(1/p)) and O(L^4 ε^{−7} log(1/p)) stochastic oracle calls when μ > 0 and μ = 0, respectively, where κ = L/μ. To our knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCSC minimax problems. We provide numerical results for SGDA-B on a distributionally robust learning.
Towards Provable Gradient Descent Ascent Dynamics in Hidden Structure
ABSTRACT. Despite the empirical success in training zero-sum game objectives, with the generative adversarial network (GAN) being one of the most popular applications, there has been little understanding of the dynamics of minimax optimizing a nonconvex-nonconcave structure. In this work, we focus on a specific paradigm of problems in literature called the hidden convex-concave (HCC) problem, where the objective is convex-concave in the image space of some functions. We propose a sufficient condition that ensures convergence under high-dimensional space. Further, we show that a two-layer neural network satisfies our condition and thus converges to the equilibrium point, which is the first to provide a practical example with a convergence guarantee.
f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization
ABSTRACT. Numerous constraints and regularization terms have been proposed in the literature to promote fairness in machine learning tasks. Most of these methods are not amenable to stochastic optimization due to the complex and nonlinear structure of constraints and regularizers. Here, the term ``stochastic'' refers to the ability of the algorithm to work with small mini-batches of data. Motivated by the limitation of existing literature, this paper presents a unified stochastic optimization framework for fair empirical risk minimization based on $f$-divergence measures ($f$-FERM). The proposed stochastic algorithm enjoys theoretical convergence guarantees. In addition, our experiments demonstrate the superiority of fairness-accuracy tradeoffs offered by $f$-FERM for almost all batch sizes (ranging from full-batch to batch size of one). Moreover, we show that our framework can be extended to the case where there is a distribution shift from training to the test data.
Our extension is based on a distributionally robust optimization reformulation of $f$-FERM objective under $\ell_p$ norms as uncertainty sets. Again, in this distributionally robust setting, $f$-FERM enjoys not only theoretical convergence guarantees but also outperforms other baselines in the literature in the tasks involving distribution shifts. An efficient stochastic implementation of $f$-FERM is publicly available.
Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization
ABSTRACT. Algorithmic reproducibility measures the deviation in outputs of machine learning algorithms upon minor changes in the training process. Previous work suggests that first-order methods would need to trade off convergence rate (gradient complexity) for better reproducibility. In this work, we challenge this perception and demonstrate that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems under various error-prone oracle settings. Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization. With the inexact gradient oracle, the near-optimal guarantees also hold for minimax optimization. Additionally, with the stochastic gradient oracle, we show that stochastic gradient descent ascent is optimal in terms of both reproducibility and gradient complexity.
Co-Design of Complex Systems: From Autonomy to Future Mobility Systems
ABSTRACT. When designing complex systems, we need to consider multiple trade-offs at various abstraction levels, and choices of single components need to be studied jointly.
For instance, the design of future mobility solutions (e.g., autonomous vehicles) and the design of the mobility systems they enable are closely coupled. Indeed, knowledge about the intended service of novel mobility solutions would impact their design and deployment process, whilst insights about their technological development could significantly affect transportation policies.
Co-designing sociotechnical systems is a complex task for at least two reasons. First, the co-design of interconnected systems (e.g., networks of cyber-physical systems) involves the simultaneous choice of components arising from heterogeneous fields, while satisfying systemic constraints and accounting for multiple objectives. Second, components are connected via interactions between different stakeholders.
I will present a framework to co-design such systems, leveraging a monotone theory of co-design.
The framework will be instantiated in the task of designing future mobility systems, all the way from the policies that a city can design, to the autonomy of vehicles part of an autonomous mobility-on-demand service. I will show how the proposed approaches allow one to efficiently answer heterogeneous questions, unifying different techniques and promoting interdisciplinarity and compositionality.
ABSTRACT. We propose an approach to solving online Stackelberg games with multiple followers in a follower-agnostic way. Unlike previous works, our approach allows the leader to achieve their Stackelberg equilibrium with no knowledge about the followers' utility functions or strategy space. Instead, we introduce an algorithm in which the leader updates their strategies using a specially crafted gradient estimator by probing followers with specially designed strategies. To further model realistic interactions we do not assume that followers respond with optimal play but instead that they follow a convergent adaptation rule (i.e., the followers themselves may be learning). The leader constructs the gradient estimator based solely on observations of the followers actions. We provide both non-asymptotic convergence rates to stationary points of leader's objective and demonstrate asymptotic convergence to a local Stackelberg equilibrium. We show the effectiveness of our algorithm in solving incentive design problems in non-atomic routing games, even when the leader lacks access to follower demand information.
ABSTRACT. Gentrification is a process of neighborhood change in which the primary beneficiaries tend to be homeowners and newcomers, as opposed to incumbent renters. However, operational definitions of gentrification and other concepts of neighborhood change are elusive, making them and their interactions with policy interventions difficult to quantify. We propose formulating processes of neighborhood change as instances of no-regret dynamics; a collective learning process in which a set of strategic agents rapidly reach a state of approximate equilibrium. We mathematize concepts of neighborhood change to model the incentive structures impacting individual dwelling site decision-making. Our model accounts for affordability, access to relevant amenities, community ties, and site upkeep. We showcase our model with computational experiments that provide semi-quantitative insights on the spatial economics of neighborhood change, particularly on the influence of residential zoning policy and the placement of urban transportation amenities.
When Simple is (Near)-Optimal in Security Games for Urban Transportation
ABSTRACT. Intermediate public transport (IPT) services, including mini-buses and share-taxis, are pivotal in serving the last-mile connectivity needs in many developing nation cities. However, such systems are often prone to fraud or illegal behavior, as some passengers may jump queues to reduce wait times. For instance, some users, typically men, might skip the queue and board a moving vehicle as it arrives at the mini-bus stand to pick up passengers, which can be detrimental to other users (e.g., women). Beyond IPT systems, such fraudulent behavior is ubiquitous across many urban transportation contexts. For example, users often drive above the speed limit on roads to reach their destinations faster, and truck drivers often engage in excessive illegal lane-changing maneuvers to reduce their travel times, which can compromise road safety.
To mitigate the potential harms associated with users engaging in fraudulent activities in urban transportation systems, in this work, we study the problem of policing and monitoring such fraud in urban transportation applications as a security game. In this game, an administrator deploys $R$ security resources (or personnel) to $L$ locations and levies fines against users engaging in fraudulent activities. For this security game, we study both the fraud minimization (or social welfare maximization) and revenue maximization administrator objectives. In both settings, we show that the problem of computing the optimal administrator strategy is NP-hard and develop natural greedy algorithm variants for the respective settings that achieve at least half the reduction in fraud (or at least half the revenue) as the fraud-minimizing (revenue-maximizing) solutions. Moreover, we establish a resource augmentation guarantee that our proposed greedy algorithms with just one additional resource, i.e., $R+1$ resources, achieve at least the same reduction in fraud (level of revenue) as the fraud minimizing (revenue maximizing) outcome with $R$ resources that is NP-hard to compute. Finally, we perform numerical experiments and study several model extensions, highlighting the generalizability of our proposed framework.
Sensitivity analysis for mixed binary quadratic programming
ABSTRACT. We consider sensitivity analysis for Mixed Binary Quadratic
Programs (MBQPs) with respect to changing right-hand-sides (rhs). We
show that even if the optimal solution of a given MBQP is known, it
is NP-hard to approximate the change in objective function value with
respect to changes in rhs. Next, we study algorithmic approaches to ob-
taining dual bounds for MBQP with changing rhs. We leverage Burer’s
completely-positive (CPP) reformulation of MBQPs. Its dual is an in-
stance of co-positive programming (COP), and can be used to obtain
sensitivity bounds. We prove that strong duality between the CPP and
COP problems holds if the feasible region is bounded or if the objective
function is convex, while the duality gap can be strictly positive if neither
condition is met. We also show that the COP dual has multiple optimal
solutions, and the choice of the dual solution affects the quality of the
bounds with rhs changes. We finally provide a method for finding good
nearly optimal dual solutions, and we present preliminary computational
results on sensitivity analysis for MBQPs.
ABSTRACT. Clustering is an unsupervised learning task that aims to partition data into a set of clusters. In many applications, these clusters correspond to real-world constructs (e.g., electoral districts) whose benefit can only be attained by groups when they reach a minimum level of representation (e.g., 50% to elect their desired candidate). In this paper, we study the k-means clustering problem with the additional constraint that each group (e.g., demographic group) must have a minumum level of representation in at least a given number of clusters. We formulate the problem through a mixed-integer optimization framework and present an alternating minimization algorithm, called MiniReL, that directly incorporates the fairness constraints. While incorporating the fairness criteria leads to an NP-Hard assignment problem within the algorithm, we present computational approaches that make the algorithm practical even for large datasets. Numerical results show that this approach can produce fair clusters with practically no increase in the clustering cost across standard benchmark datasets.
DeLuxing: Deep Lagrangian Underestimate Fixing for Column-Generation-Based Exact Methods
ABSTRACT. In this paper, we propose an innovative variable fixing strategy called deep Lagrangian underestimate fixing(DeLuxing). It is a highly effective approach for removing unnecessary variables in column-generation (CG)-based exact methods used to solve challenging discrete optimization problems commonly encountered in various industries, including vehicle routing problems (VRPs). DeLuxing employs a novel linear programming(LP) formulation with only a small subset of the enumerated variables, which is theoretically guaranteed to yield qualified dual solutions for computing Lagrangian underestimates (LUs). Due to their small sizes, DeLuxing can efficiently solve a sequence of similar LPs to generate multiple high-quality LUs, and thus can, in most cases, remove over 75% of the variables from the enumerated pool. We extend the fundamental concept underpinning the new formulation to contexts beyond variable fixing, namely variable type relaxation and cutting plane addition. We demonstrate the effectiveness of the proposed method in accelerating CG-based exact methods via the capacitated multi-trip vehicle routing problem with time windows(CMTVRPTW) and two important variants with loading times or release dates. Enhanced by DeLuxing and the extensions, one of the best exact methods for solving the CMTVRPTW developed in Yang (2023)doubles the size of instances solved optimally for the first time while being more than 7 times on average and up to over 20 times as fast as top-performing exact methods reported in the literature.
ABSTRACT. An M-decomposable set is a closed convex set which is the sum of a compact convex set and a closed convex cone. We present properties of these sets in the context of integer programming. In particular, we present properties of their integer hull, cutting plane closures and duality.
Inexact bilevel stochastic gradient methods for constrained and unconstrained lower-level problems
ABSTRACT. Two-level stochastic optimization formulations have become instrumental in a number of machine learning contexts (such as continual learning, neural architecture search, adversarial learning, and hyperparameter tuning) as well as in cybersecurity, network interdiction, and power network defense. Practical stochastic bilevel optimization problems become challenging in optimization or learning scenarios where the number of variables is high or there are constraints. In this talk, we introduce the bilevel stochastic gradient method for bilevel problems with lower level constraints. We also present a comprehensive convergence theory that covers all inexact calculations of the adjoint (or hypergradient) gradient and addresses both the lower-level unconstrained and constrained cases. To promote the use of bilevel optimization in large-scale learning, we introduce practical bilevel stochastic gradient methods that are matrix free and do not require second-order derivatives.
An Inexact Conditional Gradient Method for Constrained Bilevel Optimization
ABSTRACT. Bilevel optimization is an important class of optimization problems where one optimization problem is nested within another. This framework is widely used in machine learning problems, including meta-learning, data hyper-cleaning, and matrix completion with denoising. Several methods have been developed for tackling unconstrained bilevel optimization problems, but there is limited work on methods for the constrained setting. In fact, for those methods that can handle constrained problems, either the convergence rate is slow or the computational cost per iteration is expensive. In this talk, we focus on a bilevel optimization problem with a strongly convex lower-level problem and a smooth upper-level objective function over a compact and convex constraint set. We introduce a novel single-loop projection-free method using a nested approximation technique with a convergence rate guarantee. Our proposed method has an improved per-iteration complexity, surpassing existing methods, and achieves optimal convergence rate guarantees matching the best-known complexity of projection-free algorithms for solving convex constrained single-level optimization problems. We also present numerical experiments to showcase the superior performance of our method compared with state-of-the-art methods.
ABSTRACT. Transformers have recently revolutionized many machine learning domains and one salient discovery is their remarkable in-context learning capability, where models can capture an unseen task by utilizing task-specific prompts without further parameters fine-tuning. In this talk, I will present our recent work that aims at understanding the in-context learning mechanism of transformers. Our focus is on the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent in order to in-context learn linear function classes. I will first present our characterization of the training convergence of in-context learning for data with balanced and imbalanced features, respectively. I will then discuss the insights that we obtain about attention models and training processes. I will also talk about the analysis techniques that we develop which may be useful for a broader set of problems. I will finally conclude my talk with comments on a few future directions.
Min-Max Problems with Coupled Constraints: Theory and Algorithms
ABSTRACT. In this work we study extensions of min-max problems in which the constraints of the inner problem couple both the minimization and maximization variables. Differently from typical min-max problems the coupled problem is NP-hard, even under mild assumptions, and it does not satisfy the max-min inequality. Moreover, the addition of the coupled constraints enables the modeling of competitive settings in which the decision of the outer (min) agent constrains the decision set of the inner (max) one; we can encounter such settings in telecommunications, transportation, etc. We develop a duality theory for the coupled problem and a (duality-based) reformulation that is easier to solve than the original problem. Then by leveraging this reformulation we develop the Dual Gradient Descent (DGD) algorithm. We show asymptotic and finite-time convergence to stationary solutions under the assumptions that the objective is strongly-convex strongly-concave and the constraints are (strongly) convex-concave (under which the problem is still NP-hard). We also propose the stochastic DGD (SDGD), and prove non-asymptotic convergence to stationary solutions. Finally, we perform experiments on a network interdiction problem demonstrating the superior performance of the proposed algorithm compared to other methods for both classical and coupled min-max problems.
ABSTRACT. Linear programming has played a crucial role in shaping decision-making, resource allocation, and cost reduction in various domains. In this paper, we investigate the application of overparametrized neural networks and their implicit bias in solving linear programming problems. Specifically, our findings reveal that training diagonal linear networks with gradient descent, while optimizing the squared L2-norm of the slack variable, leads to solutions for entropically regularized linear programming problems. Remarkably, the strength of this regularization depends on the initialization used in the gradient descent process. We analyze the convergence of both discrete-time and continuous-time dynamics and demonstrate that both exhibit a linear rate of convergence, requiring only mild assumptions on the constraint matrix.
For the first time, we introduce a comprehensive framework for solving linear programming problems using diagonal neural networks. We underscore the significance of our discoveries by applying them to address challenges in basis pursuit and optimal transport problems. [Joint work with Haoyue Wang (MIT), and Promit Ghosal (Brandeis)]
New Developments of ADMM-based Interior Point Methods for Linear Programming and Conic Programming
ABSTRACT. The ADMM-based interior point method (ABIP, Lin et al. 2021) is a hybrid algorithm
which effectively combines the interior point method and the first-order method to achieve performance boost in large-scale linear programming. Different from the standard interior point method which relies on a costly Newton step, ABIP applies the alternating direction method of multipliers (ADMM) to approximately solve the barrier penalized problem. In this paper, we provide a new version of ABIP with multiple improvements. First, we develop several new implementation strategies to accelerate ABIP’s performance for linear programming. Next, we extend ABIP to solving the more general linear conic programming and establish the associated iteration complexity of the algorithm. Finally, we conduct extensive numerical experiments in both synthetic and real-world datasets to demonstrate the empirical advantage of our developments. In particular, the enhanced ABIP achieves a 5.8x reduction in the geometric mean of run time on 105 LP instances from Netlib and it compares favorably against state-of-the-art open-source solvers in a wide range of large-scale problems. Moreover, it is even comparable to the commercial solvers in some particular datasets.
ABSTRACT. The COPT conic optimizer solves large-scale linear conic optimization problems over symmetric cones (standard linear-, second-order cone and semi-definite optimization). In this talk we give an overview of the software, and discuss recent advances towards non-symmetric conic optimization.
EXPRESS: Automatic Convexity Detection and Transformation to Conic Forms for Nonlinear Optimization
ABSTRACT. Nonlinear optimization problems are often more effectively solved when converted into conic optimization forms, which leverage advanced conic solvers. These solvers generally require prior transformations into canonical forms, a step that is often laborious. To streamline this process, we introduce EXPRESS, an innovative optimization toolkit designed to automatically detect the convexity of nonlinear functions and transform these problems into their equivalent conic forms. When direct transformation is infeasible, EXPRESS provides symbolic analysis of function convexity to guide the choice of a solver. We validate the effectiveness of EXPRESS with experimental results on the MINLPLib and CUTEST datasets, demonstrating its proficiency in simplifying and accelerating nonlinear optimization processes.
Accelerated Nonlinear Programming on GPUs: Implementing Solver and Automatic Differentiation
ABSTRACT. Despite the remarkable success of graphics processing units (GPUs) in the realms of deep learning and forward simulations, their application in classical large-scale constrained nonlinear programming (NLP) has faced limitations. These limitations arise from the challenges associated with sparse automatic differentiation and the factorization of sparse indefinite Karush-Kuhn-Tucker (KKT) systems on GPUs, both of which are essential in applying second-order algorithms. To overcome these limitations, we introduce two novel strategies: Single Instruction, Multiple Data (SIMD) abstraction of NLPs (addressing the sparse automatic differentiation challenge) and a condensed-space interior point method with inequality relaxation (tackling the sparse linear algebra issue). In this presentation, we elucidate the key strategies that underpin the development of software infrastructure for nonlinear optimization on GPUs.
The SIMD abstraction represents an NLP modeling approach that simplifies the implementation of parallelized sparse automatic differentiation for large-scale NLPs. This abstraction leverages the fact that most large-scale NLPs exhibit repetitive and parallelizable computational patterns. Our modeling syntax preserves the inherent parallel structure within model equations and constructs and compiles parallel derivative evaluation kernels, which can be executed in parallel over large data. This not only ensures GPU compatibility but also delivers exceptional performance, yielding speed-ups exceeding 100 times compared to state-of-the-art tools running on CPUs.
The challenge posed by sparse matrix factorization can be overcome by approximating the indefinite KKT systems using an alternative form involving positive definite matrices. This approach is particularly effective for GPUs because positive definite matrices can be factorized without the need for numerical pivoting, a hindrance to parallel computing resource utilization in sparse matrix factorization. We have discovered that the condensed-space interior point method (IPM) with an inequality relaxation strategy enables us to transform the KKT system into a condensed positive definite form. This enables the implementation of the interior-point method assisted by the parallel sparse Choleksy solvers. Furthermore, various other techniques, such as reduction and Schur complement decomposition strategies, can be employed to further exploit the structure and accelerate computations.
Our numerical results on AC optimal power flow problems demonstrate the substantial potential of GPU utilization. Our implementations, ExaModels and MadNLP, running on GPUs, achieve speedups of over 10 times compared to state-of-the-art CPU-based tools. This presentation will spotlight these advancements and their implications for other large-scale optimization problems. Additionally, we will address the current limitations of our implementation and discuss avenues for further research.
Optimizing AC Power Flow by Exploiting the Power Flow Manifold
ABSTRACT. AC Optimal Power Flow has proven difficult to solve with interior point methods on GPUs. This is largely due to challenging linear algebra problems that current state of the art massively parallel linear solvers struggle with. However, the advent of Riemannian optimization techniques and the fact that the power flow equations form a smooth manifold present an alternative approach. In this talk we present the basics of Riemannian optimization techniques in which optimization is done directly on a manifold. Then we present computational results showing that Riemannian techniques are capable of producing solutions of comparable quality as interior point methods.
Scalable, high performance linear solvers for nonlinear constrained optimization problems
ABSTRACT. A need to repeatedly solve a large, sparse linear system is a major computational bottleneck while implementing optimization solvers. Computations related to solving these linear systems can easily consume ninety or more percent of the total runtime. In alternating current optimal power flow (ACOPF) analysis for transmission power grids this issue is even more pronounced as the underlying KKT linear systems have extremely sparse and irregular structure without dense blocks, thus standard direct solver parallelization techniques do not perform well in these cases. In my talk, I will focus on linear solver methods that ultimately led to the first GPU-accelerated linear solvers for ACOPF problem that were able to outperform MA57. The breakthrough was achieved by combining several strategies including refactorization, iterative refinement and highly optimized implementation. I will also introduce Re::Solve, our library of GPU-resident linear solvers.
Frequency Recovery in Power Grids using High-Performance Computing
ABSTRACT. Maintaining electric power system stability is paramount, especially
in extreme contingencies involving unexpected outages of multi-
ple generators or transmission lines that are typical during severe
weather events. Such outages often lead to large supply-demand
mismatches followed by subsequent system frequency deviations
from their nominal value. The extent of frequency deviations is an
important metric of system resilience, and its timely mitigation is a
central goal of power system operation and control. This paper de-
velops a novel nonlinear model predictive control (NMPC) method
to minimize frequency deviations when the grid is affected by an
unforeseen loss of multiple components. Our method is based on a
novel multi-period alternating current optimal power flow (ACOPF)
formulation that accurately models both nonlinear electric power
flow physics and the primary and secondary frequency response of
generator control mechanisms. We develop a distributed parallel
Julia package for solving the large-scale nonlinear optimization
problems that result from our NMPC method and thereby address
realistic test instances on existing high-performance computing
architectures. Our method demonstrates superior performance in
terms of frequency recovery over existing industry practices, where
generator levels are set based on the solution of single-period clas-
sical ACOPF models.
ABSTRACT. This paper optimizes the configuration of large-scale data centers toward cost-effective, reliable and sustainable cloud supply chains. We formulate an integer optimization model that optimizes server placement within a data center to maximize demand coverage while adhering to space, power and cooling restrictions as well as operational considerations. We propose modeling and algorithmic adjustments to handle online demand arrivals. In particular, we define a tractable single-sample online approximation (SSOA) approach to multi-stage stochastic optimization, which approximates unknown parameters with a single realization and re-optimizes decisions at each decision epoch. Our theoretical results show strong performance guarantees of SSOA in the canonical online generalized assignment and online bin packing settings. Our computational results using real-world data show that our optimization approach can lead to enhanced utilization and reduced power stranding in data centers. Our model and algorithm have been packaged into a software solution deployed in Microsoft's data centers worldwide to support data center managers’ rack placement decisions. Iterative modeling improvements based on user feedback led to a significant increase in adoption over time, resulting in reduced power stranding and multi-million dollar annual cost savings. Ultimately, this paper constitutes one of the first large-scale deployments of a decision-making tool to support data center management, contributing a collaborative decision-making process at the human-machine interface.
ABSTRACT. Adaptive submodular cover is a topic in stochastic optimization with applications in active learning, sensor placement, viral marketing and hypothesis identification. In previous work we have shown that given a monotone and submodular utility function we can get a solution that is within a factor of 4(ln(Q)+1) of the minimum cost cover, where Q is the target value and the approximation is the best possible up to constant factors (unless P = NP). We extend this analysis to the p-th moment of the expected cost, we show that the same adaptive greedy solution attains (p+1)^(p+1)(ln(Q)+1)^p approximation of the minimum p-th moment cost cover. We also present some empirical results that exhibit the performance of our algorithm.
A Semidefinite Hierarchy for Expected Independence Numbers of Random Graphs
ABSTRACT. We introduce convex optimization methods to find upper bounds on the expected independence number of a random graph, in the vein of the Lovász theta function’s bound for the independence number of a deterministic graph. Specifically, we propose a hierarchy of semidefinite programs (SDPs) whose values upper bound the expected independence number. For symmetric random graphs, the last level of the hierarchy is equivalent to a linear program whose optimal value can often be calculated or approximated in closed form. We show that our methods provide good upper bounds in a number of examples, including Erdős-Rényi graphs and geometric random graphs.
ABSTRACT. Motivated by bursty bandwidth allocation and by the allocation of virtual machines to servers in the cloud, we consider the online problem of packing items with random sizes into unit-capacity bins. Items arrive sequentially, but upon arrival an item’s actual size is unknown; only its probabilistic information is available to the decision maker. Without knowing this size, the decision maker must irrevocably pack the item into an available bin or place it in a new bin. Once packed in a bin, the decision maker observes the item’s actual size, and overflowing the bin is a possibility. An overflow incurs a large penalty cost and the corresponding bin is unusable for the rest of the process. In practical terms, this overflow models delayed services, failure of servers, and/or loss of end-user goodwill. The objective is to minimize the total expected cost given by the sum of the number of opened bins and the overflow penalty cost. We present an online algorithm with expected cost at most a constant factor times the cost incurred by the optimal packing policy when item sizes are drawn from an i.i.d. sequence of unknown length. We give a similar result when item size distributions are exponential with arbitrary rates. We also study the offline model, where distributions are known in advance but must be packed sequentially. We construct a soft-capacity PTAS for this problem, and show that the complexity of computing the optimal offline cost is #P-hard. Finally, we provide an empirical study of our online algorithm’s performance.
Learning to Accelerate the Global Optimization of Quadratically-Constrained Quadratic Programs
ABSTRACT. We learn optimal instance-specific heuristics for the global minimization of non-convex quadratically-constrained quadratic programs (QCQPs). Specifically, we consider partitioning-based convex mixed-integer programming relaxations for non-convex QCQPs and propose the novel problem of strong partitioning (SP) to optimally partition variable domains without sacrificing global optimality. We design a local optimization method for solving this challenging max-min strong partitioning problem and replace this expensive benchmark strategy with a machine learning (ML) approximation for homogeneous families of QCQPs. We present a detailed computational study on randomly generated families of QCQPs, including instances of the pooling problem. Numerical experiments demonstrate that SP and its ML approximation significantly reduce the average run times of partitioning-based algorithms by factors of "3.5 - 16.5" and “2 - 4.5”, respectively, over different families of non-convex QCQPs.
A Generalizable End-to-End Learning Approach for Global Optimization of Quadratically-Constrained Quadratic Programs
ABSTRACT. Quadratically-constrained quadratic programs (QCQPs) are a broad class of non-convex optimization problems which occur in applications such as network optimization and the pooling problem, but are often slow to solve in practice. We present a neural network approach that can learn partitioning heuristics which result in significant time saves for existing global optimization solvers. We construct a novel end-to-end loss function so that the desired attributes of an output partitioning scheme can be learned directly, without reference to a pre-selected heuristic. Additionally, we describe how to use the graph scattering transform to produce a single model which works on a variety of different problem structures and sizes. We demonstrate the neural network is able to learn heuristics that speed up QCQP optimization significantly, greatly exceeding those of previous work. Furthermore, we demonstrate the model is able to learn heuristics which generalize well even to problem sizes not seen in training.
Solution of disjoint bilinear programs using simplicial branch-and-bound
ABSTRACT. In this talk, we propose a branch-and-bound algorithm for disjoint bilinear programs (DBPs) whose objective function includes bilinear terms of two vectors of variables lying in separate convex polytopes. By utilizing the reformulation-linearization technique and the double-description method, we devise a splitting scheme that splits the search space into simplices in the branching step. We demonstrate that it ensures the algorithm converges to the global optimums of the DBPs in a finite number of iterations.
Bounding-Focused Discretization Methods for the Global Optimization of Nonconvex Semi-Infinite Programs
ABSTRACT. We use sensitivity analysis to design bounding-focused discretization (cutting-plane) methods for the global optimization of nonconvex semi-infinite programs (SIPs). We begin by formulating the optimal bounding-focused discretization of SIPs as a max-min problem and propose variants that are more computationally tractable. We then use parametric sensitivity theory to design an effective heuristic approach for solving these max-min problems. We also show how our new iterative discretization methods may be modified to ensure that the solutions of their discretizations converge to an optimal solution of the SIP. We then formulate optimal bounding-focused generalized discretization of SIPs as max-min problems and design effective heuristic algorithms for their solution. Numerical experiments on standard nonconvex SIP test instances from the literature demonstrate that our new bounding-focused discretization methods can significantly reduce the number of iterations for convergence relative to a state-of-the-art feasibility-focused discretization method.
Backhaul Profit Maximization with Multiple Vehicles
ABSTRACT. We address the Multi-Vehicle Backhaul Profit Maximization Problem (MVBPMP). The single-vehicle BPMP, which is known to be NP-hard, requires simultaneously solving two problems: (1) determining how to route an empty delivery vehicle back from its current location to its depot by a scheduled arrival time, and (2) selecting a profit-maximizing subset of spot-market delivery requests between various locations on the route subject to the vehicle’s capacity. We propose a compact mixed integer programming formulation for the MVBPMP and a Lagrangian-relaxation-based heuristic for finding high quality solutions within practical time limits. We present results of a computational study on problem instances derived from the BPMP literature.
Pricing and Routing Strategy to Maximize Profit in Congested Roads Considering Interactions with Travelers
ABSTRACT. We introduce an innovative approach for analyzing strategic interactions in transportation networks featuring ride-sourcing services. This study focuses on achieving company-traveler equilibria, whereby a single company optimizes pricing and routing decisions to maximize profitability while considering travelers’ mode choices, modeled via multinomial logit models (MNL). Although profit maximization problems have been extensively studied in the field of revenue management, their application to transportation networks poses unique challenges. To address the inherent non-linear relationship arising from endogenous travel demand, we shift our domain space from price to market share. Subsequently, we derive prices using a one-to-one correspondence within the MNL. This work pioneers the application of such novel techniques in the context of transportation network analysis. Remarkably, the proposed reformulation results in an equivalent problem exhibiting convexity, offering computational efficiency and interpretability. By solving the Karush-Kuhn-Tucker conditions, we characterize user equilibrium with the generalized route cost, which incorporates operating costs by rebalancing and travelers’ disutility caused by congestion. Our approach is empirically validated through a numerical analysis conducted on the widely recognized Sioux Falls network. The results underscore the effectiveness and practical applicability of our method in analyzing transportation networks featuring ride-sourcing services, and opens the stage for important future investigations.
Integer programming formulations for partitioning a graph into low-diameter clusters
ABSTRACT. This paper studies the problem of partitioning the vertices of a graph into s-clubs from the lens of integer programming (IP). An s-club is a subset of vertices for which the diameter of its induced subgraph is at most s. We propose IP formulations and compare them with existing ones theoretically and computationally. We also develop approximation algorithms for the s-club partitioning problem. Furthermore, we propose variable fixing procedures to improve the performance of the IP formulations. We finally test our IP formulations on two existing sets of benchmark instances. Our computational experiments show the superiority of our approach when s is two or three.
The edge-based contiguous p-median problem with connections to territorial districting
ABSTRACT. We introduce the edge-based contiguous p-median (ECpM) problem to partition a road network into a given number of compact and contiguous territories. Two binary programming models are introduced, both of which incorporate node-to-edge (i.e., network) distances to reflect road travel. The first model requires an exponential number of cut set-based constraints to model contiguity; it is paired with separation algorithms that generate only a small number of these constraints, namely branch-and-cut and iterative branch and bound with a cut generation scheme. The second model utilizes a polynomial number of shortest-path constraints to model contiguity and can be solved with off-the-shelf solvers. In addition, we demonstrate that the SPC constraints represent supervalid inequalities of the edge-based p-median problem, meaning that they cut off integer feasible but non-optimal solutions. The two solution approaches associated with the proposed models are tested on road networks with over 2000 nodes and close to 3000 edges. The shortest path-based formulation attains speed ups in computational time up to almost 48x relative to the featured separation schemes.
ABSTRACT. Two-metric projection method is a simple yet elegant algorithm proposed by Bertsekas in 1984 to address bound/box constrained optimization problems. The algorithm's low per-iteration cost and potential for using Hessian information makes it a favourable computation method for this problem class. However, its global convergence guarantee is not studied in the nonconvex regime. In our work, we first investigate the global complexity of such a method for finding first-order stationary solution. After properly scaling each step, we equip the algorithm with competitive complexity guarantees. Furthermore, we generalize the two-metric projection method for solving nonsmooth nonconvex problems such as l1-norm minimization and discuss the computation advantages and complexity.
ABSTRACT. Bilevel optimization has been widely used in a variety of areas such as adversarial training, hyperparameter tuning, image reconstruction, meta-learning, neural architecture search, and reinforcement learning. In this talk, we will present first-order methods for solving a class of bilevel optimization problems through the use of a single or sequential minimax optimization scheme. The first-order operation complexity of the proposed methods will be discussed.
[REFEREED PAPER TRACK ONLY] Convergence Guarantees of a Distributed Network Equivalence Algorithm for Distribution-OPF
ABSTRACT. [REFEREED PAPER TRACK ONLY]
We present ENDiCo-OPF, a distributed algorithm with convergence guarantees for the nonlinear optimal power flow problem. This is based on a previously developed decomposition-based method that employs network equivalence. We derive a sufficient condition under which ENDiCo-OPF is guaranteed to converge for a single iteration step on a local subsystem. We then derive conditions that guarantee convergence of a local subsystem over time. Finally, we derive system-specific conditions that when satisfied in a time sequential manner guarantee global convergence for a line network. We also present simulations using the IEEE-123 bus test system to demonstrate the algorithm's effectiveness and provide additional insights into theoretical results. Full version at https://www.arxiv.org/abs/2210.17465.
Toward a grand unified theory of accelerations in optimization and machine learning
ABSTRACT. Momentum-based acceleration of first-order optimization methods, first introduced by Nesterov, has been foundational to the theory and practice of large-scale optimization and machine learning. However, finding a fundamental understanding of such acceleration remains a long-standing open problem. In the past few years, several new acceleration mechanisms, distinct from Nesterov’s, have been discovered, and the similarities and dissimilarities among these new acceleration phenomena hint at a promising avenue of attack for the open problem. In this talk, we discuss the envisioned goal of developing a mathematical theory unifying the collection of acceleration mechanisms and the challenges that are to be overcome.
Learning and Predictions in Power System Frequency Control
ABSTRACT. Ensuring the frequency stability of electric grids with increasing renewable resources is a key problem in power system operations. Two questions have emerged: How should we design optimal controllers, and how do we deal with time-varying net loads. We will talk about how to using machine learning to design the controllers and make useful predictions in the frequency control problem. We prove that the overall decentralized control architecture achieves frequency restoration.
Unit Commitment without Commitment: A Dynamic Framework for Managing an Integrated Energy System Under Uncertainty
ABSTRACT. Though variability and uncertainty have always posed challenges for power systems, the increasing use of renewable energy sources has exacerbated these issues. At a vertically integrated utility, the system operator manages many generation units -- renewable and otherwise -- and storage units to ensure that the total energy production matches contemporaneous demand. Current industry practice at these utilities involves solving "unit commitment" and "economic dispatch" optimization problems to choose production plans: these models, while complex, do not explicitly incorporate uncertainty. In this paper, we develop a dynamic framework to help system operators manage production under uncertainty. We formulate the problem as a stochastic dynamic program and use Lagrangian methods to decompose the system across units. The Lagrangian model relaxes the demand-matching constraint and introduces stochastic Lagrange multipliers that can be interpreted as prices representing the varying marginal value of energy production; each unit is then operated to maximize its own expected "profit" given these uncertain prices. These unit-specific value functions are then used to incorporate longer-term effects in dispatch decisions. The unit-specific value functions also provide a way to value generation and storage units in an uncertain environment. We develop relevant theory and demonstrate this dynamic framework using data from the Duke Energy Carolinas and Progress systems. Our numerical experiments demonstrate that this dynamic approach is computationally feasible at an industrial scale and can improve on current practice. Specifically, our results suggest that this dynamic approach can reduce operational costs by about 2% on average in the present Duke Energy system and, in a "future" system with increased solar and storage capacity, can reduce operational costs by 4-5% on average. Perhaps more strikingly, this dynamic approach, on average, performs within 0.2-0.3% of production plans based on perfect foresight about future net demands.
ABSTRACT. This paper addresses the problem of computing systemic set-valued risk measures. The proposed method incorporates the clearing mechanism of the Eisenberg-Noe (EN) model, used as an aggregation function, with the value-at-risk (V@R), used as the underlying risk measure. The sample-average approximation (SAA) of the corresponding set-valued systemic risk measure can be calculated by solving a vector optimization problem. For this purpose, we propose a variation of the so-called grid algorithm in which grid points are evaluated by solving certain scalar mixed-integer programming problems, namely, the weighted sum and norm-minimizing scalarizations. We prove the convergence of the SAA optimal values of the scalarization problems to their respective true values. Moreover, we prove the convergence of the approximated set-valued risk measure to the true set-valued risk measure in both the Wijsman and Hausdorff senses. In order to demonstrate the applicability of our findings, we conducted a comprehensive sensitivity analysis by constructing a financial network based on the Bollobás preferential attachment model and modeling the economic disruptions using identically distributed random variables with a Pareto distribution.
Sectoral portfolio optimization by judicious selection of financial ratios via PCA
ABSTRACT. This paper introduces a novel strategy for embedding value investment into portfolio optimization. We utilize Principal Component Analysis (PCA) to identify dominant financial ratios (FRs) specific to each sector in two distinct ways, one from the component solution and the other from each category of the ratio, viz., liquidity, solvency, profitability, and valuation on the basis of their communalities. These sector-specific ratios are then integrated into a two-step sectoral portfolio optimization (SPO) model, incorporating second-order stochastic dominance (SSD) criterion to derive an optimal investment. The strategy formed using the formerly extracted ratios is termed PCA-SPO(A) and the latter PCA-SPO(B). The empirical performance of proposed strategies is assessed through a rolling window approach and over different market scenarios for six sectors of the S&P BSE 500 (India) and S&P 500 (U.S.) indices. The results obtained are compared with those from mean-variance, minimum variance, SPO, and nominal SSD model, both with and without FRs. PCA-SPO(B) consistently outperforms all other models in terms of downside deviation, CVaR, VaR, Sortino, Rachev, and STARR ratios over almost all out-of-sample periods. This approach highlights the importance of value investment, where ratios are carefully selected and embedded quantitatively in the portfolio selection process.
A Multiparametric Programming Approach to Accelerate Neural Network based Model Predictive Control
ABSTRACT. Recently, rectified linear unit (ReLU) based neural networks (NN) were shown to be mixed integer linear representable, thus allowing their incorporation into mixed integer programming (MIP) frameworks. In this work, we propose a novel bound tightening procedure based on a multiparametric (MP) programming formulation of the corresponding ReLU-reformulated optimization problem. The bounding procedure features 1) the generation of valid tight bounds on the individual auxiliary variables introduced from the ReLU NN reformulation and 2) bounds on groups of input and output variables of ReLU NNs. The tightened bounds are valid for all parameter realizations and thus can be utilized for different optimization problem realizations. This tightening only needs to be computed once and offline, thus allowing applications such as control where the online computation of these problems can be computationally prohibitive. Additionally, some aspects of the effect of NN normalization on the computational difficulty of these optimization problems are investigated in conjunction with the proposed bounds-tightening approach. A thorough analysis of how ReLU model weights affect the tractability of the overall optimization problem is provided. The procedure is compared to the base case without the parametric tightening procedure and network weight normalization through a Model Predictive Control (MPC) case study on a nonlinear chemostat.
Improved Airline Network Topologies for Building Resilient & Robust Aviation Infrastructures
ABSTRACT. We propose new mathematical models to generate network connectivity patterns for airline networks in order to provide cost-effective solutions for enhanced operational efficiency and airport connectivity with improved tolerance to node disruptions. The objective is to enable the creation of a modeling and optimization framework for the design and development of commercial airline networks and analyze them from a graph theoretic perspective. We compare these newly developed theoretically optimal network topologies with the existing airline networks to evaluate its performance with regard to connectivity, robustness, resilience and tolerance to targeted attacks.
Edge-concave Underestimators and Relaxations for Deterministic Global Optimization
ABSTRACT. Convex underestimators are often used to obtain a guaranteed bound on the global solution of a nonconvex problem. A key challenge with convex underestimators is that the resulting relaxation is not always tight if the original function is highly nonconvex. In this presentation, we will describe a new relaxation technique for the deterministic global optimization of twice-differentiable continuous problems, which marks a departure from the convexity-based optimization approaches. Instead of using a convex underestimator, the relaxation is based on an underestimator which is edge-concave (componentwise concave). The underestimator is constructed by subtracting a positive quadratic expression such that all non-edgeconcavities in the original function is overpowered by the added expression. Next, we use the linear facets of the vertex polyhedral convex envelope of the edge-concave underestimator to obtain a linear programming (LP)-based relaxation of general nonconvex functions. The method will be presented with theoretical results and will be compared with convexification/ underestimation techniques such as αBB and its variants through test examples.
A Hierarchy of Nonconvex Continuous Reformulations for Discrete Optimization
ABSTRACT. Many discrete optimization problems can be formulated as binary integer programs. Some of the most powerful general-purpose approaches for solving binary linear programs are based on hierarchies of successive convex (linear or semidefinite) relaxations that converge to an exact solution of the original formulation in a finite number of steps (given by the number of variables in the original formulation). Some well-known examples of such approaches include Sherali-Adams, Lovasz-Schrijver and Lasserre hierarchies. The limitation of these approaches is the exponential increase in the number of variables and constraints which hampers their practical applicability to a few levels of the hierarchies. We propose a fundamentally different approach to designing hierarchies of continuous formulations for discrete and combinatorial optimization problems. It shifts the focus from convexifying a given formulation to establishing equivalent non-convex reformulations of the original problem. The aim is to move towards an "equi-maximal" reformulation in which every local optimum is global, a property which is eventually achieved at the final level of the hierarchy. In terms of computational complexity, the improved quality of local maxima in the proposed approach comes with an increased cost of objective function evaluation, as opposed to the problem size in the existing hierarchies. The approach is demonstrated by extending the Motzkin-Straus formulation for the maximum clique problem to a hierarchy of standard polynomial programming formulations with the above-described properties. We explore various properties of these formulations and how they change through the different levels of the hierarchy.
Efficient Pre-training on Massive Datasets: A Data perspective
ABSTRACT. Pre-training deep models with self-supervised contrastive methods allows learning more robust representations with superior generalization performance. However, the benefits of pre-training are often contingent on the large volume of the pre-training data. Recently, improving the quality of the pre-training data has been shown to be more effective in improving the performance than increasing its size. Nevertheless, finding subsets of examples that contribute the most to pre-training has remained an open question. In this talk, I discuss the first theoretically rigorous data selection methods for contrastive pre-training on large (1) image datasets and (2) multimodal image-text datasets. I show that such techniques allow discarding up to 40% of examples during pre-training without harming the performance and can even yield better models.
ABSTRACT. Standard bandit algorithms that assume continual reallocation of measurement effort are challenging to implement due to delayed feedback and infrastructural/organizational difficulties. Motivated by practical instances involving a handful of reallocation epochs in which outcomes are measured in batches, we develop a computation-driven adaptive experimentation framework that can flexibly handle batching. Our main observation is that normal approximations, which are universal in statistical inference, can also guide the design of adaptive algorithms. Instead of the typical theory-driven paradigm, we leverage computational tools and empirical benchmarking for algorithm development. Our approach significantly improves statistical power over standard methods, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings that are difficult for standard methods, involving limited adaptivity, low signal-to-noise ratio, and unknown reward distributions.
In-context learning of linear functions with gradient descent (an invited talk)
ABSTRACT. In-context learning is the striking capability of large language models in learning from data, which demonstrates a strong generalization capability of these models. For example, one can teach LLMs to swap first and second character in a word only by providing examples. In this talk, we study the mechanism of in-context learning of least squares. By studying the landscape of training loss, we show that models with linear attention implicitly implement gradient descent to solve least squares. This result provides novel insights on in-context learning and calls for future studies in this vein. This talk is based on a joint work with Kwangjun Ahn, Xiang Cheng, and Suvrit Sra titled “Transformers learn to implement preconditioned gradient descent for in-context learning”, accepted for NeurIPs 23.
How well do learning and dimension reduction methods for nonlinear dynamics generalize?
ABSTRACT. Here we define a notion of generalization applicable to ergodic dynamical systems. We observe empirically, in elementary examples, that the Neural ODE approach, which learns neural network representations of dynamical systems purely from timeseries data, does not lead to good generalization. The learned representation can be non-ergodic and statistically inaccurate. We discuss theoretically motivated strategies to reduce the generalization gap that extend to deterministic chaotic systems as well as stochastic systems. In the second part of the talk, we introduce dynamics-based generalization analyses for dimension reduction methods for timeseries data. Again considering the case of ergodic dynamics, we show how the spectral data of the associated transition operators are related to the generalization gap. Our analyses provide a rigorous grounding for folk wisdom around generalization of popular dimension reduction methods based on proper orthogonal decomposition (POD) and extended dynamic mode decomposition (DMD). Our results illustrate how ergodic theory and dynamical systems theory can fruitfully intersect with statistical learning theory and optimization to better our understanding of learning dynamical models from timeseries information.
Branch-and-price for prescriptive contagion analytics
ABSTRACT. Predictive contagion models are ubiquitous in epidemiology, social sciences, engineering, and management. This paper formulates a prescriptive contagion analytics model where a decision-maker allocates shared resources across multiple segments of a population, each governed by continuous-time dynamics. We define four real-world problems under this umbrella: vaccine distribution, vaccination centers deployment, content promotion, and congestion mitigation. These problems feature a large-scale mixed-integer non-convex optimization structure with constraints governed by ordinary differential equations, combining the challenges of discrete optimization, non-linear optimization, and continuous-time system dynamics. This paper develops a branch-and-price methodology for prescriptive contagion analytics based on: (i) a set partitioning reformulation; (ii) a column generation decomposition; (iii) a state-clustering algorithm for discrete-decision continuous-state dynamic programming; and (iv) a tri-partite branching scheme to circumvent non-linearities. Extensive experiments show that the algorithm scales to very large and otherwise-intractable instances, outperforming state-of-the-art benchmarks. Our methodology provides practical benefits in contagion systems; in particular, it can increase the effectiveness of a vaccination campaign by an estimated 12-70\%, resulting in 7,000 to 12,000 extra saved lives over a three-month horizon mirroring the COVID-19 pandemic. We provide an open-source implementation of the methodology in an online repository to enable replication
Electric Vehicle Routing: Subpath-Based Decomposition
ABSTRACT. A critical challenge in transportation decarbonization lies in the electrification of long-haul operations along highways. To support this transition, this paper tackles the electric vehicle routing problem, and proposes a modelling and algorithmic approach to solve it without time or charge discretization. Starting with column generation for the path-based set-partitioning formulation, we develop a novel algorithm for the pricing subproblem which generates and stitches together non-dominated subpaths. We also use ng-route relaxations iteratively, expanding neighborhoods until one obtains elementary paths. Lastly, we apply cuts which strengthen the linear relaxation and show how the pricing problem can be solved including new duals from these cuts. Numerical results show that our approach outperforms state-of-the-art algorithms and scales to realistic problem sizes.
ABSTRACT. Shared rides, which pool individual riders into a single vehicle, are essential for mitigating congestion and promoting more sustainable urban transportation. However, major ridesharing platforms have long struggled to maintain a healthy and profitable shared rides product. To understand why shared rides have struggled, we analyze procedures commonly used in practice to set static prices for shared rides, and discuss their pitfalls. We then propose a pricing policy that is adaptive to matching outcomes, dubbed match-based pricing, which varies prices depending on whether a rider is dispatched alone or to what extent she is matched with another rider. Analysis on a single origin-destination setting reveals that match-based pricing is both profit-maximizing and altruistic, simultaneously improving cost efficiency and reducing rider payments relative to the optimal static pricing policy. These theoretical results are validated on a large-scale simulation with Chicago ridesharing data. The improvements in efficiency and reductions in payments are especially noticeable when costs are high and demand density is low, enabling healthy operations where they have historically been most challenging.
ABSTRACT. We consider a dispatching or scheduling model with release times and batching, where the time to dispatch a set of orders or jobs is submodular, and multiple vehicles or machines are available with the objective of minimizing the makespan. We propose and contrast several column generation mixed-integer formulations of the problem; we discuss the special case of interval-solvable functions, where an optimal solution only includes batches of the form [i, j]; finally, we present computational results for applications in same-day delivery and machine scheduling with family setup times.
ABSTRACT. To better handle real-time load and wind generation volatility in unit commitment, we present an enhancement to the computation of security-constrained unit commitment (SCUC) problem. More specifically, we propose a two-stage optimization model for SCUC, which aims to provide a risk-aware schedule for power generation. Our model features a data-driven uncertainty set based on principal component analysis, which accommodates both load and wind production volatility and captures locational correlation of uncertain data. To solve the model more efficiently, we develop a decomposition algorithm that can handle nonconvex subproblems. Our extensive experiments on NYISO dataset show that the risk-aware model protects the public from potential high costs caused by adverse circumstances.
Performance Risk Scoring of Risk-free Renewable Generation Bids
ABSTRACT. Renewable generators can bid a conservative segment of the forecasted power generated into the day-ahead market with the aim of generating greater risk-adjusted profitability. However, these bids and their underlying risks may vary across renewable generators, which must be calibrated for a reliable clearing of the day-ahead market. A rigorous third-party performance risk scoring of these risk-free bids is a valuable input and validation in support of a robust day-ahead market clearing framework. In this paper, we develop an ensemble machine learning approach for performance risk scoring of risk-free renewable generation bids and evaluate the performance of these bids made by different renewable resources and in different geographies.
ABSCORES, Managing Risk and Uncertainty in Electricity Systems Using Banking Scoring and Rating Methodologies
ABSTRACT. We present a usable multi-layered framework that facilitates the implementation of data-enabled recourse actions, instead of worst-case risk management, and manages prospective decisions about the system dispatch and planning
The ABSCORES platform establishes an Electric Assets Risk Bureau, with different scores customized according to the application required.
We leverage scoring and ratings methodologies from banking and financial institutions alongside current optimization methods in dispatching power systems to help system operators and electricity markets schedule resources. This approach is based on the observation that there are major discrepancies between the power scheduled by a system operator and the actual power generated/consumed. These discrepancies—exacerbated by unplanned contingencies (e.g., natural disasters)—are caused by multiple factors, including the different financial, environmental and risk preferences of power producers, consumers, and aggregators.
Optimizing design and dispatch of a renewable energy system with combined heat and power
ABSTRACT. We embellish a mixed-integer program that prescribes a set of renewable energy, conventional generation, and storage technologies to procure, along with a corresponding dispatch strategy. Specifically, we add combined heat and power to this set. The model minimizes fixed and operational costs less incentives for the use of various technologies, subject to a series of component interoperability and system-wide constraints. The resulting mixed-integer linear program contains hundreds of thousands of variables and constraints. We demonstrate how to efficiently formulate and solve the corresponding instances such that we produce near-optimal solutions in minutes. A previous rendition of the model required hours of solution time for the same instances. We then apply this model to a case study for a hospital in South Africa, considering its need for reliable electricity in light of multiple outages that might occur over the course of a year, as well as its high heating and cooling loads. Our results show that optimal design and dispatch decisions for the distributed generation system address reliability.
Local convergence of a sequential quadratic programming method for a class of nonsmooth nonconvex objectives
ABSTRACT. A sequential quadratic programming (SQP) algorithm is designed for nonsmooth optimization problems with upper-C^2 objective functions. Upper-C^2 functions are locally equivalent to difference-of-convex (DC) functions with smooth convex parts. They arise naturally in many applications such as certain classes of solutions to parametric optimization problems, e.g., recourse of stochastic programming. We show that the algorithm is globally convergent under reasonable assumptions. Moreover, we study the local convergence behavior of the algorithm under additional assumptions of Kurdyka-{\L}ojasiewicz (KL) properties, which have been applied to many nonsmooth optimization problems. Due to the nonconvex nature of the problems, a special potential function is used to analyze local convergence. We show that under acceptable assumptions, upper bounds on local convergence can be proven. Additionally, we show that for a large number of optimization problems with upper-C^2 objectives, their corresponding potential functions are indeed KL functions. Numerical experiment is performed with a power grid optimization problem that is consistent with the assumptions and analysis in this paper.
A new working set method for nonlinear optimization
ABSTRACT. In this talk, we introduce a new active set method (ASM) called a working set method (WSM) for solving a general nonlinear inequality constrained minimization problem in a Hilber space. Mathematical analysis will be presented to validate the method and to show its merits over other ASMs. Since the method is quite general and new, some results on its implementation and application are presented. Numerical examples on some benchmark problems are presented to show the efficiency of this method.
Projection-Free Methods for Solving Nonconvex-Concave Saddle Point Problems
ABSTRACT. In this paper, we investigate a class of constrained saddle point (SP) problems where the objective function is nonconvex-concave and smooth. This class of problems has wide applicability in machine learning, including robust multi-class classification and dictionary learning. While several projection-based primal-dual methods have been developed for tackling such problems, the availability of methods with projection-free oracles remains limited. To fill this gap, we propose efficient single-loop projection-free methods that rely on first-order information. Specifically, through the use of regularization and nested approximation techniques, we introduce a primal-dual conditional gradient method with convergence guarantees, which solely employs linear minimization oracles to handle constraints. To the best of our knowledge, our proposed algorithms stand among the first projection-free methods with convergence guarantees for solving nonconvex-concave SP problems.
Optimization with Superquantile Constraints: A Fast Computational Approach
ABSTRACT. We present an efficient and scalable second-order computational framework for solving large-scale optimization problems with superquantile constraints. Unlike empirical risk models, superquantile models have non-separable constraints that make typical first-order algorithms difficult to scale. We address the challenge by adopting a hybrid of the second-order semismooth Newton method and the augmented Lagrangian method, which takes advantage of the structured sparsity brought by the risk sensitive measures. The key to make the proposed computational framework scalable in terms of the number of training data is that the matrix-vector multiplication in solving the resulting Newton system can be computed in a reduced space due to the aforementioned sparsity. The computational cost per iteration for the Newton method is similar or even smaller than that of the first-order method. Our developed solver is expected to help improve the reliability and accuracy of statistical estimation and prediction, as well as control the risk of adverse events for safety-critical applications.
A Quantum-Classical Performance Separation in Nonconvex Optimization
ABSTRACT. In this paper, we identify a family of nonconvex continuous optimization instances, each d-dimensional instance with 2d local minima, to demonstrate a quantum-classical performance separation. Specifically, we prove that the recently proposed Quantum Hamiltonian Descent (QHD) algorithm [Leng et al., arXiv:2303.01471] is able to solve any d-dimensional instance from this family using O(d^3) quantum queries to the function value and O(d^4) additional 1-qubit and 2-qubit elementary quantum gates. On the other side, a comprehensive empirical study suggests that representative state-of-the-art classical optimization algorithms/solvers (including Gurobi) would require a super-polynomial time to solve such optimization instances.
Analysis of Quantum Linear System Algorithms for Linear Systems in Tensor Format
ABSTRACT. Solving linear systems is the foundation of many algorithms. Recently, quantum linear system algorithms (QLSAs) have attracted great attention since they solve quantum linear systems exponentially faster than classical algorithms in terms of dimension. However, these QLSAs assume the existence of certain oracles in preprocessing data and building circuits, making these algorithms not implementable on current quantum computers. In this work, we focus on the application of QLSAs on linear systems in low rank tensor format, which is widely used when solving discretized PDEs. There have been works using modified Krylov subspace methods to solve such linear systems with per-iteration complexity being polylogarithmic of the dimension. We propose an implementation of a recently proposed adiabatic-inspired QLSA and perform a detailed analysis of the circuit depth of the implementation. We show the total complexity of our implementation is polylogarithmic of the dimension, which is comparable to the per-iteration complexity of the classical methods.
Divide and conquer approach for distributed quantum computing
ABSTRACT. We present a quantum divide-and-conquer algorithm designed to facilitate the application of distributed computing to constrained combinatorial optimization problems. This algorithm comprises three key elements: firstly, the classical partitioning of a target graph into multiple subgraphs; secondly, variational optimization conducted over these subgraphs; and finally, a quantum circuit cutting technique that enables independent optimization on distinct quantum processors.
Rigorously Approximating the Local Hamiltonian Problem
ABSTRACT. We discuss several works related to approximating natural quantum generalizations of constraint satisfaction problems. In contrast to the landscape of classical optimization, quantum problems offer rich behavior where analogies of standard techniques which achieve the optimal approximation factor classically are known to be suboptimal. I will give an overview of our work, discussing three results: (1) an approximation algorithm for arbitrary 2-local positive semi-definite (PSD) local Hamiltonians(arXiv 2012.12347) (2) an SDP based approximation algorithm for a natural quantum analog of Max-Cut (arXiv 2105.05698) and (3) an optimal product state algorithm for the same problem (arXiv 2206.08342). We build an understanding of monogamy of entanglement inequalities satisfied by polynomial-time relaxations of the local Hamiltonian problem for the algorithms in 2105.05698 and 2206.08342. These inequalities have been important ingredients in subsequent (best known) approximation algorithms by other authors.
ABSTRACT. The modeling efforts towards handling non-stationary data are important in practice but remains underexplored. In this presentation, our discussion centers on the enhanced performance capabilities of non-parametric Bayesian surrogate models. We focus on Bayesian Gaussian Processes (GP) and Bayesian Additive trees (BART).
We introduce an advanced approach to non-stationary modeling in GPs, achieved by integrating deep kernels, a technique that significantly improves non-stationary fitting precision.
We present a novel method for scaling BART tree surrogate models using sharding techniques, enhancing their adaptiveness and scalability on massive datasets.
These novel approaches invite us to rethink possible approaches to non-stationarity using Bayesian non-parametric modeling.
ABSTRACT. We consider iterative methods for finite-sum minimization problems in which the summand functions are computationally expensive, making it undesirable to evaluate all summands on every iteration. We present the idea of stochastic average model methods, which sample component functions according to a probability distribution on component functions that minimizes an upper bound on the variance of the resulting stochastic model. These method are conceptually similar to, and inspired by, ``semi-stochastic" gradient methods, which were motivated in large part by problems coming from the machine learning community. We present promising numerical results on a corresponding extension to the derivative-free model-based trust-region solver POUNDERS.
Integration of Machine Learning and Genetic Algorithm as a Generative Model
ABSTRACT. Standard generative models such as variational autoencoders, generative adversarial networks, or properly implemented recurrent neural networks have already been used to accelerate antimicrobial discovery, to design functional protein variants, peptide-based drugs, and small molecules. However, these generative models are limited by a lack of interpretability and by the ability to fine-tune their search direction. Although the genetic algorithm (GA) is used to solve optimization tasks and is not considered a generative model, in our project we implemented a machine learning (ML)-based predictive model in the fitness function, introduced mutation operators which enable structural modifications, and constructed a GA which discovers novel peptides of function that ML classifies. This flexible generative framework is enriched by two penalty functions which guide the search towards high diversity in the population and promote a desired feature. The application of ML allows us to deal with the incomplete understanding of complex phenomena such as the sequence-to-function relationship in peptide chemistry, while the stochastic nature of the GA allows us to repeat the evolutionary process and obtain a different output, making the model highly reusable. In addition, this optimization method can perform an unbiased search that is free of prejudice, assumptions, or other human-introduced biases and may ultimately find solutions superior to those that come from a gradual search guided by rational design.
Cost-Optimization of Process-Scale Desalination Systems Incorporating Surrogate-based Water Chemistry Models
ABSTRACT. Incorporating water chemistry phenomena into water treatment design is a critical aspect of evaluating emerging water treatment processes, but one that is often overlooked in literature due to the complexity of water chemistry processes and the limitations on data availability. In this talk, we will discuss how machine learning surrogates can provide a powerful mechanism for capturing complex water chemistry phenomena such as chemical pre-treatment and mineral scaling, thereby enabling the design, optimization, and technoeconomic assessment of advanced desalination technologies such as high-pressure reverse osmosis (HPRO). Finally, using WaterTAP, an advanced technoeconomic platform for water treatment, we explore the implications of pretreatment decisions and potential minimum liquid discharge targets on desalination costs and system operation.
Adversarially Robust Optimal Control with Causal Transport Distance
ABSTRACT. We investigate stochastic optimal control problems in the presence of ambiguous disturbance distributions. To address this dynamic uncertainty, we propose a minimax distributionally robust formulation based on the $\infty$-causal transport distance. We develop a nested dynamic programming reformulation for this problem. To tackle the dynamic programming problem involving continuous states and controls, we devise a stochastic dual dynamic programming algorithm. Numerical experiments on energy planning demonstrate the effectiveness of our approach.
Multi-stage stochastic programming for integrated hurricane relief logistics and evacuation planning
ABSTRACT. In this paper, we study an integrated hurricane relief logistics and evacuation planning (HRLEP) problem. We propose multi-stage stochastic programming models that integrate the hurricane relief item pre-positioning problem and the hurricane evacuation planning problem, which are often treated as separate problems in the literature, by incorporating the historical data on hurricane forecast errors (FE). Specifically, we the historical FE data into an auto-regressive model of order one (AR-1), from which we generate FE realizations to create evacuation demand scenarios. We evaluate the value of integration and the value of adaptive planning by comparing the decision policy given by the proposed multi-stage stochastic programming model with a static decision policy based on two-stage stochastic programming. We will present extensive numerical experiment results and sensitivity analyses based on real-world data from Hurricane Florence and Hurricane Ian.
Data-driven Policies for Multistage Stochastic Programming
ABSTRACT. This talk presents a study of data-driven policies or decision rules for multistage stochastic linear programming (MSLP) problems, called basic-feasible policies. These policies are piecewise affine mappings whose design exploits the structure of the polyhedral structure of the feasible regions and value functions of the MSLP problems. We introduce approaches to derive these policies for static instances of MSLP problems where the underlying distribution of random quantities has finite support. We also propose a sequential sampling approach to derive policies for stochastic instances of the MSLP problems with continuous random variables. We establish the performance of solution algorithms that embed these basic-feasible policies.
ABSTRACT. Multistage stochastic programming is a powerful tool allowing decision-makers to revise their decisions at each stage based on the realized uncertainty. However, organizations may not be fully flexible, as decisions cannot be revised too frequently due to their high organizational impact. Consequently, decision commitment becomes crucial to ensure that initially made decisions remain unchanged for a certain period. This study introduces adaptive multistage stochastic programming, a new optimization paradigm that strikes an optimal balance between decision flexibility and commitment by determining the best stages to revise decisions depending on the allowed level of flexibility. We introduce a novel mathematical formulation and theoretical properties eliminating certain constraint sets. Furthermore, we develop a decomposition method that effectively handles mixed-integer adaptive multistage programs by adapting the integer L-shaped method and Benders decomposition. Computational experiments on stochastic lot-sizing and generation expansion planning problems show substantial advantages attained through optimal selections of revision times when flexibility is limited, while demonstrating computational efficiency of the proposed properties and solution methodology. Optimizing revision times in a less flexible case can outperform arbitrary selection in a more flexible case. By adhering to these optimal revision times, organizations can achieve performance levels comparable to fully flexible settings.
ABSTRACT. We study the unit-demand pricing problem for Hicksian demand. In the well-studied Bayesian unit-demand pricing problem, when valuations are IID, an optimally chosen single price is a two-approximation of the optimal pricing. We find that under Hicksian demand, a well-chosen single price is always optimal, implying that the paradoxical advantage of non-symmetric prices vanishes when valuations are symmetric in more conservative models. We also consider extensions to independent but non-identical valuations, and give new structural results and approximation guarantees for exchangeable valuation distributions.
Min-max-min Optimization with Smooth and Strongly Convex Objectives
ABSTRACT. We consider min-max-min optimization with smooth and strongly convex objectives. Our motivation for studying this class of problems stems from its connection to the k-center problem and the growing literature on min-max-min robust optimization. In particular, the considered class of problems non-trivially generalizes the Euclidean k-center problem in the sense that distances in this more general setting do not necessarily satisfy metric properties.
We present a 9 kappa-approximation algorithm (where kappa is the maximum condition number of the functions involved) that generalizes a simple greedy 2-approximation algorithm for the classical k-center problem. We show that for any choice of kappa, there is an instance with a condition number of kappa per which our algorithm yields a 4 kappa + 4\sqrt{kappa}+1)-approximation guarantee, implying that our analysis is tight when kappa = 1. Finally, we compare the computational performance of our approximation algorithm with an exact mixed integer linear programming approach.
ABSTRACT. Inverse optimization involves inferring the unknown parameters of a “forward” optimization problem from a given solution to the forward problem. In this talk, we consider the case in which the forward problem is a mixed-integer linear program (MILP). While this class of problems has received growing attention, ongoing work primarily considers norm-based inverse objectives and solving the corresponding inverse problems with cutting-planes based methods that lack theoretical guarantees. In this study, we consider more general convex inverse objectives and propose first order methods for solving the corresponding inverse problems. We establish convergence rates for the methods in terms of the number of forward optimization problem solves and a natural condition measure. We also assess the empirical performance of our methods through computational experiments.
Learning Discrete-Time Markov Chains from Aggregate Data
ABSTRACT. We consider the problem of learning the transition probabilities of a Discrete-Time Markov Chain (DTMC) from aggregate data, i.e., estimates of the probability that the DTMC is in certain states at certain times. Our motivation for studying the problem stems from health studies that consider calibrating (i.e., algorithmically adjusting) the transition probabilities of Markov chain models of disease progression to agree with target disease prevalence data. The calibration problem has primarily been tackled with black-box simulation-based calibration methods, leaving open a number of interesting computational and mathematical questions that we investigate in this talk. We formulate the calibration problem as a non-convex optimization problem and consider solving it with a first order method that only require relatively inexpensive matrix-vector multiplications. We establish conditions under which the transition probabilities are identifiable and the first order method converges to them. We also investigate the performance of the method through computational experiments.
Recent advances in computing Nash equilibria in Markov Games
ABSTRACT. In this talk, we will focus on the problem of computing Nash equilibria in Markov Games. The seminal work of Daskalakis et al. (and later works) indicates that computing Nash equilibria in multi-player Markov games is a computationally hard task. This fact raises the question of whether or not computational intractability can be circumvented if one focuses on specific classes of Markov games. We answer the question in the affirmative by providing computationally efficient policy gradient-type methods that approximate Nash equilibria for various classes of Markov games, including Markov potential games, adversarial team Markov games, and zero-sum polymatrix Markov games.
Two-Timescale Q-Learning with Function Approximation in Zero-Sum Markov Games
ABSTRACT. Many real-world applications by nature involve the decision-making and learning of agents in strategic environments, such as board/strategic games, resource allocation, cybersecurity, pricing, and auction bidding. In this talk, for the development of theoretically solid algorithms, we consider two-player zero-sum Markov games, a benchmark setting where conflicts of interest come into play. We propose a learning algorithm called two-timescale Q-learning with function approximation that is payoff-based, convergent, rational, and symmetric between the two players. In the special case of linear function approximation, we show that our proposed algorithm enjoys the last-iterate finite-sample guarantees. To the best of our knowledge, this is the first payoff-based independent learning algorithm that successfully incorporates function approximation and has last-iterate guarantees.
To establish the results, we model our proposed algorithm as a two-timescale stochastic approximation and derive the finite-sample bound through a Lyapunov-based approach. The key novelty lies in constructing a valid Lyapunov function to capture the evolution of the slow-timescale iterates, which we use to compute the policies. Specifically, through a change of variables, we demonstrate that the update equation of the slow-timescale iterates resembles the smoothed best-response dynamics, which are classical learning dynamics in games. This connection enables us to construct a valid Lyapunov function via a generalized variant of the Moreau envelope of the regularized Nash gap. The roadmap for the construction of our Lyapunov function might be of independent interest broadly for studying the behavior of stochastic approximation algorithms.
Decentralized and Networked Policy Gradient for Solving Markov Potential Games
ABSTRACT. We design a multi-agent and networked policy gradient algorithm in Markov potential games. Each agent has its own rewards and utility as functions of joint actions and a shared state among agents. The state dynamics depend on the joint actions taken. Differentiable Markov potential games are defined based on the existence of a potential (value) function having partial gradients equal to the local gradients of agents' individual value functions. Agents implement continuous parameterized policies defined over the state and other agents' parameters to maximize their utilities against each other. Agents compute their stochastic policy gradients to update their parameters with respect to their local estimates of Q-functions and joint parameters. The updated parameters are shared with neighbors over a time-varying network. We prove the convergence of joint parameters to a first-order stationary point of the potential function almost surely for any type of state and action spaces. We also show that the convergence rate to reach $\epsilon$-approximate first-order stationary point is $O(\epsilon^-2)$ with a chosen step-size rate. Numerical results illustrate the potential advantages of using networked policies compared to independent policies.
Optimistic Policy Iteration In Linearly-Solvable Multi-Agent Games
ABSTRACT. We present schemes for performing optimistic policy iteration in linearly-solvable multi-agent, discounted infinite-horizon Markov games with provable convergence to the optimal joint policy.
We first introduce our Kullback-Leibler regularized optimistic policy iteration scheme (KL-OPI) in identical-interest Markov games.
We show that the KL-OPI scheme only propagates a decaying policy evaluation error but not a policy improvement error. In addition, we extend the scheme to the Markov Potential Games (MPGs) setting as the KL-OPI-MPG scheme and show that we obtain a stationary Nash equilibrium point in the MPG. Moreover, we relate the MPG game to the original identical-interest game by showing that our KL-OPI-MPG scheme gives a $\nu-$approximate Nash equilibrium point in the identical-interest Markov game. Based on the two schemes, we design our asynchronous tabular reinforcement learning algorithms. The proposed algorithms obtain the policy for each agent by updating the value function of a subset of initially sampled joint states using first-visit Monte Carlo trajectories. Simulation results on a multi-agent Stag-Hare example validate our learning algorithms' performance in terms of minimizing the total discounted costs.
Minimal representations of generalized power cones
ABSTRACT. In this work, we analyze minimal representations of generalized power cones as simpler cones. We derive some new results on the complexity of the representations, and we provide a procedure to construct a minimal representation by means of second order cones in case the parameters defining the cones are rational. The construction is based on the identification of the cones with a graph, the mediated graph. Then, we develop a mixed integer linear optimization formulation to obtain the optimal mediated graph, and then, the minimal representation. We present the results of a series of computational experiments in order to analyze the computational performance of the approach, both to obtain the representation and its incorporation into a practical conic optimization model that arises in facility location.
Integer Points in Convex Cones: The Case of PSD and SOC
ABSTRACT. The integer points within a convex cone form a semigroup. In this talk, we will extend classical results in integer linear programming to conic integer programming in the special cases of the cone of positive semidefinite matrices (PSD) and the second order cone (SOC). We show that both of these conical semigroups have a notion of a finite generating set similar in spirit to Hilbert bases, but requiring the action of a finitely generated group. This allows us to discuss generating sets of conical semigroups from arbitrary convex cones, rather than remaining confined to polyhedral cones. Lastly, we extend notions of total dual integrality, Gomory-Chvátal closure, and Carathéodory rank to integer points in arbitrary convex cones.
Mixture Model for Contextual Route Choice in Multimodal Transportation Systems
ABSTRACT. The development of personalized travel recommendation systems (TRSs) is crucial in assisting travelers to navigate intricate transportation networks and providing affordable, reliable multimodal mobility options. However, predicting route behavior based on trip data becomes a challenging task because of the diversification of intermodal options and latent factors such as trust in the recommended routes. In this paper, we propose an Inverse Mixture Model (IMM) composed of two interconnected components: (1) a set of dynamic route choice models with arc-to-arc transition functions contingent on the contextual traffic conditions, and (2) mixture probabilities determining the selection of route choice models. There mixture probabilities are influenced by the expected value of following recommended routes, thus emulating travel behavior in dynamic and convoluted traffic networks. Two EM algorithms are tailored for estimating IMMs in the following scenarios: (1) when mixture probabilities are predetermined upon arriving at the origin, and (2) when mixture probabilities vary at each intersection. The effectiveness of IMM and estimation methods are validated through our numerical results, using both synthesized and real-world trip data. The expansion to a mixture of route choice enables the identification of heterogeneous routing behavior at a granular level, thereby facilitating the development of data-driven travel behavior prediction models.
Profits or Loss: Installing Electric Vehicle Chargers on Public Parking Lots
ABSTRACT. Electric vehicles (EVs) are the promising key to sustainable transport systems. EV fleet will be the Texas power grid stakeholder and service supplier because of tremendous powertrain efficiency gains. Limited public charging infrastructure and large upfront investment could be a bottleneck in large adoption and use of EV. We propose a data-driven study to examine price elasticity of operating EV chargers in public parking lots. The study determines optimal siting and sizing of EV charging stations to maximize profitability of charger investment. Specifically, we first study an open dataset of historical parking information for 18 public parking lots in Santa Monica, CA from 2014 to present to identify factors that affect parking demand, including time of year, traffic nearby, parking prices and the pandemic event. Then we recommend the best price policy for EV chargers installed in public parking lots, following the price elasticity resulting from the historical data. Finally, we propose a robust optimization model to determine the optimal locations and sizes of EV chargers by considering uncertain future demand. The approach of obtaining optimal charger allocations can be rolled out for multiple phrases through 2050.
An infeasible interior-point arc-search algorithm for spacecraft trajectory optimization
ABSTRACT. Optimization is a critical step in the trajectory design process for spacecraft
because it can lower the time or propellant mass required to achieve a mission’s
objectives. Two popular methods have been used to solve the nonlinear optimization
problem: the sequential quadratic programming (SQP) method and
the interior-point method (IPM). It can be shown that the first-order IPM is
essentially equivalent to the steepest descent algorithm, which is inefficient. In
this presentation, we introduce a second-order IPM, which uses an arc-search
technique to approximate the central path better than the traditional line-search,
thereby significantly improving optimization efficiency. The second-order method
requires solving two linear systems instead of one, but the coefficient matrix is
the same for both systems, meaning only one matrix decomposition is required.
Additionally, most of the elements of the right-hand side vector of the second
system are computed during the assembly of the coefficient matrix. Preliminary
tests are performed using some CUTEst benchmark problems and show promising
result. Several trajectory design problems, including a two-dimensional circular
orbit Hohmann transfer, an elliptical orbit Hohmann transfer, and a spiraling
low-thrust orbit transfer, are successfully solved.
Optimization of Bioenergy supply chain for agricultural waste by Considering Environmental Aspects in Ireland
ABSTRACT. This research focuses on the optimisation of biorefinery location and supply chain design, seeking to revolutionize agricultural waste management in Ireland. It targets a deep emphasis on the optimization component, seeking to develop a sustainable bioenergy supply chain that considerably decreases environmental impact and maximizes resource efficiency. Through the integration of advanced techniques such as Geographic Information Systems (GIS), this research pioneers a holistic methodology for selecting optimal facility locations and designing efficient supply chains. By doing so, it not only tackles agricultural waste challenges but also sets the stage for a more environmentally responsible and sustainable future in biorefinery operations. The study's findings and methodology have the potential to revolutionize how industries approach waste management and network design.