Harry Jiang (Carnegie Mellon University, United States) Xiaoxi Zhang (Sun Yat-Sen University, United States) Carlee Joe-Wong (Carnegie Mellon University, United States)
DOLL: Distributed OnLine Learning Using Preemptible Instances
ABSTRACT. Training large machine learning models, e.g., large language models, is often prohibitively costly, particularly for small enterprises. Thus, much recent work has proposed to run such training workloads on preemptible instances, which cost much less than traditional on-demand cloud instances but do not provide reliable access to computing resources. Mitigating the resulting instability in access to compute resources is made even more challenging when the model training data arrives in an online fashion. In such datastream settings, both the data arrivals and compute resources may bottleneck training. We propose DOLL, which designs, analyzes, and optimizes a framework to train machine learning models on distributed datastreams. We derive theoretical guarantees on model convergence in DOLL's framework and validate them experimentally.
ABSTRACT. In many applications such as rationing medical care and supplies, university admissions, and the assignment of public housing, the decision of who receives an allocation can be justified by various normative criteria (ethical, financial, legal, etc). Such settings motivate a natural priority-respecting allocation problem: several categories, each with a quota of interchangeable items, wish to allocate the
items among a set of agents. Each category has a list of eligible agents and a priority ordering over these agents; agents may be eligible in multiple categories. The goal is to select a valid allocation: one that respects quotas, eligibility, and priorities, and ensures Pareto efficiency. I will describe some recent work of ours, which provides a complete algorithmic characterization of all valid allocations. I will then show how this characterization allows us to understand the strange complexity landscape of such problems, as well as characterize optimal tradeoffs in dynamic priority-respecting allocation. In particular, in settings where allocations must be made to agents arriving online via some stochastic process, we show that while insisting on zero priority violations leads loss in efficiency which is linear in the horizon, one can design allocation policies that ensure that the ex post sum of the efficiency loss and priority violations is constant, independent of the horizon.
Learning in Network Games with Dynamic Populations
ABSTRACT. In many practical settings, agents need to make strategic decisions in a network environment, but lack information or computational capabilities to determine the best policy ahead of time and thus must learn online what strategy to use. Such a learning process becomes challenging when the underlying network of interactions is time-varying and the number of agents is very large and dynamic, with agents joining or leaving the system. In this talk, we investigate whether agents can adapt to such a non-stationary environment and nonetheless learn a suitable strategy, based on their past actions and observations. In particular, we focus on learning dynamics over large linear-quadratic network games. We model time-varying populations and their changing interactions as stochastic network realizations from a random network model and demonstrate that under suitable gradient dynamics, agents almost surely converge to a strategy that corresponds to the Nash equilibrium of the game played over the expected network of interactions. Moreover, we prove that, for any stage game, the strategy learned by the agents is, with high probability, an epsilon-Nash equilibrium of the stage game where epsilon decreases with increasing population size. We apply these results to study dynamic pricing games in which online merchants must adapt their pricing strategy to changing customers with varying demands.
09:30
Ozan Candogan (University of Chicago, Booth School of Business, United States) Manxi Wu (Cornell University, Operations Research and Information Engineering, United States)
Information Design for Spatial Resource Allocation
ABSTRACT. In this paper, we study platforms where resources and jobs are spatially distributed, and resources have the flexibility to strategically move to different locations for better payoffs. The price of the service at each location depends on the number of resources present and the market size, which is modeled as a random state. Our focus is on how the platform can utilize information about the underlying state to influence resource repositioning decisions and ultimately increase commission revenues. We establish that in many practically relevant settings a simple monotone partitional information disclosure policy is optimal. This policy reveals state realizations below a threshold and above a second (higher) threshold, and pools all states in between and maps them to a unique signal realization. We also provide algorithmic approaches for obtaining (near-)optimal information structures that are monotone partitional in general settings.
Discovering and Learning from Closed-Loop Failures of Vision-based Controllers.
ABSTRACT. Machine learning-driven image-based controllers allow robotic systems to take intelligent actions based on the visual feedback from their environment. Understanding when these controllers might lead to system safety violations is important for their integration in safety-critical applications and engineering corrective safety measures for the system. In this talk, we will propose a framework for detecting such image-based failures of closed-loop systems using tools from Hamilton-Jacobi reachability analysis. This approach blends simulation-based analysis with HJ reachability methods to compute an approximation of the backward reachable tube (BRT) of the system, i.e., the set of unsafe states for the system under vision-based controllers. Besides its scalability to high-dimensional visual inputs, explicit computation of BRT allows the proposed approach to capture non-trivial system failures that are difficult to expose via random simulations. Finally, we will go over how we can take advantage of the insights from BRT’s value function to further construct an anomaly detector that can flag inputs at system runtime, allowing us to take corrective measures through a fallback controller that ensures system safety in real-time.
08:50
Mahyar Fazlyab (Johns Hopkins University, United States) Taha Entesari (Johns Hopkins University, United States)
Certified Robustness via Dynamic Margin Maximization and Improved Lipschitz Regularization
ABSTRACT. To improve the robustness of deep classifiers against adversarial perturbations, many approaches have been proposed, such as designing new architectures with better robustness properties (e.g., Lipschitz-capped networks), or modifying the training process itself (e.g., min-max optimization, constrained learning, or regularization). These approaches, however, might not be effective at increasing the margin in the input (feature) space. As a result, there has been an increasing interest in developing training procedures that can directly manipulate the decision boundary in the input space. In this paper, we build upon recent developments in this category by developing a robust training algorithm whose objective is to increase the margin in the output (logit) space while regularizing the Lipschitz constant of the model along vulnerable directions. We show that these two objectives can directly promote larger margins in the input space. To this end, we develop a scalable method for calculating guaranteed differentiable upper bounds on the Lipschitz constant of neural networks accurately and efficiently. The relative accuracy of the bounds prevents excessive regularization and allows for more direct manipulation of the decision boundary. Furthermore, our Lipschitz bounding algorithm exploits the monotonicity and Lipschitz continuity of the activation layers, and the resulting bounds can be used to design new layers with controllable bound on their Lipschitz constant. Experiments on the MNIST and CIFAR-10 data sets verify that our proposed algorithm obtains competitively improved results compared to the state-of-the-art.
09:10
Sicun Gao (University of California San Diego, United States)
Towards Rigorous Use of Neural Networks in Nonlinear Control
ABSTRACT. Data-driven and learning-based approaches have become an indispensable part of automation and autonomous systems. Control and planning components that are neural network models challenge existing principled methods for ensuring reliability and safety. By taking a numerical and statistical perspective on synthesis and verification, we can still aim for assurance in some strong sense for highly nonlinear systems with neural control policies. One way of doing so is to find value certificates, such as Lyapunov and barrier functions, that are often themselves neural networks to capture the highly nonlinear value landscape of these systems. I will describe some of our work in different settings of nonlinear control design across model-based and model-free settings.
09:30
Nicholas Rober (Massachusetts Institute of Technology, United States) Sydney Katz (Stanford University, United States) Chelsea Sidrane (Stanford University, United States) Esen Yel (Stanford University, United States) Michael Everett (Northeastern University, United States) Mykel Kochenderfer (Stanford University, United States) Jonathan How (Massachusetts Institute of Technology, United States)
Backward Reachability Analysis for Safety Verification of Neural Feedback Loops
ABSTRACT. As neural networks (NNs) are increasingly applied to safety critical systems such as autonomous cars or aircraft collision avoidance systems, it is important to develop methods that provide safety assurances at the system level. Generating such assurances is a challenge, not only because it involves formally analyzing the interaction between NNs and other components of a given system, but also because it may require handling large amounts of uncertainty from the environment in which the system operates. In this talk we present a backward reachability approach for safety certification of neural feedback loops (NFLs), i.e., closed-loop systems with NN control policies. While backward reachability strategies have been developed for systems without NN components, the nonlinearities in NN activation functions and general noninvertibility of NN weight matrices make backward reachability for NFLs a challenging problem. To avoid the difficulties associated with propagating sets backward through NNs, we introduce a framework that leverages standard forward NN analysis tools to efficiently find over-approximations to backprojection (BP) sets, i.e., sets of states for which an NN policy will lead a system to a given target set. In this talk, we present frameworks for calculating BP over-approximations for both linear and nonlinear systems with control policies represented by feedforward NNs and propose computationally efficient strategies. We use numerical results from a variety of models to showcase the proposed algorithms, including a demonstration of safety certification for a 6D system.
Vulnerability Analysis of Perception-Based Control: Modeling and Fundamental Limits
ABSTRACT. In this talk, I will present some of our recent results on modeling and analysis of neural network-based control systems operating in adversarial environments. We have introduced the notions of attack effectiveness and stealthiness independent of any potentially employed anomaly detector; i.e., the attack remaining stealthy even from the best existing detectors. The talk will discuss attacks with different levels of runtime knowledge about the physical plant and its states, before presenting sufficient conditions for existence of stealthy effective attacks, forcing the system into an unsafe operating region while remaining stealthy. We have shown that depending on runtime information available to the attacker, the probability of attack remaining stealthy can be arbitrarily close to one, if the attacker’s estimate of the plant’s state is arbitrarily close to the true plant state; on the other hand, when the attacker does not have access to an accurate estimate of the plant state, the stealthiness level of the attack depends on the control performance in attack-free operation. Finally, I will illustrate our results on several real world systems with varying levels of autonomy.
Pawan Poojary (Northwestern University, United States) Randall Berry (Northwestern University, United States)
Bayesian Observational Learning in Mean-Field Games with Imperfect Observations
ABSTRACT. There are many settings in which agents learn from observing the actions of other agents. Bayesian observational learning models provide a framework for studying such situations and have been well-studied in settings where agents sequentially choose Bayes' optimal actions by learning from the actions of previous agents. Here, we consider such observational learning in a mean-field game setting, in which agents repeatedly choose actions over time to maximize an infinite horizon discounted pay-off. This pay-off depends on the underlying mean-field population state, which agents do not know and only have a prior common belief over it. At the end of each time-step, agents observe a common signal which is an imperfect observation of the mean-field action profile played in that time-step and use this to update their beliefs. We give a sequential decomposition of this game that enables one to characterize Markov perfect equilibria of the game. We then focus on a particular sub-class of these games which can be viewed as a mix of coordination/anti-coordination players. Using the sequential decomposition, we characterize the impact of varying the observation quality on the outcome of the game and show that this can exhibit non-monotonic behaviour, where in many instances, poorer observations lead to better expected total discounted pay-offs.
08:50
Keith Paarporn (University of Colorado, Colorado Springs, United States)
Non-myopic agents can stabilize cooperation in feedback-evolving games
ABSTRACT. Individuals make decisions that affect a shared environment, and in turn, a changing environment shapes incentives for future action. In this paper, we investigate the impact that non-myopic agents can have in altering environmental outcomes in scenarios where cooperation improves the environment and defection degrades it. The non-myopic agents considered here make choices that optimize future payoffs, based on projections of the future environmental states. We thus break with traditional assumptions in evolutionary game theory that all agents are myopic, basing decisions only on immediately available rewards. We show that the presence of non-myopic types can stabilize interior fixed points that were originally unstable in a fully myopic population. However, non-myopic types cannot de-stabilize fixed points that constitute a tragedy of the commons (TOC), i.e. configurations where all agents defect. We provide conditions on i) how non-myopic agents need to be, and ii) what fraction of non-myopic types in the population are necessary such that these results hold.
09:10
Tao Zhang (New York University, United States) Quanyan Zhu (New York University, United States)
Stochastic Game with Interactive Information Acquisition: A Fixed-Point Alignment Principle
ABSTRACT. This paper studies a multi-player, general-sum stochastic game characterized by a dual-stage temporal structure per period. The agents face uncertainty regarding the time-evolving state that is realized at the beginning of each period. During the first stage, agents engage in information acquisition regarding the unknown state. Each agent strategically selects from multiple signaling options, each carrying a distinct cost. The selected signaling rule dispenses private information that determines the type of the agent. In the second stage, the agents play a Bayesian game by taking actions contingent on their private types. We introduce an equilibrium concept, Pipelined Perfect Markov Bayesian Equilibrium (PPME), which incorporates the Markov perfect equilibrium and the perfect Bayesian equilibrium. We propose a novel equilibrium characterization principle termed fixed-point alignment and deliver a set of verifiable necessary and sufficient conditions for any strategy profile to achieve PPME.
Characterization of Potential Games: Application in Aggregative Games
ABSTRACT. The main objective of this work is to describe games which fall under title of Potential and simplify the conditions for class of aggregative games. Games classified as aggregative are ones in which, in addition to the player's own action, the payoff for each player depends on an aggregate of all the players' decision variables. In this study, we developed a method based on payoff functions to determine if a given game is potential. Then, in order to identify the Aggregative Games that fall under this class we simplified the criteria for the class of Aggregative Games. A $3$-player Cournot game, also known as an Aggregative Potential Game, is used to test the characterization criteria for Potential Games. A $4$-player Cournot game is also utilized to test the form of potential function we obtained for class of general potential games.
United We Fall: On the Nash Equilibria of Multiplex Network Games
ABSTRACT. Network games have provided a framework to study strategic decision making processes that are governed by an underlying network of interdependencies among agents. However, existing models do not account for environments in which agents simultaneously interact over multiple networks. In this paper, we propose a model of multiplex network games to capture the different modalities of interactions among strategic agents. We then explore how the properties of the constituent networks of a multiplex network can undermine or support the uniqueness of its Nash equilibria. We first show that in general, even if the constituent networks are guaranteed to have unique Nash equilibria in isolation, the resulting multiplex need not have a unique equilibrium. We then identify certain subclasses of networks wherein guarantees on the uniqueness of Nash equilibria on the isolated networks lead to the same guarantees on the multiplex network game. We further highlight that both the largest and smallest eigenvalues of the constituent networks (reflecting their connectivity and two-sidedness, respectively) are instrumental in determining the uniqueness of the multiplex network equilibrium. Together, our findings shed light on the reasons for the fragility of the uniqueness of equilibria in multiplex networks, and potential interventions to alleviate them.
ABSTRACT. We characterize the differential privacy guarantees of privacy mechanisms in the large-composition regime, i.e., when a privacy mechanism is sequentially applied many times to sensitive data. Via exponentially tilting the privacy loss random variable, we derive a new formula for the privacy curve expressing it as a contour integral over an integration path that runs parallel to the imaginary axis with a free real-axis intercept. Then, using the method of steepest descent from mathematical physics, we demonstrate that the choice of saddle-point as the real-axis intercept yields closed-form accurate approximations of the desired contour integral. This procedure---dubbed the saddle-point accountant (SPA)---yields a constant-time accurate approximation of the privacy curve. Theoretically, our results can be viewed as a refinement of both Gaussian Differential Privacy and the moments accountant method found in Rényi Differential Privacy. In practice, we demonstrate through numerical experiments that the SPA provides a precise approximation of privacy guarantees competitive with purely numerical-based methods (such as FFT-based accountants), while enjoying closed-form mathematical expressions.
ABSTRACT. Local differential privacy (LDP) is a powerful method for privacy-preserving data collection. In this paper, we develop a framework for training Generative Adversarial Networks (GAN) on differentially privatized data. We show that entropic regularization of the Wasserstein distance -- a popular regularization method in the literature that has been often leveraged for its computational benefits -- can be used to denoise the data distribution when data is privatized by common additive noise mechanisms, such as Laplace and Gaussian. This combination uniquely enables the mitigation of both the regularization bias and the effects of privatization noise, thereby enhancing the overall efficacy of the model. We analyze the proposed method, provide sample complexity results and experimental evidence to support its efficacy.
ABSTRACT. Consider the problem of binary hypothesis testing. Given $Z$ coming from either $\mathbb P^{\otimes m}$ or $\mathbb Q^{\otimes m}$, to decide between the two with small probability of error it is sufficient and in most cases necessary to have $m \asymp 1/\epsilon^2$, where $\epsilon$ measures the separation between $\mathbb P$ and $\mathbb Q$ in total variation ($\mathsf{TV}$). Achieving this, however, requires complete knowledge of the distributions and can be done, for example, using the Neyman-Pearson test. In this paper we consider a variation of the problem, which we call likelihood-free (or simulation-based) hypothesis testing, where access to $\mathbb P$ and $\mathbb Q$ is given through $n$ iid observations from each. In the case when $\mathbb P,\mathbb Q$ are assumed to belong to a non-parametric family $\mathcal P$, we demonstrate the existence of a fundamental trade-off between $n$ and $m$ given by $nm \asymp n^2_\mathsf{GoF}(\epsilon,\cal P)$, where $n_\mathsf{GoF}$ is the minimax sample complexity of testing between the hypotheses $H_0: \mathbb P= \mathbb Q$ vs $H_1: \mathsf{TV}(\mathbb P,\mathbb Q) \ge \epsilon$. We show this for three families of distributions: $\beta$-smooth densities supported on $[0,1]^d$, the Gaussian sequence model over a Sobolev ellipsoid, and the collection of distributions on alphabet $[k]=\{1,2,\dots,k\}$ with pmfs bounded by $c/k$ for fixed $c$. For the larger family of all distributions on $[k]$ we obtain a more complicated trade-off that exhibits a phase-transition. The test that we propose, based on the $L^2$-distance statistic of Ingster, simultaneously achieves all points on the trade-off curve for the regular classes. This demonstrates the possibility of testing without fully estimating the distributions, provided $m\gg1/\epsilon^2$.
09:30
Cynthia Rush (Columbia University, United States) Ruchira Ray (Columbia University, United States) Marco Avella (Columbia university, United States)
Asymptotics for alpha-posteriors with applications to variational inference
ABSTRACT. In this work we provide some results related to low dimensional asymptotics of the so-called alpha-posterior. We use these results to study the robustness of variations inference to model misspecification.
09:50
Syed Eqbal Alam (Department of Electrical & Computer Engineering, University of New Brunswick, Fredericton, New Brunswick, Canada, Canada) Dhirendra Shukla (Department of Electrical & Computer Engineering, University of New Brunswick, Fredericton, New Brunswick, Canada, Canada) Shrisha Rao (International Institute of Information Technology, Bangalore, Karnataka, India, India)
Near Optimal Differentially Private Client Selection in Federated Settings
ABSTRACT. We develop an iterative differentially private algorithm for client selection in federated settings. We consider a federated network wherein clients coordinate with a central server to complete a task; however, the clients decide whether to participate or not in completing the task based on their preferences---local computation and probabilistic intent. The algorithm does not require client-to-client information exchange. The developed algorithm provides near-optimal values to the clients over long-term average participation with a certain differential privacy guarantee. Finally, we present the experimental results to check the algorithm's efficacy.
ABSTRACT. DNA is an attractive medium for digital data storage. When data is stored on DNA, errors occur, which makes error-correcting coding techniques critical for reliable DNA data storage.
To reduce the number of errors, a common technique is to include constraints that avoid homopolymers (consecutive repeated nucleotides) and balance the GC content, as sequences with homopolymers and unbalanced GC contents are often associated with larger error rates.
However, constrained coding comes at the cost of an increase in redundancy. An alternative is to control the errors by randomizing the sequences, embracing the extra errors, and paying for them with additional coding redundancy. In this paper, we determine the error regimes in which embracing errors is more efficient than constrained coding. We find that constrained coding is inefficient in most common error regimes for DNA data storage. Specifically, the error probabilities for homopolymers and unbalanced GC contents must be very large for constrained coding to achieve a higher code rate than unconstrained coding.
10:50
Netanel Raviv (Washington University in St. Louis, United States) Canran Wang (Washington University in St. Louis, United States) Jin Sima (University of Illinois at Urbana Champaign, United States) Vinh Pham (Washington University in St. Louis, United States) Mi Zhou (Washington University in St. Louis, United States)
Coding for forensic 3D fingerprinting
ABSTRACT. 3D printing brings about a revolution in consumption and distribution of goods, but poses a significant risk to public safety. Any individual with internet access and a commodity printer can now produce untraceable firearms (ghost guns), keys, and dangerous counterfeit products. To combat these threats, it is important to embed identifying information inside the object, so that the perpetrator could be identified if the (potentially damaged) object is confiscated or left at a crime scene.
The talk will begin by surveying some existing 3D printing technologies, and presenting new and existing techniques for information embedding. Related adversarial noise patterns will be analyzed, and an abstract coding problem will be formulated. Then, our progress in solving this problem, as well as experimental proof-of-concept, will be presented.
11:10
Han Mao Kiah (Nanyang Technological University, Singapore)
The Bee Identification Problem and DNA-Based Data Storage
ABSTRACT. The bee-identification problem, formally defined by Tandon, Tan, and Varshney (2019), requires the receiver to identify "bees" using a set of unordered noisy measurements. In this prior work, Tandon, Tan, and Varshney studied error exponents and showed that decoding the measurements jointly results in a significantly smaller error exponent.
In this current work, we study algorithms related to this joint decoder and discuss how these decoders can address the unordered nature of DNA-based data storage systems.
Cover Your Bases: How to Minimize the Sequencing Coverage in DNA Storage Systems
ABSTRACT. Although the expenses associated with DNA sequencing have been rapidly decreasing, the current cost stands at roughly $1.3K/TB, which is dramatically more expensive than reading from existing archival storage solutions today. In this work, we aim to reduce not only the cost but also the latency of DNA storage by studying the DNA coverage depth problem, which aims to reduce the required number of reads to retrieve information from the storage system. Under this framework, our main goal is to understand how to optimally pair an error-correcting code with a given retrieval algorithm to minimize the sequencing coverage depth, while guaranteeing retrieval of the information with high probability. Additionally, we study the DNA coverage depth problem under the random-access setup.
11:50
Kuan Cheng (Peking University, China) Elena Grigorescu (Purdue University, United States) Xin Li (University of Washington, United States) Madhu Sudan (Microsoft, United States) Minshen Zhu (Purdue University, United States)
On $k$-Mer-Based and Maximum Likelihood Estimation Algorithms for Trace Reconstruction
ABSTRACT. The goal of the trace reconstruction problem is to recover a string $\*x\in\{0,1\}^n$ given many independent {\em traces} of $\*x$, where a trace is a subsequence obtained from deleting bits of $\*x$ independently with some given probability $p\in [0,1).$ A recent result of Chase (STOC 2021) shows how $\*x$ can be determined (in exponential time) from $\exp(\widetilde{O}(n^{1/5}))$ traces. This is the state-of-the-art result on the sample complexity of trace reconstruction.
In this paper we consider two kinds of algorithms for the trace reconstruction problem.
On the one hand we study algorithms based on ``$k$-mer statistics'', i.e. statistics regarding occurrences of {\em contiguous} $k$-bit strings (a.k.a, {\em $k$-mers}) in the initial string $\*x$, for $k = O(n^{1/5})$. Mazooji and Shomorony (arXiv.2210.10917) show that such statistics (called $k$-mer density map) can be estimated within $\epsilon$ accuracy from $2^k\cdot\poly(n, \frac{1}{\epsilon})$ traces. We call an algorithm to be {\em $k$-mer-based} if it reconstructs $\*x$ given estimates of the $k$-mer density map. These are in fact a very general class of algorithms that essentially capture all the analyses in the worst-case and smoothed complexity models of the trace reconstruction problem we know of so far.
Our first, and technically more involved, result shows that any $k$-mer-based algorithm for trace reconstruction must use $\exp(\Omega(n^{1/5}))$ traces, under the assumption that the estimator requires $2^k \cdot \poly(n, \frac{1}{\epsilon})$ traces, thus establishing the optimality of this number of traces
Our second, simple, result considers the performance of the Maximum Likelihood Estimator (MLE), which specifically picks the source string that has the maximum likelihood to generate the samples (traces). We show that
the MLE algorithm uses a nearly optimal number of traces, i.e., up to a factor of $n$ in the number of samples needed for an optimal algorithm, and show that this factor of $n$ loss may be necessary in general ``model estimation'' settings.
ABSTRACT. We study the problem of achieving decentralized coordination by a group of strategic decision-makers choosing to engage or not in a task in a stochastic setting. First, we define a class of symmetric utility games that encompasses a broad type of coordination games, including the popular framework known as Global Games. To study the extent to which agents engaging in a stochastic coordination game coordinate, we propose a new probabilistic measure of coordination efficiency. Then, we provide a universal information-theoretic upper bound on the coordination efficiency as a function of the amount of noise in the observation channels. Finally, we revisit a large class of Global Games and illustrate that their Nash equilibrium policies may be less coordination efficient than certainty equivalent policies, in spite of providing expected utility. This counter-intuitive result establishes the existence of a nontrivial trade-off between coordination efficiency and expected utility in coordination games.
10:50
Ali Kara (University of Michigan, United States) Serdar Yuksel (Queen's University, Canada)
Reinforcement Learning for Stochastic Control under Weak Feller Continuous Models and General Information Structures
ABSTRACT. We study reinforcement learning in stochastic control under a variety of information structures. We will first study fully observed Markov Decision Processes (MDPs) with continuous spaces, then partially observable Markov Decision Processes (POMDPs) with continuous spaces, and finally decentralized stochastic control problems with a variety of information structures under which a dynamic programming formulation can be established and a weak Feller continuity condition can be satisfied: (i) We first show that under weak continuity conditions and a technical ergodicity condition, Q-learning can be applied to fully observable MDPs for near optimal performance under both discounted and average cost criteria. (ii) For POMDPs, we show that under filter stability, finite memory policies can be used to arrive at near optimality, and in the absence of filter stability, quantization can be used to arrive at near optimality, under a mild unique ergodicity condition and weak Feller continuity of the non-linear filter dynamics. (iii) For decentralized stochastic control problems, under a variety of information structures with strictly local information or partial global information, we show that synchronous or asynchronous quantized Q-learning can be used to arrive at near optimal performance. We will discuss both the discounted cost criteria as well as average cost criteria.
The general theme is that reinforcement learning can be applied to a large class of models under a variety of performance criteria, provided that modest weak continuity conditions apply for the associated Markov kernels.
Communication Enabled Collaborations in Multi-Agent Systems
ABSTRACT. In large-scale systems, distributing decision-making can reduce physical and computational burdens, but often comes at the cost of overall system performance. With the emergence of new communication and computation technologies, local decision making entities (or agents) may be enabled to collaborate their actions and regain some of the lost efficiency. To understand the possible performance improvements from these inter-agent collaborations, we model a multi-agent system as a common-interest game in which groups of agents can collaborate their action to jointly improve the common objective. We specifically consider k-strong Nash equilibria as the emergent behavior of these systems and address how well these states approximate the system optimal, formalized by the k-strong price of anarchy ratio. Our main contributions are in generating tight bounds on the k-strong price of anarchy in finite resource allocation games as the solution to a tractable linear program.
By varying k –- the maximum size of a collaborative coalition -– we observe exactly how much performance is regained from inter-agent collaborations. To investigate further opportunities for improvement, we generate upper bounds on the maximum attainable k-strong price of anarchy when the agents’ objective function can be designed.
ABSTRACT. This paper aims at extending the framework of optimization and output reachability analysis for functional expansions known as Chen-Fliess series (CFS). In the author's earlier work, it was shown that the minimum bound box (MBB) of output reachable sets can be computed efficiently via a gradient descent and Newton's method. Here steps towards the computation of input-output backward reachable set are taken, and a method for computing the pre-image of a target output set using an optimization routine with constraints that are depending on the CFS representation of a system is provided. Furthermore, this paper also introduces inner approximations of forward and backward reachable sets of a CFS. The interest in such inner approximations lies in the fact that inputs taking values within the MBB of a pre-image of a target output set can render inputs that will produce outputs outside the target set, which is not the case for an inner approximation. The procedure for obtaining such inner approximations is based on the concept of viability theory for which one can compute the largest polygon with a pre-defined number of vertices that is inscribed to the true reachable set. Illustrative examples are included in the final section to illustrate the methods developed in this work.
10:50
Yunshen Huang (Washington University in St. Louis, United States) Wenbo He (Washington University in St. Louis, United States) Shen Zeng (Washington University in St. Louis, United States)
A Differential Dynamic Programming-based Approach for Balancing Energy and Time Optimality in Motion Planning
ABSTRACT. Optimal motion planning that simultaneously considers energy and time efficiency is crucial for a wide range of applications from industrial manufacturing to autonomous vehicle navigation. This paper introduces a novel perspective on discrete-time systems which recognizes the time interval size as an additional aspect of the control variable that is conventionally sought to be determined. By incorporating ideas from Dynamic Differential Programming (DDP), our method, called BO-DDP (Balancing energy and time Optimality via DDP), enables an adjustable trade-off between energy and time optimality. DDP leverages quadratic approximation of dynamics and cost function, ensuring accurate information capture and a high convergence rate. To address the challenge of high computational complexity in obtaining related derivatives, we introduce a Taylor series-based numerical scheme for simultaneous forward integration and differentiation. Extensive simulation experiments on two scenarios, including autonomous car navigation and quadcopter flight, demonstrate the practicality and effectiveness of our algorithm.
11:10
W. Steven Gray (Old Dominion University, United States) Erik I. Verriest (Georgia Institute of Technology, United States)
Linear Operator Differential Equations with Applications to Linear Time-Varying Systems
ABSTRACT. A classical result in linear system theory is that a causal linear time-varying (LTV) system has a smooth finite dimensional realization if and only if its impulse response exhibits a certain factorization. In which case, the standard four canonical realizations can be written explicitly in terms of the coefficient functions appearing in various forms of the input-output differential equation. On the other hand, little appears to be known about the non-factorizable case. Thus, the first goal of this paper is to examine in detail one specific instance of an LTV system whose impulse response is not factorizable and to show explicitly that its input-output map satisfies a type of linear functional differential equation, which will be referred to as a linear operator differential equation. This leads to the second goal of the paper which is to describe the general solutions of first-order and second-order linear operator differential equations. The treatment in the second-order case is focused primarily on singular equations, which are shown to describe the non-factorizable system presented in the first part of the paper.
ABSTRACT. This paper presents an automated algorithm to analyze the stability of piecewise affine PWA dynamical systems due to their broad applications.
We parametrize the Lyapunov function as a PWA function, with polytopic regions defined by the PWA dynamics. Using this parametrization, Stability conditions can be expressed as linear constraints restricted to polytopes so that the search for a Lyapunov function involves solving a linear program. However, a valid Lyapunov function might not be found given these polytopic regions. A natural response is to increase the size of the parametrization of the Lyapunov function by dividing regions and solving the new linear program. This paper proposes two new methods to divide each polytope into smaller ones. The first approach divides a polytope based on the sign of the derivative of the candidate Lyapunov function, while the second divides it based on the change in the vector field of the PWA dynamical system. In addition, we propose using Delaunay triangulation to achieve automated division of regions and preserve the continuity of the PWA Lyapunov function. Examples involving learned models and explicit MPC controllers demonstrate that the proposed method of dividing regions leads to valid Lyapunov functions with fewer regions than existing methods, reducing the computational time taken for stability analysis
Wasserstein Distributionally Robust Regret-Optimal Control under Partial Observability
ABSTRACT. This paper presents a framework for Wasserstein distributionally robust (DR) regret-optimal (RO) control in the context of partially observable systems. DR-RO control considers the regret in LQR cost between a causal and non-causal controller and aims to minimize the worst-case regret over all disturbances whose probability distribution is within a certain Wasserstein-2 ball of a nominal distribution. Our work builds upon the full-information DR-RO problem that was introduced and solved in Yan et al., 2023, and extends it to handle partial observability and measurement-feedback (MF). We solve the finite horizon partially observable DR-RO and show that it reduces to a tractable semi-definite program whose size is proportional to the time horizon. Through simulations, the effectiveness and performance of the framework are demonstrated, showcasing its practical relevance to real-world control systems. The proposed approach enables robust control decisions, enhances system performance in uncertain and partially observable environments, and provides resilience against measurement noise and model discrepancies.
12:10
Faraz Farahvash (Cornell University, United States) Ao Tang (Cornell University, United States)
Stochastic Control under Correlated Disturbances
ABSTRACT. Stochastic control is a fundamental concept in
control theory. Traditionally, it is assumed that the system
disturbances are independent. This assumption does not
always hold. In this paper, we look into cases where the
disturbances are correlated. We provide a modified version of
the Dynamic Programming algorithm to find optimal policies
without the independence assumption. Then, we look into
the classic LQG problem but with correlated disturbances.
We show that the optimal policy is closely related to the
trajectory-following LQG problem. The trajectory in this
problem is defined by estimating future disturbances and
is time-varying. We provide numerical examples to illustrate
the theoretical results.
Yiming Zhang (Northwestern University, United States) Dongning Guo (Northwestern University, United States)
Distributed MARL for Scheduling in Conflict Graphs
ABSTRACT. This paper approaches the scheduling problem in conflict graphs by framing it as a distributed learning problem, with the objective of minimizing average packet delays. Each agent in the network is responsible for a single link and only has access to its own state and the states of its immediate neighbors. The problem formulation aligns with the framework of decentralized partially observable Markov decision process (Dec-POMDP).
The proposed solution leverages on-policy reinforcement learning algorithms within a multi-agent networked system. Specifically, the paper adopts the multi-agent proximal policy optimization (MAPPO) algorithm, incorporating well-designed recurrent structures in the neural network. Through extensive simulations, the solution demonstrates its superiority over a widely used scheduler in terms of both throughput and delay metrics.
10:50
Faraz Farahvash (Cornell University, United States) Ao Tang (Cornell University, United States)
Delay Performance Optimization with Packet Drop
ABSTRACT. In this paper, we aim to improve delay performance of an M/M/1 queue by dropping the right packets at
the right time. First, for a given dropping budget, we explore
the average delay metric and prove that dropping from the
head of the queue is optimal. Next, we look into cases where
packets have a deadline they need to make. We introduce the
concept of gain and propose a policy called ”Drop Positive
Gain Policy (DPGP)” that increases the percentage of packets
meeting their deadline. Finally, some numerical examples are
provided to confirm that DPGP outperforms conventional
policies.
11:10
Chih-Yuan Chiu (University of California, Berkeley, United States) Chinmay Maheshwari (University of California, Berkeley, United States) Pan-Yang Su (University of California, Berkeley, United States) Shankar Sastry (University of California, Berkeley, United States)
Dynamic Tolling in Arc-based Traffic Assignment Models
ABSTRACT. Tolling in traffic networks offers a popular measure to minimize overall congestion. Existing toll designs primarily focus on congestion in route-based traffic assignment models (TAMs), in which travelers make a single route selection from their source to destination. However, these models do not reflect real-world traveler decisions because they preclude deviations from a chosen route, and because the enumeration of all routes is computationally expensive. To address these limitations, our work focuses on arc-based TAMs, in which travelers sequentially select individual arcs (or edges) on the network to reach their destination. We first demonstrate that marginal pricing, a tolling scheme commonly used in route-based TAMs, also achieves socially optimal congestion levels in our arc-based formulation. Then, we use perturbed best response dynamics to model the evolution of travelers' arc selection preferences over time, and a marginal pricing scheme to the social planner's adaptive toll updates in response. We prove that our adaptive learning and marginal pricing dynamics converge to a neighborhood of the socially optimal loads and tolls. We then present empirical results that verify our theoretical claims.
11:30
Amin Shojaeighadikolaei (PhD Student at University of Kansas, United States) Morteza Hashemi (Faculty, Assistant professor at EECS department, United States)
An Efficient Distributed Multi-Agent Reinforcement Learning for EV Charging Network Control
ABSTRACT. The increasing trend in adopting electric vehicles (EVs) will significantly impact the residential electricity demand, which results in an increased risk of transformer overload in the distribution grid. To mitigate such risks, there are urgent needs to develop effective EV charging controllers. Currently, the majority of the EV charge controllers are based on a centralized approach for managing individual EVs or a group of EVs. In this paper, we introduce a decentralized Multi-agent Reinforcement Learning (MARL) charging framework that prioritizes the preservation of privacy for EV owners. We employ the Centralized Training Decentralized Execution-Deep Deterministic Policy Gradient (CTDE-DDPG) scheme, which provides valuable information to users during training while maintaining privacy during execution. Our results demonstrate that the CTDE framework improves the performance of the charging network by reducing the network costs. Moreover, we show that the Peak-to-Average Ratio (PAR) of the total demand is reduced, which, in turn, reduces the risk of transformer overload during the peak hours.