On the connection between reinforcement learning and gradient descent
ABSTRACT. Temporal difference (TD) learning with linear function approximation is one of the earliest methods in reinforcement learning and the basis of many modern methods. We revisit the analysis of TD learning through a new lens and show that TD may be viewed as a modification of gradient descent. This leads not only to a better explanation of what TD does but also improved convergence times guarantees. We also discuss how other methods in reinforcement learning could be viewed through a lens of gradient descent.
08:50
Xiaochun Niu (Northwestern University, United States) Ermin Wei (Northwestern University, United States)
GRAND: A Gradient-Related Ascent and Descent Algorithmic Framework for Minimax Problems
ABSTRACT. Many distributed and centralized optimization problems can be formulated as minimax problems. Thus, we study the minimax optimization problem. Current works only focus on the global performance of gradient-type methods, including gradient descent ascent method (GDA) and its variants such as extra-gradient (EG) and optimistic gradient descent ascent (OGDA) methods, or local convergence analysis of Newton-type methods. In this work, we propose GRAND as a gradient-related ascent and descent algorithmic framework for finding global minimax points. GRAND covers and motivates gradient-type, Newton-type, and other general descent ascent methods as special cases. Theoretically, we present a global sublinear rate of GRAND for solving strongly-convex-nonconcave problems and a linear rate of GRAND for solving strongly-convex-PL problems. These results match the state-of-the-art convergence rates of GDA in corresponding settings. We conduct numerical experiments to demonstrate the efficacy of GRAND. To the best of our knowledge, GRAND is the first generalized framework for minimax problems with provable guarantees.
09:10
Behrouz Touri (University of California San Diego, United States) Rohit Parasnis (University of California San Diego, United States) Massimo Franceschetti (University of California San Diego, United States) Ashwin Verma (University of California San Diego, United States)
Random Adaptation Perspective to Distributed Computation
ABSTRACT. We propose a random adaptation variant of time-varying distributed averaging dynamics in discrete time. This provides a new viewpoint to distributed averaging and many related concepts including ergodicity, and absolute probability sequence. This allows us to propose and study other networked dynamics models through the lens of random adaptation dynamics including the Friedkin-Johnsen model.
09:30
Yang Hu (Harvard University, United States) Adam Wierman (Caltech, United States) Guannan Qu (Carnegie Mellon University, United States)
On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory
ABSTRACT. Stabilizing an unknown dynamical system is one of the central problems in control theory. In this paper, we study the sample complexity of the learn-to-stabilize problem in Linear Time-Invariant (LTI) systems on a single trajectory. Current state-of-the-art approaches require a sample complexity linear in $n$, the state dimension, which incurs a state norm that blows up exponentially in $n$. We propose a novel algorithm based on spectral decomposition that only needs to learn ``a small part'' of the dynamical matrix acting on its unstable subspace. We show that, under proper assumptions, our algorithm stabilizes an LTI system on a single trajectory with $\tilde{O}(k)$ samples, where $k$ is the instability index of the system. This represents the first sub-linear sample complexity result for the stabilization of LTI systems under the regime when $k = o(n)$.
ABSTRACT. In this talk, we begin by optimizing the gradostat, in which several bioreactors are interconnected by mass flow and diffusion. The gradostat is of interest both as a classical nonlinear system and because the basic network structure and nonlinearities appear in a wide variety of bioprocesses, including wastewater treatment. We formulate a convex relaxation for optimizing the gradostat. The relaxation is exact under several conditions, for example, if the gradostat is outflow connected and its flow matrix is irreducible. When the microbial growth in the bioreactors is described by the Monod or Contois functions, the relaxation is a second-order cone program, which can be solved at scales of over 10^5 variables in minutes with industrial software. We present a numerical example based on the operation of a wastewater treatment network over a two-week period.
08:50
Mahnoosh Alizadeh (University of California Santa Barbara, United States)
A safe pricing algorithm for distributed demand management
ABSTRACT. This talk is inspired by applications where customers' demand served over a safety-critical network such as the power system can only be affected through posted prices. We assume no real-time two-way communication with users and no information on the users’ price response function. As such, we discuss the design of algorithms that learn to generate “safe” prices over multiple iterations. A safe pricing algorithm requires that at no iteration of the algorithm should the realized demand in response to the posted prices violate the safety constraints of the network (such as power flow constraints), in spite of uncertainty about the users’ price response. We will discuss two possible formulations we have considered: 1) a first-order dual gradient algorithm for solving network utility maximization problems for resource allocation over networks with safety-critical constraints. In contrast to existing first-order methods, our algorithm, called the safe dual gradient method (SDGM), is guaranteed to produce feasible primal iterates at all iterations. We ensure primal feasibility by a) adding a diminishing safety margin to the constraints, and b) using a sign-based dual update method with different step sizes for plus and minus directions. In addition, we prove that the primal iterates produced by the SDGM achieve a sublinear static regret of ${\cal O}(\sqrt{T})$. If time allows, we will also discuss a bandit based formulation that utilizes the well known principle of optimism in the face of uncertainty, while simultaneously being pessimistic with respect to constraint violation. Our analysis of this algorithm shows that, with high probability, it will not violate the constraints and will achieve $\mathcal{O}(\log(T)\sqrt{T})$ regret.
09:10
Cong Chen (Cornell University, United States) Lang Tong (Cornell University, United States)
Storage Participation with State-of-Charge Dependent Offers and Bids: Market Clearing, Pricing, and Incentive Compatibility
ABSTRACT. There have been recent proposals that allow storage participants in the wholesale electricity market to submit state-of-charge dependent offers and bids. In general, such bids and offers are nonconvex in the multi-interval look ahead electricity market, resulting in computationally expensive market clearing processes, out-of-merit dispatch, and the need for out-of-the-market settlements in the day ahead and real-time markets. This paper proposes ways to mitigate such complications and analyzes incentive compatibilities with the dispatch-following of the market clearing signals, and truthful revelations of its bidding parameters.
09:30
Ye Yuan (Huazhong University of Science and Technology, China) Steven Low (Caltech, United States) Omid Ardakanian (University of Alberta, Canada) Claire Tomlin (UC Berkeley, United States)
Inverse Power Flow Problem
ABSTRACT. We consider the problem of determining the network admittance matrix of a single-phase radial network from partial measurements. Suppose there are current injections at a strict subset of the buses. Suppose voltage and current phasors are measured at these buses so that the admittance matrix of the Kron-reduced network is determined. We show, by construction, that it is then possible to determine the admittance matrix of the original network without error.
Synchronized Stop and Re-start Distributed Average Consensus for Control and Coordination Applications
ABSTRACT. In this work we develop and analyze a distributed event-triggered iterative algorithm that enables the components of a distributed system, each with some initial value, to reach approximate average consensus on their initial values, after executing a finite number of iterations. The proposed algorithm has features that facilitate its use in control/coordination applications where timing constraints might prohibit the implementation of asymptotic techniques that are used in most distributed average consensus protocols. More specifically, the proposed algorithm allows all the nodes to (i) reduce the number of transmitted messages, (ii) simultaneously stop the average consensus process once the network has reached consensus, and (iii) simultaneously restart the process again, presumably with different initial values. To provide this capability, the proposed algorithm provides a criterion that enables all the nodes to determine, in a distributed manner, when to terminate the operation because approximate average consensus has been reached, i.e., all nodes have obtained a value that is within a small distance from the average of their initial values. A second criterion enables all the nodes in the system to terminate their operation (together, at the same time-step) after they have reached approximate consensus. The latter criterion also enables the nodes to simultaneously restart a new round of distributed average computation.
Unbounded Gradients in Federated Leaning with Buffered Asynchronous Aggregation
ABSTRACT. The efficiency of cross-silo federated learning may be compromised by synchronous updates once the number of active users increases. FedBuff (Nguyen et al.) alleviates this problem by allowing asynchronous updates (staleness), which enhances the scalability of training. We revisit the FedBuff algorithm for asynchronous federated learning and extend the existing analysis by removing the boundedness assumptions from the gradient norm. This paper presents a theoretical analysis of the convergence rate of this algorithm when heterogeneity in data, batch size, and delay are taken into account. We conclude by illustrating our theoretical findings through numerical simulations.
08:50
Zhaolin Ren (Harvard University, United States) Na Li (Harvard University, United States)
Escaping saddle points in zeroth-order optimization: two function evaluations suffice
ABSTRACT. Zeroth-order methods are useful in solving black-box optimization, reinforcement learning, and control problems in unknown environments. It uses function values to estimate the gradient. As the optimization problems are often nonconvex, it is a natural question to understand how zeroth-order methods escape saddle points. In this paper, we consider zeroth-order methods, that at each iteration, may freely choose 2$m$ function evaluations where $m$ ranges from 1 to $d$, with $d$ denoting the problem dimension. We show that by adding an appropriate isotropic perturbation at each iteration, a zeroth-order algorithm based on $2m$ function evaluations per iteration can not only find $\ep$-second order stationary points polynomially fast, but do so using only $\tilde{O}(\nicefrac{d}{\ep^{2.5}})$ function evaluations.
Symmetric Natural Policy Gradient for Regularized Multi-Agent Learning with Parameter Convergence
ABSTRACT. Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods in this setting have attracted increasing attention recently. We investigate the global convergence of Natural Policy Gradient (NPG) algorithms in multi-agent learning. We first show that the parameter of the policy may not converge for the vanilla NPG method, even when the payoffs are regularized (which is key in establishing strong convergence guarantees in the policy space). This non-convergence of parameters leads to stability issues in learning, which becomes especially relevant in the function approximation setting, where we can only operate on low-dimensional parameters, instead of the high-dimensional policy. We then propose variants of the NPG algorithm, both with and without and optimism, for several standard multi-agent learning scenarios: two-player zero-sum matrix and Markov games, and multi-player monotone games. We establish global and last-iterate convergence of the policy parameters when the reward is regularized. One feature of our algorithms is that the agents take symmetric roles and operate independently. Moreover, our results might also be of independent interest in solving nonconvex-nonconcave minimax optimization problems with certain structures.
09:30
Xinyi Chen (Princeton University and Google AI Princeton, United States) Elad Hazan (Princeton University and Google AI Princeton, United States)
Black-box Control for Linear Dynamical Systems
ABSTRACT. We consider the problem of black-box control: the task of controlling an unknown linear time-invariant dynamical system from a single trajectory without a stabilizing controller. Under the assumption that the system is controllable, we give the first efficient algorithm that attains sublinear regret under the setting of online nonstochastic control. This resolves an open problem since the work of Abbasi-Yadkori and Szepesvari(2011) on the stochastic LQR problem, and in a more challenging setting that allows for adversarial perturbations and adversarially chosen changing convex loss functions. We give finite-time regret bounds for our algorithm on the order of 2^{poly(d)} + O(poly(d)T^{2/3}) for general nonstochastic control. To complete the picture, we investigate the complexity of the online black-box control problem and give a matching regret lower bound of 2^{\Omega(d)}, showing that the exponential cost is inevitable. This lower bound holds even in the noiseless setting, and applies to any, randomized or deterministic, black-box control method.
Philippe Rigollet (Massachusetts Institute of Technology, United States) Austin Stromme (Massachusetts Institute of Technology, United States)
On the sample complexity of entropic optimal transport
ABSTRACT. We study the sample complexity of entropic optimal transport in high dimensions using computationally efficient plug-in estimators. We significantly advance the state of the art by establishing dimension-free, parametric rates for estimating various quantities of interest, including the entropic regression function which is a natural analog to the optimal transport map. As an application, we propose a practical model for transfer learning based on entropic optimal transport and establish parametric rates of convergence for nonparametric regression and classification.
08:50
Cheng Mao (Georgia Institute of Technology, United States) Alexander Wein (University of California, Davis, United States)
Detection-Recovery Gap for Planted Dense Cycles
ABSTRACT. Planted dense cycles are a type of latent structure that appears in many applications, such as small-world networks in social sciences and sequence assembly in computational biology. We consider a model where a dense cycle with expected bandwidth $n \tau$ and edge density $p$ is planted in an Erd\H{o}s--R\'enyi graph $G(n,q)$. We characterize the computational thresholds for the associated detection and recovery problems for the class of low-degree polynomial algorithms. In particular, a gap exists between the two thresholds in a certain regime of parameters. For example, if $n^{-3/4} \ll \tau \ll n^{-1/2}$ and $p = 2q = \frac{1}{\mathrm{polylog(n)}}$, the detection problem is computationally easy, while the recovery problem is computationally hard.
ABSTRACT. We consider the problem of learning the causal structure of a system from observational data. Constraint-based methods are one of the main approaches for solving this problem, but the existing methods are either computationally impractical when dealing with large graphs or lacking completeness guarantees. We propose a novel computationally efficient recursive constraint-based method that is sound and complete. The key idea of our approach is that at each iteration a specific type of variable is identified and removed. This allows us to learn the structure efficiently and recursively, as this technique reduces both the number of required conditional independence (CI) tests and the size of the conditioning sets.
09:30
Jiaze Qiu (Harvard University, United States) Subhabrata Sen (Harvard University, United States)
The TAP free energy for high-dimensional linear regression with uniform spherical prior
ABSTRACT. Consider the Bayes linear regression model with an iid gaussian design. Assume that the number of observations and covariates are both large and comparable. In this setting, the Thouless-Anderson-Palmer (TAP) approximation from spin glass theory is conjectured to yield a tight approximation for the log-normalizing constant. We will present a rigorous proof under a uniform spherical prior on the regression vector. Finally, we will discuss the implications of this approximation in the context of variational inference.
Local convexity of the TAP free energy and AMP convergence for Z2-synchronization
ABSTRACT. We study mean-field variational Bayesian inference using the TAP approach, for Z2-synchronization as a prototypical example of a high-dimensional Bayesian model. We show that for any signal strength λ>1 (the weak-recovery threshold), there exists a unique local minimizer of the TAP free energy functional near the mean of the Bayes posterior law. Furthermore, the TAP free energy in a local neighborhood of this minimizer is strongly convex. Consequently, a natural-gradient/mirror-descent algorithm achieves linear convergence to this minimizer from a local initialization, which may be obtained by a finite number of iterates of Approximate Message Passing (AMP). This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy.
We also analyze the finite-sample convergence of AMP, showing that AMP is asymptotically stable at the TAP minimizer for any λ>1, and is linearly convergent to this minimizer from a spectral initialization for sufficiently large λ. Such a guarantee is stronger than results obtainable by state evolution analyses, which only describe a fixed number of AMP iterations in the infinite-sample limit.
Our proofs combine the Kac-Rice formula and Sudakov-Fernique Gaussian comparison inequality to analyze the complexity of critical points that satisfy strong convexity and stability conditions within their local neighborhoods.
Phase transitions in the Gaussian database alignment and planted Gaussian matching problems
ABSTRACT. In the correlated Gaussian database model, n pairs of d-dimensional correlated Gaussian vectors are generated and one of the collections of vectors is shuffled. When the correlation is high enough, recovery of the correspondence between the vectors is possible. We investigate the structure of the exact recovery phase transition and find qualitatively different behaviors for large and small numbers of dimensions. In the high dimensional limit, the problem become equivalent to recovering a planted matching in a matrix of independent Gaussian entries. In low dimensions, errors of size two play a special role.
Low-rank matrix estimation with groupwise heteroskedasticity
ABSTRACT. High-dimensional inference problems involving heterogeneous pairwise observations arise in applications such as low-rank covariance estimation and covariate assisted clustering. This talk focuses on a groupwise model for these types of problems, where interactions between different subsets of latent variables are observed under different noise levels. The main results are formulas for the asymptotic mutual information and minimum mean-squared error that describe the fundamental limits of inference without any computational constraints. As an application of these results, it is shown that recently proposed methods based on applying principal component analysis to weighted combinations of the data are optimal in some settings but suboptimal in others.
14:10
Peizhong Ju (The Ohio State University, United States) Xiaojun Lin (Purdue University, United States) Ness Shroff (The Ohio State University, United States)
Understanding the generalization power of overfitted NTK models: 3-layer vs. 2-layer
ABSTRACT. We study the generalization performance of overparameterized 3-layer NTK
models, and compare it with that of 2-layer NTK models. For 3-layer NTK
models, with a specific set of ground-truth functions (which we refer to
as the "learnable set"), we establish an upper bound on the test error
of the min-$l_2$-norm overfitted solution, which decreases with the number
of neurons of the hidden layers. Our upper bound on 3-layer NTK reveals
that, between the two hidden-layers, the test error descends faster with
respect to the number of neurons in the second hidden-layer (the one
closer to the output) than with respect to that in the first
hidden-layer (the one closer to the input). We also show that the
learnable set of 3-layer NTK without bias is no smaller than that of
2-layer NTK models with various choices of bias in the neurons. However,
in terms of the actual generalization performance, our results suggest
that 3-layer NTK is much less sensitive to the choices of bias than
2-layer NTK, especially when the input dimension is large.
14:30
Rishabh Dudeja (Harvard University, United States) Subhabrata Sen (Harvard University, United States) Yue Lu (Harvard University, United States)
Universality of High-Dimensional Estimation with Nearly Deterministic Sensing Matrices
ABSTRACT. It has been observed that the performances of many high-dimensional estimation problems are universal with respect to underlying sensing (or design) matrices. Specifically, matrices with markedly different constructions seem to achieve identical performance if they share the same spectral distribution and have "generic" singular vectors. We prove this universality phenomenon for the case of convex regularized least squares (RLS) estimators under a linear regression model with additive Gaussian noise. Our main contributions are two-fold: (1) We introduce a notion of universality classes for sensing matrices, defined through a set of deterministic conditions that fix the spectrum of the sensing matrix and precisely capture the previously heuristic notion of generic singular vectors; (2) We show that for all sensing matrices that lie in the same universality class, the dynamics of the proximal gradient descent algorithm for solving the regression problem, as well as the performance of RLS estimators themselves (under additional strong convexity conditions) are asymptotically identical. In addition to including i.i.d. Gaussian and rotational invariant matrices as special cases, our universality class also contains highly structured, strongly correlated, or even (nearly) deterministic matrices. Examples of the latter include randomly signed versions of incoherent tight frames and randomly subsampled Hadamard transforms. As a consequence of this universality principle, the asymptotic performance of regularized linear regression on many structured matrices constructed with limited randomness can be characterized by using the rotationally invariant ensemble as an equivalent yet mathematically more tractable surrogate.
Wenqi Cui (University of Washington, United States) Weiwei Yang (Microsoft, United States) Baosen Zhang (University of Washington, United States)
Stable and Decentralized Learning for Voltage Regulation
ABSTRACT. Inverter-based distributed energy resources provide the possibility for fast time-scale voltage control by quickly adjusting their reactive power. Reinforcement learning approaches are becoming increasingly popular to search for the best control policies. It is difficult, however, to enforce that the learned controllers are safe, in the sense that they may introduce instabilities into the system. In addition, the optimality of the learning methods are often quantified only through simulations.
In this paper, we paper proposes a safe learning approach for voltage control.
We prove that the system is guaranteed to be exponentially stable if each controller satisfies certain Lipschitz constraints. A decentralized RL framework is constructed to train local neural network controller at each bus in a model-free setting. We provide theoretically guarantees for the converge and quality of these learned controllers.
13:50
Bernard Lesieutre (University of Wisconsin-Madison, United States) Yasmine Abdennadher (University of Wisconsin-Madison, United States) Sandip Roy (Washington State University, United States)
Model-Enhanced Localization of Forced Oscillation Using PMU Data
ABSTRACT. The detection and mitigation of low-frequency forced oscillation are important for reliable grid operation. Forced oscillations, driven by periodic disturbances such as malfunctioning controllers, may resonate with poorly-damped natural modes across wide areas of the grid, and risk large system failures. Locating the source within the system is the prerequisite for mitigation schemes to correct the actions responsible for the original oscillation. We introduce a model-enhanced method for locating the source of a forced oscillation using PMU measurements. This method augments the use of data-only approaches, such as the energy based function inspired dissipating energy flow method. The algorithms are tested on several scenarios using a 240-bus, 178-line WECC model.
14:10
Anna Scaglione (Cornell University, United States) Tong Wu (Cornell University, United States) Daniel Arnold (Laurence Berkeley National Lab, United States)
Graph Convolutional Neural Networks for Control of Smart Inverters in Power Grids
ABSTRACT. In this work, we propose a physics inspired Graph Convolutional Neural Network (GCN)-Reinforcement Learning (RL) architecture to train online controllers policies for the optimal selection of Distributed Energy Resources (DER) set-points. In particular, the GCN layers perform the feature extraction task for the policy function and the value function of the RL architecture. While the use of GCN is compatible with any DRL scheme, we test it in combination with the popular proximal policy optimization (PPO) algorithm and, as application, we consider the selection of set-points for Volt/Var and Volt/Watt control logic of smart inverters. We are able to show numerically that the GCN scheme is more effective than various benchmarks in regulating voltage and mitigating undesirable voltage dynamics generated by cyber attacks.
In addition to exploring the performance of GCN for a given network, we investigate the case of grids that are dynamically changing due to topology or line parameters variations. We test the robustness of GCN-RL policies against small perturbations as well as explore more drastic dynamics for the grid parameters and evaluate the so called ``transfer learning'' capabilities of the scheme.
14:30
Yury Dvorkin (Johns Hopkins University, United States)
Bringing Data Science and Machine Learning into Electricity Markets
ABSTRACT. Integration of volatile and uncertain renewable energy resources and synergistic energy storage and demand response technologies motivates the pursuit of stochastic electricity market designs to accommodate these resources and technologies efficiently and reliably. However, until recently the primary means of achieving these goals rested on the use of traditional optimization techniques and, in particular, on stochastic optimization and its proxies. However, recent developments make it possible to “re-invent” stochastic electricity markets borrowing results from data science, machine learning and financial engineering. This presentation will first describe how stressing of load and renewable time series can be achieved using principal component analysis and generative adversarial networks, and then how these stressed time series could be integrated into a stochastic electricity market design. Building on this result, we will describe a machine learning approach to stochastic-market clearing that affords both market- and operation-feasible solutions and reduces computational requirements.
14:50
Sunho Jang (University of Michigan, United States) Necmiye Ozay (University of Michigan, United States) Johanna Mathieu (University of Michigan, United States)
Data-driven estimation of probabilistic constraints for network-safe distributed energy resource control
ABSTRACT. Distribution network safety should not be compromised when distributed energy resources provide balancing services to the grid. This work proposes an approach for the distribution network operator to probabilistically maintain the network’s safety by computing and imposing a constraint on the controller’s action. We first introduce a data-driven approach to estimate the network-safety probability under a given constraint, which uses realizations of the random variables related to the network’s safety. Then, we derive a set of sufficient conditions on network-safety probability estimation and determine the number of realizations to satisfy the desired chance constraint with the desired confidence level. Then, we leverage the bisection method to find the tightest possible constraint that satisfies the obtained conditions and impose it on the aggregator's action. Numerical simulations show that the proposed approach achieves the desired network-safety probability.
Individual Altruism Cannot Overcome Congestion Effects in a Global Pandemic Game
ABSTRACT. A key challenge in responding to public health
crises such as COVID-19 is the difficulty of predicting the
results of feedback interconnections between the disease and
society. As a step towards understanding these interconnections,
we pose a simple game-theoretic model of a global pandemic in
which individuals can choose where to live, and we investigate
the global behavior that may emerge as a result of individuals
reacting locally to the competing costs of isolation and infection.
We study the game-theoretic equilibria that emerge from this
setup when the population is composed of either selfish or
altruistic individuals. First, we demonstrate that as is typical
in these types of games, selfish equilibria are in general not
optimal, but that all stable selfish equilibria are within a
constant factor of optimal. Second, there exist infinitely-many
stable altruistic equilibria; all but finitely-many of these are
worse than the worst selfish equilibrium, and the social cost
of altruistic equilibria is unbounded. Our work is in sharp
contrast to recent work in network congestion games in which
all altruistic equilibria are socially optimal. This suggests that
a population without central coordination may react very
poorly to a pandemic, and that individual altruism could even
exacerbate the problem.
13:50
Xing Gao (University of Illinois at Chicago, United States) Lev Reyzin (University of Illinois at Chicago, United States)
An Interactive Search Game with Two Agents
ABSTRACT. We study a two-player interactive game where the players search for a target vertex in a connected undirected graph via multiple rounds of query and feedback. This two-player search game is an adaptation of the single-agent interactive learning problem (Emamjomeh-Zadeh and Kempe, 2017). Here, we analyze the game for trees under different costs: under zero-sum cost we present the one-step Nash equilibrium strategy, and show that the competition hinders both players' search progress as compared to binary search; under non-zero-sum costs, however, we present a cooperative strategy which benefits both players.
14:10
Carmel Fiscko (Carnegie Mellon University, United States) Soummya Kar (Carnegie Mellon University, United States) Bruno Sinopoli (Washington University in St. Louis, United States)
Identifying Influential Agents Via Faux Adversarial Games
ABSTRACT. Identifying important agents is a key goal in study of multi-agent systems (MAS), but standard graph- based centrality metrics often fail to account for the long-term influence of an agent through a system. In this work we model a MAS as a controlled Markov decision process (MDP), and quantify an agent’s impact as its maximum ability to minimize the MDP’s value function. To evaluate this metric, the agent is considered as a “faux adversary” and a secondary MDP is defined from their viewpoint. A game is defined between the controller of the main MDP and the adversary of the secondary MDP, who each work to optimize their respective policies. We show that defining the Q functions of the MDPs as the players’ utilities results in a zero-sum game whose Nash equilibrium value is equal to the proposed impact metric; the exact value can then be found using traditional methods in game theory. Finally, simulations demonstrate that the proposed impact metric can out-preform graph-based centrality measures in reinforcement learning settings, and shows how simulated game-play can approximate the Nash equilibrium value.
14:30
Adel Aghajan (University of California Santa Barbara, United States) Keith Paarporn (University of California Santa Barbara, United States) Jason Marden (University of California Santa Barbara, United States)
Equilibrium characterizations of multi-resource Lotto games
ABSTRACT. The allocation of heterogeneous resources is increasingly becoming a significant aspect for system planners to consider, given the complexity and diversity of objectives that need to be accomplished. In this manuscript, we considered the allocation of multiple and heterogeneous resource types for a strategic adversarial setting known as General Lotto games. We proposed a new framework for multi-resource General Lotto games, which generalizes the classic and pre-dominantly single-resource General Lotto games considered in the literature. Here, the specification of winning rules for each individual contest is the novel modeling consideration in our framework. We derived equilibrium characterizations in the cases where the winning rules are based on weakest-link objectives, and when they are based on a linear combination of the allocated resource types. We then analyzed how investment decisions should be made when it is costly to utilize each resource type.
14:50
Jean-Baptiste Seby (Massachusetts Institute of Technology, United States) Charles Harvey (Massachusetts Institute of Technology, United States) Saurabh Amin (Massachusetts Institute of Technology, United States)
Equilibrium analysis of game on heterogeneous networks with coupled activities
ABSTRACT. We study a game where agents interacting over a network engage in two coupled activities and have to strategically decide their production for each of these activities. Agent interactions involve local and global network effects, as well as a coupling between activities. We consider the general case where the network effects are heterogeneous across activities, i.e., the underlying graph structures of the two activities differ and/or the parameters of the network effects are not equal. In particular, we apply this game in the context of palm oil tree cultivation and timber harvesting, where network structures are defined by spatial boundaries of concessions. We first derive a sufficient condition for the existence and uniqueness of a Nash equilibrium. This condition can be derived using the potential game property of our game or by employing variational inequality framework. We show that the equilibrium can be expressed as a linear combination of two Bonacich centrality vectors.
Raphael Chincilla (University of California, Santa Barbara, United States) Guosong Yang (University of California, Santa Barbara, United States) Joao Hespanha (University of California, Santa Barbara, United States)
Second order methods for min-max optimization with stability guarantees
ABSTRACT. Min-max optimization is widely used in robust model predictive control, computer security problems, robust training of neural networks, generative ad- versarial networks, reformulating stochastic programming as min-max optimiza- tion, and robust stochastic programs. In this talk, we address the design of algorithms to solve nonconvex-nonconcave min-max optimizations using second order methods. These algorithms modify the Hessian matrix to obtain a search direction that can be seen as the solution to a quadratic program that locally approximates the min-max problem. We show that by selecting this type of modification appropriately, the only stable points of the resulting iterations are local min-max points. For min-max model predictive control problems, these al- gorithms leads to computation times that scale linearly with the horizon length.
13:50
Jingjin Yu (Rutgers University at New Brunswick, United States)
Rubik Tables, Stack Rearrangement, and Multi-Robot Routing
ABSTRACT. In a basic Rubik table problem, there are n types of items where each type has a multiplicity of n. These n^2 items are randomly placed in an n x n table, one in each table cell. In a shuffle, a row or a column of the table may be permuted to achieve an arbitrary configuration of the items in that row/column. The Rubik table problem asks how many shuffles are needed to sort the table so that type i items occupies column i of the table. Surprisingly, it turns out that 2n shuffles suffice. Interestingly, Rubik tables and its generalizations allow us to successfully tackle difficult multi-object stack rearrangement problems and multi-robot path planning problems, resulting in polynomial time algorithms for computing asymptotically optimal solutions that were previously not possible.
14:10
Necmiye Ozay (University of Michigan, United States) Liren Yang (Huazhong University of Science and Technology, China)
Guaranteeing safety for systems with missing measurements
ABSTRACT. We consider the problem of synthesizing controllers for uncertain systems that enforce hard state-constraints under occasionally missing state measurements, where the potential missing data patterns are specified by an automaton. Although the problem can be solved as a partial information safety game via power set construction, this naive approach is computationally intractable. In this talk, we show that the problem indeed admits a simpler solution by providing a reduction to a full information safety game that does not require the power set construction. The key to our approach is to introduce an alternative representation of missing data patterns, namely a string automaton. By using the string automaton and exploiting the causality requirement for controllers, we construct a product system that reveals the required memory structure for the controller. Finally, we show that solving a full information game on this product system is equivalent to the original control synthesis problem. We will end the talk with some examples demonstrating the application of the approach.
Asymptotically Optimal Worst-Case State Estimation over Noisy Channels
ABSTRACT. In this talk we present some recent results on worst-case state estimation of a discrete-time LTI system with bounded disturbances when the sensor measurements are transmitted over a noisy channel. It is known that in order to achieve estimation errors that are uniformly bounded in worst-case over time, the zero-error capacity of the channel must exceed the topological entropy of the LTI system. As zero-error capacity is sometimes achieved only at large block-coding lengths, this leads to estimation errors that can be very large, even if uniformly bounded over time.
In this work, we characterize how worst-case state estimation performance is impacted by the block-coding length n and other parameters, such as the channel rate, plant dynamics and disturbance size. By exploiting the Brunn-Minkowski inequality, a universal lower bound on the worst-case state estimation error is obtained, which is valid under any coding and estimation scheme. An upper bound is then derived by proposing and analyzing a specific scheme with optimized allocation of bits. As n increases, the upper and lower bounds approach each other, implying asymptotic optimality of the scheme. It is shown that performance is dominated by the maximum-magnitude eigenvalue(s).
If time permits, extensions to multiple-access channels will be discussed, building on other recent results in https://arxiv.org/abs/2002.03294
14:50
Randy Freeman (Northwestern University, United States)
Proximal methods for self-healing and exact distributed convex optimization
ABSTRACT. Distributed convex optimization algorithms employ a variety of methods
for achieving exact convergence to the global optimal value (modulo
numerical precision): some use time-varying dynamics, some use
dynamics on each edge rather than on each node of the communication
graph, some use double the communication between nodes per
optimization step, and some use a specific initialization that
enforces the dynamics to evolve on a particular subspace. Each of
these methods has its drawbacks. Using time-varying dynamics might
require a global clock, and it might cause transients due to
disturbances to take longer and longer to die out as time progresses.
Using edge-based rather than node-based dynamics might increase the
memory and computation costs, and it typically requires the
communication graph to be undirected. Using double the communication
per optimization step might increase the communication cost, and it
might also slow down the convergence to the optimal value. Using a
specific initialization to enforce a particular invariant subspace
might render the algorithm unable to recover from faults or
disturbances that perturb its dynamics from this subspace, resulting
in convergence to the incorrect value. In this latter case we say
that the algorithm is not self-healing.
In our previous work we considered strongly convex
objective functions having Lipschitz continuous gradients, and we
introduced a new first-order method for achieving exact convergence to
the global optimal value. Our new algorithm has none of the above
drawbacks, and in particular it is a self-healing algorithm. But, it
does possess a peculiar internal instability: each node has states
that grows linearly in time even as its output converges to the
optimal value. In the present work, we consider general non-smooth
and extended-real-valued convex objective functions (which can thus
incorporate hard convex constraints). We present proximal algorithms
that again employ our new, internally unstable method for achieving
exact convergence.
Optimal Primary- and Secondary-control Design for Grids with Generators and Inverters
ABSTRACT. For traditional power grids predominantly featuring large synchronous generators (SGs), there exists a significant body of work bridging optimization and control tasks. A generic workflow in such efforts entails: characterizing the steady state of control algorithms and SG dynamics; assessing the optimality of the resulting operating point with respect to an optimal dispatch task; and prescribing control parameters to ensure that (under reasonable ambient perturbations) the considered control nudges the system steady state to optimality. Well studied instances of the aforementioned approach include designing: i) generator participation factors to ensure economic optimality, and ii) governor frequency-droop slopes to ensure power sharing. Recognizing that future power grids will feature a diverse mix of SGs and inverter-based resources (IBRs) with varying control structures, this work examines the different steps of the optimization-control workflow for this context. Considering the contemporary models of active power-frequency dynamics of IBRs and SGs, a characterization of steady state is put forth (with and without secondary frequency control). Necessary and sufficient conditions on active-power droop slopes of SGs and IBRs are then derived to ascertain desired power sharing and ensure economically optimal operation under varying power demands.
15:50
Dominic Groß (University of Wisconsin-Madison, United States)
Compensating Network Dynamics in Grid-Forming Control
ABSTRACT. The circuit dynamics of power networks are typically neglected in transient stability analysis of power systems. While this approximation is justified in classical multi-machine power systems, the circuit dynamics can compromise stability of power systems dominated by grid-forming (GFM) power converters. In this work, we show that the impact of circuit dynamics can be mitigated by (i) augmenting GFM droop control with derivative feedback, and (ii) leveraging derivative control terms of dual-port GFM control that was recently proposed in the literature. Moreover we show that, contrary to conventional wisdom, avoiding instability by slowing down the converter dynamics (e.g., through virtual inertia) is not always possible or may require unrealistic virtual inertia constants.
From Automatic Generation Control to Fast Frequency Control using Inverter-Based Resources
ABSTRACT. Automatic generation control (AGC) is the decentralized integral control system responsible for continuous re-balancing of generation and load in interconnected power systems. For multiple reasons, AGC must be tuned to operate slowly in practice, rendering the mechanism inappropriate for fast time-scale control services. In contrast, significant recent attention has been targeted towards the development of 'fast frequency control' approaches, with the aim of fully leveraging inverter-based resource (IBR) capabilities for frequency control in an increasingly volatile grid environment. How should the ideas of AGC inform these modern approaches?
We begin this talk by describing recent theoretical advances in our understanding of the dynamic stability and performance of traditional AGC, with an emphasis on the crude disturbance estimation mechanism implicitly embedded within AGC. We then discuss how this disturbance estimation mechanism can be modernized using either model-based or data-driven control methods to significantly improve response speed and enable fast frequency control via IBRs. The approach is supported by both theoretical and simulation results.
16:30
Ian Hiskens (University of Michigan, United States) Sijia Geng (Johns Hopkins University, United States)
Dynamic Performance of Unified Grid-Forming/Following Inverter Control
ABSTRACT. The paper discusses the dynamic performance of grid-connected inverter-based resources (IBRs). This discussion makes use of an inverter control scheme that exhibits both grid-forming and grid-following characteristics. The dynamic behaviour of that grid-forming/following controller is explored under a variety of system conditions to categorize the performance of power systems that incorporate both IBRs and synchronous machines. Small disturbance (linear) analysis is used to provide insights into the nature of modal oscillations and factors affecting oscillation frequency and damping. Examples illustrate the capability of the grid-forming/following control scheme to support autonomous switching between grid-connected and islanded operation, and vice-versa.
Approaches to Analysis of Power System Dynamics with High Penetration of Inverter Interfaced Devices
ABSTRACT. Due to a rapid increase of power electronic converters interfaced renewable sources and loads connected in the power grid, new complex hybrid dynamics have begun to arise. The asynchronous and limited-inertia characteristics of the current mode grid-connected converters can severely degrade system performance. At the same time, the overall system controllability could be potentially improved because of the fast regulating time and flexible programmability of converter controls. Acceptable grid performance will increasingly require converters to be capable of operating with different grid-support functionalities, such as, inertia emulation or voltage support, and leveraging the high controllability of power converters adjustable output characteristics, such as for virtual synchronous generators. In addition, the point of load converters (POLC) in motor drives, computer power supplies, and compact fluorescent lighting typically control load behavior to be constant power and can contribute to voltage instability. These controls could be modified to support the grid. However, this flexibility comes at a cost of extreme complexity that renders both analytical approaches and extensive numerical simulations to be of limited use. This talk discusses some of these complexities and contrasts possible approaches to understanding grid reliable dynamic performance with large numbers of inverters, including passivity methods, set theoretic analysis, temporal logic and the use of time-scale theory.
Sohil Shah (Massachusetts Institute of Technology, United States) Saurabh Amin (Massachusetts Institute of Technology, United States) Patrick Jaillet (Massachusetts Institute of Technology, United States)
Information provision to manage strategic agents over star networks - static and dynamic designs
ABSTRACT. We study information provision by a strategic central planner who publicly signals about an uncertain parameter to agents distributed over the hosts of a star network. The parameter impacts the agents’ gains in moving to the congestible central hub. Agents update their belief using provisioned public signals and decide whether to move to the hub or remain at their respective hosts. The planner maintains a set of desirable outcomes and seeks to design a signaling mechanism to maximize the probability that agents choose an acceptable outcome for the true parameter. We consider two aspects in our design of optimal signaling mechanisms: (i) whether the set of desirable outcomes is independent of the parameter (state-independent case) or not (state-dependent case); and (ii) whether signaling is static or dynamic. For the state-independent case, we design both static and dynamic mechanisms by leveraging the mapping between agents’ equilibrium decisions and the induced posterior belief, and then another mapping from this belief to the planner’s objective. For the state-dependent case, we show that optimal static signaling can be obtained by solving a linear program. The most general case of dynamic mechanism for state-dependent set of desirable outcomes remains an open question.
15:50
Runyu Zhang (Harvard University, United States) Zhaolin Ren (Harvard University, United States) Na Li (Harvard University, United States)
Gradient play in stochastic games: stationary points, convergence, and sample complexity
ABSTRACT. Policy gradient-type methods are popular for multiagent reinforcement learning because of their flexibility and capability to incorporate structured state and action spaces. We will focus on studying the performance of the gradient play algorithm for stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first-order stationary policies are equivalent in this setting, and give a local convergence rate around strict NEs. Further, for a subclass of SGs called Markov potential games (which includes the cooperative setting with identical rewards among agents as an important special case), we design a sample-based reinforcement learning algorithm and give a non-asymptotic global convergence rate analysis for both exact gradient play and our sample-based learning algorithm.
Local geometry and local stability are also considered, where we prove that strict NEs are local maxima of the total potential function and fully-mixed NEs are saddle points. Lastly, we will briefly discuss the extension to softmax policies.
16:10
Ozan Candogan (University of Chicago, Booth School of Business, United States)
Persuasion in Networks: Public Signals and Cores
ABSTRACT. We consider a setting where agents in a social network take binary actions that exhibit local strategic complementarities. Their payoffs are affine and increasing in an underlying real-valued state of the world. An information designer commits to a signaling mechanism that publicly reveals a signal that is potentially informative about the state. She wants to maximize the expected number of agents who take action 1. We study the structure and design of optimal public signaling mechanisms.
The designer’s payoff is an increasing step function of the posterior mean (of the state) induced by the realization of her signal. We provide a convex optimization formulation and an algorithm that obtain an optimal public signaling mechanism whenever the designer’s payoff admits this structure. This structure is prevalent, making our formulation and results useful well beyond persuasion in networks. In our problem, the step function is characterized in terms of the cores of the underlying network. The optimal mechanism is based on a “double-interval partition” of the set of states: it associates up to two subintervals of the set of states with each core, and when the state realization belongs to the interval(s) associated with a core, the mechanism publicly reveals this fact. In turn, this induces the agents in the relevant core to take action 1. We also provide a framework for obtaining asymptotically optimal public signaling mechanisms for a class of random networks. Our approach uses only the limiting degree distribution information, thereby making it useful even when the network structure is not fully known. Finally, we explore which networks are more amenable to persuasion, and show that more assortative connection structures lead to larger payoffs for the designer. On the other hand, the dependence of the designer’s payoff on the agents’ degrees can be quite counterintuitive. In particular, we focus on networks sampled uniformly at random from the set of all networks consistent with a degree sequence, and illustrate that when the degrees of some nodes increase, this can reduce the designer’s expected payoff, despite an increase in the extent of (positive) network externalities.
ABSTRACT. Can intentionally decreasing one’s own competitive advantage ever provide strategic benefits in adversarial environments? In this talk we pursue this question under a game theoretic framework termed General Lotto games, and investigate whether a player can gain an advantage by either conceding budgetary resources or conceding valuable objectives to an opponent. While one might intuitively assume that the player cannot, our work demonstrates that – perhaps surprisingly – concessions do offer strategic benefits when made correctly. Our first result shows that budgetary concessions cannot be advantageous in both one-vs.-one and two-vs.-one adversarial scenarios, supporting the intuition. Our second results shows that the same intuition does not hold true for objective concession, i.e., conceding objectives can be strategically beneficial. Accordingly, our main result fully characterizes the scenarios where such opportunities are present. Finally, we draw connections between our results and previous results on strategic, pre-emptive mechanisms in General Lotto games.
Huanran Li (University of Wisconsin-Madison, United States) Daniel Pimentel-Alarcón (University of Wisconsin-Madison, United States)
Minimum-Length Trace Reconstruction via Integer Programming
ABSTRACT. We propose a new perspective for solving the trace-reconstruction problem. In this problem, multiple traces (or subsequences) are independently sampled from an unknown integer sequence. The goal is to find the shortest sequence that agrees with all traces. Our primary result is that this problem can be efficiently solved with an Integer Programming model under some mild assumptions. We introduced two sets of valid inequalities to further speed up the optimization process. We also compare our algorithm with classic depth-first searching. Through the experimental results, our algorithm shows a dramatic speed and efficiency advantage over DFS.
15:50
Meng-Che Chang (Georgia Institution of Technology, United States) Shi-Yuan Wang (Georgia Institution of Technology, United States) Matthieu Bloch (Georgia Institution of Technology, United States)
Controlled Sensing with Corrupted Commands
ABSTRACT. We consider a controlled sensing problem in which
the actions of the decision-maker are corrupted by an adversary.
Two kinds of error probabilities are defined according to whether
the adversary exists or not. We denote $P_{E,0}$ as the error probability
when there is no adversary and $P_{E,-1}$ as the error probability when
an adversary exists. Our main result provides Stein’s lemma-like characterization of the achievable exponent of $P_{E,0}$. Finally,
we give some numerical examples to illustrate our main theorem
An Information-Theoretic Analysis of Bayesian Reinforcement Learning
ABSTRACT. Building on the framework introduced by Xu and
Raginksy for supervised learning problems, we study the best
achievable performance for model-based Bayesian reinforcement
learning problems. With this purpose, we define minimum
Bayesian regret (MBR) as the difference between the maximum
expected cumulative reward obtainable either by learning from
the collected data or by knowing the environment and its dynamics.
We specialize this definition to reinforcement learning problems
modeled as Markov decision processes (MDPs) whose kernel
parameters are unknown to the agentand whose uncertainty is
expressed by a prior distribution. One method for deriving upper
bounds on the MBR is presented and specific bounds based on the
relative entropy and the Wasserstein distance are given. We then
focus on two particular cases of MDPs, the multi-armed bandit
problem (MAB) and the online optimization with partial feedback
problem. For the latter problem, we show that our bounds can
recover from below the current information-theoretic bounds by
Russo and Van Roy.
16:30
Adam Case (Drake University, United States) Jack Lutz (Iowa State University, United States)
ABSTRACT. In 2004, Dai, Lathrop, Lutz, and Mayordomo defined and investigated the {\it finite-state dimension} (a finite-state version of algorithmic dimension) of a sequence S and, in 2018, Case and Lutz defined and investigated the {\it mutual (algorithmic) dimension} between two sequences S and T. In this paper, we propose a definition for the {\it lower} and {\it upper finite-state mutual dimensions} mdim_FS(S:T) and Mdim_FS(S:T) between two sequences S and T. Intuitively, the finite-state dimension of a sequence S represents the density of finite-state information contained within S, while the finite-state mutual dimension between two sequences S and T represents the density of finite-state information {\it shared} by S and T. Thus "finite-state mutual dimension" can be viewed as a "finite-state'' version of mutual dimension and as a "mutual" version of finite-state dimension.
The main results of this investigation are as follows. First, we show that finite-state mutual dimension, defined using {\it information-lossless finite-state compressors}, has all of the properties expected of a measure of {\it mutual information}. Next, we prove that finite-state mutual dimension may be characterized in terms of {\it block mutual information rates}. Finally, we provide necessary and sufficient conditions for two {\it normal} sequences R_1 and R_2 to achieve mdim_FS(R_1:R_2) = Mdim_FS(R_1:R_2) = 0.
On Learning a Hidden Directed Graph with Path Queries
ABSTRACT. In this paper, we consider the problem of reconstructing a directed graph using path queries. In the query model of learning, a graph is hidden from the learner, and the learner can access information about it with path queries. For a source and destination node, a path query returns whether there is a directed path from the source to the destination node in the hidden graph. We first give bounds for learning graphs on n vertices and k strongly connected components. We then study the case of bounded degree directed trees and give new algorithms for learning “almost-trees” – directed trees to which extra edges have been added. We also give some lower bound constructions
justifying our approach.
Overparameterization improves robustness to covariate shift in high dimensions
ABSTRACT. A significant obstacle in the development of robust machine learning models is covariate shift, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label distributions remain the same. Despite the prevalence of covariate shift in real-world applications, a theoretical understanding in the context of modern machine learning has remained lacking. We examine the exact high-dimensional asymptotics of random feature regression under covariate shift and present a precise characterization of the limiting test error, bias, and variance in this setting. Our results motivate a natural partial order over covariate shifts that provides a sufficient condition for determining when the shift will harm (or even help) test performance. We find that overparameterized models exhibit enhanced robustness to covariate shift, providing one of the first theoretical explanations for this ubiquitous empirical phenomenon. Additionally, our analysis reveals an exact linear relationship between the in-distribution and out-of-distribution generalization performance, offering an explanation for this surprising recent observation.
A Kernel Analysis of Feature Learning in Deep Neural Networks
ABSTRACT. Deep neural networks learn useful representations of data, yet the nature of these representations has not been fully understood. Here, we empirically study the kernels induced by the layer representations during training by analyzing their kernel alignment to the network's target function. We show that representations from earlier to deeper layers increasingly align with the target task for both training and test sets, implying better generalization. We analyze these representations across different architectures, optimization methods and batch sizes, and bridge our findings to the Neural Collapse (NC) phenomenon. Furthermore, we compare the Neural Tangent Kernel (NTK) of deep neural networks and its alignment with the target during training and find that NTK-target alignment also increases during training.
Natasha Devroye (University of Illinois Chicago, United States) Abhijeet Mulgund (University of Illinois Chicago, United States) Raj Shekhar (University of Illinois Chicago, United States) Gyorgy Turan (University of Illinois Chicago, United States) Yeqi Wei (University of Illinois Chicago, United States) Milos Zefran (University of Illinois Chicago, United States)
Opening the black box: deep-learned error-correcting codes
ABSTRACT. Deep-learned error-correcting codes (DL-ECCs), where encoders and decoders are learned "black-boxes", have been experimentally shown to outperform traditional codes in several scenarios. In order to put these results in context and make DL-ECCs attractive for applications, it is necessary to better understand these codes, and in turn interpret the neural networks implementing them. We summarize our results of interpreting the encoder part of TurboAE, one of the first DL-ECCs and describe several tools that were used in the process and may be of general interest for interpreting deep-learned components. We also provide further insights into the performance of TurboAE by interpreting its decoder.
17:00
Rafal Goebel (Loyola University of Chicago, United States)
On the Conley's decomposition for well-posed hybrid inclusions
ABSTRACT. The so-called Fundamental Theorem of Dynamical Systems, due to Conley and initially stated for flows, states that the dynamics on a compact and invariant set decompose into the chain-recurrent part and the gradient-like part. The talk will discuss the generalization of this result to hybrid (and multivalued) dynamics.