Program for Thursday, September 28th

PROGRAM FOR THURSDAY, SEPTEMBER 28TH

Days:

previous day

next day

all days

View: session overview talk overview

08:30-10:10 Session ThA1: BHRS3 - Learning and Networks III

Organizers - Bruce Hajek and R. Srikant

Chair:

Sanjay Shakkotttai (University of Texas Austin, United States)

Location: Library

08:30	Dileep Kalathil (TAMU, United States) Natural Actor-Critic for Robust Reinforcement Learning ABSTRACT. We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales up. To this end, we propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric. Both make large-scale robust RL tractable even when one only has access to a simulator. We propose a robust natural actor-critic (RNAC) approach that incorporates the new uncertainty sets and employs function approximation. We provide finite-time convergence guarantees for the proposed RNAC algorithm to the optimal robust policy within the function approximation error. Finally, we demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.
08:50	Richard Combes (Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des signaux et systèmes, France, France) Raymond Zhang (Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des signaux et systèmes, France, France) Curse of Dimension of Thompson Sampling for Combinatorial Bandits ABSTRACT. We consider Thompson Sampling (TS) for combinatorial bandits with Bernoulli rewards and uniform priors, and we show that TS suffers from the curse of dimension in the sense that its problem dependent regret scales exponentially in the ambiant dimension, and that its minimax regret scales close to linearly in time. Both these facts indicate that TS is strictly sub-optimal for the problem at hand, when compared with optimistic algorithms such as ESCB. We further show that simple fixes such as including forced exploration do not alleviate the problem. We further confirm our theoretical predictions with numerical experiments, even on simple examples.
09:10	Jerome Taupin (Ecole Normale Superieure, France) Yassir Jedra (KTH Royal Institute of Technology, Sweden) Alexandre Proutiere (KTH Royal Institute of Technology, Sweden) Best Policy Identification in Linear MDPs ABSTRACT. We consider the problem of best policy identification in discounted Linear Markov Decision Processes in the fixed confidence setting, under both generative and forward models. We derive an instance-specific lower bound on the expected number of samples required to identify an $\varepsilon$-optimal policy with probability $1-\delta$. The lower bound characterizes the optimal sampling rule as the solution of an intricate non-convex optimization program, but can be used as the starting point to devise simple and near-optimal sampling rules and algorithms. We devise such algorithms. In the generative model, our algorithm exhibits a sample complexity upper bounded by ${\cal O}( (d(1-\gamma)^{-4}/ (\varepsilon+\Delta)^2) (\log(1/\delta)+d))$ where $\Delta$ denotes the minimum reward gap of sub-optimal actions and $d$ is the dimension of the feature space. This upper bound holds in the moderate-confidence regime (i.e., for all $\delta$), and matches existing minimax and gap-dependent lower bounds. In the forward model, we determine a condition under which learning approximately optimal policies is possible; this condition is weak and does not require the MDP to be ergodic nor communicating. Under this condition, the sample complexity of our algorithm is asymptotically (as $\delta$ approaches 0) upper bounded by ${\cal O}((\sigma^\star(1-\gamma)^{-4}/{(\varepsilon+\Delta)^2}) (\log(\frac{1}{\delta})))$ where $\sigma^\star$ is an instance-specific constant, value of an optimal experiment-design problem. To derive this bound, we establish novel concentration results for random matrices built on Markovian data.
09:30	Matthew Faw (UT Austin, United States) Litu Rout (UT Austin, United States) Constantine Caramanis (The University of Texas at Austin, United States) Sanjay Shakkottai (UT Austin, United States) Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD PRESENTER: Constantine Caramanis ABSTRACT. A long line of work has established conditions under which SGD converges to a stationary point, along with corresponding rates of convergence. Typically, this requires control of the smoothness of the gradients, as well as the variance of the stochastic oracle. Adaptive algorithms like AdaGrad address the same problem, but do not require a priori knowledge of key regularity parameters of the problem. We consider the class of $(L_0,L_1)$-smooth functions proposed by Zhang et al.\ (ICLR'20). Empirical evidence suggests that these functions more closely capture practical machine learning problems as compared to the pervasive $L_0$-smoothness. To date, analysis of adaptive gradient descent algorithms has required a uniform boundedness assumption on the stochastic oracle (the noise), a condition that is often violated in real-world problems. This is precisely the problem we address. A major challenge in the analysis of adaptive stochastic gradient descent is the correlation between the stochastic gradients and the step-size. Our key innovation is a recursively-defined stopping time. Using this idea, we prove a 1/sqrt{T} rate of convergence, for $(L_0,L_1)$-smooth functions, but without requiring uniformly bounded noise.
09:50	Arunabh Srivastava (University of Maryland, United States) Sennur Ulukus (University of Maryland, United States) Age of Gossip on a Grid ABSTRACT. We consider a gossip network consisting of a source generating updates and $n$ nodes connected in a two-dimensional square grid. The source keeps updates of a process, that might be generated or observed, and shares them with the grid network. The nodes in the grid network communicate with their neighbors and disseminate these version updates using a push-style gossip strategy. We use the version age metric to quantify the timeliness of information at the nodes. We find an upper bound for the average version age for a set of nodes in a general network. Using this, we show that the average version age at a node scales as $O(n^{\frac{1}{3}})$ in a grid network. Prior to our work, it has been known that when $n$ nodes are connected on a ring the version age scales as $O(n^{\frac{1}{2}})$, and when they are connected on a fully-connected graph the version age scales as $O(\log n)$. Ours is the first work to show an age scaling result for a connectivity structure other than the ring and fully-connected networks that represent two extremes of network connectivity. Our work shows that higher connectivity on a grid compared to a ring lowers the age experience of each node from $O(n^{\frac{1}{2}})$ to $O(n^{\frac{1}{3}})$.

08:30-10:10 Session ThA2: RZSF2- Recent Advances in Optimization II

Organizers - Richard Zhang and Salar Fattahi

Location: Solarium

08:30	Eitan Levin (Caltech, United States) Venkat Chandrasekaran (Caltech, United States) Any-dimensional convex sets ABSTRACT. Classical algorithms are defined on inputs of different sizes. In contrast, algorithms learned from data may only be defined on inputs of the same size as the data. What does it mean for an algorithm to be defined on infinitely-many input sizes? How do we describe such algorithms, and how do we parametrize and search over them? In this talk, we tackle these questions for convex optimization-based algorithms. Describing such algorithms reduces to describing convex sets. These, in turn, are often "freely" described, meaning that their description makes instantiation in every dimension obvious. Examples include unit balls of standard regularizers defined on signals and images of any size, graph parameters defined for graphs of any size, and (quantum) information theoretic quantities defined for distributions on any number of (qu)bits. We show that such descriptions arise from a recently-identified phenomenon in algebraic topology called representation stability, which combines symmetry with embeddings and projections relating different-sized problem instances. We use representation stability to formally define free descriptions and obtain parametric families of them adapted to specific applications. Our framework also allows us to prove that many symmetric semidefinite and relative entropy programs can be solved in time independent of dimension. Our proof uses representation stability to unify and strengthen existing results as well as prove new ones.
08:50	Qing Qu (University of Michigan, United States) The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks ABSTRACT. Over the past few years, an extensively studied phenomenon in training deep networks is the implicit bias of gradient descent towards parsimonious solutions. In this work, we investigate this phenomenon by narrowing our focus to deep linear networks. Through our analysis, we reveal a surprising "law of parsimony" in the learning dynamics when the data possesses low-dimensional structures. Specifically, we show that the evolution of gradient descent starting from orthogonal initialization only affects a minimal portion of singular vector spaces across all weight matrices. In other words, the learning process happens only within a small invariant subspace of each weight matrix, despite the fact that all weight parameters are updated throughout training. This simplicity in learning dynamics could have significant implications for both efficient training and a better understanding of deep networks. First, the analysis enables us to considerably improve training efficiency by taking advantage of the low-dimensional structure in learning dynamics. We can construct smaller, equivalent deep linear networks without sacrificing the benefits associated with the wider counterparts. Second, it allows us to better understand deep representation learning by elucidating the linear progressive separation and concentration of representations from shallow to deep layers. We also conduct numerical experiments to support our theoretical results. The code for our experiments can be found at https://github.com/cjyaras/lawofparsimony.
09:10	Christian Kümmerle (University of North Carolina at Charlotte, United States) Dominik Stöger (KU Eichstätt-Ingolstadt, Germany) Weights Matter: Linear & Superlinear Convergence Rates for IRLS in Low-Rank Optimization ABSTRACT. We provide the first convergence analysis for an iteratively reweighted least squares (IRLS) algorithm applied to nuclear norm minimization that involves convergence rates. For MatrixIRLS minimizing the nuclear norm, we show both a global linear convergence rate and local linear convergence rate with dimension-independent convergence factor, which apply for measurements satisfying a null space prop- erty. Furthermore, we show that MatrixIRLS optimizing non-convex Schatten-p objectives, a setting for which widespread global convergence has been observed empirically in matrix completion settings if in a low-data regime, exhibits local superlinear convergence within a convergence radius that is larger than in previous results by order of magnitudes. Unlike for other non-convex algorithms, the basin of attraction does not depend on the condition number of the matrix of interest.
09:30	Marie Maros (Purdue University, United States) Gesualdo Scutari (Purdue University, United States) A Unified View of Decentralized Algorithms for Sparse Linear Regression ABSTRACT. We investigate sparse linear regression in high-dimension via the projected LASSO estimator from distributed datasets across a network of agents. This model enables the ambient dimension to scale exponentially with the total sample size. We develop a unified algorithmic framework that encompasses a variety of distributed algorithms, new and old, such as primal, primal-dual, and gradient tracking-based methods, for which we establish both statistical and computational guarantees. Our findings demonstrate that, under standard data generation models, appropriate network connectivity and algorithm tuning, the studied schemes converge to statistically optimal estimates at a linear rate. However, the dependencies of convergence rate on the ambient dimension exhibit a noteworthy difference, ranging from linear to independent scaling. This fact is a new, interesting phenomenon distinctive of the high-dimensional setting and revealed by this study.
09:50	Amin Karbasi (Yale & Google, United States) Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization ABSTRACT. Algorithmic reproducibility measures the deviation in outputs of machine learning algorithms upon minor changes in the training process. Previous work suggests that first-order methods would need to trade-off convergence rate (gradient complexity) for better reproducibility. In this work, we challenge this perception and demonstrate that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems under various error-prone oracle settings. Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds – optimal reproducibility and near-optimal gradient complexity – for minimization and minimax optimization. With the inexact gradient oracle, the near-optimal guarantees also hold for minimax optimization. Additionally, with the stochastic gradient oracle, we show that stochastic gradient descent ascent is optimal in terms of both reproducibility and gradient complexity.

08:30-10:10 Session ThA3: Federated learning

Location: Butternut

08:30	Benedito Fonseca Jr (Northern Illinois University, United States) Fairness in Distributed Sensor Systems to Detect Emitters ABSTRACT. When designing a distributed sensor system to detect the occurrence of an emitter in multiple communities, the designer may face questions related to fairness. To assist designers answer such questions, we adopt a fairness definition based on envy-free fairness: we define a design to be envy-free fair when all communities have the same probability of detecting the emitter. Using an optimization framework considering a constraint in the total rate of false alarms, we show that, when communities are isolated, the optimum envy-free fair system exists; and, for a given sensor allocation, we propose an algorithm to find such a system. We also propose an algorithm to allocate sensors to communities and, through examples, we illustrate that it can produce allocations close to the optimum allocation most of the times.
08:50	Ahmed Ali Abbasi (Iowa State University, United States) Shana Moothedath (Iowa State University, United States) Namrata Vaswani (Iowa State University, United States) AltGDMin for Fast Federated Low Rank Matrix Completion ABSTRACT. We develop a fast, communication-efficient and private algorithm, called Alternating Gradient Descent and Minimization (AltGDMin), for solving the Low Rank Matrix Completion (LRMC) problem. The communication-efficiency and privacy claims assume a vertically federated setting in which each node observes some entries of a different subset of columns of the unknown LR matrix X∗. Using extensive simulation experiments, we demonstrate the superior communication efficiency of AltGDMin. We also numerically evaluate the sample/iteration complexity and show competitive performance compared to benchmark methods.
09:10	Syed Eqbal Alam (Department of Electrical & Computer Engineering, University of New Brunswick, Fredericton, New Brunswick, Canada, Canada) Dhirendra Shukla (Department of Electrical & Computer Engineering, University of New Brunswick, Fredericton, New Brunswick, Canada, Canada) Communication-efficient Federated Multi-resource Allocation: A Non-homogeneous Markov Chain Model PRESENTER: Syed Eqbal Alam ABSTRACT. In many applications in communication networks, smart energy systems, edge computing, smart cities, etcetera, agents wish not to exchange information with other agents in the network and make decisions based on their choices and preferences and local computation. The agents cooperate with a central server to minimize the overall cost to the network. Such settings are called federated setting; it has recently received much interest from the research community. In this paper, we develop a communication-efficient stochastic multi-resource allocation algorithm for federating settings, generalizing the additive increase multiplicative decrease (AIMD) algorithm. Our solution does not require inter-agent communication. We consider a central server that coordinates with the agents to track the aggregate consumption of resources and sends one-bit feedback signals in the multi-agent system when the resource capacity constraints are violated. We model the system as a non-homogeneous Markov chain with place-dependent probabilities. We show the convergence of the average allocations to the optimal values. Furthermore, we present experimental results to demonstrate the algorithms’ efficacy.
09:30	Ankit Pratap Singh (Iowa State University, United States) Namrata Vaswani (Iowa State University, United States) Byzantine-resilient Federated Low Rank Column-wise Compressive Sensing ABSTRACT. In this work we consider the low rank column-wise (compressive) sensing (LRCCS) problem within a horizontally federated framework. We introduce a fast and communication-efficient algorithm for solving this problem, while being provably resilient to Byzantine attacks on all but the initial iterations of the algorithm. We do this by modifying our recently developed GD-based algorithm called AltGDmin (alternating GD and minimization) by replacing the summation/mean step of its GD step by the geometric median (GM).

08:30-10:10 Session ThA4: Information Theory

Organizers - Olgica Milenkovic and Ilan Shomorony

Location: Pine

08:30	Oguzhan Kubilay Ulger (New York University, United States) Elza Erkip (New York University, United States) Single-Shot Lossy Compression for Joint Inference and Reconstruction ABSTRACT. In the classical source coding problem, the compressed source is reconstructed at the decoder with respect to some distortion metric. Motivated by settings in which we are interested in more than simply reconstructing the compressed source, we investigate a single-shot compression problem where the decoder is tasked with reconstructing the original data as well as making inferences from it. Quality of inference and reconstruction is determined by a distortion criteria for each task. Given allowable distortion levels, we are interested in characterizing the probability of excess distortion. Modeling the joint inference and reconstruction problem as direct-indirect source coding one, we obtain lower and upper bounds for excess distortion probability. We specialize the converse bound and present a new easily computable achievability bound for the case where the distortion metric for reconstruction is logarithmic loss.
08:50	Mohammad Zaeri Amirani (University of Hawaii, United States) Anders Host-Madsen (University of Hawaii, United States) Learning Source Coding for General Alphabets and Finite State Machines ABSTRACT. This paper develops bounds for learning lossless source coding under the PAC (probably approximately correct) framework. The paper considers both iid sources and sources generated by finite state machines (FSM).
09:10	Jeremy Johnston (Columbia University, United States) Xiaoyang Liu (Columbia University, United States) Shixun Wu (University of California, Riverside, United States) Xiaodong Wang (Columbia University, United States) A Curriculum Learning Approach to Optimization with Application to Downlink Beamforming ABSTRACT. We explore the potential of neural networks to learn the solutions of a target family of continuous optimization problems, with application to two downlink beamforming optimization scenarios: MISO and MIMO. The learned neural network's forward pass produces near-optimal solutions, thereby affording significant computational savings compared to existing optimization solvers. To learn the optimal solution mapping, we develop two curriculum learning strategies for learning optimizers for beamforming design: First, the reward curriculum defines a sequence of training objective functions of increasing complexity, utilizing the mean-squared error criterion before switching to a more complex performance criterion (e.g., sum rate). Second, the subspace curriculum defines a sequence of training data distributions by restricting the data to linear subspaces of increasing dimension. For the MISO beamforming problem, the learned optimizer achieves near-optimal objective value (sum rate or min rate) across a range of signal-to-noise ratios. In the MIMO scenario, the learned optimizer is on par with and in some cases far exceeds performance of suboptimal beamforming strategies.
09:30	Zhi Li (Dept of Electrical and Computer Engineering, University of Delaware, United States) Riley Kilfoyle (Dept of Electrical and Computer Engineering, Texas A&M University, United States) Zixiang Xiong (Dept of Electrical and Computer Engineering, Texas A&M University, United States) Javier Garcia-Frias (Dept of Electrical and Computer Engineering, University of Delaware, United States) Non-Linear Analog Coding for Image Transmission over Noisy Channels ABSTRACT. We propose a novel analog joint source-channel coding system for image transmission over noisy channels. First, a variable density compressive sensing encoder is applied to the desired image. Then, the resulting compressive measurements are encoded using a recently proposed non-linear analog encoder, which can operate in a rateless fashion and produces continuous amplitude symbols that are directly transmitted over the noisy channel. Due to the non-uniform energy allocation resulting from the compressive sensing encoder, the non-linear encoder has to be carefully designed following the insights provided by an optimization analysis. Simulation results show the potential of the proposed framework, which completely skips the digital domain (bits are never utilized during the encoding process).

10:30-12:10 Session ThB1: Jitendra Malik, Plenary Speaker

Location: Library

10:30-12:10 Session ThB2: Plenary - Spill Over

Location: Solarium

10:30-12:10 Session ThB3: PLENARY - NO TALKS SCHEDULED

Location: Butternut

10:30-12:10 Session ThB4: PLENARY - NO TALKS SCHEDULED

Location: Pine

13:30-15:10 Session ThC1: Learning and Networks IV

Organizers - Bruce Hajek and R. Srikant

Chair:

Richard Sowers (University of Illinois at Urbana Champaign, United States)

Location: Library

13:30	Rudrajit Das (UT Austin, United States) Sujay Sanghavi (The University of Texas at Austin, United States) Understanding Self Distillation ABSTRACT. Self-distillation (SD) is the process of first training a \enquote{teacher} model and then using its predictions to train a \enquote{student} model with the \textit{same} architecture. Specifically, the student's objective function is (ξ∗ℓ(teacher's predictions, student's predictions)+(1−ξ)∗ℓ(given labels, student's predictions)), where ℓ is some loss function and ξ is some parameter ∈[0,1]. Empirically, SD has been observed to provide performance gains in several settings. In this paper, we theoretically characterize the effect of SD in two supervised learning problems with \textit{noisy labels}. We first analyze SD for regularized linear regression and show that in the high label noise regime, the optimal value of ξ that minimizes the expected error in estimating the ground truth parameter is surprisingly greater than 1. Empirically, we show that ξ>1 works better than ξ≤1 even with the cross-entropy loss for several classification datasets when 50\% or 30\% of the labels are corrupted. Further, we quantify when optimal SD is better than optimal regularization. Next, we analyze SD in the case of logistic regression for binary classification with random label corruption and quantify the range of label corruption in which the student outperforms the teacher in terms of accuracy. To our knowledge, this is the first result of its kind for the cross-entropy loss.
13:50	Thodoris Lykouris (Massachusetts Institute of Technology, United States) Efficient decentralized multi-agent learning in asymmetric bipartite queueing systems ABSTRACT. Learning in multi-agent systems often poses significant challenges due to interference between agents. In particular, unlike classical stochastic systems, the performance of an agent's action is not drawn i.i.d. from some distribution but is directly affected by the (unobserved) actions of the other agents. This is the reason why most collaborative multi-agent learning approaches aim to globally coordinate all agents' actions to evade this interference. In this talk, we focus on agents in a decentralized bipartite queueing system, where N agents request service from K servers. Prior decentralized approaches aim to decentrally identify a globally coordinated schedule, which leads to various shortcomings: they are restricted to symmetric systems, have performance that degrades exponentially in the number of servers, require communication through shared randomness, two-way message passing or unique identifiers, or are computationally demanding. In contrast, we provide a low-complexity algorithm that is run decentrally by each agent, avoids the shortcomings of "global coordination", and leads the queueing system to have efficient performance in asymmetric bipartite queuing systems while also having additional robustness properties. Paper information: The paper on which this talk is based is joint work with Daniel Freund and Wentao Weng and can be found here: https://arxiv.org/abs/2206.03324. The paper is accepted for publication in Operations Research, a preliminary version appeared at COLT'22, and Wentao was selected as a finalist in the Applied Probability Society student paper competition for this work.
14:10	Emmanouil-Vasileios Vlatakis-Gkaragkounis (University of California, Berkeley, United States) Angeliki Giannou (University of Wisconsin-Madison, United States) Yudong Chen (University of Wisconsin-Madison, United States) Qiaomin Xie (University of Wisconsin-Madison, United States) Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements ABSTRACT. For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their convergence behaviors are more complicated even in rudimentary bilinear models. Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem, demonstrating that the average iterate is asymptotically normal with a unique invariant distribution for an extensive range of monotone and non-monotone VIPs. Specializing to convex-concave min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the Von-Neumann's value. Finally, we establish that Richardson-Romberg extrapolation can improve proximity of the average iterate to the global solution for VIPs. Our probabilistic analysis, underpinned by experiments corroborating our theoretical discoveries, harnesses techniques from optimization, Markov chains, and operator theory.
14:30	Harit Vishwakarma (University of Wisconsin-Madison, United States) Heguang Lin (University of Wisconsin-Madison, United States) Ramya Korlakai Vinayak (University of Wisconsin-Madison, United States) Human-in-the-Loop Out-of-Distribution Detection with False Positive Rate Control ABSTRACT. Robustness to Out-of-Distribution (OOD) samples is essential for the successful deployment of machine learning models in the open world. Since it is not possible to have a priori access to a variety of OOD data before deployment, several recent works have focused on designing scoring functions to quantify OOD uncertainty. These methods often find a threshold that achieves $95\%$ true positive rate (TPR) on the In-Distribution (ID) data used for training and uses this threshold for detecting OOD samples. However, this can lead to very high FPR as seen in a comprehensive evaluation in the Open-OOD benchmark, the FPR can range between 60 to 96\% on several ID and OOD dataset combinations. In contrast, practical systems deal with a variety of OOD samples on the fly and critical applications, e.g., medical diagnosis, demanding guaranteed control of the false positive rate (FPR). To meet these challenges, we propose a mathematically grounded framework for human-in-the-loop OOD detection, wherein expert feedback is used to update the threshold. This allows the system to adapt to variations in the OOD data while adhering to the quality constraints. We propose an algorithm that uses any time-valid confidence intervals based on the Law of Iterated Logarithm (LIL). Our theoretical results show that the system meets FPR constraints while minimizing the human feedback for points that are in-distribution. Another key feature of the system is that it can work with any existing post-hoc OOD uncertainty-quantification methods. We evaluate our system empirically on a mixture of benchmark OOD datasets in image classification tasks on CIFAR-10 and CIFAR-100 as in distribution datasets and show that our method can maintain FPR at most $5\%$ while maximizing TPR.
14:50	Arnab Maiti (University of Washington, United States) Kevin Jamieson (University of Washington, United States) Lillian Ratliff (University of Washington, United States) Instance-dependent Sample Complexity Bounds for Zero-sum Matrix Games ABSTRACT. We study the sample complexity of identifying an approximate equilibrium for two-player zero-sum n × 2 matrix games. That is, in a sequence of repeated game plays, how many rounds must the two players play before reaching an approximate equilibrium (e.g., Nash)? We derive instance-dependent bounds that define an ordering over game matrices that captures the intuition that the dynamics of some games converge faster than others. Specifically, we consider a stochastic observation model such that when the two players choose actions i and j, respectively, they both observe each other’s played actions and a stochastic observation Xij such that E[ Xij ] = Aij . To our knowledge, our work is the first case of instance-dependent lower bounds on the number of rounds the players must play before reaching an approximate equilibrium in the sense that the number of rounds depends on the specific properties of the game matrix A as well as the desired accuracy. We also prove a converse statement: there exist player strategies that achieve this lower bound.

13:30-15:10 Session ThC2: ABMR1 - Dynamics of Learning I

Organizers - Ali Belabbas, Мах Raginsky, and Philippe Rigollet

Location: Solarium

13:30	Borjan Geshkovski (MIT, United States) Cyril Letrouit (CNRS, Laboratoire de Mathématiques d'Orsay, France) Yury Polyanskiy (MIT, United States) Philippe Rigollet (MIT, United States) Clustering in transformers: an interacting particle systems perspective PRESENTER: Borjan Geshkovski ABSTRACT. With remarkable empirical success, Transformers enable large language models to compute succinct representations of data using the self-attention mechanism. We model these architectures as interacting particle systems in the spirit of models in collective behaviour and opinion dynamics, allowing us to show the appearance of various clustering phenomena. Based on joint work with Cyril Letrouit, Yury Polyanskiy, and Philippe Rigollet.
13:50	Maxwell Varley (Uni Melbourne, Australia) Timothy Molloy (Australian National University, Australia) Girish Nair (The University of Melbourne, Australia) Pulling Back from Infinity: Asymptotically Optimal Sampling of Infinite-Dimensional Measurements ABSTRACT. Vision-based sensors are widely used in autonomous systems, and yield rich, high-dimensional measurements. Inspired by statistical mechanics, recent work has proposed going to the continuum limit, and treating such measurements as infinite-dimensional objects on a continuous pixel domain. While this can simplify the analysis, for practical implementation the resulting filter must be reduced to a finite-dimensional model with minimal degradation. In this talk, we propose a novel approach for performing this reduction. Assuming linear scalar states with Gaussian process and measurement noise, we seek to sample the infinite-dimensional measurement at N points, in order to minimize the mean-squared filtering error. We show that for large N, this problem can be expressed using the notion of an asymptotic point density function from the field of high-resolution quantization. This leads to a nonlinear Urysohn integral equation, which may be solved numerically to yield an asymptotically optimal N-point filter. The mean-square approximation error decays like N^{-4}, which is faster than the typical N^{-2} decay of high-resolution quantization, suggesting that this method will be efficient even at moderate or small N. These properties are verified by simulations based on a linearized pinhole camera measurement model.
14:10	Yilong Qin (Carnegie Melln University, United States) Andrej Risteski (Carnegie Mellon University, United States) Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains ABSTRACT. Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. Energy-Based Models). The idea is to fit the score of the distribution, rather than the likelihood, thus avoiding the need to evaluate the constant of proportionality. While there's a clear algorithmic benefit, the statistical "cost" can be steep: recent work by Koehler et al. 2022 showed that for distributions that have poor isoperimetric properties (a large Poincaré or log-Sobolev constant), score matching is substantially statistically less efficient than maximum likelihood. However, many natural realistic distributions, e.g. multimodal distributions as simple as a mixture of two Gaussians in one dimension -- have a poor Poincaré constant. In this paper, we show a close connection between the mixing time of an arbitrary Markov process with generator L and an appropriately chosen generalized score matching loss that tries to fit Op/p. If L corresponds to a Markov process corresponding to a continuous version of simulated tempering, we show the corresponding generalized score matching loss is a Gaussian-convolution annealed score matching loss, akin to the one proposed in Song and Ermon 2019. Moreover, we show that if the distribution being learned is a finite mixture of Gaussians in d dimensions with a shared covariance, the sample complexity of annealed score matching is polynomial in the ambient dimension, the diameter the means, and the smallest and largest eigenvalues of the covariance -- obviating the Poincaré constant-based lower bounds of the basic score matching loss shown in Koehler et al. 2022. This is the first result characterizing the benefits of annealing for score matching -- a crucial component in more sophisticated score-based approaches like Song and Ermon 2019.
14:30	Xudong Chen (Washington University in St. Louis, United States) Super-linearization of Polynomial Systems ABSTRACT. In this talk, we investigate finite-dimensional Koopman embeddings of nonlinear dynamical systems, or super-linearizations for short. Roughly speaking, a super-linearization of a system gives rise to a linear dynamical system, evolving in a possibly higher-dimensional state-space, whose trajectories coincide with the trajectories of the original system after projection. The benefits of having a linear model for a nonlinear system are clear; in fact, controller and observer design based on such linearizations, either approximate or exact, has already been widely investigated. While these works start with the assumption of the existence of a linearization, or of a good approximate one via truncation of some infinite-dimensional linearization, we focus in this talk on the existence of exact, finite-dimensional linearization. Amongst others, we provide a sufficient condition for a polynomial dynamical system to be super-linearizable. The condition is expressed in terms of the hereby introduced weighted dependency graph, whose nodes correspond to state-variables and edges have partial derivatives of the polynomial vector field as weights. We demonstrate that if the product of the edge weights along any cycle in the graph is a constant, then the system is super-linearizable. The proof is constructive, and we will provide an algorithm to obtain super-linearizations. Examples will also be provided for illustration.
14:50	Matus Telgarsky (New York University, United States) Benign overfitting, calibration, and the benefits of early stopping ABSTRACT. This paper studies logistic loss minimization in high dimensional ("benign overfitting") cases inspired by interpolation phenomena in neural networks. In contrast with the standard regression setting, where the asymptotic gradient descent solution --- equivalently the ordinary least squares solution --- is the optimal predictor, early stopping is beneficial.

13:30-15:10 Session ThC3: OMRE2 - Machine Learning II

Organizers - Olgica Milenkovic and Lu and Rasoul Etesami

Location: Butternut

13:30	Yuksel Arslantas (Bilkent University, Turkey) Ege Yuceel (Bilkent University, Turkey) Yigit Yalin (Bilkent University, Turkey) Muhammed Sayin (Bilkent University, Turkey) Convergence of Heterogeneous Learning Dynamics in Zero-sum Stochastic Games PRESENTER: Muhammed Sayin ABSTRACT. In this paper, we present new families of algorithms for the repeated play of two-agent (near) zero-sum games and two-agent zero-sum stochastic games. The algorithms can differ in terms of their step sizes, preferences in exploitation vs exploration, access to model knowledge, and access to opponent actions. For example, the family includes fictitious play and its variants as members.Commonly, the algorithms in this family are all uncoupled (i.e., they do not depend on the opponent objective), convergent in homogenous cases (where every agent follow the same algorithm with the same parameters), and rational. Furthermore, the algorithm family is convergent in heterogenous cases, where the dynamics may differ in terms of fast vs slow learning rates, best vs smoothed best responses, belief-based vs payoff-based learning, and model-based vs model-free learning.We prove that any mixture of such asymmetries does not impact the algorithms' convergence to equilibrium (or near equilibrium if there is experimentation) in zero-sum games with repeated play and in zero-sum (irreducible) stochastic games with sufficiently small discount factors.Our findings are of practical interest because agents may have no interest or means to follow the same dynamic with the same parameters in competitive settings. Our work opens the door for a paradigm shift toward provable convergence of heterogeneous learning dynamics in competitive settings to enable their reliable deployment in the wild.
13:50	Thinh Doan (Virginia Tech, United States) A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning ABSTRACT. We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying Markov random processes controlled by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated ``faster'' (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, Polyak-Lojasiewicz (PL) condition, and general non-convexity. We apply our framework to policy optimization problems in control and reinforcement learning. First, we look at the infinite-horizon average-reward Markov decision process with finite state and action spaces and derive a convergence rate of O(k^(-2/5)) for the online actor-critic algorithm under function approximation, which recovers the best-known rate derived specifically for this problem. Second, we study the linear-quadratic regulator and show that an online actor-critic method converges with rate O(k^(-2/3)). Third, we use the actor-critic algorithm to solve the policy optimization problem in an entropy-regularized Markov decision process, where we also establish a convergence of O(k^(-2/3)). The results we derive for both the second and third problems are novel and previously unknown in the literature. Finally, we briefly present the application of our framework to gradient-based policy evaluation algorithms in reinforcement learning.
14:10	Chi Ho Leung (Purdue, United States) William Ebenezer (IIT Kharagpur, India) Ashish Hota (IIT Kharagpur, India) Philip E. Paré (Purdue University, United States) Adaptive Identification from a Support Vector Perspective ABSTRACT. This work addresses the challenge of adaptive identification for non-linear multi-variate dynamical systems. To overcome the requirement of ill-conditioning and lack of persistent excitation in traditional adaptive identification schemes, we propose a novel approach using the $\epsilon$-insensitive norm. The estimated parameters are expressed as a linear combination of support vectors derived from KKT variables, supplemented with information from time-varying a priori parameter constraints. Numerical case studies on networked epidemic systems demonstrate the effectiveness of the proposed algorithm.
14:30	Hao Lou (University of Virginia, United States) Tao Jin (University of Virginia, United States) Yue Wu (UCLA, United States) Pan Xu (Duke, United States) Quanquan Gu (UCLA, United States) Farzad Farnoud (University of Virginia, United States) Active Ranking without Strong Stochastic Transitivity ABSTRACT. In this talk, we consider the problem of recovering the exact full ranking for a list of items under ranking models that do not assume the Strong Stochastic Transitivity property. We propose a δ-correct algorithm, Probe-Rank, that actively learns the ranking from noisy pairwise comparisons. We prove a sample complexity upper bound for Probe-Rank, which only depends on the preference probabilities between items that are adjacent in the true ranking. This improves upon existing sample complexity results that depend on the preference probabilities for all pairs of items. Probe-Rank thus outperforms existing methods over a large collection of instances that do not satisfy Strong Stochastic Transitivity. Thorough numerical experiments in various settings demonstrate that Probe-Rank is significantly more sample-efficient than the state-of-the-art active ranking method.
14:50	Daniel Cullina (Penn State, United States) Characterizing optimal loss for adversarial classification versus multiple adversaries ABSTRACT. Over the last several years, a line of work has characterized the optimal 0-1 loss and log loss for adversarially robust classification. In the case of log loss, there are connections to the graph entropy of a conflict graph on the examples. After discuss this framework for information theoretic bounds, we extend to the setting of multiple adversaries. Here, a fundamental question is: to what extent are there tradeoffs between performance of a single classifier versus each of multiple adversaries? We define Bayesian and minimax multi-adversary settings along with a few other variations and characterize optimal performance in each setting. We compute these optimal loss quantities for several standard benchmark image classification datasets and adversarial models.

13:30-15:10 Session ThC4: BIHU1 - Certification of Neural Networks I

Organizers - Bin Hu and Huan Zhang

Location: Pine

13:30	Lars Lindemann (University of Southern California, United States) Xin Qin (University of Southern California, United States) Jyotirmoy V. Deshmukh (University of Southern California, United States) George J. Pappas (University of Pennsylvania, United States) Conformal Prediction for STL Runtime Verification ABSTRACT. We are interested in predicting failures of learning-enabled systems during their operation. Particularly, we consider stochastic learning-enabled systems and signal temporal logic specifications, and we want to calculate the probability that the current system trajectory will violate the specification. The paper presents two predictive runtime verification algorithms that predict future system states from the current observed system trajectory. As these predictions may not be accurate, we construct prediction regions that quantify prediction uncertainty by using conformal prediction, a statistical tool for uncertainty quantification. Our first algorithm directly constructs a prediction region for the satisfaction measure of the specification so that we can predict specification violations with a desired confidence. The second algorithm constructs prediction regions for future system states first, and uses these to obtain a prediction region for the satisfaction measure. To the best of our knowledge, these are the first formal guarantees for a predictive runtime verification algorithm that applies to widely used trajectory predictors such as RNNs and LSTMs, while being computationally simple and making no assumptions on the underlying distribution. We present numerical experiments of an F-16 aircraft and a self-driving car.
13:50	Peter Seiler (University of Michigan, United States) Analysis of Neural Network Controllers Using Quadratic Constraints ABSTRACT. This talk considers the stability of feedback systems with neural network controllers. The analysis method consists of two key steps. First, we separate the activation functions in the neural network from the (linear) weights and biases. Next, we use quadratic constraints to locally bound the input/output behavior of the activation functions. This approach can be used to build a variety of stability and robustness results for feedback systems. The analysis conditions are formulated as semidefinite programs which can be efficiently solved for "small" neural networks with up to a few hundred neurons. Examples are given to illustrate the proposed approach. This results in this talk build on prior work by (Fazlyab, Morari, Pappas; 2019) and additional details can be found in (Yin, Seiler, Arcak; 2020).
14:10	Kunal Garg (Massachusetts Institute of Technology, United States) Chuchu Fan (Massachusetts Institute of Technology, United States) Neural Network-based Fault Detection and Identification for Quadrotors using Dynamic Symmetry PRESENTER: Kunal Garg ABSTRACT. Autonomous robotic systems, such as quadrotors, are susceptible to actuator faults, and for the safe operation of such systems, timely detection and isolation of these faults is essential. Neural networks can be used for verification of actuator performance via online actuator fault detection with high accuracy. In this paper, we develop a novel model-free fault detection and isolation (FDI) framework for quadrotor systems using long-short-term memory (LSTM) neural network architecture. The proposed framework only uses system output data and the commanded control input and requires no knowledge of the system model. Utilizing the symmetry in quadrotor dynamics, we train the FDI for fault in just one of the motors (e.g., motor $\# 2$), and the trained FDI can predict faults in any of the motors. This reduction in search space enables us to design an FDI for partial fault as well as complete fault scenarios. Numerical experiments illustrate that the proposed NN-FDI correctly verifies the actuator performance and identifies partial as well as complete faults with over $90\%$ prediction accuracy. We also illustrate that model-free NN-FDI performs at par with model-based FDI, and is robust to model uncertainties as well as distribution shifts in input data.
14:30	Saber Jafarpour (Georgia Institute of Technology, United States) Akash Harapanahalli (Georgia Institute of Technology, United States) Samuel Coogan (Georgia Institute of Technology, United States) Interaction-aware interval reachability of neural network controlled systems ABSTRACT. In this talk, we propose a computationally efficient framework for estimating reachable sets of a dynamical system with a neural network controller. Inspired by mixed monotone system theory, we leverage inclusion functions for the open-loop system and the neural network controller to embed the closed-loop system into a larger-dimensional embedding system with favorable control theoretic properties. In particular, we show that the hyper-rectangular over-approximations of the reachable sets can be efficiently computed using a single trajectory of this embedding system. We propose two methods for constructing closed-loop embedding systems, which account for the interactions between the system and the controller in different ways. The interconnection-based approach considers the worst-case evolution of each coordinate separately by substituting the neural network inclusion function into the open-loop inclusion function. The interaction-based approach uses Jacobian-based inclusion functions to capture the first-order interactions between the open-loop system and the controller by leveraging state-of-the-art neural network verifiers. Finally, we demonstrate the efficiency and scalability of our approach on several benchmarks.
14:50	Qi Zhu (Northwestern University, United States) Wenchao Li (Boston University, United States) Chao Huang (University of Liverpool, UK) Xin Chen (University of New Mexico, United States) Weichao Zhou (Boston University, United States) Yixuan Wang (Northwestern University, United States) Jiajun Li (University of Liverpool, UK) Feisi Fu (Boston University, United States) Verification and Design of Robust and Safe Neural Network-enabled Autonomous Systems ABSTRACT. Neural networks are being applied to a wide range of tasks in autonomous systems, such as perception, prediction, planning, control, and general decision making. While they may improve system performance over traditional physical model-based methods, pressing concerns have been raised on the uncertain behaviors of neural networks under varying inputs, especially for safety-critical systems such as autonomous vehicles and robots. In this paper, we will discuss the challenges in ensuring the safety and robustness of neural network-enabled autonomous systems, and present our recent work in addressing these challenges. These include methods for certifying the robustness of neural networks, verifying the safety of neural network-controlled systems, and designing these systems with safety assurance.

15:30-17:30 Session ThD1: Stochastic Methods in Information Systems

Location: Library

15:30	Maryam Keshvari (Wichita State University, United States) Remi Chou (Wichita State University, United States) Distributed Storage Over a Public Channel: Trade-off between Privacy and Shared Key Lengths ABSTRACT. Consider a user who wants to store a file across multiple servers such that at least t servers are needed to reconstruct the file and any z colluding servers cannot learn more than a fraction α of the file. Unlike traditional secret sharing models that assume the availability of secure channels at no cost, we assume that the user can only transmit data to the servers through a public channel and that the user and each server share an individual secret key of length n bits. For a fixed key length n, our objective is to determine the fundamental trade-off between the privacy leakage parameter α and the file length that the user can store in the servers. Furthermore, for the optimal trade-off between α and the file length, we determine (i) the minimum amount of local randomness needed by the user, (ii) the minimum amount of public communication from the user to the servers, and (iii) and the minimum storage requirement at the servers.
15:50	Yinbin Ma (University of Illinois Chicago, United States) Daniela Tuninetti (University of Illinois Chicago, United States) Coded Caching with Linear Coded Placement: Exact Tradeoff for the Three User Case PRESENTER: Yinbin Ma ABSTRACT. This paper studies the optimal memory-load tradeoff in a coded caching system with K = 3 users under the constraint that the contents in the local caches are the result of encoding the files by a linear code. This setting generalizes pastwork that had established the optimal tradeoff under uncodedplacement. Let N be the number of files. For K = N = 3 the optimal tradeoff under linear coded placement is shown to have a corner point in the low memory regime that was unknown before this work, which is actually optimalwithout any restrictions on the placement. For K = 3, N ≥ 4,the optimal tradeoff under linear coded placement is shown to be attained by uncoded placement. As a consequence of this result together with past optimality results, it is an open question whether non-linear coded placement would outperform the tradeoff derived in this work for the memory regime M ∈ (1/2, 1) for N = 3, and M ∈ (0, N/3) for K = 3, N ∈ {4, 5}.
16:10	Viswanathan Ramachandran (KTH Royal Institute of Technology, Sweden) Tobias J. Oechtering (KTH Royal Institute of Technology, Sweden) Mikael Skoglund (KTH Royal Institute of Technology, Sweden) Multi-terminal Strong Coordination over Noiseless Networks with Secrecy Constraints ABSTRACT. We investigate the problem of securely emulating a two-user multiple-access channel (MAC) aided by a multiple-access network of noiseless links as a resource. In this configuration, two encoders observe independent and identically distributed (i.i.d.) samples of a source random variable each, while the decoder receives i.i.d. samples of a side-information random variable. At limited rates, each encoder and the decoder possess independent pairwise shared randomness. The objective is for the decoder to generate approximately i.i.d. samples of another random variable, which is jointly distributed with the two sources and the side information. Furthermore, we require that an external eavesdropper who intercepts the communication and has correlated observations but no access to the shared randomness variables, learns virtually nothing about the input sources and the simulated output sequence. We are interested in the rate tuples which permit such simulation with strong secrecy. We establish a complete characterization for this secure multi-terminal coordination problem when the sources are independent and one of the pairwise shared randomness rates is unconstrained. Moreover, we derive an achievable region and an outer bound for the general case of correlated sources and limited shared randomness rates.
16:30	Barron Han (Caltech, United States) Oron Sabag (The Hebrew University of Jerusalem, Israel) Victoria Kostina (Caltech, United States) Babak Hassibi (Caltech, United States) Coded Kalman Filtering Over Gaussian Channels with Feedback PRESENTER: Barron Han ABSTRACT. This paper investigates the problem of zero-delay joint source-channel coding of a vector Gauss-Markov source over a multiple-input mulitple-output (MIMO) additive white Gaussian noise (AWGN) channel with feedback. In contrast to the classical problem of causal estimation using noisy observations, we examine a system where the source can be encoded before transmission. An encoder, equipped with feedback of past channel outputs, observes the source state and encodes the information in a causal manner as inputs to the channel while adhering to a power constraint. The objective of the code is to estimate the source state with minimum mean square error at the infinite horizon. This work shows a fundamental theorem for two scenarios: for the transmission of an unstable vector Gauss-Markov source over either a multiple-input single-output (MISO) or a single-input multiple-output (SIMO) AWGN channel, finite estimation error is achievable if and only if the sum of the unstable eigenvalues of the state gain matrix is less than the Shannon channel capacity. We prove these results by showing an optimal linear innovations encoder that can be applied to sources and channels of any dimension and analyzing it together with the corresponding Kalman filter decoder.
16:50	Xinyi Tong (Tsinghua University, China) Shao-Lun Huang (Tsinghua Shenzhen International Graduate School, China) Optimal Error Exponent of Type-Based Distributed Hypothesis Testing over AWGN Channels PRESENTER: Shao-Lun Huang ABSTRACT. This paper investigates the distributed hypothesis testing (DHT) problem over AWGN channels, where the distributed nodes are restricted to transmit functions of the empirical distributions of the observed data sequences due to practical computational constraints. The goal of this problem is to design the transmitted signals of the distributed nodes, so that the decision center can make a decision based on the received signals with the maximal decision error exponent. In order to solve this problem, we first propose a mixture coding strategy for each node, which combines the decode-and-forward and amplify-and-forward strategies, and then the achievable error exponent is computed. Moreover, we present a genie-aided approach to prove the converse result, thereby demonstrating the optimality of the achievable error exponent. In summary, our results provide coding guidance for distributed systems, and could be applied to more complicated problems.

15:30-17:10 Session ThD2: YLMD1 - Learning and Estimation in High Dimensions

Location: Solarium

15:30	Yiqiao Zhong (University of Wisconsin--Madison, United States) Andrea Montanari (Stanford University, United States) Why interpolating neural nets generalize well: recent insights from neural tangent model ABSTRACT. A mystery of modern neural networks is their surprising generalization power in overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones; despite this, they achieve good prediction error on unseen data. In this talk, we focus on the neural tangent (NT) model for two-layer neural networks, which is a simplified model. Under the isotropic input data, we first show that interpolation phase transition is around Nd ~ n, where Nd is the number of parameters and n is the sample size. To demystify the generalization puzzle, we consider the min-norm interpolator and show that its test error/generalization error is largely determined by a smooth, low-degree component. Moreover, we find that nonlinearity of the activation function has an implicit regularization effect. These results offer new insights to recent discoveries in overparametrized models such as double descent phenomena.
15:50	Weijie Su (University of Pennsylvania, United States) A Law of Data Separation in Deep Learning ABSTRACT. While deep learning has enabled significant advances in many areas of science, its black-box nature hinders architecture design for future artificial intelligence applications and interpretation for high-stakes decision makings. We addressed this issue by studying the fundamental question of how deep neural networks process data in the intermediate layers. Our finding is a simple and quantitative law that governs how deep neural networks separate data according to class membership throughout all layers for classification. This law shows that each layer improves data separation at a constant geometric rate, and its emergence is observed in a collection of network architectures and datasets during training. This law offers practical guidelines for designing architectures, improving model robustness and out-of-sample performance, as well as interpreting the predictions.
16:10	Ahmed El Alaoui (Cornell University, United States) Stochastic localization and algorithmic variants ABSTRACT. I will present a general method for sampling from a probability distribution. The method, not based on a Markov chain, is derived from the stochastic localization process of Eldan (2013), and proceeds by iterating two steps: (1) compute the mean vector m_t of a measure obtained by tilting the target measure in a direction y_t, and (2) update the tilting direction y_t using m_t and extra Gaussian noise. This method, when viewed in continuous time, turns out to be equivalent to recent diffusion-based sampling methods, much popular for generating realistic images. I will review the main properties of the method and show how it can be applied using one or two examples.
16:30	Galen Reeves (Duke University, United States) Henry Pfister (Duke University, United States) Achieving Capacity on Non-Binary Channels with Generalized Reed–Muller Codes PRESENTER: Henry Pfister ABSTRACT. Recently, the authors showed that Reed-Muller (RM) codes achieve capacity on binary memoryless symmetric (BMS) channels with respect to bit error rate. Here, we extend that work by showing that RM codes defined on non-binary fields, known as generalized RM codes, achieve capacity on sufficiently symmetric non-binary channels with respect to symbol error rate. The new proof also simplifies the previous approach (for BMS channels) in a variety of ways that may be of independent interest. Connections to other high-dimensional estimation problems, such as compressed sensing, are also discussed.
16:50	Charles Bouman (Purdue University, United States) Gregery Buzzard (Purdue University, United States) Generative Plug and Play: Posterior Sampling for Inverse Problems ABSTRACT. Over the past decade, Plug-and-Play (PnP) has become a popular method for reconstructing images using a modular framework consisting of a forward and prior model. The great strength of PnP is that an image denoiser can be used as a prior model while the forward model can be implemented using more traditional physics-based approaches. However, a limitation of PnP is that it reconstructs only a single deterministic image. In this paper, we introduce Generative Plug-and-Play (GPnP), a generalization of PnP to sample from the posterior distribution. As with PnP, GPnP has a modular framework using a physics-based forward model and an image denoising prior model. However, in GPnP these models are extended to become proximal generators, which sample from associated distributions. GPnP applies these proximal generators in alternation to produce samples from the posterior. We present experimental simulations using the well-known BM3D denoiser. Our results demonstrate that the GPnP method is robust, easy to implement, and produces intuitively reasonable samples from the posterior for sparse interpolation and tomographic reconstruction.

15:30-17:30 Session ThD3: Compressed Sensing and Blockmodels

Location: Butternut

15:30	Silpa Babu (Iowa State University, United States) Namrata Vaswani (Iowa State University, United States) A Fast Algorithm for Low Rank + Sparse column-wise Compressive Sensing ABSTRACT. This paper focuses studies the following low rank plus sparse column-wise compressive sensing problem. We aim to recover a matrix from undersampled independent linear projections of each of its columns. We assume that the unknown matrix can be decomposed as a sum of low rank matrix and a sparse matrix , where rank of the low rank matrix is given by r satisfies r << min(n, q) and S is a sparse matrix. Each column of S contains rho non-zero entries. To address this recovery problem, we propose a novel fast GD-based solution called AltGDmin-LR+S, which is memory and communication efficient. We numerically evaluate its performance by conducting a detailed simulation based study.
15:50	Muralikrishnna Guruswamy Sethuraman (Georgia Institute of Technology, United States) Hang Zhang (Amazon, United States) Faramarz Fekri (Georgia Institute of Technology, United States) A Density Evolution Framework for Recovery of Covariance and Causal Graphs from Compressed Measurements PRESENTER: Muralikrishnna Guruswamy Sethuraman ABSTRACT. In this paper, we propose a general framework for designing sensing matrix $A \in R^{d\times p}$, for estimation of sparse covariance matrix from compressed measurements of the form $y = Ax + n$, where $y, n \in R^d$, and $x \in R^p$. By viewing covariance recovery as inference over factor graphs via message passing algorithm, ideas from coding theory, such as \textit{Density Evolution} (DE), are leveraged to construct a framework for the design of the sensing matrix. The proposed framework can handle both (1) regular sensing, i.e., equal importance is given to all entries of the covariance, and (2) preferential sensing, i.e., higher importance is given to a part of the covariance matrix. Through experiments, we show that the sensing matrix designed via density evolution can match the state-of-the-art for covariance recovery in the regular sensing paradigm and attain improved performance in the preferential sensing regime. Additionally, we study the feasibility of causal graph structure recovery using the estimated covariance matrix obtained from the compressed measurements.
16:10	Mengyu Zhao (Rutgers University, United States) Shirin Jalali (Rutgers University, United States) Theoretical Analysis of Binary Masks in Snapshot Compressive Imaging Systems PRESENTER: Mengyu Zhao ABSTRACT. Snapshot compressive imaging (SCI) systems have gained significant attention in recent years. While previous theoretical studies have primarily focused on the performance analysis of Gaussian masks, practical SCI systems often employ binary-valued masks. Furthermore, recent research has demonstrated that optimized binary masks can significantly enhance system performance. In this paper, we present a comprehensive theoretical characterization of binary masks and their impact on SCI system performance. Initially, we investigate the scenario where the masks are binary and independently identically distributed (iid), revealing a noteworthy finding that aligns with prior numerical results. Specifically, we show that the optimal probability of non-zero elements in the masks is smaller than 0.5. This result provides valuable insights into the design and optimization of binary masks for SCI systems, facilitating further advancements in the field. Additionally, we extend our analysis to characterize the performance of SCI systems where the mask entries are not independent but are generated based on a stationary first-order Markov process. Overall, our theoretical framework offers a comprehensive understanding of the performance implications associated with binary masks in SCI systems.
16:30	Simina Brânzei (Purdue University, United States) Nithish Kumar (Purdue University, United States) Gireeja Ranade (UC Berkeley, United States) Phase Transitions for Diversity in Stochastic Block Model Dynamics ABSTRACT. This paper proposes a stochastic block model with dynamics where the population grows using preferential attachment. Nodes with higher weighted degree are more likely to recruit nodes from their own community. We focus on the case of two communities. This model can capture how communities grow or shrink based on their collaborations with other nodes in the network. We derive a deterministic approximation to the dynamics and characterize the phase transitions for diversity, i.e. the parameter regimes in which either one of the communities dies out over time or the two communities reach parity. We show the minority may vanish when the probability of cross-community edges is low, even when cross-community projects are more valuable than monochrome projects (i.e. projects with collaborators from the same community).
16:50	Joonhyuk Yang (KAIST(Korea Advanced Institute of Science and Technology), South Korea) Hye Won Chung (KAIST(Korea Advanced Institute of Science and Technology), South Korea) Graph Matching in Correlated Stochastic Block Models for Improved Graph Clustering ABSTRACT. We consider the community detection from multiple correlated graphs sharing the same community structure. The correlated graphs are generated by independent subsampling of a parent graph sampled from the stochastic block model. The vertex correspondence between the correlated graphs is assumed to be unknown. We consider the two-step procedure where the vertex correspondence between the correlated graphs is first revealed, and the communities are recovered from the union of the correlated graphs, which becomes denser than each single graph. We derive the information-theoretic limits for exact graph matching in general density regimes and the number of communities, and then analyze the regime of graph parameters, where one can benefit from the matching of the correlated graphs in recovering the latent community structure of the graphs.

15:30-16:30 Session ThD4: Stochastic Systems

Location: Pine

15:30	Ulysses Lee (UCSD, United States) Subspace Methods for Blind Fractionally-Spaced Equalization of Distributed Phase-Shift Beamforming Signals ABSTRACT. This work investigates different subspace-based approaches for the equalization of signals transmitted from phase shift Distributed Beamforming systems. While phase shifters have been shown to be capable of enabling distributed beamforming by providing carrier coherence, it does not address mismatches in symbol timing, resulting in potentially significant levels of intersymbol interference. Using fractionally spaced receivers and the singular value decomposition of the channel matrix, we design linear and nonlinear equalization methods to combat the distortion induced by this symbol timing mismatch. We show that when linear equalization is not practical, a Causal Minimum Noise Decision Feedback Equalizer would provide us with better symbol error performance when operating with high signal-to-noise ratio, and we provide a lower bound on the error rate of this proposed equalization method.
15:50	Shuai Huang (Emory University, United States) Deqiang Qiu (Emory University, United States) Rotation-Invariant Correlation Features for 2D Sparse Unknown View Tomography ABSTRACT. 2D sparse unknown view tomography aims to reconstruct a set of point sources from its random projections with unknown viewing directions. We propose a set of rich rotation-invariant correlation features that can be extracted from the projections, which allows us to reconstruct Gaussian point-source signals without estimating the viewing directions. We show that the proposed correlation feature is a quadratic function of the signal, and formulate the reconstruction problem as solving a system of quadratic equations with linear and nonnegative constraints. Simulation experiments show that the proposed approach is robust to noise and successfully recovers the point set with Gaussian sources.
16:10	Jamie Haddock (Harvey Mudd College, United States) Anna Ma (University of California, Irvine, United States) Elizaveta Rebrova (Princeton University, United States) On Subsampled Quantile Randomized Kaczmarz PRESENTER: Jamie Haddock ABSTRACT. When solving noisy linear systems Ax = b + c, the theoretical and empirical performance of stochastic iterative methods, such as the Randomized Kaczmarz algorithm, depends on the noise level. However, if there are a small number of highly corrupt measurements, one can instead use quantile- based methods to guarantee convergence to the solution x of the system, despite the presence of noise. Such methods require the computation of the entire residual vector, which may not be desirable or even feasible in some cases. In this work, we analyze the sub-sampled quantile Randomized Kaczmarz (sQRK) algorithm for solving large-scale linear systems which utilize a sub-sampled residual to approximate the quantile threshold. We prove that this method converges to the unique solution to the linear system and provide numerical experiments that support our theoretical findings. We additionally remark on the extremely small sample size case and demonstrate the importance of interplay between the choice of quantile and subset size.
16:30	Anju Anand (Binghamton University, United States) Emrah Akyol (Binghamton University, United States) Strategic Quantization of a Noisy Source ABSTRACT. This paper is concerned with strategic quantization of a noisy source where the encoder, which observes the remote source through a noisy channel, and the decoder, with a distortion defined over the remote source, have misaligned objectives. Such scenarios constitute a special class of games called information design in Economics. Here, we approach them with an Engineering lens by adding a constraint on the message space in the problem setting. This is an extension of our recent work on strategic quantization of a noiseless source. Finally, we present the numerical results that validate the analysis in this paper. The codes are available at: https://tinyurl.com/allerton2023.