View: session overviewtalk overview
Organizers - Bruce Hajek and R. Srikant
08:30 | Natural Actor-Critic for Robust Reinforcement Learning ABSTRACT. We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales up. To this end, we propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric. Both make large-scale robust RL tractable even when one only has access to a simulator. We propose a robust natural actor-critic (RNAC) approach that incorporates the new uncertainty sets and employs function approximation. We provide finite-time convergence guarantees for the proposed RNAC algorithm to the optimal robust policy within the function approximation error. Finally, we demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task. |
08:50 | Curse of Dimension of Thompson Sampling for Combinatorial Bandits ABSTRACT. We consider Thompson Sampling (TS) for combinatorial bandits with Bernoulli rewards and uniform priors, and we show that TS suffers from the curse of dimension in the sense that its problem dependent regret scales exponentially in the ambiant dimension, and that its minimax regret scales close to linearly in time. Both these facts indicate that TS is strictly sub-optimal for the problem at hand, when compared with optimistic algorithms such as ESCB. We further show that simple fixes such as including forced exploration do not alleviate the problem. We further confirm our theoretical predictions with numerical experiments, even on simple examples. |
09:10 | Best Policy Identification in Linear MDPs ABSTRACT. We consider the problem of best policy identification in discounted Linear Markov Decision Processes in the fixed confidence setting, under both generative and forward models. We derive an instance-specific lower bound on the expected number of samples required to identify an $\varepsilon$-optimal policy with probability $1-\delta$. The lower bound characterizes the optimal sampling rule as the solution of an intricate non-convex optimization program, but can be used as the starting point to devise simple and near-optimal sampling rules and algorithms. We devise such algorithms. In the generative model, our algorithm exhibits a sample complexity upper bounded by ${\cal O}( (d(1-\gamma)^{-4}/ (\varepsilon+\Delta)^2) (\log(1/\delta)+d))$ where $\Delta$ denotes the minimum reward gap of sub-optimal actions and $d$ is the dimension of the feature space. This upper bound holds in the moderate-confidence regime (i.e., for all $\delta$), and matches existing minimax and gap-dependent lower bounds. In the forward model, we determine a condition under which learning approximately optimal policies is possible; this condition is weak and does not require the MDP to be ergodic nor communicating. Under this condition, the sample complexity of our algorithm is asymptotically (as $\delta$ approaches 0) upper bounded by ${\cal O}((\sigma^\star(1-\gamma)^{-4}/{(\varepsilon+\Delta)^2}) (\log(\frac{1}{\delta})))$ where $\sigma^\star$ is an instance-specific constant, value of an optimal experiment-design problem. To derive this bound, we establish novel concentration results for random matrices built on Markovian data. |
09:30 | Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD PRESENTER: Constantine Caramanis ABSTRACT. A long line of work has established conditions under which SGD converges to a stationary point, along with corresponding rates of convergence. Typically, this requires control of the smoothness of the gradients, as well as the variance of the stochastic oracle. Adaptive algorithms like AdaGrad address the same problem, but do not require a priori knowledge of key regularity parameters of the problem. We consider the class of $(L_0,L_1)$-smooth functions proposed by Zhang et al.\ (ICLR'20). Empirical evidence suggests that these functions more closely capture practical machine learning problems as compared to the pervasive $L_0$-smoothness. To date, analysis of adaptive gradient descent algorithms has required a uniform boundedness assumption on the stochastic oracle (the noise), a condition that is often violated in real-world problems. This is precisely the problem we address. A major challenge in the analysis of adaptive stochastic gradient descent is the correlation between the stochastic gradients and the step-size. Our key innovation is a recursively-defined stopping time. Using this idea, we prove a 1/sqrt{T} rate of convergence, for $(L_0,L_1)$-smooth functions, but without requiring uniformly bounded noise. |
09:50 | Age of Gossip on a Grid ABSTRACT. We consider a gossip network consisting of a source generating updates and $n$ nodes connected in a two-dimensional square grid. The source keeps updates of a process, that might be generated or observed, and shares them with the grid network. The nodes in the grid network communicate with their neighbors and disseminate these version updates using a push-style gossip strategy. We use the version age metric to quantify the timeliness of information at the nodes. We find an upper bound for the average version age for a set of nodes in a general network. Using this, we show that the average version age at a node scales as $O(n^{\frac{1}{3}})$ in a grid network. Prior to our work, it has been known that when $n$ nodes are connected on a ring the version age scales as $O(n^{\frac{1}{2}})$, and when they are connected on a fully-connected graph the version age scales as $O(\log n)$. Ours is the first work to show an age scaling result for a connectivity structure other than the ring and fully-connected networks that represent two extremes of network connectivity. Our work shows that higher connectivity on a grid compared to a ring lowers the age experience of each node from $O(n^{\frac{1}{2}})$ to $O(n^{\frac{1}{3}})$. |
Organizers - Richard Zhang and Salar Fattahi
08:30 | Fairness in Distributed Sensor Systems to Detect Emitters ABSTRACT. When designing a distributed sensor system to detect the occurrence of an emitter in multiple communities, the designer may face questions related to fairness. To assist designers answer such questions, we adopt a fairness definition based on envy-free fairness: we define a design to be envy-free fair when all communities have the same probability of detecting the emitter. Using an optimization framework considering a constraint in the total rate of false alarms, we show that, when communities are isolated, the optimum envy-free fair system exists; and, for a given sensor allocation, we propose an algorithm to find such a system. We also propose an algorithm to allocate sensors to communities and, through examples, we illustrate that it can produce allocations close to the optimum allocation most of the times. |
08:50 | AltGDMin for Fast Federated Low Rank Matrix Completion ABSTRACT. We develop a fast, communication-efficient and private algorithm, called Alternating Gradient Descent and Minimization (AltGDMin), for solving the Low Rank Matrix Completion (LRMC) problem. The communication-efficiency and privacy claims assume a vertically federated setting in which each node observes some entries of a different subset of columns of the unknown LR matrix X∗. Using extensive simulation experiments, we demonstrate the superior communication efficiency of AltGDMin. We also numerically evaluate the sample/iteration complexity and show competitive performance compared to benchmark methods. |
09:10 | Communication-efficient Federated Multi-resource Allocation: A Non-homogeneous Markov Chain Model PRESENTER: Syed Eqbal Alam ABSTRACT. In many applications in communication networks, smart energy systems, edge computing, smart cities, etcetera, agents wish not to exchange information with other agents in the network and make decisions based on their choices and preferences and local computation. The agents cooperate with a central server to minimize the overall cost to the network. Such settings are called federated setting; it has recently received much interest from the research community. In this paper, we develop a communication-efficient stochastic multi-resource allocation algorithm for federating settings, generalizing the additive increase multiplicative decrease (AIMD) algorithm. Our solution does not require inter-agent communication. We consider a central server that coordinates with the agents to track the aggregate consumption of resources and sends one-bit feedback signals in the multi-agent system when the resource capacity constraints are violated. We model the system as a non-homogeneous Markov chain with place-dependent probabilities. We show the convergence of the average allocations to the optimal values. Furthermore, we present experimental results to demonstrate the algorithms’ efficacy. |
09:30 | Byzantine-resilient Federated Low Rank Column-wise Compressive Sensing ABSTRACT. In this work we consider the low rank column-wise (compressive) sensing (LRCCS) problem within a horizontally federated framework. We introduce a fast and communication-efficient algorithm for solving this problem, while being provably resilient to Byzantine attacks on all but the initial iterations of the algorithm. We do this by modifying our recently developed GD-based algorithm called AltGDmin (alternating GD and minimization) by replacing the summation/mean step of its GD step by the geometric median (GM). |
Organizers - Olgica Milenkovic and Ilan Shomorony
Organizers - Bruce Hajek and R. Srikant
Organizers - Ali Belabbas, Мах Raginsky, and Philippe Rigollet
13:30 | Clustering in transformers: an interacting particle systems perspective PRESENTER: Borjan Geshkovski ABSTRACT. With remarkable empirical success, Transformers enable large language models to compute succinct representations of data using the self-attention mechanism. We model these architectures as interacting particle systems in the spirit of models in collective behaviour and opinion dynamics, allowing us to show the appearance of various clustering phenomena. Based on joint work with Cyril Letrouit, Yury Polyanskiy, and Philippe Rigollet. |
13:50 | Pulling Back from Infinity: Asymptotically Optimal Sampling of Infinite-Dimensional Measurements ABSTRACT. Vision-based sensors are widely used in autonomous systems, and yield rich, high-dimensional measurements. Inspired by statistical mechanics, recent work has proposed going to the continuum limit, and treating such measurements as infinite-dimensional objects on a continuous pixel domain. While this can simplify the analysis, for practical implementation the resulting filter must be reduced to a finite-dimensional model with minimal degradation. In this talk, we propose a novel approach for performing this reduction. Assuming linear scalar states with Gaussian process and measurement noise, we seek to sample the infinite-dimensional measurement at N points, in order to minimize the mean-squared filtering error. We show that for large N, this problem can be expressed using the notion of an asymptotic point density function from the field of high-resolution quantization. This leads to a nonlinear Urysohn integral equation, which may be solved numerically to yield an asymptotically optimal N-point filter. The mean-square approximation error decays like N^{-4}, which is faster than the typical N^{-2} decay of high-resolution quantization, suggesting that this method will be efficient even at moderate or small N. These properties are verified by simulations based on a linearized pinhole camera measurement model. |
14:10 | Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains ABSTRACT. Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. Energy-Based Models). The idea is to fit the score of the distribution, rather than the likelihood, thus avoiding the need to evaluate the constant of proportionality. While there's a clear algorithmic benefit, the statistical "cost" can be steep: recent work by Koehler et al. 2022 showed that for distributions that have poor isoperimetric properties (a large Poincaré or log-Sobolev constant), score matching is substantially statistically less efficient than maximum likelihood. However, many natural realistic distributions, e.g. multimodal distributions as simple as a mixture of two Gaussians in one dimension -- have a poor Poincaré constant. In this paper, we show a close connection between the mixing time of an arbitrary Markov process with generator L and an appropriately chosen generalized score matching loss that tries to fit Op/p. If L corresponds to a Markov process corresponding to a continuous version of simulated tempering, we show the corresponding generalized score matching loss is a Gaussian-convolution annealed score matching loss, akin to the one proposed in Song and Ermon 2019. Moreover, we show that if the distribution being learned is a finite mixture of Gaussians in d dimensions with a shared covariance, the sample complexity of annealed score matching is polynomial in the ambient dimension, the diameter the means, and the smallest and largest eigenvalues of the covariance -- obviating the Poincaré constant-based lower bounds of the basic score matching loss shown in Koehler et al. 2022. This is the first result characterizing the benefits of annealing for score matching -- a crucial component in more sophisticated score-based approaches like Song and Ermon 2019. |
14:30 | Super-linearization of Polynomial Systems ABSTRACT. In this talk, we investigate finite-dimensional Koopman embeddings of nonlinear dynamical systems, or super-linearizations for short. Roughly speaking, a super-linearization of a system gives rise to a linear dynamical system, evolving in a possibly higher-dimensional state-space, whose trajectories coincide with the trajectories of the original system after projection. The benefits of having a linear model for a nonlinear system are clear; in fact, controller and observer design based on such linearizations, either approximate or exact, has already been widely investigated. While these works start with the assumption of the existence of a linearization, or of a good approximate one via truncation of some infinite-dimensional linearization, we focus in this talk on the existence of exact, finite-dimensional linearization. Amongst others, we provide a sufficient condition for a polynomial dynamical system to be super-linearizable. The condition is expressed in terms of the hereby introduced weighted dependency graph, whose nodes correspond to state-variables and edges have partial derivatives of the polynomial vector field as weights. We demonstrate that if the product of the edge weights along any cycle in the graph is a constant, then the system is super-linearizable. The proof is constructive, and we will provide an algorithm to obtain super-linearizations. Examples will also be provided for illustration. |
14:50 | Benign overfitting, calibration, and the benefits of early stopping ABSTRACT. This paper studies logistic loss minimization in high dimensional ("benign overfitting") cases inspired by interpolation phenomena in neural networks. In contrast with the standard regression setting, where the asymptotic gradient descent solution --- equivalently the ordinary least squares solution --- is the optimal predictor, early stopping is beneficial. |
Organizers - Olgica Milenkovic and Lu and Rasoul Etesami
13:30 | Convergence of Heterogeneous Learning Dynamics in Zero-sum Stochastic Games PRESENTER: Muhammed Sayin ABSTRACT. In this paper, we present new families of algorithms for the repeated play of two-agent (near) zero-sum games and two-agent zero-sum stochastic games. The algorithms can differ in terms of their step sizes, preferences in exploitation vs exploration, access to model knowledge, and access to opponent actions. For example, the family includes fictitious play and its variants as members.Commonly, the algorithms in this family are all uncoupled (i.e., they do not depend on the opponent objective), convergent in homogenous cases (where every agent follow the same algorithm with the same parameters), and rational. Furthermore, the algorithm family is convergent in heterogenous cases, where the dynamics may differ in terms of fast vs slow learning rates, best vs smoothed best responses, belief-based vs payoff-based learning, and model-based vs model-free learning.We prove that any mixture of such asymmetries does not impact the algorithms' convergence to equilibrium (or near equilibrium if there is experimentation) in zero-sum games with repeated play and in zero-sum (irreducible) stochastic games with sufficiently small discount factors.Our findings are of practical interest because agents may have no interest or means to follow the same dynamic with the same parameters in competitive settings. Our work opens the door for a paradigm shift toward provable convergence of heterogeneous learning dynamics in competitive settings to enable their reliable deployment in the wild. |
13:50 | A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning ABSTRACT. We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying Markov random processes controlled by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated ``faster'' (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, Polyak-Lojasiewicz (PL) condition, and general non-convexity. We apply our framework to policy optimization problems in control and reinforcement learning. First, we look at the infinite-horizon average-reward Markov decision process with finite state and action spaces and derive a convergence rate of O(k^(-2/5)) for the online actor-critic algorithm under function approximation, which recovers the best-known rate derived specifically for this problem. Second, we study the linear-quadratic regulator and show that an online actor-critic method converges with rate O(k^(-2/3)). Third, we use the actor-critic algorithm to solve the policy optimization problem in an entropy-regularized Markov decision process, where we also establish a convergence of O(k^(-2/3)). The results we derive for both the second and third problems are novel and previously unknown in the literature. Finally, we briefly present the application of our framework to gradient-based policy evaluation algorithms in reinforcement learning. |
14:10 | Adaptive Identification from a Support Vector Perspective ABSTRACT. This work addresses the challenge of adaptive identification for non-linear multi-variate dynamical systems. To overcome the requirement of ill-conditioning and lack of persistent excitation in traditional adaptive identification schemes, we propose a novel approach using the $\epsilon$-insensitive norm. The estimated parameters are expressed as a linear combination of support vectors derived from KKT variables, supplemented with information from time-varying a priori parameter constraints. Numerical case studies on networked epidemic systems demonstrate the effectiveness of the proposed algorithm. |
14:30 | Active Ranking without Strong Stochastic Transitivity ABSTRACT. In this talk, we consider the problem of recovering the exact full ranking for a list of items under ranking models that do not assume the Strong Stochastic Transitivity property. We propose a δ-correct algorithm, Probe-Rank, that actively learns the ranking from noisy pairwise comparisons. We prove a sample complexity upper bound for Probe-Rank, which only depends on the preference probabilities between items that are adjacent in the true ranking. This improves upon existing sample complexity results that depend on the preference probabilities for all pairs of items. Probe-Rank thus outperforms existing methods over a large collection of instances that do not satisfy Strong Stochastic Transitivity. Thorough numerical experiments in various settings demonstrate that Probe-Rank is significantly more sample-efficient than the state-of-the-art active ranking method. |
14:50 | Characterizing optimal loss for adversarial classification versus multiple adversaries ABSTRACT. Over the last several years, a line of work has characterized the optimal 0-1 loss and log loss for adversarially robust classification. In the case of log loss, there are connections to the graph entropy of a conflict graph on the examples. After discuss this framework for information theoretic bounds, we extend to the setting of multiple adversaries. Here, a fundamental question is: to what extent are there tradeoffs between performance of a single classifier versus each of multiple adversaries? We define Bayesian and minimax multi-adversary settings along with a few other variations and characterize optimal performance in each setting. We compute these optimal loss quantities for several standard benchmark image classification datasets and adversarial models. |
Organizers - Bin Hu and Huan Zhang
13:30 | Conformal Prediction for STL Runtime Verification ABSTRACT. We are interested in predicting failures of learning-enabled systems during their operation. Particularly, we consider stochastic learning-enabled systems and signal temporal logic specifications, and we want to calculate the probability that the current system trajectory will violate the specification. The paper presents two predictive runtime verification algorithms that predict future system states from the current observed system trajectory. As these predictions may not be accurate, we construct prediction regions that quantify prediction uncertainty by using conformal prediction, a statistical tool for uncertainty quantification. Our first algorithm directly constructs a prediction region for the satisfaction measure of the specification so that we can predict specification violations with a desired confidence. The second algorithm constructs prediction regions for future system states first, and uses these to obtain a prediction region for the satisfaction measure. To the best of our knowledge, these are the first formal guarantees for a predictive runtime verification algorithm that applies to widely used trajectory predictors such as RNNs and LSTMs, while being computationally simple and making no assumptions on the underlying distribution. We present numerical experiments of an F-16 aircraft and a self-driving car. |
13:50 | Analysis of Neural Network Controllers Using Quadratic Constraints ABSTRACT. This talk considers the stability of feedback systems with neural network controllers. The analysis method consists of two key steps. First, we separate the activation functions in the neural network from the (linear) weights and biases. Next, we use quadratic constraints to locally bound the input/output behavior of the activation functions. This approach can be used to build a variety of stability and robustness results for feedback systems. The analysis conditions are formulated as semidefinite programs which can be efficiently solved for "small" neural networks with up to a few hundred neurons. Examples are given to illustrate the proposed approach. This results in this talk build on prior work by (Fazlyab, Morari, Pappas; 2019) and additional details can be found in (Yin, Seiler, Arcak; 2020). |
14:10 | Neural Network-based Fault Detection and Identification for Quadrotors using Dynamic Symmetry PRESENTER: Kunal Garg ABSTRACT. Autonomous robotic systems, such as quadrotors, are susceptible to actuator faults, and for the safe operation of such systems, timely detection and isolation of these faults is essential. Neural networks can be used for verification of actuator performance via online actuator fault detection with high accuracy. In this paper, we develop a novel model-free fault detection and isolation (FDI) framework for quadrotor systems using long-short-term memory (LSTM) neural network architecture. The proposed framework only uses system output data and the commanded control input and requires no knowledge of the system model. Utilizing the symmetry in quadrotor dynamics, we train the FDI for fault in just one of the motors (e.g., motor $\# 2$), and the trained FDI can predict faults in any of the motors. This reduction in search space enables us to design an FDI for partial fault as well as complete fault scenarios. Numerical experiments illustrate that the proposed NN-FDI correctly verifies the actuator performance and identifies partial as well as complete faults with over $90\%$ prediction accuracy. We also illustrate that model-free NN-FDI performs at par with model-based FDI, and is robust to model uncertainties as well as distribution shifts in input data. |
14:30 | Interaction-aware interval reachability of neural network controlled systems ABSTRACT. In this talk, we propose a computationally efficient framework for estimating reachable sets of a dynamical system with a neural network controller. Inspired by mixed monotone system theory, we leverage inclusion functions for the open-loop system and the neural network controller to embed the closed-loop system into a larger-dimensional embedding system with favorable control theoretic properties. In particular, we show that the hyper-rectangular over-approximations of the reachable sets can be efficiently computed using a single trajectory of this embedding system. We propose two methods for constructing closed-loop embedding systems, which account for the interactions between the system and the controller in different ways. The interconnection-based approach considers the worst-case evolution of each coordinate separately by substituting the neural network inclusion function into the open-loop inclusion function. The interaction-based approach uses Jacobian-based inclusion functions to capture the first-order interactions between the open-loop system and the controller by leveraging state-of-the-art neural network verifiers. Finally, we demonstrate the efficiency and scalability of our approach on several benchmarks. |
14:50 | Verification and Design of Robust and Safe Neural Network-enabled Autonomous Systems ABSTRACT. Neural networks are being applied to a wide range of tasks in autonomous systems, such as perception, prediction, planning, control, and general decision making. While they may improve system performance over traditional physical model-based methods, pressing concerns have been raised on the uncertain behaviors of neural networks under varying inputs, especially for safety-critical systems such as autonomous vehicles and robots. In this paper, we will discuss the challenges in ensuring the safety and robustness of neural network-enabled autonomous systems, and present our recent work in addressing these challenges. These include methods for certifying the robustness of neural networks, verifying the safety of neural network-controlled systems, and designing these systems with safety assurance. |
15:30 | Distributed Storage Over a Public Channel: Trade-off between Privacy and Shared Key Lengths ABSTRACT. Consider a user who wants to store a file across multiple servers such that at least t servers are needed to reconstruct the file and any z colluding servers cannot learn more than a fraction α of the file. Unlike traditional secret sharing models that assume the availability of secure channels at no cost, we assume that the user can only transmit data to the servers through a public channel and that the user and each server share an individual secret key of length n bits. For a fixed key length n, our objective is to determine the fundamental trade-off between the privacy leakage parameter α and the file length that the user can store in the servers. Furthermore, for the optimal trade-off between α and the file length, we determine (i) the minimum amount of local randomness needed by the user, (ii) the minimum amount of public communication from the user to the servers, and (iii) and the minimum storage requirement at the servers. |
15:50 | Coded Caching with Linear Coded Placement: Exact Tradeoff for the Three User Case PRESENTER: Yinbin Ma ABSTRACT. This paper studies the optimal memory-load tradeoff in a coded caching system with K = 3 users under the constraint that the contents in the local caches are the result of encoding the files by a linear code. This setting generalizes pastwork that had established the optimal tradeoff under uncodedplacement. Let N be the number of files. For K = N = 3 the optimal tradeoff under linear coded placement is shown to have a corner point in the low memory regime that was unknown before this work, which is actually optimalwithout any restrictions on the placement. For K = 3, N ≥ 4,the optimal tradeoff under linear coded placement is shown to be attained by uncoded placement. As a consequence of this result together with past optimality results, it is an open question whether non-linear coded placement would outperform the tradeoff derived in this work for the memory regime M ∈ (1/2, 1) for N = 3, and M ∈ (0, N/3) for K = 3, N ∈ {4, 5}. |
16:10 | Multi-terminal Strong Coordination over Noiseless Networks with Secrecy Constraints ABSTRACT. We investigate the problem of securely emulating a two-user multiple-access channel (MAC) aided by a multiple-access network of noiseless links as a resource. In this configuration, two encoders observe independent and identically distributed (i.i.d.) samples of a source random variable each, while the decoder receives i.i.d. samples of a side-information random variable. At limited rates, each encoder and the decoder possess independent pairwise shared randomness. The objective is for the decoder to generate approximately i.i.d. samples of another random variable, which is jointly distributed with the two sources and the side information. Furthermore, we require that an external eavesdropper who intercepts the communication and has correlated observations but no access to the shared randomness variables, learns virtually nothing about the input sources and the simulated output sequence. We are interested in the rate tuples which permit such simulation with strong secrecy. We establish a complete characterization for this secure multi-terminal coordination problem when the sources are independent and one of the pairwise shared randomness rates is unconstrained. Moreover, we derive an achievable region and an outer bound for the general case of correlated sources and limited shared randomness rates. |
16:30 | PRESENTER: Barron Han ABSTRACT. This paper investigates the problem of zero-delay joint source-channel coding of a vector Gauss-Markov source over a multiple-input mulitple-output (MIMO) additive white Gaussian noise (AWGN) channel with feedback. In contrast to the classical problem of causal estimation using noisy observations, we examine a system where the source can be encoded before transmission. An encoder, equipped with feedback of past channel outputs, observes the source state and encodes the information in a causal manner as inputs to the channel while adhering to a power constraint. The objective of the code is to estimate the source state with minimum mean square error at the infinite horizon. This work shows a fundamental theorem for two scenarios: for the transmission of an unstable vector Gauss-Markov source over either a multiple-input single-output (MISO) or a single-input multiple-output (SIMO) AWGN channel, finite estimation error is achievable if and only if the sum of the unstable eigenvalues of the state gain matrix is less than the Shannon channel capacity. We prove these results by showing an optimal linear innovations encoder that can be applied to sources and channels of any dimension and analyzing it together with the corresponding Kalman filter decoder. |
16:50 | Optimal Error Exponent of Type-Based Distributed Hypothesis Testing over AWGN Channels PRESENTER: Shao-Lun Huang ABSTRACT. This paper investigates the distributed hypothesis testing (DHT) problem over AWGN channels, where the distributed nodes are restricted to transmit functions of the empirical distributions of the observed data sequences due to practical computational constraints. The goal of this problem is to design the transmitted signals of the distributed nodes, so that the decision center can make a decision based on the received signals with the maximal decision error exponent. In order to solve this problem, we first propose a mixture coding strategy for each node, which combines the decode-and-forward and amplify-and-forward strategies, and then the achievable error exponent is computed. Moreover, we present a genie-aided approach to prove the converse result, thereby demonstrating the optimality of the achievable error exponent. In summary, our results provide coding guidance for distributed systems, and could be applied to more complicated problems. |
15:30 | Why interpolating neural nets generalize well: recent insights from neural tangent model ABSTRACT. A mystery of modern neural networks is their surprising generalization power in overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones; despite this, they achieve good prediction error on unseen data. In this talk, we focus on the neural tangent (NT) model for two-layer neural networks, which is a simplified model. Under the isotropic input data, we first show that interpolation phase transition is around Nd ~ n, where Nd is the number of parameters and n is the sample size. To demystify the generalization puzzle, we consider the min-norm interpolator and show that its test error/generalization error is largely determined by a smooth, low-degree component. Moreover, we find that nonlinearity of the activation function has an implicit regularization effect. These results offer new insights to recent discoveries in overparametrized models such as double descent phenomena. |
15:50 | A Law of Data Separation in Deep Learning ABSTRACT. While deep learning has enabled significant advances in many areas of science, its black-box nature hinders architecture design for future artificial intelligence applications and interpretation for high-stakes decision makings. We addressed this issue by studying the fundamental question of how deep neural networks process data in the intermediate layers. Our finding is a simple and quantitative law that governs how deep neural networks separate data according to class membership throughout all layers for classification. This law shows that each layer improves data separation at a constant geometric rate, and its emergence is observed in a collection of network architectures and datasets during training. This law offers practical guidelines for designing architectures, improving model robustness and out-of-sample performance, as well as interpreting the predictions. |
16:10 | Stochastic localization and algorithmic variants ABSTRACT. I will present a general method for sampling from a probability distribution. The method, not based on a Markov chain, is derived from the stochastic localization process of Eldan (2013), and proceeds by iterating two steps: (1) compute the mean vector m_t of a measure obtained by tilting the target measure in a direction y_t, and (2) update the tilting direction y_t using m_t and extra Gaussian noise. This method, when viewed in continuous time, turns out to be equivalent to recent diffusion-based sampling methods, much popular for generating realistic images. I will review the main properties of the method and show how it can be applied using one or two examples. |
16:30 | Achieving Capacity on Non-Binary Channels with Generalized Reed–Muller Codes PRESENTER: Henry Pfister ABSTRACT. Recently, the authors showed that Reed-Muller (RM) codes achieve capacity on binary memoryless symmetric (BMS) channels with respect to bit error rate. Here, we extend that work by showing that RM codes defined on non-binary fields, known as generalized RM codes, achieve capacity on sufficiently symmetric non-binary channels with respect to symbol error rate. The new proof also simplifies the previous approach (for BMS channels) in a variety of ways that may be of independent interest. Connections to other high-dimensional estimation problems, such as compressed sensing, are also discussed. |
16:50 | Generative Plug and Play: Posterior Sampling for Inverse Problems ABSTRACT. Over the past decade, Plug-and-Play (PnP) has become a popular method for reconstructing images using a modular framework consisting of a forward and prior model. The great strength of PnP is that an image denoiser can be used as a prior model while the forward model can be implemented using more traditional physics-based approaches. However, a limitation of PnP is that it reconstructs only a single deterministic image. In this paper, we introduce Generative Plug-and-Play (GPnP), a generalization of PnP to sample from the posterior distribution. As with PnP, GPnP has a modular framework using a physics-based forward model and an image denoising prior model. However, in GPnP these models are extended to become proximal generators, which sample from associated distributions. GPnP applies these proximal generators in alternation to produce samples from the posterior. We present experimental simulations using the well-known BM3D denoiser. Our results demonstrate that the GPnP method is robust, easy to implement, and produces intuitively reasonable samples from the posterior for sparse interpolation and tomographic reconstruction. |
15:30 | A Fast Algorithm for Low Rank + Sparse column-wise Compressive Sensing ABSTRACT. This paper focuses studies the following low rank plus sparse column-wise compressive sensing problem. We aim to recover a matrix from undersampled independent linear projections of each of its columns. We assume that the unknown matrix can be decomposed as a sum of low rank matrix and a sparse matrix , where rank of the low rank matrix is given by r satisfies r << min(n, q) and S is a sparse matrix. Each column of S contains rho non-zero entries. To address this recovery problem, we propose a novel fast GD-based solution called AltGDmin-LR+S, which is memory and communication efficient. We numerically evaluate its performance by conducting a detailed simulation based study. |
15:50 | A Density Evolution Framework for Recovery of Covariance and Causal Graphs from Compressed Measurements PRESENTER: Muralikrishnna Guruswamy Sethuraman ABSTRACT. In this paper, we propose a general framework for designing sensing matrix $A \in R^{d\times p}$, for estimation of sparse covariance matrix from compressed measurements of the form $y = Ax + n$, where $y, n \in R^d$, and $x \in R^p$. By viewing covariance recovery as inference over factor graphs via message passing algorithm, ideas from coding theory, such as \textit{Density Evolution} (DE), are leveraged to construct a framework for the design of the sensing matrix. The proposed framework can handle both (1) regular sensing, i.e., equal importance is given to all entries of the covariance, and (2) preferential sensing, i.e., higher importance is given to a part of the covariance matrix. Through experiments, we show that the sensing matrix designed via density evolution can match the state-of-the-art for covariance recovery in the regular sensing paradigm and attain improved performance in the preferential sensing regime. Additionally, we study the feasibility of causal graph structure recovery using the estimated covariance matrix obtained from the compressed measurements. |
16:10 | Theoretical Analysis of Binary Masks in Snapshot Compressive Imaging Systems PRESENTER: Mengyu Zhao ABSTRACT. Snapshot compressive imaging (SCI) systems have gained significant attention in recent years. While previous theoretical studies have primarily focused on the performance analysis of Gaussian masks, practical SCI systems often employ binary-valued masks. Furthermore, recent research has demonstrated that optimized binary masks can significantly enhance system performance. In this paper, we present a comprehensive theoretical characterization of binary masks and their impact on SCI system performance. Initially, we investigate the scenario where the masks are binary and independently identically distributed (iid), revealing a noteworthy finding that aligns with prior numerical results. Specifically, we show that the optimal probability of non-zero elements in the masks is smaller than 0.5. This result provides valuable insights into the design and optimization of binary masks for SCI systems, facilitating further advancements in the field. Additionally, we extend our analysis to characterize the performance of SCI systems where the mask entries are not independent but are generated based on a stationary first-order Markov process. Overall, our theoretical framework offers a comprehensive understanding of the performance implications associated with binary masks in SCI systems. |
16:30 | Phase Transitions for Diversity in Stochastic Block Model Dynamics ABSTRACT. This paper proposes a stochastic block model with dynamics where the population grows using preferential attachment. Nodes with higher weighted degree are more likely to recruit nodes from their own community. We focus on the case of two communities. This model can capture how communities grow or shrink based on their collaborations with other nodes in the network. We derive a deterministic approximation to the dynamics and characterize the phase transitions for diversity, i.e. the parameter regimes in which either one of the communities dies out over time or the two communities reach parity. We show the minority may vanish when the probability of cross-community edges is low, even when cross-community projects are more valuable than monochrome projects (i.e. projects with collaborators from the same community). |
16:50 | Graph Matching in Correlated Stochastic Block Models for Improved Graph Clustering ABSTRACT. We consider the community detection from multiple correlated graphs sharing the same community structure. The correlated graphs are generated by independent subsampling of a parent graph sampled from the stochastic block model. The vertex correspondence between the correlated graphs is assumed to be unknown. We consider the two-step procedure where the vertex correspondence between the correlated graphs is first revealed, and the communities are recovered from the union of the correlated graphs, which becomes denser than each single graph. We derive the information-theoretic limits for exact graph matching in general density regimes and the number of communities, and then analyze the regime of graph parameters, where one can benefit from the matching of the correlated graphs in recovering the latent community structure of the graphs. |
15:30 | Subspace Methods for Blind Fractionally-Spaced Equalization of Distributed Phase-Shift Beamforming Signals ABSTRACT. This work investigates different subspace-based approaches for the equalization of signals transmitted from phase shift Distributed Beamforming systems. While phase shifters have been shown to be capable of enabling distributed beamforming by providing carrier coherence, it does not address mismatches in symbol timing, resulting in potentially significant levels of intersymbol interference. Using fractionally spaced receivers and the singular value decomposition of the channel matrix, we design linear and nonlinear equalization methods to combat the distortion induced by this symbol timing mismatch. We show that when linear equalization is not practical, a Causal Minimum Noise Decision Feedback Equalizer would provide us with better symbol error performance when operating with high signal-to-noise ratio, and we provide a lower bound on the error rate of this proposed equalization method. |
15:50 | Rotation-Invariant Correlation Features for 2D Sparse Unknown View Tomography ABSTRACT. 2D sparse unknown view tomography aims to reconstruct a set of point sources from its random projections with unknown viewing directions. We propose a set of rich rotation-invariant correlation features that can be extracted from the projections, which allows us to reconstruct Gaussian point-source signals without estimating the viewing directions. We show that the proposed correlation feature is a quadratic function of the signal, and formulate the reconstruction problem as solving a system of quadratic equations with linear and nonnegative constraints. Simulation experiments show that the proposed approach is robust to noise and successfully recovers the point set with Gaussian sources. |
16:10 | On Subsampled Quantile Randomized Kaczmarz PRESENTER: Jamie Haddock ABSTRACT. When solving noisy linear systems Ax = b + c, the theoretical and empirical performance of stochastic iterative methods, such as the Randomized Kaczmarz algorithm, depends on the noise level. However, if there are a small number of highly corrupt measurements, one can instead use quantile- based methods to guarantee convergence to the solution x of the system, despite the presence of noise. Such methods require the computation of the entire residual vector, which may not be desirable or even feasible in some cases. In this work, we analyze the sub-sampled quantile Randomized Kaczmarz (sQRK) algorithm for solving large-scale linear systems which utilize a sub-sampled residual to approximate the quantile threshold. We prove that this method converges to the unique solution to the linear system and provide numerical experiments that support our theoretical findings. We additionally remark on the extremely small sample size case and demonstrate the importance of interplay between the choice of quantile and subset size. |
16:30 | Strategic Quantization of a Noisy Source ABSTRACT. This paper is concerned with strategic quantization of a noisy source where the encoder, which observes the remote source through a noisy channel, and the decoder, with a distortion defined over the remote source, have misaligned objectives. Such scenarios constitute a special class of games called information design in Economics. Here, we approach them with an Engineering lens by adding a constraint on the message space in the problem setting. This is an extension of our recent work on strategic quantization of a noiseless source. Finally, we present the numerical results that validate the analysis in this paper. The codes are available at: https://tinyurl.com/allerton2023. |