ALLERTON 2023: 2023 ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, & COMPUTING
PROGRAM FOR WEDNESDAY, SEPTEMBER 27TH
Days:
next day
all days

View: session overviewtalk overview

08:30-10:10 Session WeA1: ADGB1 - Electric Power Systems I

Organizers - Alejandro Dominguez-Garcia and Subhonmesh Bose

Location: Library
08:30
Wenqi Cui (University of Washington, United States)
Guanya Shi (Carnegie Mellon University, United States)
Yuanyuan Shi (University of California, San Diego, United States)
Baosen Zhang (University of Washington, United States)
Leveraging Predictions in Power System Frequency Control: an Adaptive Approach

ABSTRACT. Ensuring the frequency stability of electric grids with increasing renewable resources is a key problem in power system operations. In recent years, a number of advanced controllers have been designed to optimize frequency control. These controllers, however, almost always assume that the net load in the system remains constant over a sufficiently long time. Given the intermittent and uncertain nature of renewable resources, it is becoming important to explicitly consider net load that is time-varying.

We propose an adaptive approach to frequency control in power systems with significant time-varying net load. We leverage the advances in short-term load forecasting, where the net load in the system can be accurately predicted using weather and other features. We integrate these predictions into the design of adaptive controllers, which can be seamlessly combined with most existing controllers including conventional droop control and emerging neural network-based controllers. We prove that the overall control architecture achieves frequency restoration decentralizedly. Case studies verify that the proposed method improves both transient and frequency-restoration performances compared to existing approaches.

08:50
Lihui Yi (Northwestern University, United States)
Ermin Wei (Northwestern Univeristy, United States)
The Effect of Altruism in the Electricity Network

ABSTRACT. We explore the interplay between a self-interested player and an altruistic player acting on behalf of a benevolent social planner in the electricity market. Their interaction is coupled by the underlying physical network properties. Our study focuses on a stylized model, inspired by Cournot competition in economics, to analyze the behavior of two electricity firms serving a single load. We aim to understand how the electricity transmission network impacts the influence of an altruistic player in the market. In our work, we consider two scenarios: one with both firms as self-interested players (i.e., profit maximizers), and another one where one firm being altruistic with the same objective as a benevolent social planner, aiming to maximize total social welfare. First, we assume the transmission capacity is arbitrarily large and compare the equilibrium outcomes of two scenarios. Then we investigate the influence of limited transmission capacity on the equilibrium. Taking into account the underlying network, we find that the effect of altruistic player's ability to regulate market manipulation is reduced, and in some cases, even entirely lost.

09:10
Xavier Farrell (University of Michigan, United States)
Johanna Mathieu (University of Michigan, United States)
Quantifying Confidence in Electricity Rate Plan Recommendations Considering Variability in Future Electricity Usage

ABSTRACT. Electricity Rate Plan Recommender Systems (EPRS) are a class of tools developed to help electrical utility customers find rate plans that could save them money on their utilities. ERPS generate recommendations using past home energy usage data or other home specific features such as appliance ownership and occupancy. However, the homeowner’s future electricity usage characteristics cannot be precisely predicted, and therefore, the same must be said about their future savings (or losses) obtained by adopting an EPRS recommendation. Thus, the homeowner must ask, how confident can they be that the recommended rate plan will incur the least cost relative to the alternatives. To answer this question, this work applies least-squares temperature dependent load modelling techniques and statistical bootstrapping methods to estimate confidence intervals for the next year's expected savings for each recommendation. The method is then applied to one year of hourly smart meter data for 800+ households in Detroit, MI. The confidence interval for the expected savings of each household is then validated against the each household's true savings in the following year as computed from the following year's energy usage data.

09:30
Apostolos Rikos (KTH Royal Institute of Technology, Sweden)
Nicolas Manitara (University of Cyprus, Cyprus)
Gabriele Oliva (University Campus Bio-Medico of Rome, Italy)
Christoforos Hadjicostis (University of Cyprus, Cyprus)
Privacy-Preserving Distributed k-means Clustering with Quantized Communication in Finite Time

ABSTRACT. In this paper, we propose a distributed algorithm that extends the application of the well-known k-means algorithm to multi-agent systems with directed communication links. Our algorithm addresses key challenges in such systems by leveraging three strategies: preserving the privacy of the data in each node, transmitting quantized values between nodes, and exhibiting distributed stopping capabilities. By preserving the privacy of each node’s data we maintain confidentiality, and security, in the network as we ensure that sensitive or confidential information remains undisclosed to unauthorized entities (in our case, these entities could be, for example, curious nodes within the network). By quantizing the transmitted values, we optimize bandwidth utilization and alleviate communication bottlenecks, leading to improved efficiency. Additionally, nodes possess the ability to collectively determine when to terminate the algorithm, enabling the conservation of valuable resources. To enhance privacy within the system, we introduce a novel privacy-preserving protocol. This protocol allows nodes to protect their initial values from curious nodes by defining specific conditions for secure information exchange. We carefully analyze these conditions and provide precise conditions that guarantee privacy preservation for individual nodes. Moreover, our distributed algorithm provides guarantees for the formation of exclusive clusters within a finite time frame on any static and strongly connected directed graph. The paper concludes by providing illustrative examples that highlight the functionality, performance, and potential benefits of the proposed algorithm.

08:30-10:10 Session WeA2: VVAZ1 - Robust Inference and Learning

Organizers - Venu Veeravalli, George Atia, and Shaofeng Zou

Location: Solarium
08:30
Sen Lin (The Ohio State University, United States)
Peizhong Ju (The Ohio State University, United States)
Yingbin Liang (The Ohio State University, United States)
Ness Shroff (The Ohio State University, United States)
Theory on Forgetting and Generalization of Continual Learning

ABSTRACT. Continual learning (CL), which aims to learn a sequence of tasks, has attracted significant recent attention. However, most work has focused on the experimental performance of CL, and theoretical studies of CL are still limited. In particular, there is a lack of understanding on what factors are important and how they affect "catastrophic forgetting" and generalization performance. To fill this gap, our theoretical analysis, under overparameterized linear models, provides the first-known explicit form of the expected forgetting and generalization error. Further analysis of such a key result yields a number of theoretical explanations about how overparameterization, task similarity, and task ordering affect both forgetting and generalization error of CL. More interestingly, by conducting experiments on real datasets using deep neural networks (DNNs), we show that some of these insights even go beyond the linear models and can be carried over to practical setups. In particular, we use concrete examples to show that our results not only explain some interesting empirical observations in recent studies, but also motivate better practical algorithm designs of CL.

08:50
Yue Wang (University at Buffalo, United States)
Alvaro Velasquez (Air Force Research Laboratory, United States)
George Atia (University of Central Florida, United States)
Ashley Prater-Bennette (Air Force Research Laboratory, United States)
Shaofeng Zou (University at Buffalo, the State University of New York, United States)
Robust Average-Reward Markov Decision Processes

ABSTRACT. In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor γ goes to 1, and moreover, when γ is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.

09:10
Abhin Shah (Massachusetts Institute of Technology, United States)
Maohao Shen (Massachusetts Institute of Technology, United States)
Jongha Jon Ryu (Massachusetts Institute of Technology, United States)
Subhro Das (MIT-IBM Watson AI Lab, IBM Research, United States)
Prasanna Sattigeri (MIT-IBM Watson AI Lab, IBM Research, United States)
Yuheng Bu (University of Florida, United States)
Gregory W. Wornell (Massachusetts Institute of Technology, United States)
Group Fairness with Uncertainty in Sensitive Attributes

ABSTRACT. Learning a fair predictive model is crucial to mitigate biased decisions against minority groups in high-stakes applications. A common approach to learn such a model involves solving an optimization problem that maximizes the predictive power of the model under an appropriate group fairness constraint. However, in practice, sensitive attributes are often missing or noisy resulting in uncertainty. We demonstrate that solely enforcing fairness constraints on uncertain sensitive attributes can fall significantly short in achieving the level of fairness of models trained without uncertainty. To overcome this limitation, we propose a bootstrap-based algorithm that achieves the target level of fairness despite the uncertainty in sensitive attributes. The algorithm is guided by a Gaussian analysis for the independence notion of fairness where we propose a robust quadratically constrained quadratic problem to ensure a strict fairness guarantee with uncertain sensitive attributes. Our algorithm is applicable to both discrete and continuous sensitive attributes and is effective in real-world classification and regression tasks for various group fairness notions, e.g., independence and separation.

09:30
Banghua Zhu (University of California, Berkeley, United States)
Hiteshi Sharma (Microsoft Research, United States)
Felipe Frujeri (Microsoft Research, United States)
Shi Dong (Microsoft Research, United States)
Chenguang Zhu (Microsoft Research, United States)
Michael Jordan (University of California, Berkeley, United States)
Jiantao Jiao (University of California, Berkeley, United States)
Fine-Tuning Language Models with Advantage-Induced Policy Alignment

ABSTRACT. Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences. Among the plethora of RLHF techniques, proximal policy optimization (PPO) is of the most widely used methods. Despite its popularity, however, PPO may suffer from mode collapse, instability, and poor sample efficiency. We show that these issues can be alleviated by a novel algorithm that we refer to as Advantage-Induced Policy Alignment (APA), which leverages a squared error loss function based on the estimated advantages. We demonstrate empirically that APA consistently outperforms PPO in language tasks by a large margin when a separate reward model is employed as the evaluator. In addition, compared with PPO, APA offers a more stable form of control over the deviation from the model’s initial policy, ensuring that the model improves its performance without collapsing to deterministic output. In addition to empirical results, we also provide a theoretical justification supporting the design of our loss function.

09:50
Xiangxiang Xu (MIT, United States)
Ishank Agrawal (MIT, United States)
Lizhong Zheng (MIT, United States)
Feature Geometry of Multivariate Dependence and Applications

ABSTRACT. a little later if allowed.

08:30-10:10 Session WeA3: BHRS1 - Learning and Networks I

Organizers - Bruce Hajek and R. Srikant

Chair:
Thodoris Lykouris (MIT Sloan School of Management, United States)
Location: Butternut
08:30
Yige Hong (Carnegie Mellon University, United States)
Qiaomin Xie (University of Wisconsin-Madison, United States)
Yudong Chen (University of Wisconsin-Madison, United States)
Weina Wang (Carnegie Mellon University, United States)
Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

ABSTRACT. We study the infinite-horizon restless bandit problem with the average reward criterion, under both discrete-time and continuous-time settings. A fundamental question is how to design computationally efficient policies that achieve a diminishing optimality gap as the number of arms, $N$, grows large. Existing results on asymptotical optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this paper, we propose a general, simulation-based framework that converts any single-armed policy into a policy for the original $N$-armed problem. This is accomplished by simulating the single-armed policy on each arm and carefully steering the real state towards the simulated state. Our framework can be instantiated to produce a policy with an $O(1/\sqrt{N})$ optimality gap. In the discrete-time setting, our result holds under a simpler synchronization assumption, which covers some problem instances that do not satisfy UGAP. More notably, in the continuous-time setting, our result does not require any additional assumptions beyond the standard unichain condition. In both settings, we establish the first asymptotic optimality result that does not require UGAP.

08:50
Sai Sumedh Hindupur (Indian Institute of Technology Bombay, India)
Vivek Borkar (Indian Institute of Technology Bombay, India)
Online Parameter Estimation in Partially Observed Markov Decision Processes

ABSTRACT. We consider the problem of online parameter estimation using maximum likelihood for Partially Observed Markov Decision Processes (POMDPs). Classical approaches based upon EM algorithm are for the episodic framework and are not suited for this problem. We develop a methodology based upon extremum seeking to develop a multiple time scale iterative scheme for the online problem and study it both theoretically and with numerical experiments.

09:10
Michael Neely (University of Southern California, United States)
Adaptive optimization for stochastic renewal systems

ABSTRACT. This paper considers opportunistic scheduling over a system where tasks are performed back-to-back. Each new task has a random type. The type of a task determines the performance of the system under each possible scheduling option, which includes the duration of the task, the energy expended, and a vector of general penalties. The goal is to make decisions on how to process each task to maximize time average reward subject to a collection of time average inequality constraints. Prior work on this problem use a drift-plus-penalty ratio algorithm, or related variations, that do not adapt to changes in the system probabilities. This work uses a new type of optimization, utilizing a convex subset of a nonconvex set, to solve the problem using a very small state space description and using a method that adapts to environmental changes.

09:30
Xinghan Wang (University of California, San Diego, United States)
Nasimeh Heidaribeni (University of California, San Diego, United States)
Tara Javidi (University of California San Diego, United States)
A variational approach to federated and robust learning

ABSTRACT. This paper considers a class of decentralized learning problems where a set of networked agents are tasked to learn and act from their local data/experiences as well as their social interactions. This generalizes the problem of federated learning to the problem of decentralized (reinforcement) learning under data and model poisoning attacks. Inspired by a Bayesian view of the problem, we propose a variational method to approximately quantify and then utilize uncertainty across the network. We empirically investigate and illustrate the superior performance of the proposed algorithms in multiple concrete examples.

09:50
Caio Lauand (University of Florida, United States)
Ana Busic (Inria and ENS, France)
Sean Meyn (University of Florida, United States)
Inverse-Free Zap Stochastic Approximation

ABSTRACT. Stochastic approximation (SA) concerns recursive algorithms to approximate the solution to f(θ∗) = 0, using a sequence of d-dimensional noisy measurements, where in this paper the noise is modeled as an ergodic Markov chain. The standard condition for success is global asymptotic stability of the ODE with vector field f, which may be difficult to verify in applications.

Zap SA is a matrix gain algorithm designed to simultaneously ensure stability, and minimize the asymptotic covariance appearing in the Central Limit Theorem for the scaled parameter estimates. The algorithm is found to be very reliable in practice, but highly complex since each iteration requires an update of an estimate of a Jacobian matrix, followed by an inversion of this estimate. This work goes far beyond this prior research. The main contributions follow: 1) The new Zap Zero SA algorithm is introduced in which there is no need for matrix estimation or matrix inversion, yet the algorithm enjoys the same universal stability properties as Zap SA. 2) On employing averaging of parameter estimates, the asymptotic covariance is optimal, exactly as found in the original Zap SA. 3) Examples are provided to illustrate the general theory, and suggest topics for future research.

08:30-10:10 Session WeA4: OMRE1 - Machine Learning I

Organizers - Olgica Milenkovic and Rasoul Etesami

Location: Pine
08:30
Xiangyu Liu (University of Maryland, College Park, United States)
Kaiqing Zhang (University of Maryland, College Park, United States)
Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing

ABSTRACT. We study provable multi-agent reinforcement learning (MARL) in the general framework of partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of computationally intractable oracles, we propose to leverage the potential information-sharing among agents, a standard practice in empirical MARL and a common model for multi-agent control systems with communications. We first establish several computation complexity results to justify the necessity of information-sharing, as well as the observability assumption that has enabled quasi-efficient single-agent RL with partial observations, for computational efficiency in solving POSGs. We then propose to further approximate the shared common information to construct an approximate model of the POSG, in which planning an approximate equilibrium (in terms of solving the original POSG) can be quasi-efficient, i.e., of quasi-polynomial-time, under the aforementioned assumptions. Furthermore, we develop a partially observable MARL algorithm that is both statistically and computationally quasi-efficient. We hope our study may open up the possibilities of leveraging and even designing different information structures, for developing both sample- and computation-efficient partially observable MARL.

08:50
Cesar A. Uribe (Rice University, United States)
Push-sum, Kullback-Leibler Barycenters, and Colored Flows

ABSTRACT. The Push-sum algorithm has become the off-the-shelf subroutine for distributed optimization over time-varying and directed graphs. Its structure allows an algorithm to reach consensus by keeping track of two appropriately initialized mixing variables, whose ratio for each agent will converge to the mean of initial values held over the network. Its convergence rate is geometric. However, its worst-case bounds show an exponential dependency over the number of nodes in the graph. In this paper, we show an unexpected connection between the algorithmic structure of the push-sum algorithm, the computation of geometric averages of Gaussian distributions (Kullback-Leibler Barycenters), and concentration mixing in water tanks described as colored flows. This similarity in dynamics might shed light on a better understanding of mixing in directed graphs and provide immediate convergence properties of some variations of the consensus problem.

09:10
Behrouz Touri (University of California San Diego, United States)
Hadi Reisizadeh (University of Minnesota, United States)
Soheil Mohajer (University of Minnesota, United States)
Distributed Optimization with Imperfect Information Sharing

ABSTRACT. We study the problem of distributed optimization with imperfect information sharing. We propose and study a two-time-scale decentralized gradient descent algorithm to reach a consensus on an optimizer of the objective function. One time scale fades out the imperfect incoming information from the neighboring agents, and the second one adjusts the local loss functions' gradients. We show that under certain conditions on the connectivity of the underlying time-varying network and the time-scale sequences, the algorithm converges to an optimal point of the cost function.

09:30
Shubhada Agrawal (Georgia Institute of Technology, United States)
Martin Zubeldia (University of Minnesotta, United States)
Siva Theja Maguluri (Georgia Institute of Technology, United States)
Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

ABSTRACT. We study the concentration behavior of a stochastic approximation algorithm under a contractive operator with respect to an arbitrary norm. We consider two settings where the iterates are potentially unbounded: (1) bounded multiplicative noise, and (2) additive sub-Gaussian noise. We obtain concentration bounds on the convergence errors, and show that these errors have sub-Gaussian tails in the additive noise setting, and super-polynomial tails (faster than polynomial decay) in the multiplicative noise setting. We also present recent generalizations of our results to the case of Markovian noise, which enables us to obtain high probability convergence bounds and sample complexity for a large class of RL algorithms including TD learning and its variants, as well as Q learning.

10:30-12:30 Session WeB1: GFVV1 - Sequential Methods I

Organizers - G. Fellouris and V. Veeravalli

Location: Library
10:30
Haoyun Wang (Georgia Institute of Technology, United States)
Yao Xie (Georgia Insitute of Technology, United States)
Non-parametric distribution-free CUSUM for online change-point detection

ABSTRACT. In modern applications, it is of interest to detect change without making distributional assumptions for using possibly high-dimensional time series due to the complex nature of the data. We develop a general framework of non-parametric CUSUM procedure based on popular distribution-free statistical divergences that can be conveniently estimated by mini-batches of samples, such as MMD and classification loss, computed from mini-batches of data. We present a way to analyze the statistical performance of such procedures by extending the classic non-linear renewal theory.

10:50
Taposh Banerjee (University of Pittsburgh, United States)
Vahid Tarokh (Duke University, United States)
Score-Based Bayesian Quickest Change Detection

ABSTRACT. The problem of quickest change detection is studied in which the decision maker has access to the true pre- and post-change distributions within normalizing constants. The case where only the gradient of the logarithm of the density is known is also investigated. Algorithms are proposed based on the Hyvärinen score of the data. Expressions are provided for the detection delay and the false alarm rate. In this score-based approach to quickest change detection, the role of Kullback-Leibler divergence is replaced by the Fisher divergence.

11:10
Mehmet Kurt (Columbia University, United States)
Xiaodong Wang (Columbia University, United States)
Pieter J. Mosterman (MathWorks, United States)
Yasin Yilmaz (University of South Florida, United States)
Data-Driven Sequential Change Detection in Privacy-Sensitive Networks

ABSTRACT. We study privacy-preserving sequential change detection in a setting where the data are distributed over a network and locally sensitive to each node, and probabilistic data model is unknown. We design and analyze a data-driven solution scheme where each node observes a high-dimensional data stream for which it computes a local outlierness score, perturbs, encrypts, and sends it to a network operator. The network operator then decrypts an aggregate statistic over the network and performs online network change detection via the proposed generalized cumulative sum (CUSUM) algorithm. We derive an asymptotic lower bound and an asymptotic approximation for the average false alarm period of the proposed algorithm. Additionally, we derive an asymptotic upper bound and asymptotic approximation for the average detection delay of the proposed algorithm under a certain anomaly. We show the analytical tradeoff between the change detection performance and the differential privacy level, controlled via the local perturbation noise. Experiments illustrate that the proposed algorithm offers a good tradeoff between privacy and quick change detection against the UDP flooding and spam attacks in a real Internet of Things (IoT) network.

11:30
Shuchen Cao (University of Nebraska-Lincoln, United States)
Ruizhi Zhang (University of Georgia, United States)
An Adaptive Approach for Online Monitoring of Large Scale Data Streams

ABSTRACT. We propose an adaptive top-r method to monitor large-scale data streams where the change may affect a set of unknown data streams at some unknown time. Our proposed method combines the Benjamin-Hochberg (BH) false discovery rate (FDR) control procedure and the CUSUM procedure to construct a global monitoring statistic that can estimate the number of changed data streams adaptively and can detect the change quickly. Theoretically, we show that our proposed method can estimate the number of changed data streams for both the pre-change and post-change status. Moreover, we perform simulations and two case studies to show its detection efficiency

11:50
Xiaoou Li (University of Minnesota, United States)
Hongru Zhao (University of Minnesota, United States)
Globally-optimal Greedy Experiment Selection For Active Sequential Estimation

ABSTRACT. Motivated by applications such as computerized adaptive testing, sequential rank aggregation, and data source selection in signal processing, we study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data. The goal is to design the experiment selection rule so that the model parameter can be estimated more accurately. Greedy information-based experiment selection methods, optimizing the information gain for one-step ahead, have been employed in practice thanks to their computational convenience and flexibility to context change. However, statistical analysis is limited to the one-dimensional case due to the problem’s combinatorial nature and the seemingly limited capacity of greedy algorithms in higher-dimensional spaces. In this talk, I will present the statistical analysis for a general class of greedy experiment selection methods that extend existing techniques for specific problems and introduce new methods with improved computational efficiency. Theoretical results regarding the consistency, asymptotic normality and asymptotic optimality will be discussed.

12:10
Liyan Xie (The Chinese University of Hong Kong, Shenzhen, China)
Jie Gao (The Chinese University of Hong Kong, Shenzhen, China)
Zhaoyuan Li (The Chinese University of Hong Kong, Shenzhen, China)
Sequential Change-Point Detection for Correlation Matrices

ABSTRACT. We study the quickest change detection for correlation matrices. We assume the pre-change correlation matrix is an identity matrix (i.e., no pairwise correlation) and the post-change correlation matrix is unknown. We propose detection statistics based on the sample correlation matrices. Two types of detection statistics are studied, one is the sum type statistics which is good at detecting dense changes; the other is the max type statistics which is good at detecting sparse changes in the correlation matrix. We conduct different types of detection procedures in parallel to improve detection efficiency. Theoretical guarantees are provided for the evaluation criteria, including average run length to false alarm and the detection delay. Synthetic and real data examples are also provided to validate the performance.

10:30-12:30 Session WeB2: ADGB2 - Electric Power Systems II

Organizers - Alejandro Dominguez-Garcia and Subhonmesh Bose

Location: Solarium
10:30
Dominic Gross (University of Wisconsin-Madison, United States)
Xue Lyu (Pacific Northwest National Laboratory, United States)
Towards constrained grid-forming control

ABSTRACT. Replacing conventional fuel-based synchronous generators with converter-interfaced resources results in significantly different power system dynamics and challenges standard operating paradigms and controls. In this context, grid-forming (GFM) converters are envisioned to replace synchronous machines as the cornerstone of future power systems. However, compared to synchronous generators, power electronic converters and their renewable generation sources have limited inherent energy storage and overload capability. Typically GFM controls are designed neglecting these constraints and then augmented with heuristic limiters for a subset of constraints. Instead, in this work we propose a more systematic approach to GFM control that explicitly accounts for constraints of both voltage source converters and their power source in the GFM control design. The results are illustrated using a grid-connected single-stage photovoltaic system.

10:50
Bernard Lesieutre (University of Wisconsin-Madison, United States)
Bhumesh Kumar (University of Wisconsin-Madison, United States)
Participation Factor Analysis of Nonlinear Systems: Beyond Equilibria

ABSTRACT. Participation factors are widely used in power system analysis to associate dynamic behavior to state variables. Participation factors account for both the appearance of the dynamic modes of interest in the response of the state variables, and the impact of state variables on the excitation of the dynamics. They include aspects of observability and controllability, and they are calculated as the elementwise product of right- and left-eigenvectors associated with an eigenvalue. It is the purpose of this work to extend the use of participation factors beyond the analysis of constant fixed point solutions. We return to earlier work that attempted to extend participation factors to limit cycles through a nonlinear dynamic phasor transformation resulting in a system with a constant steady-state. In this paper we present additional analysis to support this extension by examining sensitivities involving limit cycle characteristics. Then we explore further extensions to general systems and trajectories, attempting to assign participation factor like measures to associate dynamic phenomenon to state variables. We use the Rossler system which exhibits ordinary fixed points, limit cycles, and chaotic behavior depending on parameters.

11:10
Steven Low (Caltech, United States)
Three-phase Transformer Modeling

ABSTRACT. We present a simple and transparent method to derive the admittance matrices of three-phase banks of single- phase transformers in an unbalanced setting. The basic idea is to explicitly separate a transformer model into an internal model that specifies the characteristics of the constituent single- phase transformers, and a conversion rule that maps its internal variables to its terminal variables. Its admittance matrix can then be derived by eliminating the internal variables from the set of equations describing the internal model and the conversion rule. The method is general and provides rigor and clarity, facilitating extensions. It also reveals a transformer model that is unified and modular.

10:30-12:30 Session WeB3: RZSF1 - Recent Advances in Optimization I

Organizers - Richard Zhang and Salar Fattahi

Location: Butternut
10:30
Xingyu Xu (Carnegie Mellon University, United States)
Yandi Shen (University of Chicago, United States)
Yuejie Chi (Carnegie Mellon University, United States)
Cong Ma (University of Chicago, United States)
The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

ABSTRACT. We propose ScaledGD(λ), a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, ScaledGD(λ) starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, ScaledGD(λ) is remarkably robust to ill-conditioning compared to vanilla gradient descent (GD) even with overprameterization. Specifically, we show that, under the Gaussian design, ScaledGD(λ) converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla GD which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning

10:50
Joshua Cutler (University of Washington, United States)
Mateo Diaz (Johns Hopkins University, United States)
Dmitriy Drusvyatskiy (University of Washington, United States)
Stochastic approximation with decision-dependent distributions: asymptotic normality and optimality

ABSTRACT. We analyze a stochastic approximation algorithm for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence. The primary examples of such problems appear in performative prediction and its multiplayer extensions. We show that under mild assumptions, the deviation between the average iterate of the algorithm and the solution is asymptotically normal, with a covariance that nicely decouples the effects of the gradient noise and the distributional shift. Moreover, building on the work of H'ajek and Le Cam, we show that the asymptotic performance of the algorithm is locally minimax optimal.

11:10
Jianhao Ma (University of Michigan, United States)
Salar Fattahi (University of Michigan, United States)
Blessing of Nonconvexity in Factorized Models

ABSTRACT. Factorized models, from low-rank matrix recovery to deep neural networks, play a central role in many modern machine learning problems. Despite their widespread applications, problems based on factorized models are deemed difficult to solve in their worst case due to their inherent nonconvexity. Our talk is inspired by the recent observations in the optimization and machine learning communities that many realistic and practical instances of factorized models are far from their worst-case scenarios. We study a natural nonconvex and nonsmooth formulation of two prototypical factorized models, namely low-rank matrix factorization and deep linear regression. On the negative side, we show that these problems do not have a benign landscape: with high probability, there always exists a true solution that is not a global minimum of the loss function. However, on the positive side, we show that a simple subgradient method with small initialization is oblivious to such problematic solutions; instead, it converges to a balanced solution that is not only close to the ground truth but also enjoys a flat local landscape.

11:30
Jose Gallego-Posada (Mila, University of Montreal; Meta Platforms, Inc., Canada)
Hao-Jun Michael Shi (Meta Platforms, Inc., United States)
Michael Rabbat (Meta Platforms, Inc., Canada)
Explorations on scalable preconditioned gradient methods for deep learning

ABSTRACT. The growth in model size observed in current deep learning applications has reinforced the research interest in stochastic preconditioned gradient methods. Amongst an extensive list of alternatives, distributed Shampoo is a promising preconditioned stochastic gradient method for training neural networks, motivated as an approximation to (the often intractable) full-matrix Adagrad. In this talk, I will provide a technical introduction to the distributed Shampoo algorithm and its PyTorch implementation for distributed data-parallel training. We discuss the tradeoffs between memory, efficiency, and model quality when tuning distributed Shampoo and other block-preconditioned methods for industrial settings, as well as the performance optimizations necessary to produce a fast, high-quality implementation.

11:50
Ahmed Khaled (Princeton University, United States)
Konstantin Mishchenko (Samsung AI Center, UK)
Chi Jin (Princeton University, United States)
DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

ABSTRACT. This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient -- matching the convergence rate of optimally tuned gradient descent in convex optimization up to a logarithmic factor without tuning any parameters, and universal -- automatically adapting to both smooth and nonsmooth problems. While popular algorithms such as AdaGrad, Adam, or DoG compute a running average of the squared gradients, DoWG maintains a new distance-based weighted version of the running average, which is crucial to achieve the desired properties. To our best knowledge, DoWG is the first parameter-free, efficient, and universal algorithm that does not require backtracking search procedures. It is also the first parameter-free AdaGrad style algorithm that adapts to smooth optimization. To complement our theory, we also show empirically that DoWG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks. This paper further uncovers the underlying principle behind the success of the AdaGrad family of algorithms by presenting a novel analysis of Normalized Gradient Descent (NGD), that shows NGD adapts to smoothness when it exists, with no change to the stepsize. This establishes the universality of NGD and partially explains the empirical observation that it trains at the edge of stability in a much more general setup compared to standard gradient descent. The latter might be of independent interest to the community.

12:10
David Rosen (Northeastern University, United States)
Practical Algorithms for Globally Optimal Geometric Machine Perception

ABSTRACT. Many fundamental geometric estimation tasks in robotics and computer vision naturally take the form of high-dimensional nonconvex optimization problems over smooth manifolds; this class includes (for example) the fundamental problems of simultaneous localization and mapping (in robotics), 3D reconstruction (in computer vision), and sensor network localization (in distributed sensing). These problems are known to be computationally hard in general, with many local minima that can entrap the smooth local optimization methods commonly applied to solve them. The result is that standard machine perception algorithms can be surprisingly brittle, often returning egregiously wrong estimates in practice.

In this talk, we describe a novel class of *certifiably correct* estimation algorithms that can efficiently recover globally optimal solutions of these challenging estimation problems in many practical settings. Our approach is based upon a family of convex (semidefinite) relaxations whose minimizers we prove provide *exact*, *globally optimal* solutions of the original estimation problem under moderate measurement noise. We illustrate this class of methods using the fundamental problem of robotic mapping as a running example.

10:30-12:30 Session WeB4: Quantum Computing
Location: Pine
10:30
Sayantan Chakraborty (Centre for Quantum Technologies, National University of Singapore, Singapore)
Rahul Jain (Centre for Quantum Technologies, National University of Singapore, Singapore)
Pranab Sen (Tata Institute of Fundamental Research, Mumbai, India)
One-Shot Non-Catalytic Distributed Purity Distillation

ABSTRACT. In this paper we revisit the problem of distributed purity distillation in the one-shot setting. The problem considers a bipartite quantum state $\rho^{AB}$ shared between two parties Alice and Bob, who are allowed local unitary operations and one-way classical communication. The goal is to extract pure qubit states using these given resources, in their respective laboratories. A caveat of this problem is that all known protocols which recover the optimal number of pure qubit states require some pure qubits at the beginning at the protocol, to be used as ancilla. These ancilla qubits are used catalytically and recovered at the end of the protocol, but typically a large amount of ancilla is necessary for the protocol to work. Some workarounds exist via bootstrapping in the asymptotic iid setting, but aside from being rather ad-hoc, these ideas fail in the one-shot setting. In this paper, we present a one-shot purity distillation protocol which requires no ancilla qubits to work for most choices of the state $\rho^{AB}$ . Due to the nature of one-shot quantities, there do exist certain pathological situation when our protocol requires some ancilla to work, but we show that these pathological cases do not arise when multiple iid copies of the underlying state are available.

10:50
Yu-Chen Shen (National Taiwan University, Taiwan)
Li Gao (University of Houston, United States)
Hao-Chung Cheng (National Taiwan University, Taiwan)
Privacy Amplification Against Quantum Side Information Via Regular Random Binning

ABSTRACT. We consider privacy amplification against quantum side information by using regular random binning as an effective extractor. For constant-type sources, we obtain both error exponent and strong converse bounds in terms of the so-called quantum Augustin information. Via type decomposition, we then recover the error exponent for independent and identically distributed sources proved by Dupuis [arXiv:2105.05342]. As an application, we obtain an achievable secrecy exponent for classical-quantum wiretap channel coding in terms of the Augustin information, which solves an open problem in [IEEE Trans. Inf. Theory, 65(12):7985, 2019]. Our approach is to establish an operational equivalence between privacy amplification and quantum soft covering; this may be of independent interest.

11:10
Michael Graifer (Tel Aviv University, Israel)
Yuval Kochman (HUJI, Israel)
Ofer Shayevitz (Tel Aviv University, Israel)
Quantum Key Distribution with State Replacement
PRESENTER: Michael Graifer

ABSTRACT. Quantum key distribution protocols have been stud- ied for almost four decades, and have already seen real world applications. Such implementations often operate in naturally noisy environments (e.g., sensor thermal noise, sun radiation, etc.) with known or measurable noise levels. However, existing security proofs implicitly attribute any channel noise to a possible eavesdropper, an overly pessimistic assumption that can adversely effect the guaranteed protocol key-rate. In this paper, we analyze general prepare-and-measure QKD schemes, and show how to improve the bounds on the key-rate under the assumption of an independent state replacement noise at the receiver. Taking BB84 as a concrete example, our results theoretically guarantee a higher key-rate, as a function of the minimal possible replace- ment probability. In practice, for some parameter-regimes, this guarantees a higher tolerable QBER.

11:30
James I. Lathrop (Iowa State University, United States)
Peter-Michael Osera (Grinnell College, United States)
Addison W. Schmidt (Iowa State University, United States)
Jesse C. Slater (Iowa State University, United States)
Verifying Chemical Reaction Networks with the Isabelle Theorem Prover

ABSTRACT. Molecular programming involves algorithmically controlling nanoscale devices to compute functions, create structures, and manipulate molecules to perform various tasks. Many programmed nanoscale devices have been successfully demonstrated in the laboratory with physical experiments. Some of the applications for these devices are safety critical, and thus require extensive verification for their correctness. Unfortunately, experiments and deployment of these applications require molecular device counts on the order of ten to the eighteenth power, making verification via model checking and simulation infeasible. Verification of these systems must therefore be accomplished through proof techniques independent of the number of devices in the system.

Developing these proofs can be tedious and time-consuming. In this paper, we develop general proof constructs for a subclass of molecular programs that compute semilinear functions modeled by a stochastic chemical reaction network. These proof techniques are developed in Isabelle, an automated theorem prover, with resulting theory files packaged so that they can be reused.

13:30-15:10 Session WeC1: ADGB3 - Electric Power Systems III

Organizers - Alejandro Dominguez-Garcia and Subhonmesh Bose

Location: Library
13:30
Gerald Ogbanna (Systems Engineering, Cornell University, United States)
Lindsay Anderson (Cornell University, United States)
Aggregating Power Networks for Decentralized Control: A Spectral Clustering Approach

ABSTRACT. With the large-scale integration of variable distributed energy resources (DERs) and the growing influence of aggregators in power networks, electricity generation, and grid operation are increasingly becoming decentralized. Aggregating the power network into smaller groups of loosely-coupled subsystems facilitates operational flexibility, the prevention of widespread blackouts, and efficient grid operation.

In this work, we establish a notion of similarity and closeness of the buses in the network based on the sensitivity matrix obtained from the power flow solution. The underlying coupling between buses is then represented as a directed graph (similarity graph). We utilize the computationally efficient and highly scalable spectral clustering technique to analyze the network laplacian of the similarity graph. Results of the proposed clustering technique on various IEEE test systems are provided.

13:50
Ahmed Alahmed (Cornell University, United States)
Guido Cavraro (National Renewable Energy Laboratory, United States)
Andrey Bernstein (National Renewable Energy Laboratory, United States)
Lang Tong (Cornell University, United States)
Operating-Envelopes-Aware Decentralized Welfare Maximization for Energy Communities
PRESENTER: Lang Tong

ABSTRACT. We consider the optimal scheduling of consumptions and distributed energy resources within an energy community managed by a profit-neural community operator in a distribution system. We assume that the distribution system operator transacts with its customers under the regulated net energy metering tariff, and it sets operating envelopes to ensure system reliability. We propose an incentive-aligned community pricing mechanism under which the decentralized optimization of community members' benefit implies the optimization of overall community welfare. The proposed pricing mechanism satisfies the cost-causation principle and ensures the stability of the energy community in a coalition game setting. Numerical examples provide insights into the characteristics of the proposed pricing mechanism and quantitative measures of its performance.

14:10
Fernando Paganini (Universidad ORT Uruguay, Uruguay)
Andres Ferragut (Universidad ORT Uruguay, Uruguay)
Dynamic Load Balancing of Selfish Drivers between Spatially Distributed Electrical Vehicle Charging Stations

ABSTRACT. This paper considers an electrical vehicle recharging infrastructure made up of physically separate stations serving spatially distributed requests for charge. Arriving EVs receive feedback on transport times to each station, and waiting times at congested stations, based on which they make a selfish selection. We present a fluid model of the resulting dynamics, in particular modeling queueing delays as a function of fluid queues, and two different models of client departures: given sojourn times, or given service times. In each case, the joint load balancing dynamics is related to a convex program, suitable variant of a centralized optimal transport problem. In particular we show the correspondence between equilibrium points and the corresponding optima, and use Lagrange duality to interpret the convergence properties of the dynamics. The results have similarities and differences with classical work on selfish routing for transportation networks. We present illustrative simulations, which also explore the validity of the model beyond the fluid assumption.

14:30
Young-ho Cho (The University of Texas at Austin, United States)
Hao Zhu (University of Texas at Austin, United States)
Topology-aware Piecewise Linearization of the AC Power Flow through Generative Modeling

ABSTRACT. Effective power flow modeling critically affects the ability to efficiently solve large-scale grid optimization problems, especially those with topology-related decision variables. In this work, we put forth a generative modeling approach to obtain a piecewise linear (PWL) approximation of AC power flow by training a simple neural network model from actual data samples. By using the ReLU activation, the NN models can produce a PWL mapping from the input voltage magnitudes and angles to the output power flow and injection. Our proposed generative PWL model uniquely accounts for the nonlinear and topology-related couplings of power flow models, and thus it can greatly improve the accuracy and consistency of output power variables. Most importantly, it enables to reformulate the nonlinear power flow and line status-related constraints into mixed-integer linear ones, such that one can efficiently solve grid topology optimization tasks like the AC optimal transmission switching (OTS) problem. Numerical tests using the IEEE 14- and 118-bus test systems have demonstrated the modeling accuracy of the proposed PWL approximation using a generative approach, as well as its ability in enabling competitive OTS solutions at very low computation order.

13:30-15:10 Session WeC2: BHRS2 - Learning and Networks II

Organizers - Bruce Hajek and R. Srikant

Chair:
Aleksander Stolyar (University of Illinois at Urbana Champaign, United States)
Location: Solarium
13:30
Julia Gaudio (Northwestern University, United States)
Miklos Racz (Northwestern University, United States)
Colin Sandon (MIT, United States)
Jiaming Xu (Duke University, United States)
Dana Yang (Cornell University, United States)
"All-Something-Nothing" Phase Transitions in Planted Subgraph Recovery

ABSTRACT. This talk delves into the problem of inferring a k-factor, specifically a spanning k-regular graph, planted within an ER random graph. Our research uncovers an interesting "all-something-nothing" phase transition. Specifically, we demonstrate that as the average degree of the ER random graph surpasses the critical threshold of 1/k, the inference problem undergoes a transition from near-perfect recovery ("all" phase) to partial recovery ("something" phase). Moreover, as the average degree continues to increase, the accuracy of recovery diminishes to zero, leading to the onset of the "nothing" phase. This finding complements the recent result by Mossel et al., who established that for certain sufficiently dense graphs, the problem undergoes an "all-or-nothing" phase transition, jumping from near-perfect recovery to near-zero accuracy.

13:50
Akshay Mete (Texas A&M University, United States)
Rahul Singh (Indian Institute of Science, Bengaluru, India, India)
P. R. Kumar (Texas A&M University, United States)
The Reward Biased Method: An Optimism based Approach for Reinforcement Learning
PRESENTER: P. R. Kumar

ABSTRACT. A fundamental problem in reinforcement learning is what is referred to as the closed-loop identifiability problem, the conflict between exploration and exploitation, or the problem of dual control. Generally, for a system operating in closed-loop, one can only learn the closed-loop dynamics. To learn enough about the open-loop dynamics to be able to choose an optimal control law, one will need to excite the system to explore it. How to optimally do so?

One approach proposed in 1980 consisted of adding a bias term that favored model parameters with smaller costs (or larger rewards) to the log-likelihood function, and then using a certainty equivalent control law. It was shown that this optimistic approach yields a long-term average cost reward that is optimal for the unknown system. This result was established under various conditions in a variety of contexts including Multi-Armed Bandits (MABs), Markov Decision Processes (MDPs), Linear Quadratic Gaussian (LQG) systems, nonlinear systems, and controlled diffusions. This method is now called the ``Reward-Biased Method (RBM).”

An alternative optimistic approach was proposed by Lai and Robbins in 1983 in the context of MABs. It consisted of developing confidence intervals for each bandit, and using the upper end of the confidence interval to choose between bandits. Importantly, it was shown that this approach is asymptotically optimal with respect to the order of the ``regret” as well as its pre-constant. Subsequently this approach was extended under various conditions to systems such as MDPs and Linear Quadratic (LQ) Gaussian systems. This method is now called the ``Upper Confidence Bound” (UCB) Method.

Over the past few years, the analysis of RBM has been extended to characterize its regret for MABs, MDPs and LQ sub-Gaussian systems. Also, comparative simulation studies have been conducted on the RBM, UCB and other methods. These results appear to show that RBM has excellent finite-time performance. RBM can be considered as a Lagrangian formulation, and UCB as a primal constrained optimization problem. The simulation results appear to suggest that the upper confidence regions of UCB are too large, or equivalently that the implicit Lagrange multiplier corresponding to UCB is too small.

This paper will provide an account of these recent developments as well as some open problems.

14:10
Advait Parulekar (The University of Texas at Austin, United States)
Liam Collins (The University of Texas at Austin, United States)
Karthikeyan Shanmugam (Google, India)
Aryan Mokhtari (University of Texas at Austin, United States)
Sanjay Shakkottai (The University of Texas at Austin, United States)
On representation learning with the InfoNCE loss

ABSTRACT. Contrastive learning is a popular means for learning representations of data without the need for labeled samples, with the InfoNCE loss being the most popular means of learning such representations. This method has had great empirical success, yet it is not well-understood how this loss enables representation learning. In brief, this method associates each data sample (e.g., an image) with positive augmentations (e.g., rotation, crop) of itself along with a finite number of negative samples (typically random, unrelated images); and each image with its augmentations forms a cluster in the image space. In this work, we show that if the clusters associated with related images (e.g., images of dogs) are possibly non-overlapping but are close and intertwined, then the InfoNCE loss provably recovers the underlying representation.

14:30
Qining Zhang (University of Michigan - Ann Arbor, United States)
Lei Ying (The University of Michigan, Ann Arbor, United States)
Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms

ABSTRACT. We consider a stochastic multi-armed bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of T consecutive rounds. Though each objective has been individually well-studied, i.e., best arm identification for (i) and regret minimization for (ii), the simultaneous realization of both objectives remains an open problem, despite its practical importance. We introduce Regret Optimal Best Arm Identification which aims to achieve these dual objectives. To solve this problem, we characterize its fundamental limits and develop low-complexity algorithms for fast and regret optimal best arm identification.

14:50
Daniel Freund (MIT, United States)
Thodoris Lykouris (MIT, United States)
Wentao Weng (MIT, United States)
Quantifying the Cost of Learning in Queueing Systems

ABSTRACT. Queueing systems are widely applicable stochastic models with use cases in communication networks, healthcare, service systems, etc. Although their optimal control has been extensively studied, most existing approaches assume perfect knowledge of system parameters. Of course, this assumption rarely holds in practice where there is parameter uncertainty, thus motivating a recent line of work on bandit learning for queueing systems. This nascent stream of research focuses on the asymptotic performance of the proposed algorithms. In this paper, we argue that an asymptotic metric, which focuses on late-stage performance, is insufficient to capture the intrinsic statistical complexity of learning in queueing systems which typically occurs in the early stage. Instead, we propose the Cost of Learning in Queueing (CLQ), a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty. We characterize the CLQ of a single-queue multi-server system, and then extend these results to multi-queue multi-server systems and networks of queues. In establishing our results, we propose a unified analysis framework for CLQ which bridges Lyapunov and bandit analysis and could be of independent interest.

15:10
Anton Xue (University of Pennsylvania, United States)
Rajeev Alur (University of Pennsylvania, United States)
Eric Wong (University of Pennsylvania, United States)
Stability Guarantees for Feature Attributions with Multiplicative Smoothing

ABSTRACT. Explanation methods for machine learning models tend to not provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. To achieve such a model, we develop a smoothing method called Multiplicative Smoothing (MuS). We show that MuS overcomes theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with a variety of feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.

13:30-15:10 Session WeC3: Information Theory and Coding
Location: Butternut
13:30
Yuming Han (Dept of Electrical and Computer Engineering, Texas A&M University, United States)
Zixiang Xiong (Dept of Electrical and Computer Engineering, Texas A&M University, United States)
Anders Host-Madsen (Dept of Electrical Engineering, University of Hawaii at Manoa, United States)
On Energy-Delay Tradeoff in Uncoordinated MAC

ABSTRACT. Polyanskiy [1] proposed a framework for the MAC problem with a large number of users, where users employ a common codebook in the finite blocklength regime. In this work, we extend [1] to the case when the number of active users is random and there is also a delay constraint. We first define a random-access channel and derive the general converse bound. Our bound captures the basic tradeoff between the required energy and the delay constraint. Then we propose an achievable bound for block transmission. In this case, all packets are transmitted in the second half of the block to avoid interference. We then study treating interference as noise (TIN) with both single user and multiple users. Last, we derive an achievable bound for the packet splitting model, which allows users to split each packet into two parts with different blocklengths. Our numerical results indicate that, when the delay is large, TIN is effective; on the other hand, packet splitting outperforms as the delay decreases.

13:50
Sina Eghbal (University of Cincinnati, United States)
Badri N. Vellambi (University of Cincinnati, United States)
Lawrence Ong (University of New Castle, Australia)
Parastoo Sadeghi (University of New South Wales (Canberra), Australia)
An Improved Greedy Cover Algorithm for Pliable Index Coding

ABSTRACT. We present the ImpGrCov algorithm–a novel algo- rithm for devising binary linear codes for pliable index coding that have lower codelengths than those devised using state-of-the-art algorithms such as the greedy cover, and random cover algorithms. The proposed algorithm employs a novel, improved decoder that uses a representative matrix to keep track of relationships between undecoded messages, along with a demand-based exponential weighting of receivers. Simulations indicate 10–19% improvement in average codelength over the best performing algorithm in the literature.

14:10
Eric Ruzomberka (Princeton University, United States)
Homa Nikbakht (Princeton University, United States)
Christopher Brinton (Purdue University, United States)
David Love (Purdue University, United States)
H. Vincent Poor (Princeton University, United States)
Derandomizing Codes for the Binary Adversarial Wiretap Channel of Type II

ABSTRACT. We revisit the binary adversarial wiretap channel (AWTC) of type II in which an active adversary can read a fraction $r$ and flip a fraction $p$ of codeword bits. The semantic-secrecy capacity of the AWTC II is partially known, where the best-known lower bound is non-constructive, proven via a random coding argument that uses a large number (that is exponential in blocklength $n$) of random bits to seed the random code. In this paper, we establish a new derandomization result in which we match the best-known lower bound of $1-H_2(p)-r$ where $H_2(\cdot)$ is the binary entropy function via a random code that uses a small seed of only $O(n^2)$ bits. Our random code construction is a novel application of \textit{pseudolinear codes} -- a class of non-linear codes that have $k$-wise independent codewords when picked at random where $k$ is a design parameter. As the key technical tool in our analysis, we provide a soft-covering lemma in the flavor of Goldfeld, Cuff and Permuter (Trans. Inf. Theory 2016) that holds for random codes with $k$-wise independent codewords.

14:30
Tharindu Adikari (University of Toronto, Canada)
Haider Al-Lawati (University of Toronto, Canada)
Jason Lam (Huawei Technologies Canada, Canada)
Zhenhua Hu (Huawei Technologies Canada, Canada)
Stark Draper (University of Toronto, Canada)
Straggler Exploitation in Distributed Computing Systems with Task Grouping

ABSTRACT. We consider the problem of stragglers in distributed computing systems. Stragglers are compute nodes that unpredictably become slow. This often increases the completion times of tasks. One popular method for mitigating stragglers is to replicate work. Out of all replications of a task, only the first to complete is accepted. The rest are discarded. Discarding work completed by some of the workers leads to wastage of resources. In this paper, we propose a method for exploiting the work completed by stragglers rather than discarding it. We increase the granularity of the assigned work and the frequency of worker updates. We demonstrate that the proposed method reduces the completion time of tasks via experiments performed on a simulated cluster as well as on Amazon EC2 with Apache Hadoop.

14:50
Vinayak Ramkumar (Tel Aviv University, Israel)
Netanel Raviv (Washington University in St. Louis, United States)
Itzhak Tamo (Tel Aviv University, Israel)
Coefficient Complexity in Low-Access Quantized Linear Computations

ABSTRACT. Quantizing linear computations over the reals is a common practice in many distributed systems, with applications in efficient, private, or robust machine learning algorithms. Recent work studied the access parameter of such computations, i.e., the maximum number of data entries required to accommodate any given computation. It was shown that lower access can be achieved with higher redundancy and vice versa, and a corresponding scheme based on covering codes was given. Surprisingly, the resulting scheme is universal to all possible two-valued quantizations, meaning that the same storage protocol can be used to retrieve any linear combination with two distinct coefficients---regardless of what those coefficients are---with the same access parameter. In this paper we extend this universal protocol to all possible quantizations with any number of values; while the storage parameter remains identical, the access parameter increases according to a new additive-combinatorics property we call coefficient complexity. We then turn to study the coefficient complexity---we characterize the complexity of small sets of coefficients, provide bounds, and identify coefficient sets having the highest and lowest complexity. Interestingly, arithmetic progressions have the lowest possible complexity, and some geometric progressions have the highest possible complexity, the former being particularly attractive for its common use in uniform quantization.

13:30-15:10 Session WeC4: Learning and Inference I
Location: Pine
13:30
Chedi Morchdi (University of Utah, United States)
Yi Zhou (University of Utah, United States)
Jie Ding (University of Minnesota, United States)
Bei Wang (University of Utah, United States)
Exploring Gradient Oscillation in Deep Neural Network Training

ABSTRACT. Understanding optimization in deep learning is a fundamental problem, and recent findings have challenged the previously held belief that gradient descent stably trains deep networks. In this study, we delve deeper into the instability of gradient descent during the training of deep networks. By employing gradient descent to train various modern deep networks, we provide empirical evidence demonstrating that a significant portion of the optimization progress occurs through the utilization of oscillating gradients. These gradients exhibit a high negative correlation between adjacent iterations. Furthermore, we make the following noteworthy observations about these gradient oscillations (GO): (i) GO manifests in different training stages for networks with diverse architectures; (ii) when using a large learning rate, GO consistently emerges across all layers of the networks; and (iii) when employing a small learning rate, GO is more prominent in the input layers compared to the output layers. These discoveries indicate that GO is an inherent characteristic of training different types of neural networks and may serve as a source of inspiration for the development of novel optimizer designs.

13:50
Xinyi Tong (Tsinghua University, China)
Xiangxiang Xu (MIT, United States)
Shao-Lun Huang (Tsinghua Shenzhen International Graduate School, China)
Lizhong Zheng (MIT, United States)
Robust Transfer Learning Based on Minimax Principle
PRESENTER: Xinyi Tong

ABSTRACT. The similarity between target and source tasks is a crucial quantity for theoretical analyses and algorithm designs in transfer learning studies. However, this quantity is often difficult to be precisely captured. To address this issue, we make a boundedness assumption on the task similarity and then propose a mathematical framework based on the minimax principle, which minimizes the worst-case expected population risk under this assumption. Furthermore, our proposed minimax problem can be solved analytically, which provides a guideline for designing robust transfer learning models. According to the analytical expression, we interpret the influences of sample sizes, task distances, and the model dimensionality in knowledge transferring. Then, practical algorithms are developed based on the theoretical results. Finally, experiments conducted on image classification tasks show that our approaches can achieve robust and competitive accuracies for practical datasets.

14:10
Seonho Kim (The Ohio State University, United States)
Kiryung Lee (The Ohio State University, United States)
Fast max-affine regression via stochastic gradient descent

ABSTRACT. We consider regression of the max-affine model that combines multiple affine models via the max function. The max-affine model ubiquitously arises in applications such as multiclass classification and auction problems. It also generalizes the forward model in phase retrieval and rectifier linear unit activation function. We present a non-asymptotic convergence analysis of mini-batch stochastic gradient descent (SGD) for max-affine regression when the model is observed at random locations following the sub-Gaussianity with anti-concentration. Under these assumptions, a suitably initialized SGD converges linearly to the ground truth. Due to its low per-iteration cost, SGD converges faster than alternating minimization and gradient descent in run time. Our numerical results corroborate the presented theoretical results.

14:30
Ege Can Kaya (Purdue University, United States)
Abolfazl Hashemi (Purdue University, United States)
Relative Entropy Regularization for Robust Submodular Multi-Task Subset Selection

ABSTRACT. We propose a novel formulation for multi-task subset selection with the aim of finding a solution that is locally distributionally robust in the ``neighborhood'' of a reference distribution, assigning an importance score to each task in a multi-task objective, using relative entropy as a regularizer. Using duality, we show that this novel formulation is equivalent to a normalized, monotone nondecreasing submodular function, which can be optimized very efficiently with standard greedy-based methods. This approach bridges the existing gap in optimization of performance-robustness tradeoff in multi-task subset selection. We then experimentally corroborate our theoretical results, comparing it with two other algorithms focused on optimizing the performance of the worst-case task, and on directly optimizing the performance on the reference distribution itself. We conclude that solving our novel formulation produces a solution that performs well on the reference distribution, is locally distributionally robust, and is quick in terms of computation time.

14:50
Rachel Newton (University of Michigan, United States)
Zhe Du (University of Michigan, United States)
Laura Balzano (University of Michigan, United States)
Peter Seiler (University of Michigan, United States)
Manifold Optimization for Data Driven Reduced-Order Modeling
PRESENTER: Rachel Newton

ABSTRACT. The focus of this paper is on data-driven reduced-order modeling. We assume a high fidelity, discrete-time model is available for simulation. The simulator allows state and output trajectories to be collected for any specified initial condition and input signal. An optimal reduced-order model (ROM) requires: (i) the selection of a lower dimensional subspace for the state of the ROM, and (ii) an optimal reduced-order state-space model (evolving on the lower dimensional subspace). A common heuristic is to: (i) select the lower-order subspace using proper orthogonal decomposition (POD), and (ii) use least-squares to fit the reduced-order state-space model on the POD subspace. We demonstrate the potential deficiencies of this heuristic via two simple examples. In order to address these deficiencies, we propose a novel method to optimize the choice of subspace using the Grassmannian manifold. Finally, we show that our proposed manifold optimization empirically outperforms the POD heuristic on the two motivating examples and a planar wind farm model.

15:30-17:30 Session WeD1: Learning Theory
Location: Library
15:30
Enoch Hyunwook Kang (University of Washington, United States)
P. R. Kumar (Texas A&M University, United States)
Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls Oracle

ABSTRACT. In online exploration systems where users with fixed preferences repeatedly arrive, it has recently been shown that O(1), i.e., bounded regret, can be achieved when the system is modeled as a linear contextual bandit. This result may be of interest for recommender systems, where the popularity of their items is often short-lived, as the exploration itself may be completed quickly before potential long-run non-stationarities come into play. However, in practice, exact knowledge of the linear model is difficult to justify. Furthermore, potential existence of unobservable covariates, uneven user arrival rates, interpretation of the necessary rank condition, and users opting out of private data tracking all need to be addressed for practical recommender system applications. In this work, we conduct a theoretical study to address all these issues while still achieving bounded regret. Aside from proof techniques, the key differentiating assumption we make here is the presence of effective Synthetic Control Methods (SCM), which are shown to be a practical relaxation of the exact linear model knowledge assumption. We verify our theoretical bounded regret result using a minimal simulation experiment.

15:50
Xing Gao (University of Illinois at Chicago, United States)
Thomas Maranzatto (University of Illinois at Chicago, United States)
Lev Reyzin (University of Illinois at Chicago, United States)
A Unified Analysis of Dynamic Interactive Learning

ABSTRACT. In this paper we investigate the problem of learning evolving concepts over a combinatorial structure. Previous work by Emamjomeh-Zadeh et al. (2020) introduced dynamics into interactive learning as a way to model non-static user preferences in clustering problems or recommender systems. We provide many useful contributions to this problem. First, we give a framework that captures the two different models analyzed by Emamjomeh-Zadeh et al. (2020), which allows us to study any type of concept evolution and matches the same query complexity bounds and running time guarantees of the previous models. Using this general model we solve the open problem of closing the gap between the upper and lower bounds on query complexity. Finally, we study an efficient algorithm where the learner simply follows the feedback at each round, and we provide mistake bounds for low diameter graphs such as cliques, stars, and general o(log n) diameter graphs by using a Markov Chain model.

16:10
Zahra Esmaeilbeig (University of Illinois at Chicago, United States)
Mojtaba Soltanalian (University of Illinois at Chicago, United States)
Deep Learning Meets Adaptive Filtering: A Stein's Unbiased Risk Estimator Approach

ABSTRACT. This paper revisits two prominent adaptive filtering algorithms through the lens of algorithm unrolling, namely recursive least squares (RLS) and equivariant adaptive source separation (EASI), in the context of source estimation and separation. Building upon the unrolling methodology, we introduce novel task-based deep learning frameworks, denoted as Deep RLS and Deep EASI. These architectures transform the iterations of the original algorithms into layers of a deep neural network, thereby enabling efficient source signal estimation by taking advantage of a training process. To further enhance performance, we propose training these deep unrolled networks utilizing a loss function grounded on a Stein's unbiased risk estimator (SURE). Our empirical evaluations demonstrate the efficacy of this SURE-based approach for enhanced source signal estimation.

16:30
Somtochukwu Oguchienti (Ph.D. Student, Electrical and Computer Engineering, Purdue University, United States)
Mahsa Ghasemi (Assistant Professor, Electrical and Computer Engineering, Purdue University, United States)
Inverse Reinforcement Learning with Learning and Leveraging Demonstrators' Varying Expertise Levels

ABSTRACT. A common assumption in most Inverse Reinforcement Learning (IRL) methods is that human demonstrations are drawn from an optimal policy. However, this assumption poses a problem in real-world applications, as humans may have varying aptitudes for different tasks. To address this, preference-based IRL methods, using human rankings of trajectories, have been explored as a potential solution. More recently, demonstrations from humans with different levels of expertise have also been utilized to relax this assumption. In this paper, we consider scenarios where there is no prior knowledge of the demonstrators' expertise levels and humans are not readily available to provide rankings. Therefore, we propose a new method called Inverse Reinforcement Learning from Varying Expertise Levels (IRL-VEL), which combines demonstrations from different demonstrators with different levels of expertise to learn a high performance policy from their proficiency scores—an indication of how optimal their demonstrations are. In particular, we solve an optimization problem to learn these scores and subsequently use them to efficiently learn a reward function that optimizes a policy through reinforcement learning. To assess the performance of our algorithm, we learn two separate policies from two reward functions: an initial reward function directly obtained from these combined demonstrations and a final reward function that incorporates the proficiency scores. We evaluate the proposed algorithm in two simulation environments, namely, the continuous mountain car and the MuJoCo swimmer. The simulation results demonstrate that the proposed method can effectively learn and leverage the expertise levels underlying the demonstrations to learn a policy, significantly outperforming the methods that do not account for the expertise levels.

16:50
Xiangxiang Xu (Massachusetts Institute of Technology, United States)
Lizhong Zheng (Massachusetts Institute of Technology, United States)
Sequential Dependence Decomposition and Feature Learning
PRESENTER: Xiangxiang Xu

ABSTRACT. In this paper, we introduce an approach to decompose statistical dependence and learn informative features from sequential data. We first present a sequential decomposition of dependence, which extends the chain rule of mutual information. To learn this decomposition from data, we investigate the optimal feature representations associated with decomposed dependence modes and develop corresponding learning algorithms. Specifically, for stationary processes, we demonstrate applications of the learned features in computing optimal Markov approximation and testing the order of Markov processes.

15:30-17:30 Session WeD2: Learning and Inference II
Location: Solarium
15:30
Mohammadhossein Abedinzadeh (Department of Electrical and Computer Engineering, Binghamton University–SUNY, Binghamton, NY 13902 USA, United States)
Emrah Akyol (Department of Electrical and Computer Engineering, Binghamton University–SUNY, Binghamton, NY 13902 USA, United States)
A Multidimensional Opinion Evolution Model with Confirmation Bias

ABSTRACT. In this paper, we present a novel opinion dynamics model in vector spaces. We are individuals and information sources, where individuals' opinions evolve over time and are represented by vectors. The information sources are characterized by persistent viewpoints, while confirmation bias is explicitly incorporated into the model to account for individuals' selective information acquisition from information sources that align with their existing opinions. Drawing inspiration from the Friedkin-Johnsen model, this mechanism combines an individual's innate opinion, the opinions of their social network neighbors, and the influence of information sources. The social network operates on trust and remains static, while the information sources introduce dynamism through weighted norms based on the distance between an individual's and the information source's opinion vector. We characterize the convergence conditions for the proposed dynamics and present estimation and computation methods for steady-state values under both linear and non-linear state-dependent confirmation bias weight functions.

15:50
Abhin Shah (MIT, United States)
Raaz Dwivedi (MIT, United States)
Devavrat Shah (MIT, United States)
Gregory Wornell (MIT, United States)
On counterfactual inference with unobserved confounding

ABSTRACT. Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the underlying joint distribution as an exponential family, we reduce learning the unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing covariates.

16:10
Chelsea Huynh (University of California, Irvine (UCI) - Department of Mathematics, United States)
Anna Ma (University of California, Irvine (UCI) - Department of Mathematics, United States)
Michael Strand (University of California, Irvine (UCI) - Department of Statistics, United States)
Block-missing data in linear systems: An unbiased stochastic gradient descent approach
PRESENTER: Michael Strand

ABSTRACT. Achieving accurate approximations to solutions of large linear systems is crucial, especially when those systems utilize real-world data. A consequence of using real-world data is that there will inevitably be missingness. Current approaches for dealing with missing data, such as deletion and imputation, can introduce bias. Recent studies proposed an adaptation of stochastic gradient descent (SGD) in specific missing-data models. In this work, we propose a new algorithm, $\ell$-tuple mSGD, for the setting in which data is missing in a block-wise, tuple pattern. We prove that our proposed method uses unbiased estimates of the gradient of the least squares objective in the presence of tuple missing data. We also draw connections between $\ell$-tuple mSGD and previously established SGD-type methods for missing data. Furthermore, we prove our algorithm converges when using updating step sizes and empirically demonstrate the convergence of $\ell$-tuple mSGD on synthetic data. Lastly, we evaluate $\ell$-tuple mSGD applied to real-world continuous glucose monitoring (CGM) device data.

16:30
Meiyu Zhong (University of Arizona, United States)
Ravi Tandon (University of Arizona, United States)
Learning Fair Classifiers via Min-Max F-divergence Regularization

ABSTRACT. As machine learning (ML) based systems are adopted in domains such as law enforcement, criminal justice, finance, hiring and admissions, ensuring the fairness of ML aided decision-making is becoming increasingly important. In this paper, we focus on the problem of fair classification, and introduce a novel min-max F-divergence regularization framework for learning fair classification models while preserving high accuracy.

Our framework consists of two trainable networks, namely, a classifier network and a bias/fairness estimator network, where the fairness is measured using the statistical notion of F-divergence. We show that F-divergence measures possess convexity and differentiability properties, and their variational representation makes them widely applicable in practical gradient based training methods. The proposed framework can be readily adapted to multiple sensitive attributes and for high dimensional datasets. We study the F-divergence based training paradigm for two types of group fairness constraints, namely, demographic parity and equalized odds. We present a comprehensive set of experiments for several real-world data sets arising in multiple domains (including COMPAS, Law Admissions, Adult Income, and CelebA datasets).

To quantify the fairness-accuracy tradeoff, we introduce the notion of fairness-accuracy receiver operating characteristic (FA-ROC) and a corresponding low-bias FA-ROC, which we argue is an appropriate measure to evaluate different classifiers. In comparison to several existing approaches for learning fair classifiers (including pre-processing, post-processing and other regularization methods), we show that the proposed F-divergence based framework achieves state-of-the-art performance with respect to the trade-off between accuracy and fairness.

16:50
Alon Kipnis (Reichman University, Israel)
The Minimax Risk in Testing the Histogram of Discrete Distributions for Uniformity under Missing Ball Alternatives

ABSTRACT. We consider the problem of testing the fit of a sample of items from many categories to the uniform distribution over the categories. As a class of alternative hypotheses, we consider the removal of an $\ell_p$ ball of radius $\epsilon$ around the uniform rate sequence for $p \leq 2$. We characterize the asymptotic minimax risk as the number of samples and number of dimensions go to infinity when $\epsilon$ is small, for testing based on the sample's histogram (number of absent categories, singletons, collisions, ...). For example, with $p=1$ and in the limit of a small expected number of samples $n$ compared to the number of categories $N$ (aka "sub-linear" regime), the minimax risk $R^*_\epsilon$ asymptotes to $2 \Phi(-n \epsilon^2/\sqrt{8N})$, with $\Phi(x)$ the normal CDF. Empirical studies over a range of problem parameters show that this estimate is accurate in finite samples and that our test is significantly better than the chisquared test or a test that only uses collisions. Our analysis relies on the asymptotic normality of histogram ordinates, the equivalence between the minimax setting and a Bayesian setting, and the characterization of the least favorable prior by reducing a multi-dimensional optimization problem to a one-dimensional problem.

15:30-17:30 Session WeD3: Coding Techniques
Location: Butternut
15:30
Lu Le Phuc (Ho Chi Minh University of Science, Viet Nam)
Hoang Dau Son (The School of Computing Technologies, RMIT University, Australia)
Hy Ngo Dinh (Ho Chi Minh University of Science, Viet Nam)
Thuc Nguyen Dinh (Ho Chi Minh University of Science, Viet Nam)
New results on Erasure Combinatorial Batch Codes
PRESENTER: Lu Le Phuc

ABSTRACT. We investigate in this work the problem of Erasure Combinatorial Batch Codes (ECBC), in which $n$ files are stored on $m$ servers so that every set of $n-r$ servers allows a client to retrieve at most $k$ distinct files by downloading at most $t$ files from each server. Previous studies have solved this problem for the special case of $t=1$ using Combinatorial Batch Codes (CBC). We tackle the general case $t \geq 1$ using a generalization of Hall's theorem. Additionally, we address a realistic scenario in which the retrieved files are consecutive according to some order and provide a simple and optimal solution for this case.

15:50
Anindya Bijoy Das (Purdue University, United States)
Aditya Ramamoorthy (Iowa State University, United States)
David J. Love (Purdue University, United States)
Christopher G. Brinton (Purdue University, United States)
Preserving Sparsity and Privacy in Straggler-Resilient Distributed Matrix Computations

ABSTRACT. Existing approaches to distributed matrix computations involve allocating coded combinations of submatrices to worker nodes, to build resilience to stragglers and/or enhance privacy. In this study, we consider the challenge of preserving input sparsity in such approaches to retain the associated computational efficiency enhancements. First, we find a lower bound on the weight of coding, i.e., the number of submatrices to be combined to obtain coded submatrices to provide the resilience to the maximum possible number of stragglers (for given number of nodes and their storage constraints). Next we propose a distributed matrix computation scheme which meets this exact lower bound on the weight of the coding. Further, we develop controllable trade-off between worker computation time and the privacy constraint for sparse input matrices in settings where the worker nodes are honest but curious. Numerical experiments conducted in Amazon Web Services (AWS) validate our assertions regarding straggler mitigation and computation speed for sparse matrices.

16:10
Emna Ben Yacoub (Technical University of Munich, Germany)
Trapping and Absorbing Set Enumerators for Multi-Edge Type LDPC Code Ensembles

ABSTRACT. The finite-length and asymptotic (elementary) trapping and (fully) absorbing set enumerators for multi-edge type(MET) low-density parity-check (LDPC) code ensembles are derived. An efficient method for the evaluation of the asymptotic distributions is presented and evaluated.

16:30
Mohammad Reza Deylam Salehi (Eurecom, Sorbonne Université, France)
Derya Malak (Eurecom, France)
An Achievable Low Complexity Encoding Scheme for Coloring Cyclic Graphs
PRESENTER: Derya Malak

ABSTRACT. In this paper, we utilize graph coloring for functions captured by cycle graphs to achieve lossless function compression. For functional compression of characteristic graphs or cycles and their OR graphs, we present a feasible scheme for coloring the odd cycles and their OR power graphs. In addition, we calculate the chromatic number for coloring the $n$-th power of even cycles, which paves the way for calculating the minimum entropy coloring. Moreover, we determine the number of distinct eigenvalues in each OR power and the largest eigenvalue for each power, which allows us to derive a bound on the chromatic number of each OR power of a cycle. Our method provides a proxy for understanding the fundamental limits of compressing cyclic graphs, which has broad applications in various practical use cases of cryptography and computer algebra.

16:50
Giuseppe Cotardo (Virginia Tech, United States)
Gretchen Matthews (Virginia Tech, United States)
Alberto Ravagnani (Eindhoven University of Technology, Netherlands)
Julia Shapiro (Virginia Tech, United States)
Multishot Adversarial Network Decoding

ABSTRACT. We investigate adversarial network coding and decoding focusing on the multishot regime. Errors can occur on a proper subset of the network edges and are modeled via an adversarial channel. The paper contains both bounds and capacity-achieving schemes for the Diamond Network and the Mirrored Diamond Network. We also initiate the study of the generalizations of these networks.

15:30-17:30 Session WeD4: RESU1 - Towards Mathematical Foundation of Federated Learning I

Organizers - Lili Su, Pengkun Yang, Jiaming Xu, and Rasoul Etesami

Location: Pine
15:30
Yichen Ruan (Carnegie Mellon University, United States)
Carlee Joe-Wong (Carnegie Mellon University, United States)
Fedsoft: Soft clustered federated learning with proximal local updating

ABSTRACT. While federated learning has recently exploded in popularity as a means to distributedly train a machine learning model, most federated learning frameworks assume that one model is used for all users. In practice, users may instead wish to utilize models personalized to their own local data. We propose a mechanism to do so that follows the ideas of clustered federated learning, in which each client's data is drawn from a mixture of the same underlying source distributions. FedSoft trains both locally personalized models and high-quality cluster models in this setting and limits client workload by using proximal updates to require the completion of only one optimization task from a subset of clients in every communication round. We show, analytically and empirically, that FedSoft effectively exploits similarities between the source distributions to learn personalized and cluster models that perform well.

15:50
Jiadong Liang (Peking University, China)
Yuze Han (Peking University, China)
Xiang Li (Peking University, China)
Zhihua Zhang (Peking University, China)
Complete Asymptotic Analysis for Projected Stochastic Approximation and Debiased Variants
PRESENTER: Xiang Li

ABSTRACT. In this study, we investigate federated optimization through the lens of linearly constrained optimization problems and analyze the loopless projection stochastic approximation (LPSA) method. LPSA incorporates a probabilistic projection with probability $p_n$ at the $n$-th iteration to ensure solution feasibility at random. We set $p_n \propto \eta_n^\beta$, where $\eta_n$ represents the step size and $\beta \in [0, 1]$ is a tuning parameter. The previous research demonstrates that LPSA exhibits a fascinating bias-variance trade-off through diffusion approximation in the asymptotic regime. Specifically, for $\beta < 0.5$, the bias becomes degenerate, while for $\beta > 0.5$, the bias dominates the variance. In this work, we investigate the intricate scenario where $\beta=0.5$ and discover that the last iterate, after appropriate scaling, weakly converges to a biased Gaussian distribution. We specify the exact form of the incurred bias, as a result of which, we provide a comprehensive asymptotic analysis of LPSA and a complete characterization of phase transitions. We observe that such a non-zero bias leads to slow convergence. Motivated by this observation, we propose a debiased version of LPSA, called Debiased LPSA (DLPSA), which effectively reduces projection complexity compared to the vanilla LPSA.

16:10
Ningning Ding (Northwestern University, United States)
Zhenyu Sun (Northwestern Univeristy, United States)
Ermin Wei (Northwestern Univeristy, United States)
Randall Berry (Northwestern University, United States)
Incentive Mechanism Design for Federated Learning and Unlearning

ABSTRACT. To protect users' right to be forgotten in federated learning, federated unlearning aims at eliminating the impact of leaving users’ data on the global learned model. The current research in federated unlearning mainly concentrated on developing effective and efficient unlearning techniques. However, the issue of incentivizing valuable users to remain engaged and preventing their data from being unlearned is still under-explored, yet important to the unlearned model performance. This paper focuses on the incentive issue and develops an incentive mechanism for federated learning and unlearning. We first characterize the leaving users' impact on the global model accuracy and the required communication rounds for unlearning. Building on these results, we propose a four-stage game to capture the interaction and information updates during the learning and unlearning process. A key contribution is to summarize users’ multi-dimensional private information into one-dimensional metrics to guide the incentive design. We show that users who incur high costs and experience significant training losses are more likely to discontinue their engagement through federated unlearning. On the other hand, the server tends to retain users who make substantial contributions to the model but has a trade-off on users' training losses, as large training losses of retained users increase privacy costs but decrease unlearning costs. The numerical results demonstrate the necessity of unlearning incentives for retaining valuable leaving users, and also show that our proposed mechanisms decrease the server's cost by up to 53.91% compared to state-of-the-art benchmarks.

16:30
Sravani Ramishetty (Purdue University, United States)
Abolfazl Hashemi (Purdue University, United States)
High Probability Guarantees For Federated Learning.

ABSTRACT. Federated learning (FL) has emerged as a promising approach for training machine learning models on distributed data while ensuring privacy preservation and data locality. However, one key challenge in FL optimization is the lack of high probability guarantees, which can undermine the trustworthiness of FL solutions. To address this critical issue, we introduce Federated Averaging with post-optimization (FedAvg-PO) method, a modification to the Federated Averaging (FedAvg) algorithm. The proposed algorithm applies a post-optimization phase to evaluate a short list of solutions generated by several independent runs of the FedAvg method. These modifications allow to significantly improve the large-deviation properties of FedAvg which improve the reliability and robustness of the optimization process. The novel complexity analysis shows that FedAvg-PO can compute accurate and statistically guaranteed solutions in the federated learning context. Our result further relaxes the restrictive assumptions in FL theory by developing new technical tools which may be of independent interest. The insights provided by the computational requirements analysis contribute to the understanding of the scalability and efficiency of the algorithm, guiding its practical implementation.

16:50
Lili Su (Northeastern University, United States)
Jiaming Xu (Duke University, United States)
Pengkun Yang (Tsinghua University, China)
A Non-parametric View of FedAvg and FedProx: Beyond Stationary Points

ABSTRACT. Federated Learning (FL) is a promising decentralized learning framework and has great potentials in privacy preservation and in lowering the computation load at the cloud. Recent work showed that FedAvg and FedProx – the two widely-adopted FL algorithms – fail to reach the stationary points of the global optimization objective even for homogeneous linear regression problems. Further, it is concerned that the common model learned might not generalize well locally at all in the presence of heterogeneity.

In this paper, we analyze the convergence and statistical efficiency of FedAvg and FedProx, addressing the above two concerns. Our analysis is based on the standard nonparametric regression in a reproducing kernel Hilbert space (RKHS), and allows for heterogeneous local data distributions and unbalanced local datasets. We prove that the estimation errors, measured in either the empirical norm or the RKHS norm, decay with a rate of $1/t$ in general and exponentially for finite-rank kernels. In certain heterogeneous settings, these upper bounds also imply that both FedAvg and FedProx achieve the optimal error rate. To further analytically quantify the impact of the heterogeneity at each client, we propose and characterize a novel notion-federation gain, defined as the reduction of the estimation error for a client to join the FL. We discover that when the data heterogeneity is moderate, a client with limited local data can benefit from a common model with a large federation gain. Two new insights introduced by considering the statistical aspect are: (1) requiring the standard bounded dissimilarity is pessimistic for the convergence analysis of FedAvg and FedProx; (2) despite inconsistency of stationary points, their limiting points are unbiased estimators of the underlying truth. Numerical experiments further corroborate our theoretical findings.