Program for Thursday, September 26th

PROGRAM FOR THURSDAY, SEPTEMBER 26TH

Days:

previous day

next day

all days

View: session overview talk overview

08:30-10:10 Session 4A: Learning and Networks III

Invited Session

Chairs:

R Srikant and Bruce Hajek

Location: Auditorium

08:30	Xihe Gu, Greg Fields and Tara Javidi Active Exploration for Minimax Estimation in MDPs ABSTRACT. The problem of active exploration is central to a great many stochastic control problems. In this paper, we consider the problem of active pure exploration problem for a Markov decision problems (MDPs) where an accurate estimate of the controlled transition probability matrix must be obtained uniformly well (minimizing the maximum error) in terms $\ell_2$ distance. In contrast to much of prior work which rely on coverage and randomization as means to ensure effective model estimation, our proposed algorithm is fully deterministic and builds on our prior work on adaptive sampling for estimating probability distributions. More specifically, we first propose a genie-aided distribution matching optimization and analyze its sample allocation performance with respect to an oracle. We then instantiate this in form of an optimistic tracking algorithm that 1) tracks the estimation error at each state-action pair and 2) relies on a stochastic shortest path to reach and sample at the state-action pair with the worst estimation error.
08:50	Thodoris Lykouris and Wentao Weng Learning to Defer in Content Moderation: The Human-AI Interplay ABSTRACT. BHRS1 Ensuring successful content moderation is vital for a healthy online social platform where it is necessary to responsively remove harmful posts without jeopardizing non-harmful content. Due to the high-volume nature of online posts, human-only moderation is operationally challenging, and platforms often employ a human-AI collaboration approach. A typical machine-learning heuristic estimates the expected harmfulness of incoming posts and uses fixed thresholds to decide whether to remove the post (classification decision) and whether to send it for human review (admission decision). This can be inefficient as it disregards the uncertainty in the machine learning estimation, the time-varying element of human review capacity and post arrivals, and the selective sampling in the dataset (humans only review posts filtered by the admission algorithm). We introduce a model to capture the human-AI interplay in content moderation. The algorithm observes contextual information for incoming posts, makes classification and admission decisions, and schedules posts for human review. Non-admitted posts do not receive reviews (selective sampling) and admitted posts receive human reviews on their harmfulness. These reviews help educate the machine-learning algorithms but are delayed due to congestion in the human review system. The classical learning-theoretic way to capture this human-AI interplay is via the framework of learning to defer, where the algorithm has the option to defer a classification task to humans for a fixed cost and immediately receive feedback. Our model contributes to this literature by introducing congestion in the human review system. Moreover, unlike work on online learning with delayed feedback where the delay in the feedback is exogenous to the algorithm’s decisions, the delay in our model is endogenous to both the admission and the scheduling decisions. We propose a near-optimal learning algorithm that carefully balances the classification loss from a selectively sampled dataset, the idiosyncratic loss of non-reviewed posts, and the delay loss of having congestion in the human review system. To the best of our knowledge, this is the first result for online learning in contextual queueing systems and hence our analytical framework may be of independent interest.
09:10	Keshav Patel Keval, Vivek Borkar and Ananya Singhal Minkowski Descent: An Algorithm for Stochastic Global Optimization ABSTRACT. We introduce a novel stochastic global optimization algorithm called Minkowski descent for optimizing a black-box function over a convex domain. No assumptions are made about the function, such as Lipschitz smoothness or strong convexity. The novel algorithm is a two timescale scheme which uses a heuristic inspired by the Shapley-Folkman theorem in order to accelerate the search process. We compare the multi-start version of our algorithm with standard algorithms such as pure random search, multi-start simulated annealing and multi-start stochastic gradient descent, among other gradient based algorithms.
09:30	Litu Rout, Yujia Chen, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai and Wen-Sheng Chu On solving inverse problems in computer vision using latent diffusion models ABSTRACT. BHRS1 Image reconstruction, often formulated as posterior sampling, presents significant computational challenges while using latent diffusion models. We present recent progress in using computationally tractable second-order methods for posterior sampling, and demonstrate its applicability to image reconstruction in the presence of corruptions (e.g., blurriness, low resolution, noise), as well as text-guided image editing.
09:50	Austin Cooper and Sean Meyn Quickest Change Detection Using Mismatched CUSUM ABSTRACT. The field of quickest change detection (QCD) concerns design and analysis of algorithms to estimate in real time the time at which an important event takes place and identify properties of the post-change behavior. The goal is to devise a stopping time adapted to the observations that minimizes an L1 loss. Approximately optimal solutions are well known under a variety of assumptions. In the work surveyed here we consider the CUSUM statistic, which is defined as a one-dimensional reflected random walk driven by a functional of the observations. It is known that the optimal functional is a log likelihood ratio subject to special statical assumptions. The talk will discuss model free approaches to detection design, and consider the following questions: 1. What is the performance for a given functional of the observations in Bayesian and minimax settings? 2. How do the conclusions change when there is dependency between pre- and post-change behavior? 3. How can techniques from statistics and machine learning be adapted to approximate the best functional in a given class? Based on the recent research surveyed in Reinforcement Learning Design for Quickest Change Detection, https://arxiv.org/abs/2403.14109

08:30-10:10 Session 4B: Information Theory, Statistics, and Data Science II

Invited Session

Chairs:

Olgica Milenkovic, Kamalika Chaudhuri, Eli Chien and Pan Li

Location: 1040

08:30	Elena Grigorescu Differential Privacy and Sublinear Time are Incompatible Sometimes ABSTRACT. Differential privacy and sublinear algorithms are both rapidly emerging algorithmic themes in times of big data analysis. Although recent works have shown the existence of differentially private sublinear algorithms for many problems including graph parameter estimation and clustering, little is known regarding hardness results on these algorithms. In this paper, we initiate the study of lower bounds for problems that aim for both differentially-private and sublinear-time algorithms. Our main result is the incompatibility of both the desiderata in the general case. In particular, we prove that a simple problem based on one-way marginals yields both a differentially-private algorithm, as well as a sublinear-time algorithm, but does not admit a ``strictly'' sublinear-time algorithm that is also differentially private. Joint work with Jeremiah Blocki, Hendrik Fichtenberger, Tamalika Mukherjee.
08:50	Rachel Cummings, Alessandro Epasto, Jieming Mao, Tamalika Mukherjee, Tingting Ou and Peilin Zhong Differentially Private Space-Efficient Algorithms for Frequency Moment Estimation in the Turnstile Model ABSTRACT. The \emph{turnstile} continual release model of differential privacy captures scenarios where a privacy-preserving real-time analysis is sought for a dataset evolving through additions and deletions. In typical applications of real-time data analysis, both the length of the stream $T$ and the size of the universe $\|\cU\|$ from which data come can be extremely large. This motivates the study of private algorithms in the turnstile setting using space sublinear in both $T$ and $\|\mathcal{U}\|$. In this paper, we give the first sublinear space differentially private algorithms for the fundamental problems of counting distinct elements and $\ell_p$-frequency moment estimation in the turnstile streaming model. For counting distinct elements, our algorithm achieves $\Tilde{O}_{\eta}(T^{1/3})$ space and additive error, and a $(1+\eta)$-relative approximation for all $\eta \in (0,1)$. Our result significantly improves upon the space requirements of the state-of-the-art for this problem in this model, which has a linear dependency in both $T$ and $\vert \cU \vert$, while still achieving an additive error that is close to the known $\Omega(T^{1/4})$ lower bound for arbitrary streams. This addresses an open question posed by~\cite{JainKRSS23} about designing low-memory mechanisms for this problem. For the more general problem of $\ell_p$-frequency moment estimation for $p \in [0,+\infty)$, our algorithm achieves an additive error and space of $\Tilde{O}_{\eta}(T^{1/3})$, and a $(1+\eta)$-relative approximation for all $\eta \in (0,1)$. We also give a space lower bound for this problem, which shows that any algorithm that uses our techniques must use space $\Tilde{\Omega}(T^{1/3})$.
09:10	Oliver Kosut, Wael Alghamdi, Shahab Asoodeh, Flavio P. Calmon, Atefeh Gilani, Juan Felipe Gomez and Lalitha Sankar Optimized Mechanisms for Differentially Private Composition ABSTRACT. In many applications, differential privacy mechanisms must be applied many times over multiple iterations, or compositions, of an algorithm. For example, when training a machine learning model, differentially private noise is typically added at each iteration of stochastic gradient descent. In practice, Gaussian noise is often employed, but is it the optimal choice? This talk will explore several settings in which this question arises to find that sometimes the answer is no. In the setting with multiple compositions, we argue that the optimal mechanism should minimize either a Kullback-Leibler divergence or a Rényi divergence (i.e., the Rényi differential privacy). We find that the optimal mechanism according to either of these objectives is not Gaussian, and in fact does not even have a closed-form solution. Numerical optimization can be used to derive mechanisms that are very close to optimal, which turn out to have a surprisingly ragged shape, which we call Cactus mechanisms. This same approach is further applied to multi-dimensional mechanisms and discrete mechanisms, again deriving optimal mechanisms that are not easily expressible in closed form. We also find that, in the limit of small sensitivity (or equivalently, large noise variance), the Gaussian distribution does reappear. Thus, the correct answer to the question “Is Gaussian noise optimal?” seems to be “maybe”.
09:30	Rongzhe Wei, Eli Chien and Pan Li Differentially Private Graph Diffusion with Applications in Personalized PageRanks ABSTRACT. Graph diffusion, which iteratively propagates real-valued substances among the graph, is used in numerous graph/network-involved applications. However, releasing diffusion vectors may reveal sensitive linking information in the data such as transaction information in financial network data. Protecting the privacy of graph data is challenging due to its interconnected nature. This work proposes a novel graph diffusion framework with edge-level differential privacy guarantees by using noisy diffusion iterates. The algorithm injects Laplace noise per diffusion iteration and adopts a degree-based thresholding function to mitigate the high sensitivity induced by low-degree nodes. Our privacy loss analysis is based on Privacy Amplification by Iteration (PABI), which to our best knowledge, is the first effort that analyzes PABI with Laplace noise and provides relevant applications. We also introduce a novel $\infty$-Wasserstein distance tracking method, which tightens the analysis of privacy leakage and makes PABI practically applicable. We evaluate this framework by applying it to Personalized Pagerank computation for ranking tasks. Experiments on real-world network data demonstrate the superiority of our method under stringent privacy conditions.
09:50	Charlie Hou, Akshat Shrivastava, Hongyuan Zhan, Rylan Conway, Trang Le, Adithya Sagar, Giulia Fanti and Daniel Lazar Training Language Models on Private Federated Data in the Age of LLMs ABSTRACT. On-device training, or federated learning (FL), is currently the most common approach for training machine learning (ML) models on private, distributed user data. Despite this, on-device training has several drawbacks: (1) most user devices are too small to train large models on-device, (2) on-device training is communication- and computation-intensive, and (3) on-device training can be difficult to debug and deploy. In parallel, the quality of differentially private (DP) synthetic data has increased significantly in recent years, thanks in part to improvements in foundation models. To address the problems with on-device learning, we propose a new paradigm, in which FL servers first collect DP synthetic data from clients, then finetune their model centrally on the private synthetic data. To this end, we present Private Evolution-Text (PrE-Text), a method for generating differentially private (DP) synthetic textual data. First, we show that across multiple datasets, training small models (models that fit on user devices) with PrE-Text synthetic data outperforms small models trained on-device under practical privacy regimes (epsilon=1.29, epsilon=7.58). We achieve these results while using 9x fewer rounds, 6x less client computation per round, and 100x less communication per round. Second, finetuning large models on PrE-Text's DP synthetic data improves large language model (LLM) performance on private data across the same range of privacy budgets. Altogether, these results suggest that training on DP synthetic data can be a promising alternative to DP-FL for training models from private, distributed data.

08:30-10:10 Session 4C: Optimization II

Invited Session

Chairs:

Salar Fattahi and Richard Zhang

Location: 1030

08:30	Nikos Zarifis, Puqian Wang, Ilias Diakonikolas and Jelena Diakonikolas Robustly Learning Single-Index Models via Alignment Sharpness ABSTRACT. We study the problem of learning Single-Index Models under the L_2^2 loss in the agnostic model. We give an efficient learning algorithm, achieving a constant factor approximation to the optimal loss, that succeeds under a range of distributions (including log-concave distributions) and a broad class of monotone and Lipschitz link functions. This is the first efficient constant factor approximate agnostic learner, even for Gaussian data and for any nontrivial class of link functions. Prior work for the case of unknown link function either works in the realizable setting or does not attain constant factor approximation. The main technical ingredient enabling our algorithm and analysis is a novel notion of a local error bound in optimization that we term alignment sharpness and that may be of broader interest.
08:50	Yiping Lu Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls ABSTRACT. The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the difference between the weighted average outcomes of the treated and control units, where the weights are learned from the observed data. Under this setting, we surprisingly observed that the optimal experimental design problem could be reduced to a so-called phase synchronization problem. We solve this problem via a normalized variant of the generalized power method with spectral initialization. On the theoretical side, we establish the first global optimality guarantee for experiment design when pre-treatment data is sampled from certain data-generating processes. Empirically, we conduct extensive experiments to demonstrate the effectiveness of our method on both the US Bureau of Labor Statistics and the Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean square error, our algorithm surpasses the random design by a large margin.
09:10	Frank E. Curtis and Qi Wang Stochastic-Gradient-based Interior-Point Methods ABSTRACT. I will discuss some of our recent work on stochastic-gradient-based interior-point algorithms for solving constrained optimization problems, such as those arising in informed machine learning. The algorithms are single-loop in the sense that they do not require an inner criterion for updating the barrier parameter; rather, the barrier parameter is decreased according to a prescribed sequence. Convergence guarantees are attained in both deterministic and stochastic settings. The algorithms exhibit good practical performance in comparison to projected-gradient-based methods.
09:30	Jianhao Ma and Salar Fattahi Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion ABSTRACT. We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite and low-rank matrix from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization and constant step-size provably converges to the ground truth without requiring any explicit regularization or projection step. This convergence result holds true even in the over-parameterized scenario, where the true rank is unknown and conservatively over-estimated by a search rank. The existing results for this problem require explicit regularization, periodic projections, or a sufficiently accurate initial point. At the crux of our method lies a novel weakly-coupled leave-one-out analysis, which allows us to establish the global convergence of GD, extending beyond what was previously possible using the classical leave-one-out analysis.
09:50	Andrew Cheng and Melanie Weber Structured Regularization for SPD Optimization with Side Information ABSTRACT. Matrix-valued optimization tasks, including those involving symmetric positive definite (SPD) matrices, arise in a wide range of applications in machine learning, data science and statistics. Classically, such problems are solved via constrained Euclidean optimization, where the domain is viewed as a Euclidean space and the structure of the matrices (e.g., positive definiteness) enters as constraints. More recently, geometric approaches that leverage parametrizations of the problem as unconstrained tasks on the corresponding matrix manifold have been proposed. While they exhibit algorithmic benefits in many settings, they cannot directly handle additional constraints, such as side information on the solution. A remedy comes in the form of constrained Riemannian optimization methods, notably, Riemannian Frank-Wolfe and Projected Gradient Descent. However, both algorithms require potentially expensive subroutines that can introduce computational bottlenecks in practise. To mitigate these shortcomings, we propose a structured regularization approach based on symmetric gauge functions. On the example of computing optimistic likelihoods, we show that the regularizer preserves crucial structure in the objective, including geodesic convexity. This allows for solving the regularized problem with a fast unconstrained method with global optimality certificate. We demonstrate the effectiveness of our approach in numerical experiments and through theoretical analysis.

08:30-10:10 Session 4D: Capacity of Communication Channels

Location: 3000

08:30	Omri Lev, Matthew Ho, Ligong Wang and Gregory Wornell On the Capacity of Scalar Gaussian Channels Subject to State Obfuscation PRESENTER: Matthew Ho ABSTRACT. We study communication over the scalar Gaussian fading channel subject to a state-obfuscation constraint, which requires that the channel outputs and the fading coefficients be almost independent. We consider two cases for the fading coefficient: where it is independent and identically distributed in time, and where it is quasistatic, i.e., it is randomly generated but then remains the same during communication. For the transmitter, we study three different scenarios: where it only has access to the message; where it has channel-state information about the fading coefficient; and where it has access to feedback. We establish conditions for the communication capacity subject to obfuscation to be non-zero, and analyze this capacity in the high signal-to-noise ratio regime.
08:50	Omkar Mujumdar and J. Nicholas Laneman Capacity of a Complex-Valued Gaussian Channel with Frequency Offset PRESENTER: Omkar Mujumdar ABSTRACT. We consider a complex additive white Gaussian noise (AWGN) channel consisting of an approximately ideal low pass filter (LPF) along with a carrier frequency offset (CFO) that stays constant during transmission. The CFO is assumed to lie with a maximum value around the origin. The transmitter and the receiver are supposed to be unaware of the exact value of the CFO but aware of its maximum value. The lack of knowledge of the precise channel probability law while knowing the family of probability laws to which it belongs gives rise to a compound channel model. An expression for the capacity of this compound channel is derived and shown to be independent of the exact value of the CFO. We provide a proof of the forward or the achievability part of the coding theorem.
09:10	Haobo Chen, Gholamali Aminian and Yuheng Bu An Algorithm for Computing the Capacity of Symmetrized KL Information for Discrete Channels ABSTRACT. Symmetrized Kullback-Leibler (KL) information ($I_{\mathrm{SKL}}$), which symmetrizes the traditional mutual information by integrating Lautum information, has been shown as a critical quantity in communication~\cite{aminian2015capacity} and learning theory~\cite{aminian2023information}. This paper considers the problem of computing the capacity in terms of $I_{\mathrm{SKL}}$ for a fixed discrete channel. Such a maximization problem is reformulated into a discrete quadratic optimization with a simplex constraint. One major challenge here is the non-concavity of Lautum information, which complicates the optimization problem. Our method involves symmetrizing the KL divergence matrix and applying iterative updates to ensure a non-decreasing update while maintaining a valid probability distribution. We validate our algorithm on Binary symmetric Channels and Binomial Channels, demonstrating its consistency with theoretical values. Additionally, we explore its application in machine learning through the Gibbs channel, showcasing the effectiveness of our algorithm in finding the worst-case data distributions.
09:30	Amanda Merkley and Pulkit Grover Understanding Neural Population Communication with Latent Channels ABSTRACT. Experiments in neuroscience often aim to understand how information about a message, e.g., an experimental stimulus, is communicated between neural populations, where each population can comprise millions of neurons. In such high-dimensional systems, dimension reduction is imperative for isolating message-relevant communication. Here, we address the problem of defining and characterizing latent representations of population-level communication. We introduce the concept of a latent channel (LC) to capture statistical communication between latent activity of two neural populations. Through simple, illustrative examples of population communication, we show how four kinds of LCs emerge from message-specific interactions between populations. We then propose an algorithm for identifying `valid’ LCs that optimally preserve information about the message. Finally, we use Partial Information Decomposition, a framework for describing how two populations interact with respect to the message, to provide insight into how LCs preserve and reallocate pieces of message-relevant information from the original, high-dimensional system. Through definitions and examples, we provide a principled approach for understanding latent representations of neural population communication.

08:30-10:10 Session 4E: Sequential Methods II

Invited Session

Chairs:

Venu Veeravalli and Georgios Fellouris

Location: 2000N

08:30	Ayelet Cohen-Portnoy and Alejandro Cohen Non-Adaptive Multi-Stage Algorithm for Group Testing with Prior Statistics PRESENTER: Alejandro Cohen ABSTRACT. In this paper, we propose an efficient multi-stage algorithm for non-adaptive Group Testing (GT) with general correlated prior statistics. The proposed solution can be applied to any correlated statistical prior represented in trellis, e.g., finite state machines and Markov processes. We introduce a variation of List Viterbi Algorithm (LVA) to enable accurate recovery using much fewer tests than objectives, which efficiently gains from the correlated prior statistics structure. Our numerical results demonstrate that the proposed Multi-Stage GT (MSGT) algorithm can obtain the optimal Maximum A Posteriori (MAP) performance with feasible complexity in practical regimes, such as with COVID-19 and sparse signal recovery applications, and reduce in the scenarios tested the number of pooled tests by at least 25% compared to existing classical low complexity GT algorithms. Moreover, we analytically characterize the complexity of the proposed MSGT algorithm that guarantees its efficiency.
08:50	Brian Cho, Aurelien Bibaut and Nathan Kallus Post-Contextual Bandit Inference with Adaptive Stopping ABSTRACT. Post-Contextual Bandit Inference with Adaptive Stopping
09:10	Aaditya Ramdas Power-one sequential tests, confidence sequences, e-processes and e-detectors ABSTRACT. This talk will summarize recent progress in a line of work called anytime-valid inference that stems back to Robbins, Lai and Siegmund (1967-76). E-processes are a nonparametric and composite generalization of nonnegative supermartingales. If a problem is (power-one) sequentially testable at level a if and only if it is testable by thresholding an e-process at 1/a. In fact, all admissible power-one sequential tests must be based on thresholding e-processes using Ville's inequality. All admissible confidence sequences can be obtained by inverting admissible sequential tests. For sequential change detection, we provide two very general (and potentially all-encompassing) frameworks for change detection: one employing e-processes (e-detectors) and one employing confidence sequences. Together, these provide a unified and extremely general and powerful framework for sequential testing, estimation and change detection, providing new solutions for lots of composite and nonparametric problems.
09:30	Liad Lea Didi, Tomer Gafni and Kobi Cohen Active Change Point Anomaly Detection Over Composite Hypotheses ABSTRACT. The problem of detecting a single anomalous process among a finite number M of processes is considered. We examine a composite hypothesis case, where observations of a process follow a common distribution with an unknown parameter (vector). The parameter value resides in either normal or abnormal parameter spaces, contingent on the process state. Until the change point, all processes are in a normal state, and after the change point, one process transitions to an abnormal state. Our goal is to develop a sequential search strategy that minimizes the expected detection time since the anomaly occurred subject to an error probability constraint. We develop a novel anomaly detection algorithm, named Searching for Change Point Anomaly (SCPA), with the following desired properties. Firstly, when no additional side information on the process states is available, the proposed algorithm is asymptotically optimal in terms of minimizing the detection delay as the error probability approaches zero. Secondly, in the scenario where the parameter value under the null hypothesis is known and equal for all normal processes, the proposed algorithm is also asymptotically optimal, demonstrating improved detection time determined by the true null state. Finally, we establish an explicit upper bound on the error probability under the proposed algorithm for the finite sample regime.
09:50	Mohamed Seif, Liyan Xie, Andrea Goldsmith and Vincent Poor Private Online Community Detection for Censored Block Models ABSTRACT. We study the private online change detection problem for dynamic communities, using a censored block model (CBM). Focusing on the notion of edge differential privacy (DP), we seek to understand the fundamental tradeoffs between the privacy budget, detection delay, and exact community recovery of community labels. We establish the theoretical lower bound on the delay in detecting changes privately and propose an algorithm capable of identifying changes in the community structure, while maintaining user privacy. Further, we provide theoretical guarantees for the effectiveness of our proposed method by showing necessary and sufficient conditions for change detection and exact recovery under edge DP. Simulation and real data examples are provided to validate the proposed method.

10:30-12:10 Session 5A: Games, Markets, and Learning

Invited Session

Chairs:

Tamer Basar and Rasoul Etesami

Location: Auditorium

10:30	Xiaochun Niu, Lihui Yi and Ermin Wei Incentive Analysis for Agent Participation in Federated Learning ABSTRACT. Federated learning presents a decentralized approach to machine learning, where multiple agents collaboratively train a model using their local data while preserving data privacy. This study investigates the incentives for each agent to join the global training process versus conducting local training independently. In this work, we focus on a multi-time period framework, where agents aim to minimize their accumulated expected loss functions. At each time period, agents decide whether to join the global training process or continue with their local training. Our goal is to determine if individual agents have sufficient incentives to participate in global training and, if so, to find the optimal time for their participation. By analyzing the trade-offs between global and local training in terms of expected loss reduction, we aim to develop insights into the incentive structures in federated learning and eventually advance collaborations in distributed machine learning systems.
10:50	Siddhartha Banerjee, Christina Lee Yu and Alankrita Bhatt Instance-Optimality in Online Learning ABSTRACT. We devise an online learning algorithm -- titled Switching via Monotone Adapted Regret Traces (SMART) -- that adapts to the data and achieves regret that is instance optimal, i.e., simultaneously competitive on every input sequence compared to the performance of the follow-the-leader (FTL) policy and the worst case guarantee of any other input policy. We show that the regret of the SMART policy on any input sequence is within a multiplicative factor e/(e−1)≈1.58 of the smaller of: 1) the regret obtained by FTL on the sequence, and 2) the upper bound on regret guaranteed by the given worst-case policy. This implies a strictly stronger guarantee than typical 'best-of-both-worlds' bounds as the guarantee holds for every input sequence regardless of how it is generated; we complement our algorithm with a fundamental lower bound showing that over all input sequences, no algorithm can get better than a 1.43-fraction of the minimum regret achieved by FTL and the minimax-optimal policy. Our approach is simple to implement as it begins by playing FTL and switches at most once during the time horizon to the worst-case algorithm. Our results follow from an operational reduction of instance optimal online learning to competitive analysis for the ski-rental problem, and we discuss how this idea may prove useful beyond the settings we consider.
11:10	Vijay Kamble and Eren Ozbay Incentives for Exploration at Market Equilibrium ABSTRACT. Online marketplaces face an exploration problem: the qualities of new supply units are unknown and must be discovered through customer feedback so that higher-quality supply gets prioritized for matching. However, customers are generally myopic and unwilling to participate in exploratory matches, leading to the well-known concern of incentivizing such exploration. This paper uncovers the role of competitive pricing effects arising from market congestion in creating incentives for exploratory behavior among myopic customers. The intuition is that since established higher-quality supply units are expected to be more popular and, hence, more congested, they naturally demand higher prices at a competitive equilibrium than new supply units, effectively incentivizing customers to explore. This paper presents a comprehensive analysis of the extent to which this intuition holds and the extent to which exogenous incentives for exploration are necessary for such markets. To investigate this question, we define a novel competitive equilibrium notion for markets with evolving public information about supply units. The key result establishes that, under a tightly characterized market regularity condition, the ratio of equilibrium matching value to the system optimal value is bounded by the aggregate congestion level, indicating that congested regular markets inherently incentivize exploration. We also show that this bound is optimal in the worst case. Furthermore, we show that in markets with linear information transition structures, the equilibrium achieves the first-best system-optimal value regardless of the congestion level. Finally, we address the problem of designing optimal price interventions to align market equilibrium with the system-optimal solution. Overall, our results inform market designers grappling with the concern of incentivizing exploration in various online marketplaces, shedding light on when interventions may be (un)necessary.
11:30	Eric Mazumdar, Kishan Panaganti and Laixi Shi Tractable Equilibrium Computation in Markov Games through Risk Aversion ABSTRACT. A significant roadblock to the development of principled multi-agent reinforcement learning is the fact that desired solution concepts like Nash equilibria may be intractable to compute. To overcome this obstacle, we take inspiration from behavioral economics and show that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all n-player matrix and finite-horizon Markov games. In particular, we show that they emerge as the endpoint of no-regret learning in suitably adjusted versions of the games. Crucially, the class of computationally tractable RQE is independent of the underlying game structure and only depends on agents' degree of risk-aversion and bounded rationality. To validate the richness of this class of solution concepts we show that it captures peoples' patterns of play in a number of 2-player matrix games previously studied in experimental economics. Furthermore, we give a first analysis of the sample complexity of computing these equilibria in finite-horizon Markov games when one has access to a generative model and validate our findings on a simple multi-agent reinforcement learning benchmark.
11:50	Rohit Parasnis, Visal Soeun and Saurabh Amin Designing interventions in multi-activity network games for forest resource management ABSTRACT. Motivated by the challenge of sustainable forest resource management, we consider the problem of shaping the incentives of entities that have harvesting rights in forest concession regions. We study optimal price-shaping in a coupled activity network game model in which these entities (agents) engage in two activities: (a) sustainable production of commodities, which does not interfere with protected forest resources, and (b) unsustainable production, which involves infringing into protected regions to expand their agricultural footprint. Crucially, we account for strategic agent-to-agent interactions as well as the clandestine deforestation activities of shadow companies. We focus on solving a social welfare maximization subject to budget constraints and bounds on aggregate unsustainable production at equilibrium and show that the resulting pricing policies are effective in reducing incentives for unsustainable production. Our results provide novel insights into how the interaction network topology, strategic substitutability between sustainable and unsustainable production, and cross-activity and intra-activity effects of strategic interactions shape the optimal policies. We also identify a node centrality measure that helps identify agents that contribute the most to reducing aggregate unsustainable production. Our empirical analyses validate our theoretical results and rank the performance of all our policy designs using two yardsticks: social welfare and aggregate unsustainable production in the post-intervention equilibrium.

10:30-12:10 Session 5B: Quantum Error-Correction

Invited Session

Chairs:

Dror Baron, Yuan Liu, Eric Chitambar and Olgica Milenkovic

Location: 1040

10:30	Bane Vasic, Nithin Raveendran and David Declercq Quantum LDPC Codes: Constructions and Decoders Handling Degeneracy ABSTRACT. Quantum error correction is critical for the practical realization of fault-tolerant quantum computing. Novel families of quantum low-density parity-check (QLDPC) codes with superior asymptotic error correction code properties have been proposed recently and are also considered for quantum computing hardware architectures/implementation. From a practical perspective, it is also essential to design finite-length QLDPC codes with high minimum distance and encoding rates and have efficient decoders. This talk gives an overview of our recent results in understanding degenerate errors of QLDPC codes and new decoders that correct them.
10:50	Eduardo Camps Moreno, Hiram López, Gretchen Matthews, Diego Ruano, Rodrigo San-José and Ivan Soprunov Binary Triorthogonal and CSS-T Codes for Quantum Error Correction ABSTRACT. In this paper, we study binary triorthogonal codes and their relation to CSS-T quantum codes. We characterize the binary triorthogonal codes that are minimal or maximal with respect to the CSS-T poset, and we also study how to derive new triorthogonal matrices from existing ones. Given a binary triorthogonal matrix, we characterize which of its equivalent matrices are also triorthogonal. As a consequence, we show that a binary triorthogonal matrix uniquely determines the parameters of the corresponding triorthogonal quantum code, meaning that any other equivalent matrix that is also triorthogonal gives rise to a triorthogonal quantum code with the same parameters.
11:10	Oskar Novak and Narayanan Rengaswamy GNarsil: Splitting Stabilizers into Gauges ABSTRACT. Quantum subsystem codes have been shown to improve error-correction performance, ease the implementation of logical operations on codes, and make stabilizer measurements easier by decomposing stabilizers into smaller-weight gauge operators. In this paper, we present two algorithms that produce new subsystem codes from a "seed" CSS code. They replace some stabilizers of a given CSS code with smaller-weight gauge operators that split the remaining stabilizers, while being compatible with the logical Pauli operators of the code. The algorithms recover the well-known Bacon-Shor code computationally as well as produce a new [[9,1,2,2]] rotated surface subsystem code with weight-3 gauges and weight-4 stabilizers. We illustrate using a [[100,25,3]] subsystem hypergraph product (SHP) code that the algorithms can produce more efficient gauge operators than the closed-form expressions of the SHP construction. However, we observe that the stabilizers of the lifted product quantum LDPC codes are more challenging to split into small-weight gauge operators. Hence, we introduce the subsystem lifted product (SLP) code construction and develop a new [[775,124,20]] code from Tanner's classical quasi-cyclic LDPC code. The code has high-weight stabilizers but all gauge operators that split stabilizers have weight 5, except one. In contrast, the LP stabilizer code from Tanner's code has parameters [[1054,124,20]]. This serves as a novel example of new subsystem codes that outperform stabilizer versions of them. Finally, based on our experiments, we share some general insights about non-locality's effects on the performance of splitting stabilizers into small-weight gauges.
11:30	Mert Gökduman, Hanwen Yao and Henry Pfister Erasure Decoding for Quantum LDPC Codes via Belief Propagation with Guided Decimation ABSTRACT. Quantum low-density parity-check (LDPC) codes are a promising family of quantum error-correcting codes with low overhead. Decoding quantum LDPC codes on quantum erasure channels has received more attention recently due to advances in erasure conversion for various types of qubits including neutral atoms, trapped ions, and superconducting qubits. Belief propagation with guided decimation (BPGD) decoding of quantum LDPC codes has demonstrated good performance in bit-flip and depolarizing noise. In this work, we apply BPGD decoding to quantum erasure channels. Using a natural modification, we show that BPGD offers competitive performance on quantum erasure channels for multiple families of quantum LDPC codes. Furthermore, we show that the performance of BPGD decoding on erasure channels can sometimes be improved significantly by either adding damping or adjusting the initial channel log-likelihood ratio for bits that are not erased. More generally, our results demonstrate BPGD is an effective general-purpose solution for erasure decoding across the quantum LDPC landscape.
11:50	Serge Adonsou, Guillaume Dauphinais, David Kribs, Priya Nadkarni and Michael Vasmer Unified and Generalized Approach to Entanglement-Assisted Quantum Error Correction ABSTRACT. In this talk, I’ll give an overview of a new framework for entanglement-assisted quantum error correcting codes that both unifies and generalizes the three original frameworks for such codes called EAQEC, EAOQEC, and EACQ. The approach is arrived at by viewing entanglement-assisted codes from the ‘operator algebra quantum error correction’ perspective, and it is built upon a recently established extension of the stabilizer formalism for quantum error correction to that setting (and so we denote it by EAOAQEC). Time dependent, I’ll briefly discuss a selection of our main results and new code constructions. This talk is based on joint work with Serge Adonsou, Guillaume Dauphinais, Priya Nadkarni, and Michael Vasmer.

10:30-12:10 Session 5C: Federated, Distributed, and Statistical Learning

Location: 2000N

10:30	Ran Greidi and Kobi Cohen Communication-Efficient Federated Learning via Sparse Training with Regularized Error Correction ABSTRACT. Federated Learning (FL) is an emerging paradigm that allows for decentralized machine learning (ML), where multiple models are collaboratively trained in a privacy-preserving manner. However, since communications and computation resources are limited, training DNN models in FL systems face challenges such as elevated computational and communication costs in complex tasks. Sparse training schemes gain increasing attention in order to scale down the dimensionality of each client (i.e., node) transmission. Specifically, sparsification with error correction methods is a promising technique, where only important updates are sent to the parameter server (PS) and the rest are accumulated locally. While error correction methods have shown to achieve a significant sparsification level of the client-to-PS message without harming convergence, pushing sparsity further remains unresolved due to the staleness effect. In this paper, we propose a novel algorithm, dubbed Federated Learning with Accumulated Regularized Embeddings (FLARE), to overcome this challenge. FLARE presents a novel sparse training approach via accumulated pulling of the updated models with regularization on the embeddings in the FL process, providing a powerful solution to the staleness effect, and pushing sparsity to an exceptional level. Our theoretical analysis demonstrates that FLARE's regularized error feedback achieves significant improvements in scalability with sparsity parameter. The performance of FLARE is validated through experiments on diverse and complex models, achieving a remarkable sparsity level (10 times and more beyond the current state-of-the-art) along with significantly improved accuracy.
10:50	Sajjad Ghiasvand, Amirhossein Reisizadeh, Mahnoosh Alizadeh and Ramtin Pedarsani Communication-efficient and Decentralized Federated Minimax Optimization PRESENTER: Sajjad Ghiasvand ABSTRACT. As distributed learning applications like Federated Learning, the Internet of Things (IoT), and Edge Computing expand, addressing their limitations becomes crucial. We approach decentralized learning across a network of communicating clients or nodes, focusing on two primary challenges: data heterogeneity and adversarial robustness. To address these, we introduce a decentralized minimax optimization method incorporating two key components: local updates and gradient tracking. Minimax optimization serves as a fundamental tool for adversarial training, ensuring robustness. Local updates are vital in Federated Learning (FL) to alleviate the communication bottleneck, while gradient tracking is necessary to demonstrate convergence amid data heterogeneity. Our analysis of the proposed algorithm, Dec-FedTrack, in nonconvex-strongly-concave minimax optimization demonstrates its convergence to a stationary point. Additionally, numerical experiments support our theoretical results.
11:10	Tamir L.S. Gez and Kobi Cohen A Masked Pruning-Based Algorithm for Dimensionality Reduction in Federated Learning Systems ABSTRACT. Federated Learning (FL) represents a growing machine learning (ML) paradigm designed for training models across numerous nodes that retain local datasets, all without directly exchanging the underlying private data with the parameter server (PS). Its increasing popularity is attributed to notable advantages in terms of training deep neural network (DNN) models under privacy aspects and efficient utilization of communication resources. Unfortunately, DNNs suffer from high computational and communication costs, as well as memory consumption in intricate tasks. These factors restrict the applicability of FL algorithms in communication-constrained systems with limited hardware resources. In this paper, we develop a novel algorithm that overcomes these limitations by synergistically combining a pruning-based method with the FL process, resulting in low-dimensional representations of the model with minimal communication cost, dubbed Masked Pruning over FL (MPFL). The algorithm operates by initially distributing weights to the nodes through the PS. Subsequently, each node locally trains its model and computes pruning masks. These low-dimensional masks are then transmitted back to the PS, which generates a consensus pruning mask, broadcasted back to the nodes. This iterative process enhances the robustness and stability of the masked pruning model. The generated mask is used to train the FL model, achieving significant bandwidth savings. We present an extensive experimental study demonstrating the superior performance of MPFL compared to existing methods. Additionally, we have developed an open-source software package for the benefit of researchers and developers in related fields.
11:30	Adrian Edin, Zheng Chen, Michel Kieffer and Mikael Johansson Temporal Predictive Coding for Gradient Compression in Distributed Learning ABSTRACT. This paper proposes a prediction-based gradient compression method for distributed learning with event-triggered communication. Our goal is to reduce the amount of information transmitted from the distributed agents to the parameter server by exploiting temporal correlation in the local gradients. We use a linear predictor that combines past gradients to form a prediction of the current gradient, with coefficients that are optimized by solving a least-square problem. In each iteration, every agent transmits the predictor coefficients to the server such that the predicted local gradient can be computed. The difference between the true local gradient and the predicted one, termed the prediction residual, is only transmitted when its norm is above some threshold. When this additional communication is omitted, the server uses the prediction as the estimated gradient. This proposed design shows notable performance gains compared to existing methods in the literature, achieving convergence with reduced communication costs.
11:50	Xuanzhou Chen, Austin Xu, Jingyan Wang and Ashwin Pananjady Learning the eye of the beholder: Statistical modeling and estimation for personalized color perception ABSTRACT. Color perception has long remained an intriguing topic spanning vision and cognitive science, signal processing, and computer graphics. People are often classified as either “color-normal” or “color-blind” and there are a few types of colorblindness that are the most prevalent. Such a classification has had significant downstream impact, e.g., on how we correct for colorblindness on computer displays. At the same time, empirical evidence, such as in optometry and vision science, has repeatedly suggested that categories for colorblindness only serve as approximations to real manifestations of it. With the motivation of better understanding individual-level color perception, we propose a model for color vision that unifies existing theories for color-normal and color-blind populations. This model posits a certain type of low-dimensional structure in color space according to which any given person distinguishes colors. We design an algorithm to learn this low-dimensional structure from user queries, and prove statistical guarantees on its performance. To collect user data, we adapt a user interface design, termed “perceptual adjustment queries” (PAQs), to a color matching task. This user interface efficiently infers a user’s color distinguishability profile from a small number of cognitively lightweight responses. In a host of simulations, PAQs offer significant advantages over the de facto method of collecting comparison-based similarity queries. A user study shows that our method captures individual differences among color-normal and color-blind people.

10:30-12:10 Session 5D: Quantization, Compression, and Computation

Location: 3000

10:30	Anju Anand and Emrah Akyol Strategic Quantization with Quadratic Distortion Measures ABSTRACT. We consider the problem of strategic quantization where an encoder and a decoder with misaligned objectives communicate over a rate-constrained noiseless channel. Specifically, we focus on a 2-dimensional source with quadratic distortion measures. We provide a design algorithm for this special case of strategic quantization, as well as an upper bound on the encoder distortion via employing linear communication strategies. Finally, we present comparative numerical results obtained via the proposed method, in conjunction with the aforementioned upper bound. We provide our numerical results and the code to obtain them for research purposes at https://tinyurl.com/allerton24.
10:50	Samar Agnihotri, Renu Rameshan, Ritwik Ghosal and Prerit Gupta Lossless Binary Image Compression Using Learned Multi-level Dictionaries PRESENTER: Renu Rameshan ABSTRACT. Lossless image compression is required in various applications to reduce storage or transmission costs of images, while requiring the reconstructed images to have zero information loss compared to the original. Existing lossless image compression methods either have simple design but poor compression performance, or complex design, better performance, but with no performance guarantees. In our endeavor to develop a lossless image compression method with low complexity and guaranteed performance, we argue that compressibility of a color image is essentially derived from the patterns in its spatial structure, intensity variations, and color variations. Thus, we divide the overall design of a lossless image compression scheme into three parts that exploit corresponding redundancies. We further argue that the binarized version of an image captures its fundamental spatial structure. In this first part of our work, we propose a scheme for lossless compression of binary images. The proposed scheme first learns dictionaries of 16×16, 8×8, 4 × 4, and 2 × 2 square pixel patterns from various datasets of binary images. It then uses these dictionaries to encode binary images. These dictionaries have various interesting properties that are further exploited to construct an efficient and scalable scheme. Our preliminary results show that the proposed scheme consistently outperforms existing conventional and learning based lossless compression approaches, and provides, on average, as much as 1.5× better performance than a common general purpose lossless compression scheme (WebP), more than 3× better performance than a state of the art learning based scheme, and better performance than a specialized scheme for binary image compression (JBIG2).
11:10	Derya Malak, Mohammad Reza Deylam Salehi, Berksan Serbetci and Petros Elia Multi-Functional Distributed Computing PRESENTER: Derya Malak ABSTRACT. The work here studies the communication cost for a multi-server, multi-task distributed computation framework, and does so for a broad class of functions and data statistics. Considering the framework where a user seeks the computation of multiple complex (conceivably non-linear) tasks from a set of distributed servers, we establish communication cost upper bounds for a variety of data statistics, function classes and data placements across the servers. To do so, we proceed to apply, for the first time here, Körner's characteristic graph approach --- which is known to capture the structural properties of data and functions --- to the promising framework of multi-server multi-task distributed computing. Going beyond the general expressions, and in order to offer clearer insight, we also consider the well-known scenario of cyclic dataset placement and linearly separable functions over the binary field, in which case our approach exhibits considerable gains over the state of the art. Similar gains are identified for multi-linear functions.

10:30-12:10 Session 5E: Information Theory, Statistics, and Data Science III

Invited Session

Chairs:

Pan Li, Kamalika Chaudhuri, Eli Chien and Olgica Milenkovic

Location: 1030

10:30	Eli Chien, Haoyu Wang, Ziang Chen and Pan Li Langevin Unlearning: A New Perspective of Noisy Gradient Descent for Machine Unlearning ABSTRACT. Machine unlearning has raised significant interest with the adoption of laws ensuring the ``right to be forgotten''. Researchers have provided a probabilistic notion of approximate unlearning under a similar definition of Differential Privacy (DP), where privacy is defined as statistical indistinguishability to retraining from scratch. We propose Langevin unlearning, an unlearning framework based on noisy gradient descent with privacy guarantees for approximate unlearning problems. Langevin unlearning unifies the DP learning process and the privacy-certified unlearning process with many algorithmic benefits. These include approximate certified unlearning for non-convex problems, complexity saving compared to retraining, sequential and batch unlearning for multiple unlearning requests.
10:50	Salim El Rouayheb You Shall Not Leak Your Brother's Data: Privacy Control in the Presence of Correlation ABSTRACT. Online users are being given more agency over their online privacy. The California Consumer Privacy Act (CCPA) and the EU General Data Protection Regulation (GDPR) give individual online users control over the personal information collected about them. However, this gives rise to a somehow overlooked problem in privacy. Due to correlation, one has still to worry about privacy even when opting out of it. For example, a user with a public profile in a social network may unknowingly compromise the privacy settings of his friends and family due to them having correlated behavior. In this talk, I will give a mathematical formulation of this problem which we refer to as On/Off privacy. I will discuss algorithms that allow opting in and out of privacy in the presence of different types of correlations, including Markov (as in location-based services), Hidden Markov Model (for time series and signals) and Bayesian networks (for graphical data).
11:10	Sina Eghbal, Badri Vellambi, Lawrence Ong and Parastoo Sadeghi Restricted Converse Results for g-Group Complete-{s} Pliable Index Coding Problems ABSTRACT. This paper expands upon the optimality results for a specific class of PICOD(t) problems referred to as g-group complete-S problems where S is a singleton set. The paper introduces a new class of codes referred to as excess-rate and derive non-achievability results that establishes the optimality of the previously proposed achievable scheme for g-group complete-{s} PICOD(t) problems when restricting codes from this class of codes. Using the aforementioned result, we derive the capacity region when either t or t + 1 is a multiple of the number of groups not in any user’s side information.
11:30	Jiaqi Wang and Ziv Goldfeld Differentially Private Sinkhorn Algorithm ABSTRACT. Optimal transport (OT) theory play a central role in the design and analysis of various machine learning algorithms. As such, approximate computation of the OT cost between large-scale dataset via the popular Sinkhorn algorithms forms a basic primitive. However, this approach may lead to privacy violations when dealing with datasets that contain sensitive information. To address this predicament, we propose a differentially private variant of the Sinkhorn algorithm and couple it with formal guarantees by deriving its privacy utility tradeoff (PUT). To that end, the Sinkhorn algorithm is treated as a block coordinate descent algorithm scheme, which we privatize by injecting Gaussian noise to the iterates. We establish a linear convergence rate for our private Sinkhorn algorithm and analyze its privacy by controlling the Renyi divergence between outputs corresponding to neighboring input dataset. Combining these results we obtain the desired PUT. In doing so, this work also closes an existing gap in formal guarantees for private constrained nonlinear optimization. As an application, we employ the noisy Sinkhorn algorithm for differentially private (approximate) computation of OT cost and derive insights from its PUT.
11:50	Gokcan Tatli and Ramya Vinayak Metric Clustering From Triplet Comparisons ABSTRACT. Using triplet comparison queries of the form "Do you think item a is more similar to item b or item c?" to learn the a positive definite matrix to capture a distance metric in $\mathbb{R}^d$ has been a popular approach to capture how human perceive similarity and differences between various objects/concepts. Most of the existing works focus on learning a single metric using data from all people in the dataset. However, people can systematically differ in their notions of similarity over a set of objects due to their diverse backgrounds. Therefore, using a single metric for everyone has limited capacity in capture the heterogeneity while modeling how people perceive objects in populations that have diverse subgroups. The subgroup structure is often salient and difficult to know a priori. We propose to learn the subgroup structure from the answers to triplet queries by clustering the user-triplet observation matrix. By modeling the problem of metric clustering as a low-rank matrix recovery problem, we leverage convex optimization based approach to perform clustering. We provide analysis for the cluster case that sheds light on when the approach succeeds and fails as function of distance between the metrics, size of the clusters, number of triplet queries answered per person and the noise level in the answers obtained. We validate of results through extensive simulations. Furthermore, we also provide analysis that shows how an outlier impacts the discovery of cluster structure.

16:00-17:00 Session 6: PLENARY

Location: Electrical and Computer Engineering Building (1002 ECEB)

Location: 1030