View: session overviewtalk overview
Organizers - Alejandro Dominguez-Garcia and Subhonmesh Bose
Organizers - Venu Veeravalli, George Atia, and Shaofeng Zou
Organizers - Bruce Hajek and R. Srikant
Organizers - Olgica Milenkovic and Rasoul Etesami
Organizers - G. Fellouris and V. Veeravalli
Organizers - Alejandro Dominguez-Garcia and Subhonmesh Bose
Organizers - Richard Zhang and Salar Fattahi
10:30 | One-Shot Non-Catalytic Distributed Purity Distillation ABSTRACT. In this paper we revisit the problem of distributed purity distillation in the one-shot setting. The problem considers a bipartite quantum state $\rho^{AB}$ shared between two parties Alice and Bob, who are allowed local unitary operations and one-way classical communication. The goal is to extract pure qubit states using these given resources, in their respective laboratories. A caveat of this problem is that all known protocols which recover the optimal number of pure qubit states require some pure qubits at the beginning at the protocol, to be used as ancilla. These ancilla qubits are used catalytically and recovered at the end of the protocol, but typically a large amount of ancilla is necessary for the protocol to work. Some workarounds exist via bootstrapping in the asymptotic iid setting, but aside from being rather ad-hoc, these ideas fail in the one-shot setting. In this paper, we present a one-shot purity distillation protocol which requires no ancilla qubits to work for most choices of the state $\rho^{AB}$ . Due to the nature of one-shot quantities, there do exist certain pathological situation when our protocol requires some ancilla to work, but we show that these pathological cases do not arise when multiple iid copies of the underlying state are available. |
10:50 | Privacy Amplification Against Quantum Side Information Via Regular Random Binning ABSTRACT. We consider privacy amplification against quantum side information by using regular random binning as an effective extractor. For constant-type sources, we obtain both error exponent and strong converse bounds in terms of the so-called quantum Augustin information. Via type decomposition, we then recover the error exponent for independent and identically distributed sources proved by Dupuis [arXiv:2105.05342]. As an application, we obtain an achievable secrecy exponent for classical-quantum wiretap channel coding in terms of the Augustin information, which solves an open problem in [IEEE Trans. Inf. Theory, 65(12):7985, 2019]. Our approach is to establish an operational equivalence between privacy amplification and quantum soft covering; this may be of independent interest. |
11:10 | Quantum Key Distribution with State Replacement PRESENTER: Michael Graifer ABSTRACT. Quantum key distribution protocols have been stud- ied for almost four decades, and have already seen real world applications. Such implementations often operate in naturally noisy environments (e.g., sensor thermal noise, sun radiation, etc.) with known or measurable noise levels. However, existing security proofs implicitly attribute any channel noise to a possible eavesdropper, an overly pessimistic assumption that can adversely effect the guaranteed protocol key-rate. In this paper, we analyze general prepare-and-measure QKD schemes, and show how to improve the bounds on the key-rate under the assumption of an independent state replacement noise at the receiver. Taking BB84 as a concrete example, our results theoretically guarantee a higher key-rate, as a function of the minimal possible replace- ment probability. In practice, for some parameter-regimes, this guarantees a higher tolerable QBER. |
11:30 | Verifying Chemical Reaction Networks with the Isabelle Theorem Prover ABSTRACT. Molecular programming involves algorithmically controlling nanoscale devices to compute functions, create structures, and manipulate molecules to perform various tasks. Many programmed nanoscale devices have been successfully demonstrated in the laboratory with physical experiments. Some of the applications for these devices are safety critical, and thus require extensive verification for their correctness. Unfortunately, experiments and deployment of these applications require molecular device counts on the order of ten to the eighteenth power, making verification via model checking and simulation infeasible. Verification of these systems must therefore be accomplished through proof techniques independent of the number of devices in the system. Developing these proofs can be tedious and time-consuming. In this paper, we develop general proof constructs for a subclass of molecular programs that compute semilinear functions modeled by a stochastic chemical reaction network. These proof techniques are developed in Isabelle, an automated theorem prover, with resulting theory files packaged so that they can be reused. |
Organizers - Alejandro Dominguez-Garcia and Subhonmesh Bose
13:30 | Aggregating Power Networks for Decentralized Control: A Spectral Clustering Approach ABSTRACT. With the large-scale integration of variable distributed energy resources (DERs) and the growing influence of aggregators in power networks, electricity generation, and grid operation are increasingly becoming decentralized. Aggregating the power network into smaller groups of loosely-coupled subsystems facilitates operational flexibility, the prevention of widespread blackouts, and efficient grid operation. In this work, we establish a notion of similarity and closeness of the buses in the network based on the sensitivity matrix obtained from the power flow solution. The underlying coupling between buses is then represented as a directed graph (similarity graph). We utilize the computationally efficient and highly scalable spectral clustering technique to analyze the network laplacian of the similarity graph. Results of the proposed clustering technique on various IEEE test systems are provided. |
13:50 | Operating-Envelopes-Aware Decentralized Welfare Maximization for Energy Communities PRESENTER: Lang Tong ABSTRACT. We consider the optimal scheduling of consumptions and distributed energy resources within an energy community managed by a profit-neural community operator in a distribution system. We assume that the distribution system operator transacts with its customers under the regulated net energy metering tariff, and it sets operating envelopes to ensure system reliability. We propose an incentive-aligned community pricing mechanism under which the decentralized optimization of community members' benefit implies the optimization of overall community welfare. The proposed pricing mechanism satisfies the cost-causation principle and ensures the stability of the energy community in a coalition game setting. Numerical examples provide insights into the characteristics of the proposed pricing mechanism and quantitative measures of its performance. |
14:10 | Dynamic Load Balancing of Selfish Drivers between Spatially Distributed Electrical Vehicle Charging Stations ABSTRACT. This paper considers an electrical vehicle recharging infrastructure made up of physically separate stations serving spatially distributed requests for charge. Arriving EVs receive feedback on transport times to each station, and waiting times at congested stations, based on which they make a selfish selection. We present a fluid model of the resulting dynamics, in particular modeling queueing delays as a function of fluid queues, and two different models of client departures: given sojourn times, or given service times. In each case, the joint load balancing dynamics is related to a convex program, suitable variant of a centralized optimal transport problem. In particular we show the correspondence between equilibrium points and the corresponding optima, and use Lagrange duality to interpret the convergence properties of the dynamics. The results have similarities and differences with classical work on selfish routing for transportation networks. We present illustrative simulations, which also explore the validity of the model beyond the fluid assumption. |
14:30 | Topology-aware Piecewise Linearization of the AC Power Flow through Generative Modeling ABSTRACT. Effective power flow modeling critically affects the ability to efficiently solve large-scale grid optimization problems, especially those with topology-related decision variables. In this work, we put forth a generative modeling approach to obtain a piecewise linear (PWL) approximation of AC power flow by training a simple neural network model from actual data samples. By using the ReLU activation, the NN models can produce a PWL mapping from the input voltage magnitudes and angles to the output power flow and injection. Our proposed generative PWL model uniquely accounts for the nonlinear and topology-related couplings of power flow models, and thus it can greatly improve the accuracy and consistency of output power variables. Most importantly, it enables to reformulate the nonlinear power flow and line status-related constraints into mixed-integer linear ones, such that one can efficiently solve grid topology optimization tasks like the AC optimal transmission switching (OTS) problem. Numerical tests using the IEEE 14- and 118-bus test systems have demonstrated the modeling accuracy of the proposed PWL approximation using a generative approach, as well as its ability in enabling competitive OTS solutions at very low computation order. |
Organizers - Bruce Hajek and R. Srikant
13:30 | "All-Something-Nothing" Phase Transitions in Planted Subgraph Recovery ABSTRACT. This talk delves into the problem of inferring a k-factor, specifically a spanning k-regular graph, planted within an ER random graph. Our research uncovers an interesting "all-something-nothing" phase transition. Specifically, we demonstrate that as the average degree of the ER random graph surpasses the critical threshold of 1/k, the inference problem undergoes a transition from near-perfect recovery ("all" phase) to partial recovery ("something" phase). Moreover, as the average degree continues to increase, the accuracy of recovery diminishes to zero, leading to the onset of the "nothing" phase. This finding complements the recent result by Mossel et al., who established that for certain sufficiently dense graphs, the problem undergoes an "all-or-nothing" phase transition, jumping from near-perfect recovery to near-zero accuracy. |
13:50 | The Reward Biased Method: An Optimism based Approach for Reinforcement Learning PRESENTER: P. R. Kumar ABSTRACT. A fundamental problem in reinforcement learning is what is referred to as the closed-loop identifiability problem, the conflict between exploration and exploitation, or the problem of dual control. Generally, for a system operating in closed-loop, one can only learn the closed-loop dynamics. To learn enough about the open-loop dynamics to be able to choose an optimal control law, one will need to excite the system to explore it. How to optimally do so? One approach proposed in 1980 consisted of adding a bias term that favored model parameters with smaller costs (or larger rewards) to the log-likelihood function, and then using a certainty equivalent control law. It was shown that this optimistic approach yields a long-term average cost reward that is optimal for the unknown system. This result was established under various conditions in a variety of contexts including Multi-Armed Bandits (MABs), Markov Decision Processes (MDPs), Linear Quadratic Gaussian (LQG) systems, nonlinear systems, and controlled diffusions. This method is now called the ``Reward-Biased Method (RBM).” An alternative optimistic approach was proposed by Lai and Robbins in 1983 in the context of MABs. It consisted of developing confidence intervals for each bandit, and using the upper end of the confidence interval to choose between bandits. Importantly, it was shown that this approach is asymptotically optimal with respect to the order of the ``regret” as well as its pre-constant. Subsequently this approach was extended under various conditions to systems such as MDPs and Linear Quadratic (LQ) Gaussian systems. This method is now called the ``Upper Confidence Bound” (UCB) Method. Over the past few years, the analysis of RBM has been extended to characterize its regret for MABs, MDPs and LQ sub-Gaussian systems. Also, comparative simulation studies have been conducted on the RBM, UCB and other methods. These results appear to show that RBM has excellent finite-time performance. RBM can be considered as a Lagrangian formulation, and UCB as a primal constrained optimization problem. The simulation results appear to suggest that the upper confidence regions of UCB are too large, or equivalently that the implicit Lagrange multiplier corresponding to UCB is too small. This paper will provide an account of these recent developments as well as some open problems. |
14:10 | On representation learning with the InfoNCE loss ABSTRACT. Contrastive learning is a popular means for learning representations of data without the need for labeled samples, with the InfoNCE loss being the most popular means of learning such representations. This method has had great empirical success, yet it is not well-understood how this loss enables representation learning. In brief, this method associates each data sample (e.g., an image) with positive augmentations (e.g., rotation, crop) of itself along with a finite number of negative samples (typically random, unrelated images); and each image with its augmentations forms a cluster in the image space. In this work, we show that if the clusters associated with related images (e.g., images of dogs) are possibly non-overlapping but are close and intertwined, then the InfoNCE loss provably recovers the underlying representation. |
14:30 | Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms ABSTRACT. We consider a stochastic multi-armed bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of T consecutive rounds. Though each objective has been individually well-studied, i.e., best arm identification for (i) and regret minimization for (ii), the simultaneous realization of both objectives remains an open problem, despite its practical importance. We introduce Regret Optimal Best Arm Identification which aims to achieve these dual objectives. To solve this problem, we characterize its fundamental limits and develop low-complexity algorithms for fast and regret optimal best arm identification. |
14:50 | Quantifying the Cost of Learning in Queueing Systems ABSTRACT. Queueing systems are widely applicable stochastic models with use cases in communication networks, healthcare, service systems, etc. Although their optimal control has been extensively studied, most existing approaches assume perfect knowledge of system parameters. Of course, this assumption rarely holds in practice where there is parameter uncertainty, thus motivating a recent line of work on bandit learning for queueing systems. This nascent stream of research focuses on the asymptotic performance of the proposed algorithms. In this paper, we argue that an asymptotic metric, which focuses on late-stage performance, is insufficient to capture the intrinsic statistical complexity of learning in queueing systems which typically occurs in the early stage. Instead, we propose the Cost of Learning in Queueing (CLQ), a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty. We characterize the CLQ of a single-queue multi-server system, and then extend these results to multi-queue multi-server systems and networks of queues. In establishing our results, we propose a unified analysis framework for CLQ which bridges Lyapunov and bandit analysis and could be of independent interest. |
15:10 | Stability Guarantees for Feature Attributions with Multiplicative Smoothing ABSTRACT. Explanation methods for machine learning models tend to not provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. To achieve such a model, we develop a smoothing method called Multiplicative Smoothing (MuS). We show that MuS overcomes theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with a variety of feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees. |
13:30 | Exploring Gradient Oscillation in Deep Neural Network Training ABSTRACT. Understanding optimization in deep learning is a fundamental problem, and recent findings have challenged the previously held belief that gradient descent stably trains deep networks. In this study, we delve deeper into the instability of gradient descent during the training of deep networks. By employing gradient descent to train various modern deep networks, we provide empirical evidence demonstrating that a significant portion of the optimization progress occurs through the utilization of oscillating gradients. These gradients exhibit a high negative correlation between adjacent iterations. Furthermore, we make the following noteworthy observations about these gradient oscillations (GO): (i) GO manifests in different training stages for networks with diverse architectures; (ii) when using a large learning rate, GO consistently emerges across all layers of the networks; and (iii) when employing a small learning rate, GO is more prominent in the input layers compared to the output layers. These discoveries indicate that GO is an inherent characteristic of training different types of neural networks and may serve as a source of inspiration for the development of novel optimizer designs. |
13:50 | Robust Transfer Learning Based on Minimax Principle PRESENTER: Xinyi Tong ABSTRACT. The similarity between target and source tasks is a crucial quantity for theoretical analyses and algorithm designs in transfer learning studies. However, this quantity is often difficult to be precisely captured. To address this issue, we make a boundedness assumption on the task similarity and then propose a mathematical framework based on the minimax principle, which minimizes the worst-case expected population risk under this assumption. Furthermore, our proposed minimax problem can be solved analytically, which provides a guideline for designing robust transfer learning models. According to the analytical expression, we interpret the influences of sample sizes, task distances, and the model dimensionality in knowledge transferring. Then, practical algorithms are developed based on the theoretical results. Finally, experiments conducted on image classification tasks show that our approaches can achieve robust and competitive accuracies for practical datasets. |
14:10 | Fast max-affine regression via stochastic gradient descent ABSTRACT. We consider regression of the max-affine model that combines multiple affine models via the max function. The max-affine model ubiquitously arises in applications such as multiclass classification and auction problems. It also generalizes the forward model in phase retrieval and rectifier linear unit activation function. We present a non-asymptotic convergence analysis of mini-batch stochastic gradient descent (SGD) for max-affine regression when the model is observed at random locations following the sub-Gaussianity with anti-concentration. Under these assumptions, a suitably initialized SGD converges linearly to the ground truth. Due to its low per-iteration cost, SGD converges faster than alternating minimization and gradient descent in run time. Our numerical results corroborate the presented theoretical results. |
14:30 | Relative Entropy Regularization for Robust Submodular Multi-Task Subset Selection ABSTRACT. We propose a novel formulation for multi-task subset selection with the aim of finding a solution that is locally distributionally robust in the ``neighborhood'' of a reference distribution, assigning an importance score to each task in a multi-task objective, using relative entropy as a regularizer. Using duality, we show that this novel formulation is equivalent to a normalized, monotone nondecreasing submodular function, which can be optimized very efficiently with standard greedy-based methods. This approach bridges the existing gap in optimization of performance-robustness tradeoff in multi-task subset selection. We then experimentally corroborate our theoretical results, comparing it with two other algorithms focused on optimizing the performance of the worst-case task, and on directly optimizing the performance on the reference distribution itself. We conclude that solving our novel formulation produces a solution that performs well on the reference distribution, is locally distributionally robust, and is quick in terms of computation time. |
14:50 | Manifold Optimization for Data Driven Reduced-Order Modeling PRESENTER: Rachel Newton ABSTRACT. The focus of this paper is on data-driven reduced-order modeling. We assume a high fidelity, discrete-time model is available for simulation. The simulator allows state and output trajectories to be collected for any specified initial condition and input signal. An optimal reduced-order model (ROM) requires: (i) the selection of a lower dimensional subspace for the state of the ROM, and (ii) an optimal reduced-order state-space model (evolving on the lower dimensional subspace). A common heuristic is to: (i) select the lower-order subspace using proper orthogonal decomposition (POD), and (ii) use least-squares to fit the reduced-order state-space model on the POD subspace. We demonstrate the potential deficiencies of this heuristic via two simple examples. In order to address these deficiencies, we propose a novel method to optimize the choice of subspace using the Grassmannian manifold. Finally, we show that our proposed manifold optimization empirically outperforms the POD heuristic on the two motivating examples and a planar wind farm model. |
15:30 | Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls Oracle ABSTRACT. In online exploration systems where users with fixed preferences repeatedly arrive, it has recently been shown that O(1), i.e., bounded regret, can be achieved when the system is modeled as a linear contextual bandit. This result may be of interest for recommender systems, where the popularity of their items is often short-lived, as the exploration itself may be completed quickly before potential long-run non-stationarities come into play. However, in practice, exact knowledge of the linear model is difficult to justify. Furthermore, potential existence of unobservable covariates, uneven user arrival rates, interpretation of the necessary rank condition, and users opting out of private data tracking all need to be addressed for practical recommender system applications. In this work, we conduct a theoretical study to address all these issues while still achieving bounded regret. Aside from proof techniques, the key differentiating assumption we make here is the presence of effective Synthetic Control Methods (SCM), which are shown to be a practical relaxation of the exact linear model knowledge assumption. We verify our theoretical bounded regret result using a minimal simulation experiment. |
15:50 | A Unified Analysis of Dynamic Interactive Learning PRESENTER: Thomas Maranzatto ABSTRACT. In this paper we investigate the problem of learning evolving concepts over a combinatorial structure. Previous work by Emamjomeh-Zadeh et al. (2020) introduced dynamics into interactive learning as a way to model non-static user preferences in clustering problems or recommender systems. We provide many useful contributions to this problem. First, we give a framework that captures the two different models analyzed by Emamjomeh-Zadeh et al. (2020), which allows us to study any type of concept evolution and matches the same query complexity bounds and running time guarantees of the previous models. Using this general model we solve the open problem of closing the gap between the upper and lower bounds on query complexity. Finally, we study an efficient algorithm where the learner simply follows the feedback at each round, and we provide mistake bounds for low diameter graphs such as cliques, stars, and general o(log n) diameter graphs by using a Markov Chain model. |
16:10 | Deep Learning Meets Adaptive Filtering: A Stein's Unbiased Risk Estimator Approach PRESENTER: Zahra Esmaeilbeig ABSTRACT. This paper revisits two prominent adaptive filtering algorithms through the lens of algorithm unrolling, namely recursive least squares (RLS) and equivariant adaptive source separation (EASI), in the context of source estimation and separation. Building upon the unrolling methodology, we introduce novel task-based deep learning frameworks, denoted as Deep RLS and Deep EASI. These architectures transform the iterations of the original algorithms into layers of a deep neural network, thereby enabling efficient source signal estimation by taking advantage of a training process. To further enhance performance, we propose training these deep unrolled networks utilizing a loss function grounded on a Stein's unbiased risk estimator (SURE). Our empirical evaluations demonstrate the efficacy of this SURE-based approach for enhanced source signal estimation. |
16:30 | Inverse Reinforcement Learning with Learning and Leveraging Demonstrators' Varying Expertise Levels ABSTRACT. A common assumption in most Inverse Reinforcement Learning (IRL) methods is that human demonstrations are drawn from an optimal policy. However, this assumption poses a problem in real-world applications, as humans may have varying aptitudes for different tasks. To address this, preference-based IRL methods, using human rankings of trajectories, have been explored as a potential solution. More recently, demonstrations from humans with different levels of expertise have also been utilized to relax this assumption. In this paper, we consider scenarios where there is no prior knowledge of the demonstrators' expertise levels and humans are not readily available to provide rankings. Therefore, we propose a new method called Inverse Reinforcement Learning from Varying Expertise Levels (IRL-VEL), which combines demonstrations from different demonstrators with different levels of expertise to learn a high performance policy from their proficiency scores—an indication of how optimal their demonstrations are. In particular, we solve an optimization problem to learn these scores and subsequently use them to efficiently learn a reward function that optimizes a policy through reinforcement learning. To assess the performance of our algorithm, we learn two separate policies from two reward functions: an initial reward function directly obtained from these combined demonstrations and a final reward function that incorporates the proficiency scores. We evaluate the proposed algorithm in two simulation environments, namely, the continuous mountain car and the MuJoCo swimmer. The simulation results demonstrate that the proposed method can effectively learn and leverage the expertise levels underlying the demonstrations to learn a policy, significantly outperforming the methods that do not account for the expertise levels. |
16:50 | Sequential Dependence Decomposition and Feature Learning PRESENTER: Xiangxiang Xu ABSTRACT. In this paper, we introduce an approach to decompose statistical dependence and learn informative features from sequential data. We first present a sequential decomposition of dependence, which extends the chain rule of mutual information. To learn this decomposition from data, we investigate the optimal feature representations associated with decomposed dependence modes and develop corresponding learning algorithms. Specifically, for stationary processes, we demonstrate applications of the learned features in computing optimal Markov approximation and testing the order of Markov processes. |
15:30 | A Multidimensional Opinion Evolution Model with Confirmation Bias ABSTRACT. In this paper, we present a novel opinion dynamics model in vector spaces. We are individuals and information sources, where individuals' opinions evolve over time and are represented by vectors. The information sources are characterized by persistent viewpoints, while confirmation bias is explicitly incorporated into the model to account for individuals' selective information acquisition from information sources that align with their existing opinions. Drawing inspiration from the Friedkin-Johnsen model, this mechanism combines an individual's innate opinion, the opinions of their social network neighbors, and the influence of information sources. The social network operates on trust and remains static, while the information sources introduce dynamism through weighted norms based on the distance between an individual's and the information source's opinion vector. We characterize the convergence conditions for the proposed dynamics and present estimation and computation methods for steady-state values under both linear and non-linear state-dependent confirmation bias weight functions. |
15:50 | On counterfactual inference with unobserved confounding ABSTRACT. Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the underlying joint distribution as an exponential family, we reduce learning the unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing covariates. |
16:10 | Block-missing data in linear systems: An unbiased stochastic gradient descent approach PRESENTER: Michael Strand ABSTRACT. Achieving accurate approximations to solutions of large linear systems is crucial, especially when those systems utilize real-world data. A consequence of using real-world data is that there will inevitably be missingness. Current approaches for dealing with missing data, such as deletion and imputation, can introduce bias. Recent studies proposed an adaptation of stochastic gradient descent (SGD) in specific missing-data models. In this work, we propose a new algorithm, $\ell$-tuple mSGD, for the setting in which data is missing in a block-wise, tuple pattern. We prove that our proposed method uses unbiased estimates of the gradient of the least squares objective in the presence of tuple missing data. We also draw connections between $\ell$-tuple mSGD and previously established SGD-type methods for missing data. Furthermore, we prove our algorithm converges when using updating step sizes and empirically demonstrate the convergence of $\ell$-tuple mSGD on synthetic data. Lastly, we evaluate $\ell$-tuple mSGD applied to real-world continuous glucose monitoring (CGM) device data. |
16:30 | Learning Fair Classifiers via Min-Max F-divergence Regularization ABSTRACT. As machine learning (ML) based systems are adopted in domains such as law enforcement, criminal justice, finance, hiring and admissions, ensuring the fairness of ML aided decision-making is becoming increasingly important. In this paper, we focus on the problem of fair classification, and introduce a novel min-max F-divergence regularization framework for learning fair classification models while preserving high accuracy. Our framework consists of two trainable networks, namely, a classifier network and a bias/fairness estimator network, where the fairness is measured using the statistical notion of F-divergence. We show that F-divergence measures possess convexity and differentiability properties, and their variational representation makes them widely applicable in practical gradient based training methods. The proposed framework can be readily adapted to multiple sensitive attributes and for high dimensional datasets. We study the F-divergence based training paradigm for two types of group fairness constraints, namely, demographic parity and equalized odds. We present a comprehensive set of experiments for several real-world data sets arising in multiple domains (including COMPAS, Law Admissions, Adult Income, and CelebA datasets). To quantify the fairness-accuracy tradeoff, we introduce the notion of fairness-accuracy receiver operating characteristic (FA-ROC) and a corresponding low-bias FA-ROC, which we argue is an appropriate measure to evaluate different classifiers. In comparison to several existing approaches for learning fair classifiers (including pre-processing, post-processing and other regularization methods), we show that the proposed F-divergence based framework achieves state-of-the-art performance with respect to the trade-off between accuracy and fairness. |
16:50 | The Minimax Risk in Testing the Histogram of Discrete Distributions for Uniformity under Missing Ball Alternatives ABSTRACT. We consider the problem of testing the fit of a sample of items from many categories to the uniform distribution over the categories. As a class of alternative hypotheses, we consider the removal of an $\ell_p$ ball of radius $\epsilon$ around the uniform rate sequence for $p \leq 2$. We characterize the asymptotic minimax risk as the number of samples and number of dimensions go to infinity when $\epsilon$ is small, for testing based on the sample's histogram (number of absent categories, singletons, collisions, ...). For example, with $p=1$ and in the limit of a small expected number of samples $n$ compared to the number of categories $N$ (aka "sub-linear" regime), the minimax risk $R^*_\epsilon$ asymptotes to $2 \Phi(-n \epsilon^2/\sqrt{8N})$, with $\Phi(x)$ the normal CDF. Empirical studies over a range of problem parameters show that this estimate is accurate in finite samples and that our test is significantly better than the chisquared test or a test that only uses collisions. Our analysis relies on the asymptotic normality of histogram ordinates, the equivalence between the minimax setting and a Bayesian setting, and the characterization of the least favorable prior by reducing a multi-dimensional optimization problem to a one-dimensional problem. |
15:30 | New results on Erasure Combinatorial Batch Codes PRESENTER: Lu Le Phuc ABSTRACT. We investigate in this work the problem of Erasure Combinatorial Batch Codes (ECBC), in which $n$ files are stored on $m$ servers so that every set of $n-r$ servers allows a client to retrieve at most $k$ distinct files by downloading at most $t$ files from each server. Previous studies have solved this problem for the special case of $t=1$ using Combinatorial Batch Codes (CBC). We tackle the general case $t \geq 1$ using a generalization of Hall's theorem. Additionally, we address a realistic scenario in which the retrieved files are consecutive according to some order and provide a simple and optimal solution for this case. |
15:50 | Preserving Sparsity and Privacy in Straggler-Resilient Distributed Matrix Computations ABSTRACT. Existing approaches to distributed matrix computations involve allocating coded combinations of submatrices to worker nodes, to build resilience to stragglers and/or enhance privacy. In this study, we consider the challenge of preserving input sparsity in such approaches to retain the associated computational efficiency enhancements. First, we find a lower bound on the weight of coding, i.e., the number of submatrices to be combined to obtain coded submatrices to provide the resilience to the maximum possible number of stragglers (for given number of nodes and their storage constraints). Next we propose a distributed matrix computation scheme which meets this exact lower bound on the weight of the coding. Further, we develop controllable trade-off between worker computation time and the privacy constraint for sparse input matrices in settings where the worker nodes are honest but curious. Numerical experiments conducted in Amazon Web Services (AWS) validate our assertions regarding straggler mitigation and computation speed for sparse matrices. |
16:10 | Trapping and Absorbing Set Enumerators for Multi-Edge Type LDPC Code Ensembles ABSTRACT. The finite-length and asymptotic (elementary) trapping and (fully) absorbing set enumerators for multi-edge type(MET) low-density parity-check (LDPC) code ensembles are derived. An efficient method for the evaluation of the asymptotic distributions is presented and evaluated. |
16:30 | An Achievable Low Complexity Encoding Scheme for Coloring Cyclic Graphs PRESENTER: Derya Malak ABSTRACT. In this paper, we utilize graph coloring for functions captured by cycle graphs to achieve lossless function compression. For functional compression of characteristic graphs or cycles and their OR graphs, we present a feasible scheme for coloring the odd cycles and their OR power graphs. In addition, we calculate the chromatic number for coloring the $n$-th power of even cycles, which paves the way for calculating the minimum entropy coloring. Moreover, we determine the number of distinct eigenvalues in each OR power and the largest eigenvalue for each power, which allows us to derive a bound on the chromatic number of each OR power of a cycle. Our method provides a proxy for understanding the fundamental limits of compressing cyclic graphs, which has broad applications in various practical use cases of cryptography and computer algebra. |
16:50 | Multishot Adversarial Network Decoding ABSTRACT. We investigate adversarial network coding and decoding focusing on the multishot regime. Errors can occur on a proper subset of the network edges and are modeled via an adversarial channel. The paper contains both bounds and capacity-achieving schemes for the Diamond Network and the Mirrored Diamond Network. We also initiate the study of the generalizations of these networks. |
Organizers - Lili Su, Pengkun Yang, Jiaming Xu, and Rasoul Etesami
15:30 | Fedsoft: Soft clustered federated learning with proximal local updating ABSTRACT. While federated learning has recently exploded in popularity as a means to distributedly train a machine learning model, most federated learning frameworks assume that one model is used for all users. In practice, users may instead wish to utilize models personalized to their own local data. We propose a mechanism to do so that follows the ideas of clustered federated learning, in which each client's data is drawn from a mixture of the same underlying source distributions. FedSoft trains both locally personalized models and high-quality cluster models in this setting and limits client workload by using proximal updates to require the completion of only one optimization task from a subset of clients in every communication round. We show, analytically and empirically, that FedSoft effectively exploits similarities between the source distributions to learn personalized and cluster models that perform well. |
15:50 | Complete Asymptotic Analysis for Projected Stochastic Approximation and Debiased Variants PRESENTER: Xiang Li ABSTRACT. In this study, we investigate federated optimization through the lens of linearly constrained optimization problems and analyze the loopless projection stochastic approximation (LPSA) method. LPSA incorporates a probabilistic projection with probability $p_n$ at the $n$-th iteration to ensure solution feasibility at random. We set $p_n \propto \eta_n^\beta$, where $\eta_n$ represents the step size and $\beta \in [0, 1]$ is a tuning parameter. The previous research demonstrates that LPSA exhibits a fascinating bias-variance trade-off through diffusion approximation in the asymptotic regime. Specifically, for $\beta < 0.5$, the bias becomes degenerate, while for $\beta > 0.5$, the bias dominates the variance. In this work, we investigate the intricate scenario where $\beta=0.5$ and discover that the last iterate, after appropriate scaling, weakly converges to a biased Gaussian distribution. We specify the exact form of the incurred bias, as a result of which, we provide a comprehensive asymptotic analysis of LPSA and a complete characterization of phase transitions. We observe that such a non-zero bias leads to slow convergence. Motivated by this observation, we propose a debiased version of LPSA, called Debiased LPSA (DLPSA), which effectively reduces projection complexity compared to the vanilla LPSA. |
16:10 | Incentive Mechanism Design for Federated Learning and Unlearning ABSTRACT. To protect users' right to be forgotten in federated learning, federated unlearning aims at eliminating the impact of leaving users’ data on the global learned model. The current research in federated unlearning mainly concentrated on developing effective and efficient unlearning techniques. However, the issue of incentivizing valuable users to remain engaged and preventing their data from being unlearned is still under-explored, yet important to the unlearned model performance. This paper focuses on the incentive issue and develops an incentive mechanism for federated learning and unlearning. We first characterize the leaving users' impact on the global model accuracy and the required communication rounds for unlearning. Building on these results, we propose a four-stage game to capture the interaction and information updates during the learning and unlearning process. A key contribution is to summarize users’ multi-dimensional private information into one-dimensional metrics to guide the incentive design. We show that users who incur high costs and experience significant training losses are more likely to discontinue their engagement through federated unlearning. On the other hand, the server tends to retain users who make substantial contributions to the model but has a trade-off on users' training losses, as large training losses of retained users increase privacy costs but decrease unlearning costs. The numerical results demonstrate the necessity of unlearning incentives for retaining valuable leaving users, and also show that our proposed mechanisms decrease the server's cost by up to 53.91% compared to state-of-the-art benchmarks. |
16:30 | High Probability Guarantees For Federated Learning. PRESENTER: Sravani Ramishetty ABSTRACT. Federated learning (FL) has emerged as a promising approach for training machine learning models on distributed data while ensuring privacy preservation and data locality. However, one key challenge in FL optimization is the lack of high probability guarantees, which can undermine the trustworthiness of FL solutions. To address this critical issue, we introduce Federated Averaging with post-optimization (FedAvg-PO) method, a modification to the Federated Averaging (FedAvg) algorithm. The proposed algorithm applies a post-optimization phase to evaluate a short list of solutions generated by several independent runs of the FedAvg method. These modifications allow to significantly improve the large-deviation properties of FedAvg which improve the reliability and robustness of the optimization process. The novel complexity analysis shows that FedAvg-PO can compute accurate and statistically guaranteed solutions in the federated learning context. Our result further relaxes the restrictive assumptions in FL theory by developing new technical tools which may be of independent interest. The insights provided by the computational requirements analysis contribute to the understanding of the scalability and efficiency of the algorithm, guiding its practical implementation. |
16:50 | A Non-parametric View of FedAvg and FedProx: Beyond Stationary Points ABSTRACT. Federated Learning (FL) is a promising decentralized learning framework and has great potentials in privacy preservation and in lowering the computation load at the cloud. Recent work showed that FedAvg and FedProx – the two widely-adopted FL algorithms – fail to reach the stationary points of the global optimization objective even for homogeneous linear regression problems. Further, it is concerned that the common model learned might not generalize well locally at all in the presence of heterogeneity. In this paper, we analyze the convergence and statistical efficiency of FedAvg and FedProx, addressing the above two concerns. Our analysis is based on the standard nonparametric regression in a reproducing kernel Hilbert space (RKHS), and allows for heterogeneous local data distributions and unbalanced local datasets. We prove that the estimation errors, measured in either the empirical norm or the RKHS norm, decay with a rate of $1/t$ in general and exponentially for finite-rank kernels. In certain heterogeneous settings, these upper bounds also imply that both FedAvg and FedProx achieve the optimal error rate. To further analytically quantify the impact of the heterogeneity at each client, we propose and characterize a novel notion-federation gain, defined as the reduction of the estimation error for a client to join the FL. We discover that when the data heterogeneity is moderate, a client with limited local data can benefit from a common model with a large federation gain. Two new insights introduced by considering the statistical aspect are: (1) requiring the standard bounded dissimilarity is pessimistic for the convergence analysis of FedAvg and FedProx; (2) despite inconsistency of stationary points, their limiting points are unbiased estimators of the underlying truth. Numerical experiments further corroborate our theoretical findings. |