View: session overviewtalk overview
08:30 | A Precise Performance Analysis of the Randomized Singular Value Decomposition ABSTRACT. The Randomized Singular Value Decomposition (RSVD) is a widely used algorithm for efficiently computing low-rank approximations of large matrices, without the need to construct a full-blown SVD. Of interest, of course, is the approximation error of RSVD compared to the optimal low-rank approximation error obtained from the SVD. While the literature provides various upper and lower error bounds for RSVD, in this paper we derive precise asymptotic expressions that characterize its approximation error as the matrix dimensions grow to infinity. Our expressions depend only on the singular values of the matrix, and we evaluate them for two important matrix ensembles: those with power law and bilevel singular value distributions. Our results aim to quantify the gap between the existing theoretical bounds and the actual performance of RSVD. Furthermore, we extend our analysis to polynomial-filtered RSVD, deriving performance characterizations that provide insights into optimal filter selection. |
08:50 | Estimating Error in Natural Distribution Estimation PRESENTER: Haricharan Balasundaram ABSTRACT. Given i.i.d. samples from an unknown discrete distribution, the goal of distribution estimation is to construct an accurate estimate of the underlying distribution. Natural distribution estimators assign one probability estimate to all letters occurring with the same frequency, and this is well-justified for i.i.d. models. However, natural estimators can be significantly erroneous for low frequency or missing (frequency 0) letters in large alphabet scenarios. In this work, we introduce a statistic that captures the unavoidable error at a particular frequency of any natural distribution estimator. For this proposed error statistic, which depends on the distribution and the samples, we provide an estimator that is non-linear in the prevalences (frequencies of frequencies). We show that the proposed estimator has low bias and is consistent, and can be used to ascertain if the distribution restricted to letters of same frequency is close to uniform. Our approach is validated through simulations on synthetic and natural language data. |
09:10 | Linking Diffusion Models & I-MMSE ABSTRACT. This paper establishes a theoretical connection between diffusion models and the I-MMSE theorem in information theory. Specifically, we demonstrate that two widely used log-likelihood-based training objectives in diffusion models are closely related, with their relationship explicitly established via the I-MMSE theorem. As a direct consequence of this theoretical link, we derive a single-step denoising method that achieves optimal performance without iterative sampling. |
09:30 | Bounds on the Privacy Amplification of Arbitrary Channels via the Contraction of $f_\alpha$-Divergence PRESENTER: Leonhard Grosse ABSTRACT. We examine the privacy amplification of channels that do not necessarily satisfy any LDP guarantee by analyzing their contraction behavior in terms of $f_\alpha$-divergence, an $f$-divergence related to Rényi-divergence via a monotonic transformation. We present bounds on contraction for restricted sets of prior distributions via $f$-divergence inequalities and present an improved Pinsker's inequality for $f_\alpha$-divergence based on the joint range technique by Harremoës and Vajda. The presented bound is tight whenever the value of the total variation distance is larger than $\nicefrac{1}{\alpha}$. By applying these inequalities in a cross-channel setting, we arrive at strong data processing inequalities for $f_\alpha$-divergence that can be adapted to use-case specific restrictions of input distributions and channel. The application of these results to privacy amplification shows that even very sparse channels can lead to significant privacy amplification when used as a post-processing step after local differentially private mechanisms. |
09:50 | Information-Theoretic Outer-bounds from Sphere-Packing, Compactness, and Zero-One Laws ABSTRACT. New derivations of outer-bounds for the point-to-point channel and the relay channel are proposed using a combination of sphere-packing and compactness arguments, as well as zero-one laws. The key idea is the notion of ``decoding sets'', which map regions of the ${\bf y}$-space to transmitted codewords from the code-book. Each decoding set requires an asymptotic minimum volume of $2^{nH(Y|X)}$ sequences in order for the decoding error to vanish. Strong converses for the point-to-point channel are derived from Kolmogorov's zero-one law. An outer-bound for the relay channel is proposed when the channel capacity exceeds $I(X_{s};Y_{r}|X_{r})$. The proposed outer-bound is $\max_{p(x_{s})p(x_{r})}I(X_{s};\hat{Y}_{r}Y_{d}|X_{r})$ for $I(\hat{Y}_{r};Y_{r}|X_{r})>I(X_{r};Y_{d})$. This outer-bound can be achieved using a simplified version of the original compress-forward scheme. |
Organizers: Venu Veeravalli, Shaofeng Zou, & George Atia
Organizers: Yunzong Xu & Bhaskar Ray Chaudhury
Organizers: Ilan Shomorony & Lav Varshney
10:40 | Learning and inference from temporally dependent data ABSTRACT. Given a sequence of length $n$ from an unknown, discretely supported distribution, can we estimate, for each $k$, the mass placed by the distribution on elements occurring $k$ times? Flavors of this question have been tackled by an extensive body of literature spanning statistics, information theory and computer science, and dating back to the 40s. Classical estimators of the “unseen” such as those due to Good and Turing are known to possess desirable properties when the data is i.i.d. However, these estimators are no longer reliable in modern settings spanning genomics and language, where the data can be temporally dependent. We design an alternative family of estimators for learning properties of the stationary distribution in mixing sequences. We show that these practical, linear-time estimators achieve a form of “universal consistency” in that their error can decay in the sequence length $n$ with no dependence on the size of the support. Our guarantees are also adaptive to properties of the stationary distribution. Time permitting, we will also show how similar algorithmic ideas can be applied to perform assumption-lean predictive inference with dependent data. |
11:00 | Ordinal-Poisson Causal Discovery ABSTRACT. This paper deals with causal discovery in the setting where random variables in the causal graph can be either ordinal or Poisson. In particular, we show that it is possible to recover the causal network (modeled by a directed acyclic graph) from purely observational data. Such results are important in the field of causal discovery since, in general, it is impossible to recover causal relationships from purely observational data, and the identifiability of causal network structures relies on assumptions regarding the data-generating process. Causal discovery in the setting where all variables in the graph are ordinal and the setting where all variables are Poisson has been separately considered. However, the setting in this paper where the two are allowed to co-exist in the graph has, to the best of our knowledge, not been considered. In addition to proving the identifiability of ordinal-Poisson causal discovery, we show, through numerical simulations, that the Bayesian information criterion (BIC) tends to reliably recover the true causal graph as the number of observations increases. |
11:20 | Synchronizing Probabilities in Model-Driven Compression ABSTRACT. Classical lossless compression algorithms such as Huffman coding or arithmetic coding rely on using probabilities of symbols to achieve compression rates near entropy. Deep neural networks provide a natural choice for estimating these probabilities using their impressive ability to model relevant domains such as natural language or images. However, to take advantage of this for compression, both the compressor and decompressor require access to the same neural network to compute these probabilities directly from context without communicating the probabilities themselves. This introduces the possibility that slight variations in hardware or software may cause the computed probabilities to differ slightly at the compressor and decompressor ends, which could derail the entire compression algorithm. In this work, we look at methods to solve this issue by synchronizing the probabilities generated at the compressor and decompressor ends, with minimal additional communication. |
11:40 | A FREQUENTIST THEORY FOR M-POSTERIORS: ASYMPTOTIC AND ROBUSTNESS PROPERTIES ABSTRACT. An M-estimator is a general class of estimators in statistics that are defined as the minimizer of an objective function, typically derived from a loss or score function and an M-posterior incorporates the Bayesian framework through a prior distribution the parameters. In this work, we study asymptotic properties of such posteriors as well as their robustness to outliers. We show that the robustness properties of estimators based on M-posteriors are similar to those of M-estimators. |
12:00 | Focal Loss: Statistical and Information-Theoretic Perspectives PRESENTER: Martina Cardone ABSTRACT. The focal loss has emerged as a de facto training loss in machine learning domains, especially in computer vision. Despite its ubiquity, its theoretical properties and benefits have not been well explored. In this talk, we will investigate statistical properties of the focal loss. In particular, we will explore the consistency and asymptotic normality of estimators that minimize the focal loss. In addition, we will also study several information-theoretic aspects, including connections to the relative entropy and the emergence of novel information measures induced by the focal loss. |
10:40 | Optimal Sampling and Scheduling for Remote Fusion Estimation of Correlated Wiener Processes PRESENTER: Elif Uysal ABSTRACT. In distributed sensor networks, sensors often observe a dynamic process within overlapping regions. Due to random delays, these correlated observations arrive at the fusion center asynchronously, raising a central question: How can one fuse asynchronous yet correlated information for accurate remote fusion estimation? This paper addresses this challenge by studying the joint design of sampling, scheduling, and estimation policies for monitoring a correlated Wiener process. Though this problem is coupled, we establish a separation principle and identify the joint optimal policy: the optimal fusion estimator is a weighted-sum fusion estimator conditioned on Age of Information (AoI), the optimal scheduler is a Maximum Age First (MAF) scheduler that prioritizes the most stale source, and the optimal sampling can be designed given the optimal estimator and the MAF scheduler. To design the optimal sampling, we show that, under the infinite-horizon average-cost criterion, optimizing AoI is equivalent to optimizing MSE under pull-based communications, despite the presence of strong inter-sensor correlations. This structural equivalence allows us to identify the MSE-optimal sampler as one that is AoI-optimal. This result underscores an insight: information freshness can serve as a design surrogate for optimal estimation in correlated sensing environments. |
11:00 | Optimizing Preventive and Reactive Defense Resource Allocation with Uncertain Sensor Signals ABSTRACT. Cyber attacks continue to be a cause of concern despite advances in cyber defense techniques. Although cyber attacks cannot be fully prevented, standard decision-making frameworks typically focus on how to prevent them from succeeding, without considering the cost of cleaning up the damages incurred by successful attacks. This motivates us to investigate a new resource allocation problem formulated in this paper: The defender must decide how to split its investment between preventive defenses, which aim to harden nodes from attacks, and reactive defenses, which aim to quickly clean up the compromised nodes. This encounters a challenge imposed by the uncertainty associated with the observation, or sensor signal, whether a node is truly compromised or not; this uncertainty is real because attack detectors are not perfect. We investigate how the quality of sensor signals impacts the defender's strategic investment in the two types of defense, and ultimately the level of security that can be achieved. In particular, we show that the optimal investment in preventive resources increases, and thus reactive resource investment decreases, with higher sensor quality. We also show that the defender's performance improvement, relative to a baseline of no sensors employed, is maximal when the attacker can only achieve low attack success probabilities. |
11:20 | Error-Sensitive Dynamic Calibration for Online Optimization with Switching Costs ABSTRACT. Neural networks (NNs) have achieved state-of-the-art performance in a wide range of applications; however, their deployment in online settings often exposes a lack of robustness due to various dynamic and uncertain conditions. In contrast, traditional online algorithms offer formal robustness guarantees, yet generally do not attain the predictive accuracy of modern neural network models. To address this trade-off, we propose a novel neural network architecture that incorporates a dynamic calibration layer, designed to enhance robustness in online environments without sacrificing predictive performance. The proposed dynamic calibration layer consists of a differentiable optimization component that adjusts the NN output in real time, enabling end-to-end training via standard backpropagation. We provide theoretical analysis establishing performance bounds relative to an offline optimal benchmark, and we leverage these bounds to constrain the function class used for dynamic calibration, ensuring both practical feasibility and theoretical soundness. Empirical evaluation is conducted on a real-world case study involving data center energy management. Comparative results against existing online and hybrid approaches demonstrate that our architecture consistently outperforms baseline methods, both in terms of average performance and robustness to environmental variability. |
11:40 | Control of Dynamical Systems with Multiplicative Observation Noise with Unknown Distribution Parameters ABSTRACT. In this work, we consider the problem of stabilizing a linear dynamical system with multiplicative observation noise (MON), where the precise distribution generating the MON is unknown. We propose a control algorithm that first uses system identification to estimate the necessary parameters for the optimal policy and then applies the resulting control policy using those parameter estimates. We provide theoretical guarantees for our algorithm which show that (1) our estimation scheme requires $O(\log(1/\delta)^2/\epsilon^2)$ samples to obtain estimates with $\epsilon$ accuracy and high probability $1-\delta$, (2) the resulting controller guarantees second-moment stability conditioned on a good estimation event. This controller has a bounded gap to the performance of the optimal linear memoryless controller that knows the distribution of the noise a-priori. |
12:00 | Problem-dependent convergence bounds for randomized linear gradient compression ABSTRACT. —In distributed optimization, the communication of model updates can be a performance bottleneck. Consequently, gradient compression has been proposed as a means of increasing optimization throughput. In general, due to information loss, compression introduces a penalty on the number of iterations needed to reach a solution. In this work, we investigate how the iteration penalty depends on the interaction between compression and problem structure, in the context of non-convex stochastic optimization. We focus on linear schemes, where compression and decompression can be modeled as multiplication with a random matrix. We consider several distributions of matrices, among them Haar-distributed orthogonal matrices and matrices with random Gaussian entries. We find that the impact of compression on convergence can be quantified in terms of a smoothness matrix associated with the objective function, using a norm defined by the compression scheme. The analysis reveals that in certain cases, compression performance is related to low-rank structure or other spectral properties of the problem and our bounds predict that the penalty introduced by compression is significantly reduced compared to worst-case bounds that only consider the compression level, ignoring problem data. We verify the theoretical findings experimentally, including fine-tuning an image classification model. |
10:40 | BayesFlow++: A Unified Framework for Amortized Bayesian Inference in Complex Systems ABSTRACT. In the contemporary landscape of data science, the demand for robust and scalable inference methods has intensified, particularly when addressing complex, high-dimensional systems. Traditional Bayesian inference techniques, while theoretically rigorous, often suffer from computational limitations in the presence of intractable likelihoods or large parameter spaces. In order to overcome these obstacles, we present BayesFlow++, a sophisticated system that incorporates adaptive summary statistics learning and dynamic nested sampling into the core BayesFlow architecture. Through the use of invertible neural networks and simulation-based amortized inference, BayesFlow++ allows for quick posterior estimate across datasets without the need for retraining. By concentrating on high-posterior-probability zones, dynamic nested sampling enhances convergence and estimation accuracy while also enabling more efficient use of processing resources. Simultaneously, adaptive summary statistics learning replaces fixed data representations with task-specific, learned encodings, enhancing inference robustness across varying data distributions. Compared to current Bayesian methods, the unified framework provides better scalability, flexibility, and precision. To evaluate its effectiveness, we apply BayesFlow++ to three domains: epidemiological modeling, where it accurately estimates disease transmission dynamics; climate modeling, where it handles high-dimensional uncertainty in environmental systems; and in financial forecasting, where nonlinear dependencies are captured for analysis of probabilistic markets. In all cases, BayesFlow++ demonstrates superior performance in terms of inference accuracy and computational efficiency. This work establishes the foundation for future adaptations to streaming and real-time data settings and represents a major step toward general-purpose, scalable Bayesian inference in complicated real-world systems. |
11:00 | On the Sign Structure of the Ridge Estimator for Nonnegative Linear Mappings ABSTRACT. The $\ell^2$-regularized linear estimator (ridge regression estimator) is examined, in the case that the linear mapping from parameters to responses is entry-wise nonnegative. The sign pattern of several matrices associated with the estimator is characterized. The main finding is that, for several of these matrices, either their positive or negative entries define a connected graph. These analyses give some insight into the ridge estimator for the nonnegative case. |
11:20 | Hybrid Atomic Norm Sparse/Diffuse Channel Estimation ABSTRACT. In this paper, the hybrid sparse/diffuse (HSD) channel model in frequency domain is proposed. Based on the structural analysis on the resolvable paths and diffuse scattering statistics in the channel, the Hybrid Atomic-Least-Squares (HALS) algorithm is designed to estimate sparse/diffuse components with a combined atomic and l2 regularization. A theoretical analysis is conducted on the Lagrangian dual problem and the conditions needed to be satisfied by primal and dual solutions are provided. This analysis, in turn, suggests an algorithm for optimal frequency support estimation. Debiased methods for improved channel estimation are provided. Given differing amounts of side information, performance bounds are derived in terms of a genie-aided estimator and constrained Cramer-Rao lower bounds (CRLB). Numerical results via simulations on synthetic data as well as real experimental data validate the efficacy of the proposed method. There are clear tradeoffs with respect to the properties of the channel with respect to performance: sparsity of specular paths and relative energy of diffuse components. |
11:40 | Scaling Wideband Massive MIMO Radar via Beamspace Dimension Reduction PRESENTER: Oveys Delafrooz Noroozi ABSTRACT. We present an architecture for scaling digital beamforming for wideband massive MIMO radar. Conventional spatial processing becomes computationally prohibitive as array size grows; for example, the computational complexity of MVDR beamforming scales as $O(N^3)$ for an $N$-element array. In this paper, we show that energy concentration in beamspace provides the basis for drastic complexity reduction, with array scaling governed by the $O(N \log N)$ complexity of the spatial FFT used for beamspace transformation. Specifically, we propose an architecture for windowed beamspace MVDR beamforming, parallelized across targets and subbands, and evaluate its efficacy for beamforming and interference suppression for government-supplied wideband radar data from the DARPA SOAP (Scalable On-Array Processing) program. We demonstrate that our approach achieves detection performance comparable to full-dimensional benchmarks while significantly reducing computational and training overhead, and provide insight into tradeoffs between beamspace window size and FFT resolution in balancing complexity, detection accuracy, and interference suppression. |
12:00 | Multi-Cell MMSE Processing and Pilot Assignment in Massive MIMO under Correlated Fading PRESENTER: Azadeh Vosoughi ABSTRACT. This paper investigates a multi-cell massive MIMO network subject to spatially correlated Rayleigh fading, where pilot reuse occurs both within and across cells. Base stations (BSs) implement multi-cell minimum mean square error (M-MMSE) processing to mitigate interference. We present a new deterministic approximation of the uplink coherent interference term, which remains accurate in the large-antenna regime despite the presence of spatial correlation and pilot reuse—addressing a critical gap in prior analyses. Based on this analytical result, we design a multi-cell pilot assignment (PA) algorithm that fully removes pilot contamination by utilizing the spatial correlation information of all users. To enable deployment in large-scale networks, we also develop a scalable variant using partial M-MMSE (P-MMSE) processing, which restricts inter-BS coordination by selectively engaging only strong interferers, and provide a complexity evaluation to demonstrate its implementation feasibility. Simulation results confirm that our proposed PA algorithms significantly outperform existing PA schemes across various system configurations, achieving higher spectral efficiency, improved fairness, and reduced pilot overhead—offering practical design insights for next-generation massive MIMO systems. |