View: session overviewtalk overview

Invited Session

Invited Session

Invited Session

08:30 | A robust testing approach to measuring model varability ABSTRACT. Training deep neural networks generally uses stochastic optimization, which means each run produces a different model. Despite this variability, models are often deemed to be "equally good" based on their performance on the test set. A variety of measures for model variability have been proposed, usually based around the decisions made by classification models. In this work we propose using robust hypothesis testing to measure the difference between models by using the marginal distribution on pre-thresholded predictions. In particular, we measure a trimming level between models and argue that it provides a more nuanced view model variability. We apply this to understand the benefits of model ensembling. |

08:50 | An Information Geometric Analysis of Noise Contrastive Estimation PRESENTER: Hasan Sabri Melihcan Erol ABSTRACT. We study noise-contrastive estimation (NCE) through the lens of local information geometry. In particular, paralleling recent work [1], [2], we demonstrate that the optimal noise distribution for NCE is not the same as those typically chosen based on practical heuristics. Furthermore, we show that an important information geometric construct, the angle between the subspaces of distributions, dictates the asymptotic performance of an extension of NCE proposed in [3]. This analysis provides an information geometric interpretation of the method, encouraging further study of NCE and its variants using information geometric methods. |

09:10 | Impossibility of latent inner product recovery via rate distortion ABSTRACT. In a random geometric graph model or latent space model, the observed graph $A$ on $n$ vertices with average edge density $p$ is assumed to be generated from latent locations $z_1, \dots, z_n$ in $\mathbb{R}^d$ associated with the $n$ vertices. Given the graph $A$, it is of interest to estimate the inner products $\langle z_i, z_j \rangle$ which represents the geometry of the latent locations. In this note, assuming that the latent locations are Gaussian or spherical points, we show an impossibility result for inner product recovery when $d \gtrsim n h(p)$ where $h(p)$ is the binary entropy function. This matches the condition required for positive results on inner product recovery in the literature. The main technical ingredient of this work is a lower bound on the rate-distortion function of the Wishart distribution which is interesting in its own right. |

09:30 | OpAMP: Linear Regressions with Erasures ABSTRACT. Approximate message passing (AMP) is a powerful framework for precise characterization of the iterative inference algorithms. In recent work, we proposed linear operator approximate message passing (OpAMP) as a generalization of AMP that permits the data matrix to be passed through a linear operator at each iterations. In this paper, we apply this framework to large-scale linear regression subject to erasures. These erasures can be used to model scheduled partial computations as well as stragglers in distributed computing. |

09:50 | Chernoff Information as a Privacy Constraint for Adversarial Classification PRESENTER: Ayse Unsal ABSTRACT. This work inspects a privacy metric based on Chernoff information, \textit{Chernoff differential privacy}, due to its significance in characterization of the optimal classifier's performance. Adversarial classification, as any other classification problem is built around minimization of the (average or correct detection) probability of error in deciding on either of the classes in the case of binary classification. Unlike the classical hypothesis testing problem, where the false alarm and mis-detection probabilities are handled separately resulting in an asymmetric behavior of the best error exponent, in this work, we focus on the Bayesian setting and characterize the relationship between the best error exponent of the average error probability and $\varepsilon\textrm{-}$differential privacy. Accordingly, we re-derive Chernoff differential privacy in terms of $\varepsilon\textrm{-}$differential privacy using the Radon-Nikodym derivative and show that it satisfies the composition property for sequential composition. Subsequently, we present numerical evaluation results, which demonstrates that Chernoff information outperforms Kullback-Leibler divergence as a function of the privacy parameter $\varepsilon$, the impact of the adversary's attack and global sensitivity for the problem of adversarial classification in Laplace mechanisms. |

08:30 | Rendering Wireless Environments Useful for Gradient Estimators: A Zero-Order Stochastic Federated Learning Method PRESENTER: Elissa Mhanna ABSTRACT. Cross-device federated learning (FL) is a growing machine learning setting whereby multiple edge devices collaborate to train a model without disclosing their raw data. With the great number of mobile devices participating in more FL applications via the wireless environment, the practical implementation of these applications will be hindered due to the limited uplink capacity of devices, causing critical bottlenecks. In this work, we propose a novel doubly communication-efficient zero-order (ZO) method with a one-point gradient estimator that replaces communicating long vectors with scalar values and that harnesses the nature of the wireless communication channel, overcoming the need to know the channel state coefficient. It is the first method that includes the wireless channel in the learning algorithm itself instead of wasting resources to analyze it and remove its impact. We then offer a thorough analysis of the proposed zero-order federated learning (ZOFL) framework and prove that our method converges \textit{almost surely}, which is a novel result in nonconvex ZO optimization. We further prove a convergence rate of $O(\frac{1}{\sqrt[3]{K}})$ in the nonconvex setting. We finally demonstrate the potential of our algorithm with experimental results. |

08:50 | Mobility in Age-Based Gossip Networks ABSTRACT. We consider a gossiping network where a source forwards updates to a set of $n$ gossiping nodes that are placed in an arbitrary graph structure and gossip with their neighbors. In this paper, we analyze how mobility of nodes affects the freshness of nodes in the gossiping network. To model mobility, we let nodes randomly exchange positions with other nodes in the network. The position of the node determines how the node interacts with the rest of the network. In order to quantify information freshness, we use the version age of information metric. We use the stochastic hybrid system (SHS) framework to derive recursive equations to find the version age for a set of positions in the network in terms of version ages of sets of positions that are one larger or of the same size. We use these recursive equations to find an upper bound for the average version age of a node in two example networks. We perform numerical simulations to analyze how mobility affects version age of different positions in the network and also show that the upper bounds obtained for version age of the example networks are tight. |

09:10 | Linear-time Scheduling for Time-varying Wireless Networks via Randomization ABSTRACT. We study the problem of designing efficient throughput-optimal joint routing and scheduling policy for time-varying network, which has seen recent emergence from the surge in wireless edge-device systems. The only known solutions to the optimal control of dynamic network require solving the NP-hard problem of Max-Weight scheduling, thereby hindering the practicality. Even for the simplified setting of time-invariant network, previous work only considered efficient scheduling without routing, where any direct extension would only be able to incorporate unicast routing, which is restricted in practice. In this paper, we propose the Randomized Universal Max-Weight (RUMW) policy, which supports the generalized routing paradigm and relies on incremental sampling and updating rule to find good scheduling decisions, thereby resulting a linear-time scheduling algorithm. We further show that the RUMW algorithm is throughput-optimal in the setting of time-varying network. Of independent interest, the algorithmic structure of RUMW is directly congruent with the emerging Software-Defined Network (SDN) architecture. |

09:30 | Traffic-driven Spectrum and Power Allocation via Scalable Multi-agent Reinforcement Learning ABSTRACT. This paper introduces a novel traffic-driven approach to radio resource allocation in cellular networks by leveraging a fully scalable multi-agent reinforcement learning (MARL) framework. The objective is to minimize packets delay of links under stochastic arrivals, where access points (APs) make spectrum and power allocation decisions based on limited local information. Formulated as a distributed learning problem, we implement multi-agent proximal policy optimization (MAPPO) algorithm with recurrent neural networks and queueing dynamics to train flexible policies that map dynamic traffic and channel state information (CSI) to allocation decisions. The proposed MARL-based solution enables decentralized training and execution, ensuring scalability to large networks. Extensive simulations demonstrate that the proposed methods achieve comparable packet delay performance to genie-aided centralized algorithms while using only local information and reducing execution time. The trained policies also show scalability and robustness across various network sizes and traffic conditions. |

Invited Session

Invited Session

10:30 | Linearly Solvable General-Sum Markov Games ABSTRACT. Linearly solvable Markov decision processes (LSMDPs) are a special class of Markov decision processes (MDPs) in which the optimal value function under an exponential transformation satisfies a linear equation. This model was previously extended to a class of two-player, zero-sum Markov games in which the game's equilibrium value function can similarly be derived from a linear equation. In this work, a new class of linearly solvable n-player, general-sum Markov games is proposed. We show this class of games has a unique Nash equilibrium, and the equilibrium value functions can be derived from a single linear equation. We demonstrate how to approximate arbitrary discrete-state, discrete-action Markov games outside this class through an embedding process analogous to the way LSMDPs approximate standard MDPs. An off-policy reinforcement learning algorithm for general-sum Markov games, which we call Nash-Z learning, is presented. We empirically demonstrate that Nash-Z learning finds equilibrium policies with low regret and that it finds these solutions orders of magnitude faster than the classic Nash-Q learning algorithm. |

10:50 | Constrained Correlated Equilibria ABSTRACT. This paper introduces constrained correlated equilibrium, a solution concept combining correlation and coupled constraints in finite non-cooperative games. In a correlated equilibrium, players coordinate their actions based on common random signals, yielding outcomes that can be more efficient than Nash equilibria but may lack feasibility when considering certain constraints. We present the formal definition of the concept, illustrating its relevance through examples. Furthermore, we analyze its fundamental properties, including relations to correlated equilibria in the general case of arbitrary constraints. In the particular case of constraints induced by a feasible set of probability distributions over action profiles, we show that canonical correlation devices suffice to characterize the set of constrained correlated equilibrium distributions and provide a sufficient condition of the existence of the constrained correlated equilibrium. Finally, we illustrate these results through numerical examples. |

11:10 | Fairness of Equilibrium in Proportional Sharing Mechanisms with Bid Costs ABSTRACT. This paper considers the proportional sharing mechanism from the lens of fairness. Deviating from prior work we consider the net utility to users by incorporating a weighted cost of bidding, one example being the use of power, incurred at user-dependent costs, to competitively bid for throughput in wireless systems. In this context, we analyze the price of anarchy of the equilibrium when the social welfare is measured both by a fair-sharing utility measure, as well as the sum of utility measures. In the context of power bids by users, we consider both continuous and discrete bids. The analysis shows the PoA can be zero and even for a uniform linear utility function when using the sum-total measure, but with distinct costs of bids, is asymptotically zero with increasing number of users indicating that a number of users/clients were starved of resources. This illustrates that accounting for the cost of bids can substantially change the structure of the equilibrium as compared to uniform costs. We also investigate the number of such starved users experimentally. In order to determine equilibrium, we show convex program formulations as well as present approximation methods to determine (1 + ϵ)-approximate equilibrium via an efficient algorithm. We also design methods to compute the optimum solution. Optimization of the social utility in the general case, where multiple resources can be chosen by users, is shown to be NP-hard. Experimental results illustrate the asymptotic behavior of the PoA and of the number of users that are starved of resources when users (competition) increases. |

11:30 | Price of Anarchy of a Censorship Policy removing Misinformation in a Social Network ABSTRACT. The proliferation of misinformation on social media has become a significant concern, particularly in the realms of political discourse and public health. Censorship policies have emerged as a solution to limit the spread of misinforma- tion. However, although censorship reduces the proportion of misinformation disseminated, it also creates an implied truth effect which skews the perception of less reliable information. This paper investigates the impact of censorship policies in an online social network model where agents sequentially observe an article and decide whether to share it with others or not. We measure the impact of censorship in the virality of articles containing misinformation and observe that while cen- sorship can effectively reduce the spread of misinformation, it also allows less reliable articles to spread over the network. Specifically, we quantify the “price of anarchy” associated with these censorship policies using a formal model that incorporates agents’ beliefs, network structure, and content reliability. Unlike usual frameworks of resources allocation games in commutation networks, we show that the price of anarchy is unbounded and we exhibit minimal limit case scenarios. |

11:50 | Suppressing Overestimation in Q-Learning through Adversarial Behaviors PRESENTER: Hyeann Lee ABSTRACT. The goal of this paper is to propose a new Q-learning algorithm with a dummy adversarial player, which is called dummy adversarial Q-learning (DAQ), that can effectively regulate the overestimation bias in standard Q-learning. With the dummy player, the learning can be formulated as a two-player zero-sum game. The proposed DAQ unifies several Q-learning variations to control overestimation biases, such as maxmin Q-learning and minmax Q-learning (proposed in this paper) in a single framework. The proposed DAQ is a simple but effective way to suppress the overestimation bias through dummy adversarial behaviors and can be easily applied to off-the-shelf value-based reinforcement learning algorithms to improve the performances. A finite-time convergence of DAQ is analyzed from an integrated perspective by adapting an adversarial Q-learning. The performance of the suggested DAQ is empirically demonstrated under various benchmark environments. |

12:10 | Non-Monotone Variational Inequalities ABSTRACT. In this paper, we focus on deriving some sufficient conditions for the existence of solutions to non-monotone Variational Inequalities (VIs) based on inverse mapping theory. We have obtained several widely applicable sufficient conditions for this problem and have introduced a sufficient condition for the existence of a Minty solution. We have shown that the extra-gradient method converges to a solution of VI in the presence of a Minty solution. Additionally, we have shown that, under some extra assumption, the algorithm is efficient and approaches a particular type of Minty solution. Interpreting these results in an equivalent game theory problem, weak coupling conditions will be obtained, stating that if the players' cost functions are sufficiently weakly coupled, the game has a pure quasi-Nash equilibrium. Moreover, under the additional assumption of the existence of Minty solutions, a pure Nash equilibrium exists for the corresponding game. |

Invited Session

10:30 | From Large Language Models to Large Agent Models: Planning and Reasoning with Physical World Knowledge ABSTRACT. While Large Language Models excel in language processing, Large Agent Models are designed to interact with the environment. This transition poses significant challenges in understanding lower-level visual details, and long-horizon reasoning for effective goal interpretation and decision-making. Despite the impressive performance of LLMs/VLMs on various benchmarks, these models perceive images as bags of words (semantic concepts). In detail, they use semantic understanding as a shortcut but lack the ability to recognize geometric structures or solve spatial problems such as mazes. To interact with the physical world, we focus on two dimensions: (1) From high-level semantic to low-level geometric understanding: We introduce a low-level visual description language that serves as geometric tokens, allowing the abstraction of multimodal low-level geometric structures. (2) From fast-thinking to slow-thinking: We propose to quantify long-horizon reasoning by incorporating Markov Decision Process (MDP) based decision-making. The key difference between language models and agent models lies in their decision-making capabilities. This fundamental difference necessitates a shift in how we approach the development of large agent models, focusing on both geometric understanding and long-term planning to create more capable embodied AI agents. Bio: Manling Li is an Assistant Professor at Northwestern University. She was a postdoc at Stanford University and obtained the PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics. Her work on multimodal knowledge extraction won the ACL'20 Best Demo Paper Award, and the work on scientific information extraction from COVID literature won NAACL'21 Best Demo Paper Award. She was a recipient of Microsoft Research PhD Fellowship in 2021, an EE CS Rising Star in 2022, a DARPA Riser in 2022, etc. She served as Organizing Committee of ACL 25 and EMNLP 2024, and delivered tutorials about multimodal knowledge at IJCAI'23, CVPR'23, NAACL'22, AAAI'21, ACL'21, etc. Additional information is available at https://limanling.github.io/. |

10:50 | Mitigating Backdoor Threats to Large Language Models ABSTRACT. Despite the success of Large Language Models (LLMs), their increasingly scaled sizes, as well as their growing deployments in systems, services and scientific studies, are bringing along more and more emergent issues in security and privacy. One significant area of security concern for LLMs is their susceptibility during the training phase. Adversaries can exploit this vulnerability by strategically contaminating a small fraction of the training data and lead to the introduction of backdoors or a significant degradation in model performance. In this introductory talk, will begin discussing the training-time threats by delving into various attack types including ample-agnostic attacks like word or sensitive-level trigger attacks, sample-dependent attacks such as syntactic, paraphrasing and back translation attacks. Subsequently, encompassing emergent LLM development processes of instruction tuning and RLHF, we will discuss how attackers may capitalize on these processes, injecting tailored instruction-following examples or manipulating ranking scores to purposefully alter the model’s behavior. We will also shed light on the far-reaching consequences of training-time attacks across diverse LLM applications. Moving forward, we will introduce threat mitigation strategies in the training stage where defenders can measure and counteract the influence of poisoned data within the training process, and in the inference stage where defenders can detect and eliminate poisoned data given the compromised model. |

11:10 | Foundation Models for Robotic Manipulation: Opportunities and Challenges ABSTRACT. Foundation models, such as GPT-4 Vision, have achieved significant milestones in the fields of natural language processing and vision, demonstrating exceptional abilities to adapt to new tasks and scenarios. However, physical interactions—such as cooking, cleaning, or caregiving—remain a frontier where foundation models and robotic systems have yet to achieve the desired level of adaptability and generalization. In this talk, I will explore the opportunities for incorporating foundation models into traditional robotic pipelines to endow robots with capabilities beyond those achievable with conventional robotic tools. The core idea behind this research is to introduce novel representations and integrate structural priors into robot learning systems, leveraging the commonsense knowledge learned from foundation models to achieve the best of both worlds. I will demonstrate how such integration enables robots to interpret instructions given in free-form natural language and allows for finer-grained robotic manipulation, considering both spatial and temporal constraints. The performance of the system is showcased through a wide range of real-world manipulation tasks involving rigid, articulated, and deformable objects. Towards the end of the talk, I will discuss the limitations of current foundation models, the challenges that still lie ahead, and potential avenues to address these challenges. |

11:30 | Dependence Induced Representations PRESENTER: Xiangxiang Xu ABSTRACT. We study the problem of learning feature representations from a pair of random variables, where we focus on the representations that are induced by their dependence. We provide sufficient and necessary conditions for such dependence induced representations, and illustrate their connections to Hirschfeld--Gebelein--R\'{e}nyi (HGR) maximal correlation functions and minimal sufficient statistics. We characterize a large family of loss functions that can learn dependence induced representations, including cross entropy, hinge loss, and their regularized variants. In particular, we show that the features learned from this family can be expressed as the composition of a loss-dependent function and the maximal correlation function, which reveals a key connection between representations learned from different losses. Our development also gives a statistical interpretation of the neural collapse phenomenon observed in deep classifiers. Finally, we present the learning design based on the feature separation, which allows hyperparameter tuning during inference. |

11:50 | Efficient LLM Serving via Lossy Computation ABSTRACT. Large language models (LLMs) have exhibited human-like conversation ability. Yet scaling LLMs to longer contexts, such as extracting information from lengthy articles–one of the most fundamental tasks in healthcare applications–poses significant challenges. The primary issues are their inability to handle contexts beyond pre-training lengths and the system constraints that make deployment difficult, as memory requirements for inference increase with context length. The key idea to overcome these issues is that LLMs are extremely robust to the noise from lossy computation, like low-precision computation. Following this insight, we will discuss some recent results in serving LLMs at scale, especially in handling longer contexts. To deal with the algorithmic challenge, I will share our recent work on algorithmic support to extend the context length of LLMs to at least 8X longer by coarsening the positional information of distant tokens. To deal with the system challenge, I'll discuss our recent efforts in quantizing the intermediate states of past tokens to 2-bit numbers, which leads to a 3.5X wall-clock time speedup without harming performance. Finally, I will discuss our latest projects applying LLMs in healthcare applications, especially how we utilize retrieval techniques for long contexts to reduce the hallucination problem of healthcare chatbots. |

Invited Session