View: session overviewtalk overview

Organizer: Bin Hu

Organizers: Bruce Hajek, Jiaming Xu, Yihong Wu

08:30 | Active Learning for Individual Data via Minimal Stochastic Complexity ABSTRACT. Modern machine learning systems require massive amounts of labeled training data in order to achieve high accuracy rates. Active learning uses feedback to label the most informative data points and significantly reduce the labeling effort. Many heuristics for selecting data points have been developed in recent years which are usually tailored to a specific task and a general unified framework is lacking. In this work, the individual data setting is considered and an active learning criterion is proposed. In this setting the features and labels, both in the training and the test, are specific individual, deterministic quantities. Motivated by connections between source coding and minimax learning, the proposed criterion attempts to find data points which minimize the average Predictive Normalized Maximum Likelihood (pNML) on the unlabeled test set. It is shown using a real data set that the proposed criterion performs better than other active learning criteria. |

08:50 | Byzantine Resilience With Reputation Scores ABSTRACT. Training distributed machine learning algorithms is prone to Byzantine attacks where the adversarial workers send corrupted model updates to derail the training. In this paper, we propose a reputation score based gradient aggregation as a possible solution. We introduce a class of novel stochastic gradient descent algorithms, ByGARS (Byzantine Gradient Aggregation using Reputation Scores) that involve computing reputation scores (of workers) using an auxiliary dataset at the server. These reputation scores are then used for aggregating the gradients (model updates) at the server. Under reasonable assumptions, we show that using these reputation scores is robust to any number of adversaries and prove the convergence of a representative algorithm, ByGARS++ for strongly convex objective functions using results from two-timescale stochastic approximation theory. The computational complexity of ByGARS++ is the same as the usual distributed stochastic gradient descent method with only an additional inner product computation in every iteration. We also demonstrate the effectiveness of the algorithms for non-convex learning problems using MNIST and CIFAR-10 datasets against almost all state-of-the-art Byzantine attacks. |

09:10 | CoBAAF: Controlled Bayesian Air Aggregation Federated Learning from Heterogeneous Data ABSTRACT. Federated learning (FL) is an emerging machine learning paradigm for training models across multiple edge devices holding local data sets. Due to communications overhead, complexity, and privacy requirements, the global training at the server is done without receiving the entire original data from the edge devices, but only outputs of their local processing and training. A major challenge in FL is to reduce the bandwidth and energy consumption due to the repeated transmissions of large volumes of data by a large number of users over the wireless channel. Recently, over-the-air (OTA) FL has been suggested to achieve this goal. In this setting, all users transmit their data signal simultaneously over a Multiple Access Channel (MAC), and the computation is done over the wireless channel via air aggregation. However, existing methods suffer from performance degradation when handling statistical heterogeneity of users data. In this paper we address this challenge. We present a novel algorithm, dubbed Controlled Bayesian Air Aggregation Federated-learning (CoBAAF), that handles statistical heterogeneity in noisy networks using a joint design of three main steps in FL: Model distribution, local training, and global aggregation. Specifically, CoBAAF controls the drift in local updates using a correction term, and allows users to transmit their data signal simultaneously over MAC. Second, it adopts a Bayesian approach to average properly the channel output, thus mitigating the effect of the noise and fading induced by the channel. We analyze the convergence of CoBAAF to the loss minimizing model theoretically, showing its ability to achieve a convergence rate similar to that achieved over error-free channels. Extensive simulation results demonstrate the improved convergence of CoBAAF for training in machine learning problems. |

09:30 | Ensuring the Defense of Paths and Perimeters in Dynamic Defender-Attacker Blotto Games (dDAB) on Graphs PRESENTER: Bryce Ferguson ABSTRACT. When can we defend a region of space from being breached by an adversary? To answer this, we consider the Dynamic Defender-Attacker Blotto Game (dDAB) played on a graph, in which a fleet of defensive assets must be maneuvered through a network with the objective of defending a specified set of nodes from an adversarial fleet. In this framework, we seek to understand how the defender's objective (which nodes to defend in the graph) affects the amount of resources required to provide a safety guarantee. Particularly, we are interested in understanding how to defend paths-through and perimeters-of regions of space. Using the dDAB framework, we show the necessary and sufficient size of the defender's fleet needed to guarantee the defense of line graphs and cycle graphs, respectively. Then we show how these results can be viewed through a spatial interpretation to give insights on defense strategies in 2 Dimensional spaces. We find that more defender assets are needed for paths that are less direct (longer than the shortest path) and perimeters that have nooks and crevasses (non-convex). |

09:50 | Over-the-Air Federated Learning with Privacy Protection via Correlated Additive Perturbations PRESENTER: Erik G. Larsson ABSTRACT. In this paper, we consider privacy aspects of wireless federated learning (FL) with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server. By exploiting the waveform superposition property of multiple access channels, OtA FL enables the users to transmit their updates simultaneously with only linear processing techniques, which improves resource efficiency. However, this setting is vulnerable to privacy leakage since an adversary node can hear directly the uncoded message. Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy due to the reduced signal-to-noise ratio. In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server at the same time. More explicitly, spatially correlated perturbations are added to the gradients at the users before transmission. Utilizing the zero-sum property of the correlated perturbations, the side effect of added perturbation on the aggregated gradient update at the edge server can be minimized. In the meanwhile, the added perturbation will not be canceled out at the adversary due to the heterogeneity of wireless channels and misaligned power scaling at the adversary, which prevents privacy leakage. Theoretical analysis on the perturbation covariance matrix, differential privacy, and model convergence are provided, based on which an optimization problem is formulated to jointly design the covariance matrix and the amplitude scaling factor to balance between privacy protection and convergence performance. |

08:30 | PRESENTER: Yuanyuan Tang ABSTRACT. Codes over permutations have received significant attention due to their applications in memory devices and powerline transmission systems. In this paper, we construct a family of permutation codes that are capable of correcting a burst of at most $t$ deletions with $\log n+O(\log\log n)$ bits of redundancy where $t$ is a constant, which significantly improves upon the state-of-the-art construction with redundancy $2t\log n$ bits. |

08:50 | Bounds on Reversible, Complement, Reversible-Complement, Constant Weight Sum Codes ABSTRACT. For an even-length alphabet, a code is called Reversible, Complement, Reversible-Complement, Constant Weight Sum code (RCCWS code) if the code is closed under reverse, complement, and reverse-complement sequences, and for given symbols in the alphabet, the total number of occurrence of those symbols in each sequence is identical. In this work, we have obtained an upper bound (Sphere Packing bound) and a lower bound (Gilbert-Varshamov bound) on the size of an RCCWS code defined over an even-size alphabet. As a special case of these two bounds, we also have obtained an upper and lower bound on the size of DNA codes that satisfies Reverse, Complement, Reverse-complement, and GC-content constraints. |

09:10 | Zero-error Coding for Computing with Encoder Side-information ABSTRACT. We study the zero-error source coding problem in which an encoder with side-information (SI) $g(Y)$ transmits source symbols $X$ to a decoder. The decoder has SI $Y$ and wants to recover $f(X,Y)$ where $f,g$ are deterministic. We exhibit a condition on the source distribution and $g$ that we call ``pairwise shared SI'', such that the optimal rate has a single-letter expression. This condition is satisfied if every pair of source symbols ``share" at least one SI symbol for all output of $g$. It has a practical interpretation, as $Y$ models a request made by the encoder on an image $X$, and $g(Y)$ corresponds to the type of request. It also has a graph-theoretical interpretation: under ``pairwise shared SI'' the characteristic graph can be written as a disjoint union of OR products. In the case where the source distribution is full-support, we provide an analytic expression for the optimal rate. We develop two examples, one under ``pairwise shared SI'', the other one with full-support source distribution; and we show that the optimal coding scheme outperforms trivial strategies in both examples. |

09:30 | Extracting Unique Information Through Markov Relations ABSTRACT. We propose two new measures for extracting the unique information in $X$ and not $Y$ about a message $M$, when $X, Y$ and $M$ are joint random variables with a given joint distribution. We take a Markov based approach, motivated by questions in fair machine learning, and inspired by similar Markov-based optimization problems that have been used in the Information Bottleneck and Common Information frameworks. We obtain a complete characterization of our definitions in the Gaussian case (namely, when $X, Y$ and $M$ are jointly Gaussian), under the assumption of Gaussian optimality. We also examine the consistency of our definitions with the partial information decomposition (PID) framework, and show that these Markov based definitions achieve non-negativity, but not symmetry, within the PID framework. |

09:50 | Strategic Communication via Cascade Multiple-Description Network ABSTRACT. In decentralised decision-making problems, agents choose their actions based on locally available information and knowledge about decision rules or strategies of other agents. We consider a three-node cascade network with an encoder, a relay and a decoder, having distinct objectives captured by cost functions. In such a cascade network, agents choose their respective strategies sequentially, as a response to the former agent's strategy and in a way to influence the decision of the latter agent in the network. We assume the encoder commits to a strategy before the communication takes place. Upon revelation of the encoding strategy, the relay commits to a strategy and reveals it. The communication starts, the source sequence is drawn and processed by the encoder and relay. Then, the decoder observes a sequences of symbols, updates its Bayesian posterior beliefs accordingly, and takes the optimal action. This is an extension of the Bayesian persuasion problem in the Game Theory literature. In this work, we provide an information-theoretic approach to study the fundamental limit of the strategic communication via three-node cascade network. Our goal is to characterize the optimal strategies of the encoder, the relay and the decoder, and study the asymptotic behavior of the encoder's minimal long-run cost function. |

Organizer: Bin Hu

Organizer: G. Fellouris and V. Veeravalli

Organizers: Bruce Hajek and R. Srikant

10:30 | Uncertainty in Biometric Identification and Authentication Systems with Strong Secrecy ABSTRACT. In this work, we derive fundamental bounds of biometric identification and authentication systems under uncertainty considerations. The uncertainty lies in two aspects, i.e., the probability mass function (p.m.f.) of users' biometric information and the observation channel. For the p.m.f. of users' biometrics, it is assumed that the users are from different groups and the distribution of each group is determined by the state. For the observation channel, it is assumed that there is a set of channels and the actual observation channel is from this set. It is shown that our work is more general and can be reduced to the setup without uncertainty studied in the literature. Moreover, we show that strong secrecy can be achieved without the loss of the other performance. |

10:50 | Linear Coding for Gaussian Two-Way Channels PRESENTER: Junghoon Kim ABSTRACT. We consider linear coding for Gaussian two-way channels (GTWCs), in which each user generates the transmit symbols by linearly encoding both its message and the past received symbols (i.e., the feedback information) from the other user. In Gaussian one-way channels (GOWCs), Butman has proposed a well-developed model for linear encoding that encapsulates feedback information into transmit signals. However, such a model for GTWCs has not been well studied since the coupling of the encoding processes at the users in GTWCs renders the encoding design non-trivial and challenging. In this paper, we aim to fill this gap in the literature by extending the existing signal models in GOWCs to GTWCs. With our developed signal model for GTWCs, we formulate an optimization problem to jointly design the encoding/decoding schemes for both the users, aiming to minimize the weighted sum of their transmit powers under signal-to-noise ratio constraints. Further, we provide new insights on the encoding design for GTWCs. In particular, we show that it is optimal that one of the users (i) does not transmit the feedback information to the other user at the last channel use, and (ii) transmits its message only over the last channel use. With these solution behaviors, we further simplify the problem and solve it via an iterative two-way optimization scheme. We numerically demonstrate that our proposed scheme for GTWCs achieves a better performance in terms of the transmit power compared to the existing counterparts, such as the non-feedback scheme and one-way optimization scheme. |

11:10 | Finite-Blocklength Results for the A-channel: Applications to Unsourced Random Access and Group Testing ABSTRACT. We present finite-blocklength achievability bounds for the unsourced A-channel. In this multiple-access channel, users noiselessly transmit codewords picked from a common codebook with entries generated from a q-ary alphabet. At each channel use, the receiver observes the set of different transmitted symbols but not their multiplicity. We show that the A-channel finds applications in unsourced random-access (URA) and group testing. Leveraging the insights provided by the finite-blocklength bounds and the connection between URA and non-adaptive group testing through the A-channel, we propose improved decoding methods for state-of-the-art A-channel codes and we showcase how A-channel codes provide a new class of structured group testing matrices. The developed bounds allow to evaluate the achievable error probabilities of group testing matrices based on random A-channel codes for arbitrary numbers of tests, items and defectives. We show that such a construction asymptotically achieves the optimal number of tests. In addition, every efficiently decodable A-channel code can be used to construct a group testing matrix with sub-linear recovery time. |

11:30 | On the Symmetries of the Deletion Channel ABSTRACT. In this short note, we consider a class of symmetry groups associated to communication channels, which can informally be viewed as the transformations of the set of inputs that “commute” with the action of the channel. These groups were first studied by Polyanskiy in (IEEEToIT 2012). We show the simple result that the input distribution that attains the maximum mutual information for a given channel is a “fixed point” of its group. We conjecture (and give empirical evidence) that the channel group of the deletion channel is extremely small (it contains a number of elements constant in the blocklength). We prove a special case of this conjecture. This serves as some formal justification for why the analysis of the binary deletion channel has proved much more difficult than its memoryless counterparts. |

Organizers: Bruce Hajek and R. Srikant

Organizers: G. Fellouris and V. Veeravalli

13:30 | Efficient SPRT-based Best Arm Identification in Stochastic Bandits ABSTRACT. This paper investigates the problem of best arm identification (BAI) in stochastic multi-armed bandits in the fixed confidence setting. A novel formulation based on sequential hypothesis testing is provided, and an algorithm for BAI is proposed that, in spirit, follows the structure of the canonical sequential probability ratio test (SPRT). The algorithm has three features: (1) its sample complexity is asymptotically optimal, (2) it is guaranteed to be $\delta$-PAC, and (3) it addresses the computational challenge of the state-of-the-art approaches. Specifically, the existing approaches rely on Thompson sampling for dynamically identifying the best arm and a challenger. This paper shows that identifying the challenger can be computationally expensive and demonstrates that the SPRT-based approach addresses that computational weakness. |

13:50 | Efficient sequential UCB-based Hungarian algorithm for assignment problems ABSTRACT. The assignment problem has many real-world applications such as allocations of agents and tasks for optimal utility gain. While it has been well-studied in the optimization literature when the underlying utility between every pair of agent and task is known, research is limited when the utilities are unknown and need to be learned from data on the fly. In this work, motivated by the mentor-mentee matching application in U.S. universities, we develop an efficient sequential assignment algorithm, with the objective of nearly maximizing the overall utility simultaneously for each time. Our proposed algorithm is to use stochastic binary bandit feedback to estimate the unknown utilities through the logistic regression, and then to combine the Upper Confidence Bound (UCB) algorithm in the multi-armed bandit problem with the Hungarian algorithm in the assignment problem. We derive the theoretical bounds of our algorithm for both the estimation error and the total regret, and numerical studies are also conducted to illustrate the usefulness of our algorithm. |

14:10 | Data-Driven Sequential Change Detection in Privacy-Sensitive Networks PRESENTER: Yasin Yilmaz ABSTRACT. We study privacy-preserving sequential change detection in a setting where the data are distributed over a network and locally sensitive to each node, and probabilistic data model is unknown. We design and analyze a data-driven solution scheme where each node observes a high-dimensional data stream for which it computes a local outlierness score, perturbs, encrypts, and sends it to a network operator. The network operator then decrypts an aggregate statistic over the network and performs online network change detection via the proposed generalized cumulative sum (CUSUM) algorithm. We derive an asymptotic lower bound and an asymptotic approximation for the average false alarm period of the proposed algorithm. Additionally, we derive an asymptotic upper bound and asymptotic approximation for the average detection delay of the proposed algorithm under a certain anomaly. We show the analytical tradeoff between the change detection performance and the differential privacy level, controlled via the local perturbation noise. Experiments illustrate that the proposed algorithm offers a good tradeoff between privacy and quick change detection against the UDP flooding and spam attacks in a real Internet of Things (IoT) network. |

14:30 | Decentralized Anomaly Detection via Deep Multi-Agent Reinforcement Learning ABSTRACT. We consider a decentralized anomaly detection problem, where multiple agents collaborate to localize a single anomalous process among a finite number M of processes. At each time, a subset of the processes can be observed by each agent, and the observations from each chosen process follow two different distributions, depending on whether the process is normal or abnormal. The communication channel between agents is rate-limited. The objective is a sequential search strategy that minimizes the Bayes risk, consisting of the sampling cost, and the joint terminal cost among the agents. This problem generalizes previous studies that considered anomaly detection by a single detector. We develop a novel algorithm based on deep multi-agent reinforcement learning optimization to minimize the Bayes risk. Numerical experiments demonstrate the ability of the algorithm to learn good policies in this challenging problem, and improve the single-agent performance by applying the proposed multi-agent collaborative learning method. |

13:30 | Oversquashing in GNNs through the lens of information contraction and graph expansion PRESENTER: Kedar Karhadkar ABSTRACT. The quality of signal propagation in message-passing graph neural networks (GNNs) strongly influences their expressivity as has been observed in recent works. In particular, for prediction tasks relying on long-range interactions, recursive aggregation of node features can lead to an undesired phenomenon called ``oversquashing''. We present a framework for analyzing oversquashing based on information contraction. Our analysis is guided by a model of reliable computation due to von Neumann that lends a new insight into oversquashing as signal quenching in noisy computation graphs. Building on this, we propose a graph rewiring algorithm aimed at alleviating oversquashing. Our algorithm employs a random local edge flip primitive motivated by an expander graph construction. We compare the spectral expansion properties of our algorithm with that of an existing curvature-based non-local rewiring strategy. Synthetic experiments show that while our algorithm in general has a slower rate of expansion, it is overall computationally cheaper, preserves the node degrees exactly and never disconnects the graph. |

13:50 | Classifying Incomplete Data with a Mixture of Subspace Experts ABSTRACT. Performing inference on data with missing values is a highly common yet challenging task in machine learning, and is often addressed by imputing missing values before performing inference on the filled dataset. However, the two parts of this paradigm may be at odds with one another in that optimal imputation may not yield data that is optimal for inference. In this paper, we propose an ensemble approach to classification with missing values where base learners are trained on embeddings of the data, not on imputed versions of the data. To accomplish this, MoSEs first uses a novel subspace clustering algorithm which characterizes an input space as a hierarchy of localized affine subspaces, each of which produces a low-dimensional embedding of samples with missing data. Then, an expert for each subspace is trained on the associated embedding, with final predictions being a weighted sum of all expert predictions. The degree to which experts are localized to a subspace of data, as well as the degree to which final predictions are influenced by the nearest subspaces are controlled by a pair of tunable hyper-parameters. We perform experiments demonstrating the state-of-the-art performance of this method across a variety of reference datasets. |

14:10 | Multivariate Feature Extraction PRESENTER: Xiangxiang Xu ABSTRACT. We propose a general framework to extract features for learning problems involving multiple variables. First, we decompose multivariate dependence into different components to obtain the piece relevant to the learning task. Then, we establish a modal decomposition approach to represent the component as informative features. We further demonstrate the algorithm design for extracting such features from real data, which can incorporate and utilize existing deep feature extractors. We also present an application of our framework in learning tasks with side information. |

14:30 | On Sample Complexity of Learning Shared Representations: The Asymptotic Regime PRESENTER: Xiangxiang Xu ABSTRACT. Federated learning and multi-task learning have achieved great successes in recent works, which exploit the knowledge contained in different tasks to alleviate the influence of lacking training samples. In order to understand the knowledge utilization mechanism, we study the sample complexity of the federated learning algorithms by computing their expected population risks, where the asymptotic regime is considered that the number of tasks is large. Specifically, we focus on several typical algorithms, including (i) training personalized models merely by local data, (ii) jointly training a single model by all the data of devices, and (iii) jointly training the shared feature representation and learning individual classifiers of different devices, where the latter one is recently widely-adopted. The results reveal that the validity of the shared representation algorithm is affected by the task similarities, the model dimensionality, and the sample size, which quantify the sample size range where such algorithm is preferable. Moreover, we develop a new algorithm in consideration of collaborating the classifiers of different devices under the shared representations, which can have lower sample complexity than conventional methods under the asymptotic regime of large sample size. This provides theoretical guidance for practical algorithm designs. |

14:50 | Distributed Consensus-based Kalman Filter under Limited Communication ABSTRACT. In this work, we consider a distributed estimation problem in a communication-constrained environment. To address the limited communication challenge, we present a fully distributed Kalman filtering algorithm in which each agent shares a compressed version of its estimated state information with its neighboring nodes. In the proposed algorithm, we explicitly compute the estimation error covariances of each node in a distributed manner based on the consensus filter using the compressed estimates. An intuitive finding is that for a specific mid-tread quantization function, compared with the uncompressed distributed Kalman consensus filter, the state estimates obtained with the quantized Kalman consensus filter are significantly similar; however, the estimation error covariances are noticeably different. We validate the theoretical results using simulations. |

Organizers: Bruce Hajek and R. Srikant

Organizer: V. Veeravalli

15:30 | The Inverse Problem of Linear-Quadratic Differential Games: When is a Control Strategies Profile Nash? ABSTRACT. This paper aims to formulate and study the inverse problem of non-cooperative linear quadratic games: Given a profile of control strategies, find cost parameters for which this profile of control strategies is Nash. We formulate the problem as a leader-followers problem, where a leader aims to implant a desired profile of control strategies among selfish players. In this paper, we leverage frequency-domain techniques to develop a necessary and sufficient condition on the existence of cost parameters for a given profile of stabilizing control strategies to be Nash under a given linear system. The necessary and sufficient condition includes the circle criterion for each player and a rank condition related to the transfer function of each player. The condition provides an analytical method to check the existence of such cost parameters, while previous studies need to solve a convex feasibility problem numerically to answer the same question. We develop an identity in frequency-domain representation to characterize the cost parameters, which we refer to as the Kalman equation. The Kalman equation reduces redundancy in the time-domain analysis that involves solving a convex feasibility problem. Using the Kalman equation, we also show the leader can enforce the same Nash profile by applying penalties on the shared state instead of penalizing the player for other players' actions to avoid the impression of unfairness. |

15:50 | Adversarial Examples for Model-Based Control: A Sensitivity Analysis ABSTRACT. We propose a method to attack controllers that rely on external timeseries forecasts as task parameters. An adversary can manipulate the costs, states, and actions of the controllers by forging the timeseries, in this case perturbing the real timeseries. Since the controllers often encode safety requirements or energy limits in their costs and constraints, we refer to such manipulation as an adversarial attack. We show that different attacks on model-based controllers can increase control costs, activate constraints, or even make the control optimization problem infeasible. We use the linear quadratic regulator and convex model predictive controllers as examples of how adversarial attacks succeed and demonstrate the impact of adversarial attacks on a battery storage control task for power grid operators. As a result, our method increases control cost by 8500% and energy constraints by 13% on real electricity demand timeseries. |

16:10 | System Identification Entropy for Chen-Fliess Series and Their Interconnections ABSTRACT. The notion of system identification entropy is defined for the class of analytic nonlinear input-output systems having Chen-Fliess series representations. This quantity describes the growth in the number of bits needed to specify the series coefficients in order to represent the input-output map to a desired accuracy. Upper bounds are then computed in terms of the growth rate of the series coefficients and the entropy of the generating series, which is an entirely distinct entropy concept. Once the results for the system identification entropy of a single system are established, the focus turns to interconnected systems, specifically, parallel, series, and feedback interconnected systems. Examples include estimating upper bounds for the system identification entropies of parallel interconnected rational systems and cascaded Wiener systems. |

16:30 | PRESENTER: Pouria Tooranjipour ABSTRACT. This paper synthesizes a risk-aware safe optimal controller for partially unknown linear systems under additive Gaussian noises. The risk is assessed through the concept of Conditional Value-at-Risk (CVaR) to account for the extreme low probability events that occurred in a one-step cost function without being overly conservative. The safety of the CVaR optimization solution is also guaranteed with high probability by imposing a chance constraint. A state-feedback risk-aware controller is first obtained that provides an upper bound to the formulated safe CVaR optimization problem. Then, an online data-driven quadratic programming (QP) optimization problem is devised to simultaneously and safely learn the unknown dynamics and control the system with high probability. As more measurements are collected, the safety constraint is tightened due to increasing the confidence in estimating the dynamic model. In the end, a numerical example is given to elucidate the efficacy of the proposed method. |

16:50 | Beyond UCB: statistical complexity and optimal algorithm for non-linear ridge bandits ABSTRACT. Many existing literature on bandits and reinforcement learning assume a linear reward/value function, but what happens if the reward is non-linear? Two curious phenomena arise for non-linear bandits: first, in addition to the "learning phase" with a standard \Theta(\sqrt(T)) regret, there is an "initialization phase" with a fixed cost determined by the reward function; second, achieving the smallest cost of the initialization phase requires new learning algorithms other than traditional ones such as UCB. For a special family of non-linear bandits, we derive upper and lower bounds on the optimal fixed cost, and in addition, on the entire learning trajectory in the initialization phase via differential equations. In particular, we show that a two-stage algorithm which first finds a good initialization and then treats the problem as a locally linear bandit is statistically optimal, while several classical algorithms such as UCB are provably suboptimal. This is based on a recent joint work with Jiantao Jiao, Nived Rajaraman, and Kannan Ramchandran. |

15:30 | Queueing Delay Minimization in Overloaded Networks via Rate Control ABSTRACT. We study the pattern of queueing delay minimization in overloaded networks using link rate control. We show that serving packets with maximum link rates does not minimize queueing delay when networks are on overload. To understand this counter-intuitive observation and identify the optimal solution, we characterize the packet queueing delay explicitly through a fluid queue model, which enables us to characterize the sufficient and necessary condition for a policy to be delay-optimal for single-hop networks. We show that setting the link rates to be proportional to the packet arrival rates minimizes both average and maximum delay among all sources simultaneously. We further prove that a queue-proportional rate control policy, agnostic of packet arrival rates, asymptotically achieves optimal delay performance given any initial state of queue backlog. We also demonstrate that our proposed policies are implementable in distributed manner to reduce communication overhead and computational cost. We evaluate our policies over single-hop networks with different network settings, and demonstrate its superiority compared to the policies that work well in networks not overloaded. |

15:50 | Age of Information Process under Strongly Mixing Communication -- Moment Bound, Mixing Rate and Strong Law PRESENTER: Adrian Redder ABSTRACT. The decentralized nature of multi-agent systems requires continuous data exchange to achieve global objectives. In such scenarios, Age of Information (AoI) has become an important metric of the freshness of exchanged data due to the error-proneness and delays of communication systems. Communication systems usually possess dependencies: the process describing the success or failure of communication is highly correlated when these attempts are ``close'' in some domain (e.g. in time, frequency, space or code as in wireless communication) and is, in general, non-stationary. To study AoI in such scenarios, we consider an abstract event-based AoI process $\Delta(n)$, expressing time since the last update: If, at time $n$, a monitoring node receives a status update from a source node (event $A(n-1)$ occurs), then $\Delta(n)$ is reset to one; otherwise, $\Delta(n)$ grows linearly in time. This AoI process can thus be viewed as a special random walk with resets. The event process $A(n)$ may be nonstationary and we merely assume that its temporal dependencies decay sufficiently, described by $\alpha$-mixing. We calculate moment bounds for the resulting AoI process as a function of the mixing rate of $A(n)$. Furthermore, we prove that the AoI process $\Delta(n)$ is itself $\alpha$-mixing from which we conclude a strong law of large numbers for $\Delta(n)$. These results are new, since AoI processes have not been studied so far in this general strongly mixing setting. This opens up future work on renewal processes with non-independent interarrival times. |

16:10 | ASUMAN: Age Sense Updating Multiple Access in Networks ABSTRACT. We consider a fully-connected wireless gossip network which consists of a source and $n$ receiver nodes. The source updates itself with a Poisson process and also sends updates to the nodes as Poisson arrivals. Upon receiving the updates, the nodes update their knowledge about the source. The nodes gossip the data among themselves in the form of Poisson arrivals to disperse their knowledge about the source. The total gossiping rate is bounded by a constraint. The goal of the network is to be as timely as possible with the source. In this work, we propose ASUMAN, a distributed opportunistic gossiping scheme, where after each time the source updates itself, each node waits for a time proportional to its current age and broadcasts a signal to the other nodes of the network. This allows the nodes in the network which have higher age to remain silent and only the low-age nodes to gossip, thus utilizing a significant portion of the constrained total gossip rate. We calculate the average age for a typical node in such a network with symmetric settings and show that the theoretical upper bound on the age scales as $O(1)$. ASUMAN, with an average age of $O(1)$, offers significant gains compared to a system where the nodes just gossip blindly with a fixed update rate in which case the age scales as $O(\log n)$. |

16:30 | Universal Policy Tracking: Scheduling for Wireless Networks with Delayed State Observation ABSTRACT. Numerous scheduling algorithms have been proposed to optimize various performance metrics like throughput, delay and utility in wireless networks. However, these algorithms often require instantaneous access to network state information, which is not always available. While network stability can sometimes be achieved with delayed state information, other performance metrics such as latency may degrade. Thus, instead of simply stabilizing the system, our goal is to design a framework that can mimic arbitrary scheduling algorithms with performance guarantees. A naive approach is to make decisions directly with delayed information, but we show that such methods may lead to poor performance. Instead, we propose the Universal Tracking (UT) algorithm that can mimic the actions of arbitrary scheduling algorithms under observation delay. We rigorously show that the performance gap between UT and the scheduling algorithm being tracked is bounded by constants. Our numerical experiments show that UT significantly outperforms the naive approach in various applications. |

16:50 | Pricing, competition and market segmentation in ride hailing ABSTRACT. We analyse a non-cooperative strategic game among two ride-hailing platforms, each of which is modeled as a two-sided queueing system, where drivers (with a certain patience level) are assumed to arrive according to a Poisson process at a fixed rate, while the arrival process of passengers is split across the two providers based on QoS considerations. We also consider two monopolistic scenarios: (i) each platform has half the market share, and (ii) the platforms merge into a single entity, serving the entire passenger base using their combined driver resources. The key novelty of our formulation is that the total market share is fixed across the platforms. The game thus captures the competition among the platforms over market share, which is modeled using two different Quality of Service (QoS) metrics: (i) probability of driver availability, and (ii) probability that an arriving passenger takes a ride. The objective of the platforms is to maximize the profit generated from matching drivers and passengers. In each of the above settings, we analyse the equilibria associated with the game. Interestingly, under the second QoS metric, we show that for a certain range of parameters, no Nash equilibrium exists. Instead, we demonstrate a new solution concept called an equilibrium cycle. Our results highlight the interplay between competition, cooperation, passenger-side price sensitivity, and passenger/driver arrival rates. |