next day
all days

View: session overviewtalk overview

14:00-15:10 Session 3: Multi-armed Bandits I
Simple Bayesian Algorithms for Best Arm Identification
SPEAKER: Daniel Russo

ABSTRACT. This paper considers the optimal adaptive allocation of measurement effort in order to identify the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality, with the goal of confidently identifying the best design after a small number of measurements.

We propose two simple Bayesian algorithms for adaptively allocating measurement effort. One, which we call Top-Two sampling, computes the two designs with the highest posterior probability of being optimal, and then randomizes to select among these two. The other algorithm we propose is a modified version of Thompson sampling that is tailored for identifying the best design.

We prove that these simple algorithms satisfy a strong optimality property. In a frequestist setting where the true quality of the designs is fixed, the posterior is said to be consistent if it correctly identifies the optimal design, in the sense that that the posterior probability assigned to the event that some other design is optimal converges to zero as measurements are collected. We show that under the proposed algorithms this convergence occurs at an \emph{exponential rate}, and the corresponding exponent is the best possible among all allocation rules.

Optimal Best Arm Identification with Fixed Confidence
SPEAKER: unknown

ABSTRACT. We provide a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the 'Track-and-Stop' strategy, which is proved to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.

Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem

ABSTRACT. We consider the problem of \textit{best arm identification} with a \textit{fixed budget $T$}, in the $K$-armed stochastic bandit setting, with arms distribution defined on $[0,1]$. We prove that any bandit strategy, for at least one bandit problem characterized by a complexity $H$, will misidentify the best arm with probability lower bounded as $$\exp\Big(-\frac{T}{\log(K)H}\Big),$$ where $H$ is the sum for all sub-optimal arms of the inverse of the squared gaps. Our result disproves formally the general belief - coming from results in the fixed confidence setting - that there must exist an algorithm for this problem whose probability of error is upper bounded as $\exp(-T/H)$. This also proves that some existing strategies based on the Successive Rejection of the arms are optimal - closing therefore the current gap between upper and lower bounds for the fixed budget best arm identification problem.

Best-of-K Bandits
SPEAKER: unknown

ABSTRACT. This paper studies the Best-of-K Bandit game: At each time the player chooses a subset S among all N-choose-K possible options and observes reward max(X(i) : i in S) where X is a random vector drawn from a joint distribution. The objective is to identify the subset that achieves the highest expected reward with high probability using as few queries as possible. We present distribution-dependent lower bounds which force a learner to consider all N-choose-K subsets and match naive extensions of known upper bounds in the bandit setting obtained by treating each subset as a separate arm. Nevertheless, we present evidence that exhaustive search may be avoided for certain, favorable distributions because the influence of high-order order correlations may be dominated by lower order statistics. Finally, we present an algorithm and analysis for independent arms, which mitigates the surprising non-trivial information occlusion that occurs due to only observing the max in the subset. This may inform strategies for more general dependent measures, and we complement these result with independent-arm lower bounds.

Pure Exploration of Multi-armed Bandit Under Matroid Constraints
SPEAKER: unknown

ABSTRACT. We study the pure exploration problem subject to a matroid constraint (Best-Basis) in a stochastic multi-armed bandit game. In a Best-Basis instance, we are given $n$ stochastic arms with unknown reward distributions, as well as a matroid $M$ over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify a basis of $M$ with the maximum total weight, using as few samples as possible. The problem is a significant generalization of the best arm identification problem and the top-$k$ arm identification problem, which have attracted significant attentions in recent years. We study both the exact and PAC versions of Best-Basis, and provide algorithms with nearly-optimal sample complexities for these versions. Our results generalize and/or improve on several previous results for the top-$k$ arm identification problem and the combinatorial pure exploration problem (when the combinatorial constraint is a matroid).

15:10-15:30Coffee Break
15:30-16:20 Session 4: Multi-armed Bandits II
Maximin Action Identification: A New Bandit Framework for Games
SPEAKER: unknown

ABSTRACT. We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower- and upper- confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

Instance-dependent Regret Bounds for Dueling Bandits
SPEAKER: unknown

ABSTRACT. We study the multi-armed dueling bandit problem in which feedback is provided in the form of relative comparisons between pairs of actions, with the goal of eventually learning to select actions that are close to the best. Following Dudik et al. (2015), we aim for algorithms whose performance approaches that of the optimal randomized choice of actions, the von Neumann winner, expressly avoiding more restrictive assumptions, for instance, regarding the existence of a single best action (a Condorcet winner). In this general setting, the best known algorithms achieve regret O(\sqrt{KT}) in T rounds with K actions. In this paper, we present the first instance-dependent regret bounds for the general problem, focusing particularly on when the von Neumann winner is sparse. Specifically, we propose a new algorithm whose regret, relative to a unique von Neumann winner with sparsity s, is at most O(\sqrt{sT}), plus an instance-dependent constant. Thus, when the sparsity is much smaller than the total number of actions, our result indicates that learning can be substantially faster.

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives
SPEAKER: unknown

ABSTRACT. We consider a contextual version of multi-armed bandit problem with global knapsack constraints. In each round, the outcome of pulling an arm is a scalar reward and a resource consumption vector, both dependent on the context, and the global knapsack constraints require the total consumption for each resource to be below some pre-fixed budget. The learning agent competes with an arbitrary set of context-dependent policies. This problem was introduced by Badanidiyuru et al.(2014), who gave a computationally inefficient algorithm with near optimal regret bounds for it. We give a "computationally efficient" algorithm for this problem with slightly better regret bounds, by generalizing the approach of Agarwal et al.(2014) for the non-constrained version of the problem. The computational time of our algorithm scales "logarithmically" in the size of the policy space. This answers the main open question of Badanidiyuru et al.(2014). We also extend our results to a variant where there are no knapsack constraints but the objective is an arbitrary Lipschitz concave function of the sum of outcome vectors.

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits
SPEAKER: Tor Lattimore

ABSTRACT. I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.

An algorithm with nearly optimal pseudo-regret \\for both stochastic and adversarial bandits
SPEAKER: unknown

ABSTRACT. We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is $\bigO{\sqrt{K n \log n}}$ and against stochastic bandits the regret is $\bigO{\sum_i (\log n)/\Delta_i}$. We also show that no algorithm with $\bigO{\log n}$ pseudo-regret against stochastic bandits can achieve $\tO{\sqrt{n}}$ expected regret against adaptive adversarial bandits. This complements previous results of \cite{BS12} that show $\tO{\sqrt{n}}$ expected adversarial regret with $\bigO{(\log n)^2}$ stochastic pseudo-regret.

16:40-18:00 Session 5: Stochastic Block Models
Information-theoretic thresholds for community detection in sparse networks
SPEAKER: unknown

ABSTRACT. We give upper and lower bounds on the information-theoretic threshold for community detection in the stochastic block model. Specifically, consider a symmetric stochastic block model with q groups, average degree d, and connection probabilities c_in/n and c_out/n for within-group and between-group edges respectively; let lambda = (c_in-c_out)/(q d). We show that, when q is large, and lambda = O(1/q), the critical value of d at which community detection becomes possible---in physical terms, the condensation threshold---is

d_c = \Theta(log(q) / (q lambda^2)),

with tighter results in certain regimes. Above this threshold, we show that any partition of the nodes into q groups which is as `good' as the planted one, in terms of the number of within- and between-group edges, is correlated with it. This gives an exponential-time algorithm that performs better than chance; specifically, community detection becomes possible below the Kesten-Stigum bound for k >= 5 in the disassortative case lambda < 0, and for k >= 11 in the assortative case lambda > 0 (similar upper bounds were obtained independently by Abbe and Sandon). Conversely, below this threshold, we show that no algorithm can label the vertices better than chance, or even distinguish the block model from an Erdos-Renyi random graph with high probability.

Our lower bound on d_c uses Robinson and Wormald's small subgraph conditioning method, and we also give (less explicit) results for non-symmetric stochastic block models. In the symmetric case, we obtain explicit results by using bounds on certain functions of doubly stochastic matrices due to Achlioptas and Naor; indeed, our lower bound on d_c is their second moment lower bound on the q-colorability threshold for random graphs with a certain effective degree.

On the low-rank approach for semidefinite programs arising in synchronization and community detection
SPEAKER: unknown

ABSTRACT. To address difficult optimization problems, convex relaxations based on semidefinite programming are now common place in many fields. Although solvable in polynomial time, large semidefinite programs tend to be computationally challenging. Over a decade ago, exploiting the fact that in many applications of interest the desired solutions are low rank, Burer and Monteiro proposed a heuristic to solve such semidefinite programs by restricting the search space to low-rank matrices. The accompanying theory does not explain the extent of the empirical success. We focus on Synchronization and Community Detection problems and provide theoretical guarantees shedding light on the remarkable efficiency of this heuristic.

Density Evolution in the Degree-correlated Stochastic Block Model
SPEAKER: unknown

ABSTRACT. There is a recent surge of interest in identifying the sharp recovery thresholds for cluster recovery under the stochastic block model. In this paper, we address the more refined question of how many vertices that will be misclassified on average. We consider the binary form of the stochastic block model, where $n$ vertices are partitioned into two clusters with edge probability $a/n$ within the first cluster, $c/n$ within the second cluster, and $b/n$ across clusters. Suppose that as $n \to \infty$, $a= b+ \mu \sqrt{ b} $, $c=b+ \nu \sqrt{ b} $ for two fixed constants $\mu, \nu$, and $b \to \infty$ with $b=n^{o(1)}$. When the cluster sizes are balanced and $\mu \neq \nu$, we show that the minimum fraction of misclassified vertices on average is given by $Q(\sqrt{v^*})$, where $Q(x)$ is the Q-function for standard normal, $v^*$ is the unique fixed point of $v= \frac{(\mu-\nu)^2}{16} + \frac{ (\mu+\nu)^2 }{16} \mathbb{E}[ \tanh(v+ \sqrt{v} Z)]$, and $Z$ is standard normal. Moreover, the minimum misclassified fraction on average is attained by a local algorithm, namely belief propagation, in time linear in the number of edges. Our proof techniques are based on connecting the cluster recovery problem to tree reconstruction problems, and analyzing the density evolution of belief propagation on trees with Gaussian approximations.

Learning Communities in the Presence of Errors
SPEAKER: unknown

ABSTRACT. We study the problem of learning communities in the presence of modeling errors and give robust recovery algorithms for the Stochastic Block Model (SBM). This model, which is also known as the Planted Partition Model, is widely used for community detection and graph partitioning in various fields, including machine learning, statistics, and social sciences. Many algorithms exist for learning communities in the Stochastic Block Model, but they do not work well in the presence of errors.

In this paper, we initiate the study of robust algorithms for partial recovery in SBM with modeling errors or noise. We consider graphs generated according to the Stochastic Block Model and then modified by an adversary. We allow two types of adversarial errors, Feige-Kilian or monotone errors, and edge outlier errors. Mossel, Neeman and Sly (STOC 2015) posed an open question about whether an almost exact recovery is possible when the adversary is allowed to add o(n) edges. Our work answers this question affirmatively even in the case of k>2 communities.

We then show that our algorithms work not only when the instances come from SBM, but also work when the instances come from any distribution of graphs that is (\epsilon m) close to SBM in the Kullback-Leibler divergence. This result also works in the presence of adversarial errors. Finally, we present almost tight lower bounds for two communities.

Semidefinite Programs for Exact Recovery of a Hidden Community
SPEAKER: unknown

ABSTRACT. We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality $K$ from an $n \times n$ symmetric data matrix $A$, where for distinct indices $i,j$, $A_{ij} \sim P$ if $i, j$ are both in the community and $A_{ij} \sim Q$ otherwise, for two known probability distributions $P$ and $Q$. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case ($P=\Bern(p)$ and $Q=\Bern(q)$ with $p>q$) and the Gaussian case ($P=\calN(\mu,1)$ and $Q=\calN(0,1)$ with $\mu>0$), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If $K=\omega( n /\log n)$, SDP attains the information-theoretic recovery limits with sharp constants; (2) If $K=\Theta(n/\log n)$, SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If $K=o(n/\log n)$ and $K \to \infty$, SDP is order-wise suboptimal. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

Spectral thresholds in the bipartite stochastic block model
SPEAKER: unknown

ABSTRACT. We consider a bipartite stochastic block model on vertex sets V_1 and V_2, with planted partitions in each, and ask at what densities efficient algorithms can recover the partition of the smaller vertex set.

When |V_2| >>|V_1|, multiple thresholds emerge. We first locate a sharp threshold for detection of the partition, in the sense of the results of \cite{mossel2012stochastic,mossel2013proof} and \cite{massoulie2014community} for the stochastic block model. We then show that at a higher edge density, the singular vectors of the rectangular biadjacency matrix exhibit a localization / delocalization phase transition, giving recovery above the threshold and no recovery below. Nevertheless, we propose a simple spectral algorithm, Diagonal Deletion SVD, which recovers the partition at a nearly optimal edge density.

The bipartite stochastic block model studied here was used by \cite{feldman2014algorithm} to give a unified algorithm for recovering planted partitions and assignments in random hypergraphs and random k-SAT formulae respectively. Our results give the best known bounds for the clause density at which solutions can be found efficiently in these models as well as showing a barrier to further improvement via this reduction to the bipartite block model.