View: session overviewtalk overview
09:00 | Synthetic theory of Ricci curvature - when information theory, optimization, geometry and gradient flows meet SPEAKER: Cedric Villani |
10:00 | Computational Lower Bounds for Community Detection on Random Graphs SPEAKER: unknown ABSTRACT. This paper studies the problem of detecting the presence of a small dense community planted in a large Erd\H{o}s-R\'enyi random graph $\calG(N,q)$, where the edge probability within the community exceeds $q$ by a constant factor. Assuming the hardness of the planted clique detection problem, we show that the computational complexity of detecting the community exhibits the following phase transition phenomenon: As the graph size $N$ grows and the graph becomes sparser according to $q=N^{-\alpha}$,there exists a critical value of $\alpha = \frac{2}{3}$, below which there exists a computationally intensive procedure that can detect far smaller communities than any computationally efficient procedure, and above which a linear-time procedure is statistically optimal. The results also lead to the average-case hardness results for recovering the dense community and approximating the densest $K$-subgraph. |
10:05 | Bad Universal Priors and Notions of Optimality SPEAKER: unknown ABSTRACT. A big open question of algorithmic information theory is the choice of the universal Turing machine (UTM). For Kolmogorov complexity and Solomonoff induction we have invariance theorems: the choice of the UTM changes bounds only by a constant. For the universally intelligent agent AIXI (Hutter, 2005) no invariance theorem is known. Our results are entirely negative: we discuss cases in which unlucky or adversarial choice of the UTM causes AIXI to misbehave drastically. We show that Legg-Hutter intelligence and thus balanced Pareto optimality is entirely subjective, and that every policy is Pareto optimal in the class of all computable environments. This undermines all existing optimality properties for AIXI. While it may still serve as a gold standard for AI, our results imply that AIXI is a relative theory, dependent on the choice of the UTM. |
10:10 | Thompson Sampling for Learning Parameterized Markov Decision Processes SPEAKER: unknown ABSTRACT. We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards. Consequently, observing a particular state transition might yield useful information about other, unobserved, parts of the MDP. We present a version of Thompson sampling for parameterized reinforcement learning problems, and derive a frequentist regret bound for priors over general parameter spaces. The result shows that the number of instants where suboptimal actions are chosen scales logarithmically with time, with high probability. It holds for priors without any additional, specific closed-form structure such as conjugate or product-form priors. Moreover, the constant factor in the logarithmic scaling encodes the information complexity of learning the MDP, in terms of the Kullback-Leibler geometry of the parameter space. |
10:15 | Fast Mixing for Discrete Point Processes SPEAKER: unknown ABSTRACT. We investigate the systematic mechanisms for designing fast mixing Markov chains Monte Carlo algorithms to sample from discrete point processes. Such processes are defined as probability distributions $\mu(S)\propto \exp(f(S))$ over all subsets $S\subseteq 2^V$ of a finite set $V$ through a bounded set function $f:2^V\rightarrow \mathbb{R}$. In particular, a subclass of discrete point processes characterized by submodular functions (which include determinantal point processes, log-submodular distributions, and submodular point processes) has recently gained a lot of interest in machine learning and shown to be effective for modeling diversity and coverage. We show that if the set function (not necessarily submodular) displays a natural notion of decay of correlation, then it is possible to design fast mixing Markov Chain Monte Carlo methods that provide size-free error bounds on marginal approximations. The conditions that we introduce involve a control on the second order (discrete) derivatives of set functions. We then provide sufficient conditions for fast mixing when the set function is submodular, and specialize our results for canonical examples. |
10:20 | On Convergence of Emphatic Temporal-Difference Learning SPEAKER: Huizhen Yu ABSTRACT. We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White as an improved solution to the problem of divergence of off-policy temporal-difference learning with linear function approximation. We present in this paper the first convergence proofs for two emphatic algorithms, ETD($\lambda$) and ELSTD($\lambda$). We prove, under general off-policy conditions, the convergence in $L^1$ for ELSTD($\lambda$) iterates and the almost sure convergence of the approximate value functions calculated by both algorithms. Our analysis involves new techniques with applications beyond emphatic algorithms leading, for example, to the first proof that standard TD($\lambda$) also converges under off-policy training for $\lambda$ sufficiently large. |
10:25 | Faster Algorithms for Testing under Conditional Sampling SPEAKER: unknown ABSTRACT. There has been considerable recent interest in distribution-tests whose run-time and sample requirements are sublinear in the domain-size $k$. We study two of the most important tests under the conditional-sampling model where each query specifies a subset $S$ of the domain, and the response is a sample drawn from $S$ according to the underlying distribution. For identity testing, which asks whether the underlying distribution equals a specific given distribution or $\delta$-differs from it, we reduce the known time and sample complexities from $O(\delta^{-4})$ to $O(\delta^{-2})$, thereby matching the information theoretic lower bound. For closeness testing, which asks whether two distributions underlying observed data sets are equal or different, we reduce existing complexity from $O(\delta^{-4} \log^5 k)$ to an even sub-logarithmic $O(\delta^{-5} \log \log k)$, and providing a better bound to an open problem in Bertinoro Workshop on Sublinear Algorithms. |
10:30 | Interactive Fingerprinting Codes and the Hardness of Preventing False Discovery SPEAKER: unknown ABSTRACT. We show an essentially tight bound on the number of adaptively chosen statistical queries that a computationally efficient algorithm can answer accurately given n samples from an unknown distribution. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is accurate if it is "close" to the correct expectation over the distribution. This question was recently studied by Dwork et al., who showed how to answer ~n^2 queries efficiently, and also by Hardt and Ullman, who showed that answering ~n^3 queries is hard. We close the gap between the two bounds and show that, under a standard hardness assumption, there is no computationally efficient algorithm that, given n samples from an unknown distribution, can give valid answers to O(n^2) adaptively chosen statistical queries. An implication of our results is that computationally efficient algorithms for answering arbitrary, adaptively chosen statistical queries may as well be differentially private. We obtain our results using a new connection between the problem of answering adaptively chosen statistical queries and a combinatorial object called an interactive fingerprinting code (Fiat and Tassa '01). In order to optimize our hardness result, we give a new Fourier-analytic approach to analyzing fingerprinting codes that is simpler, more flexible, and yields better parameters than previous constructions. |