previous day
all days

View: session overviewtalk overview

09:00-12:00 Session 5: Tutorial

Fairness, Privacy, and Ethics in Data Science Tutorial

Fairness, Privacy, and Ethics in Data Science Tutorial
12:30-13:30 Session 6: Keynote


Semantic Scholar, NLP, and the Fight Against COVID-19

ABSTRACT. My talk will describe the dramatic creation of the COVID-19 Open Research Dataset (CORD-19) at the Allen Institute for AI and the broad range of efforts, both inside and outside of the Semantic Scholar project, to garner insights into COVID-19 and its treatment based on this data.  The talk will highlight the difficult problems facing the emerging field of Scientific Language Processing.

14:00-15:50 Session 7: Plenary Session

Data Science Theory

Non-Uniform Sampling of Fixed Margin Binary Matrices

ABSTRACT. Data sets in the form of binary matrices are ubiquitous across scientific domains, and researchers are often interested in identifying and quantifying noteworthy structure. One approach is to compare the observed data to that which might be obtained under a null model.Here we consider sampling from the space of binary matrices which satisfy a set of marginal row and column sums. Whereas existing sampling methods have focused on uniform sampling from this space, we introduce modified versions of two elementwise swapping algorithms which sample according to a non-uniform probability distribution defined by a weight matrix, which gives the relative probability of a one for each entry. We demonstrate that values of zero in the weight matrix, i.e. structural zeros, are generally problematic for swapping algorithms, except when they have special monotonic structure. We explore the properties of our algorithms through simulation studies, and illustrate the potential impact of employing a non-uniform null model using a classic bird habitation dataset.

Large very dense subgraphs in a stream of edges

ABSTRACT. We study the detection and the reconstruction of a large very dense subgraph in a social graph with $n$ nodes and $m$ edges given as a stream of edges, when the graph follows a power law degree distribution, in the regime when $m=O(n. \log n)$. A subgraph is very dense if its edge density is comparable to a clique. We uniformly sample the edges with a Reservoir of size $k=O(\sqrt{n}.\log n)$. The detection algorithm of a large very dense subgraph checks whether the Reservoir has a giant component. We show that if the graph contains a very dense subgraph of size $\Omega(\sqrt{n})$, then the detection algorithm is almost surely correct. On the other hand, a random graph that follows a power law degree distribution almost surely has no large very dense subgraph, and the detection algorithm is almost surely correct. We define a new model of random graphs which follow a power law degree distribution and have large very dense subgraphs. We then show that on this class of random graphs we can reconstruct a good approximation of the very dense subgraph with high probability. We generalize these results to dynamic graphs defined by sliding windows in a stream of edges.

Toward Communication Efficient Adaptive Gradient Method

ABSTRACT. In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of training speed in distributed training is gradually shifting from computation to communication. Meanwhile, in the hope of training machine learning models on mobile devices, a new distributed training paradigm federated learning is proposed. The communication time in federated learning is especially important due to the low bandwidth of mobile devices. Various approaches to improve communication efficiency are proposed for federated learning. Yet, most of them are designed with SGD as the prototype training algorithm. While adaptive gradient methods are very effective for training neural nets, the study of adaptive gradient methods in federated learning is scarce. In this paper, we study the design of adaptive gradient methods in federated learning. As a result, we propose an adaptive gradient method that can guarantee both convergence and communication efficiency for federated learning.

Towards Practical Lipschitz Bandits

ABSTRACT. Stochastic Lipschitz bandit algorithms are methods that govern exploration-exploitation tradeoffs, and have been used for a variety of important task domains, including zeroth order optimization.While beautiful theory has been developed for the stochastic Lipschitz bandit problem, the methods arising from these theories are not practical, and accordingly, the development of practical well-performing Lipschitz bandit algorithms has stalled in recent years. To remedy this, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and arm-space.Due to this flexibility, the algorithm is able to efficiently optimize rewards and minimize regret, by focusing on the portions of the space that are most relevant. In our analysis, we link tree-based methods to Gaussian processes. In light of our analysis, we design a novel hierarchical Bayesian model for Lipschitz bandit problems. Our experiments show that our algorithms can achieve state-of-the-art performance in challenging real-world tasks such as neural network hyperparameter tuning.

On Reinforcement Learning for Turn-based Zero-sum Markov Games

ABSTRACT. We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement Learning based approach. Specifically, we propose Explore-Improve-Supervise (EIS) method that combines “exploration”, “policy improvement” and “supervised learning” to find the value function and policy associated with Nash equilibrium. We identify sufficient conditions for convergence and correctness for such an approach. For a concrete instance of EIS where random policy is used for “exploration”, Monte-Carlo Tree Search is used for “policy improvement” and Nearest Neighbors is used for “supervised learning”, we establish that this method finds an $\varepsilon$-approximate value function of Nash equilibrium in {$\widetilde{O}(\varepsilon^{-(d+4)})$} steps when the underlying state-space of the game is continuous and $d$-dimensional. This is nearly optimal as we establish a lower bound of {$\widetilde{\Omega}(\varepsilon^{-(d+2)})$} for any policy.

16:00-18:00 Session 8: Plenary Session

Foundations in Practice

Transforming Probabilistic Programs for Model Checking

ABSTRACT. Probabilistic programming is perfectly suited to reliable and transparent data science, as it allows the user to specify their models in a high-level language without worrying about the complexities of how to fit the models. Static analysis of probabilistic programs presents even further opportunities for enabling a high-level style of programming, by automating time-consuming and error-prone tasks. We apply static analysis to probabilistic programs to automate large parts of two crucial model checking methods: Prior Predictive Checks and Simulation-Based Calibration. Our method transforms a probabilistic program specifying a density function into an efficient forward-sampling form. To achieve this, we extract a factor graph from a probabilistic program using static analysis, generate a set of proposal directed acyclic graphs using a SAT solver, select a graph which will produce provably correct sampling code, then generate one or more sampling programs. We allow minimal user interaction to broaden the scope of application beyond what is possible with static analysis alone. We present an implementation targeting the popular Stan probabilistic programming language, automating large parts of a robust Bayesian workflow for a wide community of probabilistic programming users.

StyleCAPTCHA: CAPTCHA based on style-transferred images to defend against Deep Convolutional Networks

ABSTRACT. CAPTCHA has found widespread applications for bot detection in the cyberspace. Many CAPTCHAs are based on visual perception tasks such as text recognition, objection recognition and image classification. However, they are under serious threat from modern visual perception technologies, especially deep convolutional networks (DCNs). We propose a novel CAPTCHA, called Style- CAPTCHA, which asks users to classify stylized human versus animal face images. Each stylized image in StyleCAPTCHA is created by combining the content representations of a human or animal face image and the style representations of a style reference image, both of which are hidden from the user. The style is changed per a few requests of the StyleCAPTCHA service. While attackers need to repeatedly train or retrain their DCNs for adaptation to various styles, normal users extract the content information of stylized images to tell human faces apart from animal faces with no training or retraining process. In addition, we propose a metric called Classifier Cross-task Transferability to measure the ability of a classifier transferring from its original task to another task. This metric allows us to effectively arrange the schedule of styles such that the transferability of DCNs of potential attackers across classification tasks featured by various styles are limited. We showcase the utility of StyleCAPTCHA by applying it to defend against state-of-the-art face detection networks and general DCN classifiers.

Statistical significance in high-dimensional linear mixed models

ABSTRACT. This paper develops an inferential framework for high-dimensional linear mixed effect models. Such models are suitable, e.g., when collecting n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (and may be larger than M), but the number of random effects q is small. Our framework is inspired by a recent line of work that proposes de-biasing penalized estimators to perform inference for high-dimensional linear models with fixed effects only. In particular, we demonstrate how to correct a ‘naive’ ridge estimator to build asymptotically valid confidence intervals for mixed effect models. We validate our theoretical results with numerical experiments that show that our method can successfully account for the correlation induced by the random effects. For a practical demonstration we consider a riboflavin production dataset that exhibits group structure, and show that conclusions drawn using our method are consistent with those obtained on a similar dataset without group structure.

Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data

ABSTRACT. Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable Model (L-GPLVM), a variant of the Gaussian Process Latent Variable Model, for learning compact representations of such data. L-GPLVM overcomes a key limitation of the Dynamic Gaussian Process Latent Variable Model and its variants, which rely on the assumption that the data are fully observed over all of the sampled time points. We describe an effective approach to learning the parameters of L-GPLVM from sparse observations, by coupling the dynamical model with a Multitask Gaussian Process model for sampling of the missing observations at each step of the gradient-based optimization of the variational lower bound. We further show the advantage of the Sparse Process Convolution framework to learn the latent representation of sparsely and irregularly sampled longitudinal data with minimal computational overhead relative to a standard Latent Variable Model. We demonstrated experiments with synthetic data as well as variants of MOCAP data with varying degrees of sparsity of observations that show that L-GPLVM substantially and consistently outperforms the state-of-the-art alternatives in recovering the missing observations even when the available data exhibits a high degree of sparsity. The compact representations of irregularly sampled and sparse longitudinal data can be used to perform a variety of machine learning tasks, including clustering, classification, and regression.