PROGRAM FOR WEDNESDAY, JULY 30TH
View: session overviewtalk overview
08:30 | What's Hot in Reinforcement Learning SPEAKER: Volodymyr Mnih |
08:45 | What's Hot in Machine Learning SPEAKER: Animashree Anandkumar |
09:00 | Challenges in Robotics SPEAKER: Stefanie Tellex |
08:30 | Program Induction in the Real World SPEAKER: Stephen Muggleton ABSTRACT. Much of the world’s population use computers for everyday tasks, but most fail to benefit from the power of computation due to their inability to program. Inductive Program Synthesis, the subject of a long tradition of ongoing aca- demic AI research1 [6, 12], has recently been incorporated for the first time into a mass market product in the form of the FlashFill subsystem found in Microsoft Excel 2013. FlashFill [3, 4] employs Formal Methods techniques to support inductive synthesis of string processing programs for spreadsheets, typ- ically from a single user-provided example. The ability to learn from a single example is typical of human learning [7], while being uncharacteristic of most forms of machine learning. Despite the power of FlashFill to support a natural interface for everyday inductive programming, its present implementation falls short of the early goals of AI areas such as Inductive Logic Programming (ILP) [12] in its lack of 1) Automatic Hierarchical Program Decomposition, 2) Lack of iteration and recursion in the synthesised program, 3) Inability to re-use parts of learned programs in new tasks, 4) Inability to provide simple human understandable feedback on the form of the learned program to the user. This presentation will survey recent progress in AI approaches to inductive program synthesis and will demonstrate how these techniques are able to overcome the shortcomings of FlashFill on string transformation tasks. |
08:45 | Implications of Algorithmic and High-Frequency Trading SPEAKER: Michael Wellman ABSTRACT. Algorithms are taking over financial markets. AI researchers have a unique perspective to contribute in helping the world understand the implications of this development. |
09:00 | Spectral Learning Methods for Natural Language Processing SPEAKER: Lyle Ungar ABSTRACT. This talk will explain recent advances in spectral methods for latent variable models. We will focus on spectral methods for estimating HMMs, as this setting serves best for giving the intuition behind the algorithms and theorems about them. Spectral HMM methods form the basis for a variety of NLP applications, including spectral dependency parsing and latent variable PCFGs. We show that using such spectral methods to map from words to vectors provides features that improve performance in a number of natural language processing applications including named entity recognition (is a mention of "Paris" a place, person or perfume?) and parsing. Spectral HMM estimation methods can also be elegantly generalized to include continuous emissions, for applications including speech recognition, machine vision and robotics. Examples will be drawn primarily from NLP, but these techniques are increasing also used in robotics and machine vision. |
08:30 | What's Hot in Social Media SPEAKER: Eytan Adar |
08:45 | Challenges in Combinatorial Search SPEAKER: Nathan Sturtevant |
09:00 | Challenges in Cognitive Science SPEAKER: Vincent C. Mueller |
08:30 | Advances in Developing Physical and Cognitive Surrogates for Remote Operations—The Mars Exploration Rovers as Collaboration Tools SPEAKER: William Clancey ABSTRACT. This presentation draws primarily from the author’s book (Clancey, 2012)—which recently received the AIAA 2014 Gardner-Lasser History Literature Award—about the MER scientists' experience in their "voyages of discovery" exploring the surface of Mars, as well as related studies by Vertesi (2012), Cohn (2013), and Mazmanian (in press). Because of the infrequency of planetary missions the opportunities for study are few, however lessons learned could have a fundamental, broad effect on how we conceive of the role of robotic surrogates of teams on Earth in scientific and military field operations, hospitals, factories, etc. |
08:45 | Knowledge Compilation SPEAKER: Adnan Darwiche ABSTRACT. Knowledge compilation is an area of research that has a long tradition in AI. Initially, work in this area took the form of searching for tractable languages based on CNFs (e.g. Horn clauses). However, the area took a different turn a decade ago with the publication of the "Knowledge Compilation Map." Since then, the work on knowledge compilation became structured across three major dimensions: identifying new tractable languages and placing them on the map by characterizing their succinctness and the polytime operations they support; building compilers that map propositional knowledge bases into tractable languages; and using these languages in various applications, such as diagnosis, planning, probabilistic reasoning, and statistical relational learning. More recently, knowledge compilation has greatly influenced the area of probabilistic databases and became also increasingly influential in first-order probabilistic inference (having been mostly restricted to propositional representations in the last two decades). Another area of influence is in the learning of tractable probabilistic models, as knowledge compilation has formed the basis of a number of recent approaches in this area of research---ICML will host the First International Workshop on Learning Tractable Probabilistic Models (LTPM) in 2014. The purpose of this talk would be to give a perspective on knowledge compilation, while highlighting some of its historical milestones and paying particular attention to its recent role in first-order reasoning, probabilistic learning, and probabilistic databases. The goal would be to identify current trends and research problems that could benefit from further contributions by the broader AI community. |
09:00 | From Programs to Program Spaces: Leveraging Machine Learning and Optimisation for Automated Algorithm Design SPEAKER: Holger Hoos ABSTRACT. In recent years, there has been a significant increase in the use of automated algorithm design methods, such as portfolio-based algorithm selection and automated algorithm configuration, across many areas within AI (and beyond). These methods are based on cutting-edge machine learning and optimisation methods and, in many cases, have prompted advances in those areas, as they stimulate the development of new techniques and can be applied to automatically improve machine learning and optimisation methods, as seen from recent successes in hyperparameter optimisation and mixed integer programming. This sets the stage for a paradigm shift in the design of algorithms for computationally challenging problems: Rather than to strive for parameter-less (or almost parameter-less) algorithms with robust performance across a broad range of problem instances, algorithm developers are now encouraged to avoid premature commitment during the design process, to actively seek design variants and to expose parameters; those deferred choices will then be made automatically, using automated algorithm design techniques, in such a way that performance is optimised for a specific application context. This is the key idea behind programming by optimisation; it essentially takes us from thinking about programs to opening up large combinatorial spaces of programs. Intriguingly, this paradigm shift is echoed by recent development in software engineering, which gave rise to theoretical concepts, like variation programming and the choice calculus for variation maintenance, as well as practical approaches, such as adaptation-based programming. The goal of this presentation is to outline these developments and to present a vision of the process of designing solvers for challenging AI problems and, more generally, versatile software for performance-critical tasks, under this new paradigm. This includes mechanisms for the automated generation of effective parallel programs from sequential solvers for the same problem, of portfolio-based algorithm selectors and of on-line parameter control strategies. A second focus will be on the impact this paradigm shift is likely to have on machine learning, search and software engineering, in terms of the development it is likely to spawn in these areas and the advances that will further enable it. Finally, I will discuss how programming by optimisation and related approaches will encourage us to think very differently about the way in which we solve challenging problems, and how, in fact, it can unlock human creativity in problem solving to a much larger degree as is currently the case, as well as provide new scientific insight into what makes problems challenging to solve. |
09:00 | The Conversational User Interface SPEAKER: Ron Kaplan ABSTRACT. Work on both the graphical user interface (GUI) and the conversational user interface (CUI) started at about the same time, about 40 years ago. The GUI was a lot easier to implement, and it made computing and information resources available to ordinary people. But over the years it has lost much of its simplicity and charm. The CUI has taken many more years to develop, requiring major scientific and engineering advances in speech, natural language processing, user-modeling, and reasoning, not to mention increases in cost-effective computation. But the infrastructure is now in place for the creation and widespread distribution of conversational interfaces. This talk describes some natural modes of conversational interaction and some of the supporting technologies that are now under development. |
09:15 | A Parameterized Complexity Analysis of Generalized CP-Nets SPEAKER: Andreas Pfandler ABSTRACT. Generalized CP-nets (GCP-nets) allow a succinct representation of preferences over multi-attribute domains. As a consequence of their succinct representation, many GCP-net related tasks are computationally hard. Even finding the more preferable of two outcomes is PSPACE-complete. In this work, we employ the framework of parameterized complexity to achieve two goals: First, we want to gain a deeper understanding of the complexity of GCP-nets. Second, we search for efficient fixed-parameter tractable algorithms. |
09:30 | Evolutionary Dynamics of Q-Learning over the Sequence Form SPEAKER: unknown ABSTRACT. Multi-agent learning is a challenging open task in artificial intelligence. It is known an interesting connection between multi-agent learning algorithms and evolutionary game theory, showing that the learning dynamics of some algorithms can be modeled as replicator dynamics with a mutation term. Inspired by the recent sequence-form replicator dynamics, we develop a new version of the Q-learning algorithm working on the sequence form of an extensive-form game allowing thus an exponential reduction of the dynamics length w.r.t. those of the normal form. The dynamics of the proposed algorithm can be modeled by using the sequence-form replicator dynamics with a mutation term. We show that, although sequence-form and normal-form replicator dynamics are realization equivalent, the Q-learning algorithm applied to the two forms have non-realization equivalent dynamics. Originally from the previous works on evolutionary game theory models form multi-agent learning, we produce an experimental evaluation to show the accuracy of the model. |
09:45 | Two Case Studies for Trading Multiple Indivisible Goods with Indifferences SPEAKER: unknown ABSTRACT. Designing mechanisms that satisfy individual rationality, Pareto efficiency, and strategyproofness is one of the most important problems in mechanism design. In this paper we investigate mechanism design for exchange models where each agent is initially endowed with a set of goods, each agent may have indifferences on distinct bundles of goods, and monetary transfers are not allowed. Sönmez (1999) showed that in such models, those three properties are not compatible in general. The impossibility, however, only holds under an assumption on preference domains. The purpose of this paper is to give a discussion on the compatibility of those three properties when the assumption does not hold. We first establish a preference domain called top-only preferences, which violates the assumption, and develop a class of exchange mechanisms satisfying all those properties. Each mechanism in the class utilizes one instance of mechanisms introduced by Saban and Sethuraman (2013). We also find a class of preference domains called m-chotomous preferences, where the assumption fails and those properties are incompatible. |
10:20 | Encoding Tree Sparsity in Multi-Task Learning: A Probabilistic Framework SPEAKER: unknown ABSTRACT. Multi-task learning seeks to improve the generalization performance by sharing common information among multiple related tasks. A key assumption in most MTL algorithms is that all tasks are related, which, however, may not hold in many real-world applications. Existing techniques, which attempt to address this issue, aim to identify groups of related tasks using group sparsity. In this paper, we propose a probabilistic tree sparsity (PTS) model to utilize the tree structure to obtain the sparse solution instead of the group structure. Specifically, each model coefficient in the learning model is decomposed into a product of multiple component coefficients each of which corresponds to a node in the tree. Based on the decomposition, Gaussian and Cauchy distributions are placed on the component coefficients as priors to restrict the model complexity. We devise an efficient expectation maximization algorithm to learn the model parameters. Experiments conducted on both synthetic and real-world problems show the effectiveness of our model compared with state-of-the-art baselines. |
10:35 | Labeling Complicated Objects: Multi-View Multi-Instance Multi-Label Learning SPEAKER: unknown ABSTRACT. Multi-Instance Multi-Label (MIML) is a learning framework where an example is associated with multiple labels and represented by a set of feature vectors (multiple instances). In the original formalization of MIML learning, instances come from a single source (single view). To leverage multiple information sources (multi-view), we develop a multi-view MIML framework based on hierarchical Bayesian Network, and present an effective learning algorithm based on variational inference. The model can naturally deal with the examples in which some views could be absent (partial examples). On multi-view datasets, it is shown that our method is better than other multi-view and single-view approaches particularly in the presence of partial examples. On single-view benchmarks, extensive evaluation shows that our approach is highly comparable or better than other MIML approaches on labeling examples and instances. Moreover, our method can effectively handle datasets with a large number of labels. |
10:50 | Cross-Domain Metric Learning Based on Information Theory SPEAKER: Wei Wang ABSTRACT. Supervised metric learning plays a substantial role in statistical classification. Conventional metric learning algorithms have limited utility when the training data and the testing data are drawn from related but different domains (i.e., source domain and target domain). Although this issue has got some progress in feature-based transfer learning, most of the work in this area suffers from non-trivial optimization and pays little attention to preserving the discriminating information. In this paper, we propose a novel metric learning algorithm to transfer knowledge from the source domain to the target domain in an information-theoretic setting, where a shared Mahalanobis distance across two domains is learnt by combining three goals together: 1) reducing the distribution difference between different domains; 2) preserving the geometry of target domain data; 3) aligning the geometry of source domain data with its label information. Based on this combination, the learnt Mahalanobis distance effectively transfers the discriminating power and propagates standard classifiers across two domains. More importantly, our proposed method has closed-form solution and can be efficiently optimized. Experiments on two real-world applications (i.e., face recognition and text classification) demonstrate the effectiveness and efficiency of our proposed method. |
11:05 | HC-Search for Multi-Label Prediction: An Empirical Study SPEAKER: unknown ABSTRACT. Multi-label learning concerns learning multiple, overlapping, and correlated classes. In this paper, we adapt a recent structured prediction framework called HC-Search for multi-label prediction problems. One of the main advantages of this framework is that its training is sensitive to the loss function, unlike the other multi-label approaches that either assume a specific loss function or require a manual adaptation to each loss function. We empirically evaluate our instantiation of the HC-Search framework along with many existing multi-label learning algorithms on a variety of benchmarks by employing diverse task loss functions. Our results demonstrate that the performance of existing algorithms tends to be very similar in most cases, and that the HC-Search approach is comparable and often better than all other algorithms across different loss functions. |
11:20 | Learning Instance Concepts from Multiple-Instance Data with Bags as Distributions SPEAKER: unknown ABSTRACT. We analyze and evaluate a generative process for multiple-instance learning (MIL) in which bags are distributions over instances. We show that our generative process contains as special cases generative models explored in prior work, while excluding scenarios known to be hard for MIL. Further, under the mild assumption that every negative instance is observed with nonzero probability in some negative bag, we show that it is possible to learn concepts that accurately label instances from MI data in this setting. Finally, we show that standard supervised approaches can learn concepts with low area-under-ROC error from MI data in this setting. We validate this surprising result with experiments using several real-world MI datasets that have been annotated with instance labels. |
11:35 | Online Multi-Task Learning via Sparse Dictionary Optimization SPEAKER: unknown ABSTRACT. This paper develops an efficient online algorithm for learning multiple consecutive tasks based on the K-SVD algorithm for sparse dictionary optimization. We first derive a batch multi-task learning method that builds upon K-SVD, and then extend the batch algorithm to train models online in a lifelong learning setting. The resulting method has lower computational complexity than other current lifelong learning algorithms while maintaining nearly identical performance. Additionally, the proposed method offers an alternate formulation for lifelong learning that supports both task and feature similarity matrices. |
10:20 | SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis SPEAKER: Erik Cambria ABSTRACT. SenticNet is a publicly available semantic and affective resource for concept-level opinion mining and sentiment analysis. Rather than using graph-mining and dimensionality-reduction techniques, SenticNet 3 makes use of `energy flows' to connect various parts of extended common and common-sense knowledge representations to one another. SenticNet 3 models nuanced semantics and sentics (that is, the conceptual and affective information associated with multi-word natural language expressions), representing information with a symbolic opacity intermediate between that of neural networks and of typical symbolic systems. |
10:35 | Improving Domain-independent Cloud-based Speech Recognition with Domain-dependent Phonetic Post-processing SPEAKER: unknown ABSTRACT. Automated speech recognition (ASR) technology has been developed to such a level that off-the-shelf distributed speech recognition services are available (free of cost) that allow researchers to integrate speech into their applications with little development effort or expert knowledge leading to better results compared with previously used open-source tools. Often, however, such services do not accept language models or grammars but process free speech from any domain. While results are very good given the enormous size of the search space, results frequently contain out-of-domain words or constructs that cannot be understood by subsequent domain-dependent natural language understanding (NLU) components. In this paper we present a versatile post-processing technique based on phonetic distance that integrates domain knowledge with open-domain ASR results, leading to improved ASR performance. Notably, our technique is able to make use of domain restrictions using various degrees of domain knowledge, ranging from pure vocabulary restrictions via grammars or N-grams to restrictions of the acceptable utterances. We present results for a variety of corpora (mainly from human-robot interaction) where our combined approach significantly outperforms Google ASR as well as a plain open-source ASR solution. |
10:50 | Knowledge Graph Embedding by Translating on Hyperplanes SPEAKER: unknown ABSTRACT. We deal with embedding a large scale knowledge graph composed of entities and relations into a continuous vector space. TransE is a promising method proposed recently which is very efficient while achieving the state-of-the-art predictive performance. We discuss about some mapping properties of relations which should be considered in embedding, such as symmetric, one-to-many, many-to-one, and many-to-many. We point out that TransE does not do well in dealing with these properties. Some complex models are capable to preserve these mapping properties but sacrificing the efficiency. To make a good trade-off between model capacity and efficiency, in this paper we propose a method where a relation is modeled as a hyperplane together with a translation operation on it. In this way, we can well preserve the above mapping properties of relations with almost the same model complexity of TransE. Meanwhile, as a practical knowledge graph is often far from completed, how to construct negative examples to reduce false negative labels in training is very important. Utilizing the one-to-many/many-to-one mapping property of a relation, we propose a simple trick to reduce the possibility of false negative labeling. We conduct extensive experiments on the tasks of link prediction, triplet classification and fact extraction on benchmark data sets from WordNet and Freebase. They show impressive improvements on predictive accuracy and also the capability to scale up. |
11:05 | Mind the Gap: Machine Translation by Minimizing the Semantic Gap in Embedding Space SPEAKER: Jiajun Zhang ABSTRACT. The conventional statistical machine translation (SMT) methods perform the decoding process by compositing a set of the translation rules which have the highest probability. However, the probabilities of the translation rules are calculated only according to the cooccurrence statistics in the bilingual corpus rather than the semantic meaning similarity. In this paper, we propose a Recursive Neural Network (RNN) based model that converts each translation rule into a compact real-valued vector in the semantic embedding space and performs the decoding process by minimizing the semantic gap between the source language string and its translation candidates at each state in a bottom-up structure. The RNN-based translation model is trained using a max-margin objective function. Extensive experiments on Chinese-to-English translation show that our RNN-based model can significantly improve the translation quality by up to 1.68 BLEU score. |
11:20 | Hybrid Singular Value Thresholding for Tensor Completion SPEAKER: unknown ABSTRACT. In this paper, we study the low-rank tensor completion problem, where a high-order tensor with missing entries is given and the goal is to complete the tensor. We propose to minimize a new convex objective function, based on log sum of exponentials of nuclear norms, that promotes the low-rankness of unfolding matrices of the completed tensor. We show for the first time that the proximal operator to this objective function is readily computable through a hybrid singular value thresholding scheme. This leads to a new solution to high-order (low-rank) tensor completion via convex relaxation. We show that this convex relaxation and the resulting solution are much more effective than existing tensor completion methods (including those also based on minimizing ranks of unfolding matrices). The hybrid singular value thresholding scheme can be applied to any problem where the goal is to minimize the maximum rank of a set of low-rank matrices. |
11:35 | PREGO: An Action Language for Belief-Based Cognitive Robotics in Continuous Domains SPEAKER: unknown ABSTRACT. Cognitive robotics is often subject to the criticism that the proposals investigated in the literature are far removed from the kind of continuous uncertainty and noise seen in actual real-world robotics. This paper proposes a new language and an implemented system, called PREGO, based on the situation calculus, that is able to reason effectively about degrees of belief against noisy sensors and effectors in continuous domains. It embodies the representational richness of conventional logic-based action languages, such as context-sensitive successor state axioms, but is still shown to be efficient using a number of empirical evaluations. We believe that PREGO is a simple yet powerful dialect to explore real-time reactivity and an interesting bridge between logic and probability for cognitive robotics applications. |
10:20 | Deploying CommunityCommands: A Software Command Recommender System Case Study SPEAKER: Wei Li ABSTRACT. In 2009 we presented the idea of using collaborative filtering within a complex software application to help users learn new and relevant commands (Matejka et al. 2009). This project continued to evolve and we explored the design space of a contextual software command recommender system and completed a four-week user study (Li et al. 2011). We then expanded the scope of our project by implementing CommunityCommands, a fully functional and deployable recommender system. CommunityCommands was made available as a publically available plug-in download for Autodesk’s flagship software application AutoCAD. During a one-year period, the recommender system was used by more than 1100 AutoCAD users. In this paper, we present our system usage data and payoff. We also provide an in-depth discussion of the challenges and design issues associated with developing and deploying the front end AutoCAD plug-in and its back end system. This includes a detailed description of the issues surrounding cold start and privacy. We also discuss how our practical system architecture was designed to leverage Autodesk’s existing Customer Involvement Program (CIP) data to deliver in-product contextual recommendations to end-users. Our work sets important groundwork for the future development of recommender systems within the domain of end-user software learning assistance. |
10:50 | Predictive Models for Determining If and When to Display Online Lead Forms SPEAKER: Tim Chan ABSTRACT. This paper will demonstrate a machine learning appli- cation for predicting positive lead conversion events on the Edmunds.com website, an American destination for car shopping. A positive conversion event occurs when a user fills out and submits a lead form interstitial. We used machine learning to identify which users might want to fill out lead forms, and where in their sessions to present the interstitials. There are several factors that make these predictions difficult, such as (a) far more negative than positive responses (b) seasonality effects due to car sales events near holidays, which require the model to be easily tunable and (c) the need for compu- tationally fast predictions for real-time decision-making in order to minimize any impact on the website’s us- ability. Rather than develop a single highly complex model, we used an ensemble of three simple models: Naive Bayes, Markov Chain, and Vowpal Wabbit. The ensemble generated significant lift over random predic- tions and demonstrated comparable accuracy to an ex- ternal consulting company’s model. |
11:20 | CiteSeerX: AI in a Digital Library Search Engine SPEAKER: Jian Wu ABSTRACT. CiteSeerX is a digital library search engine that provides access to more than 4 million academic documents with nearly a million users and millions of hits per day. Artificial intelligence (AI) technologies are used in many components of CiteSeerX, e.g. to accurately extract metadata, intelligently crawl the web, and ingest documents. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5–6 years. We also show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines. |
13:00 | AI and AAAI: Fascinating Research and Engaged Community SPEAKER: Manuela Veloso ABSTRACT. AI research and development are currently in great demand. As we experience an enormous increase in a wide variety of cyber-physical-social systems and the goals to make them increasingly intelligent, AI is needed in all its multiple and varied technical facets. The talk will make the case for the need of an increased integration of our worldwide research, development, and organizational efforts and initiatives. Highlights of the past, of the recent present, and proposed directions for the future of AAAI will be presented. The talk will include and acknowledge input from the AAAI office, members, and current and past leadership. |
14:00 | Feature-Cost Sensitive Learning with Submodular Trees of Classifiers SPEAKER: unknown ABSTRACT. During the past decade, machine learning algorithms have become commonplace in large-scale real-world industrial applications. In these settings, the computation time to train and test machine learning algorithms is a key consideration. At training-time the algorithms must scale to very large data set sizes. At testing-time, the cost of feature extraction can dominate the CPU runtime. Recently, a promising method was proposed to account for the feature extraction cost at testing time, called Cost-sensitive Tree of Classifiers (CSTC). Although the CSTC problem is NP-hard, the authors suggest an approximation through a mixed-norm relaxation across many classifiers. This relaxation is slow to train and requires involved optimization hyperparameter tuning. We propose a different relaxation using approximate submodularity, called Approximately Submodular Tree of Classifiers (ASTC). ASTC is much simpler to implement, yields equivalent results but requires no optimization hyperparameter tuning and is up to two orders of magnitude faster to train. |
14:30 | A Computational Challenge Problem in Materials Discovery: Synthetic Problem Generator and Real-World Datasets SPEAKER: unknown ABSTRACT. Newly-discovered materials have been central to recent technological advances. They have contributed significantly to breakthroughs in electronics, renewable energy and green buildings, and overall, have promoted the advancement of global human welfare. Yet, only a fraction of all possible materials have been explored. Accelerating the pace of discovery of materials would foster technological innovations, and would potentially address pressing issues in sustainability, such as energy production or consumption. The bottleneck of this discovery cycle lies, however, in the analysis of the materials data. As materials scientists have recently devised techniques to efficiently create thousands of materials and experimentalists have developed new methods and tools to characterize these materials, the limiting factor has become the data analysis itself. Hence, the goal of this paper is to stimulate the development of new computational techniques for the analysis of materials data, by bringing together the complimentary expertise of materials scientists and computer scientists. In collaboration with two major research laboratories in materials science, we provide the first publicly available dataset for the phase map identification problem. In addition, we provide a parameterized synthetic data generator to assess the quality of proposed approaches, as well as tools for data visualization and solution evaluation. |
15:00 | Collaborative Models for Referring Expression Generation in Situated Dialogue SPEAKER: unknown ABSTRACT. In situated dialogue with artificial agents (e.g., robots), although a human and an agent are co-present, the agent's representation and the human's representation of the shared environment are significantly mismatched. Because of this misalignment, previous work has shown that when the agent applies traditional approaches to generate referring expressions to describe target objects, the intended objects often cannot be correctly identified by the human. To address this problem, motivated by collaborative behaviors in human referential communication, we have developed two collaborative models - an episodic model and an installment model - for referring expression generation. In both models, instead of generating a single referring expression to describe a target object as in the previous work, it generates multiple small expressions that lead to the target object with a goal to minimize the collaborative effort. In particular, our installment model incorporates human feedback in a reinforcement learning framework to learn the optimal generation strategies. Our empirical results have shown that the episodic model and the installment model outperform previous non-collaborative models with an absolute gain of 6% and 21% respectively. |
15:30 | On Boosting Sparse Parities SPEAKER: Lev Reyzin ABSTRACT. While the ensemble method of boosting has been extensively studied, considerably less attention has been devoted to the task of designing good weak learning algorithms. In this paper we consider the problem of designing weak learners that are especially adept to the boosting procedure and specifically the AdaBoost algorithm. First we describe conditions desirable for a weak learning algorithm. We then propose using sparse parity functions as weak learners, which have many of our desired properties, as weak learners in boosting. Our experimental tests show the proposed weak learners to be competitive with the most widely used ones: decision stumps and pruned decision trees. |
16:00 | Regret-Based Multi-Agent Coordination with Uncertain Task Rewards SPEAKER: Feng Wu ABSTRACT. Many multi-agent coordination problems can be represented as DCOPs. Motivated by task allocation in disaster response, we extend standard DCOP models to consider uncertain task rewards where the outcome of completing a task depends on its current state, which is randomly drawn from unknown distributions. The goal of solving this problem is to find a solution for all agents that minimizes the overall worst-case loss. This is a challenging problem for centralized algorithms because the search space grows exponentially with the number of agents and is nontrivial for existing algorithms for standard DCOPs. To address this, we propose a novel decentralized algorithm that incorporates Max-Sum with iterative constraint generation to solve the problem by passing messages among agents. By so doing, our approach scales well and can solve instances of the task allocation problem with hundreds of agents and tasks. |
16:30 | How Long Will It Take? Accurate Prediction of Ontology Reasoning Performance SPEAKER: unknown ABSTRACT. For expressive ontology languages such as OWL 2 DL, classification is a computationally expensive task---2\textsc{NExpTime}-complete in the worst case. Hence, it is highly desirable to be able to accurately estimate classification time, especially for large and complex ontologies. Recently, machine learning techniques have been successfully applied to predicting the reasoning \emph{hardness category} for a given (ontology, reasoner) pair. In this paper, we further develop predictive models to estimate actual classification time using regression techniques, with ontology metrics as features. Our large-scale experiments on 6 state-of-the-art OWL 2 DL reasoners and more than 450 significantly diverse ontologies demonstrate that the prediction models achieve high accuracy, good generalizability and statistical significance. Such prediction models have a wide range of applications. We demonstrate how they can be used to efficiently and accurately identify \emph{performance hotspots} in an large and complex ontology, an otherwise very time-consuming and resource-intensive task. |
17:00 | Avoiding Plagiarism in Markov Sequence Generation SPEAKER: unknown ABSTRACT. Markov processes are widely used to generate sequences that imitate a given style, using random walk. Random walk generates sequences by iteratively concatenating states to prefixes of length equal or less than the given Markov order. However, at higher orders, Markov chains tend to replicate chunks of the corpus with a size possibly higher than the order, a primary form of plagiarism. In fact, the Markov order defines a maximum length for training but not for generation. In the framework of constraint satisfaction (CSP), we introduce MaxOrder. This global constraint ensures that generated sequences do not include chunks larger than a given maximum order. We exhibit an automaton that recognises the solution set, with a size linear in the size of the corpus. We propose a linear-time procedure to generate this automaton from a corpus and a given max order. We then use this automaton to achieve generalised arc consistency for the MaxOrder constraint, holding on a sequence of size n, in O(n.T) time, where T is the size of the automaton. We illustrate our approach by generating text sequences from text corpora with a maximum order guarantee, effectively controlling plagiarism. |
17:30 | Fast Multi-Instance Multi-Label Learning SPEAKER: unknown ABSTRACT. In multi-instance multi-label learning (MIML), one object is represented by multiple instances and simultaneously associated with multiple labels. Existing MIML approaches have been found useful in many applications; however, most of them can only handle moderate-sized data. To efficiently handle large data sets, we propose the MIMLfast approach, which first constructs a low-dimensional subspace shared by all labels, and then trains label specific linear models to optimize approximated ranking loss via stochastic gradient descent. Although the MIML problem is complicated, MIMLfast is able to achieve excellent performance by exploiting label relations with shared space and discovering sub-concepts for complicated labels. Experiments show that the performance of MIMLfast is highly competitive to state-of-the-art techniques, whereas its time cost is much less; particularly, on a data set with 30K bags and 270K instances, where none of existing approaches can return results in 24 hours, MIMLfast takes only 12 minutes. Moreover, our approach is able to identify the most representative instance for each label, and thus providing a chance to understand the relation between input patterns and output semantics. |
18:00 | Modeling and Mining Spatiotemporal Patterns of Infection Risk from Heterogeneous Data for Active Surveillance Planning SPEAKER: unknown ABSTRACT. Active surveillance is a desirable way to prevent the spread of infectious diseases in that it aims to timely discover individual incidences through an active searching for patients. However, in practice active surveillance is difficult to implement especially when monitoring space is large but the available resources are limited. Therefore, it is extremely important for public health authorities to know how to distribute their very sparse resources to high-priority regions so as to maximize the outcomes of active surveillance. In this paper, we raise the problem of active surveillance planning and provide an effective method to address it via modeling and mining spatiotemporal patterns of infection risk from heterogeneous data sources. Taking malaria as an example, we perform an empirical study on real-world data to validate our method and provide our new findings. |
18:30 | Scalable Sparse Covariance Estimation via Self-Concordance SPEAKER: unknown ABSTRACT. We consider the class of convex minimization problems, composed of a self-concordant function, such as the logdet metric, a convex data fidelity term h() and, a regularizing -- possibly non-smooth -- function g(), accompanied with an easily computable proximity operator. These type of problems have recently attracted a great deal of interest, mainly due to their omnipresence in top-notch applications. Under this locally Lipschitz continuous gradient setting, we analyze the convergence behavior of proximal Newton schemes with the added twist of a probable presence of inexact evaluations; a scenario that has not been considered yet, to the best of our knowledge. By using standard convex tools combined with self-concordance machinery, we provide a concise convergence theory with attractive convergence rate guarantees, and enhance state-of-the-art optimization schemes to accommodate such developments. Experimental results on sparse covariance estimation show the merits of our algorithm, both in terms of recovery efficiency and complexity, rendering the proposed framework a suitable choice for such problems. |
19:00 | Spatio-Temporal Consistency as a Means to Identify Unlabeled Objects in a Continuous Data Field SPEAKER: unknown ABSTRACT. Mesoscale ocean eddies are a critical component of the Earth System as they dominate the ocean's kinetic energy and impact the global distribution of oceanic heat, salinity, momentum, and nutrients. Thus, accurately representing these dynamic features is critical for our planet's sustainability. The majority of methods that identify eddies from satellite observations analyze the data in a frame-by-frame basis despite the fact that eddies are dynamic objects that propagate across space and time. We introduce the notion of spatio-temporal consistency to identify eddies in a continuous spatio-temporal field, to simultaneously ensure that the features detected are both spatially consistent and temporally persistent. Our spatio-temporal consistency approach allows us to remove most of the expert criteria used in traditional methods and enables us to better render eddy dynamics by identifying smaller and longer lived eddies than existing methods. |
19:30 | Active Learning for Crowdsourcing Using Knowledge Transfer SPEAKER: unknown ABSTRACT. This paper studies the problems associated with active learning, in which multiple users with varying levels of expertise are available for labeling data. Annotations collected from different users may be noisy and unreliable, and the quality of labeled data needs to be maintained for data mining tasks. Previous solutions have included estimating individual user reliability based on existing knowledge in each task, but for this to be effective each task requires a large quantity of labeled data to provide accurate estimates. In practice, annotation budgets for a given target task are limited, so each example can be presented to only a few users, each of whom can only label a few examples. To overcome data scarcity we propose a new probabilistic model that transfers knowledge from abundant unlabeled data in auxiliary domains to help estimate labelers' expertise. Based on this model we present a novel active learning algorithm that: a) simultaneously selects the most informative example and b) queries its label from the labeler with the best expertise. Experiments on both text and image datasets demonstrate that our proposed method outperforms other state-of-the-art active learning methods. |
20:00 | Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks SPEAKER: unknown ABSTRACT. Click prediction is one of the fundamental problems in sponsored search. Most of existing studies took advantage of machine learning approaches to predict ad click for each event of ad view independently. While these studies aimed at providing a stationary interpretation on ad clicks, they were lack of the capability to understand user clicks in a dynamic way. As observed in real-world sponsored search system, user's behavior on the ad yield high dependency on how the user previously behaved along the time, especially in terms of the queries user submitted, click / non-click on ads, dwell time on landing page, etc. Inspired by these observations, we introduce a novel framework based on Recurrent Neural Networks (RNN) to model user's sequential behaviors into the click prediction process. Compared to traditional methods, this framework aims at effective click prediction by leveraging not only user's stationary historical behaviors but the rich information and patterns implied by user's dynamic sequential behaviors. Large scale evaluations on the click-through logs from a commercial search engine demonstrate that our approach can significantly improve the click prediction accuracy, compared to other time-independent approaches. |
20:30 | Learning Parametric Models for Social Infectivity in Multi-dimensional Hawkes Processes SPEAKER: unknown ABSTRACT. Efficient and effective learning of social infectivity is a critical challenge in modeling diffusion phenomenons in social networks and other applications. Existing methods require substantial amount of event cascades to guarantee the learning accuracy, while only time-invariant infectivity is considered. Our paper overcomes those two drawbacks by constructing a more compact model and parameterizing the infectivity using time-varying features, thus dramatically reduces the data requirement, and enable the learning of time-varying infectivity which also takes into account the underlying network topology. We replace the pairwise infectivity in the multidimensional Hawkes processes with linear combinations of those time-varying features, and optimize the associated coefficients with lasso regularization on coefficients. To efficiently solve the resulting optimization problem, we employ the technique of alternating direction method of multipliers, and under that framework update each coefficient independently, by optimizing a surrogate function which upper-bounds the original objective function. On both synthetic and real world data, the proposed method performs better than alternatives in terms of both recovering the hidden diffusion network and predicting the occurrence time of social events. |
21:00 | A Convex Formulation for Semi-supervised Multi-Label Feature Selection SPEAKER: unknown ABSTRACT. Explosive growth of multimedia data has brought challenge of how to efficiently browse, retrieve and organize these data. Under this circumstance, different approaches have been proposed to facilitate multimedia analysis. Several semi-supervised feature selection algorithms have been proposed to exploit both labeled and unlabeled data. However, they are implemented based on graphs, such that they cannot handle large-scale datasets. How to conduct semi-supervised feature selection on large-scale datasets has become a challenging research problem. Moreover, existing multi-label feature selection algorithms rely on eigen-decomposition with heavy computational burden, which further prevent current feature selection algorithms from being applied for big data. In this paper, we propose a novel semi-supervised multi-label feature selection for large-scale multimedia analysis. We evaluate performance of the proposed algorithm over five benchmark datasets and compare the results with state-of-the-art supervised and semi-supervised feature selection algorithms as well as baseline using all features. The experimental results demonstrate that our proposed algorithm consistently achieve superiors performances. |
21:30 | TacTex’13: A Champion Adaptive Power Trading Agent SPEAKER: unknown ABSTRACT. Sustainable energy systems of the future will no longer be able to rely on the current paradigm that energy supply follows demand. Many of the renewable energy resources do not necessarily produce the energy when it is needed, and therefore there is a need for new market structures that motivate sustainable behaviors by participants. The Power Trading Agent Competition ($\powertac$) is a new annual competition that focuses on the design and operation of future retail power markets, specifically in smart grid environments with renewable energy production, smart metering, and autonomous agents acting on behalf of customers and retailers. It uses a rich, open-source simulation platform that is based on real-world data and state-of-the-art customer models. Its purpose is to help researchers understand the dynamics of customer and retailer decision-making, as well as the robustness of proposed market designs. This paper introduces OurAgent'13, the champion agent from the inaugural competition in 2013. OurAgent is an adaptive agent that learns and reacts to the environment in which it operates, by heavily relying on reinforcement-learning and prediction methods. This paper describes the constituent components of our agent and examines the success of the complete agent through analysis of competition results and subsequent controlled experiments. |
14:00 | Emerging: A Smart Range Helping Cognitively-Impaired Persons Cooking SPEAKER: Bruno Bouchard ABSTRACT. People suffering from a loss of autonomy caused by a cognitive deficit generally have to perform important daily tasks (such as cooking) using devices and appliances designed for healthy people, which do not take into consideration their cognitive impairment. Using these devices is risky and may lead to a tragedy (e.g. fire). A potential solution to this issue is to provide automated systems, which perform tasks on behalf of the patient. However, clinical studies have shown that encouraging users to maintain their autonomy greatly help to preserve health, dignity, and motivation. Therefore, we present in this paper a new smart range prototype allowing monitoring and guiding a cognitively-impaired user in the activity of preparing a meal. This new original prototype is capable of giving adapted prompting to the user in the completion of several recipes by exploiting load cells, heat sensors and electromagnetic contacts embedded in the range. We currently own a provisional patent on this new invention, and we completed a first experimental phase. |
14:20 | Emerging: A Speech-Driven Second Screen Application for TV Program Discovery SPEAKER: Peter Z. Yeh ABSTRACT. In this paper, we present a speech-driven second screen application for TV program discovery. We give an overview of the application and its architecture. We also present a user study along with a failure analysis. The results from the study are encouraging, and demonstrate our application's effectiveness in the target domain. We conclude with a discussion of follow-on efforts to further enhance our application. |
14:40 | Emerging: Clustering Species Accumulation Curves to Identify Skill Levels of Citizen Scientists Participating in the eBird Project SPEAKER: Jun Yu ABSTRACT. Although citizen science projects such as eBird can compile large volumes of data over broad spatial and temporal extents, the quality of this data is a concern due to differences in the skills of volunteers at identifying bird species. Species accumulation curves, which plot the number of unique species observed over time, are an effective way to quantify the skill level of an eBird participant. Intuitively, more skilled observers can identify a greater number of species per unit time than inexperienced birders, resulting in a steeper curve. We propose a mixture model for clustering species accumulation curves. These clusters enable the identification of distinct skill levels of eBird participants, which can then be used to build more accurate species distribution models and to develop automated data quality filters. |
15:00 | Emerging: Pattern Discovery in Protein Networks Reveals High-Confidence Predictions of Novel Interactions SPEAKER: Hazem Radwan Ahmed ABSTRACT. Pattern discovery in protein interaction networks can reveal crucial biological knowledge on the inner workings of cellular machinery. Although far from complete, extracting meaningful patterns from proteomic networks is a non-trivial task due to their size-complexity. This paper proposes a computational framework to efficiently discover topologically-similar patterns from large proteomic networks using Particle Swarm Optimization (PSO). PSO is a robust and low-cost optimization technique that demonstrated to work effectively on the complex, mostly sparse proteomic networks. The resulting topologically-similar patterns of close proximity are utilized to systematically predict new high-confidence protein-protein interactions (PPIs). The proposed PSO-based PPI prediction method (3PI) managed to predict high-confidence PPIs, validated by more than one computational/experimental source, through a proposed PPI knowledge transfer process between topologically-similar interaction patterns of close proximity. In three case studies, over 50% of the predicted interactions for EFGR, ERBB2, ERBB3, GRB2 and UBC are overlapped with publically available interaction databases, ~80% of the predictions are found among the Top 1% results of another PPI prediction method and their genes are significantly co-expressed across different tissues. Moreover, the only single prediction example that did not overlap with any of our validation sources was recently experimentally supported by two PubMed publications. |
14:45 | Predicting the Hardness of Learning Bayesian Networks SPEAKER: Brandon Malone ABSTRACT. There are various algorithms for finding a Bayesian network structure (BNS) that is optimal with respect to a given scoring function. No single algorithm dominates the others in speed, and given a problem instance, it is a priori unclear which algorithm will perform best and how fast it will solve the problem. Estimating the running times directly is extremely difficult as they are complicated functions of the instance. The main contribution of this paper is characterization of the empirical hardness of an instance for a given algorithm based on a novel collection of non-trivial, yet efficiently computable features. Our empirical results, based on the largest evaluation of state-of-the-art BNS learning algorithms to date, demonstrate that we can predict the runtimes to a reasonable degree of accuracy, and effectively select algorithms that perform well on a particular instance. Moreover, we also show how the results can be utilized in building a portfolio algorithm that combines several individual algorithms in an almost optimal manner. |
15:00 | Relational One-Class Classification: A Non-Parametric Approach SPEAKER: unknown ABSTRACT. One-class classification approaches have been proposed in literature to learn classifiers from examples of only one class. But these approaches are not directly applicable to relational domains due to their reliance on feature vectors or distance measure. We propose a non-parametric relational one-class classification approach based on first-order trees. We learn a tree-based distance measure that iteratively introduces new relational features to differentiate relational examples. We update the distance measure so as to maximize the one-class classification performance of our model. We also relate our model definition to existing work on combination functions and density estimation. We also experimentally show that our approach can discover relevant features for this task and outperform three baseline approaches. |
15:15 | User Group Oriented Temporal Dynamics Exploration SPEAKER: unknown ABSTRACT. Dynamic online content becomes the zeitgeist. Vibrant user groups are essential participants and promoters behind it. Studying temporal dynamics is a valuable means to uncover group characters. Since different user groups usually tend to have highly diverse interest and show distinct behavior patterns, they will exhibit varying temporal dynamics. However, most current work only use global trends of temporal topics and fail to distinguish such fine-grained patterns across groups. In this paper, we propose GrosToT (Group Specific Topics-over-Time), a unified probabilistic model which infers latent user groups and temporal topics. It can model group-specific temporal topic variation from social media stream. By leveraging the comprehensive group-specific temporal patterns, GrosToT significantly outperforms state-of-the-art dynamics modeling methods. Our proposed approach shows advantage not only in temporal dynamics modeling but also group content exploration. Based on GrosToT, we uncover the interplay between group interest and temporal dynamics. Specifically, we find that group's attention to their medium-interested topics are event-driven, showing rich bursts; while its engagement in group's dominating topics are interest-driven, remaining stable over time. The dynamics over different groups vary and reflect the groups' intention. |
16:00 | Panel: From Research to Deployed Applications: Crossing the Valley of Death SPEAKER: Craig Knoblock |
16:15 | Compact Aspect Embedding For Diversified Query Expansion SPEAKER: unknown ABSTRACT. Diversified query expansion (DQE) based approaches aim to select a set of expansion terms with less redundancy among them while covering as many query aspects as possible. Recently they have experimentally demonstrate their effectiveness for the task of search result diversification. One challenge faced by existing DQE approaches is how to ensure the aspect coverage. In this paper, we propose a novel method for DQE, called compact aspect embedding, which exploits trace norm regularization to learn a low rank vector space for the query, with each eigenvector of the learnt vector space representing an aspect, and the absolute value of its corresponding eigenvalue representing the association strength of that aspect to the query. Meanwhile, each expansion term is mapped into the vector space as well. Based on this novel representation of the query aspects and expansion terms, we design a greedy selection strategy to choose a set of expansion terms to explicitly cover all possible aspects of the query. We test our method on several TREC diversification data sets, and show our method significantly outperforms the state-of-the-art approaches. |
16:15 | Learning Scripts as Hidden Markov Models SPEAKER: unknown ABSTRACT. Scripts have been proposed to model the stereotypical event sequences found in narratives. They can be applied to make a variety of inferences including filling gaps in the narratives and resolving ambiguous references. This paper proposes the first formal framework for scripts based on Hidden Markov Models (HMMs). Our framework supports robust inference and learning algorithms, which are lacking in previous clustering models. We develop an algorithm for structure and parameter learning based on Expectation Maximization and evaluate it on a number of natural and synthetic datasets. The results show that our algorithm is superior to several informed baselines for predicting future events given some past history. |
16:15 | Using Response Functions to Measure Strategy Strength SPEAKER: unknown ABSTRACT. Extensive-form games are a powerful tool for representing complex multi-agent interactions. Nash equilibrium strategies are commonly used as a solution concept for extensive-form games, but many games are too large for the computation of Nash equilibria to be tractable. In these large games, exploitability has traditionally been used to measure deviation from Nash equilibrium, and thus strategies are aimed to achieve minimal exploitability. However, while exploitability measures a strategy's worst-case performance, it fails to capture how likely that worst-case is to be observed in practice. In fact, empirical evidence has shown that a less exploitable strategy can perform worse than a more exploitable strategy in one-on-one play against a variety of opponents. In this work, we propose a class of response functions that can be used to measure the strength of a strategy. We prove that standard no-regret algorithms can be used to learn optimal strategies for a scenario where the opponent uses one of these response functions. We demonstrate the effectiveness of this technique in Leduc poker against opponents that use the UCT Monte Carlo tree search algorithm. |
16:15 | Relaxation Search: a Simple Way of Managing Optional Clauses SPEAKER: unknown ABSTRACT. A number of problems involve managing a set of optional clauses. For example, the soft clauses in a MaxSat formula are optional---they can be falsified for a cost. Similarly when computing a Minimum Correction Set for an unsatisfiable formula all clauses are optional---some can be falsified in order to make the remaining satisfiable. In both of these cases the task is to find a subset of the optional clauses that achieves some optimization criteria and is satisfiable. Relaxation search is a simple method of using a standard SAT solver to solve this task. Relaxation search is very easy to implement, sometimes requiring only a simple modification of the variable selection heuristic in the SAT solver. Furthermore, considerable flexibility and control can be achieved over the order in which subsets of optional clauses examined. We demonstrate how relaxation search can be used to solve MaxSat and to compute Minimum Correction Sets. In both cases relaxation search is able to achieve state-of-the-art performance and solve some instances other solvers are not able to solve. |
16:15 | Towards Scalable Exploration of Diagnoses in an Ontology Stream SPEAKER: Freddy Lecue ABSTRACT. Diagnosis, or the process of identifying the nature and cause of an anomaly in an ontology, has been largely studied by the Semantic Web community. In the context of ontology stream, diagnosis results are not captured by a unique fixed ontology but numerous time-evolving ontologies. Thus any anomaly can be diagnosed by a large number of different explanations depending on the version and evolution of the ontology. We address the problems of identifying, representing, exploiting and exploring the evolution of diagnoses representations. Our approach consists in a graph-based representation, which aims at (i) efficiently organizing and linking time-evolving diagnoses and (ii) being used for scalable exploration. The experiments have shown scalable diagnoses exploration in the context of real and live data from Dublin City. |
16:15 | Calibration-Free BCI Based Control SPEAKER: unknown ABSTRACT. Recent work has shown that it is possible to extract feedback information from EEG measurements of brain activity, such as error potentials, and use it to solve sequential tasks. As most Brain-Computer Interfaces, a calibration phase is required to build a decoder that translates raw EEG signals to understandable feedback signals. This paper proposes a method to solve sequential tasks based on feedback extracted from the brain without any calibration. The proposed method uses optimal policies to hallucinate the meaning of the EEG signals and select the target with the lowest expected error. Also, we use the task and symbol uncertainty as an exploration bonus for an active strategy to speed up the learning. We report online experiments where four users directly controlled an agent on a 2D grid world to reach a target without any previous calibration process. |
16:15 | Managing Change in Graph-structured Data Using Description Logics SPEAKER: unknown ABSTRACT. In this paper we consider the setting of graph-structured data that evolves as a result of operations carried out by users or applications. We study different reasoning problems, which range from ensuring the satisfaction of a given set of integrity constraints after a given sequence of updates, to deciding the (non-)existence of a sequence of actions that would take the data to an (un)desirable state, starting either from a specific data instance or from an incomplete description of it. We consider a simple action language in which actions are finite sequences of insertions and deletions of nodes and labels, and use Description Logics for describing integrity constraints and (partial) states of the data. We then formalize the data management problems mentioned above as a static verification problem and several planning problems. We provide algorithms and tight complexity bounds for the formalized problems, both for an expressive DL and for a variant of DL-Lite. |
16:15 | Joule Counting Correction for Electric Vehicles using Artificial Neural Nets SPEAKER: Michael Taylor ABSTRACT. Estimating the remaining energy in high-capacity electric vehicle batteries is essential to safe and efficient operation. Accurate estimation remains a major challenge, however, because battery state cannot be observed directly. In this paper, we demonstrate a method for estimating battery remaining energy using real data collected from the Charge Car electric vehicle. This new method relies on energy integration as an initial estimation step, which is then corrected using a neural net that learns how error accumulates from recent charge/discharge cycles. In this way, the algorithm is able to adapt to nonlinearities and variations that are difficult to model or characterize. On the collected dataset, this method is demonstrated to be accurate to within 2.5% to 5% of battery remaining energy, which equates to approximately 1 to 2 miles of residual range for the Charge Car given its 10kWh battery pack. |
16:15 | Solving Zero-Sum Security Games in Discretized Spatio-Temporal Domains SPEAKER: unknown ABSTRACT. Security games model the problem of allocating multiple resources to defend multiple targets. In such settings, the defender's action space is usually exponential in the input size. Therefore, known general-purpose linear or mixed integer program formulations scale up exponentially. One method that has been widely deployed to address this computational issue is to instead compute the marginal probabilities, of which there are only polynomially many. In this paper, we address a class of problems that cannot be handled by previous approaches based on marginal probabilities. We consider security games in discretized spatio-temporal domains, in which the schedule set is so large that even the marginal probability formulation has exponential size. We develop novel algorithms under an oracle-based algorithmic framework and show that this framework allows us to efficiently compute Stackelberg mixed strategy when the problem allows a polynomial-time oracle. For the cases in which efficient oracles are difficult to find, we propose new direct algorithms or prove hardness results. All our algorithms are examined in experiments with realistic and artificial data. |
16:15 | Robust Non-Negative Dictionary Learning SPEAKER: unknown ABSTRACT. Dictionary learning plays an important role in machine learning, where data vectors are modeled as a sparse linear combinations of basis factors (i.e., dictionary). However, how to conduct dictionary learning in noisy environment has not been well studied. Moreover, in practice, the dictionary (i.e., the lower rank approximation of the data matrix) and the sparse representations are required to be nonnegative, such as applications for image annotation, document summarization, microarray analysis. In this paper, we propose a new formulation for non-negative dictionary learning in noisy environment, where structure sparsity is enforced on sparse representation. The proposed new formulation is also robust for data with noises and outliers, due to a robust loss function used. We derive an efficient multiplicative updating algorithm to solve the optimization problem, where dictionary and sparse representation are updated iteratively. We prove the convergence and correctness of proposed algorithm rigorously. We show the differences of dictionary at different level of sparsity constraint. The proposed algorithm can be adapted for data clustering and semi-supervised learning purpose. Promising results in extensive experiments validate the effectiveness of proposed approach. |
16:15 | Doubly Regularized Portfolio with Risk Minimization SPEAKER: unknown ABSTRACT. Due to recent empirical success, machine learning algorithms have drawn sufficient attention and are becoming important analysis tools in financial industry. In particular, as the core engine of many financial services such as private wealth and pension fund management, portfolio management calls for the application of those novel algorithms. Most of portfolio allocation strategies do not account for costs from market frictions such as transaction costs and capital gain taxes, as the complexity of sensible cost models often causes the induced problem intractable. In this paper, we propose a doubly regularized sparse and consistent portfolio that provides a modest but effective solution to the above difficulty. Specifically, as all kinds of trading costs primarily root in large transaction volumes, to reduce volumes we synergistically combine two penalty terms with classic risk minimization models to ensure: (1) only a small set of assets are selected to invest in each period; (2) portfolios in subsequent trading periods are similar. To assess the new portfolio, we apply standard evaluation criteria and conduct extensive experiments on well-known benchmarks and market datasets. Compared to various state-of-the-art portfolios, the proposed portfolio demonstrates a superior performance of having both higher risk-adjusted returns and dramatically decreased transaction volumes. |
16:15 | Mixing-time Regularized Policy Gradient SPEAKER: unknown ABSTRACT. Policy gradient reinforcement learning (PGRL) methods have received substantial attention as a mean for seeking stochastic policies that maximize a cumulative reward. However, PGRL methods can often take a huge number of learning steps before it finds a reasonable stochastic policy. This learning speed depends on the mixing time of the Markov chains that are given by the policies that PGRL explores. In this paper, we give a new PGRL approach that regularizes the rule of updating the policy with the hitting time that bounds the mixing time. In particular, hitting-time regressions based on temporal-difference learning are proposed. The proposed approach will keep the Markov chain compact and can improve the learning efficiency. Numerical experiments show the proposed method outperforms the conventional policy gradient methods. |
16:15 | Generalized Higher-Order Tensor Decomposition via Parallel ADMM SPEAKER: unknown ABSTRACT. Higher-order tensors are becoming prevalent in many scientific areas such as computer vision, social network analysis, data mining and neuroscience. Traditional tensor decomposition approaches face three major challenges: model selecting, gross corruptions, and computational efficiency. To address these problems, we first propose a parallel trace norm regularized tensor decomposition method, and formulate it as a convex optimization problem. This method does not require the rank of each model to be specified beforehand, and can automatically determine the number of factors in each mode through our optimization scheme. By considering the low-rank structure of the observed tensor, we analyze the equivalent relationship of the trace norm between a low-rank tensor and its core tensor. Then, we cast a non-convex tensor decomposition model into a weighted combination of multiple much smaller-scale matrix trace norm minimization. Finally, we develop two parallel alternating direction methods of multipliers (ADMM) to solve the proposed problems. Experimental results verify that our regularized formulation is reasonable, and our methods are very robust to noise or outliers. |
16:15 | Towards Topological-transformation Robust Shape Comparison: A Sparse Representation Based Manifold Embedding Approach SPEAKER: unknown ABSTRACT. Non-rigid shape comparison based on manifold embedding using Generalized Multidimensional Scaling (GMDS) has attracted a lot of attention for its high accuracy. However, this method requires that shape surface is not elastic. In other words, it is sensitive to topological transformations such as stretching and compressing. To tackle this problem, we propose a new approach that constructs a high-dimensional space to embed the manifolds of shapes, which could completely withstand rigid transformations and considerably tolerate topological transformations. Experiments on TOSCA shapes validate the proposed approach. |
16:15 | Propagating Regular Counting Constraints SPEAKER: unknown ABSTRACT. Constraints over finite sequences of variables are ubiquitous in sequencing and timetabling. This led to general modeling techniques and generic propagators, often based on deterministic finite automata (DFA) and their extensions. We consider counter-DFAs (cDFA), which provide concise models for regular counting constraints, that is constraints over the number of times a regular-language pattern occurs in a sequence. We show how to enforce domain consistency in polynomial time for atmost and atleast regular counting constraints based on the frequent case of a cDFA with only accepting states and a single counter that can be increased by transitions. We also show that the satisfaction of exact regular counting constraints is NP-hard and that an incomplete propagator for exact regular counting constraints is faster and provides more pruning than the existing propagator from (Beldiceanu, Carlsson, and Petit 2004). Finally, by avoiding the unrolling of the cDFA used by COSTREGULAR, the space complexity reduces from O(n|\Sigma||Q|) to O(n(|\Sigma|+|Q|)), where \Sigma is the alphabet and Q the state set of the cDFA. |
16:15 | Manifold Spanning Graphs SPEAKER: unknown ABSTRACT. Graph construction is the essential first step for nearly all manifold learning algorithms. While many applications assume that a simple k-nearest or epsilon-close neighbors graph will accurately model the topology of the underlying manifold, these methods often require expert tuning and may not produce high quality graphs. In this paper, the hyperparameter sensitivity of existing graph construction methods is demonstrated. We then present a new algorithm for unsupervised graph construction, based on minimal assumptions about the input data and its manifold structure. Notably, this method requires no hyperparameter tuning. |
16:15 | Tightening Bounds for Bayesian Network Structure Learning SPEAKER: unknown ABSTRACT. A recent breadth-first branch and bound algorithm (BFBnB) for learning Bayesian network structures (Malone et al. 2011) uses two bounds to prune the search space for better efficiency; one is a lower bound calculated from pattern database heuristics, and the other is an upper bound obtained by a hill climbing search. Whenever the lower bound of a search path exceeds its upper bound, the path is guaranteed to lead to suboptimal solutions and is discarded immediately. This paper introduces methods for tightening the bounds. The lower bound is tightened by using more informed variable groupings in creating the pattern databases, and the upper bound is tightened using an anytime learning algorithm. Empirical results show that these bounds improve the efficiency of Bayesian network learning by two to three orders of magnitude. |
16:15 | Incentivizing High-Quality Content from Heterogeneous Users: On the Existence of Nash Equilibrium SPEAKER: unknown ABSTRACT. In this paper, we study the existence of pure Nash equilibrium (PNE) for the mechanisms used in Internet services (e.g., online reviews and question-answer websites) to incentivize users to generate high-quality content. Most existing work assumes that users are homogeneous and have the same ability. However, real-world users are heterogeneous and their abilities can be very different from each other due to their diverse background, culture, and profession. In this work, we consider heterogeneous users with the following framework: (1) the users are heterogeneous and each of them has a private type indicating the best quality of the content she can generate; (2) there is a fixed amount of reward to allocate to the participated users. Under this framework, we study the existence of pure Nash equilibrium of several mechanisms composed by different allocation rules, action spaces, and information settings. We prove the existence of PNE for some mechanisms and the non-existence of PNE for some mechanisms. We also discuss how to find a PNE for those mechanisms with PNE either through a constructive way or a searching algorithm. |
16:15 | Voting with Rank Dependent Scoring Rules SPEAKER: unknown ABSTRACT. Positional scoring rules in voting compute the score of an alternative by summing the scores for the alternative induced by every vote. This summation principle ensures that all votes contribute equally to the score of an alternative. We relax this assumption and, instead, aggregate scores by taking into account the rank of a score in the ordered list of scores obtained from the votes. This defines a new family of voting rules, rank-dependent scoring rules (RDSRs), based on ordered weighted average (OWA) operators, which include scoring rules, plurality, k-approval, and Olympic averages. We study some properties of these rules, and show, empirically, that certain RDSRs are less manipulable than Borda voting, across a variety of statistical cultures. |
16:15 | Latent Low-Rank Transfer Subspace Learning for Missing Modality Recognition SPEAKER: unknown ABSTRACT. We consider an interesting problem in this paper that using transfer learning in two directions to compensate missing knowledge from the target domain. Transfer learning is usually exploited as a powerful tool to mitigate the discrepancy between different databases for knowledge transfer; it can also be used for knowledge transfer between different modalities within one database. However, in either case, transfer learning will fail if the target data are missing. To overcome this, we consider knowledge transfer between different databases and modalities simultaneously in a single framework, where missing target data from one database are recovered to facilitate recognition task. We call this framework Latent Low-rank Bi-Directional Transfer Subspace Learning method (L2BTSL). First, we propose to use low-rank constraint as well as dictionary learning in a learned subspace to guide the knowledge transfer between and within different databases. Second, a latent factor is introduced to uncover the underlying structure of the missing target data. Third, bi-directional transfer learning is proposed to integrate auxiliary database for transfer learning with missing target data. Experimental results of multi-modalities knowledge transfer with missing target data demonstrate that our method can successfully inherit knowledge from the auxiliary database to complete the target domain, and therefore enhance the performance when recognizing data from the modality without any training data. |
16:15 | Diagnosing Analogue Linear Systems Using Dynamic Topological Reconfiguration SPEAKER: unknown ABSTRACT. Fault diagnosis of analogue linear systems is a challenging task and no fully automated solution exists. Two challenges in this diagnosis task are the size of the search space that much be explored and the possibility of simulation instabilities introduced by particular fault classes. We study a novel algorithm that addresses both problems. This algorithm dynamically modifies the simulation model during diagnosis by pruning parametrized components that cause discontinuity in the model. We provide a theoretical framework for predicting the speedups, which depends on the topology of the model. We empirically validate the theoretical predictions through extensive experimentation on a benchmark of circuits. |
16:15 | Sparse Compositional Metric Learning SPEAKER: unknown ABSTRACT. We propose a new approach for metric learning by framing it as learning a sparse combination of locally discriminative metrics that are inexpensive to generate from the training data. This flexible framework allows us to naturally derive formulations for global, multi-task and local metric learning. These new algorithms have several advantages over existing methods in the literature: a much smaller number of parameters to be estimated and a principled way to generalize learned metrics to new testing data points. To analyze the approach theoretically, we derive a generalization bound that justifies the sparse combination. Empirically, we evaluate our algorithms on several datasets against state-of-the-art metric learning methods. The results are consistent with our theoretical findings and demonstrate the superiority of our approach in terms of classification performance and scalability. |
16:15 | Role-Aware Conformity Influence Modeling and Analysis in Social Networks SPEAKER: unknown ABSTRACT. Conformity is the inclination of a person to be influenced by others. In this paper, we study how the conformity tendency of a person changes with her {\em role}, as defined by her structural properties in a social network. We first formalize conformity using a utility function based on the conformity theory from social psychology, and validate the proposed utility function by proving the existence of Nash Equilibria when all users in a network behave according to it. We then extend and incorporate the utility function into a probabilistic topic model, called the Role-Conformity Model (RCM), for modeling user behaviors under the effect of conformity. We apply the proposed RCM to several academic research networks, and discover that people with higher degree and lower clustering coefficient are more likely to conform to others. We also evaluate RCM through the task of word usage prediction in academic publications, and show significant improvements over the performance of baselines. |
16:15 | Planning as Model Checking in Hybrid Domains SPEAKER: unknown ABSTRACT. Planning in hybrid domains is an important and challenging task, and various planning algorithms have been proposed in the last years. From an abstract point of view, hybrid planning domains are based on hybrid automata, which have been studied intensively in the model checking community. In particular, powerful model checking algorithms and tools have emerged for this formalism. However, despite the quest for more scalable planning approaches, model checking algorithms have not been applied to planning in hybrid domains so far. In this paper, we make a first step in bridging the gap between these two worlds. We provide a formal translation scheme from PDDL+ to the standard formalism of hybrid automata, as a solid basis for using hybrid system model-checking tools for dealing with hybrid planning domains. As a case study, we use the SpaceEx model checker, showing how we can address PDDL+ domains that are out of the scope of state-of-the-art planners. |
16:15 | Computing General First-order Parallel and Prioritized Circumscription SPEAKER: unknown ABSTRACT. This paper focuses on computing general first-order parallel and prioritized circumscription with varied constants. We propose polynomial translations from general first-order circumscription to first-order stable model semantics over arbitrary structures, including $Tr_v$ for parallel circumscription and $Tr^s_v$ for several parallel circumscriptions (further for prioritized circumscription). To improve the efficiency, we give an optimization, called $\Gamma_{\exists}$, to reduce auxiliary predicates in number and logic programs in size when eliminating existential quantifiers during the translations. Based on these results, a general first-order circumscription solver, named cfo2lp, is developed by calling answer set programming solvers. Using circuit diagnosis problem and extended stable marriage problem as benchmarks, we compare cfo2lp with a propositional circumscription solver circ2dlp on efficiency. Experimental results demonstrate that for problems represented by general first-order circumscription naturally and intuitively, cfo2lp can compute all solutions over finite structures. We also apply our approach to description logics with circumscription and repairs in inconsistent databases, which can be handled effectively. |
16:15 | A Joint Optimization Model for Image Summarization Based on Image Content and Tags SPEAKER: unknown ABSTRACT. As an effective technology for navigating a large number of images, image summarization is becoming a promising task with the rapid development of image sharing sites and social networks. Most existing summarization approaches use the visual-based features for image representation without considering tag information. In this paper, we propose a novel framework, named JOINT, which employs both image content and tag information to summarize images. Our model generates the summary images which can best reconstruct the original collection. Based on the assumption that an image with representative content should also have typical tags, we introduce a similarity-inducing regularizer to our model. Furthermore, we impose the lasso penalty on the objective function to yield a concise summary set. Extensive experiments demonstrate our model outperforms the state-of-the-art approaches. |
16:15 | Pre-trained Multi-view Word Embedding using Two-side Neural Network SPEAKER: unknown ABSTRACT. Word embedding aims to learn a continuous representation for each word. It attracts increasing attention due to its effectiveness in various tasks such as named entity recognition and language modeling. Most existing word embedding results are generally trained on one individual data source such as news pages or Wikipedia articles. However, when we apply them to other tasks such as web search, the performance suffers. To obtain a robust word embedding for different applications, multiple data sources could be leveraged. In this paper, we proposed a two-side multimodal neural network to learn a robust word embedding from multiple data sources including free text, user search queries and search click-through data. This framework takes the word embeddings learned from different data sources as pre-train, and then uses a two-side neural network to unify these embeddings. The pre-trained embeddings are obtained by adapting the recently proposed CBOW algorithm. Since the proposed neural network does not need to re-train word embeddings for a new task, it is highly scalable in real world problem solving. Besides, the network allows weighting different sources differently when applied to different application tasks. Experiments on two real-world applications including web search ranking and word similarity measuring show that our neural network with multiple sources outperforms state-of-the-art word embedding algorithm with each individual source. It also outperforms other competitive baselines using multiple sources. |
16:15 | Regret-based Optimization and Preference Elicitation for Stackelberg Security Games with Uncertainty SPEAKER: unknown ABSTRACT. Stackelberg security games (SSGs) have been deployed a number of real-world security domains. One key challenge in these applications is the assessment of attacker payoffs which may not be perfectly known. Previous work has studied SSGs with uncertain payoffs modeled by interval uncertainty and maximin-based robust optimization. In contrast, this paper is the first to propose the use of the less conservative minimax regret as a decision criterion for payoff-uncertain SSGs and to present several algorithms for computing minimax regret for such games. This paper also for the first time addresses the challenge of preference elicitation in SSGs, providing novel regret-based solution strategies. Experimental results validate the runtime performance and solution quality of our approaches. |
16:15 | Globally and Locally Consistent Unsupervised Projection SPEAKER: unknown ABSTRACT. In this paper, we propose an unsupervised projection method for feature extraction to preserve both global and local consistencies of the input data in the projected space. Traditional unsupervised feature extraction methods, such as principal component analysis and locality preserving projections, can only explore either the global or local geometric structures of the input data, but not the both. In our new method, we introduce a new measurement using the neighborhood data variances to assess the data locality, by which we propose to learn an optimal projection by rewarding both the global and local structures of the input data. Moreover, to improve the robustness of the proposed learning model against outlier data samples and outlier features, which is of particular importance in unsupervised learning, we propose a new objective that simultaneously minimizes and maximizes (minmax) the L1-norm distances instead of the traditional squared L2-norm distances. Solving the formulated optimization problem is very challenging, because it minimizes and maximizes a number of non-smooth L1- norm terms at the same time. In this paper, as an important theoretical contribution, we propose a simple yet effective optimization method to solve the L1-norm minmax problem, and theoretically prove its convergence and correctness. To the best of our knowledge, our paper makes the first attempt to solve the general L1-norm minmax problem with orthogonal constraints. Extensive experiments have been performed on six benchmark data sets, where the promising results validate the proposed method. |
16:15 | Low-Rank Tensor Completion with Spatio-Temporal Consistency SPEAKER: unknown ABSTRACT. Video completion is a computer vision technique to recover the missing values in video sequences by filling the unknown regions with the known information. In recent research, tensor completion, a generalization of matrix completion for higher order data, emerges as a new solution to estimate the missing information in video with the assumption that the video frames are homogenous and correlated. However, each video clip often stores the heterogeneous episodes and the correlations among all video frames are not high. Thus, the regular tenor completion methods are not suitable to recover the video missing values in practical applications. To solve this problem, we propose a novel spatially-temporally consistent tensor completion method for recovering the video missing data. Instead of minimizing the average of the trace norms of all matrices unfolded along each mode in a tensor data, we introduce a new smoothness regularization along video time direction to utilize the temporal information between consecutive video frames. Meanwhile, we also minimize the trace norm of each individual video frame to employ the spatial correlations among pixels. Different to previous tensor completion approaches, our new method can keep the spatio-temporal consistency in video and do not assume the global correlation in video frames. Thus, the proposed method can be applied to the general and practical video completion applications. Our method shows promising results in all evaluations on 3D biomedical image sequence and video benchmark data sets. |
16:15 | Sample-Adaptive Multiple Kernel Learning SPEAKER: unknown ABSTRACT. Existing multiple kernel learning (MKL) algorithms indiscriminately apply a same set of kernel combination weights to all samples. However, the utility of base kernels could vary across samples and a base kernel useful for one sample could become noisy for another. In this case, rigidly applying a same set of kernel combination weights could adversely affect the learning performance. To improve this situation, we propose a sample-adaptive MKL algorithm, in which base kernels are allowed to be adaptively switched on/off with respect to each sample. We achieve this goal by assigning a latent binary variable to each base kernel when it is applied to a sample. The kernel combination weights and the latent variables are jointly optimized via margin maximization principle. As demonstrated on five benchmark data sets, the proposed algorithm consistently outperforms the comparable ones in the literature. |
16:15 | Gradient Descent with Proximal Average for Nonconvex and Composite Regularization SPEAKER: unknown ABSTRACT. Sparse modeling has been highly successful on high-dimensional data. While a lot of interests have been on convex regularization, recent studies show that nonconvex regularizers can outperform their convex counterparts in many situations. However, the resulting nonconvex optimization problems are often challenging, especially for composite regularizers such as the nonconvex overlapping group lasso. In this paper, by using a recent tool known as the proximal average, we propose a novel proximal gradient descent method for a wide class of composite and nonconvex problems. Instead of directly solving the proximal step with a composite regularizer, we average the solutions from the proximal problems of the individual regularizers. This simple strategy has similar convergence guarantee as existing nonconvex optimization approaches,but its per-iteration complexity is much lower. Experimental results on synthetic and real-world data sets demonstrate the effectiveness and efficiency of the proposed optimization algorithm, and also the improved prediction performance resulting from the nonconvex regularizers. |