Enhancing Human Capability with Intelligent Machine Teammates

ABSTRACT. Every team has top performers -- people who excel at working in a team to find the right solutions in complex, difficult situations. These top performers include nurses who run hospital floors, emergency response teams, air traffic controllers, and factory line supervisors. While they may outperform the most sophisticated optimization and scheduling algorithms, they cannot often tell us how they do it. Similarly, even when a machine can do the job better than most of us, it can’t explain how. In this talk I share recent work investigating effective ways to blend the unique decision-making strengths of humans and machines. I discuss the development of computational models that enable machines to efficiently infer the mental state of human teammates and thereby collaborate with people in richer, more flexible ways. Our studies demonstrate statistically significant improvements in people’s performance on military, healthcare and manufacturing tasks, when aided by intelligent machine teammates.

Bribery as a Measure of Candidate Success: Complexity Results for Approval-Based Multiwinner Rules

ABSTRACT. We study the problem of bribery in multiwinner elections, for the case where the voters cast approval ballots (i.e., sets of candidates they approve) and the bribery actions are limited to: adding an approval to a vote, deleting an approval from a vote, or moving an approval within a vote from one candidate to the other. We consider a number of approval-based multiwinner rules (AV, SAV, GAV, RAV, approval-based Chamberlin--Courant, and PAV). We find the landscape of complexity results quite rich, going from polynomial-time algorithms through NP-hardness with constant-factor approximations, to outright inapproximability. However, in general, our problems tend to be easier when we limit out the bribery actions on increasing the number of approvals of the candidate that we want to be in a winning committee (e.g., adding approvals only for this preferred candidate, or moving approvals only to him or her). We also study parameterized complexity of our problems, with a focus on parameterizations by the number of candidates and by the number of voters.

ABSTRACT. We consider elections where the voters come one at a time, in a streaming fashion, and devise space-efficient algorithms which identify an approximate winning committee with respect to common multiwinner proportional representation voting rules; specifically, we consider the Approval-based and the Borda-based variants of both the Chamberlin--Courant rule and the Monroe rule. We complement our algorithms with lower bounds. Somewhat surprisingly, our results imply that, using space which does not depend on the number of voters it is possible to efficiently identify an approximate representative committee of fixed size over vote streams with huge number of voters.

Beyond Electing and Ranking: Collective Dominating Chains, Dominating Subsets and Dichotomies

ABSTRACT. Classical voting rules, or social choice functions, map a collection of linear orders into a winning alternative or (for irresolute rules) a nonempty set of (tied) alternatives. Social welfare functions map a profile into a ranking over alternatives. There are many practical situations where we have to output a different structure than a winner or a ranking: we may have to output a ranked or non-ranked set of $k$ winning alternatives, or an ordered partition into k classes, and so on. We define several classes of such aggregation functions, induced by two parameters: a consensus class that depends on the desired structure, and a distance (or something which plays the role of a distance) between profiles. We focus on the more specific case where the distance between profiles can be defined by a distance between (weighted or unweighted) tournaments. We address the computation issues pertaining to our rules, their normative properties, and list a number of specific applications.

ABSTRACT. We study the problem of computing optimal bundles given agents' preferences over individual items when agents derive satisfaction from the entire bundle under constraints on the size k of the bundle. Building on the notion of Condorcet winning sets by Gehrlein, we extend common Condorcet consistent voting rules from the single winner voting setting to that of forming bundles of size k. Our main technical contribution involves designing efficient algorithms for computing (approximately)-optimal bundles for multi-winner extensions of the following voting rules: Copeland, Minimax, Ranked Pairs, and Schulze.

Parameterized Dichotomy of Choosing Committees Based on Approval Votes in the Presence of Outliers

ABSTRACT. We study the computational complexity of the committee selection problem for several approval-based voting rules in the presence of outliers. Our first result shows that outliers render the committee selection problem intractable for approval, net approval, and minisum approval voting rules. We next study parameterized complexity of this problem with five natural parameters, namely the target score, the size of the committee (and its dual parameter namely the number of candidates outside the committee); and, the number of outliers (and its dual parameter namely the number of non-outliers). For approval, net approval, and minisum approval voting rules, we provide a dichotomous result, which resolves the parameterized complexity of this problem for all subsets of the above five natural parameters considered (by showing either FPT or W[1]-hardness for all subsets of parameters).

ABSTRACT. For the past 16 years, much of the work on combinatorial auctions (CAs) has used the CATS instance generator (Leyton-Brown et al., 2000). However, CATS does not include a good model for spectrum auctions, which have become the most important application of CAs. In this paper, we make three contributions. First, we propose a new "Multi-Region Value Model (MRVM)" that captures the difficult to model geographic complementaries of large US and Canadian auctions. We also encode our model as a MIP, making the auction's winner determination problem tractable. Second, we introduce and release a new "spectrum auction test suite (SATS)" that provides a unified framework and code base for several of the models from the literature, as well as our new model. Third, using SATS, we evaluate our MRVM model experimentally: after fitting the model parameters to the bidding data from the 2014 Canadian auction we show that our model can represent the Canadian auction well.

Generalizing Demand Response Through Reward Bidding

ABSTRACT. Demand-side response (DR) is emerging as a crucial technology to assure stability of modern power grids. The uncertainty about the cost agents face for reducing consumption imposes challenges in achieving reliable, coordinated response.

In recent work, [Ma et al. 2016] introduce DR as a mechanism design problem and solve it for a setting where an agent has a binary preparation decision and where, contingent on choosing to prepare, the probability an agent will reduce demand is fixed. We generalize this model in two ways: (i) agents have uncertainty in their costs of responding, and (ii) agents have multiple levels of effort they can exert in preparing. For both cases, the design of contingent payments now affects the probability of response. We design a new, truthful and reliable mechanism that uses a “reward-bidding” approach rather than the “penalty-bidding” approach. It has good performance when compared to natural benchmarks. The mechanism also extends to handle multiple units of demand response from each agent.

Mechanism Design with Unknown Correlated Distributions: Can We Learn Optimal Mechanisms?

ABSTRACT. Due to Cremer and McLean (1985), it is well known that in a setting where bidders’ values are correlated, an auction designer can extract the full social surplus as revenue. However, this result strongly relies on the assumption of a common prior distribution between the mechanism designer and the bidders. A natural question to ask is, can a mechanism designer distinguish between a set of possible distributions, or failing that, use a finite number of samples from the true distribution to learn enough about the distribution to recover the Cremer and Mclean result? We show that if a bidder’s distribution is one of a countably infinite sequence of potential distributions that converges to an independent private values distribution, then there is no mechanism that can guarantee revenue more than greater than the optimal
mechanism over the independent private value mechanism, even with sampling from the true distribution. We also show that any mechanism over this infinite sequence can guarantee at most a (|\Theta| + 1)//(2 + \epsilon) approximation, where |\Theta| is the number of bidder types, to the revenue achievable by a mechanism where the designer knows the bidder’s distribution. Finally, as a positive result, we show that for any distribution where full surplus extraction as revenue is possible, a mechanism exists that guarantees revenue arbitrarily close to full surplus for sufficiently close distributions. Intuitively, our results suggest that a high degree of correlation will be essential in the effective application of correlated mechanism design techniques to settings with uncertain distributions.

ABSTRACT. Designing simple mechanisms with desirable revenue guarantees has become a major research agenda in the economics and computation community. However, few mechanisms have been actually applied in industry.

In this paper, we aim to bridge the gap between the ``simple versus optimal'' theory and practice, and propose a class of parameterized mechanisms, tailored for the sponsored search auction settings. Our mechanisms can balance different objectives by simple parameter tuning, yet at the same time guarantee near optimal revenue both theoretical and practical senses.

Thompson Sampling Based Mechanisms for Stochastic Multi-Armed Bandit Problems

ABSTRACT. This paper explores Thompson sampling in the context of mechanism design for stochastic multi-armed bandit (MAB) problems. The setting is that of an MAB problem where the reward distribution of each arm consists of a stochastic component as well as a strategic component. Many existing MAB mechanisms use upper confidence bound (UCB) based algorithms for learning the parameters of the reward distribution. Our motivation to use the Thompson sampling based approach comes from the superior regret performance of Thompson sampling when compared to UCB based approaches. The randomized nature of Thompson sampling, however, introduces certain unique, non-trivial challenges for mechanism design, which we effectively address in this paper. We first propose a MAB mechanism with deterministic payment rule, namely, TSM-D. We show that in TSM-D, the variance of agent utilities asymptotically approaches zero. However, the game theoretic properties satisfied by TSM-D (incentive compatibility and individual rationality with high probability) are rather weak. As our main contribution, we then propose the mechanism TSM-R, with randomized payment rule, and prove that TSM-R satisfies appropriate, adequate notions of incentive compatibility and individual rationality. For TSM-R, we also establish a theoretical upper bound on the variance in utilities of the agents. We further show, using simulations, that the variance in social welfare incurred by TSM-D or TSM-R is much lower when compared to that of existing UCB based mechanisms. We believe this paper establishes Thompson sampling as an attractive approach to be used in MAB mechanism design.

A Declarative Modular Framework for Representing and Applying Ethical Principles

ABSTRACT. This paper investigates the use of high-level action languages for designing ethical autonomous agents in goal specification domains. It proposes a novel and modular logic-based framework for representing and reasoning over a variety of ethical theories, based on a modified version of the Event Calculus and implemented in Answer Set Programming. The ethical decision-making process is conceived of as a four-step procedure captured by four types of interdependent models which allow the agent to assess its environment, reason over its accountability and make ethically informed choices. The overarching ambition of the presented research is twofold. First, to allow the systematic representation of an unbounded number of ethical reasoning processes, through a framework that is adaptable and extensible by virtue of its designed hierarchisation and standard syntax. Second, to avoid the pitfall of much research in current computational ethics that too readily embed moral information within computational engines, thereby feeding agents with atomic answers that fail to truly represent underlying dynamics. The aim instead is to comprehensively displace the burden of moral reasoning from the programmer to the program itself.

The Event Calculus in Probabilistic Logic Programs with Annotated Disjunctions

ABSTRACT. We propose a new probabilistic extension to the event calculus using the ProbLog language with annotated disjunctions. This is the first extension of the event calculus capable of handling numerous forms of uncertainty (e.g. in event observations and in composite event definitions). It is also the first extension capable of handling multiple sources of event observations (e.g. in multi-sensor environments). We then describe characteristics of this new extension (e.g. rationality of conclusions), and prove some important properties (e.g. validity in ProbLog). Our extension is directly implementable in ProbLog and has been successfully applied to handling uncertainty in activity recognition for intelligent surveillance.

Symbolic Model Checking Multi-Agent Systems against CTL*K Specifications

ABSTRACT. We introduce a methodology for model checking multi-agent systems
against temporal-epistemic specifications expressed in the logic
CTL$^*$K. We present an algorithm for the verification of explicit
models and show it is PSPACE-complete. We show that the technique is
amenable to symbolic implementation via binary decision diagrams. We
introduce MCMAS$^*$, a toolkit based on the open-source model checker
MCMAS which presently supports CTLK only, implementing the
technique. We present the experimental results obtained and show its
attractiveness compared to all other toolkits available.

ABSTRACT. Dynamic epistemic logic (DEL) is an extension of modal multi-agent epistemic logic with dynamic operators. We propose a succinct version of DEL where Kripke models and event models are described succinctly. Our proposal relies on Dynamic logic of propositional assignments (DLPA): epistemic relations are described with so-called mental programs written in DLPA. We give examples of models that are exponentially more succinct in our framework. Interestingly, the model checking of DEL is PSPACE-complete and we show that it remains in PSPACE for the succinct version.

Exploring the bidimensional space: a dynamic logic point of view

ABSTRACT. We present a family of logics for reasoning about agents' positions and motion in the two-dimensional plane which have several potential applications in the area of multi-agent systems (MAS), such as multi-agent planning and robotics. The most general logic includes (i) atomic formulas for representing the truth of a given fact or the presence of a given agent at a certain position of the plane, (ii) atomic programs corresponding to the four basic orientations in the plane (up,down,left,right) as well as the four program constructs of propositional dynamic logic PDL (sequential composition, nondeterministic composition, iteration and test). As this logic is undecidable and non-axiomatizable, we study some interesting decidable and axiomatizable fragments of it. We also present a decidable extension of the iteration-free fragment of the logic by special programs representing motion of agents in the plane.

Coordinating vessel traffic to improve safety and efficiency

ABSTRACT. Global increase in trade leads to congestion of maritime traffic in and out of ports. This often leads to increased maritime incidents or near-miss situations. To improve maritime safety while maintaining efficiency, movement of vessels needs to be better coordinated. Our work formulates this problem of coordinating the paths of vessels as a multi-agent path-finding (MAPF) problem. To address this problem, we introduce an innovative application of multi-agent path finding in maritime domain known as Vessel Coordination Module (VCM). Based on a local search paradigm, VCM plans on a joint state space updated using the Electronic Navigation Charts (ENC) and the paths of vessels. We introduce the notion of path quality that measures the number of positions of a vessel path that is too close to some other vessels spatially and temporally. VCM aims to improve the overall path quality of vessels by improving path quality of selected vessels. Experiments are conducted on the Singapore Straits to evaluate and compare performance of our proposed approach in heterogeneous maritime scenario. Our experiment results show that VCM can improve the overall path quality of vessels.

Influence Maximization in the Field: The Arduous Journey from Emerging to Deployed Application [BEST PAPER NOMINEE]

ABSTRACT. This paper focuses on a topic that is rarely addressed in the AAMAS application track literature, i.e., challenges faced in transitioning agents from an emerging phase in the lab, to a deployed application in the field. We focus on challenges faced in transitioning HEALER and DOSIM, two agents for social influence maximization, which assist service providers in maximizing HIV awareness in real-world homeless youth social networks. These agents recommend key "seed" nodes in social networks, i.e., homeless youth who would maximize HIV awareness in their real-world social network. While earlier research published promising simulation results from the lab, this paper illustrates that transitioning these agents from the lab into the real-world is not straightforward, and outlines three major lessons. First, it is important to conduct real-world pilot tests; indeed, due to the health-critical nature of the domain and complex influence spread models used by these agents, it is important to conduct field tests to ensure the real-world usability and effectiveness of these agents. We present results from three real-world pilot studies conducted with homeless youth in an American city. These are the first such pilot studies which provide head-to-head comparison of different agents for social influence maximization, including a comparison with a baseline approach. Second, we present analyses of these real-world results, illustrating the strengths and weaknesses of different influence maximization approaches we compare. Third, we present research challenges revealed in conducting these pilot tests, and propose solutions to address them. These challenges and proposed solutions are instructive in assisting the transition of agents focused on social influence maximization from the emerging to the deployed application phase. The promising results obtained in the pilot studies opens the door to future deployment of HEALER and DOSIM by service providers on a regular basis.

Cloudy with a Chance of Poaching: Adversary Behavior Modeling and Forecasting with Real-World Poaching Data

ABSTRACT. Given that wildlife conservation organizations task rangers to deter and capture wildlife poachers, and rangers are responsible for patrolling vast areas, predictive models for poaching activities can help more effectively direct future patrols.In this paper, we present the most extensive empirical evaluation in the AI literature of one of the largest poaching datasets from Queen Elizabeth National Park (QENP) in Uganda, with the aim of deploying the results for saving animals in QENP. We present three major contributions. First, we present a paradigm shift for modeling and forecasting wildlife poacher behavior. Some of the latest work in the AI literature (and in Conservation) has relied on models similar to the Quantal Response model from Behavioral Game Theory for poacher behavior prediction. In contrast, we present a decision tree-based model (i) that more effectively predicts poacher attacks and (ii) that is more effectively explainable and verifiable. We augment this basic decision tree approach via an iterative supervised learning approach with feature boosting and an ensemble of the best classifiers, significantly improving performance. Second, we conduct an extensive evaluation on the QENP dataset, comparing 40 models in prediction performance over two years, illustrating trade-offs. Third, we present results from one month trial field test of our model in QENP; this led to finding a poached elephant, and a dozen snares (including a roll of elephant snares) before they were deployed, potentially saving the lives of multiple animals.

Prioritized Allocation of Emergency Responders based on a Continuous-Time Incident Prediction Model

ABSTRACT. Efficient emergency response is a major concern in densely populated urban areas. Numerous techniques have been proposed to allocate emergency responders to optimize response times, coverage, and incident prevention. Effective response depends, in turn, on effective prediction of incidents occurring in space and time, a problem which also received considerable prior attention. We formulate a non-linear mathematical program maximizing expected incident coverage, and propose a novel algorithmic framework for solving this problem. In order to aid the optimization problem, we propose a novel incident prediction mechanism. Prior art in incident prediction does not generally consider incident priorities which are crucial in optimal dispatch, and spatial modeling either considers each grid independently, or learns a homogenous model. We bridge these gaps by learning a joint distribution of both incident arrival time and severity, with spatial heterogeneity captured using a hierarchical clustering approach. Moreover, our decomposition of the joint arrival and severity distributions allows us to independently learn the continuous-time arrival model, and subsequently use a multinomial logistic regression to capture severity, conditional on incident time. We use real traffic accident and response data for a major US urban area to evaluate the proposed approach, showing that it significantly outperforms prior art as well as the real dispatch method currently in use.

A Game Theoretic Approach to Strategy Generation for Moving Target Defense in Web Applications

ABSTRACT. The present complexity in designing web applications makes software security a difficult goal to achieve. Moreover, an attacker can explore a deployed service on the web and attack at his/her own leisure. Although Moving Target Defense (MTD) in web applications is an effective mechanism to nullify this advantage of their reconnaissance, the framework demands a good switching strategy when switching between multiple configurations for its web-stack. To address this issue, we propose modeling of a real-world MTD web application as a repeated Bayesian game. We then formulate an optimization problem that generates an effective switching strategy while considering the cost of switching between different web-stack configurations. To incorporate this model into a developed MTD system, we propose an automated system for generating attack sets of Common Vulnerabilities and Exposures (CVEs) for input attacker types with predefined capabilities. Our framework obtains realistic reward values for the players (defenders and attackers) in this game by using security domain expertise on CVEs obtained from the National Vulnerability Database (NVD). Also, with our model, we address the issue of prioritizing vulnerabilities that when fixed, improves the security of the MTD system. Lastly, we demonstrate the robustness of our proposed model by evaluating its performance when there is uncertainty about input attacker information.

A Partial Decision Scheme for Local Search Algorithms for Distributed Constraint Optimization Problems

ABSTRACT. Local search algorithms are widely adopted in solving large-scale DCOPs. However, since each agent always makes its value decision based on the values of its neighbors in local search, those algorithms usually suffer from local premature convergence. More concretely, an agent cannot make a wise decision with poor values of its neighbors since its decision space is bound up with those poor values. In this paper, we propose a Partial Decision Scheme (PDS) to relax the decision space of an agent by ignoring the value of its neighbor which has the bad impact on its local benefits. The PDS comprises of two partial decision processes: trigger partial decision and recursive partial decision. The former is iteratively performed by agents who cannot enhance their local benefits unilaterally to break out of potential local optima. The latter is recursively performed by neglected agents to improve global benefits. Besides, the trigger conditions along with a self-adaptive probability are introduced to control the use of PDS. The PDS can be easily added to any local search algorithm to overcome its local premature convergence with a small additional overhead. In our theoretical analysis, we prove the feasibility and convergence of PDS. Moreover, the experimental results also demonstrate the superiority of the use of PDS in the typical local search algorithms over state-of-the-art local search algorithms.

An Iterative Refined Max-sum_AD Algorithm via Single-side Value Propagation and Local Search

ABSTRACT. Max-sum_ADVP is a message-passing algorithm for solving Distributed Constraint Optimization Problems (DCOPs) able to obtain a convergent solution in a cyclic factor graph. Nevertheless, the solution quality is closely related to the timing for starting value propagation in Max-sum_ADVP. In other words, low-quality initial assignments will lead to a poor result. In this paper, we illustrate that value propagation can eliminate the inconsistent contexts in Max-sum_AD and break ties among utilities, but it also restricts the exploration brought by Max-sum. For balancing between the exploration and the accuracy, we propose a new iterative refined Max-sum_AD algorithm with single-side value propagation, called Max-sum_ADSSVP. It performs two phases in every two convergences, one phase which enables the exploration to find high-quality initial assignments, and the other phase which enables value propagation to guarantee solution quality. Max-sum_ADSSVP tackles the timing selection problem by iteratively refining initial assignments in every exploration phase. Besides, local search is introduced into the value propagation phase to speed up the convergence process. Our empirical evaluation indicates that our methods are independent of initial assignments, and less likely to get stuck in local optima.

A Deterministic Distributed Algorithm for Reasoning with Connected Row-Convex Constraints

ABSTRACT. The class of CRC constraints generalizes several tractable classes of constraints and is expressive enough to model problems in domains such as temporal reasoning, geometric reasoning, and scene labelling. This paper presents the first distributed deterministic algorithm for connected row-convex (CRC) constraints. Our distributed (partial) path consistency algorithm efficiently transforms a CRC constraint network into an equivalent constraint network, where all constraints are minimal (i.e., they are the tightest constraints) and generating all solutions can be done in a backtrack-free manner. When compared with the state-of-the-art distributed algorithm for CRC constraints, which is a randomized one, our algorithm guarantees to generate a solution for satisfiable CRC constraint networks and it is applicable to solve large networks in real distributed systems. The experimental evaluations show that our algorithm outperforms the state-of-the-art algorithm in both practice and theory.

ABSTRACT. Distributed Constraint Optimization Problem (DCOP) formulation is a powerful tool for modeling multi-agent coordination problems. Researchers have recently extended this model to Proactive Dynamic DCOPs (PD-DCOPs) to capture the inherent dynamism present in many coordination problems. PD-DCOP is a finite-horizon model and integrates future problems to the problem at a finite horizon which is known a priori. However, this assumption can be unrealistic in many applications and does not guarantee optimal solutions for infinite horizon. Therefore, we (i) propose the Infinite-Horizon PD-DCOP (IPD-DCOP) model, which extends PD-DCOPs to handle infinite-horizon; (ii) exploit the convergence properties of Markov chains to determine optimal solutions to infinite horizon; (iii) propose two distributed greedy algorithms to solve IPD-DCOPs; (iv) provide theoretical quality guarantees on the new model; and (v) empirically evaluate both proactive and reactive algorithms to determine the tradeoffs between both classes of algorithms. The final contribution is important as, thus far, researchers have exclusively evaluated those two classes of algorithms in isolation. As a result, it is difficult to identify the characteristics of problems that they excel in. Our results are the first in this important direction.

Proactive Project Scheduling with Time-dependent Workability Uncertainty

ABSTRACT. Proactive scheduling can effectively handle activity duration
uncertainty in real-world projects, by generating a baseline
solution according to a prior stochastic knowledge. However,
most of the previous approaches cannot deal with the activity
duration uncertainty caused by time-dependent workability
uncertainty. In this paper, we aim at finding a partialorder
schedule (POS) that produces the minimum expected
makespan on a given probability model of workability uncertainty.
Since this is a hard discrete stochastic optimization
problem, we propose an approximation approach based
on Sample Average Approximation (SAA), and develop a
branch-and-bound algorithm to optimally solve the SAA
problem. Empirical results on benchmark problem instances
and real-world distribution data show that our approach
outperforms the best general-purpose POS generation approaches
that do not exploit the stochastic knowledge.

Arnor: Modeling Social Intelligence via Norms to Engineer Privacy-Aware Personal Agents

ABSTRACT. We propose Arnor, a method to engineer personal agents that are socially intelligent. Arnor incorporates social norms and goes beyond existing agent-oriented software engineering (AOSE) methods by systematically capturing how an agent's actions influence the social experience it delivers. We conduct two rigorous empirical studies to evaluate Arnor. (1) Via a multiphase developer study, we show that Arnor simplifies application development. (2) Via a set of simulation experiments, we show that Arnor provides better social experience to end users than personal agents
engineered using a traditional AOSE method.

ABSTRACT. We describe DecAMon, an algorithm for decentralizing the
monitoring of the MAS communicative behavior described
via an Agent Interaction Protocol (AIP). DecAMon pro-
duces results also when the AIP does not satisfy the “unique
point of choice” and “connectedness for sequence” coher-
ence conditions. It is well known from the literature that
these conditions are necessary to fully decentralize the sys-
tem monitoring, namely to associate one monitor with each
individual agent in the MAS.
If some agents can be grouped together and monitored by
the same monitor, instead of individually, a partial decen-
tralization of the monitoring activity can still be obtained
even if the above conditions are not satisfied. Given an AIP
specification, DecAMon outputs a set of “Monitoring Safe
Partitions” of the agents, namely partitions P which ensure
that having one monitor in charge for each group of agents
in P allows to detect the same protocol violations that a
fully centralized monitor would detect.
A post-processing algorithm can be run on the DecAMon
output to select those partitions which meet further condi-
tions such as being minimal (groups cannot be further split
without loosing monitoring safety), keeping some agents dis-
joint, having no more than n agents in each group, no more
of m groups in the partition, and so on.

ABSTRACT. Debugging is hard, and debugging cognitive agent programs is particularly hard, since they involve concurrency, a dynamic environment, and a complex execution model that includes failure handling.
Previous work by Ko & Myers has demonstrated that providing Alice and Java programmers with software that can answer "why?" and "why not?" questions can make a dramatic difference to debugging performance. This paper considers how to adapt this approach to cognitive agent programs, specifically AgentSpeak. It develops and formalises definitions for "why?" and "why not?" questions and associated answers, and illustrates their application using a scenario.

Approximate Solutions To Max-Min Fair and Proportionally Fair Allocations of Indivisible Goods

ABSTRACT. Max-min fair allocations and proportionally fair allocations are desirable outcomes in a fair division of indivisible goods. Unfortunately, such allocations do not always exist, not even in very simple settings with few agents. A natural question is to ask about the largest value $c$ for which there is an allocation such that every agent has utility of at least $c$ times her fair share. Our goal is to approximate this value~$c$. For additive utilities, we show that when the number of agents is fixed, one can approximate $c$ by a polynomial-time approximation scheme. We show that the case when utility functions are defined based on scoring vectors (binary, Borda, and lexicographic vectors) is tractable. For $2$-additive functions, we show that a bounded constant for max-min fair allocations does not exist, not even when there are only two agents. We explore a class of symmetric submodular functions for which a tight $\frac{1}{2}$-max-min fair allocation exists and show how it can be approximated within a factor of~$\frac{1}{4}$.

The Atkinson Inequality Index in Multiagent Resource Allocation

ABSTRACT. We analyse the problem of finding an allocation of resources in a multiagent system that is as fair as possible in terms of minimising inequality between the utility levels enjoyed by the individual agents. We use the well-known Atkinson index to measure inequality and we focus on the distributed approach to multiagent resource allocation, where new allocations emerge as the result of a sequence of local deals between groups of agents agreeing on an exchange of some of the items in their possession. Our results show that it is possible to design systems that provide theoretical guarantees for optimal outcomes that minimise inequality, but also that in practice there are significant computational hurdles to be overcome: finding an optimal allocation is computationally intractable---independently of the approach chosen---and large numbers of potentially highly complex deals may be required under the distributed approach. From a methodological point of view, while much work in multiagent resource allocation relies on combinatorial arguments, here we use insights from basic calculus.

ABSTRACT. We study cake cutting on a graph, where agents can only evaluate their shares relative to neighbors in the network. This is an extension of the classical problem of fair division to incorporate the notion of social comparison from the social sciences, in which individuals tend to focus on comparisons with their social contacts rather than with the population overall. We say an allocation is locally envy-free if no agent envies a neighbor's allocation, and locally proportional if each agent’s own allocation has as much value as its average value for the neighbors’ allocations. Whereas global envy-freeness implies local envy-freeness, we show that local and global proportionality are incomparable. We generalize the classical “Cut and Choose” protocol for two agents to this network setting, by fully characterizing the set of graphs for which an oblivious single-cutter protocol--- a protocol that uses a single agent to cut the cake into pieces ---is capable of producing locally envy-free allocations. These allocations are also locally proportional. We study the price of envy-freeness, which compares the total utility of an optimal allocation to the best utility of an allocation that is also locally envy-free. We show that a lower bound of $\Omega(\sqrt{n})$ on the price of envy-freeness for global allocations also holds for local envy-freeness in any connected undirected graph, so that sparse graphs do not provide more flexibility asymptotically with respect to the quality of envy-free allocations.

Stability of Generalized Two-Sided Markets with Transaction Thresholds [BEST PAPER NOMINEE]

ABSTRACT. We study a type of generalized two-sided markets where a set of sellers, each with multiple units of the same divisible good, trade with a set of buyers. Possible trade of each unit between a buyer and a seller generates a given welfare (to be split among them), indicated by the weight of the edge between them. What makes the markets interesting is a special type of constraints, called transaction threshold constraints, which essentially mean that the amount of goods traded between a pair of agents can either be zero or above a certain edge-specific threshold. This constraints has originally been motivated from the water-right market domain by Liu et. al. where minimum thresholds must be imposed to mitigate administrative and other costs. The same constraints have been witnessed in several other market domains.

Without the threshold constraints, it is known that the seminal result by Shapley and Shubick carries over: the social welfare maximizing assignments between buyers and sellers are in the core. In other words, by algorithmically optimizing the market, one can obtain desirable incentive properties for free. This is no longer the case for markets with threshold constraints, i.e., the model considered in this paper.

We first demonstrate a counterexample where no optimal assignment (with respect to any way to split the trade welfare) is in the core. Motivated by this observation, we study the stability of the optimal assignments from the following two perspectives: 1) by relaxing the definition of core; 2) by restricting the graph structure. For the first line, we show that the optimal assignments are pairwise stable, and no coalition can benefit twice as large when they deviate. For the second line, we exactly characterize the graph structure for the nonemptyness of core: the core is nonempty if and only if the market graph is a tree. Last but not least, we compliment our previous results by quantitatively measuring the welfare loss caused by the threshold constraints: the optimal welfare without transaction thresholds is no greater than constant times of that with transaction thresholds. We evaluate and confirm our theoretical results using real data from a water-right market.

ABSTRACT. Cooperative games offer an elegant framework to model cooperation among self-interested agents. A central question of these games is how the payoff ought to be distributed to each player when all players cooperate and derive some benefits. In this work, we consider cooperative transferable utility games where a subset of players can form a coalition if and only if they are connected in the underlying communication structure. We propose a relaxed notion of supermodularity, called quasi-supermodularity, for such games. We identify a class of networks where many of these problems are polynomial-time solvable for relaxed-supermodular games. We also complement these results by showing that without supermodularity, these problems become hard even if the underlying graph is a tree.

Coalition Structure Generation and CS-core: Results on the Tractability Frontier for games represented by MC-nets

ABSTRACT. The coalition structure generation problem (CSG) consists in partitioning a group of agents into coalitions so that the sum of their values is maximized.
We consider here the case of coalitional games whose characteristic function is compactly represented by a set of weighted conjunctive formulae (an MC-net).
In this context the CSG problem is known to be computationally hard in general.

In this paper, we first study some key parameters of MC-nets that make the CSG problem difficult to solve. Then we consider a specific class of MC-nets, called bipolar MC-nets, and prove that the CSG problem is polynomial for this class.
Finally, we show that the CS-core of games represented by bipolar MC-nets is never empty, and that an imputation belonging to the CS-core can be computed in polynomial time.

ABSTRACT. We investigate markets with a set of students on one side and a set of colleges on the other. A student and college can be linked by a weighted contract that defines the student's wage, while a college's budget for hiring students is limited. Stability is a crucial requirement for matching mechanisms to be applied in the real world. A standard stability requirement is coalitional stability, i.e., no pair of a college and group of students has incentive to deviate. We find that a coalitionally stable matching is not guaranteed to exist, verifying the coalitional stability for a given matching is coNP-complete, and the decision problem to find whether a coalitionally stable matching exists in a given market, is NP^NP-complete (that is Sigma_2^P-complete). Given these computational hardness results, we pursue a weaker stability requirement called pairwise stability, i.e., no pair of a college and single student has incentive to deviate. We then design a strategy-proof mechanism that works in polynomial-time for computing a pairwise stable matching in typed markets in which students are partitioned into types that induce their possible wages.

ABSTRACT. A central problem in multiagent systems concerns the fair assignment of objects to agents. We initiate the study of randomized assignment rules with optional participation and investigate whether agents always benefit from participating in the assignment mechanism. Our results are largely positive, irrespective of the strategyproofness of the considered rules. In particular, random serial dictatorship, the probabilistic serial rule, and the Boston mechanism strictly incentivize single agents to participate, no matter what their underlying utility functions are. Random serial dictatorship and the probabilistic serial rule also cannot be manipulated by groups of agents who abstain strategically. These results stand in contrast to similar problems in the more general domain of voting where many rules suffer from the so-called “no-show paradox”. We also show that rules that return popular random assignments may disincentivize participation for some (but never all) utility representations consistent with the agents’ ordinal preferences.

Majority Graphs of Assignment Problems and Properties of Popular Random Assignments

ABSTRACT. Randomized mechanisms for assigning objects to individual agents have received increasing attention by computer scientists as well as economists. In this paper, we study a property of random assignments, called popularity, which corresponds to the well-known notion of Condorcet-consistency in social choice theory. Our contribution is threefold. First, we define a simple condition that characterizes whether two assignment problems induce the same majority graph and which can be checked in polynomial time. Secondly, we analytically and experimentally investigate the uniqueness of popular random assignments. Finally, we prove that popularity is incompatible with very weak notions of both strategyproofness and envy-freeness. This settles two open problems by Aziz et al. (2013) and reveals an interesting tradeoff between social and individual goals in random assignment.

Stable Matching with Uncertain Pairwise Preferences

ABSTRACT. In this paper we study a two-sided matching problem under preferences, where the agents have independent pairwise comparisons on their possible partners and these preferences may be uncertain. In this case preferences may be intransitive and agents may even have certain cycles in their preferences; e.g. an agent a may prefer b to c, c to d, and d to b, all with probability one. If an instance has such a cycle, then there may not exist any matching that is stable with positive probability. We focus on the computational problems of checking the existence of possibly and certainly stable matchings, i.e., matchings whose probability of being stable is positive or one, respectively. We show that finding possibly stable matchings is NP-hard, even if only one side can have cyclic preferences. On the other hand we show that the problem of finding a certainly stable matching is polynomial-time solvable if only one side can have cyclic preferences and the other side has transitive preferences, but that this problem becomes NP-hard when both sides can have cyclic preferences. The latter complexity result also implies the hardness of finding a kernel in a special class of directed graphs.

Parameterized Complexity of Group Activity Selection

ABSTRACT. We consider the Group Activity Selection Problem (GASP) in which a group of agents need to be assigned to activities, subject to agent preferences and stability conditions. In GASP, the agents announce preferences on which activities are acceptable to them and how many other participants in these activities they prefer. We consider five solution concepts of assignments: (1) individual rationality (everyone who is assigned to an activity is willing to participate), (2) (Nash) stability (no agent wants to deviate from the assignment), (3) envy-freeness (no agent is envious of someone else's assignment), (4) stability and envy-freeness, and (5) perfection (everyone is assigned and willing to participate).
It is known that finding an assignment of a given size with any of these properties is NP-complete. We study the complexity of GASP on a finer scale, through the lens of parameterized complexity. We show that the solution concepts above differ substantially, when parameterized by the size of the solution (the number of assigned agents or the number of used activities).
In particular, we show that finding an individually rational assignment of size $k$ is fixed parameter tractable, yet other solutions concepts are much more complex (W[1]- and W[2]-hard) even under very natural restrictions on inputs.

Efficient near-optimal algorithms for barter exchange

ABSTRACT. We study polynomial-time clearing algorithms for the barter exchange problem. We put forward a family of carefully designed approximation algorithms with desirable worst-case guarantees. We further apply a series of novel heuristics to carry out these algorithms into practice. We demonstrate via standard kidney exchange data sets that these algorithms achieve near-optimal performances (no less than 98% of the optima) while outperforming the state-of-the-art ILP based algorithms in running time by orders of magnitude.

ABSTRACT. In this paper we present a new algorithm for negotiations in non-zero-sum games. Although games have been studied extensively, most game playing algorithms have been developed under the assumption that players do not communicate. Many real-world problems however, can be modeled as non-zero-sum games in which players may mutually benefit if they coordinate their actions, which requires negotiation. The field of Automated Negotiations is another important topic in AI, but in this field it is usually assumed that utility values are given by an explicit function that can be calculated easily. Traditional approaches do not apply to domains in which the utility functions are determined by the rules of some complex game. In this paper we aim to bridge the gap between General Game Playing and Automated Negotiations. Our algorithm is an adaptation of Monte Carlo Tree Search that allows players to negotiate. Our algorithm is completely domain-independent in the sense that it is not tailored to any specific game. It can be applied to any non-zero-sum game, provided that its rules are described in Game Description Language.

An Automated Negotiation Agent for Permission Management

ABSTRACT. The digital economy is based on data sharing. Yet citizens still need to be able to exercise choice about which services they engage with and what they choose to share. Data management during web and app-based use is already a challenge, but as the Internet of Things scales up, the number of devices requiring and accessing personal data will go beyond what a person can manually assess in terms of data access requests. Since the user's willingness to share data depends on context and the user's expected benefit, set-and-forget-it approaches (for example as used for smartphones) are sub-optimal. Therefore, new approaches are needed for managing privacy preferences at scale and providing active consent around data sharing that can improve fidelity of operation in alignment with user intent.

In this paper we report on two key components: 1) a novel agent-based approach to negotiate the permission to exchange private data between users and services; 2) the criticality of the interaction design to support the usability of such services. To evaluate our agent-based approach, we developed an experimental tool to run on people's own smartphones, where users were asked to share their private, real data (e.g. photos, contacts, etc) under various conditions. The agent autonomously negotiates potential agreements for the user, which they can refine by manually continuing the negotiation. The agent learns from these interactions and updates the user model in subsequent interactions. We found that the agent was able to effectively capture the preferences and negotiate on the user's behalf but, surprisingly, did not reduce user engagement with the system. Understanding how interaction interplays with agent-based automation we propose is a key component to successful deployment of negotiating agents in real-life settings and within the IoT context in particular.

The Value of Information in Automated Negotiation: A Decision Model for Eliciting User Preferences

ABSTRACT. Consider an agent that can autonomously negotiate and coordinate with others in our stead, to reach outcomes and agreements in our interest. Such automated negotiation agents are already common practice in areas such as high frequency trading, and are now finding applications in domains closer to home, which involve not only mere financial optimizations but balanced tradeoffs between multiple issues, such as cost and convenience. As a simple example, a smart thermostat controlling a heat pump could provide demand response to the electricity grid if the inconvenience is offset by the grid relieve incentives. In such situations, the agent represents a user with individual and a priori unknown preferences, which are costly to elicit due to the user bother this incurs. Therefore, the agent needs to strike a balance between increasing the user model accuracy and the inconvenience caused by interacting with the user. To do so, we require a tractable metric for the value of information in an ensuing negotiation, which until now has not been available. In this paper, we propose a decision model that finds the point of diminishing returns for improving the model of user preferences with costly queries. We present a reasoning framework to derive this metric, and show a myopically optimal and tractable stopping criterion for querying the user before a fixed number of negotiation rounds. Our method provides an extensible basis for interactive negotiation agents to evaluate which questions are worth posing given the marginal utility expected to arise from more accurate beliefs.

ABSTRACT. We present the Interactive Arbitration Guide Online (IAGO) platform, a tool for designing human-aware agents for use in negotiation. Current state-of-the-art research platforms are ideally suited for agent-agent interaction. While helpful, these often fail to address the reality of human negotiation, which involves irrational actors, natural language, and deception. The authors step through the design of four agents that target the IAGO platform. We go on to show how these agents might be used to answer core questions in human-centered computing, by presenting the results of a 2x2 study. These results then show which agents were most successful at winning a negotiation against humans, and discusses the impact of these results on effective human-relatable agent design.

Towards An Autonomous Agent that Provides Automated Feedback on Students' Negotiation Skills

ABSTRACT. Although negotiation is an integral part of daily life, most people are poor negotiators. To improve one’s skill set, a range of costly options exist. Self-study guides, courses, and even training programs are offered by various companies and educational institutions. Virtual role playing agents offer a low-cost alternative to the current training methods. To be effective, these systems must allow students to engage in experiential learning exercises and provide personalized feedback on the learner’s performance. In this paper, we show how a number of negotiation principles can be formalized and quantified in a corpus of human-agent negotiation role plays. We then establish the pedagogical relevance of these automated metrics and show that they are significantly correlated with negotiation outcomes in a human-agent negotiation. These metrics can be used to provide students with personalized feedback on the errors they make in a negotiation exercise and thereby support guided experiential learning. This illustrates the potential of technology to quantify feedback that is traditionally provided through more qualitative approaches.

Increasing Fairness by Delegating Decisions to Autonomous Agents

ABSTRACT. There has been growing interest in autonomous agents that act on our behalf, or represent us, across various domains such as negotiation, transportation, health, finance, defense, etc. As these agent representatives become immersed in society, it is critical we understand whether and, if so, how they disrupt the traditional patterns of interaction with others. In this paper we study how programming agents to represent us, shapes our decisions in social settings. Here we show that, when acting through agent representatives, people are considerably less likely to accept unfair offers from others, when compared to direct interaction with others. This result, thus, demonstrates that agent representatives have the potential to promote fairer outcomes. Moreover, we show that this effect can also occur when people are asked to “program” human representatives, thus revealing that the effect is caused by the act of programming itself. We argue this happens because programming requires the programmer to deliberate on all possible situations that might arise and, thus, promote consideration of social norms – such as fairness – when making their decisions. These results have important theoretical, practical, and ethical implications for designing and understanding the nature of people's decision making when they act through agents that act on our behalf.

Agent-Based Modeling and Simulation of Mosquito-Borne Disease Transmission

ABSTRACT. Mosquito-borne diseases, such as chikungunya, dengue, and malaria are re-emerging and expanding to new and formerly unaffected places, leading to a need for models which can track their evolution. We give a generalized agent-based model (ABM) which captures the disorganized spatio-temporal interactions of hosts and vectors at a micro-scale by explicitly modeling each human and mosquito to predict the complex trajectory of the infection. The model has been
integrated with geographic information (GIS), census, and climate data to effectively account for the host and agent behaviors. We also provide a novel framework for incorporating complex climate-sensitive mosquito reproduction cycle which is seen to adequately imitate the mosquito population in nature. As a proof of concept, the model is trained and validated using real data from a 2013-14 chikungunya epidemic in the Caribbean. Two popular infection intervention strategies: LSM (Larval Source Management) and ITN + IRS(Insecticide-Treated Nets plus Indoor Residual Spraying), are also simulated and analyzed in terms of cost-effectiveness and mosquito control to demonstrate our model's ability to contribute to policy design.

What Becomes of the Broken Hearted? - An Agent-Based Approach to Self-Evaluation, Interpersonal Loss, and Suicide Ideation

ABSTRACT. Social surroundings greatly affect how we perceive ourselves. Examining the network of social relations may help us reveal important information about our psyche. We build on this idea and propose a model that is consistent with social psychological theories and connects individuals' emotional states with their social relations. The model consists of a social network of agents. Using a centrality-based measure of self-evaluation, the model quantifies the amount of interpersonal loss experienced by agents as their social relations change. Applying this model, we analyze interpersonal loss of agents in standard network structures under different conditions. We then draw a link to suicide and discuss two real-world suicide incidents. Finally, we simulate dynamics of large random networks and investigate how network structures affect suicide ideation and its possible contagion.

The importance of modelling realistic human behaviour when planning evacuation schedules

ABSTRACT. In planning evacuation schedules various projects work on devising
optimal schedules for evacuation of the population. These assume that
people do as they are asked in terms of when and by what route they
leave the area. However, we know from numerous case studies, that this
is not so. People have their own priorities and concerns and will
address these before doing as they are asked. We know from many
emergency situation examples that a key behaviour that people engage
in is to assemble the family group, and to check on relatives or close
friends. These behaviours are highly likely to affect how an optimal
schedule that is provided plays out in practice. In this work we
obtain an actual optimised evacuation schedule for a large area of
over 35,000 households, which is prone to wildfires. We then compare
an agent based simulation that adheres to this schedule, with ones
that use it as a base, but where some of the agents first engage in
the behaviours mentioned. We analyse the differences on 17 different
configurations, exploring both the statistical significance of
differences, as well as the importance of the differences for ensuring
successful evacuation.

Learning social conventions via collaborative reinforcement learning in complex and open settings

ABSTRACT. This paper explores the computation of conventions in agents' societies via social learning methods in settings where agents aim to perform tasks collaboratively with their acquaintances. Specifically, agents need to accomplish multiple -even incompatible- joint tasks simultaneously, with limited information about the preferred means of others to accomplish tasks (e.g. resources to be used), implying the need for coordination. In conjunction to that, agents may have limited monitoring abilities about the strategies of their acquaintances. The article formulates the problem as an MDP and reports on the effectiveness of a collaborative reinforcement learning method for agents to compute conventions wrt to the existing constraints and limitations, against other social learning approaches.

Multiagent Reinforcement Learning in Sequential Social Dilemmas

ABSTRACT. Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit gathering game and 2. a wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

Phase Transitions in Possible Dynamics of Cellular and Graph Automata Models of Sparsely Interconnected Multi-Agent Systems

ABSTRACT. We study Communicating Finite State Machine (CFSM) based models of large-scale multi-agent systems and their emerging behavior. CFSM-based models are suitable for studying large ensembles of simple reactive agents, and collective dynamics of such ensembles. In this paper, we focus on the asymptotic dynamics of a class of the classical (finite) Cellular Automata (CA) and more general Network or Graph Automata (GA). We restrict our attention to CA and GA with Boolean-valued linear threshold functions as the node update rules, inspired by well-known models of biological neurons and research on artificial neural networks. In particular, we fully characterize the configuration spaces of a class of Simple Threshold CA, with the focus on Majority update rule that results in the most interesting dynamics among the CA in this class. In particular, we analytically demonstrate the stark contrast in possible dynamics between the Majority rule and all other Simple Threshold rules in the context of 1-D CA.

We then discuss a stark contrast with respect to intractability of counting for
the related classes of Boolean graph automata with the same restrictictions on the node update rules. The GA with provenly complex dynamics have only slightly more complex structures when it comes to (i) the underlying interconnection topologies
("cellular spaces") and (ii) diversity of the node update rules, i.e., whether all the nodes use the same update rule (as in the classical CA), or just two different rules from the given class are allowed.

Thus, our research demonstrates two interesting phase transitions in dynamics complexity. One, for the simple threshold CA (corresponding to homogeneous ensembles of reactive agents), the number of possible dynamics is small for all agent
update rules other than the Majority function; in the latter case, there are in general exponentially many possible asymptotic dynamics. Two, predicting asymptotic dynamics of such homogeneous systems is always computationally feasible; however,
when even the smallest degree of heterogeneity is added to individual agent behaviors and inter-agent interaction patterns, in several such scenarios the worst-case complexity of predicting asymptotic dynamics becomes provably computationally intractable for nontrivial ensemble sizes. The main conclusion is that even rather modest differences with respect to MAS (non)uniformity can lead to a major phase transition from simple collective dynamics that are “easy in principle” to fully characterize, to very complex behaviors that are provably computationally intractable
to predict.

A GRASP Metaheuristic for the Coverage of Grid Environments with Limited-Footprint Tools

ABSTRACT. Coverage of known environments is a task involved in several applications of autonomous mobile robots, like patrolling, search and rescue, and cleaning. The coverage problem can be formulated as that of finding the optimal tour that, when followed, allows a robot to cover with its tool (e.g., a sensor or a brush) all the points of the free space of a given environment. Most of the current methods for coverage discretize the environment in cells. In this paper, we consider a setting in which the environment is represented as a grid of equal square cells and in which a robot has a tool with limited range and angular field of view, able to cover a set of cells from a given pose. We propose an efficient covering method based on a metaheuristic approach that iteratively constructs a feasible solution and tries to improve it through local search. Results of experimental activities show that the proposed method produces solutions of better quality than those of a state-of-the-art method, in an efficient way.

Decentralized Online Planning for Multi-Robot Warehouse Commissioning [BEST PAPER NOMINEE]

ABSTRACT. Warehouse commissioning, in which a team of robots needs to gather and deliver items as
fast and efficiently as possible while adhering to the constraint capacity of the robots,
is a complex task. Typical centralised control approaches can quickly become infeasible
when dealing with many robots. Instead, we tackle this spatial task allocation problem
(SPATAP) via distributed planning on each robot in the system. Previous such distributed
planning approaches suffer from a number of limiting assumptions and ad-hoc
approximations. This paper demonstrates how we can use Monte Carlo Tree Search (MCTS) to
overcome these limitations and provide scalability in a more principled manner. Our
simulation-based evaluation demonstrates that this translates to higher task performance,
especially when tasks get more complex. Moreover, this higher performance does not come at
the cost of scalability: in fact, the proposed approach in fact scales better than the previous
best approach, demonstrating excellent performance on an 8-robot team servicing a warehouse
comprised of over 200 locations.

Multirobot Symbolic Planning under Temporal Uncertainty

ABSTRACT. Multirobot symbolic planning (MSP) aims at computing plans, each in the form of a sequence of actions, for a team of robots to achieve their individual goals while minimizing overall cost. Solving MSP problems requires modeling limited domain resources (e.g., corridors that allow at most one robot at a time) and the possibility of action synergy (e.g., multiple robots going through a door after a single door-opening action). However, it is a challenge to plan for resource sharing and realizing synergy in a team of robots due to the robots’ noisy action durations. This paper, for the first time, introduces the problem of MSP under temporal uncertainty (MSPTU). We present a novel, iterative conditional planning (ICP) algorithm, including two configurations (simple and enhanced), for solving general MSPTU problems. We then focus on multirobot navigation tasks, presenting a full instantiation of ICP that includes a new algorithm for computing conditional plan cost under temporal uncertainty and a novel shifted-Poisson distribution for accumulating temporal uncertainty over actions. The algorithms have been implemented both in simulation and on real robots. We observed a significant reduction in overall cost compared to baselines in which robots do not communicate or model temporal uncertainty.

Exploiting Robotic Swarm Characteristics for Adversarial Subversion in Coverage Tasks

ABSTRACT. Multi-robot systems, such as swarms, with large number of members that are homogeneous and anonymous are robust to deletion and addition of members. However, these same properties that make the system robust, create vulnerabilities under certain circumstances. In this paper, we study such a case, namely the insertion by an adversary of agents, called moles, that subvert the performance of the system. The adversary monitors the swarm's movement during surveillance operations for the presence of holes, i.e. areas that were left uncovered by the swarm. The adversary then adds moles that get positioned in the swarm, in such a way as to deceive the swarms regarding the existence of holes and thus preventing the swarm from discovering and repairing the holes. This problem has significant military applications. Our contributions are as follows: First, to the best of our knowledge, this is the first paper that studies this problem. Second, we provide a formalization of the problem. Third, we provide several algorithms, and characterize them formally and also experimentally.

Three Years of the RoboCup Standard Platform League Drop-in Player Competition: Creating and Maintaining a Large Scale Ad Hoc Teamwork Robotics Competition (JAAMAS Paper)

Scaling Expectation-Maximization for Inverse Reinforcement Learning to Multiple Robots under Occlusion

ABSTRACT. We consider inverse reinforcement learning (IRL) when portions of
the expert's trajectory are occluded from the learner. For example,
two experts performing tasks in close proximity may block each other
from the learner's view or the learner is a robot observing mobile
robots from a fixed position with limited sensor range. Previous
methods mitigate this challenge by either focusing on the observed
data only or by forming an expectation over the missing portion of
the expert's trajectories given observed data. However, not only is the resulting optimization nonlinear and
nonconvex, the space of occluded trajectories may be very large
especially when multiple agents are observed over an extended time,
which makes it intractable to compute the expectation. We present
methods for speeding up the computation of conditional expectations
by employing blocked Gibbs sampling. Challenged by a time-limited,
multi-robot domain we explore various blocking schemes and
demonstrate that our methods offer significantly improved
performance over existing IRL techniques under occlusion.

Asynchronous Data Aggregation for Training End to End Visual Control Networks

ABSTRACT. Robust training of deep neural networks requires a large amount of data. However gathering and labeling this data can be expensive and determining what distribution of features are needed for training is not a trivial problem. This is compounded when training neural networks for navigating autonomous agents in continuous non-deterministic environments from visual input. Increasing the amount of demonstrated data does not solve this problem as demonstrated sequences of actions are not guaranteed to produce the same outcomes and slight changes in orientation generate drastically different visual representations. This results in a training set with a different distribution than what the agent will typically encounter in application. Here, we develop a method that can grow a training set from the same distribution as the agent's experiences and capture useful features not found in demonstrated behavior. Additionally, we show that our approach scales to efficiently handle complex tasks that require a large amount of data (experiences) for training (e.g., navigating 3D mazes). Concretely, we propose the deep asynchronous dagger framework which combines the dagger algorithm with an asynchronous actor-learner architecture for parallel dataset aggregation and network policy learning. We apply our method to the task of navigating 3D mazes in Minecraft with randomly changing block types and analyze our results.

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

ABSTRACT. For an autonomous agent, executing a poor policy may be costly or even dangerous.
For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy.
Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound.
Existing model-based methods only address the problem in discrete state spaces.
Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds.
In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces.
Since direct use of a model may introduce bias, we derive a theoretical upper bound on model bias for when the model transition function is estimated with i.i.d. sampled trajectories.
This bound broadens our understanding of the conditions under which model-based methods have high bias.
Finally, we empirically evaluate our proposed methods and analyze the settings in which different bootstrapping off-policy methods succeed and fail.

Reasoning about Hypothetical Agent Behaviours and their Parameters

ABSTRACT. Agents can achieve effective interaction with previously unknown other agents by maintaining beliefs over a set of hypothetical behaviours, or types, that these agents may have. A current limitation in this method is that it does not recognise parameters within type specifications, because types are viewed as blackbox mappings from interaction histories to probability distributions over actions. In this work, we propose a general method which allows an agent to reason about both the relative likelihood of types and the values of any bounded continuous parameters within types. The method maintains individual parameter estimates for each type and selectively updates the estimates for some types after each observation. We propose different methods for the selection of types and the estimation of parameter values. The proposed methods are evaluated in detailed experiments, showing that updating the parameter estimates of a single type after each observation can be sufficient to achieve good performance.

Forward actor-critic for non-linear function approximation in reinforcement learning

ABSTRACT. Multi-step methods are important in reinforcement learning (RL). Usual way of handling them is through eligibility traces which has worked well with linear function approximators. Recently, van Seijen (2016) has introduced an alternative approach for handling multi-step updates, called forward SARSA(λ), that can work substantially better in the case of non-linear function approximators. These methods learn action-value estimates using multi-step update targets without eligibility traces. In this paper, we extend this idea to the policy gradient case to produce a forward actor-critic method. Additionally, we compared the proposed forward actor-critic method with conventional actor-critic on mountain car and pole-balancing tasks, which produced substantially better performance on these benchmark tasks. Notably, this forward actor-critic method has resulted in a new class of multi-step RL algorithms without eligibility traces.

ABSTRACT. Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until one will see large scale applications of reinforcement learning in various sectors, such as health-care, cyber-security---among others. However, reinforcement learning can be time-consuming because the learning algorithms have to determine the long term consequences of their actions using delayed feedback or rewards. Reward shaping is a method of incorporating domain knowledge into reinforcement learning so that the algorithms are faster guided towards more promising solutions. Under an overarching theme of episodic reinforcement learning, we show a unifying analysis of potential-based reward shaping which leads to new theoretical insights into reward shaping in both model-free and model-based algorithms, as well as in multi-agent reinforcement learning.

Learning to Partition using Score Based Compatibilties

ABSTRACT. We study the problem of learning to partition users into groups, where one must learn the compatibilities between the users to achieve optimal groupings. We define four natural objectives that optimize for average and worst case compatibilities and propose new algorithms for adaptively learning optimal groupings. When we do not impose any structure on the compatibilities, we show that the group formation objectives considered are $NP$ hard to solve and we either give approximation guarantees or prove inapproximability results. We then introduce an elegant structure, namely that of \textit{intrinsic scores}, that makes many of these problems polynomial time solvable. We explicitly characterize the optimal groupings under this structure and show that the optimal solutions are related to \emph{homophilous} and \emph{heterophilous} partitions, well-studied in the psychology literature. For one of the four objectives, we show $NP$ hardness under the score structure and give a $\frac{1}{2}$ approximation algorithm for which no constant approximation was known thus far. Finally, under the score structure, we propose an online low sample complexity PAC algorithm for learning the optimal partition. We demonstrate the efficacy of the proposed algorithm on synthetic and real world datasets.

ABSTRACT. Self driving cars hold the promise of making transportation safer and more efficient than ever before possible. They will also be among the most complex robotic systems ever fielded and thus require an unprecedented level of machine learning throughout to achieve their desired performance. These learners create an everchanging environment for all algorithms operating in the system and optimizing their performance will become a perpetual activity rather than a one-off task.

I will present active optimization methods and their application in robotics applications, focusing on scaling up the dimensionality and managing multi-fidelity evaluations. I will summarize lessons learned and thoughts on future directions as these methods move into fielded systems. Finally, I will describe current efforts on self driving cars and give some views on the future of them.