ABSTRACT. How can computers help ordinary people make collective decisions about real-life dilemmas, like which restaurant to go to with friends, or even how to divide an inheritance? In this talk, I will present an optimization-driven approach that draws on ideas from AI, theoretical computer science, and economic theory, and illustrate it through my research in computational social choice and computational fair division. In both areas, I will make a special effort to demonstrate how fundamental theoretical questions underlie the design and implementation of deployed services — Spliddit.org and RoboVote.org — that are already used by more than a hundred thousand people.
Manipulation of Hamming-based Approval Voting for Multiple Referenda and Committee Elections
ABSTRACT. Several methods exist for electing committees of representatives and conducting multiple referenda, based on approval voting. Recently, a family of rules for approval-based voting using ordered weighted averaging has been proposed. This family of rules ranges from simple candidate-wise majority, called minisum, to the egalitarian rule, minimax. While the first rule is strategy-proof and the second one is not, due to its egalitarian nature, only a partial study on manipulation has been conducted for the rules that are in-between.
This paper aims at studying the manipulability of fair rules within this family. We first prove that all rules parameterized by fair (non-increasing) weight vectors are manipulable, except minisum, if we consider them either deterministic with a tie-breaking mechanism or non-deterministic with classic extension principles. Then, we conduct an empirical study of the proportion of elections that are manipulable, showing that it increases with the fairness of the rule.
New Approximation for Borda Coalitional Manipulation
ABSTRACT. We study the problem of Borda Coalitional Manipulation,
where k manipulators out of n voters try to manipulate an
election on m candidates under the Borda protocol. This
problem is known to be NP-hard. While most recent approaches
to approximation tried to minimize k, the number
of manipulators needed to make the preferred candidate win
(thus assuming the number of manipulators is not limited
in advance), we focus instead on minimizing the maximum
score obtainable by a non-preferred candidate. We provide a
randomized, additive O(k sqrt(m)logm) approximation to this
value; in other words, if there exists a strategy enabling the
preferred candidate to win by an Omega(k sqrt(m)logm) margin, our
method, with high probability, will find a strategy enabling
her to win (albeit with a possibly smaller margin). It thus
provides a somewhat stronger guarantee compared to the
previous methods, where the addition of an extra manipulator
implied (with respect to the original k) a strategy that
provides an O(m)-additive approximation to a runner-up's
score: when k is smaller than sqrt(m)/logm, our strategy provides
a stronger approximation. Our algorithm can also be
viewed as a (1+o(1))-multiplicative approximation since the
value we approximate has a natural Omega(km) lower bound.
Our methods are novel and adapt techniques from multiprocessor
scheduling by carefully rounding of an exponentially large
configuration linear program that is solved by using the
ellipsoid method with an efficient separation oracle. We believe
that such methods could be beneficial in approximating
coalitional manipulation in other election protocols as well.
Complexity of Control by Partition of Voters and of Voter Groups in Veto and Other Scoring Protocols
ABSTRACT. In order to model maliciously resizing election districts (a.k.a. gerrymandering), Bartholdi et al. (1992) introduced control by partition of voters, which means that an election chair can influence the outcome of an election to her advantage by partitioning the voters such that two first-round subelections are created whose winners will take part in a final run-off. Control by partition of voter groups, due to Erd\'{e}lyi et al. (2015), refers to the same model with the additional constraint that a partition of voters into groups is given beforehand, which the chair's control action must respect: either all voters of a group take part in one of the two first-round subelections or none of them. Maushagen and Rothe (2016) recently classified some problems of control by partition of either voters or candidates for veto elections in terms of their computational complexity, leaving some other problems open. We solve all these remaining cases. In addition, we generalize a result of Erd\'{e}lyi et al. (2015) for constructive control by partition of voter groups from plurality elections to all nontrivial pure scoring protocols by showing NP-hardness, and we also obtain the analogous result for its destructive variant.
Divide and Conquer: Using Geographic Manipulation to Win District-Based Elections
ABSTRACT. District-based elections, in which voters vote for a district representative and those representatives ultimately choose the winner, are vulnerable to gerrymandering, i.e., manipulation of the outcome by changing the location and borders of districts. Many countries aim to limit blatant gerrymandering, and thus we introduce a geographically-based manipulation problem, where voters must vote at the ballot box closest to
them.
We show that this problem is NP-complete in the worst case. However, we present a greedy algorithm for this problem, and using it on both simulation data as well as real-world data from the 2015 Israeli and British elections, we show that many parties are potentially able to make themselves victorious using district manipulation. Moreover, we show that the relevant variables here are not only vote share, but the form of geographic dispersion plays a crucial role.
ABSTRACT. Multi-winner voting rules aiming at proportional representation, such as those suggested by Chamberlin and Courant and by Monroe, partition an electorate into virtual districts, such that a representative is assigned to each district; these districts are formed based on the voters' preferences. In some applications it is beneficial to require certain structural properties to be satisfied by these virtual districts. In this paper we consider situations where the voters are embedded in a network, and we require each virtual district to be connected (with respect to the network). We discuss applications of a corresponding combinatorial problem and study its computational complexity, identifying several variants and special cases which can be solved efficiently.
Referral-Embedded Provision Point Mechanisms for Crowdfunding of Public Projects
ABSTRACT. Civic Crowdfunding is emerging as a popular means to mobilize funding from citizens for public projects. A popular mechanism deployed on civic crowdfunding platforms is a provision point mechanism, wherein, the total contributions must reach a predetermined threshold in order for the project to be provisioned (undertaken). Such a mechanism may have multiple equilibria; unfortunately, in many of these equilibria, the project is not funded even if it is highly valued among the agents. Recent work has proposed mechanisms with refund bonuses where the project gets funded in equilibrium if its net value is higher than a threshold among the agents who are aware of the crowdfunding effort. In this paper, we formalize the notion of social desirability of a public project and propose mechanisms which use the idea of referrals to expand the pool of participants and achieve an equilibrium in which the project gets funded if its net value exceeds a threshold among the entire agent population. A key challenge in introducing a referral mechanism in civic crowdfunding settings is to ensure that incentivizing referrals should not dis-incentivize contributions. A referral mechanism introduced in conjunction with a civic crowdfunding mechanism must ensure that the project gets funded at equilibrium. The class of mechanisms we propose achieve this. We call this new class of mechanisms Referral-Embedded Provision Point Mechanisms (REPPM). In REPPMs, by referring others to contribute, an agent can reduce his equilibrium contribution but only up to a bound such that the project is funded at equilibrium. We propose two variants of REPPM and both these mechanisms have the remarkable property that, at equilibrium, referral bonuses are offered but there is no need for actual payment of these bonuses. In our view, REPPMs can increase the number of projects that are funded on civic crowdfunding platforms.
Spoofing the Limit Order Book: An Agent-Based Model
ABSTRACT. We present an agent-based model of manipulating prices in financial markets through spoofing: submitting spurious orders to mislead traders who observe the order book. Built around the standard limit-order mechanism, our model captures a complex market environment with combined private and common values, the latter represented by noisy observations upon a dynamic fundamental time series. We consider background agents following two types of trading strategies: zero intelligence (ZI) that ignores the order book and heuristic belief learning (HBL) that exploits the order book to predict price outcomes. By employing an empirical game-theoretic analysis to derive approximate strategic equilibria, we demonstrate the effectiveness of HBL and the usefulness of order book information in a range of non-spoofing environments. We further show that a market with HBL traders is spoofable, in that a spoofer can qualitatively manipulate prices towards its desired direction. After re-equilibrating games with spoofing, we find spoofing generally hurts market surplus and decreases the proportion of HBL. However, HBL's persistence in most environments with spoofing indicates a consistently spoofable market. Our model provides a way to quantify the effect of spoofing on trading behavior and efficiency, and thus measures the profitability and cost of an important form of market manipulation.
ABSTRACT. In this paper we study variations of the standard Hotelling-Downs model of spatial competition, where each agent attracts the clients in a restricted neighborhood, each client randomly picks one attractive agent for service.
Two utility functions for agents are considered: support utility and winner utility. We generalize the results in [9] to the case where the clients are distributed arbitrarily. In the support utility setting, we show that a pure Nash equilibrium always exists by formulating the game as a potential game. In the winner utility setting, we show that there exists a Nash equilibrium in two cases: when there are at most 3 agents and when the size of attraction area is at least half of the entire space. We also consider the price of anarchy and the fairness of equilibria under certain conditions.
Stop Nuclear Smuggling Through Efficient Container Inspection
ABSTRACT. Since 2003, the U.S. government has spent $850 million on the Megaport Initiative which aims at stopping the nuclear smuggling in international container shipping through advanced inspection facilities including Non-Intrusive Inspection (NII) and Mobile Radiation Detection and Identification System (MRDIS). Unfortunately, it remains a significant challenge to efficiently inspect more than 11.7 million containers imported to the U.S. due to the limited inspection resources. Moreover, existing work in container inspection neglects the sophisticated behavior of smugglers who can surveil the inspector's strategy and draw up a long-term sequential plan. This paper is the first to tackle such sophisticated smugglers in container inspection domain, where a novel Container Inspection Model (CIM) is proposed, which models the interaction between the inspector and smugglers as a leader-follower Stackelberg game, and formulates the smugglers' sequential decision behavior as a Markov Decision Process (MDP). The special structure of the CIM results in a non-convex optimization, which the existing approaches fail to address. Therefore, we make several key contributions: i) a linear relaxation approximation with guarantee of solution quality which reformulates the model as a bilinear optimization, ii) an algorithm inspired by the Multipleparametric Disaggregation Technique (MDT) to solve the reformulated bilinear optimization, and iii) an iterative algorithm to further improve the scalability. Extensive experimental evaluation shows that our approach can scale up to realistic-sized problems with a robust enough solution outperforming the heuristic baselines significantly.
Coordinating multiple defensive resources in patrolling games with alarm systems
ABSTRACT. Alarm systems represent a novel issue in Security Games. Recent works studied their employment, even considering various forms of uncertainty, and showed that disregarding them can lead to arbitrarily poor strategies. One of their key problems is computing the best strategy to respond to alarm signals for each mobile defensive resource. The current literature only solves the basic single–resource version of such problem. In this paper, we provide a solution for the multi–resource case addressing the challenge of designing algorithms to coordinate a scaling–up number of resources. First, we focus on finding the minimum number of resources assuring non–null protection to every target. Then, we deal with the computation of multi–resource strategies with different degrees of coordination among resources resorting to adversarial team game models. For each considered problem, we provide algorithms and their theoretical and empirical analysis.
A path in the jungle of logics for multi-agent systems: on the relation between general game-playing logics and seeing-to-it-that logics
ABSTRACT. In the recent years, several concurrent logical systems for reasoning about agency and social interaction and for representing game properties have been proposed. The aim of the present paper is to put some order in this ‘jungle’ of logics by studying the relationship between the dynamic logic of agency DLA and the game description language GDL. The former has been proposed as a variant of the logic of agency STIT by Belnap, Perloff, and Xu (2001) in which agents' action are named, while the latter has been introduced in AI as a formal language for reasoning about general game-playing. The paper provides complexity results for the satisfiability problems of both DLA and GDL as well as a polynomial embedding of GDL into DLA.
ABSTRACT. Linear Dynamic Logic on finite traces (LDL_F) is a powerful logic for reasoning about the behaviour of concurrent and multi-agent systems. In this paper, we investigate techniques for both the characterisation and verification of equilibria in multi-player games with goals/objectives expressed using logics based on LDL_F. This study builds upon a generalisation of Boolean games, a logic-based game model of multi-agent systems where players have goals succinctly represented in a logical way. Because LDL_F goals are considered, in the setting we study---iterated Boolean games with goals over finite traces (iBG_F)---players' goals can be defined to be regular properties while achieved in a finite, but arbitrarily large, trace. In particular, using alternating automata, the paper investigates automata-theoretic approaches to the characterisation and verification of (Nash) equilibria, shows that the set of Nash equilibria in games with LDL_F objectives is regular, and provides complexity results for the associated automata constructions.
ABSTRACT. Rational verification is the problem of understanding what temporal
logic properties hold of a multi-agent system when agents are
modelled as players in a game, each acting rationally in pursuit of
personal preferences. More specifically, rational verification asks
whether a given property, expressed as a temporal logic formula, is
satisfied in a computation of the system that might be generated if
agents within the system choose strategies for selecting actions
that form a Nash equilibrium. We show that, when agents are modelled
using the Simple Reactive Modules Language, a widely-used system
modelling language for concurrent and multi-agent systems, this
problem can be reduced to a simpler query: whether some iterated
game---in which players have control over a finite set of Boolean
variables and goals expressed as Linear Temporal Logic (LTL)
formulae---has a Nash equilibrium. To better understand the
complexity of solving this kind of verification problem in practice,
we then study the two-player case for various types of LTL goals,
present some experimental results, and describe a general technique
to implement rational verification using MCMAS, a model checking
tool for the verification of concurrent and multi-agent systems.
ABSTRACT. In game theory, as well as in the semantics of game logics, a strategy can be represented by any function from states of the game to the agent's actions. That makes sense from the mathematical point of view, but not necessarily in the context of human behavior. This is because humans are quite bad at executing complex plans, and also rather unlikely to come up with such plans in the first place. In this paper, we adopt the view of bounded rationality, and look only at "simple" strategies in specifications of agents' abilities. We formally define what "simple" means, and propose a variant of alternating-time temporal logic that takes only such strategies into account. We also study the model checking problem for the resulting semantics of ability.
ABSTRACT. The paper proposes a bimodal logic that describes an interplay between coalition strategies and epistemic knowledge. Unlike the existing literature, the paper assumes that a strategy must be executable and verifiable. That is, the strategy of a coalition should be based only on the information distributively known by the coalition and the coalition must be able to verify the result after the strategy is executed. The main technical result of the paper is a sound and complete logical system describing all universal properties expressible in the proposed bimodal language.
Exploiting Anonymity and Homogeneity in Factored Dec-MDPs through Precomputed Binomial Distributions
ABSTRACT. Recent work in decentralized stochastic planning for cooperative agents has focussed on exploiting homogeneity of agents and anonymity in interactions to solve problems with large numbers of agents. A key insight that helps achieve improved scalability with respect to number of agents is to approximate joint expected reward with reward for expected number of agents in all state, action pairs. While such an approximation works well in problems with large numbers
of agents, it performs significantly worse than existing approaches on problems with fewer agents. In this paper, we provide a direct approximation of joint expected reward that is based on offline computation of binomial distributions. Our new technique is not only able to improve quality performance on problems with
large numbers of agents, but is able to perform on par or better than existing best approaches on problems with fewer agents. This is achieved without sacrificing on scalability/run-time performance of the previous work.
GUBS: a utility-based semantic for Goal-Directed Markov Decision Processes
ABSTRACT. A key question in stochastic planning is how to evaluate policies when a goal cannot be reached with probability one. Usually, in the Goal-Directed Markov Decision Process (GD-MDP) formalization, the outcome of a policy is summarized on two criteria: probability of reaching a goal state and expected cost to reach a goal state. The dual criterion solution considers a lexicography preference, by prioritizing probability of reaching the goal state and then minimizing the expected cost to goal. Some other solutions, consider only cost by using some math trick to guaranteed that every policy has a finite expected cost. In this paper we show that the lexicography solution does not allow a smooth trade-off between goal and cost, while the expected cost solution does not define a goal-semantic. We propose {\sc GUBS} (Goals with Utility-Based Semantic), a new model to evaluate policies based on the expected utility theory; this model defines a trade-off between cost and goal by proposing an axiomatization for goal semantics in GD-MDPs. We show that our model can be solved by any continuous state MDP solver and propose an algorithm to solve a special class of GD-MDPs.
Cost-Based Goal Recognition for Path-Planning [BEST PAPER NOMINEE]
ABSTRACT. Recent developments in plan recognition have established a method of identifying an agent's most probable plan or goal by use of a classical planner and an appeal to the principle of rational action. We examine this technique in the strict context of path-planning. We show that a simpler formula provides an identical result in less than half the time under all but one set of conditions, which we identify. Further, we show that the probability distribution based on this technique is observation-independent and present a revised formula that achieves goal-recognition by reference to the agent's current location only.
BDI Agent Reasoning with Guidance from HTN Recipes
ABSTRACT. Belief-Desire-Intention (BDI) agent systems and Hierarchical Task Network (HTN) planning systems are two popular approaches to acting and planning, both of which are based on hierarchical, context-based expansion of subgoals. Over the past three decades, various authors have recognised the similarities between the two approaches, and developed methods for making domain knowledge that is embedded in one system accessible to the other, and for augmenting BDI agents with the ability to perform HTN-style lookahead.
This paper makes a novel contribution to this strand of work by developing a formal account of “plugging” available HTN hierarchies (e.g. from the International Planning Competition) into a BDI agent’s “goal-plan” hierarchy. When combined with lookahead-based execution, the agent’s new hierarchies are guaranteed to behave in accordance with the “operational guidelines” embedded in the HTN hierarchies. We also explore using HTNs to obtain BDI hierarchies that can be executed without performing any lookahead, and study the resulting tradeoffs. In particular, we show that in some cases, BDI preconditions cannot replace (or "mimic") HTN-style lookahead without losing completeness, and we characterise the restrictions that are needed in HTNs to be able to encode them as BDI entities.
Progressing Intention Progression: A Call for a Goal-Plan Tree Contest
ABSTRACT. User-supplied domain control knowledge in the form of hierarchically structured Goal-Plan Trees (GPTs) is at the heart of a number of approaches to reasoning about action. Reasoning with GPTs connects the AAMAS community with other communities such as automated planning, and forms the foundation for important reasoning capabilities, especially intention progression in Belief-Desire-Intention (BDI) agents. Research on GPTs has a long history but suffers from fragmentation and lack of common terminology, data formats, and enabling tools. One way to address this frag- mentation is through a competition. Competitions are now familiar as a means to foster research and challenge the state of the art. The AAMAS conference has a number of associated competitions, such as the Trading Agent Competition, while agent research is showcased at competitions such as RoboCup. We therefore issue a call for a Goal-Plan Tree Contest, with the ambition of drawing together a community and incentivizing research.
Using Virtual Narratives to Explore Children's Story Understanding
ABSTRACT. Interactive Narratives are systems that use automated narrative generation techniques to create multiple story variants which can be shown to an audience, as "virtual" narratives, using cinematic staging techniques. Previous research in this area has focused on assessment of aspects such as the quality of the automatically generated narratives and their acceptance by the audience. However in our work we deviate from this to explore the use of interactive narratives to support cognitive psychology experiments in story understanding. We hypothesized that the use of virtual narratives would enable narrative comprehension to be studied independently of linguistic phenomena. To assess this we developed a demonstration interactive narrative featuring a virtual environment (Unity3D engine) based on a pre-existing children's story which allows for the generation of variants of the original story that can be "told" via visualization in the 3D world. In the paper we introduce a narrative generation mechanism that provides control over insertion of cues facilitating story understanding, whilst also ensuring that the plot itself is unaffected. An intuitive user interface allows experimenters to insert and order cues and specific events while the narrative generation techniques ensure these requests are effected in a consistent fashion. We also report the results of a field experiment with children (age 9-10) that demonstrates the potential for the use of virtual narratives in story understanding experiments. Our results demonstrated acceptance of virtual narratives, the usability of the system and the impact of cue insertion on inference and story understanding.
MISER: Mise-En-Scene Region Support for Staging Narrative Actions in Interactive Storytelling
ABSTRACT. The recent increase in interest in Interactive Storytelling systems, spurred on by the emergence of affordable virtual reality technology, has brought with it a need to address the way in which narrative content is visualized through the complex staging of multiple narrative agents' behaviors within virtual story worlds. In this work we address the challenge of automating several aspects of staging the activities of a population of narrative agents and their interactions, where agents can have differing levels of narrative relevance within the situated narrative actions.
Our solution defines an approach that integrates the use of multiple dynamic regions within a virtual story world, specified via a semantic representation that is able to support the staging of narrative actions through the behaviors of the primary and supporting agents' that are involved. This encompasses both the mechanics of dealing with the narrative discourse level as well as the interaction with the narrative generation layer to account for any dynamic modifications of the virtual story world. We refer to this approach as MISE-en-scene Region (MISER) support.
In the paper, we describe our approach and its integration as part of a fully implemented Interactive Storytelling system. We illustrate the work through detailed examples of short narrative instantiations. We present the results of our evaluation which clearly demonstrate the potential of the MISER approach, as well as its scalability.
Pedagogical Agents as Team Members: Impact of Proactive and Pedagogical Behavior on the User
ABSTRACT. In the context of a virtual environment for the learning of cooperative tasks, learners have to coordinate with one or more autonomous agents in order to perform a task. This coordination requires that the human and agent team members reason and dialogue about their resources, action plans, and shared actions. This article proposes a new agent architecture called PC$^2$BDI designed to generate reactive, proactive, and pedagogical behaviors of the agent based on three agendas: the task to be performed, dialogues, and the pedagogical scenario.
An experimental study, in the context of the learning of a procedural activity in a virtual environment involving 3 team members (1 human and 2 agents), is presented to evaluate the effect of the agent behavior on a learner. The results show that the use of proactive pedagogical agents improves the learner's engagement and task learning.
Incorporating Emotion Perception into Opponent Modeling for Social Dilemmas
ABSTRACT. Many everyday decisions involve a social dilemma: cooperation can enhance joint gains, but also make one vulnerable to exploitation. Emotion and emotional signaling is an important element of how people resolve these dilemmas. With the rise of affective computing, emotion is also an important element of how people resolve these dilemmas with machines. In this article, we learn a predictive model of how people make decisions in an iterative social dilemma. We further show that the accuracy of this model improves by incorporating a player's emotional displays as input. Finally, we show how this mode can be used to perform ``social planning'': i.e., to generate a sequence of actions and expressions that achieve social goals (such as maximizing individual rewards). These techniques could be used to enhance machine-understanding of human behavior, as social decision-aids, or to drive the actions of virtual and robotic agents.
"Is It Just Me?": Evaluating Attribution of Negative Feedback as a Function of Virtual Instructor’s Gender and Proxemics
ABSTRACT. Virtual agents are used in a number of different outcome-based contexts such as physical and mental health, skill-based training, as well as classroom learning and pedagogy. Virtual agents in such applications are largely designed so that they project positive attitude and feedback towards the human participant. Human-human interactions, however, are certainly not exclusively positive in valence. For example, teachers and educators engage in both positive and negative feedback strategies for pedagogical outcomes. While the distinct effects of positive and negative feedback on learning are well established, few studies have attempted to examine the effects of negative feedback across different combinations of instructor's gender and proxemics-based physical behavior. This study explores this very question with a 2 (instructor gender)*2 (proxemic behavior) between subject design. In this experiment, participants (N=63) actively engage in a learning task with a male/female virtual instructor that provides negative feedback while either standing stationary or while physically approaching the participant. Based on the different deliveries of the negative feedback, the study aimed to identify the sources of variations in participant reactions to the negative feedback, namely patterns of attribution and both behavioral and physiological measurements of emotions. The results indicate that participants attribute greater self-blame (internal attribution) for their purported poor performance when interacting with the female virtual instructor than when interacting with the male virtual instructor. Participants also generally exhibited greater positive affect in response to female virtual professors than male virtual professors. These results are highly relevant both to the design of virtual agents as well as to adding to our understanding of the role of gender and behavior in human-human, non-peer interaction.
Data-driven simulation and optimization for incident response in urban railway networks
ABSTRACT. In this article we introduce an integrated agent-based train, passenger and incident simulation engine for data-driven incident response in urban railway networks. We model the movement of passengers and trains as individual agents behaving according to parsimonious models defined by data availability. Appropriate statistical routines are implemented for model calibration. We also design a generic incident model appropriate for typical localized mechanical failure scenarios in which the transport supply is adversely impacted in a short spatio-temporal window. Given the brief and localized nature of these events, a mathematical programming formulation is proposed, which computes the optimal action plan for a specific incident. The set of action plans considered includes re-scheduling existing train services as well as running temporary services. The numerical performance of the simulation engine introduced in this work is presented using a large dataset of real anonymized smart card data. The results of the proposed optimization framework are then evaluated using real incident scenarios.
Real-time Adaptive Tolling Scheme for Optimized Social Welfare in Traffic Networks
ABSTRACT. Connected and autonomous vehicle technology has advanced rapidly in recent years.
These technologies create possibilities for
AI-based traffic management techniques. Developing such techniques is an important challenge and opportunity for the AI community as it requires synergy between experts in game theory, multiagent systems, behavioral science, and flow optimization.
This paper takes a step in this direction by considering traffic flow optimization through setting and
broadcasting of dynamic and adaptive tolls.
Previous tolling schemes either were not adaptive in real-time, not scalable to large networks, or did not optimize the entire system. Moreover, previous schemes made strong assumptions on observable demands, road capacities and users homogeneity.
This paper introduces Delta-tolling, a novel tolling scheme that is adaptive in real-time and able to scale to large networks. We provide theoretical evidence showing that under certain assumptions Delta-tolling is equal to Marginal-Cost Tolling, which provably leads to system-optimal, and empirical evidence showing that Delta-tolling increases social welfare (by up to 33%) in two traffic simulators with markedly different modeling assumptions.
Lifelong Multi-Agent Path Finding for Online Pickup and Delivery Tasks
ABSTRACT. The multi-agent path finding (MAPF) problem has recently received a lot of attention. However, it does not capture important characteristics of many real-world domains, such as automated warehouses, where agents are constantly engaged with new tasks and destinations. In this paper, we study the lifelong version of the MAPF problem dubbed as the multi-agent pickup and delivery (MAPD) problem. In the MAPD problem, agents not only avoid collisions with each other but also attend to delivery tasks that can appear at any time with corresponding pickup and delivery locations. Each agent that attends to an assigned task has to first move to its pickup location and then to its delivery location before it attends to the next task. We present three MAPD algorithms based on existing MAPF methods. Theoretically, we show that all three algorithms are complete if the MAPD problem instance satisfies a realistic sufficient condition and qualifies as a well-formed infrastructure. Experimentally, we show that the algorithms scale well for MAPD instances with dozens of agents and hundreds of tasks as required for real-world applications.
ABSTRACT. Reinforcement learning (RL) is a challenging task, especially in highly competitive multiagent scenarios. We consider the route choice problem, in which self-interested drivers aim at choosing routes that minimise their travel times. Employing RL here is challenging because agents must adapt to each others' decisions. In this paper, we investigate how agents can overcome such condition by minimising the regret associated with their decisions. Regret measures how much worse an agent performs on average compared to the best fixed action in hindsight. We present a simple yet effective regret-minimising algorithm to address this scenario. To this regard, we introduce the action regret, which measures the performance of each route in comparison to the best one, and employ it as reinforcement signal. Given that agents do not know the cost of all routes (except for the currently taken ones) in advance, we also devise a method through which they can estimate the action regret. We analyse the theoretical properties of our method and prove it minimises the agents' regret by means of the action regret. Furthermore, we provide formal guarantees on the agents' convergence to a phi-approximate User Equilibrium, where phi is the bound on the agents' regret. To the best of our knowledge, this is the first work in which RL-agents are formally proven to converge to an approximate UE, without further assumptions, in the context of route choice.
ABSTRACT. We consider a social choice problem where only a small number of people out of a large population are sufficiently available or motivated to vote. A common solution to increase participation is to allow voters use a proxy, that is, transfer their voting rights to another voter. Considering social choice problems on metric spaces, we compare voting with and without the use of proxies to see which mechanism better approximates the optimal outcome, and characterize the regimes in which proxy voting is beneficial.
When voters' opinions are located on an interval, both the median mechanism and the mean mechanism are substantially improved by proxy voting. When voters vote on many binary issues, proxy voting is better when the sample of active voters is too small to provide a good outcome. Our theoretical results extend to situations where available voters choose strategically whether to participate. We support our theoretical findings with empirical results showing substantial benefits of proxy voting on simulated and real preference data.
Real Candidacy Games: A New Model for Strategic Candidacy
ABSTRACT. We introduce a novel model of strategic candidacy, Real Candidacy Games (RCGs), where candidates have a continuous range of positions that effect their attractiveness for voters. In addition, we assume that candidates may have their own non-trivial preferences over the candidate set. We study RCGs with restricted and unrestricted positioning strategies to establish conditions for a Nash Equilibrium (NE) existence. That is, we investigate under what voting rules and tie-breaking schemes, a stable candidate positioning exists. While we obtain positive results for voting rule classes (e.g. Condorcet-Consistent), we also show that for some scoring rules there are examples without a NE for an arbitrarily large number of voters.
ABSTRACT. In Doodle polls, each voter approves a subset of the available alternatives
according to his preferences. While such polls can be captured by the standard
models of Approval voting, Zou et al. analyse real-life Doodle poll data
and conclude that poll participants' behaviour seems to be affected by considerations other than their intrinsic preferences over the alternatives. To capture this phenomenon, they propose a model of social voting, where voters approve their top alternatives as well as additional `safe' choices so as to appear cooperative. The predictions of this model turn out to be consistent with the real-life data. However, Zou et al.~do not attempt to rationalise the voters' behavior in the context of social voting: they explicitly describe the voters' strategies rather than explain how these strategies arise from voters' preferences.
In this paper, we complement the work of Zou et al. by putting forward a model in which the behaviour described by Zou et al. arises as an equilibrium strategy. In our model, a voter derives a bonus from approving each additional alternative, up to a certain cap. We show that trembling hand perfect Nash equilibria of our model behave consistently with the model of Zou et al. Importantly, placing a cap on the total bonus is an essential component of our model: in the absence of the cap, all Nash equilibria are very far from the behaviour observed in Doodle polls.
ABSTRACT. In recent years, there has been an increasing interest within the computational social choice community in models where voters are biased towards specific behaviors or have secondary preferences. An important representative example of this approach is the model of truth bias, where voters prefer to be honest about their preferences, unless they are pivotal. This model has been demonstrated to be an effective tool in controlling the set of pure Nash equilibria in a voting game, which otherwise lacks predictive power. However, in the models that have been used thus far, the bias is binary, i.e., the final utility of a voter depends on whether he casted a truthful vote or not, independently of the type of lie.
In this paper, we introduce a more robust framework, and eliminate this limitation, by investigating truth-biased voters with variable bias strength. Namely, we assume that even when voters face incentives to lie towards a better outcome, the ballot distortion from their truthful preference incurs a cost, measured by a distance function. We study various such distance-based cost functions and explore their effect on the set of Nash equilibria of the underlying game. Intuitively, one might expect that such distance metrics may induce very similar behavior. To our surprise, we show that the presented metrics exhibit quite different equilibrium behavior.
In this paper, we introduce a more robust framework, and eliminate this limitation, by investigating truth-biased voters with variable bias strength. Namely, we assume that even when voters face incentives to lie towards a better outcome, the ballot distortion from their truthful preference incurs a cost, measured by a distance function. We study various such distance-based cost functions and explore their effect on the set of Nash equilibria of the underlying game. Intuitively, one might expect that such distance metrics may induce very similar behavior. To our surprise, we show that the presented metrics exhibit quite different equilibrium behavior.
A Restricted Markov Tree Model for Inference and Generation in Social Choice with Incomplete Preferences
ABSTRACT. We introduce a probabilistic graphical model of ranked preferences for social choice based on a restricted version of a kth-order Markov tree. The system is similar to Plackett's model, and is based on the intuition that, in some domains, an agent's next most preferred alternative is highly predictable given a few of the immediately preceding alternatives. We provide a bound on the data requirements to learn the parameters of such a model and show eventual consistency between most probable sequences of the model and both the centroid of a Mallows model and the induced ranking of a RUM, theoretically and on artificial data; this is followed by a full evaluation of the model as a social choice system, including comparisons with existing approaches on 6 real world datasets. An application of the system to a simulated multiagent coordination task is also demonstrated, in which the proposed system offers pronounced advantages over existing approaches.
Combining Incremental Strategy Generation and Branch and Bound Search for Computing Maxmin Strategies in Imperfect Recall Games
ABSTRACT. Extensive-form games with imperfect recall are an important model of dynamic games where the players are allowed to forget previously known information. Often, imperfect recall games are the result of an abstraction algorithm that simplifies a large game with perfect recall. Unfortunately, solving an imperfect recall game has fundamental problems since a Nash equilibrium does not have to exist. As an alternative, one can compute maxmin strategies that guarantee an expected outcome. The only algorithm for computing maxmin strategies in imperfect recall games, however, requires approximating a bilinear mathematical program that is proportional to the size of the whole game and thus has a limited scalability. We propose a novel algorithm for computing maxmin strategies that combines this approximate algorithm with an incremental strategy-generation technique designed previously for extensive-form games with perfect recall. Experimental evaluation shows that the proposed algorithm often builds only a fraction of the whole game and improves the scalability by several orders of magnitude.
Computing Approximate Pure Nash Equilibria in Digraph k-Coloring Games
ABSTRACT. We investigate approximate pure Nash equilibria in digraph $k$-coloring games, where we are given an unweighted directed graph together with a set of $k$ colors. Vertices represent agents and arcs their mutual unidirectional interests. The strategy set of each agent $v$ consists of the $k$ colors and the payoff of $v$ in a given state or coloring is given by the number of outgoing neighbors with a color dfferent from the one of $v$. Such games form some of the basic payoff structures in game theory, model lots of real-world scenarios with selfish agents and extend or are related to several fundamental class of games.
It is known that the problem of understanding whether the game admits a pure Nash equilibrium is NP-complete. Therefore we focus on designing polynomial time algorithms that return approximate Nash equilibria. Informally, we say that a coloring is a $\gamma$-Nash equilibrium (for some $\gamma \geq 1$) if no agent can strictly improve her payoff by a factor of $\gamma$ by changing color.
We first propose a deterministic polynomial time algorithm that, for any $k \geq 3$, returns a $k$-coloring that is a $\Delta_o(G)$-Nash equilibrium, where $\Delta_o(G)$ is the maximum outdegree of the digraph.
We then provide our two main results:
i) By exploiting the constructive version of the well known Lov\a' asz Local Lemma, we show a randomized algorithm with polynomial expected running time that, given any constant $k \geq 2$, computes a constant-Nash equilibrium for a broad class of digraphs, i.e., for digraphs where, for any $v \in V$, $\delta_{o}^{v}(G) = \Omega(\ln \Delta_o(G) + \ln \Delta_i(G))$ where $\Delta_o(G)$ (resp. $\Delta_i(G)$) is the maximum outgoing (resp. maximum ingoing) degree of $G$, and $\delta_{o}^{v}(G)$ is the outgoing degree of agent $v$.
ii) For generic digraphs, we show a deterministic polynomial time algorithm that computes a ($1+\epsilon$)-Nash equilibrium, for any $\epsilon > 0$, by using $O(\frac {\log n} {\epsilon} )$ colors.
Safely Using Predictions in General-Sum Normal Form Games
ABSTRACT. It is often useful to predict opponent behavior when playing a general-sum two-player normal form game. However best-responding to an inaccurate prediction can lead to a strategy which is vulnerable to exploitation. This paper proposes a novel method, Restricted Stackelberg Response with Safety (RSRS), for an agent to select a strategy to respond to a prediction. The agent uses the confidence it has in the prediction and a safety margin which reflects the level of risk it is willing to tolerate to make a controlled trade-off between best-responding to the prediction and providing a guarantee of worst-case performance. We describe an algorithm which selects parameter values for RSRS to produce strategies that play well against the prediction, deal with a best-responding opponent, and guard against worst-case outcomes. We report results obtained by the algorithm on multiple general-sum games against different opponents.
ABSTRACT. We aim to find a winning strategy that determines the arguments a proponent should assert during a dialogue such that it will successfully persuade its opponent of some goal arguments, regardless of the strategy employed by the opponent. By restricting the strategies we consider for the proponent to what we call \emph{simple strategies} and by modelling this as a planning problem, we are able to use an automated planner to generate \emph{optimal simple strategies} for realistically sized problems. These strategies guarantee with a certain probability (determined by the proponent's uncertain model of the arguments known to the opponent) that the proponent will be successful no matter what strategy the opponent employs. Our model accounts for the possibility that the proponent, when asserting arguments, may give away knowledge that the opponent can subsequently use against it; we investigate how this affects both the time taken to find an optimal simple strategy and its probability of success.
Are ranking semantics sensitive to the notion of core?
ABSTRACT. In this paper we study the impact of two notions of core on the output of ranking semantics in logical argumentation frameworks. We consider the existential rules fragment, a language widely used in Semantic Web and Ontology Based Data Access applications. Using burden semantics as example we show how some ranking semantics yield different outputs on the argumentation graph and its cores. We extend existing results in the literature regarding core equivalences on logical argumentation frameworks and propose the first formal characterization of core-induced modification for a class of ranking semantics satisfying given postulates.
Complexity Results for Aggregating Judgments using Scoring or Distance-Based Procedures
ABSTRACT. Judgment aggregation is an abstract framework for studying collective decision making by aggregating individual opinions on logically related issues. Important types of judgment aggregation methods are those of scoring and distance-based methods, many of which can be seen as generalisations of voting rules. An important question to investigate for judgment aggregation methods is how hard it is to find a collective decision by applying these methods. In this article we study the complexity of this "winner determination" problem for some scoring and distance-based judgment aggregation procedures. Such procedures aggregate judgments by assigning values to judgment sets. Our work fills in some of the last gaps in the complexity landscape for winner determination in judgment aggregation. Our results reaffirm that aggregating judgments is computationally hard and strongly point towards the necessity of analysing approximation methods or parameterized algorithms in judgment aggregation.
Reinforcement Learning for Multi-Step Expert Advice
ABSTRACT. Complex tasks for heterogeneous data sources, such as finding and linking named entities in text documents or detecting objects in images, often require multiple steps to be solved in a processing pipeline.
In most of the cases, there exist numerous, exchangeable software components for a single step, each an ''expert'' for data with certain characteristics.
Which expert to apply to which observed data instance in which step becomes a challenge that is even hard for humans to decide.
In this work, we treat the problem as Single-Agent System (SAS) where a central agent learns how to best exploit experts.
We therefore define locality-sensitive relational measures for experts and data points, so-called ''meta-dependencies'', to assess expert performances, and use them for decision-making via Model-Free Online- and Batch Reinforcement Learning (RL) approaches, based on techniques from Contextual Bandits (CBs) and Statistical Relational Learning (SRL).
The resulting system automatically learns to pick the best pipeline of experts for a given set of data points.
We evaluate our approach for Entity Linking on text corpora with heterogeneous characteristics (such as news articles or tweets).
Our empirical results improve the estimation of expert accuracies as well as the out-of-the-box performance of the original experts without manual tuning.
Applying Copeland Voting to Design an Agent-Based Hyper-Heuristic
ABSTRACT. Meta-heuristics are algorithms which are applied to solve problems when conventional algorithms can not find good solutions in reasonable time. As there are many possible meta-heuristics, finding the most suitable meta-heuristic for a given problem is not a trivial task. In order to make this choice, one can design hyper-heuristics. In the literature, one can find some agent-based research whose focus is to propose a framework where meta-heuristics are considered as agents, that solve a given problem in a collaborative or competitive way. Most of these works focus on mono-objectives meta-heuristics. Other works focus on how to select multi-objective meta-heuristics, but not using an agent-based approach. We present in this work an agent-based hyper-heuristic for choosing the most suitable meta-heuristic for a given problem. Our approach performs a cooperative Copeland voting procedure between five different metrics to define which one of three competitive meta-heuristic should execute during a certain processing time. We used the Walking Fish Problem (WFG) suite with two and three objectives to verify the proposed approach performance. The obtained results showed that in all cases our strategy found the most indicated meta-heuristic and gets competitive results against the state of art.
A Multiagent System Approach to Scheduling Devices in Smart Homes
ABSTRACT. Demand-side management (DSM) in the smart grid allows customers to make autonomous decisions on their energy consumption, helping energy providers to reduce the peaks in load demand. The automated scheduling of smart devices in residential and commercial buildings plays a key role in DSM. Due to data privacy and user autonomy, such an approach is best implemented through distributed multi-agent systems. This paper makes the following contributions:
(i) It introduces the Smart Home Device Scheduling (SHDS) problem, which formalizes the device scheduling and coordination problem across multiple smart homes as a multi-agent system;
(ii) It describes a mapping of this problem to a Distributed Constraint Optimization Problem (DCOP) and to a potential game;
(iii) It proposes a distributed algorithm for the SHDS problem; and
(iv) It presents empirical results from a physically distributed system of Raspberry Pis, each capable of controlling smart devices through hardware interfaces.
Label Correction and Event Detection for Electricity Disaggregation
ABSTRACT. Electricity disaggregation focuses on identifying individual appliances from one or more aggregate signals. By reporting detailed appliance consumption to users, disaggregation has the potential to reduce electrical waste in residential and commercial sectors. However, widespread application of existing methods is limited by two critical shortcomings. First, supervised learning methods implicitly assume error-free labels in training data, an unrealistic expectation that lowers classification accuracy on imperfectly-labeled consumer data. Second, both supervised and unsupervised learning methods suffer from the need to tune multiple parameters to individual appliances and/or datasets, limiting accurate classification across large numbers of samples. To address these limitations, this paper introduces the implementation of Bayesian changepoint detection (BCD) with necessary adaptations to electricity disaggregation. We introduce an algorithm to effectively apply changepoint detection to automatically correct labels. We then apply the same BCD method to event detection to identify transitions between appliances' on and off states, the first such parameter-free application. Performance is evaluated using 3 fully labeled, publicly available datasets containing over 250 appliances across 11 houses. Results show both BCD applications are competitive, and in some cases outperform existing state-of-the-art methods, overcoming the need for manual parameter tuning and advancing disaggregation towards widespread, real-world deployment.
A Distributed Constraint Optimization (DCOP) Approach to the Economic Dispatch with Demand Response
ABSTRACT. With the growing complexity of the current power grid, there is an increasing need for intelligent operations coordinating energy supply and demand. A key feature of the smart grid vision is that intelligent mechanisms will coordinate the production, transmission, and consumption of energy in a distributed and reliable way. Economic Dispatch (ED) and Demand Response (DR) are two key problems that need to be solved to achieve this vision. In traditional operations, ED and DR are implemented separately, despite the strong inter-dependences between these two problems.
In traditional operations, ED and DR are implemented separately, despite their strong inter-dependences.
To contrast this background, (1) we propose an integrated approach to the ED and DR problems that simultaneously maximizes the benefits of customers and minimizes the generation costs; (2) we introduce an effective multi-agent-based algorithm, based on the distributed constraint optimization (DCOP) model, acting on direct control of both generators and dispatchable loads;
(3) we employ General Purpose Graphical Processing Units (GPGPUs) to cope with the high complexity of the problem and speed up the solving time; and
(4) We empirically evaluate the proposed algorithms on standard IEEE bus systems and test the stability of the proposed solution with a state-of-the-art power system simulator on the IEEE 30-bus system.
Save Money or Feel Cozy? A Field Experiment Evaluation of a Smart Thermostat that Learns Heating Preferences
ABSTRACT. We present the design of a fully autonomous smart thermostat that supports end-users in managing their heating preferences in a real-time pricing regime. The thermostat uses a machine learning algorithm to learn how a user wants to trade off comfort versus cost. We evaluate the thermostat in a field experiment in the UK involving 30 users over a period of 30 days. We make two main contributions. First, we study whether our smart thermostat enables end-users to handle real-time prices, and in particular, whether machine learning can help them. We find that the users trust the system and that they can successfully express their preferences; thus, the smart thermostat enables the users to manage their heating given real-time prices. Moreover, our machine learning-based thermostats outperform a baseline without machine learning in terms of usability. Second, we present a quantitative analysis of the users' economic behavior, including their reaction to price changes, their price sensitivity, and their comfort-cost trade-offs. We find a wide variety regarding the users' willingness to make trade-offs. But in aggregate, the users' thermostat settings were sufficient to enable a large amount of demand response, reducing the average energy consumption during peak hours by 38%.
Adaptive Pricing Mechanisms for On-Demand Mobility
ABSTRACT. We consider on-demand car rental systems for public transportation. In these systems, demands are often unbalanced across different parking stations, necessitating costly manual relocations of vehicles. To address this so-called ``deadheading'' effect and maximise the operator's revenue, we propose two novel pricing mechanisms. These adaptively adjust the prices between origin and destination stations depending on their current occupancy, probabilistic information about the customers' valuations and estimated relocation costs. In so doing, the mechanisms incentivise drivers to help rebalance the system and place a premium on trips that lead to costly relocations. We evaluate the mechanisms in a series of experiments using real historical data from an existing on-demand mobility system in a French city. We show that our mechanisms achieve an up to 64% increase in revenue for the operator and at the same time up to 36% fewer relocations.
Uniform Information Exchange in Multi-channel Wireless Ad Hoc Networks
ABSTRACT. Information exchange is a basic primitive for maintaining the smooth running of a network. In particular,
$k$ packets are initially stored at $k$ nodes, and the problem is to
disseminate the $k$ packets to the whole network with the objective of minimizing the time used.
We study this problem in single-hop multi-channel networks of $n$ nodes, and target on devising uniform distributed protocols that do not rely on any prior knowledge of network parameters, such as the network size $n$ or the number of packet holders $k$. Uniform protocols have better scalability and are more suitable for implementation in reality. Specifically, we propose a uniform distributed protocol that with high probability accomplishes the dissemination
in $O(k/\mathcal{F} + \mathcal{F}\cdot \log n)$ rounds, assuming $\mathcal{F}$
available channels. This protocol is asymptotically optimal
when $k$ is large ($k \geq \mathcal{F}^2 \cdot \log n$), and provides the best possible linear speedup with multiple channels comparing the results using a single channel. To the best of our knowledge, this
is the first uniform protocol for information exchange in
multi-channel networks.
Multi-agent nonlinear negotiation for Wi-Fi channel assignment
ABSTRACT. Optimizing resource use in complex networks with self-interested participants (e.g. transportation networks, electric grids, Internet systems) is a challenging and increasingly critical real-world problem. We propose an approach for solving this problem based on multi-agent nonlinear negotiation, and demonstrate it in the context of Wi-Fi channel assignment. We compare the performance of our proposed approach with a complete information optimizer based on particle swarms, together with the de facto heuristic technique based on using the least congested channel. We have evaluated all these techniques in a wide range of settings, including randomly generated scenarios and real-world ones. Our experiments show that our approach outperforms the rest of techniques in terms of social welfare. The particle swarm optimizer is the only technique whose performance is close to ours, but its computation cost is much higher. Finally, we also study the effect of some graphs metrics on the gain that our approach can achieve.
A Distributed, Multi-Agent approach to Reactive Network Resilience
ABSTRACT. Critical network infrastructures are communication networks whose disruption can create a severe impact on other systems. Multi-agent systems have been successfully used to protect critical network infrastructures, with approaches ranging from reasoning about secure design and policies to multi-agent intrusion detection systems (IDS). However, there is little research on the possibilities for multi-agent systems to react to known intrusions. In this paper we propose a multi-agent framework for reactive network resilience, that is, to allow a network to reconfigure itself in the event of a security incident so that the risk of further damage is mitigated. The proposed framework takes advantage of a risk model based on multilayer networks and of distributed belief propagation techniques to agree on a new, more resilient configuration of the network in the event of an attack. We compare our proposal with a number of centralized optimization and multi-agent negotiation techniques. Experiments show that our proposal outperforms the reference approaches both in terms of risk mitigation and performance.
Splee: A Declarative Information-Based Language for Multiagent Interaction Protocols
ABSTRACT. We take as our point of departure declarative, information-based
approaches for specifying interaction protocols that support fully
decentralized protocol enactments via asynchronous messaging between
agents.
We introduce \emph{Splee}, a significant extension of
\emph{Blindingly Simple Protocol Language} (BSPL) . The extensions
fall into two broad categories:\emph{roles} and \emph{multicast}.
In Splee, a role binding is information that is dynamically
generated during protocol enactment, potentially as the content
(payload) of communication between two agents. Multicast
communication is the idea that a message is sent to a set of agents.
The two categories of extensions are interconnected via novel
features such as \emph{set roles} (the idea the idea that a role
binding can be a set of agents) and \emph{subroles} (the idea that
agents playing a role must be a subset of agents playing another
role). We give the formal semantics of Splee and give \emph{small
model} characterizations of the correctness of decentralized
enactments in terms of the safety and liveness of Splee protocols.
We additionally introduce the pragmatic enhancement of \emph{query
attachments} for messages, which takes advantage of Splee's
information-orientation, and can be used to restrict the values
(parameters bindings) communicated in a message.
Vocabulary Alignment in Openly Specified Interactions
ABSTRACT. The problem of achieving common understanding between agents that use different vocabularies has been mainly addressed by designing techniques that explicitly negotiate mappings between their vocabularies, requiring agents to share a meta-language. In this paper we consider the case of agents that use different vocabularies and have no meta-language in common, but share the knowledge of how to perform a task, given by the specification of an interaction protocol. For this situation, we present a framework that lets agents learn a vocabulary alignent from the experience of interacting. Unlike previous work in this direction, we use open protocols that constrain possible actions instead of defining procedures, making our approach more general. We present two techniques that can be used either to learn a alignment from scratch or to repair an existent one, and we evaluate experimentally their performance.
ABSTRACT. Online team games need matchmaking systems which can handle a high throughput of players and form fair teams to play matches together. We study this problem from a theoretical perspective, by defining and analyzing a broad model which captures many applications. In the most basic formulation, we desire a data structure which supports adding and removing players, and extracting the best possible games. We design an efficient solution, where with n players in the structure, operations can be performed in O(log n) time. We then consider a natural extension one might want in a team game setting, where each team has different roles, and each player is only willing to play in some of these roles. We show that this extension is computationally intractable, conditioned on the popular 3SUM conjecture; nevertheless, we are able to design an efficient constant-factor approximate solution. Our results help explain recent practical issues with the matchmaking systems in some of the most popular online games. We also prove similar results in an offline setting, which has many applications to matching and partitioning beyond online games.
Incentivizing Cooperation Between Heterogeneous Agents in Dynamic Task Allocation
ABSTRACT. Market Clearing is an economic concept that features attractive properties when used for resource and task allocation, e.g., Pareto optimality and Envy Freeness.
Recently, an algorithm based on Market Clearing, FMC_TA, has been shown to be most effective for realistic dynamic multi agent task allocation, outperforming general optimization methods, e.g., Simulated annealing, and dedicated algorithms, specifically designed for task allocation.
That been said, FMC_TA was applied to a homogeneous team of agents and used linear personal utility functions for representing agents' preferences. These properties limited the settings on which the algorithm could be applied.
In this paper we advance the research on task allocation methods based on market clearing by enhancing the FMC_TA algorithm such that it: 1) can use concave personal utility functions as its input and 2) can apply to applications which require the collaboration of heterogeneous agents, i.e. agents with different capabilities. We demonstrate that the use of concave functions indeed encourages collaboration among agents.
Our results on both homogeneous and heterogeneous scenarios indicate that the use of personal utility functions with small concavity is enough to achieve the desired incentivized cooperation result, and on the other hand, in contrast to functions with increased concavity, does not cause a severe delay in the execution of tasks.
Causality, Responsibility, and Blame in Team Plans
ABSTRACT. Many objectives can be achieved (or may be achieved more effectively) only by a group of agents executing a team plan. If a team plan fails, it is often of interest to determine what caused the failure, the degree of responsibility of each agent for the failure, and the degree of blame attached to each agent. We show how team plans can be represented in terms of structural equations, and then apply the definitions of causality introduced by Halpern (2015) and degree of responsibility and blame introduced by Chockler and Halpern (2004) to determine the agent(s) who caused the failure and what their degree of responsibility/blame is. We also prove new results on the complexity of computing causality and degree of responsibility and blame, showing that they can be determined in polynomial time for many team plans of interest.
Simultaneously Learning and Advising in Multiagent Reinforcement Learning
ABSTRACT. Reinforcement Learning has long been employed to solve sequential decision-making problems with minimal input data. However, the classical approach requires a large number of interactions with the environment to learn a suitable policy. This problem is further intensified when multiple autonomous agents are simultaneously learning in the environment. The student-teacher approach aims at alleviating this problem by integrating an advising procedure in the learning process, in which an experienced agent (human or not) can advise a student to guide her exploration. Even though previous works reported that an agent can learn faster when receiving advice, their proposals require that the teacher is an expert in the learning task.
Sharing successful episodes can also accelerate learning, but this procedure requires many communications among agents, what is unfeasible for domains in which communication is limited. Thus, we here propose a multiagent advising framework where multiple agents can advise each other while learning in a shared environment.
If an agent is unsure about what to do in a state she can ask for advice from other agents and may receive answers from agents that have more confidence in their actuation for that state. We perform experiments in a simulated Robot Soccer environment and show that the learning process is improved by incorporating this kind of advice.
Cooperative Set Function Optimization Without Communication or Coordination
ABSTRACT. We introduce a new model for cooperative agents that seek to optimize a common goal without communication or coordination. Given a universe of elements V, a set of agents, and a set function f, we ask each agent i to select a subset Sᵢ ⊂ V such that the size of Sᵢ is constrained (i.e., |Sᵢ| < k). The goal is for the agents to cooperatively choose the sets Sᵢ to maximize the function evaluated at the union of these sets, ∪ᵢ Sᵢ; we seek max f(∪ᵢ Sᵢ). We assume the agents can neither communicate nor coordinate how they choose their sets. This model arises naturally in many real-world settings such as swarms of surveillance robots and colonies of foraging insects. Even for simple classes of set functions, there are strong lower bounds on the achievable performance of coordinating deterministic agents. We show, surprisingly, that for the fundamental class of submodular set functions, there exists a near-optimal distributed algorithm for this problem that does not require communication. We demonstrate that our algorithm performs nearly as well as recently published algorithms that allow full coordination.
ABSTRACT. What is fun about artificial intelligence (AI) is that it is fundamentally constructive! Rather than theorizing about the behavior of an existing, say social system, or understanding the way in which decisions are currently being made, one gets to ask a profound question: how should a system for making intelligent decisions be designed? In multi-agent systems we’re often interested, in particular, in what happens when multiple participants, each with autonomy, self-interest and potentially misaligned incentives come together. These systems involve people (frequently people and firms), as well as the use of AI to automate some parts of decision making. How should such a system be designed?
In adopting this normative viewpoint, we follow a successful branch of economic theory that includes mechanism design, social choice and matching. But in pursuit of success, we must grapple with problems that are more complex than has been typical in economic theory, and solve these problems at scale. We want to attack difficult problems by using the methods of AI together with the methods of economic theory. In this talk, I will highlight the following themes from my own research, themes that emerge as one lifts nice ideas from economic theory and brings them to bear on the kinds of problems that are at the heart of modern AI research:
1. Preferences need to be elicited, and cannot be assumed to be known or easily represented.
2. Mechanisms don’t need to be centralized. Rather, we can use (incentive aligned) distributed optimization!
3. Many interesting problems are temporal and involved uncertainty, leading to rich challenges that have been largely untouched by economic theory.
4. Markets become algorithms, and market-based optimization is a powerful and increasingly real paradigm.
5. Scale becomes an opportunity, as large systems beget data, this data enabling new approaches to robust incentive alignment.
6. Computation becomes a tool -- machine learning has taken automated mechanism design up to the frontier of knowledge from 35 years of auction theory.
The future will bring a tighter and ever more compelling integration of AI with markets. The human and societal interface will remain, and become ever more important as there is more that can be automated. The anticipated "agent-mediated economy" is almost upon us, and holds much promise as long as we make sure that AIs represent our preferences and system designs capture our values as society.