## Session BP1 – Best Papers Session I

B39
Central to the vision of the smart grid is the deployment of smart meters that will allow autonomous software agents, representing the consumers, to optimise their use of devices and heating in the smart home while interacting with the grid. However, without some form of coordination, the population of agents may end up with overly-homogeneous optimised consumption patterns that may generate significant peaks in demand in the grid. These peaks, in turn, reduce the efficiency of the overall system, increase carbon emissions, and may even, in the worst case, cause blackouts. Hence, in this paper, we introduce a novel model of a Decentralised Demand Side Management (DDSM) mechanism that allows agents, by adapting the deferment of their loads based on grid prices, to coordinate in a decentralised manner. Specifically, using average UK consumption profiles for 26M homes, we demonstrate that, through an emergent coordination of the agents, the peak demand of domestic consumers in the grid can be reduced by up to 17% and carbon emissions by up to 6%. We also show that our DDSM mechanism is robust to the increasing electrification of heating in UK homes (i.e., it exhibits a similar efficiency).
Agent-Based Control for Decentralised Demand Side Management in the Smart Grid
Sarvapali D. Ramchurn, Perukrishnen Vytelingum, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

G38
Grid-Integrated Vehicles (GIVs) are plug-in Electric Drive Vehicles (EDVs) with power-management and other controls that allow them to respond to external commands sent by power-grid operators, or their affiliates, when parked and plugged-in to the grid. At a bare minimum, such GIVs should respond to demand-management commands or pricing signals to delay, reduce or switch-off the rate of charging when the demand for electricity is high. In more advanced cases, these GIVs might sell both power and storage capacity back to the grid in any of the several electric power markets – a concept known as Vehicle-to-Grid power or V2G power. Although individual EDVs control too little power to sell in the market at an individual level, a large group of EDVs may form an aggregate or coalition that controls enough power to meaningfully sell, at a profit, in these markets. The profits made by such a coalition can then be used by the coalition members to offset the costs of the electric vehicles and batteries themselves. In this paper we describe an implemented and deployed multi-agent system that is used to integrate EDVs into the electricity grid managed by PJM, the largest transmission service operator in the world. We provide a brief introduction to GIVs and the various power markets and discuss why multi-agent systems are a good match for this application.
Deploying Power Grid-Integrated Electric Vehicles as a Multi-Agent System
Sachin Kamboj, Willett Kempton, Keith S. Decker

G40
Building on research previously reported at AAMAS conferences, this paper describes an innovative application of a novel gametheoretic approach for a national scale security deployment. Working with the United States Transportation Security Administration (TSA), we have developed a new application called GUARDS to assist in resource allocation tasks for airport protection at over 400 United States airports. In contrast with previous efforts such as ARMOR and IRIS, which focused on one-off tailored applications and one security activity (e.g. canine patrol or checkpoints) per application, GUARDS faces three key issues: (i) reasoning about hundreds of heterogeneous security activities; (ii) reasoning over diverse potential threats; (iii) developing a system designed for hundreds of end-users. Since a national deployment precludes tailoring to specific airports, our key ideas are: (i) creating a new game-theoretic framework that allows for heterogeneous defender activities and compact modeling of a large number of threats; (ii) developing an efficient solution technique based on general purpose Stackelberg game solvers; (iii) taking a partially centralized approach for knowledge acquisition and development of the system. In doing so we develop a software scheduling assistant, GUARDS, designed to reason over two agents – the TSA and a potential adversary – and allocate the TSA's limited resources across hundreds of security activities in order to provide protection within airports. The scheduling assistant has been delivered to the TSA and is currently under evaluation and testing for scheduling practices at an undisclosed airport. If successful, the TSA intends to incorporate the system into their unpredictable scheduling practices nationwide. In this paper we discuss the design choices and challenges encountered during the implementation of GUARDS. GUARDS represents promising potential for transitioning years of academic research into a nationally deployed system.
GUARDS - Game Theoretic Security Allocation on a National Scale
James Pitahas 2 papers, Milind Tambehas 8 papers, Christopher Kiekintveldhas 4 papers, Shane Cullen, Erin Steigerwald

## Session BP2 – Best Papers Session II

R38
Overlapping Coalition Formation (OCF) games are cooperative games where the players can simultaneously participate in several coalitions. Capturing the notion of stability in OCF games is a difficult task: a player may deviate by abandoning some, but not all of the coalitions he is involved in, and the crucial question is whether he then gets to keep his payoff from the unaffected coalitions. In related work the authors introduce three stability concepts for OCF games – the conservative, refined, and optimistic core – that are based on different answers to this question. In this paper, we propose a unified framework for the study of stability in the OCF setting, which encompasses the concepts considered previously as well as a wide variety of alternative stability concepts. Our approach is based on the notion of an arbitrator, which can be thought of as an external party that determines payoff to deviators. We give a complete characterization of outcomes that are stable under arbitration. In particular, our results provide a criterion for the outcome to be in the refined or optimistic core, thus complementing previously results for the conservative core, and answering questions left open previously. We also introduce a notion of the nucleolus for arbitrated OCF games, and argue that it is non-empty. Finally, we extend the definition of the Shapley value to the OCF setting, and provide an axiomatic characterization for it.
Arbitrators in Overlapping Coalition Formation Games
Yair Zick, Edith Elkindhas 3 papers

R39
Online digital goods auctions are settings where a seller with an unlimited supply of goods (e.g. music or movie downloads) interacts with a stream of potential buyers. In the posted price setting, the seller makes a take-it-or-leave-it offer to each arriving buyer. We study the seller's revenue maximization problem in posted-price auctions of digital goods. We find that algorithms from the multi-armed bandit literature like UCB, which come with good regret bounds, can be slow to converge. We propose and study two alternatives: (1) a scheme based on using Gittins indices with priors that make appropriate use of domain knowledge; (2) a new learning algorithm, LLVD, that assumes a linear demand curve, and maintains a Beta prior over the free parameter using a moment-matching approximation. LLVD is not only (approximately) optimal for linear demand, but also learns fast and performs well when the linearity assumption is violated, for example in the cases of two natural valuation distributions, exponential and log-normal.
Learning the Demand Curve in Posted-Price Digital Goods Auctions
Meenal Chhabrahas 2 papers, Sanmay Dashas 2 papers

B42
In their groundbreaking paper, Bartholdi, Tovey and Trick argued that many well-known voting rules, such as Plurality, Borda, Copeland and Maximin are easy to manipulate. An important assumption made in that paper is that the manipulator's goal is to ensure that his preferred candidate is among the candidates with the maximum score, or, equivalently, that ties are broken in favor of the manipulator's preferred candidate. In this paper, we examine the role of this assumption in the easiness results of Bartholdi et al. We observe that the algorithm presented in Bartholdi et al extends to all rules that break ties according to a fixed ordering over the candidates. We then show that all scoring rules are easy to manipulate if the winner is selected from all tied candidates uniformly at random. This result extends to Maximin under an additional assumption on the manipulator's utility function that is inspired by the original model of Bartholdi et al. In contrast, we show that manipulation becomes hard when arbitrary polynomial-time tie-breaking rules are allowed, both for the rules considered in Bartholdi et al, and for a large class of scoring rules.
Ties Matter: Complexity of Voting Manipulation Revisited
Svetlana Obraztsova, Edith Elkindhas 3 papers, Noam Hazon

R40
Boolean games are a natural, compact, and expressive class of logic-based games, in which each player exercises unique control over some set of Boolean variables, and has some logical goal formula that it desires to be achieved. A player's strategy set is the set of all possible valuations that may be made to its variables. A player's goal formula may contain variables controlled by other agents, and in this case, it must reason strategically about how best to assign values to its variables. In the present paper, we consider the possibility of overlaying Boolean games with taxation schemes. A taxation scheme imposes a cost on every possible assignment an agent can make. By designing a taxation scheme appropriately, it is possible to perturb the preferences of the agents within a society, so that agents are rationally incentivised to choose some socially desirable equilibrium that would not otherwise be chosen, or incentivised to rule out some socially undesirable equilibria. After formally presenting the model, we explore some issues surrounding it (e.g., the complexity of finding a taxation scheme that implements some socially desirable outcome), and then discuss possible desirable properties of taxation schemes.
Designing Incentives for Boolean Games
Ulle Endriss, Sarit Kraushas 4 papers, Jérôme Langhas 2 papers, Michael Wooldridgehas 6 papers

## Session A1 – Robotics

R41
A common decision problem in multi-robot applications involves deciding on which robot, out of a group of *N* robots, should travel to a goal location, to carry out a task there. Trivially, this decision problem can be solved greedily, by selecting the robot with the shortest expected travel time. However, this ignores the inherent uncertainty in path traversal times; we may prefer a robot that is slower (but always takes the same time), over a robot that is expected to reach the goal faster, but on occasion takes a very long time to arrive. We make several contributions that address this challenge. First, we bring to bear economic decision-making theory, to distinguish between different selection policies, based on risk (risk averse, risk seeking, etc.). Second, we introduce social regret (the difference between the actual travel time by the selected robot, and the hypothetical time of other robots) to augment decision-making in practice. Then, we carry out experiments in simulation and with real robots, to demonstrate the usefulness of the selection procedures under real-world settings, and find that travel-time distributions have repeating characteristics.
Who Goes There? Selecting a Robot to Reach a Goal Using Social Regret
Meytal Traub, Gal A. Kaminkahas 3 papers, Noa Agmonhas 2 papers

B43
Autonomy requires robustness. The use of unmanned (autonomous) vehicles is appealing for tasks which are dangerous or dull. However, increased reliance on autonomous robots increases reliance on their robustness. Even with validated software, physical faults can cause the controlling software to perceive the environment incorrectly, and thus to make decisions that lead to task failure. We present an online anomaly detection method for robots, that is light-weight, and is able to take into account a large number of monitored sensors and internal measurements, with high precision. We demonstrate a specialization of the familiar Mahalanobis Distance for robot use, and also show how it can be used even with very large dimensions, by online selection of correlated measurements for its use. We empirically evaluate these contributions in different domains: commercial Unmanned Aerial Vehicles (UAVs), a vacuum-cleaning robot, and a high-fidelity flight simulator. We find that the online Mahalanobis distance technique, presented here, is superior to previous methods.
Online Anomaly Detection in Unmanned Vehicles
Eliahu Khalastchi, Gal A. Kaminkahas 3 papers, Meir Kalech, Raz Lin

B44
Incremental heuristic search algorithms can solve sequences of similar search problems potentially faster than heuristic search algorithms that solve each search problem from scratch. So far, there existed incremental heuristic search algorithms (such as Adaptive A*) that make the h-values of the current A* search more informed, which can speed up future A* searches, and incremental heuristic search algorithms (such as D* Lite) that change the search tree of the current A* search to the search tree of the next A* search, which can be faster than constructing it from scratch. In this paper, we present Tree Adaptive A*, which applies to goal-directed navigation in unknown terrain and builds on Adaptive A* but combines both classes of incremental heuristic search algorithms in a novel way. We demonstrate experimentally that it can run faster than Adaptive A*, Path Adaptive A* and D* Lite, the top incremental heuristic search algorithms in the context of goal-directed navigation in unknown grids.
Tree Adaptive A*
Carlos Hernándezhas 2 papers, Xiaoxun Sunhas 2 papers, Sven Koenighas 2 papers, Pedro Meseguerhas 2 papers

## Session B1 – Distributed Problem Solving I

G42
*k*- and *t*-optimality algorithms provide solutions to DCOPs that are optimal in regions characterized by its size and distance respectively. Moreover, they provide quality guarantees on their solutions. Here we generalise the *k*- and *t*-optimal framework to introduce *C*-optimality, a flexible framework that provides reward-independent quality guarantees for optima in regions characterised by any arbitrary criterion. Therefore, *C*-optimality allows us to explore the space of criteria (beyond size and distance) looking for those that lead to better solution qualities. We benefit from this larger space of criteria to propose a new criterion, the socalled size-bounded-distance criterion, which outperforms *k*- and *t*-optimality.
Quality Guarantees for Region Optimal DCOP Algorithms
Meritxell Vinyals, Eric Shieh, Jesus Cerquideshas 2 papers, Juan Antonio Rodriguez-Aguilarhas 3 papers, Zhengyu Yinhas 2 papers, Milind Tambehas 8 papers, Emma Bowringhas 2 papers

G43
Scheduling agents can use the Multiagent Simple Temporal Problem (MaSTP) formulation to efficiently find and represent the complete set of alternative consistent joint schedules in a distributed and privacy-maintaining manner. However, continually revising this set of consistent joint schedules as new constraints arise may not be a viable option in environments where communication is uncertain, costly, or otherwise problematic. As an alternative, agents can find and represent a temporal decoupling in terms of locally independent sets of consistent schedules that, when combined, form a set of consistent joint schedules. Unlike current algorithms for calculating a temporal decoupling that require centralization of the problem representation, in this paper we present a new, provably correct, distributed algorithm for calculating a temporal decoupling. We prove that this algorithm has the same theoretical computational complexity as current state-of-the-art MaSTP solution algorithms, and empirically demonstrate that it is more efficient in practice. We also introduce and perform an empirical cost/benefit analysis of new techniques and heuristics for selecting a maximally flexible temporal decoupling.
Distributed Algorithms for Solving the Multiagent Temporal Decoupling Problem
James C. Boerkoel, Edmund H. Durfeehas 3 papers

R43
Distributed systems can often be modeled as a collection of distributed (system) variables whose values are constrained by a set of constraints. In distributed multi-agent systems, the set of variables occurring at a site (subsystem) is usually viewed as controllable by a local agent. This agent assigns values to the variables, and the aim is to provide distributed methods enabling a set of agents to come up with a global assignment (solution) that satisfies all the constraints. Alternatively, the system might be understood as a distributed database. Here, the focus is on ensuring consistency of the global system if local constraints (the distributed parts of the database) change. In this setting, the aim is to determine whether the existence of a global solution can be guaranteed. In other settings (e.g., P2P systems, sensor networks), the values of the variables might be completely out of control of the individual systems, and the constraints only characterize globally normal states or behavior of the system. In order to detect anomalies, one specifies distributed methods that can efficiently indicate violations of such constraints. The aim of this paper is to show that the following three main problems identified in these research areas are in fact identical: (i) the problem of ensuring that independent agents come up with a global solution; (ii) the problem of ensuring that global consistency is maintained if local constraint stores change; and (iii) the problem of ensuring that global violations can be detected by local nodes. This claim is made precise by developing a decomposition framework for distributed constraint systems and then extracting preservation properties that must satisfied in order to solve the above mentioned problems. Although satisfying the preservation properties seems to require different decomposition modes, our results demonstrate that in fact these decomposition properties are equivalent, thereby showing that the three main problems identified above are identical. We then show that the complexity of finding such decompositions is polynomially related to finding solutions for the original constraint system, which explains the popularity of decomposition applied to tractable constraint systems. Finally, we address the problem of finding optimal decompositions and show that even for tractable constraint systems, this problem is hard.
Decomposing Constraint Systems: Equivalences and Computational Properties
Wiebe van der Hoekhas 5 papers, Cees Witteveen, Michael Wooldridgehas 6 papers

B46
We consider the fundamental problem of reaching consensus in multiagent systems; an operation required in many applications such as, among others, vehicle formation and coordination, shape formation in modular robotics, distributed target tracking, and environmental modeling. To date, the consensus problem (the problem where agents have to agree on their reported values) has been typically solved with iterative decentralized algorithms based on graph Laplacians. However, the convergence of these existing consensus algorithms is often too slow for many important multiagent applications, and thus they are increasingly being combined with acceleration methods. Unfortunately, state-of-the- art acceleration techniques require parameters that can be optimally selected only if complete information about the network topology is available, which is rarely the case in practice. We address this limitation by deriving two novel acceleration methods that can deliver good performance even if little information about the network is available. The first proposed algorithm is based on the Chebyshev semi-iterative method and is optimal in a well defined sense; it maximizes the worst-case convergence speed (in the mean sense) given that only rough bounds on the extremal eigenvalues of the network matrix are available. It can be applied to systems where agents use unreliable communication links, and its computational complexity is similar to those of simple Laplacian-based methods. This algorithm requires synchronization among agents, so we also propose an asynchronous version that approximates the output of the synchronous algorithm. Mathematical analysis and numerical simulations show that the convergence speed of the proposed acceleration methods decrease gracefully in scenarios where the sole use of Laplacian-based methods is known to be impractical.
Consensus Acceleration in Multiagent Systems with the Chebyshev Semi-Iterative Method
Renato L.G. Cavalcantehas 2 papers, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

## Session C1 – Game Theory I

R44
Proper scoring rules, particularly when used as the basis for a prediction market, are powerful tools for eliciting and aggregating beliefs about events such as the likely outcome of an election or sporting event. Such scoring rules incentivize a single agent to reveal her true beliefs about the event. Othman and Sandholm introduced the idea of a decision rule to examine these problems in contexts where the information being elicited is conditional on some decision alternatives. For example,“What is the probability having ten million viewers if we choose to air new television show X? What if we choose Y?” Since only one show can actually air in a slot, only the results under the chosen alternative can ever be observed. Othman and Sandholm developed proper scoring rules (and thus decision markets) for a single, deterministic decision rule: always select the action with the greatest probability of success. In this work we significantly generalize their results, developing scoring rules for other deterministic decision rules, randomized decision rules, and situations where there may be more than two outcomes (e.g. less than a million viewers, more than one but less than ten, or more than ten million).
Information Elicitation for Decision Making
Yiling Chenhas 2 papers, Ian A. Kash

## Session D1 – Multiagent Learning

G47
We present a game-theoretic self-organizing approach for scheduling the radio activity of wireless sensor nodes. Our approach makes each node play a *win-stay lose-shift* (WSLS) strategy to choose when to schedule radio transmission, reception and sleeping periods. The proposed strategy relies only on local interactions with neighboring nodes, and is thus fully decentralized. This behavior results in shorter communication schedules, allowing to not only reduce energy consumption by reducing the wake-up cycles of sensor nodes, but also to decrease the data retrieval latency. We implement this WSLS approach in the OMNeT++ sensor network simulator where nodes are organized in three topologies —line, grid and random. We compare the performance of our approach to two state-of-the-art scheduling protocols, namely S-MAC and D-MAC, and show that the WSLS strategy brings significant gains in terms of energy savings, while at the same time reduces communication delays. In addition, we show that our approach performs particularly well in large, random topologies.
Distributed Cooperation in Wireless Sensor Networks
Mihail Mihaylov, Yann-Aël Le Borgne, Karl Tuylshas 4 papers, Ann Nowéhas 2 papers

## Session A2 – Logic-Based Approaches I

B50
We propose coalitional normative system (cns), which can selectively restrict the joint behavior of a coalition, in this paper. We extend the semantics of atl and propose Coordinated atl (co-atl) to support the formalizing of cns. We soundly and completely characterize the limitation of the normative power of a coalition by identifying two fragments of col-atl language corresponding to two types of system properties that are unchangeable by restricting the joint behavior of such a coalition. Then, we prove that the effectiveness checking, feasibility and synthesis problems of cns are ptime-complete, cp-complete and fnp-complete, respectively. Moreover, we define two concepts of optimality for cns, that is, minimality and compactness, and prove that both minimality checking and compactness checking are conp-complete while the problem of checking whether a coalition is a minimal controllable coalition is dp-complete. The relation between ns and cns is discussed, and it turns out that css intrinsically consists of a proper subset of cnss and some basic problems related to cns are no more complex than that of ns.
A Framework for Coalitional Normative Systems
Jun Wu, Chongjun Wang, Junyuan Xie

G48
An abstract argumentation framework and the semantics, often called Dungean semantics, give a general framework for nonmonotonic logics. In the last fteen years, a great number of papers in computational argumentation adopt Dungean semantics as a fundamental principle for evaluating various kinds of defeasible consequences. Recently, many papers address problems not only with theoretical reasoning, i.e., reasoning about what to believe, but also practical reasoning, i.e., reasoning about what to do. This paper proposes a practical argumentation semantics specic to practical argumentation. This is motivated by our hypothesis that consequences of such argumentation should satisfy Pareto optimality because the consequences strongly depend on desires, aims, or values an individual agent or a group of agents has. We dene a practical argumentation framework and two kinds of extensions, preferred and grounded extensions, with respect to each group of agents. We show that evaluating Pareto optimality can be translated to evaluating preferred extensions of a particular practical argumentation framework. Furthermore, we show that our semantics is a natural extension of Dungean semantics in terms of considering more than one defeat relation. We give a generality order of four practical argumentation frameworks specied by taking into account Dungean semantics and Pareto optimality. We show that a member of preferred extensions of the most specic one is not just Pareto optimal, but also it is theoretically justied.
Practical Argumentation Semantics for Socially Efficient Defeasible Consequence
Hiroyuki Kido, Katsumi Nitta

## Session B2 – Agent-Based System Development I

## Session C2 – Social Choice Theory

G52
Given the preferences of several agents over a common set of candidates, voting trees can be used to select a candidate (the winner) by a sequence of pairwise competitions modelled by a binary tree (the agenda). The majority graph compactly represents the preferences of the agents and provides enough information to compute the winner. When some preferences are missing, there are various notions of winners, such as the possible winners (that is, winners in at least one completion) or the necessary winners (that is, winners in all completions). In this generalized scenario, we show that using the majority graph to compute winners is not correct, since it may declare as winners candidates that are not so. Nonetheless, the majority graph can be used to compute efficiently an upper or lower approximation of the correct set of winners.
Possible And Necessary Winners In Voting Trees: Majority Graphs Vs. Profiles
Maria Silvia Pinihas 2 papers, Francesca Rossihas 2 papers, Kristen Brent Venablehas 2 papers, Toby Walshhas 2 papers

R47
Strategyproof (SP) classification considers situations in which a decision-maker must classify a set of input points with binary labels, minimizing expected error. Labels of input points are reported by self-interested agents, who may lie so as to obtain a classifier more closely matching their own labels. These lies would create a bias in the data, and thus motivate the design of truthful mechanisms that discourage false reporting.
We here answer questions left open by previous research on strategyproof classification, in particular regarding the best approximation ratio (in terms of social welfare) that an SP mechanism can guarantee for *n* agents. Our primary result is a lower bound of *3 - 2/n* on the approximation ratio of SP mechanisms under the shared inputs assumption; this shows that the previously known upper bound (for uniform weights) is tight. The proof relies on a result from Social Choice theory, showing that any SP mechanism must select a dictator at random, according to some fixed distribution. We then show how different randomizations can improve the best known mechanism when agents are weighted, matching the lower bound with a tight upper bound. These results contribute both to a better understanding of the limits of SP classification, as well as to the development of similar tools in other, related domains such as SP facility location.
Tight Bounds for Strategyproof Classification
Reshef Meir, Shaull Almagor, Assaf Michaely, Jeffrey S. Rosenscheinhas 2 papers

G53
In response to the Mumbai attacks of 2008, the Mumbai police have started to schedule a limited number of inspection checkpoints on the road network throughout the city. Algorithms for similar security-related scheduling problems have been proposed in recent literature, but security scheduling in networked domains when targets have varying importance remains an open problem at large. In this paper, we cast the network security problem as an attackerdefender zero-sum game. The strategy spaces for both players are exponentially large, so this requires the development of novel, scalable techniques. We first show that existing algorithms for approximate solutions can be arbitrarily bad in general settings. We present RUGGED (Randomization in Urban Graphs by Generating strategies for Enemy and Defender), the first scalable optimal solution technique for such network security games. Our technique is based on a double oracle approach and thus does not require the enumeration of the entire strategy space for either of the players. It scales up to realistic problem sizes, as is shown by our evaluation of maps of southern Mumbai obtained from GIS data.
A Double Oracle Algorithm for Zero-Sum Security Games on Graphs
Manish Jainhas 3 papers, Dmytro Korzhykhas 2 papers, Ondřej Vaněkhas 4 papers, Vincent Conitzerhas 2 papers, Michal Pěchoučekhas 5 papers, Milind Tambehas 8 papers

## Session D2 – Preferences and Strategies

R49
CP-net (Conditional Preference Network) is one of the extensively studied languages for representing and reasoning with preferences. The fundamental operation of dominance testing in CP-nets, i.e. determining whether an outcome is preferred to another, is very important in many real-world applications. Current techniques for solving general dominance queries is to search for improving flipping sequence from one outcome to another as a proof of the dominance relation in all rankings satisfying the given CP-net. However, it is generally a hard problem even for binary-valued, acyclic CP-nets and tractable search algorithms exist only for specific problem classes. Hence, there is a need for efficient algorithms and techniques for dominance testing in more general problem settings. In this paper, we propose a heuristic approach, called DT*, to dominance testing in arbitrary acyclic multi-valued CP-nets. Our proposed approach guides the search process efficiently and allows significant reduction of search effort without impacting soundness or completeness of the search process. We present results of experiments that demonstrate the computational efficiency and feasibility of our approach to dominance testing.
Efficient Heuristic Approach to Dominance Testing in CP-nets
Minyi Lihas 3 papers, Quoc Bao Vohas 4 papers, Ryszard Kowalczykhas 4 papers

## Session A3 – Distributed Problem Solving II

R50
In this paper we address efficient decentralised coordination of cooperative multi-agent systems by taking into account the actual computation and communication capabilities of the agents. We consider coordination problems that can be framed as Distributed Constraint Optimisation Problems, and as such, are suitable to be deployed on large scale multi-agent systems such as sensor networks or multiple unmanned aerial vehicles. Specifically, we focus on techniques that exploit structural independence among agents' actions to provide optimal solutions to the coordination problem, and, in particular, we use the Generalized Distributive Law (GDL) algorithm. In this settings, we propose a novel resource aware heuristic to build junction trees and to schedule GDL computations across the agents. Our goal is to minimise the total running time of the coordination process, rather than the theoretical complexity of the computation, by explicitly considering the computation and communication capabilities of agents. We evaluate our proposed approach against DPOP, RDPI and a centralized solver on a number of benchmark coordination problems, and show that our approach is able to provide optimal solutions for DCOPs faster than previous approaches. Specifically, in the settings considered, when resources are scarce our approach is up to three times faster than DPOP (which proved to be the best among the competitors in our settings).
Resource-Aware Junction Trees for Efficient Multi-Agent Coordination
N. Stefanovitch, A. Farinelli, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

B52
We propose the bounded multi-objective max-sum algorithm (B-MOMS), the first decentralised coordination algorithm for multi-objective optimisation problems. B-MOMS extends the max-sum message-passing algorithm for decentralised coordination to compute bounded approximate solutions to multi-objective decentralised constraint optimisation problems (MO-DCOPs). Specifically, we prove the optimality of B-MOMS in acyclic constraint graphs, and derive problem dependent bounds on its approximation ratio when these graphs contain cycles. Furthermore, we empirically evaluate its performance on a multi-objective extension of the canonical graph colouring problem. In so doing, we demonstrate that, for the settings we consider, the approximation ratio never exceeds 2, and is typically less than 1.5 for less-constrained graphs. Moreover, the runtime required by B-MOMS on the problem instances we considered never exceeds 30 minutes, even for maximally constrained graphs with 100 agents. Thus, B-MOMS brings the problem of multi-objective optimisation well within the boundaries of the limited capabilities of embedded agents.
Bounded Decentralised Coordination over Multiple Objectives
Francesco M. Delle Fave, Ruben Stranders, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

## Session B3 – Agent-Based System Development II

R51
Multi-agent systems form the basis of many innovative large-scale distributed applications. The development of such applications requires a careful balance of a wide range of concerns: a detailed understanding of the behaviour of the abstract algorithms being employed, a knowledge of the effects and costs of operating in a distributed environment, and an expertise in the performance requirements of the application itself. Experimental work plays a key role in the process of designing such systems. This paper examines the multi-agent systems development cycle from a distributed systems perspective. A survey of recent experimental studies finds that a large proportion of work on the design of multi-agent systems is focused on the analytical and simulation phases of development. This paper advocates an alternative more comprehensive development cycle, which extends from theoretical studies to simulations, emulations, demonstrators and finally staged deployment. AgentScope, a tool that supports the experimental stages of multiagents systems development and facilitates long-term dispersed research efforts, is introduced. AgentScope consists of a small set of interfaces on which experimental work can be built independently of a particular type of platform. The aim is to make not only agent code but also experimental scenarios, and metrics reusable, both between projects and over simulation, emulation and demonstration platforms. An example gossip-based sampling experiment demonstrates reusability, showing the ease with which an experiment can be defined, modified into a comparison study, and ported between a simulator and an actual agent-operating system.
AgentScope: Multi-Agent Systems Development in Focus
Elth Ogston, Frances Brazier

## Session C3 – Bounded Rationality

R53
Increasingly in both traditional, and especially Internet-based marketplaces, knowledge is becoming a traded commodity. This paper considers the impact of the presence of knowledge-brokers, or experts, on search-based markets with noisy signals. For example, consider a consumer looking for a used car on a large Internet marketplace. She sees noisy signals of the true value of any car she looks at the advertisement for, and can disambiguate this signal by paying for the services of an expert (for example, getting a Carfax report, or taking the car to a mechanic for an inspection). Both the consumer and the expert are rational, self-interested agents. We present a model for such search environments, and analyze several aspects of the model, making three main contributions: (1) We derive the consumer's optimal search strategy in environments with noisy signals, with and without the option of consulting an expert; (2) We find the optimal strategy for maximizing the expert's profit; (3) We study the option of market designers to subsidize search in a way that improves overall social welfare. We illustrate our results in the context of a plausible distribution of signals and values.
Expert-Mediated Search
Meenal Chhabrahas 2 papers, Sanmay Dashas 2 papers, David Sarnehas 2 papers

G55
In many settings and for various reasons, people fail to make optimal decisions. These factors also influence the agents people design to act on their behalf in such virtual environments as eCommerce and distributed operating systems, so that the agents also act sub-optimally despite their greater computational capabilities. In some decision-making situations it is theoretically possible to supply the optimal strategy to people or their agents, but this optimal strategy may be non-intuitive, and providing a convincing explanation of optimality may be complex. This paper explores an alternative approach to improving the performance of a decision-maker in such settings: the data on choices is manipulated to guide searchers to a strategy that is closer to optimal. This approach was tested for sequential search, which is a classical sequential decision-making problem with broad areas of applicability (e.g., product search, partnership search). The paper introduces three heuristics for manipulating choices, including one for settings in which repeated interaction or access to a decision-maker's past history is available. The heuristics were evaluated on a large population of computer agents, each of which embodies a search strategy programmed by a different person. Extensive tests on thousands of search settings demonstrate the promise of the problem-restructuring approach: despite a minor degradation in performance for a small portion of the population, the overall and average individual performance improve substantially. The heuristic that adapts based on a decision-maker's history achieved the best results.
Less Is More: Restructuring Decisions to Improve Agent Search
David Sarnehas 2 papers, Avshalom Elmalech, Barbara J. Grosz, Moti Geva

## Session D3 – Virtual Agents I

G57
In creating an evacuation simulation for training and planning, realistic agents that reproduce known phenomenon are required. Evacuation simulation in the airport domain requires additional features beyond most simulations, including the unique behaviors of first-time visitors who have incomplete knowledge of the area and families that do not necessarily adhere to often-assumed pedestrian behaviors. Evacuation simulations not customized for the airport domain do not incorporate the factors important to it, leading to inaccuracies when applied to it. In this paper, we describe ESCAPES, a multiagent evacuation simulation tool that incorporates four key features: (i) different agent types; (ii) emotional interactions; (iii) informational interactions; (iv) behavioral interactions. Our simulator reproduces phenomena observed in existing studies on evacuation scenarios and the features we incorporate substantially impact escape time. We use ESCAPES to model the International Terminal at Los Angeles International Airport (LAX) and receive high praise from security officials.
ESCAPES - Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social comparison
Jason Tsai, Natalie Fridmanhas 2 papers, Emma Bowringhas 2 papers, Matthew Brown, Shira Epstein, Gal A. Kaminkahas 3 papers, Stacy Marsellahas 2 papers, Andrew Ogden, Inbal Rika, Ankur Sheel, Matthew E. Taylorhas 4 papers, Xuezhi Wang, Avishay Zilka, Milind Tambehas 8 papers

## Session A4 – Agent Communication

R54
Commitments provide a flexible means for specifying the business relationships among autonomous and heterogeneous agents, and lead to a natural way of enacting such relationships. However, current formalizations of commitments incorporate conditions expressed as propositions, but disregard (1) temporal regulations and (2) an agent's control over such regulations. Thus, they cannot handle realistic application scenarios where time and control are often central because of domain conventions or other requirements.
We propose a new formalization of commitments that builds on an existing representation of events in which we can naturally express temporal regulations as well as what an agent can control, including indirectly as based on the commitments and capabilities of other agents. Our formalization supports a notion of commitment safety. A benefit of our consolidated approach is that by incorporating these considerations into commitments we enable agents to reason about and flexibly enact the regulations.
The main contributions of this paper include (1) a formal semantics of commitments that accommodates temporal regulations; (2) a formal semantics of the notions of innate and social control; and (3) a formalization of when a temporal commitment is safe for its debtor. We evaluate our contributions using an extensive case study.
Commitments with Regulations: Reasoning about Safety and Control in Regula
Elisa Marengo, Matteo Baldoni, Cristina Baroglio, Amit K. Choprahas 2 papers, Viviana Patti, Munindar P. Singhhas 5 papers

B58
Communication is a key capability of autonomous agents in a multiagent system to exchange information about their environment. It requires a naming convention that typically involves a set of predefined names for all objects in the environment, which the agents share and understand. However, when the agents are heterogeneous, highly distributed, and situated in an unknown environment, it is very unrealistic to assume that all the objects can be foreseen in advance, and therefore their names cannot be defined beforehand. In such a case, each individual agent needs to be able to introduce new names for the objects it encounters and align them with the naming convention used by the other agents. A language game is a prospective mechanism for the agents to learn and align the naming conventions between them. In this paper we extend the language game model by proposing novel strategies for selecting topics, i.e. attracting agent's attention to different objects during the learning process. Using a simulated multi-agent system we evaluate the process of name alignment in the case of the least restrictive type of language game, the naming game without feedback. Utilising proposed strategies we study the dynamic character of formation of coherent naming conventions and compare it with the behaviour of commonly used random selection strategy. The experimental results demonstrate that the new strategies improve the overall convergence of the alignment process, limit agent's overall demand on memory, and scale with the increasing number of the interacting agents.
On Topic Selection Strategies in Multi-Agent Naming Game
Wojciech Lorkiewicz, Ryszard Kowalczykhas 4 papers, Radoslaw Katarzyniak, Quoc Bao Vohas 4 papers

## Session B4 – Game Theory and Learning

B60
In infinitely repeated games, the act of teaching an outcome to our adversaries can be beneficial to reach coordination, as well as allowing us to `steer' adversaries to outcomes that are more beneficial to us. Teaching works well against followers, agents that are willing to go along with the proposal, but can lead to miscoordination otherwise. In the context of infinitely repeated games there is, as of yet, no clear formalism that tries to capture and combine these behaviours into a unified view in order to reach a solution of a game. In this paper, we propose such a formalism in the form of an algorithmic criterion, which uses the concept of targeted learning. As we will argue, this criterion can be a beneficial criterion to adopt in order to reach coordination. Afterwards we propose an algorithm that adheres to our criterion that is able to teach pure strategy Nash Equilibria to a broad class of opponents in a broad class of games and is able to follow otherwise, as well as able to perform well in self-play.
Sequential Targeted Optimality as a New Criterion for Teaching and Following in Repeated Games
Max Knobbout, Gerard A.W. Vreeswijkhas 2 papers

B61
In the well-known scheduling game, a set of jobs controlled by selfish players wishes each to minimize the load of the machine on which it is executed, while the social goal is to minimize the makespan, that is, the maximum load of any machine. We consider this problem on the three most common machines models, identical machines, uniformly related machines and unrelated machines, with respect to both weak and strict Pareto optimal Nash equilibria. These are kinds of equilibria which are stable not only in the sense that no player can improve its cost by changing its strategy unilaterally, but in addition, there is no alternative choice of strategies for the entire set of players where no player increases its cost, and at least one player reduces its cost (in the case of strict Pareto optimality), or where all players reduce their costs (in the case of weak Pareto optimality). We give a complete classification of the social quality of such solutions with respect to an optimal solution, that is, we find the Price of Anarchy of such schedules as a function of the number of machines, *m*. In addition, we give a full classification of the recognition complexity of such schedules.
On the Quality and Complexity of Pareto Equilibria in the Job Scheduling Game
Leah Epstein, Elena Kleiman

G59
False-name bids are bids submitted by a single agent under multiple fictitious names such as multiple e-mail addresses. False-name bidding can be a serious fraud in Internet auctions since identifying each participant is virtually impossible. It is shown that even the theoretically well-founded Vickrey-Clarke-Groves auction (VCG) is vulnerable to falsename bidding. Thus, several auction mechanisms that cannot be manipulated by false-name bids, i.e., false-name-proof mechanisms, have been developed. This paper investigates a slightly different question, i.e., how do they affect (perfect) Bayesian Nash equilibria of first-price combinatorial auctions? The importance of this question is that first-price combinatorial auctions are by far widely used in practice than VCG, and can be used as a benchmark for evaluating alternate mechanisms. In an environment where false-name bidding are possible, analytically investigating bidders' behaviors is very complicated, since nobody knows the number of real bidders. As a first step, we consider a kind of minimal settings where falsename bids become effective, i.e., an auction with two goods where one naive bidder competes with one shill bidder who may pretend to be two distinct bidders. We model this auction as a simple dynamic game and examine approximate Bayesian Nash equilibria by utilizing a numerical technique. Our analysis revealed that false-name bidding significantly affects the first-price auctions. Furthermore, the shill bidder has a clear advantage against the naive bidder.
False-name Bidding in First-price Combinatorial Auctions with Incomplete Information
Atsushi Iwasakihas 5 papers, Atsushi Katsuragi, Makoto Yokoohas 5 papers

## Session C4 – Teamwork

R57
This paper presents a novel method to describe and analyze strategic interactions in settings that include multiple actors, many possible actions and relationships among goals, tasks and resources. It shows how to reduce these large interactions to a set of bilateral normal-form games in which the strategy space is significantly smaller than the original setting, while still preserving many of its strategic characteristics. We demonstrate this technique on the Colored Trails (CT) framework, which encompasses a broad family of games defining multi-agent interactions and has been used in many past studies. We define a set of representative heuristics in a three-player CT setting. Choosing players' strategies from this set, the original CT setting is analytically decomposed into canonical bilateral social dilemmas, i.e., Prisoners' Dilemma, Stag Hunt and Ultimatum games. We present a set of criteria for generating strategically interesting CT games and empirically show that they indeed decompose into bilateral social dilemmas if players play according to the heuristics. Our results have significance for multi-agent systems researchers in mapping large multi-player task settings to well-known bilateral normal-form games in a way that facilitates the analysis of the original setting.
Metastrategies in the Colored Trails Game
Steven de Jonghas 2 papers, Daniel Henneshas 2 papers, Karl Tuylshas 4 papers, Ya'akov (Kobi) Galhas 2 papers

R58
We study the computational complexity of finding stable outcomes in hedonic games, which are a class of coalition formation games. We restrict our attention to a nontrivial subclass of such games, which are guaranteed to possess stable outcomes, i.e., the set of symmetric additively-separable hedonic games. These games are specified by an undirected edge-weighted graph: nodes are players, an outcome of the game is a partition of the nodes into coalitions, and the utility of a node is the sum of incident edge weights in the same coalition. We consider several stability requirements defined in the literature. These are based on restricting feasible player deviations, for example, by giving existing coalition members veto power. We extend these restrictions by considering more general forms of preference aggregation for coalition members. In particular, we consider voting schemes to decide if coalition members will allow a player to enter or leave their coalition. For all of the stability requirements we consider, the existence of a stable outcome is guaranteed by a potential function argument, and local improvements will converge to a stable outcome. We provide an almost complete characterization of these games in terms of the tractability of computing such stable outcomes. Our findings comprise positive results in the form of polynomialtime algorithms, and negative (PLS-completeness) results. The negative results extend to more general hedonic games.
Computing Stable Outcomes in Hedonic Games with Voting-Based Deviations
Martin Gairing, Rahul Savani

B63
The behavior composition problem involves realizing a virtual target behavior (i.e., the desired module) by suitably coordinating the execution of a set of partially controllable available components (e.g., agents, devices, processes, etc.) running in a shared partially predictable environment. All existing approaches to such problem have been framed within *strict* uncertainty settings. In this work, we propose a framework for automatic behavior composition which allows the seamless integration of classical behavior composition with decision-theoretic reasoning. Specifically, we consider the problem of *maximizing the “expected realizability”* of the target behavior in settings where the uncertainty can be quantified. Unlike previous proposals, the approach developed here is able to (better) deal with instances that do not accept “exact” solutions, thus yielding a more practical account for real domains. Moreover, it is provably strictly more general than the classical composition framework. Besides formally defining the problem and what counts as a solution, we show how a decision-theoretic composition problem can be solved by reducing it to the problem of finding an optimal policy in a Markov decision process.
Decision Theoretic Behavior Composition
Nitin Yadav, Sebastian Sardina

G60
An interesting problem of multi-agent systems is that of voting, in which the preferences of autonomous agents are to be combined. Applications of voting include modeling social structures, search engine ranking, and choosing a leader among computational agents. In the setting of voting, it is very important that each agent presents truthful information about his or her preferences, and not manipulate. The choice of election system may encourage or discourage voters from manipulating. Because manipulation often results in undesirable consequences, making the determination of such intractable is an important goal. An interesting metric on the robustness of an election system concerns the frequency in which opportunities of manipulations occur in a given election system. Previous work by Walsh has evaluated the frequency of manipulation in the context of very specific election systems, particularly veto, when the number of candidates is limited to at most three, by showing that manipulation problems in these systems can be directly viewed as problems of (Two-Way) Partition, and then using the best known heuristics of Partition. Walsh also claimed similar results hold for k-candidate veto election by way of problems involving multi-way partitions. We show that the results for k-candidate veto elections do not follow directly from common versions of partition problems and require non-trivial modifications to Multi-Way Partition. With these modifications, we confirm Walsh's claim that these elections are also vulnerable to manipulation. Our new computational problems also allow one to evaluate manipulation in the general case of k-candidate scoring protocols. We investigate the complexity of manipulating scoring protocols using new algorithms we derive by extending the known algorithms of Multi-Way Partition. It is our conclusion that the problems of manipulation in more general scoring protocols of four or more candidates are not vulnerable to manipulation using extensions of the current known algorithms of Multi-Way Partition. This may be due to weaknesses in these algorithms or complexity in manipulating general scoring protocols.
Solving Election Manipulation Using Integer Partitioning Problems
Andrew Lin

## Session A5 – Learning Agents

G61
The field of multiagent decision making is extending its tools from classical game theory by embracing reinforcement learning, statistical analysis, and opponent modeling. For example, behavioral economists conclude from experimental results that people act according to levels of reasoning that form a "cognitive hierarchy" of strategies, rather than merely following the hyper-rational Nash equilibrium solution concept. This paper expands this model of the iterative reasoning process by widening the notion of a level within the hierarchy from one single strategy to a distribution over strategies, leading to a more general framework of multiagent decision making. It provides a measure of sophistication for strategies and can serve as a guide for designing good strategies for multiagent games, drawing it's main strength from predicting opponent strategies. We apply these lessons to the recently introduced Lemonade-stand Game, a simple setting that includes both collaborative and competitive elements, where an agent's score is critically dependent on its responsiveness to opponent behavior. The opening moves are significant to the end result and simple heuristics have achieved faster cooperation than intricate learning schemes. Using results from the past two real-world tournaments, we show how the submitted entries fit naturally into our model and explain why the top agents were successful.
Using Iterated Reasoning to Predict Opponent Strategies
Michael Wunder, John Robert Yaros, Michael Littman, Michael Kaisershas 3 papers

R59
In continuous learning settings stochastic stable policies are often necessary to ensure that agents continuously adapt to dynamic environments. The choice of the decentralised learning system and the employed policy plays an important role in the optimisation task. For example, a policy that exhibits fluctuations may also introduce non-linear effects which other agents in the environment may not be able to cope with and even amplify these effects. In dynamic and unpredictable multiagent environments these oscillations may introduce instabilities. In this paper, we take inspiration from the limbic system to introduce an extension to the weighted policy learner, where agents evaluate rewards as either positive or negative feedback, depending on how they deviate from average expected rewards. Agents have positive and negative biases, where a bias either magnifies or depresses a positive or negative feedback signal. To contain the non-linear effects of biased rewards, we incorporate a decaying memory of past positive and negative feedback signals to provide a smoother gradient update on the probability simplex, spreading out the effect of the feedback signal over time. By splitting the feedback signal, more leverage on the win or learn fast (WoLF) principle is possible. The cognitive policy learner is evaluated using a small queueing network and compared with the fair action and weighted policy learner. Emphasis is placed on analysing the dynamics of the learning algorithms with respect to the stability of the queueing network and the overall queueing performance.
Cognitive Policy Learner: Biasing Winning or Losing Strategies
Dominik Dahlem, Jim Dowling, William Harrison

## Session B5 – Auction and Incentive Design

R62
We consider a setting in which a principal seeks to induce an adaptive agent to select a target action by providing incentives on one or more actions. The agent maintains a belief about the value for each action—which may update based on experience—and selects at each time step the action with the maximal sum of value and associated incentive. The principal observes the agent's selection, but has no information about the agent's current beliefs or belief update process. For inducing the target action as soon as possible, or as often as possible over a fixed time period, it is optimal for a principal with a per-period budget to assign the budget to the target action and wait for the agent to want to make that choice. But with an across-period budget, no algorithm can provide good performance on all instances without knowledge of the agent's update process, except in the particular case in which the goal is to induce the agent to select the target action once. We demonstrate ways to overcome this strong negative result with knowledge about the agent's beliefs, by providing a tractable algorithm for solving the offline problem when the principal has perfect knowledge, and an analytical solution for an instance of the problem in which partial knowledge is available.
Incentive Design for Adaptive Agents
Yiling Chenhas 2 papers, Jerry Kung, David C. Parkeshas 2 papers, Ariel D. Procaccia, Haoqi Zhang

B64
So far computer cannot satisfyingly solve many tasks that are extremely easy for human, such as image recognition or common sense reasoning. A partial solution is to delegate algorithmically difficult computation task to human, called human computation. The Game with a Purpose (GWAP), in which computational task is transformed into a game, is perhaps the most popular form of human computation. A simplified adverse selection model for output-agreement / simultaneous-verification GWAP was built, using the ESP Game as example. The experiment results favored an adverse selection model over an moral hazard model. We were particularly interested in output quality of a GWAP affected by how players are matched with each other, and proposed capability-aligned matching (CAM) versus commonly-used random matching. The analysis showed that when compared with random mathcing, the CAM improved output quality. The experiment confirmed conclusions drawed from the analysis, and further pointed out that task-human matching scheme was as important as human-human matching scheme studied in this paper. The main contribution of this paper is the analysis and empirical evaluation of humanhuman matching scheme, showing that capability-aligned matching can improve quality of GWAP.
Capability-Aligned Matching: Improving Quality of Games with a Purpose
Che-Liang Chiou, Jane Yung-Jen Hsuhas 2 papers

G62
Mechanism design studies how to design mechanisms that result in good outcomes even when agents strategically report their preferences. In traditional settings, it is assumed that a mechanism can enforce payments to give an incentive for agents to act honestly. However, in many Internet application domains, introducing monetary transfers is impossible or undesirable. Also, in such highly anonymous settings as the Internet, declaring preferences dishonestly is not the only way to manipulate the mechanism. Often, it is possible for an agent to pretend to be multiple agents and submit multiple reports under different identifiers, e.g., by creating different e-mail addresses. The effect of such false-name manipulations can be more serious in a mechanism without monetary transfers, since submitting multiple reports would have no risk. In this paper, we present a case study in false-name-proof mechanism design without money. In our basic setting, agents are located on a real line, and the mechanism must select the location of a public facility; the cost of an agent is its distance to the facility. This setting is called the facility location problem and can represent various situations where an agent's preference is single-peaked. First, we fully characterize the deterministic false-name-proof facility location mechanisms in this basic setting. By utilizing this characterization, we show the tight bounds of the approximation ratios for two objective functions: social cost and maximum cost. We then extend the results in two natural directions: a domain where a mechanism can be randomized and a domain where agents are located in a tree. Furthermore, we clarify the connections between false-name-proofness and other related properties.
False-name-proof Mechanism Design without Money
Taiki Todo, Atsushi Iwasakihas 5 papers, Makoto Yokoohas 5 papers

## Session C5 – Simulation and Emergence

B66
We study the phenomenon of evolution of cooperation in a society of self-interested agents using repeated games in graphs. A repeated game in a graph is a multiple round game, where, in each round, an agent gains payoff by playing a game with its neighbors and updates its action (state) by using the actions and/or payoffs of its neighbors. The interaction model between the agents is a two-player, two-action (cooperate and defect) Prisoner's Dilemma (PD) game (a prototypical model for interaction between self-interested agents). The conventional wisdom is that the presence of network structure enhances cooperation and current models use multiagent simulation to show evolution of cooperation. However, these results are based on particular combination of interaction game, network model and state update rules (e.g., PD game on a grid with imitate your best neighbor rule leads to evolution of cooperation). The state-of-the-art lacks a comprehensive picture of the dependence of the emergence of cooperation on model parameters like network topology, interaction game, state update rules and initial fraction of cooperators. We perform a thorough study of the phenomenon of evolution of cooperation using (a) a set of popular categories of networks, namely, grid, random networks, scale-free networks, and small-world networks and (b) a set of cognitively motivated update rules. Our simulation results show that the evolution of cooperation in networked systems is quite nuanced and depends on the combination of network type, update rules and the initial fraction of cooperating agents. We also provide an analysis to support our simulation results.
The Evolution of Cooperation in Self-Interested Agent Societies: A Critical Study
Lisa-Maria Hofmann, Nilanjan Chakraborty, Katia Sycarahas 7 papers

G64
We analyze and extend a recently proposed model of linguistic diffusion in social networks, to analytically derive time to convergence, and to account for the innovation phase of lexical dynamics in networks. Our new model, the degree-biased voter model with innovation, shows that the probability of existence of a norm is inversely related to innovation probability. When the innovation rate in the population is low, variants that become norms are due to a peripheral member with high probability. As the innovation rate increases, the fraction of time that the norm is a peripheral-introduced variant and the total time for which a norm exists at all in the population decrease. These results align with historical observations of rapid increase and generalization of slang words, technical terms, and new common expressions at times of cultural change in some languages.
A Model of Norm Emergence and Innovation in Language Change
Samarth Swarup, Andrea Apolloni, Zsuzsanna Fagyal

## Session D5 – Logic-Based Approaches II

R65
In modal logic, when adding a syntactic property to an axiomatisation, this property will semantically become true in all models, in all situations, under all circumstances. For instance, adding a property like *K_a p → K_b p* (agent *b* knows at least what agent *a* knows) to an axiomatisation of some epistemic logic has as an effect that such a property becomes globally true, i.e., it will hold in all states, at all time points (in a temporal setting), after every action (in a dynamic setting) and after any communication (in an update setting), and every agent will know that it holds, it will even be common knowledge. We propose a way to express that a property like the above only needs to hold locally: it may hold in the actual state, but not in all states, and not all agents may know that it holds. We can achieve this by adding relational atoms to the language that represent (implicitly) quantification over all formulas, as in *∀ p (K_a p → K_b p)*. We show how this can be done for a rich class of modal logics and a variety of syntactic properties.
Reasoning About Local Properties in Modal Logic
Hans van Ditmarsch, Wiebe van der Hoekhas 5 papers, Barteld Kooi

R66
Logics of propositional control, such as van der Hoek and Wooldridge's CL-PC, were introduced in order to represent and reason about scenarios in which each agent within a system is able to exercise unique control over some set of system variables. Our aim in the present paper is to extend the study of logics of propositional control to settings in which these agents have incomplete information about the society they occupy. We consider two possible sources of incomplete information. First, we consider the possibility that an agent is only able to “read” a subset of the overall system variables, and so in any given system state, will have partial information about the state of the system. Second, we consider the possibility that an agent has incomplete information about which agent controls which variables. For both cases, we introduce a logic combining epistemic modalities with the operators of CL-PC, investigate its axiomatization, and discuss its properties.
Knowledge and Control
Wiebe van der Hoekhas 5 papers, Nicolas Troquard, Michael Wooldridgehas 6 papers

R68
In epistemic logic, Kripke structures are used to model the distribution of information in a multi-agent system. In this paper, we present an approach to quantifying how much information each particular agent in a system has, or how important the agent is, with respect to some fact represented as a goal formula. It is typically the case that the goal formula is distributed knowledge in the system, but that no individual agent alone knows it. It might be that several different groups of agents can get to know the goal formula together by combining their individual knowledge. By using power indices developed in voting theory, such as the Banzhaf index, we get a measure of how important an agent is in such groups. We analyse the properties of this notion of information-based power in detail, and characterise the corresponding class of voting games. Although we mainly focus on distributed knowledge, we also look at variants of this analysis using other notions of group knowledge. An advantage of our framework is that power indices and other power properties can be expressed in standard epistemic logic. This allows, e.g., standard model checkers to be used to quantitatively analyse the distribution of information in a given Kripke structure.
Scientia Potentia Est
Thomas Ågotnes, Wiebe van der Hoekhas 5 papers, Michael Wooldridgehas 6 papers

## Session A6 – Robotics and Learning

R70
Maintaining accurate world knowledge in a complex and changing environment is a perennial problem for robots and other artificial intelligence systems. Our architecture for addressing this problem, called Horde, consists of a large number of independent reinforcement learning sub-agents, or demons. Each demon is responsible for answering a single predictive or goal-oriented question about the world, thereby contributing in a factored, modular way to the system's overall knowledge. The questions are in the form of a value function, but each demon has its own policy, reward function, termination function, and terminal-reward function unrelated to those of the base problem. Learning proceeds in parallel by all demons simultaneously so as to extract the maximal training information from whatever actions are taken by the system as a whole. Gradient-based temporal-difference learning methods are used to learn efficiently and reliably with function approximation in this off-policy setting. Horde runs in constant time and memory per time step, and is thus suitable for learning online in real-time applications such as robotics. We present results using Horde on a multi-sensored mobile robot to successfully learn goal-oriented behaviors and long-term predictions from off-policy experience. Horde is a significant incremental step towards a real-time architecture for efficient learning of general knowledge from unsupervised sensorimotor interaction.
Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction
Richard S. Sutton, Joseph Modayil, Michael Delp, Thomas Degris, Patrick M. Pilarski, Adam White, Doina Precuphas 2 papers

B69
In several realistic domains an agent's behavior is composed of multiple interdependent skills. For example, consider a humanoid robot that must play soccer, as is the focus of this paper. In order to succeed, it is clear that the robot needs to walk quickly, turn sharply, and kick the ball far. However, these individual skills are ineffective if the robot falls down when switching from walking to turning, or if it cannot position itself behind the ball for a kick. This paper presents a learning architecture for a humanoid robot soccer agent that has been fully deployed and tested within the RoboCup 3D simulation environment. First, we demonstrate that individual skills such as walking and turning can be parameterized and optimized to match the best performance statistics reported in the literature. These results are achieved through effective use of the CMA-ES optimization algorithm. Next, we describe a framework for optimizing skills in conjunction with one another, a little-understood problem with substantial practical significance. Over several phases of learning, a total of roughly 100-150 parameters are optimized. Detailed experiments show that an agent thus optimized performs comparably with the top teams from the RoboCup 2010 competitions, while taking relatively few man-hours for development.
On Optimizing Interdependent Skills: A Case Study in Simulated 3D Humanoid Robot Soccer
Daniel Urielihas 2 papers, Patrick MacAlpine, Shivaram Kalyanakrishnan, Yinon Bentor, Peter Stonehas 5 papers

## Session B6 – Energy Applications

R72
The creation of Virtual Power Plants (VPPs) has been suggested in recent years as the means for achieving the cost-efficient integration of the many distributed energy resources (DERs) that are starting to emerge in the electricity network. In this work, we contribute to the development of VPPs by offering a game-theoretic perspective to the problem. Specifically, we design cooperatives (or “cooperative VPPs”—CVPPs) of rational autonomous DER-agents representing small-to-medium size renewable electricity producers, which coalesce to profitably sell their energy to the electricity grid. By so doing, we help to counter the fact that individual DERs are often excluded from the wholesale energy market due to their perceived inefficiency and unreliability. We discuss the issues surrounding the emergence of such cooperatives, and propose a pricing mechanism with certain desirable properties. Specifically, our mechanism guarantees that CVPPs have the incentive to truthfully report to the grid accurate estimates of their electricity production, and that larger rather than smaller CVPPs form; this promotes CVPP efficiency and reliability. In addition, we propose a scheme to allocate payments within the cooperative, and show that, given this scheme and the pricing mechanism, the allocation is in the core and, as such, no subset of members has a financial incentive to break away from the CVPP. Moreover, we develop an analytical tool for quantifying the uncertainty about DER production estimates, and distinguishing among different types of errors regarding such estimates. We then utilize this tool to devise protocols to manage CVPP membership. Finally, we demonstrate these ideas through a simulation that uses real-world data.
Cooperatives of Distributed Energy Resources for Efficient Virtual Power Plants
Georgios Chalkiadakis, Valentin Robuhas 2 papers, Ramachandra Kotahas 2 papers, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

R73
Traffic causes pollution and demands fuel. When it comes to vehicle traffic, intersections tend to be a main bottleneck. Traditional approaches to control traffic at intersections have not been designed to optimize any environmental criterion. Our objective is to design mechanisms for intersection control which minimize fuel consumption.
This is difficult because it requires a specialized infrastructure: It must allow vehicles and intersections to communicate, e.g., vehicles send their dynamic characteristics (position, speed etc.) to the intersection more or less continuously so that it can estimate the fuel consumption. In this context, the use of software agents supports the driver by reducing the necessary degree of direct interaction with the intersection.
In this paper, we quantify the fuel consumption with existing agent-based approaches for intersection control. Further, we propose a new, agent-based mechanism for intersection control, with minimization of fuel consumption as an explicit design objective. It reduces fuel consumption by up to 26% and waiting time by up to 98%, compared to traffic lights. Thus, agent-based mechanisms for intersection control may reduce fuel consumption in a way that is substantial.
How Agents Can Help Curbing Fuel Combustion – a Performance Study of Intersection Control for Fuel-Operated Vehicles
Natalja Pulter, Heiko Schepperle, Klemens Böhmhas 2 papers

G65
Intelligent electricity grids, or `Smart Grids', are being introduced at a rapid pace. Smart grids allow the management of new distributed power generators such as solar panels and wind turbines, and innovative power consumers such as plug-in hybrid vehicles. One challenge in Smart Grids is to fulfill consumer demands while avoiding infrastructure overloads. Another challenge is to reduce imbalance costs: after ahead scheduling of production and consumption (the so-called `load schedule'), unpredictable changes in production and consumption yield a cost for repairing this balance. To cope with these risks and costs, we propose a decentralized, multi-agent system solution for coordinated charging of PHEVs in a Smart Grid. Essentially, the MAS utilizes an "intention graph" for expressing the flexibility of a fleet of PHEVs. Based on this flexibility, charging of PHEVs can be rescheduled in real-time to reduce imbalances. We discuss and evaluate two scheduling strategies for reducing imbalance costs: reactive scheduling and proactive scheduling. Simulations show that reactive scheduling is able to reduce imbalance costs by 14%, while proactive scheduling yields the highest imbalance cost reduction of 44%.
Decentralized Coordination Of Plug-in Hybrid Vehicles For Imbalance Reduction In A Smart Grid
Stijn Vandael, Klaas De Craemer, Nelis Boucké, Tom Holvoet, Geert Deconinck

B70
Plug-in hybrid electric vehicles are expected to place a considerable strain on local electricity distribution networks, requiring charging to be coordinated in order to accommodate capacity constraints. We design a novel online auction protocol for this problem, wherein vehicle owners use agents to bid for power and also state time windows in which a vehicle is available for charging. This is a multi-dimensional mechanism design domain, with owners having non-increasing marginal valuations for each subsequent unit of electricity. In our design, we couple a greedy allocation algorithm with the occasional “burning” of allocated power, leaving it unallocated, in order to adjust an allocation and achieve monotonicity and thus truthfulness. We consider two variations: burning at each time step or on-departure. Both mechanisms are evaluated in depth, using data from a real-world trial of electric vehicles in the UK to simulate system dynamics and valuations. The mechanisms provide higher allocative efficiency than a fixed price system, are almost competitive with a standard scheduling heuristic which assumes non-strategic agents, and can sustain a substantially larger number of vehicles at the same per-owner fuel cost saving than a simple random scheme.
Online Mechanism Design for Electric Vehicle Charging
Enrico H. Gerding, Valentin Robuhas 2 papers, Sebastian Stein, David C. Parkeshas 2 papers, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

## Session C6 – Voting Protocols

R75
In a voting system, sometimes multiple new alternatives will join the election after the voters' preferences over the initial alternatives have been revealed. Computing whether a given alternative can be a co-winner when multiple new alternatives join the election is called the possible co-winner with new alternatives (PcWNA) problem and was introduced by (Chevaleyre et al., 2010). In this paper, we show that the PcWNA problems are NP-complete for the Bucklin, Copeland*_0*, and maximin (a.k.a. Simpson) rule, even when the number of new alternatives is no more than a constant. We also show that the PcWNA problem can be solved in polynomial time for plurality with runoff. For the approval rule, we examine three different ways to extend a linear order with new alternatives, and characterize the computational complexity of the PcWNA problem for each of them.
Possible Winners When New Alternatives Join: New Results Coming Up!
Lirong Xia, Jérôme Langhas 2 papers, Jérôme Monnot

B71
Electoral control models ways of changing the outcome of an election via such actions as adding/deleting/partitioning either candidates or voters. These actions modify an election's participation structure and aim at either making a favorite candidate win (“constructive control”) or prevent a despised candidate from winning (“destructive control”). To protect elections from such control attempts, computational complexity has been used to show that electoral control, though not impossible, is computationally prohibitive. Recently, Erdélyi and Rothe proved that Brams and Sanver's fallback voting, a hybrid voting system that combines Bucklin with approval voting, is resistant to each of the standard types of control except five types of voter control. They proved that fallback voting is vulnerable to two of those control types, leaving the other three cases open. We solve these three open problems, thus showing that fallback voting is resistant to all standard types of control by partition of voters–which is a particularly important and well-motivated control type, as it models “two-district gerrymandering.” Hence, fallback voting is not only fully resistant to candidate control but also fully resistant to constructive control, and it displays the broadest resistance to control currently known to hold among natural voting systems with a polynomial-time winner problem. We also show that Bucklin voting behaves almost as good in terms of control resistance. Each resistance for Bucklin voting strengthens the corresponding control resistance for fallback voting.
The Complexity of Voter Partition in Bucklin and Fallback Voting: Solving Three Open Problems
Gábor Erdélyi, Lena Piras, Jörg Rothehas 2 papers

G66
A possible winner of an election is a candidate that has, in some kind of incomplete-information election, the possibility to win in a complete extension of the election. The first type of problem we study is the Possible co-Winner with respect to the Addition of New Candidates (PcWNA) problem, which asks, given an election with strict preferences over the candidates, is it possible to make a designated candidate win the election by adding a limited number of new candidates to the election? In the case of unweighted voters we show NP-completeness of PcWNA for a broad class of pure scoring rules. We will also briefly study the case of weighted voters. The second type of possible winner problem we study is Possible Winner/co-Winner under Uncertain Voting System (PWUVS and PcWUVS). Here, uncertainty is present not in the votes but in the election rule itself. For example, PcWUVS is the problem of whether, given a set *C* of candidates, a list of votes over *C*, a distinguished candidate *c ın C*, and a class of election rules, there is at least one election rule from this class under which *c* wins the election. We study these two problems for a class of systems based on approval voting, the family of Copeland*^α* elections, and a certain class of scoring rules. Our main result is that it is NP-complete to determine whether there is a scoring vector that makes c win the election, if we restrict the set of possible scoring vectors for an m-candidate election to those of the form *(α_1, …, α_{m-4}, x_1, x_2, x_3, 0)*, with *x_i = 1* for at least one *i ın {1, 2, 3}*.
Computational Complexity of Two Variants of the Possible Winner Problem
Dorothea Baumeister, Magnus Roos, Jörg Rothehas 2 papers

## Session D6 – Trust and Organisational Structure

G67
We propose that the trust an agent places in another agent declaratively captures an architectural *connector* between the two agents. We formulate trust as a generic modality expressing a relationship between a truster and a trustee. Specifically, trust here is *definitionally independent* of, albeit constrained by, other relevant modalities such as commitments and beliefs. Trust applies to a variety of attributes of the relationship between truster and trustee. For example, an agent may trust someone to possess an important capability, exercise good judgment, or to intend to help it. Although such varieties of trust are hugely different, they respect common logical patterns. We present a logic of trust that expresses such patterns as reasoning postulates concerning the static representation of trust, its dynamics, and its relationships with teamwork and other agent interactions. In this manner, the proposed logic illustrates the general properties of trust that reflect natural intuitions, and can facilitate the engineering of multiagent systems.
Trust as Dependence: A Logical Approach
Munindar P. Singhhas 5 papers

## Session A7 – Argumentation and Negotiation

R78
There is now considerable evidence in social psychology, economics, and related disciplines that emotion plays an important role in negotiation. For example, humans make greater concessions in negotiation to an opposing human who expresses anger, and they make fewer concessions to an opponent who expresses happiness, compared to a no-emotion-expression control. However, in AI, despite the wide interest in negotiation as a means to resolve differences between agents and humans, emotion has been largely ignored. This paper explores whether expression of anger or happiness by computer agents, in a multi-issue negotiation task, can produce effects that resemble effects seen in human-human negotiation. The paper presents an experiment where participants play with agents that express emotions (anger vs. happiness vs. control) through different modalities (text vs. facial displays). An important distinction in our experiment is that participants are aware that they negotiate with computer agents. The data indicate that the emotion effects observed in past work with humans also occur in agent-human negotiation, and occur independently of modality of expression. The implications of these results are discussed for the fields of automated negotiation, intelligent virtual agents and artificial intelligence.
The Effect of Expression of Anger and Happiness in Computer Agents on Negotiations with Humans
Celso M. de Melo, Peter Carnevale, Jonathan Gratchhas 2 papers

## Session B7 – Planning

B75
Over the past few years, attempts to scale up infinite-horizon DEC-POMDPs are mainly due to approximate algorithms, but without the theoretical guarantees of their exact counterparts. In contrast, *ε*-optimal methods have only theoretical significance but are not efficient in practice. In this paper, we introduce an algorithmic framework (*β*-PI) that exploits the scalability of the former while preserving the theoretical properties of the latter. We build upon *β*-PI a family of approximate algorithms that can find (provably) errorbounded solutions in reasonable time. Among this family, H-PI uses a branch-and-bound search method that computes a near-optimal solution over distributions over histories experienced by the agents. These distributions often lie near structured, low-dimensional subspace embedded in the high-dimensional sufficient statistic. By planning only on this subspace, H-PI successfully solves all tested benchmarks, outperforming standard algorithms, both in solution time and policy quality.
Toward Error-Bounded Algorithms for Infinite-Horizon DEC-POMDPs
Jilles S. Dibangoye, Abdel-Illah Mouaddib, Brahim Chaib-draa

B76
The use of distributed POMDPs for cooperative teams has been severely limited by the incredibly large joint policyspace that results from combining the policy-spaces of the individual agents. However, much of the computational cost of exploring the entire joint policy space can be avoided by observing that in many domains important interactions between agents occur in a relatively small set of scenarios, previously defined as *coordination locales* (CLs). Moreover, even when numerous interactions *might* occur, given a set of individual policies there are relatively few *actual* interactions. Exploiting this observation and building on an existing model shaping algorithm, this paper presents D-TREMOR, an algorithm in which cooperative agents iteratively generate individual policies, identify and communicate possible interactions between their policies, shape their models based on this information and generate new policies. D-TREMOR has three properties that jointly distinguish it from previous DEC-POMDP work: (1) it is completely distributed; (2) it is scalable (allowing 100 agents to compute a “good” joint policy in under 6 hours) and (3) it has low communication overhead. D-TREMOR complements these traits with the following key contributions, which ensure improved scalability and solution quality: (a) techniques to ensure convergence; (b) faster approaches to detect and evaluate CLs; (c) heuristics to capture dependencies between CLs; and (d) novel shaping heuristics to aggregate effects of CLs. While the resulting policies are not globally optimal, empirical results show that agents have policies that effectively manage uncertainty and the joint policy is better than policies generated by independent solvers.
Distributed Model Shaping for Scaling to Decentralized POMDPs with Hundreds of Agents
Prasanna Velagapudi, Pradeep Varakanthamhas 4 papers, Katia Sycarahas 7 papers, Paul Scerrihas 3 papers

G73
This contribution proposes a model for argumentation-based multi-agent planning, with a focus on cooperative scenarios. It consists in a multi-agent extension of DeLP-POP, partial order planning on top of argumentation-based defeasible logic programming. In DeLP-POP, actions and arguments (combinations of rules and facts) may be used to enforce some goal, if their conditions (are known to) apply and arguments are not defeated by other arguments applying. In a cooperative planning problem a team of agents share a set of goals but have diverse abilities and beliefs. In order to plan for these goals, agents start a stepwise dialogue consisting of exchanges of plan proposals, plus arguments against them. Since these dialogues instantiate an A search algorithm, these agents will find a solution if some solution exists, and moreover, it will be provably optimal (according to their knowledge).
Multiagent Argumentation for Cooperative Planning in DeLP-POP
Pere Pardo, Sergio Pajares, Eva Onaindiahas 2 papers, Pilar Dellunde, Lluís Godo

## Session C7 – Game Theory II

## Session D7 – Virtual Agents II

G78
In this paper, we merge speech act theory, emotion theory, and logic. We propose a modal logic that integrates the concepts of belief, goal, ideal and responsibility and that allows to describe what a given agent expresses in the context of a conversation with another agent. We use the logic in order to provide a systematic analysis of expressive speech acts, that is, speech acts that are aimed at expressing a given emotion (e.g. to apologize, to thank, to reproach, etc.).
The Face of Emotions: A Logical Formalization of Expressive Speech Acts
Nadine Guiraudhas 2 papers, Dominique Longin, Emiliano Lorinihas 2 papers, Sylvie Pesty, Jérémy Rivière

G79
While speaking about social interaction, psychology claims as crucial the temporal correlations between interactants' behaviors: to give to their partners a feeling of natural interaction, interactants, be human, robotic or virtual, must be able to react on appropriate time. Recent approaches consider autonomous agents as dynamical systems and the interaction as a coupling between these systems. These approaches solve the issue of time handling and enable to model synchronization and turn-taking as phenomenon emerging with the coupling. But when complex computations are added to their architecture, such as processing of video and audio signals, delays appear within the interaction loop and disrupt this coupling. We model here a dyad of agents where processing delays are controlled. These agents, driven by oscillators, synchronize and take turns when there is no delay. We describe the methodology enabling to evaluate the synchrony and turn-taking emergence. We test oscillators coupling properties when there is no delay: coupling occurs if coupling strength is inferior to the parameter controlling oscillators natural period and if the ratio between oscillators periods is inferior to 1/2. We quantify the maximal delays between agents which do not disrupt the interaction: the maximal delay tolerated by agents is proportional to the natural period of the coupled system and to the strength of the coupling. These results are put in perspective with the different time constraints of human-human and human-agent interactions.
Effect of Time Delays on Agents' Interaction Dynamics
Ken Prepin, Catherine Pelachaud