Full Papers

Session BP1 – Best Papers Session I
Session BP2 – Best Papers Session II
Session A1 – Robotics
Session B1 – Distributed Problem Solving I
Session C1 – Game Theory I
Session D1 – Multiagent Learning
Session A2 – Logic-Based Approaches I
Session B2 – Agent-Based System Development I
Session C2 – Social Choice Theory
Session D2 – Preferences and Strategies
Session A3 – Distributed Problem Solving II
Session B3 – Agent-Based System Development II
Session C3 – Bounded Rationality
Session D3 – Virtual Agents I
Session A4 – Agent Communication
Session B4 – Game Theory and Learning
Session C4 – Teamwork
Session A5 – Learning Agents
Session B5 – Auction and Incentive Design
Session C5 – Simulation and Emergence
Session D5 – Logic-Based Approaches II
Session A6 – Robotics and Learning
Session B6 – Energy Applications
Session C6 – Voting Protocols
Session D6 – Trust and Organisational Structure
Session A7 – Argumentation and Negotiation
Session B7 – Planning
Session C7 – Game Theory II
Session D7 – Virtual Agents II

Session BP1 – Best Papers Session I

B39

Central to the vision of the smart grid is the deployment of smart meters that will allow autonomous software agents, representing the consumers, to optimise their use of devices and heating in the smart home while interacting with the grid. However, without some form of coordination, the population of agents may end up with overly-homogeneous optimised consumption patterns that may generate significant peaks in demand in the grid. These peaks, in turn, reduce the efficiency of the overall system, increase carbon emissions, and may even, in the worst case, cause blackouts. Hence, in this paper, we introduce a novel model of a Decentralised Demand Side Management (DDSM) mechanism that allows agents, by adapting the deferment of their loads based on grid prices, to coordinate in a decentralised manner. Specifically, using average UK consumption profiles for 26M homes, we demonstrate that, through an emergent coordination of the agents, the peak demand of domestic consumers in the grid can be reduced by up to 17% and carbon emissions by up to 6%. We also show that our DDSM mechanism is robust to the increasing electrification of heating in UK homes (i.e., it exhibits a similar efficiency). Agent-Based Control for Decentralised Demand Side Management in the Smart Grid Sarvapali D. Ramchurn, Perukrishnen Vytelingum, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

G38

Grid-Integrated Vehicles (GIVs) are plug-in Electric Drive Vehicles (EDVs) with power-management and other controls that allow them to respond to external commands sent by power-grid operators, or their affiliates, when parked and plugged-in to the grid. At a bare minimum, such GIVs should respond to demand-management commands or pricing signals to delay, reduce or switch-off the rate of charging when the demand for electricity is high. In more advanced cases, these GIVs might sell both power and storage capacity back to the grid in any of the several electric power markets – a concept known as Vehicle-to-Grid power or V2G power. Although individual EDVs control too little power to sell in the market at an individual level, a large group of EDVs may form an aggregate or coalition that controls enough power to meaningfully sell, at a profit, in these markets. The profits made by such a coalition can then be used by the coalition members to offset the costs of the electric vehicles and batteries themselves. In this paper we describe an implemented and deployed multi-agent system that is used to integrate EDVs into the electricity grid managed by PJM, the largest transmission service operator in the world. We provide a brief introduction to GIVs and the various power markets and discuss why multi-agent systems are a good match for this application. Deploying Power Grid-Integrated Electric Vehicles as a Multi-Agent System Sachin Kamboj, Willett Kempton, Keith S. Decker

B40

In this paper we propose a Multi-Agent version of UCT Monte Carlo Go. We use the emergent behavior of a great number of simple agents to increase the quality of the Monte Carlo simulations, increasing the strength of the artificial player as a whole. Instead of one agent playing against itself, different agents play in the simulation phase of the algorithm, leading to a better exploration of the search space. We could significantly overcome Fuego, a top Computer Go software. Emergent behavior seems to be the next step of Computer Go development. Multi-Agent Monte Carlo Go Leandro Soriano Marcolino, Hitoshi Matsubara

G39

Researchers in the field of multiagent sequential decision making have commonly used the terms “weakly-coupled” and “loosely-coupled” to qualitatively classify problems involving agents whose interactions are limited, and to identify various structural restrictions that yield computational advantages to decomposing agents' centralized planning and reasoning into largely-decentralized planning and reasoning. Together, these restrictions make up a heterogeneous collection of facets of “weakly-coupled” structure that are conceptually related, but whose purported computational benefits are hard to compare evenhandedly. The contribution of this paper is a unified characterization of weak coupling that brings together three complementary aspects of agent interaction structure. By considering these aspects in combination, we derive new bounds on the computational complexity of optimal DecPOMDP planning, that together quantify the relative benefits of exploiting different forms of interaction structure. Further, we demonstrate how our characterizations can be used to explain why existing classes of decoupled solution algorithms perform well on some problems but poorly on others, as well as to predict the performance of a particular algorithm from identifiable problem attributes. Towards a Unifying Characterization for Quantifying Weak Coupling in Dec-POMDPs Stefan J. Witwicki, Edmund H. Durfeehas 3 papers

G40

Building on research previously reported at AAMAS conferences, this paper describes an innovative application of a novel gametheoretic approach for a national scale security deployment. Working with the United States Transportation Security Administration (TSA), we have developed a new application called GUARDS to assist in resource allocation tasks for airport protection at over 400 United States airports. In contrast with previous efforts such as ARMOR and IRIS, which focused on one-off tailored applications and one security activity (e.g. canine patrol or checkpoints) per application, GUARDS faces three key issues: (i) reasoning about hundreds of heterogeneous security activities; (ii) reasoning over diverse potential threats; (iii) developing a system designed for hundreds of end-users. Since a national deployment precludes tailoring to specific airports, our key ideas are: (i) creating a new game-theoretic framework that allows for heterogeneous defender activities and compact modeling of a large number of threats; (ii) developing an efficient solution technique based on general purpose Stackelberg game solvers; (iii) taking a partially centralized approach for knowledge acquisition and development of the system. In doing so we develop a software scheduling assistant, GUARDS, designed to reason over two agents – the TSA and a potential adversary – and allocate the TSA's limited resources across hundreds of security activities in order to provide protection within airports. The scheduling assistant has been delivered to the TSA and is currently under evaluation and testing for scheduling practices at an undisclosed airport. If successful, the TSA intends to incorporate the system into their unpredictable scheduling practices nationwide. In this paper we discuss the design choices and challenges encountered during the implementation of GUARDS. GUARDS represents promising potential for transitioning years of academic research into a nationally deployed system. GUARDS - Game Theoretic Security Allocation on a National Scale James Pitahas 2 papers, Milind Tambehas 8 papers, Christopher Kiekintveldhas 4 papers, Shane Cullen, Erin Steigerwald

Session BP2 – Best Papers Session II

B41

In recent years, several bilateral protocols regulating the exchange of arguments between agents have been proposed. When dealing with persuasion, the objective is to arbitrate among conflicting viewpoints. Often, these debates are not entirely predetermined from the initial situation, which means that agents have a chance to influence the outcome in a way that fits their individual preferences. This paper introduces a simple and intuitive protocol for multiparty argumentation, in which several (more than two) agents are equipped with argumentation systems. We further assume that they focus on a (unique) argument (or issue) –thus making the debate two-sided– but do not coordinate. We study what outcomes can (or will) be reached if agents follow this protocol. We investigate in particular under which conditions the debate is pre-determined or not, and whether the outcome coincides with the result obtained by merging the argumentation systems. On the Outcomes of Multiparty Persuasion Elise Bonzon, Nicolas Maudet

R38

Overlapping Coalition Formation (OCF) games are cooperative games where the players can simultaneously participate in several coalitions. Capturing the notion of stability in OCF games is a difficult task: a player may deviate by abandoning some, but not all of the coalitions he is involved in, and the crucial question is whether he then gets to keep his payoff from the unaffected coalitions. In related work the authors introduce three stability concepts for OCF games – the conservative, refined, and optimistic core – that are based on different answers to this question. In this paper, we propose a unified framework for the study of stability in the OCF setting, which encompasses the concepts considered previously as well as a wide variety of alternative stability concepts. Our approach is based on the notion of an arbitrator, which can be thought of as an external party that determines payoff to deviators. We give a complete characterization of outcomes that are stable under arbitration. In particular, our results provide a criterion for the outcome to be in the refined or optimistic core, thus complementing previously results for the conservative core, and answering questions left open previously. We also introduce a notion of the nucleolus for arbitrated OCF games, and argue that it is non-empty. Finally, we extend the definition of the Shapley value to the OCF setting, and provide an axiomatic characterization for it. Arbitrators in Overlapping Coalition Formation Games Yair Zick, Edith Elkindhas 3 papers

R39

Online digital goods auctions are settings where a seller with an unlimited supply of goods (e.g. music or movie downloads) interacts with a stream of potential buyers. In the posted price setting, the seller makes a take-it-or-leave-it offer to each arriving buyer. We study the seller's revenue maximization problem in posted-price auctions of digital goods. We find that algorithms from the multi-armed bandit literature like UCB, which come with good regret bounds, can be slow to converge. We propose and study two alternatives: (1) a scheme based on using Gittins indices with priors that make appropriate use of domain knowledge; (2) a new learning algorithm, LLVD, that assumes a linear demand curve, and maintains a Beta prior over the free parameter using a moment-matching approximation. LLVD is not only (approximately) optimal for linear demand, but also learns fast and performs well when the linearity assumption is violated, for example in the cases of two natural valuation distributions, exponential and log-normal. Learning the Demand Curve in Posted-Price Digital Goods Auctions Meenal Chhabrahas 2 papers, Sanmay Dashas 2 papers

B42

In their groundbreaking paper, Bartholdi, Tovey and Trick argued that many well-known voting rules, such as Plurality, Borda, Copeland and Maximin are easy to manipulate. An important assumption made in that paper is that the manipulator's goal is to ensure that his preferred candidate is among the candidates with the maximum score, or, equivalently, that ties are broken in favor of the manipulator's preferred candidate. In this paper, we examine the role of this assumption in the easiness results of Bartholdi et al. We observe that the algorithm presented in Bartholdi et al extends to all rules that break ties according to a fixed ordering over the candidates. We then show that all scoring rules are easy to manipulate if the winner is selected from all tied candidates uniformly at random. This result extends to Maximin under an additional assumption on the manipulator's utility function that is inspired by the original model of Bartholdi et al. In contrast, we show that manipulation becomes hard when arbitrary polynomial-time tie-breaking rules are allowed, both for the rules considered in Bartholdi et al, and for a large class of scoring rules. Ties Matter: Complexity of Voting Manipulation Revisited Svetlana Obraztsova, Edith Elkindhas 3 papers, Noam Hazon

R40

Boolean games are a natural, compact, and expressive class of logic-based games, in which each player exercises unique control over some set of Boolean variables, and has some logical goal formula that it desires to be achieved. A player's strategy set is the set of all possible valuations that may be made to its variables. A player's goal formula may contain variables controlled by other agents, and in this case, it must reason strategically about how best to assign values to its variables. In the present paper, we consider the possibility of overlaying Boolean games with taxation schemes. A taxation scheme imposes a cost on every possible assignment an agent can make. By designing a taxation scheme appropriately, it is possible to perturb the preferences of the agents within a society, so that agents are rationally incentivised to choose some socially desirable equilibrium that would not otherwise be chosen, or incentivised to rule out some socially undesirable equilibria. After formally presenting the model, we explore some issues surrounding it (e.g., the complexity of finding a taxation scheme that implements some socially desirable outcome), and then discuss possible desirable properties of taxation schemes. Designing Incentives for Boolean Games Ulle Endriss, Sarit Kraushas 4 papers, Jérôme Langhas 2 papers, Michael Wooldridgehas 6 papers

Session A1 – Robotics

R41

A common decision problem in multi-robot applications involves deciding on which robot, out of a group of N robots, should travel to a goal location, to carry out a task there. Trivially, this decision problem can be solved greedily, by selecting the robot with the shortest expected travel time. However, this ignores the inherent uncertainty in path traversal times; we may prefer a robot that is slower (but always takes the same time), over a robot that is expected to reach the goal faster, but on occasion takes a very long time to arrive. We make several contributions that address this challenge. First, we bring to bear economic decision-making theory, to distinguish between different selection policies, based on risk (risk averse, risk seeking, etc.). Second, we introduce social regret (the difference between the actual travel time by the selected robot, and the hypothetical time of other robots) to augment decision-making in practice. Then, we carry out experiments in simulation and with real robots, to demonstrate the usefulness of the selection procedures under real-world settings, and find that travel-time distributions have repeating characteristics. Who Goes There? Selecting a Robot to Reach a Goal Using Social Regret Meytal Traub, Gal A. Kaminkahas 3 papers, Noa Agmonhas 2 papers

R42

Autonomous mobile robots are considered a valuable technology for search and rescue applications, where an initially unknown environment has to be explored to locate human victims. In this scenario, robots exploit exploration strategies to autonomously move around the environment. Most of the strategies proposed in literature are based on the idea of evaluating a number of candidate locations according to ad hoc utility functions that combine different criteria. In this paper, we show some of the advantages of using a more theoretically-grounded approach, based on Multi-Criteria Decision Making (MCDM), to define exploration strategies for robots employed in search and rescue applications. We implemented some MCDM-based exploration strategies within an existing robot controller and we experimentally evaluated their performance in a simulated environment. Exploration Strategies Based on Multi-Criteria Decision Making for Search and Rescue Autonomous Robots Nicola Basilicohas 2 papers, Francesco Amigoni

G41

Performing everyday manipulation tasks successfully depends on the ability of autonomous robots to appropriately account for the physical behavior of task-related objects. Meaning that robots have to predict and consider the physical effects of their possible actions to take. In this work we investigate a simulation-based approach to naive physics temporal projection in the context of autonomous robot everyday manipulation. We identify the abstractions underlying typical first-order axiomatizations as the key obstacles for making valid naive physics predictions. We propose that temporal projection for naive physics problems should not be performed based on abstractions but rather based on detailed physical simulations. This idea is realized as a temporal projection system for autonomous manipulation robots that translates naive physics problems into parametrized physical simulation tasks, that logs the data structures and states traversed in simulation, and translates the logged data back into symbolic time-interval-based first-order representations. Within this paper, we describe the concept and implementation of the temporal projection system and present the example of an egg-cracking robot for demonstrating its feasibility. Simulation-based Temporal Projection of Everyday Robot Object Manipulation Lars Kunze, Mihai Emanuel Dolha, Emitza Guzman, Michael Beetz

B43

Autonomy requires robustness. The use of unmanned (autonomous) vehicles is appealing for tasks which are dangerous or dull. However, increased reliance on autonomous robots increases reliance on their robustness. Even with validated software, physical faults can cause the controlling software to perceive the environment incorrectly, and thus to make decisions that lead to task failure. We present an online anomaly detection method for robots, that is light-weight, and is able to take into account a large number of monitored sensors and internal measurements, with high precision. We demonstrate a specialization of the familiar Mahalanobis Distance for robot use, and also show how it can be used even with very large dimensions, by online selection of correlated measurements for its use. We empirically evaluate these contributions in different domains: commercial Unmanned Aerial Vehicles (UAVs), a vacuum-cleaning robot, and a high-fidelity flight simulator. We find that the online Mahalanobis distance technique, presented here, is superior to previous methods. Online Anomaly Detection in Unmanned Vehicles Eliahu Khalastchi, Gal A. Kaminkahas 3 papers, Meir Kalech, Raz Lin

B44

Incremental heuristic search algorithms can solve sequences of similar search problems potentially faster than heuristic search algorithms that solve each search problem from scratch. So far, there existed incremental heuristic search algorithms (such as Adaptive A*) that make the h-values of the current A* search more informed, which can speed up future A* searches, and incremental heuristic search algorithms (such as D* Lite) that change the search tree of the current A* search to the search tree of the next A* search, which can be faster than constructing it from scratch. In this paper, we present Tree Adaptive A*, which applies to goal-directed navigation in unknown terrain and builds on Adaptive A* but combines both classes of incremental heuristic search algorithms in a novel way. We demonstrate experimentally that it can run faster than Adaptive A*, Path Adaptive A* and D* Lite, the top incremental heuristic search algorithms in the context of goal-directed navigation in unknown grids. Tree Adaptive A* Carlos Hernándezhas 2 papers, Xiaoxun Sunhas 2 papers, Sven Koenighas 2 papers, Pedro Meseguerhas 2 papers

Session B1 – Distributed Problem Solving I

G42

k- and t-optimality algorithms provide solutions to DCOPs that are optimal in regions characterized by its size and distance respectively. Moreover, they provide quality guarantees on their solutions. Here we generalise the k- and t-optimal framework to introduce C-optimality, a flexible framework that provides reward-independent quality guarantees for optima in regions characterised by any arbitrary criterion. Therefore, C-optimality allows us to explore the space of criteria (beyond size and distance) looking for those that lead to better solution qualities. We benefit from this larger space of criteria to propose a new criterion, the socalled size-bounded-distance criterion, which outperforms k- and t-optimality. Quality Guarantees for Region Optimal DCOP Algorithms Meritxell Vinyals, Eric Shieh, Jesus Cerquideshas 2 papers, Juan Antonio Rodriguez-Aguilarhas 3 papers, Zhengyu Yinhas 2 papers, Milind Tambehas 8 papers, Emma Bowringhas 2 papers

G43

Scheduling agents can use the Multiagent Simple Temporal Problem (MaSTP) formulation to efficiently find and represent the complete set of alternative consistent joint schedules in a distributed and privacy-maintaining manner. However, continually revising this set of consistent joint schedules as new constraints arise may not be a viable option in environments where communication is uncertain, costly, or otherwise problematic. As an alternative, agents can find and represent a temporal decoupling in terms of locally independent sets of consistent schedules that, when combined, form a set of consistent joint schedules. Unlike current algorithms for calculating a temporal decoupling that require centralization of the problem representation, in this paper we present a new, provably correct, distributed algorithm for calculating a temporal decoupling. We prove that this algorithm has the same theoretical computational complexity as current state-of-the-art MaSTP solution algorithms, and empirically demonstrate that it is more efficient in practice. We also introduce and perform an empirical cost/benefit analysis of new techniques and heuristics for selecting a maximally flexible temporal decoupling. Distributed Algorithms for Solving the Multiagent Temporal Decoupling Problem James C. Boerkoel, Edmund H. Durfeehas 3 papers

R43

Distributed systems can often be modeled as a collection of distributed (system) variables whose values are constrained by a set of constraints. In distributed multi-agent systems, the set of variables occurring at a site (subsystem) is usually viewed as controllable by a local agent. This agent assigns values to the variables, and the aim is to provide distributed methods enabling a set of agents to come up with a global assignment (solution) that satisfies all the constraints. Alternatively, the system might be understood as a distributed database. Here, the focus is on ensuring consistency of the global system if local constraints (the distributed parts of the database) change. In this setting, the aim is to determine whether the existence of a global solution can be guaranteed. In other settings (e.g., P2P systems, sensor networks), the values of the variables might be completely out of control of the individual systems, and the constraints only characterize globally normal states or behavior of the system. In order to detect anomalies, one specifies distributed methods that can efficiently indicate violations of such constraints. The aim of this paper is to show that the following three main problems identified in these research areas are in fact identical: (i) the problem of ensuring that independent agents come up with a global solution; (ii) the problem of ensuring that global consistency is maintained if local constraint stores change; and (iii) the problem of ensuring that global violations can be detected by local nodes. This claim is made precise by developing a decomposition framework for distributed constraint systems and then extracting preservation properties that must satisfied in order to solve the above mentioned problems. Although satisfying the preservation properties seems to require different decomposition modes, our results demonstrate that in fact these decomposition properties are equivalent, thereby showing that the three main problems identified above are identical. We then show that the complexity of finding such decompositions is polynomially related to finding solutions for the original constraint system, which explains the popularity of decomposition applied to tractable constraint systems. Finally, we address the problem of finding optimal decompositions and show that even for tractable constraint systems, this problem is hard. Decomposing Constraint Systems: Equivalences and Computational Properties Wiebe van der Hoekhas 5 papers, Cees Witteveen, Michael Wooldridgehas 6 papers

B45

Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the “right” time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. Decentralized Monitoring of Distributed Anytime Algorithms Alan Carlinhas 2 papers, Shlomo Zilbersteinhas 2 papers

B46

We consider the fundamental problem of reaching consensus in multiagent systems; an operation required in many applications such as, among others, vehicle formation and coordination, shape formation in modular robotics, distributed target tracking, and environmental modeling. To date, the consensus problem (the problem where agents have to agree on their reported values) has been typically solved with iterative decentralized algorithms based on graph Laplacians. However, the convergence of these existing consensus algorithms is often too slow for many important multiagent applications, and thus they are increasingly being combined with acceleration methods. Unfortunately, state-of-the- art acceleration techniques require parameters that can be optimally selected only if complete information about the network topology is available, which is rarely the case in practice. We address this limitation by deriving two novel acceleration methods that can deliver good performance even if little information about the network is available. The first proposed algorithm is based on the Chebyshev semi-iterative method and is optimal in a well defined sense; it maximizes the worst-case convergence speed (in the mean sense) given that only rough bounds on the extremal eigenvalues of the network matrix are available. It can be applied to systems where agents use unreliable communication links, and its computational complexity is similar to those of simple Laplacian-based methods. This algorithm requires synchronization among agents, so we also propose an asynchronous version that approximates the output of the synchronous algorithm. Mathematical analysis and numerical simulations show that the convergence speed of the proposed acceleration methods decrease gracefully in scenarios where the sole use of Laplacian-based methods is known to be impractical. Consensus Acceleration in Multiagent Systems with the Chebyshev Semi-Iterative Method Renato L.G. Cavalcantehas 2 papers, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

Session C1 – Game Theory I

R44

Proper scoring rules, particularly when used as the basis for a prediction market, are powerful tools for eliciting and aggregating beliefs about events such as the likely outcome of an election or sporting event. Such scoring rules incentivize a single agent to reveal her true beliefs about the event. Othman and Sandholm introduced the idea of a decision rule to examine these problems in contexts where the information being elicited is conditional on some decision alternatives. For example,“What is the probability having ten million viewers if we choose to air new television show X? What if we choose Y?” Since only one show can actually air in a slot, only the results under the chosen alternative can ever be observed. Othman and Sandholm developed proper scoring rules (and thus decision markets) for a single, deterministic decision rule: always select the action with the greatest probability of success. In this work we significantly generalize their results, developing scoring rules for other deterministic decision rules, randomized decision rules, and situations where there may be more than two outcomes (e.g. less than a million viewers, more than one but less than ten, or more than ten million). Information Elicitation for Decision Making Yiling Chenhas 2 papers, Ian A. Kash

R45

An important aspect in systems of multiple autonomous agents is the exploitation of synergies via coalition formation. In this paper, we solve various open problems concerning the computational complexity of stable partitions in additively separable hedonic games. First, we propose a polynomial-time algorithm to compute a contractually individually stable partition. This contrasts with previous results such as the NP-hardness of computing individually stable or Nash stable partitions. Secondly, we prove that checking whether the core or the strict core exists is NP-hard in the strong sense even if the preferences of the players are symmetric. Finally, it is shown that verifying whether a partition consisting of the grand coalition is contractual strict core stable or Pareto optimal is coNP-complete. Stable Partitions in Additively Separable Hedonic Games Haris Azizhas 2 papers, Felix Brandt, Hans Georg Seedig

R46

We revisit the coalition structure generation problem in which the goal is to partition the players into exhaustive and disjoint coalitions so as to maximize the social welfare. One of our key results is a general polynomial-time algorithm to solve the problem for all monotonic coalitional games provided that player types are known and the number of player types is bounded by a constant. As a corollary, we obtain a polynomial-time algorithm to compute an optimal partition for weighted voting games with a constant number of weight values and for coalitional skill games with a constant number of skills. We also consider well-studied and well-motivated coalitional games defined compactly on combinatorial domains. For these games, we characterize the complexity of computing an optimal coalition structure by presenting polynomial-time algorithms, approximation algorithms, or NP-hardness and inapproximability lower bounds. Complexity of Coalition Structure Generation Haris Azizhas 2 papers, Bart de Keijzer

B47

The class of simulation-based games, in which the payoffs are generated as an output of a simulation process, recently received a lot of attention in literature. In this paper, we extend such class to games in extensive form with continuous actions and perfect information. We design two convergent algorithms to find an approximate subgame perfect equilibrium (SPE) and an approximate Nash equilibrium (NE) respectively. Our algorithms can exploit different optimization techniques. In particular, we use: simulated annealing, cross entropy method, and Lipschitz optimization. We produce an extensive experimental evaluation of the performance of our algorithms in terms of approximation degree of the optimal solution and number of evaluated samples. Finding approximate NE and SPE requires exponential time in the game tree depth: an SPE can be computed in game trees with a small depth, while the computation of an NE is easier. Equilibrium Approximation in Simulation-Based Extensive-Form Games Nicola Gattihas 4 papers, Marcello Restelli

B48

Motivated by a machine learning perspective–that game-theoretic equilibria constraints should serve as guidelines for predicting agents' strategies, we introduce maximum causal entropy correlated equilibria (MCECE), a novel solution concept for general-sum Markov games. In line with this perspective, a MCECE strategy profile is a uniquely-defined joint probability distribution over actions for each game state that minimizes the worst-case prediction of agents' actions under log-loss. Equivalently, it maximizes the worst-case growth rate for gambling on the sequences of agents' joint actions under uniform odds. We present a convex optimization technique for obtaining MCECE strategy profiles that resembles value iteration in finite-horizon games. We assess the predictive benefits of our approach by predicting the strategies generated by previously proposed correlated equilibria solution concepts, and compare against those previous approaches on that same prediction task. Maximum Causal Entropy Correlated Equilibria for Markov Games Brian D. Ziebart, J. Andrew Bagnell, Anind K. Dey

Session D1 – Multiagent Learning

G44

In multi-agent planning environments, action models for each agent must be given as input. However, creating such action models by hand is difficult and time-consuming, because it requires formally representing the complex relationships among different objects in the environment. The problem is compounded in multi-agent environments where agents can take more types of actions. In this paper, we present an algorithm to learn action models for multi-agent planning systems from a set of input plan traces. Our learning algorithm Lammas automatically generates three kinds of constraints: (1) constraints on the interactions between agents, (2) constraints on the correctness of the action models for each individual agent, and (3) constraints on actions themselves. Lammas attempts to satisfy these constraints simultaneously using a weighted maximum satisfiability model known as MAX-SAT, and converts the solution into action models. We believe this to be one of the first learning algorithms to learn action models in the context of multi-agent planning environments. We empirically demonstrate that Lammas performs effectively and efficiently in several planning domains. Learning Action Models for Multi-Agent Planning Hankz Hankui Zhuo, Hector Muñoz-Avila, Qiang Yang

G45

Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-agent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon. Theoretical Considerations of Potential-Based Reward Shaping for Multi-Agent Systems Sam Devlinhas 2 papers, Daniel Kudenkohas 2 papers

G46

We have proposed the utility-based Q-learning concept that supposes an agent internally has an emotional mechanism that derives subjective utilities from objective rewards and the agent uses the utilities as rewards of Q-learning. We have also proposed such an emotional mechanism that facilitates cooperative actions in Prisoner's Dilemma (PD) games. However, this mechanism has been designed and implemented manually in order to force the agents to take cooperative actions in PD games. Since it seems slightly unnatural, this work considers whether such an emotional mechanism exists and where it comes from. We try to evolve such mechanisms that facilitate cooperative actions in PD games by conducting simulation experiments with a genetic algorithm, and we investigate the evolved mechanisms from various points of view. Evolving Subjective Utilities: Prisoner's Dilemma Game Examples Koichi Moriyama, Satoshi Kurihara, Masayuki Numao

B49

The Service Game is a model for reciprocity in multiagent systems. Here, agents interact repeatedly by requesting and providing services. In contrast to existing models where players are matched randomly, players of the Service Game may choose with whom they play. The rationale behind provider selection is to choose a provider that is likely to perform a task as desired. We develop a formal model for provider selection in the Service Game. An evolutionary process based on a genetic algorithm allows us to incorporate notions of bounded rationality, learning, and adaptation into the analysis of the game. We conduct a series of experiments to study the evolution of strategies and the emergence of cooperation. We show that cooperation is more expensive with provider selection than with random matching. Further, populations consisting of discriminators and defectors form a bistable community. Cooperation through Reciprocity in Multiagent Systems: An Evolutionary Analysis Christian Hütter, Klemens Böhmhas 2 papers

G47

We present a game-theoretic self-organizing approach for scheduling the radio activity of wireless sensor nodes. Our approach makes each node play a win-stay lose-shift (WSLS) strategy to choose when to schedule radio transmission, reception and sleeping periods. The proposed strategy relies only on local interactions with neighboring nodes, and is thus fully decentralized. This behavior results in shorter communication schedules, allowing to not only reduce energy consumption by reducing the wake-up cycles of sensor nodes, but also to decrease the data retrieval latency. We implement this WSLS approach in the OMNeT++ sensor network simulator where nodes are organized in three topologies —line, grid and random. We compare the performance of our approach to two state-of-the-art scheduling protocols, namely S-MAC and D-MAC, and show that the WSLS strategy brings significant gains in terms of energy savings, while at the same time reduces communication delays. In addition, we show that our approach performs particularly well in large, random topologies. Distributed Cooperation in Wireless Sensor Networks Mihail Mihaylov, Yann-Aël Le Borgne, Karl Tuylshas 4 papers, Ann Nowéhas 2 papers

Session A2 – Logic-Based Approaches I

B50

We propose coalitional normative system (cns), which can selectively restrict the joint behavior of a coalition, in this paper. We extend the semantics of atl and propose Coordinated atl (co-atl) to support the formalizing of cns. We soundly and completely characterize the limitation of the normative power of a coalition by identifying two fragments of col-atl language corresponding to two types of system properties that are unchangeable by restricting the joint behavior of such a coalition. Then, we prove that the effectiveness checking, feasibility and synthesis problems of cns are ptime-complete, cp-complete and fnp-complete, respectively. Moreover, we define two concepts of optimality for cns, that is, minimality and compactness, and prove that both minimality checking and compactness checking are conp-complete while the problem of checking whether a coalition is a minimal controllable coalition is dp-complete. The relation between ns and cns is discussed, and it turns out that css intrinsically consists of a proper subset of cnss and some basic problems related to cns are no more complex than that of ns. A Framework for Coalitional Normative Systems Jun Wu, Chongjun Wang, Junyuan Xie

G48

An abstract argumentation framework and the semantics, often called Dungean semantics, give a general framework for nonmonotonic logics. In the last fteen years, a great number of papers in computational argumentation adopt Dungean semantics as a fundamental principle for evaluating various kinds of defeasible consequences. Recently, many papers address problems not only with theoretical reasoning, i.e., reasoning about what to believe, but also practical reasoning, i.e., reasoning about what to do. This paper proposes a practical argumentation semantics specic to practical argumentation. This is motivated by our hypothesis that consequences of such argumentation should satisfy Pareto optimality because the consequences strongly depend on desires, aims, or values an individual agent or a group of agents has. We dene a practical argumentation framework and two kinds of extensions, preferred and grounded extensions, with respect to each group of agents. We show that evaluating Pareto optimality can be translated to evaluating preferred extensions of a particular practical argumentation framework. Furthermore, we show that our semantics is a natural extension of Dungean semantics in terms of considering more than one defeat relation. We give a generality order of four practical argumentation frameworks specied by taking into account Dungean semantics and Pareto optimality. We show that a member of preferred extensions of the most specic one is not just Pareto optimal, but also it is theoretically justied. Practical Argumentation Semantics for Socially Efficient Defeasible Consequence Hiroyuki Kido, Katsumi Nitta

G49

Reasoning about the mental states of agents is important in various settings, and has been recognized as vital for teamwork. But the complexity of some of the more well-known agent logics that facilitate reasoning about mental states prohibits the use of these logics in practice. An alternative is to investigate fragments of these logics that have a lower complexity but are still expressive enough for reasoning about the mental states of (other) agents. We explore this alternative and take as our starting point the linear time variant of BDI logic (BDI_LTL). We summarize some of the relevant known complexity results for e.g. LTL, KD45_n, and BDI_LTL itself. We present a tableau-based method for establishing complexity bounds, and provide a map of the complexity of (various fragments of) BDI_LTL. Finally, we identify a few fragments that may be usefully applied for reasoning about mental states. Taming the Complexity of Linear Time BDI Logics Nils Bullinghas 2 papers, Koen V. Hindriks

Session B2 – Agent-Based System Development I

B51

Scenarios in current design methodologies, provide a natural way for the users to identify the inputs and outputs of the system revolving around a particular interaction process. A scenario typically consists of a sequence of steps which captures a particular run of the system and satisfies some aspect of the requirements. In this work we add additional structure to the scenarios used in the Prometheus agent development methodology. This additional structure then facilitates both traceability and automated testing. We describe our process for mapping the scenarios and their steps to the initial detailed design, where we then maintain the traceability as the design develops. The structured action lists that we define for both scenarios and their variations provides the basis for facilitating automated testing of system behavior. We describe how we use the newly defined structure within the scenarios to facilitate testing, describing how we automate test case generation, execution and analysis. Scenarios for System Requirements Traceability and Testing John Thangarajahhas 2 papers, Gaya Jayatilleke, Lin Padgham

G50

The introduction of affect or emotion modeling into software opens up new possibilities for improving user experience. Yet, current techniques for building affective applications are limited, with the treatment of affect in essence handcrafted in each application. The multiagent middleware Koko attempts to reduce the burden of incorporating affect modeling into applications. However, Koko can be effective only if the models it needs to function are suitably constructed. We propose Kokomo, a methodology that employs expressive communicative acts as an organizing principle for affective applications. Kokomo specifies the steps needed to create an affective application in Koko. A key motivation is that Kokomo would facilitate the construction of an affective application by engineers who may lack a prior background in affective modeling. We empirically evaluate Kokomo's utility through a developer study. The results are positive and demonstrate that the developers who used Kokomo were able to develop an affective application in less time, with fewer lines of code, and with a reduced perception of difficulty than developers who worked without Kokomo. Kokomo: An Empirically Evaluated Methodology for Affective Applications Derek J. Sollenberger, Munindar P. Singhhas 5 papers

G51

Many multi-agent system applications involve software agents that reason about the behavior of other agents with which they interact in cooperation or competition. In order to design and develop those systems, the employed programming languages should provide tools to facilitate the implementation of agents that can perform such reasoning. This paper focuses on BDI-based programming languages and proposes a nonmonotonic reasoning mechanism that can be incorporated into agents, allowing them to reason about observed behavior to infer others' beliefs or goals. In particular, it is suggested that the behavior-generating rules of agents are translated into a nonmonotonic logic programming framework. A formal analysis of the presented approach is provided and it is shown that it has desirable properties. Programming Mental State Abduction Michal Sindlar, Mehdi Dastanihas 4 papers, John-Jules Ch. Meyerhas 2 papers

Session C2 – Social Choice Theory

G52

Given the preferences of several agents over a common set of candidates, voting trees can be used to select a candidate (the winner) by a sequence of pairwise competitions modelled by a binary tree (the agenda). The majority graph compactly represents the preferences of the agents and provides enough information to compute the winner. When some preferences are missing, there are various notions of winners, such as the possible winners (that is, winners in at least one completion) or the necessary winners (that is, winners in all completions). In this generalized scenario, we show that using the majority graph to compute winners is not correct, since it may declare as winners candidates that are not so. Nonetheless, the majority graph can be used to compute efficiently an upper or lower approximation of the correct set of winners. Possible And Necessary Winners In Voting Trees: Majority Graphs Vs. Profiles Maria Silvia Pinihas 2 papers, Francesca Rossihas 2 papers, Kristen Brent Venablehas 2 papers, Toby Walshhas 2 papers

R47

Strategyproof (SP) classification considers situations in which a decision-maker must classify a set of input points with binary labels, minimizing expected error. Labels of input points are reported by self-interested agents, who may lie so as to obtain a classifier more closely matching their own labels. These lies would create a bias in the data, and thus motivate the design of truthful mechanisms that discourage false reporting. We here answer questions left open by previous research on strategyproof classification, in particular regarding the best approximation ratio (in terms of social welfare) that an SP mechanism can guarantee for n agents. Our primary result is a lower bound of 3 - 2/n on the approximation ratio of SP mechanisms under the shared inputs assumption; this shows that the previously known upper bound (for uniform weights) is tight. The proof relies on a result from Social Choice theory, showing that any SP mechanism must select a dictator at random, according to some fixed distribution. We then show how different randomizations can improve the best known mechanism when agents are weighted, matching the lower bound with a tight upper bound. These results contribute both to a better understanding of the limits of SP classification, as well as to the development of similar tools in other, related domains such as SP facility location. Tight Bounds for Strategyproof Classification Reshef Meir, Shaull Almagor, Assaf Michaely, Jeffrey S. Rosenscheinhas 2 papers

G53

In response to the Mumbai attacks of 2008, the Mumbai police have started to schedule a limited number of inspection checkpoints on the road network throughout the city. Algorithms for similar security-related scheduling problems have been proposed in recent literature, but security scheduling in networked domains when targets have varying importance remains an open problem at large. In this paper, we cast the network security problem as an attackerdefender zero-sum game. The strategy spaces for both players are exponentially large, so this requires the development of novel, scalable techniques. We first show that existing algorithms for approximate solutions can be arbitrarily bad in general settings. We present RUGGED (Randomization in Urban Graphs by Generating strategies for Enemy and Defender), the first scalable optimal solution technique for such network security games. Our technique is based on a double oracle approach and thus does not require the enumeration of the entire strategy space for either of the players. It scales up to realistic problem sizes, as is shown by our evaluation of maps of southern Mumbai obtained from GIS data. A Double Oracle Algorithm for Zero-Sum Security Games on Graphs Manish Jainhas 3 papers, Dmytro Korzhykhas 2 papers, Ondřej Vaněkhas 4 papers, Vincent Conitzerhas 2 papers, Michal Pěchoučekhas 5 papers, Milind Tambehas 8 papers

Session D2 – Preferences and Strategies

G54

Game-tree search algorithms have contributed greatly to the success of computerized players in two-player extensive-form games. In multi-player games there has been less success, partly because of the difficulty of recognizing and reasoning about the inter-player relationships that often develop and change during human game-play. Simplifying assumptions (e.g., assuming each player selfishly aims to maximize its own payoff) have not worked very well in practice. We describe a new algorithm for multi-player games, Socially-oriented Search (SOS), that incorporates ideas from Social Value Orientation theory from social psychology. We provide a theoretical study of the algorithm, and a method for recognizing and reasoning about relationships as they develop and change during a game. Our empirical evaluations of SOS in the strategic board game Quoridor show it to be significantly more effective against players with dynamic interrelationships than the current state-of-the-art algorithms. Modeling Social Preferences in Multi-player Games Brandon Wilsonhas 2 papers, Inon Zuckerman, Dana Nau

R48

Revelation games are bilateral bargaining games in which agents may choose to truthfully reveal their private information before engaging in multiple rounds of negotiation. They are analogous to real-world situations in which people need to decide whether to disclose information such as medical records or university transcripts when negotiating over health plans and business transactions. This paper presents an agent-design that is able to negotiate proficiently with people in a revelation game with different dependencies that hold between players. The agent modeled the social factors that affect the players' revelation decisions on people's negotiation behavior. It was empirically shown to outperform people in empirical evaluations as well as agents playing equilibrium strategies. It was also more likely to reach agreement than people or equilibrium agents. A Study of Computational and Human Strategies in Revelation Games Noam Peled, Ya'akov (Kobi) Galhas 2 papers, Sarit Kraushas 4 papers

R49

CP-net (Conditional Preference Network) is one of the extensively studied languages for representing and reasoning with preferences. The fundamental operation of dominance testing in CP-nets, i.e. determining whether an outcome is preferred to another, is very important in many real-world applications. Current techniques for solving general dominance queries is to search for improving flipping sequence from one outcome to another as a proof of the dominance relation in all rankings satisfying the given CP-net. However, it is generally a hard problem even for binary-valued, acyclic CP-nets and tractable search algorithms exist only for specific problem classes. Hence, there is a need for efficient algorithms and techniques for dominance testing in more general problem settings. In this paper, we propose a heuristic approach, called DT*, to dominance testing in arbitrary acyclic multi-valued CP-nets. Our proposed approach guides the search process efficiently and allows significant reduction of search effort without impacting soundness or completeness of the search process. We present results of experiments that demonstrate the computational efficiency and feasibility of our approach to dominance testing. Efficient Heuristic Approach to Dominance Testing in CP-nets Minyi Lihas 3 papers, Quoc Bao Vohas 4 papers, Ryszard Kowalczykhas 4 papers

Session A3 – Distributed Problem Solving II

R50

In this paper we address efficient decentralised coordination of cooperative multi-agent systems by taking into account the actual computation and communication capabilities of the agents. We consider coordination problems that can be framed as Distributed Constraint Optimisation Problems, and as such, are suitable to be deployed on large scale multi-agent systems such as sensor networks or multiple unmanned aerial vehicles. Specifically, we focus on techniques that exploit structural independence among agents' actions to provide optimal solutions to the coordination problem, and, in particular, we use the Generalized Distributive Law (GDL) algorithm. In this settings, we propose a novel resource aware heuristic to build junction trees and to schedule GDL computations across the agents. Our goal is to minimise the total running time of the coordination process, rather than the theoretical complexity of the computation, by explicitly considering the computation and communication capabilities of agents. We evaluate our proposed approach against DPOP, RDPI and a centralized solver on a number of benchmark coordination problems, and show that our approach is able to provide optimal solutions for DCOPs faster than previous approaches. Specifically, in the settings considered, when resources are scarce our approach is up to three times faster than DPOP (which proved to be the best among the competitors in our settings). Resource-Aware Junction Trees for Efficient Multi-Agent Coordination N. Stefanovitch, A. Farinelli, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

B52

We propose the bounded multi-objective max-sum algorithm (B-MOMS), the first decentralised coordination algorithm for multi-objective optimisation problems. B-MOMS extends the max-sum message-passing algorithm for decentralised coordination to compute bounded approximate solutions to multi-objective decentralised constraint optimisation problems (MO-DCOPs). Specifically, we prove the optimality of B-MOMS in acyclic constraint graphs, and derive problem dependent bounds on its approximation ratio when these graphs contain cycles. Furthermore, we empirically evaluate its performance on a multi-objective extension of the canonical graph colouring problem. In so doing, we demonstrate that, for the settings we consider, the approximation ratio never exceeds 2, and is typically less than 1.5 for less-constrained graphs. Moreover, the runtime required by B-MOMS on the problem instances we considered never exceeds 30 minutes, even for maximally constrained graphs with 100 agents. Thus, B-MOMS brings the problem of multi-objective optimisation well within the boundaries of the limited capabilities of embedded agents. Bounded Decentralised Coordination over Multiple Objectives Francesco M. Delle Fave, Ruben Stranders, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

B53

In this paper we focus on solving DCOPs in communication constrained scenarios. The GDL algorithm optimally solves DCOP problems, but requires the exchange of exponentially large messages which makes it impractical in such settings. Function filtering is a technique that alleviates this high communication requirement while maintaining optimality. Function filtering involves calculating approximations of the exact cost functions exchanged by GDL. In this work, we explore different ways to compute such approximations, providing a novel method that empirically achieves significant communication savings. Communication-Constrained DCOPs: Message Approximation in GDL with Function Filtering Marc Pujol-Gonzalez, Jesus Cerquideshas 2 papers, Pedro Meseguerhas 2 papers, Juan Antonio Rodriguez-Aguilarhas 3 papers

Session B3 – Agent-Based System Development II

R51

Multi-agent systems form the basis of many innovative large-scale distributed applications. The development of such applications requires a careful balance of a wide range of concerns: a detailed understanding of the behaviour of the abstract algorithms being employed, a knowledge of the effects and costs of operating in a distributed environment, and an expertise in the performance requirements of the application itself. Experimental work plays a key role in the process of designing such systems. This paper examines the multi-agent systems development cycle from a distributed systems perspective. A survey of recent experimental studies finds that a large proportion of work on the design of multi-agent systems is focused on the analytical and simulation phases of development. This paper advocates an alternative more comprehensive development cycle, which extends from theoretical studies to simulations, emulations, demonstrators and finally staged deployment. AgentScope, a tool that supports the experimental stages of multiagents systems development and facilitates long-term dispersed research efforts, is introduced. AgentScope consists of a small set of interfaces on which experimental work can be built independently of a particular type of platform. The aim is to make not only agent code but also experimental scenarios, and metrics reusable, both between projects and over simulation, emulation and demonstration platforms. An example gossip-based sampling experiment demonstrates reusability, showing the ease with which an experiment can be defined, modified into a comparison study, and ported between a simulator and an actual agent-operating system. AgentScope: Multi-Agent Systems Development in Focus Elth Ogston, Frances Brazier

R52

We present AgentSpeak(RT), a real-time BDI agent programming language based on AgentSpeak(L). AgentSpeak(RT) extends AgentSpeak intentions with deadlines which specify the time by which the agent should respond to an event, and priorities which specify the relative importance of responding to a particular event. The AgentSpeak(RT) interpreter commits to a priority-maximal set of intentions: a set of intentions which is maximally feasible while preferring higher priority intentions. We prove some properties of the language, such as guaranteed reactivity delay of the AgentSpeak(RT) interpreter and probabilistic guarantees of successful execution of intentions by their deadlines. Agent Programming with Priorities and Deadlines Konstantin Vikhorev, Natasha Alechina, Brian Logan

B54

Goals are central to the design and implementation of intelligent software agents. Much of the literature on goals and reasoning about goals in agent programming frameworks only deals with a limited set of goal types, typically achievement goals, and sometimes maintenance goals. In this paper we extend a previously proposed unifying framework for goals with additional richer goal types that are explicitly represented as Linear Temporal Logic (LTL) formulae. We show that these goal types can be modelled as a combination of achieve and maintain goals. This is done by providing an operationalization of these new goal types, and showing that the operationalization generates computation traces that satisfy the temporal formula. Rich Goal Types in Agent Programming Mehdi Dastanihas 4 papers, M. Birna van Riemsdijkhas 2 papers, Michael Winikoffhas 3 papers

Session C3 – Bounded Rationality

R53

Increasingly in both traditional, and especially Internet-based marketplaces, knowledge is becoming a traded commodity. This paper considers the impact of the presence of knowledge-brokers, or experts, on search-based markets with noisy signals. For example, consider a consumer looking for a used car on a large Internet marketplace. She sees noisy signals of the true value of any car she looks at the advertisement for, and can disambiguate this signal by paying for the services of an expert (for example, getting a Carfax report, or taking the car to a mechanic for an inspection). Both the consumer and the expert are rational, self-interested agents. We present a model for such search environments, and analyze several aspects of the model, making three main contributions: (1) We derive the consumer's optimal search strategy in environments with noisy signals, with and without the option of consulting an expert; (2) We find the optimal strategy for maximizing the expert's profit; (3) We study the option of market designers to subsidize search in a way that improves overall social welfare. We illustrate our results in the context of a plausible distribution of signals and values. Expert-Mediated Search Meenal Chhabrahas 2 papers, Sanmay Dashas 2 papers, David Sarnehas 2 papers

B55

Creating agents that properly simulate and interact with people is critical for many applications. Towards creating these agents, models are needed that quickly and accurately predict how people behave in a variety of domains and problems. This paper explores how one bounded rationality theory, Aspiration Adaptation Theory (AAT), can be used to aid in this task. We extensively studied two types of problems - a relatively simple optimization problem and two complex negotiation problems. We compared the predictive capabilities of traditional learning methods with those where we added key elements of AAT and other optimal and bounded rationality models. Within the extensive empirical studies we conducted, we found that machine learning models combined with AAT were most effective in quickly and accurately predicting people's behavior. Using Aspiration Adaptation Theory to Improve Learning Avi Rosenfeldhas 2 papers, Sarit Kraushas 4 papers

G55

In many settings and for various reasons, people fail to make optimal decisions. These factors also influence the agents people design to act on their behalf in such virtual environments as eCommerce and distributed operating systems, so that the agents also act sub-optimally despite their greater computational capabilities. In some decision-making situations it is theoretically possible to supply the optimal strategy to people or their agents, but this optimal strategy may be non-intuitive, and providing a convincing explanation of optimality may be complex. This paper explores an alternative approach to improving the performance of a decision-maker in such settings: the data on choices is manipulated to guide searchers to a strategy that is closer to optimal. This approach was tested for sequential search, which is a classical sequential decision-making problem with broad areas of applicability (e.g., product search, partnership search). The paper introduces three heuristics for manipulating choices, including one for settings in which repeated interaction or access to a decision-maker's past history is available. The heuristics were evaluated on a large population of computer agents, each of which embodies a search strategy programmed by a different person. Extensive tests on thousands of search settings demonstrate the promise of the problem-restructuring approach: despite a minor degradation in performance for a small portion of the population, the overall and average individual performance improve substantially. The heuristic that adapts based on a decision-maker's history achieved the best results. Less Is More: Restructuring Decisions to Improve Agent Search David Sarnehas 2 papers, Avshalom Elmalech, Barbara J. Grosz, Moti Geva

Session D3 – Virtual Agents I

B56

Integrating culture as a parameter into the behavioral models of virtual characters to simulate cultural differences is becoming more and more popular. But do these differences affect the user's perception? In the work described in this paper, we integrated aspects of non-verbal behavior as well as communication management behavior into the behavioral models of virtual characters for the two cultures of Germany and Japan in order to find out which of these aspects affect human observers of the target cultures. We give a literature review pointing out the expected differences in these two cultures and describe the analysis of a multi-modal corpus including video recordings of German and Japanese interlocutors. After integrating our findings into a demonstrator featuring a German and a Japanese scenario, we presented the virtual scenarios to human observers of the two target cultures in an evaluation study. Culture-related Differences in Aspects of Behavior for Virtual Characters Across Germany and Japan Birgit Endrass, Elisabeth Andréhas 2 papers, Afia Akhter Lipi, Matthias Rehm, Yukiko Nakano

G56

Narrative time has an important role to play in Interactive Storytelling (IS). The prevailing approach to controlling narrative time has been to use implicit models that allow only limited temporal reasoning about virtual agent behaviour. In contrast, this paper proposes the use of an explicit model of narrative time which provides a control mechanism that enhances narrative generation, orchestration of virtual agents and number of possibilities for the staging of agent actions. This approach can help address a number of problems experienced in IS systems both at the level of execution staging and at the level of narrative generation. Consequently it has a number of advantages: it is more flexible with respect to the staging of virtual agent actions; it reduces the possibility of timing problems in the coordination of virtual agents; and it enables more expressive representation of narrative worlds and narrative generative power. Overall it provides a uniform, consistent, principled and rigorous approach to the problem of time in agent-based storytelling. In the paper we demonstrate how this approach to controlling narrative time can be implemented within an IS system and illustrate this using our fully implemented IS system that features virtual agents inspired by Shakespeare's The Merchant of Venice. The paper presents results of an experimental evaluation with the system that demonstrates the use of this approach to co-ordinate the actions of virtual agents and to increase narrative generative power. Controlling Narrative Time in Interactive Storytelling Julie Porteoushas 2 papers, Jonathan Teutenberghas 2 papers, Fred Charleshas 2 papers, Marc Cavazzahas 2 papers

G57

In creating an evacuation simulation for training and planning, realistic agents that reproduce known phenomenon are required. Evacuation simulation in the airport domain requires additional features beyond most simulations, including the unique behaviors of first-time visitors who have incomplete knowledge of the area and families that do not necessarily adhere to often-assumed pedestrian behaviors. Evacuation simulations not customized for the airport domain do not incorporate the factors important to it, leading to inaccuracies when applied to it. In this paper, we describe ESCAPES, a multiagent evacuation simulation tool that incorporates four key features: (i) different agent types; (ii) emotional interactions; (iii) informational interactions; (iv) behavioral interactions. Our simulator reproduces phenomena observed in existing studies on evacuation scenarios and the features we incorporate substantially impact escape time. We use ESCAPES to model the International Terminal at Los Angeles International Airport (LAX) and receive high praise from security officials. ESCAPES - Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social comparison Jason Tsai, Natalie Fridmanhas 2 papers, Emma Bowringhas 2 papers, Matthew Brown, Shira Epstein, Gal A. Kaminkahas 3 papers, Stacy Marsellahas 2 papers, Andrew Ogden, Inbal Rika, Ankur Sheel, Matthew E. Taylorhas 4 papers, Xuezhi Wang, Avishay Zilka, Milind Tambehas 8 papers

Session A4 – Agent Communication

R54

Commitments provide a flexible means for specifying the business relationships among autonomous and heterogeneous agents, and lead to a natural way of enacting such relationships. However, current formalizations of commitments incorporate conditions expressed as propositions, but disregard (1) temporal regulations and (2) an agent's control over such regulations. Thus, they cannot handle realistic application scenarios where time and control are often central because of domain conventions or other requirements. We propose a new formalization of commitments that builds on an existing representation of events in which we can naturally express temporal regulations as well as what an agent can control, including indirectly as based on the commitments and capabilities of other agents. Our formalization supports a notion of commitment safety. A benefit of our consolidated approach is that by incorporating these considerations into commitments we enable agents to reason about and flexibly enact the regulations. The main contributions of this paper include (1) a formal semantics of commitments that accommodates temporal regulations; (2) a formal semantics of the notions of innate and social control; and (3) a formalization of when a temporal commitment is safe for its debtor. We evaluate our contributions using an extensive case study. Commitments with Regulations: Reasoning about Safety and Control in Regula Elisa Marengo, Matteo Baldoni, Cristina Baroglio, Amit K. Choprahas 2 papers, Viviana Patti, Munindar P. Singhhas 5 papers

R55

Recent work in communications and business modeling emphasizes a commitment-based view of interaction. By abstracting away from implementation-level details, commitments can potentially enhance perspicuity during modeling and flexibility during enactment. We address the problem of creating commitment-based specifications that directly capture business requirements, yet apply in distributed settings. We encode important business patterns in terms of commitments and group them into methods to better capture business requirements. Our approach yields significant advantages over existing approaches: our patterns (1) respect agent autonomy; (2) capture business intuitions faithfully; and (3) can be enacted in real-life, distributed settings. We evaluate our contributions using the Extended Contract Net Protocol. Specifying and Applying Commitment-Based Business Patterns Amit K. Choprahas 2 papers, Munindar P. Singhhas 5 papers

G58

Social commitments have been widely studied to represent business contracts among agents with different competing objectives in communicating multi-agent systems. However, their formal verification is still an open issue. This paper proposes a novel model-checking algorithm to address this problem. We define a new temporal logic, CTLC, which extends CTL with modalities for social commitments and their fulfillment and violation. The verification technique is based on symbolic model checking that uses ordered binary decision diagrams to give a compact representation of the system. We also prove that the problem of model checking CTLC is polynomial-time reducible to the problem of model checking CTLK, the combination of CTL with modalities for knowledge. We finally present the full implementation of the proposed algorithm by extending the MCMAS symbolic model checker and report on the experimental results obtained when verifying the NetBill protocol. On the Verification of Social Commitments and Time Mohamed El-Menshawy, Jamal Bentahar, Hongyang Qu, Rachida Dssouli

B57

We present a novel approach to interaction-oriented programming based on declaratively representing communication protocols. Our approach exhibits the following distinguishing features. First, it treats a protocol as an engineering abstraction in its own right. Second, it models a protocol in terms of the information that the protocol needs to proceed (so agents enact it properly) and the information the protocol would produce (when it is enacted). Third, it naturally maps traditional operational constraints to the information needs of protocols, thereby obtaining the desired interactions without additional effort or reasoning. Fourth, our approach naturally supports shared nothing enactments: everything of relevance is included in the communications and no separate global state need be maintained. Fifth, our approach accommodates, but does not require, formal representations of the meanings of the protocols. We evaluate this approach via examples from the literature. Information-Driven Interaction-Oriented Programming: BSPL, the Blindingly Simple Protocol Language Munindar P. Singhhas 5 papers

B58

Communication is a key capability of autonomous agents in a multiagent system to exchange information about their environment. It requires a naming convention that typically involves a set of predefined names for all objects in the environment, which the agents share and understand. However, when the agents are heterogeneous, highly distributed, and situated in an unknown environment, it is very unrealistic to assume that all the objects can be foreseen in advance, and therefore their names cannot be defined beforehand. In such a case, each individual agent needs to be able to introduce new names for the objects it encounters and align them with the naming convention used by the other agents. A language game is a prospective mechanism for the agents to learn and align the naming conventions between them. In this paper we extend the language game model by proposing novel strategies for selecting topics, i.e. attracting agent's attention to different objects during the learning process. Using a simulated multi-agent system we evaluate the process of name alignment in the case of the least restrictive type of language game, the naming game without feedback. Utilising proposed strategies we study the dynamic character of formation of coherent naming conventions and compare it with the behaviour of commonly used random selection strategy. The experimental results demonstrate that the new strategies improve the overall convergence of the alignment process, limit agent's overall demand on memory, and scale with the increasing number of the interacting agents. On Topic Selection Strategies in Multi-Agent Naming Game Wojciech Lorkiewicz, Ryszard Kowalczykhas 4 papers, Radoslaw Katarzyniak, Quoc Bao Vohas 4 papers

Session B4 – Game Theory and Learning

B59

Many games have undesirable Nash equilibria. For example consider a resource allocation game in which two players compete for an exclusive access to a single resource. It has three Nash equilibria. The two pure-strategy NE are efficient, but not fair. The one mixed-strategy NE is fair, but not efficient. Aumann's notion of correlated equilibrium fixes this problem: It assumes a correlation device which suggests each agent an action to take. However, such a “smart” coordination device might not be available. We propose using a randomly chosen, “stupid” integer coordination signal. “Smart” agents learn which action they should use for each value of the coordination signal. We present a multi-agent learning algorithm which converges in polynomial number of steps to a correlated equilibrium of a wireless channel allocation game, a variant of the resource allocation game. We show that the agents learn to play for each coordination signal value a randomly chosen pure-strategy Nash equilibrium of the game. Therefore, the outcome is an efficient correlated equilibrium. This CE becomes more fair as the number of the available coordination signal values increases. We believe that a similar approach can be used to reach efficient and fair correlated equilibria in a wider set of games, such as potential games. Reaching Correlated Equilibria Through Multi-agent Learning Ludek Cigler, Boi Faltings

B60

In infinitely repeated games, the act of teaching an outcome to our adversaries can be beneficial to reach coordination, as well as allowing us to `steer' adversaries to outcomes that are more beneficial to us. Teaching works well against followers, agents that are willing to go along with the proposal, but can lead to miscoordination otherwise. In the context of infinitely repeated games there is, as of yet, no clear formalism that tries to capture and combine these behaviours into a unified view in order to reach a solution of a game. In this paper, we propose such a formalism in the form of an algorithmic criterion, which uses the concept of targeted learning. As we will argue, this criterion can be a beneficial criterion to adopt in order to reach coordination. Afterwards we propose an algorithm that adheres to our criterion that is able to teach pure strategy Nash Equilibria to a broad class of opponents in a broad class of games and is able to follow otherwise, as well as able to perform well in self-play. Sequential Targeted Optimality as a New Criterion for Teaching and Following in Repeated Games Max Knobbout, Gerard A.W. Vreeswijkhas 2 papers

B61

In the well-known scheduling game, a set of jobs controlled by selfish players wishes each to minimize the load of the machine on which it is executed, while the social goal is to minimize the makespan, that is, the maximum load of any machine. We consider this problem on the three most common machines models, identical machines, uniformly related machines and unrelated machines, with respect to both weak and strict Pareto optimal Nash equilibria. These are kinds of equilibria which are stable not only in the sense that no player can improve its cost by changing its strategy unilaterally, but in addition, there is no alternative choice of strategies for the entire set of players where no player increases its cost, and at least one player reduces its cost (in the case of strict Pareto optimality), or where all players reduce their costs (in the case of weak Pareto optimality). We give a complete classification of the social quality of such solutions with respect to an optimal solution, that is, we find the Price of Anarchy of such schedules as a function of the number of machines, m. In addition, we give a full classification of the recognition complexity of such schedules. On the Quality and Complexity of Pareto Equilibria in the Job Scheduling Game Leah Epstein, Elena Kleiman

R56

We develop an algorithm for opponent modeling in large extensive-form games of imperfect information. It works by observing the opponent's action frequencies and building an opponent model by combining information from a precomputed equilibrium strategy with the observations. It then computes and plays a best response to this opponent model; the opponent model and best response are both updated continually in real time. The approach combines game-theoretic reasoning and pure opponent modeling, yielding a hybrid that can effectively exploit opponents after only a small number of interactions. Unlike prior opponent modeling approaches, ours is fundamentally game theoretic and takes advantage of recent algorithms for automated abstraction and equilibrium computation rather than relying on domain-specific prior distributions, historical data, or a handcrafted set of features. Experiments show that our algorithm leads to significantly higher win rates (than an approximate-equilibrium strategy) against several opponents in limit Texas Hold'em – the most studied imperfect-information game in computer science – including competitors from recent AAAI computer poker competitions. Game Theory-Based Opponent Modeling in Large Imperfect-Information Games Sam Ganzfriedhas 2 papers, Tuomas Sandholmhas 2 papers

G59

False-name bids are bids submitted by a single agent under multiple fictitious names such as multiple e-mail addresses. False-name bidding can be a serious fraud in Internet auctions since identifying each participant is virtually impossible. It is shown that even the theoretically well-founded Vickrey-Clarke-Groves auction (VCG) is vulnerable to falsename bidding. Thus, several auction mechanisms that cannot be manipulated by false-name bids, i.e., false-name-proof mechanisms, have been developed. This paper investigates a slightly different question, i.e., how do they affect (perfect) Bayesian Nash equilibria of first-price combinatorial auctions? The importance of this question is that first-price combinatorial auctions are by far widely used in practice than VCG, and can be used as a benchmark for evaluating alternate mechanisms. In an environment where false-name bidding are possible, analytically investigating bidders' behaviors is very complicated, since nobody knows the number of real bidders. As a first step, we consider a kind of minimal settings where falsename bids become effective, i.e., an auction with two goods where one naive bidder competes with one shill bidder who may pretend to be two distinct bidders. We model this auction as a simple dynamic game and examine approximate Bayesian Nash equilibria by utilizing a numerical technique. Our analysis revealed that false-name bidding significantly affects the first-price auctions. Furthermore, the shill bidder has a clear advantage against the naive bidder. False-name Bidding in First-price Combinatorial Auctions with Incomplete Information Atsushi Iwasakihas 5 papers, Atsushi Katsuragi, Makoto Yokoohas 5 papers

Session C4 – Teamwork

R57

This paper presents a novel method to describe and analyze strategic interactions in settings that include multiple actors, many possible actions and relationships among goals, tasks and resources. It shows how to reduce these large interactions to a set of bilateral normal-form games in which the strategy space is significantly smaller than the original setting, while still preserving many of its strategic characteristics. We demonstrate this technique on the Colored Trails (CT) framework, which encompasses a broad family of games defining multi-agent interactions and has been used in many past studies. We define a set of representative heuristics in a three-player CT setting. Choosing players' strategies from this set, the original CT setting is analytically decomposed into canonical bilateral social dilemmas, i.e., Prisoners' Dilemma, Stag Hunt and Ultimatum games. We present a set of criteria for generating strategically interesting CT games and empirically show that they indeed decompose into bilateral social dilemmas if players play according to the heuristics. Our results have significance for multi-agent systems researchers in mapping large multi-player task settings to well-known bilateral normal-form games in a way that facilitates the analysis of the original setting. Metastrategies in the Colored Trails Game Steven de Jonghas 2 papers, Daniel Henneshas 2 papers, Karl Tuylshas 4 papers, Ya'akov (Kobi) Galhas 2 papers

R58

We study the computational complexity of finding stable outcomes in hedonic games, which are a class of coalition formation games. We restrict our attention to a nontrivial subclass of such games, which are guaranteed to possess stable outcomes, i.e., the set of symmetric additively-separable hedonic games. These games are specified by an undirected edge-weighted graph: nodes are players, an outcome of the game is a partition of the nodes into coalitions, and the utility of a node is the sum of incident edge weights in the same coalition. We consider several stability requirements defined in the literature. These are based on restricting feasible player deviations, for example, by giving existing coalition members veto power. We extend these restrictions by considering more general forms of preference aggregation for coalition members. In particular, we consider voting schemes to decide if coalition members will allow a player to enter or leave their coalition. For all of the stability requirements we consider, the existence of a stable outcome is guaranteed by a potential function argument, and local improvements will converge to a stable outcome. We provide an almost complete characterization of these games in terms of the tractability of computing such stable outcomes. Our findings comprise positive results in the form of polynomialtime algorithms, and negative (PLS-completeness) results. The negative results extend to more general hedonic games. Computing Stable Outcomes in Hedonic Games with Voting-Based Deviations Martin Gairing, Rahul Savani

B62

The concept of creating autonomous agents capable of exhibiting ad hoc teamwork was recently introduced as a challenge to the AI, and specifically to the multiagent systems community. An agent capable of ad hoc teamwork is one that can effectively cooperate with multiple potential teammates on a set of collaborative tasks. Previous research has investigated theoretically optimal ad hoc teamwork strategies in restrictive settings. This paper presents the first empirical study of ad hoc teamwork in a more open, complex teamwork domain. Specifically, we evaluate a range of effective algorithms for on-line behavior generation on the part of a single ad hoc team agent that must collaborate with a range of possible teammates in the pursuit domain. Empirical Evaluation of Ad Hoc Teamwork in the Pursuit Domain Samuel Barrett, Peter Stonehas 5 papers, Sarit Kraushas 4 papers

B63

The behavior composition problem involves realizing a virtual target behavior (i.e., the desired module) by suitably coordinating the execution of a set of partially controllable available components (e.g., agents, devices, processes, etc.) running in a shared partially predictable environment. All existing approaches to such problem have been framed within strict uncertainty settings. In this work, we propose a framework for automatic behavior composition which allows the seamless integration of classical behavior composition with decision-theoretic reasoning. Specifically, we consider the problem of maximizing the “expected realizability” of the target behavior in settings where the uncertainty can be quantified. Unlike previous proposals, the approach developed here is able to (better) deal with instances that do not accept “exact” solutions, thus yielding a more practical account for real domains. Moreover, it is provably strictly more general than the classical composition framework. Besides formally defining the problem and what counts as a solution, we show how a decision-theoretic composition problem can be solved by reducing it to the problem of finding an optimal policy in a Markov decision process. Decision Theoretic Behavior Composition Nitin Yadav, Sebastian Sardina

G60

An interesting problem of multi-agent systems is that of voting, in which the preferences of autonomous agents are to be combined. Applications of voting include modeling social structures, search engine ranking, and choosing a leader among computational agents. In the setting of voting, it is very important that each agent presents truthful information about his or her preferences, and not manipulate. The choice of election system may encourage or discourage voters from manipulating. Because manipulation often results in undesirable consequences, making the determination of such intractable is an important goal. An interesting metric on the robustness of an election system concerns the frequency in which opportunities of manipulations occur in a given election system. Previous work by Walsh has evaluated the frequency of manipulation in the context of very specific election systems, particularly veto, when the number of candidates is limited to at most three, by showing that manipulation problems in these systems can be directly viewed as problems of (Two-Way) Partition, and then using the best known heuristics of Partition. Walsh also claimed similar results hold for k-candidate veto election by way of problems involving multi-way partitions. We show that the results for k-candidate veto elections do not follow directly from common versions of partition problems and require non-trivial modifications to Multi-Way Partition. With these modifications, we confirm Walsh's claim that these elections are also vulnerable to manipulation. Our new computational problems also allow one to evaluate manipulation in the general case of k-candidate scoring protocols. We investigate the complexity of manipulating scoring protocols using new algorithms we derive by extending the known algorithms of Multi-Way Partition. It is our conclusion that the problems of manipulation in more general scoring protocols of four or more candidates are not vulnerable to manipulation using extensions of the current known algorithms of Multi-Way Partition. This may be due to weaknesses in these algorithms or complexity in manipulating general scoring protocols. Solving Election Manipulation Using Integer Partitioning Problems Andrew Lin

Session A5 – Learning Agents

G61

The field of multiagent decision making is extending its tools from classical game theory by embracing reinforcement learning, statistical analysis, and opponent modeling. For example, behavioral economists conclude from experimental results that people act according to levels of reasoning that form a "cognitive hierarchy" of strategies, rather than merely following the hyper-rational Nash equilibrium solution concept. This paper expands this model of the iterative reasoning process by widening the notion of a level within the hierarchy from one single strategy to a distribution over strategies, leading to a more general framework of multiagent decision making. It provides a measure of sophistication for strategies and can serve as a guide for designing good strategies for multiagent games, drawing it's main strength from predicting opponent strategies. We apply these lessons to the recently introduced Lemonade-stand Game, a simple setting that includes both collaborative and competitive elements, where an agent's score is critically dependent on its responsiveness to opponent behavior. The opening moves are significant to the end result and simple heuristics have achieved faster cooperation than intricate learning schemes. Using results from the past two real-world tournaments, we show how the submitted entries fit naturally into our model and explain why the top agents were successful. Using Iterated Reasoning to Predict Opponent Strategies Michael Wunder, John Robert Yaros, Michael Littman, Michael Kaisershas 3 papers

R59

In continuous learning settings stochastic stable policies are often necessary to ensure that agents continuously adapt to dynamic environments. The choice of the decentralised learning system and the employed policy plays an important role in the optimisation task. For example, a policy that exhibits fluctuations may also introduce non-linear effects which other agents in the environment may not be able to cope with and even amplify these effects. In dynamic and unpredictable multiagent environments these oscillations may introduce instabilities. In this paper, we take inspiration from the limbic system to introduce an extension to the weighted policy learner, where agents evaluate rewards as either positive or negative feedback, depending on how they deviate from average expected rewards. Agents have positive and negative biases, where a bias either magnifies or depresses a positive or negative feedback signal. To contain the non-linear effects of biased rewards, we incorporate a decaying memory of past positive and negative feedback signals to provide a smoother gradient update on the probability simplex, spreading out the effect of the feedback signal over time. By splitting the feedback signal, more leverage on the win or learn fast (WoLF) principle is possible. The cognitive policy learner is evaluated using a small queueing network and compared with the fair action and weighted policy learner. Emphasis is placed on analysing the dynamics of the learning algorithms with respect to the stability of the queueing network and the overall queueing performance. Cognitive Policy Learner: Biasing Winning or Losing Strategies Dominik Dahlem, Jim Dowling, William Harrison

R60

Distributed collaborative adaptive sensing (DCAS) of the atmosphere is a new paradigm for detecting and predicting hazardous weather using a large dense network of short-range, low-powered radars to sense the lowest few kilometers of the earths atmosphere. In DCAS, radars are controlled by a collection of Meteorological Command and Control (MC&C) agents that instruct where to scan based on emerging weather conditions. Within this context, this work concentrates on designing efficient approaches for allocating sensing resources to cope with restricted real-time requirements and limited computational resources. We have developed a new approach based on explicit goals that can span multiple system heartbeats. This allows us to reason ahead about sensor allocations based on expected requirements of goals as they project forward in time. Each goal explicitly specifies end-users' preferences as well as a prediction of how a phenomena will move. We use a genetic algorithm to generate scanning strategies of each single MC&C and a distributed negotiation model to coordinate multiple MC&Cs' scanning strategies over multiple heartbeats. Simulation results show that as compared to simpler variants of our approach, the proposed distributed model achieved the highest social welfare. Our approach also has exhibited similarly very good performance in an operational radar testbed that is deployed in Oklahoma to observe severe weather events. Agent-Mediated Multi-Step Optimization for Resource Allocation in Distributed Sensor Networks Bo Anhas 2 papers, Victor Lesserhas 4 papers, David Westbrook, Michael Zink

R61

This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies the impact of demonstration quality and quantity, as well as the effect of combining demonstrations from multiple teachers. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration. The best performance was achieved by combining the best demonstrations from two teachers. Integrating Reinforcement Learning with Human Demonstrations of Varying Ability Matthew E. Taylorhas 4 papers, Halit Bener Suay, Sonia Chernova

Session B5 – Auction and Incentive Design

R62

We consider a setting in which a principal seeks to induce an adaptive agent to select a target action by providing incentives on one or more actions. The agent maintains a belief about the value for each action—which may update based on experience—and selects at each time step the action with the maximal sum of value and associated incentive. The principal observes the agent's selection, but has no information about the agent's current beliefs or belief update process. For inducing the target action as soon as possible, or as often as possible over a fixed time period, it is optimal for a principal with a per-period budget to assign the budget to the target action and wait for the agent to want to make that choice. But with an across-period budget, no algorithm can provide good performance on all instances without knowledge of the agent's update process, except in the particular case in which the goal is to induce the agent to select the target action once. We demonstrate ways to overcome this strong negative result with knowledge about the agent's beliefs, by providing a tractable algorithm for solving the offline problem when the principal has perfect knowledge, and an analytical solution for an instance of the problem in which partial knowledge is available. Incentive Design for Adaptive Agents Yiling Chenhas 2 papers, Jerry Kung, David C. Parkeshas 2 papers, Ariel D. Procaccia, Haoqi Zhang

R63

We study a problem where a group of agents has to decide how a joint reward should be shared among them. We focus on settings where the share that each agent receives depends on the subjective opinions of its peers concerning that agent's contribution to the group. To this end, we introduce a mechanism to elicit and aggregate subjective opinions as well as for determining agents' shares. The intuition behind the proposed mechanism is that each agent who believes that the others are telling the truth has its expected share maximized to the extent that it is well-evaluated by its peers and that it is truthfully reporting its opinions. Under the assumptions that agents are Bayesian decision-makers and that the underlying population is sufficiently large, we show that our mechanism is incentive-compatible, budget-balanced, and tractable. We also present strategies to make this mechanism individually rational and fair. A Truth Serum for Sharing Rewards Arthur Carvalho, Kate Larsonhas 2 papers

B64

So far computer cannot satisfyingly solve many tasks that are extremely easy for human, such as image recognition or common sense reasoning. A partial solution is to delegate algorithmically difficult computation task to human, called human computation. The Game with a Purpose (GWAP), in which computational task is transformed into a game, is perhaps the most popular form of human computation. A simplified adverse selection model for output-agreement / simultaneous-verification GWAP was built, using the ESP Game as example. The experiment results favored an adverse selection model over an moral hazard model. We were particularly interested in output quality of a GWAP affected by how players are matched with each other, and proposed capability-aligned matching (CAM) versus commonly-used random matching. The analysis showed that when compared with random mathcing, the CAM improved output quality. The experiment confirmed conclusions drawed from the analysis, and further pointed out that task-human matching scheme was as important as human-human matching scheme studied in this paper. The main contribution of this paper is the analysis and empirical evaluation of humanhuman matching scheme, showing that capability-aligned matching can improve quality of GWAP. Capability-Aligned Matching: Improving Quality of Games with a Purpose Che-Liang Chiou, Jane Yung-Jen Hsuhas 2 papers

G62

Mechanism design studies how to design mechanisms that result in good outcomes even when agents strategically report their preferences. In traditional settings, it is assumed that a mechanism can enforce payments to give an incentive for agents to act honestly. However, in many Internet application domains, introducing monetary transfers is impossible or undesirable. Also, in such highly anonymous settings as the Internet, declaring preferences dishonestly is not the only way to manipulate the mechanism. Often, it is possible for an agent to pretend to be multiple agents and submit multiple reports under different identifiers, e.g., by creating different e-mail addresses. The effect of such false-name manipulations can be more serious in a mechanism without monetary transfers, since submitting multiple reports would have no risk. In this paper, we present a case study in false-name-proof mechanism design without money. In our basic setting, agents are located on a real line, and the mechanism must select the location of a public facility; the cost of an agent is its distance to the facility. This setting is called the facility location problem and can represent various situations where an agent's preference is single-peaked. First, we fully characterize the deterministic false-name-proof facility location mechanisms in this basic setting. By utilizing this characterization, we show the tight bounds of the approximation ratios for two objective functions: social cost and maximum cost. We then extend the results in two natural directions: a domain where a mechanism can be randomized and a domain where agents are located in a tree. Furthermore, we clarify the connections between false-name-proofness and other related properties. False-name-proof Mechanism Design without Money Taiki Todo, Atsushi Iwasakihas 5 papers, Makoto Yokoohas 5 papers

R64

This paper studies the problem of majority-rule-based collective decision-making where the agents' preferences are represented by CP-nets (Conditional Preference Networks). As there are exponentially many alternatives, it is impractical to reason about the individual full rankings over the alternative space and apply majority rule directly. Most existing works either do not consider computational requirements, or depend on a strong assumption that the agents have acyclic CP-nets that are compatible with a common order on the variables. To this end, this paper proposes an efficient SAT-based approach, called MajCP (Majority-rule-based collective decision-making with CP-nets), to compute the majority winning alternatives. Our proposed approach only requires that each agent submit a CP-net; the CP-net can be cyclic, and it does not need to be any common structures among the agents' CP-nets. The experimental results presented in this paper demonstrate that the proposed approach is computationally efficient. It offers several orders of magnitude improvement in performance over a Brute-force algorithm for large numbers of variables. Majority-Rule-Based Preference Aggregation on Multi-Attribute Domains with CP-Nets Minyi Lihas 3 papers, Quoc Bao Vohas 4 papers, Ryszard Kowalczykhas 4 papers

Session C5 – Simulation and Emergence

B65

The dynamic formation of coalitions is a well-known area of interest in multi-agent systems (MAS). Coalitions can help self-interested agents to successfully cooperate and coordinate in a mutually beneficial manner. Moreover, the organization provided by coalitions is particularly helpful for largescale MAS. In this paper we present a distributed approach for coalition emergence in large-scale MAS. In particular, we focus on MAS with agents interacting over complex networks since they provide a realistic model of the nowadays interconnected world (e.g. social networks). Our experiments show the effectiveness of our coalition emergence approach in achieving full cooperation over different complex networks. Furthermore, they provide a clear picture of the strong influence the topology has on coalition emergence. Emerging Cooperation on Complex Networks Norman Salazar, Juan Antonio Rodriguez-Aguilarhas 3 papers, Josep Lluís Arcos, Ana Peleteiro, Juan C. Burguillo-Rial

G63

Large heterogeneous teams in a variety of applications must make joint decisions using large volumes of noisy and uncertain data. Often not all team members have access to a sensor, relying instead on information shared by peers to make decisions. These sensors can become permanently corrupted through hardware failure or as a result of the actions of a malicious adversary. Previous work showed that when the trust between agents was tuned to a specific value the resulting dynamics of the system had a property called scale invariance which led to agents reaching highly accurate conclusion with little communication. In this paper we show that these dynamics also leave the system vulnerable to most agents coming to incorrect conclusions as a result of small amounts of anomalous information maliciously injected in the system. We conduct an analysis that shows that the efficiency of scale invariant dynamics is due to the fact that large number of agents can come to correct conclusions when the difference between the percentage of agents holding conflicting opinions is relatively small. Although this allows the system to come to correct conclusions quickly, it also means that it would be easy for an attacker with specific knowledge to tip the balance. We explore different methods for selecting which agents are Byzantine and when attacks are launched informed by the analysis. Our study reveals global system properties that can be used to predict when and where in the network the system is most vulnerable to attack. We use the results of this study to design an algorithm used by agents to effectively attack the network, informed by local estimates of the global properties revealed by our investigation. An Investigation of the Vulnerabilities of Scale Invariant Dynamics in Large Teams Robin Glinton, Paul Scerrihas 3 papers, Katia Sycarahas 7 papers

B66

We study the phenomenon of evolution of cooperation in a society of self-interested agents using repeated games in graphs. A repeated game in a graph is a multiple round game, where, in each round, an agent gains payoff by playing a game with its neighbors and updates its action (state) by using the actions and/or payoffs of its neighbors. The interaction model between the agents is a two-player, two-action (cooperate and defect) Prisoner's Dilemma (PD) game (a prototypical model for interaction between self-interested agents). The conventional wisdom is that the presence of network structure enhances cooperation and current models use multiagent simulation to show evolution of cooperation. However, these results are based on particular combination of interaction game, network model and state update rules (e.g., PD game on a grid with imitate your best neighbor rule leads to evolution of cooperation). The state-of-the-art lacks a comprehensive picture of the dependence of the emergence of cooperation on model parameters like network topology, interaction game, state update rules and initial fraction of cooperators. We perform a thorough study of the phenomenon of evolution of cooperation using (a) a set of popular categories of networks, namely, grid, random networks, scale-free networks, and small-world networks and (b) a set of cognitively motivated update rules. Our simulation results show that the evolution of cooperation in networked systems is quite nuanced and depends on the combination of network type, update rules and the initial fraction of cooperating agents. We also provide an analysis to support our simulation results. The Evolution of Cooperation in Self-Interested Agent Societies: A Critical Study Lisa-Maria Hofmann, Nilanjan Chakraborty, Katia Sycarahas 7 papers

G64

We analyze and extend a recently proposed model of linguistic diffusion in social networks, to analytically derive time to convergence, and to account for the innovation phase of lexical dynamics in networks. Our new model, the degree-biased voter model with innovation, shows that the probability of existence of a norm is inversely related to innovation probability. When the innovation rate in the population is low, variants that become norms are due to a peripheral member with high probability. As the innovation rate increases, the fraction of time that the norm is a peripheral-introduced variant and the total time for which a norm exists at all in the population decrease. These results align with historical observations of rapid increase and generalization of slang words, technical terms, and new common expressions at times of cultural change in some languages. A Model of Norm Emergence and Innovation in Language Change Samarth Swarup, Andrea Apolloni, Zsuzsanna Fagyal

B67

Large scale agent-based simulations typically face a trade-off between the level of detail in the representation of each agent and the scalability seen as the number of agents that can be simulated with the computing resources available. In this paper, we aim at bypassing this trade-off by considering that the level of detail is itself a parameter that can be adapted automatically and dynamically during the simulation, taking into account elements such as user focus, or specific events. We introduce a framework for such a methodology, and detail its deployment within an existing simulator dedicated to the simulation of urban infrastructures. We evaluate the approach experimentally along two criteria: (1) the impact of our methodology on the resources (CPU use), and (2) an estimate of the dissimilarity between the two modes of simulation, i.e. with and without applying our methodology. Initial experiments show that a major gain in CPU time can be obtained for a very limited loss of consistency. Dynamic Level of Detail for Large Scale Agent-Based Urban Simulations Laurent Navarro, Fabien Flacher, Vincent Corruble

Session D5 – Logic-Based Approaches II

R65

In modal logic, when adding a syntactic property to an axiomatisation, this property will semantically become true in all models, in all situations, under all circumstances. For instance, adding a property like K_a p → K_b p (agent b knows at least what agent a knows) to an axiomatisation of some epistemic logic has as an effect that such a property becomes globally true, i.e., it will hold in all states, at all time points (in a temporal setting), after every action (in a dynamic setting) and after any communication (in an update setting), and every agent will know that it holds, it will even be common knowledge. We propose a way to express that a property like the above only needs to hold locally: it may hold in the actual state, but not in all states, and not all agents may know that it holds. We can achieve this by adding relational atoms to the language that represent (implicitly) quantification over all formulas, as in ∀ p (K_a p → K_b p). We show how this can be done for a rich class of modal logics and a variety of syntactic properties. Reasoning About Local Properties in Modal Logic Hans van Ditmarsch, Wiebe van der Hoekhas 5 papers, Barteld Kooi

R66

Logics of propositional control, such as van der Hoek and Wooldridge's CL-PC, were introduced in order to represent and reason about scenarios in which each agent within a system is able to exercise unique control over some set of system variables. Our aim in the present paper is to extend the study of logics of propositional control to settings in which these agents have incomplete information about the society they occupy. We consider two possible sources of incomplete information. First, we consider the possibility that an agent is only able to “read” a subset of the overall system variables, and so in any given system state, will have partial information about the state of the system. Second, we consider the possibility that an agent has incomplete information about which agent controls which variables. For both cases, we introduce a logic combining epistemic modalities with the operators of CL-PC, investigate its axiomatization, and discuss its properties. Knowledge and Control Wiebe van der Hoekhas 5 papers, Nicolas Troquard, Michael Wooldridgehas 6 papers

R67

A well known (and often used) result by Marc Pauly states that for every playable effectivity function E there exists a strategic game that assigns to coalitions exactly the same power as E, and vice versa. While the latter direction of the correspondence is correct, we show that the former does not always hold in the case of infinite game models. We point out where the proof of correspondence goes wrong, and we present examples of playable effectivity functions in infinite models for which no equivalent strategic game exists. Then, we characterize the class of truly playable effectivity functions, that does correspond to strategic games. Moreover, we discuss a construction that transforms any playable effectivity function into a truly playable one while preserving the power of most (but not all) coalitions. We also show that Coalition Logic is not expressive enough to distinguish between playable and truly playable effectivity functions, and we extend it to a logic that can make this distinction while enjoying finite axiomatization and finite model property. Strategic Games and Truly Playable Effectivity Functions Valentin Goranko, Wojciech Jamroga, Paolo Turrini

R68

In epistemic logic, Kripke structures are used to model the distribution of information in a multi-agent system. In this paper, we present an approach to quantifying how much information each particular agent in a system has, or how important the agent is, with respect to some fact represented as a goal formula. It is typically the case that the goal formula is distributed knowledge in the system, but that no individual agent alone knows it. It might be that several different groups of agents can get to know the goal formula together by combining their individual knowledge. By using power indices developed in voting theory, such as the Banzhaf index, we get a measure of how important an agent is in such groups. We analyse the properties of this notion of information-based power in detail, and characterise the corresponding class of voting games. Although we mainly focus on distributed knowledge, we also look at variants of this analysis using other notions of group knowledge. An advantage of our framework is that power indices and other power properties can be expressed in standard epistemic logic. This allows, e.g., standard model checkers to be used to quantitatively analyse the distribution of information in a given Kripke structure. Scientia Potentia Est Thomas Ågotnes, Wiebe van der Hoekhas 5 papers, Michael Wooldridgehas 6 papers

B68

Session A6 – Robotics and Learning

R69

Recent research in multi-robot exploration and mapping has focused on sampling environmental fields, which are typically modeled using the Gaussian process (GP). Existing information-theoretic exploration strategies for learning GP-based environmental field maps adopt the non-Markovian problem structure and consequently scale poorly with the length of history of observations. Hence, it becomes computationally impractical to use these strategies for in situ, real-time active sampling. To ease this computational burden, this paper presents a Markov-based approach to efficient information-theoretic path planning for active sampling of GP-based fields. We analyze the time complexity of solving the Markov-based path planning problem, and demonstrate analytically that it scales better than that of deriving the non-Markovian strategies with increasing length of planning horizon. For a class of exploration tasks called the transect sampling task, we provide theoretical guarantees on the active sampling performance of our Markov-based policy, from which ideal environmental field conditions and sampling task settings can be established to limit its performance degradation due to violation of the Markov assumption. Empirical evaluation on real-world temperature and plankton density field data shows that our Markov-based policy can generally achieve active sampling performance comparable to that of the widely-used non-Markovian greedy policies under less favorable realistic field conditions and task settings while enjoying significant computational gain over them. Active Markov Information-Theoretic Path Planning for Robotic Environmental Sensing Kian Hsiang Low, John M. Dolan, Pradeep Khosla

R70

Maintaining accurate world knowledge in a complex and changing environment is a perennial problem for robots and other artificial intelligence systems. Our architecture for addressing this problem, called Horde, consists of a large number of independent reinforcement learning sub-agents, or demons. Each demon is responsible for answering a single predictive or goal-oriented question about the world, thereby contributing in a factored, modular way to the system's overall knowledge. The questions are in the form of a value function, but each demon has its own policy, reward function, termination function, and terminal-reward function unrelated to those of the base problem. Learning proceeds in parallel by all demons simultaneously so as to extract the maximal training information from whatever actions are taken by the system as a whole. Gradient-based temporal-difference learning methods are used to learn efficiently and reliably with function approximation in this off-policy setting. Horde runs in constant time and memory per time step, and is thus suitable for learning online in real-time applications such as robotics. We present results using Horde on a multi-sensored mobile robot to successfully learn goal-oriented behaviors and long-term predictions from off-policy experience. Horde is a significant incremental step towards a real-time architecture for efficient learning of general knowledge from unsupervised sensorimotor interaction. Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction Richard S. Sutton, Joseph Modayil, Michael Delp, Thomas Degris, Patrick M. Pilarski, Adam White, Doina Precuphas 2 papers

B69

In several realistic domains an agent's behavior is composed of multiple interdependent skills. For example, consider a humanoid robot that must play soccer, as is the focus of this paper. In order to succeed, it is clear that the robot needs to walk quickly, turn sharply, and kick the ball far. However, these individual skills are ineffective if the robot falls down when switching from walking to turning, or if it cannot position itself behind the ball for a kick. This paper presents a learning architecture for a humanoid robot soccer agent that has been fully deployed and tested within the RoboCup 3D simulation environment. First, we demonstrate that individual skills such as walking and turning can be parameterized and optimized to match the best performance statistics reported in the literature. These results are achieved through effective use of the CMA-ES optimization algorithm. Next, we describe a framework for optimizing skills in conjunction with one another, a little-understood problem with substantial practical significance. Over several phases of learning, a total of roughly 100-150 parameters are optimized. Detailed experiments show that an agent thus optimized performs comparably with the top teams from the RoboCup 2010 competitions, while taking relatively few man-hours for development. On Optimizing Interdependent Skills: A Case Study in Simulated 3D Humanoid Robot Soccer Daniel Urielihas 2 papers, Patrick MacAlpine, Shivaram Kalyanakrishnan, Yinon Bentor, Peter Stonehas 5 papers

R71

A key component of any reinforcement learning algorithm is the underlying representation used by the agent. While reinforcement learning (RL) agents have typically relied on hand-coded state representations, there has been a growing interest in learning this representation. While inputs to an agent are typically fixed (i.e., state variables represent sensors on a robot), it is desirable to automatically determine the optimal relative scaling of such inputs, as well as to diminish the impact of irrelevant features. This work introduces HOLLER, a novel distance metric learning algorithm, and combines it with an existing instance-based RL algorithm to achieve precisely these goals. The algorithms' success is highlighted via empirical measurements on a set of six tasks within the mountain car domain. Metric Learning for Reinforcement Learning Agents Matthew E. Taylorhas 4 papers, Brian Kulis, Fei Sha

Session B6 – Energy Applications

R72

The creation of Virtual Power Plants (VPPs) has been suggested in recent years as the means for achieving the cost-efficient integration of the many distributed energy resources (DERs) that are starting to emerge in the electricity network. In this work, we contribute to the development of VPPs by offering a game-theoretic perspective to the problem. Specifically, we design cooperatives (or “cooperative VPPs”—CVPPs) of rational autonomous DER-agents representing small-to-medium size renewable electricity producers, which coalesce to profitably sell their energy to the electricity grid. By so doing, we help to counter the fact that individual DERs are often excluded from the wholesale energy market due to their perceived inefficiency and unreliability. We discuss the issues surrounding the emergence of such cooperatives, and propose a pricing mechanism with certain desirable properties. Specifically, our mechanism guarantees that CVPPs have the incentive to truthfully report to the grid accurate estimates of their electricity production, and that larger rather than smaller CVPPs form; this promotes CVPP efficiency and reliability. In addition, we propose a scheme to allocate payments within the cooperative, and show that, given this scheme and the pricing mechanism, the allocation is in the core and, as such, no subset of members has a financial incentive to break away from the CVPP. Moreover, we develop an analytical tool for quantifying the uncertainty about DER production estimates, and distinguishing among different types of errors regarding such estimates. We then utilize this tool to devise protocols to manage CVPP membership. Finally, we demonstrate these ideas through a simulation that uses real-world data. Cooperatives of Distributed Energy Resources for Efficient Virtual Power Plants Georgios Chalkiadakis, Valentin Robuhas 2 papers, Ramachandra Kotahas 2 papers, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

R73

Traffic causes pollution and demands fuel. When it comes to vehicle traffic, intersections tend to be a main bottleneck. Traditional approaches to control traffic at intersections have not been designed to optimize any environmental criterion. Our objective is to design mechanisms for intersection control which minimize fuel consumption. This is difficult because it requires a specialized infrastructure: It must allow vehicles and intersections to communicate, e.g., vehicles send their dynamic characteristics (position, speed etc.) to the intersection more or less continuously so that it can estimate the fuel consumption. In this context, the use of software agents supports the driver by reducing the necessary degree of direct interaction with the intersection. In this paper, we quantify the fuel consumption with existing agent-based approaches for intersection control. Further, we propose a new, agent-based mechanism for intersection control, with minimization of fuel consumption as an explicit design objective. It reduces fuel consumption by up to 26% and waiting time by up to 98%, compared to traffic lights. Thus, agent-based mechanisms for intersection control may reduce fuel consumption in a way that is substantial. How Agents Can Help Curbing Fuel Combustion – a Performance Study of Intersection Control for Fuel-Operated Vehicles Natalja Pulter, Heiko Schepperle, Klemens Böhmhas 2 papers

G65

Intelligent electricity grids, or `Smart Grids', are being introduced at a rapid pace. Smart grids allow the management of new distributed power generators such as solar panels and wind turbines, and innovative power consumers such as plug-in hybrid vehicles. One challenge in Smart Grids is to fulfill consumer demands while avoiding infrastructure overloads. Another challenge is to reduce imbalance costs: after ahead scheduling of production and consumption (the so-called `load schedule'), unpredictable changes in production and consumption yield a cost for repairing this balance. To cope with these risks and costs, we propose a decentralized, multi-agent system solution for coordinated charging of PHEVs in a Smart Grid. Essentially, the MAS utilizes an "intention graph" for expressing the flexibility of a fleet of PHEVs. Based on this flexibility, charging of PHEVs can be rescheduled in real-time to reduce imbalances. We discuss and evaluate two scheduling strategies for reducing imbalance costs: reactive scheduling and proactive scheduling. Simulations show that reactive scheduling is able to reduce imbalance costs by 14%, while proactive scheduling yields the highest imbalance cost reduction of 44%. Decentralized Coordination Of Plug-in Hybrid Vehicles For Imbalance Reduction In A Smart Grid Stijn Vandael, Klaas De Craemer, Nelis Boucké, Tom Holvoet, Geert Deconinck

B70

Plug-in hybrid electric vehicles are expected to place a considerable strain on local electricity distribution networks, requiring charging to be coordinated in order to accommodate capacity constraints. We design a novel online auction protocol for this problem, wherein vehicle owners use agents to bid for power and also state time windows in which a vehicle is available for charging. This is a multi-dimensional mechanism design domain, with owners having non-increasing marginal valuations for each subsequent unit of electricity. In our design, we couple a greedy allocation algorithm with the occasional “burning” of allocated power, leaving it unallocated, in order to adjust an allocation and achieve monotonicity and thus truthfulness. We consider two variations: burning at each time step or on-departure. Both mechanisms are evaluated in depth, using data from a real-world trial of electric vehicles in the UK to simulate system dynamics and valuations. The mechanisms provide higher allocative efficiency than a fixed price system, are almost competitive with a standard scheduling heuristic which assumes non-strategic agents, and can sustain a substantially larger number of vehicles at the same per-owner fuel cost saving than a simple random scheme. Online Mechanism Design for Electric Vehicle Charging Enrico H. Gerding, Valentin Robuhas 2 papers, Sebastian Stein, David C. Parkeshas 2 papers, Alex Rogershas 6 papers, Nicholas R. Jenningshas 9 papers

Session C6 – Voting Protocols

R74

Distance rationalizability is a framework for classifying voting rules by interpreting them in terms of distances and consensus classes. It can also be used to design new voting rules with desired properties. A particularly natural and versatile class of distances that can be used for this purpose is that of votewise distances, which “lift” distances over individual votes to distances over entire elections using a suitable norm. In this paper, we continue the investigation of the properties of votewise distance-rationalizable rules initiated in Elkind et al. We describe a number of general conditions on distances and consensus classes that ensure that the resulting voting rule is homogeneous or monotone. This complements the results of Elkind et al., where the authors focus on anonymity, neutrality and consistency. We also introduce a new class of voting rules, that can be viewed as “majority variants” of classic scoring rules, and have a natural interpretation in the context of distance rationalizability. Homogeneity and Monotonicity of Distance-Rationalizable Voting Rules Edith Elkindhas 3 papers, Piotr Faliszewski, Arkadii Slinko

R75

In a voting system, sometimes multiple new alternatives will join the election after the voters' preferences over the initial alternatives have been revealed. Computing whether a given alternative can be a co-winner when multiple new alternatives join the election is called the possible co-winner with new alternatives (PcWNA) problem and was introduced by (Chevaleyre et al., 2010). In this paper, we show that the PcWNA problems are NP-complete for the Bucklin, Copeland_0, and maximin (a.k.a. Simpson) rule, even when the number of new alternatives is no more than a constant. We also show that the PcWNA problem can be solved in polynomial time for plurality with runoff. For the approval rule, we examine three different ways to extend a linear order with new alternatives, and characterize the computational complexity of the PcWNA problem for each of them. Possible Winners When New Alternatives Join: New Results Coming Up! Lirong Xia, Jérôme Langhas 2 papers, Jérôme Monnot

B71

Electoral control models ways of changing the outcome of an election via such actions as adding/deleting/partitioning either candidates or voters. These actions modify an election's participation structure and aim at either making a favorite candidate win (“constructive control”) or prevent a despised candidate from winning (“destructive control”). To protect elections from such control attempts, computational complexity has been used to show that electoral control, though not impossible, is computationally prohibitive. Recently, Erdélyi and Rothe proved that Brams and Sanver's fallback voting, a hybrid voting system that combines Bucklin with approval voting, is resistant to each of the standard types of control except five types of voter control. They proved that fallback voting is vulnerable to two of those control types, leaving the other three cases open. We solve these three open problems, thus showing that fallback voting is resistant to all standard types of control by partition of voters–which is a particularly important and well-motivated control type, as it models “two-district gerrymandering.” Hence, fallback voting is not only fully resistant to candidate control but also fully resistant to constructive control, and it displays the broadest resistance to control currently known to hold among natural voting systems with a polynomial-time winner problem. We also show that Bucklin voting behaves almost as good in terms of control resistance. Each resistance for Bucklin voting strengthens the corresponding control resistance for fallback voting. The Complexity of Voter Partition in Bucklin and Fallback Voting: Solving Three Open Problems Gábor Erdélyi, Lena Piras, Jörg Rothehas 2 papers

B72

We introduce a new algorithm for the Unweighted Coalitional Manipulation problem under the Maximin voting rule. We prove that the algorithm gives an approximation ratio of 12/3 to the corresponding optimization problem. This is an improvement over the previously known algorithm that gave a 2-approximation. We also prove that its approximation ratio is no better than 11/2, i.e., there are instances on which a 11/2 -approximation is the best the algorithm can achieve. Finally, we prove that no algorithm can approximate the problem better than to the factor of 11/2, unless P = NP. An Algorithm for the Coalitional Manipulation Problem under Maximin Michael Zuckerman, Omer Lev, Jeffrey S. Rosenscheinhas 2 papers

G66

A possible winner of an election is a candidate that has, in some kind of incomplete-information election, the possibility to win in a complete extension of the election. The first type of problem we study is the Possible co-Winner with respect to the Addition of New Candidates (PcWNA) problem, which asks, given an election with strict preferences over the candidates, is it possible to make a designated candidate win the election by adding a limited number of new candidates to the election? In the case of unweighted voters we show NP-completeness of PcWNA for a broad class of pure scoring rules. We will also briefly study the case of weighted voters. The second type of possible winner problem we study is Possible Winner/co-Winner under Uncertain Voting System (PWUVS and PcWUVS). Here, uncertainty is present not in the votes but in the election rule itself. For example, PcWUVS is the problem of whether, given a set C of candidates, a list of votes over C, a distinguished candidate c ın C, and a class of election rules, there is at least one election rule from this class under which c wins the election. We study these two problems for a class of systems based on approval voting, the family of Copeland^α elections, and a certain class of scoring rules. Our main result is that it is NP-complete to determine whether there is a scoring vector that makes c win the election, if we restrict the set of possible scoring vectors for an m-candidate election to those of the form (α_1, …, α_{m-4}, x_1, x_2, x_3, 0), with x_i = 1 for at least one i ın {1, 2, 3}. Computational Complexity of Two Variants of the Possible Winner Problem Dorothea Baumeister, Magnus Roos, Jörg Rothehas 2 papers

Session D6 – Trust and Organisational Structure

G67

We propose that the trust an agent places in another agent declaratively captures an architectural connector between the two agents. We formulate trust as a generic modality expressing a relationship between a truster and a trustee. Specifically, trust here is definitionally independent of, albeit constrained by, other relevant modalities such as commitments and beliefs. Trust applies to a variety of attributes of the relationship between truster and trustee. For example, an agent may trust someone to possess an important capability, exercise good judgment, or to intend to help it. Although such varieties of trust are hugely different, they respect common logical patterns. We present a logic of trust that expresses such patterns as reasoning postulates concerning the static representation of trust, its dynamics, and its relationships with teamwork and other agent interactions. In this manner, the proposed logic illustrates the general properties of trust that reflect natural intuitions, and can facilitate the engineering of multiagent systems. Trust as Dependence: A Logical Approach Munindar P. Singhhas 5 papers

G68

In the absence of legal enforcement procedures for the participants of an open e-marketplace, trust and reputation systems are central for resisting against threats from malicious agents. Such systems provide mechanisms for identifying the participants who disseminate unfair ratings. However, it is possible that some of the honest participants are also victimized as a consequence of the poor judgement of these systems. In this paper, we propose a two-layer filtering algorithm that cognitively elicits the behavioral characteristics of the participating agents in an e-marketplace. We argue that the notion of unfairness does not exclusively refer to deception but can also imply differences in dispositions. The proposed filtering approach aims to go beyond the inflexible judgements on the quality of participants and instead allows the human dispositions that we call optimism, pessimism and realism to be incorporated into our trustworthiness evaluations. Our proposed filtering algorithm consists of two layers. In the first layer, a consumer agent measures the competency of its neighbors for being a potentially helpful adviser. Thus, it automatically disqualifies the deceptive agents and/or the newcomers that lack the required experience. Afterwards, the second layer measures the credibility of the surviving agents of the previous layer on the basis of their behavioral models. This tangible view of trustworthiness evaluation boosts the confidence of human users in using a web-based agent-oriented e-commerce application. Multi-Layer Cognitive Filtering by Behavioral Modeling Zeinab Noorian, Stephen Marsh, Michael Fleming

B73

In any group of agents, trust plays an important role. The degree to which agents trust one another will inform what they believe, and, as a result the reasoning that they perform and the conclusions that they come to when that involves information from other agents. In this paper we consider a group of agents with varying degrees of trust of each other, and examine the combinations of trust with the argumentation-based reasoning that they can carry out. The question we seek to answer is “What is the relationship between the trust one agent has in another and the conclusions that it can draw using information from that agent?”, and show that there are a range of answers depending upon the way that the agents deal with trust. Argumentation-Based Reasoning in Agents with Varying Degrees of Trust Simon Parsonshas 3 papers, Yuqing Tanghas 2 papers, Elizabeth Sklar, Peter McBurney, Kai Cai

G69

Keyword auctions are becoming increasingly important in today's electronic marketplaces. One of their most challenging aspects is the limited amount of information revealed about other advertisers. In this paper, we present a particle filter that can be used to estimate the bids of other advertisers given a periodic ranking of their bids. This particle filter makes use of models of the bidding behavior of other advertisers, and so we also show how such models can be learned from past bidding data. In experiments in the Ad Auction scenario of the Trading Agent Competition, the combination of this particle filter and bidder modeling outperforms all other bid estimation methods tested. A Particle Filter for Bid Estimation in Ad Auctions with Periodic Ranking Observations David Pardoe, Peter Stonehas 5 papers

R76

Conviviality has been introduced as a social science concept for multiagent systems to highlight soft qualitative requirements like user friendliness of systems. In this paper we introduce formal conviviality measures for dependence networks using a coalitional game theoretic framework, which we contrast with more traditional efficiency and stability measures. Roughly, more opportunities to work with other people increases the conviviality, whereas larger coalitions may decrease the efficiency or stability of these involved coalitions. We first introduce assumptions and requirements, then we introduce a classification, and finally we introduce the conviviality measures. We use a running example from robotics to illustrate the measures. Conviviality Measures Patrice Caire, Baptiste Alcalde, Leendert van der Torrehas 2 papers, Chattrakul Sombattheera

Session A7 – Argumentation and Negotiation

B74

We present a dialogue system that allows agents to exchange arguments in order to come to an agreement on how to act. When selecting arguments to assert, an agent uses a model of what is important to the recipient agent. The system lets the agents agree to an action that each finds acceptable, but does not necessarily demand that they resolve their differing preferences. We present an analysis of the behaviour of our system and develop a mechanism with which an agent can develop a model of another's preferences. Choosing Persuasive Arguments for Action Elizabeth Black, Katie Atkinson

G70

What do I need to say to convince you to do something? This is an important question for an autonomous agent deciding whom to approach for a resource or for an action to be done. Were similar requests granted from similar agents in similar circumstances? What arguments were most persuasive? What are the costs involved in putting certain arguments forward? In this paper we present an agent decision-making mechanism where models of other agents are refined through evidence from past dialogues, and where these models are used to guide future argumentation strategy. We empirically evaluate our approach to demonstrate that decision-theoretic and machine learning techniques can both significantly improve the cumulative utility of dialogical outcomes, and help to reduce communication overhead. Argumentation Strategies for Plan Resourcing Chukwuemeka D. Emele, Timothy J. Normanhas 2 papers, Simon Parsonshas 3 papers

G71

The main goal of a persuasion dialogue is to persuade, but agents may have a number of additional goals concerning the dialogue duration, how much and what information is shared or how aggressive the agent is. Several criteria have been proposed in the literature covering different aspects of what may matter to an agent, but it is not clear how to combine these criteria that are often incommensurable and partial. This paper is inspired by multi-attribute decision theory and considers argument selection as decision-making where multiple criteria matter. A meta-level argumentation system is proposed to argue about what argument an agent should select in a given persuasion dialogue. The criteria and sub-criteria that matter to an agent are structured hierarchically into a value tree and meta-level argument schemes are formalized that use a value tree to justify what argument the agent should select. In this way, incommensurable and partial criteria can be combined. Multi-Criteria Argument Selection In Persuasion Dialogues Tom L. van der Weide, Frank Dignumhas 3 papers, John-Jules Ch. Meyerhas 2 papers, H. Prakken, Gerard A.W. Vreeswijkhas 2 papers

R77

An agent-based negotiation team is a group of two or more agents with their own and possibly conflicting preferences who join together as a single negotiating party because they share a common goal which is related to the negotiation. Scenarios involving negotiation teams require coordination among party members in order to reach a good agreement for all of the party members. An intra-team strategy defines what decisions are taken by the negotiation team and when and how these decisions are taken. Thus, they are tightly linked with the results obtained by the team in a negotiation process. Environmental conditions affect the performance of the different intra-team strategies in different ways. Thus, team members need to analyze their environment in order to select the most appropriate strategy according to the current conditions. In this paper, we analyze how environmental conditions affect different intra-team strategies in order to provide teams with the knowledge necessary to select the proper intra-team strategy. Analyzing Intra-Team Strategies for Agent-Based Negotiation Teams Víctor Sánchez-Anguixhas 2 papers, Vicente Juliánhas 2 papers, Vicente Bottihas 4 papers, Ana García-Forneshas 3 papers

R78

There is now considerable evidence in social psychology, economics, and related disciplines that emotion plays an important role in negotiation. For example, humans make greater concessions in negotiation to an opposing human who expresses anger, and they make fewer concessions to an opponent who expresses happiness, compared to a no-emotion-expression control. However, in AI, despite the wide interest in negotiation as a means to resolve differences between agents and humans, emotion has been largely ignored. This paper explores whether expression of anger or happiness by computer agents, in a multi-issue negotiation task, can produce effects that resemble effects seen in human-human negotiation. The paper presents an experiment where participants play with agents that express emotions (anger vs. happiness vs. control) through different modalities (text vs. facial displays). An important distinction in our experiment is that participants are aware that they negotiate with computer agents. The data indicate that the emotion effects observed in past work with humans also occur in agent-human negotiation, and occur independently of modality of expression. The implications of these results are discussed for the fields of automated negotiation, intelligent virtual agents and artificial intelligence. The Effect of Expression of Anger and Happiness in Computer Agents on Negotiations with Humans Celso M. de Melo, Peter Carnevale, Jonathan Gratchhas 2 papers

Session B7 – Planning

B75

Over the past few years, attempts to scale up infinite-horizon DEC-POMDPs are mainly due to approximate algorithms, but without the theoretical guarantees of their exact counterparts. In contrast, ε-optimal methods have only theoretical significance but are not efficient in practice. In this paper, we introduce an algorithmic framework (β-PI) that exploits the scalability of the former while preserving the theoretical properties of the latter. We build upon β-PI a family of approximate algorithms that can find (provably) errorbounded solutions in reasonable time. Among this family, H-PI uses a branch-and-bound search method that computes a near-optimal solution over distributions over histories experienced by the agents. These distributions often lie near structured, low-dimensional subspace embedded in the high-dimensional sufficient statistic. By planning only on this subspace, H-PI successfully solves all tested benchmarks, outperforming standard algorithms, both in solution time and policy quality. Toward Error-Bounded Algorithms for Infinite-Horizon DEC-POMDPs Jilles S. Dibangoye, Abdel-Illah Mouaddib, Brahim Chaib-draa

B76

The use of distributed POMDPs for cooperative teams has been severely limited by the incredibly large joint policyspace that results from combining the policy-spaces of the individual agents. However, much of the computational cost of exploring the entire joint policy space can be avoided by observing that in many domains important interactions between agents occur in a relatively small set of scenarios, previously defined as coordination locales (CLs). Moreover, even when numerous interactions might occur, given a set of individual policies there are relatively few actual interactions. Exploiting this observation and building on an existing model shaping algorithm, this paper presents D-TREMOR, an algorithm in which cooperative agents iteratively generate individual policies, identify and communicate possible interactions between their policies, shape their models based on this information and generate new policies. D-TREMOR has three properties that jointly distinguish it from previous DEC-POMDP work: (1) it is completely distributed; (2) it is scalable (allowing 100 agents to compute a “good” joint policy in under 6 hours) and (3) it has low communication overhead. D-TREMOR complements these traits with the following key contributions, which ensure improved scalability and solution quality: (a) techniques to ensure convergence; (b) faster approaches to detect and evaluate CLs; (c) heuristics to capture dependencies between CLs; and (d) novel shaping heuristics to aggregate effects of CLs. While the resulting policies are not globally optimal, empirical results show that agents have policies that effectively manage uncertainty and the joint policy is better than policies generated by independent solvers. Distributed Model Shaping for Scaling to Decentralized POMDPs with Hundreds of Agents Prasanna Velagapudi, Pradeep Varakanthamhas 4 papers, Katia Sycarahas 7 papers, Paul Scerrihas 3 papers

G72

PAC-MDP algorithms are particularly efficient in terms of the number of samples obtained from the environment which are needed by the learning agents in order to achieve a near optimal performance. These algorithms however execute a time consuming planning step after each new state-action pair becomes known to the agent, that is, the pair has been sampled sufficiently many times to be considered as known by the algorithm. This fact is a serious limitation on broader applications of these kind of algorithms. This paper examines the planning problem in PAC-MDP learning. Value iteration, prioritized sweeping, and backward value iteration are investigated. Through the exploitation of the specific nature of the planning problem in the considered reinforcement learning algorithms, we show how these planning algorithms can be improved. Our extensions yield significant improvements in all evaluated algorithms, and standard value iteration in particular. The theoretical justification to all contributions is provided and all approaches are further evaluated empirically. With our extensions, we managed to solve problems of sizes which have never been approached by PAC-MDP learning in the existing literature. Efficient Planning in R-max Marek Grześhas 2 papers, Jesse Hoey

G73

This contribution proposes a model for argumentation-based multi-agent planning, with a focus on cooperative scenarios. It consists in a multi-agent extension of DeLP-POP, partial order planning on top of argumentation-based defeasible logic programming. In DeLP-POP, actions and arguments (combinations of rules and facts) may be used to enforce some goal, if their conditions (are known to) apply and arguments are not defeated by other arguments applying. In a cooperative planning problem a team of agents share a set of goals but have diverse abilities and beliefs. In order to plan for these goals, agents start a stepwise dialogue consisting of exchanges of plan proposals, plus arguments against them. Since these dialogues instantiate an A search algorithm, these agents will find a solution if some solution exists, and moreover, it will be provably optimal (according to their knowledge). Multiagent Argumentation for Cooperative Planning in DeLP-POP Pere Pardo, Sergio Pajares, Eva Onaindiahas 2 papers, Pilar Dellunde, Lluís Godo

Session C7 – Game Theory II

R79

The Nash equilibrium is the most commonly adopted solution concept for non-cooperative interaction situations. However, it underlays on the assumption of common information that is hardly verified in many practical situations. When information is not common, the appropriate game theoretic solution concept is the self-confirming equilibrium. It requires that every agent plays the best response to her beliefs and that the beliefs are correct on the equilibrium path. We present, to the best of our knowledge, the first study on the computation of a self-confirming equilibrium for two-player extensive-form games. We provide algorithms, we analyze the computational complexity, and we experimentally evaluate the performance of our algorithms in terms of computational time. Computing a Self-Confirming Equilibrium in Two-Player Extensive-Form Games Nicola Gattihas 4 papers, Fabio Panozzo, Sofia Ceppihas 2 papers

B77

We study how a mobile defender should patrol an area to protect multiple valuable targets from being attacked by an attacker. In contrast to existing approaches, which assume stationary targets, we allow the targets to move through the area according to an a priori known, deterministic movement schedules. We represent the patrol area by a graph of arbitrary topology and do not put any restrictions on the movement schedules. We assume the attacker can observe the defender and has full knowledge of the strategy the defender employs. We construct a game-theoretic formulation and seek defender's optimal randomized strategy in a Stackelberg equilibrium of the game. We formulate the computation of the strategy as a mathematical program whose solution corresponds to an optimal time-dependent Markov policy for the defender. We also consider a simplified formulation allowing only stationary defender's policies which are generally less effective but are computationally significantly cheaper to obtain. We provide experimental evaluation examining this trade-off on a set of test problems covering various topologies of the patrol area and various movement schedules of the targets. Computing Time-Dependent Policies for Patrolling Games with Mobile Targets Branislav Bošanskýhas 3 papers, Viliam Lisýhas 2 papers, Michal Jakobhas 3 papers, Michal Pěchoučekhas 5 papers

G74

The fastest known algorithm for solving General Bayesian Stackelberg games with a finite set of follower (adversary) types have seen direct practical use at the LAX airport for over 3 years; and currently, an (albeit non-Bayesian) algorithm for solving these games is also being used for scheduling air marshals on limited sectors of international flights by the US Federal Air Marshals Service. These algorithms find optimal randomized security schedules to allocate limited security resources to protect targets. As we scale up to larger domains, including the full set of flights covered by the Federal Air Marshals, it is critical to develop newer algorithms that scale-up significantly beyond the limits of the current state-of-theart of Bayesian Stackelberg solvers. In this paper, we present a novel technique based on a hierarchical decomposition and branch and bound search over the follower type space, which may be applied to different Stackelberg game solvers. We have applied this technique to different solvers, resulting in: (i) A new exact algorithm called HBGS that is orders of magnitude faster than the best known previous Bayesian solver for general Stackelberg games; (ii) A new exact algorithm called HBSA which extends the fastest known previous security game solver towards the Bayesian case; and (iii) Approximation versions of HBGS and HBSA that show significant improvements over these newer algorithms with only 12% sacrifice in the practical solution quality. Quality-bounded Solutions for Finite Bayesian Stackelberg Games: Scaling up Manish Jainhas 3 papers, Christopher Kiekintveldhas 4 papers, Milind Tambehas 8 papers

G75

Game theory is fast becoming a vital tool for reasoning about complex real-world security problems, including critical infrastructure protection. The game models for these applications are constructed using expert analysis and historical data to estimate the values of key parameters, including the preferences and capabilities of terrorists. In many cases, it would be natural to represent uncertainty over these parameters using continuous distributions (such as uniform intervals or Gaussians). However, existing solution algorithms are limited to considering a small, finite number of possible attacker types with different payoffs. We introduce a general model of infinite Bayesian Stackelberg security games that allows payoffs to be represented using continuous payoff distributions. We then develop several techniques for finding approximate solutions for this class of games, and show empirically that our methods offer dramatic improvements over the current state of the art, providing new ways to improve the robustness of security game models. Approximation Methods for Infinite Bayesian Stackelberg Games: Modeling Distributional Payoff Uncertainty Christopher Kiekintveldhas 4 papers, Janusz Marecki, Milind Tambehas 8 papers

G76

Recent applications of game theory in security domains use algorithms to solve a Stackelberg model, in which one player (the leader) first commits to a mixed strategy and then the other player (the follower) observes that strategy and best-responds to it. However, in real-world applications, it is hard to determine whether the follower is actually able to observe the leader's mixed strategy before acting. In this paper, we model the uncertainty about whether the follower is able to observe the leader's strategy as part of the game (as proposed in the extended version of Yin et al. [17]). We describe an iterative algorithm for solving these games. This algorithm alternates between calling a Nash equilibrium solver and a Stackelberg solver as subroutines. We prove that the algorithm finds a solution in a finite number of steps and show empirically that it runs fast on games of reasonable size. We also discuss other properties of this methodology based on the experiments. Solving Stackelberg Games with Uncertain Observability Dmytro Korzhykhas 2 papers, Vincent Conitzerhas 2 papers, Ronald Parr

Session D7 – Virtual Agents II

G77

Creating a virtual character that exhibits realistic physical behaviors requires a rich set of animations. To mimic the variety as well as the subtlety of human behavior, we may need to animate not only a wide range of behaviors but also variations of the same type of behavior influenced by the environment and the state of the character, including the emotional and physiological state. A general approach to this challenge is to gather a set of animations produced by artists or motion capture. However, this approach can be extremely costly in time and effort. In this work, we propose a model that can learn styled motion generation and an algorithm that produce new styles of motions via style interpolation. The model takes a set of styled motions as training samples, and can create new motions that are the generalization among given styles of motions. Our style interpolation algorithm can blend together motions with distinct styles, and it also helps improve the performance of previous work. We verify our algorithm using walking motions of different styles, and the experimental results show that our method is significantly better than previous work. A Style Controller for Generating Virtual Human Behaviors Chung-Cheng Chiu, Stacy Marsellahas 2 papers

G78

In this paper, we merge speech act theory, emotion theory, and logic. We propose a modal logic that integrates the concepts of belief, goal, ideal and responsibility and that allows to describe what a given agent expresses in the context of a conversation with another agent. We use the logic in order to provide a systematic analysis of expressive speech acts, that is, speech acts that are aimed at expressing a given emotion (e.g. to apologize, to thank, to reproach, etc.). The Face of Emotions: A Logical Formalization of Expressive Speech Acts Nadine Guiraudhas 2 papers, Dominique Longin, Emiliano Lorinihas 2 papers, Sylvie Pesty, Jérémy Rivière

B78

The objective of our current work was to create a model for agent memory retrieval of emotionally relevant episodes. We analyzed agent architectures that support memory retrieval realizing that none fulfilled all of our requirements. We designed an episodic memory retrieval model consisting of two main steps: location ecphory, in which the agent's current location is matched against stored memories associated locations; and recollective experience, in which memories that had a positive match are re-appraised. We implemented our model and used it to drive the behavior of characters in a game application. We recorded the application running and used the videos to create a non-interactive evaluation. The evaluation's results are consistent with our hypothesis that agents with memory retrieval of emotionally relevant episodes would be perceived as more believable than similar agents without it. I've Been Here Before! Location and Appraisal in Memory Retrieval Paulo F. Gomes, Carlos Martinho, Ana Paiva

B79

This paper introduces a model which connects representations of the space surrounding a virtual humanoid's body with the space it shares with several interaction partners. This work intends to support virtual humans (or humanoid robots) in near space interaction and is inspired by studies from cognitive neurosciences on the one hand and social interaction studies on the other hand. We present our work on learning the body structure of an articulated virtual human by using data from virtual touch and proprioception sensors. The results are utilized for a representation of its reaching space, the so-called peripersonal space. In interpersonal interaction involving several partners, their peripersonal spaces may overlap and establish a shared reaching space. We define it as their interaction space, where cooperation takes place and where actions to claim or release spatial areas have to be adapted, to avoid obstructions of the other's movements. Our model of interaction space is developed as an extension of Kendon's F-formation system, a foundational theory of how humans orient themselves in space when communicating. Thus, interaction space allows for analyzing the spatial arrangement (i.e., body posture and orientation) between multiple interaction partners and the extent of space they share. Peripersonal and interaction space are modeled as potential fields to control the virtual human's behavior strategy. As an example we show how the virtual human can relocate object positions toward or away from locations reachable for all partners, and thus influencing the degree of cooperation in an interaction task. From Body Space to Interaction Space - Modeling Spatial Cooperation for Virtual Humans Nhung Nguyen, Ipke Wachsmuthhas 2 papers

G79

While speaking about social interaction, psychology claims as crucial the temporal correlations between interactants' behaviors: to give to their partners a feeling of natural interaction, interactants, be human, robotic or virtual, must be able to react on appropriate time. Recent approaches consider autonomous agents as dynamical systems and the interaction as a coupling between these systems. These approaches solve the issue of time handling and enable to model synchronization and turn-taking as phenomenon emerging with the coupling. But when complex computations are added to their architecture, such as processing of video and audio signals, delays appear within the interaction loop and disrupt this coupling. We model here a dyad of agents where processing delays are controlled. These agents, driven by oscillators, synchronize and take turns when there is no delay. We describe the methodology enabling to evaluate the synchrony and turn-taking emergence. We test oscillators coupling properties when there is no delay: coupling occurs if coupling strength is inferior to the parameter controlling oscillators natural period and if the ratio between oscillators periods is inferior to 1/2. We quantify the maximal delays between agents which do not disrupt the interaction: the maximal delay tolerated by agents is proportional to the natural period of the coupled system and to the strength of the coupling. These results are put in perspective with the different time constraints of human-human and human-agent interactions. Effect of Time Delays on Agents' Interaction Dynamics Ken Prepin, Catherine Pelachaud