Full Papers

Session A – Best Paper and Best Student Paper

653

  As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the TAMER framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on TAMER showed that shaping can greatly reduce the sample complexity required to learn a good policy, can enable lay users to teach agents the behaviors they desire, and can allow agents to learn within a Markov Decision Process (MDP) in the absence of a coded reward function. However, TAMER does not allow this human training to be combined with autonomous learning based on such a coded reward function. This paper leverages the fast learning exhibited within the TAMER framework to hasten a reinforcement learning (RL) algorithm's climb up the learning curve, effectively demonstrating that human reinforcement and MDP reward can be used in conjunction with one another by an autonomous agent. We tested eight plausible TAMER+RL methods for combining a previously learned human reinforcement function, $\hat{H}$, with MDP reward in a reinforcement learning algorithm. This paper identifies which of these methods are most effective and analyzes their strengths and weaknesses. Results from these TAMER+RL algorithms indicate better final performance and better cumulative performance than either a TAMER agent or an RL agent alone. Combining Manual Feedback with Subsequent MDP Reward Signals for Reinforcement Learning W. Bradley Knox, Peter Stonehas 5 papers

697

  Large heterogeneous teams will often be in situations where sensor datathat is uncertain and conflicting is shared across a peer-to-peer network.Not every team member will have direct access to sensors and team members will be influenced mostly by teammates with whom they communicatedirectly. In this paper, we investigate the dynamics and emergent behaviors of a large team sharing beliefs to reach conclusions about the world.We find empirically that the dynamics of information propagation in suchbelief sharing systems are characterized by information avalanches of belief changes caused by a single additional sensor reading. The distributionof the size of these avalanches dictates the speed and accuracy with whichthe team reaches conclusions. A key property of the system is that it exhibits qualitatively different dynamics and system performance over smallchanges in system parameter ranges. In one particular range, the systemexhibits behavior known as scale-invariant dynamics which we empiricallyfind to correspond to dramatically more accurate conclusions being reachedby team members. Due to the fact that the ranges are very sensitive toconfiguration details, the parameter ranges over which specific system dynamics occur are extremely difficult to predict precisely. In this paper we(a) develop techniques to mathematically characterize the dynamics of theteam belief propagation (b) obtain through simulations the relation betweenthe dynamics and overall system performance, and (c) develop a novel distributed algorithms that the agents in the team use locally to steer the wholeteam to areas of optimized performance. Exploiting Scale Invariant Dynamics for Efficient Information Propagation in Teams Robin Glinton, Paul Scerrihas 2 papers, Katia Sycarahas 4 papers

Session B – Best Paper and Best Student Paper

283

  The use of energy storage devices in homes has been advocated as one of the main ways of saving energy and reducing the reliance on fossil fuels in the future Smart Grid. However, if micro-storage devices are all charged at the same time using power from the electricity grid, it means a higher demand and, hence, more generation capacity, more carbon emissions, and, in the worst case, breaking down the system due to over-demand. To alleviate such issues, in this paper, we present a novel agent-based micro-storage management technique that allows all (individually-owned) storage devices in the system to converge to profitable, efficient behaviour. Specifically, we provide a general framework within which to analyse the Nash equilibrium of an electricity grid and devise new agent-based storage learning strategies that adapt to market conditions. Taken altogether, our solution shows that, specifically, in the UK electricity market, it is possible to achieve savings of up to 13\% on average for a consumer on his electricity bill with a storage device of 4 kWh. Moreover, we show that there exists an equilibrium where only 38\% of UK households would own storage devices and where social welfare would be also maximised (with an overall annual savings of nearly GBP 1.5B at that equilibrium). Agent-based Micro-Storage Management for the Smart Grid Perukrishnen Vytelingumhas 4 papers, Thomas D. Voicehas 3 papers, Sarvapali D. Ramchurnhas 5 papers, Alex Rogershas 7 papers, Nicholas R. Jenningshas 15 papers

Session 1 – Virtual Agents I

Session 2 – Coordination and Cooperation I

227

  We consider the issue of representing coalitional games in multi-agent systems that exhibit externalities from coalition formation,i.e., systems in which the gain from forming a coalition may be affected by the formation of other co-existing coalitions. Althoughexternalities play a key role in many real-life situations, very littleattention has been given to this issue in the multi-agent system literature, especially with regard to the computational aspects involved.To this end, we propose a new representation which, in the spiritof Ieong and Shoham, is based on Boolean expressions. Theidea behind our representation is to construct much richer expressions that allow for capturing externalities induced upon coalitions.We show that the new representation is fully expressive, at least asconcise as the conventional partition function game representationand, for many games, exponentially more concise. We evaluate theefficiency of our new representation by considering the problem ofcomputing the Extended and Generalized Shapley value, a powerful extension of the conventional Shapley value to games withexternalities. We show that by using our new representation, theExtended and Generalized Shapley value, which has not been studied in the computer science literature to date, can be computed intime linear in the size of the input. A Logic-Based Representation for Coalitional Games with Externalities Tomasz Michalakhas 3 papers, Dorota Marciniak, Marcin Szamotulski, Talal Rahwanhas 2 papers, Michael Wooldridgehas 4 papers, Peter McBurneyhas 3 papers, Nicholas R. Jenningshas 15 papers

258

  Distributed Constraint Optimization (DCOP) is a popular framework for cooperative multi-agent decision making. DCOP is NP-hard, so an important line of work focuses on developing fast incomplete solution algorithms for large-scale applications. One ofthe few incomplete algorithms to provide bounds on solution quality is k-size optimality, which defines a local optimality criterionbased on the size of the group of deviating agents. Unfortunately,the lack of a general-purpose algorithm and the commitment toforming groups based solely on group size has limited the use ofk-size optimality.This paper introduces t-distance optimality which departs fromk-size optimality by using graph distance as an alternative criteriafor selecting groups of deviating agents. This throws open a new research direction into the tradeoffs between different group selectionand coordination mechanisms for incomplete DCOP algorithms.We derive theoretical quality bounds for t-distance optimality thatimprove known bounds for k-size optimality. In addition, we develop a new efficient asynchronous local search algorithm for finding both k-size and t-distance optimal solutions — allowing theseconcepts to be deployed in real applications. Indeed, empirical results show that this algorithm significantly outperforms the only existing algorithm for finding general k-size optimal solutions, whichis also synchronous. Finally, we compare the algorithmic performance of k-size and t-distance optimality using this algorithm. Wefind that t-distance consistently converges to higher-quality solutions in the long run, but results are mixed on convergence speed;we identify cases where k-size and t-distance converge faster. Asynchronous Algorithms for Approximate Distributed Constraint Optimization with Quality Bounds Christopher Kiekintveldhas 4 papers, Zhengyu Yinhas 3 papers, Atul Kumar, Milind Tambehas 5 papers

Session 3 – Game Theory I

742

  We consider the computational complexity of pure Nash equilibria in graphical games. It is known that the problem is NP-complete in general, but tractable (i.e., in P) for special classes of graphs such as those with bounded treewidth. It is then natural to ask: is it possible to characterize all tractable classes of graphs for this problem? In this work, we provide such a characterization for the case of bounded in-degree graphs, thereby resolving the gap between existing hardness and tractability results.In particular, we analyze the complexity of PUREGG(C, —), the problem of deciding the existence of pure Nash equilibria in graphical games whose underlying graphs are restricted to class C.We prove that, under reasonable complexity theoretic assumptions, for every recursively enumerable class C of directed graphs with bounded in-degree, PUREGG(C, —) is in polynomial time if and only if the reduced graphs (the graphs resulting from iterated removal of sinks) of C have bounded treewidth. We also give a characterization for PURECHG(C,—), the problem of deciding the existence of pure Nash equilibria in colored hypergraphical games, a game representation that can express the additional structure that some of the players have identical local utility functions. We show that the tractable classes of bounded-arity colored hypergraphical games are precisely those whose reduced graphs have boundedtreewidth modulo homomorphic equivalence. Our proofs make novel use of Grohe's characterization of the complexity of homomorphism problems. Pure Nash Equilibria: Complete Characterization of Hard and Easy Graphical Games Albert Xin Jiang, MohammadAli Safari

Session 4 – Trust

Session 5 – Agent Reasoning I

436

  One of the most challenging aspects of reasoning, planning, and acting in a multi-agent domain is reasoning about what the agents know about the knowledge of their fellows, and to take it into account when planning and acting. In the past this has been done using modal and dynamic epistemic logics. In this paper we explore the use of answer set programming (ASP), and reasoning about action techniques for this purpose. These approaches present a number of theoretical and practical advantages. From the theoretical perspective, ASP's property of non-monotonicity (and several other features) allow us to express causality in an elegant fashion. From the practical perspective, recent implementations of ASP solvers have become very efficient, outperforming several other systems in recent SAT competitions. Finally, the use of ASP and reasoning about action techniques allows for the adaptation of a large body of research developed for single-agent to multi-agent domains. We begin our discussion by showing how ASP can be used to find Kripke models of a modal theory. We then illustrate how both the muddy children, and the sum-and-product problems can be represented and solved using these concepts. We describe and implement a new kind of action, which we call "ask-and-truthfully-answer," and show how this action brings forth a new dimension to the muddy children problem. Using Answer Set Programming to model multi-agent scenarios involving agents' knowledge about other's knowledge Chitta Baral, Gregory Gelfond, Tran Cao Son, Enrico Pontelli

Session 6 – Learning I

Session 7 – Social Choice

104

  A voting rule is an algorithm for determining the winner in an election, and there are several approaches that have been used to justify the proposed rules. One justification is to show that a rule satisfies a set of desirable axioms that uniquely identify it. Another is to show that the calculation that it performs is actually maximum likelihood estimation relative to a certain model of noise that affects voters (MLE approach). The third approach, which has been recently actively investigated, is the so-called {\em distance rationalizability} framework. In it, a voting rule is defined via a class of consensus elections (i.e., a class of elections that have a clear winner) and a distance function. A candidate $c$ is a winner of an election $E$ if $c$ wins in one of the consensus elections that are closest to $E$ relative to the given distance. In this paper, we show that essentially any voting rule is distance-rationalizable if we do not restrict the two ingredients of the rule: the consensus class and the distance. Thus distance rationalizability of a rule does not by itself guarantee that the voting rule has any desirable properties. However, we demonstrate that the distance used to rationalize a given rule may provide useful information about this rule's behavior. Specifically,we identify a large class of distances, which we call {\em votewise} distances, and show that if a rule is rationalized via a distance from this class, many important properties of this rule can be easily expressed in terms of the underlying distance. This enables us to provide a new characterization of scoring rules and to establish a connection with the MLE framework. We alsogive bounds on the complexity of the winner determination problem for distance-rationalizable rules. On the Role of Distances in Defining Voting Rules Edith Elkind, Piotr Faliszewskihas 2 papers, Arkadii Slinko

Session 8 – Agreement Technologies

119

  There is a number of recent research lines addressing complex negotiations in highly rugged utility spaces. However,most of them focus on overcoming the problems imposedby the complexity of the scenario, without analyzing thestrategic behavior of the agents in the models they propose. Analyzing the dynamics of the negotiation processwhen agents with different strategies interact is necessaryto apply these models to real, competitive environments,where agents cannot be supposed to behave in the same way.Specially problematic are situations like the well-known prisoner’s dilemma, or more generally, situations of high price ofanarchy. These situations imply that individual rationalitydrives the agents towards strategies which yield low individual and social welfares. In highly rugged scenarios, suchsituations usually make agents fail to reach an agreement,and therefore negotiation mechanisms should be designed toavoid them. This paper performs a strategy analysis of anauction-based negotiation model designed for highly ruggedscenarios, revealing that the approach is prone to the prisoner’s dilemma. In addition, a set of techniques to solvethis problem are proposed, and an experimental evaluationis performed to validate the adequacy of the proposed approaches to improve the strategic stability of the negotiationprocess. Avoiding the Prisoner's Dilemma in Auction-based Negotiations for Highly Rugged Utility Spaces Ivan Marsa-Maestrehas 2 papers, Miguel A. Lopez-Carmonahas 2 papers, Juan R. Velascohas 2 papers, Enrique de la Hozhas 2 papers

Session 9 – KR/AAMAS Joint Session I

Session 10 – KR/AAMAS Joint Session II

Session 11 – Simulation

103

  Experimental analysis of networks of cooperative learningagents (to verify certain properties such as the system’s stability) has been commonly used due to the complexity oftheoretical analysis in such cases. Due to the large numberof parameters to analyze, researchers used metrics that summarize the system in few parameters. Since in cooperativesystem the ultimate goal is to optimize some global metric,researchers typically analyzed the evolution of the globalperformance metric over time to verify system properties.For example, if the global metric improves and eventuallystabilizes, it is considered a reasonable verification of thesystem’s stability.The global performance metric, however, overlooks an important aspect of the system: the network structure. Weshow an experimental case study where the convergence ofthe global performance metric is deceiving, hiding an underlying instability in the system that later leads to a significantdrop in performance. To expose such instability, we propose the use of the graph analysis methodology, where thenetwork structure is summarized using some network measures. We develop a new network measure that summarizesan agent’s interaction with its neighbors and takes the disparity of these interactions into account. The new measureis applied to our case study, clearly exposing the instability that was previously hidden by the global performancemetric. Using Graph Analysis to Study Networks of Adaptive Agent Sherief Abdallahhas 2 papers

Session 12 – Robotics I

333

  The problem of multi-robot patrol in adversarial environments has been gaining considerable interest during the recent years.In this problem, a team of mobile robots is required to repeatedly visit some target area in order to detectpenetrations that are controlled by an adversary. Little hasbeen written so far on the nature of the event of penetration,and it is commonly assumed that the goal of the robots is todetect the penetration at any time during its occurrence.Inthis paper we offer a new definition of an event, with correlation to a utility function such that the detection of the eventby the robots in different stages of its occurrence grants therobots a different reward. The goal of the robots is, then, tomaximize their utility from detecting the event.We provide three different models of events, for which wedescribe algorithms for calculating the expected utility fromdetecting the event and discuss the how the model influencesthe optimality of the patrol algorithm. In the first and basicmodel, we assume that there exists a reward function suchthat detecting an event at different times grants the robotswith an associated reward. In the second model, the eventmight evolve during its occurrence, and this progression correlates to both different rewards and to growing probabilityof detection. Finally, we consider a general model, in whichthe event can be detected from distance, where the probability of detection depends both on the distance from therobot and on the current state of the event. Last, we discuss how the new event models presented in this paper setgrounds for handling the problem of patrol in heterogeneousenvironments, where parts of the perimeter could be moresensitive to occurrence of events. On Events in Multi-Robot Patrol in Adversarial Environments Noa Agmon

Session 13 – Economic Paradigms I

112

  In the strategyproof classification setting, a set of labeledexamples is partitioned among multiple agents. Given thereported labels, an optimal classification mechanism returnsa classifier that minimizes the number of mislabeled examples. However, each agent is interested in the accuracy ofthe returned classifier on its own examples, and may misreport its labels in order to achieve a better classifier, thuscontaminating the dataset. The goal is to design strategyproof mechanisms that correctly label as many examplesas possible.Previous work has investigated the foregoing setting under limiting assumptions, or with respect to very restrictedclasses of classifiers. In this paper, we study the strategyproof classification setting with respect to prominent classesof classifiers—boolean conjunctions and linear separators — and without any assumptions on the input. On the negativeside, we show that strategyproof mechanisms cannot achievea constant approximation ratio, by showing that such mechanisms must be dictatorial on a subdomain, in the sensethat the outcome is selected according to the preferences ofa single agent. On the positive side, we present a randomizedmechanism—Iterative Random Dictator—and demonstrateboth that it is strategyproof and that its approximation ratiodoes not increase with the number of agents. Interestingly,the notion of dictatorship is prominently featured in all ourresults, helping to establish both upper and lower bounds. On the Limits of Dictatorial Classification Reshef Meir, Ariel Procaccia, Jeffrey Rosenschein

224

  We explore settings where a principal must make a decision aboutwhich action to take to achieve a desired outcome. The principal elicits the probability of achieving the outcome by followingeach action from a self-interested (but decision-agnostic) expert.We prove results about the relation between the principal’s decision rule and the scoring rules that determine the expert’s payoffs.For the most natural decision rule (where the principal takes the action with highest success probability), we prove that no symmetricscoring rule, nor any of Winkler’s asymmetric scoring rules, havedesirable incentive properties. We characterize the set of differentiable scoring rules with desirable incentive properties and construct one. We then consider decision markets, where the role of asingle expert is replaced by multiple agents that interact by tradingwith an automated market maker. We prove a surprising impossibility for this setting: an agent can always benefit from exaggeratingthe success probability of a suboptimal action. To mitigate this, weconstruct automated market makers that minimize manipulability.Finally, we consider two alternative decision market designs. Weprove the first, in which all outcomes live in the same probabilityuniverse, has poor incentive properties. The second, in which theexperts trade in the probability of the outcome occurring unconditionally, exhibits a new kind of no-trade phenomenon. Decision Rules and Decision Markets Abraham Othmanhas 3 papers, Tuomas Sandholmhas 6 papers

308

  This paper analyzes the worst-case efficiency ratio of false-name-proofcombinatorial auction mechanisms. False-name-proofness generalizesstrategy-proofness, by assuming that a bidder can submit multiple bidsunder fictitious identifiers. Even the well-known Vickrey-Clarke-Grovesmechanism is not false-name-proof. It has previously been shown thatthere is no false-name-proof mechanism that always achieves a Paretoefficient allocation. Hence, if false-name bids are possible,we need to sacrifice efficiency to some extent.This leaves the natural question of how much surplusmust be sacrificed. To answer this question,this paper focuses on worst-case analysis.Specifically, we consider thefraction of the Pareto-efficient surplus that we obtain, and try tomaximize this fraction in the worst case, under the constraint offalse-name-proofness.As far as we are aware, this is the first attempt to examine theworst-case efficiency of false-name-proof mechanisms. We show that the worst-case efficiency ratio of any false-name-proofmechanism that satisfies some apparently minor assumptions is atmost $2/(m+1)$ for auctions with $m$ different goods.We also observe that the worst-case efficiency ratioof existing false-name-proof mechanisms is generally $1/m$ or 0.Finally, we propose a novel mechanism,called the adaptive reserve price mechanism, which is false-name-proof when all bidders are single-minded,and its worst-case efficiency ratio is $2/(m+1)$, i.e., optimal. Worst-case efficiency ratio in false-name-proof combinatorial auction mechanisms Atsushi Iwasakihas 2 papers, Vincent Conitzerhas 5 papers, Yoshifusa Omori, Yuko Sakurai, Taiki Todohas 2 papers, Mingyu Guohas 3 papers, Makoto Yokoohas 4 papers

440

  In this paper, we study a service procurement problem with uncertainty as to whether service providers are capable of completing agiven task within a specified deadline. This type of setting is often encountered in large and dynamic multi-agent systems, such ascomputational Grids or clouds. To effectively deal with this uncertainty, the consumer may dynamically and redundantly procuremultiple services over time, in order to increase the probability ofsuccess, while at the same time balancing this with the additionalprocurement costs. However, in order to do this optimally, the consumer requires information about the providers’ costs and their success probabilities over time. This information is typically held privately by the providers and they may have incentives to misreportthis, so as to increase their own profits. To address this problem, weintroduce a novel mechanism that incentivises self-interested providers to reveal their true costs and capabilities, and we show thatthis mechanism is ex-post incentive compatible, efficient and individually rational. However, for these properties to hold, it generallyneeds to compute the optimal solution, which can be intractable inlarge settings. Therefore, we show how we can generate approximate solutions while maintaining the economic properties of themechanism. This approximation admits a polynomial-time solution that can be computed in seconds even for hundreds of providers, and we demonstrate empirically that it performs as well as theoptimal in typical scenarios. In particularly challenging settings,we show that it still achieves 97\% or more of the optimal. Scalable Mechanism Design for the Procurement of Services with Uncertain Durations Enrico Gerdinghas 3 papers, Sebastian Stein, Kate Larsonhas 6 papers, Alex Rogershas 7 papers, Nicholas R. Jenningshas 15 papers

Session 14 – Verification

Session 15 – Learning II

421

  This paper describes an algorithm, called CQ-learning, whichlearns to adapt the state representation for multi-agent systems inorder to coordinate with other agents. We propose a multi-levelapproach which builds a progressively more advanced representation of the learning problem. The idea is that agents start with aminimal single agent state space representation, which is expandedonly when necessary. In cases where agents detect conflicts, theyautomatically expand their state to explicitly take into account theother agents. These conflict situations are then analyzed in an attempt to find an abstract representation which generalises over theproblem states. Our system allows agents to learn effective policies, while avoiding the exponential state space growth typical inmulti-agent environments. Furthermore, the method we introduceto generalise over conflict states allows knowledge to be transferredto unseen and possibly more complex situations. Our research departs from previous efforts in this area of multi-agent learning because our agents combine state space generalisation with an agent-centric point of view. The algorithms that we introduce can beused in robotic systems to automatically reduce the sensor information to what is essential to solve the problem at hand. This isa must when dealing with multiple agents, since learning in suchenvironments is a cumbersome task due to the massive amount ofinformation, much of which may be irrelevant. In our experimentswe demonstrate a simulation of such environments using variousgridworlds. Learning multi-agent state space representations Yann-Michaël De Hauwere, Peter Vrancxhas 2 papers, Ann Nowehas 2 papers

630

  A major challenge for traditional approaches to multiagentlearning is to train teams that easily scale to include additional agents. The problem is that such approaches typically encode each agent’s policy separately. Such separationmeans that computational complexity explodes as the number of agents in the team increases, and also leads to theproblem of reinvention: Skills that should be shared amongagents must be rediscovered separately for each agent. Toaddress this problem, this paper presents an alternative evolutionary approach to multiagent learning called multiagentHyperNEAT that encodes the team as a pattern of relatedpolicies rather than as a set of individual agents. To capturethis pattern, a policy geometry is introduced to describe therelationship between each agent’s policy and its canonicalgeometric position within the team. Because policy geometry can encode variations of a shared skill across all of thepolicies it represents, the problem of reinvention is avoided.Furthermore, because the policy geometry of a particularteam can be sampled at any resolution, it acts as a heuristicfor generating policies for teams of any size, producing apowerful new capability for multiagent learning. In this paper, multiagent HyperNEAT is tested in predator-prey androom-clearing domains. In both domains the results are effective teams that can be successfully scaled to larger teamsizes without any further training. Evolving Policy Geometry for Scalable Multiagent Learning David D'Ambrosio, Joel Lehman, Sebastian Risi, Kenneth Stanley

Session 16 – Coordination and Cooperation II

312

  In this paper, we extend the traditional formalization ofa Distributed Constraint Satisfaction Problems (DisCSP)to a Quantified DisCSP.A Quantified DisCSP includes several universally quantifiedvariables, while all of the variables in a traditional DisCSP areexistentially quantified.A universally quantified variable represents a choice of the nature oran adversary.A Quantified DisCSP formalizes a situation where a team of agents istrying to make a robust plan against the nature or an adversary.In this paper, we present the formalization of such aQuantified DisCSP and develop an algorithm for solving it.This algorithm generalizes the asynchronous backtracking algorithmused for solving a DisCSP.In this algorithm, agents communicate a value assignment calleda good in addition to the nogood used inasynchronous backtracking.Interestingly, the procedures executed by an adversarial/cooperative agent for good/nogood are totally symmetrical.Furthermore, we develop a method for improving this basic algorithm. Experimental evaluation results illustrate that we observe an easy-hard-easy transition by changing the tightness of constraints, while very loose problem instances are relatively hard.Also, the modification of the basic algorithm is effective and can achieve about 25\% reduction of cycles for the hardest problem instances. Cooperative Problem Solving against Adversary:Quantified Distributed Constraint Satisfaction Problem Satomi Babahas 2 papers, Atsushi Iwasakihas 2 papers, Makoto Yokoohas 4 papers, Marius Silaghihas 2 papers, Katsutoshi Hirayamahas 2 papers, Toshihiro Matsuihas 2 papers

Session 17 – Agent Societies

Session 18 – Economic Paradigms II

Session 19 – Robotics II

455

  Several researchers, present authors included, envision personalmobile robot agents that can assist humans in their dailytasks. Despite many advances in robotics, such mobile robot agentsstill face many limitations in their perception, cognition, and actioncapabilities. In this work, we propose a symbiotic interaction betweenrobot agents and humans to overcome the robot limitations whileallowing robots to also help humans. We introduce a visitor'scompanion robot agent, as a natural task for such symbioticinteraction, e.g., the visitor lacks knowledge of the environment butcan easily open a door or read a door label, while the mobile robotwith no arms cannot open a door and may be confused about its exactlocation, but can plan paths well through the building and can provideuseful relevant information to the visitor. We present this visitorcompanion task in detail with an enumeration and formalization of theactions of the robot agent in its interaction with the human. Webriefly describe the wifi-based robot localization algorithm and showresults of the different levels of human help to the robot during the robotnavigation. We then model the tradeoffs of the value of the robot help to the human and present illustrative experiments. Our work has been fully implemented in a mobile robot agent, CoBot, which has successfully navigated for several hours and continues to navigate in our indoor environment. An Effective Personal Mobile Robot Agent Through Symbiotic Human-Robot Interaction Stephanie Rosenthal, Joydeep Biswas, Manuela Velosohas 3 papers

Session 20 – Agent-Based System Development

Session 21 – Distributed Problem Solving

469

  In this paper, we propose a Quantified Distributed Constraint Optimization problem (QDCOP) that extends the framework of Distributed Constraint Optimization problems (DCOPs). DCOPs havebeen studied as a fundamental model of multi-agent cooperation.In traditional DCOPs, all agents cooperate to optimize the sum oftheir cost functions. However, in practical systems some agentsmay desire to select the value of their variables without cooperation. In special cases, such agents may take the values with theworst impact on the quality of the result reachable by the optimization process. We apply existential/universal quantifiers to distinctuncooperative variables. A universally quantified variable is leftunassigned by the optimization as the result has to hold when ittakes any value from its domain, while an existentially quantifiedvariable takes exactly one of its values for each context. Similar classes of problems have recently been studied as (Distributed)Quantified Constraint Problems, where the variables of the CSPhave quantifiers. All constraints should be satisfied independentlyof the value taken by universal variables. We propose a QDCOPthat applies the concept of game tree search to DCOP. If the original problem is a minimization problem, agents that own universallyquantified variables may intend to maximize the cost value in theworst case. Other agents normally intend to optimize the minimizing problems. Therefore, only the bounds, especially the upperbounds, of the optimal value are guaranteed. The purpose of thenew class of problems is to compute such bounds, as well as tocompute sub-optimal solutions. For the QDCOP, we also proposeseveral methods that are based on min-max/alpha-beta and ADOPTalgorithms. A Quantified Distributed Constraint Optimization Problem Toshihiro Matsuihas 2 papers, Marius Silaghihas 2 papers, Katsutoshi Hirayamahas 2 papers, Makoto Yokoohas 4 papers, Hiroshi Matsuo, Satomi Babahas 2 papers

606

  Recent studies have investigated how a team of mobile sensors can cope withreal world constraints, such as uncertainty in the reward functions,dynamically appearing and disappearing targets, technology failures endchanges in the environment conditions. In this study we consider an additional element, deception by an adversary,which is relevant in many (military) applications. The adversary isexpected to use deception to prevent the sensor team from performing itstasks. We employ a game theoretic model to analyze the expected strategy ofthe adversary and find the best response. More specifically we considerthat the adversary deceptively changes the importance that agents give totargets in the area. The opponent is expected to use camouflage in order to create confusionamong the sensors regarding the importance of targets, and reduce the team'sefficiency in target coverage. We represent a Mobile Sensor Team problem using the Distributed ConstraintOptimization Problem (DCOP) framework. We propose an optimal method for theselection of a position of a single agent facing a deceptive adversary. Thismethod serves as a heuristic for agents to select their position in a fullscale problem with multiple agents in a large area. Our empirical study demonstrates the success of our model as compared withexisting models in the presence of deceptions. Deception in Networks of Mobile Sensing Agents Viliam Lisýhas 2 papers, Roie Zivanhas 2 papers, Katia Sycarahas 4 papers, Michal Péchoučekhas 4 papers

189

  Many applications in multiagent learning are essentially convex optimization problems in which agents have only limited communication and partial information about the function being minimized (examples of such applications include, among others, coordinated sensor localization, distributed adaptive filtering, control, and coordination). Given this observation, we propose a new non-hierarchical decentralized algorithm for the asymptotic minimization of possibly time-varying convex functions. In our method each agent has knowledge of a time-varying local cost function, and the objective is to minimize asymptotically a global cost function defined by the sum of the local functions. At each iteration of our algorithm, agents improve their estimates of a minimizer of the global function by applying a particular version of the adaptive projected subgradient method to their local functions. Then the agents exchange and mix their improved estimates using a probabilistic model based on recent results in weighted average consensus algorithms. The resulting algorithm is provably optimal and reproduces as particular cases many existing algorithms (such as consensus algorithms and recent methods based on the adaptive projected subgradient method). To illustrate one possible application, we show how our algorithm can be applied to coordinated acoustic source localization in sensor networks. Distributed Multiagent Learning with a Broadcast Adaptive Subgradient Method Renato Cavalcante, Alex Rogershas 7 papers, Nicholas R. Jenningshas 15 papers, Isao Yamada

Session 22 – Agent Reasoning II

Session 23 – Game Theory II

250

  There has been significant recent interest in game theoretic approaches to security, with much of the recent research focused onutilizing the leader-follower Stackelberg game model; for example, these games are at the heart of major applications such asthe ARMOR program deployed for security at the LAX airportsince 2007 and the IRIS program in use by the US Federal AirMarshals (FAMS). The foundational assumption for using Stackelberg games is that security forces (leaders), acting first, commit toa randomized strategy; while their adversaries (followers) choosetheir best response after surveillance of this randomized strategy.Yet, in many situations, the followers may act without observation of the leader’s strategy, essentially converting the game intoa simultaneous-move game model. Previous work fails to addresshow a leader should compute her strategy given this fundamentaluncertainty about the type of game faced.Focusing on the complex games that are directly inspired by realworld security applications, the paper provides four contributionsin the context of a general class of security games. First, exploiting the structure of these security games, the paper shows that theNash equilibria in security games are interchangeable, thus alleviating the equilibrium selection problem. Second, resolving theleader’s dilemma, it shows that under a natural restriction on security games, any Stackelberg strategy is also a Nash equilibriumstrategy; and furthermore, the solution is unique in a class of realworld security games of which ARMOR is a key exemplar. Third,when faced with a follower that can attack multiple targets, manyof these properties no longer hold. Fourth, our experimental resultsemphasize positive properties of games that do not fit our restrictions. Our contributions have major implications for the real-worldapplications. Stackelberg vs. Nash in Security Games: Interchangeability, Equivalence, and Uniqueness Zhengyu Yinhas 3 papers, Dmytro Korzhyk, Christopher Kiekintveldhas 4 papers, Vincent Conitzerhas 5 papers, Milind Tambehas 5 papers

Session 24 – Coordination and Cooperation III

390

  In large wireless sensor networks, the problem of assigning radiofrequencies to sensing agents such that no two connected sensorsare assigned the same value (and will thus interfere with one another) is a major challenge. To tackle this problem, we developa novel decentralised coordination algorithm that activates only asubset of the deployed agents, subject to the connectivity graphof this subset being provably 3-colourable in linear time, henceallowing the use of a simple decentralised graph colouring algorithm. Crucially, while doing this, our algorithm maximises thesensing coverage achieved by the selected sensing agents, whichis given by an arbitrary non-decreasing submodular set function.We empirically evaluate our algorithm by benchmarking it againsta centralised greedy algorithm and an optimal one, and show thatthe selected sensing agents manage to achieve 90\% of the coverage provided by the optimal algorithm, and 85\% of the coverageprovided by activating all sensors. Moreover, we use a simple decentralised graph colouring algorithm to show the frequency assignment problem is easy in the resulting graphs; in all consideredproblem instances, this algorithm managed to find a colouring inless than 5 iterations on average. We then show how the algorithmcan be used in dynamic settings, in which sensors can fail or newsensors can be deployed. In this setting, our algorithm provides250\% more coverage over time compared to activating all availablesensors simultaneously. A Decentralised Coordination Algorithm for Minimising Conflict and Maximising Coverage in Sensor Networks Ruben Stranders, Alex Rogershas 7 papers, Nicholas R. Jenningshas 15 papers

509

  The coordination of emergency responders and robots to undertake a number of tasks in disaster scenarios is a grand challenge for multi-agent systems. Central to this endeavour is the problem of forming the best teams (coalitions) of responders to perform the various tasks in the area where the disaster has struck. Moreover, these teams may have to form, disband, and reform in different areas of the disaster region. This is because in most cases there will be more tasks than agents. Hence, agents need to schedule themselves to attempt each task in turn. Second, the tasks themselves can be very complex: requiring the agents to work on them for different lengths of time and having deadlines by when they need to be completed. The problem is complicated still further when different coalitions perform tasks with different levels of efficiency. Given all these facets, we define this as The Coalition Formation with Spatial and Temporal constraints problem (CFSTP). We show that this problem is NP-hard—-in particular, it contains the well-known complex combinatorial problem of Team Orienteering as a special case. Based on this, we design a Mixed Integer Program to optimally solve small-scale instances of the CFSTP and develop new anytime heuristics that can, on average, complete 97\% of the tasks for large problems (20 agents and 300 tasks). In so doing, our solutions represent the first results for CFSTP. Coalition Formation with Spatial and Temporal Constraints Sarvapali D. Ramchurnhas 5 papers, Maria Polukarov, Alessandro Farinellihas 2 papers, Cuong Truong, Nicholas R. Jenningshas 15 papers

Session 25 – Agent Reasoning III

529

  Despite the recent advances in planning with MDPs, the problem ofgenerating good policies is still hard. In this paper, we describe a planning novel approach for generating policies in MDPs that (1) determinizes the given MDP model into a classicalplanning problem; (2) generates partial policies off-line byproducing solution plans to the classical planning problem and incrementallyaggregating them into a policy, and (3) uses sequential Monte-Carlo (MC)simulations of the partial policies before execution, in order to assess the probability of replanning after execution. The objective of this approach is to quickly generate policies in which the probability of replanning is low and below a given threshold. %paraWe describe our planner {\tt RFF}, which incorporates the above ideas. We present an extensive experimental study with {\tt RFF}: %para1) RFF was the winner of the fully-observable probabilistic track in the 2008International Planning Competition (IPC-08). %para2) In addition to our study of the IPC-08 results, we analyzed {\tt RFF}'sperformance with different plan aggregation and determinization strategies, with varying amount of MC sampling, andwith varying threshold values for probability of execution failures. The results of these experiments revealed how they impact the time performance of {\tt RFF} togenerate solution policies and the quality of those solution policies(i.e., the average cumulated reward gathered from the execution of the policies). Incremental Plan Aggregation for Generating Policies in MDPs Florent Teichteil-Konigsbuch, Ugur Kuterhas 2 papers, Guillaume Infantes

Session 26 – Virtual Agents II

798

  Virtual humans are embodied software agents that should not onlybe realistic looking but also have natural and realistic behaviors.Traditional virtual human systems learn these interactionbehaviors by observing how individuals respond in face-to-facesituations (i.e., direct interaction). In contrast, this paperintroduces a novel methodological approach called parasocialconsensus sampling (PCS) which allows multiple individuals tovicariously experience the same situation to gain insight on thetypical (i.e., consensus view) of human responses in socialinteraction. This approach can help tease apart what isidiosyncratic from what is essential and help reveal the strength ofcues that elicit social responses. Our PCS approach has severaladvantages over traditional methods: (1) it integrates data frommultiple independent listeners interacting with the same speaker,(2) it associates probability of how likely feedback will be givenover time, (3) it can be used as a prior to analyze and understandthe face-to-face interaction data, (4) it facilitates much quickerand cheaper data collection. In this paper, we apply our PCSapproach to learn a predictive model of listener backchannelfeedback. Our experiments demonstrate that a virtual humandriven by our PCS approach creates significantly more rapportand is perceived as more believable than the virtual human drivenby face-to-face interaction data. Parasocial Consensus Sampling: Combining Multiple Perspectives to Learn Virtual Human Behavior Lixing Huanghas 2 papers, Louis-Phillippe Morency, Jonathan Gratchhas 2 papers

669

  Virtual Actors are at the heart of Interactive Storytellingsystems and in recent years multiple approaches have beendescribed to specify their autonomous behaviour. One wellknown problem is how to achieve a balance between thecharacters’ autonomy, defined in terms of their individualroles and motivations, and the global structure of the plot,which tends to emphasise narrative phenomena and the coordination of multiple characters. In this paper we report anew approach to the definition of virtual characters aimed atachieving a balance between character autonomy and globalplot structure. Where previous approaches have tended tofocus on individual actions our objective is to reincorporatehigher-level narrative elements in the behaviour of individual actors and address the relation between character andplot at the level of behaviour representation. To this endwe introduce the notion of a characters’ Point of View andshow how it enables a story to be described from the perspective of a number of different characters: it is not merelya presentation effect it is also a different way to tell a story.As an illustration, we have developed an Interactive Narrative based on Shakespeare’s Merchant of Venice. The system, which features a novel planning approach to story generation, can generate very different stories depending on thePoint of View adopted and support dynamic modification ofthe story world which results in different story consequences.In the paper, we illustrate this approach using example narratives generated using our fully implemented prototype. Narrative Generation through Characters' Point of View Julie Porteous, Marc Cavazzahas 3 papers, Fred Charleshas 2 papers

Session 27 – ICAPS/AAMAS I

775

  We consider the problem of coordinating a team of agentsengaged in executing a set of inter-dependent, geographicallydispersed tasks in an oversubscribed and uncertain environment. In such domains, where there are sequence-dependentsetup activities (e.g., travel), we argue that there is inherent leverage to having agents maintain advance schedules.In the distributed problem solving setting we consider, eachagent begins with a task itinerary, and, as execution unfolds and dynamics ensue (e.g., tasks fail, new tasks are discovered, etc.), agents must coordinate to extend and revisetheir plans accordingly. The team objective is to maximizethe utility accrued from executed actions over a given timehorizon. Our approach to solving this problem is based ondistributed management of agent schedules. We describe anagent architecture that uses the synergy between intra-agentscheduling and inter-agent coordination to promote task allocation decisions that minimize travel time and maximizetime available for utility-acrruing activities. Experimentalresults are presented that compare our agent’s performanceto that of an agent using an intelligent dispatching strategypreviously shown to outperform our approach on synthetic,stateless, utility maximization problems. Across a range ofproblems involving a mix of situated and non-situated tasksour advance scheduling approach dominates this same dispatch strategy. Finally, we report performance results withan extension of the system on a limited set of field test experiments. Distributed Coordination of Mobile Agent Teams: The Advantage of Planning Ahead Laura Barbulescu, Zachary Rubinstein, Stephen Smith, Terry Zimmerman

Session 28 – ICAPS/AAMAS II

 


Copyright © 2010 IFAAMAS