AAMAS 2012 - June 4-9, 2012 - Valencia, Spain

Full Papers

Session 1A – Innovative Applications


While three deployed applications of game theory for security have recently been reported at AAMAS, we as a community remain in the early stages of these deployments; there is a continuing need to understand the core principles for innovative security applications of game theory. Towards that end, this paper presents PROTECT, a game-theoretic system deployed by the United States Coast Guard (USCG) in the port of Boston for scheduling their patrols. USCG has termed the deployment of PROTECT in Boston a success, and efforts are underway to test it in the port of New York, with the potential for nationwide deployment.

PROTECT is premised on an attacker-defender Stackelberg game model and offers five key innovations. First, this system is a departure from the assumption of perfect adversary rationality noted in previous work, relying instead on a quantal response (QR) model of the adversary's behavior --- to the best of our knowledge, this is the first real-world deployment of the QR model. Second, to improve PROTECT's efficiency, we generate a compact representation of the defender's strategy space, exploiting equivalence and dominance. Third, we show how to practically model a real maritime patrolling problem as a Stackelberg game. Fourth, our experimental results illustrate that PROTECT's QR model more robustly handles real-world uncertainties than a perfect rationality model. Finally, in evaluating PROTECT, this paper for the first time provides real-world data: (i) comparison of human-generated vs PROTECT security schedules, and (ii) results from an Adversarial Perspective Team's (human mock attackers) analysis.
PROTECT: A Deployed Game Theoretic System to Protect the Ports of the United States
Eric Shieh, Bo Anhas 3 papers, Rong Yanghas 5 papers, Milind Tambehas 13 papers, Craig Baldwin, Joseph DiRenzo, Ben Maule, Garrett Meyer


This paper describes an innovative multiagent system called SAVES with the goal of conserving energy in commercial buildings. We specifically focus on an application to be deployed in an existing university building that provides several key novelties: (i) jointly performed with the university facility management team, SAVES is based on actual occupant preferences and schedules, actual energy consumption and loss data, real sensors and hand-held devices, etc.; (ii) it addresses novel scenarios that require negotiations with groups of building occupants to conserve energy; (iii) it focuses on a non-residential building, where human occupants do not have a direct financial incentive in saving energy and thus requires a different mechanism to effectively motivate occupants; and (iv) SAVES uses a novel algorithm for generating optimal MDP policies that explicitly consider multiple criteria optimization (energy and personal comfort) as well as uncertainty over occupant preferences when negotiating energy reduction - this combination of challenges has not been considered in previous MDP algorithms.
In a validated simulation testbed, we show that SAVES substantially reduces the overall energy consumption compared to the existing control method while achieving comparable average satisfaction levels for occupants. As a real-world test, we provide results of a trial study where SAVES is shown to lead occupants to conserve energy in real buildings.
SAVES: A Sustainable Multiagent Application to Conserve Building Energy Considering Occupants
Jun-young Kwakhas 3 papers, Pradeep Varakanthamhas 6 papers, Rajiv Maheswaranhas 7 papers, Milind Tambehas 13 papers, Farrokh Jazizadehhas 2 papers, Geoffrey Kavulyahas 2 papers, Laura Kleinhas 2 papers, Burcin Becerik-Gerberhas 2 papers, Timothy Hayeshas 2 papers, Wendy Woodhas 2 papers


Cyber security is increasingly important for defending computer systems from loss of privacy or unauthorised use. One important aspect is threat analysis - how does an attacker infiltrate a system and what do they want once they are inside. This paper considers the problem of Active Malware Analysis, where we learn about the human or software intruder by actively interacting with it with the goal of learning about its behaviours and intentions, whilst at the same time that intruder may be trying to avoid detection or showing those behaviours and intentions. This game-theoretic active learning is then used to obtain a behavioural clustering of malware, an important contribution for both understanding malware at a high level and more crucially, for the deployment of effective anti-malware defences. This paper makes the following contributions: (i) A formal definition of the game-theoretic active malware analysis problem; (ii) A fast algorithm for learning about a malware in the active analysis problem which utilises the concept of reducing entropy in the beliefs about the malware; (iii) A virtual machine based agent architecture for the implementation of the active malware analysis problem and (iv) A behaviour based clustering of malware behaviour which is shown to be more accurate than a similar clustering using only passive information about the malware. Active Malware Analysis using Stochastic Games Simon Williamsonhas 2 papers, Pradeep Varakanthamhas 6 papers, Debin Gao, Ong Chen Hui


Contemporary maritime piracy presents a significant threat to the global shipping industry, with annual costs estimated at up to US\$12bn. To address the threat, commanders and policymakers need new data-driven decision-support tools that will allow them to plan and execute counter-piracy activities most effectively. So far, however, the provision of such tools has been very limited. To fill this gap, we have employed the multi-agent approach and developed a novel suite of computational tools and techniques for operational management of counter-piracy operations. A comprehensive agent-based simulation enables the stakeholders to assess the efficiency of a range of piracy counter-measures, including recommended transit corridors, escorted convoys, group transit schemes, route randomization and navy patrol deployments. Decision-theoretic and game-theoretic optimization techniques further assist in discovering counter-measure configurations that yield the best trade-off between transportation security and cost. We demonstrate our approach on two case studies based on the problems and solutions currently explored by the maritime security community. Our work is the first integrated application of agent-based techniques to high-seas maritime security and opens a wide range of directions for follow-up research and development. Agents vs. Pirates: Multi-agent Simulation and Optimization to Fight Maritime Piracy Michal Jakobhas 2 papers, Ondřej Vanĕkhas 2 papers, Ondřej Hrstka, Michal Pĕchoučekhas 6 papers

Session 1B – Teamwork I


In many real-life networks, such as urban structures, protein interactions and social networks, one of the key issues is to measure the centrality of nodes, i.e. to determine which nodes and edges are more central to the functioning of the entire network than others. In this paper we focus on \emph{betweenness centrality} - a metric based on which the centrality of a node is measured involving the number of shortest paths that pass through that node. This metric has been shown to be well suited for many, often complex, networks. In itsstandard form, the betweenness centrality, just like other centrality metrics, evaluates nodes based on their individual contributions to the functioning of the network. For instance, the importance of an intersection in a road network can be computed as the difference between the full capacity of this network and its capacity when the intersection is completely shut down. However, as recently argued in the literature, such an approach is inadequate for many real-life applications, as, for example, multiple nodes can fail simultaneously. Thus, what would be desirable is to refine the existing centrality metrics such that they take into account not only the functioning of nodes as individual entities but also as members of groups of nodes. One recently-proposed way of doing this is based on the \emph{Shapley Value} - a solution concept in cooperative game theory that measures in a fair way the contributions of players to all the coalitions that they could possibly participate in. Although this approach has been used to extend various centrality metrics, such an extension to betweenness centrality is yet to be developed. The main challenge when developing such a refinement is to tackle the computational complexity; the Shapely Value generally requires an exponential number of operations, making its use limited to a small number of player (or nodes in our context). Against this background, our main contribution in this paper is to refine the betweenness centrality metric based on the Shapley Value: we develop an algorithm for computing this new metric, and show that it has the same complexity as the best known algorithm due to Brandes to compute the standard betweenness centrality (i.e., polynomial in the size of the network). Finally, we show that our results can be extended to another important centrality metric called stress centrality. A New Approach to Betweenness Centrality Based on the Shapley Value Piotr Szczepański, Tomasz Michalak, Talal Rahwan


Many multi-agent applications may involve a notion of spatial coherence. For instance, simulations of virtual agents often need to model a coherent group or crowd. Alternatively, robots may prefer to stay within a pre-specified communication range. This paper proposes an extension of a decentralized, reactive collision avoidance framework, which defines obstacles in the velocity space, known as Velocity Obstacles (VOs), for coherent groups of agents. The extension, referred to in this work as a Loss of Communication Obstacle (LOCO), aims to maintain proximity among agents by imposing constraints in the velocity space and restricting the set of feasible controls. If the introduction of LOCOs results in a problem that is too restrictive, then the proximity constraints are relaxed in order to maintain collision avoidance. If agents break their proximity constraints, a method is applied to reconnect them. The approach is fast and integrates nicely with the Velocity Obstacle framework. It yields improved coherence for groups of robots connected through an input constraint graph that are moving with constant velocity. Simulated environments involving a single team moving among static obstacles, as well as multiple teams operating in the same environment, are considered in the experiments and evaluated for collisions, computational cost and proximity constraint maintenance. The experiments show that improved coherence is achieved while maintaining collision avoidance, at a small computational cost and path quality degradation. Maintaining Team Coherence under the Velocity Obstacle Framework Andrew Kimmel, Andrew Dobson, Kostas Bekris

Session 1C – Learning I


Although Reinforcement Learning (RL) has been successfully deployed in a variety of tasks, learning speed remains a fundamental problem for applying RL in complex environments. Transfer learning aims to ameliorate this shortcoming by speeding up learning through the adaptation of previously learned behaviors in similar tasks. Transfer techniques often use an inter-task mapping, which determines how a pair of tasks are related. Instead of relying on a hand-coded inter-task mapping, this paper proposes a novel transfer learning method capable of autonomously creating an inter-task mapping by using a novel combination of sparse coding, sparse projection learning and sparse Gaussian processes. We also propose two new transfer algorithms (\emph{TrLSPI} and \emph{TrFQI}) based on least squares policy iteration and fitted-Q-iteration. Experiments not only show successful transfer of information between similar tasks, inverted pendulum to cart pole, but also between two very different domains: mountain car to cart pole. This paper empirically shows that the learned inter-task mapping can be successfully used to (1) improve the performance of a learned policy on a fixed number of samples, (2) reduce the learning times needed by the algorithms to converge to a policy on a fixed number of samples, and (3) converge faster to a near-optimal policy given a large number of samples. Reinforcement Learning Transfer via Sparse Coding Haitham Bou Ammar, Karl Tuylshas 5 papers, Matthew Taylorhas 2 papers, Kurt Driessen, Gerhard Weisshas 2 papers

Session 1D – Social Choice I


This paper considers the communication complexity of approximating common voting rules. Both upper and lower bounds are presented. For $n$ voters and $m$ alternatives. It is shown that for all $\epsilon \in (0,1)$, the communication complexity of obtaining a $1 - \epsilon$ approximation to Borda is $O(\log(\frac{1}{\epsilon}) nm)$. A lower bound of $\Omega(nm)$ is provided for small values of $\epsilon$. The communication complexity of computing the true Borda winner is $\Omega(nm\log(m))$. Thus, in the case of Borda, one can obtain arbitrarily good approximations with less communication overhead than is required to compute the true Borda winner.
For other voting rules, no such $1 \pm \epsilon$ approximation scheme exists. In particular, it is shown that the communication complexity of computing any constant factor approximation, $\rho$, to Bucklin is $\Omega(\frac{nm}{\rho^2})$. Conitzer and Sandholm show that the communication complexity of computing the true Bucklin winner is $O(nm)$. However, we show that for all $\delta \in (0,1)$, the communication complexity of computing a $m^{\delta}$ approximate winner in Bucklin elections is $O(nm^{1-\delta}\log(m))$. For $\delta \in (\frac{1}{2}, 1)$ a lower bound of $\Omega( nm^{1-2\delta} )$ is also provided.
Similar lower bounds are presented on the communication complexity of computing approximate winners in Copeland elections.
Communication Complexity of Approximating Voting Rules
Travis Servicehas 2 papers, Julie Adamshas 2 papers

Session 1E – Game Theory I


A Coalition Structure Generation (CSG) problem involves partitioning a set of agents into coalitions so that the social surplus is maximized. Recently, Ohta et al. developed an efficient algorithm for solving CSG, assuming that a characteristic function is represented as a set of rules, such as marginal contribution networks (MC-nets).
In this paper, we extend the formalization of CSG in Ohta et al. so that it can handle negative value rules. Here, we assume that a characteristic function is represented by either MC-nets (without externalities) or embedded MC-nets (with externalities). Allowing negative value rules is important since it can reduce the efforts for describing a characteristic function. In particular, in many realistic situations, it is natural to assume that a coalition has negative externalities to other coalitions.
To handle negative value rules, we examine the following three algorithms: (i) a full transformation algorithm, (ii) a partial transformation algorithm, and (iii) a direct encoding algorithm. We show that the full transformation algorithm is not scalable in MC-nets (the worst-case representation size is $\Omega(n^2)$, where n is the number of agents), and does not seem to be tractable in embedded MC-nets (representation size would be $\Omega(2^n)$). In contrast, by using the partial transformation or direct encoding algorithms, an exponential blow-up never occurs even for embedded MC-nets. For embedded MC-nets, the direct encoding algorithm creates less rules than the partial transformation algorithm.
Experimental evaluations show that the direct encoding algorithm is scalable, i.e., an off-the-shelf optimization package (CPLEX) can solve problem instances with 100 agents and rules within 10 seconds.
Handling Negative Value Rules in MC-net-based Coalition Structure Generation
Suguru Uedahas 3 papers, Takato Hasegawa, Naoyuki Hashimoto, Naoki Ohta, Atsushi Iwasakihas 4 papers, Makoto Yokoohas 4 papers

Session 1F – Planning


n this paper, we investigate real-time path planning in static terrain, as needed in video games. We introduce the game time model, where time is partitioned into uniform time intervals, an agent can execute one movement during each time interval, and search and movements are done in parallel. The objective is to move the agent from its start location to its goal location in as few time intervals as possible. For known terrain, we show experimentally that Time-Bounded A* (TBA*), an existing real-time search algorithm for undirected terrain, needs fewer time intervals than two state-of-the-art real-time search algorithms and about the same number of time intervals as A*. TBA*, however, cannot be used when the terrain is not known initially. For initially partially or completely unknown terrain, we thus propose a new search algorithm. Our Time-Bounded Adaptive A* (TBAA*) extends TBA* to on-line path planning with the freespace assumption by combining it with Adaptive A*. We prove that TBAA* either moves the agent from its start location to its goal location or detects that this is impossible - an important property since many existing realtime search algorithms are not able to detect efficiently that no path exists. Furthermore, TBAA* can eventually move the agent on a cost-minimal path from its start location to its goal location if it resets the agent into its start location whenever it reaches its goal location. We then show experimentally in initially partially or completely unknown terrain that TBAA* needs fewer time intervals than several state-of-the-art complete and real-time search algorithms and about the same number of time intervals as the best compared complete search algorithm, even though it has the advantage over complete search algorithms that the agent starts to move right away. Time Bounded Adaptive A* Carlos Hernández, Jorge Baier, Tansel Uras, Sven Koenig

Session 2A – Virtual Agents


Research in the behavioral sciences suggests that emotion can serve important social functions and that, more than a simple manifestation of internal experience, emotion displays communicate one's beliefs, desires and intentions. In a recent study we have shown that, when engaged in the iterated prisoner's dilemma with agents that display emotion, people infer, from the emotion displays, how the agent is appraising the ongoing interaction (e.g., is the situation favorable to the agent? Does it blame me for the current state-of-affairs?). From these appraisals people, then, infer whether the agent is likely to cooperate in the future.
In this paper we propose a Bayesian model that captures this social function of emotion. The model supports probabilistic predictions, from emotion displays, about how the counterpart is appraising the interaction which, in turn, lead to predictions about the counterpart's intentions. The model's parameters were learned using data from the empirical study. Our evaluation indicated that considering emotion displays improved the model's ability to predict the counterpart's intentions, in particular, how likely it was to cooperate in a social dilemma. Using data from another empirical study where people made inferences about the counterpart's likelihood of cooperation in the absence of emotion displays, we also showed that the model could, from information about appraisals alone, make appropriate inferences about the counterpart's intentions. Overall, the paper suggests that appraisals are valuable for computational models of emotion interpretation. The relevance of these results for the design of multiagent systems where agents, human or not, can convey or recognize emotion is discussed.
Bayesian Model of the Social Effects of Emotion in Decision-Making in Multiagent Systems
Celso de Melo, Peter Carnevale, Stephen Read, Dimitrios Antos, Jonathan Gratchhas 2 papers

Session 2B – Distributed Problem Solving


Real life coordination problems are characterised by stochasticity and a lack of \emph{a priori} knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely accepted framework for modelling decentralised coordination problems, assumes perfect knowledge, thus limiting its practical applicability. To address this shortcoming, we introduce the MAB-DCOP, in which the interactions between agents are modelled by multi-armed bandits (MABs). Unlike canonical DCOPs, a MAB-DCOP is not a single shot optimisation problem. Rather, it is a sequential one in which agents need to coordinate in order to strike a balance between acquiring knowledge about the \emph{a priori} unknown and stochastic interactions (exploration), and taking the currently believed optimal joint action (exploitation), so as to maximise the cumulative global utility over a finite time horizon.
We propose \textsc{Heist}, the first asymptotically optimal algorithm for coordination under stochasticity and lack of prior knowledge. \textsc{Heist} solves MAB-DCOPs in a decentralised fashion using a generalised distributive law (GDL) message passing phase to find the joint action with the highest upper confidence bound (UCB) on global utility. We demonstrate that \textsc{Heist} outperforms other state of the art techniques from the MAB and DCOP literature by up to 1.5 orders of magnitude on MAB-DCOPs in experimental settings.
DCOPs and Bandits: Exploration and Exploitation in Decentralised Coordination
Ruben Stranders, Long Tran-Thanh, Francesco Maria Delle Favehas 2 papers, Alex Rogershas 9 papers, Nick Jenningshas 11 papers

Session 2C – Learning II


Solving complex but structured problems in a decentralized manner via multiagent collaboration has received much attention in recent years. This is natural, as on one hand, multiagent systems usually possess a structure that determines the allowable interactions among the agents; and on the other hand, the single most pressing need in a cooperative multiagent system is to coordinate the local policies of autonomous agents with restricted capabilities to serve a system-wide goal. The presence of uncertainty makes this even more challenging, as the agents face the additional need to learn the unknown environment parameters while forming (and following) local policies in an online fashion. In this paper, we provide the first Bayesian reinforcement learning (BRL) approach for distributed coordination and learning in a cooperative multiagent system by devising two solutions to this type of problem. More specifically, we show how the Value of Perfect Information (VPI) can be used to perform efficient decentralised exploration in both model-based and model-free BRL, and in the latter case, provide a closed form solution for VPI, correcting a decade old result by Dearden, Friedman and Russell. To evaluate these solutions, we present experimental results comparing their relative merits, and demonstrate empirically that both solutions outperform an existing multiagent learning method, representative of the state-of-the-art. Decentralized Bayesian Reinforcement Learning for Online Agent Collaboration Luke Teacy, Georgios Chalkiadakishas 4 papers, Alessandro Farinellihas 2 papers, Alex Rogershas 9 papers, Nick Jenningshas 11 papers, Sally McClean, Gerard Parr

Session 2D – Social Choice II

Session 2E – Game Theory II

Session 2F – Knowledge Representation & Reasoning


Policy iteration algorithms for partially observable Markov decision processes (POMDP) offer the benefits of quick convergence and the ability to operate directly on the solution, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration.
In this paper, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform policy iteration in settings formalized by the interactive POMDP framework. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its generalization here makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling others, we ascribe nested controllers to predict others' actions, with the benefit that the controllers compactly represent the model space. We evaluate our approach on multiple problem domains, and demonstrate its properties and scalability.
Generalized and Bounded Policy Iteration for Finitely-Nested Interactive POMDPs: Scaling Up
Ekhlas Sonuhas 2 papers, Prashant Doshihas 3 papers

Session 3A – Robotics I


A central problem in environmental sensing and monitoring is to classify/label the hotspots in a large-scale environmental field. This paper presents a novel \emph{decentralized active robotic exploration} (DARE) strategy for probabilistic classification/labeling of hotspots in a \emph{Gaussian process} (GP)-based field. In contrast to existing state-of-the-art exploration strategies for learning environmental field maps, the time needed to solve the DARE strategy is independent of the map resolution and the number of robots, thus making it practical for in situ, real-time active sampling. Its exploration behavior exhibits an interesting formal trade-off between that of boundary tracking until the hotspot region boundary can be accurately predicted and wide-area coverage to find new boundaries in sparsely sampled areas to be tracked. We provide a theoretical guarantee on the active exploration performance of the DARE strategy: under reasonable conditional independence assumption, we prove that it can optimally achieve two formal cost-minimizing exploration objectives based on the misclassification and entropy criteria. Importantly, this result implies that the uncertainty of labeling the hotspots in a GP-based field is greatest at or close to the hotspot region boundaries. Empirical evaluation on real-world plankton density and temperature field data shows that, subject to limited observations, DARE strategy can achieve more superior classification of hotspots and time efficiency than state-of-the-art active exploration strategies. Decentralized Active Robotic Exploration and Mapping for Probabilistic Field Classification in Environmental Sensing Kian Hsiang Lowhas 3 papers, Jie Chen, John Dolanhas 2 papers, Steve Chien, David Thompson


We consider the problem of dynamic self-reconfiguration in a modular self-reconfigurable robot (MSR). Previous approaches to MSR self-reconfiguration solve this problem using algorithms that search for a goal configuration in the MSR's configuration space. In contrast, we model the self-reconfiguration problem as a constrained optimization problem that attempts to minimize the reconfiguration cost while achieving a desirable configuration. We formulate the MSR self-reconfiguration problem as finding the optimal coalition structure within a coalition game theoretic framework. To reduce the complexity of finding the optimal coalition structure, we represent the set of all robot modules as a fully-connected graph. Each robot module corresponds to a vertex of the graph and edge weights represent the utility of a pair of modules being in the same coalition (or, connected component). The value of a coalition structure is then defined as the sum of the weights of all edges that are completely within the same coalition in that coalition structure. We then use a graph partitioning technique to cluster the vertices (robot modules) in the constructed graph so that the obtained coalition structure has close to optimal value. The clustering algorithm has time complexity polynomial in the number of agents, $n$, and yields an O(log $n$) approximation. We have verified our technique experimentally for a variety of settings. Our results show that the graph clustering-based self-reconfiguration algorithm performs comparably with two other existing algorithms for determining optimal coalition structures. Dynamic Reconfiguration in Modular Robots using Graph Partitioning-based Coalitions Prithviraj Dasgupta, Vladimir Ufimtsev, Carl Nelson, S. G. M. Hossain

Session 3C – Human-agent Interaction


As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users - without programming skills - can transfer their task knowledge to agents, learning can accelerate dramatically, reducing costly trials. The \textsc{tamer} framework guides the design of agents whose behavior can be shaped through signals of approval and disapproval, a natural form of human feedback. More recently, \textsc{tamer+rl} was introduced to enable human feedback to augment a traditional reinforcement learning (RL) agent that learns from a Markov decision process's (MDP) reward signal. We address limitations of prior work on \textsc{tamer} and \textsc{tamer+rl}, contributing in two critical directions. First, the four successful techniques for combining human reward with RL from prior \textsc{tamer+rl} work are tested on a second task, and these techniques' sensitivities to parameter changes are analyzed. Together, these examinations yield more general and prescriptive conclusions to guide others who wish to incorporate human knowledge into an RL algorithm. Second, \textsc{tamer+rl} has thus far been limited to a \emph{sequential} setting, in which training occurs before learning from MDP reward. In this paper, we introduce a novel algorithm that shares the same spirit as \textsc{tamer+rl} but learns \emph{simultaneously} from both reward sources, enabling the human feedback to come at any time during the reinforcement learning process. We call this algorithm simultaneous \textsc{tamer+rl}. To enable simultaneous learning, we introduce a new technique that appropriately determines the magnitude of the human model's influence on the RL algorithm throughout time and state-action space. Reinforcement Learning from Simultaneous Human and MDP Reward W. Bradley Knox, Peter Stonehas 5 papers


Both Learning from Demonstration (LfD) and Reinforcement Learning (RL) are popular approaches for building decision-making agents. LfD applies supervised learning to a set of human demonstrations to infer and imitate the human policy, while RL uses only a reward signal and exploration to find an optimal policy. For complex tasks both of these techniques may be ineffective. LfD may require many more demonstrations than it is feasible to obtain, and RL can take an inadmissible amount of time to converge.
We present Automatic Decomposition and Abstraction from demonstration (ADA), an algorithm that uses mutual information measures over a set of human demonstrations to decompose a sequential decision process into several subtasks, finding state abstractions for each one of these subtasks. ADA then projects the human demonstrations into the abstracted state space to build a policy. This policy can later be improved using RL algorithms to surpass the performance of the human teacher. We find empirically that ADA can find satisficing policies for problems that are too complex to be solved with traditional LfD and RL algorithms. In particular, we show that we can use mutual information across state features to leverage human demonstrations to reduce the effects of the curse of dimensionality by finding subtasks and abstractions in sequential decision processes.
Automatic Task Decomposition and State Abstraction from Demonstration
Luis C. Cobo, Charles L. Isbell Jr., Andrea Thomaz

Session 3D – Economies & Markets I


Many agent-based models of financial markets have been able to reproduce certain stylized facts that are observed in actual empirical time series data by using "zero-intelligence" agents whose behaviour is largely random in order to ascertain whether certain phenomena arise from market micro-structure as opposed to strategic behaviour. Although these models have been highly successful, it is not surprising that they are unable to explain \emph{every} stylized fact, and indeed it seems plausible that although some phenomena arise purely from market micro-structure, other phenomena arise from the behaviour of the participating agents, as suggested by more complex agent-based models which use agents endowed with various forms of strategic behaviour. Given that both zero-intelligence and strategic models are each able to explain various phenomena, an interesting question is whether there are hybrid, "zero-intelligence plus" models containing a minimal amount of strategic behaviour that are simultaneously able to explain all of the stylized facts. We conjecture that as we gradually increase the level of strategic behaviour in a zero-intelligence model of a financial market we will obtain an increasingly good fit with the stylized facts of empirical financial time-series data. We test this hypothesis by systematically evaluating several different experimental treatments in which we incrementally add minimalist levels of strategic behaviour to our model, and test the resulting time series of returns for the following statistical features: fat tails, volatility clustering, persistence and non-Gaussianity. Surprisingly, the resulting "zero-intelligence plus" models do mph{not} introduce more realism to the time series, thus supporting other research which conjectures that some phenomena in the financial markets are indeed the result of more sophisticated learning, interaction and adaptation. Can a Zero-Intelligence Plus Model Explain the Stylized Facts of Financial Time Series Data? Imon Palit, Steve Phelps, Wing Lon Ng

Session 3E – Game Theory III


To step beyond the first-generation deployments of attacker-defender security games -- for LAX Police, US FAMS and others -- it is critical that we relax the assumption of perfect rationality of the human adversary. Indeed, this assumption is a well-accepted limitation of classical game theory and modeling human adversaries' bounded rationality is critical. To this end, quantal response (QR) has provided very promising results to model human bounded rationality. However, in computing optimal defender strategies in real-world security games against a QR model of attackers, we face difficulties including (1) solving a nonlinear non-convex optimization problem efficiently for massive real-world security games; and (2) addressing constraints on assigning security resources, which adds to the complexity of computing the optimal defender strategy.
This paper presents two new algorithms to address these difficulties: \textsc{GOSAQ} can compute the globally optimal defender strategy against a QR model of attackers when there are no resource constraints and gives an efficient heuristic otherwise; \textsc{PASAQ} in turn provides an efficient approximation of the optimal defender strategy with or without resource constraints. These two novel algorithms are based on three key ideas: (i) use of a binary search method to solve the fractional optimization problem efficiently, (ii) construction of a convex optimization problem through a non-linear transformation, (iii) building a piecewise linear approximation of the non-linear terms in the problem. Additional contributions of this paper include proofs of approximation bounds, detailed experimental results showing the advantages of extsc{GOSAQ} and \textsc{PASAQ} in solution quality over the benchmark algorithm (\textsc{BRQR}) and the efficiency of \textsc{PASAQ}. Given these results, \textsc{PASAQ} is at the heart of the PROTECT system, which is deployed for the US Coast Guard in the port of Boston, and is now headed to other ports.
Computing Optimal Strategy against Quantal Response in Security Games
Rong Yanghas 5 papers, Fernando Ordóñezhas 2 papers, Milind Tambehas 13 papers


Given their existing and potential real-world security applications, Bayesian Stackelberg games have received significant research interest. In these games, the defender acts as a leader, and the many different follower types model the uncertainty over discrete attacker types. Unfortunately since solving such games is an NP-hard problem, scale-up has remained a difficult challenge.
This paper scales up Bayesian Stackelberg games, providing a novel unified approach to handle uncertainty not only over discrete follower types but also other key continuously distributed real world uncertainty, due to the leader's execution error, the follower's observation error, and continuous payoff uncertainty. To that end, this paper provides contributions in two parts. First, we present a new algorithm for Bayesian Stackelberg games, called \textsc{hunter}, to scale up the number of types. \textsc{hunter} combines the following five key features: i) efficient pruning via a best-first search of the leader's strategy space; ii) a novel linear program for computing tight upper bounds for this search; iii) using Bender's decomposition for solving the upper bound linear program efficiently; iv) efficient inheritance of Bender's cuts from parent to child; v) an efficient heuristic branching rule. Our experiments show that \textsc{hunter} provides orders of magnitude speedups over the best existing methods to handle discrete follower types. In the second part, we show \textsc{hunter}'s efficiency for Bayesian Stackelberg games can be exploited to also handle the continuous uncertainty using sample average approximation. We experimentally show that our \textsc{hunter}based approach also outperforms latest robust solution methods under continuously distributed uncertainty.
A Unified Method for Handling Discrete and Continuous Uncertainty in Bayesian Stackelberg Games
Zhengyu Yinhas 4 papers, Milind Tambehas 13 papers


The burgeoning area of security games has focused on real-world domains where security agencies protect critical infrastructure from a diverse set of adaptive adversaries. There are security domains where the payoffs for preventing the different types of adversaries may take different forms (seized money, reduced crime, saved lives, etc) which are not readily comparable. Thus, it can be difficult to know how to weigh the different payoffs when deciding on a security strategy. To address the challenges of these domains, we propose a fundamentally different solution concept, multi-objective security games (MOSG), which combines security games and multi-objective optimization. Instead of a single optimal solution, MOSGs have a set of Pareto optimal (non-dominated) solutions referred to as the Pareto frontier. The Pareto frontier can be generated by solving a sequence of constrained single-objective optimization problems (CSOP), where one objective is selected to be maximized while lower bounds are specified for the other objectives. Our contributions include: (i) an algorithm, Iterative $\epsilon$-Constraints, for generating the sequence of CSOPs; (ii) an exact approach for solving an MILP formulation of a CSOP (which also applies to multi-objective optimization in more general Stackelberg games); (iii) heuristics that achieve speedup by exploiting the structure of security games to further constrain a CSOP; (iv) an approximate approach for solving an algorithmic formulation of a CSOP, increasing the scalability of our approach with quality guarantees. Additional contributions of this paper include proofs on the level of approximation and detailed experimental evaluation of the proposed approaches. Multi-Objective Optimization for Security Games Matthew Brown, Bo Anhas 3 papers, Christopher Kiekintveld, Fernando Ordóñezhas 2 papers, Milind Tambehas 13 papers


There has been significant recent interest in computing effective strategies for playing large imperfect-information games. Much prior work involves computing an approximate equilibrium strategy in a smaller abstract game, then playing this strategy in the full game (with the hope that it also well approximates an equilibrium in the full game). In this paper, we present a family of modifications to this approach that work by constructing non-equilibrium strategies in the abstract game, which are then played in the full game. Our new procedures, called \emph{purification} and \emph{thresholding}, modify the action probabilities of an abstract equilibrium by preferring the higher-probability actions. Using a variety of domains, we show that these approaches lead to significantly stronger play than the standard equilibrium approach. As one example, our program that uses purification came in first place in the two-player no-limit Texas Hold'em total bankroll division of the 2010 Annual Computer Poker Competition. Surprisingly, we also show that purification significantly improves performance (against the full equilibrium strategy) in random 4 x 4 matrix games using random 3 x 3 abstractions. We present several additional results (both theoretical and empirical). Overall, one can view these approaches as ways of achieving robustness against overfitting one's strategy to one's lossy abstraction. Perhaps surprisingly, the performance gains do not necessarily come at the expense of worst-case exploitability. Strategy Purification and Thresholding: Effective Non-Equilibrium Approaches for Playing Large Games Sam Ganzfried, Tuomas Sandholmhas 4 papers, Kevin Waugh


Moving assets through a transportation network is a crucial challenge in hostile environments such as future battlefields where malicious adversaries have strong incentives to attack vulnerable patrols and supply convoys. Intelligent agents must balance network costs with the harm that can be inflicted by adversaries who are in turn acting rationally to maximize harm while trading off against their own costs to attack. Furthermore, agents must choose their strategies even without full knowledge of their adversaries' capabilities, costs, or incentives.
In this paper we model this problem as a non-zero sum game between two players, a sender who chooses flows through the network and an adversary who chooses attacks on the network. We advance the state of the art by: (1) moving beyond the zero-sum games previously considered to non-zero sum games where the adversary incurs attack costs that are not incorporated into the payoff of the sender; (2) introducing a refinement of the Stackelberg equilibrium that is more appropriate to network security games than previous solution concepts; and (3) using Bayesian games where the sender is uncertain of the capabilities, payoffs, and costs of the adversary. We provide polynomial time algorithms for finding equilibria in each of these cases. We also show how our approach can be applied to games where there are multiple adversaries.
Solving Non-Zero Sum Multiagent Network Flow Security Games with Attack Costs
Steven Okamoto, Noam Hazon, Katia Sycarahas 4 papers

Session 3F – Agent-based Software Development

Session 4A – Robotics II


In this paper, we propose a novel top-down design method for the development ofcollective behaviors of swarm robotics systems called \emph{property-driven design}. Swarm robotics systems are usually designed and developed using a \emph{code-and-fix} approach, that is, the developer devises, tests and modifies the individual robot behaviors until a desired collective behavior is obtained. The code-and-fix approach can be very time consuming and relies completely on the ingenuity and expertise of the designer. The idea of property-driven design is that a swarm robotics system can be described by specifying formally a set of desired properties. In an iterative process similar to test-driven development, the developer produces a model of the system that satisfies the desired properties. Subsequently, the system is implemented in simulation and using real robots. Property-driven design helps to minimize the risk of developing a system that does not satisfy the required properties, and to promote the reuse of hardware-independent models. In this paper, we start by giving a general description of the method. We then present a possible way to apply it by using Discrete Time Markov Chains (DTMC) and Probabilistic Computation Tree Logic* (PCTL*). Finally, we conclude by presenting the application of the proposed method to the design and development of a swarm robotics system performing aggregation. Property-driven design for swarm robotics Manuele Brambilla, Carlo Pinciroli, Mauro Birattari, Marco Dorigohas 2 papers


This paper presents a novel decision-theoretic approach to control and coordinate multiple active cameras for observing a number of moving targets in a surveillance system. This approach offers the advantages of being able to (a) account for the stochasticity of targets' motion via probabilistic modeling, and (b) address the trade-off between maximizing the expected number of observed targets and the resolution of the observed targets through stochastic optimization. One of the key issues faced by existing approaches in multi-camera surveillance is that of scalability with increasing number of targets. We show how its scalability can be improved by exploiting the problem structure: as proven analytically, our decision-theoretic approach incurs time that is linear in the number of targets to be observed during surveillance. As demonstrated empirically through simulations, our proposed approach can achieve high-quality surveillance of up to 50 targets in real time and its surveillance performance degrades gracefully with increasing number of targets. We also demonstrate our proposed approach with real AXIS 214 PTZ cameras in maximizing the number of Lego robots observed at high resolution over a surveyed rectangular area. The results are promising and clearly show the feasibility of our decision-theoretic approach in controlling and coordinating the active cameras in real surveillance system. Decision-Theoretic Approach to Maximizing Observation of Multiple Targets in Multi-Camera Surveillance Prabhu Natarajanhas 2 papers, Trong Nghia Hoanghas 2 papers, Kian Hsiang Lowhas 3 papers, Mohan Kankanhalli


When a mixture of particles with different attributes undergoes vibration, a segregation pattern is often observed. For example, in muesli cereal packs, the largest particles - the Brazil nuts - tend to end up at the top. For this reason, the phenomenon is known as the Brazil nut effect. In previous research, an algorithm inspired by this effect was designed to produce segregation patterns in swarms of simulated agents that move on a horizontal plane.
In this paper, we adapt this algorithm for implementation on robots with directional vision. We use the e-puck robot as a platform to test our implementation. In a swarm of e-pucks, different robots mimic disks of different sizes (larger than their physical dimensions). The motion of every robot is governed by a combination of three components: (i) attraction towards a point, which emulates the effect of a gravitational pull, (ii) random motion, which emulates the effect of vibration, and (iii) repulsion from nearby robots, which emulates the effect of collisions between disks. The algorithm does not require robots to discriminate between other robots; yet, it is capable of forming annular structures where the robots in each annulus represent disks of identical size.
We report on a set of experiments performed with a group of 20 physical e-pucks. The results obtained in 100 trials of 20 minutes each show that the percentage of incorrectly-ordered pairs of disks from different groups decreases as the size ratio of disks in different groups is increased. In our experiments, this percentage was, on average, below 0.5% for size ratios from 3.0 to 5.0. Moreover, for these size ratios, all segregation errors observed were due to mechanical failures that caused robots to stop moving.
Segregation in Swarms of e-puck Robots Based On the Brazil Nut Effect
Jianing Chenhas 2 papers, Melvin Gauci, Michael J. Price, Roderich Groß


Modern model-driven engineering and Agent-Oriented Software Engineering (AOSE) methods are rarely utilized in developing robotic software. In this paper, we show how a Model-Driven AOSE methodology can be used for specifying the behavior of multi-robot teams. Specifically, the Agent Systems Engineering Methodology (ASEME) was used for developing the software that realizes the behavior of a physical robot team competing in the Standard Platform League of the RoboCup competition (the robot soccer world cup). The team consists of four humanoid robots, which play soccer autonomously in real time utilizing the on-board sensing, processing, and actuating capabilities, while communicating and coordinating with each other in order to achieve their common goal of winning the game. Our work focuses on the challenges of coordinating the base functionalities (object recognition, localization, motion skills) within each robot (intra-agent control) and coordinating the activities of the robots towards a desired team behavior (inter-agent control). We discuss the difficulties we faced and present the solutions we gave to a number of practical issues, which, in our view, are inherent in applying any AOSE methodology to robotics. We demonstrate the added value of using an AOSE methodology in the development of robotic systems, as ASEME allowed for a platform-independent team behavior specification, automated a large part of the code generation process, and reduced the total development time. Model-Driven Behavior Specification for Robotic Teams Alexandros Paraschos, Nikolaos Spanoudakis, Michail Lagoudakis

Session 4B – Agent Societies


We propose a novel method for assessing the reputation of agents in multiagent systems that is capable of exploiting the structure and semantics of rich agent interaction protocols and agent communication languages. Our method is based on using so-called \emph{conversation models}, i.e. succinct, qualitative models of agents' behaviours derived from the application of data mining techniques on protocol execution data in a way that takes advantage of the semantics of inter-agent communication available in many multiagent systems. Contrary to existing systems, which only allow for querying agents regarding their assessment of others' reputation in an \emph{outcome-based} way (often limited to distinguishing between "successful" and "unsuccessful" interactions), our method allows for contextualised queries regarding the structure of past interactions, the values of content variables, and the behaviour of agents across different protocols. Moreover, this is achieved while preserving maximum privacy for the reputation querying agent and the witnesses queried, and without requiring a common definition of reputation, trust or reliability among the agents exchanging reputation information. A case study shows that, even with relatively simple reputation measures, our qualitative method outperforms quantitative approaches, proving that we can meaningfully exploit the additional information afforded by rich interaction protocols and agent communication semantics. A qualitative reputation system for multiagent systems with protocol-based communication Emilio Serranohas 2 papers, Michael Rovatsos, Juan Botia

Session 4C – Argumentation & Negotiation


This contribution presents a practical extension of a theoretical model for multi-agent planning based upon DeLP, an argumentation-based defeasible logic. Our framework, named DeLP-MAPOP, is implemented on a platform for open multi-agent systems and has been experimentally tested, among others, in applications of ambient intelligence in the field of health-care. DeLP-MAPOP is based on a multi-agent partial order planning paradigm in which agents have diverse abilities, use an argumentation-based defeasible reasoning to support their own beliefs and refute the beliefs of the others according to their knowledge during the plan search process. The requirements of Ambient Intelligence (AmI) environments featured by the imperfect nature of the context information and heterogeneity of the involved agents make defeasible argumentation be an ideal approach to resolve potential conflicts caused by the contradictory information coming from the ambient agents. Moreover, the ability of AmI systems to build a course of action to achieve the user's needs is also a claiming capability in such systems. DeLP-MAPOP shows to be an adequate approach to tackle AmI problems as it gathers together in a single framework the ability of planning while it allows agents to put forward arguments that support or argue upon the accuracy, unambiguity and reliability of the context-aware information. Defeasible Argumentation for Multi-Agent Planning in Ambient Intelligence Applications Sergio Pajares Ferrando, Eva Onaindia

Session 4D – Economies & Markets II

Session 4E – Game Theory IV


We consider multi-player games, and the guarantees that a master player that plays on behalf of a set of players can offer them, without making any assumptions on the rationality of the other players. Our model consists of an $(n+1)$-player game, with $m$ strategies per player, in which a \emph{master} player $M$ forms a coalition with nontransferable utilities among $n$ players, and the remaining player is called the {\em independent} player. Existentially, it is shown that every game admits a \emph{product-minimax-safe} strategy for $M$ -- a strategy that guarantees for every player in $M$'s coalition an expected value of at least her \emph{product minimax value} (which is at least as high as her minimax value and is often higher). Algorithmically, for any given vector of values for the players, one can decide in polytime whether it can be ensured by $M$, and if so, compute a mixed strategy that guarantees it. In symmetric games, a product minimax strategy for $M$ can be computed efficiently, even without being given the safety vector. We also consider the performance guarantees that $M$ can offer his players in repeated settings. Our main result here is the extension of the oblivious setting of Feldman, Kalai and Tennenholtz, showing that in every symmetric game, a master player who never observes a single payoff can guarantee for each of its players a {\em similar} performance to that of the independent player, even if the latter gets to choose the payoff matrix after the fact. Mastering multi-player games Yossi Azar, Uriel Feige, Michal Feldmanhas 2 papers, Moshe Tennenholtzhas 2 papers


We analytically study the role played by the network topology in sustaining cooperation in a society of myopic agents in an evolutionary setting. In our model, each agent plays the Prisoner's Dilemma (PD) game with its neighbours, as specified by a network. Cooperation is the incumbent strategy, whereas defectors are the mutants. Starting with a population of cooperators, some agents are switched to defection. The agents then play the PD game with their neighbours and compute their fitness. After this, an evolutionary rule, or imitation dynamic is used to update the agent strategy. A defector switches back to cooperation if it has a cooperator neighbour with higher fitness. The network is said to sustain cooperation if almost all defectors switch to cooperation. Earlier work on the sustenance of cooperation has largely consisted of simulation studies, and we seek to complement this body of work by providing analytical insight for the same.
We find that in order to sustain cooperation, a network should satisfy some properties such as small average diameter, densification, and irregularity. Real-world networks have been empirically shown to exhibit these properties, and are thus candidates for the sustenance of cooperation. We also analyze some specific graphs to determine whether or not they sustain cooperation. In particular, we find that scale-free graphs belonging to a certain family sustain cooperation, whereas Erdos-Renyi random graphs do not. To the best of our knowledge, ours is the first analytical attempt to determine which networks sustain cooperation in a population of myopic agents in an evolutionary setting.
Sustaining Cooperation on Networks: An Analytical Study based on Evolutionary Game Theory
Raghunandan Ananthasayanam, Subramanian Chandrasekarapuram

Session 4F – Logics for Agency

Session 5A – Robotics III

Session 5B – Teamwork II


This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilising different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in \emph{ad hoc team} problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium. Comparative Evaluation of MAL Algorithms in a Diverse Set of Ad Hoc Team Problems Stefano Albrecht, Subramanian Ramamoorthyhas 2 papers

Session 5C – Emergence


In this paper we present an approach for improving the accuracy of shared opinions in a large decentralised team. Specifically, our solution optimises the opinion sharing process in order to help the majority of agents to form the correct opinion about a state of a common subject of interest, given only few agents with noisy sensors in the large team. We build on existing research that has examined models of this opinion sharing problem and shown the existence of optimal parameters where incorrect opinions are filtered out during the sharing process. In order to exploit this collective behaviour in complex networks, we present a new decentralised algorithm that allows each agent to gradually regulate the importance of its neighbours' opinions (their social influence). This leads the system to the optimised state in which agents are most likely to filter incorrect opinions, and form a correct opinion regarding the subject of interest. Crucially, our algorithm is the first that does not introduce additional communication over the opinion sharing itself. Using it 80-90% of the agents form the correct opinion, in contrast to 60-75% with the existing message-passing algorithm DACOR proposed for this setting. Moreover, our solution is adaptive to the network topology and scales to thousands of agents. Finally, the use of our algorithm allows agents to significantly improve their accuracy even when deployed by only half of the team. Efficient Opinion Sharing in Large Decentralised Teams Oleksandr Pryymak, Alex Rogershas 9 papers, Nick Jenningshas 11 papers


In recent years, social networking sites and social media have become a very important part of peoples' lives, driving everything from family relationships to revolutions. In this work, we study the different patterns of interaction behavior seen in an online social network. We investigate the difference in the relative time people allocate to their friends versus that which their friends allocate to them, and propose a measure for this difference in time allocation. The distribution of this measure is used to identify classes of social agents through agglomerative hierarchical clustering. These classes are then characterized in terms of two important structural attributes: Degree distributions and clustering coefficients.
We demonstrate our approach on two large social networks obtained from Facebook. For each network we have the list of all social interactions that took place over six months. The total number of people in the two networks is 939,453, and 841,456 with 1.4 million and 8.4 million interactions, respectively. Our results show that, based on the interaction behavior, there are four main classes of agents in both networks, and that they are consistent across the two networks. Furthermore, each class is characterized by a specific profile of degree distributions and clustering coefficients, which are also consistent across both networks.
We speculate that agents corresponding to the four classes play different roles in the social network. To test this, we developed an opinion propagation model where opinions are represented as m-bit strings communicated from agent to agents. An agent receiving an opinion then selectively modifies its own opinions depending on the social and informational value it places upon communications from the sending agent, its overall agreement with the sending agent, and its own propensities. Opinions are injected into the system by agents of specific classes and their spread is tracked by propagating tags. The resulting data is used to analyze the influence of agents from each class in the viral spread of ideas under various conditions. The analysis also shows what behavioral factors at the agent level have the most significant impact on the spreading of ideas.
Agents of Influence in Social Networks
Amer Ghanem, Srinivasa Vedanarayanan, Ali Minai

Session 5D – Auction & Mechanism Design


Scoring rules for eliciting expert predictions of random variables are usually developed assuming that experts derive utility only from the quality of their predictions. We study more realistic settings in which (a) the principal is a decision maker who takes a decision based on the expert's prediction; and (b) the expert has an inherent \emph{interest} in the decision. Not surprisingly, in such situations, the expert usually has an incentive to misreport her forecast to influence the choice of the decision maker. We develop a general model for this setting and introduce the concept of a \emph{compensation rule}. When combined with the expert's inherent utility for decisions, a compensation rule induces a \emph{net scoring rule} that behaves like a traditional scoring rule. Assuming full knowledge of expert utility, we provide a complete characterization of all (strictly) proper compensation rules. We then analyze the case when the expert's utility function is not fully known to the decision maker. We show bounds on: (a) expert incentive to misreport; (b) the degree to which an expert will misreport; and (c) decision maker loss in utility due to such uncertainty. These bounds depend in natural ways on the degree of uncertainty, the local degree of convexity of net scoring function, and properties of the decision maker's utility function. Finally, we briefly discuss the use of compensation rules in prediction markets. Eliciting Forecasts from Self-interested Experts: Scoring Rules for Decision Makers Craig Boutilier


Many important problems in multiagent systems involve the allocation of multiple resources among the agents. For resource allocation problems, the well-known VCG mechanism satisfies a list of desired properties, including efficiency, strategy-proofness, individual rationality, and the non-deficit property. However, VCG is generally not budget-balanced. Under VCG, agents pay the VCG payments, which reduces social welfare. To offset the loss of social welfare due to the VCG payments, VCG redistribution mechanisms were introduced. These mechanisms aim to redistribute as much VCG payments back to the agents as possible, while maintaining the aforementioned desired properties of the VCG mechanism.
We continue the search for worst-case optimal VCG redistribution mechanisms -- mechanisms that maximize the fraction of total VCG payment redistributed in the worst case. Previously, a worst-case optimal VCG redistribution mechanism (denoted by WCO) was characterized for multi-unit auctions with nonincreasing marginal values. Later, WCO was generalized to settings involving heterogeneous items, resulting in the HETERO mechanism. This \emph{conjectured} that HETERO is feasible and worst-case optimal for heterogeneous-item auctions with unit demand. In this paper, we propose a more natural way to generalize the WCO mechanism. We prove that our generalized mechanism, though represented differently, actually coincides with HETERO. Based on this new representation of HETERO, we prove that HETERO is indeed feasible and worst-case optimal in heterogeneous-item auctions with unit demand. Finally, we conjecture that HETERO remains feasible and worst-case optimal in the even more general setting of combinatorial auctions with gross substitutes.
Worst-Case Optimal Redistribution of VCG Payments in Heterogeneous-Item Auctions with Unit Demand
Mingyu Guo

Session 5E – Game & Agent Theories


Boolean games are a compact and expressive class of games, based on propositional logic. However, Boolean games are computationally complex: checking for the existence of pure Nash equilibria in Boolean games is $\Sigma^p_2$-complete, and it is co-NP-complete to check whether given a given outcome for a Boolean game is a pure Nash equilibrium. In this paper, we consider two possible avenues to tractability in Boolean games. First, we consider the development of alternative solution concepts for Boolean games. We introduce the notion of $k$-bounded Nash equilibrium, meaning that no agent can benefit from deviation by altering fewer than $k$ variables. After motivating and discussing this notion of equilibrium, we give a logical characterisation of a class of Boolean games for which $k$-bounded equilibria correspond to Nash equilibria. That is, we precisely characterise a class of Boolean games for which all Nash equilibria are in fact $k$-bounded Nash equilibria. Second, we consider classes of Boolean games for which computational problems related to Nash equilibria are easier than in the general setting. We first identify some restrictions on games that make checking for beneficial deviations by individual players computationally tractable, and then show that certain types of \emph{socially desirable} equilibria can be hard to compute even when the standard decision problems for Boolean games are easy. We conclude with a discussion of related work and possible future work. Towards Tractable Boolean Games Paul Dunne, Michael Wooldridgehas 2 papers

Session 5F – Logic and Verification


Copyright © 2012 IFAAMAS