<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="client.xsl" type="text/xsl"?>
<article article-type="other">
<front>
<journal-meta>
<journal-id/>
<issn/>
<banner>
<!--<href>banner.jpg</href>-->
<size width="100%"/>
</banner>
</journal-meta>
<article-meta>
<title-group>
<article-title>Multiagent Learning in Adaptive Dynamic Systems</article-title>
</title-group>

<author><a href="mailto:burkov@damas.ift.ulaval.ca"><name>Andriy Burkov</name></a></author>
<aff>DAMAS Laboratory <br/>Laval University G1K 7P4, Quebec, Canada</aff>

<author><a href="mailto:chaib@damas.ift.ulaval.ca"><name>Brahim Chaib-draa</name></a></author>
<aff>DAMAS Laboratory <br/>Laval University G1K 7P4, Quebec, Canada</aff>
</article-meta></front>
<body>
<abstract>
<title>ABSTRACT</title>
<p>Classically, an approach to the multiagent policy learning
supposed that the agents, via interactions and/or by using
preliminary knowledge about the reward functions of all
players, would find an interdependent solution called "equilibrium".
Recently, however, certain researchers question
the necessity and the validity of the concept of equilibrium
as the most important multiagent solution concept.
They argue that a "good" learning algorithm is one that
is efficient with respect to a certain class of counterparts.
Adaptive players is an important class of agents that learn
their policies separately from the maintenance of the beliefs
about their counterparts' future actions and make their decisions
based on that policy and the current belief. In this
paper, we propose an efficient learning algorithm in presence
of the adaptive counterparts called Adaptive Dynamics
Learner (ADL), which is able to learn an efficient policy
over the opponents' adaptive dynamics rather than over the
simple actions and beliefs and, by so doing, to exploit these
dynamics to obtain a higher utility than any equilibrium
strategy can provide. We tested our algorithm on a substantial
representative set of the most known and demonstrative
matrix games and observed that ADL agent is highly efficient
against Adaptive Play <italic>Q</italic>-learning (APQ) agent and Infinitesimal
Gradient Ascent (IGA) agent. In self-play, when
possible, ADL is able to converge to a Pareto optimal strategy
maximizing the welfare of all players.</p>
</abstract>
<fpdf>
<href>pdflogo.jpg</href>
<hpdf>AAMAS07_0001_f8f8510f752a4a9bb356f0d287429aba</hpdf>
</fpdf>
</body>
</article>

