<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="client.xsl" type="text/xsl"?>
<article article-type="other">
<front>
<journal-meta>
<journal-id/>
<issn/>
<banner>
<!--<href>banner.jpg</href>-->
<size width="100%"/>
</banner>
</journal-meta>
<article-meta>
<title-group>
<article-title>Batch Reinforcement Learning in a Complex Domain</article-title>
</title-group>

<author><a href="mailto:shivaram@cs.utexas.edu"><name>Shivaram Kalyanakrishnan </name></a></author>
<aff>Department of Computer Sciences <br/>The University of Texas at Austin
</aff>

<author><a href="mailto:pstone@cs.utexas.edu"><name>Peter Stone</name></a></author>
<aff>Department of Computer Sciences <br/>The University of Texas at Austin
</aff>


</article-meta></front>
<body>
<abstract>
<title>ABSTRACT</title>
<p>Temporal difference reinforcement learning algorithms are
perfectly suited to autonomous agents because they learn
directly from an agent's experience based on sequential actions
in the environment. However, their most common algorithmic
variants are relatively inefficient in their use of
experience data, which in many agent-based settings can be
scarce. In particular, they make just one learning "update"
for each atomic experience. Batch reinforcement learning
algorithms, on the other hand, aim to achieve greater data
efficiency by saving experience data and using it in aggregate
to make updates to the learned policy. Their success has
been demonstrated in the past on simple domains like grid
worlds and low-dimensional control applications like pole
balancing. In this paper, we compare and contrast batch
reinforcement learning algorithms with on-line algorithms
based on their empirical performance in a complex, continuous,
noisy, multiagent domain, namely RoboCup soccer
Keepaway. We find that the two batch methods we consider,
Experience Replay and Fitted Q Iteration, both yield
significant gains in sample complexity, while achieving high
asymptotic performance.</p>
</abstract>
<fpdf>
<href>pdflogo.jpg</href>
<hpdf>AAMAS07_0271_a9e2d13da1aa1443ef8a35cdca6df501</hpdf>
</fpdf>
</body>
</article>

