<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="client.xsl" type="text/xsl"?>
<article article-type="other">
<front>
<journal-meta>
<journal-id/>
<issn/>
<banner>
<!--<href>banner.jpg</href>-->
<size width="100%"/>
</banner>
</journal-meta>
<article-meta>
<title-group>
<article-title>Model-Based Function Approximation in Reinforcement Learning</article-title></title-group>

<author><a href="mailto:nkj@cs.utexas.edu"><name>Nicholas K. Jong</name></a></author>
<aff>The University of Texas at Austin, 1 University Station C0500 Austin, Texas 78712-0233</aff>

<author><a href="mailto:pstone@cs.utexas.edu"><name>Peter Stone</name></a></author>
<aff>The University of Texas at Austin, 1 University Station C0500 Austin, Texas 78712-0233</aff>
</article-meta></front>
<body>
<abstract>
<title>ABSTRACT</title>
<p>Reinforcement learning promises a generic method for adapting
agents to arbitrary tasks in arbitrary stochastic environments,
but applying it to new real-world problems remains
difficult, a few impressive success stories notwithstanding.
Most interesting agent-environment systems have large state
spaces, so performance depends crucially on efficient generalization
from a small amount of experience. Current algorithms
rely on model-free function approximation, which
estimates the long-term values of states and actions directly
from data and assumes that actions have similar values in
similar states. This paper proposes model-based function
approximation, which combines two forms of generalization
by assuming that in addition to having similar values in similar
states, actions also have similar effects. For one family of
generalization schemes known as averagers, computation of
an approximate value function from an approximate model
is shown to be equivalent to the computation of the exact
value function for a finite model derived from data. This
derivation both integrates two independent sources of generalization
and permits the extension of model-based techniques
developed for finite problems. Preliminary experiments
with a novel algorithm, AMBI (Approximate Models
Based on Instances), demonstrate that this approach yields
faster learning on some standard benchmark problems than
many contemporary algorithms.</p>
</abstract>
<fpdf>
<href>pdflogo.jpg</href>
<hpdf>AAMAS07_0507_bdb14e0b95b4ae78b6befc637836ebf0</hpdf>
</fpdf>
</body>
</article>

