<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="client.xsl" type="text/xsl"?>
<article article-type="other">
<front>
<journal-meta>
<journal-id/>
<issn/>
<banner>
<!--<href>banner.jpg</href>-->
<size width="100%"/>
</banner>
</journal-meta>
<article-meta>
<title-group>
<article-title>A Globally Optimal Algorithm for TTD-MDPs</article-title>
</title-group>

<author><a href="mailto:sooraj@cc.gatech.edu"><name>Sooraj Bhat</name></a></author>
<aff>College of Computing<br/>Georgia Institute of Technology</aff>

<author><a href="mailto:robertsd@cc.gatech.edu"><name>David L. Roberts</name></a></author>
<aff>College of Computing<br/> Georgia Institute of Technology</aff>

<author><a href="mailto:mnelson@cc.gatech.edu"><name>Mark J. Nelson</name></a></author>
<aff>College of Computing<br/> Georgia Institute of Technology</aff>

<author><a href="mailto:isbell@cc.gatech.edu"><name>Charles L. Isbell</name></a></author>
<aff>College of Computing<br/> Georgia Institute of Technology</aff>

<author><a href="mailto:michaelm@cs.ucsc.edu"><name>Michael Mateas</name></a></author>
<aff>Department of Computer Science<br/> University of California-Santa Cruz</aff>
</article-meta></front>
<body>
<abstract>
<title>ABSTRACT</title>
<p>In this paper, we discuss the use of <italic>Targeted Trajectory Distribution
Markov Decision Processes</italic> (TTD-MDPs)&#8211;a variant of MDPs in
which the goal is to realize a specified distribution of trajectories
through a state space&#8211;as a general agent-coordination framework.</p>
<p>We present several advances to previous work on TTD-MDPs.
We improve on the existing algorithm for solving TTD-MDPs by
deriving a greedy algorithm that finds a policy that provably minimizes
the global <italic>KL</italic>-divergence from the target distribution. We
test the new algorithm by applying TTD-MDPs to <italic>drama management</italic>,
where a system must coordinate the behavior of many agents
to ensure that a game follows a coherent storyline, is in keeping
with the author's desires, and offers a high degree of replayability.</p>
<p>Although we show that suboptimal greedy strategies will fail
in some cases, we validate previous work that suggests that they
can work well in practice. We also show that our new algorithm
provides guaranteed accuracy even in those cases, with little additional
computational cost. Further, we illustrate how this new
approach can be applied online, eliminating the memory-intensive
offline sampling necessary in the previous approach.</p>
</abstract>
<fpdf>
<href>pdflogo.jpg</href>
<hpdf>AAMAS07_0217_4c4356a20e38facc909085ac5729dc38</hpdf>
</fpdf>
</body>
</article>
