<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="client.xsl" type="text/xsl"?>
<article article-type="other">
<front>
<journal-meta>
<journal-id/>
<issn/>
<banner>
<!--<href>banner.jpg</href>-->
<size width="100%"/>
</banner>
</journal-meta>
<article-meta>
<title-group>
<article-title>A Reinforcement Learning based Distributed Search Algorithm For Hierarchical<br/> Peer-to-Peer Information Retrieval Systems</article-title>
</title-group>

<author><a href="mailto:hzhang@ist.psu.edu"><name>Haizheng Zhang</name></a></author>
<aff>College of Information Science and Technology<br/> Pennsylvania State University University Park, PA 16803</aff>

<author><a href="mailto:lesser@cs.umass.edu"><name>Victor Lesser</name></a></author>
<aff>Department of Computer Science, University Of Massachusetts
Amherst, MA 01003</aff>

</article-meta></front>
<body>
<abstract>
<title>ABSTRACT</title>
<p>The dominant existing routing strategies employed in peerto-
peer(P2P) based information retrieval(IR) systems are
similarity-based approaches. In these approaches, agents
depend on the content similarity between incoming queries
and their direct neighboring agents to direct the distributed
search sessions. However, such a heuristic is myopic in that
the neighboring agents may not be connected to more relevant
agents. In this paper, an online reinforcement-learning
based approach is developed to take advantage of the dynamic
run-time characteristics of P2P IR systems as represented
by information about past search sessions. Specically, agents maintain estimates on the downstream agents'abilities to provide relevant documents for incoming queries.
These estimates are updated gradually by learning from the
feedback information returned from previous search sessions.
Based on this information, the agents derive corresponding
routing policies. Thereafter, these agents route the queries
based on the learned policies and update the estimates based
on the new routing policies. Experimental results demonstrate
that the learning algorithm improves considerably the
routing performance on two test collection sets that have
been used in a variety of distributed IR studies.</p>
</abstract>
<fpdf>
<href>pdflogo.jpg</href>
<hpdf>AAMAS07_0359_3500bb99ce25c0f19ecabf81c0035d19</hpdf>
</fpdf>
</body>
</article>

