This paper presents five document retrieval systems for a small (few thousands) and domain specific corpora (weekly peer-reviewed medical journals published in French) as well as an evaluation methodology to quantify the models performance. The proposed methodology does not rely on external annotations and therefore can be used as an ad hoc evaluation procedure for most document retrieval tasks. Statistical models and vector space models are empirically compared on a synthetic document retrieval task. For our dataset size and specificities the statistical approaches consistently performed better than its vector space counterparts.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com