There is a real need among researchers and students for pedagogical resources. In France, information retrieval techniques have been developed, for example in the Doc'CISMeF web site. As Pubmed, documents are indexed with (French) MeSH terms, one of the problems discovered, in quality studies, is the inadequacies between the user requests and the MeSH controlled vocabulary. Moreover, French (but also Greek or Spanish), pose specific problems for indexing, due to the diacritic characters.
In this article, we present the Grepator project. The main goal is to transform any thesaurus (or any entry) in case mix and accentuated characters, for a specific domain. Furthermore, Grepator has to complete MeSH terms according to their usual form in natural language and finally, to correct user spelling mistakes. Grepator is based on a statistical approach. A large French medical corpus has been constituted from pedagogical resources indexed in CISMeF. Using regular expressions, Grepator searches the more usual ways to spell the word.. Seventy five percent of MeSH terms are found in the corpus, using this method, with less than one mistake for a hundred words. This first evaluation of the tools is analyzed and we discuss further steps that might be developed.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com