As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Finding relevant information in the biomedical literature increasingly depends on efficient information retrieval (IR) algorithms. Cross-Encoders, SentenceBERT, and ColBERT are algorithms based on pre-trained language models that use nuanced but computable vector representations of search queries and documents for IR applications. Here we investigate how well these vectorization algorithms estimate relevance labels of biomedical documents for search queries using the OHSUMED dataset. For our evaluation, we compared computed scores to provided labels by using boxplots and Spearman’s rank correlations. According to these metrics, we found that Sentence-BERT moderately outperformed the alternative vectorization algorithms and that additional fine-tuning based on a subset of OHSUMED labels yielded little additional benefit. Future research might aim to develop a larger dedicated dataset in order to optimize such methods more systematically, and to evaluate the corresponding functions in IR tools with end-users.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.