

We address the need for focused retrieval and integration of biomedical literature, enabling improved subject annotation of documents with domain concepts for which no manual labels are available. To do so, we propose a novel zero-shot method, called PN Relabeler, that improves heuristic concept-level annotations by relabeling the documents Predicted as Negative (PN) by the concept occurrence (CO) heuristic. CO has proven to be a good heuristic for fine-grained semantic indexing (FGSI) of biomedical literature. In particular, it is quite precise but still misses some document labels, i.e. it suffers from lower recall. The PN Relabeler method addresses this problem, by introducing a novel approach to combine heuristic annotations for unseen labels with knowledge learned from past labels. To do so, it first tackles the intermediate task of zero-shot relabeling of documents labeled with CO-based FGSI annotations. Then, it builds upon the power of domain-specific deep pretrained language models to improve the recall of the heuristic annotation, i.e. reduce the rate of false negative cases. The results reveal that relabeling with PN Relabeler improves the micro-F1 by more than 6%, compared to CO annotations, and by 3%, compared to state-of-the-art CO-based FGSI methods. This highlights the potential of learning from the errors of a strong heuristic on concepts where manual labels are already available. This knowledge is useful for improving the heuristic FGSI labels on new unseen concepts without manual labels in a zero-shot scenario.