Word Sense Disambiguation of Medical Terms via Recurrent Convolutional Neural Networks

Festag, Sven; Spreckelsen, Cord

doi:10.3233/978-1-61499-759-7-8

Abstract

Background: Tagging text data with codes representing biomedical concepts plays an important role in medical data management and analysis. A problem occurs if there are ambiguous words linked to several concepts.

Objectives and Methods: This study aims at investigating word sense disambiguation based on word embedding and recurrent convolutional neural networks. The study focuses on terms mapped to multiple concepts of the Unified Medical Language System (UMLS).

Results: We created 20 text processing pipelines trained on a subset of the MeSH Word Sense Disambiguation (MSH WSD) data set, each pipeline disambiguating the sense of one word. The pipelines were then tested on a disjoint subset of MSH WSD data. Most pipelines achieved good or even excellent results (70% of the pipelines achieved at least 90% accuracy, 40% achieved at least 98% accuracy). One poor-performing outlier was detected.

Conclusion: The proposed approach can serve as a basis for an up-scaled system combining pipelines for many ambiguous words. The methods used here recently proved very successful in other fields of text understanding and can be expected to scale-up with improved availability of training data.

This website uses cookies

This website uses cookies