Background: Tagging text data with codes representing biomedical concepts plays an important role in medical data management and analysis. A problem occurs if there are ambiguous words linked to several concepts.
Objectives and Methods: This study aims at investigating word sense disambiguation based on word embedding and recurrent convolutional neural networks. The study focuses on terms mapped to multiple concepts of the Unified Medical Language System (UMLS).
Results: We created 20 text processing pipelines trained on a subset of the MeSH Word Sense Disambiguation (MSH WSD) data set, each pipeline disambiguating the sense of one word. The pipelines were then tested on a disjoint subset of MSH WSD data. Most pipelines achieved good or even excellent results (70% of the pipelines achieved at least 90% accuracy, 40% achieved at least 98% accuracy). One poor-performing outlier was detected.
Conclusion: The proposed approach can serve as a basis for an up-scaled system combining pipelines for many ambiguous words. The methods used here recently proved very successful in other fields of text understanding and can be expected to scale-up with improved availability of training data.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com