An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records

Koshman, Varvara; Funkner, Anastasia; Kovalchuk, Sergey

doi:10.3233/SHTI210579

An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records

Authors

Varvara Koshman, Anastasia Funkner, Sergey Kovalchuk

Pages

94 - 99

DOI

10.3233/SHTI210579

Category

Research Article

Series

Studies in Health Technology and Informatics

Ebook

Volume 285: pHealth 2021

Abstract

Electronic Medical Records (EMR) contain a lot of valuable data about patients, which is however unstructured. There is a lack of labeled medical text data in Russian and there are no tools for automatic annotation. We present an unsupervised approach to medical data annotation. Morphological and syntactical analyses of initial sentences produce syntactic trees, from which similar subtrees are then grouped by Word2Vec and labeled using dictionaries and Wikidata categories. This method can be used to automatically label EMRs in Russian and proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabularies.

This website uses cookies

This website uses cookies