Sharing Models and Tools for Processing German Clinical Texts

Hellrich, Johannes; Matthies, Franz; Faessler, Erik; Hahn, Udo

doi:10.3233/978-1-61499-512-8-734

Abstract

The automatic processing of non-English clinical documents is massively hampered by the lack of publicly available medical language resources for training, testing and evaluating NLP components. We suggest sharing statistical models derived from access-protected clinical documents as a reasonable substitute and provide solutions for sentence splitting, tokenization and POS tagging of German clinical texts. These three components were trained on the confidential FRAMED corpus, a non-sharable collection of various German-language clinical document types. The models derived therefrom outperform alternative components from OPENNLP and the Stanford POS tagger, also trained on FRAMED.

This website uses cookies

This website uses cookies