Hybrid Approach for Automatic Identification of Multi-Word Expressions in Lithuanian

Mandravickaitė, Justina; Rimkutė, Erika; Krilavičius, Tomas

doi:10.3233/978-1-61499-701-6-153

Abstract

Identification of MultiWord Expressions (MWE) is one of the most challenging problems in Computer Linguistic and Natural Language Processing. A number of techniques are used to solve this problem in different language, mostly English. However not all techniques and approaches can be directly transferred to Lithuanian. Hence, in this paper we experiment with automatic identification of bi-gram MWEs for Lithuanian, which is considered to be under-resourced in terms of lexical resources and availability or accuracy of special lexical tools (e.g., POS-taggers, parsers). We use a raw corpus and combination of lexical association measures and supervised machine learning, which was shown to perform well for English and some other languages. Using this approach we have reached 70.4% precision for identification of typical MWEs, 77.1% precision for non-typical MWEs as well as 60.0% and 81.6% precision for typical adjective + noun and noun + noun MWEs respectively.

This website uses cookies

This website uses cookies