As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Identification of MultiWord Expressions (MWE) is one of the most challenging problems in Computer Linguistic and Natural Language Processing. A number of techniques are used to solve this problem in different language, mostly English. However not all techniques and approaches can be directly transferred to Lithuanian. Hence, in this paper we experiment with automatic identification of bi-gram MWEs for Lithuanian, which is considered to be under-resourced in terms of lexical resources and availability or accuracy of special lexical tools (e.g., POS-taggers, parsers). We use a raw corpus and combination of lexical association measures and supervised machine learning, which was shown to perform well for English and some other languages. Using this approach we have reached 70.4% precision for identification of typical MWEs, 77.1% precision for non-typical MWEs as well as 60.0% and 81.6% precision for typical adjective + noun and noun + noun MWEs respectively.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.