Latvian FrameNet: Cross-Lingual Issues

Nešpore-Bērzkalne, Gunta; Saulīte, Baiba; Grūzītis, Normunds

doi:10.3233/978-1-61499-912-6-96

Abstract

This paper reports the lessons learned while creating a FrameNet-annotated text corpus of Latvian. This is still an ongoing work, a part of a larger project which aims at the creation of a multilayer text corpus, anchored in cross-lingual state-of-the-art representations: Universal Dependencies (UD), FrameNet and PropBank, as well as Abstract Meaning Representation (AMR). For the FrameNet layer, we use the latest frame inventory of Berkeley FrameNet (BFN v1.7), while the annotation itself is done on top of the underlying UD layer. We strictly follow a corpus-driven approach, meaning that lexical units (LU) in Latvian FrameNet are created only based on the annotated corpus examples. Since we are aiming at a medium-sized still general-purpose corpus, an important aspect that we take into account is the variety and balance of the corpus in terms of genres, domains and LUs. We have finished the first phase of the FrameNet corpus annotation, and we have collected and discuss cross-lingual issues and their possible solutions. The issues are relevant for other languages as well, particularly if the goal is to maintain cross-lingual compatibility via BFN.

This website uses cookies

This website uses cookies