Linguistically-Motivated Automatic Classification of Lithuanian Texts for Didactic Purposes

Grigonytė, Gintarė; Kovalevskaitė, Jolanta; Rimkutė, Erika

doi:10.3233/978-1-61499-912-6-38

Linguistically-Motivated Automatic Classification of Lithuanian Texts for Didactic Purposes

Authors

Gintarė Grigonytė, Jolanta Kovalevskaitė, Erika Rimkutė

Pages

38 - 46

DOI

10.3233/978-1-61499-912-6-38

Series

Frontiers in Artificial Intelligence and Applications

Ebook

Volume 307: Human Language Technologies – The Baltic Perspective

Abstract

This paper presents an effort to provide a level-appropriate study corpus for Lithuanian language learners. The collected corpus includes levelled texts from study books and unlevelled texts from other sources. The main goal is to assign the level-appropriate labels (A1, A2, B1, B2) to texts from other sources. For automatic classification we use preselected surface features, based on text readability research, and shallow linguistic features. First, we train the model with levelled texts from study books; second, we apply the learned model to classifying other texts. The best classification results are achieved with Logistic Regression method.

This website uses cookies

This website uses cookies