Language Use in a Multilingual Tweet Corpus

Milajevs, Dmitrijs

doi:10.3233/978-1-61499-912-6-88

IOS Press Ebooks

Guest Access

As a guest user you are not logged in or recognized by your IP address. You have access to the Front Matter, Abstracts, Author Index, Subject Index and the full text of Open Access publications.

loading subjects...

Language Use in a Multilingual Tweet Corpus

Authors

Dmitrijs Milajevs

Pages

88 - 95

DOI

10.3233/978-1-61499-912-6-88

Series

Frontiers in Artificial Intelligence and Applications

Ebook

Volume 307: Human Language Technologies – The Baltic Perspective

Abstract

A trilingual Latvian-Russian-English corpus of tweets is presented with an analysis of users, language and topics. The corpus consists of 1.4 million tweets that cover a period from April 2017 to July 2018. The language analysis reveals that the majority of users mostly use one language. Across topics, there is more Latvian content than in the whole collection. Among many potential use cases, the corpus can be used, for example, to study the public engagement of major Latvian media outlets and public figures, or the factors that determine language choice and content of a tweet.

This website uses cookies

This website uses cookies