As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Large-scale n-gram models are available for a small number of languages. So far, Croatian was not one of them. The research presented in this paper describes the development of n-gram database system suitable for large-scale language modeling in Croatian. The process of n-gram collection relies on Croatian academic online spellchecker Hascheck, which has been publicly available since 1993, and is today a popular language service, with average daily traffic exceeding million tokens. The approach demonstrated in this paper eliminated the need of n-gram data cleaning in the post-processing phase, which is a serious issue in other languages. The spellchecker dynamics allowed Heaps’ law modeling to be applied to Croatian n-grams, which enabled the prediction of n-gram count growth.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.