The Classification of Short Scientific Texts Using Pretrained BERT Model

Danilov, Gleb; Ishankulov, Timur; Kotik, Konstantin; Orlov, Yuriy; Shifrin, Mikhail; Potapov, Alexander

doi:10.3233/SHTI210125

The Classification of Short Scientific Texts Using Pretrained BERT Model

Authors

Gleb Danilov, Timur Ishankulov, Konstantin Kotik, Yuriy Orlov, Mikhail Shifrin, Alexander Potapov

Pages

83 - 87

DOI

10.3233/SHTI210125

Category

Research Article

Series

Studies in Health Technology and Informatics

Ebook

Volume 281: Public Health and Informatics

Abstract

Automated text classification is a natural language processing (NLP) technology that could significantly facilitate scientific literature selection. A specific topical dataset of 630 article abstracts was obtained from the PubMed database. We proposed 27 parametrized options of PubMedBERT model and 4 ensemble models to solve a binary classification task on that dataset. Three hundred tests with resamples were performed in each classification approach. The best PubMedBERT model demonstrated F1-score = 0.857 while the best ensemble model reached F1-score = 0.853. We concluded that the short scientific texts classification quality might be improved using the latest state-of-art approaches.

This website uses cookies

This website uses cookies