For the purpose of text classification or information retrieval, we apply preprocessing to these texts such as stemming and stopwords removal. Almost all the techniques could be useful only to well-formed text information like textbooks and news articles, but is not true to social network services (SNS) or any other texts in internet world. In this investigation, we propose how to extract stopwords in context of social network services. To do that, first we discuss what stopwords mean, how different from conventional ones, and we propose statistical filters TFIG and TFCHI, to identify. We examine categorical estimation to extract characteristic values putting our attention on Kullback Leibler Divergence (KLD) over temporal sequences on SNS data. Moreover we apply several preprocessing to manage unknown words and to improve morphological analysis.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com