Experiment and Comparision on Classification of Chinese Car Reviews

Liu, Xuedan; Wang, Yinglin

doi:10.3233/978-1-61499-900-3-810

Abstract

With the rapid development of e-commerce and online review platforms, the number of reviews of product has been multiplied, which makes it significant to mine valuable information from them for both businesses and consumers. Usually text classification methods are the main approaches to deal with this kind of problems. There are several steps in the process of text classification, and many different choices of methods or components can be selected in each step, so there are many possible combinations of schemas. However, there was lack of comparison of those different combinations in the past. In this paper, different combinations of components of text classification are constructed and evaluated. In the feature selection and weighting step, mutual information, information gain, chi-square test and TF-IDF methods are used as the alternatives. In the text classification step, four frequently used machine learning methods are selected as the components. The experiments are conducted on an annotated Chinese car reviews corpus. Results show that the combination of using chi-square test and Support Vector Machine algorithm obtain the best performance. The relationship between the performance and the number of the features is also studied, and empirical size of the corpus in this kind of task is given.

Contact

IOS Press Copyright 2024

Contact

IOS Press Copyright 2024

This website uses cookies

This website uses cookies