As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Imbalanced text classification, as practical and essential text classification, is the task to learn labels or categories for imbalanced text data. Existing imbalanced text classification approaches are mostly based on the Imbalance Ratio (i.e. ratio of sizes between categories). Recently, some researchers verified that the imbalance ratio severely affects the performance of classifiers when intrinsic characteristics of data such as class overlapping and small disjuncts occur. However, since the distribution of real-world data is unknown, it is difficult to describe above intrinsic characteristics directly. In this paper, we transform the unknown distribution of data into a graph model and present a graph-based imbalance index named GIR to predict the impact of imbalanced text data on classification performance. Firstly, we introduce an environmental factor that makes the imbalance index sensitive to the intrinsic characteristics of data. Secondly, we propose a graph-based method to calculate this environmental factor. Finally, we use the imbalance index to analyze the performances of imbalanced learning methods and the impact of imbalanced data on text classifiers. The experimental results evaluated on both synthetic data sets and real-world data sets demonstrate the effectiveness of our approach.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.