Imbalanced data classification is an important task in data mining and machine learning. Imbalanced data consists of majority class and minority class, where the majority class leads to miss-classification of minority samples. Various approaches have been proposed in recent years to address this problem. Sampling, which focuses on balancing between classes, is one of the methods to solve the class imbalance problem. In previous our research, we have proposed Multivariate Normal Distribution based Over-Sampling (MNDO), which uses correlations between attributes and statistical methods, and have tackled this problem. In this paper, we propose Multivariate Normal Distribution based Over-sampling for Numerical and Categorical features (MNDO-NC) to sampling a dataset that contains both numerical data and categorical data. First, MNDO-NC generates numerical data using correlation coefficients and multivariate distribution. Next, calculate the distance between the generated data and the original data, and identify 5 nearest neighbors. The categorical data is sampled by applying a voting strategy for the neighborhood sample. Some existing methods generate new samples using distance function, but our method uses positive class statistics. Therefore, it can be applied even if the number of training samples is very small. In addition, outliers can be reproduced stochastically, so more realistic samples can be generated. In the experiment, we used 17 imbalanced datasets, which consist of numerical data and categorical data. To compare with the existing method, 6 sampling methods, 2 scaling and 3 learning methods were used. As a result of the experiment, the proposed method showed the same result as other methods.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org