Parallel Distributed Genetic Rule Selection for Data Mining from Large Data Sets

Nojima, Yusuke; Mihara, Shingo; Ishibuchi, Hisao

doi:10.3233/978-1-61499-092-5-140

Abstract

Genetic algorithms (GAs) have been successfully used for data mining thanks to their flexibility. Users easily incorporate their preference into objective functions to be optimized. Although GA-based data mining techniques are useful, there is a serious difficulty in the handling of data sets with a large number of patterns and/or attributes. That is, we need much long computation time for the evaluation of candidate solutions because all the patterns have to be classified by each candidate solution. To reduce the computation time of GA-based data mining, we propose a parallel distributed implementation of genetic rule selection. The main characteristic is to divide not only a population in GA but also a training data set into a number of sub-groups. Then a pair of a sub-population and a training data subset is assigned to a single CPU core. This approach can drastically reduce the computation time with no serious deterioration in the generalization ability of obtained classifiers. We demonstrate the effectiveness of our parallel distributed implementation through computational experiments on large data sets available from the UCI machine learning repository.

This website uses cookies

This website uses cookies