In this paper we propose a method for scaling up filter-based feature selection in classification problems. We use the conditional mutual information as filter measure and show how the required statistics can be computed in parallel avoiding unnecessary calculations. The distribution of the calculations between the available computing units is determined based on balanced incomplete block designs, a strategy first developed within the area of statistical design of experiments. We show the scalability of our method through a series of experiments on synthetic and real-world datasets.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com