This paper deals with the characterization of data complexity and the relationship with the classification accuracy. We study three dimensions of data complexity: the length of the class boundary, the number of features, and the number of instances of the data set. We find that the length of the class boundary is the most relevant dimension of complexity, since it can be used as an estimate of the maximum achievable accuracy rate of a classifier. The number of attributes and the number of instances do not affect classifier accuracy by themselves, if the boundary length is kept constant. The study emphasizes the use of measures revealing the intrinsic structure of data and recommends their use to extract conclusions on classifier behavior and their relative performance in multiple comparison experiments.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com