Background: Measuring the performance of a classifier is crucial when trying to find the best machine-learning algorithm with optimal parameters. Multiple methods are available in this regard and the more common is the k-fold cross-validation. Other similar methods are the bootstrap method and the k-fold repeated cross-validation.
Objective: This paper compares three such methods, namely the k-fold cross-validation, the k-fold repeated cross-validation, and the bootstrap method. The latter two have regarding our experimental set-up a 20-fold increase in computational effort. The objective of this paper was to experimentally find the best cross-validation method regarding both its accuracy and its computational effort.
Methods: Four classification algorithms were selected and applied on multiple datasets within the field of Life Sciences (n=35) using all three selected cross-validation methods. We used the pairwise dependent Student's T-Test with the standard 95% confidence interval for statistical comparisons.
Results: The results of the statistical comparisons between the cross-validation methods were as follows. Despite 20 times less computational effort, the k-fold cross-validation method was statistically considered equal to the k-fold repeated cross-validation. The third method, the bootstrap method, was considered to be too pessimistic and therefore inferior to the other two selected methods.
Conclusion: The k-fold cross-validation was proved to be the best choice between the selected cross-validation methods both regarding its accuracy and its computational effort.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org