Item response theory (IRT) is widely used to measure latent abilities of subjects (specially for educational testing) based on their responses to items with different levels of difficulty. The adaptation of IRT has been recently suggested as a novel perspective for a better understanding of the results of machine learning experiments and, by extension, other artificial intelligence experiments. For instance, IRT suits classification tasks perfectly, where instances correspond to items and classifiers correspond to subjects. By adopting IRT, item (i.e., instance) characteristic curves can be estimated using logistic models, for which several parameters characterise each dataset instance: difficulty, discrimination and guessing. IRT looks promising for the analysis of instance hardness, noise, classifier dominances, etc. However, some caveats have been found when trying to interpret the IRT parameters in a machine learning setting, especially when we include some artificial classifiers in the pool of classifiers to be evaluated: the optimal and pessimal classifiers, a random classifier and the majority and minority classifiers. In this paper we perform a series of experiments with a range of datasets and classification methods to fully understand how IRT works and what their parameters really mean in the context of machine learning. This better understanding will hopefully pave the way to a myriad of potential applications in machine learning and artificial intelligence.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com