Data-Centric Machine Learning: Improving Model Performance and Understanding Through Dataset Analysis

Westermann, Hannes; Šavelka, Jarom&#237;r; Walker, Vern R.; Ashley, Kevin D.; Benyekhlef, Karim

doi:10.3233/FAIA210316

Data-Centric Machine Learning: Improving Model Performance and Understanding Through Dataset Analysis

Authors

Hannes Westermann, Jaromír Šavelka, Vern R. Walker, Kevin D. Ashley, Karim Benyekhlef

Pages

54 - 57

DOI

10.3233/FAIA210316

Category

Research Article

Series

Frontiers in Artificial Intelligence and Applications

Ebook

Volume 346: Legal Knowledge and Information Systems

Abstract

Machine learning research typically starts with a fixed data set created early in the process. The focus of the experiments is finding a model and training procedure that result in the best possible performance in terms of some selected evaluation metric. This paper explores how changes in a data set influence the measured performance of a model. Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train/test splits, and the human labelling accuracy impact the performance of a trained deep learning classifier. Our experiments suggest that analyzing how data set properties affect performance can be an important step in improving the results of trained classifiers, and leads to better understanding of the obtained results.

This website uses cookies

This website uses cookies