The Effects of Label Errors in Training Data on Model Performance and Overfitting

Nicholas Pellegrino; Nolen Zhao; Paul Fieguth

doi:10.15353/jcvis.v9i1.10007

Vol. 9 No. 1 (2023)
Special Issue: Proceedings of CVIS 2023

Articles

The Effects of Label Errors in Training Data on Model Performance and Overfitting

https://doi.org/10.15353/jcvis.v9i1.10007

Published 2024-04-30

Nicholas Pellegrino
Nolen Zhao
Paul Fieguth

How to Cite

Pellegrino, N., Zhao, N., & Fieguth, P. (2024). The Effects of Label Errors in Training Data on Model Performance and Overfitting. Journal of Computational Vision and Imaging Systems, 9(1), 26–29. https://doi.org/10.15353/jcvis.v9i1.10007

Download Citation

Abstract

Training data used in machine learning applications are often assumed to be perfect, i.e., do not contain any errors; however, this is almost never the case and may lead to limitations in the resulting model performance. In this paper, the effects of the presence of label errors in training data are studied quantitatively and in relation to model overfitting. By artificially creating label errors, it is observed that a constrained (small) CNN model exhibits remarkable generalizability --- retaining high accuracy even when most data are mislabelled! Test accuracy catastrophically falls only for unrealistically high label error rates, at a point related to the number of classes present in the data. These preliminary experiments pave the road towards further studies of model robustness, possibly offering a quantitative method through which to compare models.

PDF