Abstract
We consider a binary classification problem in the case where some observations in the training data are incorrectly labeled. In the presence of such label noise, conventional classification fails to obtain a classifier to be generalized to a population. In this work, we investigate label noise logistic regression and explain how it works with noisy training data. We demonstrate that, when label transition probabilities are correctly provided, label noise logistic regression satisfies the Fisher consistency and enjoys the property of robustness. To accommodate various label noise mechanisms that occur in practice, we propose a flexible label noise model in a nonparametric way. We propose an efficient algorithm under the thresholding rule for individual parameter estimation. We demonstrate its performance under synthetic and real examples. We discuss the proposed flexible transition model is also useful for robust classification.
Supplementary Materials
Supplementary document: The pdf file contains all proofs for theorems/corollaries and additional tables for simulation studies.
R code: The zipped file contains (1) an R file having core functions for label noise logistic regression implementation, (2) four R files used for simulation under four different label noise scenarios, and (3) the readme file.