Tags:label noise detection, learning to identify mislabeled samples and training dynamics
Abstract:
Label noises exist in many datasets, even in the famous CIFAR and ImageNet datasets. While mislabeled or ambiguously-labeled samples in the training set could negatively affect the performance of deep models, diagnosing the dataset and identifying mislabeled samples helps to improve the generalization power. Recently, training dynamics, i.e., the traces left by iterations of optimization algorithms, have been adopted to localize mislabeled samples with hand-crafted features. In this paper, we introduce a novel learning-based solution, leveraging a noise detector, which learns to predict whether a sample was mislabeled using training dynamics. Extensive experiments are conducted to evaluate the proposed method. Results show that the proposed method can precisely detect mislabeled samples of various datasets without further adaptation or retraining, and outperforms state-of-the-art methods, in terms of AUC and mAP scores of mislabeled sample detection and test errors after excluding the predicted mis- labels.
Learning from Training Dynamics: Identifying Mislabeled Data Beyond Manually Designed Features