Learning from Training Dynamics: Identifying Mislabeled Data Beyond Manually Designed Features

Title:Learning from Training Dynamics: Identifying Mislabeled Data Beyond Manually Designed Features

Authors:Qingrui Jia, Xuhong Li, Lei Yu, Penghao Zhao, Jiang Bian, Shupeng Li, Haoyi Xiong and Dejing Dou

Conference:DataPerf 2022

Tags:label noise detection, learning to identify mislabeled samples and training dynamics

Abstract:

Label noises exist in many datasets, even in the famous CIFAR and ImageNet datasets. While mislabeled or ambiguously-labeled samples in the training set could negatively affect the performance of deep models, diagnosing the dataset and identifying mislabeled samples helps to improve the generalization power. Recently, training dynamics, i.e., the traces left by iterations of optimization algorithms, have been adopted to localize mislabeled samples with hand-crafted features. In this paper, we introduce a novel learning-based solution, leveraging a noise detector, which learns to predict whether a sample was mislabeled using training dynamics. Extensive experiments are conducted to evaluate the proposed method. Results show that the proposed method can precisely detect mislabeled samples of various datasets without further adaptation or retraining, and outperforms state-of-the-art methods, in terms of AUC and mAP scores of mislabeled sample detection and test errors after excluding the predicted mis- labels.