Tags:Backdoor cleansing, Deep neural networks and Knowledge Distillation
Abstract:
Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, which only need to poison a small portion of samples to control the behavior of the target model. Moreover, the escalating stealth and power of backdoor attacks present not only significant challenges to backdoor defenses but also enormous potential threats to the widespread adoption of DNNs. In this paper, we propose a novel backdoor defense framework, called Self-Distillation Backdoor Cleansing (SDBC), to remove backdoor triggers from the attacked model. For the practical scenario where only a very small portion of clean data is available, SDBC first introduces self-distillation to clean the backdoor in DNNs. Extensive experiments demonstrate that SDBC can effectively remove backdoor triggers under 6 state-of-the-art backdoor attacks using less than 5% or even less than 1% clean training data without compromising accuracy. Experimental results show that the proposed SDBC outperforms existing state-of-the-art (SOTA) methods, reducing the average ASR from 95.36% to 5.75% and increasing the average ACC by 1.92%.
SDBC: a Novel and Effective Self-Distillation Backdoor Cleansing Approach