Tags:AI safety, Digital trust, Ensemble models, Machine unlearning and Poisoning attack
Abstract:
Knowledge removal is a crucial task in AI safety and for aligning with the Right To Be Forgotten (RTBF) principle. Machine Unlearning (MU) is an important means for achieving knowledge removal by removing the ML impacts of a specified subset of training data. However, existing MU frameworks may be misused to facilitate emerging novel poisoning attacks, where adversaries may introduce both poisoned data and the corresponding mitigation data that temporarily neutralize the effects of the poisoned data. The adversaries then submit malicious unlearning requests for the mitigation data, hence maintaining the malicious effects of the poison. Such attacks have been shown to be effective in single-model scenarios; however, their impacts on ensemble models, which are widely adopted because of their robustness, remain underexplored. Recognizing this gap, we extend these emerging poisoning attacks to ensemble settings to better understand and address the potential risks of malicious unlearning. Our extensive experimental results show that the proposed extended poisoning attacks are effective also in the ensemble settings, achieving a high attack success rate, highlighting the importance of continued research in safeguard measures against misuse of MU as one of the important requirements of AI safety.