Download PDFOpen PDF in browserAdversarial Attacks and Defenses in NLP: Securing Models Against Malicious InputsEasyChair Preprint 122719 pages•Date: February 24, 2024AbstractThis paper provides an overview of various adversarial attack techniques targeting NLP models, including but not limited to, input perturbations, gradient-based attacks, and semantic attacks. Furthermore, it surveys existing defense mechanisms aimed at bolstering the robustness of NLP models against such attacks. These defenses encompass methods such as adversarial training, input preprocessing, and model interpretability techniques. Moreover, it underscores the critical importance of addressing these security concerns to foster the responsible deployment of NLP technology in real-world applications. Adversarial attacks in NLP involve crafting inputs that are deliberately designed to mislead or manipulate the model's output, often with subtle alterations imperceptible to human observers. Keyphrases: Adversarial, Attacks, Defenses
|