Adversarial Attacks and Defenses in NLP: Securing Models Against Malicious Inputs

EasyChair Preprint 12271

9 pages•Date: February 24, 2024

Abstract

This paper provides an overview of various adversarial attack techniques targeting NLP models, including but not limited to, input perturbations, gradient-based attacks, and semantic attacks. Furthermore, it surveys existing defense mechanisms aimed at bolstering the robustness of NLP models against such attacks. These defenses encompass methods such as adversarial training, input preprocessing, and model interpretability techniques. Moreover, it underscores the critical importance of addressing these security concerns to foster the responsible deployment of NLP technology in real-world applications. Adversarial attacks in NLP involve crafting inputs that are deliberately designed to mislead or manipulate the model's output, often with subtle alterations imperceptible to human observers.

Keyphrases: Adversarial, Attacks, Defenses

Links:

https://easychair.org/publications/preprint/c7jzh

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:12271,
  author    = {Kurez Oroy and Herber Schield},
  title     = {Adversarial Attacks and Defenses in NLP: Securing Models Against Malicious Inputs},
  howpublished = {EasyChair Preprint 12271},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser