| ||||
| ||||
![]() Title:IndoClinNER: Overcoming Medical Prior Bias in Clinical de-Identification via Adversarial Surname Injection Conference:IEEE CBMS 2026 Tags:Adversarial NLP, Alert Fatigue, Clinical Privacy, De-identification, Hybrid AI Architecture and Medical NLP Abstract: Nursing shift handovers automated via Retrieval-Augmented Generation (RAG) promise significant reductions in administrative burden. Our prior work demonstrated a local-first RAG framework deployable on consumer CPU hardware, achieving a 43.2% reduction in handover time while maintaining zero patient identifier leakage through deterministic regex privacy controls. However, regex-based de-identification triggers false positives when common Bengali and Hindi names (Joy, Deep, Anal) overlap with English vocabulary and medical terminology, risking desensitization to genuine privacy warnings over time—a precursor to alert fatigue. Conversely, Western-trained Named Entity Recognition (NER) models exhibit what we term Medical Prior Bias, systematically failing to detect these homonymous names in clinical contexts. We present IndoClinNER, a hybrid privacy architecture combining deterministic regex, contextual NER, and Adversarial Surname Injection (ASI)—a novel inference-time technique that exploits learned bigram dependencies by synthetically injecting surnames to force syntactic disambiguation. To address the scarcity of annotated code-mixed clinical data, we developed a Dual-Path augmentation strategy: injecting realistic Bengali names—constructed from 438 unique first names and 86 surnames extracted from West Bengal voter lists—into authentic MIMIC-III nursing notes via a constrained LLM pipeline. On 450 synthetic adversarial sentences across three independent runs, ASI achieved 99.44% recall with 99.44% precision. On 97 expert-generated clinical notes, the system achieved 84.4% recall, with failure analysis confirming that errors occurred predominantly in telegraphic syntax lacking grammatical markers. The system operates at 20 ms mean inference latency on a 33M-parameter model running on consumer CPU hardware, suitable for resource-constrained settings without GPU infrastructure. IndoClinNER: Overcoming Medical Prior Bias in Clinical de-Identification via Adversarial Surname Injection ![]() IndoClinNER: Overcoming Medical Prior Bias in Clinical de-Identification via Adversarial Surname Injection | ||||
| Copyright © 2002 – 2026 EasyChair |
