| ||||
| ||||
![]() Title:A Multi-Stage Data Construction Approach for Code-Switching Grammatical Error Correction Conference:PRICAI 2025 Tags:Code-Switching, Grammatical Error Correction and Perplexity Filtering Abstract: Code-switching (CSW) refers to the phenomenon where multilingual speakers integrate multiple languages within a single utterance. Although recent studies have made notable progress in developing Grammatical Error Correction (GEC) systems for CSW scenarios involving Chinese lexical items, many still rely on simplistic translation-based data generation, which often limits semantic diversity and fails to capture the complexity of natural CSW expressions. To address this issue, we propose a multi-stage data construction approach to enrich training datasets and improve model generalization. Specifically, we first employ a model-based generation method to produce monolingual augmented data, followed by a perplexity-based (PPL) adaptive filtering algorithm to ensure data diversity and quality. Next, we apply three levels of translation-based augmentation to both the filtered and the original datasets, effectively simulating natural CSW patterns at varying levels of complexity. Finally, we perform multi-stage model training on the combined datasets to progressively enhance model robustness across diverse data distributions. Experimental results show that our optimized model achieves an average improvement of 1.82 $F_{0.5}$ points across two CSW GEC test sets, demonstrating the effectiveness of the proposed approach. A Multi-Stage Data Construction Approach for Code-Switching Grammatical Error Correction ![]() A Multi-Stage Data Construction Approach for Code-Switching Grammatical Error Correction | ||||
| Copyright © 2002 – 2025 EasyChair |
