Re-Evaluating LLMs for Arabic Grammatical Error Correction: A Data-Centric Approach Using Synthetic Data and Knowledge Distillation

Abstract

Large language models (LLMs) have shown strong performance across Natural Language Processing (NLP) tasks, yet their effectiveness in Arabic Grammatical Error Correction (AGEC) relative to smaller, task-specific models remains unclear due to inconsistent evaluation settings. Our work addresses this gap by conducting a controlled comparison between LLMs and specialized sequence-to-sequence models under identical training conditions, using taxonomy-controlled synthetic data refined through sequence-level knowledge distillation. We construct DIA2-AGEC, a large-scale synthetic dataset of approximately 650K sentence pairs, balanced across Arabic grammatical error types defined in the ALC taxonomy. To improve target correctness in DIA2-AGEC, we generate a distilled variant, DIA2-Dist, by using a stronger teacher model (Command R7B), further improved through iterative self-distillation, to re-correct the synthetic inputs. Two representative models are evaluated under identical conditions: the LLM Qwen-2.5-3B-Instruct and the seq2seq model AraT5v2-base-1024. LLMs consistently underperform specialized models, but the performance gap narrows as model scale increases from 3B to 14B parameters, indicating consistent performance gains with scale. In parallel, our data-centric approach achieves state-of-the-art performance on QALB. AraT5v2 trained on distilled data reaches 80.86 F0.5 on QALB-test, while our LLM–seq2seq ensemble reaches 81.59 F0.5, improving to 82.37 when incorporating GPT-4o predictions. These findings highlight that data quality and supervision signals are the primary drivers of Arabic GEC performance, and that carefully designed synthetic data can enable open-source models to reach and surpass state-of-the-art results, while providing a clearer understanding of the scaling behavior of LLMs in this task.

Description

Release date : 2028-05-10.

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By