Abstract:
Arabic dialogue generation presents unique challenges due to the language's rich
morphology and the scarcity of data resources. Recent advances have employed metalearning to facilitate fast adaptation of language models to low-resource domains. This
thesis builds upon such groundwork by introducing paraphrase data augmentation to
further improve the generalization and adaptation capabilities of pre-trained models in
Arabic Natural Language Generation (NLG). We propose an enhanced approach that
leverages a fine-tuned ARAT5 model with meta-learning via the Reptile algorithm. Our
methodology encompasses augmenting both the context and responses within the
auxiliary and target datasets. We incorporate paraphrase data augmentation for 10% and
30% of the seed data, examining the resultant impact on model performance. Our
experiments demonstrate significant improvements in dialogue generation quality, as
evidenced by higher BLEU-4 scores and Semantic Textual Similarity (STS) metrics in
intrinsic evaluation, even with limited data. These results surpass those achieved by the
state-of-the-art methods described in prior work. The qualitative extrinsic evaluations
reinforce the quantitative metrics, indicating a noticeable enhancement in the fluency and
relevance of the generated responses. Our findings suggest that paraphrase data
augmentation, when used judiciously within the framework of meta-learning, can serve
as a powerful tool for advancing the field of Arabic conversational AI, particularly in
low-resource scenarios.