AraDialAug: A Meta-Learning and Data Augmentation Approach for Arabic Dialogue Generation

dc.contributor.AUBidnumber201907829
dc.contributor.advisorEl Hajj, Wassim
dc.contributor.authorEl Halabi, Assaad
dc.contributor.commembersSafa, Haidar
dc.contributor.commembersElbassuoni, Shady
dc.contributor.degreeMS
dc.contributor.departmentDepartment of Computer Science
dc.contributor.facultyFaculty of Arts and Sciences
dc.date2024
dc.date.accessioned2024-02-07T12:41:27Z
dc.date.available2024-02-07T12:41:27Z
dc.date.issued2024-02-07
dc.date.submitted2024-02-06
dc.description.abstractArabic dialogue generation presents unique challenges due to the language's rich morphology and the scarcity of data resources. Recent advances have employed metalearning to facilitate fast adaptation of language models to low-resource domains. This thesis builds upon such groundwork by introducing paraphrase data augmentation to further improve the generalization and adaptation capabilities of pre-trained models in Arabic Natural Language Generation (NLG). We propose an enhanced approach that leverages a fine-tuned ARAT5 model with meta-learning via the Reptile algorithm. Our methodology encompasses augmenting both the context and responses within the auxiliary and target datasets. We incorporate paraphrase data augmentation for 10% and 30% of the seed data, examining the resultant impact on model performance. Our experiments demonstrate significant improvements in dialogue generation quality, as evidenced by higher BLEU-4 scores and Semantic Textual Similarity (STS) metrics in intrinsic evaluation, even with limited data. These results surpass those achieved by the state-of-the-art methods described in prior work. The qualitative extrinsic evaluations reinforce the quantitative metrics, indicating a noticeable enhancement in the fluency and relevance of the generated responses. Our findings suggest that paraphrase data augmentation, when used judiciously within the framework of meta-learning, can serve as a powerful tool for advancing the field of Arabic conversational AI, particularly in low-resource scenarios.
dc.identifier.urihttp://hdl.handle.net/10938/24319
dc.language.isoen
dc.subjectNatural Language Processing
dc.subjectMachine Learning
dc.subjectData Science
dc.subjectData Augmentation
dc.subjectArabic Conversational Systems
dc.titleAraDialAug: A Meta-Learning and Data Augmentation Approach for Arabic Dialogue Generation
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ElHalabiAssaad_2024.pdf
Size:
1.76 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.65 KB
Format:
Item-specific license agreed upon to submission
Description: