MODELS AND RESOURCES FOR ARABIC DATA-TO-TEXT GENERATION

Touma, Roudy

AUB ScholarWorks Home
→
Students Publications
→
AUB Students' Theses, Dissertations, and Projects
→
View Item

MODELS AND RESOURCES FOR ARABIC DATA-TO-TEXT GENERATION

Touma, Roudy

URI: http://hdl.handle.net/10938/23291

Date: 2/3/2022

Abstract:

Resource Description Framework (RDF) is the standard for representing structured knowledge on the Web. It is based on entities such as facts, events, and the relationships between them. RDF verbalizers are important to generate good quality textual descriptions from such RDF data. Despite the signi cant work done for the English language, no efforts have been directed towards low-resource languages like the Arabic language. This work promotes the development of RDF data-to-text (D2T) generation systems for the Arabic language by introducing a new Arabic dataset (AraWebNLG). A comparative study between multiple sequence-to-sequence models is also presented while studying the transfer of knowledge from pre-trained language models (AraBERT, AraGPT2 and mT5) to overcome data limitations. The analysis involves numerical metrics (BLEU and Perplexity scores) as well as task-specific metrics related to the accuracy of the content selection and fluency of the generated text. The results highlight the importance of pre-training on a large corpus of Arabic data as the AraBERT initialized model is the best performing among the others. Text-to-text pre-training using mT5 is also able to achieve competitive results even with multilingual weights.

Advisor(s):

Hajj, Hazem

Show full item record

Files in this item

Name: ToumaRoudy_2022.pdf

Size: 984.2Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

AUB Students' Theses, Dissertations, and Projects [12709]

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Copyright Statement

All materials included in the institutional repository are protected by copyright laws and are the property of their respective copyright holders. Materials may be used for non-commercial, educational, or research purposes only, and must be cited or attributed to the original source. Permission for any other use must be obtained from the copyright holder(s) directly. The American University of Beirut Libraries does not assume responsibility for any infringement of copyright laws that may occur as a result of the use of materials in the repository. If you believe that your copyright has been infringed upon in the repository, please contact the AUB Libraries immediately.

For further information, please contact us at scholarworks@aub.edu.lb

MODELS AND RESOURCES FOR ARABIC DATA-TO-TEXT GENERATION

MODELS AND RESOURCES FOR ARABIC DATA-TO-TEXT GENERATION

Abstract:

Advisor(s):

Files in this item

This item appears in the following Collection(s)

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks

This Collection

My Account

Copyright Statement