AUB ScholarWorks

MODELS AND RESOURCES FOR ARABIC DATA-TO-TEXT GENERATION

Show simple item record

dc.contributor.advisor Hajj, Hazem
dc.contributor.author Touma, Roudy
dc.date.accessioned 2022-02-03T06:00:39Z
dc.date.available 2022-02-03T06:00:39Z
dc.date.issued 2/3/2022
dc.date.submitted 2/2/2022
dc.identifier.uri http://hdl.handle.net/10938/23291
dc.description.abstract Resource Description Framework (RDF) is the standard for representing structured knowledge on the Web. It is based on entities such as facts, events, and the relationships between them. RDF verbalizers are important to generate good quality textual descriptions from such RDF data. Despite the signi cant work done for the English language, no efforts have been directed towards low-resource languages like the Arabic language. This work promotes the development of RDF data-to-text (D2T) generation systems for the Arabic language by introducing a new Arabic dataset (AraWebNLG). A comparative study between multiple sequence-to-sequence models is also presented while studying the transfer of knowledge from pre-trained language models (AraBERT, AraGPT2 and mT5) to overcome data limitations. The analysis involves numerical metrics (BLEU and Perplexity scores) as well as task-specific metrics related to the accuracy of the content selection and fluency of the generated text. The results highlight the importance of pre-training on a large corpus of Arabic data as the AraBERT initialized model is the best performing among the others. Text-to-text pre-training using mT5 is also able to achieve competitive results even with multilingual weights.
dc.language.iso en_US
dc.title MODELS AND RESOURCES FOR ARABIC DATA-TO-TEXT GENERATION
dc.type Thesis
dc.contributor.department Department of Electrical and Computer Engineering
dc.contributor.faculty Maroun Semaan Faculty of Engineering and Architecture
dc.contributor.institution American University of Beirut
dc.contributor.commembers Saghir, Mazen
dc.contributor.commembers El Hajj, Wassim
dc.contributor.degree ME
dc.contributor.AUBidnumber 201500600


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account