AUB ScholarWorks

Transformers for Arabic Natural Language Understanding and Generation

Show simple item record

dc.contributor.advisor Hajj, Hazem
dc.contributor.author Antoun, Wissam
dc.date.accessioned 2020-09-23T18:06:19Z
dc.date.available 2020-09-23T18:06:19Z
dc.date.issued 9/23/2020
dc.identifier.uri http://hdl.handle.net/10938/22132
dc.description Mazen Saghir; Wassim El-Hajj
dc.description.abstract Natural Language Processing (NLP) aims at advancing Artificial Intelligence by developing methods that enable machines to process language like humans do. While there has been significant breakthroughs in English NLP with the introduction of Machine Learning (ML) models called Transformers, Arabic NLP has been lagging behind, due to the lack of large scale data needed by these new models. Transformers represent special types of deep learning (DL) architectures, where the models learns to combine and weigh the different internal representations of a sentence. Furthermore, Arabic presents its own challenges such as the lexical sparsity, complex and concatenative morphology. This work aims to advance Arabic NLP tasks and bring the performances closer to English NLP. We propose multiple Transformer-based models that are specifically developed for Arabic Natural Language Understanding (NLU) and Generation (NLG). For Arabic NLU, we developed an Arabic centric Bidirectional Encoder Representations from Transformers, called \textsc{AraBERT}, bridging the gap with the English model BERT developed by Google. The model is comprised of 110 million parameters. For Arabic NLG, we proposed a Transformer-based encoder-decoder architecture to address challenges for Arabic open-domain chatbots. We built a large conversational dataset annotated for the gender of both interlocutors. The resulting model is the first open-domain gender-aware Arabic chatbot. For NLU experiments, we applied \textsc{AraBERT} to Arabic text classification and Arabic question answering. The performance showed state-of-the-art performance compared to multilingual BERT. For NLG Experiments, the results showed success of the model in achieving simple open-ended Arabic conversations, demonstrating basic world knowledge, and common-sense reasoning.
dc.language.iso en
dc.subject Arabic
dc.subject NLP
dc.subject BERT
dc.subject AraBERT
dc.subject Chatbot
dc.subject Transformers
dc.subject NLU
dc.subject NLG
dc.subject Sentiment Analysis
dc.subject Question Answering
dc.subject Hate Speech Detection
dc.subject Offensive Language Detection
dc.subject Named Entity Recognition
dc.title Transformers for Arabic Natural Language Understanding and Generation
dc.type Thesis
dc.contributor.department Department of Electrical and Computer Engineering
dc.contributor.faculty Maroun Semaan Faculty of Engineering and Architecture
dc.contributor.institution American University of Beirut


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account