dc.contributor.advisor |
Hajj, Hazem |
dc.contributor.author |
Antoun, Wissam |
dc.date.accessioned |
2020-09-23T18:06:19Z |
dc.date.available |
2020-09-23T18:06:19Z |
dc.date.issued |
9/23/2020 |
dc.identifier.uri |
http://hdl.handle.net/10938/22132 |
dc.description |
Mazen Saghir; Wassim El-Hajj |
dc.description.abstract |
Natural Language Processing (NLP) aims at advancing Artificial Intelligence by developing methods that enable machines to process language like humans do. While there has been significant breakthroughs in English NLP with the introduction of Machine Learning (ML) models called Transformers, Arabic NLP has been lagging behind, due to the lack of large scale data needed by these new models. Transformers represent special types of deep learning (DL) architectures, where the models learns to combine and weigh the different internal representations of a sentence. Furthermore, Arabic presents its own challenges such as the lexical sparsity, complex and concatenative morphology. This work aims to advance Arabic NLP tasks and bring the performances closer to English NLP. We propose multiple Transformer-based models that are specifically developed for Arabic Natural Language Understanding (NLU) and Generation (NLG). For Arabic NLU, we developed an Arabic centric Bidirectional Encoder Representations from Transformers, called \textsc{AraBERT}, bridging the gap with the English model BERT developed by Google. The model is comprised of 110 million parameters. For Arabic NLG, we proposed a Transformer-based encoder-decoder architecture to address challenges for Arabic open-domain chatbots. We built a large conversational dataset annotated for the gender of both interlocutors. The resulting model is the first open-domain gender-aware Arabic chatbot. For NLU experiments, we applied \textsc{AraBERT} to Arabic text classification and Arabic question answering. The performance showed state-of-the-art performance compared to multilingual BERT. For NLG Experiments, the results showed success of the model in achieving simple open-ended Arabic conversations, demonstrating basic world knowledge, and common-sense reasoning. |
dc.language.iso |
en |
dc.subject |
Arabic |
dc.subject |
NLP |
dc.subject |
BERT |
dc.subject |
AraBERT |
dc.subject |
Chatbot |
dc.subject |
Transformers |
dc.subject |
NLU |
dc.subject |
NLG |
dc.subject |
Sentiment Analysis |
dc.subject |
Question Answering |
dc.subject |
Hate Speech Detection |
dc.subject |
Offensive Language Detection |
dc.subject |
Named Entity Recognition |
dc.title |
Transformers for Arabic Natural Language Understanding and Generation |
dc.type |
Thesis |
dc.contributor.department |
Department of Electrical and Computer Engineering |
dc.contributor.faculty |
Maroun Semaan Faculty of Engineering and Architecture |
dc.contributor.institution |
American University of Beirut |