Abstract:
Recent developments enabled chatbots to be an essential part of people’s daily lives from asking general questions about the weather to booking movie tickets. Chatbots can be classified into open-domain bots or task-oriented bots. Open domain chatbots can have engaging conversations in any domain. On the other hand, task-oriented chatbots, which are the focus of this thesis, aim at handling specific tasks such as booking movie tickets. While task-oriented chatbots have seen significant advances in English, task-oriented chatbots in Arabic remain limited in their capabilities mainly due to the scarcity of the available datasets and resources for training task-oriented dialogue systems in Arabic. To overcome these challenges, we have explored two state-of-the-art strategies for task-oriented bots: End-to-end models and pipeline models that consist of Natural Language Understanding (NLU) followed by the Dialogue Manager (DM) and Natural Language Generation (NLG). For end-to-end, we proposed the use of AraGPT2 and created a large multi-domain human-to-human conversational dataset in Arabic by translating a large-scale English dataset. Our end-to-end model achieved state-of-the-art results for Arabic and proved to be comparable in performance to what has been achieved by state-of-the-art English end-to-end models. For pipeline models, we addressed the NLU challenge by developing a multi-task model that can simultaneously perform intent classification and slot filling using AraBERT. To train the NLU model, we created a large dataset labeled for intents and slots by translating another large English dataset for training task-oriented bots. The developed NLU model was able to achieve comparable results with respect to the state-of-the-art results of pipeline models in English.