Few-shot Learning for Conversational Bots in Low-Resource Settings

Naous, Tarek

AUB ScholarWorks Home
→
Students Publications
→
AUB Students' Theses, Dissertations, and Projects
→
View Item

Few-shot Learning for Conversational Bots in Low-Resource Settings

Naous, Tarek

URI: http://hdl.handle.net/10938/23363

Date: 4/6/2022

Abstract:

Open-domain dialogue agents are systems that can converse with users on any topic of user’s choice. Having such types of agents has been a long standing objective in Artificial Intelligence as they can make the human-computer interaction experience much more engaging. Recent advances in English open-domain dialogue have leveraged state-of-the-art Large Language Models (LLMs) for Natural Language Generation (NLG). Such LLMs are massively pre-trained on unlabeled data in a self-supervised mode to learn abstract representations of the language. They also require large amounts of labeled open-domain dialogue data for fine-tuning to achieve the challenging task of dialogue response generation. In low-resource settings such as Arabic and its dialects, such pre-trained LLMs and large labeled dialogue datasets are often non-existent, hindering the development of open-domain chatbots for those languages. Such limited resource modeling problem is known as the few-shot learning problem. In this thesis, we address multiple aspects of the few-shot learning problem for open-domain Arabic conversational bots. The first contribution is a solution to overcome the unavailability of LLMs with large amounts of labeled dialogue data for Arabic MSA. To address the response generation problem, we propose a model that transfers knowledge from a pre-trained BERT encoder to an encoder-decoder model for dialogue response generation. The second contribution addresses a more extreme case of limited resources with Arabic dialects. To address the LLM and NLG challenges for Arabic dialects, we propose a three-stage learning framework based on warm-starting, self-supervised pre-training, and few-shot fine-tuning. The third contribution focuses on addressing the challenge of ensuring generated responses are relevant to user’s query for both English and Arabic. We propose a new decoding algorithm that considers increased samples in response generation then chooses the response with highest similarity to user’s query. The fourth contribution is in the development of new data resources for Arabic with one message-response dataset in Modern Standard Arabic (MSA) and three datasets for the most widely spoken Arabic dialects (Levantine, Egyptian, and Gulf). The experiment results showed success of the proposed methods and achieved state of the art performance for Arabic open-dialogue systems.

Show full item record

Files in this item

Name: NaousTarek_2022.pdf

Size: 2.140Mb

Format: PDF

Description: Thesis Report

View/Open

This item appears in the following Collection(s)

AUB Students' Theses, Dissertations, and Projects [12709]

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Copyright Statement

All materials included in the institutional repository are protected by copyright laws and are the property of their respective copyright holders. Materials may be used for non-commercial, educational, or research purposes only, and must be cited or attributed to the original source. Permission for any other use must be obtained from the copyright holder(s) directly. The American University of Beirut Libraries does not assume responsibility for any infringement of copyright laws that may occur as a result of the use of materials in the repository. If you believe that your copyright has been infringed upon in the repository, please contact the AUB Libraries immediately.

For further information, please contact us at scholarworks@aub.edu.lb

Few-shot Learning for Conversational Bots in Low-Resource Settings

Few-shot Learning for Conversational Bots in Low-Resource Settings

Abstract:

Files in this item

This item appears in the following Collection(s)

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks

This Collection

My Account

Copyright Statement