AUB ScholarWorks

MIXED ARABIC AUDIO DATA SET FOR COCKTAIL PARTY MODELS

Show simple item record

dc.contributor.advisor Hajj, Hazem
dc.contributor.author Achour, Rim
dc.date.accessioned 2021-09-17T11:52:21Z
dc.date.available 2021-09-17T11:52:21Z
dc.date.issued 9/17/2021
dc.date.submitted 9/17/2021
dc.identifier.uri http://hdl.handle.net/10938/23045
dc.description.abstract The cocktail party problem (CPP) arises in complex auditory settings. It requires a device to function like the human ear to tune out the noises and focus on a single voice of interest. In fact, machines are still poor in performing such a task. To solve this problem, speech recognition has been used to separate different overlapping speakers from audio recordings and transcribe their speeches into text. Researchers have been facing many challenges such as building fast algorithms and the lack of real datasets, which limits a machine learning (ML) model’s ability to generalize well in real-world settings. In fact, there is no work done on solving the CPP for Arabic, primarily due to the lack of available mixture of Arabic speeches. This thesis aims at closing this gap and creating a mixture of Arabic audio that can be used in developing CPP ML models. Arabic language is challenging, especially with the existence of many dialects. Available texts used to perform the transcription are non-diacritized and have a complex morphology. To overcome these challenges, we have created a corpus of speech mixtures starting from a dataset that has individual audio recordings. The original dataset is called Common Voice Modern Standard Arabic. To generate the required CPP corpus, the data was processed in four stages: preprocessing, normalization, silence removal, and then mixing of shortest and longest speech durations. The mixtures were evaluated manually by listening to a representative sample assessing the volume of the mixture, the effectiveness of the silence removal technique and to which extent both speakers overlap. The evaluation showed success of the method in generating the mixtures that can be used to develop ML models for Arabic CPP.
dc.language.iso en
dc.title MIXED ARABIC AUDIO DATA SET FOR COCKTAIL PARTY MODELS
dc.type Thesis
dc.contributor.department Graduate Program in Computational Science
dc.contributor.faculty Faculty of Arts and Sciences
dc.contributor.institution American University of Beirut
dc.contributor.commembers Elbassuoni, Shady
dc.contributor.commembers Nassif, Nabil


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account