MIXED ARABIC AUDIO DATA SET FOR COCKTAIL PARTY MODELS

Achour, Rim

AUB ScholarWorks Home
→
Students Publications
→
AUB Students' Theses, Dissertations, and Projects
→
View Item

dc.contributor.advisor	Hajj, Hazem
dc.contributor.author	Achour, Rim
dc.date.accessioned	2021-09-17T11:52:21Z
dc.date.available	2021-09-17T11:52:21Z
dc.date.issued	9/17/2021
dc.date.submitted	9/17/2021
dc.identifier.uri	http://hdl.handle.net/10938/23045
dc.description.abstract	The cocktail party problem (CPP) arises in complex auditory settings. It requires a device to function like the human ear to tune out the noises and focus on a single voice of interest. In fact, machines are still poor in performing such a task. To solve this problem, speech recognition has been used to separate different overlapping speakers from audio recordings and transcribe their speeches into text. Researchers have been facing many challenges such as building fast algorithms and the lack of real datasets, which limits a machine learning (ML) model’s ability to generalize well in real-world settings. In fact, there is no work done on solving the CPP for Arabic, primarily due to the lack of available mixture of Arabic speeches. This thesis aims at closing this gap and creating a mixture of Arabic audio that can be used in developing CPP ML models. Arabic language is challenging, especially with the existence of many dialects. Available texts used to perform the transcription are non-diacritized and have a complex morphology. To overcome these challenges, we have created a corpus of speech mixtures starting from a dataset that has individual audio recordings. The original dataset is called Common Voice Modern Standard Arabic. To generate the required CPP corpus, the data was processed in four stages: preprocessing, normalization, silence removal, and then mixing of shortest and longest speech durations. The mixtures were evaluated manually by listening to a representative sample assessing the volume of the mixture, the effectiveness of the silence removal technique and to which extent both speakers overlap. The evaluation showed success of the method in generating the mixtures that can be used to develop ML models for Arabic CPP.
dc.language.iso	en
dc.title	MIXED ARABIC AUDIO DATA SET FOR COCKTAIL PARTY MODELS
dc.type	Thesis
dc.contributor.department	Graduate Program in Computational Science
dc.contributor.faculty	Faculty of Arts and Sciences
dc.contributor.institution	American University of Beirut
dc.contributor.commembers	Elbassuoni, Shady
dc.contributor.commembers	Nassif, Nabil

Files in this item

Name: AchourRim_2021.pdf

Size: 1.592Mb

Format: PDF

Description: Thesis Report

View/Open

This item appears in the following Collection(s)

AUB Students' Theses, Dissertations, and Projects [12709]

Show simple item record

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Copyright Statement

All materials included in the institutional repository are protected by copyright laws and are the property of their respective copyright holders. Materials may be used for non-commercial, educational, or research purposes only, and must be cited or attributed to the original source. Permission for any other use must be obtained from the copyright holder(s) directly. The American University of Beirut Libraries does not assume responsibility for any infringement of copyright laws that may occur as a result of the use of materials in the repository. If you believe that your copyright has been infringed upon in the repository, please contact the AUB Libraries immediately.

For further information, please contact us at scholarworks@aub.edu.lb