AUB ScholarWorks

Direct Speech to Speech Turkish to Arabic

Show simple item record

dc.contributor.advisor El Hajj, Wassim
dc.contributor.author Baali, Massa
dc.date.accessioned 2021-04-23T17:20:48Z
dc.date.available 2021-04-23T17:20:48Z
dc.date.issued 4/23/2021
dc.identifier.uri http://hdl.handle.net/10938/22441
dc.description Ahmed Ali Mohammad Nassar
dc.description.abstract Dubbed series are gaining a lot of popularity in recent years with strong sup-port from major media services providers. Such popularity is fueled by studiesthat showed that dubbed versions of TV shows are more popular than theirsubtitled equivalents. In this paper, we propose an unsupervised approach toconstruct speech-to-speech corpus, aligned on short segment level, to produce aparallel speech corpus in the source- and target- languages. Our methodologyexploits speech recognition, machine translation and noisy frames removal algo-rithms, to match segments in both languages. Without losing any generalization,our approach was successfully applied on Turkish-Arabic dubbed series. Out of36 hours, our pipeline was able to generate 17 hours of paired segments with 70%overall accuracy. The corpus will be freely available for the research community.
dc.language.iso en
dc.subject Deep learning
dc.subject Speech to Speech
dc.subject Speech Recognition
dc.subject Parallel Corpus
dc.title Direct Speech to Speech Turkish to Arabic
dc.type Thesis
dc.contributor.department Department of Computer Science
dc.contributor.faculty Faculty of Arts and Sciences
dc.contributor.institution American University of Beirut


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account