SAraBERT:Affixing Inter-Sentence Transformers to AraBERT

dc.contributor.advisorAwad, Mariette
dc.contributor.authorShameseldeen, Sami
dc.contributor.commembersElbassuoni, Shady
dc.contributor.commembersKhreich, Wael
dc.contributor.degreeMS
dc.contributor.departmentComputational Science Program
dc.contributor.facultyFaculty of Arts and Sciences
dc.contributor.institutionAmerican University of Beirut
dc.date2022
dc.date.accessioned2022-05-18T10:01:24Z
dc.date.available2022-05-18T10:01:24Z
dc.date.issued2022-05-17T21:00:00Z
dc.date.submitted2022-05-11T21:00:00Z
dc.description.abstractNatural language processing (NLP) has made remarkable advancement with the advent of deep learning technology. The deep learning models have produced enhanced results in NLP tasks such as text summarization, text translation, and sentiment analysis. In particular, text summarization is becoming an important task as the number and volume of electronic documents are increasing rapidly. However, NLP for Modern Standard Arabic (MSA) did not witness enough research due to the many challenges the language faces, the complexity of the language itself and the lack of structured data. In this research, we introduce SAraBERT, an enhanced version of AraBERT that adds inter-sentence transformer layers for extractive summarization tasks. To ensure that the summaries generated achieve a high coverage of the document's main ideas, we propose Semantic Siamese Similarity, a novel evaluation metric that measures the level of similarity between two text inputs. Testing using BLEU, ROUGE, and Semantic Siamese similarity on Sarabert and published related models showed the effectiveness of our proposed model and motivate follow on research.
dc.identifier.urihttp://hdl.handle.net/10938/23460
dc.language.isoen
dc.subjectArabic NLP
dc.subjectNatural Language Processing
dc.subjectExtractive Summarization
dc.subjectText Summarization
dc.subjectTransformers
dc.subjectDeep Learning
dc.titleSAraBERT:Affixing Inter-Sentence Transformers to AraBERT
dc.typeThesis
local.AUBID202120016

Files