Abstract:
Natural language processing (NLP) has made remarkable advancement with the advent of deep learning technology. The deep learning models have produced enhanced results in NLP tasks such as text summarization, text translation, and sentiment analysis. In particular, text summarization is becoming an important task as the number and volume of electronic documents are increasing rapidly. However, NLP for Modern Standard Arabic (MSA) did not witness enough research due to the many challenges the language faces, the complexity of the language itself and the lack of structured data. In this research, we introduce SAraBERT, an enhanced version of AraBERT that adds inter-sentence transformer layers for extractive summarization tasks. To ensure that the summaries generated achieve a high coverage of the document's main ideas, we propose Semantic Siamese Similarity, a novel evaluation metric that measures the level of similarity between two text inputs. Testing using BLEU, ROUGE, and Semantic Siamese similarity on Sarabert and published related models showed the effectiveness of our proposed model and motivate follow on research.