Detecting Hate Speech Across Arabic Dialects

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

With the ever-increasing adoption of social network platforms, online hate speech has become a pressing and growing issue. Hate speech detection in English is attracting more and more attention, and some detection systems have shown some successful results. In contrast, hate speech detection in Arabic is still faced with various challenges mainly due to the wide variety of Arabic dialects. The main goal of this work is to build an accurate speech detection system that can generalize well across different Arabic dialects. Therefore, we conduct an extensive analysis of various preprocessing techniques (e.g., stemming, lemmatization, and emojis translation), feature extraction techniques (e.g., frequency-based and word embeddings), classification models (including Logistic Regression and Support Vector Machine), and combination techniques (at the data, feature, and model level). We fine-tune Bert models and optimize their hyperparameters for our detection tasks. Our experiments include six datasets containing different dialects and three datasets with Levantine dialect, Tunisian dialect, and a combination of several dialects. 80% of each of the six datasets is combined and used for model training and validation, while the remaining part is used for modelV¶ evaluation. The three remaining datasets are kept for testing the generalization of our best models. The results on our test sets indicate that the scores combination of three models, logistic regression using (unigram) term frequency inverse document frequency (TF-IDF), logistic regression using AraVec word embedding features, and support vector machine using TF-IDF, achieves a good detection performance across all test sets, with area under the curve (AUC) of 84%, 89%, and 78% on the three unseen datasets. IQ aGGLWLRQ, ZH ILQG WKaW XVLQJ OHPPaWL]aWLRQ aQG cRQVLGHULQJ HPRMLV¶ meanings have a considerable impact on the results. Pre-trained AraBert model outperforms all other trained models with higher generalization performance and AUC scores of 91%, 93%, and 85% on the unseen datasets. The results denote that the same models' combination and AraBert are robust to data imbalance and achieve a relatively good generalization performance.

Description

Keywords

Hate Speech Detection, Social Media, Arabic Dialects, Machine Learning Algorithms, Language Models

Citation

Endorsement

Review

Supplemented By

Referenced By