Assessing in real-time the credibility of Arabic blog posts using traditional and deep learning models

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Springer

Abstract

Blogging websites are growing globally with a fast pace allowing online users to express their views and engage in discussions related to various domains such as politics, technology, and lifestyle. While some blog posts state facts and genuine personal views, others tend to spread rumors or support certain propagandas. This has triggered the need to develop models to automatically rate the credibility of blog posts. Arabic blog posts in particular, have recently drawn a lot of attention following the recent uprisings in the Arab world. To the best of our knowledge, little work has been done to predict the credibility of Arab blogs, which faces many challenges including: the subjectivity and complexity inherent in assessing credibility, the rich morphology of the Arabic language, and the lack of the appropriate lexicons and corpora to conduct credibility analysis. In this paper, we focus on developing a fully automated system to assess the credibility of Arabic blog posts. We collected Arabic blog posts, annotated them, extracted and reduced the important features, then employed various machine learning models (e.g., Support Vector Machines) and deep learning models (e.g., Long Short-Term Memory—LSTM and Convolution Neural Network—CNN), under various input settings. We conclude that LSTM performs the best with accuracy reaching 74%, when the input is composed of the full blog posts along with a set of syntactic and morphological features. The incorporation of hand-crafted features and the addition of CNN to try and extract complex features did not improve the accuracy. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature.

Description

Keywords

Blog credibility, Corpus development, Deep learning, Machine learning, Automation, Blogs, Complex networks, Learning systems, Long short-term memory, Support vector machines, Arabic languages, Convolution neural network, Fully automated, Important features, Learning models, Machine learning models, Morphological features, Online users

Citation

Endorsement

Review

Supplemented By

Referenced By