Abstract:
Blogging websites are growing globally, allowing online users to express their views and engage in discussions related to various domains such as politics, technology, entertainment, and lifestyle. Posted blog entries often reflect their authors’ trustworthiness, quality, authority and believability, which vary from one author to another. While some blog posts state facts, others tend to spread rumors, state personal views, or support certain propagandas. The aim of this work is to create models to automatically rate the credibility of Arabic blog posts in real-time, adopting the Merriam Webster credibility definition: the quality of being believed or accepted as true, real or honest . We focus on Arabic blog posts due to their recent popularity fueled by the recent uprisings in the Arab world, and due to the scarcity of tools for assessing the credibility of Arabic blog posts. We note that Arabic Natural Language Processing (NLP) is challenging due to the natural complexity of the Arabic language and it’s very rich morphology, unavailability of benchmark corpora, and immaturity of its NLP tools compared to those available for English and other languages. To achieve our objective, we first compiled a set of credibility features from literature, and added other features that we believe affect the credibility of Arabic blog posts. We then selected from the web 25 Arabic blog posts, extracted these features, and annotated the posts for credibility. Afterwards, we applied feature selection, and reduced the feature space to the four features that affected credibility the most, namely: reasonability, bias, objectivity, and sentiment. Having selected the features of interest, we annotated a manually collected medium-size corpus of 273 Arabic blog posts, and created several classification models including SVM, Neural Nets, Decision Trees and others, among which we ended up using Decision Trees which achieved 74 percent accuracy and F-measure score, and a 10percent increase on those scores (84percent) when we tested the mode
Description:
Thesis. M.S. American University of Beirut. Department of Computer Science, 2017. T:6563
Advisor : Dr. Wassim El Hajj, Associate Professor, Computer Science ; Committee members : Dr. Shady Elbassuoni, Assistant Professor, Computer Science ; Dr. Hazem Hajj, Associate Professor, Electrical and Computer Engineering.
Includes bibliographical references (leaves 68-71)