AUB ScholarWorks

Mining for credible opinions in Arabic blogs -

Show simple item record

dc.contributor.author Al Zaatari, Ayman Bassam,
dc.date.accessioned 2017-12-11T16:30:54Z
dc.date.available 2017-12-11T16:30:54Z
dc.date.issued 2017
dc.date.submitted 2017
dc.identifier.other b19152760
dc.identifier.uri http://hdl.handle.net/10938/21000
dc.description Thesis. M.S. American University of Beirut. Department of Computer Science, 2017. T:6563
dc.description Advisor : Dr. Wassim El Hajj, Associate Professor, Computer Science ; Committee members : Dr. Shady Elbassuoni, Assistant Professor, Computer Science ; Dr. Hazem Hajj, Associate Professor, Electrical and Computer Engineering.
dc.description Includes bibliographical references (leaves 68-71)
dc.description.abstract Blogging websites are growing globally, allowing online users to express their views and engage in discussions related to various domains such as politics, technology, entertainment, and lifestyle. Posted blog entries often reflect their authors’ trustworthiness, quality, authority and believability, which vary from one author to another. While some blog posts state facts, others tend to spread rumors, state personal views, or support certain propagandas. The aim of this work is to create models to automatically rate the credibility of Arabic blog posts in real-time, adopting the Merriam Webster credibility definition: the quality of being believed or accepted as true, real or honest . We focus on Arabic blog posts due to their recent popularity fueled by the recent uprisings in the Arab world, and due to the scarcity of tools for assessing the credibility of Arabic blog posts. We note that Arabic Natural Language Processing (NLP) is challenging due to the natural complexity of the Arabic language and it’s very rich morphology, unavailability of benchmark corpora, and immaturity of its NLP tools compared to those available for English and other languages. To achieve our objective, we first compiled a set of credibility features from literature, and added other features that we believe affect the credibility of Arabic blog posts. We then selected from the web 25 Arabic blog posts, extracted these features, and annotated the posts for credibility. Afterwards, we applied feature selection, and reduced the feature space to the four features that affected credibility the most, namely: reasonability, bias, objectivity, and sentiment. Having selected the features of interest, we annotated a manually collected medium-size corpus of 273 Arabic blog posts, and created several classification models including SVM, Neural Nets, Decision Trees and others, among which we ended up using Decision Trees which achieved 74 percent accuracy and F-measure score, and a 10percent increase on those scores (84percent) when we tested the mode
dc.format.extent 1 online resource ( vii, 71 leaves) : color illustrations
dc.language.iso eng
dc.relation.ispartof Theses, Dissertations, and Projects
dc.subject.classification T:006563
dc.subject.lcsh Data mining.
dc.subject.lcsh Machine learning.
dc.subject.lcsh Arabic language -- Morphology.
dc.subject.lcsh Natural language processing (Computer science)
dc.subject.lcsh Artificial intelligence.
dc.subject.lcsh Blogs.
dc.subject.lcsh Text processing (Computer science)
dc.subject.lcsh Computational linguisitics.
dc.title Mining for credible opinions in Arabic blogs -
dc.type Thesis
dc.contributor.department Faculty of Arts and Sciences.
dc.contributor.department Department of Computer Science,
dc.contributor.institution American University of Beirut.


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account