AUB ScholarWorks

Credibility models for Arabic content on Twitter -

Show simple item record

dc.contributor.author El-Ballouli, Reem Ossama,
dc.date 2014
dc.date.accessioned 2015-02-03T10:35:12Z
dc.date.available 2015-02-03T10:35:12Z
dc.date.issued 2014
dc.date.submitted 2014
dc.identifier.other b18295009
dc.identifier.uri http://hdl.handle.net/10938/10104
dc.description Thesis. M.S. American University of Beirut. Department of Computer Science, 2014. T:6109
dc.description Advisor : Dr. Wassim El Hajj, Associate Professor, Computer Science ; Members of Committee : Dr. Shady ELbassouni, Assistant Professor, Computer Science ; Dr. Hazem Hajj, Associate Professor, Electrical and Computer Engineering.
dc.description Includes bibliographical references (leaves 42-43)
dc.description.abstract Microblogging websites such as Twitter have gained popularity as an effective and quick means of expressing opinions, sharing news and promoting information and updates. As a result, data generated on Twitter has become a vital and rich source for tasks such as sentiment mining or newsgathering. However, a significant portion of such data is either biased, untruthful, spam or non-credible in general. Consequently, filtering out non-credible tweets when performing data analyses tasks on Twitter becomes a crucial task. In this work, we present credibility models for content on Twitter. We focus on Arabic tweets due to the recent popularity of Twitter in the Arab world and due to the presence of a large portion of non-credible tweets in Arabic. We build a binary credibility classifier that classifies a tweet that belongs to a given topic as either credible or non-credible. The suggested classifier relies on an exhaustive set of features extracted from both the author of the tweet (user-based) and the tweet itself (content-based). To evaluate the performance of the suggested classifier in categorizing credible vs. non-credible tweets, we compared it to several baselines and to state-of-the-art approaches. The classifier consistently surpassed the accuracy of the baseline approaches. It also outperformed the state-of-the-art approaches with an increase of 14percent in F-measure. Furthermore, we analyzed our feature set by comparing the accuracy of the classifier when trained on user-based features only versus content-based features only. Overall, user-based features only generated better accuracies than content-based features only when tested on multiple topics, indicating that features related to the tweet author are more important than features related to the content of the tweet, when it comes to deciding on the tweet credibility.
dc.format.extent 1 online resource (xi, 43 leaves) : illustrations ; 30cm
dc.language.iso eng
dc.relation.ispartof Theses, Dissertations, and Projects
dc.subject.classification T:006109 AUBNO
dc.subject.lcsh Data mining.
dc.subject.lcsh Social media.
dc.subject.lcsh Arabic language -- Texts.
dc.subject.lcsh Web sites.
dc.subject.lcsh Microblogs.
dc.title Credibility models for Arabic content on Twitter -
dc.type Thesis
dc.contributor.department American University of Beirut. Faculty of Arts and Sciences. Department of Computer Science, degree granting institution.


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account