Abstract:
Microblogging websites such as Twitter have gained popularity as an effective and quick means of expressing opinions, sharing news and promoting information and updates. As a result, data generated on Twitter has become a vital and rich source for tasks such as sentiment mining or newsgathering. However, a significant portion of such data is either biased, untruthful, spam or non-credible in general. Consequently, filtering out non-credible tweets when performing data analyses tasks on Twitter becomes a crucial task. In this work, we present credibility models for content on Twitter. We focus on Arabic tweets due to the recent popularity of Twitter in the Arab world and due to the presence of a large portion of non-credible tweets in Arabic. We build a binary credibility classifier that classifies a tweet that belongs to a given topic as either credible or non-credible. The suggested classifier relies on an exhaustive set of features extracted from both the author of the tweet (user-based) and the tweet itself (content-based). To evaluate the performance of the suggested classifier in categorizing credible vs. non-credible tweets, we compared it to several baselines and to state-of-the-art approaches. The classifier consistently surpassed the accuracy of the baseline approaches. It also outperformed the state-of-the-art approaches with an increase of 14percent in F-measure. Furthermore, we analyzed our feature set by comparing the accuracy of the classifier when trained on user-based features only versus content-based features only. Overall, user-based features only generated better accuracies than content-based features only when tested on multiple topics, indicating that features related to the tweet author are more important than features related to the content of the tweet, when it comes to deciding on the tweet credibility.
Description:
Thesis. M.S. American University of Beirut. Department of Computer Science, 2014. T:6109
Advisor : Dr. Wassim El Hajj, Associate Professor, Computer Science ; Members of Committee : Dr. Shady ELbassouni, Assistant Professor, Computer Science ; Dr. Hazem Hajj, Associate Professor, Electrical and Computer Engineering.
Includes bibliographical references (leaves 42-43)