AUB ScholarWorks

A Large Scale Analysis of COVID-19 Tweets in the Arab Region

Show simple item record

dc.contributor.advisor Elbassuoni, Shady
dc.contributor.author Mourad, Aya
dc.date.accessioned 2022-02-07T05:59:30Z
dc.date.available 2022-02-07T05:59:30Z
dc.date.issued 2/7/2022
dc.date.submitted 2/6/2022
dc.identifier.uri http://hdl.handle.net/10938/23319
dc.description.abstract Since the first case was discovered in Wuhan, China, in December 2019, the coronavirus disease (COVID-19) has caused harm worldwide. It has spread rapidly to the Arab World, affecting public health, the economy, and mental health. To combat its spread, the Arab governments have announced many states of emergency and curfews. As a result, most people started communicating about the pandemic through social media platforms such as Twitter. This thesis proposes a suite of text mining tasks to extract useful insights into people’s perceptions and reactions to the pandemic. We have identified 11 relevant topics based on an intensive sampling of randomly selected tweets from a large dataset consisting of 6, 710, 598 spanning from February 1, 2020, to April 30, 2020, combined with an extensive literature review. The tweets in the dataset are geolocated multilingual tweets emerging from the Arab region in English, Arabic, and French. Consequently, we defined an annotation schema to classify the tweets into misinformative and fine-grained informative tweets consisting of 10 different classes. The resulting labeled datasets composed of 5600 English, 4725 Arabic, and 5496 French tweets were then fed to different deep learning and transformer models, including CNN, BiLSTM, and Bert, to conduct multi-label classification. The models’ performance evaluation shows that the BERT-based model outperformed deep learning models in classifying English, multidialect Arabic, and French tweets with an F1-Micro score of 0.84, 0.81, and 0.87, respectively. We also tested the BERT-based models and performed a large-scale analysis on an unlabeled dataset that spans from February 1, 2020, to March 31, 2021. The tweets distribution was the highest in Saudi Arabia (23%), UAE(20%), and Egypt (8%). The analysis by gender shows that Arab region males mainly discussed conspiracy theory and governmental measures topics, making up 68.5% of the total tweets. The topics debated showed a remarkably similar pattern of the rapid rise and slow decline across the region. A sudden surge in the vaccine topic was noticed after Oct 2020 and continues to increase afterward. The Arab region conversation reacts strongly negatively until mid of Sep 2020, where the positive sentiment starts dominating, coinciding with the vaccine topic’s discussion period. Overall, the analysis shows that optimistic feelings increased over time. Surprisingly, Saudi Arabia (41.7%) and other countries, including Kuwait (36.5%), Bahrain(36.5%), and Jordan(35.6%), had higher positive sentiment than negative.
dc.language.iso en_US
dc.subject NLP, Deep Learning
dc.title A Large Scale Analysis of COVID-19 Tweets in the Arab Region
dc.type Thesis
dc.contributor.department Department of Computer Science
dc.contributor.faculty Faculty of Arts and Sciences
dc.contributor.institution American University of Beirut
dc.contributor.commembers Safa, Haidar
dc.contributor.commembers Awad, Mariette
dc.contributor.degree MS
dc.contributor.AUBidnumber 201300800


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account