AUB ScholarWorks

Information Extraction from Arabic Social Media Content

Show simple item record

dc.contributor.advisor Zaraket, Fadi
dc.contributor.author Dankar, Rayan
dc.date.accessioned 2020-09-23T18:00:51Z
dc.date.available 2020-09-23T18:00:51Z
dc.date.issued 9/23/2020
dc.identifier.uri http://hdl.handle.net/10938/22111
dc.description.abstract Stakeholders such as advertisers, celebrities and politicians are showing high interest in information extraction from social media content. Social media content includes posts and interactions in local dialects. Arabic and its local dialects are among the top used languages in the world. Modern standard Arabic is dominant in formal platforms and events such as newscasts, speeches, books, and newspapers. Dialects of Arabic are dominant in everyday communication and are increasingly used on social media platforms. Information extraction techniques focus on extracting entities and relational entities from unstructured text. They have limited support for MSA and lack support for Arabic dialects. In this thesis, we construct necessary resources for information extraction from Arabic social media content. We introduce a method to retrieve relevant information by extracting entities and relational entities from modern standard and dialectical Arabic text. To improve entity and relational entity extraction, we construct ADAT, an Arabic Dialect Annotation Tool. ADAT serves as a building block to construct computational linguistic models from Arabic text and requires basic linguistic knowledge from its users. We evaluate the obtained results from our work on entity and relational entity extraction on a corpora concerning Yemen and report its performance using precision and recall metrics. We finally use graph construction and analysis techniques to draw insights from the extracted entities and relational entities.
dc.language.iso en
dc.subject natural language processing, information extraction, entity extraction, relational entity extraction
dc.title Information Extraction from Arabic Social Media Content
dc.type Thesis
dc.contributor.department Department of Electrical and Computer Engineering
dc.contributor.faculty Maroun Semaan Faculty of Engineering and Architecture
dc.contributor.institution American University of Beirut
dc.contributor.commembers Chehab, Ali
dc.contributor.commembers Jaber, Mohamad


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account