AUB ScholarWorks

QUANTIFYING GENDER AND POLITICAL BIAS IN ARABIC WORD EMBEDDINGS

Show simple item record

dc.contributor.advisor Abu Salem, Fatima
dc.contributor.advisor Elbassuoni, Shady
dc.contributor.author Al-Sabahi, Ghumdan
dc.date.accessioned 2022-09-16T09:31:59Z
dc.date.available 2022-09-16T09:31:59Z
dc.date.issued 9/16/2022
dc.date.submitted 9/15/2022
dc.identifier.uri http://hdl.handle.net/10938/23618
dc.description.abstract Word embeddings are a breakthrough in the world of artificial intelligence. They replaced the one hot encoding that is used in many Natural Language Processing (NLP) systems such as sentiment analysis, recommendation systems, and so on. In word embeddings, each word is represented as a vector with related words clustered together. In other words, words that are close in vector space should have similar meanings. Recent research, however, has revealed that these word embeddings contain biases towards specific groups that are transferred from our culture to machines. However, the majority of such research has been conducted for English word embeddings. Other research on languages that incorporate grammatical gender terms have adjusted the bias test to accommodate for gendered words. However, little has been done on the Arabic language. In this study, we focus on quantifying gender and political bias in Twitter, Wikipedia, and two Lebanese newspaper corpora, all of which were trained using the CBOW algorithm. In the Twitter and Wikipedia models, we examine the relation of male and female terms with various categories, including strength, weakness, career, family, domestic work, science, art, money & business, and beauty & appearance. Furthermore, we investigate the relationships between “Palestine” and “Israel” in all of our embeddings with “occupation”, “resistance”, “peace”, and ”violence” & “terrorism”. We rely on manual translation and evaluation due to a scarcity of Arabic language literature. Our findings reveal that some stereotypes, such as the connection of females with domestic work and art as well as males with strength and money & business, are expressed in our embeddings. In terms of political categories, the Lebanese newspapers examined have long portrayed Israel using terms associated with “occupation and violence” & “terrorism”, whereas Palestinians have long been associated with “resistance”. Furthermore, we investigate the political bias in greater depth across decades to demonstrate how newspapers' opinions have evolved over time.
dc.language.iso en_US
dc.subject word embeddings
dc.subject political bias
dc.subject gender bias
dc.title QUANTIFYING GENDER AND POLITICAL BIAS IN ARABIC WORD EMBEDDINGS
dc.type Student Project
dc.contributor.department Department of Computer Science
dc.contributor.faculty Faculty of Arts and Sciences
dc.contributor.institution American University of Beirut
dc.contributor.commembers Abu Salem, Fatima
dc.contributor.commembers Elbassuoni, Shady
dc.contributor.degree MS in Computer Science
dc.contributor.AUBidnumber 202125172


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account