Abstract:
Word embeddings are a breakthrough in the world of artificial intelligence. They replaced the one hot encoding that is used in many Natural Language Processing (NLP) systems such as sentiment analysis, recommendation systems, and so on. In word embeddings, each word is represented as a vector with related words clustered together. In other words, words that are close in vector space should have similar meanings. Recent research, however, has revealed that these word embeddings contain biases towards specific groups that are transferred from our culture to machines. However, the majority of such research has been conducted for English word embeddings. Other research on languages that incorporate grammatical gender terms have adjusted the bias test to accommodate for gendered words. However, little has been done on the Arabic language. In this study, we focus on quantifying gender and political bias in Twitter, Wikipedia, and two Lebanese newspaper corpora, all of which were trained using the CBOW algorithm. In the Twitter and Wikipedia models, we examine the relation of male and female terms with various categories, including strength, weakness, career, family, domestic work, science, art, money & business, and beauty & appearance. Furthermore, we investigate the relationships between “Palestine” and “Israel” in all of our embeddings with “occupation”, “resistance”, “peace”, and ”violence” & “terrorism”. We rely on manual translation and evaluation due to a scarcity of Arabic language literature. Our findings reveal that some stereotypes, such as the connection of females with domestic work and art as well as males with strength and money & business, are expressed in our embeddings. In terms of political categories, the Lebanese newspapers examined have long portrayed Israel using terms associated with “occupation and violence” & “terrorism”, whereas Palestinians have long been associated with “resistance”. Furthermore, we investigate the political bias in greater depth across decades to demonstrate how newspapers' opinions have evolved over time.