dc.contributor.advisor |
Khreich, Wael |
dc.contributor.author |
Al Bidewe, Nour Ayman |
dc.date.accessioned |
2024-01-29T06:49:22Z |
dc.date.available |
2024-01-29T06:49:22Z |
dc.date.issued |
2024-01-29 |
dc.date.submitted |
2024-01-27 |
dc.identifier.uri |
http://hdl.handle.net/10938/24280 |
dc.description.abstract |
This thesis investigates the complex task of gender detection in text analysis, focusing on identifying an author's gender through linguistic and stylistic analysis. The study emphasizes the role of gender detection in enhancing the precision and relevance of information processing systems, which is pivotal for more personalized content strategies and combating gender biases in various sectors such as social media, and AI-driven analytics. The research conducts an exhaustive evaluation of diverse methodologies, encompassing a range of preprocessing techniques and feature selection strategies, and assesses the effectiveness of both traditional and advanced language models like BERT, particularly in analyzing tweets. Our study's key findings show that username-based data splitting in social media, as opposed to random splitting, enhances model performance and generalization, and prevents data leakage. Integrating word and character N-Grams, along with combining linguistic and textual features, proved highly effective. BERT emerged as a superior performer among large language models, though it did not outperform traditional models. This work not only advances the understanding of gender detection but also contributes significantly to the development of more sophisticated and equitable text analysis tools in the field of computational linguistics. |
dc.language.iso |
en |
dc.subject |
Gender Detection |
dc.subject |
Large Language Model |
dc.subject |
Natural language processing |
dc.subject |
Bidirectional Encoder Representations from Transformers (BERT) |
dc.subject |
Generative pre-trained transformers (GPT) |
dc.title |
Unveiling Gender in Text: Advanced Approaches in Language Model Analysis |
dc.type |
Thesis |
dc.contributor.department |
Suliman S. Olayan School of Business |
dc.contributor.faculty |
Suliman S. Olayan School of Business |
dc.contributor.commembers |
Nasr, Walid |
dc.contributor.commembers |
Taleb, Sirine |
dc.contributor.degree |
MSBA |
dc.contributor.AUBidnumber |
201706202 |