AUB ScholarWorks

Beyond Labels: Unsupervised Approaches and Representation Learning Techniques for Hate Speech Detection

Show simple item record

dc.contributor.advisor Khreich, Wael
dc.contributor.author Ben Abdallah, Malek Nabil
dc.date.accessioned 2024-01-24T10:41:37Z
dc.date.available 2024-01-24T10:41:37Z
dc.date.issued 2024-01-24
dc.date.submitted 2024-01
dc.identifier.uri http://hdl.handle.net/10938/24271
dc.description.abstract The proliferation of Hate Speech on social media platforms has been increasing recently, causing severe adverse effects on victims’ mental health and well-being. This serious phenomenon requires updated automated detection systems. However, existing supervised machine learning models have significant limitations as they rely heavily on labeled data, which is costly, prone to errors, and lacks scalability and generalizability. This thesis explores unsupervised learning techniques, specifically clustering enhanced with deep representation learning, to overcome these limitations. Traditional (TF-IDF, Word2Vec) and modern methods (transformers, pre-trained language models, and contrastive learning) are leveraged to enrich representations of short texts and capture semantic similarities without labeling. We investigate the state-of-the-art Simple Contrastive Learning of Sentence Embedding (SimCSE), a contrastive learning approach for sentence embeddings, and propose Hate-SimCSE: a finetuned SimCSE framework to encode robust hate speech representations, leading to better clustering results. Extensive experiments on diverse public datasets demonstrate significant clustering performance improvements from Hate-SimCSE over conventional text clustering approaches with an accuracy ranging from 0.58 to 0.86, a 2% to 15% improvement. Overall, our work illustrates the potential of these new techniques to develop more effective methods for combating the pressing societal issue of online hate and to create a safer online environment for all users. Additionally, this research can extend beyond hate speech detection, impacting various applications in NLP downstream tasks, such as semantic text similarity, information extraction, and question-answering.
dc.language.iso en
dc.subject Hate speech detection
dc.subject Machine Learning
dc.subject Contrastive learning
dc.subject Unsupervised learning
dc.subject Natural language processing
dc.title Beyond Labels: Unsupervised Approaches and Representation Learning Techniques for Hate Speech Detection
dc.type Thesis
dc.contributor.department Suliman S. Olayan School of Business
dc.contributor.faculty Suliman S. Olayan School of Business
dc.contributor.commembers Nasr, Walid
dc.contributor.commembers Taleb, Sirine
dc.contributor.degree MSBA
dc.contributor.AUBidnumber 202224352


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account