Beyond Labels: Unsupervised Approaches and Representation Learning Techniques for Hate Speech Detection

Ben Abdallah, Malek Nabil

AUB ScholarWorks Home
→
Students Publications
→
AUB Students' Theses, Dissertations, and Projects
→
View Item

dc.contributor.advisor	Khreich, Wael
dc.contributor.author	Ben Abdallah, Malek Nabil
dc.date.accessioned	2024-01-24T10:41:37Z
dc.date.available	2024-01-24T10:41:37Z
dc.date.issued	2024-01-24
dc.date.submitted	2024-01
dc.identifier.uri	http://hdl.handle.net/10938/24271
dc.description.abstract	The proliferation of Hate Speech on social media platforms has been increasing recently, causing severe adverse effects on victims’ mental health and well-being. This serious phenomenon requires updated automated detection systems. However, existing supervised machine learning models have significant limitations as they rely heavily on labeled data, which is costly, prone to errors, and lacks scalability and generalizability. This thesis explores unsupervised learning techniques, specifically clustering enhanced with deep representation learning, to overcome these limitations. Traditional (TF-IDF, Word2Vec) and modern methods (transformers, pre-trained language models, and contrastive learning) are leveraged to enrich representations of short texts and capture semantic similarities without labeling. We investigate the state-of-the-art Simple Contrastive Learning of Sentence Embedding (SimCSE), a contrastive learning approach for sentence embeddings, and propose Hate-SimCSE: a finetuned SimCSE framework to encode robust hate speech representations, leading to better clustering results. Extensive experiments on diverse public datasets demonstrate significant clustering performance improvements from Hate-SimCSE over conventional text clustering approaches with an accuracy ranging from 0.58 to 0.86, a 2% to 15% improvement. Overall, our work illustrates the potential of these new techniques to develop more effective methods for combating the pressing societal issue of online hate and to create a safer online environment for all users. Additionally, this research can extend beyond hate speech detection, impacting various applications in NLP downstream tasks, such as semantic text similarity, information extraction, and question-answering.
dc.language.iso	en
dc.subject	Hate speech detection
dc.subject	Machine Learning
dc.subject	Contrastive learning
dc.subject	Unsupervised learning
dc.subject	Natural language processing
dc.title	Beyond Labels: Unsupervised Approaches and Representation Learning Techniques for Hate Speech Detection
dc.type	Thesis
dc.contributor.department	Suliman S. Olayan School of Business
dc.contributor.faculty	Suliman S. Olayan School of Business
dc.contributor.commembers	Nasr, Walid
dc.contributor.commembers	Taleb, Sirine
dc.contributor.degree	MSBA
dc.contributor.AUBidnumber	202224352

Files in this item

Name: Ben AbdallahMalek ...

Size: 1.533Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

AUB Students' Theses, Dissertations, and Projects [12709]

Show simple item record

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Copyright Statement

All materials included in the institutional repository are protected by copyright laws and are the property of their respective copyright holders. Materials may be used for non-commercial, educational, or research purposes only, and must be cited or attributed to the original source. Permission for any other use must be obtained from the copyright holder(s) directly. The American University of Beirut Libraries does not assume responsibility for any infringement of copyright laws that may occur as a result of the use of materials in the repository. If you believe that your copyright has been infringed upon in the repository, please contact the AUB Libraries immediately.

For further information, please contact us at scholarworks@aub.edu.lb