Beyond Labels: Unsupervised Approaches and Representation Learning Techniques for Hate Speech Detection

Ben Abdallah, Malek Nabil

AUB ScholarWorks Home
→
Students Publications
→
AUB Students' Theses, Dissertations, and Projects
→
View Item

Beyond Labels: Unsupervised Approaches and Representation Learning Techniques for Hate Speech Detection

Ben Abdallah, Malek Nabil

URI: http://hdl.handle.net/10938/24271

Date: 2024-01-24

Abstract:

The proliferation of Hate Speech on social media platforms has been increasing recently, causing severe adverse effects on victims’ mental health and well-being. This serious phenomenon requires updated automated detection systems. However, existing supervised machine learning models have significant limitations as they rely heavily on labeled data, which is costly, prone to errors, and lacks scalability and generalizability. This thesis explores unsupervised learning techniques, specifically clustering enhanced with deep representation learning, to overcome these limitations. Traditional (TF-IDF, Word2Vec) and modern methods (transformers, pre-trained language models, and contrastive learning) are leveraged to enrich representations of short texts and capture semantic similarities without labeling. We investigate the state-of-the-art Simple Contrastive Learning of Sentence Embedding (SimCSE), a contrastive learning approach for sentence embeddings, and propose Hate-SimCSE: a finetuned SimCSE framework to encode robust hate speech representations, leading to better clustering results. Extensive experiments on diverse public datasets demonstrate significant clustering performance improvements from Hate-SimCSE over conventional text clustering approaches with an accuracy ranging from 0.58 to 0.86, a 2% to 15% improvement. Overall, our work illustrates the potential of these new techniques to develop more effective methods for combating the pressing societal issue of online hate and to create a safer online environment for all users. Additionally, this research can extend beyond hate speech detection, impacting various applications in NLP downstream tasks, such as semantic text similarity, information extraction, and question-answering.

Advisor(s):

Khreich, Wael

Show full item record

Files in this item

Name: Ben AbdallahMalek ...

Size: 1.533Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

AUB Students' Theses, Dissertations, and Projects [12709]

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Copyright Statement

All materials included in the institutional repository are protected by copyright laws and are the property of their respective copyright holders. Materials may be used for non-commercial, educational, or research purposes only, and must be cited or attributed to the original source. Permission for any other use must be obtained from the copyright holder(s) directly. The American University of Beirut Libraries does not assume responsibility for any infringement of copyright laws that may occur as a result of the use of materials in the repository. If you believe that your copyright has been infringed upon in the repository, please contact the AUB Libraries immediately.

For further information, please contact us at scholarworks@aub.edu.lb

Beyond Labels: Unsupervised Approaches and Representation Learning Techniques for Hate Speech Detection

Beyond Labels: Unsupervised Approaches and Representation Learning Techniques for Hate Speech Detection

Abstract:

Advisor(s):

Files in this item

This item appears in the following Collection(s)

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks

This Collection

My Account

Copyright Statement