Beyond Labels: Unsupervised Approaches and Representation Learning Techniques for Hate Speech Detection

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The proliferation of Hate Speech on social media platforms has been increasing recently, causing severe adverse effects on victims’ mental health and well-being. This serious phenomenon requires updated automated detection systems. However, existing supervised machine learning models have significant limitations as they rely heavily on labeled data, which is costly, prone to errors, and lacks scalability and generalizability. This thesis explores unsupervised learning techniques, specifically clustering enhanced with deep representation learning, to overcome these limitations. Traditional (TF-IDF, Word2Vec) and modern methods (transformers, pre-trained language models, and contrastive learning) are leveraged to enrich representations of short texts and capture semantic similarities without labeling. We investigate the state-of-the-art Simple Contrastive Learning of Sentence Embedding (SimCSE), a contrastive learning approach for sentence embeddings, and propose Hate-SimCSE: a finetuned SimCSE framework to encode robust hate speech representations, leading to better clustering results. Extensive experiments on diverse public datasets demonstrate significant clustering performance improvements from Hate-SimCSE over conventional text clustering approaches with an accuracy ranging from 0.58 to 0.86, a 2% to 15% improvement. Overall, our work illustrates the potential of these new techniques to develop more effective methods for combating the pressing societal issue of online hate and to create a safer online environment for all users. Additionally, this research can extend beyond hate speech detection, impacting various applications in NLP downstream tasks, such as semantic text similarity, information extraction, and question-answering.

Description

Keywords

Machine Learning, Contrastive learning, Unsupervised learning, Natural language processing, Hate speech detection

Citation

Endorsement

Review

Supplemented By

Referenced By