Bridging the Semantic Gap: Tackling Contradictions in Semantic Similarity for Natural Language

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Understanding and accurately processing semantic relations is key to advancing Nat ural Language Processing (NLP). One primary semantic relation is contradictions between sentences which play a crucial role in influencing the interpretation of other semantic relations and are essential for several NLP tasks, such as sarcasm and inconsistency detection. The ability to automatically detect contradictions is vital for identifying mutually exclusive statements, thus recognizing the underlying irony in sarcastic expressions and ensuring logical coherence in textual data. Addition ally, differentiating between various semantic relations can significantly enhance the precision of automated systems and virtual assistants in generating contradiction free information. However, contradiction detection has often been overshadowed within the semantic field in favor of entailment and similarity tasks. Contradictory ideas can appear in diverse forms within sentences, making them challenging to identify. Our research addresses this gap by developing reliable models specifically tailored for contradiction detection. We employed extensive methodologies, includ ing data restructuring, benchmarking, and fine-tuning, achieving an accuracy of 98% in classifying contradictions. Furthermore, we developed another model specialized in differentiating between the three semantic relations: contradiction, similarity, and dissimilarity, which achieved an accuracy of 97% in differentiating between contradicting and dissimilar pairs. Leveraging these models, we discovered histori cally overlooked contradictory pairs within the Semantic Textual Similarity (STS) benchmarks, inaccurately labeled as similar or dissimilar, which represent about a quarter of this dataset. This mislabeling may lead to biases in how language mod els differentiate between contradicting, similar, and dissimilar pairs. Highlighting these neglected contradicting pairs provides insights into the impact of contradic tions within the STS dataset on corresponding models. These insights confirm that the presence of contradictions significantly affects the accuracy and effectiveness of STS models. This thesis contributes significantly to realizing the full potential of NLP in capturing the complexity of human communication, thereby enriching both academic discourse and practical applications in the digital age.

Description

Keywords

Natural Language Processing, Machine Learning, Semantic Textual Similarity, Contradictions, Embeddings, Large Language Models

Citation

Endorsement

Review

Supplemented By

Referenced By