Abstract:
Understanding and accurately processing semantic relations is key to advancing Nat ural Language Processing (NLP). One primary semantic relation is contradictions
between sentences which play a crucial role in influencing the interpretation of other
semantic relations and are essential for several NLP tasks, such as sarcasm and
inconsistency detection. The ability to automatically detect contradictions is vital
for identifying mutually exclusive statements, thus recognizing the underlying irony
in sarcastic expressions and ensuring logical coherence in textual data. Addition ally, differentiating between various semantic relations can significantly enhance the
precision of automated systems and virtual assistants in generating contradiction free information. However, contradiction detection has often been overshadowed
within the semantic field in favor of entailment and similarity tasks. Contradictory
ideas can appear in diverse forms within sentences, making them challenging to
identify. Our research addresses this gap by developing reliable models specifically
tailored for contradiction detection. We employed extensive methodologies, includ ing data restructuring, benchmarking, and fine-tuning, achieving an accuracy of 98%
in classifying contradictions. Furthermore, we developed another model specialized
in differentiating between the three semantic relations: contradiction, similarity,
and dissimilarity, which achieved an accuracy of 97% in differentiating between
contradicting and dissimilar pairs. Leveraging these models, we discovered histori cally overlooked contradictory pairs within the Semantic Textual Similarity (STS)
benchmarks, inaccurately labeled as similar or dissimilar, which represent about a
quarter of this dataset. This mislabeling may lead to biases in how language mod els differentiate between contradicting, similar, and dissimilar pairs. Highlighting
these neglected contradicting pairs provides insights into the impact of contradic tions within the STS dataset on corresponding models. These insights confirm that
the presence of contradictions significantly affects the accuracy and effectiveness of
STS models. This thesis contributes significantly to realizing the full potential of
NLP in capturing the complexity of human communication, thereby enriching both
academic discourse and practical applications in the digital age.