Abstract:
In this drastically evolving digital era, textual content production heavily relies on Large Language Models. These models are prone to inherit and thus propagate various forms of stereotypes and gender bias from their training corpus, which has harmful consequences on the worldwide population, such as loss of human potential, aggressive behaviors, biased mental imagery, and unfair labor force participation. Therefore, this thesis focused on evaluating gender bias in the responses of one of the most recent and popular LLMs, ChatGPT. We examined occupational and semantic bias in three common tasks of ChatGPT as well as in the embedding task of Ada-V2 model. After that, we finetuned ChatGPT on bias detection for three types of bias: sexism, dehumanization, and generic bias. The finetuned versions outperformed the original model as well as other popular LLMs in bias detection. We were also able to highlight two major weaknesses in ChatGPT’s learning capabilities as well as reduce the gender gaps in the model’s responses. This research built a strong basis for future work to ensure the safe and valuable use of recent AI tools like ChatGPT.