Automatic Personality Detection Through Text: Predicting the Big Five Traits from Self-Narratives

Wehbe, Rima

AUB ScholarWorks Home
→
Students Publications
→
AUB Students' Theses, Dissertations, and Projects
→
View Item

Automatic Personality Detection Through Text: Predicting the Big Five Traits from Self-Narratives

Wehbe, Rima

URI: http://hdl.handle.net/10938/24302

Date: 2024-02-05

Abstract:

Automatic personality detection from text has gained interest since researchers discovered that linguistic style can be an indicator of personality. However, accurate personality classification remains a challenging task, often lacking data and robust evaluation metrics. This thesis investigates the ability of various machine learning models to predict the Big Five personality traits from text. We evaluate our models using two datasets. The first is the existing Stream of Consciousness Essays (SoCE) dataset, containing essays written by college students about their thoughts. The second is our newly collected Behavioral Interview Data (BID), featuring an annotated corpus tailored for this research. This new dataset includes university students' responses to behavioral questions similar to those asked in job interviews. In our experiments with both datasets, we explore different Natural Language Processing (NLP) techniques, focusing particularly on the Generative Pre-trained Transformer (GPT), using various parameters and testing methods. We compare GPT’s performance with a wide range of traditional and deep learning classifiers, including the BERT base model. Our key findings indicate that our data provides stronger indicators for detecting the Big Five traits than the SoCE dataset. Among the models tested, GPT-based approaches, notably GPT-4 (the latest version of GPT), consistently outperformed other approaches in identifying all five traits, even without prior training on the datasets. Additionally, we observe that fine-tuning GPT enhances its performance, particularly with the SoCE dataset. While achieving accuracy and F1 scores that are comparable to those in related studies, our research offers a more reliable evaluation of model performance by employing the Area Under the Curve (AUC) score, a metric that is more robust against data imbalance and sensitive model parameters. Moreover, our work underscores the practical applications of these models in real-world contexts, like behavioral job interviews, providing valuable insights for future research and applications in this field.

Advisor(s):

Khreich, Wael

Show full item record

Files in this item

Name: RimaWehbe_2024.pdf

Size: 1.164Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

AUB Students' Theses, Dissertations, and Projects [12714]

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Copyright Statement

All materials included in the institutional repository are protected by copyright laws and are the property of their respective copyright holders. Materials may be used for non-commercial, educational, or research purposes only, and must be cited or attributed to the original source. Permission for any other use must be obtained from the copyright holder(s) directly. The American University of Beirut Libraries does not assume responsibility for any infringement of copyright laws that may occur as a result of the use of materials in the repository. If you believe that your copyright has been infringed upon in the repository, please contact the AUB Libraries immediately.

For further information, please contact us at scholarworks@aub.edu.lb

Automatic Personality Detection Through Text: Predicting the Big Five Traits from Self-Narratives

Automatic Personality Detection Through Text: Predicting the Big Five Traits from Self-Narratives

Abstract:

Advisor(s):

Files in this item

This item appears in the following Collection(s)

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks

This Collection

My Account

Copyright Statement