LLM-Based Approaches For Gene Essentiality Prediction

Loading...
Thumbnail Image

Date

Authors

Shawraba, Sara

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Essential genes are genes that are indispensable for the survival of the organism; the removal of essential genes leads to the immediate death of the organism. Essential gene identification has great therapeutic applications as they are ideal drug targets for developing antibiotics and for performing targeted gene therapy in diseases like cancer. The use of machine learning models for identification of essential genes has seen a great increase in recent years. However, existing approaches rely exclusively on numerical methods, while large language models (LLMs) remain completely unexplored for this task. In this article, we propose the first LLM-based pipeline for essentiality prediction and argue that LLMs can achieve competitive performance while capturing meaningful information from textual representations that is otherwise lost in numerical approaches. Using sequence-derived and text-based features, our RoBERTa-based model achieved ROC-AUC values as high as 0.92, demonstrating that essentiality can be predicted effectively from textual data.

Description

Release date: 2027-02-16.

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By