Abstract:
This thesis presents an approach to improve Fine-Grained Arabic Named Entity Recognition (NER) using a both supervised and semi-supervised deep learning models, utilizing both labeled and semi-labeled data. This study is motivated by the need for computers to process and interpret natural language effectively. We review a few similar studies on semi-supervised NER using Arabic language models and propose to build a large and reliable training dataset for Fine-Grained Arabic NER,which is largely overlooked in the field of Arabic NLP. Hence, we experiment with several annotation platforms and choose Labelbox for its online accessibility, support for fine- grained labeling, and a cluster of useful tools. We follow the FIGER dataset standard for named entities and sub-entities, and plan to build our own dataset consisting of around 2,000 Arabic Wikipedia articles. Our proposed semi-supervised deep learning model aims to improve on existing models in fine-grained Arabic NER models.