Abstract:
Named entity recognition (NER) is the task of identifying named entities such as locations, persons, and organizations in a given piece of text. NER plays a signi cant role in many applications including information retrieval, question an-swering, machine translation, text clustering, and navigation systems. In this thesis, we tackled the problem of Arabic NER. Arabic is a very challenging lan-guage when it comes to natural language processing (NLP) in general. Arabic is both morphologically rich and highly ambiguous and has complex morpho-syntactic agreement rules and many irregular forms. To address all these issues, we proposed to use deep learning based on Arabic word embeddings that cap-ture syntactic and semantic relationships between words. Deep learning has been shown to perform signi cantly better than other approaches for various NLP tasks including NER. However, deep learning models also require a signi cantly large amount of training data, which is highly lacking in the case of Arabic. To be able to overcome this, we proposed a semi-supervised deep learning approach that uses both labeled and semi-labeled data, which we coin deep co-learning. We tested our approach using di erent established benchmarks and compared it to the state-of-the-art Arabic NER tools such as MadaMira and Farasa. Our deep co-learning approach signi cantly outperformed the compared to Arabic NER approaches as well as purely-supervised deep learning ones.
Description:
Thesis. M.S. American University of Beirut. Department of Computer Science, 2017. T:6553
Advisor : Dr. Shady Elbassuoni, Assistant Professor, Computer Science ; Committee members : Dr. Wassim El Hajj, Associate Professor, Computer Science ; Dr. Hazem El Hajj, Associate Professor, Electrical and Computer Engineering.
Includes bibliographical references (leaves 44-47)