Abstract:
With the expansion of scientific and social media, a wealth of online information resources has accumulated as free text including articles, studies, and social blogs. Mining, standardization, and extraction of information from these resources brings upon novel approaches for data analysis and knowledge discovery; particularly from domain specific large text corpora. Key to this is annotated corpora. Supervised algorithms for machine learning need them for training. Unsupervised algorithms need them for testing and evaluation. Manual annotation is expensive especially in expert domains such as medicine. This thesis presents a Semi-Automatic Annotator for Medical NLP Applications (SAMNA). SAMNA takes a large corpus, a list of labels, a list of terms associated with each label, and lists of rules associated with labels and terms. SAMNA annotates the corpora words that match the corresponding terms and rules. It also uses distributional similarity to discover novel annotations. In addition, it provides the annotating scholar with an intuitive, friendly and efficient interface to navigate and edit the annotations. We used SAMNA in several medical NLP applications to annotate protein sets in medical articles related to specific diseases such as stroke, spinal cord injuries, and Alzheimer. The graph theory based analysis of the corpora annotated with SAMNA led to discoveries on interest to medical experts. SAMNA can also be applied in systems review, as well as other annotation domains.
Description:
Thesis. M.E. American University of Beirut. Department of Electrical and Computer Engineering, 2015. ET:6306
Advisor : Dr. Fadi Zaraket, Assistant Professor, Electrical and Computer Engineering ; Committee Members : Dr. Mariette Awad, Associate Professor, Electrical and Computer Engineering ; Dr. Rouwaida Kanj, Assistant Professor, Electrical and Computer Engineering.
Includes bibliographical references (leaves 51-54)