Abstract:
With the advancement of social applications, the number of people using such applications has increased to unbelievable levels. As of 2021, there are 2.8 billion users on Facebook. With such a large number of users, the amount of text data has also increased, which pushed data scientists' interest towards understanding such form of data. Text data may be used to extract information about sentiment and emotion which can be useful to many industries, such as businesses, election campaigns, entertainment, etc. As more and more people are joining the world wide web from all over the world, text data is being produced in many different languages, such as Russian, Chinese, Arabic, etc. For this reason, there has been a burst in the last few years in the development of natural language resources for the analysis of text in different languages. As Armenian is one of the ``new" languages on the Internet, very limited resources for analyzing Armenian exist out there. Hence, this thesis focuses on developing effective large-scale sentiment and emotion lexicons, which can be used to extract information from these data. Moreover, to further advance the resources available for Armenian NLP (Natural Language Processing), we develop an Armenian version of BERT (Bidirectional Encoder Representations from Transformers) by combining the approach used in developing the English BERT with a large corpus in Armenian.