ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A STUDY ON THE DIFFICULTIES AND CHALLENGES INVOLVED IN CREATING A CORPUS FOR ENGLISH-MANIPURI CODE MIXED SOCIAL MEDIA TEXTS

Journal: International Journal of Advanced Research in Engineering and Technology (IJARET) (Vol.9, No. 3)

Publication Date:

Authors : ; ;

Page : 186-193

Keywords : Social Media; Natural Language Processing; Code-mixing;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Since the inception of Social media, people are greatly diverted from the use of formal documents towards its use and it has transformed the daily lives of the people in a more comfortable way. The people of this era have become more creative in expressing their thoughts in social media. They started using informal (similar to spoken words) consisting of free word forms that made them interact with other people easily. Moreover, in bilingual or multilingual societies, people switch between two or more languages. Such trend has introduced many challenges in the field of Natural language processing (NLP) as the available language detectors fail to identify such languages due to their creative and diverse style of writing. In this paper, we have focused on the code-mixed English -Manipuri social media text describing the difficulties on how we have collected the data and then created the corpus for the development of NLP tools.

Last modified: 2018-12-10 16:28:39