A STUDY ON THE DIFFICULTIES AND CHALLENGES INVOLVED IN CREATING A CORPUS FOR ENGLISH-MANIPURI CODE MIXED SOCIAL MEDIA TEXTS

Journal: International Journal of Advanced Research in Engineering and Technology (IJARET) (Vol.9, No. 3)

Publication Date: 2018-05-31

Authors : PRIYADARSHINI LAMABAM; KUNAL CHAKMA;

Page : 186-193

Keywords : Social Media; Natural Language Processing; Code-mixing;

Source : Download Find it from : Google Scholar

Abstract

Since the inception of Social media, people are greatly diverted from the use of formal documents towards its use and it has transformed the daily lives of the people in a more comfortable way. The people of this era have become more creative in expressing their thoughts in social media. They started using informal (similar to spoken words) consisting of free word forms that made them interact with other people easily. Moreover, in bilingual or multilingual societies, people switch between two or more languages. Such trend has introduced many challenges in the field of Natural language processing (NLP) as the available language detectors fail to identify such languages due to their creative and diverse style of writing. In this paper, we have focused on the code-mixed English -Manipuri social media text describing the difficulties on how we have collected the data and then created the corpus for the development of NLP tools.

Main Menu

Searching By

PARTNERS

A STUDY ON THE DIFFICULTIES AND CHALLENGES INVOLVED IN CREATING A CORPUS FOR ENGLISH-MANIPURI CODE MIXED SOCIAL MEDIA TEXTS

Abstract

Advertisement