A STUDY ON THE DIFFICULTIES AND CHALLENGES INVOLVED IN CREATING A CORPUS FOR ENGLISH-MANIPURI CODE MIXED SOCIAL MEDIA TEXTS
Journal: International Journal of Advanced Research in Engineering and Technology (IJARET) (Vol.9, No. 3)Publication Date: 2018-05-31
Authors : PRIYADARSHINI LAMABAM; KUNAL CHAKMA;
Page : 186-193
Keywords : Social Media; Natural Language Processing; Code-mixing;
Abstract
Since the inception of Social media, people are greatly diverted from the use of formal documents towards its use and it has transformed the daily lives of the people in a more comfortable way. The people of this era have become more creative in expressing their thoughts in social media. They started using informal (similar to spoken words) consisting of free word forms that made them interact with other people easily. Moreover, in bilingual or multilingual societies, people switch between two or more languages. Such trend has introduced many challenges in the field of Natural language processing (NLP) as the available language detectors fail to identify such languages due to their creative and diverse style of writing. In this paper, we have focused on the code-mixed English -Manipuri social media text describing the difficulties on how we have collected the data and then created the corpus for the development of NLP tools.
Other Latest Articles
- COMPARATIVE PERFORMANCE ANALYSIS OF HAND GESTURE RECOGNITION TECHNIQUES
- MICROWAVE ABSORPTION (RETURN LOSS) STUDIES ON CONDUCTING POLYMER (PANI-CoO) COMPOSITES
- CONSTRUCTION EQUIPMENT PRODUCTIVITY AND COST ECONOMICS IN HIGHWAY PROJECTS: CASE STUDY PART II (SEASONAL VARIATION STUDY)
- CONSTRUCTION EQUIPMENT PRODUCTIVITY AND COST ECONOMICS IN HIGHWAY PROJECTS: CASE STUDY PART I (SITE SPECIFIC STUDY)
- STUDY THE STRUCTURAL AND OPTICAL PROPERTIES OF 10% MAGNESIUM DOPED ZINC OXIDE THIN FILMS
Last modified: 2018-12-10 16:28:39