Morpheme Based Myanmar Word Segmenter
Journal: International Journal of Trend in Scientific Research and Development (Vol.3, No. 5)Publication Date: 2019-15-8
Authors : Sin Thi Yar Myint Hanni Htun Myat Myo Nwe Wai;
Page : 911-914
Keywords : Syllable breaking; Morpheme; style; styling;
Abstract
Myanmar script has no fixed delimiters between words or syllables. Therefore, to achieve meaningful and correct segmented words from the text is a challenging task. This paper has proposed a morpheme based Myanmar word tokenizer which combines rule based syllable breaking and dictionary lookup syllable merging methods with longest string matching approach. The proposed approach is tested on a Monolingual dictionary that contains useful information for the word segmentation. It also contains above 32,581 words including headwords, stop words and essential words with Myanmar3 font. These words are collected from Myanmar and Essential Words dictionaries. According to the experimental results, it can provide the promising segmentation accuracy of Myanmar text. Sin Thi Yar Myint | Hanni Htun | Myat Myo Nwe Wai "Morpheme Based Myanmar Word Segmenter" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26520.pdfPaper URL: https://www.ijtsrd.com/computer-science/other/26520/morpheme-based-myanmar-word-segmenter/sin-thi-yar-myint
Other Latest Articles
- Parking Lot Security System using RFID Technology
- Awareness of Hypertension and Adult Education as a Preventive Measure among Adults in Ikereku Community of Akinyele Local Government Area, Oyo State, Nigeria
- RP 105 Formulation of Standard Quadratic Congruence of Composite Modulus A Product of Twin Primes
- Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
- Dealing With Reactive Power in Islanded Micro Grid Corresponding Power Distribution in Hierarchical Droop Control using Photovoltaic System
Last modified: 2019-09-07 17:48:50