Template Based Affix Stemmer for a Morphologically Rich Language
Journal: The International Arab Journal of Information Technology (Vol.12, No. 2)Publication Date: 2015-03-01
Authors : Sajjad Khan; Waqas Anwar; Usama Bajwa; Xuan Wang;
Page : 146-154
Keywords : IR; stemming; prefix; infix; suffix; exception lists;
Abstract
Word stemming is one of the most significant factors that affect the performance of a Natural Language Processing (NLP) application such as Information Retrieval (IR) system, part of speech tagging, machine translation system and syntactic parsing. Urdu language raises several challenges to NLP largely due to its rich morphology. In Urdu language, stemming process is different as compared to that for other languages, as it not only depends on removing prefixes and suffixes but also on removing infixes. In this paper, we introduce a template based stemmer that eliminates all kinds of affixes i.e., prefixes, infixes and suffixes, depending on the morphological pattern of the word. The presented results are excellent and this stemmer can prove to be very affective for a morphologically rich language.
Other Latest Articles
- Combination of Feature Selection and Optimized Fuzzy Apriori Rules: The Case of Credit Scoring
- Cloud Task Scheduling Based on Ant Colony Optimization
- A Multimodal Biometric System Based on Palmprint and Finger Knuckle Print Recognition Methods
- Chaotic Image Encryption using Modular Addition and Combinatorial Techniques
- A New Perspective on Principal Component Analysis using Inverse Covariance
Last modified: 2019-11-14 22:03:31