ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Template Based Affix Stemmer for a Morphologically Rich Language

Journal: The International Arab Journal of Information Technology (Vol.12, No. 2)

Publication Date:

Authors : ; ; ; ;

Page : 146-154

Keywords : IR; stemming; prefix; infix; suffix; exception lists;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Word stemming is one of the most significant factors that affect the performance of a Natural Language Processing (NLP) application such as Information Retrieval (IR) system, part of speech tagging, machine translation system and syntactic parsing. Urdu language raises several challenges to NLP largely due to its rich morphology. In Urdu language, stemming process is different as compared to that for other languages, as it not only depends on removing prefixes and suffixes but also on removing infixes. In this paper, we introduce a template based stemmer that eliminates all kinds of affixes i.e., prefixes, infixes and suffixes, depending on the morphological pattern of the word. The presented results are excellent and this stemmer can prove to be very affective for a morphologically rich language.

Last modified: 2019-11-14 22:03:31