A Markovian Approach for Arabic Root Extraction
Journal: The International Arab Journal of Information Technology (Vol.8, No. 1)Publication Date: 2011-01-01
Authors : Abderrahim Boudlal Rachid Belahbib Abdelhak Lakhouaja Azzeddine Mazroui Abdelouafi Meziane; Mohamed Bebah;
Page : 91-98
Keywords : Arabic NLP; morphological analysis; root extraction; hidden Markov models; and Viterbi algorithm;
Abstract
In this paper, we present an Arabic morphological analysis system that assigns, for each word of an unvoweled Arabic sentence, a unique root depending on the context. The proposed system is composed of two modules. The first one consists of an analysis out of context. In this module, we segment each word of the sentence into its elementary morphological units in order to identify its possible roots. For that, we adopt the segmentation of the word into three parts (prefix, stem, suffix). In the second module we use the context to identify the correct root among all the possible roots of the word. For this purpose, we use a Hidden Markov Models approach, where the observations are the words and the possible roots represent the hidden states. We validate the approach using the NEMLAR Arabic writing corpus consisting of 500,000 words. The system gives the correct root in more than 98% of the training set, and in almost 94% of the words in the testing set
Other Latest Articles
- A High Capacity Data Hiding Scheme Using Modified AMBTC Compression Technique
- Stochastic Bounds for Microprocessor Systems Availability
- Comprehensive Stemmer for Morphologically Rich Urdu Language
- Entropy as a Measure of Quality of XML Schema Document
- A Novel Approach for Segmentation of Human Metaphase Chromosome Images Using Region Based Active Contours
Last modified: 2019-04-28 18:21:59