ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A Markovian Approach for Arabic Root Extraction

Journal: The International Arab Journal of Information Technology (Vol.8, No. 1)

Publication Date:

Authors : ; ;

Page : 91-98

Keywords : Arabic NLP; morphological analysis; root extraction; hidden Markov models; and Viterbi algorithm;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

In this paper, we present an Arabic morphological analysis system that assigns, for each word of an unvoweled Arabic sentence, a unique root depending on the context. The proposed system is composed of two modules. The first one consists of an analysis out of context. In this module, we segment each word of the sentence into its elementary morphological units in order to identify its possible roots. For that, we adopt the segmentation of the word into three parts (prefix, stem, suffix). In the second module we use the context to identify the correct root among all the possible roots of the word. For this purpose, we use a Hidden Markov Models approach, where the observations are the words and the possible roots represent the hidden states. We validate the approach using the NEMLAR Arabic writing corpus consisting of 500,000 words. The system gives the correct root in more than 98% of the training set, and in almost 94% of the words in the testing set

Last modified: 2019-04-28 18:21:59