Parallel HMM-Based Approach for Arabic Part of Speech Tagging
Journal: The International Arab Journal of Information Technology (Vol.15, No. 2)Publication Date: 2018-03-01
Authors : Ayoub Kadim; Azzeddine Lazrek;
Page : 341-351
Keywords : Part of speech tagging; hidden Markov model; Viterbi algorithm; natural language processing; corpus; arabic language.;
Abstract
In this paper we try to go beyond the classical use of the Hidden Markov Model for Part Of Speech Tagging, particularly for the Arabic language. In fact, most available Arabic tagging systems and tagsets are derived from English and do not make use of the linguistic richness of Arabic. Our new proposed tagging system will consist of two Hidden Markov Models working in parallel: In addition to the main model, a second model is added to serve as a reference for low probabilities tags. Of course, a dual corpus is required to train both models. To do so, we restructure the Nemlar Arabic corpus and extract a new tagset from diacritics and grammatical rules. The approach is implemented by using Java programming environment and several experimentations are conducted to evaluate it. The results of this approach, which are promising, as well as its limitations, are deeply discussed and future possible enhancements are also highlighted. This work will open the door for new promising research perspectives, particularly for the Arabic language processing, and more generally for the applications of Hidden Markov Models.
Other Latest Articles
- A Hybrid Template Protection Approach using Secure Sketch and ANN for Strong Biometric Key Generation with Revocability Guarantee
- Progressive Visual Cryptography with Friendly and Size Invariant Shares
- Cipher Text Policy Attribute Based Broadcast Encryption for Multi-Privileged Groups
- A Signaling System for Quality of Service (QoS)- Aware Content Distribution in Peer-to-Peer Overlay Networks
- Incorporating Unsupervised Machine Learning Technique on Genetic Algorithm for Test Case Optimization
Last modified: 2019-04-29 21:07:30