A Hybrid Approach for NER System for Scarce Resourced Language-URDU: Integrating n-gram with Rules and Gazetteers
Journal: Mehran University Research Journal of Engineering and Technology (Vol.34, No. 4)Publication Date: 2015-10-01
Authors : Saeeda Naz; Arif Iqbal Umar; Imran Razzak;
Page : 349-358
Keywords : Entity Recognition; Named Entities; N-Gram Model; Gazetteer Lists;
Abstract
We present a hybrid NER (Name Entity Recognition) system for Urdu script by integration of n-gram model (unigram and bigram), rules and gazetteers. We used prefix and suffix characters for rule construction instead of first name and last name lists or potential terms on the output list that is produced by n-gram model. Evaluation of the system is performed on two corpora, the IJCNLP NE (Named Entity) corpus and CRL NE corpus in Urdu text. The system achieved 92.65 and 87.6% using hybrid unigram and 92.47 and 86.83% using hybrid bigram on IJCNLP NE corpus and CRL NE corpus, respectively.
Other Latest Articles
- An Adequate Approach to Image Retrieval Based on Local Level Feature Extraction
- Meta-Heuristic Cuckoo Search Algorithm for the Correction of Faulty Array Antenna
- The Frequency Control in the islanded Micro Grid by using STATCOM Controllers
- Exact Solutions on the Oscillating Plate of Maxwell Fluids
- Translating Activity Diagram from Duration Calculus for Modeling of Real-Time Systems and its Formal Verification using UPPAAL and DiVinE
Last modified: 2016-01-10 01:17:23