A Hybrid Approach for Urdu Sentence Boundary Disambiguation
Journal: The International Arab Journal of Information Technology (Vol.9, No. 3)Publication Date: 2012-05-01
Authors : Zobia Rehman; Waqas Anwar;
Page : 250-255
Keywords : Sentence boundary disambiguation; and unigram model.;
Abstract
Sentence boundary identification is a preliminary step for preparing a text document for Natural Language Processing tasks, e.g., machine translation, POS tagging, text summarization and etc. We present a hybrid approach for Urdu sentence boundary disambiguation comprising of unigram statistical model and rule based algorithm. After implementing this approach, we obtained 99.48% precision, 86.35% recall and 92.45% F1-Measure while keeping training and testing data different from each other, and with same training and testing data, we obtained 99.36% precision, 96.45% recall and 97.89% F1-Measure.
Other Latest Articles
- Comparison of Genetic Algorithm and Quantum Genetic Algorithm
- Testing and Evaluation of a Secure Integrity Measurement System (SIMS) for Remote Systems
- A Novel Radon-Wavelet-based Multi-Carrier Code Division Multiple Access Transceiver Design and Simulation under Different Channel Conditions
- Neural Disparity Map Estimation from Stereo Image
- An Effective Data Warehousing System for RFID using Novel Data Cleaning, Data Transformation and Loading Techniques
Last modified: 2019-05-06 20:57:44