Improving the Accuracy of English-Arabic Statistical Sentence Alignment
Journal: The International Arab Journal of Information Technology (Vol.8, No. 2)Publication Date: 2011-04-01
Authors : Mohammad Salameh Rached Zantout Nashat Mansour;
Page : 171-177
Keywords : Mohammad Salameh1; Rached Zantout2; and Nashat Mansour1;
Abstract
Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences
Other Latest Articles
- An Ontology-based Semantic Extraction Approach for B2C eCommerce
- A New Fault Injection Approach to Study the Impact of Bitflips in the Configuration of SRAM-Based FPGAs
- Cloud Data Center Design using Delay Tolerant Based Priority Queuing Model
- New Class-based Dynamic Scheduling Strategy for Self-Management of Packets at the Internet Routers
- Offline Isolated Arabic Handwriting Character Recognition System Based on SVM
Last modified: 2019-04-28 21:12:18