ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Improving the Accuracy of English-Arabic Statistical Sentence Alignment

Journal: The International Arab Journal of Information Technology (Vol.8, No. 2)

Publication Date:

Authors : ;

Page : 171-177

Keywords : Mohammad Salameh1; Rached Zantout2; and Nashat Mansour1;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences

Last modified: 2019-04-28 21:12:18