Improving the Accuracy of English-Arabic Statistical Sentence Alignment

Journal: The International Arab Journal of Information Technology (Vol.8, No. 2)

Publication Date: 2011-04-01

Authors : Mohammad Salameh Rached Zantout Nashat Mansour;

Page : 171-177

Keywords : Mohammad Salameh1; Rached Zantout2; and Nashat Mansour1;

Source : Download Find it from : Google Scholar

Abstract

Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences

Main Menu

Searching By

PARTNERS

Improving the Accuracy of English-Arabic Statistical Sentence Alignment

Abstract

Advertisement