ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Constructing a Lexicon of Arabic-English Named Entity using SMT and Semantic Linked Data

Journal: The International Arab Journal of Information Technology (Vol.14, No. 6)

Publication Date:

Authors : ; ; ; ;

Page : 820-825

Keywords : NER; named entity translation; parallel Arabic-English lexicon; DBpedia; linked data entities; parallel corpus; SMT.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Named Entity Recognition (NER) is the problem of locating and categorizing atomic entities in a given text. In this work, we used DBpedia Linked datasets and combined existing open source tools to generate from a parallel corpus a bilingual lexicon of Named Entities (NE). To annotate NE in the monolingual English corpus, we used linked data entities by mapping them to Gate Gazetteers. In order to translate entities identified by the gate tool from the English corpus, we used moses, a Statistical Machine Translation (SMT) system. The construction of the Arabic-English NE lexicon is based on the results of moses translation. Our method is fully automatic and aims to help Natural Language Processing (NLP) tasks such as, Machine Translation (MT) information retrieval, text mining and question answering. Our lexicon contains 48753 pairs of Arabic-English NE, it is freely available for use by other researchers.

Last modified: 2019-05-09 19:09:54