Constructing a Lexicon of Arabic-English Named Entity using SMT and Semantic Linked Data
Journal: The International Arab Journal of Information Technology (Vol.14, No. 6)Publication Date: 2017-11-01
Authors : Emna Hkiri; Souheyl Mallat; Mounir Zrigui; Mourad Mars;
Page : 820-825
Keywords : NER; named entity translation; parallel Arabic-English lexicon; DBpedia; linked data entities; parallel corpus; SMT.;
Abstract
Named Entity Recognition (NER) is the problem of locating and categorizing atomic entities in a given text. In this work, we used DBpedia Linked datasets and combined existing open source tools to generate from a parallel corpus a bilingual lexicon of Named Entities (NE). To annotate NE in the monolingual English corpus, we used linked data entities by mapping them to Gate Gazetteers. In order to translate entities identified by the gate tool from the English corpus, we used moses, a Statistical Machine Translation (SMT) system. The construction of the Arabic-English NE lexicon is based on the results of moses translation. Our method is fully automatic and aims to help Natural Language Processing (NLP) tasks such as, Machine Translation (MT) information retrieval, text mining and question answering. Our lexicon contains 48753 pairs of Arabic-English NE, it is freely available for use by other researchers.
Other Latest Articles
- Chaotic Encryption Scheme Based on a Fast Permutation and Diffusion Structure
- Abductive Network Ensembles for Improved Prediction of Future Change-Prone Classes in Object-Oriented Software
- SAK-AKA: A Secure Anonymity Key of Authentication and Key Agreement protocol for LTE network
- Multi-criteria Selection of the Computer Configuration for Engineering Design
- An SNR Unaware Large Margin Automatic Modulations Classifier in Variable SNR Environments
Last modified: 2019-05-09 19:09:54