ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A Hybrid Multi-Word Terms Extraction System Applied to Topic Detection

Journal: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY (Vol.13, No. 10)

Publication Date:

Authors : ; ;

Page : 5105-5112

Keywords : Multi-word Terms Extraction; Topic Detection; C-value; LLR.;

Source : Download Find it from : Google Scholarexternal

Abstract

Mutli-word Terms extraction plays an important role in many Natural Language Processing (NLP) tasks. Despite their major importance, few works were dedicated to Arabic multi-word terms extraction. This paper proposes an automatic Arabic multi-word terms (MWTs) extraction system based on two major filtering steps: linguistics filter using a part-of-speech tagger along with morphological patterns and statistical filter based on probabilistic methods, namely: Log-Likelihood Ratio (LLR) and C-value. We evaluate the performances of the realized systems on Wattan; an Arabic oriented topic newspaper corpus. Our system manages to achieve 90.23% in term of multi-word extraction precision. We also study the use of MWTs as features in Arabic Topic Detection. The conducted experiments show good results.

Last modified: 2016-06-29 16:52:36