A Hybrid Multi-Word Terms Extraction System Applied to Topic Detection
Journal: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY (Vol.13, No. 10)Publication Date: 2014-08-31
Authors : Rim Koulali; Abdelouafi Meziane;
Page : 5105-5112
Keywords : Multi-word Terms Extraction; Topic Detection; C-value; LLR.;
Abstract
Mutli-word Terms extraction plays an important role in many Natural Language Processing (NLP) tasks. Despite their major importance, few works were dedicated to Arabic multi-word terms extraction. This paper proposes an automatic Arabic multi-word terms (MWTs) extraction system based on two major filtering steps: linguistics filter using a part-of-speech tagger along with morphological patterns and statistical filter based on probabilistic methods, namely: Log-Likelihood Ratio (LLR) and C-value. We evaluate the performances of the realized systems on Wattan; an Arabic oriented topic newspaper corpus. Our system manages to achieve 90.23% in term of multi-word extraction precision. We also study the use of MWTs as features in Arabic Topic Detection. The conducted experiments show good results.
Other Latest Articles
- Content-Based Image Retrieval using Color Quantization and Angle Representation
- Investigating the Synergistic Relationship between Enterprise Resource Planning and Business Intelligence
- Highly Scalable Network Management Solution Using Cassandra
- A Survey on Intelligent Water Drop Algorithm
- Free convection between vertical concentric annuli with induced magnetic field when inner cylinder is electrically conducting
Last modified: 2016-06-29 16:52:36