Clustering with Probabilistic Topic Models on Arabic Texts: A Comparative Study of LDA and K-Means
Journal: The International Arab Journal of Information Technology (Vol.13, No. 2)Publication Date: 2016-03-01
Authors : Abdessalem Kelaiaia; Hayet Merouani;
Page : 332-338
Keywords : Clustering; topics identification; arabic text; LDA; k-means; preprocessing.;
Abstract
Recently, probabilistic topic models such as Latent Dirichlet Allocation (LDA) have been widely used for applications in many text mining tasks such as retrieval, summarization and clustering on different languages. In this paper, we present a first comparative study between LDA and K-means, two well-known methods respectively in topics identification and clustering applied on Arabic texts. Our aim is to compare the influence of morpho-syntactic characteristics of Arabic language on performance of first method compared to the second one. In order to, study different aspects of those methods the study is conducted on four benchmark document collections in which the quality of clustering was measured by the use of four well-known evaluation measures, Rand index, Jaccard index,F-measure and Entropy. The results consistently show that LDA perform best results more than K-means in most cases
Other Latest Articles
- Implementation of Image Processing System using Handover Technique with Map Reduce Based on Big Data in the Cloud Environment
- Investigation and Analysis of Research Gate User’s Activities using Neural Networks
- Predicting the Existence of Design Patterns based on Semantics and Metrics
- Secure Verification Technique for Defending IP Spoofing Attacks
- Dynamic Group Recommendation with Modified Collaborative Filtering and Temporal Factor
Last modified: 2019-11-13 20:56:40