Optimizing Text Clustering: A Methodological Approach for Determining the Optimal Number of Clusters
Journal: International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE) (Vol.13, No. 3)Publication Date: 2024=06-10
Authors : Oussama Chabih Sara Sbai Mohammed Reda Chbihi Louhdi Hicham Behja;
Page : 103-111
Keywords : Kmeans; Number of clusters; Text document clustering; Unsupervised classification;
Abstract
Developing a method to determine the optimal number of clusters is a crucial endeavor, particularly in the domain of text clustering where the sheer volume of variations poses significant challenges. Recognizing this, our study is specifically tailored to address this challenge within the realm of unsupervised text analysis. We put forth an innovative approach that marries the K-means algorithm with Bregman distance, meticulously crafted to accommodate the idiosyncrasies inherent in textual data. Our iterative methodology is designed with a dual purpose: to mitigate the adverse effects of noise and to ensure the stability of the clusters formed, all underpinned by the sophisticated metric of Kullback-Leibler divergence. Through rigorous experimentation, we validated the efficacy of our method in effectively segmenting texts into coherent clusters. Notably, our approach outperformed an initial categorization, providing a more nuanced and representative depiction of the diverse array of topics present within the corpus. In essence, our study offers a promising avenue to enhance unsupervised text analysis, heralding potential advancements and avenues for further exploration in this dynamic field
Other Latest Articles
Last modified: 2024-06-17 18:57:14