A hybrid approach for generative process model with topic modelling towards efficient and dynamic document clustering
Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.10, No. 106)Publication Date: 2023-10-30
Authors : Gugulothu Venkanna; K.F Bharati;
Page : 1184-1197
Keywords : Document clustering; Natural language processing; Generative process model; Document similarity; Dynamic document clustering.;
Abstract
Clustering text documents has a wide range of applications across various domains. However, due to the diversity and rapid growth of textual data, performing clustering on a given text corpus has become increasingly challenging. Several existing approaches for text document clustering rely on natural language processing (NLP) and text similarity measures. However, there is a pressing need for a generative process model to systematically and progressively handle text corpora. Furthermore, a hybrid approach that enhances clustering performance is essential. Therefore, developing a model for a given text corpus and dynamically updating it as new documents arrive, rather than starting clustering from scratch, is of paramount importance. In this paper, a framework known as the hybrid approach for dynamic document clustering (HADDC) was proposed. This framework is realized through the definition of two algorithms that collaborate to achieve dynamic document clustering. The first algorithm, called similar document identification (SDI), leverages a lexical dictionary, WordNet, and similarity measures to effectively identify similar documents. The second algorithm, topic modelling for efficient and dynamic document clustering (TM-EDDC), is designed as a dynamic process model based on latent Dirichlet allocation (LDA). It has the capability to cluster documents incrementally as new ones become available. Experimental results demonstrate that the proposed methods outperform existing ones, as evidenced by a lower mean absolute error (MAE). The proposed framework and underlying algorithms were evaluated using the news groups dataset. The empirical study showcases the enhanced utility and efficiency of the proposed framework, making it a valuable tool for organizations to integrate into their existing applications.
Other Latest Articles
- Intellig_block: enhancing IoT security with blockchain-based adversarial machine learning protection
- Intelligent face sketch recognition system using shearlet transform and convolutional neural network model
- Trusted surveillance system based on blockchain-internet of spatial things for smart cities
- A comprehensive control strategy for power quality enhancement in railway power system
- Design and implementation system of mobile oxygen concentrator and telemedicine for comprehensive treatment of SpO2
Last modified: 2023-10-07 16:38:31