A soft similarity measure for k-means based high dimensional document clustering
Journal: IADIS INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (Vol.12, No. 1)Publication Date: 2017-07-01
Authors : T. V. Rajinikanth; G. Suresh Reddy;
Page : 88-108
Keywords : Feature Selection; Feature Reduction; Clustering; Classification; Dimensionality;
Abstract
Feature dimensionality has always been one of the key challenges in text mining as it increases complexity when mining documents with high dimensionality. High dimensionality introduces sparseness, noise, and boosts the computational and space complexities. Dimensionality reduction is usually addressed by implementing either feature reduction or feature selection techniques. In this work, the problem of dimensionality reduction is addressed using singular value decomposition and the results are compared to information gain approach through retaining top-k features. High dimensional clustering is carried by using k-means algorithm with gaussian function. The proposed dimensionality reduction and clustering approaches are compared to conventional approaches and results prove the importance of our approach.
Other Latest Articles
- Privacy and data security in internet of things
- A class based clustering approach for imputation and mining of medical records (cbc-im)
- Design and analysis of similarity measure for discovering similarity profiled temporal association patterns
- A feature clustering based dimensionality reduction for intrusion detection (fcbdr)
- A text similarity measure for document classification
Last modified: 2019-12-13 20:53:49