ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A soft similarity measure for k-means based high dimensional document clustering

Journal: IADIS INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (Vol.12, No. 1)

Publication Date:

Authors : ; ;

Page : 88-108

Keywords : Feature Selection; Feature Reduction; Clustering; Classification; Dimensionality;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Feature dimensionality has always been one of the key challenges in text mining as it increases complexity when mining documents with high dimensionality. High dimensionality introduces sparseness, noise, and boosts the computational and space complexities. Dimensionality reduction is usually addressed by implementing either feature reduction or feature selection techniques. In this work, the problem of dimensionality reduction is addressed using singular value decomposition and the results are compared to information gain approach through retaining top-k features. High dimensional clustering is carried by using k-means algorithm with gaussian function. The proposed dimensionality reduction and clustering approaches are compared to conventional approaches and results prove the importance of our approach.

Last modified: 2019-12-13 20:53:49