k-means Based Document Clustering with Automatic “k” Selection and Cluster Refinement

Journal: International Journal of Computer Science and Mobile Applications IJCSMA (Vol.2, No. 5)

Publication Date: 2014-05-30

Authors : Himanshu Gupta; Rajeev Srivastava;

Page : 7-13

Keywords : Document Clustering; k-means; Feature Voting; SVD; Vector Space Model; Cosine similarity;

Source : Download Find it from : Google Scholar

Abstract

In recent years use of web has been increased manifold. Efficiency is as important as accuracy. Automatic document clustering is an important part of many important fields such as data mining, information retrieval etc. Most of the document clustering techniques are based on k-means and it’s variants. K-means is a fast algorithm but there are some shortcomings with this technique. K in k-means stands for no of clusters which a user has to provide but most of the times users don’t have any clue about k. In our implementation of document clustering technique we used SVD (Singular Vector Decomposition) to find out no of clusters (value of k) required. Then k-means algorithm is used to create clusters and in last phase of algorithm the clusters are refined by feature voting. Refinement phase enable us to make our algorithm much faster than k-means algorithm.

Main Menu

Searching By

PARTNERS

k-means Based Document Clustering with Automatic “k” Selection and Cluster Refinement

Abstract

Advertisement