k-means Based Document Clustering with Automatic “k” Selection and Cluster Refinement
Journal: International Journal of Computer Science and Mobile Applications IJCSMA (Vol.2, No. 5)Publication Date: 2014-05-30
Authors : Himanshu Gupta; Rajeev Srivastava;
Page : 7-13
Keywords : Document Clustering; k-means; Feature Voting; SVD; Vector Space Model; Cosine similarity;
Abstract
In recent years use of web has been increased manifold. Efficiency is as important as accuracy. Automatic document clustering is an important part of many important fields such as data mining, information retrieval etc. Most of the document clustering techniques are based on k-means and it’s variants. K-means is a fast algorithm but there are some shortcomings with this technique. K in k-means stands for no of clusters which a user has to provide but most of the times users don’t have any clue about k. In our implementation of document clustering technique we used SVD (Singular Vector Decomposition) to find out no of clusters (value of k) required. Then k-means algorithm is used to create clusters and in last phase of algorithm the clusters are refined by feature voting. Refinement phase enable us to make our algorithm much faster than k-means algorithm.
Other Latest Articles
- The geopolitical and social factors to improve the technique of fencing peoples of Russia in the period from VII-XIV centuries
- Theoretical foundations of youth and sports in Ukraine backup
- World records and age swimmers in achieving top sports results
- The study of the functional state of athletes using new methodological approaches
- Improving the technique works on the subject in cheerleading at the stage of basic specialist training
Last modified: 2014-05-24 02:18:44