ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Survey of Document Clustering?

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.3, No. 5)

Publication Date:

Authors : ; ; ;

Page : 871-874

Keywords : clustering; document; hierarchical; partitional;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

This paper presents the results of an experimental study of common known document clustering algorithms. In essence, there are two main approaches to document clustering. They are agglomerative hierarchical clustering and K-means. (For K-means there are a ―standard‖ K-means algorithm and a variant of K-means, ―bisecting‖ K-means in which K-means is repeated for some finite number of times). Hierarchical clustering, often graphed as the better quality clustering approach, is limited because of its quadratic time complexity. In contrast, K-means and its variant (bisecting K-means) have a time complexity which is linear in the number of documents, but are considered to produce inferior clusters. However, our results indicate that the bisecting K-means approach is better than the standard K-means approach and as good as or better than the hierarchical approaches that we tested for a variety of clusters.

Last modified: 2014-05-29 23:38:39