Survey of Document Clustering?Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.3, No. 5)
Publication Date: 2014-05-30
Authors : Hetal Gaudani; Khushboo Lakhani; Riten Chhatrala;
Page : 871-874
Keywords : clustering; document; hierarchical; partitional;
This paper presents the results of an experimental study of common known document clustering algorithms. In essence, there are two main approaches to document clustering. They are agglomerative hierarchical clustering and K-means. (For K-means there are a ―standard‖ K-means algorithm and a variant of K-means, ―bisecting‖ K-means in which K-means is repeated for some finite number of times). Hierarchical clustering, often graphed as the better quality clustering approach, is limited because of its quadratic time complexity. In contrast, K-means and its variant (bisecting K-means) have a time complexity which is linear in the number of documents, but are considered to produce inferior clusters. However, our results indicate that the bisecting K-means approach is better than the standard K-means approach and as good as or better than the hierarchical approaches that we tested for a variety of clusters.
Other Latest Articles
Last modified: 2014-05-29 23:38:39