Document Clustering using Improved K-means Algorithm
Journal: International Journal of Science and Research (IJSR) (Vol.5, No. 6)Publication Date: 2016-06-05
Authors : Anjali Vashist; Rajender Nath;
Page : 2206-2210
Keywords : Document Clustering; Cosine Similarity; Term Finder; Tf-Idf; Threshold;
Abstract
Clustering is an efficient technique that organizes a large quantity of unordered text documents into a small number of significant and coherent clusters, thereby providing a basis for intuitive and informative navigation and browsing mechanisms. It is studied by the researchers at broad level because of its broad application in several areas such as web mining, search engines, and information extraction. It clusters the documents based on various similarity measures. The existing K-means (document clustering algorithm) was based on random center generation and every time the clusters generated was different In this paper, an Improved Document Clustering algorithm is given which generates number of clusters for any text documents based on fixed center generation, collect only exclusive words from different documents in dataset and uses cosine similarity measures to place similar documents in proper clusters. Experimental results showed that accuracy of proposed algorithm is high compare to existing algorithm in terms of F-Measure, Recall, Precision and time complexity.
Other Latest Articles
- Efficiency of Local Government Units in Northwestern Philippines as to the Attainment of the Millenium Development Goals
- Energy Efficient WSN using GPSR for Mobile Sink
- Influence of Soil-Structure Interaction on Response of a Multi-Storied Building against Earthquake Forces
- Simulation Based on CFD of Drilling Fluid Screening Process
- An Experimental Study on Fresh and Hardened Properties of Self Compacting Concrete with Marble Powder and Cement Kiln Dust As Mineral Admixture
Last modified: 2021-07-01 14:39:08