K-Means Document Clustering using Vector Space Model
Journal: Bonfring International Journal of Data Mining (Vol.5, No. 2)Publication Date: 2015-07-31
Authors : R. Malathi Ravindran; Antony Selvadoss Thanamani;
Page : 10-14
Keywords : Data Mining; Document Clustering; High Dimensional Data; Vector Space Model; K-Means Clustering; Cosine Similarity;
Abstract
Document Clustering is the collection of similar documents into classes and the similarity is some function on the document. Document Clustering need not require any separate training process and manual tagging group in advance. The documents used in the same clusters are more similar, while the documents used in different clusters are more dissimilar. It is one of the familiar technique used in data analysis and is used in many areas including data mining, statistics and image analysis. The traditional clustering approaches lose its algorithmic approach when handling high dimensional data. For this, a new K-Means Clustering technique is proposed in this work. Here Cosine Similarity of Vector Space Model is used as the centroid for clustering. Using this approach, the documents can be clustered efficiently even when the dimension is high because it uses vector space representation for documents which is suitable for high dimensions.
Other Latest Articles
- PHILOSOPHICAL THOUGHTS IN POETRY OF SYR LAND AKYN-ZHYRAUS
- MEDIATING EFFECT OF OCCUPATIONAL HEALTH AND EMPLOYEES WELL BEING ON ORGANIZATIONAL PERFORMANCE
- PHYTO-CHEMICAL AND PHARMACOLOGICAL EVALUATION OF ETHNO-MEDICINAL PLANT DRUGS (EMP) AND TRIBAL MEDICINE FORMULATION (TMF) USED BY TRIBAL PRACTITIONERS FOR WOUND THERAPEUTICS IN THE REGION OF BILIGIRIRANGANA HILLS, KARNATAKA
- USE OF ALLIUM CEPA (RED ONION SKIN) EXTRACT AS INDICATOR ALTERNATE IN ACID ? BASE TITRIMETRIC ANALYSIS
- SEMANTIC VERSIONING IN ONTOLOGIES
Last modified: 2015-07-23 20:28:54