A Similarity Measure for Documents Using Clustering Technique
Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.7, No. 12)Publication Date: 2018-12-30
Authors : R.Anushya; A.Linda Sherin; A.Finny Belwin; Antony Selvadoss Thanamani;
Page : 239-248
Keywords : Clustering; Jaccard similarity; Cosine similarity; Euclidean measure; Correlation coefficient; K-means;
Abstract
Text clustering is a critical use of information mining. It is worried about gathering comparable content archives together. Content report grouping assumes a vital job in giving natural route and perusing systems by sorting out a lot of data into few important clusters. Grouping technique needs to implant the reports in an appropriate similitude space. In this paper we look at four prominent similitude measures: cosine similarity, Jaccard similarity, Euclidean distance and Correlation Coefficient related to various sorts of vector space portrayal (Boolean, term recurrence and reverse report recurrence) of archives. Clustering of archives is performed utilizing summed up k-Means; a Partitioned constructed grouping strategy in light of high dimensional inadequate information speaking to content reports. Execution is estimated against a human-forced arrangement of Topic and Place classes. We led various tests and utilized entropy measure to guarantee factual noteworthiness of results. Cosine, Pearson relationship and Jaccard similitude rise as the best measures to catch human categorization conduct, while Euclidean measures perform poor.
Other Latest Articles
- Improving Efficiency in Cloud Computing Environments Using Resource Management
- Consolidate Data Collecting Based On Even Clustering For Wireless Sensor Networks
- Culturally Displaced Identity of the Protagonist in the Novel ‘Wife’
- Culturally Displaced Identity of the Protagonist in the Novel ‘Wife’
- Western Cultural Beliefs in Wole Soyinka’s Death and King’s Horseman
Last modified: 2018-12-30 18:17:10