ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

An efficient distance estimation and centroid selection based on k-means clustering for small and large dataset

Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.7, No. 73)

Publication Date:

Authors : ; ;

Page : 234-240

Keywords : K-means; Distance estimation; Centroid selection; Distance methods.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

In this paper an efficient distance estimation and centroid selection based on k-means clustering for small and large dataset. Data pre-processing was performed first on the dataset. For the complete study and analysis PIMA Indian diabetes dataset was considered. After pre-processing distance and centroid estimation was performed. It includes initial selection based on randomization and then centroids updations were performed till the iterations or epochs determined. Distance measures used here are Euclidean distance (Ed), Pearson Coefficient distance (PCd), Chebyshev distance (Csd) and Canberra distance (Cad). The results indicate that all the distance algorithms performed approximately well in case of clustering but in terms of time Cad outperforms in comparison to other algorithms.

Last modified: 2021-01-06 14:38:12