ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Efficient Way of Determining the Number of Clusters Using Hadoop Architecture

Journal: International Journal of Science and Research (IJSR) (Vol.4, No. 2)

Publication Date:

Authors : ; ;

Page : 633-638

Keywords : Minimum Spanning Tree MST; Gap statistic; IC-av;

Source : Downloadexternal Find it from : Google Scholarexternal


The process of data mining is to extract information from a data set and transform it into an understandable structure. The clustering task plays a very important role in many areas such as exploratory data analysis, pattern recognition, computer vision, and information retrieval. The key idea is to view clustering as a supervised classification problem, in which we estimate the true class labels. The problem of determining the valid number of clusters is not easy. To overcome this problem many well known methods are used to find a correct number of clusters i. e. Gap statistic, Path based clustering and Figure of Merit (FOM) but these methods could not solve the problem of finding number of clusters efficiently. This paper focuses on Average Intracluster Distance index to validate the estimated number of arbitrary shaped clusters. In hadoop the proposed technique is based on the local relations between patterns and their clustering labels which makes use of Minimum Spanning Tree (MST) algorithm based on the multiplicity property of MST to get accurate results in efficient manner.

Last modified: 2021-06-30 21:22:46