Efficient Way of Determining the Number of Clusters Using Hadoop Architecture
Journal: International Journal of Science and Research (IJSR) (Vol.4, No. 2)Publication Date: 2015-02-05
Authors : Siri H. P.; Shashikala.B;
Page : 633-638
Keywords : Minimum Spanning Tree MST; Gap statistic; IC-av;
Abstract
The process of data mining is to extract information from a data set and transform it into an understandable structure. The clustering task plays a very important role in many areas such as exploratory data analysis, pattern recognition, computer vision, and information retrieval. The key idea is to view clustering as a supervised classification problem, in which we estimate the true class labels. The problem of determining the valid number of clusters is not easy. To overcome this problem many well known methods are used to find a correct number of clusters i. e. Gap statistic, Path based clustering and Figure of Merit (FOM) but these methods could not solve the problem of finding number of clusters efficiently. This paper focuses on Average Intracluster Distance index to validate the estimated number of arbitrary shaped clusters. In hadoop the proposed technique is based on the local relations between patterns and their clustering labels which makes use of Minimum Spanning Tree (MST) algorithm based on the multiplicity property of MST to get accurate results in efficient manner.
Other Latest Articles
- Adaptive Mobile Video Streaming and Efficient Social Video Sharing in the Clouds
- Kayser-Fleischer Ring as the Initial Presentation of Wilson Disease - A Case Report and Review of Literature
- Radioactivity Levels in Some Sediments and Water Samples from Qarun Lake by Low?Level Gamma Spectrometry
- Virulence and Control of Sporisorium ehrenbergii Vanky Races Attack Sorghum in Sohag Regions of Upper Egypt
- An Identity-Based Secure Authenticated Framework by Using ECC in Cloud Computing
Last modified: 2021-06-30 21:22:46