ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Variability analysis of the hierarchical clustering algoritms and its implication on consensus clustering

Journal: International Journal of Advanced Engineering Research and Science (Vol.4, No. 5)

Publication Date:

Authors : ;

Page : 118-131

Keywords : Data Mining; Cluster analysis; Consensus clustering; Hierarchical clustering algorithm; Validation indices.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Clustering is one of the most important unsupervised learning tools when no prior knowledge about the data set is available. Clustering algorithms aim to find underlying structure of the data sets taking into account clustering criteria, properties in the data and specific way of data comparison. In the literature many clustering algorithms have been proposed having a common goal which is, given a set of objects, grouping similar objects in the same cluster and dissimilar objects in different clusters. Hierarchical clustering algorithms are of great importance in data analysis providing knowledge about the data structure. Due to the graphical representation of the resultant partitions, through a dendrogram, may give more information than the clustering obtained by non hierarchical clustering algorithms. The use of different clustering methods for the same data set, or the use of the same clustering method but with different initializations (different parameters), can produce different clustering. So several studies have been concerned with validate the resulting clustering analyzing them in terms of stability / variability, and also, there has been an increasing interest on the problem of determining a consensus clustering. This work empirically analyzes the clustering variability delivered by hierarchical algorithms, and some consensus clustering techniques are also investigated. By the variability of hierarchical clustering, we select the most suitable consensus clustering technique existing in literature. Results on a range of synthetic and real data sets reveal significant differences of the variability of hierarchical clustering as well as different performances of the consensus clustering techniques.

Last modified: 2017-07-03 03:16:51