ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

An Efficient Hierarchical Clustering Algorithm for Large Datasets

Journal: Austin Journal of Proteomics, Bioinformatics & Genomics (Vol.2, No. 1)

Publication Date:

Authors : ; ;

Page : 1-6

Keywords : Hybrid hierachical clustering; Hierachical clustering; k -means clustering; Large datasets;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Hierarchical clustering is a widely adopted unsupervised learning algorithm for discovering intrinsic groups embedded within a dataset. Standard implementations of the exact algorithm for hierarchical clustering require ( ) 2 O n time and ( ) 2 O n memory and thus are unsuitable for processing datasets containing more than 20 000 objects. In this study, we present a hybrid hierarchical clustering algorithm requiring approximately O n n ( ) time and O n n ( ) memory while still preserving the most desirable properties of the exact algorithm. The algorithm was capable of clustering one million compounds within a few hours on a single processor. The clustering program is freely available to the research community

Last modified: 2017-10-30 15:19:58