An Efficient Hierarchical Clustering Algorithm for Large Datasets

Journal: Austin Journal of Proteomics, Bioinformatics & Genomics (Vol.2, No. 1)

Publication Date: 2015-02-23

Authors : Olga Tanaseichuk Alireza Hadj Khodabakshi Dimitri Petrov Jianwei Che Tao Jiang Bin Zhou Andrey Santrosyan; Yingyao Zhou;

Page : 1-6

Keywords : Hybrid hierachical clustering; Hierachical clustering; k -means clustering; Large datasets;

Source : Download Find it from : Google Scholar

Abstract

Hierarchical clustering is a widely adopted unsupervised learning algorithm for discovering intrinsic groups embedded within a dataset. Standard implementations of the exact algorithm for hierarchical clustering require ( ) 2 O n time and ( ) 2 O n memory and thus are unsuitable for processing datasets containing more than 20 000 objects. In this study, we present a hybrid hierarchical clustering algorithm requiring approximately O n n ( ) time and O n n ( ) memory while still preserving the most desirable properties of the exact algorithm. The algorithm was capable of clustering one million compounds within a few hours on a single processor. The clustering program is freely available to the research community

Main Menu

Searching By

PARTNERS

An Efficient Hierarchical Clustering Algorithm for Large Datasets

Abstract

Advertisement