ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A NOVEL DISTANCE BASED MODIFIED K-MEANS CLUSTERING ALGORITHM FOR ESTIMATION OF MISSING VALUES IN MICRO-ARRAY GENE EXPRESSION DATA

Journal: International Journal of Information Technology and Management information System (IJITMIS) (Vol.5, No. 3)

Publication Date:

Authors : ;

Page : 1-13

Keywords : Microarray; Clustering; K-Means; Missing Value Estimation; Coexpressed Gene; Coregulated Gene.;

Source : Download Find it from : Google Scholarexternal

Abstract

Microarray experiments normally produce data sets with multiple missing expression values, due to various experimental problems. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene expression values as input. Therefore, effective missing value estimation methods are needed to minimize the effect of incomplete data during analysis of gene expression data using these algorithms. In DNA microarray analysis, coexpressed genes provide similar biological functions. The coexpressed genes are those whose expression levels may rise and fall synchronously in response to a set of experimental conditions. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike/correlated. In this paper, a new distance is proposed to find closest coexpressed genes in an efficient way. Based on this distance a modified K-means clustering algorithm is proposed to accurately predict missing values in microarray gene expression data. The estimation accuracy of the proposed clustering method is compared with the widely used KNNimpute, SKNNimpute and IKNNimpute methods on various microarray data sets with different rate of missing entries. The experimental results show the effectiveness of this proposed method compared to other existing methods in terms of Normalized Root Mean Square error.

Last modified: 2021-04-13 17:51:18