A NOVEL DISTANCE BASED MODIFIED K-MEANS CLUSTERING ALGORITHM FOR ESTIMATION OF MISSING VALUES IN MICRO-ARRAY GENE EXPRESSION DATA
Journal: International Journal of Information Technology and Management information System (IJITMIS) (Vol.5, No. 3)Publication Date: 2014-12-31
Authors : Chandra Das Shilpi Bose Matangini Chattopadhyay Samiran Chattopadhyay;
Page : 1-13
Keywords : Microarray; Clustering; K-Means; Missing Value Estimation; Coexpressed Gene; Coregulated Gene.;
Abstract
Microarray experiments normally produce data sets with multiple missing expression values, due to various experimental problems. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene expression values as input. Therefore, effective missing value estimation methods are needed to minimize the effect of incomplete data during analysis of gene expression data using these algorithms. In DNA microarray analysis, coexpressed genes provide similar biological functions. The coexpressed genes are those whose expression levels may rise and fall synchronously in response to a set of experimental conditions. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike/correlated. In this paper, a new distance is proposed to find closest coexpressed genes in an efficient way. Based on this distance a modified K-means clustering algorithm is proposed to accurately predict missing values in microarray gene expression data. The estimation accuracy of the proposed clustering method is compared with the widely used KNNimpute, SKNNimpute and IKNNimpute methods on various microarray data sets with different rate of missing entries. The experimental results show the effectiveness of this proposed method compared to other existing methods in terms of Normalized Root Mean Square error.
Other Latest Articles
- RETAILING AND ITS CHALLENGES –THE PRESENT OUTLOOK
- HUGE GROWTH OPPORTUNITIES FOR MOBILE APPS IN THE CONVERGENCE ERA
- BUILDING AN EFFICIENT CLASSIFICATION MODEL: A COMPARISON OF LOGISTICS REGRESSION AND ARTIFICIAL NEURAL NETWORK
- SOCIO ECONOMIC CHARACTERISTICS AND THE DIMENSIONS OF GENDER ISSUES IN BPO INDUSTRY
- A STUDY ON CUSTOMER SATISFACTION TOWARDS BATH SOAP PRODUCTS WITH SPECIAL REFERENCE TO COIMBATORE CITY
Last modified: 2021-04-13 17:51:18