ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Finding Correlated CCC-Biclusters from Gene Expression Data?

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.3, No. 4)

Publication Date:

Authors : ; ;

Page : 1019-1034

Keywords : ;

Source : Downloadexternal Find it from : Google Scholarexternal


Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of experimental conditions, where the genes exhibit highly correlated behaviors. These correlated behaviors correspond to coherent expression patterns and can be used to identify potential regulatory modules possibly involved in regulatory mechanisms. Many specific versions of the biclustering problem have been shown to be (Non-deterministic polynomial) NP-complete. However, identifying biclusters in time series expression data, it can restrict the problem by finding only maximal biclusters with contiguous columns. This restriction leads to a tractable problem. The motivation of the biological processes start and finish in an identifiable contiguous period of time, leading to increased (or decreased) activity of sets of genes forming biclusters with contiguous columns. In this context, an algorithm that find and reports all maximal contiguous column coherent biclusters. (CCC-Biclusters), in time linear in the size of the expression matrix. Each relevant CCC-Bicluster identified corresponds to the discovery of a coherent expression pattern shared by a group of genes in a contiguous subset of time-points and identifies a potentially relevant regulatory module. The linear time complexity of CCC-Biclustering is obtained by manipulating a discretized version of the gene expression matrix and using efficient string processing techniques based on suffix trees. The results of the proposed algorithm in synthetic and real data that show the effectiveness of the approach and the relevance of CCC-Biclustering in the discovery of regulatory modules. These results were obtained by applying the algorithm to the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress. The results show not only the ability of the proposed methodology to extract relevant information compatible with documented biological knowledge, but also the utility of using this algorithm in the study of other environmental stresses, and of regulatory modules, in general.

Last modified: 2014-04-29 12:22:35