ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A Verified Technique for Colon Cancer Analysis with Minimum Number of Features

Journal: International Journal of Scientific Engineering and Research (IJSER) (Vol.3, No. 8)

Publication Date:

Authors : ; ;

Page : 77-79

Keywords : Features selection; QDA classifier; gene expression; t-test; and p-value;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Gene expression data is characterized by high dimensionality and small number of samples. Many researches work in data reduction, in other words selecting the most influence features (features selection). This work differs in verifying each step of selection; also, it reaches smaller number of features with high discrimination. Reducing data dimensionality lead to effective analysis of gene features. Actually, there is a tradeoff between feature selection and acceptable accuracy. The target is to find the compact set of features used for knowledge discovery and acceptable accuracy. So, we present a novel framework which integrates dimensionality reduction with classification for gene expression data analysis. In order to achieve our objective, we will use Oligonucleotide arrays. It provides a broad picture of the cell state by monitoring the expression level of thousands of genes at the same time. The developed techniques make to extract useful information from the resulting data sets. Gene expression is analyzed using 40 tumor and 22 normal colon tissue samples with 2000 human genes. The first phase of preprocessing, the introduced data is arranged and normalized. The second phase performs the features reduction in two steps. First step implements the features reduction from 2000 to 602 using t-test (lowest p-value). Second step, the reduction is implemented using sequential forward correlation which comes with only three gene features. With these only three genes a quadratic classification is done to test the features significance. The result of these classification attempt more than 96% of success.

Last modified: 2021-07-08 15:26:54