ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A REVIEW ON CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM FOR HIGH DIMENSIONAL DATA

Journal: International Journal of Engineering Sciences & Research Technology (IJESRT) (Vol.4, No. 1)

Publication Date:

Authors : ; ;

Page : 199-202

Keywords : Minimum Spanning Tree; Good Feature subset selection; Clustering;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

In HD dataset, feature selection involves identifying the subset of good features by using clustering approach. Feature selection involves removal of irrelevant and redundant features which are the essential data preprocessing activities for effective data mining. A Clustering based approach for good feature selection evaluated from both the efficiency and effectiveness points of view. Efficiency relates the time required to find a subset of good features while the effectiveness is related to the quality of the subset of features. The feature selection algorithm for high dimensional data produces the more compatible results as the original entire set of results based on search strategies, evaluation criteria, and data mining tasks. It reveals unattempted combinations, and provides guidelines in selection of feature selection algorithms. FAST algorithm for feature subset selection works in two steps. The first step involves distribution of feature subsets into clusters by using graph-theoretic clustering methods and the second step involves selection of most useful ,efficient features that is strongly related to the target classes which form the subset of good features. In FAST algorithm to increase the the efficiency we adopted efficient Minimum Spanning Tree clustering method. Based on some of these criteria, a clustering-based feature selection algorithm for HD data is proposed and experimentally evaluated in this paper.

Last modified: 2015-01-17 21:13:40