ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering

Journal: International Journal for Modern Trends in Science and Technology (IJMTST) (Vol.6, No. 4)

Publication Date:

Authors : ; ;

Page : 41-48

Keywords : IJMTST;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Traditional cluster ensemble approaches have three limitations: (1) they do not make use of prior knowledge of the datasets given by experts. (2) Most of the conventional cluster ensemble methods cannot obtain satisfactory results when handling high dimensional data. (3) All the ensemble members are considered, even the ones without positive contributions. In order to address the limitations of conventional cluster ensemble approaches, we first propose an incremental semi-supervised clustering ensemble framework (ISSCE) which makes use of the advantage of the random subspace technique, the constraint propagation approach, the proposed incremental ensemble member selection process, and the normalized cut algorithm to perform high dimensional data clustering. The random subspace technique is effective for handling high dimensional data, while the constraint propagation approach is useful for incorporating prior knowledge [2]. The incremental ensemble member selection process is newly designed to judiciously remove redundant ensemble members based on a newly proposed local cost function and a global cost function, and the normalized cut algorithm is adopted to serve as the consensus function for providing more stable, robust, and accurate results. Then, a measure is proposed to quantify the similarity between two sets of attributes, and is used for computing the local cost function in ISSCE. Next, we analyze the time complexity of ISSCE theoretically [3]. It works well on datasets with very high dimensionality, and outperforms the state-of-the-art semi-supervised clustering ensemble approaches.Clustering techniques are applied to partition the transaction data values. High dimensional support, prior knowledge usage and equal membership priority are the key factors in the traditional cluster ensemble approach. Incremental Semi Supervised Cluster Ensemble (ISSCE) approach is built to solve the limitations of conventional cluster ensemble approaches [4]. The ISSCE approach uses the steps in random subspace technique, the constraint propagation approach, the incremental ensemble member selection process and the normalized cut algorithm to perform high dimensional data clustering. The random subspace technique is effective for handling high dimensional data. The constraint propagation approach is useful for incorporating prior knowledge. The incremental ensemble member selection process is applied to judiciously remove redundant ensemble members based on a local cost function and a global cost function.The normalized cut algorithm is adopted to serve as the consensus function for providing more stable, robust and accurate results [5]. A measure is applied to quantify the similarity between two sets of attributes, and is used for computing the local cost function in ISSCE. The incremental semi supervised clustering ensemble framework (ISSCE) approach is enhanced to support structure based parameter selection process. Datasets complexity is also integrated with the parameter selection process. Membership rearrangement mechanism is adapted to handle the incremental membership selection process. Member and ensemble weight measure is also applied to discover the importance of the cluster ensembles [6]. The cluster ensemble model is integrated with the Partition around Medoids (PAM) clustering scheme. The system also increases the clustering accuracy and scalability levels.

Last modified: 2020-05-05 20:58:20