ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

OPTIMAL FEATURE BASED DENSITY CLUSTERING FOR OUTLIER DETECTION IN MULTIVARIATE DATA

Journal: International Journal of Civil Engineering and Technology (IJCIET) (Vol.8, No. 9)

Publication Date:

Authors : ;

Page : 520-538

Keywords : Ant Colony Optimization; General Sequential pattern; Feature optimization; Local Outlier Factor; Outlier Detection.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Efficient outlier detection in a large-sized big data environment incurs much of complexity in processing the information and to handle it in a proficient way. For segregating outliers from those normal data items, many of the prevailing methodologies experiences complexity in accordance with the features involved in every single attribute. On recognizing appropriate features associated the characteristics of a data gets defined. The necessity of analyzing all sort of feature escalates the processing time along with memory consumption. As a way out of all of these issues, this paper proposes Optimal Feature based Outlier Factor Model (OFOFM), an effectual outlier detection approach accompanied with prior feature optimization strategy. Initially, preprocessing stage formats all data instances available in the dataset utilized and deployed in a SPARK architecture. Furthermore, an Ant Colony Optimization gets employed in determining for an optimal set of features among the wholesome feature set available. Generalized Sequence Pattern methodology gets employed for formulating tightly coupled sequential patterns that exclude outliers on the basis of a feature set. Moreover, a density based clustering approach involves in clustering those sequentially associated patterns as a means of forming densely associated clusters. As a final point, Local Outlier Factor based outlier detection methodology involves in discriminating outliers completely from that information processed so far. The efficacy of OF-OFM regarding outlier detection gets exemplified by evaluating Area Under Curve (AUC), CPU utilization time, execution time, detection accuracy and memory consumption against existing outlier detection methodologies. OF-OFM evidently proves to be efficacious than other approaches.

Last modified: 2018-04-16 14:51:05