ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Anomaly Detection of Online Data using Oversampling Principal Component Analysis

Journal: International Journal of Science and Research (IJSR) (Vol.3, No. 12)

Publication Date:

Authors : ; ;

Page : 687-690

Keywords : Anomaly detection; principal Component Analysis; outlier; oversampling;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Anomaly detection is very important topic in data mining and machine learning. This technique is helpful in many real world applications such as intrusion or credit card fraud detection, fault detection in safety critical systems, and military surveillance for enemy activities. Anomaly detection is basically used to find the patterns in data that do not conform to their expected behavior. Such patterns are termed as anomalies, outliers, discordant observations, exceptions, aberrations etc in different application domains. From all these terms anomalies and outliers can be used interchangeably. Outlier detection methods can be used to deal with extremely unbalanced data distribution problems. Most of the anomaly detection methods are implemented in batch mode due to which they cannot be extended to large scale problems. If we extend them to large scale problems, they will result in sacrificing computation and memory requirements. To tackle this problem we proposed oversampling Principal Component Analysis (osPCA) scheme in this paper. This technique aims at detecting the presence of outliers from large amount of data. In previously proposed Principal Component Analysis methods, it is required to store entire data matrix or covariance matrix, but this is not the case with our osPCA approach. So it can be extended to large scale or online problems. Principal Component Analysis is used to find the principal direction of the data and oversampling technique will duplicate the target instance multiple times to amplify the effect of outliers. By oversampling the target instance and extracting the principal directions of the data the osPCA allows us to determine the anomaly in target instance according to the variations in the resulting dominant eigenvector. This online updating technique allows us to efficiently calculate dominant eigenvector without eigen analysis or storing entire covariance matrix. Compared with the other anomaly detection methods the required computational costs and memory requirements are significantly reduced.

Last modified: 2021-06-30 21:15:01