Anomaly Detection of Online Data using Oversampling Principal Component Analysis
Journal: International Journal of Science and Research (IJSR) (Vol.3, No. 12)Publication Date: 2014-12-05
Authors : Supriya A. Bagane; J. L. Chaudhari;
Page : 687-690
Keywords : Anomaly detection; principal Component Analysis; outlier; oversampling;
Abstract
Anomaly detection is very important topic in data mining and machine learning. This technique is helpful in many real world applications such as intrusion or credit card fraud detection, fault detection in safety critical systems, and military surveillance for enemy activities. Anomaly detection is basically used to find the patterns in data that do not conform to their expected behavior. Such patterns are termed as anomalies, outliers, discordant observations, exceptions, aberrations etc in different application domains. From all these terms anomalies and outliers can be used interchangeably. Outlier detection methods can be used to deal with extremely unbalanced data distribution problems. Most of the anomaly detection methods are implemented in batch mode due to which they cannot be extended to large scale problems. If we extend them to large scale problems, they will result in sacrificing computation and memory requirements. To tackle this problem we proposed oversampling Principal Component Analysis (osPCA) scheme in this paper. This technique aims at detecting the presence of outliers from large amount of data. In previously proposed Principal Component Analysis methods, it is required to store entire data matrix or covariance matrix, but this is not the case with our osPCA approach. So it can be extended to large scale or online problems. Principal Component Analysis is used to find the principal direction of the data and oversampling technique will duplicate the target instance multiple times to amplify the effect of outliers. By oversampling the target instance and extracting the principal directions of the data the osPCA allows us to determine the anomaly in target instance according to the variations in the resulting dominant eigenvector. This online updating technique allows us to efficiently calculate dominant eigenvector without eigen analysis or storing entire covariance matrix. Compared with the other anomaly detection methods the required computational costs and memory requirements are significantly reduced.
Other Latest Articles
- Cascading the Images based on S-R Algorithm
- An Evaluation of Projection Based Multiplicative Data Perturbation for KNN Classification
- Review on Medical Care Ontologies
- Effect of Shilajit on Testosterone Induced Benign Prostrate Hyperplasia in Rats
- Isolation of Lycopene from Tomato and Study of Its Antimicrobial Activity
Last modified: 2021-06-30 21:15:01