ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A Model for Improving Classifier Accuracy using Outlier Analysis

Journal: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY (Vol.7, No. 1)

Publication Date:

Authors : ; ; ;

Page : 500-509

Keywords : Data Mining; Outlier detection; BAD Score; NAVF; Fuzzy AVF;

Source : Download Find it from : Google Scholarexternal

Abstract

Anomalies are those records, which have different behavior and do not comply with the remaining records in the dataset. Outlier analysis is the concept to find anomalies in Datasets. ?Detecting outliers efficiently is an important issue in many fields of science, medicine and technology. Many methods are available to detect anomalies in numerical datasets but a limited number of methods available for categorical datasets. In this work, a novel method to detect outliers in categorical data based on entropy is proposed. This algorithm finds anomalies based on each record score and has great intuitive appeal. These scores called BAD scores. This algorithm utilizes the frequency of each value in the dataset. Greedy method needs k- scans of dataset to find ‘k’ outliers where as the proposed method needs only one scan of dataset and it calculates BAD score of each record directly. It avoids the problem of giving ‘k’ as an input and can find any number of outliers based on our data set directly.AVF method has less time complexity when compared with the other methods like Greedy, FPOF and FDOD. Greedy has good accuracy when compared with other methods like AVF and FPOF, FDOD (which are based on frequency patterns of all combinations of values in each record). Our algorithm shows better results in accuracy than AVF algorithm and Greedy. But this method has reached nearest to AVF in time complexity.? This algorithm has been applied on Nursery dataset and Bank dataset taken from “UCI Machine Learning Repository”. In this work, it is proposed to extend Normal distribution [11], and Fuzzy concept [12] to BAD score [13] that is NAVF combined with Fuzzy AVF is applied to BAD Score. ?Numerical attributes are excluded from Datasets for our analysis. The experimental results show that it is efficient for outlier detection in categorical dataset.

Last modified: 2016-06-29 19:35:50