A Survey on Imbalanced Data Handling Techniques for Classification

Journal: International Journal of Emerging Trends in Engineering Research (IJETER) (Vol.9, No. 10)

Publication Date: 2021-09-14

Authors : Abhisar Sharma Anuradha Purohit Himani Mishra;

Page : 1341-1347

Keywords : Imbalanced dataset; random under-sampling; random over-sampling; SMOTE.;

Source : Download Find it from : Google Scholar

Abstract

Classification is a supervised learning task based on categorizing things in groups on the basis of class labels. Algorithms are trained with labeled datasets for accomplishing the task of classification. In the process of classification, datasets plays an important role. If in a dataset, instances of one label/class (majority class) are much more than instances of another label/class (minority class), such that it becomes hard to understand and learn characteristics of minority class for a classifier, such dataset is termed an imbalanced dataset. These types of datasets raise the problem of biased prediction or misclassification in the real world, as models based on such datasets may give very high accuracy during training, but as not familiar with minority class instances, would not be able to predict minority class and thus fails poorly. A survey on various techniques proposed by the researchers for handling imbalanced data has been presented and a comparison of the techniques based on f-measure has been identified and discussed

Main Menu

Searching By

PARTNERS

A Survey on Imbalanced Data Handling Techniques for Classification

Abstract

Advertisement