A Survey on Imbalanced Data Handling Techniques for Classification
Journal: International Journal of Emerging Trends in Engineering Research (IJETER) (Vol.9, No. 10)Publication Date: 2021-09-14
Authors : Abhisar Sharma Anuradha Purohit Himani Mishra;
Page : 1341-1347
Keywords : Imbalanced dataset; random under-sampling; random over-sampling; SMOTE.;
Abstract
Classification is a supervised learning task based on categorizing things in groups on the basis of class labels. Algorithms are trained with labeled datasets for accomplishing the task of classification. In the process of classification, datasets plays an important role. If in a dataset, instances of one label/class (majority class) are much more than instances of another label/class (minority class), such that it becomes hard to understand and learn characteristics of minority class for a classifier, such dataset is termed an imbalanced dataset. These types of datasets raise the problem of biased prediction or misclassification in the real world, as models based on such datasets may give very high accuracy during training, but as not familiar with minority class instances, would not be able to predict minority class and thus fails poorly. A survey on various techniques proposed by the researchers for handling imbalanced data has been presented and a comparison of the techniques based on f-measure has been identified and discussed
Other Latest Articles
- List Point Marker Path Finding for Artificial Intelligence Movement in 3D Games
- A Review of Ensemble Learning-Based Solutions for Phishing Website Detection
- Selection of Materials for Double Layer Antireflection Coating of Silicon Solar Cell
- Success Factors Affecting Public Projects of Construction Industry in Pakistan
- The Effect of User Experience on Designing Interactive Tool: A Case Study of Learning Images Compression
Last modified: 2021-10-14 21:51:09