ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Performance analysis of samplers and calibrators with various classifiers for asymmetric hydrological data

Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.10, No. 107)

Publication Date:

Authors : ; ;

Page : 1316-1335

Keywords : Machine learning; Calibration; Asymmetric data; Classification; Probability; Prediction.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Asymmetric data classification presents a significant challenge in machine learning (ML). While ML algorithms are known for their ability to classify symmetric data effectively, addressing data asymmetry remains an on-going concern in classification tasks. This research paper aims to select an appropriate method for classifying and predicting asymmetric data, focusing on label and probability predictions. To achieve this, various ML classifiers, calibration techniques, and sampling methods are systematically analyzed. The classifiers under consideration include logistic regression (LR), k-nearest neighbour (KNN), gaussian naive Bayes (GNB), random forest (RF), decision tree (DT), and support vector classifier (SVC). Calibration techniques explored encompass isotonic regression (IR) and platt scaling (PS), while sampling techniques comprise synthetic minority oversampling technique (SMOTE), T-link (Tomek), adaptive synthetic sampling (AdaSyn), integration of SMOTE and edited nearest neighbour (SMOTEENN), and integration of SMOTE and T-link (SMOTETomek). Simulation results for label prediction consistently favour the SMOTEENN approach, with the RF classifier combined with SMOTEENN providing outstanding performance, boasting a balanced random accuracy (BRA) of 98.07%, sensitivity of 98.02%, specificity of 99.01%, an area under the curve (AUC) of 0.98, and a geometric mean (G-mean) of 98.50%. In terms of probability prediction, IR calibration consistently excels. Specifically, the GNB classifier combined with IR produces the best performance, yielding a low brier score (BS), expected calibration error (ECE), and maximum calibration error (MCE). Furthermore, it achieves perfect calibration as demonstrated by the reliability curve. In light of these findings, this study recommends the utilization of SMOTEENN for data resampling and IR calibration for probability prediction as superior methods to address data asymmetry. The comparative analysis presented in this research offers valuable insights for selecting appropriate techniques in the context of asymmetric data classification.

Last modified: 2023-11-02 21:36:51