A two-phase feature selection technique using mutual information and XGB-RFE for credit card fraud detection
Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.8, No. 85)Publication Date: 2021-12-30
Authors : C. Victoria Priscilla; D. Padma Prabha;
Page : 1656-1668
Keywords : Recursive feature elimination; Hyper-parameter optimization; Class imbalance; XGBoost; Binary classification;
Abstract
With the rapid increase in online transactions, credit card fraud has become a serious menace. Machine Learning (ML) algorithms are beneficial in building a good model to detect fraudulent transactions. Dealing with high-dimensional and imbalanced dataset becomes a hinder in real-world applications like credit card fraud detection. To overcome this issue, feature selection a pre-processing technique is adopted considering the classification performance and computational efficiency. This paper proposes a new two-phase feature selection approach that integrates filter and wrapper methods to identify the significant feature subsets. In the first phase, Mutual Information (MI) has been adopted due to its computational efficiency to rank the features based on their feature importance. However, they cannot drop the less important features. Thus, a second phase is added to eliminate the redundant features using Recursive Feature Elimination (RFE) a wrapper method employed by 5-fold cross-validation. eXtreme Gradient Boosting (XGBoost) is adopted as the estimator for RFE by adjusting the class weights. The optimal features obtained from the proposed method were used in four boosting algorithms such as XGBoost, Gradient Boosting Machine (GBM), Classic Gradient Boosting (CatBoost) and Light Gradient Boosting Machine (LGBM) to analyse the performance of classification. The proposed approach has been applied to the credit card fraud detection dataset obtained from the IEEE-CIS, which consists of imbalance in the binary class target. The experimental outcome shows promising results in terms of Geometric mean (G-Mean) for XGBoost (84.8%) and LGBM (83.7%), the Area Under a Receiver Operating Character (ROC) Curve (AUC) has increased from 79.8% to 85.5% for XGBoost and also the computation time are reduced in training the classifiers.
Other Latest Articles
- THE PROBLEM OF APPLYING THE LAW OF UNKNOWN STATES IN INTERNATIONAL PRIVATE LAW
- Improvement of torque undulation in BLDC using SEPIC and common end diode clamped three-level inverter
- DIRECTIONS OF IMPROVEMENT OF LEGAL REGULATION OF SURROGATE MOTHERHOOD IN UKRAINE
- CIVIL JURISDICTION THROUGH THE PRISM OF THE PRACTICE OF THE SUPREME COURT
- Method of reference image selection to provide high-speed aircraft navigation under conditions of rapid change of flight trajectory
Last modified: 2022-01-12 21:52:50