ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A two-phase feature selection technique using mutual information and XGB-RFE for credit card fraud detection

Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.8, No. 85)

Publication Date:

Authors : ; ;

Page : 1656-1668

Keywords : Recursive feature elimination; Hyper-parameter optimization; Class imbalance; XGBoost; Binary classification;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

With the rapid increase in online transactions, credit card fraud has become a serious menace. Machine Learning (ML) algorithms are beneficial in building a good model to detect fraudulent transactions. Dealing with high-dimensional and imbalanced dataset becomes a hinder in real-world applications like credit card fraud detection. To overcome this issue, feature selection a pre-processing technique is adopted considering the classification performance and computational efficiency. This paper proposes a new two-phase feature selection approach that integrates filter and wrapper methods to identify the significant feature subsets. In the first phase, Mutual Information (MI) has been adopted due to its computational efficiency to rank the features based on their feature importance. However, they cannot drop the less important features. Thus, a second phase is added to eliminate the redundant features using Recursive Feature Elimination (RFE) a wrapper method employed by 5-fold cross-validation. eXtreme Gradient Boosting (XGBoost) is adopted as the estimator for RFE by adjusting the class weights. The optimal features obtained from the proposed method were used in four boosting algorithms such as XGBoost, Gradient Boosting Machine (GBM), Classic Gradient Boosting (CatBoost) and Light Gradient Boosting Machine (LGBM) to analyse the performance of classification. The proposed approach has been applied to the credit card fraud detection dataset obtained from the IEEE-CIS, which consists of imbalance in the binary class target. The experimental outcome shows promising results in terms of Geometric mean (G-Mean) for XGBoost (84.8%) and LGBM (83.7%), the Area Under a Receiver Operating Character (ROC) Curve (AUC) has increased from 79.8% to 85.5% for XGBoost and also the computation time are reduced in training the classifiers.

Last modified: 2022-01-12 21:52:50