ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Evaluating the efficacy of decision tree-based machine learning in classifying intrusive behaviour of network users

Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.11, No. 114)

Publication Date:

Authors : ; ;

Page : 736-758

Keywords : Machine learning; Cross-validation; Discriminant power; Geometric mean; Random forest; Naïve bayes tree.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Building network intrusion detection models to detect intrusive behaviour of malicious users has been a major challenge to protect network resources. In this study, decision tree (DT) based machine learning (ML) classification techniques, namely, best first tree (BFT), functional tree (FT), J48, naïve Bayes tree (NBT), random forest (RF), random tree (RT), reduced error pruning tree (REPT), simple classification and regression tree (Simple CART) have been employed to build an anomaly-based network intrusion detection model. Further, in order to remove irrelevant features from the intrusion data three different categories of feature selection techniques, namely, (i) entropy based (gain ratio (GR), information gain (IG) and symmetrical uncertainty (SU)), (ii) statistical based (chi-squared, one-r, and relief-f), and (iii) search based exploratory data analysis (EDA), feature subset harmony search (FSHS), linear forward search (LFS), feature vote harmony search (FVHS)) have been applied. The proposed method was evaluated using the widely recognized NSL-KDD dataset. The efficacy of various combinations of eight classifiers and ten feature selection methods (eighty models) was analysed based on seventeen evaluation metrics such as sensitivity, false positive rate (FPR), Matthew's correlation coefficient (MCC), Kappa coefficient (KC), geometric mean (GM), and discriminant power (DP). Experimental results showed that LFS+RF model achieved the highest accuracy of 0.9989, sensitivity 0.9982, F-value 0.9988, specificity 0.9994, false negative rate (FNR) 0.0018, MCC 0.9977, GM 0.9988, and DP 7.6156 on the NSL-KDD dataset. The proposed model demonstrated its superiority over the other existing models such as support vector machine (SVM), JRip, bagging, deep learning, and neural network (NN).

Last modified: 2024-06-04 23:16:02