Enhancing intrusion detection with imbalanced data classification and feature selection in machine learning algorithms
Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.11, No. 112)Publication Date: 2024-03-31
Authors : S. V. Sugin; M. Kanchana;
Page : 405-419
Keywords : Machine learning (ML); Intrusion detection system (IDS); UNSW-NB15 dataset; XGBoost algorithm; Convolutional neural network (CNN).;
Abstract
The effectiveness of an organization in detecting and preventing computer network (CN) attacks is significantly influenced by the performance of intrusion detection systems (IDS) and intrusion prevention systems (IPS). This research focuses on IDS based on machine learning (ML), asserting that ML-based IDS are effective and accurate in detecting network attacks. The study examines the UNSW-NB15 network IDS dataset, which is used for training and testing the models. Furthermore, a filter-based attribute reduction approach was implemented using the extreme gradient boosting (XGBoost) algorithm. The condensed feature space then facilitates the application of various methods including support vector machine (SVM), logistic regression (LR), k-nearest neighbour (KNN), decision tree (DT), and convolutional neural network (CNN). A suitable feature selection approach is essential to eliminate features with minimal impact on the classification process. Additionally, the study notes that many ML-based IDS suffer from limited identification accuracy and a higher false positive rate (FPR) when trained on highly imbalanced datasets. The research considers configurations for both binary and multiclass classification. Results indicate that the XGBoost based attribute selection approach allows techniques such as DT to enhance the test accuracy of the binary-classification scheme from 88.13% to 90.85%. Moreover, the XGBoost-KNN and XGBoost-DT configurations demonstrate improved performance.
Other Latest Articles
- Machine learning algorithms for predicting of chronic kidney disease and its significance in healthcare
- Learning analytics with correlation-based SAN-LSTM mechanism for formative evaluation and improved online learning
- Enhancing data security in cloud computing: a blockchain-based Feistel cipher encryption and multiclass vector side-channel attack detection approach
- DRT mobility model for search and rescue operations based on catastrophic intensity to improve the quality of services
- Semaphore letter code recognition system using wavelet method and back propagation neural network
Last modified: 2024-04-04 15:29:13