Abstract: |
Machine learning techniques have showcased their adeptness in identifying patterns within data, yet their efficacy diminishes when dealing with imbalanced datasets—a pervasive concern, especially apparent in the realm of Intrusion Detection Systems (IDS). IDS, pivotal for monitoring malicious activities in networks or systems, requires strategic interventions to address dataset imbalances and increase machine learning model accuracy. Of note, imbalanced IDS datasets harbour covert cyber-attacks amid their substantial imbalances, intricately complicating detection for conventional machine learning methods. This study introduces novel algorithms designed to rectify imbalances within IDS datasets. The first algorithm, named Markov Chain Monte Carlo Random Oversampling (MCMCRO), seamlessly integrates Markov Chain Monte Carlo (MCMC) and Random Oversampling techniques to systematically synthesize fresh data. Additionally, MCMCRO’s novel data synthesis capability is harnessed within the Generative Adversarial Network framework to formulate the second algorithm, GANMCMCRO (Generative Adversarial Networks Markov Chain Monte Carlo Random Oversampling). This framework augments the potency of MCMCRO’s data generation function within the data generator model. An evaluation conducted on the CSE-CIC-IDS2018 Dataset substantiates the efficacy of both algorithms. MCMCRO showcases a recall of 0.66, precision of 1, an F1 score of 0.79, and an overall accuracy of 0.91. Similarly, GANMCMCRO attains a recall of 0.81, precision of 0.82, an F1 score of 0.81, and an overall accuracy of 0.88, providing compelling evidence of their prowess in mitigating the challenges posed by imbalanced datasets. This research advances the field by introducing innovative techniques that demonstrate substantial potential in enhancing the accuracy of machine learning models for imbalanced data domains, particularly IDS datasets. |