DMMLACS 2023 Abstracts

Area 1 - Data Mining and Machine Learning Applications for Cyber Security

Full Papers

Paper Nr:	6
Title:	Detecting the Manipulation of Text Structure in Text Steganography Using Machine Learning
Authors:	Benjamin Aziz and Aysha Bukhelli
Abstract:	We evaluate in this paper the security of a recent method proposed in literature for the embedding of hidden content in textual documents using paragraph size manipulation. Our steganalysis is based on machine learning, and the classification method we use for the analysis of a document utilises text attributes, such as words per paragraph, paragraph proportion based on sentences and other English document features. The embedding model showed to be resilient against the analysis techniques, where the highest plotted accuracy was 0.601, which is considered poor. The analysis methods were able to detect around half of the embedded corpus, which is equivalent to random guess. We concluded that it is difficult to detect an embedding model that manipulates paragraphs of novel texts, as the structure of these texts depend fully on the writer’s style of writing. Thus by shifting the sentences up and down paragraphs without changing the order of the sentences and affecting the context of the text, it yields a reasonably secure method of embedding.
Download

Paper Nr:	7
Title:	An Attention-Based Deep Generative Model for Anomaly Detection in Industrial Control Systems
Authors:	Mayra Macas, Chunming Wu and Walter Fuertes
Abstract:	Anomaly detection is critical for the secure and reliable operation of industrial control systems. As our reliance on such complex cyber-physical systems grows, it becomes paramount to have automated methods for detecting anomalies, preventing attacks, and responding intelligently. This paper presents a novel deep generative model to meet this need. The proposed model follows a variational autoencoder architecture with a convolutional encoder and decoder to extract features from both spatial and temporal dimensions. Additionally, we incorporate an attention mechanism that directs focus towards specific regions, enhancing the representation of relevant features and improving anomaly detection accuracy. We also employ a dynamic threshold approach leveraging the reconstruction probability and make our source code publicly available to promote reproducibility and facilitate further research. Comprehensive experimental analysis is conducted on data from all six stages of the Secure Water Treatment (SWaT) testbed, and the experimental results demonstrate the superior performance of our approach compared to several state-of-the-art baseline techniques.
Download

Paper Nr:	9
Title:	Cache Side-Channel Attacks Against Black-Box Image Processing Software
Authors:	Ssuhung Yeh and Yuji Sekiya
Abstract:	Cache side-channel attacks are a persisting threat to modern computers for their ability to steal secret information in memory and hard-to-detect characteristics. While researchers have studied these attacks for a long time, there has been relatively little focus on attacks against media software. One reason is the inherent noisiness of cache side-channels, making it challenging to extract meaningful information from it. However, recent advancements in machine learning have changed the landscape, making side-channel analysis more accessible. In this paper, we proposed a new side-channel analysis framework that is capable of extracting high-level information from complex applications. With this framework, we attacked image processing programs, reconstructed images that the victim opened with cache side-channel attacks, and achieved significantly improved results compared to the previous work.
Download

Short Papers

Paper Nr:	5
Title:	GANMCMCRO: A Generative Adversarial Network Markov Chain Monte Carlo Random Oversampling Algorithm for Imbalance Datasets
Authors:	Najmeh Abedzadeh and Matthew Jacobs
Abstract:	Machine learning techniques have showcased their adeptness in identifying patterns within data, yet their efficacy diminishes when dealing with imbalanced datasets—a pervasive concern, especially apparent in the realm of Intrusion Detection Systems (IDS). IDS, pivotal for monitoring malicious activities in networks or systems, requires strategic interventions to address dataset imbalances and increase machine learning model accuracy. Of note, imbalanced IDS datasets harbour covert cyber-attacks amid their substantial imbalances, intricately complicating detection for conventional machine learning methods. This study introduces novel algorithms designed to rectify imbalances within IDS datasets. The first algorithm, named Markov Chain Monte Carlo Random Oversampling (MCMCRO), seamlessly integrates Markov Chain Monte Carlo (MCMC) and Random Oversampling techniques to systematically synthesize fresh data. Additionally, MCMCRO’s novel data synthesis capability is harnessed within the Generative Adversarial Network framework to formulate the second algorithm, GANMCMCRO (Generative Adversarial Networks Markov Chain Monte Carlo Random Oversampling). This framework augments the potency of MCMCRO’s data generation function within the data generator model. An evaluation conducted on the CSE-CIC-IDS2018 Dataset substantiates the efficacy of both algorithms. MCMCRO showcases a recall of 0.66, precision of 1, an F1 score of 0.79, and an overall accuracy of 0.91. Similarly, GANMCMCRO attains a recall of 0.81, precision of 0.82, an F1 score of 0.81, and an overall accuracy of 0.88, providing compelling evidence of their prowess in mitigating the challenges posed by imbalanced datasets. This research advances the field by introducing innovative techniques that demonstrate substantial potential in enhancing the accuracy of machine learning models for imbalanced data domains, particularly IDS datasets.
Download

Paper Nr:	10
Title:	Adapting Open-Set Recognition Method to Various Time-Series Data
Authors:	András Halász, Lóránt Szabolcs Daubner, Nawar Al-Hemeary, János Juhász, Tamás Zsedrovits and Kálmán Tornai
Abstract:	In real-world scenarios, conventional classifier methods often stumble when faced with the unexpected emergence of unknown samples or classes previously unseen during training. Open-Set Recognition (OSR) models have emerged as a solution to this ubiquitous challenge. Our previous work introduced a robust OSR method leveraging synthesized – or “fake” – features to delineate the uncharted territory of unknowns, focusing on image datasets. Recognizing the imperative to extend this capability to diverse data types, we have successfully transposed this model to time-series datasets. A pivotal feature of the original model was its modular archi-tecture, allowing for focused modification in feature extraction. Consequently, the core components remained intact, including feature extraction, sample generation, and feature transformation. This paper illuminates our initial strides, employing a one-dimensional convolutional network for feature extraction and showcasing promising preliminary OSR results using that network. Additionally, our adapted model maintains its advantageous edge in terms of time complexity, achieved through the discreet generation of fake features in a simplified hidden layer. Future investigations will further delve into alternative feature extraction methodologies, promising to broaden the scope of applications for this adaptable OSR model.
Download