DMMLACS 2021 Abstracts


Area 1 - Data Mining and Machine Learning Applications for Cyber Security

Full Papers
Paper Nr: 3
Title:

Asguard: Adaptive Self-guarded Honeypot

Authors:

Sereysethy Touch and Jean-Noël Colin

Abstract: Cybersecurity is of critical importance to any organisations on the Internet, with attackers exploiting any security loopholes to attack them. To combat cyber threats, a honeypot, a decoy system, has been an effective tool used since 1991 to deceive and lure attackers to reveal their attacks. However, these tools become increasingly easy to detect, which diminishes their usefulness. Recently, adaptive honeypots, which can change their behaviour in response to attackers, have emerged: despite their promise, however, they still have some shortcomings of their own. In this paper we survey conventional and adaptive honeypots and discuss their limitations. We introduce an approach for adaptive honeypots that uses Q-learning, a reinforcement learning algorithm, to effectively achieve two objectives at the same time: (1) learn to engage with attacker to collect their attack tools and (2) guard against being compromised by combining state environment and action to form a new reward function.
Download

Short Papers
Paper Nr: 2
Title:

Detecting Turkish Phishing Attack with Machine Learning Algorithm

Authors:

Melih Turhanlar and Cengiz Acartürk

Abstract: Phishing attacks are social engineering attacks that aim at stealing a victim’s personal information. The primary motivation is the exploitation of human emotion. The body of a phishing message usually includes a webpage link, aiming at convincing the victim to click and submit credentials. The victim typically connects to a mock webpage. There exist solutions for mitigating phishing attacks, such as phishing detection by Natural Language Processing (NLP). We present a framework for detecting phishing text in Turkish by running machine learning classifiers on an imbalanced phishing data set. The training dataset includes emails, SMS, and tweets. Our findings reveal that the Logistic Regression Synthetic Minority Over-Sampling Technique achieves high performance compared to a set of machine learning models tested in the study.
Download

Paper Nr: 4
Title:

Hoplite Antivirus for Adversarial Attacks: A Theoretical Approach

Authors:

Anastasios Nikolakopoulos, Achilleas Marinakis, Vrettos Moulos and Theodora Varvarigou

Abstract: In the scientific community of Machine Learning and Artificial Intelligence, Adversarial Attacks are evolving to an emerging issue. Carefully perturbed data samples invade to deep neural networks and cause problems, such as misclassifications and false / malformed outputs. The community has proposed multiple defense strategies, in order to overcome this problem. This paper summarizes the existing (and most well-known) adversarial attacks & defenses. Then, it proposes a potential solution to the issue, with a theoretical approach of an antivirus software scenario, the Hoplite Antivirus. This approach could be a vital step towards addressing the constantly evolving adversarial attacks, taking a note from the way software scientists defended (and keep defending) against computer viruses.
Download

Paper Nr: 7
Title:

Implementation and Repeatability Aspects Combined with Refactoring for a Reviews Manager System

Authors:

Agorakis Bompotas, Aristidis Ilias, Andreas Kanavos, Panayiotis Kechagias, Panayiotis Arvanitakis, Nikos Zotos, Konstantinos Kovas and Christos Makris

Abstract: With the advent of social media, there is a data abundance so that analytics can be reliably designed for ultimately providing valuable information towards a given product or service. Hotel customers express reviews for every accommodation service provided and/or for the accommodation as a whole. On the other hand, reviews are particularly interested for the tourism industry in order to extract customers’ opinions and aspects, which will assist them to improve their provided services. In this paper, we delve into the detail of design and implementation of a system that initially utilizes some pre-processing techniques, as classic Natural Language Processing approaches, namely TF-IDF bag of words and word embeddings, are employed. These approaches can be further used as the input of various classifiers and Long Short Term Memory Neural Networks. The main aspects of this system have been described in (Bompotas et al., 2020a) and (Bompotas et al., 2020b). In the present article we essentially refactor the system that was described in and by embedding in the implementation the Latent Dirichlet Allocation (LDA) component and perform a repeatibility study on the experimental findings that were reported in (Bompotas et al., 2020a) depicting that its experimental findings are valid.
Download

Paper Nr: 5
Title:

Federated Learning in Healthcare is the Future, But the Problems Are Contemporary

Authors:

Mustafa Y. Topaloglu, Elisabeth M. Morrell and Umit Topaloglu

Abstract: Federated Learning (FL) has originated out of a need to mitigate certain inherent limitations of ML, particularly the capability to train on larger datasets for improved performance, which is typically an unwieldy coordination for an inter-institutional collaboration due to existing patient protection laws and regulations. FL may also play a crucial role in bypassing ML’s innate algorithmic discrimination issues via the access of underrepresented groups’ data spanning across geographically distributed institutions and the diverse populations. FL inherits many of the difficulties of ML and as such we have discussed two pressing FL challenges, namely: privacy of the model exchange as well as equity and contribution considerations.
Download

Paper Nr: 6
Title:

Modeling Credibility in Social Big Data using LSTM Neural Networks

Authors:

Athanasios Lyras, Sotiria Vernikou, Andreas Kanavos, Spyros Sioutas and Phivos Mylonas

Abstract: Communication accounts for a vital need among people in order to express and exchange ideas, emotions, messages, etc. Social media fulfill this necessity as users can make use of a variety of platforms like Twitter, to leave their digital fingerprint by uploading personal data. The ever humongous volume of users claims for evaluation and that is why the subject of user credibility or trust in a social network is equally vital and meticulously discussed in this paper. Specifically, a trust method, as we measure user credibility and trust in a social environment using user metrics, is proposed. Our dataset is derived from Twitter and consists of tweets from a popular television series. Initially, our text data are analyzed and preprocessed using NLP tools and in following, a balanced dataset that serves in model evaluation and parameter tuning, is constructed. A deep learning forecasting model, which uses LSTM/BiLSTM layers along with classic Artificial Neural Network (ANN) and predicts user credibility, is accessed for its worth in terms of model accuracy.
Download