WEBIST 2021 Abstracts


Area 1 - HCI in MOBILE SYSTEMS and WEB INTERFACES

Full Papers
Paper Nr: 17
Title:

A Teamwork Tool to Support Group Work in Online-based Higher Education: Exploring User Experience and the Use of Support Mechanisms by Students

Authors:

Jessica Brandenburger and Monique Janneck

Abstract: Teamwork is often used in online and blended courses. However, numerous problems can arise within computer-supported groups. In this paper we present a tool for supporting teamwork in computer-based collaborative learning (CSCL) in higher education. The tool was implemented as a Moodle plugin and combines automatic analyses of student behavior as well as students’ self-reports on their teamwork. The tool was evaluated in two field tests by students of an online and a blended learning course. Surveys, group discussions, and log file analysis were used as evaluation methods. The teamwork tool was rated positively in terms of usability and visual aesthetics. Functions that reflect participation of group members and task deadlines turned out to be very useful for students. In general, students consented to the automatic analysis of their learning behavior. Based on the results of our studies, we derive design implications as well as suggestions for improving functionalities to support students’ teamwork online.
Download

Paper Nr: 20
Title:

From Linked Data Querying to Visual Search: Towards a Visualization Pipeline for LOD Exploration

Authors:

Aline Menin, Catherine Faron, Olivier Corby, Carla S. Freitas, Fabien Gandon and Marco Winckler

Abstract: Over the recent years, Linked Open Data (LOD) has been increasingly used to support decision-making processes in various application domains. For that purpose, an increasing interest in information visualization has been observed in the literature as a suitable solution to communicate the knowledge described in LOD data sources. Nonetheless, transforming raw LOD data into a graphical representation (the so-called visualization pipeline) is not a straightforward process and often requires a set of operations to transform data into meaningful visualizations that suit users’ needs. In this paper, we propose a LOD generic visualization pipeline and discuss the implications of the internal operations (import → transform → map → render → interact) for creating meaningful visualizations of LOD datasets. To demonstrate the feasibility of this generic visualization pipeline, we implement it as the tool LDViz (Linked Data Visualizer). We demonstrate how LDViz supports access to any SPARQL endpoint through multiple use cases, allowing the users to perform searches with SPARQL queries and visualize the results using multiple visualization techniques.
Download

Paper Nr: 82
Title:

Web Tool based on Machine Learning for the Early Diagnosis of ASD through the Analysis of the Subject’s Gaze

Authors:

Sara Vecino, Martín Gonzalez-Rodriguez, Javier de Andres-Suarez and Daniel Fernandez-Lanvin

Abstract: Early autism spectrum disorder diagnosis is key to help children and their families sooner and thus, avoiding the high social and economic costs that would be produced other way. The aim of the project is to create a web application available in every health centre that could potentially be used as a previous step in early ASD diagnosis. It would be a fast way of diagnosing, spending almost no resources as it is web based. The system uses machine learning techniques to generate the diagnosis through the analysis of the data obtained from the eye tracker, and every time an evaluation is confirmed, it will be added to the training data set improving the evaluation process.
Download

Paper Nr: 85
Title:

A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements

Authors:

Julián Grigera, Juan C. Gardey, Alejandra Garrido and Gustavo Rossi

Abstract: Most documents in the WWW are generated from templates that represent user interface (UI) elements, and later filled with contents. In the field of information extraction, many approaches emerged to analyze the documents’ structure, obtain similar features amongst them, and generate wrappers that are used to extract the raw contents from such documents. Therefore, most techniques documented in the literature are optimized to compare full documents, but there are other fields of applicability that require analyzing structural similarity on smaller UI components, like web augmentation or transcoding. In this paper we present two flexible algorithms to measure similarity between DOM Elements by using a mixed approach that considers both elements’ location and inner structure. The proposed algorithms were used in the context of two projects: an approach for automatic usability refactoring, and a web accessibility helper. We also present a wrapper induction technique based on such algorithms. Additionally, we present a precision & recall evaluation of our algorithms as compared with other known approaches, applied to DOM elements of different sizes, but smaller than full scaled documents. The proposed algorithms run in linear time, so they are faster than most approaches that analyze structural similarity.
Download

Short Papers
Paper Nr: 29
Title:

On the Integration of Shared Autonomous Mobility on Demand in Mobility Service Platforms

Authors:

Felix Schwinger, Ralf Philipsen, Simon Himmel, Matthias Jarke and Martina Ziefle

Abstract: Recently, travelers increasingly book trips that combine public transportation with emerging mobility modes such as ride-sharing. Mobility service platforms aim to integrate this heterogeneous mobility mix on a single software platform. In practice, only first appearances of collaborations between public transit companies and ride-sharing companies have emerged so far. Especially with the coming emergence of autonomous vehicles, ride-sharing services will become a vital mobility mode as part of Mobility-as-a-Service schemes. Therefore, this study aims to research the requirements for integrating ride-sharing services into a mobility service platform from a user-centered and technical perspective. For this, we first analyzed the overall attitude towards autonomous ride-sharing in a citizen workshop and evaluated a prototype of the system in a small user study. Additionally, we conceptually integrated the service into an existing reference platform for mobility services and investigated the technical and operational differences between public transportation and ride-sharing services. The analysis shows that autonomous ride-sharing services are integrable into a mobility service platform but have distinct requirements that other mobility services such as scooter-sharing or public transit do not have.
Download

Paper Nr: 63
Title:

Integrating Gamification: The Human-Centered Gamification Process

Authors:

Alischa Thomas, Frederik Bader, Jörg Thomaschewski and Maria Rauschenberger

Abstract: The dynamically growing research area of gamification is loaded with a lack of consensus on definitions, a variety of non-validated frameworks, and few practical insights. Hence, we conducted a literature review to explore current best practices in applying gamification for integration in a practical use case. Instead, we found a narrow focus on theoretical discussions. For a stronger representation of practical research, standards need to be established for transferring gamification concepts to practical application. To fill this research gap, we designed a process and tools for a practical, human-centered, and context-related gamification application. We derived the process and tools from insights of our literature review as well as the realization of a gamification use case on a German online comparison platform. In addition, we incorporated standards such as the Human-Centered Design Process to maintain the established quality level of the field of user experience. In this paper, we present the Human-Centered Gamification Process (HCGP) and provide tools as practical guidance to lower the barrier for researchers and professionals to conduct theoretical and practical gamification projects.
Download

Paper Nr: 8
Title:

Study on CPU and RAM Resource Consumption of Mobile Devices using Streaming Services

Authors:

Przemyslaw Falkowski-Gilski and Michal Wozniak

Abstract: Streaming multimedia services have become very popular in recent years, due to the development of wireless networks. With the growing number of mobile devices worldwide, service providers offer dedicated applications that allow to deliver on-demand audio and video content anytime and everywhere. The aim of this study was to compare different streaming services and investigate their impact on the CPU and RAM resources, with respect to type of Internet connection. The paper consists of two parts: theoretical and research. The first part provides a description of current means of wireless communication, including transmission of multimedia in Wi-Fi and cellular systems, as well as principles of operation of popular streaming media available on the marked, including utilized coding algorithm and available bitrates. The second part describes the set of utilized consumer devices, including 50 smartphones, as well as tools, laboratory equipment, and research scenarios. Results of this study may aid both researchers and professionals involved in the digital mobile market, including content and service providers, as well as network operators.
Download

Paper Nr: 15
Title:

GIMO-PD: Towards a Health Technology Proposal for Improving the Personalized Treatment of Parkinson’s Disease Patients

Authors:

E. Enamorado-Díaz and J. A. García-García

Abstract: Parkinson's disease (PD) is the second most common neurodegenerative disease and its pharmacological treatment usually has unwanted side effects (motor fluctuations, dyskinesias and other motor alterations). These effects vary from patient to patient, resulting in the use of «trial and error» manual methods by healthcare professionals to optimize treatment. The GIMO-PD project (Mobile health solution based on Genetic profile, Image analysis and the permanent Monitoring of symptoms for the personalized management of Parkinson’s Disease patients) aims to present a technological solution for improving clinical decision-making on the allocation of appropriate personalized treatments according to the characteristics of each PD patient. This clinical decision support system integrates and combines patient biomarkers (such as genetic and neurological markers), motor markers (based on the computerised monitoring of activity and movement) and the digitization of clinical practice guidelines to optimise the diagnosis and treatment processes of patients with PD and to improve their quality of life.
Download

Paper Nr: 36
Title:

A User-centric System for Improving Human-Computer Interaction through Fuzzy Logic-based Assistive Messages

Authors:

Christos Troussas, Akrivi Krouska and Cleo Sgouropoulou

Abstract: The fast growth of the internet and communication technology in recent years has resulted in rendering computers easily accessible to everyone. However, people have different knowledge and characteristics that can affect their ability to use computers and at the same time create barriers to achieve an effective user experience. The reason for this is to provide dynamic adaptability to users' individual needs. In view of this compelling need, this paper presents a user-centric system that seeks mainly to improve the interaction of users with the software they use. To achieve this, the system employs fuzzy logic to model the computer knowledge of users and based on this classification, it delivers assistive messages, which are pertinent to the interaction with the system. These messages are tailored to the user groups that have been created, as well as the degree of detail which is more adequate for each group. As a testbed for our research, the presented approach has been incorporated in a learning management system to support tutors towards having a better experience while interacting with this software. The system has been evaluated by users during the COVID-19 lockdown with promising results.
Download

Paper Nr: 43
Title:

An Overview of Conversational Agent: Applications, Challenges and Future Directions

Authors:

Ahlam Alnefaie, Sonika Singh, Baki Kocaballi and Mukesh Prasad

Abstract: Recent years have seen the increased use of artificial intelligence technologies such as conversational agents. Conversational agents, also referred to as chatbots, are used to interact with users using natural language. Thus, various fields have started to adopt conversational agents such as education, healthcare, marketing, customer service, and entertainment. However, determine the motivations that drive the use of conversational agents and clarify their usefulness are challenging. This paper presents an overview of the evolution of conversational agents from an initial model to an advanced intelligent system and their deployment in various real-world applications. Moreover, this paper contributes to information system literature by comparing the different types of conversational agents based on their roles and interaction styles. This paper also highlights the current challenges of conversational applications along with recommendations for future research.
Download

Paper Nr: 71
Title:

Integrating Agile Human-Centered Design with Lean UX and Scrum

Authors:

Jenny Pilz, Jana Deutschländer, Jörg Thomaschewski and Eva-Maria Schön

Abstract: The integration of Human-Centered Design into agile product development can be challenging. In particular, the application of established user research and UX design methods within short feedback cycles frequently leads to discussions. This article provides an overview of the development and current state of research in Agile UX. In particular, we analyse Lean UX and answer the following questions: How can user research be implemented in agile product development and which best practices can be used to achieve it? For this purpose, we conducted a literature review and analysed how user research in agile product development has progressed in recent years. We discuss an approach that leverages identified best practices for Agile UX by introducing a new model for integrating Lean UX with Scrum to address the needs of the users more strongly in agile product development. We conclude that existing best practices and patterns already aim to adapt established user research methods to the agile framework. Lean UX provides a suitable approach to integrate such user research methods for use in agile product development. This approach has the potential to improve the user experience.
Download

Area 2 - INTERNET TECHNOLOGY

Full Papers
Paper Nr: 10
Title:

The LeWiS Method: Target Variable Estimation using Cyber Security Intelligence

Authors:

Leigh Chase, Alaa Mohasseb and Benjamin Aziz

Abstract: Information Technology plays an increasingly important role in the provision of essential services. For these systems and networks to be reliable and trustworthy, we must defend them from those who would seek to compromise their Confidentiality, Integrity and Availability. Security intelligence tells us about the Tactics, Techniques and Procedures used by threat actors for these very purposes. In this paper, we introduce a novel method for learning malicious behaviours and then estimating how likely it is that a system has been compromised. One of the difficulties encountered when applying machine learning to cyber security, is the lack of ground truth on which to train supervised techniques. This is often compounded by the volume, variety and velocity of data which is far greater than can be processed using only human analyses. The technique, known as LeWiS, includes data preparation and processing phases that learn and later predict the presence of threat actors using a model of their behaviours. The method addresses the problems of scale and veracity, by learning Indicators of Attack via feature extraction from security intelligence that has been obtained through empirical methods. This approach shows promising classification performance for detecting learned malicious behaviours, within synthesised systems’ event data.
Download

Paper Nr: 24
Title:

Beyond Classical SERVICE Clause in Federated SPARQL Queries: Leveraging the Full Potential of URI Parameters

Authors:

Olivier Corby, Catherine Faron, Fabien Gandon, Damien Graux and Franck Michel

Abstract: Semantic Web applications integrating very different software and data sources have to face the heterogeneity of the quality and compliance to standards exhibited by each involved resource. In this paper we propose a uniform way of adapting and customizing the behavior of both the client and the server components of an HTTP exchange to cope with this diversity. We revisit the classical SERVICE clause in SPARQL federated queries in order to parameterize the behavior of both the SPARQL client and the SPARQL service. We propose mechanisms to identify and specify SPARQL federated query services and extended SPARQL-based services.
Download

Paper Nr: 42
Title:

Memoryless: A Two-phase Methodology for Setting Memory Requirements on Serverless Applications

Authors:

Rodrigo R. Righi, Gabriel Borges, Cristiano André da Costa and Vinicius F. Rodrigues

Abstract: Serverless computing, also known as Function as a Service, is a new paradigm that aims to separate the user of the platform from details about any infrastructure deployment. The problem lies in the fact that all the current Serverless platforms require the user to specify at least the needed memory usage for their Serverless offerings. Here we have a paradox since the users must be involved in technical issues to run their applications efficiently, both in terms of execution time and financial costs. To the best of our knowledge, the state-of-the-art lacks on providing studies regarding the best memory size for a particular application setting. In this context, this work presents Memoryless, a computational methodology that is in charge of removing the completion of memory limits by the user when launching Serverless demands. To accomplish this, we introduce in the literature a two-pass algorithm composed of a microbenchmark where users inform simple application parameters (first pass) and receive from the hypervisor the memory required to run their demands. In addition to user abstraction, financial cost also drives our research, since commonly this metric is directly proportional to the selected memory size. We implemented Memoryless using NodeJS, Kubeless, and Kubernetes. The result confirms that the proposed methodology is capable of lowering the memory needs to run an application while maintaining expected execution times. This benefits both cloud administrators (who can run more Serverless demands for different users in parallel) and cloud users, who will pay less on using the cloud, so exploring better the pay-as-you-go policy.
Download

Paper Nr: 65
Title:

A Framework for Staging Personal Health Trains in the Cloud

Authors:

Virginia Graciano Martinez, Luís Ferreira Pires, Luiz S. Santos, João R. Moreira and Renata G. Souza

Abstract: Data analysis can be quite valuable for the progress of science in general, but more specifically in the healthcare domain, in which it can generate medical advances and improve healthcare services. The Personal Health Train (PHT) is an approach based on distributed learning that allows analytics to be brought to the (personal health) data rather than the other way around, allowing data privacy and control to be preserved, and ethical and legal concerns to be observed. Since computational resources are necessary whenever processing is expected to be done, a sandboxed environment should be available within the healthcare organization. This environment should allow the received algorithms to be executed without interfering with the organization’s regular processing. However, the IT infrastructure of a healthcare organization may not be powerful enough to perform a requested analysis task. This paper reports on our efforts to extend the PHT approach to allow data to be processed in the cloud, augmenting the processing power of the IT infrastructure of healthcare organizations. Our solution not only fulfills the functional requirements of the PHT approach, but it also complies with privacy regulations, particularly the General Data Protection Rules (GDPR). The paper describes the design and implementation of our solution, also demonstrating its suitability with a simple and yet representative case study.
Download

Paper Nr: 73
Title:

Simulated IoT Runtime with Virtual Smart Devices: Debugging and Testing End-user Automations

Authors:

Anthony Savidis, Yannis Valsamakis and Dimitris Linaritis

Abstract: The notion of end-user programming gains increasing attention in the context of the Internet of Things (IoT) as a promising way to enable users develop personalized automations by deploying visual programming tools. In an IoT ecosystem, devices may be either invisible to users, embedded or hardly locatable, sometimes physically inaccessible. In this sense, testing becomes very challenging and difficult, since bringing physical devices to certain states may be either impractical (e.g. window and door sensors) or overall unsafe (e.g. fire or smoke sensors). It is crucial that trials are carried out in a protected, virtual environment, not the physical one. In this context we discuss a simulated runtime that addresses the challenges of testing end-user automations by entirely virtualizing devices. In this runtime, tests are not confined to a particular location, but may be carried out anywhere and anytime, totally disengaged from the physical ecosystem, with all user tools residing in any typical mobile machine, capable to fully operate standalone in test mode. Finally, when automations involve time and scheduling, for practical reasons, time itself can be simulated so that testing is done on demand, not following or waiting the pace of physical time.
Download

Paper Nr: 75
Title:

Optimizing Resource Allocation in Edge-distributed Stream Processing

Authors:

Aluizio R. Neto, Thiago P. Silva, Thais V. Batista, Frederico Lopes, Flávia C. Delicato and Paulo F. Pires

Abstract: Emerging Web applications based on distributed IoT sensor systems and machine intelligence, such as in smart city scenarios, have posed many challenges to network and processing infrastructures. For example, environment monitoring cameras generate massive data streams to event-based applications that require fast processing for immediate actions. Finding a missing person in public spaces is an example of these applications, since his/her location is a piece of perishable information. Recently, the integration of edge computing with machine intelligence has been explored as a promising strategy to interpret such massive data near the sensor and reduce the end-to-end latency of processing events. However, due to the limited capacity and heterogeneity of edge resources, the placement of task processing is not trivial, especially when applications have different quality of service (QoS) requirements. In this paper, we develop an algorithm to solve the optimization problem of allocating a set of nodes with sufficient processing capacity to execute a pipeline of tasks while minimizing the operational cost related to latency and energy and maximizing availability. We compare our algorithm with the resource allocation algorithms (first-fit, best-fit, and worst-fit), achieving a lower cost in scenarios with different nodes’ heterogeneity. We also demonstrate that distributing processing across multiple edge nodes reduces latency and energy consumption and still improves availability compared to processing only in the cloud.
Download

Paper Nr: 88
Title:

Profiling Media Outlets and Audiences on Facebook: COVID-19 Coverage, Emotions and Controversy

Authors:

Luciana Oliveira and Joana Azevedo

Abstract: The outburst of the COVID-19 pandemic was accompanied by a steeply rise of worldwide media coverage of the phenomena, in which social media were deemed as critical platforms and became a popular place to receive and share information, as well as express personal views. In this paper, we present the preliminary results of an ongoing work devoted to analysing the media coverage of the COVID-19 outburst in Portugal (March-May 2020), the subsequent emotional engagement of audiences and the entropy-based emotional controversy generated. Using a cross-sectional descriptive methodology, we analyse the activity of the three major news outlets in the country for the category of general news. Our results reveal three distinct profiles of media coverage, negativity as the core engine for interacting with news and spreading information, negative and volatile Facebook reactions (“Angry”, “Haha” and “Wow”) as main inputs for controversy, prevailing on COVID-19 news, and a general tendency of audiences to share controversial news.
Download

Paper Nr: 95
Title:

Perception and Adoption of Customer Service Chatbots among Millennials: An Empirical Validation in the Indian Context

Authors:

Himanshu Joshi

Abstract: The last decade is witness to several successful automation efforts like customers service chatbots. Besides reducing costs for companies, chatbots saves time, effort, and enhances customer experience. Millennials being aspirational, educated and technology savvy find chatbots suited to the way they seek information. While there are several studies on technology adoption, work on chatbot adoption among millennials is scanty. The purpose of this study is to examine the factors which influence user intention, adoption and satisfaction related to chatbots. Hence, the objective is to develop a conceptual model through the extension of the Unified Theory of Acceptance and Use of Technology (UTAUT) in the context of chatbot adoption. A mixed method approach was employed characterized by qualitative data collection through five personal interviews followed by a quantitative web-based survey. The data was collected from 60 users of chatbot applications. The proposed model depicting 13 hypothesized relationships was estimated using the partial least squares- structural equation modelling (PLS-SEM) approach. The results show that performance expectancy and social influence significantly influence behavioural intention. Trust and facilitating conditions were found to impact satisfaction significantly. With respect to adoption, facilitating conditions, satisfaction and behavioural intention were found to have a positive but insignificant impact.
Download

Paper Nr: 96
Title:

Towards Services Profiling for Energy Management in Service-oriented Architectures

Authors:

Jorge A. Larracoechea, Philippe Roose, Sergio Ilarri, Yudith Cardinale, Sébastien Laborie and Mauricio J. González

Abstract: Even though hardware architects have managed to incrementally mitigate energy consumption in information and communication technology devices, it will always be a requisite for software execution. This has motivated researchers to develop a limited amount of methodologies that promote green software development and its philosophy, with new assessment methods for calculating the energetic costs of software development and software execution. In spite of this, they have been acknowledged and adopted with limited success, as they try to address highly-volatile variables (like human behavior) and environments with specific hardware/software platforms and language-centric solutions. This has created a conflict between theory and practice where, otherwise, a generic and adaptive approach could manage the discord. In this paper, we present a brief review of available selected research in relation to services’ requirements definition and profiling for energy management, as well as the limitations and advantages of existing proposals in relation to green software development. Furthermore, we present our progress towards a series of properties to define services’ requirements and their resource consumption behavior. Our final goal is to create a proper approach for energy management from the analysis and design phases of the Software Development Life Cycle using Service-Oriented Architectures as a platform for our work.
Download

Short Papers
Paper Nr: 11
Title:

Prediction Sentiment Polarity using Past Textual Content and CNN-LSTM Neural Networks

Authors:

Yassin Belhareth and Chiraz Latiri

Abstract: Sentiment analysis in social networks plays an important role in different areas, and one of its main tasks is to determine the polarity of sentiments about many things. In this paper, our goal is to create a supervised machine learning model for predicting the polarity of users’ sentiments, based solely on their textual history, about a predefined topic. The proposed approach is based on neural network architectures: the long short term memory (LSTM) and the convolutional neural networks (CNN). To experiment our system, we have purposely created a collection from SemEval-2017 data. The results revealed that our approach outperforms the comparison approach.
Download

Paper Nr: 12
Title:

A Website with an Activity based Traffic Indicator System as a Warning Tool for the COVID-19 Pandemic

Authors:

Justin Junsay, Aaron J. Lebumfacil, Ivan G. Tarun and William E. Yu

Abstract: This study describes an activity based traffic indicator system to provide information for the management of the COVID-19 pandemic. The purpose of the indicator is to be able to discern what activities (e.g., grocery shopping and sports) are dangerous, uncertain, or safe to do in the Philippines on a per-region basis through a traffic light’s color of red, yellow, and green. The activity based traffic indicator system does this by utilizing a social probability model based on the birthday paradox to determine the exposure risk which is the probability of meeting someone infected (PoMSI). Additionally, a website called SANITTISE was created to host the indicator system and also to display other pandemic related graphs. Furthermore, a user interface/user experience (UI/UX) test was conducted through a survey to measure the effectiveness of the website created. Regarding the results of the test, it was positive since all of the sections were well received in the survey. This meant that the work done on the website appears substantial as the respondents were able to understand the contents and purpose of the website and also effectively traverse the website and create deductions from the information available on the website.
Download

Paper Nr: 16
Title:

Working in a Smart Home-office: Exploring the Impacts on Productivity and Wellbeing

Authors:

Davit Marikyan, Savvas Papagiannidis, Rajiv Ranjan and Omer Rana

Abstract: Following the outbreak of the Coronavirus (COVID-19) pandemic, many organisations have shifted to remote working overnight. The new reality has created conditions to use smart home technologies for work purposes, for which they were not originally intended. The lack of insights into the new application of smart home technologies has led to two research objectives. First, the paper aimed to investigate the factors correlating with productivity and perceived wellbeing. Second, the study tried to explore individuals’ intentions to use smart home offices for remote work in the future. 528 responses were gathered from individuals who had smart homes and had worked from home during the pandemic. The results showed that productivity positively relates to service relevance, perceived usefulness, perceived ease of use, hedonic beliefs, control over environmental conditions, innovativeness and attitude. Task-technology fit, service relevance, attitude to smart homes, innovativeness, hedonic beliefs, perceived usefulness, perceived ease of use and control over environmental conditions correlate with perceived wellbeing. The intention to work from smart home-offices in the future is determined by perceived wellbeing. Findings contribute to the research on smart homes and remote work practices, by providing the first empirical evidence about the new applications and outcomes of smart home use in the work context.
Download

Paper Nr: 31
Title:

Technological Model using Machine Learning Tools to Support Decision Making in the Diagnosis and Treatment of Pediatric Leukemia

Authors:

Daniel Mendoza-Vasquez, Stephany Salazar-Chavez and Willy Ugarte

Abstract: In recent years, multiple applications of machine learning have been visualized to solve problems in different contexts, in which the health field stands out. That is why, based on what has been previously described, there is a wide interest in developing models based on machine learning for the creation of solutions that support medical assistance for disease such as pediatric cancer. Our work defines the proposal of a technological model based on machine learning which seeks to analyze the input medical data to obtain a predictive result, oriented to support the decision making of the specialist physician in relation to the diagnosis and treatment of pediatric leukemia. For the evaluation of the proposed model, a web validation system was developed that communicates with a service hosted on a cloud server which performs the predictive analysis of the inputs entered by the physician. As a result, an accuracy rate of 92.86% was obtained in the diagnosis of pediatric leukemia using the multiclass boosted decision tree classification algorithm.
Download

Paper Nr: 33
Title:

On Provisioning Procedural Geometry Workloads on Edge Architectures

Authors:

Ilir Murturi, Chao Jia, Bernhard Kerbl, Michael Wimmer, Schahram Dustdar and Christos Tsigkanos

Abstract: Contemporary applications such as those within Augmented or Virtual Reality (AR/VR) pose challenges for software architectures supporting them, which have to adhere to stringent latency, data transmission, and performance requirements. This manifests in processing 3D models, whose 3D contents are increasingly generated procedurally rather than explicitly, resulting in computational workloads (i.e., perceived as Procedural Geometry Workloads) with particular characteristics and resource requirements. Traditionally, executing such workloads takes place in resource-rich environments such as the cloud. However, the massive amount of data transfer, heterogeneous devices, and networks involved affect latency, which in turn causes low-quality visualization in user-facing applications (e.g., AR/VR). To overcome such challenges, processing elements available close to end users can be leveraged to generate 3D models instead, and as such the edge emerges as a central architectural entity. This paper describes such procedural geometry workloads, their particular characteristics, and challenges to execute them on heterogeneous devices. Furthermore, we propose an architecture capable of provisioning procedural geometry workloads in edge scenarios.
Download

Paper Nr: 39
Title:

ProFog: A Proactive Elasticity Model for Fog Computing-based IoT Applications

Authors:

Guilherme G. Barth, Rodrigo R. Righi, Cristiano André da Costa and Vinicius F. Rodrigues

Abstract: Today, streaming, Artificial Intelligence, and the Internet of Things (IoT) are being some of the main drivers to accelerate process automation in various companies. These technologies are often connected to critical tasks, requiring reliable and scalable environments. Although Fog Computing has been on the rise as an alternative to address those challenges, we perceive a gap in the literature related to adaptability on the number of resources on both cloud and fog layers. Multiple studies suggest different Cloud-Fog architectures for IoT implementations, but not thoroughly addressing elasticity control mechanisms. In this context, this article presents ProFog as a proactive elasticity model for IoT-based Cloud-Fog architectures. ProFog uses the ARIMA prediction model to anticipate load behaviors, so triggering scaling actions as close to when they are required as possible. This strategy allows the delivery of new resources before reaching an overloaded or underloaded state, benefiting performance, and energy saving. We developed a ProFog prototype that showed an improvement of 11.21% in energy consumption in favor of ProFog.
Download

Paper Nr: 52
Title:

An Open-source Testbed for IoT Systems

Authors:

Augusto Ciuffoletti

Abstract: A research team that wants to validate a new IoT solution has to implement a testbed. It is a complex step since it must provide a realistic environment, and this may require skills that are not present in the team. This paper explores the requirements of an IoT testbed and proposes an open-source solution based on low-cost and widely available components and technologies. The testbed implements an architecture consisting of a collector managing several edge devices. Security levels and duty-cycle are tunable depending on the specific application. After analyzing the testbed requirements, the paper illustrates a template that uses WiFi for the link layer, HTTPS for structured communication, an ESP8266 board for edge units, and a RaspberryPi for the collector.
Download

Paper Nr: 66
Title:

Automatic Detection and Classification of Cognitive Distortions in Journaling Text

Authors:

Mai Mostafa, Alia El Bolock and Slim Abdennadher

Abstract: Cognitive distortions are negative thinking patterns that people adopt. Left undetected, it could lead to developing mental health problems. The goal of cognitive behavioral therapy is to correct and change cognitive distortions that in turn help with the recovery from mental illnesses such as depression and anxiety, overcoming addictions, and facing common life challenges. The aim of this study is to provide a machine learning solution for the automatic detection and classification of common cognitive distortions from journaling texts. Relatively few works have focused on exploring machine learning solutions and tools in the context of cognitive-behavioral therapy. And, given the rising popularity of online therapy programs, this tool could be used for instant feedback, and would also be a helpful service for therapists and psychiatrists to initiate and ease the detection of cognitive distortions. In this study, we provide a novel dataset that we used to train machine learning and deep learning algorithms. We then employed the best- performing model in an easy-to-use user interface.
Download

Paper Nr: 68
Title:

A RESTful Northbound Interface for Applications in Software Defined Networks

Authors:

Abdullah Alghamdi, David Paul and Edmund Sadgrove

Abstract: Software Defined Networking (SDN) aims to help overcome the complexities inherent in traditional networks. The main concept in SDN is the decoupling of the data layer from the control layer, the latter of which is centralised in a controller. OpenFlow has been adopted as the standard protocol for the southbound interface, where the controller communicates with forwarding devices. However, the northbound interface (NBI), connecting the controller with end-user business applications, does not have an open standard. NBIs have accelerated application development because developers can implement required functionality without the need to consider matters related to the data layer, but there is an issue of compatibility because each SDN has its own NBI. In this position paper we present a plan to design a RESTful NBI for SDN applications to improve compatibility across SDN technologies.
Download

Paper Nr: 69
Title:

Collaborative Historical Platform for Historians: Extended Functionalities in Pauliceia 2.0

Authors:

Karla D. Fook, Daniela L. Musa, Nandamudi Vijaykumar, Rodrigo M. Mariano, Gabriel R. Morais, Raphael O. Silva, Gabriel Sansigolo, Luciana Rebelo, Luís C. Ferla, Cintia Almeida, Luanna Nascimento, Vitória M. Fontes da Silva, Monaliza D. Santos, Aracele Torres, Ângela Pereira, Fernando Atique, Jeffrey Lesser, Thomas D. Rogers, Andrew G. Britt, Rafael Laguardia, Ana A. Barbour, Orlando G. Farias, Ariana Marco, Caróu Dickinson and Sand Tamires P. Camargo

Abstract: The paper discusses a platform that has been developed to support cataloguing maps, data, images, audio and video files for the researchers involved in the area of Digital Humanities. The platform is open source and online. The most important aspect of the platform is that it is entirely collaborative, i.e., those interested in history can upload information as well as download the available datasets. The main actors of the platform development are Geocoding and VGI (Voluntary Geographic Information). VGI was based on crowdsourcing. Besides showing the platform’s functionalities, the paper also presents some very relevant improvements that have been the requests from the community of Historians after using the platform.
Download

Paper Nr: 74
Title:

A Practical Experience Applying Security Audit Techniques in an Industrial e-Health System Which Uses an Open Source ERP

Authors:

Julián Gómez, Miguel Á. Olivero, J. A. García-García and María J. Escalona

Abstract: Healthcare institutions is an ever-innovative field, in which modernization is moving forward taking giant steps. This modernization, so called “digitization”, brings up some concerns that should be carefully considered. Currently, the most sensible concerning in this field is the management of Electronic Health Record and patients’ data privacy. Health-related data in healthcare systems are under strict regulations, such as the EU’s General Data Protection Regulation (GDPR), whose non-compliance imposes huge penalties and fines. Cybersecurity in healthcare plays an important role at protecting these sensitive data, which are highly valuable for criminals. Security experts follow already existing security frameworks to orchestrate the security assessment process, so that the auditing process is as complete and as organized as possible. This study extends the lifecycle of a security assessment framework and conducts an exploitation and vulnerabilities’ analysis on an actual industrial scenario. The results of this security audit shows that even if the system is heavily fortified, there can be still some vulnerabilities.
Download

Paper Nr: 83
Title:

Anomaly Detection in Multivariate Spatial Time Series: A Ready-to-Use Implementation

Authors:

Chiara Bachechi, Federica Rollo, Laura Po and Fabio Quattrini

Abstract: IoT technologies together with AI, and edge computing will drive the evolution of Smart Cities. IoT devices are being exponentially adopted in the urban context to implement real-time monitoring of environmental variables or city services such as air quality, parking slots, traffic lights, traffic flows, public transports etc. IoT observations are usually associated with a specific location and time slot, therefore they are spatio-temporal collections of data. And, since IoT devices are generally low-cost and low-maintenance, their data can be affected by noise and errors. For this reason, there is an urgent need for anomaly detection techniques that are able to recognize errors and noise on sensors’ data streams. The Spatio-Temporal Behavioral Density-Based Clustering of Applications with Noise (ST-BDBCAN) algorithm combined with Spatio-Temporal Behavioral Outlier Factor (ST-BOF) employs both spatial and temporal dimensions to evaluate the distance between sensor observations and detect anomalies in spatial time series. In this paper, a Python implementation of ST-BOF and ST-BDBCAN in the context of IoT sensor networks is described. The implemented solution has been tested on the traffic flow data stream of the city of Modena. Four experiments with different parameters’ settings are compared to highlight the versatility of the proposed implementation in detecting sensor fault and recognizing also unusual traffic conditions.
Download

Paper Nr: 86
Title:

Vulnerability Assessment of Angolan University Web Applications

Authors:

Emanuel Mateus and Carlos Serrão

Abstract: Vulnerability assessment is one of the technical procedures that can help prevent serious security breaches, which, when exploited, can undermine brand credibility and or the continuity of a business. Universities hold and process important relevant and sensitive student and staff information appealing to attackers and might affect the organisations' credibility if such information is disclosed. This work presents a study conducted to assess the security status of the Angolan universities' web applications, identifying the most frequent security vulnerabilities and their criticality, based on OWASP Top 10 and CWE Top 25 references to identify and validate the findings discovered during the automatic vulnerability assessment process.
Download

Paper Nr: 93
Title:

A Model Driven Framework for the Development of Adaptable REST SERVICES

Authors:

Adil Kenzi and Fadoua Yakine

Abstract: REST (Representational State Transfer) is an architecture style for distributed, open, loosely coupled and decentralized hypermedia systems such as the Web. In the context of this architectural style, Restful Web services has gained significant attention in both academy and industry sectors. Restful Services may interact with several types of service requesters. Therefore, the key issue is how to deal with the challenge of adaptability of Restful Services. In this paper, we propose a framework for the development of adaptable REST services. The core building blocks of this framework is a Unified Modeling Language profile called RESTVSoaML, and its associated tool support RESTVSoaMLTool. RESTVSoaML aims the modeling of adaptable Restful Web services regardless of standards and implementation platforms. RESTVSoamLTool is an MDD tool that enables the generation of code by using a model transformation language, from high level models defined with our profile RESTVSoaML. In particular, it permits the generation of the description of each RESTFUL service and its implementation.
Download

Paper Nr: 7
Title:

Development and Performance Analysis of RESTful APIs in Core and Node.js using MongoDB Database

Authors:

Endrit Shkodra, Edmond Jajaga and Mehmet Shala

Abstract: The purpose of this paper is to present a comparative study of Core and Node.js for the development of Representational state transfer (RESTful) Application Programming Interface (API) using MongoDB non-relational database. The study includes Create, Read, Update and Delete (CRUD) functionality, authentication and authorization using the JavaScript Object Notation (JSON) Web Token token as well as the easiness and development time of the two competing technologies. Tests show that in general the performance between the two technologies does not differ much. Different tests indicate that the performance of one technology is better than the other and vice versa. However, Core outperforms Node.js in a test case with large loads.
Download

Paper Nr: 28
Title:

Carsharing System for Urban Transport in Lima using Internet of Things

Authors:

Jean P. Vásquez-Garaya, Elizabeth Munayco-Apolaya and Willy Ugarte

Abstract: Carsharing has become a trend in the transport industry that has been growing exponentially in recent years and gaining popularity in large cities in Europe and Asia such as Madrid, Berlin, Amsterdam, among others. This work presents an implementation of a carsharing system that activates all the car’s functionalities through an application without the need for additional elements such as cards, physical keys, etc. Likewise, being able to connect the car and mobile application through an IoT device and backed by a cloud infrastructure, it offers a new mobility modality that unites technology, in the city of Lima that is flexible, safe and affordable for most of the people; in addition to bringing to Peru the concepts of shared economy and uberization of things. We present that in Lima there is a very deplorable, disorderly and low-quality urban transport system generating many problems for users. The first alternative solution would be to acquire a private car, but for many people it is not accessible, especially for people between 20 and 45 years old, since it entails having a large budget that includes the cost of the car, maintenance, security permits, among others, that many people do not have in their entirety. We report such as Internet of Things, Cloud Computing and Applications Mobile, with an innovative technological architecture with new advances in the automotive field, such as electric cars.
Download

Paper Nr: 59
Title:

A Survey Study and Analysis of Task Scheduling Approaches for the Major Computing Environments

Authors:

Dalia A. Rizk, Hoda M. Hosny, El-Sayed M. El-Horbaty and Abdel-Badeeh M. Salem

Abstract: Nowadays, task scheduling is the central point of attraction with respect to cloud computing. Retrieve, store, or compute /analyse data on the cloud are typical types of such tasks. Due to the huge amount of data that are found on the cloud and the need for deep analysis and heavy computation; the importance of task scheduling in an appropriate way for execution increases. In this paper, we present diverse types of algorithms for task scheduling on different environments namely: cloud, grid, and fog under the two widely known types of task representations (task-based and workflow-based).
Download

Paper Nr: 89
Title:

Experimental Evaluation of the Message Formats’ Impact for Communication in Multi-party Edge Computing Applications

Authors:

Tarciso O. Filho, Dalton G. Valadares, Thiago F. Meneses, Adauto Ferreira de Barros, Aramis S. Araujo and Danilo S. Santos

Abstract: The increasing integration of 5G, multi-access edge computing (MEC), and microservices, benefits the development of applications that demand low coupling, low communication latency, high scalability, and high availability. An usual scenario that deals with such requirements is a video application, either to process inference on video images or process video analytics. Given that video data are considered heavy to process and transmit, we should investigate the best way to handle such data. This work presents an experimental setup for the comparison between four data formats used to send video frames among distributed application components in a MEC server. We measured and analyzed the communication latency when sending video data between distributed parties, considering three scenarios.
Download

Area 3 - SOCIAL NETWORK ANALYTICS

Full Papers
Paper Nr: 47
Title:

Impact of Tailored Network Splitting and Community Features’ Change Rates on Prediction Accuracy in Dynamic Social Networks

Authors:

Narimene Dakiche, Karima Benatchba, Fatima Benbouzid-Si Tayeb and Yahya Slimani

Abstract: With its various real-life applications, predicting community evolution is a challenging task in the field of social network analysis. In this paper, we analyze communities’ evolution prediction accuracy in dynamic social networks. The proposed approach combines two key concepts of the process, aiming to enrich the prediction model by additional information that could improve the results: (1) a tailored network splitting that results in snapshots of different periods rather than a static one, and (2) the change rates of communities’ features that characterize them over time instead of absolute values of features. Our experiments on four real-world social networks confirm that community evolution prediction can be achieved with a very high accuracy by using both tailored network splitting as a first step of prediction process and change rates of features.
Download

Paper Nr: 48
Title:

Commercial Sentiment Analysis Solutions: A Comparative Study

Authors:

Tatiana Ermakova, Max Henke and Benjamin Fabian

Abstract: Empirical insights into high-promising commercial sentiment analysis solutions that go beyond their vendors’ claims are rare. Moreover, due to ongoing advances in the field, earlier studies are far from reflecting the current situation due to the constant evolution of the field. The present research aims to evaluate and compare current solutions. Based on tweets on the airline service quality, we test the solutions of six vendors with different market power, such as Amazon, Google, IBM, Microsoft, and Lexalytics, and MeaningCloud, and report their measures of accuracy, precision, recall, (macro) F1, time performance, and service level agreements (SLA). For positive and neutral classifications, none of the solutions showed precision of over 70%. For negative classifications, all of them demonstrate high precision of around 90%, however, only IBM Watson NLU and Google Cloud Natural Language achieve recall of over 70% and thus can be seen as worth considering for application scenarios where negative text detection is a major concern. Overall, our study shows that an independent, critical experimental analysis of sentiment analysis services can provide interesting insights into their general reliability and particular classification accuracy beyond marketing claims to critically compare solutions based on real-world data and analyze potential weaknesses and margins of error before making an investment.
Download

Short Papers
Paper Nr: 3
Title:

How Does the Indonesian Government Communicate Food Security during COVID-19 Pandemic: A Social Media Analysis on Indonesia Official Twitter Account

Authors:

Dimas Subekti, Eko P. Purnomo, Lubna Salsabila and Aqil T. Fathani

Abstract: This study aims to determine communication about food security during the COVID-19 Pandemic by Analyzing the Indonesian Government's official Twitter account. This research method uses the NVIVO 12 plus in analyzing data with chart, cluster, and word cloud analysis. This research's data source came from the Food Security Agency Twitter accounts and the Logistics Affairs Agency. This study chose the Food Security Agency and the Logistics Affairs Agency's Twitter social media accounts because they are responsible for Indonesia's food security. The finding of this study, the Food Security Agency is more dominant in discussing communication content related to agriculture, availability of foodstuffs, food needs, and food prices compared to the Logistics Affairs Agency. Meanwhile, the Logistics Affairs Agency is superior in communicating content about rice availability during the COVID-19 pandemic. Content is related to one another, but the most vital link is between foodstuffs and rice availability. The Food Security Agency and Logistics Affairs Agency's communication narrative with the Indonesian people during the COVID-19 pandemic concerns rice, prices, food, and Indonesian farmers. The Logistics Affairs Agency has a higher communication intensity than the Food Security Agency with the Indonesian people in early 2020 to March 2021 period.
Download

Paper Nr: 14
Title:

Opinion and Sentiment Analysis of Twitter Users during the 2021 Ecuador Presidential Election

Authors:

Jorge Parraga-Alava, Jorge Rodas-Silva, Iván Quimi and Roberth Alcivar-Cevallos

Abstract: Social media data have been used for opinion and sentiment analysis and seem to have the potential to reflect the political picture of many territories. This paper analyzes the opinions and sentiments of users about the organization and candidates of the 2021 Ecuadorian presidential election to determine whether these can be considered as a relevant factor to predict election outcomes in this country. We used a social media analytics methodology with four phases: first two correspond to data acquisition and pre-processing, where Twitter search API was used for fetching election-related tweets that were taken and converted into a structured format; in the third phase, an opinion analysis was performed to offer statistics about the number of tweets and users, hashtags, mentions and, word clouds. In the fourth phase, we verified the emotional attitude of the users regarding the presidential candidates by using sentiment analysis. The results showed that most of the users’ opinions reflected positive sentiment about presidential candidate Arauz in the first round. On the other hand, in the second round, presidential candidate Lasso, concerning the first round, captured a more significant positive response from Twitter users, who achieved a closed result over candidate Arauz. Finally, it is concluded that there is a correspondence between positive sentiments expressed in the tweets and the total votes obtained by candidates.
Download

Paper Nr: 18
Title:

Recommendation Recovery with Adaptive Filter for Recommender Systems

Authors:

José M. Blanco, Mouzhi Ge and Tomáš Pitner

Abstract: Most recommender systems are focused on suggesting the optimal recommendations rather than finding a way to recover from a failed recommendation. Thus, when a failed recommendation appears several times, users may abandon to use a recommender system by considering that the system does not take her preference into account. One of the reasons is that when a user does not like a recommendation, this preference cannot be instantly captured by the recommender learning model, since the learning model cannot be constantly updated. Although this can be to some extent alleviated by critique-based algorithms, fine tuning the preference is not capable of fully expelling not-preferred items. This paper is therefore to propose a recommender recovery solution with an adaptive filter to deal with the failed recommendations while keeping the user engagement and, in turn, allow the recommender system to become a long-term application. It can also avoid the cost of constantly updating the recommender learning model.
Download

Paper Nr: 25
Title:

A Mixed Model for Identifying Fake News in Tweets from the 2020 U.S. Presidential Election

Authors:

Vítor Bernardes and Álvaro Figueira

Abstract: The recent proliferation of so called “fake news” content, assisted by the widespread use of social media platforms and with serious real-world impacts, makes it imperative to find ways to mitigate this problem. In this paper we propose a machine learning-based approach to tackle it by automatically identifying tweets associated with questionable content, using newly-collected data from Twitter about the 2020 U.S. presidential election. To create a sizable annotated data set, we use an automatic labeling process based on the factual reporting level of links contained in tweets, as classified by human experts. We derive relevant features from that data and investigate the specific contribution of features derived from named entity and emotion recognition techniques, including a novel approach using sequences of prevalent emotions. We conclude the paper by evaluating and comparing the performance of several machine learning models on different test sets, and show they are applicable to addressing the issue of fake news dissemination.
Download

Paper Nr: 64
Title:

Predicting Stock Market Movements with Social Media and Machine Learning

Authors:

Paraskevas Koukaras, Vasiliki Tsichli and Christos Tjortjis

Abstract: Microblogging data analysis and sentiment extraction has become a popular approach for market prediction. However, this kind of data contain noise and it is difficult to distinguish truly valid information. In this work we collected 782.459 tweets starting from 2018/11/01 until 2019/31/07. For each day, we create a graph (271 graphs in total) describing users and their followers. We utilize each graph to obtain a PageRank score which is multiplied with sentiment data. Findings indicate that using an importance-based measure, such as PageRank, can improve the scoring ability of the applied prediction models. This approach is validated utilizing three datasets (PageRank, economic and sentiment). On average, the PageRank dataset achieved a lower mean squared error than the economic dataset and the sentiment dataset. Finally, we tested multiple machine learning models, showing that XGBoost is the best model, with the random forest being the second best and LSTM being the worst.
Download

Paper Nr: 76
Title:

Real-time Recommendation System for Stock Investment Decisions

Authors:

Artur Bugaj and Weronika T. Adrian

Abstract: Recommendation systems have become omnipresent, helping people making decisions in various areas. While most of the systems can give accurate recommendations, their learning procedures can be time-consuming. In some cases, this is not permissible; for example when the information about the items and users changes very fast in time. In this paper, we discuss a new recommendation engine, based on labelled property graph knowledge representation and attributed network embeddings, which calculates real-time recommendations for stock investment decisions. In particular, we demonstrate an application of the DANE (dynamic attributed network embedding) framework proposed by Li et al. and show the promising results of the system.
Download

Paper Nr: 38
Title:

A Ridesharing Recommendation Framework with Hard and Soft Constraints

Authors:

Mow F. Kim and Hamidah Ibrahim

Abstract: Ridesharing, the act or practice of sharing automobile trips, has now becoming very popular due to many benefits it provides not only to the society, economic but also the environment. There are several ridesharing frameworks and applications being proposed, however in identifying a ridesharing group consisting of a driver and passengers (also known as riders), most of these solutions rely on the hard constraints which include timeslot (departure and arrival), location (departure and arrival), and capacity of the vehicle. Since these people will be sharing a ride together and they are strangers to each other, it is important to consider their preferences in identifying an ideal group. These preferences, called soft constraints in this paper, include among others race, age group, gender, non-smoking, etc. This is for assuring a pleasant, cosy, and most importantly a safe journey. Hence, this paper proposes a ridesharing recommendation framework that aims at identifying an ideal group by considering both the hard and the soft constraints. The framework is then embedded into a mobile application prototype, named SAGE, which aims to provide a safe, available, green, and economical ridesharing service.
Download

Area 4 - WEB INTELLIGENCE and SEMANTIC WEB

Full Papers
Paper Nr: 19
Title:

When owl:sameAs is the Same: Experimenting Online Resolution of Identity with SPARQL Queries to Linked Open Data Sources

Authors:

Raphaël Gazzotti and Fabien Gandon

Abstract: Equivalence links are the cornerstone of Linked Data and their integration. However, it is not easy to establish and manipulate them, since the Web is always evolving with datasets emerging and disappearing. Inconsistencies may also be present on the Web, leading to erroneous assertions and inferences. We propose a method to identify owl:sameAs relationships of a resource relying on online SPARQL querying of distributed datasets and to correct results using declarative curation rules. We also exploit and inspect the quality of owl:InverseFunctionalProperty and owl:FunctionalProperty relationships, using the definitions given by their schemata, endpoints and a voting approach. We evaluate our method on an existing benchmark and compare to state of the art baselines. We show that a heuristic approach can retrieve high quality equivalence links without requiring the extraction of all the alleged existing equivalence relations.
Download

Paper Nr: 32
Title:

CONCORDIA: COmputing semaNtic sentenCes for fRench Clinical Documents sImilArity

Authors:

Khadim Dramé, Gorgoumack Sambe and Gayo Diallo

Abstract: Detecting similar sentences or paragraphs is a key issue when dealing with texts duplication. This is particularly the case for instance in the clinical domain for identifying the same multi-occurring events. Due to lack of resources, this task is a key challenge for French clinical documents. In this paper, we introduce CONCORDIA, a semantic similarity computing approach between sentences within French clinical texts based on supervised machine learning algorithms. After briefly reviewing various semantic textual similarity measures reported in the literature, we describe the approach, which relies on Random Forest, Multilayer Perceptron and Linear Regression algorithms to build supervised models. These models are thereafter used to determine the degree of semantic similarity between clinical sentences. CONCORDIA is evaluated using the Spearman correlation and EDRM classical evaluation metrics on standard benchmarks provided in the context of the Text Mining DEFT 2020 challenge based. According to the official DEFT 2020 challenge results, the CONCORDIA Multilayer Perceptron based algorithm achieves the best performances compared to all the other participating systems, reaching an EDRM of 0.8217.
Download

Paper Nr: 49
Title:

A Comparison of Smart Grids Domain Ontologies

Authors:

José M. Blanco, Bruno Rossi and Tomáš Pitner

Abstract: Smart Grids (SG) represent one of the key critical infrastructures. Over time, several ontologies were defined in the SG domain to model aspects such as devices and sensors integration, and prosumers’ communication needs. In this paper, we review the state of the art regarding semantic web reasoning in the domain of SGs. We compare five main ontologies in terms of descriptive statistics (e.g., number of axioms), load time and reasoners runtime performance. Results show that not all the ontologies in the SG domain are readily available, and that some of them might be more appropriate for deployment in devices with limited computational resources.
Download

Paper Nr: 51
Title:

Lbl2Vec: An Embedding-based Approach for Unsupervised Document Retrieval on Predefined Topics

Authors:

Tim Schopf, Daniel Braun and Florian Matthes

Abstract: In this paper, we consider the task of retrieving documents with predefined topics from an unlabeled document dataset using an unsupervised approach. The proposed unsupervised approach requires only a small number of keywords describing the respective topics and no labeled document. Existing approaches either heavily relied on a large amount of additionally encoded world knowledge or on term-document frequencies. Contrariwise, we introduce a method that learns jointly embedded document and word vectors solely from the unlabeled document dataset in order to find documents that are semantically similar to the topics described by the keywords. The proposed method requires almost no text preprocessing but is simultaneously effective at retrieving relevant documents with high probability. When successively retrieving documents on different predefined topics from publicly available and commonly used datasets, we achieved an average area under the receiver operating characteristic curve value of 0.95 on one dataset and 0.92 on another. Further, our method can be used for multiclass document classification, without the need to assign labels to the dataset in advance. Compared with an unsupervised classification baseline, we increased F1 scores from 76.6 to 82.7 and from 61.0 to 75.1 on the respective datasets. For easy replication of our approach, we make the developed Lbl2Vec code publicly available as a ready-to-use tool under the 3-Clause BSD license∗.
Download

Short Papers
Paper Nr: 21
Title:

MENTORS: Monitoring Environment for System of Systems

Authors:

Antonello Calabrò, Said Daoudagh and Eda Marchetti

Abstract: Context: Systems Of Systems (SoSs) are becoming a widespread emerging architecture, and they are used in several daily life contexts. Therefore, when a new device is integrated into an existing SoS, facilities able to efficaciously assess and prevent anomalous and dangerous situations are necessary. Objective: The aim is to define a reference environment conceived for monitoring and assessing the behavior of SoS when a new device is added. Method: In this paper, we present MENTORS, a monitoring environment for SoS. MENTORS is based on semantic web technologies to formally represent SoS and Monitoring knowledge through a core ontology, called MONTOLOGY. Results and Conclusion: We defined the conceptual model of MENTORS, which is composed of two phases: Off-line and On-line, supported by a reference architecture that allows its (semi-)automation. Validation of the proposal with real use-cases is part of future activities.
Download

Paper Nr: 23
Title:

Optimizing Sales Forecasting in e-Commerce with ARIMA and LSTM Models

Authors:

Konstantinos N. Vavliakis, Andreas Siailis and Andreas L. Symeonidis

Abstract: Sales forecasting is the process of estimating future revenue by predicting the amount of product or services a sales unit will sell in the near future. Although significant advances have been made in developing sales forecasting techniques over the past decades, the problem is so diverse and multi-dimensional that only in a few cases high accuracy predictions can be achieved. In this work, we propose a new hybrid model that is suitable for modeling linear and non-linear sales trends by combining an ARIMA (autoregressive integrated moving average) model with an LSTM (Long short-term memory) neural network. The primary focus of our work is predicting e-commerce sales, so we incorporated in our solution the value of the final sale, as it greatly affects sales in highly competitive and price-sensitive environments like e-commerce. We compare the proposed solution against three competitive solutions using a dataset coming from a real-life e-commerce store, and we show that our solution outperforms all three competing models.
Download

Paper Nr: 26
Title:

Text Analytics Can Predict Contract Fairness, Transparency and Applicability

Authors:

Nicola Assolini, Adelaide Baronchelli, Matteo Cristani, Luca Pasetto, Francesco Olivieri, Roberto Ricciuti and Claudio Tomazzoli

Abstract: There is a growing attention, in the research communities of political economics, onto the potential of text analytics in classifying documents with economic content. This interest extends the data analytics approach that has been the traditional base for economic theory with scientific perspective. To devise a general method for prediction applicability, we identify some phases of a methodology and perform tests on a large well-structured repository of resource contracts containing documents related to resources. The majority of these contracts involve mining resources. In this paper we prove that, by the usage of text analytics measures, we can cluster these documents on three indicators: fairness of the contract content, transparency of the document themselves, and applicability of the clauses of the contract intended to guarantee execution on an international basis. We achieve these results, consistent with a gold-standard test obtained with human experts, using text similarity based on the basic notions of bag of words, the index tf-idf, and three distinct cut-off measures.
Download

Paper Nr: 57
Title:

Harnessing Hypermedia MAS and Microservices to Deliver Web Scale Agent-based Simulations

Authors:

Rem Collier, Seán Russell and Fatemeh Golpayegani

Abstract: This paper presents a vision for a new breed of Agent-Based Simulations that are built on the technology of the Web. Inspired by the emergence of the recently proposed Hypermedia Multi-Agent Systems concept - which combines the concepts Hypermedia Systems, Semantic Web and Affordances - we propose a novel approach to implementing complex agent-based simulations built from suites of loosely-coupled reusable components in a manner that ensures scalability.
Download

Paper Nr: 77
Title:

Set Expander: A Knowledge-based System for Entity Set Expansion

Authors:

Weronika T. Adrian and Paweł Caryk

Abstract: Entity Set Expansion (ESE) is a problem that underlies several important tasks, such as entity recommendation, query expansion, synonymy finding etc. Traditional strategies relied on corpus-based methods to recognize the intended category of the input words. But with the growing importance and visibility of knowledge graphs, new methods based on explicit knowledge representation have been put forward to solve the ESE problem. In this paper, we review the existing knowledge-based methods for entity set expansion and introduce a new online tool called Set Expander that uses semantic technologies and knowledge bases to solve the ESE problem efficiently. We present the algorithms and implemented techniques that ensure responsiveness and effectiveness of the tool. We analyze the strengths and weaknesses of the proposed solution and envision the future research directions.
Download

Paper Nr: 78
Title:

SCAN-NF: A CNN-based System for the Classification of Electronic Invoices through Short-text Product Description

Authors:

Diego S. Kieckbusch, Geraldo R. Filho, Vinicius Di Oliveira and Li Weigang

Abstract: This research presents a Convolutional Neural Network (CNN) based system, named SCAN-NF, to classify Consumer Electronic Invoices (NFC-e) based on product description. Due to how individual issuers submit Consumer Electronic Invoices, processing these invoices is often a challenging task. Information reported is often incomplete or presents mistakes. Before any meaningful processing over these invoices, it is necessary to assess the product represented in each document. SCAN-NF is developed to identify correct products codes in electronic invoices based on short-text product descriptions. Real data from Brazilian NFC-e and NF-e documents related to B2B and retail transactions are used in experiments. Comparing base single model and proposed ensemble model approaches, the evaluation results using recall, precision, and accuracy show the satisfaction of the developed system.
Download

Paper Nr: 87
Title:

A Bi-recursive Auto-encoders for Learning Semantic Word Embedding

Authors:

Amal Bouraoui, Salma Jamoussi and Abdelmajid Ben Hamadou

Abstract: The meaning of a word depends heavily on the context in which it is embedded. Deep neural network have recorded recently a great success in representing the words’ meaning. Among them, auto-encoders based models have proven their robustness in representing the internal structure of several data. Thus, in this paper, we present a novel deep model to represent words meanings using auto-encoders and considering the left/right contexts around the word of interest. Our proposal, referred to as Bi-Recursive Auto-Encoders (Bi-RAE ), consists in modeling the meaning of a word as an evolved vector and learning its semantic features over its set of contexts.
Download

Paper Nr: 94
Title:

Process Mining Analytics for Industry 4.0 with Graph Signal Processing

Authors:

Georgios Drakopoulos, Eleanna Kafeza, Phivos Mylonas and Spyros Sioutas

Abstract: Process mining is the art and science of (semi)automatically generating business processes from a large number of logs coming from potentially heterogeneous systems. With the recent advent of Industry 4.0 analog enterprise environments such as floor shops and long supply chains are bound to full digitization. In this context interest in process mining has been invigorated. Multilayer graphs constitute a broad class of combinatorial objects for representing, among others, business processes in a natural and intuitive way. Specifically the concepts of state and transition, central to the majority of existing approaches, are inherent in these graphs and coupled with both semantics and graph signal processing. In this work a model for representing business processes with multilayer graphs along with related analytics based on information theory are proposed. As a proof of concept, the latter have been applied to large synthetic datasets of increasing complexity and with real world properties, as determined by the recent process mining scientific literature, with encouraging results.
Download

Paper Nr: 30
Title:

ARKIVO Dataset: A Benchmark for Ontology-based Extraction Tools

Authors:

Laura Pandolfo and Luca Pulina

Abstract: The amount of data available on the Web has grown significantly in the past years, increasing thus the need for efficient techniques able to retrieve information from data in order to discover valuable and relevant knowledge. In the last decade, the intersection of the Information Extraction and Semantic Web areas is providing new opportunities for improving ontology-based information extraction tools. However, one of the critical aspects in the development and evaluation of this type of system is the limited availability of existing annotated documents, especially in domains such as the historical one. In this paper we present the current state of affairs about our work in building a large and real-world RDF dataset with the purpose to support the development of Ontology-Based extraction tools. The presented dataset is the result of the efforts made within the ARKIVO project and it counts about 300 thousand triples, which are the outcome of the manually annotation process executed by domain experts. ARKIVO dataset is freely available and it can be used as a benchmark for the evaluation of systems that automatically annotate and extract entities from documents.
Download

Paper Nr: 34
Title:

Which Is More Helpful in Finding Scientific Papers to Be Top-cited in the Future: Content or Citations? Case Analysis in the Field of Solar Cells 2009

Authors:

Masanao Ochi, Masanori Shiro, Jun’ichiro Mori and Ichiro Sakata

Abstract: With the increasing digital publication of scientific literature and the fragmentation of research, it is becoming more and more difficult to find promising papers. Of course, we can examine the contents of a large number of papers, but it is easier to look at the references cited. Therefore, we want to know whether a paper is promising or not based only on its content and citation information. This paper proposes a method of extracting and clustering the content and citations of papers as distributed representations and comparing them using the same criteria. This method clarifies whether the future promising papers will be biased toward content or citations. We evaluated the proposed method by comparing the distribution of the papers that would become the top-cited papers three years later among the papers published in 2009. As a result, we found that the citation information is 39.9% easier to identify the papers that will be the top-cited papers in the future than the content information. This analysis will provide a basis for developing more general models for early prediction of the impact of various scientific researches and trends in science and technology.
Download

Paper Nr: 62
Title:

Towards a Smart Identification of Tax Default Risk with Machine Learning

Authors:

Vinicius Di Oliveira, Ricardo M. Chaim, Li Weigang, Sergio B. Neto and Geraldo R. Filho

Abstract: The failure to perceive non-payment of the tax due is the main risk of tax inspection. The complex tax legislation and the volume of information available must be overcome for facing tax evasion. There is a gap in studies investigating the analysis of tax default risk and Machine Learning algorithms. This study proposes the use of ML algorithms ordinarily used on credit risk analysis as a risk analysis tool for tax default. The tax data preparation issue was faced by discretizing qualitative and quantitative variables. This work presents a new approach for the classification of companies regarding tax avoidance using Machine Learning. The developed ANN model achieved an AUC = 0.9568 in the classification task. The study gathers more than 300 thousand companies in the city of Brasilia - Brazil, analyzing their socioeconomic and financial characteristics.
Download

Paper Nr: 72
Title:

Entity Linking of Sound Recordings and Compositions with Pre-trained Language Models

Authors:

Nikiforos Katakis and Pantelis Vikatos

Abstract: In this paper, we present a Deep Learning (DL) approach to tackle a real-world, large-scale music entity matching task. The quality of data, the lack of necessary information, and the absence of unique identifiers affect the effectiveness of entity matching and pose many challenges to the matching process. We propose an efficient matching method for linking recordings to their compositions through metadata using pre-trained language models. We represent each entity as a vector and estimate the similarity between vectors for a pair of entities. Our experiments show that an application of language models such as BERT, DistilBERT or ALBERT to large text corpora significantly improves the matching quality at an industrial level. We created a human- annotated dataset with sound recordings and composition pairs obtained from music usage logs and publishers, respectively. The proposed language model achieves 95% precision and reaches 96.5% recall which is a high performance on this challenging task.
Download