Banner
Home      Log In      Contacts      FAQs      INSTICC Portal
 

Keynote Lectures

Gpt-4 Versus Bert: Which Foundation Model Is More Suitable for Integrating Data from the Web?
Christian Bizer, University of Mannheim, Germany

Recommendation Systems in Scholarly Publishing
Yannis Manolopoulos, School of Pure and Applied Sciences, Open University of Cyprus, Cyprus

Capturing the Meaning of Industrial Data
Dimitris Kiritsis, STI-IGM, EPFL, Switzerland

 

Gpt-4 Versus Bert: Which Foundation Model Is More Suitable for Integrating Data from the Web?

Christian Bizer
University of Mannheim
Germany
 

Brief Bio
Christian Bizer explores technical and empirical questions concerning the development of global, decentralized information environments. His research focus is the evolution of the World Wide Web from a medium for the publication of documents into a global dataspace. Christian Bizer has co-founded the DBpedia project which derives a comprehensive knowledge graph from Wikipedia. He initialized the WebDataCommons project which monitors the adoption of schema.org, JSON-LD, RDFa, Microdata, and Microformats on the Web by analyzing large Web crawls. His technical research focuses on the integration of data from large numbers of Web data sources and includes topics such as information extraction, identity resolution, schema matching, data fusion, and data search. Christian Bizer is full professor at the University of Mannheim where he leads Web-based Systems group. http://dws.informatik.uni-mannheim.de/bizer


Abstract
The Web contains vast amounts of structured data in the form of HTML tables, schema.org annotations, as well as datasets accessible via data repositories. The automated integration of data from larger numbers of Web data sources is a long-standing research challenge as the integration requires dealing with several tricky tasks such as schema matching, entity matching, and data indexing for retrieval. Most state-of-the-art methods for these tasks rely on variants of the BERT transformer model fine-tuned using significant amounts of task-specific training data. In the talk, Christian Bizer will critically review BERT-based data integration methods and question their robustness concerning out-of-distribution entities. He will compare the performance of BERT-based methods with results of GPT-4-based data integration methods and will argue that GPT-4-based methods are more training data efficient and more robust concerning unseen entities.



 

 

Recommendation Systems in Scholarly Publishing

Yannis Manolopoulos
School of Pure and Applied Sciences, Open University of Cyprus
Cyprus
 

Brief Bio
Yannis Manolopoulos holds a 5-years Diploma degree in Electrical Engineering (1981) and a Ph.D. degree in Computer Engineering (1986), both from the Aristotle University of Thessaloniki. He is Professor of the Open University of Cyprus, as well as Professor Emeritus of the Aristotle University of Thessaloniki. Moreover, he is Member of Academia Europaea, London. He has been with the University of Toronto, the University of Maryland at College Park, the University of Cyprus and the Hellenic Open University. He has served as Vice-Rector of the Open University of Cyprus, President of the Board of the University of Western Macedonia in Greece and Vice-President of the Greek Computer Society. Currently, he serves as Dean of the Faculty of Pure and Applied Sciences of the Open University of Cyprus and Member of the Board of the Research and Innovation Foundation of Cyprus. His research interest focuses in Data Management. He has co-authored 6 monographs and 10 textbooks (in Greek), as well as >350 journal and conference papers. He has received >17500 citations from >2600 distinct academic institutions from >100 countries (h-index=60 according to Google Scholar, d-index=61 according to Research.com). He has also received 5 best paper awards from SIGMOD, ECML/PKDD, MEDES (2) and ISSPIT conferences. Currently, he serves in the Editorial Boards of the following journals (among others): Digital (Editor-in-Chief), The Computer Journal (Deputy Editor), Information Systems, World Wide Web, Expert Systems, Data Science & Analytics.


Abstract
The area of Recommendation Systems has matured after intensive theoretical studies by researchers and practical applications by large e-commerce companies. On the other hand, Scientometrics has become an independent field, focusing in the study of laws and statistics related to scholarly publications. Nowadays, the publishing industry has accumulated big bibliographic data. Thus, the need to provide recommendations when searching in the abundance of bibliographic data has arised.
• Journal recommenders are important tools for researchers as many journals belonging to different publishers have emerged.
• Conference recommenders are useful towards avoiding predatory ones.
• Citation recommenders play an important role to alleviate the dilemma that researchers spend a lot of time and experiences for literature survey.
• Reviewer recommenders for scientific research proposals are helpful tools for funding agencies.
• Article recommendation to best fit reviewers is crucial to achieve constructive reviews towards a strong conference program.
• Collaborator Recommenders learn from researchers’ publications and advice about persons which can give research directions.

These are some fundamental research questions in the intersection area between Recommendation Systems and Scientometrics. In this talk, key approaches for each question will be presented, discussed and compared.



 

 

Capturing the Meaning of Industrial Data

Dimitris Kiritsis
STI-IGM, EPFL
Switzerland
 

Brief Bio
Prof. Dr. Dimitris Kyritsis (Kiritsis) is Emeritus Professor of ICT for Sustainable Manufacturing at EPFL, Lausanne, Switzerland and Senior Adviser at the Department of Informatics of the University of Oslo, Norway. His research interests are Closed Loop Lifecycle Management, Industrial Internet of Things (IIoT), Semantic Technologies and Data Analytics for Engineering Applications, Industrial Ontologies, Knowledge Graphs  and Cognitive Digital Twins for industrial applications. He served as Guest Professor at the IMS Center of the University of Cincinnati, and Invited Professor at the University of Technology of Compiègne, the University of Technology of Belfort-Montbéliard and at ParisTech ENSAM Paris. Dimitris is actively involved in EU research programs in the area of Connected Factories of the Future, Zero Defect Manufacturing, Circular Manufacturing, Twin (Green and Digital) Transition etc. He has more than 250 publications. Dimitris served as Director of the Doctoral Program on Robotics, Control and Intelligent Systems of EPFL from 2019 t0 2021. From 2013 to 2019. Dimitris was the Chair of IFIP WG5.7-Advances in Production Management Systems. He is founding member of the International Society for Engineering Asset Management and co-founder of the Industrial Ontologies Foundry. Dimitris was Member of the WEF Global Future Council on Advanced Manufacturing and Value Chains from 2019 to 2022, since 2022 he is Editor-in-Chief of the Open Access journal Frontiers in Manufacturing Technology and since 2023 Member of the WEF Impact Circle: Digital Transformation of Industries.


Abstract
We live in the age of data, where everything that surrounds us is linked to a data source and many aspects of our lives are being more and more digitalized. Under the lens of data heterogeneity, it can be argued that the fervent search for the meaning of data constitutes the main challenge of data integration. Today, myriads of data are collected at every moment by various types of sensors and the trend is increasing with the availability of smaller, cheaper and more efficient sensors that are able to sense almost everything. Let’s think about this simple example that everyone is doing very often: what is the source of the data we read on the thermometer when we are taking the temperature of our body? The thermometer itself as many people reply to this question? But the thermometer itself is not able to create any heat or increase the temperature of our body. It is rather a phenomenon in our body that creates heat and increases its temperature which, in its turn, is captured by the thermometer if we apply it correctly and at the right point of our body. The above reflection can be done with any type of sensor that is used to capture the value of something we want to measure: temperature, vibration, acceleration, movement, etc. The real source of the captured data is not the sensor itself but a particular phenomenon of the object where the sensor is embedded on. The function of the sensor is to capture the value of a parameter allowing to assess the behavior of the observed phenomenon. The aforementioned example illustrates the significance of identifying the meaning of data. As a solution to this challenge, a semantic data model tries to explicitly define what are things like “temperature”, “heat”, “fever”, “thermometer,” “Celsius degrees” and so forth. In addition to the above, it has to be noted that the collected data are always interpreted within the boundaries of a well-defined “context”. In this keynote talk it will be shown how the above concepts are used in emerging I4.0 applications using semantic technologies and the so called Cognitive Digital Twins in a number of EU R&I projects.



footer