Word Association Thematic Analysis: Insight Discovery from the Social Web
Mike Thelwall, University of Wolverhampton, United Kingdom
Causal Inference: Why We Should Care
Wolfgang Nejdl, L3S and University of Hannover, Germany
Knowledge-based Confidentiality-preserving Query Answering
Riccardo Rosati, Sapienza Università di Roma, Italy
Diving Deep into the Murky Web of Social Media
Diana Maynard, University of Sheffield, United Kingdom
Brief Bio
Mike Thelwall is Professor of Data Science and leads the Statistical Cybermetrics and Research Evaluation Group at the University of Wolverhampton in the UK. He researches sentiment analysis, science of science, and social web analysis methods. He has created software and methods for analysing Twitter, YouTube, and general web pages from a quantitative social science perspective.
Abstract
Billions of short messages are posted daily to the public social web. This gives opportunities for researchers to gain insights into the issues discussed, but extracting useful information is challenging. On the one hand, the simplifying quantitative approaches for large scale analysis risk misinterpreting the patterns found because of the many different uses of the social web. On the other hand, small scale qualitative investigations may miss the big picture and ignore most of the data. This talk describes a mixed methods approach, word association thematic analysis, that attempts to gain the face validity of small-scale qualitative investigations with the power of large-scale pattern detection. The method leverages comparisons to identify sets of characteristic words, then applies thematic analysis to group these words into patterns according to the context in which they are used. The comparisons can be temporal (e.g., early vs. late tweets), topic-based (e.g., vaxxers vs. antivaxxers), or tweeter-based (e.g., gender, location). The outcome of word association thematic analysis is a set of themes that characterise an issue in a social web site, supported by qualitative evaluations of the context of the words analysed and statistical tests for the validity of the differences identified. The method is supported by the free software mozdeh.wlv.ac.uk and the talk will give a range of examples from YouTube and Twitter.
Brief Bio
Prof. Dr. Wolfgang Nejdl (born 1960) has been full professor of computer science at the University of Hannover since 1995. He received his M.Sc. (1984) and Ph.D. degree (1988) at the Technical University of Vienna, was assistant professor in Vienna from 1988 to 1992, and associate professor at the RWTH Aachen from 1992 to 1995.He worked as visiting researcher / professor at Xerox PARC, Stanford University, University of Illinois at Urbana-Champaign, EPFL Lausanne,PUC Rio, Trento and Politecnico di Milano.Prof. Nejdl heads the L3S Research Center, www.L3S.de, as well as the Distributed Systems Institute / Knowledge Based Systems, and does research in the areas of Information Retrieval, Web Science, Artificial Intelligence, Social and Semantic Web, Digital Libraries and Technology Enhanced Learning. He was PI of the ERC Advanced Grant ALEXANDRIA, from 2014 - 2019, working on foundations for temporal retrieval, exploration and analytics in Web archives. Current projects include NoBIAS, SoBigData++ as well as the International Leibniz Future Lab on Artificial Intelligence, with a special focus on personalized medicine.Wolfgang Nejdl, www.kbs.uni-hannover.de/~nejdl, published more than 420 scientific articles, as listed at DBLP, with an h-index (based on Google Scholar) of 77.
Abstract
Millions of users frequently share their thoughts and experiences about work, relations, or health practices on online social media platforms. This makes such platforms a treasure trove of information which can be leveraged to better identify common and critical situations, as witnessed by a huge number of interesting social media papers at ICWSM, The Web Conference, WEBIST, and others. Given that L3S has several large projects connected to medicine, I will focus on papers related to mental health and how we can draw conclusions based on social media activity of users, such as predicting mental stress based on users’ social media activity. Most of these methods are associational and depend on correlations in the data to make inferences. This can lead to wrong or useless associations and do not express causal dependencies, as expressed by the phrase “correlation does not imply causation”.
In my talk, I will therefore focus on the importance and utility of causal inference for AI in general and for social media analysis specifically. I will start with a quick primer on causal models encompassing two major causal frameworks – structural causal models based on Pearl’s SCM approach and on the potential outcome approach by Rubin et al. I will discuss some of the existing works that deploy causal modeling to online social media data to better understand mental wellbeing of users including some work being currently pursued at our group. I will conclude by pointing to some interesting research directions which our community should take up in the future.
Brief Bio
Riccardo Rosati, Ph.D. in Computer Science, is Professor of Computer Science and Engineering at the Department of Computer, Control and Management Engineering (DIAG) of Sapienza Università di Roma, Italy. His research interests are in the areas of Artificial Intelligence and Databases, focusing on knowledge representation, data management, ontologies and semantic technologies. He has published over 170 papers in international journals and conferences. He is a Fellow of the European Association for Artificial Intelligence. He is currently involved in projects focusing on privacy-preserving data management, inconsistency-tolerant data management, and data management for healthcare applications.
Abstract
This talk will focus on Controlled Query Evaluation (CQE), a form of confidentiality-preserving query answering over databases and knowledge bases. The CQE approach is based on the declarative specification of a data protection policy on the database/knowledge base through logical formulas, and on the enforcement of such a policy through specialized query answering techniques. Such techniques are parameterized with respect to the data protection policy, and are able to prevent the disclosure of sensitive information to unauthorized users while being as cooperative as possible, that is, answering queries honestly whenever this does not harm confidentiality. The talk will present some recent theoretical and practical advancements in CQE over databases, knowledge bases and OWL ontologies, pointing out a few interesting open problems.
Brief Bio
Dr Diana Maynard is a Senior Research Fellow in the Computer Science department at the University of Sheffield. She has a PhD in Natural Language Processing (NLP) and has more than 30 years of experience in the field. Since 2000 she has been one of the key developers of the GATE NLP toolkit, leading work on Sheffield’s open-source multilingual text analysis tools. Her main research interests are in practical, multidisciplinary approaches to text and social media analysis, in a wide range of fields including cultural heritage, human rights, law, journalism, sustainability and the environment, geography, politics, and natural disasters. She is currently working on various projects based around the detection and analysis of online hate speech, including methods for removing bias in Machine Learning, and for early warning detection of abuse escalation.
Abstract
In this talk, Dr. Maynard will explore online conversation from a linguistic and social point of view, investigating both constructive and destructive behaviour as well as notions around conversational evolution, message amplification, and behavioural change. She will focus on practical methods for understanding and interpreting online discussions, discussing how NLP tools can be combined with social science methods and visual and predictive analytics for understanding and mitigating topics such as online abuse, disinformation, and behavioural change, with practical examples from recent case studies.