| Paolo Rosso |
Dpto. Sistemas Informáticos y Computación (DSIC)
Universitat Politècnica de València
Abstract: Social media have given the opportunity to users to publish content and express their opinions online in a fast and easy way. The ease of generating content online and the anonymity that social media provide have increased the amount of harmful content that is published. A great amount of fake news, hoaxes, hurtful comments, inaccurate reviews and offensive content is published and propagated every day in social media. Bots play also a key role in disseminating false claims and hate speech. In this keynote I will focus on two types of harmful information: fake news and hate speech. Moreover, I will describe some of the shared tasks that have been organised in our research community, also from an author profiling perspective, such as at PAN where given a Twitter feed, the aim has been to identify whether its author is a bot (2019), or s/he is keen to be a spreader of fake news (2020).
Paolo Rosso's BiographyPaolo Rosso is full professor at the Universitat Politecnica de Valencia, Spain where he is also member of the PRHLT research center. His research interests focus mainly on author profiling, irony detection, fake reviews detection, plagiarism detection, and recently hate speech and fake news detection. Since 2009 he has been involved in the organisation of PAN benchmark activities at CLEF and at FIRE evaluation forums, mainly on plagiarism / text reuse detection and author profiling. At SemEval he has been co-organiser of shared tasks on sentiment analysis of figurative language in Twitter (2015), and on multilingual detection of hate speech against immigrants and women in Twitter (2019). He is co-ordinator of the activities of IberLEF evaluation forum. He has been PI of national and international research projects funded by EC and U.S. Army Research Office. At the moment, in collaboration with Carnegie Mellon University, he is involved in a project funded by Qatar National Research Fund on author profiling for cyber-security. He serves as deputy steering committee chair for the CLEF conference and as associate editor for the Information Processing & Management journal. He has been chair of *SEM-2015, and organisation chair of CERI-2012, CLEF-2013 and EACL-2017. He is the author of 400+ papers, published in journals, book chapters, conference and workshop proceedings.
Diana Maynard |
Natural Language Processing Group
Department of Computer Science
University of Sheffield
Traditionally, there has been a disconnect between custom-built applications used to solve real-world information extraction problems in industry, and automated learning-based approaches developed in academia. Automated sentiment analysis and misinformation detection systems, for example, are often developed around publicly available datasets, with models being constantly improved based on competitive challenges. However, despite approaches such as transfer-based learning, adapting these to more customised solutions where the task and data may be slightly different, and where training data may be largely unavailable, is still hugely problematic, with the result that many systems are typically still custom-built using expert hand-crafted knowledge, and do not scale. In the legal domain, a traditional slow adopter of technology, black box machine learning-based systems are simply too untrustworthy to be widely used. In industrial settings, the fine-grained highly specialised knowledge of human experts is still critical, and it is not obvious how to integrate this into automated classification systems. In this talk, we examine how to combine this expert human knowledge with automated technologies from NLP and deep learning, in order to scale the tools to larger amounts of data and to perform more complex analyses.
Diana Maynard's BiographyDr Diana Maynard is a Senior Researcher at the University of Sheffield, where since 2000 she has been one of the key developers of GATE, leading work on Sheffield’s open-source multilingual text analysis tools. Her main research interests are in developing tools and applications for information extraction, sentiment analysis, social media analysis, and terminology, and investigating correlations between human behaviour and language, including hate speech and misinformation. She has led teams on numerous European and national projects in a wide range of fields including politics, climate change, natural disasters, human rights, geography, journalism, and scientometrics. She regularly provides consultancy and training on text and social media analysis in both the public and private sector, as well as widespread community engagement through talks, tutorials, and publications.
Joakim Nivre |
Department of Linguistics and Philology
Box 635, SE-75126 Uppsala
Research on dependency parsing has always had a strong multilingual orientation, but the lack of standardized annotations for a long time made it difficult both to meaningfully compare results across languages and to develop truly multilingual systems. The Universal Dependencies project has during the last five years tried to overcome this obstacle by developing cross-linguistically consistent morphosyntactic annotation for many languages. During the same period, dependency parsing (like the rest of NLP) has been transformed by the adoption of continuous vector representations and neural network techniques. In this paper, I will introduce the framework and resources of Universal Dependencies, and discuss advances in multilingual dependency parsing enabled by these resources in combination with deep learning techniques, ranging from traditional word and character embeddings to deep contextualized word representations like ELMo and BERT.