| Ralph Grishman |
Department of Computer Science
Courant Institute of Mathematical Sciences
New York University
715 Broadway, Room 703
New York, NY 10003, U.S.A.
Information extraction (IE) involves the identification of user-specified semantic classes of entities, relations, and events in text. IE renders the information in text in a structured form that can be searched, analyzed, and reasoned about. But adapting an IE system to a new extraction task involving a new set of classes remains a major challenge, and this has limited the impact of IE.
We review some of the research aimed at meeting this challenge, moving from hand-crated rules (which require developer expertise) and supervised training (which requires large annotated corpora) to semi-supervised methods and distant supervision (which require less effort but offer limited performance). We examine the potential for active learning and the trade-off between transparency and efficiency, particularly for users with limited NLP experience. We consider this trade-off in light of our initial experience with an IE customization tool, which gives the user overall control but provides guidance based on a distributional analysis of the text. The talk concludes with a demonstration of the customization process.
(joint work with Yifan He)
Ralph Grishman's BiographyProf. Ralph Grishman is Professor of Computer Science at New York University, and served as chair of the department from 1986 to 1988. He has been involved in research in natural language processing since 1969. He served for a year (1982-83) as project leader in natural language processing at the Navy Center for Applied Research in Artificial Intelligence. Since 1985 he has directed the Proteus Project under funding from DARPA, NSF, and other Government agencies, focusing on research in information extraction. He has participated in most of the multi-site evaluations of information extraction, including MUC [the Message Understanding Conferences], ACE [Automatic Content Extraction] and KBP [Knowledge Base Population], and has been involved in organizing several of these evaluations, most recently KBP. He is a past president of the Association for Computational Linguistics and author of the text Computational Linguistics: An Introduction (Cambridge Univ. Press).
Bernardo Magnini |
FBK - Fondazione Bruno Kessler
Via Sommarive 18, 38100 Povo-Trento, Italy
In the last years, a relevant research line in NLP has focused on detecting semantic relations among portions of text, including entailment, similarity, temporality, and, with a less degree, causality. The attention on such semantic relations has raised the demand to move towards more informative meaning representations, which express properties of concepts and relations among them. This demand triggered research on "statement entailment graphs", where nodes are natural language statements (propositions), comprising of predicates with their arguments and modifiers, while edges represent entailment relations between nodes.
In this talk we report initial research that defines the properties of entailment graphs and their potential applications. Particularly, we show how entailment graphs are profitably used in the context of the EXCITEMENT EU project, where they are applied for the analysis of customer interactions across multiple channels, including speech, email, chat and social media, and multiple languages (English, German, Italian).
Bernardo Magnini's BiographyBernardo Magnini is senior researcher at FBK in Trento, where he is co-responsible of the Research Unit on Human Language Technology. His interests are in the field of Natural Language Processing, particularly lexical semantics, question answering and textual entailment, areas in which he has published more than 130 scientific papers. He has coordinated several international research projects, including QALL-ME (Question Answering), LiveMemories (content extraction and integration), and EXCITEMENT (textual inferences). He has launched and has coordinated EVALITA, the evaluation campaign for both NLP and speech tools for Italian. He is contract professor at the University of Bolzano, and member of the Scientific Committee of several initiatives, both academic and industrial. He currently coordinates the special interest group on NLP of the Italian Association for Artificial Intelligence.
Salim Roukos |
Senior Manager, Multilingual NLP Technologies & CTO Translation Technologies
Thomas J. Watson Research Center
Yorktown Heights, NY, USA
Significant progress in the quality of statistical machine translation over the past few years include the use of source language analysis including parse forests of the input, techniques for cross-system adaptation, and improved data modeling for informal communications such as found in blogs. We will give an overview of what has been done in each of the above areas. We will also give an update on techniques to improve the value of machine translation for human translators for MT Post Editing. In particular, we will discuss the impact of MT on human translator productivity in translating English to Japanese, a notoriously challenging language pair.