| Eduard Hovy |
University of Southern California / Information Sciences Institute
4676 Admiralty Way
CA 90292-6695 Marina del Rey
Abstract: A few years ago, three research groups participated in an audacious experiment called Project Halo: (manually) converting the information contained in one chapter of a high school chemistry textbook into knowledge representation statements, and then having the knowledge representation system take the high school AP exam. Surprisingly, all three systems passed, albeit at a relatively low level of performance. Could one do the same, automatically? If not fully, how far can one go? Since October, several projects have taken up this challenge, or aspects of it. Our Learning by Reading project at ISI, drawing part-time participation of experts in NLP and KR&R, addresses the problem from the perspective of NLP. After suitable analysis and preparation, we parse the Chemistry textbook and then convert the results into very shallow 'pre-logic' predications, which are asserted to a knowledge base. The evaluation, still in progress, has two aspects. In the first, we apply questions to the system at various levels (text-only, knowledge level without inference, the latter with inference), and compare performance. In the second, we (in conjunction with other groups) compare the system's 'bottom-up' automatically derived representations to the 'top-down' ones created by hand by those groups. Although this project (and related) projects are merely pilot studies, they nonetheless are likely to generate some interesting conclusions regarding the gap between what automated systems can deliver and what human knowledge engineers deem necessary, in the fascinating endeavor of learning by reading.
Eduard Hovy's BiographySpecialism:
- Computational Linguistics
- Lexical Acquisition
- Machine Translation
- Multimodal Communciation
- Multimodal Dialogue Systems
- Multimodal Human-Machine Communication
- Natural Language Semantics
- Text Generation
Louise Guthrie |
The Department of Computer Science
211 Portobello Street
Sheffield, S1 4DP.
Abstract: This paper describes some of our work on a Ministry of Defense funded project to detect language use which is 'unusual' in some way. We will describe our work in iidentifying an anomalous document in a collection of documents and our work in identifying an anomalous segment within a document. We have designed thousands of test collections to allow the techniques we have used to be evaluated for several dimensions of anomaly: an anomalous author, genre, topic, or emotional tone. We will describe both supervised ( where training data is available for the 'normal' population) and unsupervised techniques (where no training data is available). In the supervised methods, our interest is in whether or not to attempt to represent the 'non-normal' population, and if so, how. In the unspuervised methods, we use several hundred stylistic features to rank the segments of a document as to their degree of anomaly. Experiments vary the size of the segments (100 word, 500 word, and 1000 word), and the similarity measures used. Overall results will be compared and analyzed.
Louise Guthrie's BiographyLouise Guthrie is Senior Lecturer in the Department of Computer Science at the University of Sheffield. Although her early career was in Mathematics, she developed one of the first commercial packages for stock market analysis in the early 1980's and completed a Ph.D. in computer science in 1986. She has undertaken research in Natural Language Processing since 1986, especially in the areas of information extraction, computational lexicography, text classification and word sense disambiguation.
Her recent research has focussed on text classification and semantic analysis of text. She was a researcher and Associate Director of CRL, New Mexico State University, where she participated as part of the team funded by the DARPA Tipster program to develop text extraction systems and participate in the MUC (Message Understanding Conferences) competitions. Subsequently, she became the Lead Scientist in Text processing for Lockheed Martin (formerly GE Aerospace), where she managed the development of one of the top performing text extraction systems in the Tipster program. Apart from many refereed publications, she co-authored with Wilks and Slator, "Electronic Words" (MIT press, 1996). She has managed a large number of research projects on text processing, text classification and dictionary analysis and construction. In 2003 Summer Workshop at Johns Hopkins University Center for Language and Speech Processing, she led the team investigating Semantic Analysis over Sparse Data. Her most recent work is to develop multilingual techniques for the identification of anomalous material in text.
James Pustejovsky |
Department of Computer Science
Abstract: Most recent annotation efforts for language have focused on small pieces of the larger problem of semantic annotation, rather than producing a single unified representation. In this talk, I investigate the issues involved in merging several of these efforts into a unified linguistic structure: specifically, PropBank, NomBank, TimeBank, Discourse Treebank, and Opinion Corpus. Each of these is focused on a specific aspect of the semantic representation task: semantic role labeling, discourse relations, temporal relations, etc., and has reached a level of maturity that warrants a concerted effort to merge them into a single, unified representation, what I will refer to as a Unified Linguistic Annotation (ULA). There are several technical and theoretical issues that must be resolved to bring these different layers together seamlessly. Most of these approaches have annotated the same type of data (Wall Street Journal), so it is also important to demonstrate that the annotation can be extended to other genres such as spoken language. The demonstration of success for the extensions is the training of accurate statistical semantic taggers
James Pustejovsky's BiographyResearch Interests:
- computational linguistics,
- lexical semantics,
- compositional semantics,
- theories of categorization, commonsense metaphysics,
- automatic indexing techniques, and
- content-based information extraction.
Eva Hudlicka |
Blacksburg, VA, USA
Abstract: Neuroscience and psychology research over the past two decades has demonstrated the close connection between cognition and affect. Affective factors (emotions and personality traits) can profoundly influence perception, decision-making and behavior, contributing to a variety of biases and heuristics. These effects may enhance or degrade cognitive processes and performance, depending on the context. The ability to explicitly represent affective factors in user models and cognitive architectures has a number of benefits, including more accurate user models, increased realism and believability of synthetic agents, and improved effectiveness of assistive technologies and decision-aids. Consideration of affective factors can also provide disambiguating information for speech recognition and natural language understanding. This paper discusses the motivation and alternatives for incorporating emotions and personality traits within a user model. Three key questions are addressed: "Why should affective factors be represented in user models?", "When are they beneficial?", and "What are the available modeling alternatives?" The paper first proposes a framework for analyzing and comparing computational models of emotions. The representational requirements for several alternative modeling approaches are then discussed, along with examples from existing cognitive-affective architectures. These include 'shallow' and 'deep' models of emotions, and a generic methodology for modeling multiple, interacting affective factors. The paper concludes with a discussion of challenges involved in building emotion models, including the selection of appropriate psychological theories, obtaining the necessary data, and validation.