Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-3 of 3
Bonnie J. Dorr
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2013) 39 (3): 555–590.
Published: 01 September 2013
FIGURES
| View All (4)
Abstract
View article
PDF
Knowing the degree of semantic contrast between words has widespread application in natural language processing, including machine translation, information retrieval, and dialogue systems. Manually created lexicons focus on opposites, such as hot and cold . Opposites are of many kinds such as antipodals, complementaries, and gradable. Existing lexicons often do not classify opposites into the different kinds, however. They also do not explicitly list word pairs that are not opposites but yet have some degree of contrast in meaning, such as warm and cold or tropical and freezing . We propose an automatic method to identify contrasting word pairs that is based on the hypothesis that if a pair of words, A and B, are contrasting, then there is a pair of opposites, C and D, such that A and C are strongly related and B and D are strongly related. (For example, there exists the pair of opposites hot and cold such that tropical is related to hot , and freezing is related to cold .) We will call this the contrast hypothesis. We begin with a large crowdsourcing experiment to determine the amount of human agreement on the concept of oppositeness and its different kinds. In the process, we flesh out key features of different kinds of opposites. We then present an automatic and empirical measure of lexical contrast that relies on the contrast hypothesis, corpus statistics, and the structure of a Roget-like thesaurus. We show how, using four different data sets, we evaluated our approach on two different tasks, solving “most contrasting word” questions and distinguishing synonyms from opposites. The results are analyzed across four parts of speech and across five different kinds of opposites. We show that the proposed measure of lexical contrast obtains high precision and large coverage, outperforming existing methods.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2012) 38 (2): 411–438.
Published: 01 June 2012
FIGURES
| View All (13)
Abstract
View article
PDF
This article describes the resource- and system-building efforts of an 8-week Johns Hopkins University Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically Informed Machine Translation (SIMT). We describe a new modality/negation (MN) annotation scheme, the creation of a (publicly available) MN lexicon, and two automated MN taggers that we built using the annotation scheme and lexicon. Our annotation scheme isolates three components of modality and negation: a trigger (a word that conveys modality or negation), a target (an action associated with modality or negation), and a holder (an experiencer of modality). We describe how our MN lexicon was semi-automatically produced and we demonstrate that a structure-based MN tagger results in precision around 86% (depending on genre) for tagging of a standard LDC data set. We apply our MN annotation scheme to statistical machine translation using a syntactic framework that supports the inclusion of semantic annotations. Syntactic tags enriched with semantic annotations are assigned to parse trees in the target-language training texts through a process of tree grafting. Although the focus of our work is modality and negation, the tree grafting procedure is general and supports other types of semantic information. We exploit this capability by including named entities, produced by a pre-existing tagger, in addition to the MN elements produced by the taggers described here. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reported on the NIST 2009 Urdu–English test set. This finding supports the hypothesis that both syntactic and semantic information can improve translation quality.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2010) 36 (3): 341–387.
Published: 01 September 2010
Abstract
View article
PDF
The task of paraphrasing is inherently familiar to speakers of all languages. Moreover, the task of automatically generating or extracting semantic equivalences for the various units of language—words, phrases, and sentences—is an important part of natural language processing (NLP) and is being increasingly employed to improve the performance of several NLP applications. In this article, we attempt to conduct a comprehensive and application-independent survey of data-driven phrasal and sentential paraphrase generation methods, while also conveying an appreciation for the importance and potential use of paraphrases in the field of NLP research. Recent work done in manual and automatic construction of paraphrase corpora is also examined. We also discuss the strategies used for evaluating paraphrase generation techniques and briefly explore some future trends in paraphrase generation.