Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-7 of 7
Bonnie Webber
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2023) 49 (1): 157–198.
Published: 01 March 2023
FIGURES
| View All (12)
Abstract
View article
PDF
Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising number of annotation errors or inconsistencies. To alleviate this issue, many methods for annotation error detection have been devised over the years. While researchers show that their approaches work well on their newly introduced datasets, they rarely compare their methods to previous work or on the same datasets. This raises strong concerns on methods’ general performance and makes it difficult to assess their strengths and weaknesses. We therefore reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets for text classification as well as token and span labeling. In addition, we define a uniform evaluation setup including a new formalization of the annotation error detection task, evaluation protocol, and general best practices. To facilitate future research and reproducibility, we release our datasets and implementations in an easy-to-use and open source software package. 1
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2021) 47 (1): 1–7.
Published: 21 April 2021
Abstract
View article
PDF
Because the 2020 ACL Lifetime Achievement Award presentation could not be done in person, we replaced the usual LTA talk with an interview between Professor Kathy McKeown (Columbia University) and the recipient, Bonnie Webber. The following is an edited version of the interview, with added citations.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2018) 44 (3): 387–392.
Published: 01 September 2018
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2014) 40 (4): 921–950.
Published: 01 December 2014
FIGURES
Abstract
View article
PDF
The Penn Discourse Treebank (PDTB) was released to the public in 2008. It remains the largest manually annotated corpus of discourse relations to date. Its focus on discourse relations that are either lexically-grounded in explicit discourse connectives or associated with sentential adjacency has not only facilitated its use in language technology and psycholinguistics but also has spawned the annotation of comparable corpora in other languages and genres. Given this situation, this paper has four aims: (1) to provide a comprehensive introduction to the PDTB for those who are unfamiliar with it; (2) to correct some wrong (or perhaps inadvertent) assumptions about the PDTB and its annotation that may have weakened previous results or the performance of decision procedures induced from the data; (3) to explain variations seen in the annotation of comparable resources in other languages and genres, which should allow developers of future comparable resources to recognize whether the variations are relevant to them; and (4) to enumerate and explain relationships between PDTB annotation and complementary annotation of other linguistic phenomena. The paper draws on work done by ourselves and others since the corpus was released.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2011) 37 (2): 385–393.
Published: 01 June 2011
Abstract
View article
PDF
Every text has at least one topic and at least one genre. Evidence for a text's topic and genre comes, in part, from its lexical and syntactic features—features used in both Automatic Topic Classification and Automatic Genre Classification (AGC). Because an ideal AGC system should be stable in the face of changes in topic distribution, we assess five previously published AGC methods with respect to both performance on the same topic–genre distribution on which they were trained and stability of that performance across changes in topic–genre distribution. Our experiments lead us to conclude that (1) stability in the face of changing topical distributions should be added to the evaluation critera for new approaches to AGC, and (2) Part-of-Speech features should be considered individually when developing a high-performing, stable AGC system for a particular, possibly changing corpus.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2007) 33 (4): 607–611.
Published: 01 December 2007
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2003) 29 (4): 545–587.
Published: 01 December 2003
Abstract
View article
PDF
We argue in this article that many common adverbial phrases generally taken to signal a discourse relation between syntactically connected units within discourse structure instead work anaphorically to contribute relational meaning, with only indirect dependence on discourse structure. This allows a simpler discourse structure to provide scaffolding for compositional semantics and reveals multiple ways in which the relational meaning conveyed by adverbial connectives can interact with that associated with discourse structure. We conclude by sketching out a lexicalized grammar for discourse that facilitates discourse interpretation as a product of compositional rules, anaphor resolution, and inference.