Corpora in English for non-NA anaphora. For each work, we provide information about the number and type of non-NA instances in the anaphora column. “(+)” and “(‡)” mark corpora that indicate the antecedent or its semantic type, respectively. Publicly available corpora are marked with an asterisk “(*)”
Work . | Corpus data . | Anaphora . |
---|---|---|
Schiffman (1985) | Transcribed career-counseling interviews | 298 pronouns (it: 65, that: 233) |
Webber (1991) | Essays, reviews, technical reports | 96 pronouns (it: 15, this: 62, that: 19) |
Eckert and Strube (2000) | Switchboard corpus (telephone conversations) | (+) 154 pronouns (it: 47, this, that: 107) |
Byron (2003) | *TRAINS93 (task-oriented dialogues), *BUR (read news stories) | (+)(‡) 68 pronouns (it: 16; demonstratives: 52) |
Poesio and Modjeska (2002, 2005) | GNOME (museum descriptions and pharmaceutical leaflets) | 19 demonstratives |
Botley and McEnery (2001); Botley (2006) | Associated Press, Hansard, and American Printing House for the Blind | 403 demonstratives (this: 149, that: 244, these: 9, those: 1) |
Gundel, Hedberg, and Zacharski (2002) | Santa Barbara Corpus of Spoken American English (spontaneous conversation) | (‡) 110 personal pronouns (it) |
Artstein and Poesio (2006) | TRAINS91 (task-oriented dialogs) | (+) 28 instances (it: 2, demonstratives: this: 4, that: 20, those: 2) (experiment 1) |
Hedberg, Gundel, and Zacharski (2007) | New York Times | (+)(‡) 178 pronouns1 (it, this, that) |
Pradhan et al. (2007) | OntoNotes (mix of genres) | (+)2 502 pronouns (it: 146, this: 85, that: 271) |
Müller (2008) | ICSI meeting corpus (multi-party discussions) | (+)2 150 pronouns (it, this, that) |
Kolhatkar and Hirst (2012) | * This issue corpus (MEDLINE abstracts) | (+) 183 this issue |
Kolhatkar, Zinsmeister, and Hirst (2013a); Kolhatkar (2015) | *ASN and *CSN corpora (New York Times corpus) | (+) 1,810 anaphoric instances (ASN), (+) 114,700 cataphoric instances (CSN) of six shell nouns |
Uryupina et al. (2018) | *ARRAU (mix of genres) | (+) 1,633 pronouns and shell nouns |
Lapshinova-Koltunski, Hardmeier, and Krielke (2018) | *TED talks, news | (+) 468 instances (pronouns, nominalizations, possibly shell nouns) |
Work . | Corpus data . | Anaphora . |
---|---|---|
Schiffman (1985) | Transcribed career-counseling interviews | 298 pronouns (it: 65, that: 233) |
Webber (1991) | Essays, reviews, technical reports | 96 pronouns (it: 15, this: 62, that: 19) |
Eckert and Strube (2000) | Switchboard corpus (telephone conversations) | (+) 154 pronouns (it: 47, this, that: 107) |
Byron (2003) | *TRAINS93 (task-oriented dialogues), *BUR (read news stories) | (+)(‡) 68 pronouns (it: 16; demonstratives: 52) |
Poesio and Modjeska (2002, 2005) | GNOME (museum descriptions and pharmaceutical leaflets) | 19 demonstratives |
Botley and McEnery (2001); Botley (2006) | Associated Press, Hansard, and American Printing House for the Blind | 403 demonstratives (this: 149, that: 244, these: 9, those: 1) |
Gundel, Hedberg, and Zacharski (2002) | Santa Barbara Corpus of Spoken American English (spontaneous conversation) | (‡) 110 personal pronouns (it) |
Artstein and Poesio (2006) | TRAINS91 (task-oriented dialogs) | (+) 28 instances (it: 2, demonstratives: this: 4, that: 20, those: 2) (experiment 1) |
Hedberg, Gundel, and Zacharski (2007) | New York Times | (+)(‡) 178 pronouns1 (it, this, that) |
Pradhan et al. (2007) | OntoNotes (mix of genres) | (+)2 502 pronouns (it: 146, this: 85, that: 271) |
Müller (2008) | ICSI meeting corpus (multi-party discussions) | (+)2 150 pronouns (it, this, that) |
Kolhatkar and Hirst (2012) | * This issue corpus (MEDLINE abstracts) | (+) 183 this issue |
Kolhatkar, Zinsmeister, and Hirst (2013a); Kolhatkar (2015) | *ASN and *CSN corpora (New York Times corpus) | (+) 1,810 anaphoric instances (ASN), (+) 114,700 cataphoric instances (CSN) of six shell nouns |
Uryupina et al. (2018) | *ARRAU (mix of genres) | (+) 1,633 pronouns and shell nouns |
Lapshinova-Koltunski, Hardmeier, and Krielke (2018) | *TED talks, news | (+) 468 instances (pronouns, nominalizations, possibly shell nouns) |