Skip to Main Content
Table 4 

Corpora in English for non-NA anaphora. For each work, we provide information about the number and type of non-NA instances in the anaphora column. “(+)” and “(‡)” mark corpora that indicate the antecedent or its semantic type, respectively. Publicly available corpora are marked with an asterisk “(*)”

WorkCorpus dataAnaphora
Schiffman (1985) Transcribed career-counseling interviews 298 pronouns (it: 65, that: 233) 
Webber (1991) Essays, reviews, technical reports 96 pronouns (it: 15, this: 62, that: 19) 
Eckert and Strube (2000) Switchboard corpus (telephone conversations) (+) 154 pronouns (it: 47, this, that: 107) 
Byron (2003) *TRAINS93 (task-oriented dialogues), *BUR (read news stories) (+)(‡) 68 pronouns (it: 16; demonstratives: 52) 
Poesio and Modjeska (2002, 2005) GNOME (museum descriptions and pharmaceutical leaflets) 19 demonstratives 
Botley and McEnery (2001); Botley (2006) Associated Press, Hansard, and American Printing House for the Blind 403 demonstratives (this: 149, that: 244, these: 9, those: 1) 
Gundel, Hedberg, and Zacharski (2002) Santa Barbara Corpus of Spoken American English (spontaneous conversation) (‡) 110 personal pronouns (it
Artstein and Poesio (2006) TRAINS91 (task-oriented dialogs) (+) 28 instances (it: 2, demonstratives: this: 4, that: 20, those: 2) (experiment 1) 
Hedberg, Gundel, and Zacharski (2007) New York Times (+)(‡) 178 pronouns1 (it, this, that
Pradhan et al. (2007) OntoNotes (mix of genres) (+)2 502 pronouns (it: 146, this: 85, that: 271) 
Müller (2008) ICSI meeting corpus (multi-party discussions) (+)2 150 pronouns (it, this, that
Kolhatkar and Hirst (2012) * This issue corpus (MEDLINE abstracts) (+) 183 this issue 
Kolhatkar, Zinsmeister, and Hirst (2013a); Kolhatkar (2015) *ASN and *CSN corpora (New York Times corpus) (+) 1,810 anaphoric instances (ASN), (+) 114,700 cataphoric instances (CSN) of six shell nouns 
Uryupina et al. (2018) *ARRAU (mix of genres) (+) 1,633 pronouns and shell nouns 
Lapshinova-Koltunski, Hardmeier, and Krielke (2018) *TED talks, news (+) 468 instances (pronouns, nominalizations, possibly shell nouns) 
WorkCorpus dataAnaphora
Schiffman (1985) Transcribed career-counseling interviews 298 pronouns (it: 65, that: 233) 
Webber (1991) Essays, reviews, technical reports 96 pronouns (it: 15, this: 62, that: 19) 
Eckert and Strube (2000) Switchboard corpus (telephone conversations) (+) 154 pronouns (it: 47, this, that: 107) 
Byron (2003) *TRAINS93 (task-oriented dialogues), *BUR (read news stories) (+)(‡) 68 pronouns (it: 16; demonstratives: 52) 
Poesio and Modjeska (2002, 2005) GNOME (museum descriptions and pharmaceutical leaflets) 19 demonstratives 
Botley and McEnery (2001); Botley (2006) Associated Press, Hansard, and American Printing House for the Blind 403 demonstratives (this: 149, that: 244, these: 9, those: 1) 
Gundel, Hedberg, and Zacharski (2002) Santa Barbara Corpus of Spoken American English (spontaneous conversation) (‡) 110 personal pronouns (it
Artstein and Poesio (2006) TRAINS91 (task-oriented dialogs) (+) 28 instances (it: 2, demonstratives: this: 4, that: 20, those: 2) (experiment 1) 
Hedberg, Gundel, and Zacharski (2007) New York Times (+)(‡) 178 pronouns1 (it, this, that
Pradhan et al. (2007) OntoNotes (mix of genres) (+)2 502 pronouns (it: 146, this: 85, that: 271) 
Müller (2008) ICSI meeting corpus (multi-party discussions) (+)2 150 pronouns (it, this, that
Kolhatkar and Hirst (2012) * This issue corpus (MEDLINE abstracts) (+) 183 this issue 
Kolhatkar, Zinsmeister, and Hirst (2013a); Kolhatkar (2015) *ASN and *CSN corpora (New York Times corpus) (+) 1,810 anaphoric instances (ASN), (+) 114,700 cataphoric instances (CSN) of six shell nouns 
Uryupina et al. (2018) *ARRAU (mix of genres) (+) 1,633 pronouns and shell nouns 
Lapshinova-Koltunski, Hardmeier, and Krielke (2018) *TED talks, news (+) 468 instances (pronouns, nominalizations, possibly shell nouns) 
1

These are cases marked as indirect by both annotators. Reference to events cannot be distinguished from NA anaphora in their scheme, cf. Section 2.1.5.

2

Pradhan et al. (2007) and Müller (2008) only mark the antecedent’s head verb.

Close Modal

or Create an Account

Close Modal
Close Modal