Table 1

Annotation task (ST for sequence tagging, Cl for classification) and the number of instances per dataset and split. $μ|D|$ denotes the average instance length in characters and μt the average annotation time. $σ|D|$ and σt denotes the standard deviation, respectively. Across all datasets, annotation time is reported for annotating the whole instance (i.e., not for individual entities).

NameTask$|D|$$|Dtrain|$$|Ddev|$$|Dtest|$$μ|D|$$σ|D|$μtσt
Muc7TST 3,113 2,179 467 467 133.7 70.8 5.4 3.9
Muc7TST 3,113 2,179 467 467 133.7 70.8 5.2 4.2
SigIE ST 251 200 – 51 226.4 114.8 27.0 14.7
SPEC Cl 850 680 – 170 160.4 64.2 22.7 12.4
