Table 4: 

Development set performance. Len inc. is the average percentage increase in length from decontextualization. % edited is the proportion of examples that have at least one edit. match-all shows percentage of outputs that have at least one match in the human references; match-edited shows the match value calculated on cases where all references include at least one edit.

len inc.% editedmatchSARI addSARI del
all / editedF1 (P/R)F1 (P/R)
Repeat 38 / 0 0 (0/0) 0 (0/0) 
Coref 42 39 / 13 22 (51/14) 31 (34/28) 
T5-Base 40 48 / 21 29 (67/19) 40 (54/32) 
T5-11B 12 59 53 / 32 42 (72/30) 46 (49/43) 
Human 24 76 45 / 29 56 (64/49) 58 (61/55) 
len inc.% editedmatchSARI addSARI del
all / editedF1 (P/R)F1 (P/R)
Repeat 38 / 0 0 (0/0) 0 (0/0) 
Coref 42 39 / 13 22 (51/14) 31 (34/28) 
T5-Base 40 48 / 21 29 (67/19) 40 (54/32) 
T5-11B 12 59 53 / 32 42 (72/30) 46 (49/43) 
Human 24 76 45 / 29 56 (64/49) 58 (61/55) 
Close Modal

or Create an Account

Close Modal
Close Modal