Development set performance. Len inc. is the average percentage increase in length from decontextualization. % edited is the proportion of examples that have at least one edit. match-all shows percentage of outputs that have at least one match in the human references; match-edited shows the match value calculated on cases where all references include at least one edit.
. | len inc. . | % edited . | match . | SARI add . | SARI del . |
---|---|---|---|---|---|
all / edited . | F1 (P/R) . | F1 (P/R) . | |||
Repeat | 0 | 0 | 38 / 0 | 0 (0/0) | 0 (0/0) |
Coref | 7 | 42 | 39 / 13 | 22 (51/14) | 31 (34/28) |
T5-Base | 8 | 40 | 48 / 21 | 29 (67/19) | 40 (54/32) |
T5-11B | 12 | 59 | 53 / 32 | 42 (72/30) | 46 (49/43) |
Human | 24 | 76 | 45 / 29 | 56 (64/49) | 58 (61/55) |
. | len inc. . | % edited . | match . | SARI add . | SARI del . |
---|---|---|---|---|---|
all / edited . | F1 (P/R) . | F1 (P/R) . | |||
Repeat | 0 | 0 | 38 / 0 | 0 (0/0) | 0 (0/0) |
Coref | 7 | 42 | 39 / 13 | 22 (51/14) | 31 (34/28) |
T5-Base | 8 | 40 | 48 / 21 | 29 (67/19) | 40 (54/32) |
T5-11B | 12 | 59 | 53 / 32 | 42 (72/30) | 46 (49/43) |
Human | 24 | 76 | 45 / 29 | 56 (64/49) | 58 (61/55) |