Preference between T5 output and human annotation. Columns represents the judgement of the expert A, rows that of the expert B. We see high agreement between two expert annotators, despite one expert annotator (column annotator) is ambivalent more frequently.
. | T5 . | either . | Annotator . | Sum . |
---|---|---|---|---|
T5 | 13 | 12 | 2 | 27 |
either | 7 | 22 | 4 | 33 |
Annotator | 1 | 15 | 24 | 40 |
Sum | 21 | 49 | 30 | 100 |
. | T5 . | either . | Annotator . | Sum . |
---|---|---|---|---|
T5 | 13 | 12 | 2 | 27 |
either | 7 | 22 | 4 | 33 |
Annotator | 1 | 15 | 24 | 40 |
Sum | 21 | 49 | 30 | 100 |