For each disagreement category, the percentage of items exhibiting convergence (at least 80/100 annotators agreed on the same NLI label), the total number of items in the category, and the mean/standard deviation of the majority vote count.
. | Converge % . | Total # . | Mean (std) . |
---|---|---|---|
. | majority vote . | ||
Lexical | 17.74 | 124 | 66.02 (14.15) |
Implicature | 12.50 | 24 | 63.96 (14.21) |
Presupposition | 0.00 | 12 | 57.92 (13.82) |
Probabilistic Enrichment | 13.33 | 165 | 64.56 (12.06) |
Imperfection | 22.73 | 22 | 67.18 (14.71) |
Coreference | 14.67 | 75 | 66.17 (14.39) |
Temporal Reference | 25.00 | 12 | 62.0 (19.33) |
Interrogative Hypothesis | 20.00 | 15 | 63.13 (14.32) |
Accommodating | 25.49 | 51 | 67.76 (16.48) |
High Overlap | 0.00 | 8 | 65.12 (4.75) |
. | Converge % . | Total # . | Mean (std) . |
---|---|---|---|
. | majority vote . | ||
Lexical | 17.74 | 124 | 66.02 (14.15) |
Implicature | 12.50 | 24 | 63.96 (14.21) |
Presupposition | 0.00 | 12 | 57.92 (13.82) |
Probabilistic Enrichment | 13.33 | 165 | 64.56 (12.06) |
Imperfection | 22.73 | 22 | 67.18 (14.71) |
Coreference | 14.67 | 75 | 66.17 (14.39) |
Temporal Reference | 25.00 | 12 | 62.0 (19.33) |
Interrogative Hypothesis | 20.00 | 15 | 63.13 (14.32) |
Accommodating | 25.49 | 51 | 67.76 (16.48) |
High Overlap | 0.00 | 8 | 65.12 (4.75) |