Table 3: 

For each disagreement category, the percentage of items exhibiting convergence (at least 80/100 annotators agreed on the same NLI label), the total number of items in the category, and the mean/standard deviation of the majority vote count.

Converge %Total #Mean (std)
majority vote
Lexical 17.74 124 66.02 (14.15) 
Implicature 12.50 24 63.96 (14.21) 
Presupposition 0.00 12 57.92 (13.82) 
Probabilistic Enrichment 13.33 165 64.56 (12.06) 
Imperfection 22.73 22 67.18 (14.71) 
Coreference 14.67 75 66.17 (14.39) 
Temporal Reference 25.00 12 62.0 (19.33) 
Interrogative Hypothesis 20.00 15 63.13 (14.32) 
Accommodating 25.49 51 67.76 (16.48) 
High Overlap 0.00 65.12 (4.75) 
Converge %Total #Mean (std)
majority vote
Lexical 17.74 124 66.02 (14.15) 
Implicature 12.50 24 63.96 (14.21) 
Presupposition 0.00 12 57.92 (13.82) 
Probabilistic Enrichment 13.33 165 64.56 (12.06) 
Imperfection 22.73 22 67.18 (14.71) 
Coreference 14.67 75 66.17 (14.39) 
Temporal Reference 25.00 12 62.0 (19.33) 
Interrogative Hypothesis 20.00 15 63.13 (14.32) 
Accommodating 25.49 51 67.76 (16.48) 
High Overlap 0.00 65.12 (4.75) 
Close Modal

or Create an Account

Close Modal
Close Modal