Skip to Main Content
Table 4: 

Rater study results. Corr and Incorr are accuracies of raters in each group on correct and incorrect instances respectively, with incorrect instances further broken into Pred(icate) and Ref(erence) model errors. F1 is on the task of identifying incorrect instances.

AccuracyF1
AllCorrIncorr/Pred/RefIncorr
None 67.5 90.4 44.3/43.9/44.7 57.6 
Sentence 69.7 92.4 47.1/46.1/48.0 60.9 
QED 70.2 90.6 49.7/48.2/51.0 62.5 
AccuracyF1
AllCorrIncorr/Pred/RefIncorr
None 67.5 90.4 44.3/43.9/44.7 57.6 
Sentence 69.7 92.4 47.1/46.1/48.0 60.9 
QED 70.2 90.6 49.7/48.2/51.0 62.5 
Close Modal

or Create an Account

Close Modal
Close Modal