Table 6: 

Accuracy of models trained on the supervised training splits of each dataset (Supervised), the contrastive objective in addition to training with Supervised (+CL), and the counterfactually augmented data (+CAD). The models are evaluated on the task of Evidence Sufficiency Prediction on datasets with extracted unrelated evidence information (§6.4).

ModelBERTRoBERTaALBERTEns.
FEVER 
Supervised 82.18 81.88 85.03 84.24 
 + CL 87.63 93.53 95.18 91.60 
 + CAD 89.50 94.73 90.89 90.95 
 
HoVer 
Supervised 97.27 78.64 97.65 88.57 
 + CL 99.58 99.71 99.45 99.98 
 + CAD 99.65 98.52 99.30 99.97 
 
VitaminC 
Supervised 69.99 80.36 80.69 78.33 
 + CL 75.77 79.32 78.95 78.90 
 + CAD 80.71 82.69 75.69 80.78 
ModelBERTRoBERTaALBERTEns.
FEVER 
Supervised 82.18 81.88 85.03 84.24 
 + CL 87.63 93.53 95.18 91.60 
 + CAD 89.50 94.73 90.89 90.95 
 
HoVer 
Supervised 97.27 78.64 97.65 88.57 
 + CL 99.58 99.71 99.45 99.98 
 + CAD 99.65 98.52 99.30 99.97 
 
VitaminC 
Supervised 69.99 80.36 80.69 78.33 
 + CL 75.77 79.32 78.95 78.90 
 + CAD 80.71 82.69 75.69 80.78 
Close Modal

or Create an Account

Close Modal
Close Modal