Accuracy of models trained on the supervised training splits of each dataset (Supervised), the contrastive objective in addition to training with Supervised (+CL), and the counterfactually augmented data (+CAD). The models are evaluated on the task of Evidence Sufficiency Prediction on datasets with extracted unrelated evidence information (§6.4).
Model . | BERT . | RoBERTa . | ALBERT . | Ens. . |
---|---|---|---|---|
FEVER | ||||
Supervised | 82.18 | 81.88 | 85.03 | 84.24 |
+ CL | 87.63 | 93.53 | 95.18 | 91.60 |
+ CAD | 89.50 | 94.73 | 90.89 | 90.95 |
HoVer | ||||
Supervised | 97.27 | 78.64 | 97.65 | 88.57 |
+ CL | 99.58 | 99.71 | 99.45 | 99.98 |
+ CAD | 99.65 | 98.52 | 99.30 | 99.97 |
VitaminC | ||||
Supervised | 69.99 | 80.36 | 80.69 | 78.33 |
+ CL | 75.77 | 79.32 | 78.95 | 78.90 |
+ CAD | 80.71 | 82.69 | 75.69 | 80.78 |
Model . | BERT . | RoBERTa . | ALBERT . | Ens. . |
---|---|---|---|---|
FEVER | ||||
Supervised | 82.18 | 81.88 | 85.03 | 84.24 |
+ CL | 87.63 | 93.53 | 95.18 | 91.60 |
+ CAD | 89.50 | 94.73 | 90.89 | 90.95 |
HoVer | ||||
Supervised | 97.27 | 78.64 | 97.65 | 88.57 |
+ CL | 99.58 | 99.71 | 99.45 | 99.98 |
+ CAD | 99.65 | 98.52 | 99.30 | 99.97 |
VitaminC | ||||
Supervised | 69.99 | 80.36 | 80.69 | 78.33 |
+ CL | 75.77 | 79.32 | 78.95 | 78.90 |
+ CAD | 80.71 | 82.69 | 75.69 | 80.78 |