Skip to Main Content
Table 5: 

Macro F1-score on the original test set compared to baseline (FEVER) and SOTA (HoVer, VitaminC) oracle results. Highest results for a dataset are in bold.

DatasetModelF1
FEVER DA (Thorne et al., 2018) 83.84 
RoBERTa Supervised 88.69 
 + CL 88.68 
 + Augmented 89.23 
 
HoVer BERT (Jiang et al., 2020) 81.20 
BERT Supervised 80.75 
 + CL 81.82 
 + Augmented 81.87 
 
VitaminC ALBERT (Schuster et al., 2021) 82.76 
ALBERT Supervised 83.38 
 + CL 83.48 
 + Augmented 83.82 
DatasetModelF1
FEVER DA (Thorne et al., 2018) 83.84 
RoBERTa Supervised 88.69 
 + CL 88.68 
 + Augmented 89.23 
 
HoVer BERT (Jiang et al., 2020) 81.20 
BERT Supervised 80.75 
 + CL 81.82 
 + Augmented 81.87 
 
VitaminC ALBERT (Schuster et al., 2021) 82.76 
ALBERT Supervised 83.38 
 + CL 83.48 
 + Augmented 83.82 
Close Modal

or Create an Account

Close Modal
Close Modal