Macro F1-score on the original test set compared to baseline (FEVER) and SOTA (HoVer, VitaminC) oracle results. Highest results for a dataset are in bold.
Dataset . | Model . | F1 . |
---|---|---|
FEVER | DA (Thorne et al., 2018) | 83.84 |
RoBERTa Supervised | 88.69 | |
+ CL | 88.68 | |
+ Augmented | 89.23 | |
HoVer | BERT (Jiang et al., 2020) | 81.20 |
BERT Supervised | 80.75 | |
+ CL | 81.82 | |
+ Augmented | 81.87 | |
VitaminC | ALBERT (Schuster et al., 2021) | 82.76 |
ALBERT Supervised | 83.38 | |
+ CL | 83.48 | |
+ Augmented | 83.82 |
Dataset . | Model . | F1 . |
---|---|---|
FEVER | DA (Thorne et al., 2018) | 83.84 |
RoBERTa Supervised | 88.69 | |
+ CL | 88.68 | |
+ Augmented | 89.23 | |
HoVer | BERT (Jiang et al., 2020) | 81.20 |
BERT Supervised | 80.75 | |
+ CL | 81.82 | |
+ Augmented | 81.87 | |
VitaminC | ALBERT (Schuster et al., 2021) | 82.76 |
ALBERT Supervised | 83.38 | |
+ CL | 83.48 | |
+ Augmented | 83.82 |