Skip to Main Content
Table 8: 
Training models on SQuAD combined with all the adversarially created datasets DBiDAF, DBERT, and DRoBERTa. Results underlined indicate the best result per model. We report the mean and standard deviation (subscript) over 10 runs with different random seeds.
Evaluation (Test) Dataset
ModelDSQuADDBiDAFDBERTDRoBERTa
EMF1EMF1EMF1EMF1
BiDAF 57.10.4 70.40.3 17.10.8 27.00.9 20.01.0 29.20.8 18.30.6 27.40.7 
BERT 75.50.2 87.20.2 57.71.0 71.01.1 52.10.7 62.20.7 43.01.1 54.21.0 
RoBERTa 74.20.3 86.90.3 59.80.5 74.10.6 55.10.6 65.10.7 41.61.0 52.71.0 
Evaluation (Test) Dataset
ModelDSQuADDBiDAFDBERTDRoBERTa
EMF1EMF1EMF1EMF1
BiDAF 57.10.4 70.40.3 17.10.8 27.00.9 20.01.0 29.20.8 18.30.6 27.40.7 
BERT 75.50.2 87.20.2 57.71.0 71.01.1 52.10.7 62.20.7 43.01.1 54.21.0 
RoBERTa 74.20.3 86.90.3 59.80.5 74.10.6 55.10.6 65.10.7 41.61.0 52.71.0 
Close Modal

or Create an Account

Close Modal
Close Modal