Skip to Main Content
Table 6: 
Training models on various datasets, each with 10,000 samples, and measuring their generalization to different evaluation datasets. Results underlined indicate the best result per model. We report the mean and standard deviation (subscript) over 10 runs with different random seeds.
Evaluation (Test) Dataset
ModelTrained OnDSQuADDBiDAFDBERTDRoBERTaDDROPDNQ
EMF1EMF1EMF1EMF1EMF1EMF1
BiDAF DSQuAD(10K) 40.90.6 54.30.6 7.10.6 15.70.6 5.60.3 13.50.4 5.70.4 13.50.4 3.80.4 8.60.6 25.11.1 38.70.7 
DBiDAF 11.50.4 20.90.4 5.30.4 11.60.5 7.10.4 14.80.6 6.80.5 13.50.6 6.50.5 12.40.4 15.71.1 28.70.8 
DBERT 10.80.3 19.80.4 7.20.5 14.40.6 6.90.3 14.50.4 8.10.4 15.00.6 7.80.9 14.50.9 16.50.6 28.30.9 
DRoBERTa 10.70.2 20.20.3 6.30.7 13.50.8 9.40.6 17.00.6 8.90.9 16.00.8 15.30.8 22.90.8 13.40.9 27.11.2 
 
BERT DSQuAD(10K) 69.40.5 82.70.4 35.11.9 49.32.2 15.62.0 27.32.1 11.91.5 23.01.4 18.92.3 28.93.2 52.91.0 68.21.0 
DBiDAF 66.50.7 80.60.6 46.21.2 61.11.2 37.81.4 48.81.5 30.60.8 42.50.6 41.12.3 50.62.0 54.21.2 69.80.9 
DBERT 61.21.8 75.71.6 42.91.9 57.51.8 37.42.1 47.92.0 29.32.1 40.02.3 39.42.2 47.62.2 49.92.3 65.72.3 
DRoBERTa 57.01.7 71.71.8 37.02.3 52.02.5 34.81.5 45.92.0 30.52.2 41.22.2 39.03.1 47.42.8 45.82.4 62.42.5 
 
RoBERTa DSQuAD(10K) 68.60.5 82.80.3 37.71.1 53.81.1 20.81.2 34.01.0 11.00.8 22.10.9 25.02.2 39.42.4 43.93.8 62.83.1 
DBiDAF 64.80.7 80.00.4 48.01.2 64.31.1 40.01.5 51.51.3 29.01.9 39.91.8 44.52.1 55.41.9 48.41.1 66.90.8 
DBERT 59.51.0 75.10.9 45.41.5 60.71.5 38.41.8 49.81.7 28.21.5 38.81.5 42.22.3 52.62.0 45.81.1 63.61.1 
DRoBERTa 56.20.7 72.10.7 41.40.8 57.10.8 38.41.1 49.50.9 30.21.3 41.01.2 41.20.9 51.20.8 43.61.1 61.60.9 
Evaluation (Test) Dataset
ModelTrained OnDSQuADDBiDAFDBERTDRoBERTaDDROPDNQ
EMF1EMF1EMF1EMF1EMF1EMF1
BiDAF DSQuAD(10K) 40.90.6 54.30.6 7.10.6 15.70.6 5.60.3 13.50.4 5.70.4 13.50.4 3.80.4 8.60.6 25.11.1 38.70.7 
DBiDAF 11.50.4 20.90.4 5.30.4 11.60.5 7.10.4 14.80.6 6.80.5 13.50.6 6.50.5 12.40.4 15.71.1 28.70.8 
DBERT 10.80.3 19.80.4 7.20.5 14.40.6 6.90.3 14.50.4 8.10.4 15.00.6 7.80.9 14.50.9 16.50.6 28.30.9 
DRoBERTa 10.70.2 20.20.3 6.30.7 13.50.8 9.40.6 17.00.6 8.90.9 16.00.8 15.30.8 22.90.8 13.40.9 27.11.2 
 
BERT DSQuAD(10K) 69.40.5 82.70.4 35.11.9 49.32.2 15.62.0 27.32.1 11.91.5 23.01.4 18.92.3 28.93.2 52.91.0 68.21.0 
DBiDAF 66.50.7 80.60.6 46.21.2 61.11.2 37.81.4 48.81.5 30.60.8 42.50.6 41.12.3 50.62.0 54.21.2 69.80.9 
DBERT 61.21.8 75.71.6 42.91.9 57.51.8 37.42.1 47.92.0 29.32.1 40.02.3 39.42.2 47.62.2 49.92.3 65.72.3 
DRoBERTa 57.01.7 71.71.8 37.02.3 52.02.5 34.81.5 45.92.0 30.52.2 41.22.2 39.03.1 47.42.8 45.82.4 62.42.5 
 
RoBERTa DSQuAD(10K) 68.60.5 82.80.3 37.71.1 53.81.1 20.81.2 34.01.0 11.00.8 22.10.9 25.02.2 39.42.4 43.93.8 62.83.1 
DBiDAF 64.80.7 80.00.4 48.01.2 64.31.1 40.01.5 51.51.3 29.01.9 39.91.8 44.52.1 55.41.9 48.41.1 66.90.8 
DBERT 59.51.0 75.10.9 45.41.5 60.71.5 38.41.8 49.81.7 28.21.5 38.81.5 42.22.3 52.62.0 45.81.1 63.61.1 
DRoBERTa 56.20.7 72.10.7 41.40.8 57.10.8 38.41.1 49.50.9 30.21.3 41.01.2 41.20.9 51.20.8 43.61.1 61.60.9 
Close Modal

or Create an Account

Close Modal
Close Modal