Skip to Main Content
Table 5: 
Quality on the TyDi QA primary tasks (passage answer and minimal answer) using: a naïve first-passage baseline, the open-source multilingual BERT model (mBERT), and a human predictor (Section 7.3). F1, precision, and recall measurements (Section 7.1) are averaged over four fine-tuning replicas for mBERT.
Train SizePassage Answer F1 (P/R)Minimal Answer Span F1 (P/R)
First passagemBERTLesser HumanmBERTLesser Human
(English) 9,211 32.9(28.4/39.1) 62.5(62.6/62.5) 69.4(63.4/77.6) 44.0(52.9/37.8) 54.4(52.9/56.5) 
 
Arabic 23,092 64.7(59.2/71.3) 81.7(85.7/78.1) 85.4(82.1/89.0) 69.3(74.9/64.5) 73.5(73.6/73.5) 
Bengali 10,768 21.4(15.5/34.6) 60.3(61.4/59.5) 85.5(81.6/89.7) 47.7(50.7/45.3) 79.1(78.6/79.7) 
 
Finnish 15,285 35.4(28.4/47.1) 60.8(58.7/63.0) 76.3(69.8/84.2) 48.0(56.7/41.8) 65.3(61.8/69.4) 
Indonesian 14,952 32.6(23.8/51.7) 61.4(57.2/66.7) 78.6(72.7/85.6) 51.3(54.5/48.8) 71.1(68.7/73.7) 
Japanese 16,288 19.4(14.8/28.0) 40.6(42.2/39.5) 65.1(57.8/74.8) 30.4(42.1/23.9) 53.3(51.8/55.2) 
 
Kiswahili 17,613 20.3(13.4/42.0) 60.2(58.4/62.3) 76.8(70.1/85.0) 49.7(55.2/45.4) 67.4(63.4/72.1) 
Korean 10,981 19.9(13.1/41.5) 56.8(58.7/55.3) 72.9(66.3/82.4) 40.1(45.2/36.2) 56.7(56.3/58.6) 
Russian 12,803 30.0(25.5/36.4) 63.2(65.3/61.2) 87.2(84.4/90.2) 45.8(51.7/41.2) 76.0(82.0/70.8) 
 
Telugu 24,558 23.3(15.1/50.9) 81.3(81.7/80.9) 95.0(93.3/96.8) 74.3(77.7/71.3) 93.3(91.6/95.2) 
Thai 11,365 34.7(27.8/46.4) 64.7(61.8/68.0) 76.1(69.9/84.3) 48.3(54.3/43.7) 65.6(63.9/67.9) 
 
Overall 166,916 30.2(23.6/45.0) 63.1(57.0/59.1) 79.9(84.4/74.5) 50.5(41.3/35.3) 70.1(70.8/62.4) 
Train SizePassage Answer F1 (P/R)Minimal Answer Span F1 (P/R)
First passagemBERTLesser HumanmBERTLesser Human
(English) 9,211 32.9(28.4/39.1) 62.5(62.6/62.5) 69.4(63.4/77.6) 44.0(52.9/37.8) 54.4(52.9/56.5) 
 
Arabic 23,092 64.7(59.2/71.3) 81.7(85.7/78.1) 85.4(82.1/89.0) 69.3(74.9/64.5) 73.5(73.6/73.5) 
Bengali 10,768 21.4(15.5/34.6) 60.3(61.4/59.5) 85.5(81.6/89.7) 47.7(50.7/45.3) 79.1(78.6/79.7) 
 
Finnish 15,285 35.4(28.4/47.1) 60.8(58.7/63.0) 76.3(69.8/84.2) 48.0(56.7/41.8) 65.3(61.8/69.4) 
Indonesian 14,952 32.6(23.8/51.7) 61.4(57.2/66.7) 78.6(72.7/85.6) 51.3(54.5/48.8) 71.1(68.7/73.7) 
Japanese 16,288 19.4(14.8/28.0) 40.6(42.2/39.5) 65.1(57.8/74.8) 30.4(42.1/23.9) 53.3(51.8/55.2) 
 
Kiswahili 17,613 20.3(13.4/42.0) 60.2(58.4/62.3) 76.8(70.1/85.0) 49.7(55.2/45.4) 67.4(63.4/72.1) 
Korean 10,981 19.9(13.1/41.5) 56.8(58.7/55.3) 72.9(66.3/82.4) 40.1(45.2/36.2) 56.7(56.3/58.6) 
Russian 12,803 30.0(25.5/36.4) 63.2(65.3/61.2) 87.2(84.4/90.2) 45.8(51.7/41.2) 76.0(82.0/70.8) 
 
Telugu 24,558 23.3(15.1/50.9) 81.3(81.7/80.9) 95.0(93.3/96.8) 74.3(77.7/71.3) 93.3(91.6/95.2) 
Thai 11,365 34.7(27.8/46.4) 64.7(61.8/68.0) 76.1(69.9/84.3) 48.3(54.3/43.7) 65.6(63.9/67.9) 
 
Overall 166,916 30.2(23.6/45.0) 63.1(57.0/59.1) 79.9(84.4/74.5) 50.5(41.3/35.3) 70.1(70.8/62.4) 
Close Modal

or Create an Account

Close Modal
Close Modal