Table 2: 

Direct comparison between mBert (rows 1–2) and Canine (rows 5–7) on TyDi QA. Public mBert results are taken from the TyDi QA paper. Rows 3 and 4 show simple baselines that yield inefficient / low-quality performance. Despite operating on 4x more sequence positions, Canine remains comparable to mBert in terms of speed. Pre-training example/sec are shown for our reported hardware (see Setup, §4.1). r represents the ratio for downsampling. Parameters are calculated at fine-tuning time. All results are averaged over 3 fine-tuning replicas. TyDi QA scores are F1 scores, macro-averaged across languages. Deltas from our mBert (the most comparable baseline) are shown in parentheses.

ModelInputMLMExamplesTyDiQATyDiQA
rLength/ secParamsSelectPMinSpan
mBert (public) Subwords Subwords – –512 – 179M 63.1 50.5 
mBert (ours) Subwords Subwords – –512 9000 179M 63.2 51.3 
 Chars Single Chars 2048 –925 127M 59.5 (–3.7) 43.7 (–7.5) 
 Chars Subwords 2048 –900 127M 63.8 (+0.6) 50.2 (–1.0) 
Canine-S Chars Subwords 2048 6400 127M 66.0 (+2.8) 52.5 (+1.2) 
Canine-C Chars Autoreg. Chars 2048 6050 127M 65.7 (+2.5) 53.0 (+1.7) 
Canine-C + n-grams Chars Autoreg. Chars 2048 5600 167M 68.1 (+4.9) 57.0 (+5.7) 
ModelInputMLMExamplesTyDiQATyDiQA
rLength/ secParamsSelectPMinSpan
mBert (public) Subwords Subwords – –512 – 179M 63.1 50.5 
mBert (ours) Subwords Subwords – –512 9000 179M 63.2 51.3 
 Chars Single Chars 2048 –925 127M 59.5 (–3.7) 43.7 (–7.5) 
 Chars Subwords 2048 –900 127M 63.8 (+0.6) 50.2 (–1.0) 
Canine-S Chars Subwords 2048 6400 127M 66.0 (+2.8) 52.5 (+1.2) 
Canine-C Chars Autoreg. Chars 2048 6050 127M 65.7 (+2.5) 53.0 (+1.7) 
Canine-C + n-grams Chars Autoreg. Chars 2048 5600 167M 68.1 (+4.9) 57.0 (+5.7) 
Close Modal

or Create an Account

Close Modal
Close Modal