Direct comparison between mBert (rows 1–2) and Canine (rows 5–7) on TyDi QA. Public mBert results are taken from the TyDi QA paper. Rows 3 and 4 show simple baselines that yield inefficient / low-quality performance. Despite operating on 4x more sequence positions, Canine remains comparable to mBert in terms of speed. Pre-training example/sec are shown for our reported hardware (see Setup, §4.1). r represents the ratio for downsampling. Parameters are calculated at fine-tuning time. All results are averaged over 3 fine-tuning replicas. TyDi QA scores are F1 scores, macro-averaged across languages. Deltas from our mBert (the most comparable baseline) are shown in parentheses.
Model . | Input . | MLM . | Examples . | TyDiQA . | TyDiQA . | |||
---|---|---|---|---|---|---|---|---|
r . | Length . | / sec . | Params . | SelectP . | MinSpan . | |||
mBert (public) | Subwords | Subwords | – | –512 | – | 179M | 63.1 | 50.5 |
mBert (ours) | Subwords | Subwords | – | –512 | 9000 | 179M | 63.2 | 51.3 |
Chars | Single Chars | 1 | 2048 | –925 | 127M | 59.5 (–3.7) | 43.7 (–7.5) | |
Chars | Subwords | 1 | 2048 | –900 | 127M | 63.8 (+0.6) | 50.2 (–1.0) | |
Canine-S | Chars | Subwords | 4 | 2048 | 6400 | 127M | 66.0 (+2.8) | 52.5 (+1.2) |
Canine-C | Chars | Autoreg. Chars | 4 | 2048 | 6050 | 127M | 65.7 (+2.5) | 53.0 (+1.7) |
Canine-C + n-grams | Chars | Autoreg. Chars | 4 | 2048 | 5600 | 167M | 68.1 (+4.9) | 57.0 (+5.7) |
Model . | Input . | MLM . | Examples . | TyDiQA . | TyDiQA . | |||
---|---|---|---|---|---|---|---|---|
r . | Length . | / sec . | Params . | SelectP . | MinSpan . | |||
mBert (public) | Subwords | Subwords | – | –512 | – | 179M | 63.1 | 50.5 |
mBert (ours) | Subwords | Subwords | – | –512 | 9000 | 179M | 63.2 | 51.3 |
Chars | Single Chars | 1 | 2048 | –925 | 127M | 59.5 (–3.7) | 43.7 (–7.5) | |
Chars | Subwords | 1 | 2048 | –900 | 127M | 63.8 (+0.6) | 50.2 (–1.0) | |
Canine-S | Chars | Subwords | 4 | 2048 | 6400 | 127M | 66.0 (+2.8) | 52.5 (+1.2) |
Canine-C | Chars | Autoreg. Chars | 4 | 2048 | 6050 | 127M | 65.7 (+2.5) | 53.0 (+1.7) |
Canine-C + n-grams | Chars | Autoreg. Chars | 4 | 2048 | 5600 | 167M | 68.1 (+4.9) | 57.0 (+5.7) |