Table 6:

Ablation experiments on the Canine model with TyDi QA F1 scores. Deltas are shown in parentheses with regard to the top-most experiment, which serves as the baseline configuration for all experiments in this table. Each result is averaged over 3 fine-tuning and evaluation replicas.

ConditionExamples / secTyDi QA SelectPTyDi QA MinSpan
Attend to $hdown′$ (instead of hup6400 64.5 52.2
8k codepoint hash buckets (instead of 16k) 6400 64.1 (–0.4) 50.5 (–1.7)
Character vocab (no hashing) 6400 64.6 (+/–) 51.2 (–1.0)
Input character dim 384 (instead of 768) 6600 62.9 (–1.2) 49.3 (–1.2)
Input character dim 192 (instead of 768) 6400 61.7 (–2.4) 47.3 (–3.2)
No initial character transformer 6700 63.2 (–1.4) 48.3 (–2.9)
Downsample by a factor of 5 (instead of 4) 7000 62.9 (–1.7) 49.2 (–2.0)
Downsample by a factor of 6 (instead of 4) 9200 62.7 (–1.9) 47.6 (–3.6)
Don’t limit final character transformer to MLM positions 5200 — —

Canine-S 6400 66.0 52.5
ConditionExamples / secTyDi QA SelectPTyDi QA MinSpan
Attend to $hdown′$ (instead of hup6400 64.5 52.2
8k codepoint hash buckets (instead of 16k) 6400 64.1 (–0.4) 50.5 (–1.7)
Character vocab (no hashing) 6400 64.6 (+/–) 51.2 (–1.0)
Input character dim 384 (instead of 768) 6600 62.9 (–1.2) 49.3 (–1.2)
Input character dim 192 (instead of 768) 6400 61.7 (–2.4) 47.3 (–3.2)
No initial character transformer 6700 63.2 (–1.4) 48.3 (–2.9)
Downsample by a factor of 5 (instead of 4) 7000 62.9 (–1.7) 49.2 (–2.0)
Downsample by a factor of 6 (instead of 4) 9200 62.7 (–1.9) 47.6 (–3.6)
Don’t limit final character transformer to MLM positions 5200 — —

Canine-S 6400 66.0 52.5
Close Modal