Ablation experiments on the Canine model with TyDi QA F1 scores. Deltas are shown in parentheses with regard to the top-most experiment, which serves as the baseline configuration for all experiments in this table. Each result is averaged over 3 fine-tuning and evaluation replicas.
Condition . | Examples / sec . | TyDi QA SelectP . | TyDi QA MinSpan . |
---|---|---|---|
Attend to (instead of hup) | 6400 | 64.5 | 52.2 |
8k codepoint hash buckets (instead of 16k) | 6400 | 64.1 (–0.4) | 50.5 (–1.7) |
Character vocab (no hashing) | 6400 | 64.6 (+/–) | 51.2 (–1.0) |
Input character dim 384 (instead of 768) | 6600 | 62.9 (–1.2) | 49.3 (–1.2) |
Input character dim 192 (instead of 768) | 6400 | 61.7 (–2.4) | 47.3 (–3.2) |
No initial character transformer | 6700 | 63.2 (–1.4) | 48.3 (–2.9) |
Downsample by a factor of 5 (instead of 4) | 7000 | 62.9 (–1.7) | 49.2 (–2.0) |
Downsample by a factor of 6 (instead of 4) | 9200 | 62.7 (–1.9) | 47.6 (–3.6) |
Don’t limit final character transformer to MLM positions | 5200 | — | — |
Canine-S | 6400 | 66.0 | 52.5 |
Condition . | Examples / sec . | TyDi QA SelectP . | TyDi QA MinSpan . |
---|---|---|---|
Attend to (instead of hup) | 6400 | 64.5 | 52.2 |
8k codepoint hash buckets (instead of 16k) | 6400 | 64.1 (–0.4) | 50.5 (–1.7) |
Character vocab (no hashing) | 6400 | 64.6 (+/–) | 51.2 (–1.0) |
Input character dim 384 (instead of 768) | 6600 | 62.9 (–1.2) | 49.3 (–1.2) |
Input character dim 192 (instead of 768) | 6400 | 61.7 (–2.4) | 47.3 (–3.2) |
No initial character transformer | 6700 | 63.2 (–1.4) | 48.3 (–2.9) |
Downsample by a factor of 5 (instead of 4) | 7000 | 62.9 (–1.7) | 49.2 (–2.0) |
Downsample by a factor of 6 (instead of 4) | 9200 | 62.7 (–1.9) | 47.6 (–3.6) |
Don’t limit final character transformer to MLM positions | 5200 | — | — |
Canine-S | 6400 | 66.0 | 52.5 |