Average inference examples per second on the test sets of word-level tasks (top) and sentence- or document-level tasks (bottom). We use a TPUv3-128 for GEM-XSum, and a TPUv3-32 elsewhere.
. | Grapheme-to-Phoneme . | Dakshina . | ||
---|---|---|---|---|
mT5 . | ByT5 . | mT5 . | ByT5 . | |
Small | 1223 | 1190 (1.0×) | 9483 | 6482 (1.5×) |
Base | 726 | 932 (0.8×) | 7270 | 4272 (1.7×) |
Large | 387 | 478 (0.8×) | 4243 | 2282 (1.9×) |
XL | 280 | 310 (0.9×) | 2922 | 1263 (2.3×) |
XXL | 150 | 146 (1.0×) | 1482 | 581 (2.6×) |
XNLI | GEM-XSum | |||
mT5 | ByT5 | mT5 | ByT5 | |
Small | 8632 | 1339 (6.4×) | 750 | 202 (3.7×) |
Base | 5157 | 687 (7.5×) | 450 | 114 (3.9×) |
Large | 1598 | 168 (9.5×) | 315 | 51 (6.2×) |
XL | 730 | 81 (9.0×) | 162 | 25 (6.4×) |
XXL | 261 | 33 (8.0×) | 61 | 10 (6.3×) |
. | Grapheme-to-Phoneme . | Dakshina . | ||
---|---|---|---|---|
mT5 . | ByT5 . | mT5 . | ByT5 . | |
Small | 1223 | 1190 (1.0×) | 9483 | 6482 (1.5×) |
Base | 726 | 932 (0.8×) | 7270 | 4272 (1.7×) |
Large | 387 | 478 (0.8×) | 4243 | 2282 (1.9×) |
XL | 280 | 310 (0.9×) | 2922 | 1263 (2.3×) |
XXL | 150 | 146 (1.0×) | 1482 | 581 (2.6×) |
XNLI | GEM-XSum | |||
mT5 | ByT5 | mT5 | ByT5 | |
Small | 8632 | 1339 (6.4×) | 750 | 202 (3.7×) |
Base | 5157 | 687 (7.5×) | 450 | 114 (3.9×) |
Large | 1598 | 168 (9.5×) | 315 | 51 (6.2×) |
XL | 730 | 81 (9.0×) | 162 | 25 (6.4×) |
XXL | 261 | 33 (8.0×) | 61 | 10 (6.3×) |