Pre-training speed and computation of mT5 vs. ByT5. Left: Sequences per second pre-training on a TPUv3-64 device. Right: Total einsum operations for a forward pass, as logged by the T5 framework.
. | sequences / sec . | einsum ops × 1e12 . | ||
---|---|---|---|---|
mT5 . | ByT5 . | mT5 . | ByT5 . | |
Small | 1646 | 1232 (0.75 ×) | 87 | 98 (1.13 ×) |
Base | 747 | 576 (0.77 ×) | 168 | 194 (1.15 ×) |
Large | 306 | 232 (0.76 ×) | 346 | 416 (1.20 ×) |
XL | 94 | 70 (0.74 ×) | 1000 | 1220 (1.22 ×) |
XXL | 33 | 25 (0.76 ×) | 1660 | 2070 (1.25 ×) |
. | sequences / sec . | einsum ops × 1e12 . | ||
---|---|---|---|---|
mT5 . | ByT5 . | mT5 . | ByT5 . | |
Small | 1646 | 1232 (0.75 ×) | 87 | 98 (1.13 ×) |
Base | 747 | 576 (0.77 ×) | 168 | 194 (1.15 ×) |
Large | 306 | 232 (0.76 ×) | 346 | 416 (1.20 ×) |
XL | 94 | 70 (0.74 ×) | 1000 | 1220 (1.22 ×) |
XXL | 33 | 25 (0.76 ×) | 1660 | 2070 (1.25 ×) |