mT5 and ByT5 performance on GLUE and SuperGLUE. For each benchmark, we fine-tune a single model on a mixture of all tasks, select the best checkpoint per task based on validation set performance, and report average validation set scores over all tasks.
Model . | GLUE . | SuperGLUE . | ||
---|---|---|---|---|
mT5 . | ByT5 . | mT5 . | ByT5 . | |
Small | 75.6 | 80.5 | 60.2 | 67.8 |
Base | 83.0 | 85.3 | 72.5 | 74.0 |
Large | 87.6 | 87.0 | 81.9 | 80.4 |
XL | 88.7 | 87.9 | 84.7 | 83.2 |
XXL | 90.7 | 90.1 | 89.2 | 88.6 |
Model . | GLUE . | SuperGLUE . | ||
---|---|---|---|---|
mT5 . | ByT5 . | mT5 . | ByT5 . | |
Small | 75.6 | 80.5 | 60.2 | 67.8 |
Base | 83.0 | 85.3 | 72.5 | 74.0 |
Large | 87.6 | 87.0 | 81.9 | 80.4 |
XL | 88.7 | 87.9 | 84.7 | 83.2 |
XXL | 90.7 | 90.1 | 89.2 | 88.6 |