Degradation of mT5 and ByT5 under various types of noise. “Clean” shows original task performance. Subsequent rows show the delta from “clean” when adding different types of noise. Learnable noise is added in training and eval, while unseen noise only affects eval.
. | Model . | Learnable Noise . | Unseen Noise . | |
---|---|---|---|---|
XNLI (accuracy) . | TyDiQA-GoldP (F1) . | XNLI (accuracy) . | ||
Clean | mT5 | 81.1 | 85.3 | 81.1 |
ByT5 | 79.7 | 87.7 | 79.7 | |
Drop | mT5 | −10.2 | −24.0 | −18.3 |
ByT5 | −8.2 | −19.5 | −11.4 | |
Repetitions | mT5 | −8.5 | −9.5 | −12.3 |
ByT5 | −4.1 | −3.0 | −5.9 | |
Antspeak | mT5 | −32.0 | −27.7 | −34.4 |
ByT5 | −8.7 | −4.8 | −24.4 | |
Uppercase | mT5 | −7.0 | −8.0 | −8.1 |
ByT5 | −1.5 | −0.5 | −1.7 | |
Random Case | mT5 | −25.7 | −14.3 | −19.2 |
ByT5 | −1.5 | −0.2 | −5.9 |
. | Model . | Learnable Noise . | Unseen Noise . | |
---|---|---|---|---|
XNLI (accuracy) . | TyDiQA-GoldP (F1) . | XNLI (accuracy) . | ||
Clean | mT5 | 81.1 | 85.3 | 81.1 |
ByT5 | 79.7 | 87.7 | 79.7 | |
Drop | mT5 | −10.2 | −24.0 | −18.3 |
ByT5 | −8.2 | −19.5 | −11.4 | |
Repetitions | mT5 | −8.5 | −9.5 | −12.3 |
ByT5 | −4.1 | −3.0 | −5.9 | |
Antspeak | mT5 | −32.0 | −27.7 | −34.4 |
ByT5 | −8.7 | −4.8 | −24.4 | |
Uppercase | mT5 | −7.0 | −8.0 | −8.1 |
ByT5 | −1.5 | −0.5 | −1.7 | |
Random Case | mT5 | −25.7 | −14.3 | −19.2 |
ByT5 | −1.5 | −0.2 | −5.9 |