Table 7: 

Models used in our ablation study.

ModelParamsDescription
ByT5-Large 1.23B Baseline ByT5 model 
mT5-Large 1.23B Baseline mT5 model 
 
(a) ByT5-36/12-668M 668M encoder:36, decoder:12 
(b) ByT5-24/24-718M 718M encoder:24, decoder:24 
(c) ByT5-12/36-768M 768M encoder:12, decoder:36 
 
(d) mT5-36/12-1.18B 1.18B encoder:36, decoder:12 
 
(e) ByT5-Large-Span3 1.23B Mean noise span 3.0 
(f) ByT5-Large-Span40 1.23B Mean noise span 40.0 
 
(g) CharT5-36/12-1.23B 1.23B 47K character vocab 
ModelParamsDescription
ByT5-Large 1.23B Baseline ByT5 model 
mT5-Large 1.23B Baseline mT5 model 
 
(a) ByT5-36/12-668M 668M encoder:36, decoder:12 
(b) ByT5-24/24-718M 718M encoder:24, decoder:24 
(c) ByT5-12/36-768M 768M encoder:12, decoder:36 
 
(d) mT5-36/12-1.18B 1.18B encoder:36, decoder:12 
 
(e) ByT5-Large-Span3 1.23B Mean noise span 3.0 
(f) ByT5-Large-Span40 1.23B Mean noise span 40.0 
 
(g) CharT5-36/12-1.23B 1.23B 47K character vocab 
Close Modal

or Create an Account

Close Modal
Close Modal