Skip to Main Content
Table 4: 
Uncased BLEU-4 scores on WMT14 English ↔ German newstest2014 and newstest2016 test sets. Models in the middle section use the 110k wordpiece vocabulary that comes with the multilingual BERT checkpoint. In the bottom section, we use the native 32k wordpiece vocabulary extracted from WMT14 train set and a BERT checkpoint pre-trained only on English and German subset of Wikipedia. * Leveraging a large number of additional parallel sentence pairs obtained with back-translation; we include this score as a reference to the highest achieved result on newstest2014. The GPT-2 results for En→De (where the GPT-2 initialized decoder is used to decode targets in De) are grayed out as they are a priori penalizing for GPT-2, which was only pretrained on En texts.
 newstest2014 newstest2016 
 En→De De→En En→De De→En 
(Vaswani et al., 2017) 27.3 – – – 
Transformer (ours) 28.1 31.4 33.5 37.9 
KERMIT (Chan et al., 2019) 28.7 31.4 – – 
(Shaw et al., 2018) 29.2 – – – 
(Edunov et al., 2018)* 35.0 (33.8) – – – 
 
Initialized with public checkpoints (12 layers) and vocabulary 
Transformer (ours) 23.7 26.6 31.6 35.8 
rnd2rnd 26.0 29.1 32.4 36.7 
bert2rnd 30.1 32.7 34.4 39.6 
rnd2bert 27.2 30.4 33.2 37.5 
bert2bert 30.1 32.7 34.6 39.3 
bertShare 29.6 32.6 34.4 39.6 
gpt gray16.4 21.5 gray22.4 27.7 
rnd2gpt gray19.6 23.2 gray24.2 28.5 
bert2gpt gray23.2 31.4 gray28.1 37.0 
 
Initialized with a custom BERT checkpoint (12 layers) and vocabulary 
bert2rnd 30.6 33.5 35.1 40.2 
bertShare 30.5 33.6 35.5 40.1 
 
Initialized with a custom BERT checkpoint (24 layers) and vocabulary 
bert2rnd 31.7 34.2 35.6 41.1 
bertShare 30.5 33.8 35.4 40.9 
 newstest2014 newstest2016 
 En→De De→En En→De De→En 
(Vaswani et al., 2017) 27.3 – – – 
Transformer (ours) 28.1 31.4 33.5 37.9 
KERMIT (Chan et al., 2019) 28.7 31.4 – – 
(Shaw et al., 2018) 29.2 – – – 
(Edunov et al., 2018)* 35.0 (33.8) – – – 
 
Initialized with public checkpoints (12 layers) and vocabulary 
Transformer (ours) 23.7 26.6 31.6 35.8 
rnd2rnd 26.0 29.1 32.4 36.7 
bert2rnd 30.1 32.7 34.4 39.6 
rnd2bert 27.2 30.4 33.2 37.5 
bert2bert 30.1 32.7 34.6 39.3 
bertShare 29.6 32.6 34.4 39.6 
gpt gray16.4 21.5 gray22.4 27.7 
rnd2gpt gray19.6 23.2 gray24.2 28.5 
bert2gpt gray23.2 31.4 gray28.1 37.0 
 
Initialized with a custom BERT checkpoint (12 layers) and vocabulary 
bert2rnd 30.6 33.5 35.1 40.2 
bertShare 30.5 33.6 35.5 40.1 
 
Initialized with a custom BERT checkpoint (24 layers) and vocabulary 
bert2rnd 31.7 34.2 35.6 41.1 
bertShare 30.5 33.8 35.4 40.9 
Close Modal

or Create an Account

Close Modal
Close Modal