Skip to Main Content
Table 2: 
Results of different models and initialization techniques on DiscoFuse and subsampled training sets. Blockwise sorted by SARI score on 100% of the training set.
DiscoFuse 100% 10% 1% 
 Exact SARI SARI SARI 
(Geva et al., 2019) 51.1 84.5 – – 
Initialized with the base checkpoint (12 layers) 
roberta2gpt 65.6 89.9 87.1 80.3 
robertaShare 65.3 89.7 86.9 81.2 
bert2bert 63.9 89.3 86.1 81.2 
bert2rnd 63.9 89.3 86.1 80.3 
bertShare 63.9 89.2 86.0 80.8 
bert2gpt 61.5 88.4 84.1 70.2 
gpt 60.4 88.0 82.9 74.5 
rnd2bert 60.0 87.6 82.1 72.8 
rnd2rnd 58.3 86.9 81.5 69.3 
rnd2gpt 57.6 86.5 81.4 70.6 
 
Initialized with the large checkpoint (24 layers) 
robertaShare 66.6 90.3 87.7 81.5 
bertShare 65.3 89.9 86.6 81.4 
DiscoFuse 100% 10% 1% 
 Exact SARI SARI SARI 
(Geva et al., 2019) 51.1 84.5 – – 
Initialized with the base checkpoint (12 layers) 
roberta2gpt 65.6 89.9 87.1 80.3 
robertaShare 65.3 89.7 86.9 81.2 
bert2bert 63.9 89.3 86.1 81.2 
bert2rnd 63.9 89.3 86.1 80.3 
bertShare 63.9 89.2 86.0 80.8 
bert2gpt 61.5 88.4 84.1 70.2 
gpt 60.4 88.0 82.9 74.5 
rnd2bert 60.0 87.6 82.1 72.8 
rnd2rnd 58.3 86.9 81.5 69.3 
rnd2gpt 57.6 86.5 81.4 70.6 
 
Initialized with the large checkpoint (24 layers) 
robertaShare 66.6 90.3 87.7 81.5 
bertShare 65.3 89.9 86.6 81.4 
Close Modal

or Create an Account

Close Modal
Close Modal