Skip to Main Content
Table 5: 
Test set results for the structured prediction tasks with BERTLARGE-Cased; each entry reflects the mean of three random seeds. We compare the no distillation baseline (“No-KD”) with the best structure-distilled model, as selected on the validation set (“Best-KD”); “Error Red.” reports the test error reductions relative to the No-KD baseline. We also report the previous state of the art among non-ensemble models pretrained on the original BERT training set (“BERT SoTA”).21
TaskTest Set - BERTLARGE-Cased
No-KDBest-KDError Red.BERT SoTA
Parsing Const. PTB – F1 95.80 95.95 3.73% 95.84 
Const. PTB – EM 56.87 57.74 2.02% − 
Const. OOD – F1 89.63 90.20 5.48% 89.91 
Dep. PTB − UAS 96.91 97.03 3.78% 97.0 
Dep. PTB − LAS 95.33 95.49 3.43% 95.43 
 
 SRL − OntoNotes 87.59 87.77 1.45% 86.5♢ 
 Coref. − OntoNotes 74.03 74.69 2.55% 79.6♦ 
TaskTest Set - BERTLARGE-Cased
No-KDBest-KDError Red.BERT SoTA
Parsing Const. PTB – F1 95.80 95.95 3.73% 95.84 
Const. PTB – EM 56.87 57.74 2.02% − 
Const. OOD – F1 89.63 90.20 5.48% 89.91 
Dep. PTB − UAS 96.91 97.03 3.78% 97.0 
Dep. PTB − LAS 95.33 95.49 3.43% 95.43 
 
 SRL − OntoNotes 87.59 87.77 1.45% 86.5♢ 
 Coref. − OntoNotes 74.03 74.69 2.55% 79.6♦ 
Close Modal

or Create an Account

Close Modal
Close Modal