Skip to Main Content
Table 7: 
Summary of the full results on GLUE, comparing the No-KD baseline with the UG-KD structure-distilled BERT (§4.2). All results are based on a single random seed: we select the 1-best fine-tuning hyperparameters (including random seed) on the validation set, which we then evaluate on the test set.
ModelCoLASST-2MRPCQQPMNLI (m/mm)QNLIRTEGLUE Avg
Dev No-KD 60.2 92.2 90.0 89.4 90.3/90.9 90.7 71.1 84.4 
UG-KD 60.6 92.0 88.9 89.3 89.6/90.0 89.9 68.6 83.6 
 
Test No-KD 53.1 92.5 88.0 88.8 82.8/81.8 89.9 65.4 80.3 
UG-KD 55.3 91.2 87.6 88.7 81.9/80.8 89.5 65.0 80.0 
ModelCoLASST-2MRPCQQPMNLI (m/mm)QNLIRTEGLUE Avg
Dev No-KD 60.2 92.2 90.0 89.4 90.3/90.9 90.7 71.1 84.4 
UG-KD 60.6 92.0 88.9 89.3 89.6/90.0 89.9 68.6 83.6 
 
Test No-KD 53.1 92.5 88.0 88.8 82.8/81.8 89.9 65.4 80.3 
UG-KD 55.3 91.2 87.6 88.7 81.9/80.8 89.5 65.0 80.0 
Close Modal

or Create an Account

Close Modal
Close Modal