Table 3: 
Summary of the validation and test set results on GLUE. The validation results are derived from the average of five random seeds for each task, which accounts for variance, and the 1-best random seed, which does not. The test results are derived from the 1-best random seed on the validation set.
No-KDUG-KD
Validation Set (Per-task average / 1-best random seed) 
CoLA 50.7 / 60.2 54.3 / 60.6 
7-task avg. (excl. CoLA) 85.4 / 87.8 84.8 / 86.9 
Overall 8-task avg. 81.1 / 84.4 81.0 / 83.6 
 
Test set (Per-task 1-best random seed on validation set) 
CoLA 53.1 55.3 
7-task avg. (excl. CoLA) 84.2 83.5 
Overall 8-task avg. 80.3 80.0 
No-KDUG-KD
Validation Set (Per-task average / 1-best random seed) 
CoLA 50.7 / 60.2 54.3 / 60.6 
7-task avg. (excl. CoLA) 85.4 / 87.8 84.8 / 86.9 
Overall 8-task avg. 81.1 / 84.4 81.0 / 83.6 
 
Test set (Per-task 1-best random seed on validation set) 
CoLA 53.1 55.3 
7-task avg. (excl. CoLA) 84.2 83.5 
Overall 8-task avg. 80.3 80.0 
Close Modal

or Create an Account

Close Modal
Close Modal