Skip to Main Content
Table 8: 

Results of BERT-based experiments on GLUE development sets, where results on MNLI and QQP are the median of five runs and results on other datasets are the median of nine runs. The size of MNLI and QQP is very large, taking a long time to train on. Therefore, we reduced the number of runs. Because we used a different optimization method to re-implement BERT, our median performance is not the same as that reported in Lan et al. (2019).

CoLASST-2RTEQNLIMRPC
(Matthew Corr.)(Accuracy)(Accuracy)(Accuracy)(Accuracy/F1)
The median result 
BERT, Lan et al., 2019  60.6 93.2 70.4 92.3 88.0/– 
BERT, our run 62.1 93.1 74.0 92.1 86.8/90.8 
TAPT 61.2 93.1 74.0 92.0 85.3/89.8 
SSL-Reg (SATP) 63.7 93.9 74.7 92.3 86.5/90.3 
SSL-Reg (MTP) 63.8 93.8 74.7 92.6 87.3/90.9 
 
The best result 
BERT, our run 63.9 93.3 75.8 92.5 89.5/92.6 
TAPT 62.0 93.9 76.2 92.4 86.5/90.7 
SSL-Reg (SATP) 65.3 94.6 78.0 92.8 88.5/91.9 
SSL-Reg (MTP) 66.3 94.7 78.0 93.1 89.5/92.4 
CoLASST-2RTEQNLIMRPC
(Matthew Corr.)(Accuracy)(Accuracy)(Accuracy)(Accuracy/F1)
The median result 
BERT, Lan et al., 2019  60.6 93.2 70.4 92.3 88.0/– 
BERT, our run 62.1 93.1 74.0 92.1 86.8/90.8 
TAPT 61.2 93.1 74.0 92.0 85.3/89.8 
SSL-Reg (SATP) 63.7 93.9 74.7 92.3 86.5/90.3 
SSL-Reg (MTP) 63.8 93.8 74.7 92.6 87.3/90.9 
 
The best result 
BERT, our run 63.9 93.3 75.8 92.5 89.5/92.6 
TAPT 62.0 93.9 76.2 92.4 86.5/90.7 
SSL-Reg (SATP) 65.3 94.6 78.0 92.8 88.5/91.9 
SSL-Reg (MTP) 66.3 94.7 78.0 93.1 89.5/92.4 
Close Modal

or Create an Account

Close Modal
Close Modal