Skip to Main Content
Table 10: 

Results of BERT-based experiments on GLUE test sets, which are scored by the GLUE evaluation server (https://gluebenchmark.com/leaderboard). Models evaluated on AX are trained on the training dataset of MNLI.

BERTTAPTSSL-Reg (SATP)SSL-Reg (MTP)
CoLA (Matthew Corr.) 60.5 61.3 63.0 61.2 
SST-2 (Accuracy) 94.9 94.4 95.1 95.2 
RTE (Accuracy) 70.1 70.3 71.2 72.7 
QNLI (Accuracy) 92.7 92.4 92.5 93.2 
MRPC (Accuracy/F1) 85.4/89.3 85.9/89.5 85.3/89.3 86.1/89.8 
MNLI-m/mm (Accuracy) 86.7/85.9 85.7/84.4 86.2/85.4 86.6/86.1 
QQP (Accuracy/F1) 89.3/72.1 89.3/71.6 89.6/72.2 89.7/72.5 
STS-B (Pearson Corr./Spearman Corr.) 87.6/86.5 88.4/87.3 88.3/87.5 88.1/87.2 
WNLI (Accuracy) 65.1 65.8 65.8 66.4 
AX(Matthew Corr.) 39.6 39.3 40.2 40.3 
 
Average 80.5 80.6 81.0 81.3 
BERTTAPTSSL-Reg (SATP)SSL-Reg (MTP)
CoLA (Matthew Corr.) 60.5 61.3 63.0 61.2 
SST-2 (Accuracy) 94.9 94.4 95.1 95.2 
RTE (Accuracy) 70.1 70.3 71.2 72.7 
QNLI (Accuracy) 92.7 92.4 92.5 93.2 
MRPC (Accuracy/F1) 85.4/89.3 85.9/89.5 85.3/89.3 86.1/89.8 
MNLI-m/mm (Accuracy) 86.7/85.9 85.7/84.4 86.2/85.4 86.6/86.1 
QQP (Accuracy/F1) 89.3/72.1 89.3/71.6 89.6/72.2 89.7/72.5 
STS-B (Pearson Corr./Spearman Corr.) 87.6/86.5 88.4/87.3 88.3/87.5 88.1/87.2 
WNLI (Accuracy) 65.1 65.8 65.8 66.4 
AX(Matthew Corr.) 39.6 39.3 40.2 40.3 
 
Average 80.5 80.6 81.0 81.3 
Close Modal

or Create an Account

Close Modal
Close Modal