Skip to Main Content
Table 5: 
Test set performance on GLUE tasks. MRPC: F1/accuracy, STS-B: Pearson/Spearmanr correlation, QQP: F1/accuracy, MNLI: matched/mistached accuracies, and accuracy for all the other tasks. WNLI (not shown) is always set to majority class (65.1% accuracy) and included in the average.
CoLASST-2MRPCSTS-BQQPMNLIQNLIRTE(Avg)
Google BERT 59.3 95.2 88.5/84.3 86.4/88.0 71.2/89.0 86.1/85.7 93.0 71.1 80.4 
Our BERT 58.6 93.9 90.1/86.6 88.4/89.1 71.8/89.3 87.2/86.6 93.0 74.7 81.1 
Our BERT-1seq 63.5 94.8 91.2/87.8 89.0/88.4 72.1/89.5 88.0/87.4 93.0 72.1 81.7 
SpanBERT 64.3 94.8 90.9/87.9 89.9/89.1 71.9/89.5 88.1/87.7 94.3 79.0 82.8 
CoLASST-2MRPCSTS-BQQPMNLIQNLIRTE(Avg)
Google BERT 59.3 95.2 88.5/84.3 86.4/88.0 71.2/89.0 86.1/85.7 93.0 71.1 80.4 
Our BERT 58.6 93.9 90.1/86.6 88.4/89.1 71.8/89.3 87.2/86.6 93.0 74.7 81.1 
Our BERT-1seq 63.5 94.8 91.2/87.8 89.0/88.4 72.1/89.5 88.0/87.4 93.0 72.1 81.7 
SpanBERT 64.3 94.8 90.9/87.9 89.9/89.1 71.9/89.5 88.1/87.7 94.3 79.0 82.8 
Close Modal

or Create an Account

Close Modal
Close Modal