Results over language probes in incoherent Wikipedia test documents. BERT-LargeCNN, XLNet-LargeCNN, RoBERTa-LargeCNN, and ALBERT-xxLargeCNN are trained over CNN, while BERT-Large, XLNet-Large, RoBERTa-Large, and ALBERT-xxLarge are trained over Wikipedia. Here, F1 is over the original incoherent documents (excluding linguistic probes), and ΔF1 indicates the absolute performance difference resulting from incorporating linguistic probes.
. | Gender . | Animacy↓ . | Animacy↑ . | Past to Future . | ||||
---|---|---|---|---|---|---|---|---|
. | F1 . | ΔF1 . | F1 . | ΔF1 . | F1 . | ΔF1 . | F1 . | ΔF1 . |
BERT-Large | 26.5 | +65.3 | 26.3 | +53.2 | 33.6 | +45.1 | 35.6 | +42.1 |
XLNet-Large | 55.8 | +41.6 | 50.0 | +45.2 | 64.0 | +23.5 | 64.9 | +16.9 |
RoBERTa-Large | 64.9 | +32.5 | 50.7 | +38.3 | 59.7 | +21.7 | 69.2 | +19.9 |
ALBERT-xxLarge | 74.0 | +25.4 | 71.8 | +8.5 | 81.0 | +2.9 | 79.8 | +4.3 |
BERT-LargeCNN | 23.9 | +70.0 | 22.2 | +60.2 | 27.6 | +51.4 | 30.6 | +14.7 |
XLNet-LargeCNN | 13.6 | +83.1 | 10.0 | +71.3 | 8.0 | +71.8 | 23.2 | +27.6 |
RoBERTa-LargeCNN | 15.4 | +82.4 | 7.9 | +64.4 | 9.8 | +73.3 | 23.4 | +40.0 |
ALBERT-xxLargeCNN | 21.6 | +72.8 | 20.2 | +51.8 | 27.6 | +33.4 | 38.0 | +30.4 |
Human | 35.8 | +53.4 | 36.6 | +45.3 | 29.8 | +53.9 | 40.9 | +34.4 |
Conjunction | Demonstrative | Negation | Number | |||||
F1 | ΔF1 | F1 | ΔF1 | F1 | ΔF1 | F1 | ΔF1 | |
BERT-Large | 51.9 | +17.3 | 34.8 | +15.6 | 34.5 | +32.2 | 32.5 | +31.2 |
XLNet-Large | 68.6 | +3.6 | 55.4 | 0.0 | 57.7 | +8.9 | 50.7 | +11.3 |
RoBERTa-Large | 73.0 | +0.7 | 57.9 | 0.0 | 68.4 | +10.9 | 54.2 | +20.0 |
ALBERT-xxLarge | 83.5 | −1.6 | 75.2 | +1.3 | 79.5 | +2.9 | 63.9 | +10.4 |
BERT-LargeCNN | 38.2 | −1.4 | 35.6 | −5.7 | 28.8 | +4.2 | 19.6 | +11.7 |
XLNet-LargeCNN | 31.0 | 0.0 | 14.1 | 0.0 | 15.7 | +11.8 | 15.2 | +13.1 |
RoBERTa-LargeCNN | 33.9 | +1.4 | 17.8 | 0.0 | 21.0 | +12.4 | 18.3 | +23.6 |
ALBERT-xxLargeCNN | 41.6 | +1.3 | 30.9 | 0.0 | 28.1 | +19.2 | 23.0 | +16.0 |
Human | 40.5 | +8.7 | 38.0 | +1.0 | 40.4 | +36.8 | 37.3 | +24.2 |
. | Gender . | Animacy↓ . | Animacy↑ . | Past to Future . | ||||
---|---|---|---|---|---|---|---|---|
. | F1 . | ΔF1 . | F1 . | ΔF1 . | F1 . | ΔF1 . | F1 . | ΔF1 . |
BERT-Large | 26.5 | +65.3 | 26.3 | +53.2 | 33.6 | +45.1 | 35.6 | +42.1 |
XLNet-Large | 55.8 | +41.6 | 50.0 | +45.2 | 64.0 | +23.5 | 64.9 | +16.9 |
RoBERTa-Large | 64.9 | +32.5 | 50.7 | +38.3 | 59.7 | +21.7 | 69.2 | +19.9 |
ALBERT-xxLarge | 74.0 | +25.4 | 71.8 | +8.5 | 81.0 | +2.9 | 79.8 | +4.3 |
BERT-LargeCNN | 23.9 | +70.0 | 22.2 | +60.2 | 27.6 | +51.4 | 30.6 | +14.7 |
XLNet-LargeCNN | 13.6 | +83.1 | 10.0 | +71.3 | 8.0 | +71.8 | 23.2 | +27.6 |
RoBERTa-LargeCNN | 15.4 | +82.4 | 7.9 | +64.4 | 9.8 | +73.3 | 23.4 | +40.0 |
ALBERT-xxLargeCNN | 21.6 | +72.8 | 20.2 | +51.8 | 27.6 | +33.4 | 38.0 | +30.4 |
Human | 35.8 | +53.4 | 36.6 | +45.3 | 29.8 | +53.9 | 40.9 | +34.4 |
Conjunction | Demonstrative | Negation | Number | |||||
F1 | ΔF1 | F1 | ΔF1 | F1 | ΔF1 | F1 | ΔF1 | |
BERT-Large | 51.9 | +17.3 | 34.8 | +15.6 | 34.5 | +32.2 | 32.5 | +31.2 |
XLNet-Large | 68.6 | +3.6 | 55.4 | 0.0 | 57.7 | +8.9 | 50.7 | +11.3 |
RoBERTa-Large | 73.0 | +0.7 | 57.9 | 0.0 | 68.4 | +10.9 | 54.2 | +20.0 |
ALBERT-xxLarge | 83.5 | −1.6 | 75.2 | +1.3 | 79.5 | +2.9 | 63.9 | +10.4 |
BERT-LargeCNN | 38.2 | −1.4 | 35.6 | −5.7 | 28.8 | +4.2 | 19.6 | +11.7 |
XLNet-LargeCNN | 31.0 | 0.0 | 14.1 | 0.0 | 15.7 | +11.8 | 15.2 | +13.1 |
RoBERTa-LargeCNN | 33.9 | +1.4 | 17.8 | 0.0 | 21.0 | +12.4 | 18.3 | +23.6 |
ALBERT-xxLargeCNN | 41.6 | +1.3 | 30.9 | 0.0 | 28.1 | +19.2 | 23.0 | +16.0 |
Human | 40.5 | +8.7 | 38.0 | +1.0 | 40.4 | +36.8 | 37.3 | +24.2 |