Skip to Main Content
Table 7: 

Results over language probes in incoherent Wikipedia test documents. BERT-LargeCNN, XLNet-LargeCNN, RoBERTa-LargeCNN, and ALBERT-xxLargeCNN are trained over CNN, while BERT-Large, XLNet-Large, RoBERTa-Large, and ALBERT-xxLarge are trained over Wikipedia. Here, F1 is over the original incoherent documents (excluding linguistic probes), and ΔF1 indicates the absolute performance difference resulting from incorporating linguistic probes.

GenderAnimacyAnimacyPast to Future
F1ΔF1F1ΔF1F1ΔF1F1ΔF1
BERT-Large 26.5  +65.3 26.3  +53.2 33.6  +45.1 35.6  +42.1 
XLNet-Large 55.8  +41.6 50.0  +45.2 64.0  +23.5 64.9  +16.9 
RoBERTa-Large 64.9  +32.5 50.7  +38.3 59.7  +21.7 69.2  +19.9 
ALBERT-xxLarge 74.0  +25.4 71.8  +8.5 81.0  +2.9 79.8  +4.3 
BERT-LargeCNN 23.9  +70.0 22.2  +60.2 27.6  +51.4 30.6  +14.7 
XLNet-LargeCNN 13.6  +83.1 10.0  +71.3 8.0  +71.8 23.2  +27.6 
RoBERTa-LargeCNN 15.4  +82.4 7.9  +64.4 9.8  +73.3 23.4  +40.0 
ALBERT-xxLargeCNN 21.6  +72.8 20.2  +51.8 27.6  +33.4 38.0  +30.4 
Human 35.8  +53.4 36.6  +45.3 29.8  +53.9 40.9  +34.4 
 
 Conjunction Demonstrative Negation Number 
 F1 ΔF1 F1 ΔF1 F1 ΔF1 F1 ΔF1 
BERT-Large 51.9  +17.3 34.8  +15.6 34.5  +32.2 32.5  +31.2 
XLNet-Large 68.6  +3.6 55.4 0.0 57.7  +8.9 50.7  +11.3 
RoBERTa-Large 73.0  +0.7 57.9 0.0 68.4  +10.9 54.2  +20.0 
ALBERT-xxLarge 83.5 −1.6 75.2  +1.3 79.5  +2.9 63.9  +10.4 
BERT-LargeCNN 38.2 −1.4 35.6 −5.7 28.8  +4.2 19.6  +11.7 
XLNet-LargeCNN 31.0 0.0 14.1 0.0 15.7  +11.8 15.2  +13.1 
RoBERTa-LargeCNN 33.9  +1.4 17.8 0.0 21.0  +12.4 18.3  +23.6 
ALBERT-xxLargeCNN 41.6  +1.3 30.9 0.0 28.1  +19.2 23.0  +16.0 
Human 40.5  +8.7 38.0  +1.0 40.4  +36.8 37.3  +24.2 
GenderAnimacyAnimacyPast to Future
F1ΔF1F1ΔF1F1ΔF1F1ΔF1
BERT-Large 26.5  +65.3 26.3  +53.2 33.6  +45.1 35.6  +42.1 
XLNet-Large 55.8  +41.6 50.0  +45.2 64.0  +23.5 64.9  +16.9 
RoBERTa-Large 64.9  +32.5 50.7  +38.3 59.7  +21.7 69.2  +19.9 
ALBERT-xxLarge 74.0  +25.4 71.8  +8.5 81.0  +2.9 79.8  +4.3 
BERT-LargeCNN 23.9  +70.0 22.2  +60.2 27.6  +51.4 30.6  +14.7 
XLNet-LargeCNN 13.6  +83.1 10.0  +71.3 8.0  +71.8 23.2  +27.6 
RoBERTa-LargeCNN 15.4  +82.4 7.9  +64.4 9.8  +73.3 23.4  +40.0 
ALBERT-xxLargeCNN 21.6  +72.8 20.2  +51.8 27.6  +33.4 38.0  +30.4 
Human 35.8  +53.4 36.6  +45.3 29.8  +53.9 40.9  +34.4 
 
 Conjunction Demonstrative Negation Number 
 F1 ΔF1 F1 ΔF1 F1 ΔF1 F1 ΔF1 
BERT-Large 51.9  +17.3 34.8  +15.6 34.5  +32.2 32.5  +31.2 
XLNet-Large 68.6  +3.6 55.4 0.0 57.7  +8.9 50.7  +11.3 
RoBERTa-Large 73.0  +0.7 57.9 0.0 68.4  +10.9 54.2  +20.0 
ALBERT-xxLarge 83.5 −1.6 75.2  +1.3 79.5  +2.9 63.9  +10.4 
BERT-LargeCNN 38.2 −1.4 35.6 −5.7 28.8  +4.2 19.6  +11.7 
XLNet-LargeCNN 31.0 0.0 14.1 0.0 15.7  +11.8 15.2  +13.1 
RoBERTa-LargeCNN 33.9  +1.4 17.8 0.0 21.0  +12.4 18.3  +23.6 
ALBERT-xxLargeCNN 41.6  +1.3 30.9 0.0 28.1  +19.2 23.0  +16.0 
Human 40.5  +8.7 38.0  +1.0 40.4  +36.8 37.3  +24.2 
Close Modal

or Create an Account

Close Modal
Close Modal