Skip to Main Content
Table 1: 

The RoBERTa-base (top) and RoBERTa-large (bottom) parsing results for the full ceiling model and the probe on the PTB Stanford Dependencies (SD) test set and CoNLL 2015 in-domain test set. We also report their absolute and relative differences (probe – full). The smaller the magnitude of the difference, the more relevant content the pretrained model already encodes. We report the canonical parsing metric (LAS for PTB dependency and labeled F1 for DM) and labeled/unlabeled exact match scores (LEM/UEM). All numbers are mean ± standard deviation across three seeds.

PTB SDCoNLL 2015 DM
MetricsAbsΔRelΔCeilingProbeAbsΔRelΔCeilingProbe
LAS/F1 −13.5±0.2 −14.2%±0.2 95.2±0.1 81.7±0.1 −23.5±0.1 −24.9%±0.2 94.2±0.0 70.7±0.2 
LEM −36.4±0.8 −72.4%±1.1 50.3±0.5 13.9±0.5 −45.4±1.1 −93.5%±0.5 48.5±1.2 3.1±0.2 
UEM −46.3±0.7 −73.2%±0.5 63.3±0.8 17.0±0.3 −48.8±1.0 −92.8%±0.5 52.6±1.0 3.8±0.2 
(a) Base. 
 
PTB SD CoNLL 2015 DM 
Metrics AbsΔ RelΔ Ceiling Probe AbsΔ RelΔ Ceiling Probe 
LAS/F1 −17.6±0.1 −18.5%±0.1 95.3±0.0 77.7±0.1 −26.7±0.3 −28.3%±0.3 94.4±0.1 67.7±0.2 
LEM −40.0±0.6 −77.2%±0.4 51.9±0.6 11.8±0.2 −46.6±1.1 −94.4%±0.1 49.3±1.1 2.7±0.0 
UEM −50.2±0.6 −77.4%±0.2 64.8±0.7 14.6±0.2 −50.0±1.1 −93.9%±0.2 53.2±1.0 3.3±0.1 
(b) Large. 
PTB SDCoNLL 2015 DM
MetricsAbsΔRelΔCeilingProbeAbsΔRelΔCeilingProbe
LAS/F1 −13.5±0.2 −14.2%±0.2 95.2±0.1 81.7±0.1 −23.5±0.1 −24.9%±0.2 94.2±0.0 70.7±0.2 
LEM −36.4±0.8 −72.4%±1.1 50.3±0.5 13.9±0.5 −45.4±1.1 −93.5%±0.5 48.5±1.2 3.1±0.2 
UEM −46.3±0.7 −73.2%±0.5 63.3±0.8 17.0±0.3 −48.8±1.0 −92.8%±0.5 52.6±1.0 3.8±0.2 
(a) Base. 
 
PTB SD CoNLL 2015 DM 
Metrics AbsΔ RelΔ Ceiling Probe AbsΔ RelΔ Ceiling Probe 
LAS/F1 −17.6±0.1 −18.5%±0.1 95.3±0.0 77.7±0.1 −26.7±0.3 −28.3%±0.3 94.4±0.1 67.7±0.2 
LEM −40.0±0.6 −77.2%±0.4 51.9±0.6 11.8±0.2 −46.6±1.1 −94.4%±0.1 49.3±1.1 2.7±0.0 
UEM −50.2±0.6 −77.4%±0.2 64.8±0.7 14.6±0.2 −50.0±1.1 −93.9%±0.2 53.2±1.0 3.3±0.1 
(b) Large. 
Close Modal

or Create an Account

Close Modal
Close Modal