The RoBERTa-base (top) and RoBERTa-large (bottom) parsing results for the full ceiling model and the probe on the PTB Stanford Dependencies (SD) test set and CoNLL 2015 in-domain test set. We also report their absolute and relative differences (probe – full). The smaller the magnitude of the difference, the more relevant content the pretrained model already encodes. We report the canonical parsing metric (LAS for PTB dependency and labeled F1 for DM) and labeled/unlabeled exact match scores (LEM/UEM). All numbers are mean ± standard deviation across three seeds.
. | PTB SD . | CoNLL 2015 DM . | ||||||
---|---|---|---|---|---|---|---|---|
Metrics . | AbsΔ . | RelΔ . | Ceiling . | Probe . | AbsΔ . | RelΔ . | Ceiling . | Probe . |
LAS/F1 | −13.5±0.2 | −14.2%±0.2 | 95.2±0.1 | 81.7±0.1 | −23.5±0.1 | −24.9%±0.2 | 94.2±0.0 | 70.7±0.2 |
LEM | −36.4±0.8 | −72.4%±1.1 | 50.3±0.5 | 13.9±0.5 | −45.4±1.1 | −93.5%±0.5 | 48.5±1.2 | 3.1±0.2 |
UEM | −46.3±0.7 | −73.2%±0.5 | 63.3±0.8 | 17.0±0.3 | −48.8±1.0 | −92.8%±0.5 | 52.6±1.0 | 3.8±0.2 |
(a) Base. | ||||||||
PTB SD | CoNLL 2015 DM | |||||||
Metrics | AbsΔ | RelΔ | Ceiling | Probe | AbsΔ | RelΔ | Ceiling | Probe |
LAS/F1 | −17.6±0.1 | −18.5%±0.1 | 95.3±0.0 | 77.7±0.1 | −26.7±0.3 | −28.3%±0.3 | 94.4±0.1 | 67.7±0.2 |
LEM | −40.0±0.6 | −77.2%±0.4 | 51.9±0.6 | 11.8±0.2 | −46.6±1.1 | −94.4%±0.1 | 49.3±1.1 | 2.7±0.0 |
UEM | −50.2±0.6 | −77.4%±0.2 | 64.8±0.7 | 14.6±0.2 | −50.0±1.1 | −93.9%±0.2 | 53.2±1.0 | 3.3±0.1 |
(b) Large. |
. | PTB SD . | CoNLL 2015 DM . | ||||||
---|---|---|---|---|---|---|---|---|
Metrics . | AbsΔ . | RelΔ . | Ceiling . | Probe . | AbsΔ . | RelΔ . | Ceiling . | Probe . |
LAS/F1 | −13.5±0.2 | −14.2%±0.2 | 95.2±0.1 | 81.7±0.1 | −23.5±0.1 | −24.9%±0.2 | 94.2±0.0 | 70.7±0.2 |
LEM | −36.4±0.8 | −72.4%±1.1 | 50.3±0.5 | 13.9±0.5 | −45.4±1.1 | −93.5%±0.5 | 48.5±1.2 | 3.1±0.2 |
UEM | −46.3±0.7 | −73.2%±0.5 | 63.3±0.8 | 17.0±0.3 | −48.8±1.0 | −92.8%±0.5 | 52.6±1.0 | 3.8±0.2 |
(a) Base. | ||||||||
PTB SD | CoNLL 2015 DM | |||||||
Metrics | AbsΔ | RelΔ | Ceiling | Probe | AbsΔ | RelΔ | Ceiling | Probe |
LAS/F1 | −17.6±0.1 | −18.5%±0.1 | 95.3±0.0 | 77.7±0.1 | −26.7±0.3 | −28.3%±0.3 | 94.4±0.1 | 67.7±0.2 |
LEM | −40.0±0.6 | −77.2%±0.4 | 51.9±0.6 | 11.8±0.2 | −46.6±1.1 | −94.4%±0.1 | 49.3±1.1 | 2.7±0.0 |
UEM | −50.2±0.6 | −77.4%±0.2 | 64.8±0.7 | 14.6±0.2 | −50.0±1.1 | −93.9%±0.2 | 53.2±1.0 | 3.3±0.1 |
(b) Large. |