Skip to Main Content
Table 3: 
Comparing scoring arrangements. Bold indicates a base dataset for which Δppl scores have been calculated, the subscript denotes the target dataset used, e.g., in A, the scores are calculated for PRELang-8 using BF as the target. All scored datasets are trained via soft down-weighting. The final column indicates the change in F0.5 over the unscored setup at the same training stage (absolute values in Table 2).
 Training DataBEA-19 dev
F0.5Δ vs unscored
(PRELang-8)BF 44.9 +12.5 
→ BF 51.8 +0.4 
PREBF 37.0 +6.8 
→ Lang-8 43.3 +0.8 
→ BF 51.7 +0.2 
PRE 24.0 — 
Lang-8BF 47.2 +4.7 
→ BF 51.9 +0.4 
PREBFLang-8BF 48.0 +5.5 
→ BF 52.3 +0.8 
 Training DataBEA-19 dev
F0.5Δ vs unscored
(PRELang-8)BF 44.9 +12.5 
→ BF 51.8 +0.4 
PREBF 37.0 +6.8 
→ Lang-8 43.3 +0.8 
→ BF 51.7 +0.2 
PRE 24.0 — 
Lang-8BF 47.2 +4.7 
→ BF 51.9 +0.4 
PREBFLang-8BF 48.0 +5.5 
→ BF 52.3 +0.8 
Close Modal

or Create an Account

Close Modal
Close Modal