Table 4: 
Comparing training strategies for PREBF, and Lang-8BF, following setup (D) in Table 3. The asterisk indicates the training stage that is being varied in each experiment. In (ii) all models are finetuned on Lang-8BF using the soft strategy. The hard strategies filter out all examples with positive Δppl, which leaves 37% of the dataset remaining for both PREBF, and Lang-8BF. The curriculum strategies anneal down to the best 5% of the dataset, following Wang et al. (2018).
 Training DataTraining Strategy
unscoredhardsofthard-cclmsoft-cclm
PREBF (soft) → Lang-8BF43.3 49.0 48.0 45.8 47.9 
→ BF 51.7 52.1 52.3 51.8 52.4 
ii PREBF24.0 45.7 37.0 47.7 36.9 
Lang-8BF (soft42.5 48.1 48.4 48.6 48.0 
→ BF 51.5 51.8 52.4 52.3 52.2 
 Training DataTraining Strategy
unscoredhardsofthard-cclmsoft-cclm
PREBF (soft) → Lang-8BF43.3 49.0 48.0 45.8 47.9 
→ BF 51.7 52.1 52.3 51.8 52.4 
ii PREBF24.0 45.7 37.0 47.7 36.9 
Lang-8BF (soft42.5 48.1 48.4 48.6 48.0 
→ BF 51.5 51.8 52.4 52.3 52.2 
Close Modal

or Create an Account

Close Modal
Close Modal