Skip to Main Content
Table 8: 
Comparison of test set evaluation results to prior work, showing the best reported result for each test set in each cited work. Cited values for different test sets do not necessarily represent the same model.
  BEA-19 test F0.5 (ERRANT)CoNLL-14 test F0.5 (M2)JFLEG test GLEU+
single model Kiyono et al. (2019) 64.2 61.3 59.7 
Lichtarge et al. (2019) — 56.8 61.6 
Xu et al. (2019) — 60.9 60.8 
Omelianchuk et al. (2020) 72.4 65.3 — 
this work - unscored 66.1 61.1 63.6 
this work - scored 66.5 62.1 63.8 
ensemble Choe et al. (2019) 69.1 60.3 — 
Ge et al. (2018b) — 61.3 62.4 
Grundkiewicz et al. (2019) 69.5 64.2 61.2 
Kiyono et al. (2019) 70.2 65.0 61.4 
Lichtarge et al. (2019) — 60.4 63.3 
Xu et al. (2019) 66.6 63.2 62.6 
Omelianchuk et al. (2020) 73.7 66.5 — 
this work - unscored 71.9 65.3 64.7 
this work - scored 73.0 66.8 64.9 
  BEA-19 test F0.5 (ERRANT)CoNLL-14 test F0.5 (M2)JFLEG test GLEU+
single model Kiyono et al. (2019) 64.2 61.3 59.7 
Lichtarge et al. (2019) — 56.8 61.6 
Xu et al. (2019) — 60.9 60.8 
Omelianchuk et al. (2020) 72.4 65.3 — 
this work - unscored 66.1 61.1 63.6 
this work - scored 66.5 62.1 63.8 
ensemble Choe et al. (2019) 69.1 60.3 — 
Ge et al. (2018b) — 61.3 62.4 
Grundkiewicz et al. (2019) 69.5 64.2 61.2 
Kiyono et al. (2019) 70.2 65.0 61.4 
Lichtarge et al. (2019) — 60.4 63.3 
Xu et al. (2019) 66.6 63.2 62.6 
Omelianchuk et al. (2020) 73.7 66.5 — 
this work - unscored 71.9 65.3 64.7 
this work - scored 73.0 66.8 64.9 
Close Modal

or Create an Account

Close Modal
Close Modal