Table 3:
Answer F1 scores on the NLmaps v2 test set for RAMP and the MRT objective as well as two further objectives, which help crystallize the difference between the two former objectives, averaged over two independent runs. M is the minibatch size. All models are statistically significant from each other at p < 0.01, except the pair (3, 4).
M% F1Δ
1 MLE  57.45
2 MRT 63.60 ± 0.02  +6.15
3 MRT neg 65.93 ± 0.16  +8.48
4 RAMP m=1 66.78 ± 0.21  +9.33
5 RAMP 80 69.03 ± 0.04  +11.58
M% F1Δ
1 MLE  57.45
2 MRT 63.60 ± 0.02  +6.15
3 MRT neg 65.93 ± 0.16  +8.48
4 RAMP m=1 66.78 ± 0.21  +9.33
5 RAMP 80 69.03 ± 0.04  +11.58
Close Modal