Skip to Main Content
Table 2: 
Answer F1 scores on the NLmaps v2 test set for various objectives, averaged over two independent runs. M is the minibatch size. All models are statistically significant from each other at p < 0.01, except the pair (2, 4).
M% F1Δ
1 MLE  57.45  
2 MRT 63.60 ± 0.02  +6.15 
3 RAMP1 80 60.50 ± 0.01  +3.05 
4 RAMP2 80 64.22 ± 0.00  +6.77 
5 RAMP 80 69.03 ± 0.04  +11.58 
6 RAMP-T 80 69.87 ± 0.02 +12.42 
M% F1Δ
1 MLE  57.45  
2 MRT 63.60 ± 0.02  +6.15 
3 RAMP1 80 60.50 ± 0.01  +3.05 
4 RAMP2 80 64.22 ± 0.00  +6.77 
5 RAMP 80 69.03 ± 0.04  +11.58 
6 RAMP-T 80 69.87 ± 0.02 +12.42 
Close Modal

or Create an Account

Close Modal
Close Modal