Table 6: 

Mean BLEU scores and accuracies of monolingual models (§5.1) on test-intersection-MT and test-intersection-gold. The numbers are averaged over the accuracies of the predictions from the monolingual models trained on three MCD splits. Overall, there is no substantial difference between the performances on the two intersection sets, demonstrating the reliability of evaluating on machine translated data in this case.

SPARQL BLEUtest-intersection-MTtest-intersection-gold
EnHeKnZhEnHeKnZh
mT5-small+RIR 86.1 82.5 78.9 85.1 – 81.8 77.7 86 
mT5-base+RIR 85.5 83.7 81.8 83.2 – 83.8 80.9 83.8 
 
Exact Match (%) 
mT5-small+RIR 45.6 35.7 32.7 38.5 – 35.9 28.2 39.8 
mT5-base+RIR 40.4 41.9 40.2 38.7 – 41.1 34 38.9 
SPARQL BLEUtest-intersection-MTtest-intersection-gold
EnHeKnZhEnHeKnZh
mT5-small+RIR 86.1 82.5 78.9 85.1 – 81.8 77.7 86 
mT5-base+RIR 85.5 83.7 81.8 83.2 – 83.8 80.9 83.8 
 
Exact Match (%) 
mT5-small+RIR 45.6 35.7 32.7 38.5 – 35.9 28.2 39.8 
mT5-base+RIR 40.4 41.9 40.2 38.7 – 41.1 34 38.9 
Close Modal

or Create an Account

Close Modal
Close Modal