Mean BLEU scores and accuracies of monolingual models (§5.1) on test-intersection-MT and test-intersection-gold. The numbers are averaged over the accuracies of the predictions from the monolingual models trained on three MCD splits. Overall, there is no substantial difference between the performances on the two intersection sets, demonstrating the reliability of evaluating on machine translated data in this case.
SPARQL BLEU . | test-intersection-MT . | test-intersection-gold . | ||||||
---|---|---|---|---|---|---|---|---|
En . | He . | Kn . | Zh . | En . | He . | Kn . | Zh . | |
mT5-small+RIR | 86.1 | 82.5 | 78.9 | 85.1 | – | 81.8 | 77.7 | 86 |
mT5-base+RIR | 85.5 | 83.7 | 81.8 | 83.2 | – | 83.8 | 80.9 | 83.8 |
Exact Match (%) | ||||||||
mT5-small+RIR | 45.6 | 35.7 | 32.7 | 38.5 | – | 35.9 | 28.2 | 39.8 |
mT5-base+RIR | 40.4 | 41.9 | 40.2 | 38.7 | – | 41.1 | 34 | 38.9 |
SPARQL BLEU . | test-intersection-MT . | test-intersection-gold . | ||||||
---|---|---|---|---|---|---|---|---|
En . | He . | Kn . | Zh . | En . | He . | Kn . | Zh . | |
mT5-small+RIR | 86.1 | 82.5 | 78.9 | 85.1 | – | 81.8 | 77.7 | 86 |
mT5-base+RIR | 85.5 | 83.7 | 81.8 | 83.2 | – | 83.8 | 80.9 | 83.8 |
Exact Match (%) | ||||||||
mT5-small+RIR | 45.6 | 35.7 | 32.7 | 38.5 | – | 35.9 | 28.2 | 39.8 |
mT5-base+RIR | 40.4 | 41.9 | 40.2 | 38.7 | – | 41.1 | 34 | 38.9 |