Spearman Correlation of spBLEU, BLEU, and chrF++. We evaluate on three sets of languages (En-XX). Models evaluated are derived from our baselines (discussed in Section 6). In the top section, we evaluate languages that often use the standard mosestokenizer. In the bottom section, we evaluate languages that have their own custom tokenization.
Lang . | Correlation . | Correlation . |
---|---|---|
spBLEU v. BLEU . | spBLEU v. chrF++ . | |
French | 0.99 | 0.98 |
Italian | 0.99 | 0.98 |
Spanish | 0.99 | 0.98 |
Hindi | 0.99 | 0.98 |
Tamil | 0.41 | 0.94 |
Chinese | 0.99 | 0.98 |
Lang . | Correlation . | Correlation . |
---|---|---|
spBLEU v. BLEU . | spBLEU v. chrF++ . | |
French | 0.99 | 0.98 |
Italian | 0.99 | 0.98 |
Spanish | 0.99 | 0.98 |
Hindi | 0.99 | 0.98 |
Tamil | 0.41 | 0.94 |
Chinese | 0.99 | 0.98 |