Skip to Main Content
Table 5: 

Spearman Correlation of spBLEU, BLEU, and chrF++. We evaluate on three sets of languages (En-XX). Models evaluated are derived from our baselines (discussed in Section 6). In the top section, we evaluate languages that often use the standard mosestokenizer. In the bottom section, we evaluate languages that have their own custom tokenization.

LangCorrelationCorrelation
spBLEU v. BLEUspBLEU v. chrF++
French 0.99 0.98 
Italian 0.99 0.98 
Spanish 0.99 0.98 
 
Hindi 0.99 0.98 
Tamil 0.41 0.94 
Chinese 0.99 0.98 
LangCorrelationCorrelation
spBLEU v. BLEUspBLEU v. chrF++
French 0.99 0.98 
Italian 0.99 0.98 
Spanish 0.99 0.98 
 
Hindi 0.99 0.98 
Tamil 0.41 0.94 
Chinese 0.99 0.98 
Close Modal

or Create an Account

Close Modal
Close Modal