Skip to Main Content
Table 4

Results on WMT11 at the segment-level (calculated on 3,695 pairs for cs-en, 8,950 for de-en, 5,974 for es-en, and 6,337 for fr-en): tuning on the entire WMT12. Kendall's τ with human judgments. Improvements over the baseline are shown in bold, and statistically significant improvements are marked with ** for p-value < 0.01.

MetricsOrig.Tuned
+DR+DR-lex
I DR −0.447 – – 
DR-lex 0.146 – – 
 
III BLEU 0.186 0.192 0.207 ** 
NIST 0.219 0.226 ** 0.232 ** 
Rouge 0.205 0.218 ** 0.242 ** 
TER 0.262 0.274 ** 0.296 ** 
MetricsOrig.Tuned
+DR+DR-lex
I DR −0.447 – – 
DR-lex 0.146 – – 
 
III BLEU 0.186 0.192 0.207 ** 
NIST 0.219 0.226 ** 0.232 ** 
Rouge 0.205 0.218 ** 0.242 ** 
TER 0.262 0.274 ** 0.296 ** 
Close Modal

or Create an Account

Close Modal
Close Modal