Skip to Main Content
Table 6: 

Ranking (lower is better) of the top candidate selected by each decoding method, as ranked among the 1,000 candidates using Bleurt v0.2 (BL.2). The percentiles are calculated on the 1,002 test queries of Newstest2021 En→De. A smaller value indicates that the chosen candidate is also preferred by the actual Ref-C BL.2 metric. This table shows that MBR provides more stable quality estimates than single references.

Rank wrt Bleurt v0.2 Ref-C
p5p25p50p75p95
MAP  13 78 181 355 717 
Oracle Ref-D 18 78 327 
MBR BL.2 26 105 
Rank wrt Bleurt v0.2 Ref-C
p5p25p50p75p95
MAP  13 78 181 355 717 
Oracle Ref-D 18 78 327 
MBR BL.2 26 105 
Close Modal

or Create an Account

Close Modal
Close Modal