Skip to Main Content
Table 2: 

Kendall’s tau correlation coefficients of expert annotations computed on a system-level along four quality dimensions with automatic metrics using 11 reference summaries per example. ˆ denotes metrics which use the source document. The five most-correlated metrics in each column are bolded.

MetricCoherenceConsistencyFluencyRelevance
ROUGE-1 0.2500 0.5294 0.5240 0.4118 
ROUGE-2 0.1618 0.5882 0.4797 0.2941 
ROUGE-3 0.2206 0.7059 0.5092 0.3529 
ROUGE-4 0.3088 0.5882 0.5535 0.4118 
ROUGE-L 0.0735 0.1471 0.2583 0.2353 
ROUGE-su* 0.1912 0.2941 0.4354 0.3235 
ROUGE-w 0.0000 0.3971 0.3764 0.1618 
ROUGE-we-1 0.2647 0.4559 0.5092 0.4265 
ROUGE-we-2 −0.0147 0.5000 0.3026 0.1176 
ROUGE-we-3 0.0294 0.3676 0.3026 0.1912 
S3-pyr −0.0294 0.5147 0.3173 0.1324 
S3-resp −0.0147 0.5000 0.3321 0.1471 
BertScore-p 0.0588 −0.1912 0.0074 0.1618 
BertScore-r 0.1471 0.6618 0.4945 0.3088 
BertScore-f 0.2059 0.0441 0.2435 0.4265 
MoverScore 0.1912 −0.0294 0.2583 0.2941 
SMS 0.1618 0.5588 0.3616 0.2353 
SummaQA^ 0.1176 0.6029 0.4059 0.2206 
BLANC^ 0.0735 0.5588 0.3616 0.2647 
SUPERT^ 0.1029 0.5882 0.4207 0.2353 
BLEU 0.1176 0.0735 0.3321 0.2206 
CHRF 0.3971 0.5294 0.4649 0.5882 
CIDEr 0.1176 −0.1912 −0.0221 0.1912 
METEOR 0.2353 0.6324 0.6126 0.4265 
Length^ −0.0294 0.4265 0.2583 0.1618 
Novel unigram^ 0.1471 −0.2206 −0.1402 0.1029 
Novel bi-gram^ 0.0294 −0.5441 −0.3469 −0.1029 
Novel tri-gram^ 0.0294 −0.5735 −0.3469 −0.1324 
Repeated unigram^ −0.3824 0.1029 −0.0664 −0.3676 
Repeated bi-gram^ −0.3824 −0.0147 −0.2435 −0.4559 
Repeated tri-gram^ −0.2206 0.1471 −0.0221 −0.2647 
Stats-coverage^ −0.1324 0.3529 0.1550 −0.0294 
Stats-compression^ 0.1176 −0.4265 −0.2288 −0.0147 
Stats-density^ 0.1618 0.6471 0.3911 0.2941 
MetricCoherenceConsistencyFluencyRelevance
ROUGE-1 0.2500 0.5294 0.5240 0.4118 
ROUGE-2 0.1618 0.5882 0.4797 0.2941 
ROUGE-3 0.2206 0.7059 0.5092 0.3529 
ROUGE-4 0.3088 0.5882 0.5535 0.4118 
ROUGE-L 0.0735 0.1471 0.2583 0.2353 
ROUGE-su* 0.1912 0.2941 0.4354 0.3235 
ROUGE-w 0.0000 0.3971 0.3764 0.1618 
ROUGE-we-1 0.2647 0.4559 0.5092 0.4265 
ROUGE-we-2 −0.0147 0.5000 0.3026 0.1176 
ROUGE-we-3 0.0294 0.3676 0.3026 0.1912 
S3-pyr −0.0294 0.5147 0.3173 0.1324 
S3-resp −0.0147 0.5000 0.3321 0.1471 
BertScore-p 0.0588 −0.1912 0.0074 0.1618 
BertScore-r 0.1471 0.6618 0.4945 0.3088 
BertScore-f 0.2059 0.0441 0.2435 0.4265 
MoverScore 0.1912 −0.0294 0.2583 0.2941 
SMS 0.1618 0.5588 0.3616 0.2353 
SummaQA^ 0.1176 0.6029 0.4059 0.2206 
BLANC^ 0.0735 0.5588 0.3616 0.2647 
SUPERT^ 0.1029 0.5882 0.4207 0.2353 
BLEU 0.1176 0.0735 0.3321 0.2206 
CHRF 0.3971 0.5294 0.4649 0.5882 
CIDEr 0.1176 −0.1912 −0.0221 0.1912 
METEOR 0.2353 0.6324 0.6126 0.4265 
Length^ −0.0294 0.4265 0.2583 0.1618 
Novel unigram^ 0.1471 −0.2206 −0.1402 0.1029 
Novel bi-gram^ 0.0294 −0.5441 −0.3469 −0.1029 
Novel tri-gram^ 0.0294 −0.5735 −0.3469 −0.1324 
Repeated unigram^ −0.3824 0.1029 −0.0664 −0.3676 
Repeated bi-gram^ −0.3824 −0.0147 −0.2435 −0.4559 
Repeated tri-gram^ −0.2206 0.1471 −0.0221 −0.2647 
Stats-coverage^ −0.1324 0.3529 0.1550 −0.0294 
Stats-compression^ 0.1176 −0.4265 −0.2288 −0.0147 
Stats-density^ 0.1618 0.6471 0.3911 0.2941 
Close Modal

or Create an Account

Close Modal
Close Modal