Kendall’s tau correlation coefficients of expert annotations computed on a system-level along four quality dimensions with automatic metrics using 11 reference summaries per example. ˆ denotes metrics which use the source document. The five most-correlated metrics in each column are bolded.
Metric . | Coherence . | Consistency . | Fluency . | Relevance . |
---|---|---|---|---|
ROUGE-1 | 0.2500 | 0.5294 | 0.5240 | 0.4118 |
ROUGE-2 | 0.1618 | 0.5882 | 0.4797 | 0.2941 |
ROUGE-3 | 0.2206 | 0.7059 | 0.5092 | 0.3529 |
ROUGE-4 | 0.3088 | 0.5882 | 0.5535 | 0.4118 |
ROUGE-L | 0.0735 | 0.1471 | 0.2583 | 0.2353 |
ROUGE-su* | 0.1912 | 0.2941 | 0.4354 | 0.3235 |
ROUGE-w | 0.0000 | 0.3971 | 0.3764 | 0.1618 |
ROUGE-we-1 | 0.2647 | 0.4559 | 0.5092 | 0.4265 |
ROUGE-we-2 | −0.0147 | 0.5000 | 0.3026 | 0.1176 |
ROUGE-we-3 | 0.0294 | 0.3676 | 0.3026 | 0.1912 |
S3-pyr | −0.0294 | 0.5147 | 0.3173 | 0.1324 |
S3-resp | −0.0147 | 0.5000 | 0.3321 | 0.1471 |
BertScore-p | 0.0588 | −0.1912 | 0.0074 | 0.1618 |
BertScore-r | 0.1471 | 0.6618 | 0.4945 | 0.3088 |
BertScore-f | 0.2059 | 0.0441 | 0.2435 | 0.4265 |
MoverScore | 0.1912 | −0.0294 | 0.2583 | 0.2941 |
SMS | 0.1618 | 0.5588 | 0.3616 | 0.2353 |
SummaQA^ | 0.1176 | 0.6029 | 0.4059 | 0.2206 |
BLANC^ | 0.0735 | 0.5588 | 0.3616 | 0.2647 |
SUPERT^ | 0.1029 | 0.5882 | 0.4207 | 0.2353 |
BLEU | 0.1176 | 0.0735 | 0.3321 | 0.2206 |
CHRF | 0.3971 | 0.5294 | 0.4649 | 0.5882 |
CIDEr | 0.1176 | −0.1912 | −0.0221 | 0.1912 |
METEOR | 0.2353 | 0.6324 | 0.6126 | 0.4265 |
Length^ | −0.0294 | 0.4265 | 0.2583 | 0.1618 |
Novel unigram^ | 0.1471 | −0.2206 | −0.1402 | 0.1029 |
Novel bi-gram^ | 0.0294 | −0.5441 | −0.3469 | −0.1029 |
Novel tri-gram^ | 0.0294 | −0.5735 | −0.3469 | −0.1324 |
Repeated unigram^ | −0.3824 | 0.1029 | −0.0664 | −0.3676 |
Repeated bi-gram^ | −0.3824 | −0.0147 | −0.2435 | −0.4559 |
Repeated tri-gram^ | −0.2206 | 0.1471 | −0.0221 | −0.2647 |
Stats-coverage^ | −0.1324 | 0.3529 | 0.1550 | −0.0294 |
Stats-compression^ | 0.1176 | −0.4265 | −0.2288 | −0.0147 |
Stats-density^ | 0.1618 | 0.6471 | 0.3911 | 0.2941 |
Metric . | Coherence . | Consistency . | Fluency . | Relevance . |
---|---|---|---|---|
ROUGE-1 | 0.2500 | 0.5294 | 0.5240 | 0.4118 |
ROUGE-2 | 0.1618 | 0.5882 | 0.4797 | 0.2941 |
ROUGE-3 | 0.2206 | 0.7059 | 0.5092 | 0.3529 |
ROUGE-4 | 0.3088 | 0.5882 | 0.5535 | 0.4118 |
ROUGE-L | 0.0735 | 0.1471 | 0.2583 | 0.2353 |
ROUGE-su* | 0.1912 | 0.2941 | 0.4354 | 0.3235 |
ROUGE-w | 0.0000 | 0.3971 | 0.3764 | 0.1618 |
ROUGE-we-1 | 0.2647 | 0.4559 | 0.5092 | 0.4265 |
ROUGE-we-2 | −0.0147 | 0.5000 | 0.3026 | 0.1176 |
ROUGE-we-3 | 0.0294 | 0.3676 | 0.3026 | 0.1912 |
S3-pyr | −0.0294 | 0.5147 | 0.3173 | 0.1324 |
S3-resp | −0.0147 | 0.5000 | 0.3321 | 0.1471 |
BertScore-p | 0.0588 | −0.1912 | 0.0074 | 0.1618 |
BertScore-r | 0.1471 | 0.6618 | 0.4945 | 0.3088 |
BertScore-f | 0.2059 | 0.0441 | 0.2435 | 0.4265 |
MoverScore | 0.1912 | −0.0294 | 0.2583 | 0.2941 |
SMS | 0.1618 | 0.5588 | 0.3616 | 0.2353 |
SummaQA^ | 0.1176 | 0.6029 | 0.4059 | 0.2206 |
BLANC^ | 0.0735 | 0.5588 | 0.3616 | 0.2647 |
SUPERT^ | 0.1029 | 0.5882 | 0.4207 | 0.2353 |
BLEU | 0.1176 | 0.0735 | 0.3321 | 0.2206 |
CHRF | 0.3971 | 0.5294 | 0.4649 | 0.5882 |
CIDEr | 0.1176 | −0.1912 | −0.0221 | 0.1912 |
METEOR | 0.2353 | 0.6324 | 0.6126 | 0.4265 |
Length^ | −0.0294 | 0.4265 | 0.2583 | 0.1618 |
Novel unigram^ | 0.1471 | −0.2206 | −0.1402 | 0.1029 |
Novel bi-gram^ | 0.0294 | −0.5441 | −0.3469 | −0.1029 |
Novel tri-gram^ | 0.0294 | −0.5735 | −0.3469 | −0.1324 |
Repeated unigram^ | −0.3824 | 0.1029 | −0.0664 | −0.3676 |
Repeated bi-gram^ | −0.3824 | −0.0147 | −0.2435 | −0.4559 |
Repeated tri-gram^ | −0.2206 | 0.1471 | −0.0221 | −0.2647 |
Stats-coverage^ | −0.1324 | 0.3529 | 0.1550 | −0.0294 |
Stats-compression^ | 0.1176 | −0.4265 | −0.2288 | −0.0147 |
Stats-density^ | 0.1618 | 0.6471 | 0.3911 | 0.2941 |