Skip to Main Content
Table 1: 

Statistics of the six datasets in the SummaC Benchmark. For each dataset, we report the validation and test set sizes, the percentage of summaries with positive (consistent) labels (% Positive), the inter-annotator agreement (when available, IAA), the source of the documents (Source: C for CNN/DM, X for XSum), the number of summarizers evaluated, and the number of sublabels annotated.

DatasetSize% PositiveIAASource# Summarizer# Sublabel
Valid.Test
CoGenSumm (Falke et al., 2019) 1281 400 49.8 0.65 
XSumFaith (Maynez et al., 2020) 1250 1250 10.2 0.80 
Polytope (Huang et al., 2020) 634 634 6.6 − 10 
FactCC (Kryscinski et al., 2020) 931 503 85.0 − 10 
SummEval (Fabbri et al., 2021) 850 850 90.6 0.7 23 
FRANK (Pagnoni et al., 2021) 671 1575 33.2 0.53 C+X 
DatasetSize% PositiveIAASource# Summarizer# Sublabel
Valid.Test
CoGenSumm (Falke et al., 2019) 1281 400 49.8 0.65 
XSumFaith (Maynez et al., 2020) 1250 1250 10.2 0.80 
Polytope (Huang et al., 2020) 634 634 6.6 − 10 
FactCC (Kryscinski et al., 2020) 931 503 85.0 − 10 
SummEval (Fabbri et al., 2021) 850 850 90.6 0.7 23 
FRANK (Pagnoni et al., 2021) 671 1575 33.2 0.53 C+X 
Close Modal

or Create an Account

Close Modal
Close Modal