Performance of Summary Inconsistency Detection models on the test set of the SummaC Benchmark. Balanced accuracy is computed for each model on the six datasets in the benchmark, and the average is computed as the overall performance on the benchmark. We obtain confidence intervals comparing the SummaC models to prior work: * indicates an improvement with 95% confidence, and ** 99% confidence (details in Section 5.2.1). The results of the throughput analysis of Section 5.2.2 are in column Doc./min (Documents per minute).
Model Type . | Model Name . | SummaC Benchmark Datasets . | |||||||
---|---|---|---|---|---|---|---|---|---|
CGS . | XSF . | Polytope . | FactCC . | SummEval . | FRANK . | Overall . | Doc./min. . | ||
Baseline | NER-Overlap | 53.0 | 63.3 | 52.0 | 55.0 | 56.8 | 60.9 | 56.8 | 55,900 |
MNLI-doc | 57.6 | 57.5 | 61.0 | 61.3 | 66.6 | 63.6 | 61.3 | 6,200 | |
Classifier | FactCC-CLS | 63.1 | 57.6 | 61.0 | 75.9 | 60.1 | 59.4 | 62.8 | 13,900 |
Parsing | DAE | 63.4 | 50.8 | 62.8 | 75.9 | 70.3 | 61.7 | 64.2 | 755 |
QAG | FEQA | 61.0 | 56.0 | 57.8 | 53.6 | 53.8 | 69.9 | 58.7 | 33.9 |
QuestEval | 62.6 | 62.1 | 70.3* | 66.6 | 72.5 | 82.1 | 69.4 | 22.7 | |
NLI | SummaCZS | 70.4* | 58.4 | 62.0 | 83.8* | 78.7 | 79.0 | 72.1* | 435 |
SummaCConv | 64.7 | 66.4* | 62.7 | 89.5** | 81.7** | 81.6 | 74.4** | 433 |
Model Type . | Model Name . | SummaC Benchmark Datasets . | |||||||
---|---|---|---|---|---|---|---|---|---|
CGS . | XSF . | Polytope . | FactCC . | SummEval . | FRANK . | Overall . | Doc./min. . | ||
Baseline | NER-Overlap | 53.0 | 63.3 | 52.0 | 55.0 | 56.8 | 60.9 | 56.8 | 55,900 |
MNLI-doc | 57.6 | 57.5 | 61.0 | 61.3 | 66.6 | 63.6 | 61.3 | 6,200 | |
Classifier | FactCC-CLS | 63.1 | 57.6 | 61.0 | 75.9 | 60.1 | 59.4 | 62.8 | 13,900 |
Parsing | DAE | 63.4 | 50.8 | 62.8 | 75.9 | 70.3 | 61.7 | 64.2 | 755 |
QAG | FEQA | 61.0 | 56.0 | 57.8 | 53.6 | 53.8 | 69.9 | 58.7 | 33.9 |
QuestEval | 62.6 | 62.1 | 70.3* | 66.6 | 72.5 | 82.1 | 69.4 | 22.7 | |
NLI | SummaCZS | 70.4* | 58.4 | 62.0 | 83.8* | 78.7 | 79.0 | 72.1* | 435 |
SummaCConv | 64.7 | 66.4* | 62.7 | 89.5** | 81.7** | 81.6 | 74.4** | 433 |