Effect of NLI category inclusion on SummaCConv performance. Models had access to different subsets of the three category predictions (Entailment, Neutral, Contradiction), with performance measured in terms of balanced accuracy. Experiments were performed with 3 NLI models: Vitamic C+MNLI, ANLI, and MNLI.
Category . | SummaCConv Performance . | ||||
---|---|---|---|---|---|
E . | N . | C . | VITC+MNLI . | ANLI . | MNLI . |
✓ | 74.4 | 69.2 | 72.6 | ||
✓ | 71.2 | 55.8 | 66.4 | ||
✓ | 72.5 | 69.2 | 72.6 | ||
✓ | ✓ | 73.1 | 69.6 | 72.6 | |
✓ | ✓ | 74.0 | 70.2 | 73.0 | |
✓ | ✓ | 72.5 | 69.2 | 72.6 | |
✓ | ✓ | ✓ | 74.0 | 69.7 | 73.0 |
Category . | SummaCConv Performance . | ||||
---|---|---|---|---|---|
E . | N . | C . | VITC+MNLI . | ANLI . | MNLI . |
✓ | 74.4 | 69.2 | 72.6 | ||
✓ | 71.2 | 55.8 | 66.4 | ||
✓ | 72.5 | 69.2 | 72.6 | ||
✓ | ✓ | 73.1 | 69.6 | 72.6 | |
✓ | ✓ | 74.0 | 70.2 | 73.0 | |
✓ | ✓ | 72.5 | 69.2 | 72.6 | |
✓ | ✓ | ✓ | 74.0 | 69.7 | 73.0 |