HANS heuristics and RoBERTa-base and SIFT’s accuracy. Examples are due to McCoy et al. (2019). “E”: entailment. “N”: non-entailment. Bold font indicates better result in each category.
Heuristic . | Premise . | Hypothesis . | Label . | RoBERTa . | SIFT . |
---|---|---|---|---|---|
Lexical | The banker near the judge saw the actor. | The banker saw the actor. | E | 98.3 | 98.9 |
Overlap | The judge by the actor stopped the banker. | The banker stopped the actor. | N | 68.1 | 71.0 |
Subsequence | The artist and the student called the judge. | The student called the judge. | E | 99.7 | 99.8 |
The judges heard the actors resigned. | The judges heard the actors. | N | 25.8 | 29.5 | |
Constituent | Before the actor slept, the senator ran. | The actor slept. | E | 99.3 | 98.8 |
If the actor slept, the judge saw the artist. | The actor slept. | N | 37.9 | 37.6 |
Heuristic . | Premise . | Hypothesis . | Label . | RoBERTa . | SIFT . |
---|---|---|---|---|---|
Lexical | The banker near the judge saw the actor. | The banker saw the actor. | E | 98.3 | 98.9 |
Overlap | The judge by the actor stopped the banker. | The banker stopped the actor. | N | 68.1 | 71.0 |
Subsequence | The artist and the student called the judge. | The student called the judge. | E | 99.7 | 99.8 |
The judges heard the actors resigned. | The judges heard the actors. | N | 25.8 | 29.5 | |
Constituent | Before the actor slept, the senator ran. | The actor slept. | E | 99.3 | 98.8 |
If the actor slept, the judge saw the artist. | The actor slept. | N | 37.9 | 37.6 |