NQ Behavioural splits (Lewis et al., 2021). “Q overlap” are test questions with paraphrases in training data. “A-only” are test questions where answers appear in training data, but questions do not. “No overlap” where neither question or answer overlap.
Model . | Total . | Q . | A-only . | No . |
---|---|---|---|---|
. | Overlap . | Overlap . | Overlap . | |
CBQA BART w/ NQ | 26.5 | 67.6 | 10.2 | 0.8 |
CBQA BART w/ NQ+PAQ | 28.2 | 52.8 | 24.4 | 9.4 |
+ final NQ finetune | 32.7 | 69.8 | 22.2 | 7.51 |
RePAQ (retriever only) | 41.7 | 65.4 | 31.7 | 21.4 |
RePAQ (with reranker) | 47.3 | 73.5 | 39.7 | 26.0 |
Model . | Total . | Q . | A-only . | No . |
---|---|---|---|---|
. | Overlap . | Overlap . | Overlap . | |
CBQA BART w/ NQ | 26.5 | 67.6 | 10.2 | 0.8 |
CBQA BART w/ NQ+PAQ | 28.2 | 52.8 | 24.4 | 9.4 |
+ final NQ finetune | 32.7 | 69.8 | 22.2 | 7.51 |
RePAQ (retriever only) | 41.7 | 65.4 | 31.7 | 21.4 |
RePAQ (with reranker) | 47.3 | 73.5 | 39.7 | 26.0 |