Table 7:

NQ Behavioural splits (Lewis et al., 2021). “Q overlap” are test questions with paraphrases in training data. “A-only” are test questions where answers appear in training data, but questions do not. “No overlap” where neither question or answer overlap.

ModelTotalQA-onlyNo
OverlapOverlapOverlap
CBQA BART w/ NQ 26.5 67.6 10.2 0.8
CBQA BART w/ NQ+PAQ 28.2 52.8 24.4 9.4
+ final NQ finetune 32.7 69.8 22.2 7.51
RePAQ (retriever only) 41.7 65.4 31.7 21.4
RePAQ (with reranker) 47.3 73.5 39.7 26.0
