Skip to Main Content
Table 2: 

F1 scores of Large-sized model variants for salient span mask prediction on CustomNews and TempLAMA. T5-CBQA is the pretrained model from Roberts et al. (2020), and T5-CBQA-ft is further finetuned on TempLAMA. The Yearly model is an ensemble of 9 models each finetuned on a yearly slice of the training data between 2010 and 2018. We use the 2018 model when testing on 2019–20. The Uniform and Temporal models are trained on the entire data from 2010–18, and the latter has additional temporal context. The F1 scores are macro-averaged across the evaluation years. The Temporal model performs better on TempLAMA, which is focused only on temporally scoped facts, as well as on the unseen years for CustomNews.

Model#ParametersCustomNewsTempLAMA
2010–182019–20Overall2010–182019–20Overall
T5-CBQA 737M 20.2 19.8 20.1 5.4 4.3 5.2 
T5-CBQA-ft 737M 15.2 15.7 15.3 17.8 15.3 17.3 
Uniform 737M 30.6 27.8 30.1 28.1 19.8 26.6 
Yearly 6.6B 33.4 26.7 32.2 28.5 21.8 27.3 
Temporal 737M 32.1 29.5 31.6 29.6 22.2 28.2 
Model#ParametersCustomNewsTempLAMA
2010–182019–20Overall2010–182019–20Overall
T5-CBQA 737M 20.2 19.8 20.1 5.4 4.3 5.2 
T5-CBQA-ft 737M 15.2 15.7 15.3 17.8 15.3 17.3 
Uniform 737M 30.6 27.8 30.1 28.1 19.8 26.6 
Yearly 6.6B 33.4 26.7 32.2 28.5 21.8 27.3 
Temporal 737M 32.1 29.5 31.6 29.6 22.2 28.2 
Close Modal

or Create an Account

Close Modal
Close Modal