Table 2: 

Number of images in evaluation tasks and whether datasets were used in a zero-shot (ZS) or fine-tuned (FT) setting.

Dataset# imagesZSFT
traintest
Flickr30k 29K 1K ✓ ✓ 
MSCOCO n/a 5K ✓  
VQA 440K 210K  ✓ 
Dataset# imagesZSFT
traintest
Flickr30k 29K 1K ✓ ✓ 
MSCOCO n/a 5K ✓  
VQA 440K 210K  ✓ 
Close Modal

or Create an Account

Close Modal
Close Modal