Evaluation time for the MSCOCO test sets of 1k, 5k, and 100k images on an NVIDIA V100 with batch size 512. The time includes bi-encoding images and text, i.e., the embeddings are not pre-computed. * denotes extrapolated values.
Sign In or Create an Account