Skip to Main Content
Table 4: 

Evaluation time for the MSCOCO test sets of 1k, 5k, and 100k images on an NVIDIA V100 with batch size 512. The time includes bi-encoding images and text, i.e., the embeddings are not pre-computed. * denotes extrapolated values.

Model1k5k100k
BE 5s 30s 7min 
Sep/Joint+Coop 5min 25min 8.5h 
CE 2h 50h 2.3a* 
Model1k5k100k
BE 5s 30s 7min 
Sep/Joint+Coop 5min 25min 8.5h 
CE 2h 50h 2.3a* 
Close Modal

or Create an Account

Close Modal
Close Modal