R@1 with a classification ITM loss (cls) and contrastive ITM loss (con) for a MMT with one multimodal layer (MMT-1) and a model which only has modality specific attention (MSA).
Model . | Loss . | Negatives . | Flickr-ZS . | COCO-ZS . |
---|---|---|---|---|
MSA | Cls. | 1 | 15.0 | 6.9 |
MSA | Con. | 32 | 17.9 | 8.3 |
MSA | Con. | 1024 | 19.7 | 9.5 |
MMT-1 | Cls. | 1 | 37.3 | 19.1 |
MMT-1 | Con. | 32 | 35.7 | 19.1 |
Model . | Loss . | Negatives . | Flickr-ZS . | COCO-ZS . |
---|---|---|---|---|
MSA | Cls. | 1 | 15.0 | 6.9 |
MSA | Con. | 32 | 17.9 | 8.3 |
MSA | Con. | 1024 | 19.7 | 9.5 |
MMT-1 | Cls. | 1 | 37.3 | 19.1 |
MMT-1 | Con. | 32 | 35.7 | 19.1 |