Results on Multi30k (multilingual setups). Following prior work (Ni et al., 2021), we report mean Recall (mR) scores: mR computes an average score of Recall@1, Recall@5 and Recall@10 on image-to-text retrieval and text- to-image retrieval tasks. All methods in the comparison use text data from all four languages. We divide the models into groups G1-G3 as in Table 1. † indicates results taken directly from the literature (Ni et al., 2021) and ‡ indicates our own results. MULE (Kim et al., 2020); S-LIWE (Wehrmann et al., 2019); SMALR (Burns et al., 2020); CEM3P † (Ni et al., 2021).
Type . | Model . | en . | de . | fr . | cs . | mean . |
---|---|---|---|---|---|---|
G1. PT | MULE | 70.3 | 64.1 | 62.3 | 57.7 | 63.6 |
S-LIWE | 76.3 | 72.1 | 63.4 | 59.4 | 67.8 | |
SMALR | 74.5 | 69.8 | 65.9 | 64.8 | 68.8 | |
G2. CE | CEM3P † | 86.7 | 82.2 | 73.5 | 70.2 | 78.2 |
CEM3P ‡ | 83.7 | 79.4 | 76.5 | 74.6 | 78.6 | |
G3. BE | BEM3P | 82.8 | 78.0 | 75.1 | 73.6 | 77.4 |
Sep+CoopM3P | 84.8 | 80.5 | 77.5 | 75.6 | 79.6 | |
Joint+CoopM3P | 83.0 | 79.2 | 75.9 | 74.0 | 78.0 |
Type . | Model . | en . | de . | fr . | cs . | mean . |
---|---|---|---|---|---|---|
G1. PT | MULE | 70.3 | 64.1 | 62.3 | 57.7 | 63.6 |
S-LIWE | 76.3 | 72.1 | 63.4 | 59.4 | 67.8 | |
SMALR | 74.5 | 69.8 | 65.9 | 64.8 | 68.8 | |
G2. CE | CEM3P † | 86.7 | 82.2 | 73.5 | 70.2 | 78.2 |
CEM3P ‡ | 83.7 | 79.4 | 76.5 | 74.6 | 78.6 | |
G3. BE | BEM3P | 82.8 | 78.0 | 75.1 | 73.6 | 77.4 |
Sep+CoopM3P | 84.8 | 80.5 | 77.5 | 75.6 | 79.6 | |
Joint+CoopM3P | 83.0 | 79.2 | 75.9 | 74.0 | 78.0 |