Skip to Main Content
Table 2: 

Results on Multi30k (multilingual setups). Following prior work (Ni et al., 2021), we report mean Recall (mR) scores: mR computes an average score of Recall@1, Recall@5 and Recall@10 on image-to-text retrieval and text- to-image retrieval tasks. All methods in the comparison use text data from all four languages. We divide the models into groups G1-G3 as in Table 1. † indicates results taken directly from the literature (Ni et al., 2021) and ‡ indicates our own results. MULE (Kim et al., 2020); S-LIWE (Wehrmann et al., 2019); SMALR (Burns et al., 2020); CEM3P † (Ni et al., 2021).

TypeModelendefrcsmean
G1. PT MULE 70.3 64.1 62.3 57.7 63.6 
S-LIWE 76.3 72.1 63.4 59.4 67.8 
SMALR 74.5 69.8 65.9 64.8 68.8 
 
G2. CE CEM3P † 86.7 82.2 73.5 70.2 78.2 
CEM3P ‡ 83.7 79.4 76.5 74.6 78.6 
 
G3. BE BEM3P 82.8 78.0 75.1 73.6 77.4 
Sep+CoopM3P 84.8 80.5 77.5 75.6 79.6 
Joint+CoopM3P 83.0 79.2 75.9 74.0 78.0 
TypeModelendefrcsmean
G1. PT MULE 70.3 64.1 62.3 57.7 63.6 
S-LIWE 76.3 72.1 63.4 59.4 67.8 
SMALR 74.5 69.8 65.9 64.8 68.8 
 
G2. CE CEM3P † 86.7 82.2 73.5 70.2 78.2 
CEM3P ‡ 83.7 79.4 76.5 74.6 78.6 
 
G3. BE BEM3P 82.8 78.0 75.1 73.6 77.4 
Sep+CoopM3P 84.8 80.5 77.5 75.6 79.6 
Joint+CoopM3P 83.0 79.2 75.9 74.0 78.0 
Close Modal

or Create an Account

Close Modal
Close Modal