Skip to Main Content
Table 4: 

Spearman’s rank ρ correlation between similarities computed with representations by all tested models and human similarity judgments in the five evaluation benchmarks: the higher the better. Results in bold are the highest in the column among models run on our dataset (that is, all but the top 2 models). Results underlined are the highest among LV models. *Original results from Bommasani et al. (2020).

model input Spearman ρ correlation (layer) 
  RG65 WS353 SL999 MEN SVERB 
BERT-1M-Wiki* L 0.7242 (1) 0.7048 (1) 0.5134 (3) – 0.3948 (4) 
BERT-Wiki ours L 0.8107 (1) 0.7262 (1) 0.5213 (0) 0.7176 (2) 0.4039 (4) 
 
GloVe L 0.7693 0.6097 0.3884 0.7296 0.2183 
BERT L 0.8124 (2) 0.7096 (1) 0.5191 (0) 0.7368 (2) 0.4027 (3) 
 
LXMERT LV 0.7821 (27) 0.6000 (27) 0.4438 (21) 0.7417 (33) 0.2443 (21) 
UNITER LV 0.7679 (18) 0.6813 (2) 0.4843 (2) 0.7483 (20) 0.3926 (10) 
ViLBERT LV 0.7927 (20) 0.6204 (14) 0.4729 (16) 0.7714 (26) 0.3875 (14) 
VisualBERT LV 0.7592 (2) 0.6778 (2) 0.4797 (4) 0.7512 (20) 0.3833 (10) 
 
Vokenization LV 0.8456 (9) 0.6818 (3) 0.4881 (9) 0.8068 (10) 0.3439 (9) 
model input Spearman ρ correlation (layer) 
  RG65 WS353 SL999 MEN SVERB 
BERT-1M-Wiki* L 0.7242 (1) 0.7048 (1) 0.5134 (3) – 0.3948 (4) 
BERT-Wiki ours L 0.8107 (1) 0.7262 (1) 0.5213 (0) 0.7176 (2) 0.4039 (4) 
 
GloVe L 0.7693 0.6097 0.3884 0.7296 0.2183 
BERT L 0.8124 (2) 0.7096 (1) 0.5191 (0) 0.7368 (2) 0.4027 (3) 
 
LXMERT LV 0.7821 (27) 0.6000 (27) 0.4438 (21) 0.7417 (33) 0.2443 (21) 
UNITER LV 0.7679 (18) 0.6813 (2) 0.4843 (2) 0.7483 (20) 0.3926 (10) 
ViLBERT LV 0.7927 (20) 0.6204 (14) 0.4729 (16) 0.7714 (26) 0.3875 (14) 
VisualBERT LV 0.7592 (2) 0.6778 (2) 0.4797 (4) 0.7512 (20) 0.3833 (10) 
 
Vokenization LV 0.8456 (9) 0.6818 (3) 0.4881 (9) 0.8068 (10) 0.3439 (9) 
Close Modal

or Create an Account

Close Modal
Close Modal