Skip to Main Content
Table 7: 

Proportion of aligned cases between humans and the models when considering all pairs in the benchmarks (all), their highest + high partition (similar), and lowest + low partition (dissimilar).

 RG65 WS353 SL999 MEN SVERB 
all 
BERT 0.52 0.39 0.38 0.41 0.31 
ViLBERT 0.49 0.37 0.35 0.43 0.30 
Vokenization 0.60 0.39 0.35 0.45 0.29 
 
similar 
BERT 0.62 0.45 0.41 0.44 0.33 
ViLBERT 0.50 0.43 0.38 0.47 0.31 
Vokenization 0.73 0.48 0.38 0.46 0.29 
 
dissimilar 
BERT 0.46 0.43 0.39 0.42 0.32 
ViLBERT 0.50 0.39 0.33 0.44 0.33 
Vokenization 0.54 0.41 0.36 0.48 0.31 
 RG65 WS353 SL999 MEN SVERB 
all 
BERT 0.52 0.39 0.38 0.41 0.31 
ViLBERT 0.49 0.37 0.35 0.43 0.30 
Vokenization 0.60 0.39 0.35 0.45 0.29 
 
similar 
BERT 0.62 0.45 0.41 0.44 0.33 
ViLBERT 0.50 0.43 0.38 0.47 0.31 
Vokenization 0.73 0.48 0.38 0.46 0.29 
 
dissimilar 
BERT 0.46 0.43 0.39 0.42 0.32 
ViLBERT 0.50 0.39 0.33 0.44 0.33 
Vokenization 0.54 0.41 0.36 0.48 0.31 
Close Modal

or Create an Account

Close Modal
Close Modal