Proportion of aligned cases between humans and the models when considering all pairs in the benchmarks (all), their highest + high partition (similar), and lowest + low partition (dissimilar).
RG65 | WS353 | SL999 | MEN | SVERB | |
all | |||||
BERT | 0.52 | 0.39 | 0.38 | 0.41 | 0.31 |
ViLBERT | 0.49 | 0.37 | 0.35 | 0.43 | 0.30 |
Vokenization | 0.60 | 0.39 | 0.35 | 0.45 | 0.29 |
similar | |||||
BERT | 0.62 | 0.45 | 0.41 | 0.44 | 0.33 |
ViLBERT | 0.50 | 0.43 | 0.38 | 0.47 | 0.31 |
Vokenization | 0.73 | 0.48 | 0.38 | 0.46 | 0.29 |
dissimilar | |||||
BERT | 0.46 | 0.43 | 0.39 | 0.42 | 0.32 |
ViLBERT | 0.50 | 0.39 | 0.33 | 0.44 | 0.33 |
Vokenization | 0.54 | 0.41 | 0.36 | 0.48 | 0.31 |
RG65 | WS353 | SL999 | MEN | SVERB | |
all | |||||
BERT | 0.52 | 0.39 | 0.38 | 0.41 | 0.31 |
ViLBERT | 0.49 | 0.37 | 0.35 | 0.43 | 0.30 |
Vokenization | 0.60 | 0.39 | 0.35 | 0.45 | 0.29 |
similar | |||||
BERT | 0.62 | 0.45 | 0.41 | 0.44 | 0.33 |
ViLBERT | 0.50 | 0.43 | 0.38 | 0.47 | 0.31 |
Vokenization | 0.73 | 0.48 | 0.38 | 0.46 | 0.29 |
dissimilar | |||||
BERT | 0.46 | 0.43 | 0.39 | 0.42 | 0.32 |
ViLBERT | 0.50 | 0.39 | 0.33 | 0.44 | 0.33 |
Vokenization | 0.54 | 0.41 | 0.36 | 0.48 | 0.31 |