Statistics of the benchmarks before (original) and after (found in VICO) filtering them based on VICO: rel. refers to the type of semantic relation, i.e., (S)imilarity or (R)elatedness; # W to the number of unique words present; concr. to average concreteness of the pairs (in brackets, # found in Brysbaert et al., 2014). Within found in VICO, percentages in brackets refer to the coverage compared to original.
original | found in VICO | |||||||
benchmark | rel. | PoS | # pairs | # W | concr. (#) | # pairs (%) | # W (%) | concr. (#) |
RG65 | S | N | 65 | 48 | 4.37 (65) | 65 (100%) | 48 (100%) | 4.37 (65) |
WordSim353 | R | N, V, Adj | 353 | 437 | 3.82 (331) | 306 (86.7%) | 384 (87.9%) | 3.91 (300) |
SimLex999 | S | N, V, Adj | 999 | 1028 | 3.61 (999) | 957 (95.8%) | 994 (99.5%) | 3.65 (957) |
MEN | R | N, V, Adj | 3000 | 752 | 4.41 (2954) | 2976 (99.2%) | 750 (99.7%) | 4.41 (2930) |
SimVerb3500 | S | V | 3500 | 827 | 3.08 (3487) | 2890 (82.6%) | 729 (88.2%) | 3.14 (2890) |
total | 7917 | 2453 | 7194 (90.9%) | 2278 (92.9%) |
original | found in VICO | |||||||
benchmark | rel. | PoS | # pairs | # W | concr. (#) | # pairs (%) | # W (%) | concr. (#) |
RG65 | S | N | 65 | 48 | 4.37 (65) | 65 (100%) | 48 (100%) | 4.37 (65) |
WordSim353 | R | N, V, Adj | 353 | 437 | 3.82 (331) | 306 (86.7%) | 384 (87.9%) | 3.91 (300) |
SimLex999 | S | N, V, Adj | 999 | 1028 | 3.61 (999) | 957 (95.8%) | 994 (99.5%) | 3.65 (957) |
MEN | R | N, V, Adj | 3000 | 752 | 4.41 (2954) | 2976 (99.2%) | 750 (99.7%) | 4.41 (2930) |
SimVerb3500 | S | V | 3500 | 827 | 3.08 (3487) | 2890 (82.6%) | 729 (88.2%) | 3.14 (2890) |
total | 7917 | 2453 | 7194 (90.9%) | 2278 (92.9%) |