Skip to Main Content
Table 2: 

Statistics of the datasets in our experiments showing the size of training and testing sets, proportion of instances having a figurative PIE (pct. idiomatic), the size of the PIE set (# of idioms), the average number of occurrences per PIE (avg. idiom occ), and standard deviation of the number of occurrences per PIE (std. idiom occ).

DatasetSplitSize (pct. idiomatic)# of idiomsAvg. idiom occStd. idiom occ
TrainTestTrainTestTrainTestTrainTest
MAGPIE Random 32,162 (76.63%) 4,030 (76.48%) 1,675 1,072 19.2 3.76 24.82 3.65 
Type-aware 32,155 (77.90%) 4,050 (70.54%) 1,411 168 22.79 24.11 29.96 32.05 
SemEval5B Random 1,420 (50.56%) 357 (50.70%) 10 10 142 35.7 51.25 12.69 
Type-aware 1,111 (58.74%) 341 (58.65%) 31 35.81 37.89 28.84 30.12 
VNC Random 2,285 (79.52%) 254 (70.47%) 53 50 43.11 5.08 25.89 2.93 
Type-aware 2,191 (79.69%) 348 (71.84%) 47 46.62 58 27.99 27.77 
DatasetSplitSize (pct. idiomatic)# of idiomsAvg. idiom occStd. idiom occ
TrainTestTrainTestTrainTestTrainTest
MAGPIE Random 32,162 (76.63%) 4,030 (76.48%) 1,675 1,072 19.2 3.76 24.82 3.65 
Type-aware 32,155 (77.90%) 4,050 (70.54%) 1,411 168 22.79 24.11 29.96 32.05 
SemEval5B Random 1,420 (50.56%) 357 (50.70%) 10 10 142 35.7 51.25 12.69 
Type-aware 1,111 (58.74%) 341 (58.65%) 31 35.81 37.89 28.84 30.12 
VNC Random 2,285 (79.52%) 254 (70.47%) 53 50 43.11 5.08 25.89 2.93 
Type-aware 2,191 (79.69%) 348 (71.84%) 47 46.62 58 27.99 27.77 
Close Modal

or Create an Account

Close Modal
Close Modal