Summary of the data sets for each of the languages/conditions. Note that Chinese tokenized input units are given as ‘words’ because in general the segmentation quality is very poor and therefore the units only loosely correspond to Chinese words. The Korean unit is listed as a phonological phrase, which includes words and additional particles, in Korean terminology referred to as eojeol.