Skip to Main Content
Table 1: 
The data sets, the split sizes (train, dev, test), and the available categories and their properties. Numbers inside parentheses are the number of unique category labels.
Data SetSplitsCategoriesProperties
Yelp 2013 62,522 / 7,773 / 8,671 • users (1.6k) • products (1.6k) Categories can be sparse (i.e., there may not be enough reviews for each user/product). 
AAPR 33,464 / 2,000 / 2,000 • author (48k) • research area (144) Authors are sparse and have many category labels. Categories can have multiple labels (e.g., multiple authors, multidisciplinary fields). 
PolMed 4,500 / 0 / 500 • politician (505) • media source (2) • audience (2) • political bias (2) The data set has more categories. Categories with binary labels may not be diverse enough to be useful. 
Data SetSplitsCategoriesProperties
Yelp 2013 62,522 / 7,773 / 8,671 • users (1.6k) • products (1.6k) Categories can be sparse (i.e., there may not be enough reviews for each user/product). 
AAPR 33,464 / 2,000 / 2,000 • author (48k) • research area (144) Authors are sparse and have many category labels. Categories can have multiple labels (e.g., multiple authors, multidisciplinary fields). 
PolMed 4,500 / 0 / 500 • politician (505) • media source (2) • audience (2) • political bias (2) The data set has more categories. Categories with binary labels may not be diverse enough to be useful. 
Close Modal

or Create an Account

Close Modal
Close Modal