List of common subtasks of TST and their corresponding attribute values and datasets. For datasets with multiple attribute-specific corpora, we report their sizes by the number of sentences of the smallest of all corpora. We also report whether the dataset is parallel (Pa?).
Task . | Attribute Values . | Datasets . | Size . | Pa? . |
---|---|---|---|---|
Style Features | ||||
Formality | Informal ↔ Formal | GYAFC3 (Rao and Tetreault 2018) | 50K | ✓ |
XFORMAL4 (Briakou et al. 2021b) | 1K | ✓ | ||
Politeness | Impolite → Polite | Politeness5 (Madaan et al. 2020) | 1M | ✗ |
Gender | Masculine ↔ Feminine | Yelp Gender6 (Prabhumoye et al. 2018) | 2.5M | ✗ |
Humor & Romance | Factual ↔ Humorous ↔ Romantic | FlickrStyle7 (Gan et al. 2017) | 5K | ✓ |
Biasedness | Biased → Neutral | Wiki Neutrality8 (Pryzant et al. 2020) | 181K | ✓ |
Toxicity | Offensive → Non-offensive | Twitter (dos Santos, Melnyk, and Padhi 2018) | 58K | ✗ |
Reddit (dos Santos, Melnyk, and Padhi 2018) | 224K | |||
Reddit Politics (Tran, Zhang, and Soleymani 2020) | 350K | |||
Authorship | Shakespearean ↔ Modern | Shakespeare (Xu et al. 2012) | 18K | ✓ |
Different Bible translators | Bible9 (Carlson, Riddell, and Rockmore 2018) | 28M | ||
Simplicity | Complicated → Simple | PWKP (Zhu, Bernhard, and Gurevych 2010) | 108K | ✓ |
Expert (den Bercken, Sips, and Lofi 2019) | 2.2K | ✓ | ||
MIMIC-III10 (Weng, Chung, and Szolovits 2019) | 59K | ✗ | ||
MSD11 (Cao et al. 2020) | 114K | ✓ | ||
Engagingness | Plain → Attractive | Math12 (Koncel-Kedziorski et al. 2016) | <1K | ✓ |
TitleStylist13 (Jin et al. 2020a) | 146K | ✗ | ||
Content Preferences | ||||
Sentiment | Positive ↔ Negative | Yelp14 (Shen et al. 2017) | 250K | ✗ |
Amazon15 (He and McAuley 2016) | 277K | |||
Topic | Entertainment ↔ Politics | Yahoo! Answers16 (Huang et al. 2020) | 153K | ✗ |
Politics | Democratic ↔ Republican | Political17 (Voigt et al. 2018) | 540K | ✗ |
Task . | Attribute Values . | Datasets . | Size . | Pa? . |
---|---|---|---|---|
Style Features | ||||
Formality | Informal ↔ Formal | GYAFC3 (Rao and Tetreault 2018) | 50K | ✓ |
XFORMAL4 (Briakou et al. 2021b) | 1K | ✓ | ||
Politeness | Impolite → Polite | Politeness5 (Madaan et al. 2020) | 1M | ✗ |
Gender | Masculine ↔ Feminine | Yelp Gender6 (Prabhumoye et al. 2018) | 2.5M | ✗ |
Humor & Romance | Factual ↔ Humorous ↔ Romantic | FlickrStyle7 (Gan et al. 2017) | 5K | ✓ |
Biasedness | Biased → Neutral | Wiki Neutrality8 (Pryzant et al. 2020) | 181K | ✓ |
Toxicity | Offensive → Non-offensive | Twitter (dos Santos, Melnyk, and Padhi 2018) | 58K | ✗ |
Reddit (dos Santos, Melnyk, and Padhi 2018) | 224K | |||
Reddit Politics (Tran, Zhang, and Soleymani 2020) | 350K | |||
Authorship | Shakespearean ↔ Modern | Shakespeare (Xu et al. 2012) | 18K | ✓ |
Different Bible translators | Bible9 (Carlson, Riddell, and Rockmore 2018) | 28M | ||
Simplicity | Complicated → Simple | PWKP (Zhu, Bernhard, and Gurevych 2010) | 108K | ✓ |
Expert (den Bercken, Sips, and Lofi 2019) | 2.2K | ✓ | ||
MIMIC-III10 (Weng, Chung, and Szolovits 2019) | 59K | ✗ | ||
MSD11 (Cao et al. 2020) | 114K | ✓ | ||
Engagingness | Plain → Attractive | Math12 (Koncel-Kedziorski et al. 2016) | <1K | ✓ |
TitleStylist13 (Jin et al. 2020a) | 146K | ✗ | ||
Content Preferences | ||||
Sentiment | Positive ↔ Negative | Yelp14 (Shen et al. 2017) | 250K | ✗ |
Amazon15 (He and McAuley 2016) | 277K | |||
Topic | Entertainment ↔ Politics | Yahoo! Answers16 (Huang et al. 2020) | 153K | ✗ |
Politics | Democratic ↔ Republican | Political17 (Voigt et al. 2018) | 540K | ✗ |