Examples of methods from each category, and papers studying these methods. These lists are non-exhaustive. In the interest of replicability, we have made our coding for all papers publicly available at: http://www.shorturl.at/stuAT.
Category . | Example Methods . | Example Studies . |
---|---|---|
Feat Aug (FA) | Structural correspondence learning, Frustratingly easy domain adaptation | (Blitzer et al., 2006; Daumé III, 2007) |
Feat Gen (FG) | Marginalized stacked denoising autoencoders, Deep belief networks | (Jochim and Schütze, 2014; Ji et al., 2015; Yang et al., 2015) |
Loss Aug (LA) | Multi-task learning, Adversarial learning, Regularization-based methods | (Zhang et al., 2017; Liu et al., 2019; Chen et al., 2020) |
Init (PI) | Prior estimation, Parameter matrix initialization | (Chan and Ng, 2006; Al Boni et al., 2015) |
Add (PA) | Adapter networks | (Lin and Lu, 2018) |
Freeze (FR) | Embedding freezing, Layerwise freezing | (Yin et al., 2015; Tourille et al., 2017) |
Ensemble (EN) | Mixture of experts, Weighted averaging | (McClosky et al., 2010; Nguyen et al., 2014) |
Instance Weighting (IW) | Classifier based weighting | (Jiang and Zhai, 2007; Jeong et al., 2009) |
Data Selection (DS) | Confidence-based sample selection | (Scheible and Schütze, 2013; Braud and Denis, 2014) |
Pseudo-Labeling (PL) | Semi-supervised learning, Self-training | (Umansky-Pesin et al., 2010; Lison et al., 2020) |
Noising/Denoising (NO) | Token dropout | (Pilán et al., 2016) |
Active Learning (AL) | Sample selection via active learning | (Rai et al., 2010; Wu et al., 2017) |
Pretraining (PT) | Language model pretraining, Supervised pretraining | (Conneau et al., 2017; Howard and Ruder, 2018) |
Instance Learning (IL) | Nearest neighbor learning | (Gong et al., 2016) |
Category . | Example Methods . | Example Studies . |
---|---|---|
Feat Aug (FA) | Structural correspondence learning, Frustratingly easy domain adaptation | (Blitzer et al., 2006; Daumé III, 2007) |
Feat Gen (FG) | Marginalized stacked denoising autoencoders, Deep belief networks | (Jochim and Schütze, 2014; Ji et al., 2015; Yang et al., 2015) |
Loss Aug (LA) | Multi-task learning, Adversarial learning, Regularization-based methods | (Zhang et al., 2017; Liu et al., 2019; Chen et al., 2020) |
Init (PI) | Prior estimation, Parameter matrix initialization | (Chan and Ng, 2006; Al Boni et al., 2015) |
Add (PA) | Adapter networks | (Lin and Lu, 2018) |
Freeze (FR) | Embedding freezing, Layerwise freezing | (Yin et al., 2015; Tourille et al., 2017) |
Ensemble (EN) | Mixture of experts, Weighted averaging | (McClosky et al., 2010; Nguyen et al., 2014) |
Instance Weighting (IW) | Classifier based weighting | (Jiang and Zhai, 2007; Jeong et al., 2009) |
Data Selection (DS) | Confidence-based sample selection | (Scheible and Schütze, 2013; Braud and Denis, 2014) |
Pseudo-Labeling (PL) | Semi-supervised learning, Self-training | (Umansky-Pesin et al., 2010; Lison et al., 2020) |
Noising/Denoising (NO) | Token dropout | (Pilán et al., 2016) |
Active Learning (AL) | Sample selection via active learning | (Rai et al., 2010; Wu et al., 2017) |
Pretraining (PT) | Language model pretraining, Supervised pretraining | (Conneau et al., 2017; Howard and Ruder, 2018) |
Instance Learning (IL) | Nearest neighbor learning | (Gong et al., 2016) |