Table 2: 

Examples of methods from each category, and papers studying these methods. These lists are non-exhaustive. In the interest of replicability, we have made our coding for all papers publicly available at: http://www.shorturl.at/stuAT.

CategoryExample MethodsExample Studies
Feat Aug (FA) Structural correspondence learning, Frustratingly easy domain adaptation (Blitzer et al., 2006; Daumé III, 2007) 
Feat Gen (FG) Marginalized stacked denoising autoencoders, Deep belief networks (Jochim and Schütze, 2014; Ji et al., 2015; Yang et al., 2015) 
Loss Aug (LA) Multi-task learning, Adversarial learning, Regularization-based methods (Zhang et al., 2017; Liu et al., 2019; Chen et al., 2020) 
Init (PI) Prior estimation, Parameter matrix initialization (Chan and Ng, 2006; Al Boni et al., 2015) 
Add (PA) Adapter networks (Lin and Lu, 2018) 
Freeze (FR) Embedding freezing, Layerwise freezing (Yin et al., 2015; Tourille et al., 2017) 
Ensemble (EN) Mixture of experts, Weighted averaging (McClosky et al., 2010; Nguyen et al., 2014) 
Instance Weighting (IW) Classifier based weighting (Jiang and Zhai, 2007; Jeong et al., 2009) 
Data Selection (DS) Confidence-based sample selection (Scheible and Schütze, 2013; Braud and Denis, 2014) 
Pseudo-Labeling (PL) Semi-supervised learning, Self-training (Umansky-Pesin et al., 2010; Lison et al., 2020) 
Noising/Denoising (NO) Token dropout (Pilán et al., 2016) 
Active Learning (AL) Sample selection via active learning (Rai et al., 2010; Wu et al., 2017) 
Pretraining (PT) Language model pretraining, Supervised pretraining (Conneau et al., 2017; Howard and Ruder, 2018) 
Instance Learning (IL) Nearest neighbor learning (Gong et al., 2016) 
CategoryExample MethodsExample Studies
Feat Aug (FA) Structural correspondence learning, Frustratingly easy domain adaptation (Blitzer et al., 2006; Daumé III, 2007) 
Feat Gen (FG) Marginalized stacked denoising autoencoders, Deep belief networks (Jochim and Schütze, 2014; Ji et al., 2015; Yang et al., 2015) 
Loss Aug (LA) Multi-task learning, Adversarial learning, Regularization-based methods (Zhang et al., 2017; Liu et al., 2019; Chen et al., 2020) 
Init (PI) Prior estimation, Parameter matrix initialization (Chan and Ng, 2006; Al Boni et al., 2015) 
Add (PA) Adapter networks (Lin and Lu, 2018) 
Freeze (FR) Embedding freezing, Layerwise freezing (Yin et al., 2015; Tourille et al., 2017) 
Ensemble (EN) Mixture of experts, Weighted averaging (McClosky et al., 2010; Nguyen et al., 2014) 
Instance Weighting (IW) Classifier based weighting (Jiang and Zhai, 2007; Jeong et al., 2009) 
Data Selection (DS) Confidence-based sample selection (Scheible and Schütze, 2013; Braud and Denis, 2014) 
Pseudo-Labeling (PL) Semi-supervised learning, Self-training (Umansky-Pesin et al., 2010; Lison et al., 2020) 
Noising/Denoising (NO) Token dropout (Pilán et al., 2016) 
Active Learning (AL) Sample selection via active learning (Rai et al., 2010; Wu et al., 2017) 
Pretraining (PT) Language model pretraining, Supervised pretraining (Conneau et al., 2017; Howard and Ruder, 2018) 
Instance Learning (IL) Nearest neighbor learning (Gong et al., 2016) 
Close Modal

or Create an Account

Close Modal
Close Modal