Skip to Main Content
Table 1

Comparison of selected DA methods adapted from Feng et al. (2021). Level denotes the depth at which data is modified by the DA. Task refers to whether the DA method can be applied to different tasks (i.e., task-agnostic), or specifically designed for a task. The Reason column provides the reason why the method is not included in this article.

DA MethodLevelTaskReason
Synonym Replacement (Wang and Yang 2015) Input Agnostic included in study 
Random Deletion (Wei and Zou 2019) Input Agnostic included in study 
Random Swap (Wei and Zou 2019) Input Agnostic included in study 
DTreeMorph (Şahin and Steedman 2018) Input Agnostic included in study 
Synthetic Noise (Karpukhin et al. 2019) Input Agnostic included in study 
Nonce (Gulordava et al. 2018) Input Agnostic included in study 
  
Backtranslation (Sennrich, Haddow, and Birch 2016a) Input Agnostic labels not preserved 
UBT & TBT (Vaibhav et al. 2019) Input Agnostic labels not preserved 
Data Diversification (Nguyen et al. 2020) Input Agnostic labels not preserved 
SCPN (Wieting and Gimpel 2017) Input Agnostic labels not preserved 
Semantic Text Exchange (Feng, Li, and Hoey 2019) Input Agnostic labels not preserved 
XLDA (Singh et al. 2019) Input Agnostic labels not preserved 
LAMBADA (Anaby-Tavor et al. 2020) Input classification labels not preserved 
  
ContextualAug (Kobayashi 2018) Input Agnostic requires strong pretrained model 
Soft Contextual DA (Gao et al. 2019) Emb/Hidden Agnostic requires strong pretrained model 
Slot-Sub-LM (Louvan and Magnini 2020) Input slot filling requires strong pretrained model 
  
WN-Hypers (Feng et al. 2020) Input Agnostic requires WordNet 
UEdin-MS (DA part) (Grundkiewicz, Junczys-Dowmunt, and Heafield 2019) Input Agnostic requires spell checker 
  
SeqMixUp (Guo, Kim, and Rush 2020) Input seq2seq not suitable 
Emix (Jindal et al. 2020a) Emb/Hidden classification not suitable 
SpeechMix (Jindal et al. 2020b) Emb/Hidden Speech/Audio not suitable 
MixText (Chen, Yang, and Yang 2020b) Emb/Hidden classification not suitable 
SwitchOut (Wang et al. 2018) Input machine translation not suitable 
SignedGraph (Chen, Ji, and Evans 2020) Input paraphrase not suitable 
DAGA (Ding et al. 2020) Input+Label sequence tagging not suitable 
SeqMix (Zhang, Yu, and Zhang 2020) Input+Label active sequence labeling not suitable 
GECA (Andreas 2020) Input Agnostic not suitable 
DA MethodLevelTaskReason
Synonym Replacement (Wang and Yang 2015) Input Agnostic included in study 
Random Deletion (Wei and Zou 2019) Input Agnostic included in study 
Random Swap (Wei and Zou 2019) Input Agnostic included in study 
DTreeMorph (Şahin and Steedman 2018) Input Agnostic included in study 
Synthetic Noise (Karpukhin et al. 2019) Input Agnostic included in study 
Nonce (Gulordava et al. 2018) Input Agnostic included in study 
  
Backtranslation (Sennrich, Haddow, and Birch 2016a) Input Agnostic labels not preserved 
UBT & TBT (Vaibhav et al. 2019) Input Agnostic labels not preserved 
Data Diversification (Nguyen et al. 2020) Input Agnostic labels not preserved 
SCPN (Wieting and Gimpel 2017) Input Agnostic labels not preserved 
Semantic Text Exchange (Feng, Li, and Hoey 2019) Input Agnostic labels not preserved 
XLDA (Singh et al. 2019) Input Agnostic labels not preserved 
LAMBADA (Anaby-Tavor et al. 2020) Input classification labels not preserved 
  
ContextualAug (Kobayashi 2018) Input Agnostic requires strong pretrained model 
Soft Contextual DA (Gao et al. 2019) Emb/Hidden Agnostic requires strong pretrained model 
Slot-Sub-LM (Louvan and Magnini 2020) Input slot filling requires strong pretrained model 
  
WN-Hypers (Feng et al. 2020) Input Agnostic requires WordNet 
UEdin-MS (DA part) (Grundkiewicz, Junczys-Dowmunt, and Heafield 2019) Input Agnostic requires spell checker 
  
SeqMixUp (Guo, Kim, and Rush 2020) Input seq2seq not suitable 
Emix (Jindal et al. 2020a) Emb/Hidden classification not suitable 
SpeechMix (Jindal et al. 2020b) Emb/Hidden Speech/Audio not suitable 
MixText (Chen, Yang, and Yang 2020b) Emb/Hidden classification not suitable 
SwitchOut (Wang et al. 2018) Input machine translation not suitable 
SignedGraph (Chen, Ji, and Evans 2020) Input paraphrase not suitable 
DAGA (Ding et al. 2020) Input+Label sequence tagging not suitable 
SeqMix (Zhang, Yu, and Zhang 2020) Input+Label active sequence labeling not suitable 
GECA (Andreas 2020) Input Agnostic not suitable 
Close Modal

or Create an Account

Close Modal
Close Modal