Table 4

Ablation study on data augmentation. Neither the additional out-of-domain samples nor the large-scale parallel corpus used for machine translation training directly contribute to LID.

Training SetQID-21KB-21
Out-of-Domain 89.77 94.29 
w/ Synthetic In-Domain 95.35 96.86 
 
w/ Parallel 90.93 94.38 
w/ Out-of-Domain (Addition) 90.89 94.52 
 
w/ Synthetic In-Domain (20%) 92.02 94.91 
w/ Synthetic In-Domain (50%) 94.89 96.12 
w/ Synthetic In-Domain (80%) 95.30 96.79 
Training SetQID-21KB-21
Out-of-Domain 89.77 94.29 
w/ Synthetic In-Domain 95.35 96.86 
 
w/ Parallel 90.93 94.38 
w/ Out-of-Domain (Addition) 90.89 94.52 
 
w/ Synthetic In-Domain (20%) 92.02 94.91 
w/ Synthetic In-Domain (50%) 94.89 96.12 
w/ Synthetic In-Domain (80%) 95.30 96.79 
Close Modal

or Create an Account

Close Modal
Close Modal