Ablation study on data augmentation. Neither the additional out-of-domain samples nor the large-scale parallel corpus used for machine translation training directly contribute to LID.
Training Set . | QID-21 . | KB-21 . |
---|---|---|
Out-of-Domain | 89.77 | 94.29 |
w/ Synthetic In-Domain | 95.35 | 96.86 |
w/ Parallel | 90.93 | 94.38 |
w/ Out-of-Domain (Addition) | 90.89 | 94.52 |
w/ Synthetic In-Domain (20%) | 92.02 | 94.91 |
w/ Synthetic In-Domain (50%) | 94.89 | 96.12 |
w/ Synthetic In-Domain (80%) | 95.30 | 96.79 |
Training Set . | QID-21 . | KB-21 . |
---|---|---|
Out-of-Domain | 89.77 | 94.29 |
w/ Synthetic In-Domain | 95.35 | 96.86 |
w/ Parallel | 90.93 | 94.38 |
w/ Out-of-Domain (Addition) | 90.89 | 94.52 |
w/ Synthetic In-Domain (20%) | 92.02 | 94.91 |
w/ Synthetic In-Domain (50%) | 94.89 | 96.12 |
w/ Synthetic In-Domain (80%) | 95.30 | 96.79 |