Skip to Main Content
Table 7: 
Tokenized UNMT BLEU scores on IWSLT’15 English-Vietnamese (tst2013) with XLM initialization. We mined 300k pseudo-parallel (PP) sentence pairs from En and Vi Wikipedia (Oct. 2019). We created two XLM models, with the pre-training corpus including or excluding the PP pairs. We compare their downstream UNMT performance with and without PP pairs as “bitext” during UNMT training.
w/o PP as bitextw/ PP as bitext
XLM excl. PP text 23.2 28.9 
XLM incl. PP text 23.1 28.3 
w/o PP as bitextw/ PP as bitext
XLM excl. PP text 23.2 28.9 
XLM incl. PP text 23.1 28.3 
Close Modal

or Create an Account

Close Modal
Close Modal