Tokenized UNMT BLEU scores on IWSLT’15 English-Vietnamese (tst2013) with XLM initialization. We mined 300k pseudo-parallel (PP) sentence pairs from En and Vi Wikipedia (Oct. 2019). We created two XLM models, with the pre-training corpus including or excluding the PP pairs. We compare their downstream UNMT performance with and without PP pairs as “bitext” during UNMT training.
This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.