Skip to Main Content
Table 6: 
Quantitative analysis of the generated data. “%” indicates the number for occurrences per 100 tokens. For English, we compute the statistics on the en-fr data. For the MTNT test sets, the statistics are computed on the source side. RL1-L2S has been generated by the alteration of PL1-L2 by our approach #1.
DatasetEnglishFrenchJapanese
Profanities %Slang %Contractions %-ise/-ize RatioProfanities %Profanities %Formal/Informal Pronouns Ratio
MTNT 0.27 0.21 1.90 40.00/60.00 0.90 0.01 68.75/31.25 
 
PL1-L2 0.01 0.00 0.03 92.00/8.00 0.45 0.00 96.88/3.12 
PL1-L2S 0.06 0.04 0.21 41.03/58.97 0.57 0.01 83.01/16.99 
DatasetEnglishFrenchJapanese
Profanities %Slang %Contractions %-ise/-ize RatioProfanities %Profanities %Formal/Informal Pronouns Ratio
MTNT 0.27 0.21 1.90 40.00/60.00 0.90 0.01 68.75/31.25 
 
PL1-L2 0.01 0.00 0.03 92.00/8.00 0.45 0.00 96.88/3.12 
PL1-L2S 0.06 0.04 0.21 41.03/58.97 0.57 0.01 83.01/16.99 
Close Modal

or Create an Account

Close Modal
Close Modal