Skip to Main Content
Table 3: 
Perplexity per word of language models on NIST dev set. GW refers to Gigaword.
ArchitectureDataPPL
transformer-XL NIST sent 83.3 
transformer-XL NIST + GW sent 96.5 
 
LSTM NIST doc 71.6 
transformer-XL NIST doc 43.8 
transformer-XL NIST + GW doc 43.4 
ArchitectureDataPPL
transformer-XL NIST sent 83.3 
transformer-XL NIST + GW sent 96.5 
 
LSTM NIST doc 71.6 
transformer-XL NIST doc 43.8 
transformer-XL NIST + GW doc 43.4 
Close Modal

or Create an Account

Close Modal
Close Modal