Skip to Main Content
Table 3:
Perplexity per word of language models on NIST dev set. GW refers to Gigaword.
Architecture
.
Data
.
PPL
.
transformer-XL
NIST sent
83.3
transformer-XL
NIST + GW sent
96.5
LSTM
NIST doc
71.6
transformer-XL
NIST doc
43.8
transformer-XL
NIST + GW doc
43.4
Architecture
.
Data
.
PPL
.
transformer-XL
NIST sent
83.3
transformer-XL
NIST + GW sent
96.5
LSTM
NIST doc
71.6
transformer-XL
NIST doc
43.8
transformer-XL
NIST + GW doc
43.4
View Large
Close Modal
Close Modal
This Feature Is Available To Subscribers Only
Sign In
or
Create an Account
Close Modal
Close Modal
This site uses cookies. By continuing to use our website, you are agreeing to
our privacy policy.
Accept