Skip to Main Content
Table 3 
Summary of the scaling properties of the language models with the PTB. † The perplexity measure for HPYLM is not equivalent to that for the n-gram and neural language models because of the preprocessing difference. ‡ The values for these models are in bits per character.
 PerplexityVocabulary PopulationLong Memory
Zipf’s Law f(r) ∝ r−αHeaps’ Law v(n) ∝ nβEbeling’s Method m(l) ∝ lηTaylor’s Law σ ∝ μζLong Range Correlation c(s) ∝ s−ξ
Original Data set 
Penn Treebank (Preprocessed) Yes 0.70 (0.16) 1.23 (0.06) 0.56 (0.14) 0.81 (0.24) 
Penn Treebank (Original) Yes 0.83 (0.07) 1.20 (0.05) 0.57 (0.06) 0.60 (0.16) 
  
Shuffled Data set 
Penn Treebank (1-gram) Yes 0.72 (0.18) 1.00 (0.00) 0.50 (0.02) No 
Penn Treebank (2-gram) Yes 0.72 (0.18) 1.00 (0.00) 0.50 (0.02) No 
Penn Treebank (5-gram) Yes 0.72 (0.18) 1.00 (0.00) 0.50 (0.02) No 
Penn Treebank (10-gram) Yes 0.72 (0.18) 1.00 (0.01) 0.50 (0.02) No 
  
N-gram Language Model 
3-gram 367.79 Yes 0.71 (0.19) 0.99 (0.01) 0.50 (0.02) No 
5-gram 561.65 Yes 0.72 (0.21) 1.00 (0.00) 0.50 (0.02) No 
linear interpolation 238.59 Yes 0.71 (0.20) 1.00 (0.00) 0.50 (0.02) No 
Katz backoff 3-gram 195.65 Yes 0.71 (0.19) 1.00 (0.00) 0.50 (0.02) No 
Katz backoff 5-gram 250.18 Yes 0.71 (0.19) 1.00 (0.00) 0.50 (0.02) No 
Kneser-Ney 3-gram 150.64 Yes 0.72 (0.21) 1.00 (0.00) 0.50 (0.02) No 
Kneser-Ney 5-gram 156.70 Yes 0.71 (0.20) 1.00 (0.00) 0.50 (0.02) No 
  
Simon/Pitman-Yor Process and Related Language Model 
HPYLM (140.49Yes 0.73 (0.21) 1.00 (0.00) 0.50 (0.02) No 
  
Grammatical Model 
PCFG Yes 0.73 (0.19) 1.00 (0.00) 0.50 (0.02) No 
  
Neural Language Model (character based) 
LSTM (no regularization) (1.38Yes 0.79 (0.08) 1.03 (0.01) 0.50 (0.01) No 
AWD-LSTM (1.18Yes 0.76 (0.12) 1.10 (0.03) 0.51 (0.02) 0.40 (0.10) 
  
Neural Language Model (word based) 
Simple RNN 123.96 Yes 0.71 (0.19) 1.00 (0.01) 0.50 (0.02) 0.74 (Weak) 
GRU 85.05 Yes 0.71 (0.18) 1.05 (0.02) 0.50 (0.02) 0.40 (Weak) 
QRNN 62.65 Yes 0.71 (0.18) 1.10 (0.03) 0.51 (0.02) 0.54 (Weak) 
LSTM (no regularization) 111.79 Yes 0.71 (0.19) 1.04 (0.01) 0.51 (0.02) 0.84 (Weak) 
AWD-LSTM 56.40 Yes 0.71 (0.18) 1.06 (0.02) 0.51 (0.03) 0.69 (Weak) 
AWD-LSTM-Simon 57.85 Yes 0.72 (0.16) 1.04 (0.01) 0.51 (0.03) No 
AWD-LSTM-MoS 54.77 Yes 0.71 (0.18) 1.10 (0.03) 0.52 (0.04) 0.77 (Weak) 
AWD-LSTM-MoS-Cache 54.03 Yes 0.71 (0.18) 1.13 (0.04) 0.55 (0.06) 0.61 (Weak) 
AWD-LSTM-Cache 52.51 Yes 0.72 (0.17) 1.07 (0.02) 0.53 (0.05) 0.57 (Weak) 
 PerplexityVocabulary PopulationLong Memory
Zipf’s Law f(r) ∝ r−αHeaps’ Law v(n) ∝ nβEbeling’s Method m(l) ∝ lηTaylor’s Law σ ∝ μζLong Range Correlation c(s) ∝ s−ξ
Original Data set 
Penn Treebank (Preprocessed) Yes 0.70 (0.16) 1.23 (0.06) 0.56 (0.14) 0.81 (0.24) 
Penn Treebank (Original) Yes 0.83 (0.07) 1.20 (0.05) 0.57 (0.06) 0.60 (0.16) 
  
Shuffled Data set 
Penn Treebank (1-gram) Yes 0.72 (0.18) 1.00 (0.00) 0.50 (0.02) No 
Penn Treebank (2-gram) Yes 0.72 (0.18) 1.00 (0.00) 0.50 (0.02) No 
Penn Treebank (5-gram) Yes 0.72 (0.18) 1.00 (0.00) 0.50 (0.02) No 
Penn Treebank (10-gram) Yes 0.72 (0.18) 1.00 (0.01) 0.50 (0.02) No 
  
N-gram Language Model 
3-gram 367.79 Yes 0.71 (0.19) 0.99 (0.01) 0.50 (0.02) No 
5-gram 561.65 Yes 0.72 (0.21) 1.00 (0.00) 0.50 (0.02) No 
linear interpolation 238.59 Yes 0.71 (0.20) 1.00 (0.00) 0.50 (0.02) No 
Katz backoff 3-gram 195.65 Yes 0.71 (0.19) 1.00 (0.00) 0.50 (0.02) No 
Katz backoff 5-gram 250.18 Yes 0.71 (0.19) 1.00 (0.00) 0.50 (0.02) No 
Kneser-Ney 3-gram 150.64 Yes 0.72 (0.21) 1.00 (0.00) 0.50 (0.02) No 
Kneser-Ney 5-gram 156.70 Yes 0.71 (0.20) 1.00 (0.00) 0.50 (0.02) No 
  
Simon/Pitman-Yor Process and Related Language Model 
HPYLM (140.49Yes 0.73 (0.21) 1.00 (0.00) 0.50 (0.02) No 
  
Grammatical Model 
PCFG Yes 0.73 (0.19) 1.00 (0.00) 0.50 (0.02) No 
  
Neural Language Model (character based) 
LSTM (no regularization) (1.38Yes 0.79 (0.08) 1.03 (0.01) 0.50 (0.01) No 
AWD-LSTM (1.18Yes 0.76 (0.12) 1.10 (0.03) 0.51 (0.02) 0.40 (0.10) 
  
Neural Language Model (word based) 
Simple RNN 123.96 Yes 0.71 (0.19) 1.00 (0.01) 0.50 (0.02) 0.74 (Weak) 
GRU 85.05 Yes 0.71 (0.18) 1.05 (0.02) 0.50 (0.02) 0.40 (Weak) 
QRNN 62.65 Yes 0.71 (0.18) 1.10 (0.03) 0.51 (0.02) 0.54 (Weak) 
LSTM (no regularization) 111.79 Yes 0.71 (0.19) 1.04 (0.01) 0.51 (0.02) 0.84 (Weak) 
AWD-LSTM 56.40 Yes 0.71 (0.18) 1.06 (0.02) 0.51 (0.03) 0.69 (Weak) 
AWD-LSTM-Simon 57.85 Yes 0.72 (0.16) 1.04 (0.01) 0.51 (0.03) No 
AWD-LSTM-MoS 54.77 Yes 0.71 (0.18) 1.10 (0.03) 0.52 (0.04) 0.77 (Weak) 
AWD-LSTM-MoS-Cache 54.03 Yes 0.71 (0.18) 1.13 (0.04) 0.55 (0.06) 0.61 (Weak) 
AWD-LSTM-Cache 52.51 Yes 0.72 (0.17) 1.07 (0.02) 0.53 (0.05) 0.57 (Weak) 
Close Modal

or Create an Account

Close Modal
Close Modal