Table 2 
Summary of the scaling properties of the language models with WT2. † The perplexity measure for HPYLM is not equivalent to that for the n-gram and neural language models because of the preprocessing difference. ‡ The values for these models are in bits per character.
 PerplexityVocabulary PopulationLong Memory
Zipf’s Law f(r) ∝ r−αHeaps’ Law v(n) ∝ nβEbeling’s Method m(l) ∝ lηTaylor’s Law σ ∝ μζLong Range Correlation c(s) ∝ s−ξ
Original Data set 
Wikitext-2 (Preprocessed) Yes 0.75 (0.13) 1.32 (0.10) 0.62 (0.15) 0.33 (0.04) 
Wikitext-2 (Original) Yes 0.78 (0.09) 1.33 (0.10) 0.65 (0.11) 0.32 (0.03) 
  
Shuffled Data set 
Wikitext-2(1-gram) Yes 0.75 (0.16) 1.00 (0.01) 0.50 (0.02) No 
Wikitext-2(2-gram) Yes 0.76 (0.16) 1.00 (0.00) 0.50 (0.01) No 
Wikitext-2(5-gram) Yes 0.76 (0.16) 1.00 (0.00) 0.50 (0.02) No 
Wikitext-2(10-gram) Yes 0.76 (0.16) 1.00 (0.00) 0.50 (0.02) No 
  
N-gram Language Model 
3-gram 837.58 Yes 0.79 (0.13) 1.00 (0.00) 0.50 (0.02) No 
5-gram 534.98 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
linear interpolation 294.72 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
Katz backoff 3-gram 285.14 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
Katz backoff 5-gram 357.94 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
Kneser-Ney 3-gram 204.15 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
Kneser-Ney 5-gram 215.44 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
  
Simon/Pitman-Yor Process and Related Language Model 
Simon Yes 0.95 (0.15) 0.50 (0.01) 0.09 (0.03) 
Pitman-Yor Yes 0.78 (0.09) 0.50 (0.01) No 
HPYLM (184.34Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
  
Neural Language Model (character based) 
LSTM (no regularization) (1.44Yes 0.74 (0.17) 1.06 (0.05) 0.50 (0.01) No 
AWD-LSTM (1.22Yes 0.73 (0.15) 1.27 (0.10) 0.54 (0.04) 0.30 (0.05) 
  
Neural Language Model (word based) 
Simple RNN 164.51 Yes 0.79 (0.12) 1.01 (0.00) 0.50 (0.02) No 
GRU 96.22 Yes 0.79 (0.11) 1.12 (0.06) 0.52 (0.03) 0.52 (Weak) 
QRNN 74.74 Yes 0.79 (0.11) 1.08 (0.03) 0.52 (0.03) 0.57 (0.08) 
LSTM (no regularization) 113.18 Yes 0.78 (0.12) 1.10 (0.03) 0.52 (0.03) 0.43 (0.15) 
AWD-LSTM 64.27 Yes 0.76 (0.13) 1.30 (0.15) 0.58 (0.06) 0.05 (0.01) 
AWD-LSTM-Simon 61.59 Yes 0.77 (0.10) 1.25 (0.15) 0.55 (0.05) 0.03 (0.01) 
AWD-LSTM-MoS 62.44 Yes 0.78 (0.12) 1.16 (0.07) 0.54 (0.04) 0.33 (0.07) 
AWD-LSTM-MoS-Cache 59.21 Yes 0.78 (0.11) 1.20 (0.07) 0.57 (0.07) 0.29 (0.05) 
AWD-LSTM-Cache 50.39 Yes 0.78 (0.11) 1.25 (0.10) 0.59 (0.07) 0.14 (0.04) 
 PerplexityVocabulary PopulationLong Memory
Zipf’s Law f(r) ∝ r−αHeaps’ Law v(n) ∝ nβEbeling’s Method m(l) ∝ lηTaylor’s Law σ ∝ μζLong Range Correlation c(s) ∝ s−ξ
Original Data set 
Wikitext-2 (Preprocessed) Yes 0.75 (0.13) 1.32 (0.10) 0.62 (0.15) 0.33 (0.04) 
Wikitext-2 (Original) Yes 0.78 (0.09) 1.33 (0.10) 0.65 (0.11) 0.32 (0.03) 
  
Shuffled Data set 
Wikitext-2(1-gram) Yes 0.75 (0.16) 1.00 (0.01) 0.50 (0.02) No 
Wikitext-2(2-gram) Yes 0.76 (0.16) 1.00 (0.00) 0.50 (0.01) No 
Wikitext-2(5-gram) Yes 0.76 (0.16) 1.00 (0.00) 0.50 (0.02) No 
Wikitext-2(10-gram) Yes 0.76 (0.16) 1.00 (0.00) 0.50 (0.02) No 
  
N-gram Language Model 
3-gram 837.58 Yes 0.79 (0.13) 1.00 (0.00) 0.50 (0.02) No 
5-gram 534.98 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
linear interpolation 294.72 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
Katz backoff 3-gram 285.14 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
Katz backoff 5-gram 357.94 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
Kneser-Ney 3-gram 204.15 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
Kneser-Ney 5-gram 215.44 Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
  
Simon/Pitman-Yor Process and Related Language Model 
Simon Yes 0.95 (0.15) 0.50 (0.01) 0.09 (0.03) 
Pitman-Yor Yes 0.78 (0.09) 0.50 (0.01) No 
HPYLM (184.34Yes 0.78 (0.13) 1.00 (0.00) 0.50 (0.02) No 
  
Neural Language Model (character based) 
LSTM (no regularization) (1.44Yes 0.74 (0.17) 1.06 (0.05) 0.50 (0.01) No 
AWD-LSTM (1.22Yes 0.73 (0.15) 1.27 (0.10) 0.54 (0.04) 0.30 (0.05) 
  
Neural Language Model (word based) 
Simple RNN 164.51 Yes 0.79 (0.12) 1.01 (0.00) 0.50 (0.02) No 
GRU 96.22 Yes 0.79 (0.11) 1.12 (0.06) 0.52 (0.03) 0.52 (Weak) 
QRNN 74.74 Yes 0.79 (0.11) 1.08 (0.03) 0.52 (0.03) 0.57 (0.08) 
LSTM (no regularization) 113.18 Yes 0.78 (0.12) 1.10 (0.03) 0.52 (0.03) 0.43 (0.15) 
AWD-LSTM 64.27 Yes 0.76 (0.13) 1.30 (0.15) 0.58 (0.06) 0.05 (0.01) 
AWD-LSTM-Simon 61.59 Yes 0.77 (0.10) 1.25 (0.15) 0.55 (0.05) 0.03 (0.01) 
AWD-LSTM-MoS 62.44 Yes 0.78 (0.12) 1.16 (0.07) 0.54 (0.04) 0.33 (0.07) 
AWD-LSTM-MoS-Cache 59.21 Yes 0.78 (0.11) 1.20 (0.07) 0.57 (0.07) 0.29 (0.05) 
AWD-LSTM-Cache 50.39 Yes 0.78 (0.11) 1.25 (0.10) 0.59 (0.07) 0.14 (0.04) 
Close Modal

or Create an Account

Close Modal
Close Modal