. | Perplexity . | Vocabulary Population . | Long Memory . | |||
---|---|---|---|---|---|---|
Zipf’s Law f(r) ∝ r−α . | Heaps’ Law v(n) ∝ nβ . | Ebeling’s Method m(l) ∝ lη . | Taylor’s Law σ ∝ μζ . | Long Range Correlation c(s) ∝ s−ξ . | ||
Original Data set | ||||||
Wikitext-2 (Preprocessed) | - | Yes | 0.75 (0.13) | 1.32 (0.10) | 0.62 (0.15) | 0.33 (0.04) |
Wikitext-2 (Original) | - | Yes | 0.78 (0.09) | 1.33 (0.10) | 0.65 (0.11) | 0.32 (0.03) |
Shuffled Data set | ||||||
Wikitext-2(1-gram) | - | Yes | 0.75 (0.16) | 1.00 (0.01) | 0.50 (0.02) | No |
Wikitext-2(2-gram) | - | Yes | 0.76 (0.16) | 1.00 (0.00) | 0.50 (0.01) | No |
Wikitext-2(5-gram) | - | Yes | 0.76 (0.16) | 1.00 (0.00) | 0.50 (0.02) | No |
Wikitext-2(10-gram) | - | Yes | 0.76 (0.16) | 1.00 (0.00) | 0.50 (0.02) | No |
N-gram Language Model | ||||||
3-gram | 837.58 | Yes | 0.79 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
5-gram | 534.98 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
linear interpolation | 294.72 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Katz backoff 3-gram | 285.14 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Katz backoff 5-gram | 357.94 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Kneser-Ney 3-gram | 204.15 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Kneser-Ney 5-gram | 215.44 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Simon/Pitman-Yor Process and Related Language Model | ||||||
Simon | - | Yes | 0.95 (0.15) | - | 0.50 (0.01) | 0.09 (0.03) |
Pitman-Yor | - | Yes | 0.78 (0.09) | - | 0.50 (0.01) | No |
HPYLM | (184.34†) | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Neural Language Model (character based) | ||||||
LSTM (no regularization) | (1.44‡) | Yes | 0.74 (0.17) | 1.06 (0.05) | 0.50 (0.01) | No |
AWD-LSTM | (1.22‡) | Yes | 0.73 (0.15) | 1.27 (0.10) | 0.54 (0.04) | 0.30 (0.05) |
Neural Language Model (word based) | ||||||
Simple RNN | 164.51 | Yes | 0.79 (0.12) | 1.01 (0.00) | 0.50 (0.02) | No |
GRU | 96.22 | Yes | 0.79 (0.11) | 1.12 (0.06) | 0.52 (0.03) | 0.52 (Weak) |
QRNN | 74.74 | Yes | 0.79 (0.11) | 1.08 (0.03) | 0.52 (0.03) | 0.57 (0.08) |
LSTM (no regularization) | 113.18 | Yes | 0.78 (0.12) | 1.10 (0.03) | 0.52 (0.03) | 0.43 (0.15) |
AWD-LSTM | 64.27 | Yes | 0.76 (0.13) | 1.30 (0.15) | 0.58 (0.06) | 0.05 (0.01) |
AWD-LSTM-Simon | 61.59 | Yes | 0.77 (0.10) | 1.25 (0.15) | 0.55 (0.05) | 0.03 (0.01) |
AWD-LSTM-MoS | 62.44 | Yes | 0.78 (0.12) | 1.16 (0.07) | 0.54 (0.04) | 0.33 (0.07) |
AWD-LSTM-MoS-Cache | 59.21 | Yes | 0.78 (0.11) | 1.20 (0.07) | 0.57 (0.07) | 0.29 (0.05) |
AWD-LSTM-Cache | 50.39 | Yes | 0.78 (0.11) | 1.25 (0.10) | 0.59 (0.07) | 0.14 (0.04) |
. | Perplexity . | Vocabulary Population . | Long Memory . | |||
---|---|---|---|---|---|---|
Zipf’s Law f(r) ∝ r−α . | Heaps’ Law v(n) ∝ nβ . | Ebeling’s Method m(l) ∝ lη . | Taylor’s Law σ ∝ μζ . | Long Range Correlation c(s) ∝ s−ξ . | ||
Original Data set | ||||||
Wikitext-2 (Preprocessed) | - | Yes | 0.75 (0.13) | 1.32 (0.10) | 0.62 (0.15) | 0.33 (0.04) |
Wikitext-2 (Original) | - | Yes | 0.78 (0.09) | 1.33 (0.10) | 0.65 (0.11) | 0.32 (0.03) |
Shuffled Data set | ||||||
Wikitext-2(1-gram) | - | Yes | 0.75 (0.16) | 1.00 (0.01) | 0.50 (0.02) | No |
Wikitext-2(2-gram) | - | Yes | 0.76 (0.16) | 1.00 (0.00) | 0.50 (0.01) | No |
Wikitext-2(5-gram) | - | Yes | 0.76 (0.16) | 1.00 (0.00) | 0.50 (0.02) | No |
Wikitext-2(10-gram) | - | Yes | 0.76 (0.16) | 1.00 (0.00) | 0.50 (0.02) | No |
N-gram Language Model | ||||||
3-gram | 837.58 | Yes | 0.79 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
5-gram | 534.98 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
linear interpolation | 294.72 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Katz backoff 3-gram | 285.14 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Katz backoff 5-gram | 357.94 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Kneser-Ney 3-gram | 204.15 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Kneser-Ney 5-gram | 215.44 | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Simon/Pitman-Yor Process and Related Language Model | ||||||
Simon | - | Yes | 0.95 (0.15) | - | 0.50 (0.01) | 0.09 (0.03) |
Pitman-Yor | - | Yes | 0.78 (0.09) | - | 0.50 (0.01) | No |
HPYLM | (184.34†) | Yes | 0.78 (0.13) | 1.00 (0.00) | 0.50 (0.02) | No |
Neural Language Model (character based) | ||||||
LSTM (no regularization) | (1.44‡) | Yes | 0.74 (0.17) | 1.06 (0.05) | 0.50 (0.01) | No |
AWD-LSTM | (1.22‡) | Yes | 0.73 (0.15) | 1.27 (0.10) | 0.54 (0.04) | 0.30 (0.05) |
Neural Language Model (word based) | ||||||
Simple RNN | 164.51 | Yes | 0.79 (0.12) | 1.01 (0.00) | 0.50 (0.02) | No |
GRU | 96.22 | Yes | 0.79 (0.11) | 1.12 (0.06) | 0.52 (0.03) | 0.52 (Weak) |
QRNN | 74.74 | Yes | 0.79 (0.11) | 1.08 (0.03) | 0.52 (0.03) | 0.57 (0.08) |
LSTM (no regularization) | 113.18 | Yes | 0.78 (0.12) | 1.10 (0.03) | 0.52 (0.03) | 0.43 (0.15) |
AWD-LSTM | 64.27 | Yes | 0.76 (0.13) | 1.30 (0.15) | 0.58 (0.06) | 0.05 (0.01) |
AWD-LSTM-Simon | 61.59 | Yes | 0.77 (0.10) | 1.25 (0.15) | 0.55 (0.05) | 0.03 (0.01) |
AWD-LSTM-MoS | 62.44 | Yes | 0.78 (0.12) | 1.16 (0.07) | 0.54 (0.04) | 0.33 (0.07) |
AWD-LSTM-MoS-Cache | 59.21 | Yes | 0.78 (0.11) | 1.20 (0.07) | 0.57 (0.07) | 0.29 (0.05) |
AWD-LSTM-Cache | 50.39 | Yes | 0.78 (0.11) | 1.25 (0.10) | 0.59 (0.07) | 0.14 (0.04) |