. | Perplexity . | Vocabulary Population . | Long Memory . | |||
---|---|---|---|---|---|---|
Zipf’s Law f(r) ∝ r−α . | Heaps’ Law v(n) ∝ nβ . | Ebeling’s Method m(l) ∝ lη . | Taylor’s Law σ ∝ μζ . | Long Range Correlation c(s) ∝ s−ξ . | ||
Original Data set | ||||||
Penn Treebank (Preprocessed) | - | Yes | 0.70 (0.16) | 1.23 (0.06) | 0.56 (0.14) | 0.81 (0.24) |
Penn Treebank (Original) | - | Yes | 0.83 (0.07) | 1.20 (0.05) | 0.57 (0.06) | 0.60 (0.16) |
Shuffled Data set | ||||||
Penn Treebank (1-gram) | - | Yes | 0.72 (0.18) | 1.00 (0.00) | 0.50 (0.02) | No |
Penn Treebank (2-gram) | - | Yes | 0.72 (0.18) | 1.00 (0.00) | 0.50 (0.02) | No |
Penn Treebank (5-gram) | - | Yes | 0.72 (0.18) | 1.00 (0.00) | 0.50 (0.02) | No |
Penn Treebank (10-gram) | - | Yes | 0.72 (0.18) | 1.00 (0.01) | 0.50 (0.02) | No |
N-gram Language Model | ||||||
3-gram | 367.79 | Yes | 0.71 (0.19) | 0.99 (0.01) | 0.50 (0.02) | No |
5-gram | 561.65 | Yes | 0.72 (0.21) | 1.00 (0.00) | 0.50 (0.02) | No |
linear interpolation | 238.59 | Yes | 0.71 (0.20) | 1.00 (0.00) | 0.50 (0.02) | No |
Katz backoff 3-gram | 195.65 | Yes | 0.71 (0.19) | 1.00 (0.00) | 0.50 (0.02) | No |
Katz backoff 5-gram | 250.18 | Yes | 0.71 (0.19) | 1.00 (0.00) | 0.50 (0.02) | No |
Kneser-Ney 3-gram | 150.64 | Yes | 0.72 (0.21) | 1.00 (0.00) | 0.50 (0.02) | No |
Kneser-Ney 5-gram | 156.70 | Yes | 0.71 (0.20) | 1.00 (0.00) | 0.50 (0.02) | No |
Simon/Pitman-Yor Process and Related Language Model | ||||||
HPYLM | (140.49†) | Yes | 0.73 (0.21) | 1.00 (0.00) | 0.50 (0.02) | No |
Grammatical Model | ||||||
PCFG | - | Yes | 0.73 (0.19) | 1.00 (0.00) | 0.50 (0.02) | No |
Neural Language Model (character based) | ||||||
LSTM (no regularization) | (1.38‡) | Yes | 0.79 (0.08) | 1.03 (0.01) | 0.50 (0.01) | No |
AWD-LSTM | (1.18‡) | Yes | 0.76 (0.12) | 1.10 (0.03) | 0.51 (0.02) | 0.40 (0.10) |
Neural Language Model (word based) | ||||||
Simple RNN | 123.96 | Yes | 0.71 (0.19) | 1.00 (0.01) | 0.50 (0.02) | 0.74 (Weak) |
GRU | 85.05 | Yes | 0.71 (0.18) | 1.05 (0.02) | 0.50 (0.02) | 0.40 (Weak) |
QRNN | 62.65 | Yes | 0.71 (0.18) | 1.10 (0.03) | 0.51 (0.02) | 0.54 (Weak) |
LSTM (no regularization) | 111.79 | Yes | 0.71 (0.19) | 1.04 (0.01) | 0.51 (0.02) | 0.84 (Weak) |
AWD-LSTM | 56.40 | Yes | 0.71 (0.18) | 1.06 (0.02) | 0.51 (0.03) | 0.69 (Weak) |
AWD-LSTM-Simon | 57.85 | Yes | 0.72 (0.16) | 1.04 (0.01) | 0.51 (0.03) | No |
AWD-LSTM-MoS | 54.77 | Yes | 0.71 (0.18) | 1.10 (0.03) | 0.52 (0.04) | 0.77 (Weak) |
AWD-LSTM-MoS-Cache | 54.03 | Yes | 0.71 (0.18) | 1.13 (0.04) | 0.55 (0.06) | 0.61 (Weak) |
AWD-LSTM-Cache | 52.51 | Yes | 0.72 (0.17) | 1.07 (0.02) | 0.53 (0.05) | 0.57 (Weak) |
. | Perplexity . | Vocabulary Population . | Long Memory . | |||
---|---|---|---|---|---|---|
Zipf’s Law f(r) ∝ r−α . | Heaps’ Law v(n) ∝ nβ . | Ebeling’s Method m(l) ∝ lη . | Taylor’s Law σ ∝ μζ . | Long Range Correlation c(s) ∝ s−ξ . | ||
Original Data set | ||||||
Penn Treebank (Preprocessed) | - | Yes | 0.70 (0.16) | 1.23 (0.06) | 0.56 (0.14) | 0.81 (0.24) |
Penn Treebank (Original) | - | Yes | 0.83 (0.07) | 1.20 (0.05) | 0.57 (0.06) | 0.60 (0.16) |
Shuffled Data set | ||||||
Penn Treebank (1-gram) | - | Yes | 0.72 (0.18) | 1.00 (0.00) | 0.50 (0.02) | No |
Penn Treebank (2-gram) | - | Yes | 0.72 (0.18) | 1.00 (0.00) | 0.50 (0.02) | No |
Penn Treebank (5-gram) | - | Yes | 0.72 (0.18) | 1.00 (0.00) | 0.50 (0.02) | No |
Penn Treebank (10-gram) | - | Yes | 0.72 (0.18) | 1.00 (0.01) | 0.50 (0.02) | No |
N-gram Language Model | ||||||
3-gram | 367.79 | Yes | 0.71 (0.19) | 0.99 (0.01) | 0.50 (0.02) | No |
5-gram | 561.65 | Yes | 0.72 (0.21) | 1.00 (0.00) | 0.50 (0.02) | No |
linear interpolation | 238.59 | Yes | 0.71 (0.20) | 1.00 (0.00) | 0.50 (0.02) | No |
Katz backoff 3-gram | 195.65 | Yes | 0.71 (0.19) | 1.00 (0.00) | 0.50 (0.02) | No |
Katz backoff 5-gram | 250.18 | Yes | 0.71 (0.19) | 1.00 (0.00) | 0.50 (0.02) | No |
Kneser-Ney 3-gram | 150.64 | Yes | 0.72 (0.21) | 1.00 (0.00) | 0.50 (0.02) | No |
Kneser-Ney 5-gram | 156.70 | Yes | 0.71 (0.20) | 1.00 (0.00) | 0.50 (0.02) | No |
Simon/Pitman-Yor Process and Related Language Model | ||||||
HPYLM | (140.49†) | Yes | 0.73 (0.21) | 1.00 (0.00) | 0.50 (0.02) | No |
Grammatical Model | ||||||
PCFG | - | Yes | 0.73 (0.19) | 1.00 (0.00) | 0.50 (0.02) | No |
Neural Language Model (character based) | ||||||
LSTM (no regularization) | (1.38‡) | Yes | 0.79 (0.08) | 1.03 (0.01) | 0.50 (0.01) | No |
AWD-LSTM | (1.18‡) | Yes | 0.76 (0.12) | 1.10 (0.03) | 0.51 (0.02) | 0.40 (0.10) |
Neural Language Model (word based) | ||||||
Simple RNN | 123.96 | Yes | 0.71 (0.19) | 1.00 (0.01) | 0.50 (0.02) | 0.74 (Weak) |
GRU | 85.05 | Yes | 0.71 (0.18) | 1.05 (0.02) | 0.50 (0.02) | 0.40 (Weak) |
QRNN | 62.65 | Yes | 0.71 (0.18) | 1.10 (0.03) | 0.51 (0.02) | 0.54 (Weak) |
LSTM (no regularization) | 111.79 | Yes | 0.71 (0.19) | 1.04 (0.01) | 0.51 (0.02) | 0.84 (Weak) |
AWD-LSTM | 56.40 | Yes | 0.71 (0.18) | 1.06 (0.02) | 0.51 (0.03) | 0.69 (Weak) |
AWD-LSTM-Simon | 57.85 | Yes | 0.72 (0.16) | 1.04 (0.01) | 0.51 (0.03) | No |
AWD-LSTM-MoS | 54.77 | Yes | 0.71 (0.18) | 1.10 (0.03) | 0.52 (0.04) | 0.77 (Weak) |
AWD-LSTM-MoS-Cache | 54.03 | Yes | 0.71 (0.18) | 1.13 (0.04) | 0.55 (0.06) | 0.61 (Weak) |
AWD-LSTM-Cache | 52.51 | Yes | 0.72 (0.17) | 1.07 (0.02) | 0.53 (0.05) | 0.57 (Weak) |