. | Perplexity . | Taylor exponent . | Perplexity from eval-AWD-LSTM . |
---|---|---|---|
Original Data set | |||
Wikitext-2 (Preprocessed) | - | 0.62 (0.15) | 33.81 |
Shuffled Data Set | |||
Wikitext-2 (1-gram) | - | 0.50 (0.02) | 7,389.15 |
Wikitext-2 (2-gram) | - | 0.50 (0.02) | 2,405.15 |
Wikitext-2 (5-gram) | - | 0.50 (0.02) | 559.92 |
Wikitext-2 (10-gram) | - | 0.50 (0.02) | 236.49 |
N-gram Language Model | |||
3-gram | 837.58 | 0.50 (0.02) | 3,730.74 |
5-gram | 534.98 | 0.50 (0.02) | 7,532.91 |
linear interpolation | 294.72 | 0.50 (0.02) | 1,371.75 |
Katz backoff 3-gram | 285.14 | 0.50 (0.02) | 663.74 |
Katz backoff 5-gram | 357.94 | 0.50 (0.02) | 664.25 |
Kneser-Ney 3-gram | 204.15 | 0.50 (0.02) | 2,562.24 |
Kneser-Ney 5-gram | 215.44 | 0.50 (0.02) | 2,743.65 |
HPYLM | 184.34 | 0.50 (0.02) | 884.76 |
Neural Language Model | |||
Simple RNN | 164.51 | 0.50 (0.02) | 645.64 |
GRU | 96.22 | 0.52 (0.03) | 266.33 |
QRNN | 74.74 | 0.52 (0.03) | 135.68 |
LSTM (no regularization) | 113.18 | 0.52 (0.03) | 177.12 |
AWD-LSTM | 64.27 | 0.58 (0.06) | 88.73 |
AWD-LSTM-Simon | 61.59 | 0.55 (0.05) | 130.52 |
AWD-LSTM-MoS | 62.44 | 0.54 (0.04) | 97.89 |
AWD-LSTM-MoS-Cache | 59.21 | 0.57 (0.07) | 164.39 |
AWD-LSTM-Cache | 50.39 | 0.59 (0.07) | 109.02 |
. | Perplexity . | Taylor exponent . | Perplexity from eval-AWD-LSTM . |
---|---|---|---|
Original Data set | |||
Wikitext-2 (Preprocessed) | - | 0.62 (0.15) | 33.81 |
Shuffled Data Set | |||
Wikitext-2 (1-gram) | - | 0.50 (0.02) | 7,389.15 |
Wikitext-2 (2-gram) | - | 0.50 (0.02) | 2,405.15 |
Wikitext-2 (5-gram) | - | 0.50 (0.02) | 559.92 |
Wikitext-2 (10-gram) | - | 0.50 (0.02) | 236.49 |
N-gram Language Model | |||
3-gram | 837.58 | 0.50 (0.02) | 3,730.74 |
5-gram | 534.98 | 0.50 (0.02) | 7,532.91 |
linear interpolation | 294.72 | 0.50 (0.02) | 1,371.75 |
Katz backoff 3-gram | 285.14 | 0.50 (0.02) | 663.74 |
Katz backoff 5-gram | 357.94 | 0.50 (0.02) | 664.25 |
Kneser-Ney 3-gram | 204.15 | 0.50 (0.02) | 2,562.24 |
Kneser-Ney 5-gram | 215.44 | 0.50 (0.02) | 2,743.65 |
HPYLM | 184.34 | 0.50 (0.02) | 884.76 |
Neural Language Model | |||
Simple RNN | 164.51 | 0.50 (0.02) | 645.64 |
GRU | 96.22 | 0.52 (0.03) | 266.33 |
QRNN | 74.74 | 0.52 (0.03) | 135.68 |
LSTM (no regularization) | 113.18 | 0.52 (0.03) | 177.12 |
AWD-LSTM | 64.27 | 0.58 (0.06) | 88.73 |
AWD-LSTM-Simon | 61.59 | 0.55 (0.05) | 130.52 |
AWD-LSTM-MoS | 62.44 | 0.54 (0.04) | 97.89 |
AWD-LSTM-MoS-Cache | 59.21 | 0.57 (0.07) | 164.39 |
AWD-LSTM-Cache | 50.39 | 0.59 (0.07) | 109.02 |