. | Perplexity . | Taylor exponent . | Perplexity from eval-AWD-LSTM . |
---|---|---|---|
Original Data Set | |||
Penn Tree Bank (Preprocessed) | - | 0.56 (0.14) | 40.70 |
Shuffled Data Set | |||
Penn Tree Bank (1-gram) | - | 0.50 (0.02) | 3,698.52 |
Penn Tree Bank (2-gram) | - | 0.50 (0.02) | 1,328.39 |
Penn Tree Bank (5-gram) | - | 0.50 (0.02) | 351.22 |
Penn Tree Bank (10-gram) | - | 0.50 (0.02) | 166.93 |
N-gram Language Model | |||
3-gram | 367.79 | 0.50 (0.02) | 1,697.99 |
5-gram | 561.65 | 0.50 (0.02) | 3,463.88 |
linear interpolation | 238.59 | 0.50 (0.02) | 965.58 |
Katz backoff 3-gram | 195.65 | 0.50 (0.02) | 420.48 |
Katz backoff 5-gram | 250.18 | 0.50 (0.02) | 471.03 |
Kneser-Ney 3-gram | 150.64 | 0.50 (0.02) | 1,324.67 |
Kneser-Ney 5-gram | 156.70 | 0.50 (0.02) | 1,411.14 |
HPYLM | 140.49 | 0.50 (0.02) | 412.13 |
Neural Language Model | |||
Simple RNN | 123.96 | 0.50 (0.02) | 321.31 |
GRU | 85.05 | 0.50 (0.02) | 258.12 |
QRNN | 62.65 | 0.51 (0.02) | 113.22 |
LSTM (no regularization) | 113.18 | 0.51 (0.02) | 234.05 |
AWD-LSTM | 64.27 | 0.51 (0.03) | 90.01 |
AWD-LSTM-Simon | 61.59 | 0.51 (0.03) | 144.45 |
AWD-LSTM-MoS | 62.44 | 0.52 (0.04) | 97.73 |
AWD-LSTM-MoS-Cache | 59.21 | 0.55 (0.06) | 100.56 |
AWD-LSTM-Cache | 50.39 | 0.53 (0.05) | 123.32 |
. | Perplexity . | Taylor exponent . | Perplexity from eval-AWD-LSTM . |
---|---|---|---|
Original Data Set | |||
Penn Tree Bank (Preprocessed) | - | 0.56 (0.14) | 40.70 |
Shuffled Data Set | |||
Penn Tree Bank (1-gram) | - | 0.50 (0.02) | 3,698.52 |
Penn Tree Bank (2-gram) | - | 0.50 (0.02) | 1,328.39 |
Penn Tree Bank (5-gram) | - | 0.50 (0.02) | 351.22 |
Penn Tree Bank (10-gram) | - | 0.50 (0.02) | 166.93 |
N-gram Language Model | |||
3-gram | 367.79 | 0.50 (0.02) | 1,697.99 |
5-gram | 561.65 | 0.50 (0.02) | 3,463.88 |
linear interpolation | 238.59 | 0.50 (0.02) | 965.58 |
Katz backoff 3-gram | 195.65 | 0.50 (0.02) | 420.48 |
Katz backoff 5-gram | 250.18 | 0.50 (0.02) | 471.03 |
Kneser-Ney 3-gram | 150.64 | 0.50 (0.02) | 1,324.67 |
Kneser-Ney 5-gram | 156.70 | 0.50 (0.02) | 1,411.14 |
HPYLM | 140.49 | 0.50 (0.02) | 412.13 |
Neural Language Model | |||
Simple RNN | 123.96 | 0.50 (0.02) | 321.31 |
GRU | 85.05 | 0.50 (0.02) | 258.12 |
QRNN | 62.65 | 0.51 (0.02) | 113.22 |
LSTM (no regularization) | 113.18 | 0.51 (0.02) | 234.05 |
AWD-LSTM | 64.27 | 0.51 (0.03) | 90.01 |
AWD-LSTM-Simon | 61.59 | 0.51 (0.03) | 144.45 |
AWD-LSTM-MoS | 62.44 | 0.52 (0.04) | 97.73 |
AWD-LSTM-MoS-Cache | 59.21 | 0.55 (0.06) | 100.56 |
AWD-LSTM-Cache | 50.39 | 0.53 (0.05) | 123.32 |