Skip to Main Content
Table 3: 
Character-level language modeling results. BPC score on the PTB test split. Using tanh is slightly better than ReLU (lines 2–3). Removing the update gate in line 1 is worse than line 2. Phase-inspired regularization may improve lines 1–3, 6–8, 9–10, and 16.
ModelBPCPrms.
RUM 1400 w/o upd. gate. (ours) 1.326 2.4M 
RUM 1000 (ours) 1.302 2.4M 
RUM 1000 w/ tanh (ours) 1.299 2.4M 
LSTM (Krueger et al., 2017) 1.270 – 
LSTM 1000 (ours) 1.240 4.5M 
RUM 1400 (ours) 1.284 4.5M 
RUM 2000 (ours) 1.280 8.9M 
2 × RUM 1500 (ours) 1.260 16.4M 
FS-EURNN-2’ (ours) 1.662 14.3M 
10 FS-GORU-2’ (ours) 1.559 17.0M 
11 HM-LSTM (Chung et al., 2017) 1.240 – 
12 HyperLSTM (Ha et al., 2016) 1.219 14.4M 
13 NASCell (Zoph and V. Le, 2017) 1.214 16.3M 
14 FS-LSTM-4 (Mujika et al., 2017) 1.193 6.5M 
15 FS-LSTM-2 (Mujika et al., 2017) 1.190 7.2M 
16 FS-RUM-2 (ours) 1.189 11.2M 
17 6lyr-QRNN (Merity et al., 2018) 1.187 13.8M 
18 3lyr-LSTM (Merity et al., 2018) 1.175 13.8M 
ModelBPCPrms.
RUM 1400 w/o upd. gate. (ours) 1.326 2.4M 
RUM 1000 (ours) 1.302 2.4M 
RUM 1000 w/ tanh (ours) 1.299 2.4M 
LSTM (Krueger et al., 2017) 1.270 – 
LSTM 1000 (ours) 1.240 4.5M 
RUM 1400 (ours) 1.284 4.5M 
RUM 2000 (ours) 1.280 8.9M 
2 × RUM 1500 (ours) 1.260 16.4M 
FS-EURNN-2’ (ours) 1.662 14.3M 
10 FS-GORU-2’ (ours) 1.559 17.0M 
11 HM-LSTM (Chung et al., 2017) 1.240 – 
12 HyperLSTM (Ha et al., 2016) 1.219 14.4M 
13 NASCell (Zoph and V. Le, 2017) 1.214 16.3M 
14 FS-LSTM-4 (Mujika et al., 2017) 1.193 6.5M 
15 FS-LSTM-2 (Mujika et al., 2017) 1.190 7.2M 
16 FS-RUM-2 (ours) 1.189 11.2M 
17 6lyr-QRNN (Merity et al., 2018) 1.187 13.8M 
18 3lyr-LSTM (Merity et al., 2018) 1.175 13.8M 
Close Modal

or Create an Account

Close Modal
Close Modal