Table 1: 
Associative recall results. T is the input length. Note that line 8 still learns the task completely for T = 50, but it needs more than 100k training steps. Moreover, varying the activations or removing the update gate does not change the result in the last line.
ModelAcc.(%) T = 30/50.Prms.
GRU (ours) 21.5/17.6 14k 
GORU (ours) 21.8/18.9 13k 
EURNN (ours) 24.5/18.5 4k 
LSTM (ours) 25.6/20.5 17k 
FW-LN (Ba et al., 2016a) 100.0/20.8 9k 
WeiNet (Zhang and Zhou, 2017) 100.0/100.0 22k 
RUM λ = 0η = N/A (ours) 25.0/18.5 13k 
RUM λ = 1η = 1.0 (ours) 100.0/83.7 13k 
RUM λ = 1η = N/A (ours) 100.0/100.0 13k 
ModelAcc.(%) T = 30/50.Prms.
GRU (ours) 21.5/17.6 14k 
GORU (ours) 21.8/18.9 13k 
EURNN (ours) 24.5/18.5 4k 
LSTM (ours) 25.6/20.5 17k 
FW-LN (Ba et al., 2016a) 100.0/20.8 9k 
WeiNet (Zhang and Zhou, 2017) 100.0/100.0 22k 
RUM λ = 0η = N/A (ours) 25.0/18.5 13k 
RUM λ = 1η = 1.0 (ours) 100.0/83.7 13k 
RUM λ = 1η = N/A (ours) 100.0/100.0 13k 
Close Modal

or Create an Account

Close Modal
Close Modal