Table 1:

. | Training . | Test cross-entropy (× 10^{−2})
. | Test accuracy (%) . | Best RNN . | MDLRNN . | ||||
---|---|---|---|---|---|---|---|---|---|

set size . | MDLRNN . | RNN . | optimal . | MDLRNN . | RNN . | Type . | Size . | proof . | |

a^{n}b^{n} | 100 | 29.4 | 53.2 | 25.8 | 100.0 | 99.8 | Elman | 2 | Th. 4.1 |

500 | 25.8 | 51.0 | 25.8 | 100.0 | 99.8 | Elman | 2 | ||

a^{n}b^{n}c^{n} | 100 | 49.3 | 62.6 | 17.2 | 96.5 | 99.8 | Elman | 4 | Th. 4.2 |

500 | 17.2 | 55.4 | 17.2 | 100.0 | 99.8 | Elman | 4 | ||

a^{n}b^{n}c^{n}d^{n} | 100 | 65.3 | 68.1 | 12.9 | 68.6 | 99.8 | GRU | 4 | |

500 | 13.5 | 63.6 | 12.9 | 99.9 | 99.8 | GRU | 4 | ||

a^{n}b^{2n} | 100 | 17.2 | 38.0 | 17.2 | 100.0 | 99.9 | Elman | 4 | Th. 4.3 |

500 | 17.2 | 34.7 | 17.2 | 100.0 | 99.9 | GRU | 4 | ||

a^{n}b^{m}c^{n +m} | 100 | 39.8 | 47.6 | 26.9 | 98.9 | 98.9 | Elman + L1 | 128 | Th. 4.4 |

500 | 26.8 | 45.1 | 26.9 | 100.0 | 98.9 | Elman | 128 | ||

Dyck-1 | 100 | 110.7 | 94.5 | 88.2 | 69.9 | 10.9 | Elman | 4 | Th. 4.5 |

500 | 88.7 | 93.0 | 88.2 | 100.0 | 10.8 | LSTM | 4 | ||

Dyck-2 | 20,000 | 1.19 | 1.19 | 1.18 | 99.3 | 89.0 | GRU | 128 | |

Addition | 100 | 0.0 | 75.8 | 0.0 | 100.0 | 74.9 | Elman | 4 | Th. 4.6 |

400 | 0.0 | 72.1 | 0.0 | 100.0 | 79.4 | Elman | 4 |

