We present a model architecture (see Figure 1A) that achieves good performance in all four memory tasks. The parameter values and other network specifics were kept the same in all simulations. An overview of these parameters is given in Table 1. We describe the details of these computations, followed by a discussion of our simulations. Code used to implement the architecture and run the simulations is available: https://osf.io/jrkdq/?view_only=e2251230b9bf415a9da837ecba3a7d64.
Figure 1:

(A) The network architecture used in all simulations: a standard multilayer network, complemented by a gated store composed of two independent memory blocks. The input layer and memory store both project to the hidden layer, which in turn projects to two output modules. There, activity encodes Q-values that drive action selection. (B) Memory unit within a block, with a closed gate: the memory content is maintained via self-recurrent connections. Additionally, a match value is computed between sensory and memory information by comparing a projection of the sensory information ($mi'$) to memory content ($mi$). The comparison is performed by two units that respond to positive and negative disparities between the two values. Their output is summed across memory units, yielding one match value for each block. The closed gate inhibits the connection between $mi'→mi$ so that the original memory is maintained. Only when a gating action is selected, the recurrent projection is inhibited and $mi'→mi$ is opened so that memory content is updated. Figure 2 illustrates network activity in a task context, and Table 1 lists the number of units in each layer.

Figure 1:

(A) The network architecture used in all simulations: a standard multilayer network, complemented by a gated store composed of two independent memory blocks. The input layer and memory store both project to the hidden layer, which in turn projects to two output modules. There, activity encodes Q-values that drive action selection. (B) Memory unit within a block, with a closed gate: the memory content is maintained via self-recurrent connections. Additionally, a match value is computed between sensory and memory information by comparing a projection of the sensory information ($mi'$) to memory content ($mi$). The comparison is performed by two units that respond to positive and negative disparities between the two values. Their output is summed across memory units, yielding one match value for each block. The closed gate inhibits the connection between $mi'→mi$ so that the original memory is maintained. Only when a gating action is selected, the recurrent projection is inhibited and $mi'→mi$ is opened so that memory content is updated. Figure 2 illustrates network activity in a task context, and Table 1 lists the number of units in each layer.

Table 1:
Parameters Used in All Simulations, Unless Stated Otherwise.
SymbolNameValue
$β$ Learning rate 0.15
$γ$ Temporal discounting factor 0.9
$λ$ Eligibility trace decay rate 0.8
$ε$ Exploration rate 0.025
Symbol Name Size
$x$ Total input units 17
$xs$ Sensory input units
$xτ$ Time input units 10
$h$ Hidden units 15
$S$ Memory store (blocks)
$mi$ Units per memory block 14
$qint$ Output q-units (internal actions)
$qext$ Output q-units (external actions) 2 or 3 (task dependent)
SymbolNameValue
$β$ Learning rate 0.15
$γ$ Temporal discounting factor 0.9
$λ$ Eligibility trace decay rate 0.8
$ε$ Exploration rate 0.025
Symbol Name Size
$x$ Total input units 17
$xs$ Sensory input units
$xτ$ Time input units 10
$h$ Hidden units 15
$S$ Memory store (blocks)
$mi$ Units per memory block 14
$qint$ Output q-units (internal actions)
$qext$ Output q-units (external actions) 2 or 3 (task dependent)
Close Modal