We present a model architecture (see Figure 1A) that achieves good performance in all four memory tasks. The parameter values and other network specifics were kept the same in all simulations. An overview of these parameters is given in Table 1. We describe the details of these computations, followed by a discussion of our simulations. Code used to implement the architecture and run the simulations is available: https://osf.io/jrkdq/?view_only=e2251230b9bf415a9da837ecba3a7d64.
Figure 1:

(A) The network architecture used in all simulations: a standard multilayer network, complemented by a gated store composed of two independent memory blocks. The input layer and memory store both project to the hidden layer, which in turn projects to two output modules. There, activity encodes Q-values that drive action selection. (B) Memory unit within a block, with a closed gate: the memory content is maintained via self-recurrent connections. Additionally, a match value is computed between sensory and memory information by comparing a projection of the sensory information (mi') to memory content (mi). The comparison is performed by two units that respond to positive and negative disparities between the two values. Their output is summed across memory units, yielding one match value for each block. The closed gate inhibits the connection between mi'mi so that the original memory is maintained. Only when a gating action is selected, the recurrent projection is inhibited and mi'mi is opened so that memory content is updated. Figure 2 illustrates network activity in a task context, and Table 1 lists the number of units in each layer.

Figure 1:

(A) The network architecture used in all simulations: a standard multilayer network, complemented by a gated store composed of two independent memory blocks. The input layer and memory store both project to the hidden layer, which in turn projects to two output modules. There, activity encodes Q-values that drive action selection. (B) Memory unit within a block, with a closed gate: the memory content is maintained via self-recurrent connections. Additionally, a match value is computed between sensory and memory information by comparing a projection of the sensory information (mi') to memory content (mi). The comparison is performed by two units that respond to positive and negative disparities between the two values. Their output is summed across memory units, yielding one match value for each block. The closed gate inhibits the connection between mi'mi so that the original memory is maintained. Only when a gating action is selected, the recurrent projection is inhibited and mi'mi is opened so that memory content is updated. Figure 2 illustrates network activity in a task context, and Table 1 lists the number of units in each layer.

Close modal
Table 1:
Parameters Used in All Simulations, Unless Stated Otherwise.
SymbolNameValue
β Learning rate 0.15 
γ Temporal discounting factor 0.9 
λ Eligibility trace decay rate 0.8 
ε Exploration rate 0.025 
Symbol Name Size 
x Total input units 17 
xs Sensory input units 
xτ Time input units 10 
h Hidden units 15 
S Memory store (blocks) 
mi Units per memory block 14 
qint Output q-units (internal actions) 
qext Output q-units (external actions) 2 or 3 (task dependent) 
SymbolNameValue
β Learning rate 0.15 
γ Temporal discounting factor 0.9 
λ Eligibility trace decay rate 0.8 
ε Exploration rate 0.025 
Symbol Name Size 
x Total input units 17 
xs Sensory input units 
xτ Time input units 10 
h Hidden units 15 
S Memory store (blocks) 
mi Units per memory block 14 
qint Output q-units (internal actions) 
qext Output q-units (external actions) 2 or 3 (task dependent) 
Close Modal

or Create an Account

Close Modal
Close Modal