Abstract
How episodic memories are formed in the brain is a continuing puzzle for the neuroscience community. The brain areas that are critical for episodic learning (e.g., the hippocampus) are characterized by recurrent connectivity and generate frequent offline replay events. The function of the replay events is a subject of active debate. Recurrent connectivity, computational simulations show, enables sequence learning when combined with a suitable learning algorithm such as backpropagation through time (BPTT). BPTT, however, is not biologically plausible. We describe here, for the first time, a biologically plausible variant of BPTT in a reversible recurrent neural network, R2N2, that critically leverages offline replay to support episodic learning. The model uses forward and backward offline replay to transfer information between two recurrent neural networks, a cache and a consolidator, that perform rapid one-shot learning and statistical learning, respectively. Unlike replay in standard BPTT, this architecture requires no artificial external memory store. This approach outperforms existing solutions like random feedback local online learning and reservoir network. It also accounts for the functional significance of hippocampal replay events. We demonstrate the R2N2 network properties using benchmark tests from computer science and simulate the rodent delayed alternation T-maze task.
1 Introduction
Forming memories of our lives’ episodes requires the ability to encode and store extended temporal sequences. Those sequences could be things said, places visited, or innumerable other sequences of states. Beyond enabling humans’ pastime of recounting our prior experiences, episodic memories are the basis of predictive models of how the world works that support adaptive decision making (Murty et al., 2016; Gershman, 2018). How brains build memories of temporal sequences remains poorly understood. It is known that specific brain circuits (e.g., the hippocampal formation Diba & Buzsáki, 2007; Hemberger et al., 2019; Schuck & Niv, 2019; Vaz et al., 2020; Eichenlaub et al., 2020; Fernández-Ruiz et al., 2019; Michon et al., 2019) and functional dynamics (e.g., hippocampal replay events) are particularly important. However, the functional principles by which the hippocampus and replay events enable sequence or episode encoding remain a puzzle. Machine learning approaches can solve this problem but are biologically implausible. Here, we explore how when the principles that underlie machine learning approaches are modified to be biologically plausible may elucidate our understanding of how the brain builds episodic memories.
Artificial neural networks, when trained with engineered machine learning approaches, are well capable of encoding protracted temporal sequences. Temporal sequence learning is a task solved particularly well by recurrent neural networks (RNNs). RNNs contain one or more layers of neurons with reciprocal connections among the neurons in that layer (i.e., recurrent connections). This means that the activity of a neuron is a function of both activity in other layers and the activity of its own layer a moment prior. When the recurrent connections are tuned appropriately, the network becomes capable of recognizing sequences, predicting upcoming transitions, and intrinsically recalling sequences. The key challenge, of course, is how to tune the connections. This breaks down to two specific questions: “What learning algorithm allows for reliable encoding and storage of extended sequences from as little as a single experience with that sequence?” and “How can this learning be done in a biologically plausible way?”
A potent learning algorithm that enables artificial RNNs to effectively encode extended temporal sequences is backpropagation through time (BPTT; Werbos, 1990). Briefly, BPTT works as follows: A sequence of patterns is applied to an input layer of an RNN. A recurrent layer integrates this input along with its own state, generating a temporally evolving pattern of activity. A full record of the spatiotemporal activity of the input and recurrent layers is stored. Given the activity of the recurrent layer, the network can then predict subsequent outputs or any signals that it is trained with. Differences between the predicted and actual next states are errors. Errors are the product of the current connection strengths and the past activity of the input and recurrent network layers. Following a presentation to a sequence, during an offline learning phase, BPTT combines information about the connection strengths and activity history while propagating the error back along the computation graph to attribute the blame for the errors to individual network connections (see Figure 1A). Finally, individual connections are weakened or strengthened proportionally to their share of the blame for the error. With repeated presentations and epochs of offline learning, the network becomes able to accurately predict or generate the sequence.
(A) Temporally unfolded computational process of an RNN in BPTT. The recurrent layer at each time step generates an output through the blue projection; error signals (yellow circles) are computed according to the output layer and propagated to the recurrent layer via the yellow projection. To compute the gradient at different time steps, error signals also need to be propagated temporally, that is, backward in time. (B) Consolidator-cache model. The consolidator-cache system first generates output and gets feedback from the environment. Then the cache network stores sequences and plays them back in reverse order to the consolidator network, which in turn optimizes itself based on the replayed content.
(A) Temporally unfolded computational process of an RNN in BPTT. The recurrent layer at each time step generates an output through the blue projection; error signals (yellow circles) are computed according to the output layer and propagated to the recurrent layer via the yellow projection. To compute the gradient at different time steps, error signals also need to be propagated temporally, that is, backward in time. (B) Consolidator-cache model. The consolidator-cache system first generates output and gets feedback from the environment. Then the cache network stores sequences and plays them back in reverse order to the consolidator network, which in turn optimizes itself based on the replayed content.
Though BPTT is an effective RNN learning algorithm, its value for explaining how brains form episodic memories is questionable because it is not biologically plausible. The implausibility results from violations of the locality constraint. The locality constraint captures the fact that biological connections, synapses, can only be changed given locally available information. BPTT violates this constraint in two key ways.
First, BPTT stores and uses an external record of the network’s past activity. Though this activity information was available as it propagated across the network during the presentation of the input sequence, it is no longer locally available during the offline learning phase. Moreover, the prior activation states cannot be recomputed with locally available information. This is because the state at time t depended on the state at time , information not available at time . Ordinary BPTT solves this by literally saving a record of the activation history (e.g., in RAM or GPU memory). Resolving this form of biological implausibility requires addressing how information about the activation history can be obtained in reverse chronological order with only locally available information.
Second, BPTT violates the locality constraint in how the information about current connection strengths is used in the offline learning phase. To attribute blame for error to individual connections, BPTT propagates error backward over the computation graph. This can be accomplished in two conceptually distinct ways, but both are biologically implausible. The first is that a separate error network is implemented wherein the connections (e.g., between neurons A and B) are defined as the transpose of the connection in the main network. In other words, the connection BA in the error network is identical to the strength of AB in the main network. This makes it so that the error passed from B to A in the error network is proportional to the amount of activity passed from A to B in the main network. In this way, the error network accurately attributes errors to each connection. The main network is then updated accordingly. The key implausibility, called the “weight transport problem,” is how the connections of the error network mirror those of the main network and how the main network updates are informed by the processing in the error network. Addressing the weight transport problem but suffering from another implausibility is the second way to backpropagate errors. The second approach propagates error backward in the main network itself, firing connections backward with information about the error. This effectively removes the need to transfer connection information between networks but creates the need to pass information backward across connections. Though dedicated backward projections exist particularly in the sensory processing stream and are essential in theories like predictive coding (Huang & Rao, 2011), individual synapses generally do not run in reverse themselves in terms of propagating downstream neural activity (Lillicrap et al., 2020). Resolving these implausibilities requires addressing the question of how the connection strength information can be factored into learning so that only locally available information is used.
Establishing biologically plausible means of tuning neural networks to store temporal sequences is an important and ambitious goal. This goal is important for its potential to offer a functional hypothesis for how neural systems support memory for episodes or protracted sequential events. It is ambitious because the native mechanisms of BPTT were engineered to specifically meet the functional needs. Reaching this goal requires addressing biological implausibilities regarding retrieving the activation history of the input and recurrent layers and how information about connection strengths is used to attribute error across connections and backward in time.
Solutions have been proposed previously for both implausibilities, but each has suffered from notable limitations. The utilization of external storage, for example, has been addressed with various approaches. The specifics of those approaches differ, but they share a common feature. The solutions omit the need for offline access to a record of the activity through clever handling of the activity while it is still present during the original presentation of the sequence. That is, they compute the sources of error in an online way (Depasquale et al., 2018; Murray, 2019; Tallec & Ollivier, 2017; Bellec et al., 2019). These are remarkable for their ability to leverage on-the-fly computations to support learning, but these adaptations come at a substantial cost to final performance. Moreover, the omission of offline replay does not improve biological plausibility. Offline forward and reverse replays are well established to occur in biological neural networks (Foster & Wilson, 2006). Indeed, there is strong evidence that offline replay is essential for learning (Jadhav et al., 2012). For offline replay to exist, however, there must be a way to regenerate the patterns. Defining how this occurs is a puzzle that we address in this article.
Solutions also exist to address the weight transport problem and reversible connections problem, but they too have limitations. For example, the reversible connection problem comes up in training feedforward networks. In that setting, it was shown that knowledge of the weights is not needed and that fixed random top-down connections can function to train various networks (Lillicrap et al., 2016; see also Akrout et al., 2019). This is referred to as feedback alignment. Though originally designed for feedforward networks, feedback alignment can also be used in RNNs. One such variant is random feedback local online (RFLO; Murray, 2019). The feedback alignment approach of RFLO effectively addresses the biological implausibility issue. Critically, however, RFLO is functionally limited to propagating error only one step backward in time (Marschall et al., 2020). A method capable of tracking the temporal gradient over many time steps in RNNs remains lacking but is a gap we address in this article. There’re also variants like e-prop (Bellec et al., 2019), which claim to be biologically plausible learning algorithms for spiking and other types of RNNs. However, e-prop can be decomposed into several major versions. The first version, e-prop-1, to our knowledge, works in a similar way to direct feedback alignment in RNNs as RFLO does. Other later versions of e-prop, specifically, e-prop-2 and -3, leverage extra information from modules learned via BPTT to improve the effect of online error propagation.
Though evolution-based explanations can be used to argue for the biological plausibility of such BPTT-optimized external error modules, we intend not to consider them as pure BPTT-free biologically plausible learning algorithms for RNNs. We present here a biologically plausible recurrent network model of episodic learning based on BPTT without suffering from the biological implausibilities of BPTT. This model, referred to here as R2N2 (short for reversible recurrent neural network), fully satisfies the locality constraint. R2N2 uses no external record of the network’s past activity. Instead, it leverages two previously described separate solutions for enabling reversible reactivation of a network, one for the input layer and one for the recurrent layer. Further, R2N2 neither transports weights nor assumes reversible synapses. Instead, it leverages an error network that is controlled by and controls the main network in a way that allows error backpropagation to train the network without weight transport. The individual components of R2N2 are each based on established approaches. The full R2N2 model, and the fact that it collectively represents a high-functioning biologically plausible replacement for BPTT, is novel and innovative. The specifics of each separated solution and the operation of the full R2N2 model are described in section 2. In section 3, we demonstrate the sufficiency of each solution separately. We then combine the components to form R2N2 and benchmark the performance of this fully biologically plausible implementation of BPTT, showing it surpasses current state-of-the-art biologically plausible implementations. Finally, to facilitate comparison to sequence learning in brains, we show that R2N2 can learn the classic delayed alternation T-maze task. While our model is designed to replace biologically implausible components with ones that are plausible in principle, it is not designed to simulate specific anatomy or physiology. Nonetheless, as illustrated and discussed, the full model recapitulates several key functional properties of the hippocampal formation, including place cells and offline replay.
2 Model
The full model consists of two interconnected RNNs referred to here as the consolidator and the cache (see Figure 1B). We refer to the full model as the reversible recurrent neural network (R2N2). The consolidator and cache are both RNNs but with different architectures and learning rules as each is designed for distinct functions. These are briefly summarized here, and specifics are unpacked in detail in the sections that follow. The consolidator is the primary RNN, designed to have a large storage capacity and robust generalization ability. A trained consolidator is functionally akin to an RNN trained with BPTT. The cache is an auxiliary network, designed to support the training of the consolidator. The cache is optimized for rapid encoding and high-precision bidirectional retrieval of input sequences. This enables it to perform one-shot learning of to-be-learned sequences. During offline processing, the cache replays the sequences in reverse order to the consolidator to train the durable trace of the memory.
2.1 The Consolidator Network
Schematic structure of consolidator and cache. (A) Projections of neuronal groups A and B in the consolidator network. The consolidator network is composed of an activity network and an error network. The activity network, shown in red, represents , while the network in blue represents . Notice that the direction of sequence play (forward or reverse) is not fixed by the arrow but determined by the competition between and , unlike in Chang et al. (2017), where the forward and backward connections are the same and are controlled explicitly through the code logic. (B) The temporal unfolding of forward and backward computation in the consolidator. When projection is stronger, the whole network operates in the forward mode. In this case, the network updates its activations in both groups A and B, generates outputs, and computes error signals accordingly. When the connection is stronger, the whole network turns into backward mode. Neuronal groups A and B generate reversed activation sequence and thus can be used to propagate activation backward to reconstruct the previous time step activities, bypassing the temporal credit assignment issue caused by nonlocality and obviating the need to store previous activities. The error network recursively multiplies the error vector by the feedback alignment random matrix to compute the error vector at the previous time step. (C) Left: Projections in the cache. In the cache, each neuron receives projections from both (orange) and (green left). Right: detailed description of connection pattern in a single neuron . The final incoming weight of neuron is determined by , which can be viewed as a result of competing oscillating interneurons tuning inputs from and synaptic inputs. (D) A schematic description of state transitions in cache. By periodically switching between and via the control signal (lower panel), the network operates in two sets of weights. Each of them builds state attractors between successive states and (energy landscape slopes between two edges of the same phase). Different successive state pairs are connected in a chaining way and thus form a long state sequence (upper panel).
Schematic structure of consolidator and cache. (A) Projections of neuronal groups A and B in the consolidator network. The consolidator network is composed of an activity network and an error network. The activity network, shown in red, represents , while the network in blue represents . Notice that the direction of sequence play (forward or reverse) is not fixed by the arrow but determined by the competition between and , unlike in Chang et al. (2017), where the forward and backward connections are the same and are controlled explicitly through the code logic. (B) The temporal unfolding of forward and backward computation in the consolidator. When projection is stronger, the whole network operates in the forward mode. In this case, the network updates its activations in both groups A and B, generates outputs, and computes error signals accordingly. When the connection is stronger, the whole network turns into backward mode. Neuronal groups A and B generate reversed activation sequence and thus can be used to propagate activation backward to reconstruct the previous time step activities, bypassing the temporal credit assignment issue caused by nonlocality and obviating the need to store previous activities. The error network recursively multiplies the error vector by the feedback alignment random matrix to compute the error vector at the previous time step. (C) Left: Projections in the cache. In the cache, each neuron receives projections from both (orange) and (green left). Right: detailed description of connection pattern in a single neuron . The final incoming weight of neuron is determined by , which can be viewed as a result of competing oscillating interneurons tuning inputs from and synaptic inputs. (D) A schematic description of state transitions in cache. By periodically switching between and via the control signal (lower panel), the network operates in two sets of weights. Each of them builds state attractors between successive states and (energy landscape slopes between two edges of the same phase). Different successive state pairs are connected in a chaining way and thus form a long state sequence (upper panel).
We define the ( can be or ) projections as forward connections with the being the backward projections and being the output projection. In this section, we focus on the recurrent units in A and B and leave the discussion of output to later sections, in which the network firing rate dynamics generated by is , which is called the forward sequence and represents the normal running or forward replay phase of this RNN. To generate a reversed sequence of mathematically, one needs to simply flip the sign of derivatives in the dynamics equation and then integrate from to . If we discretize the dynamics equations into a different form, it will degenerate to the case of a reversible deep neural network block (Chang et al., 2017), which has been shown to be memory constant in various tasks as the storage requirement of neural activity equals the number of units in the network.
The intuition is depicted in Figure 2B. The blue circuit represents a transformation ( represents the operation that combines and ) from to in the next time step: and . Then, as long as there exists another inverse operation that satisfies , we can construct a circuit that turns into : and .
This can be easily implemented with local dendritic propagation and local training (see equation 2.2) on ’s parameter (Poirazi et al., 2003; Guerguiev et al., 2017). Although equation 2.2 does not guarantee to be exactly zero after training, in practice we find that it works extremely well (see Figure 3). This detailed balance between and projections thus makes it possible to run the whole system in a backward manner: If the excitability of a trained projection is scaled by a factor of 2, the network will switch from forward to backward replay. The resulting effective projection will be approximately , as when , becomes . This is precisely the method we used to switch the consolidator between forward and reverse play in the simulations below.
Normalized firing rates of neuronal groups A and B in the consolidator (For each group, the first 5 neurons are selected, while overall the consolidator has 128 neurons). (A, D) Solid lines represent firing rates in forward running (B, E) Dashed lines represent firing rates during backward running. (C, F) Difference of firing rates between forward and backward running (flipped). Signs of symmetry are shown clearly in comparisons between the pair.
Normalized firing rates of neuronal groups A and B in the consolidator (For each group, the first 5 neurons are selected, while overall the consolidator has 128 neurons). (A, D) Solid lines represent firing rates in forward running (B, E) Dashed lines represent firing rates during backward running. (C, F) Difference of firing rates between forward and backward running (flipped). Signs of symmetry are shown clearly in comparisons between the pair.
There are many possible choices for and projections. For example, each can be a simple two-layer neural network if we consider the branching structure of dendrites. Neurons with dendritic structures have been demonstrated experimentally to be more complex than a simple neural network with just one activation function (Poirazi et al., 2003), since the inputs, prior to summation and thresholding in the soma, are initially aggregated with another nonlinear activation function in elongated terminal dendrites. In this case, a complex architecture can be approximated by as long as the complexity of is not higher than . For demonstration, in the following experiments, we choose the simplest form of to be a single-layer network taking the form of ( is tanh) and to be another simple networks like . Mathematically, this reduces the learning process of backward projection to be a regression process, which is known to be trivial to solve with local learning rules but is still enough to support complex sequential computation as nonlinearity is involved in each time step. This architecture (see Figure 2A) is scalable, and thus it is possible to support more complicated computations. If one adds another group along with groups and , the coupling can be extended, and the backward running can still be preserved.
Figure 2A depicts a network that can produce sequences of activity in reverse, and the same network structure can also be used to propagate error signals in reverse. Figure 2B shows two network graphs, one that runs activity sequences in reverse and one that runs error sequences in reverse.
Two types of implausibility exist in this process: (1) the weight transport problem (i.e., how to compute the transient component of with ; Whittington & Bogacz, 2019) and (2) the external storage of activations (i.e., how to compute in 2.3). On the one hand, for the first issue, Lillicrap et al. (2016) proposed an alternative local error circuit, feedback alignment (FA). With fixed random projections, it has been shown to be effective on various deep network architectures and tasks (Nøkland, 2016; Moskovitz et al., 2018). Our synaptic competition balance mechanism described above addresses the second issue, as it reconstructs the previous network states in a backward manner. This eliminates the need to store the neural activities at multiple time steps in the past.
2.2 The Cache Network
In the replay phase, the consolidator itself cannot generate the dynamics without knowing and, by extension, . Superficially, this brings us back to the original dilemma: to design another RNN that can run backward. The difference is that it should have the capability to memorize a given sequence after as little as a single exposure, which makes the problem harder. Nevertheless, the fact is that sensory input sequences in most cases are usually in a space that has many fewer dimensions compared with the number of neurons in the consolidator, and this suggests a solution.
The cache network thus can learn arbitrary sequences of patterns in a local, stable, and one-shot manner as the weights update rule of the Hopfield network is local and can be computed with only a single exposure to the inputs. The cache network must discretize the input stimuli in time, so a continuous Hopfield network would not be appropriate. This is not a problem, though, as arbitrary sequences can be learned provided the sampling rate is high enough to avoid aliasing. Moreover, this is appropriate as a model of biology, as there is evidence that human perception itself is discretized in time (Landau et al., 2015).
2.3 R2N2: Sequence Learning with Consolidator and Cache
In training a vanilla RNN with BPTT, one needs to perform the following steps:
Initialize the RNN and run forward with temporally varying inputs.
Store the inputs sequence, hidden unit activations, output sequence generated, and the target sequence to an external memory device.
After the whole input sequence has been received, compute the error between output and target for the last time step.
Extract input, output, and target pairs from the memory device in a temporally reversed order, propagate the error in a backward manner, and compute weight changes simultaneously.
Apply the accumulated weight changes after finishing the backward running phase.
The first point to note here is that the external storage is where the main biological implausibility lies. It’s unclear how the brain could store the activity in each cell at each time step somewhere else and replay it precisely. However, with the consolidator and cache, this activity memory can be reconstructed dynamically. Notably, the storage size requirement is substantially reduced, as the consolidator can reproduce its historical activations as a reverse-play sequence with the help of the cache. Consider a case in which the consolidator has 128 neurons and the channel size of inputs is 16. The standard BPTT needs to store a sequence of -dimensional vectors as all input and hidden state vectors need to be preserved in the temporal unfolding process. However, in our model, the system only requires a cache to store a sequence of 16-dimensional vectors representing only the sequence of input vectors, because the consolidator can reconstruct its activity by itself. This means a memory of sensory experience instead of all neural activities is enough to support sequence learning. This approach also matches the empirical findings that the replay of location sequences improves animals’ performance in given spatial navigation tasks (Ambrose et al., 2016).
Second, we modify the standard learning process in BPTT to fit the R2N2 model. In BPTT, the input and target channels usually belong to different categories. Taking the classical random dots perceptual decision-making task as an example, the input is usually set to the coherence of randomly moving dots’ directions, and the desired output target is the eye motion direction (Lo & Wang, 2006). This makes the backward running phase more complicated as the system needs to store the desired target and input patterns together and only compute the error signal based on the difference between generated outputs and desired targets. Instead, the process can be simplified if there is no categorical difference between desired outputs and inputs. Taking inspiration from predictive coding in sequence learning (Zhang et al., 2019), we view performing cognitive tasks as a process of online sequence prediction: the task-relevant stimuli, action signal, and reward signal are treated equally and are concatenated into an integrated sensory inputs vector. Regardless of their structure, various cognitive tasks then can be reduced to the same type of sequence prediction task. Thus, the task reduces to predicting the future state at time on the basis of task-relevant variables at time .
Based on these assumptions and modifications, we propose that learning a specific task can be divided into two phases, with the first one mapped to fast learning and the second one to slower statistical learning, essentially as a consolidation process. In the first stage, the animal explores the task settings and environment randomly, generating both rewarded and unrewarded sensory sequences involving all task-relevant variables. During this initial phase, the cache memorizes sensory sequences that are rewarded at the end of each trial, which can be learned in a one-shot fashion as it is a Hopfield network in principle (see equation 2.6).
In the second phase, the cache starts reverse replay, sending signals to the consolidator and thus training it. This means a target for the cache at time is actually input for both the consolidator and cache at time , so that the cache does not need to store a target sequence separately. Then the consolidator in the second phase optimizes its forward projections according to the targets provided by the cache and its own reconstructed reversed activations. Once its forward projections are changed, the backward projections will be adjusted accordingly to cancel . Notice that the adjustment of and (see equations 2.4 and 2.2) could occur simultaneously as the learning of backward projection is an online process. Consequently, the knowledge about the rewarded sensory experience is transferred from cache to consolidator via fast learning at first and then statistical learning later. Besides, as the consolidator can go back to states it experienced, it can also perform forward replay using projections , which could be used to explore possible future outcomes when the model is in an intermediate state (Van Der Meer & Redish, 2010; Pfeiffer & Foster, 2013). To sum up, we view this process as an implementation of Buzsaki’s two-phase model (Lörincz & Buzsáki, 2000) for training long-term memories as the interplay between consolidator and cache in two phases simulates the entorhinal-hippocampal communication.
3 Results
3.1 Consolidator
The consolidator is the primary long-term memory store of R2N2. The architecture is illustrated in Figures 2A and 2B and described in section 2.1. To be a biologically plausible high-functioning network, the consolidator must demonstrate two capabilities with only local information: (1) reversibility, that is, the ability to reconstruct its activity backward, and (2) error, that is, the ability to tune the connection in proportion to the errors produced so that error can be reduced. BPTT uses nonlocal solutions to achieve both capabilities. In this section, we show functioning solutions to both that use only local information. Competing complementary subnetworks (Chang et al., 2017) can reconstruct the spatiotemporal consolidator activity patterns in reverse without an external record. Feedback alignment (Lillicrap et al., 2016) can reduce the reconstruction error over training without weight transport or reversible synapses. It should also outperform the highest functioning biologically plausible algorithms. We benchmark R2N2 performance and demonstrate that R2N2 performs comparably to BPTT and better than RFLO and echo state networks.
3.1.1 Reversibility
To achieve reversibility in the consolidator network, we used the competing subnetwork approach described previously (Chang et al., 2017). To demonstrate reversibility, the consolidator should be able to reconstruct an activation sequence in reversed temporal order without external signals so that the error can be aligned to the dynamics that led to the error.
A random time-varying pattern of activity was applied to the input layer of the consolidator that induced a complex time-varying pattern of consolidator activity. A representative example of the consolidator activity is shown for five neurons in each of the consolidator subnetworks in Figure 3A. The activity of each consolidator subnetwork is part a function of the input from the other subnetwork by way of connections (see section 2 for implementation details). A separate set of intersubnetwork connections, , learns to be equal and opposite in sign to through a local learning rule (i.e., without use of nonlocal information). It is proper training of that allows reversible reconstruction of the consolidator activity. Figure 3B illustrates the reversibility of the consolidator after four epochs of training. Shown are 100 time steps of the same five neurons as shown in Figure 3A as the newly trained connections control the consolidator activity. Comparing this activity to the forward pattern by flipping the time axis and subtracting it from the forward pattern reveals that the activity was well matched, as shown in Figure 3C.
3.1.2 Error Backpropagation with Feedback Alignment
Backpropagation allows a network to take error information that becomes available at the end of a sequence and retroactively tune connection strengths to reduce error. Successful backpropagation requires both a record of the prior activity and a means of relaying the error signal. The record of prior activity in the consolidator is provided by the reversibility property shown above. To relay the error signal, we used the feedback alignment approach described previously (Lillicrap et al., 2016). To demonstrate successful backpropagation, the consolidator should be able to adjust its connections to be able to minimize error and thereby reconstruct an input sequence.
A random binary vector data stream was generated to serve as inputs as shown in Figure 4A. Notably, the random inputs included repeated elements at both adjacent and remote time points challenging the network to attribute the error appropriately as a function of time (i.e., not simply learn that state Y always follows X). The consolidator was trained to generate input patterns in the next time step (i.e., predict transitions) using data at the current time step as a cue. After 50,000 training steps, the consolidator was able to predict the random binary vector as shown in Figure 4B. This experiment shows that the consolidator can learn to map input patterns to outputs at a given time despite using shared weights across multiple time steps. This implies that the temporal credit assignment problem is solved effectively (i.e., the correct connections were adjusted for an error resulting from an earlier activity pattern) through the use of feedback alignment and consolidator reversibility.
Sequence memorization task for the consolidator. The consolidator is trained to recall elements in the next step using data from the current time step. The top row represents the input data stream, and the bottom row represents the sequence generated by the consolidator.
Sequence memorization task for the consolidator. The consolidator is trained to recall elements in the next step using data from the current time step. The top row represents the input data stream, and the bottom row represents the sequence generated by the consolidator.
3.1.3 Performance
We benchmarked the consolidator's performance by comparing its memorization capacity to that of BPTT, RFLO (random feedback local learning; Murray, 2019) and ESN (Echo state networks; Maass et al., 2002)—two high-performing, biologically plausible sequence encoders. e-prop is not included, as, per our discussion in section 1, we see no fundamental difference between BPTT-free e-prop and RFLO. Therefore, we compared the performance of the four algorithms (consolidator, BPTT, RFLO, and ESN) on a character prediction task (see Figure 5A). The character prediction task (Rodriguez et al., 1999) is a classic test for assessing RNN capability to encode sequences in the face of strong interference. In short, the input stimuli are a stream of s and s (see section 2 for details). Performing this task requires accurately predicting whether the next character is another repeat or a switch, and this requires staying oriented to how many repeats have come already. We also tested the sequential MNIST task, but this turned out to be surprisingly easy and not a useful way to discriminate among models. R2N2 could make use of local or compressed information to perform the task, and even an echo state network can solve the problem.
(A) A schematic view of task. Each model is trained to predict the next character based on the previous inputs. (B) Performance comparison on the task. The upper rows show averaged loss curves while the lower rows show prediction accuracies. For all curves, solid lines represent the running average (window 1000) while the shadowed lines represent the corresponding standard error. From the left-most to the right-most column, the range of in each data set is linearly increased from a bin including 1 to 4 to a bin including 21 to 24; thus, the corresponding sequence length ranges from to (. is used as both and are repeated times, and the extra two elements are the newline symbols in the middle and end; see panel A for details). Notice that for each column, the results are visualized for all sequence lengths within its range. Orange lines represent the consolidator with the transpose of forward matrices as backward matrices. Blue lines represent the consolidator with fixed random backward matrices, implementing feedback alignment. Green lines represent RFLO, and red lines represent an echo state network (ESN).
(A) A schematic view of task. Each model is trained to predict the next character based on the previous inputs. (B) Performance comparison on the task. The upper rows show averaged loss curves while the lower rows show prediction accuracies. For all curves, solid lines represent the running average (window 1000) while the shadowed lines represent the corresponding standard error. From the left-most to the right-most column, the range of in each data set is linearly increased from a bin including 1 to 4 to a bin including 21 to 24; thus, the corresponding sequence length ranges from to (. is used as both and are repeated times, and the extra two elements are the newline symbols in the middle and end; see panel A for details). Notice that for each column, the results are visualized for all sequence lengths within its range. Orange lines represent the consolidator with the transpose of forward matrices as backward matrices. Blue lines represent the consolidator with fixed random backward matrices, implementing feedback alignment. Green lines represent RFLO, and red lines represent an echo state network (ESN).
The panels of Figure 5B show the mean squared error (MSE) and correct rates of the consolidator (blue), BPTT (orange), RFLO (green), and ESN (red) for sequences of increasing length. We used 64 neurons for all networks in this experiment. Since the consolidator is composed of two groups of neurons, for fair comparisons, each group is set to 32 neurons. The left-most panels show the performance for input sequences where and the right-most panels show the performance when . Given differences in how each algorithm handles the temporal gradient (as unpacked in the section 2), we expected that the consolidator performance would be close to BPTT and better than RFLO (Performance: ). The results match our prediction and are shown in Figure 5B. Across sequence lengths, the consolidator performs consistently better than RFLO and ESN, approaching the performance of the biologically implausible BPTT. Trained RFLO networks can reach a correct rate with an upper bound at around 0.5, which reflects that the error gradient in RFLO is essentially limited to one step backward in time (Marschall et al., 2020), while the consolidator can propagate the error gradient backward multiple steps in time. By comparison, the ESN can barely generate any appropriate outputs when the sequence length exceeds 10 (depicted by the red lines in the right four columns of Figure 5B). It should be noted that the total sequence length is doubled by the single , as in the task, where each letter is repeated times. This shows the advantage of multi-time-step temporal error signal propagation over online error minimization, even with the constraint of no external storage of neural activation history.
For shorter sequences (the first two columns in Figure 5B), the consolidator and BPTT have similar asymptotic performance. As the sequence length increases, the performance of all models decreases (bottom row of Figure 5B), and the divergence between a consolidator implemented with BPTT (orange lines) versus feedback alignment (blue lines) gradually increases. This divergence shows that feedback alignment does have limits relative to pure BPTT, which maintains a perfect record, for temporal propagation across numerous time steps.
3.2 Cache
The cache network functions as the primary input to the consolidator. Functionally, it performs rapid memorization of the input sequence for subsequent playback to the consolidator during the offline learning phase. BPTT uses an externally stored record of the input sequence that is aligned with the backpropagated error. To be a biologically plausible high-functioning network, the network must be capable of storing a sequence of states in a way that can be retrieved in reverse order (to synchronize with the reverse replay in the consolidator) after a single training trial.
To achieve this, the cache is itself a classic form of a recurrent neural network, using well-established learning principles (akin to a Hopfield network with multiple weight matrices) that enable retrieval of stored states in forward or reverse order as described in full detail in the section 2. To demonstrate this ability, we tested the ability of an isolated cache network (i.e., with no consolidator network connected) to retrieve a sequence of randomly generated binary vectors.
As shown in Figure 6, the cache was presented with 20 distinct binary vectors over time. Transitions between adjacent vectors were encoded by alternating weight matrices based on a control signal (see Figure 6A). With only the single presentation, the cache can step through the same set of states (see Figure 6B). This playback can be performed in the forward or backward direction depending on which pattern the cache is initialized with. In our experiments, for the purpose of imposing learning on the consolidator, the cache network is usually initialized with its terminal states, acting as a cue to trigger learning. The timing of each transition is tuned by the control signal, allowing the network to intrinsically regulate the retrieval. With this ability, the cache can support playback of the input sequence so that it is synchronized with consolidator processing.
One-shot sequence memorization task for the cache. (A) A cache trained to recall the random binary sequence with an external periodic signal. Top: the oscillating external control signal . Bottom: The relative hamming distance between the cache’s activation and all patterns. A larger pattern index represents a pattern that appears later in the given sequence. (B) The same as panel A except that it uses an internally generated control signal.
One-shot sequence memorization task for the cache. (A) A cache trained to recall the random binary sequence with an external periodic signal. Top: the oscillating external control signal . Bottom: The relative hamming distance between the cache’s activation and all patterns. A larger pattern index represents a pattern that appears later in the given sequence. (B) The same as panel A except that it uses an internally generated control signal.
3.3 R2N2 Solving Sequence Learning Problems
The results shown thus far demonstrate that each component of R2N2 is capable of performing its function as intended. In this section, we demonstrate that nothing is lost and nothing additional is needed when the components are assembled into the full R2N2 model while adhering to the locality constraint. That is, we show that R2N2 is capable of encoding the memory of temporal sequence into a recurrent neural network using only local information. Given that our motivation was to understand how brains enable episodic memory, we applied R2N2 to a simulation of a T-maze navigation task, in which the animal needs to make decisions to turn to either the left or right end of the horizontal branch to get a reward based on the visual cue at the beginning of the maze (see Figure 7). In our simulations, the consolidator and cache network are discretized with a time step set to 10 ms. Under this setting, single trials can be encoded into sequences with lengths up to 106.
T-maze task trained with consolidator and cache. (A) Task structure and training paradigm. Upper: The animal makes a decision to run and then turn to the left or right according to the cue type (black or gray block in the top of the T-maze) to get reward. Bottom: The system first selectively receives rewarded sensory sequences (red trajectories in the T-maze) and stores it in the cache, which then performs reverse replay, providing a reversed input sequence to the consolidator. The consolidator then is trained to generate rewarding predictions. (B) Place representations in the hidden unit firing rates of the consolidator. For left and right rewarded trials, neurons are sorted according to the distance between the starting point and positions with the highest firing rates. The first vertical dashed line represents the distance at which the cue ends, and the second one indicates where the left or right decision point lies. (C) The distribution of place representation density in neurons. Mann-Whitney tests are performed between the density of three crucial regions (cue, turning point, and reward) and that of all other regions. For all of them, .
T-maze task trained with consolidator and cache. (A) Task structure and training paradigm. Upper: The animal makes a decision to run and then turn to the left or right according to the cue type (black or gray block in the top of the T-maze) to get reward. Bottom: The system first selectively receives rewarded sensory sequences (red trajectories in the T-maze) and stores it in the cache, which then performs reverse replay, providing a reversed input sequence to the consolidator. The consolidator then is trained to generate rewarding predictions. (B) Place representations in the hidden unit firing rates of the consolidator. For left and right rewarded trials, neurons are sorted according to the distance between the starting point and positions with the highest firing rates. The first vertical dashed line represents the distance at which the cue ends, and the second one indicates where the left or right decision point lies. (C) The distribution of place representation density in neurons. Mann-Whitney tests are performed between the density of three crucial regions (cue, turning point, and reward) and that of all other regions. For all of them, .
This was not intended to be a simulation of the brain itself. Rather, it was to test and examine the functionality of the model in a setting parallel to one commonly used to study memory in rodent models.
Briefly, the model alternately explored a T-maze and performed offline learning after collecting enough experiences. As with rodents learning to complete the task, the model generated actions that directly affected the sensory inputs that formed the episodic sequences. Thus, the learning task was two-fold: encoding the experienced sequences to enable accurate prediction of upcoming transitions and adaptive selection of actions to collect rewards. This is fundamentally different from the benchmark tests presented above in that the question is not whether the network can simply recall a training sequence.
Similar to the paradigm discussed previously, we performed another simulation to test both performance and a match with empirical data from the hippocampus. We set the sensory inputs to both consolidator and cache as a concatenated binary vector data stream at time step , with for visual observation, for action, and for the presence of the reward.
During training, the R2N2 model learns from successive trials, and its capability to generate correct response increases over time, which can be treated as a biased trajectory sampling process from the distribution of all possible behaviors. After 500 epochs of training (each epoch is composed of 100 trials), the (consolidator/cache) system successfully mastered the task with a correct performance rate above 90% (see Figure 8D), comparable to the results of animal experiments (see Figure 8A), in which the rat learns the task after around eight sessions (15 to 20 minutes per session). The performance rise is driven by a series of gradually decreasing replay epochs of training (see Figure 8E). Note that since we did not intend to model the exact change in reverse replay frequency observed in animals (see Figure 8B), the correct rate ratio curve of the model (see Figure 8D) may distort if the decrease in replay frequency is slower in the initial stage. Nonetheless, as long as the model undergoes sufficient reverse replay, the final correct ratio will eventually converge to the same level.
Model versus rat behavior in navigation tasks in comparison with results, reproduced with permission from Shin et al. (2019). Panels A, B, and C are from Figure 4 in Shin et al. (2019). Panels D, E, and F are results from the the consolidator-cache model. (A, D) Throughout training the model and animal performance steadily increase. (B, E) During the training process, the replay rate gradually decreases as the performance increases. (C, F) C Animals have relatively balanced forward and reverse replay in the all training sessions regardless of performance. (F) In the system of the cache and consolidator, the replay rate is balanced by definition as forward replay involves learning of backward circuits and backward replay trains forward circuits.
Model versus rat behavior in navigation tasks in comparison with results, reproduced with permission from Shin et al. (2019). Panels A, B, and C are from Figure 4 in Shin et al. (2019). Panels D, E, and F are results from the the consolidator-cache model. (A, D) Throughout training the model and animal performance steadily increase. (B, E) During the training process, the replay rate gradually decreases as the performance increases. (C, F) C Animals have relatively balanced forward and reverse replay in the all training sessions regardless of performance. (F) In the system of the cache and consolidator, the replay rate is balanced by definition as forward replay involves learning of backward circuits and backward replay trains forward circuits.
Besides, since this system requires training for both forward and backward circuits in the consolidator, the replay rate for both directions is balanced by definition (see Figure 8F). These results account for the empirical finding that as the animal gets familiar with the task, the replay events occur less often (Shin et al., 2019; see Figures 8A, 8B, and 8C). To investigate the representation of tasks in the system, we sorted neurons’ normalized firing rates according to the distance between their positions with the highest firing rate and the starting point. Similar to some previous work (Ziv et al., 2013; Driscoll et al., 2017), the results (see Figure 7B) show that place cells–like structure emerged after training, and some neurons are biased to crucial positions such as the end of the cue and the turning point. We verified this by calculating the place representation density for different regions in the task (see Figure 7C). First, we categorized the task into four types of regions: the cue region, turning point region where the rat is about to make a decision, reward region, and other regions that do not belong to the former three. Next, for both left and right trials, we calculated the place representation density for each region type by dividing the region length by the number of cells that have the highest firing rate in that region. The results show that the density for ordinary regions was significantly lower than that of the three crucial regions.
4 Discussion
4.1 The R2N2 Model
The massive success of deep learning models and their similarities with biological neural networks at both the behavior and neural dynamics level have captured the attention of the neuroscience community on many different questions, with one of the most crucial being how the full error gradient learning in RNNs might be implemented in the brain due to its recurrent (lateral connections) and hierarchical (multilayer) connectivity nature.
To form episodic memories, brains need to encode protracted temporal sequences of states with high efficiency. RNNs can accomplish this task when trained with the biologically implausible machine learning algorithm BPTT. To build a biologically plausible alternative of BPTT, we divided the problem into three pieces and applied solutions to each to form a novel integrated learning system: R2N2. The first two pieces eliminate the need for an external record of the spatiotemporal activity pattern to perform offline learning. One piece establishes a way to reconstruct the input sequence activity in the backward direction. This was solved with an RNN using a reciprocating weight structure. The second piece establishes a way to reconstruct the recurrent layer activity. This was solved in the consolidator network with a pair of competing, complementary subnetworks. Finally, the third piece eliminates the need for either weight transport or reversible synapses to backpropagate error. This is solved using feedback alignment.
Consequently, R2N2 consists of two major RNNs: (1) the consolidator, a main network that is to be trained and that serves as a long-term memory store for inference, and (2) the cache, an auxiliary network that supports the training of the consolidator by performing one-shot learning with the input layer activity sequences and thus providing training samples to the consolidator. Besides the constraint of only local information at synapses, a seemingly obvious biological constraint in the brain, much of the existing work trying to solve this problem has focused on the online side as a workaround (see Whittington & Bogacz, 2019, for a comprehensive and in-depth discussion). For example, Whittington and Bogacz (2017), Han et al. (2018), Ororbia and Kifer (2020), and Song et al. (2020) employ the predictive coding approach to address the issue of complex global error propagation using stateful neurons combined with error-correcting units. Some models, like Scellier and Bengio (2017), take the energy-based method to address the error signal estimation issue. Other models focus on the biological realization of such learning algorithms. Guerguiev et al. (2017) use apical dendrites to perform error propagation. Given certain assumptions, these frameworks and their variants have proved to be mathematically equivalent to BP (Song et al., 2020). It's also worth noting that these models are not mutually exclusive. As Whittington and Bogacz (2019) highlight, they can converge to account for multifaceted learning in the brain. However, unlike these online alternatives to BPTT, R2N2’s underlying principle is that it has a backward phase, inspired by learning-related reverse-replay phenomena, to compute and assign credit to synapses in recurrent projections without violating the locality constraint.
The ability to compute the error feedback signal through numerous time steps may account for the advantage over previous localist supervised sequence learning models such as echo state network (Maass et al., 2002; Jaeger, 2002) and RFLO (Murray, 2019), as it is able to extend the error gradient further back in time.
Since the consolidator is gradually trained to perform reverse replay in a nearly perfect way, we further speculate that the consolidator could, in turn, train other similar consolidator instances in the cortex in a bootstrapped manner to implement distributed knowledge representation across distinct brain regions.
4.2 Biological Implications
Our model bears some similarity to the complementary learning systems (CLS) framework regarding the relative roles of the hippocampus and cortex. Typically the hippocampus is cast as the fast learner and the cortex the slower learner (McClelland et al., 1995), and more recently the role of replays has been incorporated into the framework (Kumaran et al., 2016). The R2N2 model suggests that the cache and consolidator functions (analogous to fast and statistical learning, respectively) may both be carried out within the hippocampal region, as well as between the hippocampus and neocortex. For example, the cache could be implemented in CA3 pyramidal neurons with recurrent lateral excitatory projections, which have the same arbitrary spatial association and pattern completion capability. With a trainer providing reversed sequence samples, the consolidator, which could be a circuit in the entorhinal cortex receiving inputs from CA3, could learn statistics in the data stream and solidify the short-term memory in the hippocampus to provide longer-lasting memories. When assembled together, this system could be triggered and tuned by reward-related signals as we did in the T-maze simulation to ensure the sequence being replayed and learned is rewarded and beneficial for the animal, which has been found to be the case in the hippocampus (Ambrose et al., 2016).
Recent work has similarly argued that both fast and statistical learning may take place within the hippocampus, with the entorhinal cortex to CA1 pathway providing statistical learning and the pathway from dentate gyrus to CA3 to CA1 providing fast learning (Schapiro et al., 2017). The R2N2 model is consistent with this anatomical delineation but does not exclude other possible functional mappings. Another implication provided by R2N2 is its utilization of the reverse-replay phenomenon found in the hippocampus (Foster & Wilson, 2006; Diba & Buzsáki, 2007), which is the key element that drives the whole model to learn sequences. However, many of the existing models (Haga & Fukai, 2018; Evangelista et al., 2020) for reverse and forward replay do not account for sequence learning at all or have a limited learning capacity. They are limited in that those models are built on handcrafted attractor connectivity patterns and thus usually have only one or a few spatially clustered neurons active at each moment, which is functionally equivalent to one-hot encodings and puts a limit ( the number of neurons) on their learning capacity. Instead, the consolidator in our model builds connectivity matrices for the reverse replay of arbitrary neuronal activation pattern sequences without any prior assumptions on the spatial distribution of synapse strengths, which is far more flexible and biologically realistic considering the high-dimensional nature of spiking activities in the brain. This also matches previous observations that the reverse replay in the brain is key for sequence learning (Diba & Buzsáki, 2007; Hemberger et al., 2019; Schuck & Niv, 2019; Vaz et al., 2020; Eichenlaub et al., 2020; Fernández-Ruiz et al., 2019; Michon et al., 2019). It implies that the hippocampal cortical system may be a neural instantiation of BPTT, and our proposed model might account for the underlying mechanism of reverse replay as well as its computational role in learning.
A possible neural realization of this consolidator-cache system might be the entorhinal-hippocampus communication system. First, there is empirical evidence showing it is the backward-running phase (reverse replay) rather than the forward-running phase (forward replay) in the hippocampus during immobilization that is crucial for the animal’s later performance after experiencing the task environment (Ambrose et al., 2016). Some other recent studies even show that prolonged reverse replay enhances task performance (Fernández-Ruiz et al., 2019), while destroyed reverse replay leads to failures in task performance (Michon et al., 2019). Second, R2N2 also suggests the importance of internal clock signals, as the consolidator and cache each oscillate to generate state updates. The importance of oscillating clock signals is consistent with several hippocampal cell types that show either the greatest or smallest activity levels at the peak of the theta cycle or a ripple (Klausberger & Somogyi, 2008). Our simulation results also reveal that during learning, the system shows similar characteristics and internal representations to place cells and replay rate effects observed in previous studies (Ziv et al., 2013; Driscoll et al., 2017; Shin et al., 2019).
Together, these observations imply the existence of offline backward learning in recurrent neuronal networks, which is conceptually isomorphic with the temporal unfolding process in BPTT. However, there also exist some limitations in the current implementation of R2N2. For instance, the speed of forward replay and reverse replay is the same in our simulation, while the reverse replay in the hippocampus is usually highly compressed in time compared with the forward-running process (Foster & Wilson, 2006).
The temporal symmetry of reverse replay in R2N2 is due to the same time constant we used in equation 2.1 for both forward and reverse modes. In future research, we will explore addressing this issue by diving into the level of spiking networks, since the speed change is caused by the spike interval reduction, which can be implemented with a discrete version of the consolidator that preserves the symmetry but in which the timing can be freely tuned. The speed of network evolution may also be controlled by a clocking mechanism similar to a CPU clock, in which the frequency of an oscillatory signal as in Figure 6 may control the speed of the network. Furthermore, the application of R2N2 in the T-maze tasks shown in this study is also limited in the sense that the model is trained only on rewarded trials. We do this purely to demonstrate the computation and modeling power of R2N2. In the future, there are several potential directions to evolve R2N2 into a more realistic model for learning in recurrent neural networks. For instance, we could adopt the pretraining fine-tuning approach to enforce the model first to learn task dynamics (unbiased prediction error reduction) regardless of the reward it received from the environment and then selectively train the model to build a reward preference. This approach is similar to the common strategy now used by many large autoregressive language models (Brown et al., 2020), which also formalize complex tasks as a simple sequence prediction task and have been proved to share the same computation principle as humans (Goldstein et al., 2022).
Another limitation exists in the cache network. Since now we use a Hopfield network to model the cache, it suffers from the linear storage capacity issue, preventing us from using it in tasks that have an extremely vast input space, like language modeling. A potential solution to this is to replace the classical Hopfield network with modern associative memory networks like Ramsauer et al. (2020) and Krotov and Hopfield (2020). They highlight the possibility of using an RNN to store continuous time and continuous state variables more efficiently than the Hopfield network, while still maintaining biological plausibility, by employing higher-order energy functions through recurrent interactions in RNNs. Besides, since the current implementation relies on FA to propagate error signals across different layers, it becomes hard to build spatially deep architectures with the consolidator, making it hard to use this framework in tasks with high expressive power requirement (Raghu et al., 2017).
4.3 Summary
In summary, this article provides a new possible approach for biological RNNs to learn sequential tasks: R2N2. R2N2 can memorize sequences in a one-shot way and transfer the experience to long-lasting synaptic changes through reverse replay. When it comes to cognitive tasks, the consolidator-cache system treats different types of tasks under a unified sequence prediction framework and solves it with rewards as a signal for reverse replay. The whole process is based on competitions between different synaptic projections—that is, competition between and in the consolidator and and in the cache, which does not require any nonlocal information or weight symmetries. When compared with other online alternatives to BPTT, R2N2 can better propagate the temporal information in the training phase and thus has better performance in some tasks. This computational superiority is driven by the use of reverse replay as an error-propagating mechanism, which also raises several experimentally testable predictions for future research on sequence learning in the brain. First, an imbalanced synaptic projection (e.g., decreased excitatory level in one projection) between different neural assemblies may lead to impaired reverse replay since in our model, the reverse replay in the consolidator relies on competition of different projections between neuronal groups. Second, as in the cache, an internally generated pseudo-periodic signal is responsible for the transition between firing pattern attractors, one may expect to see induced reverse replay with external periodic signals acting on the gating neurons for the CA3 network or corrupted reverse replay with aperiodic perturbations.
5 Conclusion
In this article, we develop a novel learning system, R2N2, to address the long-standing question of biologically plausible learning of RNNs. It is composed of two components: a fast RNN that stores and replays experiences (cache) and a statistical learning RNN (consolidator).
We have shown that R2N2 is capable of running itself in a reverse order and shows improved performance relative to other models based on recurrent weight updates computed in the backward phase in various tasks. In addition, by applying this model to a rat navigation task we showed its power as a whole in sequence learning and that it captures several observed phenomena in previous experiments such as balanced replay and place cell encodings.
Appendix A: Forward Simulation and Hidden-Layer Error Computation
We include a leaky term in the the above update equations to maintain consistency with the leaky nature of the activity updates in equation 2.1.
When connecting with the cache network, since we’re taking a sequence prediction paradigm, will be the activation sequence of the cache, which can perform reverse replay itself to provide the reversed desired output sequence required in equation A.5.
Appendix B: Pseudocode for the R2N2 Architecture
R2N2 has two main components: the consolidator and the cache. When learning sequences, the sequence information is first stored in a cache, where a fast erasable and nongeneralizable memory is formed. Then through reverse replay, the cache provides the consolidator with the task information, which the consolidator then learned. The temporal procedure for the R2N2 learning is shown in algorithm 1.
Appendix C: Hyperparameter Settings
This appendix presents some hyperparameters we used for experiments in the main results.
Sequence Memorization Task for the Consolidator Network.
Hyperparameter . | Value . |
---|---|
Number of neurons (group A group B) | |
Learning rate | 0.005 |
10 | |
Time steps | 19 |
Hyperparameter . | Value . |
---|---|
Number of neurons (group A group B) | |
Learning rate | 0.005 |
10 | |
Time steps | 19 |
Task.
Hyperparameter . | Value . |
---|---|
Number of neurons (group A group B) in the consolidator RNN | 32 32 |
Number of hidden neurons in ESN RNN | 64 |
Number of neurons in RFLO RNN | 64 |
Number of neurons in BPTT RNN | 64 |
Learning rate (shared by all RNNs) | 0.001 |
(shared by all RNNs) | 10 |
Hyperparameter . | Value . |
---|---|
Number of neurons (group A group B) in the consolidator RNN | 32 32 |
Number of hidden neurons in ESN RNN | 64 |
Number of neurons in RFLO RNN | 64 |
Number of neurons in BPTT RNN | 64 |
Learning rate (shared by all RNNs) | 0.001 |
(shared by all RNNs) | 10 |
One-Shot Sequence Memorization Task for the Cache Network.
Hyperparameter . | Value . |
---|---|
Number of neurons | 500 |
Sequence length | 570 |
6 |
Hyperparameter . | Value . |
---|---|
Number of neurons | 500 |
Sequence length | 570 |
6 |
Declaration of Competing Interest
We declare no competing interests.
Acknowledgments
We thank Ehren Newman for extensive and helpful discussions and comments on the manuscript.
Data Availability
The code for the simulations in this article can be found at https://github.com/CogControlLab/R2N2.