Memory is a fundamental part of computational systems like the human brain. Theoretical models identify memories as attractors of neural network activity patterns based on the theory that attractor (recurrent) neural networks are able to capture some crucial characteristics of memory, such as encoding, storage, retrieval, and long-term and working memory. In such networks, long-term storage of the memory patterns is enabled by synaptic strengths that are adjusted according to some activity-dependent plasticity mechanisms (of which the most widely recognized is the Hebbian rule) such that the attractors of the network dynamics represent the stored memories. Most of previous studies on associative memory are focused on Hopfield-like binary networks, and the learned patterns are often assumed to be uncorrelated in a way that minimal interactions between memories are facilitated. In this letter, we restrict our attention to a more biological plausible attractor network model and study the neuronal representations of correlated patterns. We have examined the role of saliency weights in memory dynamics. Our results demonstrate that the retrieval process of the memorized patterns is characterized by the saliency distribution, which affects the landscape of the attractors. We have established the conditions that the network state converges to unique memory and multiple memories. The analytical result also holds for other cases for variable coding levels and nonbinary levels, indicating a general property emerging from correlated memories. Our results confirmed the advantage of computing with graded-response neurons over binary neurons (i.e., reducing of spurious states). It was also found that the nonuniform saliency distribution can contribute to disappearance of spurious states when they exit.
The brain is able to encode and store representations of the external world through memory. Electrophysiological recordings in various brain regions have suggested that human memory involving the neocortex for short-term memory and the hippocampus for long-term memory (Jensen & Lisman, 2004; Bird & Burgess, 2008) has similar structure to computer memories. However, memory mechanisms are fundamentally different: while classic computer memories rely on static approach, biological memories are based on firing activities of neurons driven by internal dynamics such that memories are represented as stable network activity states called attractors (Hopfield, 1982; Amit, Gutfreund, & Sompolinsky, 1985a).
Attractor networks have been central to neuronal models of memory for more than three decades for both experimentalists and theorists (Amit, Gutfreund, & Sompolinsky, 1985b; Dayan & Willshaw, 1991; Chechik, Meilijson, & Ruppin, 2001; Pantic, Torres, Kappen, & Gielen, 2002; Amit & Mongillo, 2003; Poucet & Save, 2005; Tsodyks, 2005), as they possess such an appealing feature that the underlying attractor dynamics theory can provide a unified description of several aspects of memory, including encoding, storage, retrieval, and long-term and short-term memory. The hypothesis of attractor dynamics is supported by the persistent activity observed in the neocortex and hippocampus during memory experiments (Miyashita, 1988; Ericson & Desimone, 1999; Bakker, Kirwan, Miller, & Stark, 1999). It was also shown that hippocampal region CA3, characterized by heavy recurrent connections and modifiability, could be an anatomical substrate where the attractor networks reside (Treves & Rolls, 1994; Lisman, 1999; Moser, Kropff, & Moser, 2008).
Attractor networks perform a sparse encoding of memories by mapping a continuous input space to a sparse output space, which is composed of a discrete set of attractors. When a stimulus pattern close to a stored pattern is presented to the system, the network states are drawn by the intrinsic dynamics toward the attractor that corresponds to the memorized pattern. One important feature imprinted with the attractor networks is that memory retrieval depends on the attractor landscape, which reflects the capability of pattern completion (recall the original input from a degraded version) and pattern separation (without mixing up the other stored memories) of high-level functions. The formation of the attractor landscape is achieved by Hebbian-type synaptic modifications (Hebb, 1949) in which each synapse is involved in the storage of multiple items. Obviously this common synaptic representation implies interactions between memories stored in the same network. The associative networks studied in previous work are successful in capturing some fundamental characteristics of associative memory, such as memory capacity and associative capability (Amit et al., 1985a; Mueller & Herz, 1999; Chechik et al., 2001).
Most previous studies on associative neural network models suffer from two limitations. First, the models are restricted to networks with binary (0/1 or −1/1) neurons or graded response (e.g., sigmoid functional) neurons, which cannot capture the features that biological neurons are seldom firing with saturations. The steady states of such networks can also arise from saturation, thus leading to spurious attractor states. Second, in these models, the patterns are often assumed to be uncorrelated so that the interactions in the resulting memories are minimal. Indeed, uncorrelated patterns are not unique to the memory. Some direct evidence has emerged recently to show that attractor dynamics was used to encode and retrieve memories of correlated shapes about their environments (Wills, Lever, Cacucci, Burgess, & O'Keefe, 2005; Leutgeb et al., 2005). Wills et al. (2005) recorded the firing activity of hippocampal place cells in the brains of freely moving rats exposed to a square or circular environment. When the rats explored the circle and the square environment initially, the responses of the ensemble activity of hippocampal cells were different so as to differentiate between the two environments. To see how the neuronal representations changed when the animals were in the intermediate environments, the rats were then tested in a set of environments of intermediate shapes between the circle and the square. Surprisingly, most cells fired in a pattern that was either circle-like or square-like, and no gradual change in the representations was observed; the neurons abruptly and simultaneously switched from one activity pattern to the other at the same midpoint between the circle and the square. Wills et al. explained their results as a strong support of the presence of two distinct attractors in the hippocampus network corresponding to the circular and square environments. Many of the hippocampus neurons changed their representations sharply and coherently at a certain position along the morph sequence, where basins of attraction of the two attractors meet. Similar experiments were performed in Leutgeb et al., (2005), but with different results. Instead of observing two distinct activity patterns, Leutgeb et al. reported that the neuronal representations of the subsequent environments in the morph sequence gradually change. The results indicate the flexibility of memory representation in attractor networks and put forth new challenges to provide a unified description using attractor networks' modality.
The theoretical work of Blumenfeld, Preminger, Sagi, and Tsodyks (2006) was the first attempt to tackle such challenges; they showed that attractor network models allow memory representations of correlated stimuli that depend on learning order. In their study, the Hopfield network (Hopfield, 1982) was used to study the attractor dynamics of long-term memory, where the states of neurons have binary values (+1 for active neuron and −1 for inactive neuron). A novel form of facilitated learning was proposed to explain that different learning protocols could be the reason for the incongruous experimental observations.
In this study, we aim to extend the work on associative memory by using more biologically plausible network models and exploring the role of saliency weights on the memory dynamics for the attractor networks. Our analytical result is consistent with the previous study that when network inputs are correlated, this mechanism results in overassociations or splitting of attractors. The model predicts that memory representations should be sensitive to the saliency weights formed by the learning mechanism, compatible with recent electrophysiological experiments on hippocampal place cells. Section 2 introduces the associative memory model. Section 3 presents the stability conditions of the attractor networks. Section 4 presents the memory dynamics analysis and the analytical solution for the correlated binary patterns and gives numerical examples. More general cases for variable coding level and nonbinary patterns and disappearance of spurious states are studied in sections 5 and 6. Finally, these results are discussed in section 7.
2. The Model
Hebbian synaptic plasticity has been the major paradigm for studying memory and self-organization in computational neuroscience. Within the associative memory framework, many Hebbian learning rules have been suggested in the neural network literature. However, the relationship between the input patterns and the landscape of attractors by using the Hebbian regime has not been established, and no procedure can robustly translate an arbitrary specification of an attractor landscape into a set of weights. One of the challenges is that knowledge in the network is distributed over connections; each connection participates in specifying multiple attractors. Therefore, developing neural networks as a practical memory device is still an intriguing problem that continues to attract interest (Zemel & Mozer, 2001; Siegelmann, 2008).
Through the Hebbian synaptic modification mechanism, correlations within patterns to be memorized are encoded in the synaptic weights. By this procedure, multiple patterns can be implemented as fixed-point attractors of the network dynamics. Starting from an initial state close to one of the stored attractors, the system dynamics relaxes to this attractor and thus retrieves the stored pattern. Associative memory storage in a dynamic system requires the existence of multiple attractors. The attractors are imprinted by the Hebbian rule.
In equation 2.3, the Hebb-like terms (ξμi − c)(ξμj − c) are modulated by a corresponding saliency factor s(μ). This approach is inspired by the fact that the patterns presented for learning may acquire different saliency captured by factor s(μ), and a large value implies a large saliency on the weights.
3. Stability Conditions
The model consists of unsaturating responses of neurons; thus, its stability has to be maintained by suitable interactions between the ensemble of neurons. The network dynamics has been analyzed in a series of theoretical studies (Hahnloser, 1998; Wersing, Beyn, & Ritter, 2001; Yi, Tan, & Lee, 2003; Tang, Tan, & Teoh, 2006, Tang, Tan, & Zhang,2005), where conditions for fixed-point attractors and limit cycles were examined. It is revealed that the stabilities (monostability and multistability) are only dependent on the synaptic strengths.
4. Memory Dynamics with Saliency Weights
In the previous section, the stability conditions do not make any predictions on the memory retrieval process. Now we will show the memory dynamics and examine the role of saliency weights on the emergence of attractor states. We follow a procedure similar to that of Blumenfeld et al., (2006), based on the analysis of the overlap (a variable measuring the similarity between the network activity states and the stored patterns).
The network dynamics converges to a stable steady state, , which is an attractor of the network. In contrast to the Hopfield network of binary neurons, where the retrieved memory is identified by the overlapping value (Blumenfeld et al., 2006), the retrieved memory of our model is identified by the maximal overlapping value over all the patterns, that is, , since the neuronal responses are unsaturated to binary states.
4.2. Uniform Saliency Yields a Single Attractor.
First, we show how the uniform saliency weights affect the attractor states. A number of 17 morphed patterns formed as in Figure 1 are encoded into the synaptic weights by using equation 2.3, and the network consists of 32 neurons. In the simulation, we set s(μ) = 0.6 and external inputs . The network evolves from random initial configurations according to a parallel updating dynamics. It is shown that a single attractor is reached (see Figure 3A), which corresponds to the midpoint in the morph sequence μ = 0.5. In accordance with the theoretical prediction made in the previous section, the pattern ξ0.5 takes the maximal overlapping value (Figures 3B and 3C) among all the morph patterns when the network stabilizes.
4.3. Nonuniform Saliency Allows Multiple Attractors.
Next, we show the effects of nonuniform scaling of Hebbian terms on the attractor landscape. In the simulation, we choose a saliency weights distribution s(μ) = 6(μ − 0.5))2. The network starts from random initial states. When stabilized, the activities of the network converge to two distinct attractor states (see Figure 4A). In the two attractor states, the pattern μ = 0 (denoted by the square in the solid line) and pattern μ = 1 (denoted by the circle in the dashed line) takes the maximal overlapping values, respectively (see Figure 4B), indicating that the retrieved memories bias toward one of the edge patterns depending on which is favored by the initial state (see Figures 4C and 4D). This result also verifies the prediction of the developed theory.
The above results are compatible with that of the theoretical work (Blumenfeld, Preminger, Sagi, & Tsodyks, 2006) and also consistent with the experiments of Wills, Lever, Cacucci, Burgess, & O'Keefe (2005) who interpreted their results as evidence for the presence of two distinct attractors corresponding to the circular and square environments. Instead of observing two distinct place-dependent activity patterns, Leutgeb et al. (2005) reported that the similarity between representations of the subsequent environments in the sequence gradually decreased. In their theoretical contributions, Blumenfeld et al. (2006) explained that different phenomena could result from different learning protocols.
5. Different Coding Schemes
Our analysis in the section 4 leads to a solution similar to that of Blumenfeld et al. (2006) based on the assumption that the stored memory patterns are binary vectors with , which confirmed that the saliency weight (scaling of Hebbian terms) has the same influence on the formation of the attractor landscape in binary neural networks and linear threshold neural networks. Now an interesting question arises: Does the solution also hold for other cases where ?
Since real neurons behave very differently from binary units, their activity can be approximated better by an analog than a binary variable. The biological plausibility and computational advantages of the graded-response networks with threshold linear units have been elucidated in previous analytical studies (Treves, 1990a, 1990b; Treves & Rolls, 1991; Roudi & Treves, 2003). Though with linear threshold neurons, the memory patterns encoded in synaptic weights can be taken to be binary vectors, as we did in the above studies, they can also be taken to be drawn from a distribution with several discrete activity values or from a continuous distribution. Hence, the second question arises: Does the conclusion also hold for graded-response networks storing nonbinary memory patterns?
As Treves and Rolls (1991) noted, the ternary distribution offers a good prototypical example to probe into the features emerging with nonbinary structures, and the exponential distribution is biologically meaningful because it is consistent with the continuous distribution demonstrated by experimental data.
5.1. Variable Coding Levels.
As in the extensions of Hopfield model, we thus allow for the parameter c to differ from the value c = 0.5. Figure 5 demonstrates the morphed binary memory patterns for different coding levels (c = 0.25 and 0.75), where nine patterns are encoded into a network with 32 neurons. In simulations, the network receives the same uniform background input b.
We vary different values of c to examine retrieval performance and find that the saliency weights have a consistent effect on the attractor landscapes. The simulation results are given in Figure 6 for uniform saliency and nonuniform saliency distribution. For c = 0.75, the results are given in Figure 7 for both cases.
5.2. Nonbinary Memory Patterns.
To understand the effect of increasingly structured memories, it is necessary to consider nonbinary structured memories. The ternary patterns, representing the simplest nonbinary structure, become a natural choice for studying the features that emerged with nonbinary patterns (Treves, 1990a). Hence, we are prompted to use the ternary patterns in our numerical studies.
A sequence of correlated patterns is generated according to the ternary distribution in equation 5.1. Figure 8 shows the raster plot of nine correlated ternary patterns with the coding level c = 0.25, where the neural activities are indicated by gray scale.
In the numerical simulations, the network evolves from random initial configurations that are positively correlated with the stored patterns. We also vary the coding levels. The effects of uniform saliency and nonuniform saliency weights on the attractor landscape are investigated, and the results are in accordance with that of the binary cases. It is demonstrated that the uniform saliency weight induces a single attractor in the middle of the morphed sequence, while the nonuniform saliency drives the network to retrieve one of the two extreme patterns. Figure 9 shows an example of such results for c = 0.25.
6. Disappearance of Spurious States
In attractor network models, individual memory (a pattern of activity distribution) is encoded as an attractor in an energy landscape. Whether the stored memory can be retrieved safely without being trapped by dynamical hurdles depends critically on two aspects: the smoothness of the energy landscape and the existence of other attractor states (namely, spurious states). Waugh, Marcus, and Westervelt (1990) showed analytically that neural networks with analog neurons have a computational advantage over their binary-neuron counterparts, as analog-valued neurons contribute to smoothing the energy landscape and thus greatly reduce the number of spurious states. The question of mixture states, related to the other aspects of the dynamical hurdles, was studied analytically in Roudi and Treves (2003), where the mean field solutions and their stability conditions were developed by considering an associative network with threshold-linear units. It was revealed that symmetric n-mixture states, n = 2, 3, …, are almost never stable, and only with a binary coding scheme can a limited region of the parameter space be found in which either 2- or 3-mixtures are stable. This property demonstrates further the advantage of computation with graded-response neurons, as the stability region of the spurious states is eliminated by nonbinary coding schemes that are naturally endowed with the network of graded-response units.
It remains interesting to understand when the same binary coding scheme is used whether the stability of mixture states in a graded-response network is more restricted than that in binary valued network, and whether the saliency weights can contribute to smooth the attractor landscape (or reduce the mixture states).
6.1. Networks of Binary Neurons and Graded-Response Neurons.
First, we consider 2-mixture states in a network of binary neurons and a network of graded-response (linear-threshold) neurons, respectively. The encoded memory patterns are generated using the same binary coding scheme described in equation 5.1: N = 1000, P = 3. For large N, the patterns are uncorrelated as .
We run the networks from the same initial configuration, which is correlated equally with two patterns. The simulation results are shown in Figure 10. For the same saliency distribution, the binary network typically evolves toward one of the spurious states (see Figure 10, left). In contrast, the spurious state disappears in the graded-response network, as shown in Figure 10 (right), in which one of the overlaps tends to dominate, reaching the corresponding attractor, whereas the other one tends to zero. This result illustrates that given the same saliency distribution, different attractor landscapes can be expected in networks with binary neurons and linear-threshold neurons. It is also in agreement with previous findings that the free-energy landscape is smoothed by using analog neurons (Waugh et al., 1990; Roudi & Treves, 2003).
Next, we investigated the effect of different saliency distribution on mixture states in the linear-threshold network. Our computer simulations find that when the mixture states exist, a nonuniform saliency distribution contributes to eliminating them, as shown in Figure 11. In the simulations, the 3-mixture states are studied, and the initial states are correlated equally with three stored patterns. With a uniform distribution, a mixture state is reached (two overlaps tend to grow; see Figure 11, left), while with a nonuniform distribution, a single overlap dominates the others (see Figure 11, right) implying that the corresponding memory pattern is retrieved.
6.2. Related Work.
Through extensive numerical studies on more general cases, including variable coding levels and nonbinary patterns, it is shown that our solution is not restricted to the particular case for gradually changing binary memories. In fact, reshaping of the attractor landscape through a different saliency distribution is a robust property emerging from correlated memories stored in the graded-response neural networks. Our results also confirmed the computational advantages of graded-response neurons (i.e., eliminating mixture states). Interestingly, a nonuniform saliency can also contribute to the disappearance of spurious states when they exist.
From the modeling aspect, Treves's model, equation 6.3, presents a more attractive feature in encoding the memories of pattern activities, being more flexible than the LTA model, which was also studied in an application to analog associative memories (Tang et al., 2006). As in the former, the neural activities are modulated separately by the b term, an approximation of the lumped inhibition effects, and the synaptic weights work as information encoding only. In contrast, the LTA model mixed the roles of memory encoding and activity (stability) control into a single term of synaptic weights, without distinguishing excitatory and inhibitory contributions. Though given the difference in modeling local inputs, similar memory retrieval properties have been observed for the two models (see Figures 10 and 1 in Roudi & Treves, 2003) on uncorrelated memories. Hence, our results can be considered complementary to such studies on associative memories on the networks with graded-response neurons.
In this contribution we studied an attractor network with a biologically realistic model for the associative memory of correlated patterns. Through theoretical analysis, we examined the mechanisms for encoding and retrieval memories and the dynamics of memory representations. It was shown that saliency weights can dramatically affect the resulting memory representations. The model identified memories as attractor states of network with weights adhering to a Hebb-like process of long-term synaptic modifications and retrieved via internal dynamics of patterned network activity. The results are in line with the recent experimental observations by Wills et al. (2005) and extend the previous theoretical contributions by Blumenfeld et al. (2006). This model allows the merging and splitting of attractors, which leads to overassocications (several memory merging into one) or complete associations (retrieving all memory items), and supports the flexibility of the attractor representations through novelty facilitated learning mechanisms proposed by Blumenfeld et al. (2006). Based on their novelty-facilitated learning scheme, the incongruous findings in Wills et al. (2005) and Leutgeb et al. (2005) could be formulated as the results of different learning order.
Our model, representing a large class of recurrent network models, will provide a convenient framework for studying long-term memory and its relation to pattern separation and pattern completion. The model is also useful for investigating the design of the weights for different input patterns and the effects of learning protocol on memory representations. For the system in the regime of activity-dependent plasticity, if given enough exposure time and a stable sensory environment, it would create a stable attractor state for each item with an equal attractive basin. The novelty-facilitated or history-dependent learning strategy allows reshaping the profile of saliency weights, and thus strengthens or weakens the impact of individual memorized items. It would be expected that the attractive basins are enlarged or shrink, resulting in the merging or splitting of memories.
Appendix: Analysis of Attractor States
The following analysis gives the procedure for finding attractor states through the overlapping dynamics mμ(t).
We are grateful to the anonymous reviewers for their helpful comments and constructive suggestions.
Throughout this letter, the coding level is defined as the mean level of activity of the network (Treves, 1990a; Pantic et al., 2002). In some previous work it is also defined as a fraction of firing neurons (e.g., Chechik et al., 2001). For binary 0/1 neurons, they are equivalent; otherwise they are not.
A different overlap formulation was used there: , is the mean of the network states. It is noted that .