Recurrent neural architectures having oscillatory dynamics use rhythmic network activity to represent patterns stored in short-term memory. Multiple stored patterns can be retained in memory over the same neural substrate because the network's state persistently switches between them. Here we present a simple oscillatory memory that extends the dynamic threshold approach of Horn and Usher (1991) by including weight decay. The modified model is able to match behavioral data from human subjects performing a running memory span task simply by assuming appropriate weight decay rates. The results suggest that simple oscillatory memories incorporating weight decay capture at least some key properties of human short-term memory. We examine the implications of the results for theories about the relative role of interference and decay in forgetting, and hypothesize that adjustments of activity decay rate may be an important aspect of human attentional mechanisms.
During the past several years, there has been growing interest in developing oscillatory neural network models for a variety of different tasks. These models consist of recurrent neural networks whose dynamics are characterized by persistent learned or designed rhythmic activity. For example, oscillatory networks simulating biological central pattern generators have been studied both to investigate the mechanisms of locomotion in a number of animal species (Ijspeert, 2001) and as the basis for controlling movements of legged robots (Kimura, Fukuoka, Hada, & Takase, 2002). They have also been used to test hypotheses about the biophysical mechanisms underlying human electroencephalographic (EEG) activity (Nunez & Srinivasan, 2006) and for a variety of image processing tasks such as segmentation (Chen & Wang, 2002).
In this letter, we are concerned with the problem of developing oscillatory neural network models of short-term memory, the human memory system that retains information over brief time intervals (on the order of seconds) and has substantial capacity limitations, in contrast to the relatively limitless capacity of more permanent long-term memory (Baddeley, 2000; Cowan, 2001). For example, current evidence suggests that human short-term memory capacity is approximately four items (Cowan et al., 2005). In neurocomputational terms, long-term memories are typically viewed as being represented by the weights on connections in a neural network, while short-term memories are often represented by sustained activity patterns. Thus, in many neural models of associative memory, learned patterns are stored in a weight matrix. An initial network activity state, either assigned or determined by an input pattern and consisting of a partial or noisy activity pattern that is reasonably close to a previously stored memory, serves as a query as to the current contents of memory. This initial state evolves over time to a fixed-point attractor state (equilibrium state) that ideally corresponds to a recalled memory pattern that is active in short-term memory (Amit, 1989; Haykin, 1999).
While neural networks using fixed-point attractor states can be effective as memory models and have generated substantial theoretical and experimental analysis, they are typically limited to maintaining a single pattern at a time in short-term memory. Further, it is difficult to relate activity in these models to neurobiological systems where rhythmic activity, rather than fixed-point attractor states, is the rule (Buzsaki, 2006). In response to these and other concerns, a number of oscillatory memory models have been created and studied during the past several years. In these models, stored or recalled memory patterns are typically represented as rhythmic network activity in which multiple memory patterns are simultaneously active over the same neural substrate. This is possible because the network's activity oscillates between activity states representing different stored patterns.
A remarkably diverse set of oscillatory memory models exists today. Some models are based on theories about the mechanisms underlying theta and gamma activity in specific brain regions such as the hippocampus or neocortex (Hasselmo, Bodelon, & Wyble, 2002; Ingber, 1995; Lisman & Idiart, 1995). For example, Koene and Hasselmo (2007, 2008) use a model of entorhinal cortex based on integrate-and-fire neurons that exhibit persistent spiking to show how theta rhythm may help explain order and capacity effects in recall. Other models that also use individual spiking neurons are based on more abstract architectures (Raffone & Wolters, 2001), while still others have adopted a higher-level approach such as Wilson-Cowan oscillators (Chakravarthy & Ghosh, 1996; Hayashi, 1994; Wang, 1995). Many of these past oscillatory models are quite complex (see Howard, in press, for a review).
A particularly simple and elegant approach to creating oscillatory short-term memory models is based on minimally modifying Hebbian associative memories having fixed-point attractor states so that they become oscillatory. In the following, we refer to all such models as simple oscillatory memories. For example, Horn and Usher (1991) produced a simple oscillatory memory by introducing dynamic thresholds into Hopfield networks (Hopfield, 1982; Amit, 1989). With this approach, whenever a node has a particular activity level ±1, the threshold of that node gradually changes so that eventually the node switches its activity level to the complementary value. When such a network is presented with an input that is a superposition of multiple stored memories, it is found to oscillate between activity states that represent these individual memories, thereby indicating its recognition and recall of the memories in parallel. Similar behaviors have been produced based on Hopfield networks modified to use dynamic synapses (Pantic, Torres, Kappen, & Gielen, 2002) or negative feedback with asymmetric connection weights (Brown & Collins, 2000).
Simple oscillatory memories derived from Hopfield networks are intriguing in their simplicity as models of short-term memory. In this letter, we present a simple oscillatory memory based on dynamic thresholds as used by Horn and Usher (1991), except that with our approach, the model is extended to include rapid decay of connection weights. This weight decay allows the network to have a dependency on the order in which it sees input stimuli, something that is not the case with classical Hopfield networks. Further, it lets us examine the relative roles of interference and decay as mechanisms underlying forgetting. To evaluate the model, we use data that we collect from human subjects performing a running memory span task. This task involves rapidly presenting a sequence of stimuli that are to be recalled subsequently by the subject. We show that our model can demonstrate recall performance similar to the behavioral data that we obtain: a capacity limit of approximately three items and a prominent recency effect. To our knowledge, no previous work has examined how the performance of simple oscillatory memories compares to real-world behavioral data collected from human subajects on a short-term memory task as we do here.
2.1. Model Description.
Short-term memory is modeled using a fully connected network of 35 linear threshold nodes similar to many past neural network models. There are two possible values for the activity state ai of each node i, −1 and 1, and these values change over time, governed by the activity state of the network and connection strengths, as described below. Memory patterns to be stored are essentially arbitrary. However, to facilitate visual interpretation of the network's state, the nodes in the network are displayed as a 7 × 5 array, and the memory state is taken to correspond to a specific letter (A–Z), each represented so as to pictorially resemble the letter visually, as illustrated in Figure 1.
2.2. Measuring Stimulus Patterns Actively Retained in Short-Term Memory.
The process of testing for the retention of specific letters that have just been presented sequentially to the model is done as follows. The network is started in a random initial activity state , and then the network's state is allowed to evolve according to the dynamics described above for a 200 time step test period. Because of the changing nature of the thresholds, the network does not reach a fixed state during the test period, but instead typically oscillates between states that are at or close to some of the activity patterns that were shown to it during the simulated running memory span task. During this testing period, the similarity measure sλ for each of the stimuli λ is recorded at each time step.
In the simulations that follow, we label a specific stimulus or letter λ that was presented to the network as being actively present in memory, and thus recalled by the model, only if sλ reaches a value of 1.0 during the 200 time step testing interval. This means the stimulus must be perfectly recalled by the network during this testing period at least once. We use this strict criterion because the similarity measure of some letters that are similar to each other can tend to rise and fall synchronously. For example, when the current state is an exact match for the letter T (i.e., state representing letter T is present with sT= 1.0), then letter I would have a small but significant sI value simultaneously due to its overlap with T.
2.3. Behavioral Data.
We collected behavioral data on a running memory span task for comparison with the model's performance, roughly following the designs of Pollack, Johnson, and Knaff (1959) and Bunting, Cowan, and Saults (2006). Our human experimental data were obtained from 38 adult subjects (13 females, 25 males, mean age 25) who were shown a rapidly presented, two per second sequence of 12 to 20 randomly ordered stimuli under computer control and were asked to remember the most recent six items in the order of their presentation.3 Subjects indicated the stimuli that they recalled by clicking on a subsequent graphical display of all possible stimuli. Recall was measured by assessing accuracy of recall as a function of stimulus position. A stimulus was counted as accurately recalled only if (1) it was presented in the retention window (i.e., the last six items, depending on instructions), (2) it was correctly recalled by the participant, and (3) it was recalled in the same position as it was presented (counting backward from the final, most recent stimulus). Any item presented prior to the retention window that was recalled was considered a false positive, as was any item that was not presented at all but was recalled. Any item from the retention window that was not recalled was considered a miss. Any item that was presented in the retention window but recalled in the incorrect position was also counted as wrong (e.g., if the last six items presented were “1 2 3 4 5 6” and the subject recalled “4 3 2 6 5 1,” then only “5” was counted as correct). Twelve trials were conducted for the task, with each subject requiring roughly 20 minutes per subject. No time restrictions were placed on subject responses. All 38 subjects completed the task.
Figure 2 shows a representative example of how, following presentation of a sequence of letters, network activity measured using sλ oscillates between memory states of some of the presented letters, indicating the retention of those letters in short-term memory. In this case, where a decay rate of kd= 0.2 is used, the sequence of stimuli from first to last was M, L, X, N, E, F, H, and B. Subjectively, one can see in Figure 2 that the oscillations associated with the earlier letters have relatively small amplitudes (only partial matches), while those of the more recently presented letters are more prominent. Using our criterion that a letter λ is retained in short-term memory (recalled) if and only if its activity pattern occurs exactly (sλ= 1.0) during the test period, the letters E, H, and B would be labeled as recalled in this example.
Figure 3 shows when the oscillations in sλ values peaked at 0.8 or above for the eight stimuli of this same example during just the middle of the testing period. Peaks in the oscillations associated with the different recalled letters (E, H, and B) alternate with each other, allowing the three remembered letters to be retained in short-term memory simultaneously, unlike with fixed-point neural associative memories. Note that the oscillations are irregularly spaced and not periodic.
Figure 4 displays the fraction of letters recalled (maintained as oscillatory states in short-term memory) for stimuli in each position, averaged over 1000 random input sequences of eight different stimuli. When there is no decay (kd= 0), the fraction of letters recalled is largely independent of a letter's position in the input sequence, resulting in a flat curve. This result can be related to fixed-point attractor networks where the final weights, and hence network performance as an associative memory, are independent of the order in which the input patterns are stored. Further, recall of any observed letter in this case is quite poor, as would be expected; for a network of the size used here, the number of stimuli used far exceeds the expected memory capacity of an equal size fixed-point attractor associative memory (Hopfield, 1982; Amit, 1989). With no decay, the interference between the stored memory patterns is excessive, preventing almost all letters from being retained effectively in short-term memory. In contrast, when the decay rate is very large (kd= 0.5), a very steep curve is seen (see Figure 4), with the single most recently presented (final) letter always being retained. This occurs because the weight changes from previously stored stimulus patterns quickly dissipate, and even recently presented earlier stored patterns no longer interfere significantly with the final letter's retention. In this case, the fraction of the first six presented letters that are recalled is almost zero, reflecting that they have been erased from memory. This can be contrasted with the roughly 20% recall rate of presented letters when there is no decay at all. Intermediate behaviors are seen for intermediate values of the decay rate, as is shown for k= 0.1 in Figure 4.
Figure 5 shows, for each decay rate value used in the results of Figure 4, the mean total number of stimuli recalled (memory capacity), averaged over the 1000 stimuli sequences tested. These results show that memory capacity is highest with moderate, intermediate decay rates. If decay is very low, older input stimuli are substantially retained, and the attempt to store an excessive number of memory patterns results in too much interference. If decay is too high, even recently observed stimuli are lost. Maximum recall of observed stimuli occurs when there is a balance between information lost due to decay and interference. The appendix discusses the issue of why an intermediate value of the decay rate tends to maximize memory capacity from a more analytic point of view.
How does position-specific recall by the model compare with that exhibited by human subjects? The curves in Figure 6 display experimental results when the human subjects are instructed to remember the six most recent stimuli. As also plotted in Figure 6, the model is able to approximate closely the results found with the human subjects when it is run with a 0.1 decay rate with input sequences of length six. The mean total memory capacity for recall of six stimuli was 2.73 items for human subjects versus 2.69 items for the model. Thus, both the model's total memory capacity and its position-specific stimulus retention patterns were in close agreement with those seen with the human subjects.
As described in section 2, human subject data were collected under conditions where the subjects were presented with stimulus sequences of 12 to 20 items and were told to recall only the last 6, regardless of the actual sequence length. To explore how the model was affected by seeing more stimuli than were to actually be recalled, we repeated the simulations as above but now using 20 stimuli in each sequence, even though recall of only the last 6 stimuli was of interest. Figure 7 shows how the model's performance in position-specific recall can, with an adjusted decay rate, still reasonably match the human subject results. The model results here are again averaged over 1000 different random sequences of letter stimuli. We found that a modestly higher decay rate (0.185) provides an approximate match to the six-back human data. The model's mean total memory capacity for recall of six stimuli was now 2.28 items (versus 2.73 items for human subjects).
In this work we introduced a simple oscillatory memory model of short-term memory, examined some of its properties, and compared its behavior to that of human subjects on a running memory span task. Our model's dynamics are intrinsically oscillatory due to the use of rapidly varying threshold values, and recall of an item is dependent on the time elapsed since it was observed due to the use of rapidly decaying weights. Unlike with many past neurocomputational models of memory, we assessed recall by initializing the model's activity to a random state rather than by initializing it to a noisy or partial stored memory pattern, or by biasing the network's dynamics by applying an external input pattern that represents a noisy or partial stored pattern. We found that when moderate decay rates were used, this approach resulted in a short-term memory capacity of two to three items, a value that is comparable to what has been observed in experimental studies by others (Baddeley, 2000; Cowan, 2001; Cowan et al., 2005) and that matches the memory capacity that we observed in a group of human subjects performing a similar running memory task. The model also showed a prominent recency effect, as would be expected given the use of weight decay and as is also seen in human subjects.
It should also be noted here that this model is intended to simulate short-term memory processing only, and so it is not intended to address any processes by which semantic or other long-term memory information is accessed to aid storage or recall. Indeed, it has been well established that short-term memory capacity is higher for familiar items for which long-term representations exist, compared to novel stimuli (Hulme, Maughan, & Brown, 1991; see Cowan, 2001, for review). This benefit is likely due to the fact that for novel stimuli, representations must be created before retention can successfully occur. As the model made no assumptions regarding the relationship between short-term and long-term memory, the choice of letters as the retained stimuli seemed a reasonable assumption. Still, we recognize that more complex models of short-term memory could benefit from expansion by making predictions regarding the important role of long-term memory for temporarily retained information.
Also, although the computational model is oscillatory in nature (though not periodic), it is not intended to make any predictions regarding frequency-specific responses actually produced in the brain during short-term memory retention. There is a rich and expanding research literature showing that EEG activity in the theta band plays a key role in the maintenance of temporarily retained information (Jensen & Tesche, 2002) and that hippocampal structures play a significant role in this frequency component (Kahana, Deelig, & Madsen, 2001; Rizzuto et al., 2003). There has also been recent research to show that higher-frequency oscillatory activity (i.e., gamma response) increases approximately linearly with increased memory load (Howard et al., 2003), and that greater gamma activity during encoding predicts greater likelihood of later recall (Sederberg, Kahana, Howard, Donner, & Madsen, 2003). The results of our model are promising in suggesting that oscillatory neural models can show similar capacity limitations as with humans, but they do not allow us to make predictions regarding frequency-specific contributions to EEG, especially as the model oscillations recorded are in terms of the extent to which a specific distributed memory pattern is present (quantity sλ) and not in terms of amount of network activity. While the model does retain information about stimulus order—indeed, stimulus order effects emerge in the model as a result of gradual decay—it does not address issues of temporal sequencing. As recent single-unit recording evidence suggests that ordered sequences of activation are observed in rats (Foster & Wilson, 2006), this may be an interesting area of future expansion of the model.
Our study adds to a rapidly growing literature on computational models of short-term memory by examining the role of weight decay on simple oscillatory memories. Many past models of short-term memory have employed lateral inhibition between representational units to establish competition between activated entities, and thus capacity limitations. For example, Haarmann and Usher (2001) present a model of semantic short-term memory that functions in this fashion. Our approach differs in not explicitly building in such lateral inhibition (although inhibitory weights do occur during pattern storage), with competition between memory patterns arising in the dynamics due to the interference occurring between the nonorthogonal memory patterns. Other recent models of short-term memory, inspired by specific neuroanatomical structures, have used separate modules for memory representation, maintenance, and selective gating. For example, Frank, Loughry, and O'Reilly (2001) and O'Reilly and Frank (2006) incorporate modules representing prefrontal cortex and basal ganglia, with the latter modulating which sensory stimuli are kept active. Our approach does not use a complex architecture or gating mechanisms, and thus shows that some basic behavioral properties of human short-term memory (limited memory capacity, recency effect, and shifts in position-specific stimulus recall) can be captured by a surprisingly simple neurocomputational mechanism. Still other recent short-term memory models have been based on modulation of persistent neuronal firing by rhythmic changes to membrane potential at theta frequencies (Koene & Hasselmo, 2007, 2008). Our approach is quite different in that storage is based primarily on synaptic connectivity, and memory capacity limitations arise mainly due to synaptic decay and pattern interference. However, it is an interesting question as to whether the dynamic thresholds in our model might correspond to the changes in effective thresholds brought about by the modulating theta activity in these more physiologically realistic implementations.
Perhaps the most interesting finding with the model is that by adjusting just the weight decay rate, one can produce shifts in the model's memory capacity and position-specific recall rates, as is demonstrated in Figure 4. This represents a prediction of the model that by adjusting the decay rate, one could reasonably match the shifts exhibited by human subjects who were instructed to recall different-length stimuli sequences. This would be especially remarkable given the simplicity of our model and that it requires adjustment of only a single parameter. This prediction relates to long-standing issues in the cognitive science literature concerning the nature of forgetting. For example, one view of forgetting is that short-term memory is subject to decay (Brown, 1958), while an alternative view is that forgetting is due to interference between competing elements that are simultaneously vying for attention (Waugh & Norman, 1965). Our model incorporates both interference and decay as mechanisms for forgetting and shows that the latter can partially mitigate effects from interference, consistent with evidence in past behavioral studies (Altmann & Gray, 2002).
In general, it is difficult to map processes in neurocomputational models to cognitive processes, but sometimes there are analogs that are worth considering. The observation that adjustments to decay rate control not only the total short-term memory capacity (see Figure 5) but also position-specific stimulus recall rates (see Figure 4) raises the issue of whether altering decay rate might be a useful mechanism permitting a cognitive system to control short-term memory characteristics. Specifically, our model is consistent with the hypothesis that dynamic adjustments to activity decay rate may be an important aspect of the human attention mechanisms that control forgetting (Altmann & Gray, 2002).
It is already well established that attention is a cognitive property that can be manipulated based on the needs of the task at hand (Broadbent, 1982; Downing & Pinker, 1985; Eriksen & St. James, 1986) and that attentional scope can be adjusted during visual search and memory recall between being more focused or more diffuse (Engle, 2002; Kane & Engle, 2002). Based on our modeling results, we hypothesize that altering the decay rate could serve as a means by which attentional mechanisms could act to manipulate attentional scope. More focused attention is simulated in the model by a higher decay rate, so that attention is directed more intently on a smaller number of items. In this way, decay is used as a means for combating proactive interference, with higher decay rates leading to more effective retention of recent information, but also at the expense of that which was presented before it.
For the running memory span task used here involving rapid presentation of stimuli, human subjects attempt to hold presented stimuli in a limited-capacity memory without the use of rehearsal (Bunting et al., 2006). Assuming that maintaining such stimuli depends on attentional resources, then changing instructions requiring subjects to retain varying numbers of stimuli (i.e., not just six as we did in our behavioral experiments) would be expected to have a great effect. Specifically, if attention is drawn sufficiently thin so that activation maintenance is small across all retained stimuli (a low decay rate in our model), then with longer stimulus sequences (e.g., a task requiring human subjects to recall 12 stimuli) few or none of the stimuli would be expected to retain activation levels above some cognitive threshold required for successful recall due to interference, although no doubt some attenuated recency effect will still be present. This is both a surprising and informative prediction from the model, and it suggests that overloading subjects' attentional resources (i.e., drawing attention sufficiently thin) has a detrimental effect on retention. Future behavioral testing with a varying-length recall task could therefore either refute or strongly support the model we have presented here.
To our knowledge, the work reported here is the first comparison of simple oscillatory model properties to human behavioral data. While the results are encouraging, they leave open a number of issues that should be examined in future work. Perhaps the most pressing issue concerns the generality of these results. It will be important to determine whether similar correspondences between model properties and human behavior can be produced as readily with other stimulus sets, while varying the number of stimuli that subjects are instructed to retain and recall, and across other short-term memory tasks. For example, it would be useful to compare the model's results against human data using unfamiliar or abstract symbols or nonwords where subject performance is based on free recall. Similarly, with the model, it would be useful to examine how long-term memory influences the results of recall. Another future issue, not examined in our work, would be to characterize model behavior in the absence of interference using orthogonal stimulus patterns, allowing the effects of weight decay to be studied in isolation.
Appendix: Analysis of the Effects of Varying Rate of Decay
We can use this estimate to see why an intermediate value of k optimizes memory capacity. First note that has the form , where Fq and Fμ are scalar functions of k. The first self-support term , considered in isolation, tends to make a fixed point (in the low-temperature limit) and thus an energy minimum, if Fq is large. Assuming that N is relatively large compared to M, as is typically the case, then the first term (1 − k)M−q will generally be the most important factor; this term represents the effects of decay on recall of stored pattern . The second interference term , which represents cross-talk due to other stored patterns, tends to prevent from being a fixed point if Fμ is large. The magnitude of Fμ, and thus the amount of interference, depends on how similar the memory patterns are to one another as measured by the Apq values. Thus, for a specific , we have a multiobjective optimization problem: find a value of k that maximizes Fq while simultaneously minimizing Fμ, subject to the constraint 0 ⩽ k < 1. Such a k value will maximize the probability that is stored in memory (i.e., is a fixed point). However, as in many multiobjective optimization problems, there is a trade-off: a small value of k will have the desirable effect of maximizing Fq and hence self-support (which, assuming N is large, is primarily determined by (1 − k)M−q), but will also have the deleterious effect of simultaneously maximizing Fμ and hence of maximizing interference. The opposite holds for a large value of k. It is this trade-off between self-support and interference that ultimately makes an intermediate value of k optimal.
Solving multiobjective optimization problems like ours that are subject to constraints is generally recognized as a very difficult problem (Deb, 2001). However, we can estimate a reasonably good k value that will maximize memory capacity for input sequences S of length M by simply estimating the mean of the values k that maximize the ratio Fq/Fμ, subject to the constraint 0 ⩽ k < 1, for each of the memory patterns in S, averaged over the patterns in S. For each , a preferred k value is easily estimated by identifying the maximum Fq/Fμ value as k is sampled between 0 and 1. For example, for the conditions under which memory capacity was measured in Figure 5 (N= 35 nodes, M= 8 patterns, and using the N= 26 letter patterns for which one can compute Aμ = 7.04), this approach predicts an optimal value of k= 0.22, which is reasonably close to the measured value of approximately 0.15 (see Figure 5) given the approximations made above in deriving Fμ.
This work was supported in part by NSF award 0753845.
The value 0.85 is somewhat arbitrary (0.75 or 0.90 or other similar values would serve as well). We selected 0.85 prior to systematic data collection by inspecting pilot simulation results because this value gave visually evident separation of retained versus “forgotten” memory stimuli when plotted (like those plotted in Figure 2). Note that any of these values give sλ= 1 when dλ= 0, so the specific value used for this parameter does not alter in any way whether a memory pattern is counted as recalled or not in our results.
As the stimuli are meaningless patterns to the computational model, numerical digits were used as stimuli with human subject testing rather than letters to allow simple and efficient behavioral response entry using a mouse. Subjects clicked on a graphical display of the digits 1 through 9 organized in the style of a telephone keypad. In testing human subjects, a digit was never repeated within a moving window of seven stimuli, and never more than twice. We do not expect that digit repetition significantly affected responses because careful attention was paid to avoiding any contamination from strategies that could improve memory capacity, minimizing interference from repetition within any given trial relative to that throughout the experiment.
Ransom Winder is currently at MITRE Corporation.