Most people have great difficulty in recalling unrelated items. For example, in free recall experiments, lists of more than a few randomly selected words cannot be accurately repeated. Here we introduce a phenomenological model of memory retrieval inspired by theories of neuronal population coding of information. The model predicts nontrivial scaling behaviors for the mean and standard deviation of the number of recalled words for lists of increasing length. Our results suggest that associative information retrieval is a dominating factor that limits the number of recalled items.
Human long-term memory contains many thousands of words, pictures, episodes, and other types of information, yet retrieving this information is challenging when no specific cues are available. Real-life examples of this difficulty are abundant (e.g., recalling all the talks attended at the last Society for Neuroscience meeting). Memory performance assessed with forced-choice recognition was reported to be good even with lists of up to 1000 words (Standing, 1973), suggesting that recall is limited by the retrieval process rather than true forgetting.
In the neural network paradigm of associative memory, the issue of storage capacity was extensively analyzed in the framework of attractor neural networks (Amit, 1989). Memories in these networks are stored by strengthening the connections between specific groups of neurons, resulting in stable activity patterns that encode different items (attractors; Hopfield, 1982). Retrieval of an item from memory is mediated by the reactivation of the corresponding attractor (Gelbard-Sagiv, Mukamel, Harel, Malach, & Fried, 2008). Only retrieval with memory-specific cues was considered, when the network is initialized in the activity state close to an attractor representing the item being retrieved. In this study, we develop a model of associative retrieval to explore the mechanisms of memory recall and its capacity without specific cues. In the psychological literature, it was reported that the number of items retrieved is a power function of the total number of items of this type in memory (Murray, Pye, & Hockley, 1976). Sublinear scaling was observed when lists of words were presented to the subject for subsequent free recall (Binet & Henri, 1894; Roberts, 1972; Standing, 1973; Murray et al., 1976). Surprisingly, similar scaling was observed when subjects were instructed to recall words based on graphemic cues (i.e., all the words that begin with particular letter; Murray, 1975). We therefore considered the retrieval capacity (RC) of the system—the average number of memory items that can be retrieved when the total number of items in memory is large.
We first analyzed an attractor neural network model in which memorized items are stored by the set of attractors that encode different items (Hopfield, 1982; Amit, 1989). To mimic a choice of unrelated words presented in recall experiments, we considered random uncorrelated groups of neurons representing different words. In a network of N neurons, each group on average includes fN neurons, where f is a parameter between 0 and 1 that defines the sparseness of representation. If no specific cues are presented, we assume that the items are retrieved associatively, with each currently retrieved item acting as a trigger for a next one.
In one of the network implementations, transitions between attractors are caused by periodic modulation of inhibition in the network. When inhibition is low, each attractor state is stable. When the inhibition is increased, the number of active neurons drops, and the network converges to a state in which the intersection of two attractors is active. On the subsequent cycle, the new attractor emerges. Neuronal adaptation prevents the network from falling back into the preceding attractor (see Russo, Namboodiri, Treves, & Kropff, 2008).
The main focus of our study is a mathematical abstraction of the network model that illustrates the associative nature of retrieval of information from memory and allows the analytical exploration of its consequences.
2. Materials and Methods
2.1. Network Model
The parameters used for the simulation in Figure 1 are N=3000, L=16, f=0.1, T=0.015, Dth=1.9T, and Tth=45. J0 oscillates between the values 0.7 and 1.2 with a period of TJ0=25.
2.2. Power Law Fit of Mean and Standard Deviation of Number of Recalled Words
We obtained data from experiment III of Murdock (1960). Lists of words of varying length were presented aurally to human subjects of either sex at a rate of one word every 2 s. Lists were of L=5, 6, 8, 10, 15, 30, 45, 60, 75, 100, 150, and 200 words in length, and each list was presented to N=10, 17, 15, 24, 14, 64, 19, 29, 25, 15, 13, and 17 subjects, respectively. A detailed description of the experiment can be found in the original work (Murdock, 1960). The estimates of the power law exponents reported below were obtained by numerically minimizing the sum of weighted square errors, separately for the mean and standard deviation of the number of recalled words, using the functional form. For the power law fit of the average number of recalled words, the inverse of their squared standard error was used as weights. For the standard deviation, weights were defined as (Evans, Hastings, & Peacock, 2000), where N is the number of subjects and s the estimated standard deviation of the number of recalled words. We discarded data corresponding to list lengths of 5, 6, and 8 words in order to minimize the contribution of the fixed-length recency or primacy effects to the number of recalled words (Murdock, 1962). Confidence intervals (CI) for the estimated parameters were obtained as the 2.5th and 97.5th percentiles of bootstrap parameter distributions over resamplings with replacement of the original data.
We simulated a network storing L=16 items, represented by random overlapping groups of neurons. The network repeatedly switches between stored patterns and intersections of two patterns. After several cycles, the dynamics enters a loop in which the same items are retrieved repeatedly. As a result, only 8 of the 16 items are retrieved by the network (see Figure 1A). This partial retrieval is not a consequence of a limited storage capacity, which for this network is much higher than 16 items—all the items can be readily retrieved when cued individually. We then computed the number of neurons in the intersection of patterns representing each pair of items (see Figure 1B). Inspecting Figures 1A and 1B reveals that network transitions typically occur between attractors with the largest intersection. This also holds true when transitions between attractors are induced by short-term synaptic depression alone (Bibitchkov, Herrmann, & Geisel, 2002; Pantic, Torres, Kappen, & Gielen, 2002; results not shown). Hence, for each pair of items, we consider the size of the overlap between the corresponding representations to be a measure of similarity between them. When an item is retrieved, it triggers the subsequent retrieval of the item with which it has the largest similarity. Different realizations of the groups of neurons coding memory items will result in a different number of items retrieved. The aim of our analysis is therefore to characterize the statistical properties of retrieval, such as the distribution of the number of items retrieved. To this end, we consider an ensemble of lists of L items, each endowed with a particular realization of an matrix of similarities Sij between items.
3.1. Random Asymmetric Similarity Matrix
Retrieval in the model depends on the properties of the similarity matrix. We first considered a simplified but very instructive version of the model with a random similarity matrix S in which elements are independent and identically distributed. Retrieval can be visualized by a directed graph, in which each node represents an item stored in memory, and each outgoing edge points to the item with maximal similarity (see Figure 2). Our task is to characterize the distribution of the number of nodes reached by following the edges starting from a random node, computed over the ensemble of graphs corresponding to different realizations of a similarity matrix.
3.2. Similarity as Overlap Between Neuronal Populations
The new algorithm has strong implications for the dynamics of retrieval. When an item is visited for the second time (see, e.g., item 5 in Figure 3A), there is a finite probability (p1) that retrieval will not enter into a loop but rather will continue along the already visited trajectory in the opposite direction. This will happen if at the first retrieval, this item was followed by the item with second-maximal similarity (dashed arrow from item 5 to item 6 in Figure 3A). Subsequently, at each “old” item, the retrieval can either continue backward or turn to a new item ( in Figure 3A), starting another exploration of the graph. As opposed to the simplified matrix case considered in the previous section, dynamics enter a loop when a previously retrieved sequence of two consecutive items is retrieved again.
To confirm that the asymptotic results apply to retrieval with a moderate number of items, we simulated the model with different values of f and plotted the average and standard deviation of the number of retrieved items versus the total number of items (see Figures 3C and 3D). As expected, the retrieval capacity decreases with f, and the corresponding exponents are similar to equation 3.7.
3.3. Power Laws in Free Recall Data
Several psychological studies of recall with lists of various lengths produced results that look like the power law relationship , with in the range of (Murray et al., 1976). A representative example from the experiments of Murdock (1960) is shown in Figure 4A (data courtesy of the author). The number of words recalled, averaged over subjects, is well fit by a power law relationship with exponent (95% bootstrap confidence interval [0.37–0.44]). Similar scaling was reported in a different experimental paradigm in which subjects were asked to produce words beginning with a certain letter (Murray, 1975). As in free recall, the average number of words followed the power law function with respect to the total number of corresponding words in the vocabulary, with . The standard deviation of recall across subjects in the data obtained by B. Murdock can also be fit with a power law, with (but with a relatively large 95% bootstrap confidence interval: 0.42–0.68; see Figure 4B and section 2, data courtesy of B. Murdock). These estimates are broadly compatible with our model if we assume a sparseness parameter in the range of 10% (see Figures 3C and 3D). With this parameter choice, the prefactors (a) of the power law functions for the average and standard deviation, obtained from simulation, are 2.7 and 0.8, respectively. The corresponding values and confidence intervals (CIs) estimated from the data are for the average number of recalled words 2.57 with CI [2.24–2.93], and for the standard deviation 0.35 with CI [0.22–0.53].
The standard deviation obtained from the data measures across-subject variability in the number of recalled words. To assess the influence of across-subject variability on the power law behavior, we simulated the model with f uniformly distributed on the interval [0.05, 0.15]. The resulting average and standard deviation are still well fit by power law functions, and the corresponding curves are similar to those obtained with f=0.1 (see Figures 3C and 3D).
We conclude that power law scaling of retrieval capacity is a generic property of memory systems characterized by sequential retrieval of items based on similarity between them, independent of the particular mechanism generating the transitions between the items. This behavior can be mapped to a family of random graph models with the statistics of connections determined by the underlying encoding features of memory items.
Our model is characterized by a single parameter, the average sparseness of memory representations in the network. Representation sparseness strongly affects the retrieval capacity, which could account for the different recall abilities of different subjects. The retrieval probability of an item depends on the size of its neural representation, which could underlie variability in ease of item recall. To emphasize the universal scaling laws of retrieval capacity, the model makes several simplifications. In particular, we assumed that the retrieval process is deterministic and that items stored in memory are randomly encoded and do not form hierarchical structures characterized by classes and subclasses. The model can readily take into account the hierarchically organized data if one assumes that items within the same class are encoded as patterns with a common core of neurons representing this class (Tsodyks, 1990). Increasing inhibition will first cause transitions between the attractors of the same class and then transitions between classes. Adding weak stochasticity to retrieval results in an interesting behavior where the network can spend a long time in a loop and then initiate a new retrieval sequence (results not shown). Our preliminary simulations show that power law scaling of retrieval capacity is still observed, but with the exponent that very slowly increases with the time available for recall.
Retrieval in the model is determined by the similarity between the memory items, without any reference to how the information was acquired by the subject. In free recall experiments, words are acquired by subjects as a list, and retrieval exhibits certain temporal regularities with respect to the order of exposure: words that are presented at the beginning and the end of the list have a higher probability of being recalled (“primacy” and “recency”), and neighboring words have an increased probability of being recalled in proximity (“contiguity”) (Murdock, 1962; Howard & Kahana, 1999). These temporal regularities have been studied in the context of detailed phenomenological models of free recall (Raaijmakers & Shiffrin, 1980; Polyn, Norman, & Kahana, 2009; Howard & Kahana, 2002; Hasselmo & Wyble, 1997). The influence of primacy and recency on retrieval is reduced when a delay between acquisition and recall is introduced (Howard & Kahana, 1999). Both effects were present in the data we analyzed (Murdock, 1960) and could potentially affect our estimate of the experimental retrieval capacity. However, primacy and recency have been shown to result in a small fixed number of additionally retrieved items that is independent of list length (Murdock, 1962). Hence, the confounding effect of primacy and recency becomes less relevant as longer list lengths are considered.
Contiguity is less sensitive to the manipulation of the experimental protocol (Howard & Kahana, 1999). The model could account for contiguity if off-diagonal terms are added to the similarity matrix, which could reflect additional associations temporarily formed between words during acquisition. Our preliminary simulations show that when contiguity strength is comparable to an experimentally observed one, resulting retrieval capacity is characterized by very similar power law behavior (results not shown).
We predict that recall terminates when the same items are repeatedly retrieved, a hypothesis already proposed in the psychology literature (Laming, 2009). In free recall experiments, subjects generally report retrieved items once, but when they erroneously report the same item for the second time, there is a higher probability that no new items will be recalled (Miller, Weidemann, & Kahana, 2012). This observation is compatible with the model prediction. The phenomenological model we propose could be implemented in neural networks with different mechanisms of transitions between memory representations. In this study, we considered one particular mechanism, involving periodic modulation of inhibition. We are not aware of experimental evidence in favor of this mechanism; however, an increase in oscillatory activity in the theta range was reported during memory retrieval (Lega, Jacobs, & Kahana, 2011). Our model provides an impetus for the study of neural oscillations during memory recall.
In conclusion, we presented the first analytical study of associative memory retrieval in the absence of cues. The model predicts a power law scaling dependence of retrieval capacity on the size of the memory. We suggest that power law scaling of cueless retrieval is a general feature of associative memory. Retrieval capacity is intricately linked to the way memories are encoded in the network; in particular, more items can be retrieved from memory with sparser representations.
Appendix: The Model: Analysis and Solutions
A.1. Stability of Memory Items and Intersection Between Them in the Network Model
A.2. Similarity Matrix with Independent Elements
A.3. Symmetric Similarity Matrix with Independent Elements
In this model, the similarity matrix is symmetric but otherwise has independent elements. It fits the definition of the similarities as the number of neurons in the intersections between corresponding neuronal representations but still neglects the correlations between different elements of the matrix. If retrieval proceeds from an item to its most similar, as in the asymmetric case, the dynamics will quickly converge to a two-items loop. The reason is that if item B is most similar to item A, then item A will be most similar to item B with a probability of approximately 0.5. We hence let the system choose the second most similar item if the most similar one has just been retrieved, as explained in the main text. Two types of transitions can therefore be distinguished: strong transitions (to most similar items) and weak transitions (to second-most similar items).
The modified rule is history dependent, which significantly influences the retrieval dynamics. Most important, when reaching an already visited item, retrieval can either repeat the original trajectory (resulting in a loop) or continue backward along the already visited items and then open a new rho (see Figure 3A). As explained in the main text, the statistics of the retrieval can be characterized by three probabilities: p0 the probability of returning from a new item to any one of already visited items; p1, the probability that the retrieval proceeds along the previous trajectory in the opposite direction; and p2, the probability of retrieving a new item after an old one (see Figure 3A). Here we present a way to estimate these probabilities in the limit of large number of items L.
A.4. Neuronal Representation of Memory Items
We are indebted to Bennet Murdock for the data used in Figure 4, and to D. J. Murray for reprints of his papers. M.T. thanks Michael Kahana and Adi Shamir for fruitful discussions of the work. We are grateful to Dov Sagi, Stefano Fusi, and Omri Barak for comments on an earlier version of the manuscript. This work is supported by the Israeli Science Foundation. S.R. is supported by a Human Frontier Science Program long-term fellowship. We declare no conflicts of interest. S.R. and M.T. wish to dedicate this work to the memory of Daniel Amit, mentor and friend.
S.R., I.P., and M.T. contributed equally to this work. S.R. and M.T. wish to dedicate this work to the memory of Daniel Amit, mentor and friend.