Cognitive theories suggest that working memory maintains not only the identity of recently presented stimuli but also a sense of the elapsed time since the stimuli were presented. Previous studies of the neural underpinnings of working memory have focused on sustained firing, which can account for maintenance of the stimulus identity, but not for representation of the elapsed time. We analyzed single-unit recordings from the lateral prefrontal cortex of macaque monkeys during performance of a delayed match-to-category task. Each sample stimulus triggered a consistent sequence of neurons, with each neuron in the sequence firing during a circumscribed period. These sequences of neurons encoded both stimulus identity and elapsed time. The encoding of elapsed time became less precise as the sample stimulus receded into the past. These findings suggest that working memory includes a compressed timeline of what happened when, consistent with long-standing cognitive theories of human memory.
Theories of human memory have long suggested that memory depends on a representation of the recent past in which events are organized on a compressed timeline (Howard, Shankar, Aue, & Criss, 2015; Brown, Neath, & Chater, 2007; Crowder, 1976; James, 1890). This implies that memory provides access to what happened when; a neural representation supporting a timeline should enable reconstruction of the chronological order of previous stimuli as well as their identity. If the timeline is compressed, then as stimuli recede into the past, their time of occurrence and identity are represented with less and less accuracy.
The neural underpinnings of working memory are studied using tasks that require maintenance of a small amount of information across a (typically brief) delay interval. Most neural models assume that working memory maintenance relies on sustained firing of neurons (Goldman, 2009; Egorov, Hamam, Fransén, Hasselmo, & Alonso, 2002; Goldman-Rakic, 1995). According to this model, when a to-be-remembered stimulus is presented, it activates a specific population of neurons that remain firing at an elevated rate for as long as necessary until the information is no longer required. A great deal of work in computational neuroscience has developed mechanisms for sustained stimulus-specific firing at the level of circuits, channels, and local field potentials (Chaudhuri & Fiete, 2016; Lundqvist, Herman, & Lansner, 2011; Mongillo, Barak, & Tsodyks, 2008; Sandberg, Tegnér, & Lansner, 2003; Compte, Brunel, Goldman-Rakic, & Wang, 2000; Durstewitz, Seamans, & Sejnowski, 2000; Amit & Brunel, 1997). However, if the firing rate is constant while the stimulus is maintained in working memory, then information about the passage of time is lost. Thus, a memory representation based on sustained firing is not sufficient to represent information about time.
Time cells, neurons that fire sequentially, each for a circumscribed period, during the delay interval of a memory task (MacDonald, Lepage, Eden, & Eichenbaum, 2011; Pastalkova, Itskov, Amarasingham, & Buzsaki, 2008), provide a neural representation that includes information about time. By examining which time cell is firing at a particular moment, one can reconstruct how far in the past the delay began. Behavioral work on timing shows that the accuracy in estimating the elapsed time decreases with the amount of time to be estimated (e.g., Lewis & Miall, 2009; Rakitin et al., 1998). Two properties of time cells are consistent with an analogous decrease in temporal accuracy. First, time fields later in the sequence should be broader (i.e., less precise). Second, there should be more neurons with time fields early in the delay and fewer neurons representing times further in the past. Both of these properties have been observed, primarily in rodent work, for time cells in the hippocampus (Salz et al., 2016; Howard et al., 2014), entorhinal cortex (Kraus et al., 2015), medial prefrontal cortex (mPFC; Tiganj, Jung, Kim, & Howard, 2017), and striatum (Akhlaghpour et al., 2016; Mello, Soares, & Paton, 2015; Jin, Fujii, & Graybiel, 2009).
The cognitive models for a compressed timeline predict that distinct sequences of time cells should be triggered by distinct stimuli. In this way, one could directly read off not only what stimulus occurred in the past but also how far in the past that stimulus was presented. Despite some dramatic evidence for sequence memory in primate lateral prefrontal cortex (lPFC; Ninokura, Mushiake, & Tanji, 2003, 2004), thus far there has been little evidence for conjunctive “what and when” information in populations of time cells. Earlier studies of time cells have not observed stimulus-specific time cell firing (e.g., Akhlaghpour et al., 2016; but see MacDonald, Carrow, Place, & Eichenbaum, 2013), leading some theorists to hypothesize that “what and when” information are maintained separately (Friston & Buzsáki, 2016) in much the same way that “what and where” information are presumably segregated in the visual system.
This article reports reanalysis of data initially described in Cromer, Roy, and Miller (2010). More detailed descriptions of the behavioral and recording methods can be found in that article.
Two macaque monkeys, one male and one female, performed a delayed match-to-category task. The stimuli were chosen from one of two independent category sets; each category set consisted of two categories. One category set was “animals,” which consisted of the categories “dogs” and “cats.” The other category set was “cars,” which consisted of the categories “sports cars” and “sedans.” Stimuli were constructed as morphed images, composed from a mixture of two prototype images each taken from a different category within the same category set (Figure 2A and B). The test stimulus was always chosen to be from the same category set as the sample stimulus, but it could come from either the same or a different category within that category set.
Each trial was initiated by the monkey grabbing a response bar. The trials started with a 1000-msec fixation period, during which a white cross was presented in the middle of the screen. The monkey was required to fixate on it. The fixation period was followed by a 600-msec sample stimulus presentations and then by a 1000-msec delay interval, during which the monkey had to maintain a memory of the stimulus category to be able to successfully complete the task and obtain a reward. The delay period was followed by a test stimulus presentation lasting for another 600 msec.
On match trials, when the test stimulus was from the same category as the sample stimulus, the monkey had to release the response bar within 600 msec of the test stimulus presentation. On nonmatch trials, the monkey had to continue holding the bar during the test stimulus presentation and during a subsequent 600-msec delay interval followed by a second test image. The second test image was always a category match to the sample image (see Figure 2C for a block diagram showing the behavioral protocol). The performance of each monkey was correct in more than 80% of trials.
Neural recordings were made using up to 16 individual, epoxy-coated tungsten electrodes (FHC Inc.) positioned over the lPFC. Spike sorting was performed on digitalized waveforms using principal component analysis (Offline Sorter; Plexon Inc.). Five hundred isolated units were recorded from the two animals. Recordings of 455 of these 500 units were already used for the analysis published in Cromer et al. (2010). Eye movements were recorded using an infrared eye tracking system (Iscan).
Identifying Temporal and Stimulus Selectivity Using Maximum Likelihood
We used a maximum likelihood approach to evaluate whether time and/or stimulus identity was encoded in the firing of the recorded cells. These methods build on analysis methods used to identify time cells in rodent hippocampus and mPFC (Tiganj et al., 2017; Salz et al., 2016). Here, we expanded the approach to include the identity of the stimulus in the modeled firing rates. This enables the method to identify conjunctive, stimulus-specific time cells. In each trial, we only analyzed the 1600 msec starting from presentation of the sample and terminating at the presentation of the next stimulus. This interval includes the 600-msec presentation of the sample stimulus and a 1000-msec blank delay interval. The spike trains of each cell were fitted with models that included different variables, such as time and stimulus identity. The parameter space of these models was systematically explored to compute the maximum likelihood fit. To find the best-fitting model, the parameter space was iteratively searched using a combination of particle swarming and the Quasi-Newton method. Particle swarming was performed first (with the swarm size equal to 50), and its output was used to initialize the Quasi-Newton method, which was performed second (the number of maximum function evaluations was set to 10,000). The computations were implemented in MATLAB 2016a. To avoid solutions that converged to a local minimum, the fitting procedure was repeated until the algorithm did not result with better likelihood for at least five consecutive runs. As a preprocessing step, spike trains were downsampled to 1-msec temporal resolution such that, if a spike was observed in a particular 1-msec time bin, the corresponding data point was set to 1; otherwise, it was set to 0. The maximum likelihood was computed for each recorded cell using all available trials.
Identifying Temporal Receptive Fields
To quantify whether the contribution of the terms that contained time was significant, the maximum log-likelihood was computed again, but this time with the time term set to zero (a1 = 0), such that the likelihood was affected only by the constant term a0. Because the models with and without time are nested, the likelihood ratio test was used to assess the probability that adding the time term significantly improved the fit. The test is based on the ratio of the likelihoods of two different models and expresses how many times the data are more likely under one model than the other, and it takes into account the difference in the number of parameters. To ensure that a unit will not be classified as a time cell only because of its activity in a single trial, the analysis was done separately on even and odd trials. For a unit to be classified as a time cell, it was required that the likelihood ratio test was significant (p < .01) for both even and odd trials. To eliminate units with ramping or decaying firing rate during a delay interval, μt was required to be within the delay interval and at least one σt away from either the beginning or end of the interval. In addition, to eliminate units with overly flat firing rate from classification as a time cell, σt was required to be at most equal to the length of the delay interval.
Quantifying Category Specificity
Quantifying Category Set Specificity
As an additional control to evaluate whether category set specificity was meaningful, we evaluated whether the number of category-set-specific time cells was more than one would expect from artificial pairings of categories. This was done by comparing the number of time cells that distinguished between animal and car category sets with the number of time cells that distinguished between artificial (not meaningful) mixtures of categories. One control model estimated the number of units that distinguished dog and sports car stimuli from cat and sedan car stimuli. The other estimated the number of units that distinguished dog and sedan car stimuli from cat and sports car stimuli. Because these artificial “category sets” are not meaningful, if the actual number of units coding for the true category sets—animal versus car—exceeds the number for the artificial category sets, we can conclude that the population contains information about the organization of the stimuli into category sets.
Quantifying Sustained Activity
Linear Discriminant Analysis for Cross-temporal Classification
In addition to maximum likelihood approach, we used linear discriminant analysis (LDA) to quantify the decoding accuracy. We divided the 1.6-sec interval composed of sample and delay periods into 50-msec-long nonoverlapping time bins. We used the entire population composed of 500 units. The units were recorded during multiple recording sessions with different number of trials. To even the number of trials across all units, we restricted the number of trials to the lowest number recorded from a single unit, 857 trials. For each time bin, we trained an LDA classifier on 80% of randomly chosen trials and used the remaining 20% of the trials for testing. The objective of classification was to accurately assign each trial to one of the four stimulus categories, with chance level being 25%. The testing was done on the same time bin as the training (to evaluate the decoding accuracy) but also on every other time bin (to evaluate performance of a classifier as a function of temporal distance between training and testing time bin). We repeated the training and testing for 10 iterations to obtain robust results (quantified through SEM). The classifier was implemented using MATLAB 2017b function classify. To ensure stability of LDA, the dimensionality of the training and testing data was reduced to full rank before each run of the classifier.
The computational model used here is based on a previously published method for computing a scale-invariant neural timeline (Shankar & Howard, 2012, 2013). The model can be understood as a two-layer feedforward network (Figure 1A). The first layer implements an approximation of the Laplace transform of the input to the network (keeping only the real part of the coefficients). The second layer approximates an inversion of the transform using the Post approximation formula (Post, 1930). Here, we assume that the input function is a transient that captures information about the identity of the sample stimulus at the time it is presented. After the input is presented, the first layer codes the Laplace transform of the input function. As the delay progresses, this input function contains the identity of the sample stimulus further and further in the past. The Laplace transform contains this information, and the second layer approximates a reconstruction of this function, with different units supporting different parts of the time axis. We compare the properties of units in the second layer of the network with experimentally observed stimulus-specific time cells. These units estimate the time of the sample stimulus with decreasing accuracy as it recedes into the past, resulting in broader time fields and fewer units with time fields as the sample stimulus becomes more temporally remote.
To make this more concrete, in the present experiment with four distinct categories of stimuli, it is sufficient to consider an input vector f(t) consisting of four elements. At each moment, f(t) gives the vector-valued category of the stimulus currently presented. When a stimulus from a particular category A is presented, the component fA is set equal to 1 for a brief moment (i.e., a delta function input) and 0 at other times during the delay.
It turns out (Shankar & Howard, 2012) that the width of each unit's activity as a function of time depends linearly on its value of with a Weber fraction that is determined by the value of k. We found a good correspondence to the empirical data with k = 15. Whereas k = 15 yielded the best correspondence with the data, all choices yield similar qualitative results. To implement the kth order derivative with respect to s, 2k + 1 neurons from the first layer need to project to each neuron in the second layer with an on-center/off-surround connectivity pattern. If the neurons in the first layer are anatomically organized by their time constant, a spatially local neighborhood of 2k + 1 neurons from the first layer projects to each neuron in the second layer. A higher-order derivative could also be computed by stacking up layers that implement lower-order derivatives. For instance, a sixth-order derivative could be implemented by stacking three layers, each implementing a second-order derivative. The values of were logarithmically spaced between 100 and 1500 msec. Logarithmic spacing implements Weber–Fechner scaling.
To mimic stimulus specificity, we assumed that units in received a mixture of inputs from stimuli from four categories and two category sets (analogous to the behavioral task). For each unit, we picked one category as its preferred category and weighted its response by one (Equation 9) for that category. For the other categories, we picked coefficients randomly to weight the same temporal response. When a stimulus did not belong to the preferred category but was from the same category set as the preferred category, we weighted the impulse response by a value taken from a normal distribution with a mean of 0.6 and a standard deviation of 0.3. When the stimulus was from the other category set, the temporal response was weighted by a value taken from a normal distribution with a mean of 0.3 and a standard deviation of 0.3. In addition, all of the coefficients were bounded between 0 and 1. These values were chosen informally to provide rough agreement with the empirical data.
In this study, we analyzed recordings from 500 units in lPFC of two macaque monkeys during performance of a delayed match-to-category working memory task initially reported in Cromer et al. (2010). The method is summarized in Figure 2. Neurons in lPFC are known to maintain stimulus information during the delay. Here, we focus on the units that showed temporal modulation, with special attention to the existence of stimulus-specific time cells, as predicted by cognitive models of working memory (Howard et al., 2015). To facilitate comparison of the neural phenomena to theoretical models, we also include simulations of a computational model for a compressed neural timeline (Howard et al., 2014; Shankar & Howard, 2012, 2013).
Units Carrying Temporal Information
Of 500 analyzed units, 240 were classified as time cells. Several examples of firing activity of those units are shown in Figure 3. The time cells activated sequentially, spanning the entire interval. The temporal profiles of all 240 time cells averaged across trials are shown on Figure 4A. The cells are sorted by the peak time of the estimated Gaussian-shaped time fields (μt).
Temporal Information Was Coded with Decreasing Accuracy as the Trial Elapsed
There are two ways that a population of time cells would show decreasing temporal accuracy. First, the width of time fields should increase as the trial elapses. Second, the number of units with time fields earlier in the delay should be larger than the number later in the delay. Both of these properties were observed.
First, the width of the central ridge in Figure 4A increases from the left of the plot to the right of the plot, suggesting that units that fire earlier in the trial tend to have narrower time fields than the units that fire later. This impression was confirmed by analyses of the across-units relationship between the peak time (μt) and the standard deviation (σt) of the estimated Gaussian-shaped time fields (Figure 5A). The correlation between the peak time and the width was significant (Pearson's correlation = .48, p < 10−14). A linear regression model linking the peak time (independent variable) and the width (dependent variable) gave an intercept of 0.09 ± 0.01 (mean ± SE), p < 10−12, and a slope of 0.15 ± 0.02, p < 10−14. To the extent that the relationship is linear, it confirms a key quantitative prediction of a scale-invariant timeline; the dashed green line in Figure 5A is the prediction derived from the theoretical model (see Methods for details).1
Second, the number of time fields later in the trial was smaller than the number of time fields earlier in the interval. This can be seen from the fact that the central ridge in Figure 4A does not follow a straight line, as would have been the case if it followed a uniform distribution. Rather, the curve flattens as the interval proceeds. To quantify this, we examined the distribution of the peak times across time cells (Figure 5B). The Kolmogorov–Smirnov test rejected the hypothesis that the distribution of the peak times is uniform, D(240) = 0.28, p < .001. The dashed green line in Figure 5B is the cumulative that would be expected if the distribution was a power law with exponent −1 with values between 100 and 1500 msec (choice of k does not affect this exponent). The correspondence of the observed results and this theoretical distribution suggests that the timeline is compressed logarithmically, consistent with the Weber–Fechner law. This prediction of the computational model is independent of the choice of k.
In addition, we investigated whether the data are better explained by fitting the stimulus presentation time (first 0.6 sec of the observed interval) separately from the delay period (subsequent 1 sec). We computed a bilinear fit by finding two slopes and two intercepts that maximize the likelihood of the data given the bilinear fit. The power-law fit with the exponent of −1 explained the data better than the bilinear fit (because the two models had different numbers of parameters, the fits were compared in terms of Akaike information criterion (AIC) and Bayesian information criterion (BIC): ΔAIC = 12.6, ΔBIC = 23).
The observation that temporal information was coded with decreasing accuracy as the trial elapsed was further supported by changes in the ensemble similarity across time. The ensemble similarity (Figure 4B) was computed for the population of time cells as a cosine of the normalized firing rate vectors between all pairs of time points during the observed 1.6-sec-long time interval.
The Population Coding for Time Conjunctively Carried Information about Category Identity
Of 240 units classified as time cells, 175 also distinguished the identity of the category of the to-be-remembered stimulus using the criteria described in the Methods section. Figure 6 shows several examples of such cells. These units were classified as time cells because they fired preferentially at circumscribed periods during the trial. However, the magnitude of their firing also depended dramatically on what category of stimulus was presented on the trial.
Time Cells Respected Category Set Structure
The two category sets (animal vs. car) differ in their visual similarity. That is, cat and dog stimuli are more visually similar to one another than they are to stimuli from the car category set. To determine whether stimulus-specific time cells respected this visual similarity structure, we noted that 45 of 240 time cells met the definition for category-set-specific time cells (see Methods for details). Figure 7 shows several examples of representative units that were classified as category-set-specific time cells.
As a control, we computed the number of cells that had similar coefficients for artificial category sets. The number of category-set-specific units significantly exceeded the number of cells specific for artificial category sets: Eight units distinguished dog and sports car stimuli from cat and sedan stimuli. Fifteen units distinguished dog and sedan stimuli from cat and sports car stimuli. Both of these proportions (8/175 and 15/175) are reliably different than the 45 of 175 that distinguished the true category sets (animal vs. car), χ2 = 28.81, p < 10−7, and χ2 = 16.91, p < 10−4, respectively.
The results for conjunctive coding of “what and when” information can be read off directly from the heatmaps in Figure 8, which shows the temporal profiles of all 175 stimulus-specific time cells. The first heatmap in Figure 8 shows the temporal profile for each unit in response to the category that caused the unit to fire at the highest rate. The middle heatmap shows the temporal profile for the same sorting of units for stimuli for the other category from the same category set. For instance, if a particular unit responded most to dog stimuli, that temporal profile would be in the left heatmap and its response to cat stimuli would be in the middle heatmap. The heatmap on the right shows the temporal profile for each unit in response to stimuli from the category set that did not include the unit's best category. For instance, if a unit responded best to dog stimuli, its profile in the right heatmap gives its response to stimuli from the car category set, including trials with sample stimuli from both the sports car and sedan categories.
The sensitivity of the stimulus-specific time cells to category set can be noted from observing the difference between the heatmap in the second column and the heatmap in the third column. Although a difference between the first and second heatmaps could simply be a selection artifact, the difference between the second and the third indicates that time cells in this experiment respected the structure of the category sets.
Most Units That Encoded Category Identity Were Time Cells
The fit with four constant terms was better than the fit with only one constant term for 342 of 500 units. These 342 units distinguished category identity. Most of these units also showed temporally modulated firing. Furthermore, 269 of 342 of the category-selective units were also temporally modulated (Figure 9B). The firing dynamics of the remaining 73 category-specific units seemed irregular rather than sustained in time over the delay (Figure 9A). Figure 10 shows rasters for typical cells that were category specific but did not pass the threshold for reliable temporal modulation (as modeled by a Gaussian time field). Notice that the units that were fit better with a Gaussian-shaped time field than with a constant term were not necessarily considered as time cells. This is because, for classifying units as time cells, we imposed an additional set of requirements regarding the peak time and standard deviation, as described in the Methods section (those criteria were necessary to fully define the time cells in terms of peak time and width of the temporal fields).
The Ensemble Contained Information about the Category Identity Well above Chance at Almost All Time Points
LDA was performed to decode category identity of the sample stimulus at each 50-msec time bin of the sample and delay intervals (see Methods). Accuracy for most of the time bins was above chance (Figure 12). The accuracy was computed for 10 runs; in each run, trials were randomly assigned to train the model (80%, 686 trials) or held out for testing the classifier (20%, 171 trials). For the held-out trials, if the classifier successfully classified 55 or more trials, this exceeds the Type I error rate at the .05 level. As a conservative estimate of decoding accuracy, we took time points for which the average over the 10 runs exceeded this value as reliably coding category identity. The accuracy was particularly high for the time bins where the classifier was tested on the same time bin it was trained on (the diagonal in Figure 12). Every time bin after 100 msec along the diagonal was classified above chance (30/32 bins), indicating that the ensemble maintained information about category identity.
The performance of LDA decoder appeared decreased as a function of temporal distance between the time bin the decoder was trained on and the time bin the decoder was tested on. This is indicated by the gradual change in columns of the heatmap in Figure 12. Peak accuracy was obtained around the diagonal elements and gradually decreased for points further from the diagonal, suggesting that the part of the neural ensemble decoding stimulus category changed gradually over the delay. This observation is consistent with the gradual change through sequential activation observed in time cells.
Computational Model for Compressed Memory Representation
Previous work in computational neuroscience (Howard et al., 2014; Shankar & Howard, 2012) and cognitive psychology (Howard et al., 2015; Shankar & Howard, 2012) has developed a quantitative model for how a compressed timeline could be constructed. This method (described in more detail in the Methods section) makes a strong commitment to scale invariance of the temporal representation inspired by robust behavioral results from timing and memory experiments (Howard, Youker, & Venkatadass, 2008; Rakitin et al., 1998). Consistent with these behavioral findings and the logarithmic compression of receptive fields in the visual system (Van Essen, Newsome, & Maunsell, 1984; Schwartz, 1977), theoretical considerations (Howard & Shankar, 2018; Shankar & Howard, 2013) strongly suggest that time fields should also be logarithmically compressed. This quantitative argument makes two clear predictions. First, the width of time fields should go up linearly with the time of their peak. Second, the number of time fields centered on a time τ should go down like τ−1.
This model can be understood as a two-layer feedforward network. However, because it can be well described mathematically, analytic results can be readily obtained (see Methods for details). Figure 5 provides a means to evaluate whether these quantitative predictions are consistent with the empirical data. The straight line relating spread of the time fields to their center (dashed green line in Figure 5A) is a qualitative prediction of the model; the slope of the line is controlled by a single parameter k (see Methods for details). The distribution of time fields given by the model is given by the distribution N (τ) ≃ τ−1; the dashed green line in Figure 5B shows this distribution with two parameters controlling the smallest and largest possible values of the center of the time field (set to 100 and 1500 msec, respectively). The agreement of the empirical data with the predictions of a scale-invariant timeline is very strong.
To further evaluate the comparison between the model and the empirical results, we also generated heatmaps as in Figure 8. By causing different units in the timeline to respond differentially to stimuli from different categories, we were able to generate a strong agreement with the empirical results, as shown in Figure 11. Because these are analytic results, there is no noise in the time fields across trials. This qualitative fit of the computational model supports quantitative findings from the parameters of the descriptive model of time cells and the linear classifier (Figure 12).
During performance of a working memory task, some neurons in lPFC fired in sequences while information needed to be maintained in working memory. The sequences exhibited coding of temporal information—the time since the sample stimulus was presented (Figure 4)—conjunctively with information about the identity of the category of the sample stimulus (Figure 8). The temporal information decreased in accuracy as time elapsed (Figure 5). This decrease in accuracy aligned well with predictions of a scale-invariant timeline taken from cognitive models of memory.
Temporal Information throughout the Brain
The present findings provide robust evidence of conjunctive coding of “what and when” information in the lPFC. Although, to our knowledge, this is the first report of robust stimulus-specific time cells, many recent articles have shown evidence for sequentially activated time cells in a broad range of brain regions and multiple species. Previous studies in rodents have found time cells with similar properties in hippocampus (Terada, Sakurai, Nakahara, & Fujisawa, 2017; Salz et al., 2016; Kraus, Robinson, White, Eichenbaum, & Hasselmo, 2013; MacDonald et al., 2011), mPFC (Bolkan et al., 2017; Tiganj et al., 2017), and striatum (Akhlaghpour et al., 2016; Mello et al., 2015). A monkey study has observed sequentially activated time cells in dorsolateral PFC and striatum (Jin et al., 2009). In some of those previous studies showing time cell activity, the experimental procedure would not enable measurement of conjunctive “what and when” information (Tiganj et al., 2017; Mello et al., 2015; Kraus et al., 2013; Jin et al., 2009). In some other studies, it would have been possible in principle to measure robust conjunctive “what and when” information (Akhlaghpour et al., 2016), but it was not observed or reported. MacDonald et al. (2011) observed some evidence for conjunctive “what and when” coding in the rodent hippocampus, but it was not as reliable as this study (see also MacDonald et al., 2013). Terada et al. (2017) showed reliable evidence for stimulus coding in the rodent hippocampus. A systematic study will be necessary to determine under what circumstances time cells also show evidence for stimulus identity.
The presence of sequentially activated time cells, regardless of whether or not they also code for stimulus identity, in so many brain regions with such similar functional properties is striking. This may point to a fundamental role for a compressed timeline in many different forms of memory. Our conventional understanding of the cognitive neuroscience of memory describes memory as composed of a number of separable systems associated predominantly with various brain regions (Jenkins & Ranganath, 2016; Eichenbaum, 2012; Squire, 2004). Although there are ongoing disputes about how exactly to specify the systems, there is broad consensus that the hippocampus is associated with the declarative memory system, the striatum is associated with a nondeclarative implicit memory system, PFC is associated with a working memory system, and so forth. The fact that such similar temporal representations are observed in regions associated with so many distinct memory systems suggests that these memory systems rely on a common form of temporal representation. Perhaps, different memory systems perform different operations on a common representation (Aronov, Nevers, & Tank, 2017; Howard et al., 2015).
For many years, the default understanding of the neural basis of working memory was that working memory is maintained through stable persistent firing (Curtis & D'Esposito, 2003; Goldman-Rakic, 1995; Funahashi, Bruce, & Goldman-Rakic, 1989; Fuster & Alexander, 1971). We did not observe large numbers of category-selective cells that exhibited sustained firing (see Figure 10 for the best examples). In contrast, category-selective units were temporally modulated, either as time cells or as ramping or decaying cells. Thus, this study adds to a growing body of empirical work (e.g., Murray et al., 2017; Spaak, Watanabe, Funahashi, & Stokes, 2017; Lundqvist et al., 2016; Stokes et al., 2013) that requires an updated view of working memory as a dynamically changing representation.
Interestingly, although this task did not test animals' memory for temporal order, our results clearly show that the stimulus-selective neurons nonetheless conveyed temporal information. Thus, maintaining a memory representation of what happened when over the recent past might be largely spontaneous.
We observed that the width of the temporal receptive fields increased soon after the stimulus onset, despite the fact that the stimuli remained present for 600 msec. Similarly, when LDA was used for decoding the stimulus identity, we observed the peak performance soon after the stimulus onset. This suggests that the stimulus onset is perceived as a more salient effect than the stimulus offset. Further research with experimental paradigms that include longer stimulus presentations is needed to evaluate how the stimulus duration is encoded. Furthermore, further research is needed to understand whether the neural timeline can maintain multiple stimuli simultaneously as well as multiple repetitions of the same stimulus. The theoretical framework described here is linear and predicts that multiple repetitions of the same stimulus result in additive response that allows reconstruction of the presentation time of each repetition.
Computational Model of Working Memory
Computational neuroscience studies of working memory have predominantly focused on maintaining information about the identity of presented stimuli, either through sustained (Curtis & D'Esposito, 2003; Wang, 2001; Goldman-Rakic, 1995) or time-varying (Murray et al., 2017; Stokes, 2015; Sreekumar, Dennis, Doxas, Zhuang, & Belkin, 2014; Lundqvist et al., 2011; Durstewitz & Seamans, 2006) firing dynamics. Although very useful, such models do not readily account for the temporal aspect of memory. Other computational models (e.g., Itskov, Curto, Pastalkova, & Buzsáki, 2011; Goldman, 2009; Grossberg & Merrill, 1992) can also account for sequentially active firing that can be used to read off temporal information.
The computational model used here differs from previous work in that it makes a strong commitment to a logarithmically compressed timeline and is mathematically tractable, enabling straightforward derivation of behavioral predictions (Shankar & Howard, 2012, 2013). The biological plausibility of the key components of the model has been studied closely. The major objection, that the method requires neurons with slow functional time constants, has been addressed by showing that a single-cell model based on known properties of persistently firing neurons (Fransén, Tahvildari, Egorov, Hasselmo, & Alonso, 2006; Egorov et al., 2002) can be readily adapted to generate a broad range of slow functional time constants (Tiganj, Hasselmo, & Howard, 2015; Tiganj, Shankar, & Howard, 2013). The other major assumption of the model is a feedforward projection that is functionally equivalent to a set of on-center/off-surround receptive fields placed in series. The logarithmic compression of temporal receptive fields parallels the logarithmic compression of visual receptive fields, which has been known for decades (Van Essen et al., 1984; Hubel & Wiesel, 1974). The observed compression is consistent with the Weber–Fechner law, providing a potential neural substrate for this widely observed psychophysical law. Moreover, the mathematics of this model can be readily adapted to provide a description not only of time cells but also a variety of findings from the place cell literature and even sustained firing (Howard et al., 2014).
Stimulus-specific Time Cells Are Predicted by Many Theories of Memory
Previous studies have reported that it is possible to extract temporal or stimulus identity information by applying different decoding techniques on the activity of neural populations (Kim, Ghim, Lee, & Jung, 2013; Stokes et al., 2013; Hung, Kreiman, Poggio, & DiCarlo, 2005; Baeg et al., 2003; Pesaran, Pezaris, Sahani, Mitra, & Andersen, 2002). Neural models of timing rely on gradually changing firing rate throughout a delay period rather than temporal receptive fields (Kim et al., 2013; Gavornik & Shouval, 2011; Simen, Balci, de Souza, Cohen, & Holmes, 2011). Similarly, recurrent neural networks, including liquid state machines (Buonomano & Maass, 2009; White, Lee, & Sompolinsky, 2004; Maass, Natschläger, & Markram, 2002; Buonomano & Merzenich, 1995), can be shown to maintain information about preceding stimuli, but the decoder necessary to extract that information into a useful form can be quite complex.
Fusi, Miller, and Rigotti (2016) have argued that the brain expends significant resources representing features conjunctively using mixed selectivity to enable linear decoding (Rigotti et al., 2013). For coding of continuous dimensions, mixed selectivity manifests as receptive fields. To make a concrete example, the hippocampal place code might have consisted of as few as two neurons that fire proportionally to the x or y position of the animal. Although this code would require few neurons to represent position, learning an association between a location in the middle of an arena and reward would be a significant computational challenge. Instead, the hippocampal place code uses many neurons, each with a place field; the set of all place fields tiles the enclosure. Although this coding scheme uses many more neurons, it is computationally straightforward to learn an association between a circumscribed spatial position (represented by the currently active place cells) and some behaviorally relevant outcome. Despite the fact that sequentially activated stimulus-specific time cells seem to require a great many neurons, conjunctive coding of “what and when” information (Figure 8) enables direct readout of the elapsed time and the stimulus identity.
The authors gratefully acknowledge support from ONR MURI N00014-16-1-2832, NIBIB R01EB022864, NIMH R01MH112169, and NIMH R37MH087027.
Reprint requests should be sent to Zoran Tiganj, Center for Memory and Brain, 610 Commonwealth Avenue, Boston, MA 02215, or via e-mail: firstname.lastname@example.org.
Other values of k would have also resulted in a straight line, but with a different slope. Smaller values of k result in a steeper slope.