Unlike familiarity, recollection involves the ability to reconstruct mentally previous events that results in a strong sense of reliving. According to the reinstatement hypothesis, this specific feature emerges from the reactivation of cortical patterns involved during information exposure. Over time, the retrieval of specific details becomes more difficult, and memories become increasingly supported by familiarity judgments. The multiple trace theory (MTT) explains the gradual loss of episodic details by a transformation in the memory representation, a view that is not shared by the standard consolidation model. In this study, we tested the MTT in light of the reinstatement hypothesis. The temporal dynamics of mental imagery from long-term memory were investigated and tracked over the passage of time. Participant EEG activity was recorded during the recall of short audiovisual clips that had been watched 3 weeks, 1 day, or a few hours beforehand. The recall of the audiovisual clips was assessed using a Remember/Know/New procedure, and snapshots of clips were used as recall cues. The decoding matrices obtained from the multivariate pattern analyses revealed sustained patterns that occurred at long latencies (>500 msec poststimulus onset) that faded away over the retention intervals and that emerged from the same neural processes. Overall, our data provide further evidence toward the MTT and give new insights into the exploration of our “mind's eye.”
Episodic memory involves the recollection of a unique event that occurred in a specific context. It is different from semantic memory, which reflects the ability to retrieve general concepts in the absence of contextual details (Tulving, 1972). The Remember/Know (R/K) paradigm was originally developed by Tulving (1985) to distinguish between episodic (“Remember” responses) and semantic memories (“Know” responses). Over time and with its extensive use in recognition memory, “Remember” and “Know” responses became associated with recollection and familiarity processes, respectively (Yonelinas, 2002). A key difference between “remembering” and “knowing” lies in the ability to create mental images of the information to be retrieved. Indeed, episodic memories would be associated with conscious mental images, an attribute that would not be shared by semantic memories (Gardiner & Richardson-Klavehn, 2000; Brewer & Pani, 1996; Tulving, 1983).
According to the reinstatement hypothesis, the retrieval of specific episodic details would emerge from the reactivation of cortical patterns that were involved during the encoding of the event (Rugg, Johnson, Park, & Uncapher, 2008; Damasio, 1989). Much experimental evidence supports the reinstatement hypothesis by showing that the brain regions that are active during episodic retrieval overlap with those that were involved at the time of the encoding of the information (for a review, see Danker & Anderson, 2010). In the absence of sensory information, top–down mechanisms originating from the prefrontal and lateral parietal cortex would trigger reactivations in the sensory areas (Dijkstra, Zeidman, Ondobaka, Van Gerven, & Friston, 2017; Mechelli, Price, Friston, & Ishai, 2004; Ishai, Ungerleider, & Haxby, 2000). Therefore, the retrieval of content-specific information from the sensory cortical regions (Johnson & Rugg, 2007; Woodruff, Johnson, Uncapher, & Rugg, 2005; Nyberg, Habib, McIntosh, & Tulving, 2000; Wheeler, Petersen, & Buckner, 2000) mediated by content-independent retrieval effects observed in the medial-temporal lobe, in the pFC, and in the lateral parietal cortex would contribute to the reactivation of a unitary episodic representation that would be directly accessible to consciousness (Rugg & Vilberg, 2013).
Importantly, the retrieved episodic content is not an exact copy of the encoded information, and some transformations are already at hand a day after exposure (Xiao et al., 2017). Important reorganizations occur at the system level, resulting in the gradual disengagement of the hippocampus within a few weeks of encoding (Frankland & Bontempi, 2005). This is reflected in terms of subjective reports where retrieval becomes increasingly supported by familiarity judgments (Piolino, Desgranges, & Eustache, 2009; Herbert & Burt, 2004; Conway, Gardiner, Perfect, Anderson, & Cohen, 1997). The gradual loss of episodic details is explained by the multiple trace theory (MTT) under the transformation hypothesis (for a review, see Winocur & Moscovitch, 2011). It posits that episodic memories fade away over time in such a way that most of the transformed memories contain no more than a schematic version of the original memory: the “gist.” This view differs from the standard consolidation model (Squire & Alvarez, 1995) for which (1) episodic memories and semantic memories are considered as two separate entities and (2) episodic memories would not undergo a qualitative change over the consolidation process. Importantly, although these “semanticized” memories would not rely on the hippocampus, remote memories that contain some episodic details would still be hippocampus dependent according to the MTT. Again, this conclusion is not shared by the standard consolidation model for which both remote semantic and episodic memories would not require hippocampal activations.
The reinstatement hypothesis suggests that cortical reinstatement is specific to recollection. So far, however, the patterns of memory reinstatement found in the fMRI studies were similar for recollection and familiarity judgments in R/K paradigms (Thakral, Wang, & Rugg, 2017; Johnson, McDuff, Rugg, & Norman, 2009). This indicates that the observed spatial patterns could not differentiate between the two distinct phenomenological experiences of remembering and knowing. In this study, we decided to focus on the temporal patterns associated to memory reinstatement as specific ERP signatures have been already identified.
The late positive component (LPC) kicks in between 500 and 800 msec in the left parietal region and has been linked to recollection. Converging evidence suggests that activations of the inferior parietal cortex correspond to the LPC (for a review, see Vilberg & Rugg, 2008). Interestingly, the LPC persists over retention intervals ranging from several seconds (Nessler & Mecklinger, 2003) to several minutes and up to 1 day (Wolk et al., 2006; Curran & Friedman, 2004) but significantly decreases after a retention interval of a week (Roberts, Tsivilis, & Mayes, 2013) up to 4 weeks (Tsivilis et al., 2015). This late ERP component is generally contrasted with the FN400, a midfrontal negativity occurring within 300–500 msec after stimulus onset, which has been linked to familiarity judgment (Duarte, Ranganath, Winward, Hayward, & Knight, 2004; Tsivilis, Otten, & Rugg, 2001; Curran, 2000; Düzel, Yonelinas, Mangun, Heinze, & Tulving, 1997; for a review, see Rugg & Curran, 2007). Unlike the LPC, the FN400 appears insensitive to long delays.
ERP data provide meaningful information concerning the time course of long-term retrieval processes but do not propose an integrated view of the complete dynamics during the reinstatement of episodic memories. Multivariate pattern analyses (MVPAs) on time-resolved signals have been proposed in this sense (King & Dehaene, 2014). This decoding technique was recently used to characterize the temporal dynamics associated with mental imagery arising from STM (Dijkstra, Mostert, DeLange, Bosh, & van Gerven, 2018). Data revealed a late sustained pattern that relied on activations in the sensory areas as well as in the frontal and parietal regions. This sustained pattern suggests that a single process was involved during mental imagery reinstatement and contrasted with the sequential ordering observed during perception.
In this study, we decided to test the transformation hypothesis proposed by the MTT in light of the cortical reinstatement theory. To this end, we proposed to characterize the temporal dynamics of mental imagery from long-term memories and track these dynamics as memories aged. Three predictions were made: (1) the recollection of episodic details should be associated with a clear pattern like the one obtained during mental imagery from STM (Dijkstra et al., 2018), (2) this pattern should vanish over the passage of time reflecting the loss of episodic details, and (3) this recollection pattern should rely on the same neural processes irrespective of the age of the memory. To this end, we investigated participant EEG activity (n = 11) during the recall of short audiovisual clips seen 3 weeks, 1 day, and a few hours beforehand. Recall was elicited by a short presentation of snapshots of previously seen or new audiovisual clips and assessed by a Remember/Know/New (R/K/N) paradigm. The reason audiovisual clips were used was to enable participants to perform rich and dynamic mental imagery. ERP and decoding analyses were performed.
Eleven healthy participants (six women, mean age = 24 years, SD = 1.9 years) gave written informed consent and participated in the experiment approved by the INSERM ethical evaluation committee.
Seven-second audiovisual clips (n = 750) downloaded from the Internet were used in the experiment. The clips contained no speech and were selected to be diversified and unknown to the participants. Half of them were shown to a group of participants (n = 6) during the learning sessions, and the other half of the clips were shown to the second group of participants (n = 5). For all the audiovisual clips, one frame of the 7-sec clip was chosen as a representative snapshot of the clip and presented during the recall phase. The same 750 snapshots were presented for all the participants and counterbalanced between participant groups.1
The experiment included two phases: an explicit learning phase during which audiovisual clips were presented to the participants and a recall test of these clips (Figure 1). The explicit learning phase was composed of three viewing sessions occurring respectively 3 weeks (3W, mean interval = 21 days 2 hr 43 min, SD = 16 hr 54 min), 1 day (1D, mean interval = 1 day 2 hr 21 min, SD = 2 hr 39 min), and several hours (HOURS, mean interval = 4 hr 37 min, SD = 1 hr 6 min) before the recall test. In every session, 125 audiovisual clips (duration = 7 sec, size = 320 × 240) were shown to participants (n = 11) on a computer screen. Every session was split in two blocks of 65 and 60 videos. All the clips were presented twice within a block but were not presented again in any other session blocks. To control for participant attention during viewing, participants were asked to make a button-press response every time a clip was shown for the second time. Participants could also indicate that they already knew a clip by pressing a key assigned to this purpose during the clip presentation, in which case the corresponding trials were discarded from further analysis. Overall, participants watched 375 audiovisual clips, presented twice, during the three learning sessions. The duration of a learning session was about 50 min. Participants were instructed that a recall test on the audiovisual clips would be performed 3 weeks after the first learning session. The testing phase performed 3 weeks after the first learning session was a cued-recall task of 750 trials and during which participant EEG activity was recorded. Each trial started with a fixation cross of varying duration (600–1000 msec) followed by either a snapshot from a clip seen during the learning sessions (OLD, n = 375) or a new one (NEW, n = 375), which was presented for 400 msec. After picture offset, participants were asked to press one of the three keys corresponding to Remember (“R”), Know (“K”), and New (“N”) judgments (randomized between participants), as fast and as accurately as possible, with the same hand. Before testing, participants were instructed about the meaning of these three response options. The definitions given to the participants were based on those proposed by Gardiner and Richardson-Klavehn (2000) and were the following: “R”: You identify the picture as previously seen in one of the three learning sessions and you can replay the related clip in your mind by anticipating the following scenes of the short scenario; “K”: You identify the picture as previously seen in one of the three learning sessions but you are unable to retrieve the clip it belongs to; “N”: The picture does not belong to any of the audiovisual clips shown during the three learning sessions. Participants were then invited to judge their response on a 5-point confidence scale (1 = not sure at all, 2 = not so sure, 3 = fairly sure, 4 = very sure, 5 = completely sure). The duration of the recall test was about 1 hr 15 min. The learning and recall phases were programmed using the Psychtoolbox, a MATLAB (The MathWorks, Inc.) toolbox.
EEG Acquisition and Analysis
All of the 11 participants were included in the EEG analysis.
EEG Acquisition and Preprocessing
During the recall test, participant brain activity was recorded with a 64-channel cap connected to a BioSemi EEG amplifier (5 kHz) and with a sampling frequency of 1024 Hz. A finite impulse response Blackman band-pass filter was applied to the data with a lower cutoff frequency of 0.1 Hz to avoid filtering artifacts (Acunzo, Mackenzie, & van Rossum, 2012), a higher cutoff frequency of 100 Hz and a transition bandwidth = 1. Data were down-sampled to 256 Hz. Trials were then epoched from −1 to 2 sec relative to picture onset, their baseline removed (−0.2 to 0 sec) and rereferenced based on the average reference. Independent component analyses were performed on the whole data, and the components were visually inspected to remove the ones that did not correspond to EEG signal (eye and jaw movements, heart activity). For all the participants, trials with extreme values for a channel (>100 or < −100 μV), an abnormal trend (slope max: 75 μV, R2 > .3), or an abnormal distribution (kurtosis outside 5 SDs of the kurtosis for each single electrode) were excluded from the analysis after visual inspection. All the preprocessing steps were done with EEGlab (Delorme & Makeig, 2004).
Importantly and following the recommendations in VanRullen (2011), the analysis of the EEG data was restricted to the conditions that were manipulated by the experimenter. Indeed, introducing subjective responses in the comparisons such as participant memory judgment or selecting only the correct responses would likely produce biases in the analysis. Therefore, in the following ERP and MVPAs, trials were split according to the four retention intervals: HOURS, 1D, 3W, and NEW for snapshots respectively seen a few hours, 1 day, and 3 weeks beforehand, or never seen in the learning sessions.
The ERP time courses for the four retention intervals were analyzed on average across the participants for t = −200 to 800 msec after picture onset. The topographical representations of the ERPs were analyzed every 50 msec. Differential activity was also computed on a 2 × 2 analysis by contrasting trials with snapshots seen during a learning session and trials with NEW snapshots. The topographical representations of the ERP contrasts were analyzed every 50 msec. To ensure that the activity observed was not mainly driven by some motor activity linked to the button press, the ERPs for the four retention intervals were aligned to participant response onset. All trials answered within 1.8 sec after picture onset were included (HOURS: mean number = 113, SD = 7; 1D: mean number = 105, SD = 12; 3W: mean number = 91, SD = 21; NEW: mean number = 289, SD = 51). These response-locked ERPs were analyzed on average across the participants for t = −800 to 200 msec after response onset. The differential activity between snapshots previously seen and NEW snapshots was also computed on a 2 × 2 analysis for the response-locked ERPs.
MVPAs were conducted on the same data used for the ERP analyses. The classification was performed on two classes: one class corresponding to snapshots seen during a specific learning session (HOURS, 1D, or 3W) and the other class corresponding to NEW snapshots. Trials were randomly split 120 times according to the Monte Carlo cross-validation (CV) procedure: 120 CVs. For each CV, the number of trials was equalized between the classes: 90% of the trials were used by the algorithm as a training set, and the remaining 10% were used as a test set. To reduce the effect of outlier values in the signal, the activity of each electrode was normalized across the trials: For each CV, the signal was scaled between 0 and 1 by using parameters that were estimated from the training set. A linear classifier (L2-regularized logistic regression; Fan, Chang, Wang, & Lin, 2008) was trained on individual trials at each time point from −200 to 800 msec and tested on individual trials on the same time points and all the other time points (temporal generalization). The performance of the classifier for every participant and at each test time point was evaluated by using the area under the curve. The average decoding performance across participants resulted in a decoding matrix where the x and y axes represented the training and testing times, respectively. Chance-level decoding was calculated by performing the same classification on randomly permuted labels. This generalization across time was further extended to generalization across conditions where a classifier trained on one condition was tested on its ability to generalize to another condition (e.g., trained on HOURS and NEW trials and tested on 1D and NEW trials). Again for each condition, one class corresponded to snapshots never seen before (NEW) and one class to snapshots seen during a specific learning session (HOURS, 1D, or 3W). This led to six different time-and-condition generalization decoding matrices. These six decoding matrices were obtained using the same procedure as the previous time generalization decoding matrices except for the number of CV which was lowered to 20 to reduce computational time.
The same statistical procedure was applied for the EEG analyses described above: time course representation (channels by time points), topographical representation (channels at specific time points), and decoding matrices (time points by time points). Statistical significance was assessed using nonparametric cluster permutation tests, n(permutations) = 4,000, p < .05, to control for the family-wise error rate (Maris & Oostenveld, 2007). This was performed by adding Fieldtrip functionalities (Oostenveld, Fries, Maris, & Schoffelen, 2011) to the EEGlab toolbox.
On average, 8 of the 375 videos presented (SD = 9.8) were considered already known by the participants. For each participant, the corresponding trials were discarded from the behavioral and the EEG analysis.
Three learning sessions were done respectively 3 weeks (3W), 1 day (1D), and a few hours (HOURS) before the recall test. To make sure that participants were paying attention to the videos, they were asked to make a button-press response whenever a video was presented for the second time. Accordingly, a correct and a wrong button press respectively corresponded to a hit (HIT) and a false alarm (FA) within the signal detection theory framework. The percentage of HITs and FAs was close to optimal performance and was stable over the different learning sessions: 3W: HITs = 96.7% (SD = 3.2, range = 90.2–100%) and FAs = 0.3% (SD = 0.4, range = 0–0.8%); 1D: 96.3% (SD = 4.7, range = 86.5–100%) and FAs = 0.5% (SD = 0.6, range = 0–1.7%); HOURS: 95.9% (SD = 5.5, range = 83.1–100%) and FAs = 0.2% (SD = 0.4, range = 0–0.8%). These results show that the encoding of the audiovisual clips was similar across the three learning sessions (one-way ANOVA for HITs: F(2, 30) = 0.1, CI95 difference (Learning 1 vs. Learning 2) [−4.4, 5.3], CI95 difference (Learning 1 vs. Learning 3) [−4.0, 5.8], CI95 difference (Learning 2 vs. Learning 3) [−4.4, 5.3]). For each learning session, the audiovisual clips were presented twice inside a block of 65 or 60 clips. On average across participants, the average median number of clips between the two presentations of the same clip was 30.4 (SD = 0.7, range = 30–32) and varied between 0.5 (SD = 0.5, range = 0–1) and 112.4 (SD = 3.4, range = 110–121) for each participant. Because the number of misses was very small compared with the number of HITs (participants were almost at ceiling), we could not test whether the number of clips between the two presentations of the same clip significantly affected participant performance.
During the recall phase, snapshots from audiovisual clips seen by the participant (n = 375) or NEW pictures (n = 375) were briefly presented on screen. Participants were asked to make an R/K/N judgment as quickly and accurately as possible and to rate the confidence in their response on a 5-point scale.
Performance over the retention intervals.
In this section, “R” and “K” responses were merged to calculate participant performance for the identification of the snapshots.
Participants were 90.1% (SD = 9.5), 85.1% (SD = 10.3), and 64.5% (SD = 18.1) correct in identifying snapshots seen respectively a few hours (HOURS), 1 day (1D), and 3 weeks (3W) before and were 79.7% (SD = 13.3) correct at identifying NEW pictures. The percentage of correct responses for snapshots seen 3W before was significantly different from the performance for snapshots seen HOURS (CI95 difference [10.4, 40.7]) and 1D before (CI95 difference [5.4, 37.7]; one-way ANOVA: F(3, 40) = 7.68; Figure 2A).
Participant average confidence was 4.7 (SD = 0.3), 4.6 (SD = 0.4), and 3.9 (SD = 0.8) for the HOURS, 1D, and 3W conditions, respectively, and 3.5 (SD = 0.9) for the NEW snapshots. The HOURS condition was significantly different from the 3W condition (CI95 difference [0.0, 1.5]) and the NEW condition (CI95 difference [0.4, 1.9]; one-way ANOVA: F(3, 40) = 7.64). These results show that participants were able to identify the snapshots explicitly and that they could estimate accurately the probability of being correct.
Participant performance was strongly negatively correlated (r = −.65, Pearson's correlation coefficient) with the speed of the recall. Participant median RTs increased over the retention intervals: 1.04 sec (SD = 0.12), 1.14 sec (SD = 0.16), and 1.53 sec (SD = 0.34) for snapshots correctly identified and respectively seen in the HOURS, 1D, and 3W conditions. Participant median RT was 1.27 sec (SD = 0.20) for NEW snapshots. The 3W condition was significantly different from the HOURS condition (CI95 difference [0.23, 0.74]) and 1D condition (CI95 difference [0.13, 0.64]; one-way ANOVA: F(3, 40) = 9.5; Figure 2B).
Overall, participant subjective responses were distributed as follows: “R” = 33.9% (SD = 11.4), “K” = 15.8% (SD = 7.4), and “N” = 50.3% (SD = 11.8). The significant difference between “R” and “K” responses shows that the snapshots were good cues for eliciting the recollection of the audiovisual clips (paired t test: t(10) = 3.9, p < .01, CI95 difference [7.9, 28.3]). Participant average performance was high for “R” and “N” responses with respectively 93.0% (SD = 6.0) and 81.4% (SD = 7.4) of correct responses and significantly lower for “K” responses: 55.3% (SD = 20.4; one-way ANOVA: F(2, 30) = 24.34, CI95 difference [24.0, 51.3], CI95 difference [12.5, 39.8]). Participant average confidence was significantly the highest for “R” responses: 4.7 (SD = 0.2) and lower for “N”: 3.4 (SD = 0.9) and “K” responses: 3.0 (SD = 0.5; one-way ANOVA: F(2, 30) = 23.19, CI95 difference [1.1, 2.4], CI95 difference [0.6, 1.9], CI95 difference [−1.1, 0.2]).
The average number of “R,” “K,” and “N” judgments across participants is presented in Table 1. The values seem to indicate that correct “R” responses decrease over the retention interval whereas correct “K” responses increase. To test whether this trend was significant, we calculated the proportion of correct “R” responses compared with correct “K” responses: the correct “R/K” ratio. This ratio was calculated for each participant and over the three retention intervals: HOURS, 1D, and 3W. Interestingly, the average correct “R/K” ratio across the participants decreased over the retention intervals: 87.4% (SD = 15.7), 81.4% (SD = 16.4), 55.6% (SD = 23.31) for snapshots respectively seen in the HOURS, 1D, and 3W conditions with the 3W condition significantly different from the HOURS condition (CI95 difference [12.0, 51.6]) and 1D condition (CI95 difference [5.9, 45.5]; one-way ANOVA: F(30, 2) = 8.85; Figure 2C).
|NEW .||HOURS .||1D .||3W .|
|N .||K .||R .||N .||K .||R .||N .||K .||R .||N .||K .||R .|
|NEW .||HOURS .||1D .||3W .|
|N .||K .||R .||N .||K .||R .||N .||K .||R .||N .||K .||R .|
Participant median RTs for correct “R” responses were on average 1.01 sec (SD = 0.1), 1.07 sec (SD = 0.2), and 1.30 sec (SD = 0.3) for snapshots seen in the HOURS, 1D, and 3W conditions, respectively (Figure 2D), with a significant difference between the HOURS and 3W conditions (CI95 difference [0.04, 0.52]; one-way ANOVA: F(2, 30) = 4.69). Participant RTs for remembering the clips were negatively correlated with participants performance (r = −.58). In contrast, correct “K” responses were stable over retention intervals: 2.08 sec (SD = 0.7), 2.11 sec (SD = 0.7), and 1.99 sec (SD = 0.7) for the HOURS, 1D, or 3W conditions, respectively, F(2, 29) = 0.09. Overall, participant RTs were slower for “K” (mean = 2.00 sec, SD = 0.7) compared with “R” (mean = 1.07 sec, SD = 0.1, CI95 difference [0.49, 1.37]) and “N” responses (mean = 1.27 sec, SD = 0.2, CI95 difference [0.29, 1.17]; one-way ANOVA: F(2, 30) = 15.17). Slower RTs for “K” responses are due to our experimental design that pushes participants to make fast “R” responses preferentially. Late “K” responses simply indicate a failure of recollection after about two sec.
Overall, the behavioral results indicate that participant ability to replay the audiovisual clips in their mind decreased over the retention intervals. This had a direct impact on the percentage of correct discrimination between the old and new snapshots and on the timing of the decision. The gradual shift from “Remember” to “Know” responses is in agreement with the transformation hypothesis proposed by the MTT. This should be directly reflected by differences of brain dynamics. We now propose to investigate these temporal dynamics by analyzing participant EEG activity during the recall test.
Stimulus-locked ERP Analyses
First, the topographical maps corresponding to the ERP activity elicited by the snapshots presented in the three watching sessions (OLD) and by snapshots never seen before (NEW) were compared every 50 msec (paired t test, p < .05, cluster corrected; Figure 3). This provides a global overview of the temporal dynamics involved during memory recall at the scalp level. Significant differences were found from 350 msec poststimulus onset onward and involved a large and distributed network of channels. Over time, a gradual shift from frontocentral to centroparietal locations could be observed along with significant differences in the occipital regions. To track how these dynamics evolved over the passage of time, the OLD condition was split according to the three retention intervals: HOURS, 1D, and 3W. The corresponding topographical representations are presented in Figure 4A. The activity for these three retention intervals was respectively compared with the NEW condition. The statistical maps with the exact p values (paired t tests, cluster corrected) are presented in Figure 4B. For the three contrasts, significant activations were found at t = 450 msec and t = 650 msec poststimulus onset. At t = 450 msec, significant channels were observed in the frontocentral area. Occipital channels were also involved for the HOURS and 1D conditions. At t = 650 msec, the difference of activity mainly relied on centroparietal channels for the two recent intervals and on frontocentral channels for the 3W condition. Overall, significant activations for the two recent intervals covered an extended time period and involved a vast network of channels as previously observed for the main OLD/NEW contrast. This was different for the 3W condition for which differences of activity remained located in the frontocentral region and did not extend over long time periods. The topographical representations allowed a clear identification of the frontocentral, central, parietal, and occipital regions as key regions involved during the retrieval of the audiovisual clips.
ERP time course analyses.
The channels FC1, C1, P3, and P10 were representative of the respective frontocentral, central, parietal, and occipital regions that were involved during memory retrieval. The activity modulation of these channels for the HOURS, 1D, 3W, and NEW retention intervals is presented in Figure 5. The channels FC1 and C1 exhibited significant activity from t = 400 msec to t = 650 msec poststimulus onset (one-way ANOVA, p < .05, cluster corrected). Although the P3 channel was found to be significant at t = 550 msec, t = 600 msec, t = 750 msec, and t = 800 msec poststimulus onset for the OLD/NEW contrast, no significant activation modulation was found for this channel when considering the four conditions tested over all channels and time points (one-way ANOVA, p < .05, cluster corrected). It is interesting to note that the P10 channel was found significant from the stimulus offset (t = 400 msec) to the end of the epoch (t = 800 msec), which was demonstrative of the activity modulation in the occipital area. Overall, the modulation of the ERP activity observed for these representative channels closely matches the memory components classically found in the literature. The frontal activations observed at t = 400–500 msec and the late parietal component found at t = 600–800 msec resemble the FN400 and the LPC, respectively.
The ERP analyses were performed between t = −200 msec and t = 800 msec poststimulus onset. This time window was selected to analyze the components associated to memory retrieval without including participant median button response timing (1.04–1.53 sec). But even with these precautions it might be possible that the significant activations observed at the end of the epoch were directly driven by some motor activation linked to the button press. To test for a potential motor effect, the ERPs of the HOURS, 1D, 3W, and NEW conditions were aligned to the participant response onset (t = −800 to 200 msec) and analyzed on average across the participants. Statistical testing (one-way ANOVA, p < .05, cluster corrected) across all channels and time points revealed no significant differences in the time course of the 64 channels. The time course of the representative channels FC1, C1, P3, and P10 are presented in Figure 6A.
Late effects were mainly observed for the HOURS/NEW and 1D/NEW contrasts. We therefore tested whether significant differences could be detected for these two contrasts when ERPs were aligned to participant response onset (paired t test, p < .05, cluster corrected). Again, no significant differences were observed. The topographical representation of the response-locked ERPs for the HOURS and NEW conditions is shown in Figure 6B. The analysis of the response-locked ERPs did not reveal any significant motor component that would be responsible for the ERP effects observed earlier.
Overall, the ERP data showed that the brain processes involved in the recall of the audiovisual clips differed according to the retention interval. For the clips seen 3 weeks beforehand, the ERP activations were transient and focused on the frontocentral area, whereas for the recent clips a large network of channels was activated for an extended time period. These dynamics might be reflected by different patterns. In the next section, we use MVPA techniques to characterize such temporal patterns.
Multivariate Pattern Analysis
We performed MVPA on our EEG data by comparing the activity for snapshots seen in one of the three learning sessions (HOURS, 1D, or 3W) and NEW snapshots. For each participant, classifier performance was measured for the trial tests at the same time point as for the training sets (diagonal matrix) or for any other time points (off-diagonal matrix). Performance was averaged across the 11 participants and calculated using the area under the curve where chance level corresponded to the same classification performed using random labels (Figure 7A). Classification performance was found to be significantly different (paired t test, p < .05, cluster corrected) from chance level from around t = 500 msec poststimulus onset and until the end of the epoch for the HOURS/NEW comparison. This resulted in a clear square-shaped pattern. Similar findings were observed for the 1D/NEW contrast, although no generalization was found when the decoder was trained at t ∼ 500 msec and tested at t = 600–800 msec and vice versa. For the 3W/NEW contrast, significant time points were observed within the same timing range and formed small connected clusters resulting in a degraded square-shaped pattern.
Similarity analyses on the differential correct response percentages and on the decoding matrices were performed for each participant. Interestingly, we found that the differential correct response percentages for the HOURS/NEW, 1D/NEW, and 3W/NEW conditions were correlated with the respective decoding matrix performance (mean: r = .57, SD = 0.7, range = −0.99 to 1.00, Pearson's correlation coefficient). This suggests that it might be possible to infer participant performance on very long term memories from the matrix decoding obtained through MVPA.
The above time generalization decoding matrices revealed important features concerning the recollection of long-term memories in regard to the predictions made in the introduction. Recollection was associated with a clear square-shaped pattern that faded away over the retention intervals. To test whether these patterns relied on the same neural processes, we decided to test how a classifier trained on a specific retention interval could generalize to another retention interval. This resulted in six time-and-condition generalization decoding matrices. Significant time points (paired t test, p < .05, cluster corrected) for these six time-and-condition generalization matrices are shown in Figure 7, along with the significant time points obtained for the three previous time generalization matrices. Each of these nine decoding matrices corresponded to a specific entry in the 3 × 3 training versus test condition map. The three time generalization matrices for which the classifier was trained and tested for the same condition correspond to the decoding matrices numbers 1, 5, and 9. Interestingly, significant clusters where found for all the six time-and-condition generalization matrices (numbers 2, 3, 4, 6, 7, and 8). This means that common neural processes were shared across the different retention intervals and could be used by the classifier to perform above chance level. However, the generalization across condition varied according to the condition trained and tested. In particular, the classifier performed particularly well when trained on HOURS and NEW trials and tested on 1D and NEW trials (decoding matrix number 4). Classification performances were not perfectly symmetrical when the training and test conditions were inverted (e.g., training on HOURS/NEW and tested on 1D/NEW: decoding matrix number 4 and training on 1D/NEW and tested on HOURS/NEW: decoding matrix number 2). Indeed, more information would be available in the HOURS trials for the classifier to generalize to 1D trials rather than the opposite. Importantly, although less information was expected to be contained in the 3W trials, a classifier trained on 3W trials was still able to generalize to HOURS and 1D trials (decoding matrices numbers 3 and 6).
Using MVPA techniques on EEG signal, we were able to characterize the brain dynamics involved during the long-term recollection of audiovisual clips. The time generalization decoding matrices obtained for each retention interval revealed sustained patterns that emerged 500 msec after the presentation of the snapshots. This sustained activity was the most stable for the clips seen a few hours beforehand and faded away over the passage of time. Generalization across time and condition also revealed that the same processes were involved during the recollection of long-term memories independently of their age. According to the ERP analyses, the patterns observed emerged from the interaction between the central channels and the frontal region at early latencies (∼450 msec poststimulus onset) and between the central channels and the parietal sites later in time (t = 600–800 msec poststimulus onset).
Late effects could have been the result of motor initiation but no significant motor components were found when participant ERPs were time-locked to the response onset. The correlation between the decoding performance and participant performance over the retention intervals demonstrated a close link between the brain activity used by the algorithm and participant ability to recall the audiovisual clips. Overall our findings are in agreement with the MTT and the transformation hypothesis.
Many fMRI investigations have been carried out to solve the ongoing debate between the standard consolidation model and the MTT. More specifically, studies tested whether the hippocampus was still involved during the retrieval of remote episodic memories. In addition to classical univariate approaches, (e.g., Harand et al., 2012; Gilboa, Winocur, Grady, Hevenor, & Moscovitch, 2004; Maguire & Frith, 2003; Maguire, Henson, Mummery, & Frith, 2001) MVPA techniques were used (Bonnici & Maguire, 2018; Bonnici, Chadwick, & Maguire, 2013; Bonnici et al., 2012; Chadwick, Hassabis, Weiskopf, & Maguire, 2010). These studies revealed that the hippocampus was still activated during the retrieval of remote episodic details although some changes were observed over the passage of time reflecting some transformation of the memory trace. Indeed, remote episodic memories involved preferentially the posterior part of the hippocampus and were particularly visible in the ventromedial pFC compared with recent episodic memories. Similarly to our study, the contrasts used to find specific spatial patterns in these fMRI studies were based on the timing between the learning of the information and its retrieval. This differs from other fMRI studies that used subjective “R” and “K” contrasts (Rissman, Reggente, & Wagner, 2016; Rissman, Greely, & Wagner, 2010) and that could be more prone to interpretation biases (VanRullen, 2011).
We found that the correct “R/K” ratio decreased over the retention intervals. As previously observed in the literature (Piolino et al., 2009; Herbert & Burt, 2004; Conway et al., 1997), this shift from “R” to “K” responses shows that the recollection of the audiovisual content was becoming more difficult as time elapsed. Interestingly, the speed of “R” responses also increased over the retention intervals. Classical speeded R/K paradigms usually compare “R” and “K” participants' RTs. In addition to this comparison, our design allowed us to analyze the speed of “R” and “K” responses according to the retention intervals. Since the early 1950s and the development of the sequential analysis method, the speed of a decision has been linked to the amount of evidence accumulated toward the decision (Wald, 1947). The difference of timing observed for the “R” responses over the retention intervals would then directly reflect the difference of evidence that had been accumulated. The recollection of contextual details, which would result from the accumulation of information retrieved over the time of the decision, would be achieved quickly for recent memories but would take more time for remote memories. Accordingly in our design, “K” responses would indicate a failure to retrieve contextual information for snapshots previously seen. Indeed, “K” responses were associated with the longest latencies in our experiment.
This view of considering memory retrieval based on an evidence accumulation model is in agreement with the diffusion model proposed by Ratcliff that accounts for perceptual and memory processes (Ratcliff & Mckoon, 2008; Ratcliff, 1978). Although this perspective is now commonly held in the perception domain (Dehaene, 2009; Norris & Kinoshita, 2008; Gold & Shadlen, 2007; Ploran et al., 2007), it is not widely shared within the memory community to account for high level of retrieval such as familiarity and recollection processes (Shadlen & Shohamy, 2016; Moulin & Souchay, 2015; Koriat & Levy-Sadot, 2001). Indeed, instead of such a dynamic model of memory retrieval, most of the literature on recollection and familiarity relies on a debate between two static recognition models: the dual-process model and the single-process model. According to the dual-process models, familiarity and recollection are two distinct processes (Yonelinas, 2002; Gardiner & Richardson-Klavehn, 2000; Mandler, 1980; Juola, Fischler, Wood, & Atkinson, 1971), whereas the single-process models suggest that familiarity and recollection can be explained using a single strength-based measure (Wixted, 2009; Dunn, 2004; McClelland & Chappell, 1998; Shiffrin & Steyvers, 1997; Hintzman, 1988; Eich, 1982).
In most of the R/K paradigms, “R” and “K” responses are associated with recollection and familiarity processes. However, depending on the paradigms used, “K” responses can also reflect retrieval from semantic memory. In our study, participants were asked to recall specific audiovisual clips from the presentation of snapshots. Two response judgments were therefore possible and referred to the amount of source information retrieved (Gardiner & Richardson-Klavehn, 2000): The amount of information retrieved was sufficient to recall the clip (“R” response), and the identification of an old snapshot was not followed by the retrieval of the clip (“K” response). Therefore, “R” responses corresponded to the retrieval of item plus associative information whereas “K” responses reflect the retrieval of item-only information (Mickes, Seale-Carlisle, & Wixted, 2013). Accordingly, our design cannot provide a direct comparison between familiarity and recollection processes, although familiarity judgment might emerge from the retrieval of item-only information (Mickes et al., 2013).
The sustained activity that we observed could be explained by the episodic buffer account proposed by Baddeley (2000). According to this view, the key role of the episodic buffer would be to provide temporary storage for the information retrieved from long-term memory. Indeed, the information would be bound into a unitary episodic representation that would be directly accessible to consciousness. Recent evidence shows that the parietal cortex might play this buffer role (Kuhl & Chun, 2014; Vilberg & Rugg, 2008; Wagner, Shannon, Kahn, & Buckner, 2005). Other regions would be also involved such as the visual areas, as supported by our data.
Interestingly, sustained decoding patterns were recently observed when participants were involved in a short-term visual imagery task and differed from the sequential decoding patterns observed during visual perception (Dijkstra et al., 2018). However, and as raised by the authors, the task and the stimuli used in their study were not the best to observe the constructive process involved during memory reinstatement. In our task, recollection corresponded to participants' ability to replay short audiovisual clips in their mind. Even in such a dynamical context, it is worth mentioning that a sustained pattern emerged as well. Again, the role of the episodic buffer, which is part of the working memory model, would explain the similarity between imagery from short-term and long-term memory.
In line with the evidence accumulation model described earlier, the difference of signal strength for the sustained patterns would be directly linked with the amount of information retrieved in memory. Indeed, studies showed that the amplitude of the LPC was higher if recollection required the retrieval of several contextual details (Vilberg, Moosavi,& Rugg, 2007; Wilding, 2000; Wilding & Rugg, 1996). The same effect was also observed when comparing the LPC of remote and recent memories (Tsivilis et al., 2015).
Another possibility would be that, over the consolidation process, long-term memories might become supported by a smaller and more specialized neuronal network to prevent their loss (Thorpe, 1989, 2011). The decrease of signal would therefore reflect the involvement of a more localized pattern of neurons.
Importantly, sleep might have played a crucial role in the change of sustained activity found for the different retention intervals. Indeed, numerous studies have shown that, during sleep, a memory trace can be reactivated several times and therefore be consolidated into a stable neuronal pattern (Girardeau, Benchenane, Wiener, Buzsáki, & Zugaro, 2009; Peyrache, Khamassi, Benchenane, Wiener, & Battaglia, 2009; Sirota, Csicsvari, Buhl, & Buzsáki, 2003; Buzsáki, 1989). However, in our design, we could not isolate this variable to measure its impact.
Overall, we showed that imagery from long-term memory was associated with a sustained pattern of activity that faded away over the passage of time. This provides further evidence toward the MTT. Further investigations should be carried out to compare these data with patients suffering from memory impairment.
This research was supported by the European Union's Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no. 323711 to S. J. T. The authors also acknowledge the support of the French Agence Nationale de la Recherche under grant ANR-12-BSH2-0010 (project ELMA). We thank Emmanuel Barbeau for helpful discussions and Estelle Bonin for her assistance in the stimuli preparation. We are also very grateful for the feedback that the two anonymous reviewers gave us during the revision process.
Reprint requests should be sent to Christelle Larzabal, Pavillon Baudot, CHU Purpan, Toulouse 31052, France, or via e-mail: firstname.lastname@example.org.
All the audiovisual clips and snapshots used in this experiment are freely available for research purposes and can be downloaded using the following link: https://data.mendeley.com/datasets/nh8xsmr289/draft?a=1813589b-7737-442d-9fcb-89fa0f3bee5d.
The open source script can be downloaded at https://github.com/scrouzet/classifyEEG.