Evidence is accumulating that the classic two-stage model of visual STM (VSTM), comprising iconic memory (IM) and visual working memory (WM), is incomplete. A third memory stage, termed fragile VSTM (FM), seems to exist in between IM and WM [Vandenbroucke, A. R. E., Sligte, I. G., & Lamme, V. A. F. Manipulations of attention dissociate fragile visual STM from visual working memory. Neuropsychologia, 49, 1559–1568, 2011; Sligte, I. G., Scholte, H. S., & Lamme, V. A. F. Are there multiple visual STM stores? PLoS One, 3, e1699, 2008]. Although FM can be distinguished from IM using behavioral and fMRI methods, the question remains whether FM is a weak expression of WM or a separate form of memory with its own neural signature. Here, we tested whether FM and WM in humans are supported by dissociable time–frequency features of EEG recordings. Participants performed a partial-report change detection task, from which individual differences in FM and WM capacity were estimated. These individual FM and WM capacities were correlated with time–frequency characteristics of the EEG signal before and during encoding and maintenance of the memory display. FM capacity showed negative alpha correlations over peri-occipital electrodes, whereas WM capacity was positively related, suggesting increased visual processing (lower alpha) to be related to FM capacity. Furthermore, FM capacity correlated with an increase in theta power over central electrodes during preparation and processing of the memory display, whereas WM did not. In addition to a difference in visual processing characteristics, a positive relation between gamma power and FM capacity was observed during both preparation and maintenance periods of the task. On the other hand, we observed that theta–gamma coupling was negatively correlated with FM capacity, whereas it was slightly positively correlated with WM. These data show clear differences in the neural substrates of FM versus WM and suggest that FM depends more on visual processing mechanisms compared with WM. This study thus provides novel evidence for a dissociation between different stages in VSTM.
Traditionally, visual STM (VSTM) has been divided into two subsystems: a short-lasting, large capacity storage termed iconic memory (IM; Neisser, 1967; Sperling, 1960) and a long-lasting but limited-capacity storage termed visual working memory (WM; Luck & Vogel, 1997). IM was mainly thought of as a passive visual buffer (Baddeley, 2007) that lasted only for a few hundred milliseconds, from which long-lasting and robust WM representations were encoded. In the last decade, evidence for a third memory stage that lies in between IM and WM, termed fragile VSTM (FM; Sligte, Scholte, & Lamme, 2008; Makovski & Jiang, 2007; Griffin & Nobre, 2003; Landman, Spekreijse, & Lamme, 2003), has been found. However, although FM can be clearly dissociated from IM (Sligte et al., 2008), whether it is really different from WM remains a matter of debate (Makovski, 2012; Matsukura & Hollingworth, 2011).
FM is distinct from IM as it has a smaller capacity and lasts for several seconds instead of milliseconds (Sligte, Scholte, & Lamme, 2009; Sligte et al., 2008). Moreover, in contrast to IM, FM is not erased by a light mask (Sligte et al., 2008), and neural traces associated with FM have been found in V4, showing that these representations are based on cortical processing and not on retinal afterimages (Sligte et al., 2009). At the same time, FM seems to differ from WM because the presentation of a display containing similar stimuli overwrites FM representations, whereas it does not overwrite WM (Pinto, Sligte, Shapiro, & Lamme, 2013; Sligte et al., 2008). Second, when attention is diverted during memory encoding, FM capacity reduces only slightly, whereas WM capacity suffers considerably (Vandenbroucke, Sligte, & Lamme, 2011). In addition, when TMS is applied over the dorsolateral pFC during maintenance, WM capacity decreases, whereas FM capacity remains intact (Sligte, Wokke, Tesselaar, Scholte, & Lamme, 2011). Therefore, we have suggested that FM reflects a stage in VSTM in which visual cortical icons are maintained independent of focused attention, whereas information in WM has received selective attention, thereby making the information more robust and available for further manipulation and report (Sligte, Vandenbroucke, Scholte, & Lamme, 2010).
Although behavioral evidence is accumulating that FM and WM reflect different stages in VSTM, it could be that FM is merely a weak form of WM and depends on the same neural substrates. This would undermine the construct validity of FM. To resolve this issue, we examined the underlying EEG oscillatory characteristics of FM and WM capacities. If FM and WM depend on the same neural substrates, we would expect the same EEG components to underlie both forms of memory, perhaps with a quantitative difference. However, if FM and WM are neurally distinct, a qualitative difference should emerge from their underlying EEG characteristics.
Different oscillatory substrates have been found to support visual WM. For example, a decrease in alpha power over areas that are involved in the task, together with an increase in alpha over areas that are not involved, has been linked to engagement versus disengagement of these areas (Sauseng et al., 2009; Jokisch & Jensen, 2007). In addition, a sustained increase in gamma over midcentral and visual regions is often associated with memory maintenance (Tallon-baudry, 2009; Jensen, Kaiser, & Lachaux, 2007; Jokisch & Jensen, 2007; Tallon-baudry, Kreiter, & Bertrand, 1999). Recently, it has been proposed that WM might be supported by the link between gamma and theta oscillations, in which gamma cycles would represent single items embedded in a theta wave (Lisman & Jensen, 2013; Jensen & Colgin, 2007). In the current study, our main goal was to investigate whether FM and WM capacities are supported by different neural substrates. Therefore, we explored whether any difference in oscillatory power or in the coupling between theta and gamma oscillations emerged between FM and WM in different frequency bands.
In this study, we recorded EEG while participants performed a typical partial-report change detection task that measures both FM and WM capacities in a single experiment (Vandenbroucke et al., 2011; Sligte et al., 2008; Makovski & Jiang, 2007; Landman et al., 2003; Figure 1). To measure FM capacity, a spatial cue is presented after offset of the memory display but before onset of the test display (Figure 1A). The cue is presented 1000 msec after offset of the memory display, which ensures that IM has decayed (Sligte et al., 2008; Neisser, 1967; Sperling, 1960). Because, in this case, performance cannot be based on retinal afterimages anymore, memory performance on the cued item is indicative of all information that was cortically processed and maintained (Sligte et al., 2008, 2009). The presumption is that, when no new visual information has perturbed the fragile memory traces, all information that is potentially available on a visual level can be retrieved (Pinto et al., 2013; Makovski, Sussman, & Jiang, 2008). The use of a retro-cue before onset of the test display is thus necessary to probe FM capacity. To measure WM, the spatial cue is presented after onset of the test display. In this case, the presentation of new, and similar, visual information replaces all fragile visual memory traces. Any information that can still be retrieved after onset of the test display is therefore attributed to more deeply processed and robust representations, termed as WM representations (Figure 1B; Pinto et al., 2013; Sligte et al., 2008). Percentage correct for the two trial types is converted into FM and WM capacities and typically differs between individuals. We investigated whether FM and WM capacities—derived from percentage correct on the test display—are related to different oscillatory characteristics of the EEG signal recorded during the task. Specifically, we focused on the EEG signals recorded before onset of the cue: If indeed FM and WM reflect different neural representations, the capacity difference should be evident during formation of these representations and thus before the spatial cue is used to access them. We correlated individual FM and WM capacities with power in four different frequency bands (theta: 4–7 Hz, alpha: 8–15 Hz, beta: 16–30 Hz, gamma: 31–70 Hz). In addition, we tested the relationship between capacity and theta–gamma coupling because this specific form of coupling has been linked to WM processing (Lisman & Jensen, 2013; Sauseng et al., 2009; Jensen & Colgin, 2007).
Twenty-five students (mean age = 23 years, SD = 2 years; 11 men) from the University of Amsterdam participated in this experiment for course credit or monetary reward. All participants had normal or corrected-to-normal vision and signed an informed consent form before participation. The study was approved by the local ethics committee of the University of Amsterdam.
Memory and test displays consisted of white rectangles (1.4° × 0.4° in visual degrees) presented on a black background, placed radially (2.6°) in eight invisible placeholders. The rectangles had four possible orientations: horizontal, vertical, 45° rotated to the horizontal, or 135° rotated to the horizontal. The neutral cue consisted of a white star (total span = 2.4°) containing eight arms pointing toward the eight possible item locations. To create the spatial cue, one of the eight white arms was replaced by a red arm (Figure 1A and B).
Task and Procedure
To indicate the start of a trial, the gray fixation dot turned green for 500 msec. Then, the memory display appeared for 250 msec containing two, four, six, or eight oriented rectangles placed randomly in the eight placeholders (Figure 1A and B). Participants were instructed to remember the orientation of all rectangles. On FM trials (Figure 1A), a spatial retro-cue was presented 1000 msec after offset of the memory array, indicating which item could potentially change in the test display (50% change, 90° rotation, all other items remained unchanged). After 500 msec, the retro-cue was replaced by a neutral cue. The test display was presented 1000 msec after offset of the retro-cue, and participants indicated whether they perceived an orientation change in the cued memory item (cues were always valid). WM trials started the same as FM trials (Figure 1B), but instead of presenting a spatial cue 1000 msec after offset of the memory array, a neutral cue was presented for 1500 msec. The spatial cue was then presented 100 msec after onset of the test display. The test display stayed on screen until participants made their response, with a maximum of 4000 msec. All trials were separated by a 1000-msec ITI, in which a gray fixation dot was presented.
Before the start of the EEG recordings, participants received two training blocks of 64 trials (FM: 32, WM: 32; randomly intermixed). Throughout the task, trials were intermixed, and participants were not prompted to which trial type they would receive. The probability of a trial containing two, four, six, or eight rectangles was equal and randomly distributed within blocks (eight trials for each load in FM and WM). After the training trials, participants performed 384 trials for each condition (96 trials per load, total of 768 trials), separated in blocks of 64 trials.
To determine FM and WM capacities, Cowan's K was calculated [(hit rate − 0.5 + correct rejection − 0.5) × N)], which corrects for guessing (Cowan, 2001). To investigate the correlation between behavior and time–frequency characteristics, FM and WM capacities for each participant were taken as the maximum score on any of the four loads (Sauseng et al., 2009). This reflects individual FM and WM capacities most reliably, because when load heavily exceeds memory capacity (e.g., in the Load 8 WM condition), participants might underperform compared with their true capacity.
EEG Recordings and Preprocessing
EEG was recorded at 1024 Hz using a 64-channel Biosemi ActiveTwo system (BioSemi, Amsterdam, The Netherlands) placed according to the 10–20 system. Offline data were down-sampled to 512 Hz, high-pass filtered at 0.5 Hz, and rereferenced to the average of two earlobes electrodes. Trials were epoched from −1 to 4.5 sec relative to the onset of the green preparation cue (which corresponds to −1.5 to 4 sec relative to onset of the memory array). Because of a recording error, for two participants only 512 trials were recorded. All trials were visually inspected, and trials containing artifacts not related to eye blinks, such as activity because of muscle tension, were removed. One participant was removed because of an excessive number of artifacts, leaving too few trials to analyze. For the remaining 24 participants, an average of 7.8% of the trials was removed (ranging from 1.6% to 22.6%, SD = 5.5%), leaving a minimum of 59 trials per load per memory condition and a minimum of 244 trials per overall memory condition.
After artifact rejection, an independent component analysis was performed for each participant, and components that were clearly related to eye blinks were removed using EEGLAB (UC San Diego; Delorme & Makeig, 2004). Independent components that clearly only mapped onto one lateral electrode were removed as well. After component removal, we applied a spatial filter (surface Laplacian) that increases topographical selectivity by filtering out spatially broad and therefore likely volume-conducted effects (Srinivasan, Winter, Ding, & Nunez, 2007). The units of data after this transformation are millivolts per square centimeter (mV/cm2). Both the removal of components that mapped onto only one lateral electrode and spatially filtering the data make it less likely that any effects found in the gamma range are because of muscle tension during the task (Fitzgibbon et al., 2015).
EEG Time–Frequency Decomposition: Power
All data were analyzed using MATLAB (The MathWorks, Inc., Natick, MA) in combination with EEGLAB (Delorme & Makeig, 2004). We convolved the time domain signal with a complex Morlet wavelet with increasing cycles as frequency increased (3–15 cycles, logarithmically spaced in 30 steps; Cox, van Driel, de Boer, & Talamini, 2014; Cohen, van Gaal, Ridderinkhof, & Lamme, 2009; Cohen, Elger, & Ranganath, 2007). The resulting squared complex signal provided an estimate of power for each time point at 30 frequencies between 2 and 70 Hz (logarithmically spaced). Epochs were centered at the onset of the memory array, and relatively large windows were taken (−1.5 to 4 sec relative to the onset of the memory display) to prevent edge artifacts from contaminating the estimates of power. Power was normalized using a decibel (dB) transform, for which the baseline was taken as the average power over each frequency band at −1000 to −600 msec (gray fixation) for each condition. This way, data from each participant and each condition were in the same scale and thus comparable.
EEG Time–Frequency Decomposition: Phase–Amplitude Coupling
To extract phase–amplitude coupling (PAC) from the EEG signal, the time domain signal was again convolved with a complex Morlet wavelet. The analysis was restricted to three phase frequencies in the theta range (three frequencies logarithmically spaced between 5 and 7 Hz) and three amplitude frequencies in the gamma range (three frequencies logarithmically spaced between 45 and 65 Hz) to minimize the number of comparisons. The chosen frequencies were determined a priori and based on previous literature that showed a relationship between theta–gamma coupling and WM (Lisman & Jensen, 2013; Sauseng et al., 2009; Jensen & Colgin, 2007).
For the convolution with the Morlet wavelet, six cycles were used for the phase frequencies to obtain a better frequency resolution and three cycles for the frequencies used for the amplitude component to obtain a better temporal resolution. Phase was estimated by taking the angle of the convolution results. Power was defined by the squared complex signal of the convolution result. To calculate PAC, we used debiased PAC (dPAC; van Driel, Cox, & Cohen, 2015; Cox et al., 2014; Canolty et al., 2006). PAC is derived by multiplying power by exp(i × phase) for each time point (where i is the imaginary component) and then taking the average over a specific time window. Because of the possibility of a nonuniform phase angle distribution, we debiased the PAC term by subtracting the mean of exp(i × phase) for each time window from each individual time point before averaging, thereby creating dPAC (van Driel et al., 2015). dPAC values were calculated over single trials and then averaged. dPAC values result in arbitrary units. To be able to compare coupling across individuals and time–frequency coupling windows, we calculated the z value derived from a random null distribution. The null distribution for dPAC values was created by shuffling the power time series for each time–frequency coupling window with respect to the phase time series 200 times. Z values for the data were then calculated by subtracting the mean of this null distribution and dividing by its standard deviation. Deflections from zero thus reflect a positive coupling compared with coupling under the assumption of no relation between oscillation and time. This normalization allowed us to compare data from each participant and each condition. Because we were not interested in task modulation per se, we did not perform an additional baseline correction on these normalized dPAC values.
Electrode and Time–Frequency Window Selection
Because we did not have any a priori hypotheses regarding the electrodes at which we would find correlation differences between oscillatory characteristics and FM/WM, we reduced the number of comparisons by pooling data across electrodes and frequency bands and investigated specific time windows of interest. As the nature of the task was such that we did not expect any lateralization effects before onset of the spatial retro-cue, we first pooled together electrodes from both hemispheres, including the middle electrodes together with their adjacent laterals. This resulted in 27 distinct electrode poolings. Then, we pooled together the data across different frequencies to create frequency bands that are most common in the literature: theta (4–7 Hz), alpha (8–15 Hz), beta (16–30 Hz), and gamma (31–70 Hz). Last, we averaged power over seven different time windows: two preparation phases (−500:−250 and −250:0 msec), the memory display processing phase (0:250 msec), and four encoding/maintenance stages (250:500, 500:750, 750:1000, and 1000:1250 msec; see Figure 1). This left us with 27 (electrode poolings) × 4 (frequency bands) × 7 (time windows) comparisons for the power analysis and 27 (electrode poolings) × 9 (PACs) × 7 (time windows) for the PAC analysis.
To investigate the oscillatory mechanisms underlying FM and WM capacities, we correlated the maximum FM/WM capacity per participant with the average power on all trials at each frequency band and at each time window (using Spearman's rho, R, which can deal with possible nonparametric relationships), creating a correlational time–frequency plot for each electrode pooling. Similarly, the PAC index (dPAC) was correlated with maximum FM/WM capacity. Because our measures of FM and WM capacities were correlated (R = .44, p = .03), we computed partial correlations (analyzing WM capacity while partialing out FM capacity, and vice versa). By using partial correlations, we ensured that the variance explained between capacities and the oscillatory characteristics were either attributable to FM or WM. Because we were interested in the difference between FM and WM, any shared variance was not analyzed. To directly compare the difference between FM and WM, we transformed the partial correlations using Fisher's Z, which allows for the comparison of nonnormally distributed data (Fisher et al., 1970). FM and WM correlations were then tested against each other using Fisher's Z test, in which a Z value for the difference between the two correlations was calculated. p Values were false discovery rate (FDR) corrected at a false discovery proportion of 0.05. FDR corrections were carried out separately for the power and the PAC correlations.
Our main analyses focused on correlations based on between-subject differences in capacity. Because we believe capacity to be a trait that is stable within a participant rather than a fluctuating state, we deemed this approach valid. However, it could be that the state a participant was in during a particular trial influenced their ability to remember the rectangles on a given trial. The nature of the change detection task makes it impossible to investigate fluctuations on a trial-by-trial level: Because it is a two-forced choice task, in 50% of the correct (and incorrect) trials, performance could have been based on chance. Therefore, a fair number (60+) of trials is necessary to evaluate performance and reliably estimate a participant's capacity.
Because we had a minimum of 244 trials per memory condition per participant, we were able to do a split-half analysis and divide the trials into low- and high-power trials. If the state of a participant would specifically influence their FM or WM performance, one would expect that capacity as measured on the high-power trials would differ from that on the low-power trials. For each significant time–frequency window separately and within each participant, we ranked all FM and WM trials according to power (not baseline corrected), divided the data into two (one low-power and one high-power trial set), and calculated capacity over these two sets. The capacity difference between the low- and high-power trial sets was analyzed with 35 ANOVAs (2 × 2, WM/FM × High/low power). Because these were unplanned comparisons, p values were Bonferonni-corrected (alpha = 0.05/35).
Using a 2 (Memory: FM, WM) × 4 (Load: 2, 4, 6, 8) repeated-measures ANOVA, we found a main effect of Memory, showing that FM capacity was larger than WM capacity (Figure 1C; F(1, 3) = 98.8, p < .001). There was an interaction effect between Memory and Load (F(1.8, 41.2) = 27.5, p < .001), revealing that WM capacity increased between Loads 2 and 4 (t(23) = 7.8, p < .001) but not between Loads 4 and 6 or between Loads 6 and 8 (t(23) = −0.3, p = .736; t(23) = 0.7, p = .473), whereas FM capacity increased until Load 6 (difference between Loads 4 and 6: t(23) = 7.7, p < .001) and then leveled off between Loads 6 and 8 (t(23) = 1.8, p = .093). This confirms previous work showing that FM has a larger capacity than WM and that FM performance can increase with larger memory load, whereas WM capacity stays fixed even when increasing the number of items to remember (Vandenbroucke et al., 2014; Sligte et al., 2008). The current capacities for FM and WM are somewhat lower than previously found for the same objects (Vandenbroucke et al., 2011; Sligte et al., 2008). In previous studies, however, participants received more extensive training on the task, which maximized both their FM and WM scores.
General Power Characteristics
In Figure 2, general power characteristics of the task are depicted. The only statistical analysis we performed on these overall power characteristics was to confirm that there were no statistical differences (FDR-corrected) between FM and WM trials before onset of the cue, as the two trial types are the same here. In Figure 2, it can be seen that there are no differences between the two trial types (FM–WM) from the onset of the green cue indicating the start of the trial (−500 msec relative to memory display) and cue onset (1250 msec relative to memory display) in parietal-occipital (Figure 2A), central-parietal (Figure 2B), or frontal-central (Figure 2C) electrodes. Indeed, for none of the electrode poolings, such a difference was found (all ps > .05, FDR corrected within this analysis). In both FM and WM trials, activity seemed to be most pronounced in the posterior electrodes (e.g., Figure 2A), with a clear theta enhancement after presentation of stimuli and cue and a decrease in alpha that persisted during the delay periods. This effect was similarly present at central electrodes (Figure 2B). After 1250 msec, a difference between FM and WM trials emerged that was manifested over posterior and central electrodes: Theta enhancement was most pronounced for FM trials after (red coloring) presentation of the cue over posterior electrodes (Figure 2A), whereas theta enhancement was larger for WM trials (blue coloring) during this period over central electrodes (Figure 2B). At frontal electrodes (Figure 2C), there was a sustained decrease in alpha after presentation of the memory display for both FM and WM trials but no clear difference between FM and WM.
Correlation between Capacity and Power
To investigate whether FM and WM depend on different underlying oscillatory mechanisms, we correlated individual FM and WM capacities with time–frequency power before onset of the cue. Importantly, participants were not aware of the trial type they would receive before onset of the cue and thus could not prepare for the two conditions differently. We confirmed that indeed no differences were present in power before onset of the retro-cue between FM and WM trials when averaged over participants. However, if different mechanisms support the formation of representations in FM and WM, a divergence should be seen before onset of the retro-cue in the correlation between individual FM/WM capacity and time–frequency power.
There were 35 time–frequency windows that showed a significant difference in correlation between FM and WM (FDR corrected at .05; Figure 3; Table 1 depicts all partial and full correlations for FM and WM). For P5/P6 and PO7/PO8, there was a negative correlation between FM capacity and alpha power (8–15 Hz) at the initial preparation phase of the trial (−500 to −250 msec), whereas there was a slight positive correlation for WM capacity. To illustrate, the correlations for PO7/PO8 are depicted in Figure 4, top (FM: partial R = −.62, WM: partial R = .39, FM − WM difference: p < .001). For ease of interpretation, we plot the original variables next to the residualized data used in the partial correlation analyses (Figure 4B). This gives insight to the range of capacities and power between participants. A positive correlation between FM capacity and theta (4–7 Hz) was found during preparation (−250 to 0 msec at FC1/FCz/FC2) and presentation of the memory display (0–250 msec at CP1/CPz/CP2), whereas the opposite relation was found for WM (correlations for CP1/CPz/CP2 are depicted in Figure 4, center; FM: partial R = .64, WM: partial R = −.50, FM − WM difference: p < .001). In addition, throughout the trial but mainly during early preparation (−500 to −250 msec) and late delay (750–1250 msec), a positive correlation between FM capacity and gamma (31–70 Hz) and a negative relation between WM capacity and gamma were found for several frontal, central parietal, and occipital-parietal electrode pairs (see Figure 3). In Figure 4, bottom, the most pronounced correlation difference is depicted (FC5/FC6 at 750–1000 msec; FM: partial R = .52, WM: partial R = −.63, FM − WM difference: p < .001). Together, these data show that FM capacity is at least partially related to different oscillatory mechanisms than WM capacity. Crucially, these differences arise even before onset of the retro-cue, which suggests that the buildup of FM and WM representations is supported by dissociable mechanisms.
|Electrode Pair .||Frequency .||Time Bin .||Partial Correlations .||Full Correlations .||p Value .|
|FM .||WM .||FM .||WM .||Diff .|
|Electrode Pair .||Frequency .||Time Bin .||Partial Correlations .||Full Correlations .||p Value .|
|FM .||WM .||FM .||WM .||Diff .|
The current correlational differences between FM and WM were based on between-subject analyses. The question remains whether these differences reflect interindividual trait differences or intraindividual state differences. We therefore divided the data of each of the 35 significant time–frequency windows into a low-power and a high-power trial set per time window and calculated capacity over the low- and high-power sets separately. If the between-subject correlations between FM and WM capacity and power were state-dependent, one would expect to find a capacity difference between low- and high-power trials. If, however, the difference in neural mechanisms underlying FM and WM capacity reflects trait differences, there should be no difference in capacity between low- and high-power trials. This procedure was warranted because capacity estimates were highly reliable over trials. Capacity calculated over odd and even trials separately yielded a correlation of .84. We conducted ANOVAs (2 × 2, WM/FM × High/low power) to test for difference in capacity between high- and low-power trials and found no main effects of Power, Memory type, or an interaction between Power and Memory type (unplanned comparison, Bonferonni-corrected). This suggests that the correlational differences between FM and WM capacity and power were because of participants' general trait characteristics. However, because our task design did not allow us to perform analyses on a trial-by-trial level, we did not have the power to detect more subtle state differences that might affect the formation of FM and WM representations.
General Theta–Gamma Coupling Characteristics
In Figure 5, theta–gamma PAC (dPAC) values are given for occipital and frontal electrode poolings for the two memory conditions throughout the task. As with power, these values are depicted to illustrate the general landscape of theta–gamma coupling as present in this task, but the only statistical analysis performed was to confirm that there were no statistical differences between FM and WM trials before onset of the cue (FDR-corrected within this analysis). In none of the electrode poolings, we found a difference between dPAC for FM and WM trials before onset of the cue (all ps > .05, FDR-corrected). dPAC values were normalized (z scored) based on the average coupling within the same data set when there is no relation between oscillation and time (using a permutation test; see Methods). Deflections from zero thus reflect coupling compared with random fluctuations. Looking at Figure 5, it seems that, in both frontal (upper) and occipital (lower) pools, theta–gamma coupling was present throughout the task. However, theta–gamma coupling seemed more variable for the occipital electrodes, suggesting that the mechanisms at play here might be (differentially) involved in memory formation and/or maintenance (note, however, that these values were not baseline corrected and not statistically tested; we therefore cannot make any statements about task modulation per se).
Correlation between Capacity and Theta–Gamma Coupling
Similar to the power analysis, we focused on the relation between theta–gamma coupling and FM/WM capacity before onset of the retro-cue. A significant difference (FDR-corrected) between FM and WM correlations with theta–gamma coupling was found for an occipital pooling during presentation of the memory display (Figure 6; O1/Oz/O2; theta: 6 Hz, gamma: 55 Hz; 0–250 msec; FM: partial R = −.73, WM: partial R = .35; FM – WM difference: p < .001). Whereas FM capacity negatively correlated with theta–gamma coupling, WM showed a slight positive correlation. This suggests that, although theta and gamma power by themselves are more correlated for FM during presentation of the memory display, the coupling between these two relates more to WM capacity than to FM capacity.
Over the last years, several studies have shown that the traditional two-stage model of VSTM, comprising IM and WM, might be insufficient (Vandenbroucke et al., 2011, 2014; Pinto et al., 2013; Vandenbroucke, Sligte, Fahrenfort, Ambroziak, & Lamme, 2012; Sligte et al., 2008, 2010). A third stage of VSTM, termed fragile memory (FM), has been proposed to lie in between IM and WM. FM can be clearly dissociated from classical IM because it has a smaller capacity, does not rely on retinal afterimages (Sligte et al., 2008), and has a cortical basis (Sligte et al., 2009). In this study, we showed that, on the basis of their electrophysiological correlates, FM and WM could be dissociated as well. We found several time–frequency windows before onset of the retro-cue for which power correlated differentially with FM and WM capacities. This shows that the mechanisms at play during encoding of the memory array differently determined the formation of FM and WM representations and that the construct validity of FM seems warranted. The main focus of this study was to investigate whether FM and WM could be neurally distinguished. No specific claims were made as to which neural substrates would underlie either FM or WM. In the following discussion, we will speculate on the interpretation of our findings on the basis of existing literature.
The first difference between FM and WM was evident before onset of the memory display. When the fixation dot turned green to alert participants to the start of a trial, FM capacity correlated negatively with peri-occipital alpha power (8–15 Hz). A prestimulus decrease in occipital alpha has been related to visual discriminability (Van Dijk, Schoffelen, Oostenveld, & Jensen, 2008), visual attention (Sauseng et al., 2005; Klimesch, 1999), and visual excitability (Lange, Oostenveld, & Fries, 2013). It thus seems that participants who were best able to visually prepare for the upcoming stimuli had a larger memory capacity and, specifically, a larger FM capacity. It seems likely that preparatory attention is more related to memory encoding than to memory maintenance. We thus speculate that the initial formation of FM representations relies more on preparation of the visual system than the formation of WM representations does.
Before and during stimulus onset, a positive correlation between FM and theta power (4–7 Hz) and a negative correlation between WM and theta power were found for a central-frontal and central-parietal electrode pool. An increase in power after stimulus presentation at these electrode sites is related to visual processing of the memory display (Klimesch, 1999). Thus, it seems that participants with enhanced visual processing of the memory items had a larger FM capacity but not a larger WM capacity. This might indicate that forming FM representations is more dependent on visual processing of a scene than the forming of WM representations is.
During several phases of the trial, gamma power (31–70 Hz) over peri-occipital, central, and frontal sites was positively correlated with FM capacity and negatively correlated with WM capacity. On the other hand, specifically during the presentation of the memory display, theta–gamma coupling was negatively related to FM capacity but positively related to WM capacity. These findings suggest that there are different attentional and memory mechanisms at play during the formation of FM and WM memory representations. An increase in gamma power has been associated with increased anticipatory attention (Tallon-baudry, 2009) as well as with memory formation and maintenance (Honkanen, Rouhinen, Wang, Palva, & Palva, 2014; Tallon-baudry, 2009; Jokisch & Jensen, 2007; Tallon-baudry et al., 1999). At the same time, theta–gamma coupling has been related to WM processing and specifically to WM capacity (Sauseng et al., 2009; Canolty et al., 2006). Our results trigger the hypothesis that increases in gamma power are related to the formation of fragile but visually detailed memory traces, whereas theta–gamma frequency coupling supports the formation of long-lasting robust memory representations.
Previous studies have found both gamma power enhancement and increased theta–gamma coupling as correlates for VSTM formation and maintenance (Sauseng et al., 2009; Jokisch & Jensen, 2007; Canolty et al., 2006; Tallon-baudry et al., 1999). Possibly, these correlates reflect different types of maintenance. The increase in gamma power might support memory formation in a more local and more visually detailed manner. This could support easy, short-lived maintenance of stimulus features. Indeed, such processing seems evident in tasks in which the emphasis lies on the exact visual features of a stimulus (e.g., in paradigms such as used by Honkanen et al., 2014; Jokisch & Jensen, 2007; Tallon-baudry et al., 1999). In these tasks, participants would be making a “visual snapshot” as it were. Similarly, in our task, making a snapshot of the image and retaining as much visual detail as possible over a short period enhance FM performance. Because, in FM trials, the visual snapshot is not overwritten by the new display before the retro-cue is presented, even local, volatile representations can still be accessed. This is in line with the observation that FM was also supported by increased visual preparation and processing of the memory display, thus showing a larger dependency on the visual system. On the other hand, to enhance WM performance, representations need to be strengthened and made robust against overwriting of a new display. This latter memory system might be supported by the coupling between theta and gamma, producing a more robust form of memory representations over time. Indeed, theta–gamma coupling has been found in a variety of memory tasks in which maintaining visual details was not as emphasized (Axmacher et al., 2010; Sauseng et al., 2009). Although our current results are in line with the above outlined theory, conformational studies are needed to support these ideas.
The current study suggests that different mechanisms are at play during VSTM formation. This additionally implies that the nature of an experimental task has profound impact on the neural correlates that will be found. If the main focus lies on detailed processing and maintenance of visual features over a retention period without interference, mechanisms that support the formation of visual icons, similar to FM, will be uncovered. However, if items need to be protected against overwriting from new stimuli, or need to be retained in a different format for example, capacity will be supported by neural substrates that support the formation of robust WM-type representations. Therefore, the conclusion about which neural correlates are involved in a VSTM task, and thus which capacity system is tapped into, might highly depend on the nature of the task.
Post hoc analyses showed that the variance between participants in the power spectra that resulted in a difference in FM and WM capacity as found in this study seemed to reflect a general trait difference between participants rather than the state a participant was in during a particular trial. Multiple studies have shown that differences in time–frequency spectra can be related to differences in white matter density (Cohen, 2011a, 2011b; Zaehle & Herrmann, 2011; Valdés-Hernández et al., 2010). It could thus be that the current results are explained by differences in structural connectivity between areas that determine the ability to maintain FM and WM representations.
Previous studies have challenged the existence of a fragile stage in VSTM that has different characteristics from WM. Matsukura and Hollingworth (2011) questioned whether FM capacity truly exceeds the classical WM capacity of four objects. They found that, with minimal practice in the partial-report change detection task, FM capacity did not heavily exceed the classical average of four objects associated with WM. Notably, however, their own measure of WM averaged to around 2.5 objects, which is below the classical average of four objects. These results map perfectly onto the capacities found in the current study. It appears that, when participants receive minimal practice in the partial-report change detection task, capacity estimates for both FM and WM are somewhat lower than reported in previous literature. The crucial observation here, however, is that FM capacity always exceeds WM capacity. Both in the Matsukura et al. study and in this study, at set size 8, FM capacity is at least 39% higher than WM capacity.
Makovski (2012) challenged the assumption that FM capacity can only be probed by a cue before interference of the test array and is abolished after interference. He found that, when visual interference is presented after offset of the memory array but before presentation of the retro-cue, participants still perform better than without a retro-cue, thus concluding that FM is not abolished by visual interference. In his design, however, the visual interference was probably not similar enough to overwrite FM representations. Pinto et al. (2013) showed that, only when the same objects are used in the same spatial locations, FM capacity is abolished, and performance is reduced to WM performance. With different objects (such as in the Makovski  study) or when interference is presented at a different location, the retro-cueing effect is diminished but not abolished. This fits with our notion of FM as a fragile but midlevel memory stage. Unlike IM, FM representations are not overwritten by any type of visual stimulation. However, when FM representations are not protected from interference by the test array (Makovski et al., 2008), similar items will overwrite its representation. Therefore, only items that are encoded and maintained in a more robust format will remain after such interference. The current study shows that, indeed, diverging mechanisms relate to the encoding and maintenance of fragile and robust representations. For an in-depth discussion on different theories on the retro-cue benefit, see van Moorselaar, Olivers, Theeuwes, Lamme, and Sligte (2015).
In recent years, many WM models have emerged that propose a multiple stage account (Nee & Jonides, 2013; Oberauer & Hein, 2012; Jonides et al., 2008). In these models, a distinction is made between the focus of attention, in which only one item is maintained that can be readily acted upon, and a stage of “direct access” (Nee & Jonides, 2013) or “broad focus” (Oberauer & Hein, 2012), in which three-to-four items are maintained to which attention can be flexibly switched. Apart from these two attentional sets, there is an activated part of long-term memory (LTM) in which representations are held that could be relevant to the task. This LTM stage is not currently attended but contains representations that can be easily accessed. We believe that WM as measured in our task corresponds to the broad focus stage of WM, in which multiple items are attentively maintained. Where FM fits in remains to be elucidated. It shares characteristics with the activated LTM stage, such as its unattended nature. However, the fact that FM is not accessible after overwriting of similar stimuli, in combination with its visual nature, does not seem to converge with activated LTM. Alternatively, FM as measured in the current partial-report change detection task might reflect a stage within the visual processing stream itself. FM might constitute the representational capacity of midlevel visual areas and reflect a pure form of VSTM: the capacity to maintain information in a visual format. WM, on the other hand, might come into play when information needs to be protected and manipulated for further use and is thus composed of a focus of attention, broad attention, and activated LTM.
The current study shows that FM and WM capacities are related to different oscillatory characteristics, and therefore, the formation of FM and WM at least partly relies on different mechanisms. FM seems to have a visual basis, whereas for WM, visual processing of the memory display seems less important. In the model we propose (Sligte et al., 2010), FM represents the maintenance of items at a visual level, representing a genuine VSTM store, whereas WM reflects maintenance of items on a cognitive, perhaps more abstract, level. The capacity to maintain items at a visual level is much larger than the capacity to maintain items at a cognitive level (depending on the complexity of the items that are used; Vandenbroucke et al., 2012, 2014; Sligte et al., 2008, 2010). In conclusion, these results show that a distinction between fragile VSTM and visual WM, and thus, a distinction between different representational stages in VSTM is warranted.
This study was made possible by an ERC Advanced Investigator Grant to V. A. F. L. and a Newton International Fellowship by the Royal Academy to I. G. S.
Reprint requests should be sent to Annelinde R. E. Vandenbroucke, Helen Wills Neuroscience Institute, UC Berkeley, 10 Giannini Hall, Berkeley, CA 94720, or via e-mail: email@example.com.