Flexible control over currently relevant sensory representations is an essential feature of primate cognition. We investigated the neurophysiological bases of such flexible control in humans during an intermodal working memory task in which participants retained visual or tactile sequences. Using magnetoencephalography, we first show that working memory retention engages early visual and somatosensory areas, as reflected in the sustained load-dependent suppression of alpha and beta oscillations. Next, we identify three components that are also load dependent but modality independent: medial prefrontal theta synchronization, frontoparietal gamma synchronization, and sustained parietal event-related fields. Critically, these domain-general components predict (across trials and within load conditions) the modality-specific suppression of alpha and beta oscillations, with largely unique contributions per component. Thus, working memory engages multiple complementary frontoparietal components that have discernible neuronal dynamics and that flexibly modulate retention-related activity in sensory areas in a manner that tracks the current contents of working memory.
The ability to dynamically regulate different sensory representations as a function of ongoing task demands is an essential feature of primate cognition that enables adaptive behavior. Such flexible cognitive control is widely believed to be mediated by the same (domain-general) frontal and parietal brain areas whose regulatory influence over other brain areas is continuously aligned to match current task demands (e.g., Squire, Noudoost, Schafer, & Moore, 2013; Gazzaley & Nobre, 2012; Duncan, 2010; Corbetta & Shulman, 2002). Despite this widely accepted notion, however, little evidence to date has confirmed within a single experiment that the same frontoparietal substrates of cognition can flexibly control activity in different sensory modalities as a function of what is currently relevant. Moreover, although frontal and parietal areas consistently “light up” in human fMRI studies of cognition, little remains known about the neurophysiological substrates of frontoparietal control in humans. We therefore set out to investigate the neurophysiology of flexible frontoparietal control in humans and did so in relation to working memory.
Working memory pertains to the core cognitive ability to temporarily retain and manipulate information in mind, for as long as this information remains relevant to current goals (e.g., D'Esposito & Postle, 2015; Baddeley, 1992, 2012). A common aspect of the many models of working memory is that working memory relies on one or more central executive control components that can flexibly regulate representations in distinct storage components, depending on what information is currently relevant (e.g., Baddeley, 1992, 2012). Over the past decade, neuroscience has colored this picture by demonstrating that the latter can engage even primary sensory areas (Spitzer & Blankenburg, 2012; Harrison & Tong, 2009; Pasternak & Greenlee, 2005)—at least when the sensory properties of the information are retained (Lee, Kravitz, & Baker, 2013). Accordingly, it has been proposed that the frontal and parietal substrates that have traditionally been associated with working memory (e.g., Smith & Jonides, 1999; Fuster & Alexander, 1971) reflect domain-general executive control components that regulate representation-specific activity in early sensory areas (D'Esposito & Postle, 2015; Lara & Wallis, 2014; Sreenivasan, Curtis, & D'Esposito, 2014).
Although this framework for understanding the neural implementation of working memory is highly appealing, not many studies to date have directly substantiated its central hypothesis that the same frontoparietal substrates of working memory can flexibly regulate activity in distinct sensory modalities, depending on what information is currently held in working memory. This is because most prior studies (1) presented information in a single sensory modality (making it hard to distinguish domain-general control from representation-specific components) and (2) focused on either frontoparietal or sensory substrates (not addressing whether they co-occur, let alone, interact; but see Rissman, Gazzaley, & D'Esposito, 2008; Gazzaley, Rissman, & D'Esposito, 2004). In addition, the majority of prior studies in humans that support this framework are fMRI studies (as reviewed in, e.g., D'Esposito & Postle, 2015; with some notable exceptions: e.g., Palva, Monto, Kulashekhar, & Palva, 2010). Accordingly, the following key questions have remained largely unaddressed: (1) does “activation” of different frontal–parietal areas engage similar neurophysiological (time–frequency) profiles, or do different frontoparietal areas engage qualitatively distinct neurophysiological processes? (2) Are the same neurophysiological processes engaged for top–down control over different sensory modalities, or are these processes modality specific? (3) How do neural dynamicsin frontoparietal areas relate to neuronal dynamics in modality-specific brain areas?
To investigate these questions regarding neurophysiological substrates of flexible frontoparietal control during working memory in humans, we capitalized on the high temporal resolution and whole-head coverage of magnetoencephalography (MEG) and adopted an intermodal working memory task in which the same working memory operations (retaining a sequence of two or four items) were required on either visual or tactile representations. We first confirm that working memory engages early visual and somatosensory areas and show that this is reflected in the sustained suppression of alpha and beta (8–30 Hz) oscillations in the relevant sensory area that, moreover, scales with load. We next identify three electrophysiological substrates of working memory (frontal theta synchronization, medial frontoparietal gamma synchronization, and sustained parietal elevation in magnetic field strength) that each also scale with load but are largely independent of its contents. Finally, and addressing our central hypothesis, we demonstrate that each of these supramodal components predicts, in a largely unique manner, the activity modulations in early sensory areas—with the pattern of correlation tracking the contents of working memory.
This study was conducted in accordance with guidelines of the local ethics committee (Committee on Research Involving Human Subjects, Region Arnhem-Nijmegen, The Netherlands).
Sixteen healthy human volunteers (four women, age range = 24–37 years, all right-handed) participated in the study after providing informed consent. Participants received € 10/hr for their participation. Data from all participants were retained in the analysis.
Task and Procedure
Participants performed an intermodal working memory task with visual and tactile sequences (Figure 1) while seated in the MEG. In different blocks, participants were required to reproduce either the visual or tactile sequences after a 4-sec retention interval. Visual sequences were produced by sequentially lighting up four of eight placeholders that were positioned around a central fixation cross. Similarly, tactile sequences were produced by sequentially tapping four of eight possible fingers (for details, see Visual and tactile stimulation details). Visual and tactile sequences were always presented together, but their sequences were drawn independently of each other. Both sequences always contained four unique items. Within visual and tactile blocks, working memory load was varied across three mini blocks of eight trials each (Figure 1B). In Load 2 mini blocks, participants were required to reproduce only the first two items of the sequence, whereas in Load 4 mini blocks, they were required to reproduce all four items. Finally, in Load 4* mini blocks, they were again required to reproduce all four items but could do so in arbitrary order (and we confirmed that this manipulation worked in our behavioral data; see Results). Instructions were provided through a visual display that required the participant's response to start the (mini)block. It is key to note that, across all conditions (tactile/visual, loads 2/4/4*), sensory input was matched (i.e., only instructions varied).
Twenty-five percent of all trials served as non-working memory control trials. In these trials, the fixation cross turned red (instead of green) at the onset of the sequence, and participants were instructed not to retain the items but instead to base their responses on information provided at the reproduction stage (as described below). The fixation cross remained red or green until the end of the retention interval. Control trials were randomly interleaved with working memory trials and were equally distributed across the mini blocks.
At the reproduction stage, a visual display depicted all possible response options for that block (visual or tactile; see Figure 1A), together with a response cursor (a white line) beneath one of them. To avoid response preparation during the retention interval, we drew the cursor's starting position randomly. Participants moved the cursor clockwise or counterclockwise by pressing a button with their right or left thumb and selected a response option by pressing both buttons within a 75-msec time frame. In control trials, the to-be-selected responses were indicated by dots inside the requested response options. Participants sequentially selected two or four options depending on the load mini block they were in (also in the control trials that were presented in this mini block). After an item was selected, the selection could not be undone. Feedback was presented after every selection by flashing the selected option green (correct) or red (incorrect) for 100 msec. The interval between the last response and the start of the next trial was randomly drawn from a truncated negative exponential distribution (truncated between 1 and 4 sec) with a mean of 1500 msec.
All participants completed two sessions of 1 hr with a 15- to 30-min break in between. Each session contained 10 blocks (five visual, five tactile) of 24 trials. Between blocks, we presented visual and tactile localizers, during which participants were instructed to relax. Localizer stimuli involved stimulation of all eight visual placeholders or all eight fingers and lasted 100 msec each. Visual and tactile stimuli were randomly interleaved, and ISIs were randomly drawn between 500 and 700 msec. Each localizer contained 100 stimuli (50 visual, 50 tactile).
Visual and Tactile Stimulation Details
Visual displays were projected to a screen that was positioned approximately 70 cm in front of the participant's eyes. We placed eight placeholders (small squares of approximately 0.3° visual angle) on an invisible oval that was centered at the fixation cross (Figure 1A). Placeholders appeared immediately after the response in the previous trial. The dimensions of the oval were approximately 5° visual angle in width and 2° in height. Placeholder locations were varied from trial to trial, with the constraint that individual placeholders were at least 25° apart on the circumference of the oval (which spans 360°). Placeholders were gray (RGB values: 15, 15, 15) and were set to purple (RGB values: 20, 4, 30) for 100 msec for visual stimulation.
For tactile stimulation, we made use of two custom-built graspable tactile stimulation devices (as also described and depicted in van Ede, de Lange, Jensen, & Maris, 2011), one for each hand. Each device contained five piezoelectric Braille cells (Metec, Stuttgart, Germany), each with eight plastic pins that can be raised and lowered. When being raised, this produces the sensation of a tap to the finger. We positioned all fingertips on a separate (adjustable) Braille cell but excluded the thumbs. Instead, at the thumbs, each stimulation device contained a response button that was used for sequence reproduction.
Visual and tactile sequences were always presented simultaneously and each consisted of four individual stimulations of 100 msec, with 366-msec ISIs, thus yielding sequences of 1500 msec.
MEG Acquisition and Preprocessing
Data were acquired using a CTF MEG system that contained 275 axial gradiometers and that was housed in a magnetically shielded room. Localization coils at the nasion and the left and right ears continuously monitored the position of the head relative to the gradiometers. Data were sampled at 1200 Hz and were analyzed in MATLAB (The MathWorks, Natick, MA) using FieldTrip (Oostenveld, Fries, Maris, & Schoffelen, 2011). During data preprocessing, we removed line noise using a discrete Fourier transform filter, cut out our epochs of interest, and subtracted the average signal per epoch (i.e., de-meaning). For analyses of event-related fields (ERFs), we instead baseline corrected the signal by subtracting a 1000-msec presequence baseline. Excessively noisy trials were excluded in two ways. First, noisy trials were detected by visual inspection of the signal's variance across trials and channels. Second, for all analysis involving a single (extracted) value per trial, we additionally removed trials for which this value was more than 3 SDs away from all other trials. For sensor level analyses of oscillatory power, we calculated synthetic planar gradients of the signal, which are known to be maximal above the sources (Bastiaansen & Knösche, 2000). Horizontal and vertical gradients were combined (summed) after power was calculated. For ERF analyses, we did not perform this planar gradient transformation because it removes information about the influx and outflux of the source-generated magnetic fields (which are relevant to interpreting ERFs but not power).
We calculated oscillatory power with and without time resolution. For analyses without time resolution, we estimated power across the full 4-sec retention interval for four a priori defined frequency bands that were nonoverlapping. For theta, we used 4–7 Hz. We based this range on the 4- to 8-Hz band put forward by Hsieh and Ranganath (2014) but stopped at 7 Hz to avoid overlap with the 8- to 12-Hz alpha band. For alpha and beta, we used the standard bands from 8 to 12 and 13 to 30 Hz. Finally, for gamma, we used 55–75 Hz based on our own prior study that had revealed a prominent gamma source in this range (van Ede, Szebényi, & Maris, 2014; note that this band is also very close to the 60- to 80-Hz band identified by Roux, Wibral, Mohr, Singer, and Uhlhaas, 2012. We combined Fourier analysis with multitapering (Percival & Walden, 1993) to achieve the desired spectral smoothing in each of these bands. For all analyses with time resolution, we used a 1000-msec sliding time window that was advanced in steps of 200 msec across the epochs of interest. For time-resolved analyses of frequencies below 50 Hz, we applied ±2-Hz smoothing, whereas for frequencies above 50 Hz, we applied ±5-Hz smoothing.
We placed grids with 0.75-cm3 spacing inside a standardized Montreal Neurological Institute anatomy—yielding 5341 voxels inside anatomical boundaries. For each participant, we then warped this grid to match their individual structural MRI. Per voxel, a leadfield matrix was calculated using a forward model based on a single-shell volume conductor (Nolte, 2003). For the different frequency bands of interest, we then used a frequency-domain beamformer (DICS; Gross et al., 2001) to reconstruct source level power. Source level power was subsequently contrasted between conditions.
For the sustained ERF component, a comparable beamforming approach in the time domain (LCMV) did not yield interpretable source reconstructions. There are two plausible reasons for this. First, although the effect topographies (working memory minus control) were relatively clean, the topographies observed in the individual conditions (from which only the presequence baseline had been subtracted) were much noisier. This noise will have affected the source reconstructions of the individual conditions, and the degree to which this has happened may have corrupted the between-condition contrast to such a degree that it does not look focal anymore. Second, and equally important, the source of this sustained ERF component may have had a wide spatial distribution. Distributed sources are a form of correlated sources, and for this type of sources, it is known that the beamformer performs poorly (Van Veen, van Drongelen, Yuchtman, & Suzuki, 1997). To deal with these challenges, we resided to a source reconstruction approach that is not adversely affected by correlated sources (but is inferior to the beamformer when sources are actually uncorrelated). This approach is a form of signal subspace projection. Specifically, per voxel, we calculated the proportion of variance in the average effect topography (working memory minus control trials, averaged across the full retention interval) that could be explained by a linear combination of the three leadfields associated with that voxel. This proportion of explained variance was expressed as an R2 value, and it resulted in one source level R2 map per participant.
Analysis Strategy and Statistics
For all neurophysiological components of interest (theta/alpha/beta/gamma/sustained ERF), we first contrasted these components between working memory and control trials with regard to their average strength across the full 4-sec retention interval. We did this both at the sensor and source levels. For the oscillatory components (theta, alpha, beta, and gamma), we further normalized this difference by expressing it as a percentage change: [(WM − control)/control] × 100. We then statistically evaluated these contrasts at the source level using a cluster-based permutation approach. This approach circumvents the multiple-comparison problem by evaluating the full dataspace under a single permutation distribution with regard to the largest cluster of neighboring values that exceed the univariate threshold of p < .05 (see Maris & Oostenveld, 2007, for details). Because, for the sustained ERF component, we only obtained a single source-level map of R2 values per participant (for reasons explained above in Source analyses), the same statistical source-level comparison between working memory and control trials was not possible. We instead based our statistical analysis of this component on a leave-one-out approach that we will explain and justify in more detail below.
After our evaluation of the presence of a retention-related source (i.e., working memory minus control) in the different frequency bands of interest, we next turned to the temporal and spectral profiles of these components as well as their dependence on working memory modality and load. For reasons explained below, this involved a slightly different procedure for the different components of interest.
To further explore the profile of the observed alpha and beta band modulations in the sensory areas, we were able to define participant-specific visual and somatosensory ROIs on the basis of an independent localizer. Specifically, we contrasted 8- to 30-Hz power in the 150- to 400-msec poststimulus window between visual and tactile localizer stimuli and assigned the 300 voxels (5.6% of the total volume) that showed the largest positive difference to the somatosensory ROI and the 300 voxels that showed the largest negative difference to the visual ROI. We could then reconstruct activity in these ROIs to map out their time–frequency profiles as well as their load and modality dependence with regard to 8- to 30-Hz power. To this end, we (1) reconstructed the time-domain activity for each of the selected voxels (by multiplying the data with the beamformer-derived filters for those voxels), (2) subjected all virtual channels to the desired frequency analysis, and (3) averaged the resulting power estimates across all voxels within that ROI.
For the other components of interest, we did not have a localizer and therefore required a different approach. For the theta and gamma components, we reconstructed activity from the significant source-level clusters that were obtained by comparing all working memory and control trials. For theta, this involved a cluster of 343 voxels (6.4% of the total volume); and for gamma, a cluster of 350 voxels (6.6%). Importantly, because we selected these clusters on the basis of a statistically significant difference between working memory and control trials, subsequent analyses evaluating this particular difference will be biased by this selection. In our analysis, this was the case only for the analyses of the time–frequency profiles of these modulations. We therefore presented these profiles only for descriptive purposes and did not use them for statistical inference. At the same time, it is important to note that two other analyses of interest remain unbiased by this selection. First, this holds for the comparison between load and modality conditions, because the clusters were found on the basis of all working memory conditions collapsed (with an equal number of trials in each). Second, this also holds for the correlation analyses between, on the one hand, the sensory-specific working memory components and, on the other hand, the supramodal theta and gamma components. The latter holds because the clusters were found on the basis of the trial averaged difference between working memory and control trials, rather than on the basis of the covariance of any trial-specific values with any other variable.
For the analysis of the sustained ERFs, we employed a slightly different approach. This was driven by the facts that, for this component, (1) we were not able to test for a significant source-level cluster, as we were only able to obtain a single (effect topography based) R2 map per participant, and (2) the individual topographical effect maps revealed vast across-participant variability that was likely driven by differently oriented sources. Instead of reconstructing this sustained ERF component on the basis of a significant group-level source cluster, we therefore reconstructed it on the basis of the participant-specific sensor level effect topographies. Specifically, we used a leave-one-out approach that allowed us to leverage participant-specific information, while avoiding double dipping. In this approach, for every given trial, we obtained a spatial filter from the effect topography of all remaining trials for that participant. We started from the difference between the average working memory and the average control trials, which we calculated using all trials except one (which could be a working memory or control trial). We then subjected this difference (dimensions: Channels × Time points) to a singular value decomposition and used the spatial weights associated with the component with the highest singular value as a spatial filter (dimensions: Channels × 1). Applying this filter to the remaining trial allowed us to estimate the time course of this trial's sustained ERF component. We applied this procedure separately for every trial. Because this leave-one-out approach provides an unbiased selection of the relevant dataspace, all subsequent comparisons (including those of the reconstructed time courses itself) remained unbiased.
Between-component Correlation Analyses
Across-trial correlations between all observed neurophysiological components were calculated with regard to their reconstructed trial-wise strengths, averaged across the full retention interval. Correlations were evaluated exclusively on working memory trials and were calculated separately for each session and subsequently averaged (as for all other outcome measures). To evaluate across-trial correlations within each of the load conditions, we calculated these correlations separately for each of the load conditions and subsequently averaged the resulting correlations.
Before calculating the between-component correlations, we regressed out the contribution of two main potential sources of correlated variability: time-on-task and head position. For time-on-task, we used the trial number within the session as a regressor. For head position, we first subjected the time courses of the nine head movement parameters (x, y, and z, for each of the three localization coils) to a singular value decomposition analysis and retained the component with the largest singular value. Per trial, this component was averaged across the retention interval, and the resulting variable was then used as a regressor to remove its contribution to the neurophysiological components.
Our main focus was on the correlation between each of the observed supramodal components (theta/gamma/sustained ERF) and the sensory-specific modulation in 8- to 30-Hz power. We evaluated these correlations separately for power in the alpha and beta bands and did this once for all channels and once with regard to the average power in the participant-specific visual and somatosensory ROIs. We also evaluated this correlation for a combined predictor that was obtained by z-normalizing across trials the strengths of each of the supramodal predictors (i.e., subtracting the mean and dividing by the standard deviation) and summing their trial-specific z values. We focused on the difference in this correlation between visual and tactile working memory trials, and we did so for two reasons: (1) to zoom in on our hypothesis that the same supramodal component correlates with activity in different sensory areas as a function of what is currently held in memory and (2) to increase sensitivity by subtracting out sources of common variance that are unrelated to the working memory task (fluctuations in arousal, head movements, etc.).
Working Memory Tasks and Performance
Sixteen healthy human volunteers performed an intermodal working memory task with simultaneously presented visual and tactile sequences (Figure 1A) while their electrophysiological brain activity was recorded using MEG. Visual sequences consisted of small squares that lit up around the fixation cross, and tactile sequences consisted of taps to different fingers across both hands. In different blocks, participants reproduced either the visual or tactile items after a 4-sec retention interval (Figure 1A and B). Working memory load and sequencing were varied by instructing participants to reproduce either the first two (Load 2) or all four (Load 4) items of the sequence (in the presented order) or all four items while neglecting the order (Load 4*; Figure 1B). A subset of the trials served as non-working-memory control trials in which the fixation cross turned red, and participants were instructed not to retain the items (see Methods for further details).
Participants performed both the tactile and visual tasks well above chance level: on average, 83.3 ± 1.7% (mean ± 1 SE) of all items were correctly reproduced (Figure 1C). An ANOVA further revealed that performance was higher for visual than tactile items (main effect of Modality: F(1, 15) = 51.91, p = 3.05e−6, ηp2 = 0.78) and that performance was Lower with Load 4 than with Loads 2 and 4* (main effect of load: F(2, 30) = 78.61, p = 1.8e−12, ηp2 = 0.84; Load 4 vs. Load 2: t(15) = −10.99, p = 1.43e−8, d = −2.75, 9%% CI [−15.31, −10.34]; Load 4 vs. Load 4*: t(15) = −9.75, p = 6.93e−8, d = −2.44, 95% CI [−15.25, −9.78]), whereas Loads 2 and 4* did not differ (Load 4* vs. Load 2: t(15) = −0.29, p = .78, d = −0.07, 95% CI [−2.6, 1.98]). The performance data also confirmed that participants disregarded the sequence order in the Load 4* condition: whereas 89.4 ± 1.2% of the items were correctly reported in this condition, only 27.4 ± 0.6% of the items were correctly reported in the presented order (see inserted error bars in Figure 1C).
Tactile and Visual Working Memory Engage Early Sensory Areas, as Reflected in the Sustained Suppression of Alpha and Beta Oscillations
Figure 2A shows the topographical maps of working-memory-related changes in power in the alpha (8–12 Hz) and beta (13–30 Hz) frequency bands, estimated over the entire 4-sec retention interval and collapsed across load conditions. Relative to control trials, a marked suppression of power is observed during both tactile and visual working memory. Critically, this suppression involves distinct topographies: During tactile working memory retention (leftmost topographies), power is predominantly suppressed in central sites, whereas during visual working memory retention (middle topographies), this is the case in posterior sites. Indeed, the direct comparison between tactile and visual working memory (rightmost topographies) confirms a clear separation between power modulations in central and posterior sites, with power being lower in the central sites during tactile working memory (blue in the depicted contrast) and lower in posterior sites during visual working memory (red in the depicted contrast). This is similarly evident in both the alpha (top row) and beta (bottom row) frequency bands, and for this reason, we consider them jointly (i.e., 8–30 Hz) in subsequent analyses.
We next studied the sources and statistical significance of these modulations. For this, we calculated the same contrast between tactile and visual working memory for reconstructed source power and statistically evaluated this difference across all voxels using a cluster-based permutation analysis (see Methods for details). This confirmed two significant clusters (Figure 2B): one sensorimotor cluster encompassing left and right primary somatosensory areas (cluster p = .001) and one occipital cluster encompassing left and right early visual areas (cluster p = .007). Although cluster level p values cannot be used for spatially specific statistical inference (Maris, 2012), this result strongly suggests that, during the retention of tactile items, 8- to 30-Hz power is more suppressed in primary sensorimotor areas, whereas during the retention of visual items, 8- to 30-Hz power is more suppressed in early visual areas (see also Figure 6A).
We next characterized the time–frequency profile of the sensory-specific power modulations in the somatosensory and visual areas using an ROI approach. Somatosensory and visual ROIs were each extracted from an independent localizer (see Methods for details). For both ROIs, we contrasted trials in which the items in working memory were relevant or irrelevant to the ROI. As depicted in Figure 2C, for both ROIs collapsed (left) as well as for each ROI separately (center and right), this confirmed that the modality-specific power modulations are most pronounced in the 8- to 30-Hz band, are largely sustained throughout the 4-sec retention interval, and are highly similar in each ROI. Although the depicted contrasts also suggest that the beta suppression diminishes toward the end of the retention period, complementary contrasts with the control condition showed that, instead, this modulation does persist but becomes less modality specific (Figure 2D).
Finally, we investigated to what extent the identified power modulations in the sensory areas depend on working memory load. As depicted in Figure 2E, in both the somatosensory and visual ROIs, power was more suppressed with higher load, but only when the items in working memory were relevant to the ROI (i.e., tactile working memory for the somatosensory ROI and visual working memory for the visual ROI). This was confirmed by a significant three-way interaction between the factors ROI, Modality, and Load (F(2, 30) = 36.75, p = 8e−8, ηp2 = 0.71). Breaking this down for power modulations in the somatosensory ROI during tactile working memory, 8- to 30-Hz power is more suppressed with Loads 4 and 4*, compared with Load 2 (Load 4 vs. Load 2: t(15) = −5.84, p = 3.24e−5, d = −1.46, 95% CI [−14.1, −6.55]; Load 4* vs. Load 2: t(15) = −7.63, p = 1.53e−6, d = −1.91, 95% CI [−14.66, −8.26]), whereas Loads 4 and 4* are not significantly different (t(15) = 1.55, p = .14, d = 0.39, 95% CI [−0.43, 2.71]). The same pattern of load dependence occurs in the visual ROI during visual working memory (Load 4 vs. Load 2: t(15) = −6.55, p = 9.23e−6, d = −1.64, 95% CI [−15.18, −7.73]; Load 4* vs. Load 2: t(15) = −8.21, p = 6.27e−7, d = −2.05, 95% CI [−16.76, −9.85]; Load 4 vs. Load 4*: t(15) = 1.57, p = .14, d = 0.39, 95% CI [−0.66, 4.36]).
In summary, the tactile and visual working memory tasks engaged, respectively, early somatosensory and visual areas, and this covert sensory recruitment is reflected in the sustained suppression of 8- to 30-Hz oscillations that scale with the amount of items held in working memory.
Frontal and Parietal Theta and Gamma Synchronization, as well as a Sustained ERF, Reflect Three Supramodal Substrates of Working Memory
Theta Oscillations (4–7 Hz)
We next analyzed working-memory-related modulations in the theta band by comparing working memory with control trials with regard to power in the 4- to 7-Hz theta band (see Methods for frequency band justification). In contrast to the 8- to 30-Hz band, this revealed a very different picture (Figure 3). Although also we observed prominent modulations in this lower-frequency band, this time, their topographical maps were highly similar during tactile and visual working memory (see also Figure 6B). In both cases, theta power is elevated in the same set of frontal sites (Figure 3A, top). At the source level, this is reflected in a significant cluster (cluster p = .031) that encompasses medial prefrontal areas (Figure 3A, bottom; Figure 6B). The time–frequency profile extracted from this prefrontal source cluster (Figure 3B) confirms that this modulation occurs in the classical 4- to 7-Hz theta band and that it is largely specific to the retention interval. Finally, when comparing the different load conditions with regard to this prefrontal theta source (which was found on the basis of all load conditions collapsed), we observed a main effect of Load (F(2, 30) = 6.45, p = .005, ηp2 = 0.30) that is constituted by the fact that, during both tactile and visual working memory, Loads 4 and 4* show larger increases in power than Load 2 (tactile, Load 4 vs. Load 2: t(15) = 2.52, p = .024, d = 0.63, 95% CI [0.71, 8.47]; Load 4* vs. Load 2: t(15) = 2.42, p = .029, d = 0.60, 95% CI [0.66, 10.48]; visual, Load 4 vs. Load 2: t(15) = 2.38, p = .031, d = 0.60, 95% CI [0.90, 16.32]; Load 4* vs. Load 2: t(15) = 2.37, p = .03, d = 0.60, 95% CI [0.6, 8.5]), whereas Loads 4 and 4* do not significantly differ (tactile: t(15) = −0.79, p = .44, d = −0.2, 95% CI [−3.63, 1.68]; visual: t(15) = 1.99, p = .07, d = 0.5, 95% CI [−0.30, 8.55]). Thus, similar to the 8- to 30-Hz suppression in the relevant sensory ROIs, this domain-general theta component also scales with the number of items in working memory, independent of whether they need to be retained in their original sequence order.
Gamma Oscillations (55–75 Hz)
The same analyses for gamma power (55–75 Hz; see Methods for frequency band justification) revealed another supramodal component in midfrontal sites (Figure 4A, top; note that, for these topographies, we plotted group level t values, taking advantage of the fact that t value maps down-weight unreliable effects, which are much more pronounced in this higher-frequency band). Source analysis yielded a significant source-level cluster (cluster p = .022) that centers on medial frontal areas and extends to medial parietal areas (Figure 4A, bottom; see Figure 6C). As for the identified theta power increase, also this gamma power increase is also band limited and sustains throughout the retention interval (Figure 4B). Moreover, as depicted in Figure 4C, this increase in medial frontoparietal gamma power scales with load (main effect of also Load: F(2, 30) = 22.59, p = 1.04e−6, ηp2 = 0.60). In fact, it was only apparent with Loads 4 and 4*. As for the theta modulation, during both tactile and visual working memory, power is higher with Loads 4 and 4* than with Load 2, whereas Loads 4 and 4* do not differ (tactile, Load 4 vs. Load 2: t(15) = 4.94, p = 1.79e−4, d = 1.23, 95% CI [2.10, 5.27]; Load 4* vs. Load 2: t(15) = 4.88, p = 1.99e−4, d = 1.22, 95% CI [1.91, 4.87]; Load 4 vs. Load 4*: t(15) = 0.5, p = .622, d = 0.13, 95% CI [−0.95, 1.54]; visual, Load 4 vs. Load 2: t(15) = 4.12, p = 8.87e−4, d = 1.03, 95% CI [1.49, 4.68]; Load 4* vs. Load 2: t(15) = 4.16, p = 8.30e−4, d = 1.04, 95% CI [1.56, 4.83]; Load 4 vs. Load 4*: t(15) = −0.19, p = .855, d = −0.047, 95% CI [−1.37, 1.15]).
Apart from their distinct spectral content and spatial localization, the functional properties of the identified theta and gamma modulations thus appear strikingly similar. We did, however, note one key difference: Whereas the increase in theta power is largely restricted to the retention interval (Figure 3B), the increase in gamma power already becomes prominent during sequence encoding (Figure 4B).
We next investigated the average strength of the sustained magnetic field during the retention interval (i.e., the “DC component” of the signal). The topographical maps of this component (average field strength in working memory trials minus average field strength in control trials) are depicted in Figure 5A. Note that, in contrast to the power modulations described before, we here depict axial gradiometer signals, because these retain information on the precise pattern of magnetic outflux and influx (in red and blue, respectively). These effect topographies too are highly similar during tactile and visual memory and suggest a central dipolar source. Despite this promising sensor level topography, we did not succeed in producing a convincing (localized) source reconstruction using the beamformer methodology (Van Veen et al., 1997), which we used in the other analyses. However, using an older and much simpler source reconstruction methodology (signal subspace projection; see Methods for details), we identified a prominent source in superior parietal cortex that extends to medial frontal areas as well (Figure 5B; see also Figure 6D).
We did note vast differences in the effect topographies between participants. For example, whereas the pattern of outflux and influx is relatively similar to the grand average for Participant 1 in Figure 5A, it appears reversed for Participant 2. Yet, also for Participant 2, we observed reproducible effect topographies when moving from tactile to visual working memory. This intermodal reproducibility was confirmed at the group level: Although effect topographies between tactile and visual working memory are based on two fully independent sets of data, they are, on average, highly correlated (r = .51 ± .08, t(15) = 6.561, p = 9.02e−6).
To better deal with these vast interindividual differences, we reconstructed the time courses of this sustained ERF component on the basis of participant-specific effect topographies (rather than on the basis of a group level source cluster, as for the theta and gamma components). To avoid statistical bias, we used a leave-one-out approach in which each trial's time course was reconstructed using a spatial filter that was based on the effect topography of all other trials (see Methods for details). Reconstructed component time courses are depicted in Figure 5C. Relative to control trials, during both tactile (top) and visual (bottom) working memory, field strength increases over the encoding interval and remains elevated throughout the retention interval. Moreover, this elevation increases with higher load (Figure 5C and D). This was confirmed by a significant main effect of Load (F(2, 30) = 5.16, p = .012, ηp2 = 026) with follow-up t tests revealing the same pattern of load dependence as described for the theta and gamma components. Only a single exception was observed: The contrast between Loads 4 and 2 during tactile working memory did only not reach significance anymore (tactile, Load 4 vs. Load 2: t(15) = 2.06, p = .058, d = 0.51, 95% CI [−0.01, 0.45]); all the other contrasts were as before (tactile, Load 4* vs. Load 2: t(15) = 3.22, p = .006, d = 0.81, 95% CI [0.09, 0.44]e−12; Load 4 vs. Load 4*: t(15) = −0.88, p = .391, d = −0.22, 95% CI [−0.16, 0.06]*e−12; visual, Load 4 vs. Load 2: t(15) = 2.196, p = .044, d = 0.55, 95% CI [0.01, 0.53]*e−12; Load 4* vs. Load 2: t(15) = 3.298, p = .005, d = 0.82, 95% CI [0.06, 0.29]*e−12; Load 4 vs. Load 4*: t(15) = 0.85, p = .409, d = 0.21, 95% CI [−0.14, 0.33]*e−12). Interestingly, we also noted a further increase in the strength of this component during the report (Figure 5C), where it also becomes evident in control trials (which also require a report; see Methods).
In summary, we have identified and characterized three further electrophysiological signatures of working memory that each also scale with the number of items in working memory and that, unlike the identified sensory modulations, are similarly engaged during both tactile and visual working memory (see also Figure 6).
Supramodal Theta, Gamma, and Sustained ERF Components Flexibly Correlate with Somatosensory or Visual Activity, Depending on Working Memory Content
Having identified a robust sensory-specific suppression of 8- to 30-Hz power alongside three supramodal components that are enhanced during working memory, we turned to our central hypothesis that the same supramodal component can flexibly correlate with activity in distinct sensory areas, depending on what information is kept in memory. To this end, we employed (between-frequency) power–power correlations (see also, e.g., Hipp, Hawellek, Corbetta, Siegel, & Engel, 2012; Mazaheri, Nieuwenhuis, van Dijk, & Jensen, 2009). Specifically, we calculated the strength of each of the supramodal components as a single number per trial. These trial-specific quantifications of the three supramodal components were then correlated across trials with the alpha and beta power in each recording site (see Methods for details). We did this separately for tactile and visual working memory trials. By comparing the resulting topographical correlation maps between tactile and visual working memory, we could evaluate our central hypothesis, while at the same time correcting the topographical correlation maps for sources of variance that are common to both tasks, such as fluctuations in arousal. Provided that the sensory-specific component involves a decrease in power in the relevant sensory area, we hypothesized more negative correlations in somatosensory sites during tactile working memory and in visual sites during visual working memory.
For each of the extracted supramodal predictors (theta, gamma, sustained ERF, and a combined predictor; see Methods for details), Figure 7A shows the degree to which their trial-wise correlation with alpha and beta power in all sites was different depending on whether somatosensory or visual information was retained. During tactile working memory, each of the supramodal predictors correlates more negatively with power in central sites (blue in the depicted contrast), whereas during visual working memory, the same predictors correlate more negatively with power in posterior sites (red in the depicted contrast)—akin to the power modulation topographies in Figure 2A. This is the case for power in both the alpha (top row) and beta (bottom row) bands and, as expected, is most clear when the predictors are combined (rightmost topographies). Thus, for each of the supramodal components, when this component is more pronounced, the suppression of 8- to 30-Hz power in the relevant sensory area is also more pronounced. In other words, the same supramodal components predict the suppression of 8- to 30-Hz power in distinct sensory areas, depending on whether tactile or visual items are retained.
Because all analyzed components followed the same dependence on load (being more pronounced during Loads 4 and 4* compared with Load 2), the observed pattern of correlation may be due to load-induced differences alone. However, if these supramodal components truly regulate activity in these sensory areas, then the same pattern of correlation should also be manifested across trials within each of the load conditions. The topographic maps in Figure 7C show exactly this.
We now turn to the statistical evaluation of this pattern of interest. For this, we again made use of the ROI approach in which, for both the somatosensory and visual ROIs, we contrasted the correlations between the conditions in which the items in working memory were relevant versus irrelevant to the ROI (cf. Figure 2C). To increase sensitivity, we averaged this outcome measure across the somatosensory and visual ROIs (importantly, this does not hamper the interpretation of this statistical test because the topographical maps [Figure 7A and C] reveal largely symmetrical effects between visual and somatosensory sites). This analysis confirmed that, for each supramodal predictor, correlations are more negative when the items in working memory are relevant to the ROI (Figure 7B and D). Restricting ourselves to the more meaningful within-load-condition correlations, we obtained the following (Figure 7D): For the sustained ERF and the combined predictors, this pattern is significant with regard to the power in the alpha and beta bands (sERF-alpha: t(15) = −2.45, p = .027, d = −0.61, 95% CI [−0.12, −0.01]; sERF-beta: t(15) = −2.36, p = .031, d = −0.59, 95% CI [−0.11, −0.06]; combined-alpha: t(15) = −2.48, p = .025, d = −0.62, 95% CI [−0.15, −0.01]; combined-beta: t(15) = −3.56, p = .003, d = −0.89, 95% CI [−0.14, −0.03]); for the theta predictor, this only reaches significance for beta power (theta-alpha: t(15) = −1.64, p = .106, d = −0.41, 95% CI [−0.12, 0.02]; theta-beta: t(15) = −2.60, p = .02, d = −0.65, 95% CI [−0.10, −0.01]); and for the gamma predictor, this is only the case for the alpha power (gamma-alpha: t(15) = −2.43, p = .028, d = −0.61, 95% CI [−0.12, −0.01]; gamma-beta: t(15) = −1.22, p = .18, d = −0.30, 95% CI [−0.09, 0.03]).
The Different Supramodal Components Have a Largely Unique Contribution in Predicting the Sensory-specific Modulation of Alpha and Beta Oscillations
Given that all supramodal components show the same modulation of their topographic correlation maps (see Figure 7), it is obvious to ask whether this might be due to the correlation between these supramodal components. To investigate this, we evaluated the unique contributions of the different supramodal components to the across-trial (within load condition) correlations described above. To this end, we performed a partial correlation analysis in which, for each of the supramodal predictors, the contributions of either or both of the other supramodal predictors were partialed out. As depicted in Figure 8, the predictive power of each of the supramodal components was largely independent of the other supramodal components. In fact, for none of the observed effects, we did find a significant reduction when either or both of the other two components were partialed out (all ps > .25), and for two of three supramodal predictors, the effect of interest remained significant after partialing out (only for the sustained ERF predictor, this was not the case). This result is further substantiated by the observation that the across-trial correlations between the different supramodal predictors, although positive, were generally low. In fact, this correlation only reached significance between the theta and sustained ERF components (theta-sERF: r = .07 ± .02, t(15) = 3.16, p = .006; theta-gamma: r = .01 ± .01, t(15) = 0.05, p = .59; gamma-sERF: r = .02 ± .02, t(15) = 0.90, p = .38).
We investigated the neurophysiological bases of working memory in humans and evaluated the hypothesis that the same frontoparietal substrates of working memory can flexibly regulate activity in distinct sensory areas depending on the content of working memory. This hypothesis is akin to the notion of central executive control over information that is held in one of several available sketchpads (see, e.g., Baddeley, 1992, 2012). On the basis of MEG, we identified and characterized the neuronal dynamics of three such supramodal control signatures (enhanced frontoparietal theta and gamma synchronization, as well as sustained ERFs), alongside a robust sensory-specific signature (the suppression of alpha and beta oscillations in the relevant sensory area). The key finding is that each of these domain-general components flexibly correlates with activity in either the visual or somatosensory area, depending on the contents of working memory.
Main Advances of the Current Study
To appreciate the relevance of the present results, it is important to first note that several of the components observed in this study have already been reported and discussed in relation to working memory, such as frontal theta (Hsieh, Ekstrom, & Ranganath, 2011; Jensen & Tesche, 2002; Raghavachari et al., 2001; Gevins, Smith, McEvoy, & Yu, 1997) and gamma (Roux et al., 2012) synchronization as well as the modulation of sensory-specific alpha and beta oscillations (van Ede, Niklaus, & Nobre, 2016; Lozano-Soldevilla, ter Huurne, Cools, & Jensen, 2014; Spitzer & Blankenburg, 2012; Sauseng et al., 2009); see also Roux and Uhlhaas, 2014, for a review). To date, however, these different components have typically been studied in isolation, leaving open the questions to what extent they are task and/or modality specific and whether and how they are interrelated. By evaluating this diverse set of electrophysiological signatures of working memory within a single experiment, this work makes four key advances.
First, although it is widely believed that cognitive control over currently relevant representations is mediated by multiple frontal and parietal brain areas, evidence for this in humans comes predominantly from fMRI studies (e.g., Dosenbach, Fair, Cohen, Schlaggar, & Petersen, 2008; Rissman, Gazzaley, & D'Esposito, 2008; Gazzaley et al., 2004; Corbetta & Shulman, 2002). Our results reveal several neurophysiological processes involved in this control and demonstrate that these are highly multifaceted. Specifically, (1) in medial prefrontal regions, we observed theta oscillations that were largely restricted to the retention interval; (2) in medial premotor and parietal areas, we observed gamma oscillations that were most prominent during the transition from encoding to retention; and (3) in superior parietal areas, we observed a sustained ERF component that was prominent both during the retention interval and the report. These different temporal and spectral profiles could not have been identified using fMRI. An important target for future research will be to address how these different electrophysiological signatures relate to the more commonly observed frontoparietal fMRI activations.
Second, whereas the modulation of sensory alpha and beta oscillations has become a highly popular index of attentional gating (van Ede, de Lange, & Maris, 2012; Foxe & Snyder, 2011; Haegens, Händel, & Jensen, 2011; van Ede et al., 2011; Jensen & Mazaheri, 2010; Thut, Nietzel, Brandt, & Pascual-Leone, 2006; Worden, Foxe, Wang, & Simpson, 2000), the electrophysiological processes that control these modulations in sensory areas remain largely elusive. Although combined TMS–EEG and EEG–fMRI studies have revealed important insights into their control—demonstrating, for example, a causal involvement of the FEF (Marshall, O'Shea, Jensen, & Bergmann, 2015; Capotosto, Babiloni, Romani, & Corbetta, 2009)—they have been blind to the neurophysiological processes reflecting this control. Our data suggest at least three such processes. Of course, because our analysis is only correlational, we must be aware of alternative scenarios that could also explain our results. For example, the domain-general components could be involved in monitoring the output of the sensory areas, rather than controlling their activity in a top–down fashion.
Third, whereas frontoparietal engagement is often considered to be domain general, this is not often directly shown. Here, we have shown this for the theta, gamma, and sustained ERF modulations. Still, it cannot be ruled out that these components might involve modality-specific subpopulations/networks that may only be resolved at a finer spatial scale (cf. Chambers, Stokes, & Mattingley, 2004).
Finally, we show that each of the observed domain-general components has a largely unique contribution in predicting activity modulations in sensory areas. This is surprising, as it suggests a set of relatively autonomous, complementary processes that independently regulate sensory activity. It should be noted, however, that at least part of this apparent independence may be driven by the notion that the different spectral components may be susceptible to different sources of noise.
Relating the Sustained ERF Component to Other Sustained Components
In contrast to the alpha, beta, theta, and gamma components, the sustained parietal ERF component reflects a signature that, to our knowledge, has not yet been reported in relation to working memory. Clearly, because this component was similarly present during tactile and visual working memory, it must be distinguished from other sustained working memory components that are characterized by their content/location specificity (e.g., Khader, Ranganath, Seemüller, & Rösler, 2007; Vogel & Machizawa, 2004). What then might this component relate to? Intracranial recordings in the hippocampus previously revealed a sustained ERP component that also depended on working memory load (Axmacher et al., 2007). Whether and how this hippocampal component is related to the here observed parietal component remains an interesting target for future research. Similarly, it would be interesting to know what the counterpart(s) are of this MEG component in the EEG literature, where sustained components have been given more attention. One possibility is that this working memory-related component is related to the sustained frontoparietal EEG component that has been linked to attentional deployment (Grent-'t-Jong & Woldorff, 2007). A direct test of this hypothesis would require a combined EEG–MEG experiment.
No Robust Signature of Sequencing Operations in Working Memory
Surprisingly, all four neurophysiological signatures of working memory observed in the current study scaled with the number of items in memory but remained independent of whether the items were retained in sequence. Moreover, a direct comparison of the Load 4 and 4* conditions across the full dataspace neither revealed any robust difference (results not shown). At least for the medial prefrontal theta oscillations, this observation appears at odds with prior empirical (Roberts, Hsieh, & Ranganath, 2013; Hsieh et al., 2011) and theoretical (Roux & Uhlhaas, 2014) work, arguing for a particular role of these oscillations in retaining order information in working memory. Our data suggest that this notion might require revisiting. However, despite the fact that our behavioral data strongly suggested that our participants dropped the order in the Load 4* condition, it cannot be excluded that the sequence order may have only been dropped at the response stage.
Considering Alternative Cognitive Operations
It is conceivable that, during retention, participants were engaged in probe (reproduction display) and/or response anticipation, in addition to working memory retention. Could our results reflect such processes instead? Probe anticipation as a potential confound is particularly relevant with regard to the observed modulations of sensory alpha and beta oscillations, because these modulations are known to also index the allocation of preparatory attention for an upcoming stimulus (e.g., van Ede et al., 2011, 2012; Foxe & Snyder, 2011; Haegens et al., 2011; Thut et al., 2006; Worden et al., 2000). This potential confound can be ruled out because, in our study, the alpha and beta band modulations were highly dependent on the modality of the retained memoranda, whereas the probes were always presented visually. In further support of a mnemonic function for these alpha and beta modulations, we also note that previous studies have linked these modulations in the visual modality to working memory capacity (Fukuda, Mance, & Vogel, 2015), to content-specific working memory representations (Fukuda, Kang, & Woodman, 2016; Foster, Sutterer, Serences, Vogel, & Awh, 2015), and to impairments in working memory performance with neuropsychiatric conditions (Erickson, Albrecht, Robinson, Luck, & Gold, in press).
Response preparation can also be ruled out as a confound because, during the working memory retention interval, the starting position of the response cursor was unknown to the participants. Thus, what action was required to reach the first item for reproduction would only become clear with the initial reproduction display. Therefore, no specific motor program could be prepared during the retention interval. Of course, this does not rule out more general forms of motor preparation. Because we anticipated this, we required participants to also make a motor response in the control trials, thus equating the trials with respect to these more general forms of motor preparation.
Another potential concern regards ocular artifacts. This is particularly a concern regarding the observed theta modulation, provided its low-frequency nature and frontal topography. However, if eye artifacts were to account for this, one would probably expect to see this modulation in a broader frequency range (because of the sharp transients in ocular artifacts) and to be strongest during encoding. In contrast, the observed modulation was specific to the narrow theta band and was highly specific to the retention interval. Moreover, one would probably also expect eye movements to be more prevalent during visual as compared with tactile working memory retention. Yet, if anything, the frontal theta modulation was slightly larger during tactile retention (Figure 3A and C).
Having excluded probe and response anticipation accounts, this does not imply that the here reported phenomena are necessarily exclusive to working memory: They may still reflect general processes of cognitive control that are also engaged in other tasks. In fact, in the context of a sustained attention task, we previously observed similar sustained ERF and medial frontal gamma components (data from van Ede et al., 2014; unreported observations), and it is noteworthy that a very similar gamma source has also been reported during response competition (Grent-'t-Jong, Oostenveld, Jensen, Medendorp, & Praamstra, 2013) and long-term memory encoding (Meeuwissen, Takashima, Fernández, & Jensen, 2011). Analogously, medial frontal theta oscillations have also been linked to additional cognitive operations, such as long-term memory encoding and retrieval (as reviewed in Hsieh & Ranganath, 2014).
The present work has revealed that the neural dynamics of top–down control are highly multifaceted, demonstrating at least three distinct and complementary neurophysiological processes that flexibly predict retention-related activity in the relevant sensory areas (i.e., the sensory areas that correspond to the information that is currently kept in working memory).
This work was supported by a Newton International Fellowship from the Royal Society and the British Academy (NF140330) to F. v. E.
Reprint requests should be sent to Eric Maris, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Montessorilaan 3, 6525 HR Nijmegen, The Netherlands, or via e-mail: email@example.com.