Working memory (WM) is limited in capacity, but it is controversial whether these capacity limitations are domain-general or are generated independently within separate modality-specific memory systems. These alternative accounts were tested in bimodal visual/tactile WM tasks. In Experiment 1, participants memorized the locations of simultaneously presented task-relevant visual and tactile stimuli. Visual and tactile WM load was manipulated independently (one, two, or three items per modality), and one modality was unpredictably tested after each trial. To track the activation of visual and tactile WM representations during the retention interval, the visual contralateral delay activity (CDA) and tactile CDA (tCDA) were measured over visual and somatosensory cortex, respectively. CDA and tCDA amplitudes were selectively affected by WM load in the corresponding (tactile or visual) modality. The CDA parametrically increased when visual load increased from one to two and to three items. The tCDA was enhanced when tactile load increased from one to two items and showed no further enhancement for three tactile items. Critically, these load effects were strictly modality-specific, as substantiated by Bayesian statistics. Increasing tactile load did not affect the visual CDA, and increasing visual load did not modulate the tCDA. Task performance at memory test was also unaffected by WM load in the other (untested) modality. This was confirmed in a second behavioral experiment where tactile and visual loads were either two or four items, unimodal baseline conditions were included, and participants performed a color change detection task in the visual modality. These results show that WM capacity is not limited by a domain-general mechanism that operates across sensory modalities. They suggest instead that WM storage is mediated by distributed modality-specific control mechanisms that are activated independently and in parallel during multisensory WM.
Working memory (WM) refers to the ability to memorize stimuli over brief periods of time. The most notable feature of WM is its limited capacity, as only three to four items can be successfully maintained in WM (Vogel & Machizawa, 2004; Cowan, 2001). The reasons for these capacity limitations are still under dispute. They may either arise at a central domain-unspecific level or may be generated independently within separate domain-specific storage systems that represent a particular type of information (e.g., visual, auditory, or tactile items). The domain-unspecific account assumes that the limited capacity of WM reflects the limited availability of an attention resource that is shared across sensory modalities and/or the existence of a central storage system (Cowan, 2011). In this case, the same capacity limitations would apply regardless of whether memorized items have been encoded through the same modality or through different modalities. Alternatively, if the maintenance of items from different modalities is mediated by distributed processes that operate independently at peripheral modality-specific levels (Tamber-Rosenau & Marois, 2016), WM capacity limitations should occur within—but not across—sensory modalities.
The question whether WM capacity limits arise at domain-general or domain-specific levels can be tested in multimodal dual-task experiments, where participants simultaneously memorize sets of stimuli from different modalities (e.g., visual and auditory items) and dual-task interference (i.e., performance decrements in one modality due to WM load increments in another modality) is measured. Crossmodal interference effects were found in numerous auditory–visual experiments (Cowan, Saults, & Blume, 2014; Salmela, Moisala, & Alho, 2014; Fougnie & Marois, 2011; Saults & Cowan, 2007; Morey & Cowan, 2005; Cocchini, Logie, Della Sala, MacPherson, & Baddeley, 2002), but the theoretical implications of such effects remain disputed. Some authors have interpreted interference as evidence for a WM store and/or attention mechanism that is shared across sensory modalities (Cowan et al., 2014; Cowan, 2010, 2011; Saults & Cowan, 2007). Others assume that interference in multimodal WM tasks does not reflect a cognitive bottleneck that is specific to WM storage but instead results from general dual-task coordination costs (e.g., Cocchini et al., 2002). The amount of interference between items from different modalities also varies considerably across previous studies. Experiments that found strong interference led to the conclusion that WM maintenance is mediated by a central mechanism (Saults & Cowan, 2007), whereas studies that only found weak interference (Cocchini et al., 2002) or no interference at all (Fougnie, Zughni, Godwin, & Marois, 2015) suggest that WM maintenance relies on processes that are inherently modality-specific. A third possibility is that WM capacity is constrained by both central and modality-specific mechanisms (Cowan et al., 2014; Fougnie & Marois, 2011).
Evidence that modality-specific mechanisms underpin WM maintenance comes from neuroimaging studies showing that stimulus representations are stored in the same cortical areas that have encoded these stimuli into WM (“sensory recruitment hypothesis”; Emrich, Riggall, LaRocque, & Postle, 2013; Jonides, Lacey, & Nee, 2005; Pasternak & Greenlee, 2005). Modality-specific sources of WM capacity limits were identified by studies that predicted visual WM capacity based on the size of primary visual cortex (Bergmann, Genç, Kohler, Singer, & Pearson, 2016) or by the amplitude of the contralateral delay activity (CDA; e.g., McCollough, Machizawa, & Vogel, 2007; Vogel & Machizawa, 2004) over visual cortex. The CDA component emerges in the EEG over posterior visual areas during the retention period of lateralized visual WM tasks. The somatosensory analogue of the CDA has recently been identified in tactile WM experiments (Katus & Müller, 2016; Katus & Eimer, 2015; Katus, Müller, & Eimer, 2015). During the maintenance of lateralized tactile stimuli, a tactile CDA (tCDA) component is elicited with a topographical distribution over somatosensory cortex. Thus, the CDA and tCDA reflect the activation of WM representations in modality-specific visual and somatosensory cortical areas, respectively. Because both components are sensitive to WM load and WM capacity limits (Katus, Grubert, & Eimer, 2015; Vogel & Machizawa, 2004), coregistering them in bimodal visual/tactile WM tasks allows for testing whether WM capacity limitations are shared across sensory modalities or whether they arise independently within modality-specific storage systems. The simultaneous measurement of the tCDA/CDA components in tactile/visual WM tasks (Katus, Grubert, & Eimer, 2017; Katus & Eimer, 2016) is feasible after transforming EEG data to current source densities (CSDs; Tenke & Kayser, 2012). Combining behavioral and EEG measures in investigations of WM capacity limits is important because behavioral performance may reflect not only WM storage but also other capacity-unrelated processes, such as the comparison between memorized and test items (Awh, Barton, & Vogel, 2007). In contrast, CDA components provide online measures of WM maintenance that are unaffected by subsequent memory comparison or response selection processes. A pattern of results where crossmodal interference effects are observed for performance but not for visual CDA and tCDAs would therefore suggest that these effects were generated at later storage-unrelated stages.
In Experiment 1, participants performed a lateralized dual task where visual and tactile items were presented simultaneously in the left and right visual field and to the left and right hand. All items on one side had to be memorized, and WM load was manipulated orthogonally in vision and touch. The critical question was whether the maintenance of visual and tactile items in WM is mediated by a shared central process or by independent modality-specific mechanisms. A recent behavioral dual-task experiment that required memory for visual colors and auditorily presented digits found no crossmodal interactions (e.g., Experiments 1–7 in Fougnie et al., 2015), consistent with the assumption that maintenance operates in a modality-specific fashion. However, such processes might operate independently for different types of content within each modality (Shin & Ma, 2017; Fougnie & Alvarez, 2011; Wheeler & Treisman, 2002). For this reason, Experiment 1 employed a multisensory WM task where participants memorized spatial locations in vision and touch. Although locations are represented in different formats in these modalities (retinotopic or spatiotopic in vision, somatotopic in touch), combining visual and tactile spatial WM tasks may still increase the representational overlap between multisensory information in WM (Tamber-Rosenau & Marois, 2016; Zimmer, 2008) relative to situations where different feature dimensions have to be memorized in different modalities.
On each trial, participants had to memorize the locations of one, two, or three visual items and of one, two, or three tactile items, and memory was unpredictably tested for either modality after the trial. This design allowed us to simultaneously test the effects of increasing WM load within and across modalities on behavioral and electrophysiological measures of WM storage. The number of visual or tactile items that have to be retained should affect performance on trials where the respective modality is tested, with a reduction in accuracy with increased WM load. Increasing visual and tactile WM load should also be reflected by CDA and tCDA amplitudes. Previous unimodal studies have found load-dependent CDA enhancements for set sizes up to three visual items (Vogel & Machizawa, 2004) and tCDA enhancements for load increments from one to two tactile items (Katus, Grubert, et al., 2015). Similar modality-specific load effects should also be found in Experiment 1.
The critical question was whether, in addition to these modality-specific effects, there would be additional costs associated with the manipulation of WM load in the other modality. Domain-general accounts (e.g., Cowan, 2011; Saults & Cowan, 2007) assume that the capacity of visual and tactile WM is limited by a shared central mechanism and that the capacity limit of three to four items found for unimodal WM (Vogel & Machizawa, 2004; Cowan, 2001) also determines the maximum number of items that can be simultaneously maintained in multisensory WM tasks. If this is correct, behavioral and electrophysiological crossmodal load effects should be observed in Experiment 1 when more than three to four multisensory items have to be memorized simultaneously. When vision is tested, WM performance should differ as a function of the number of tactile items that are simultaneously maintained, with crossmodal costs on trials with higher tactile load. Analogous crossmodal costs of increased visual load should be observed on trials where tactile WM is tested. In addition, visual CDA components should be affected by concurrent tactile WM load, with reduced components when tactile load is increased and vice versa for tCDA components and visual load. In contrast, if the maintenance of visual and tactile WM representations operates in an entirely modality-specific fashion, no such crossmodal interference effects should be observed. Load manipulations in vision and touch should produce strictly modality-specific behavioral and electrophysiological effects, but there should be no impact of visual load on tactile WM performance and tCDA components and no effect of tactile load on visual WM performance and CDA components. Because this domain-specific account predicts crossmodal null effects that cannot be confirmed by conventional significance tests (which only allow for rejecting the null hypothesis), we assessed the statistical reliability of null effects using Bayesian statistics (Rouder, Morey, Verhagen, Swagman, & Wagenmakers, 2017; Rouder, Speckman, Sun, Morey, & Iverson, 2009).
The sample size was 30 participants (average age = 28 years, 19 women, 28 right-handed) after exclusion of four participants with excessive EEG artifacts. All participants were neurologically unimpaired and gave informed written consent before testing. The experiment was conducted in accordance with the Declaration of Helsinki and was approved by the psychology ethics committee of Birkbeck, University of London.
Participants were seated in a dimly lit recording chamber with their hands covered from sight. All stimuli were presented for 200 msec. Tactile stimuli (100 Hz sinusoids, intensity 0.37 N) were delivered by eight mechanical stimulators that were attached to the left and right hands' distal phalanges of the index, middle, ring, and little fingers. The stimulators were driven by custom-built amplifiers, controlled by MATLAB routines (The MathWorks, Natick, MA) via an eight-channel sound card (M-Audio, Delta 1010LT). Headphones played continuous white noise to mask any sounds produced by tactile stimulation. Visual stimuli were shown at a viewing distance of 100 cm against a dark gray background on a 22-in. monitor (Samsung SyncMaster 2233; 100 Hz refresh rate, 16 msec RT). Throughout the experiments, the monitor showed black crosshairs (three lines at 0°, 45°, and 90° polar angle; horizontal/vertical eccentricity: 3.44° of visual angle) and three concentric black rings around the fixation dot (eccentricity: 3.15° outer ring, 2.21° middle ring, 1.26° inner ring; see Figure 1). Stimuli shown on different rings had different sizes, which decreased from lateral to medial (0.40°, 0.34°, 0.28° for stimuli on the outer, middle, and inner ring, respectively). A headset microphone recorded vocal responses (“a” for match and “e” for mismatch, see below) during the 1800 msec period after the trial.
The experiment comprised 720 trials, run in 16 blocks. Participants were instructed to memorize the tactile/visual samples on the same side, left or right. The task-relevant side (left or right) was randomized per participant for the first block, remained constant for Blocks 1–8, and then changed to the opposite side for Blocks 9–16. WM load (one, two, or three items) varied on a trial basis independently for each modality, resulting in nine load conditions with 80 trials each. Memory was unpredictably assessed with a tactile or visual test set, resulting in 40 trials per condition where memory was tested for touch and vision. Training was run before the experiment (depending on individual performance between 40 and 80 trials). Feedback about the proportion of correct responses was given after each block.
Stimulation and Randomization Procedure
In each trial, tactile and visual stimuli were simultaneously presented for the bimodal sample set, which was followed by a unimodal test set after 1 sec. Depending on tactile load (NT), we separately selected NT locations for the tactile samples on the left and right side. Tactile tests comprised one stimulus per hand, presented to the same location as a sample or to a different location (match/mismatch, 50% each). Depending on visual load (NV), we separately selected NV locations for the visual samples on the left and right side. These locations were sampled from 110 angular positions (in polar coordinates, left side: 125° to 234°, right side: 305° to 54°), with the constraint that the sampled positions were at least 25° apart. We randomly formed NV pairs of left- and right-sided positions and assigned these coordinate pairs to the same concentric ring (NV rings were selected without replacement to ensure that no ring contained more than two stimuli, i.e., one per side). Each visual test stimulus matched the location of a sample on half of all trials and appeared at a different location on the other half (30° angular offset relative to the location of a randomly selected sample). Regardless of whether memory was tested for touch or vision, matches/mismatches between sample and test were not correlated for the left and right sides.
Acquisition and Preprocessing
EEG data, sampled at 500 Hz using a BrainVision amplifier, were DC-recorded from 64 Ag/AgCl active electrodes at standard locations of the extended 10–20 system. Two electrodes at the eyes' outer canthi monitored horizontal eye movements (horizontal EOG). Continuous EEG data were referenced to the left mastoid during recording and rereferenced to the arithmetic mean of both mastoids for data preprocessing. Data were submitted offline to a 20-Hz low-pass filter (Blackman window, filter order 1000). Epochs were extracted for the 1-sec period after the sample set and were corrected for a 200-msec prestimulus baseline.
Artifact Rejection and Correction
Trials with saccades were rejected using a differential step function that ran on the bipolarized horizontal EOG (step width = 200 msec, threshold = 30 μV). Independent component analysis (Delorme, Sejnowski, & Makeig, 2007) was subsequently used to correct for frontal artifacts such as eye blinks and residual traces of horizontal eye movements that had not been detected by the step function. We rejected trials in which difference values for corresponding left- minus right-hemispheric electrodes exceeded a fixed threshold of ±50 μV (for any electrode pair). We furthermore excluded epochs with unusual spectral profiles; using fast Fourier transforms, we calculated the power of difference values for five frequency bins (between 1 and 9 Hz) for each trial and electrode pair. Spectral power was normalized across trials by means of z transforms. An epoch was rejected if z scores exceeded 3 (for any frequency bin and electrode pair). Notably, this procedure was only used to identify epochs with artifacts; the z scores were discarded after artifact rejection and played no role in any statistical analysis. Epochs entered Fully Automated Statistical Thresholding for EEG Artifact Rejection (Nolan, Whelan, & Reilly, 2010) for the interpolation of noisy electrodes and were subsequently converted to CSDs (iterations = 50, m = 4, lambda = 10−5; Tenke & Kayser, 2012). Ninety-three percent of epochs remained for statistical analysis. Statistical tests were based on correct and incorrect trials; the exclusion of incorrect trials did not change the pattern of results but would have reduced the signal-to-noise ratio of EEG data.
Selection of Electrodes and Time Windows; Topographical Maps
We separately averaged CSDs across three adjacent electrodes contralateral and ipsilateral to the task-relevant side. As in prior studies (Katus et al., 2017; Katus & Eimer, 2016), the tactile and visual CDA components were measured at lateral central (tCDA: C3/4, FC3/4, CP3/4) and occipital scalp regions (CDA: PO7/8, PO3/4, O1/2). Statistical tests were conducted on difference values of contralateral minus ipsilateral CSDs averaged between 300 and 1000 msec after the sample set (cf. Katus, Grubert, et al., 2015; Vogel & Machizawa, 2004).
Spline-interpolated voltage maps illustrate the topographical distribution of lateralized activity during the retention period (300–1000 msec). These maps were obtained by subtracting ipsilateral CSDs from contralateral CSDs, with contra/ipsilateral referring to the task-relevant side. To collapse data across blocks where the left or right side was task-relevant, electrode coordinates were flipped over the midline for left-side memory blocks. Therefore, in the topographical maps, a negative potential over the left hemisphere indicates the presence of CDA for the task-relevant sample stimuli.
Data were analyzed with paired t tests and repeated-measures ANOVAs, with Greenhouse–Geisser adjustments when appropriate. Error bars in graphs indicate confidence intervals for the true population mean. Thus, error bars that do not overlap with the zero axis (y ≠ 0) inform about statistically significant tCDA/CDA components; error bars that do not overlap with chance level (y ≠ 50%) indicate behavioral performance that is significantly above chance.
Bayesian t tests (Rouder et al., 2009) and the software Jasp (JASP team 2016) were used to calculate Bayes factors (BFs) for each main effect/interaction in our statistical designs. The BF denotes the relative evidence for the alternative hypothesis as compared with the null hypothesis and thus allows for statistical inferences regarding the presence or absence of a modulation. The BF for the null hypothesis (BF01) corresponds to the inverse of the BF for the alternative hypothesis (BF10) and indexes the relative evidence in the data that an effect is absent rather than present. We report the numerically larger BF; reliable evidence for either hypothesis is marked by a BF of >3 (Jeffreys, 1961), suggesting that the empirical data is at least three times more likely under this hypothesis as compared with the competing hypothesis.
Tactile and visual CDA components (tCDA/CDA) entered an ANOVA with the factors Component (tCDA, CDA), Tracked modality load (TL: tactile load for the tCDA, visual load for the CDA), and Untracked modality load (UL: visual load for the tCDA, tactile load for the CDA). As observed previously (Katus et al., 2017), the CDA component was larger than the tCDA (Component: F(1, 29) = 42.893, p < 10−6, BF10 > 1032). Load manipulations in touch and vision selectively modulated the tCDA and CDA component, respectively (TL: F(1.344, 38.973) = 23.238, p < 10−5, BF10 > 106). Critically, the tCDA was not sensitive to differences in visual load, and the CDA was unaffected by the manipulation of tactile load (UL: F(2, 58) = 0.141, p = .727, BF01 = 41.251), and there was no interaction between Load in the two modalities (TL × UL: F(3.001, 87.025) = 0.890, p = .450, BF01 = 48.282). Load-dependent enhancements of CDA/tCDA amplitudes differed between touch and vision (Component × TL: F(2, 58) = 14.457, p < 10−5, BF10 > 103). This is illustrated in Figure 2, where the black line graphs on the bottom row show the impact of tactile load on the tCDA (left) and the influence of visual load on the CDA (right). Visual load parametrically enhanced the CDA (collapsed for tactile load, comparison of one vs. two visual items: t(29) = 2.349, p = .026, BF10 = 2.039; two vs. three visual items: t(29) = 6.150, p < 10−5, BF10 > 104), with largest CDA amplitudes measured in trials with three visual items (cf. Vogel & Machizawa, 2004). In contrast, the tCDA reached asymptote for two tactile items (collapsed for visual load, one vs. two tactile items: t(29) = 3.712, p < 10−3, BF10 = 37.518; comparison 2 vs. 3 items: t(29) = 1.215, p = .234, BF01 = 2.635). All remaining effects were nonsignificant (Component × UL: F(2, 58) = 0.996, p = .375, BF01 = 14.497; Component × TL × UL: F(4, 116) = 0.955, p = .435, BF01 = 18.427).
The percentage of correct responses entered an ANOVA with the factors Tested modality (touch, vision), Tested modality load (TL: tactile or visual load, depending on whether memory was tested for touch or vision on a given trial), and Untested modality load (UL: load for the other, untested, modality). Participants responded correctly in 79.4% and 87.1% of trials where memory was tested for touch and vision, respectively, and this difference was significant (tested modality: F(1, 29) = 21.583, p < 10−4, BF10 > 1012). Most importantly, as shown in Figure 3A, load manipulations caused strictly modality-specific effects. Performance decreased when load increased in the tested modality from one to two and three items (TL: F(2, 58) = 226.533, p < 10−20, BF10 > 1060). Critically, no such decrements were found as a result of increased load in the untested modality (UL: F(2, 58) = 1.883, p = .161, BF01 = 26.742). All other effects were nonsignificant (TL × UL: F(4, 116) = 0.812, p = .520, BF01 = 68.807; Tested Modality × TL: F(2, 58) = 0.880, p = .420, BF01 = 10.223; Tested Modality × UL: F(2, 58) = 1.321, p = .275, BF01 = 16.504; Tested Modality × TL × UL: F(3.081, 89.357) = 1.170, p = .328, BF01 = 17.315).
To assess modality-specific capacity limits for visual and tactile WM in Experiment 1, we calculated Cowan's K (Cowan, 2001) for Load 2 and Load 4 in vision and touch (collapsing across load in the other untested modality). For visual WM, K values of 1.43 and 1.77 were obtained on Load 2 and Load 3 trials, and this difference was highly reliable (t(29) = 7.521, p < 10−7, BF01 > 105). For tactile WM, K values of 1.13 and 1.23 were obtained on Load 2 and Load 3 trials. This increase was not significant (t(29) = 1.443, p = .160, BF01 = 2.022), suggesting that, in contrast to vision, the capacity of tactile WM was already exhausted with a load of two items. For comparison, K values increased significantly between Load 1 and Load 2 trials not only in vision (0.92 vs. 1.43; t(29) = 9.644, p < 10−9, BF01 > 106), but also in touch (0.79 vs. 1.13; t(29) = 5.838, p < 10−5, BF01 > 103).
In Experiment 1, manipulations of visual and tactile WM load produced entirely modality-specific effects, and no crossmodal interference effects were found either for visual CDA and tCDA components or for behavioral performance in the bimodal WM task. This pattern of results seems to suggest that WM capacity limitations are strictly modality-specific. However, alternative interpretations remain. The load manipulations used in Experiment 1 may not have been sufficiently high to produce crossmodal costs. Previous experiments where visual and auditory WM tasks were combined found no dual-task interference when auditory WM load was low (e.g., Morey & Cowan, 2005; Luck & Vogel, 1997), whereas such effects typically emerged with higher loads (e.g., Saults & Cowan, 2007; Cocchini et al., 2002; but see Fougnie et al., 2015, for an exception). Although the WM capacity estimates for vision and touch in Experiment 1 suggest that a maximal load of three items exhausted the capacity of visual and tactile stores, performance may have been affected by the specific demands of the lateralized WM task used in this experiment. For example, items that were located on the to-be-ignored side of the sample set could have interfered with the encoding of the task-relevant items in the same modality, resulting in an underestimation of WM capacity limitations. Participants may also have adopted specific strategies for reducing the effective loads of the visual and tactile WM tasks. In the visual task, some perceptual grouping of item locations may have occurred, especially for Load 3. On Load 3 trials in the tactile task, three of the four stimulators on the task-relevant hand were activated. In some of these trials, participants may have only memorized the single nonstimulated location, thereby reducing tactile load from three to one on these trials.
Experiment 2 was designed to address all of these possible shortcomings of Experiment 1. In this purely behavioral experiment, bilateral visual and tactile WM tasks were used where participants had to memorize all visual and tactile sample stimuli in both visual hemifields and both hands. Because all sample stimuli were now task-relevant, there was no longer any possibility of interference by to-be-ignored items of the sample set. In bimodal trials, visual and tactile load was varied independently (two or four items). On tactile Load 4 trials, two sample items were delivered to the left hand and two to the right hand, so that a strategy to only memorize a single nonstimulated location was no longer available. To eliminate potential grouping strategies for memorized visual positions in trials with high visual load, the spatial WM task was replaced with a color task for the visual modality. We used the standard color change detection procedure introduced by Luck and Vogel (1997). Observers had to memorize two or four colors and to report whether one of these colors was changed in the test display. Importantly, Experiment 2 also included unimodal baseline trials where two or four visual or tactile items had to be memorized to demonstrate that a unimodal load of four items was sufficient to exhaust the capacity of visual and tactile WM stores. If crossmodal interference effects emerge when the effective WM load within both modalities is sufficiently high, such effects should be observed in Experiment 2.
Twelve participants (average age = 28.8 years, seven women, 10 right-handed) were tested. All were neurologically unimpaired and gave informed written consent.
Stimuli and Procedure
These were similar to Experiment 1, with the following exceptions. No EEG was recorded during task performance. The WM task was no longer lateralized, as visual and/or tactile sample stimuli on both sides were task-relevant. WM load was two or four items (separately varied for touch/vision), and unimodal visual and tactile baseline trials (Load 2 or 4) were also included. The tactile task was similar to the one used in Experiment 1. Participants had to memorize the locations of all tactile sample stimuli that could be presented to the index, middle, ring, or little fingers of the left and right hand. The stimulated locations on each hand were chosen randomly and independently on each trial. In Load 2 trials, one finger on each hand was stimulated. In Load 4 trials, sample stimuli were delivered to two fingers of each hand. The tactile test set included two or four tactile stimuli in Load 2 and Load 4 trials, respectively. On match trials, the test set was identical to the memory set. On mismatch trials, one randomly selected sample location was replaced by a different location on the same hand. The visual task was now a bilateral color change detection task. Sample displays contained two or four differently colored squares (each covering 0.52° × 0.52° of visual angle). The colors shown on each trial were randomly selected from a set of six possible colors (CIE color coordinates for red: .627/.336; green: .263/.568; blue: .189/.193; yellow: .422/.468; cyan: .212/.350; magenta: .289/.168). All colors were equiluminant (11.8 cd/m2). On Load 2 trials, two sample squares were presented to the left and right of fixation at a horizontal eccentricity of 1°. On Load 4 trials, two horizontally aligned squares were presented above and two below fixation, each at a horizontal and vertical eccentricity of 1°. Participants had to memorize the colors of all sample stimuli. On match trials, the test set was identical to the sample set. On mismatch trials, one item in the test set changed its color relative to the sample set.
The experiment included 480 trials, run in eight blocks of 60 trials. There were 320 bimodal and 160 unimodal trials that were randomly intermixed in each block. For bimodal trials, visual and tactile load (two or four item) was varied independently, resulting in four different load conditions. Memory was unpredictably tested for touch or vision (160 trials each, with 40 trials for each for the four load conditions). In the unimodal trials, the sample and test sets were presented in the same modality (80 tactile and 80 visual; with 40 trials each for Load 2 and Load 4). As in Experiment 1, vocal responses (“a” for match and “e” for mismatch) were registered with a headset microphone for each trial. The timing of all sample and test events was identical to Experiment 1.
Figure 3B shows accuracy on trials where touch or vision was tested, for each combination of WM load in the tested modality (two or four items) and load in the untested modality (zero items in the unimodal baseline, otherwise two or four items). There were clear effects of increasing WM load for the tested modality, but no apparent effects of load in the other untested modality. We first assessed whether increasing visual and tactile load to four items was sufficient to exhaust the capacity of visual and tactile WM by calculating Cowan's K as a measure of WM capacity for the two single-task visual and tactile baseline conditions, separately for loads of two and four items. With Load 2, K was 1.91 and 1.94 for the tactile and visual tasks, respectively, reflecting near-perfect performance. With Load 4, K was 3.13 in the tactile task and 3.25 in the visual task. This indicates that a WM load of four items exhausted the capacity of both tactile and visual stores.
For the main analysis, accuracy entered an ANOVA with the factors tested modality load (TL: two or four items), untested modality load (UL: 0, two or four items), and tested modality (TM: vision or touch). This analysis confirmed the presence of strong modality-specific load effects in the absence of any crossmodal effects. Accuracy was lower when four rather than two items had to be memorized in the tested modality (TL, F(1, 11) = 43.575, p < 10−4, BF10 > 1015). In contrast, there was no impairment of WM performance due to load in the untested modality (UL: F(2, 22) = 1.333, p = .284, BF01 = 6.550) and no interaction between load in the tested and untested modalities (TL × UL: F(2, 22) = 0.623, p = .546, BF01 = 7.339).1 Accuracy did not differ between the tactile and visual tasks (93.4% vs. 94.5%, averaged across all load conditions, main effect TM: F(1, 11) = 0.631, p = .444, BF01 = 2.220). There were no other significant interactions (TM × TL: F(1, 11) = 0.095, p = .763, BF01 = 3.634; TM × UL: F(2, 22) = 0.677, p = .518, BF01 = 7.553; TM × TL × UL: F(2, 22) = 0.648, p = .533, BF01 = 4.682).
We investigated whether the maintenance of information in WM is mediated by a domain-general (i.e., central/supramodal) mechanism or by processes that operate independently for WM content that has been encoded via different sensory modalities. In two experiments, we employed bimodal tactile–visual WM tasks and manipulated WM load orthogonally for both modalities. In Experiment 1, spatial WM tasks were used in both modalities. EEG was recorded during task performance, and tCDA and visual CDA components were measured to concurrently track the activation of tactile and visual WM representations.
If visual and tactile WM representations were maintained by a central mechanism, varying visual load should affect the somatosensory tCDA component, and changes in tactile load should modulate the visual CDA. There were no such crossmodal load effects in Experiment 1. CDA amplitudes were entirely unaffected by manipulations of tactile WM load, and tCDA amplitudes remained equally insensitive to manipulations of visual load. The reliability of these null effects was confirmed by Bayesian statistics. BFs (see Rouder et al., 2017) for each main effect and interaction in our factorial design (such as TL, UL, and TL × UL) quantify the relative evidence in the data for the null hypothesis (e.g., the absence of an effect of WM load in the untracked modality) as compared with the alternative hypothesis (the presence of such an effect). The BFs strongly support the null hypothesis with regard to load in the untracked modality (factor UL) and its interaction with load in the tracked modality (TL × UL), thus confirming the absence of crossmodal interference effects on the tCDA (due to visual load) and on the visual CDA (due to tactile load). Adopting a commonly used categorization of BF sizes (Jeffreys, 1961), we found very strong evidence for the absence of tCDA/CDA modulations due to the factor UL (BF01 = 41), as well as very strong evidence for the absence of an interaction between TL and UL (BF01 = 48). For both these effects, the null hypothesis was over 40 times more likely to account for the empirical data than the alternative hypothesis. This electrophysiological evidence for the absence of crossmodal load effects is at least four times stronger than suggested by behavioral evidence, obtained in a recent auditory/visual WM experiment (Fougnie et al., 2015), where BFs01 ranged between 7 and 10. It is notable that these highly reliable null effects were accompanied by decisive evidence for an impact of factor TL (BF10 > 106), indicating the presence of load-dependent tCDA/CDA modulations for manipulations of tactile/visual WM load, respectively. These results therefore unequivocally support the conclusion that the tactile and visual CDA components reflect WM maintenance processes that operate in a strictly modality-specific fashion.
This conclusion was further supported by the behavioral results of Experiment 1. For the modality assessed at memory test, increments in WM load led to parametric reductions in performance, but performance was insensitive to load in the untested modality (Figure 3A). Converging with electrophysiological data, Bayesian analysis of behavioral performance provided strong to very strong evidence for the absence of crossmodal load effects (BF01 = 27 for factor UL and BF01 = 69 for the TL × UL interaction) and decisive evidence for the presence of modulations due to increments in load for the modality that was tested after the trial (BF10 > 1060 for factor TL). It would in principle have been possible to observe crossmodal load effects for performance only, without any corresponding effects on CDA and tCDA components. Such a pattern of results would have suggested that crossmodal interference specifically affects stages other than WM maintenance, such as the comparison between memorized and test stimuli. In fact, the electrophysiological and behavioral results of Experiment 1 mirrored each other perfectly, with no evidence for crossmodal load effects for either measure. This indicates that none of the stages involved in WM performance were selectively affected by concurrent WM load in another modality.
The fact that performance in Experiment 1 was better in the visual relative to the tactile task could indicate that participants had prioritized vision over touch. This should have produced asymmetrical crossmodal interference effects according to a domain-general account of WM capacity. For example, if visual stimuli had been preferentially encoded into a shared domain-general WM store, performance on trials where memory was tested for a tactile load of three items should be worse with visual Load 3 relative to visual Load 1. Because accuracy data from trials where vision or touch were tested were analyzed together, the presence of selective crossmodal costs for the low-priority (tactile) modality should have been reflected by a three-way interaction (Tested Modality × TL × UL). As reported above, there was strong evidence for the absence of this interaction (BF01 > 17). Likewise, we found strong evidence against asymmetrical crossmodal interference effects on tactile and visual CDA components (Component × TL × UL; BF01 > 18). These observations suggest that performance differences between the tactile and visual tasks in Experiment 1 were not attributable to a modality prioritization strategy.
The ERP results of Experiment 1 revealed a difference between the effects of memory load in the tracked modality (TL) on CDA and tCDA components. Increasing visual load led to parametric amplitude enhancements of the CDA component over visual cortex, with largest CDA amplitudes on trials where three visual items had to be memorized, in line with previous experiments of unimodal visual WM (McCollough et al., 2007; Vogel & Machizawa, 2004). The tCDA component over somatosensory cortex increased in amplitude when tactile load increased from one to two items (compare Katus, Grubert, et al., 2015, for unimodal tactile WM), but no further tCDA enhancement was obtained for three tactile items. This difference between the visual CDA and tCDA components was mirrored by behavioral capacity estimates for visual and tactile WM. In vision, Cowan's K increased significantly when visual load was increased from two to three items, whereas no such increase was observed for touch, indicating that, in the specific task context of Experiment 1, the capacity limit of tactile WM was already reached with two items. The fact that tactile WM capacity was substantially higher in the nonlateralized WM task used in Experiment 2 shows that more than two tactile items can be successfully maintained in some conditions (see below for further discussion). It remains to be determined whether it is principally possible to obtain tCDA enhancements beyond a load of two tactile items in other task contexts. Importantly, any difference between CDA and tCDA asymptotes does not affect our key finding that the load-dependent modulations of CDA and tCDA amplitudes were strictly modality-specific, as demonstrated by the fact that these amplitudes remained entirely unaffected by manipulations of WM load in the other modality.
To rule out the possibility that the absence of crossmodal load effects was due to the specific task demands of Experiment 1, we ran a second behavioral experiment with a nonlateralized design where all sample stimuli were task-relevant. Visual and tactile load was two or four items, the spatial WM task in the visual modality was replaced by a color change detection task, and unimodal baseline trials were included. The results of Experiment 2 fully confirmed the findings of Experiment 1, with strictly modality-specific load effects, and no evidence for any crossmodal interference. Capacity estimates on baseline trials confirmed that a load of fuor items was sufficient to exhaust the capacity of visual and tactile stores. Furthermore, the design of Experiment 2 prevented participants from reducing effective WM load by grouping locations in the visual task or remembering nonstimulated locations in the tactile task. The fact that load effects remained entirely modality-specific in this experiment thus suggests that the analogous pattern observed in Experiment 1 was not due to insufficient demands on storage capacity but instead reflects the independence of WM maintenance processes in different modalities.
It is notable that WM performance differed considerably between these two experiments, with much better performance in Experiment 2. This difference was particularly pronounced for the tactile WM task, in spite of the fact that participants had to memorize stimulated locations in both experiments. Even on tactile Load 1 trials, accuracy was well below 100% in Experiment 1. The improved tactile WM performance in Experiment 2 is most likely due to the fact that a nonlateralized WM task was used where all tactile sample stimuli on both hands to be memorized. In contrast to the lateralized task in Experiment 1, there was no longer any interference from stimulated locations on the other unattended hand, and the average distance between two tactile stimuli on the same hand was larger. The finding that approximately three tactile stimuli could be successfully retained on Load 4 trials in Experiment 2 demonstrates that, under such optimal conditions, the capacity of tactile WM stores appears to be limited to three items. Visual WM accuracy was also better with the highly distinguishable color stimuli used in Experiment 2 relative to the spatial WM task with monochrome stimuli in Experiment 1. Previous research has shown that visual WM performance is affected by the features that have to be memorized, with tasks involving color typically yielding better performance than tasks where other stimulus dimensions have to be retained (e.g., orientation or shape; Woodman & Vogel, 2008; Awh et al., 2007; Alvarez & Cavanagh, 2004). In addition, some interference from stimuli in the unattended visual field may also have contributed to the lower visual WM performance in Experiment 1. However, the behavioral estimate of WM capacity in Experiment 2 (K = 3.25 items) is in line with the parametric load-dependent CDA enhancements observed in Experiment 1 (for up to three visual items).
What does the absence of crossmodal interference effects on performance in both experiments and on CDA and tCDA amplitudes in Experiment 1 imply for the nature of mechanisms that control the storage of information in WM? It is established that WM and selective attention are closely intertwined (Gazzaley & Nobre, 2012; Ruchkin, Grafman, Cameron, & Berndt, 2003; Awh & Jonides, 2001) and that attentional mechanisms underpin the active maintenance of WM representations (e.g., Emrich, Lockhart, & Al-Aidroos, 2017; Awh, Vogel, & Oh, 2006). Attention optimizes WM representations in a goal-directed fashion (Myers, Stokes, & Nobre, 2017; Lepsien & Nobre, 2006), and the allocation of attention to task-relevant items in WM enhances performance (e.g., Griffin & Nobre, 2003). In line with these ideas, electrophysiological evidence suggests that lateralized delay activity (such as the tCDA/CDA) does not reflect information storage as such, but more specifically the attentional activation of representations of memorized stimuli in sensory cortex (e.g., Berggren & Eimer, 2016; Katus & Eimer, 2015; Kuo, Stokes, & Nobre, 2012). This is analogous to the early interpretation of delay activity in the pFC of monkeys as the indication of a top–down attentive process (Fuster & Alexander, 1971). Although passive mechanisms may also be involved in the short-term storage of information (Mongillo, Barak, & Tsodyks, 2008; for a review of activity-silent WM, see Stokes, 2015), CDA/tCDA components reflect activation-related aspects of WM maintenance that are mediated by selective attention (Katus & Müller, 2016; Katus & Eimer, 2015; Unsworth, Fukuda, Awh, & Vogel, 2015; Vogel, McCollough, & Machizawa, 2005). If these active maintenance processes were limited by the capacity of a central attention mechanism (Cowan, 2011), they should be adversely affected by increasing WM load in another modality, provided that this results in an overall bimodal WM load exceeds the capacity of this domain-general mechanism. However, this study found that increasing multisensory load above the three- to four-item capacity limit of unimodal WM (Vogel & Machizawa, 2004; Cowan, 2001) did not produce any crossmodal interference effects for CDA and tCDA amplitudes. The absence of such effects suggests that the maintenance processes indexed by the tCDA/CDA components are mediated by modality-specific attention mechanisms with independent capacities for tactile and visual information that are activated in parallel during the maintenance of multisensory information.
Such modality-specific attentional control processes operate within hierarchically organized WM systems (Brady, Konkle, & Alvarez, 2011), which are controlled in a top–down fashion by higher-level executive mechanisms (e.g., Katus et al., 2017). This distributed nature of WM (Christophel, Klink, Spitzer, Roelfsema, & Haynes, 2017; Fuster, 2009) can account for the fact that the capacity of multisensory WM (i.e., the number of multisensory items that can be recalled at memory test) exceeds the capacity of unimodal WM (Fougnie et al., 2015; Cowan et al., 2014; Fougnie & Marois, 2011). In such a distributed processes architecture, capacity limitations can arise due to the competition between stimulus representations that are stored in the same cortical map (in somatosensory vs. retinotopic cortex, for tactile vs. visual information; cf. cortical real estate hypothesis: Bergmann et al., 2016; Franconeri, Alvarez, & Cavanagh, 2013) and due to capacity limitations of the maintenance processes that keep these sensory representations in an active state (as indexed by the tCDA/CDA in tactile/visual WM tasks). Instead of assuming that multisensory items compete for representation in a central WM store and/or for domain-unspecific attention resources (Cowan, 2011; Saults & Cowan, 2007), crossmodal interference effects observed in bimodal WM tasks are likely to reflect factors that are unrelated to WM capacity (e.g., costs that arise during dual-task coordination or during the simultaneous encoding of multisensory stimuli, response selection, etc.; see Fougnie et al., 2015; Brisson & Jolicœur, 2007; Cocchini et al., 2002, for further discussion). Competitive interactions between modality-specific maintenance processes may also contribute to such costs, given that these processes rely on feedback signals from a common source (such as a central executive; Baddeley, 2003). This is most likely to happen in bimodal WM tasks with extremely high load (e.g., 10 multisensory items, as in Cowan et al., 2014), as such tasks may compromise the ability of the central executive system to effectively coordinate and sustain concurrent activation processes within different sensory modalities (cf. Tamber-Rosenau & Marois, 2016).
Building on evidence that WM recruits sensory mechanisms for information storage, we here show that WM additionally recruits modality-specific control mechanisms to regulate the activation of stimulus representations in somatosensory and visual cortex. The parallel functioning of such distributed processes during the retention of multisensory information explains the absence of crossmodal load effects on behavioral and electrophysiological measures of WM and can also account for the enhanced capacity of multisensory WM relative to unimodal WM.
This work was funded by the Leverhulme Trust (grant RPG-2015-370). We thank Sue Nicholas for help in setting up the tactile stimulation hardware, Andreas Widmann for providing EEGLab plug-ins for digital filtering and spherical spline interpolation, and Anna Grubert for comments on the manuscript.
Reprint requests should be sent to Tobias Katus, Department of Psychology, Birkbeck, University of London, Malet Street, London, UK, WC1E 7HX, or via e-mail: email@example.com.
To assess whether behavioral measures reflected a tradeoff between the number of tactile and visual items maintained in WM, we calculated ΔK to obtain a normalized measure of any interference between the tactile and visual tasks. The ΔK measure (Fougnie & Marois, 2011) quantifies dual-task interference relative to single-task baseline conditions in terms of a value ranging between 0% (reflecting fully independent WM capacities for two tasks/modalities) and 50% (fully shared WM capacity). ΔK for trials where load was four in both modalities was on average 0.4% (relative to the unimodal four-item baselines). ΔK values were significantly below 50% (t(11) = 12.530, p < 10−7, BF10 > 104), but not different from 0% (t(11) = 0.112, p = .913, BF01 = 3.461), indicating distinct rather than shared capacities for tactile and visual WM.