Selective attention regulates the activation of working memory (WM) representations. Retro-cues, presented after memory sample stimuli have been stored, modulate these activation states by triggering shifts of attention to task-relevant samples. Here, we investigated whether the control of such attention shifts is modality-specific or shared across sensory modalities. Participants memorized bilateral tactile and visual sample stimuli before an auditory retro-cue indicated which visual and tactile stimuli had to be retained. Critically, these cued samples were located on the same side or opposite sides, thus requiring spatially congruent or incongruent attention shifts in tactile and visual WM. To track the attentional selection of retro-cued samples, tactile and visual contralateral delay activities (tCDA and CDA components) were measured. Clear evidence for spatial synergy effects from attention shifts in visual WM on concurrent shifts in tactile WM were observed: Tactile WM performance was impaired, and tCDA components triggered by retro-cues were strongly attenuated on opposite-sides relative to same-side trials. These spatial congruency effects were eliminated when cued attention shifts in tactile WM occurred in the absence of simultaneous shifts within visual WM. Results show that, in contrast to other modality-specific aspects of WM control, concurrent attentional selection processes within tactile and visual WM are mediated by shared supramodal control processes.
Working memory (WM) is responsible for the maintenance of perceptual information that is no longer physically present and for making this information accessible to other cognitive and response-related processes (e.g., Baddeley, 2012). It is generally believed that spatial attention plays a central role during the encoding and storage of sensory signals in WM (see Awh, Vogel, & Oh, 2006, for discussion). WM maintenance relies on the same frontoparietal networks that are also involved in the control of spatial attention (Awh & Jonides, 2001). Decreasing the distance between memorized stimuli reduces WM precision and increases binding errors in WM (Ahmad et al., 2017; Emrich & Ferber, 2012), reflecting a competition for spatial attention during encoding and WM maintenance of visual objects. This suggests that spatial attention is responsible for the binding of stimulus features and the formation of object-based visual representations not only in perception (e.g., Robertson, 2003) but also in WM.
If spatial attention is critical for the encoding of sensory stimuli into WM, these encoding processes should be affected by the spatial location of these stimuli. This factor should be particularly important for the concurrent encoding of to-be-memorized stimuli from different sensory modalities in multisensory WM tasks. It has been suggested that WM for stimuli from different sensory modalities relies on shared top–down attentional control mechanisms (e.g., Cowan, 2011). If this were the case, spatial synergies should be observed when to-be-memorized sample stimuli from different modalities are presented simultaneously. For example, when visual and tactile sample stimuli appear on opposite sides, concurrent attention shifts in opposite directions are required during the encoding of these stimuli. If the underlying attentional control processes are linked across vision and touch, this should result in costs relative to situations where these stimuli appear on the same side. Such spatial synergies have been demonstrated in previous behavioral and electrophysiological experiments on crossmodal links in spatial attention (e.g., Eimer, van Velzen, & Driver, 2002; Spence & Driver, 1996) for the perceptual processing of multisensory stimuli. Allocating attention to concurrent visual, auditory, or tactile stimuli was more efficient when these stimuli appeared at the same location than when they were presented on opposite sides, suggesting that these attention shifts are controlled by shared central mechanisms (see Eimer & Driver, 2001, for further discussion). However, two recent bimodal WM studies from our lab (Katus & Eimer, 2016, 2019b) have failed to find any evidence for such spatial congruency effects for the encoding and maintenance of visual and tactile sample stimuli. The goal of the present experiments was to investigate whether such crossmodal spatial synergies might emerge in tasks where attention has to be allocated retrospectively to visual and tactile items that are already stored in WM.
In our first previous study (Katus & Eimer, 2016), visual and tactile samples were presented bilaterally, but only stimuli on one of these sides had to be memorized in each modality. In different blocks, task-relevant visual and tactile samples were either located on the same side or on opposite sides. To assess whether the spatial congruency of these samples affected visual and tactile WM encoding and maintenance processes, EEG was recorded during task performance to measure the visual contralateral delay activity (CDA) and its tactile equivalent, the tCDA component. The CDA and tCDA are elicited contralateral to the side of task-relevant visual sample stimuli (e.g., Vogel & Machizawa, 2004) or tactile sample stimuli (e.g., Katus & Eimer, 2015). Both components are sensitive to WM load and individual differences in WM capacity (Katus, Grubert, & Eimer, 2015; Vogel & Machizawa, 2004) and have modality-specific topographies over visual and somatosensory cortex, respectively. This indicates that they reflect the activation of WM representations in modality-specific perceptual cortical areas, as proposed by the sensory recruitment account of WM (Postle, 2006; Jonides, Lacey, & Nee, 2005). In our study (Katus & Eimer, 2016), visual CDA and tCDA components were elicited over the same hemisphere in same-side blocks and over different hemispheres in opposite-sides blocks, contralateral to the respective task-relevant visual and tactile sample items. Critically, no crossmodal spatial congruency effects were found; visual CDA and tCDA were elicited at the same time and were identical in amplitude in same-side and opposite-sides blocks, and there were also no costs for WM performance in opposite-sides blocks.
Similar results were observed in our second bimodal tactile/visual WM study (Katus & Eimer, 2019b). Here, participants memorized task-relevant visual and tactile stimuli in a first memory sample set (S1) before encoding relevant samples in a second sample set (S2). Critically, relevant visual or tactile S2 samples appeared unpredictably on the same side or on the side opposite to the task-relevant S1 samples. Thus, the foci of tactile and visual attention could either be maintained on the same side or had to be redirected to opposite sides in the period following the S2 samples. As expected, CDA and tCDA components reversed polarity on trials where visual or tactile S1 and S2 samples appeared on opposite sides, reflecting shifts of spatial attention on these trials. However and importantly, there were no crossmodal interactions in spatial selection: The visual CDA was entirely unaffected by concurrent attention shifts in tactile WM, and the tCDA was insensitive to shifts of attention in visual WM. Visual and tactile WM performance was also not modulated by attention shifts in the other modality. Overall, the results of these two studies suggest that the control processes that allocate spatial attention during the encoding and subsequent maintenance of visual and tactile sample stimuli are not linked but operate in an entirely independent modality-specific fashion (see also Katus & Eimer, 2018, 2019a, for analogous evidence for independent capacities of spatial WM in touch and vision).
In these previous experiments, attention shifts could already take place during the encoding of the sample displays into WM, as the identity of the task-relevant target samples was known in advance (i.e., before the samples were presented). However, selective spatial attention does not only mediate encoding but can also selectively modulate the activation states of representations that are already stored in WM. This has been demonstrated in experiments where retro-cues presented during the delay period specified a subset of task-relevant visual sample stimuli (e.g., Lepsien & Nobre, 2006; Griffin & Nobre, 2003). These retro-cues produced clear benefits for WM performance, demonstrating that attention modulates WM representations in line with task goals. ERP studies have further shown that retro-cues trigger visual CDA (Kuo, Stokes, & Nobre, 2012; Eimer & Kiss, 2010) and tCDA (Katus, Müller, & Eimer, 2015) components, reflecting changes in the attentional activation states of visual and tactile WM representations in bimodal memory tasks (Katus, Grubert, & Eimer, 2017).
Previous retro-cue studies demonstrate that spatial attention can be selectively allocated to representations that are already stored in WM. This raises the question whether the retrospective selection of information in WM is mediated by processes that operate in a strictly modality-specific fashion or whether these processes are linked across different modalities, resulting in crossmodal interactions during spatial selection. The goal of the current study was to investigate this question. We employed bimodal WM tasks where visual/tactile sample stimuli were followed by retro-cues that marked a subset of the visual and tactile samples as relevant for a comparison with a subsequent test stimulus (unpredictably vision or touch). In Experiment 1, participants had to initially encode four visual samples presented in both visual fields and two tactile samples presented to both hands. These samples were either filled or unfilled (see Methods section for details), and an auditory retro-cue presented 700 msec after the sample stimuli indicated whether filled or unfilled visual and tactile samples had to be retained (see Figure 1). Once the retro-cue appeared, attention could be allocated to the relevant visual and tactile sample items, and irrelevant samples could be dropped from WM. The critical manipulation concerned the spatial congruency of the visual and tactile samples that had to be maintained. On same-side trials, these samples had been presented on the same side (both left or both right). On opposite-sides trials, their spatial relationship was incongruent (visual samples on the left and tactile samples on the right, or vice versa).
CDA and tCDA components were measured during the intervals between samples and retro-cues and between retro-cues and test displays, separately for same-side and opposite-sides trials, and WM performance was also compared between these two types of trials. The critical question was whether spatial congruency would affect the selective activation of retrospectively cued visual and tactile WM representations and subsequent WM performance. If retrospective attention shifts within visual and tactile WM are mediated by shared top–down control processes, spatial synergies should emerge. This should be reflected by attenuated visual CDA and tCDA components in response to retro-cues on opposite-sides trials relative to same-side trials and by impaired visual and tactile WM performance on opposite-sides trials. Alternatively, the absence of such spatial congruency effects would suggest that the allocation of attention to visual and tactile representations that are already stored in WM is not controlled by a central supramodal system but by independent modality-specific processes.
Sixteen participants (mean age = 29 years; nine women, one left-handed) took part in Experiment 1. Three additional participants were excluded from statistical analysis because of excessive alpha activity and EEG artifacts that led to the exclusion of more than 30% of trials. All participants were neurologically unimpaired and gave informed written consent before testing. The experiment was conducted in accordance with the Declaration of Helsinki and was approved by the Psychology Ethics Committee of Birkbeck, University of London.
Stimulus Material and Setup
Participants were seated in a dimly lit recording chamber with their hands covered from sight. Headphones played continuous pink noise during EEG recordings to mask any sounds produced by tactile stimulation. Tactile stimuli (100 Hz sinusoids, intensity 0.37 N, duration 250 msec) were delivered by eight mechanical stimulators that were attached to the left and right hands' distal phalanges of the index, middle, ring, and little fingers. The stimulators were driven by custom-built amplifiers, controlled by MATLAB routines (The MathWorks) via an eight-channel sound card (M-Audio, Delta 1010LT). There were two types of tactile sample stimuli (filled and unfilled). For filled stimuli, a continuous vibration was presented for 250 msec. Unfilled stimuli consisted of two 20-msec pulses, separated by a 210-msec delay. Tactile memory test stimuli consisted of two 60-msec pulses separated by a 130-msec delay.
Visual stimuli were shown at a viewing distance of 100 cm against a dark gray background on a 22-in. monitor (Samsung SyncMaster 2233; 100-Hz refresh rate, 16-msec response time). All visual stimuli were presented for 200 msec. Analogous to the tactile stimulus material, three types of visual stimuli were used (filled samples, unfilled samples, test stimuli). These visual stimuli were presented against a background of black crosshairs (four lines at 0°, 45°, 90°, and 135° of polar angle; horizontal/vertical eccentricity: 3.49° of visual angle) and two concentric black rings around the fixation dot. This background remained constantly on-screen during task performance. The eccentricity of the two rings was 2.52° and 5.04°. Visual sample and test stimuli were presented on top of these rings (see Figure 1). Their size was scaled for eccentricity (0.34° vs. 0.46° for stimuli on the inner vs. outer ring, respectively). Visual sample stimuli were filled white circles or unfilled white outline rings (line width: 0.12 and 0.16 cm, for inner and outer stimuli). Visual test stimuli were white outline rings with a small white dot in the ring's center (diameter: 0.12 and 0.16 cm for inner and outer stimuli).
Two different auditory retro-cues (600 Hz, 200-msec duration with 5-msec ramps, presented via headphones) signaled the stimulus type in the bimodal sample set (filled or unfilled) that had to be retained. Task-relevant filled tactile/visual stimuli were indicated by a continuous tone. Task-relevant unfilled stimuli were signaled by a retro-cue consisting of two 50-msec tones separated by a 100-msec interval. Responses were given vocally during the 1800-msec period after the presentation of the test stimulus set (“a” for match and “e” for mismatch, see below). They were recorded by a headset microphone, were analyzed online with custom-written voice-key routines, and were manually rechecked for correctness after the experiment.
Task Design and Stimulation Procedure
Simultaneously presented tactile and visual sample stimuli (250 msec duration) were followed after 700 msec by an auditory retro-cue (200 msec duration) and after 2000 msec by a unimodal tactile (50%) or visual (50%) test stimulus set (250 msec duration). Tactile WM load was one item per side, and visual WM load was two items per side. Tactile sample sets included one filled and one unfilled stimulus, presented to randomly determined fingers of the left and right hand. A pair of filled visual samples was presented in one hemifield, and a pair of unfiled samples was presented in the other hemifield. The precise locations of these visual sample stimuli were sampled from 102 angular positions (in polar coordinates, left side: 130°–230°, right side: 310°–50°), with the constraint that the sampled positions were at least 25° apart. For each side, the two selected positions were randomly assigned to the inner and outer rings on the monitor, with one sample being shown on top of each ring (see Figure 1). Crucially, tactile and visual sample stimuli of the same type (filled or unfilled) were either located on the same side or on opposite sides. Same-side and opposite-sides trials were equally likely and varied unpredictably within each block.
Participants first had to encode all visual and tactile sample stimuli on both sides. Following the retro-cue (a continuous tone or two tones with a gap, presented with equal probability and in random order in each block), they had to selectively maintain the locations of the cued task-relevant tactile and visual samples (filled or unfilled) only. Memory was unpredictably tested for touch or vision (50% each), with test stimulus sets containing one stimulus on the left side and one on the right side. Participants had to decide whether the location of a cued visual or tactile sample matched the location of a (visual or tactile) test stimulus. This was the case on match trials (50%). On mismatch trials (50%), the test stimulus location differed from the memorized sample location. For tactile mismatch trials, the stimulus was randomly delivered to one of the three fingers where no sample had been presented. For visual mismatch trials, the position of one of the two relevant samples was randomly shifted either upward or downward by 30° on its ring in the test display. The task-irrelevant uncued visual or tactile samples could also spatially match or mismatch with the stimuli on the irrelevant (uncued) side of the test stimulus. Spatial matches or mismatches between sample and tests were independently randomized for the cued (relevant) and uncued (irrelevant) test stimulus locations.
The experiments comprised 528 trials each that were run in 12 blocks. There were four experimental conditions (spatial congruency: relevant visual and tactile samples on the same vs. opposite sides; tested modality: touch vs. vision) with 132 trials each, which unpredictably alternated within each block. Training was run before each experiment (depending on individual performance between ∼20 and 44 trials). Feedback about the proportion of correct responses was given after each block.
Acquisition and Preprocessing of EEG Data
EEG data, sampled at 500 Hz using a BrainVision amplifier, were DC recorded from 64 Ag/AgCl active electrodes at standard locations of the extended 10–20 system. Two electrodes at the eyes' outer canthi monitored horizontal eye movements (horizontal EOG, HEOG). Continuous EEG was acquired with left mastoid reference and rereferenced to the arithmetic mean of both mastoids for data preprocessing. Data were offline submitted to a 20-Hz low-pass filter (Blackman window, filter order 1000). Epochs were extracted for the 2-sec period following the sample set and were corrected for a 200-msec prestimulus baseline that was taken before the onset of the tactile/visual sample stimuli.
Artifact Rejection and Correction
Trials with saccades were rejected using a differential step function that ran on the bipolarized HEOG (step width 200 msec, threshold 30 μV). Independent component analysis (Delorme, Sejnowski, & Makeig, 2007) was subsequently used to correct for frontal artifacts such as eye blinks and residual traces of horizontal eye movements that had not been detected by the step function. We rejected trials in which difference values for corresponding left- minus right-hemispheric electrodes exceeded a fixed threshold of ±50 μV for at least two electrode pairs. We also excluded trials where amplitudes at any electrode exceeded a fixed 150-μV threshold. The remaining epochs entered Fully Automated Statistical Thresholding for EEG Artifact Rejection (Nolan, Whelan, & Reilly, 2010) for the interpolation of noisy electrodes and were subsequently converted to current source densities (CSDs: iterations = 50, m = 4, lambda = 10−5; Tenke & Kayser, 2012); 98.4% of epochs remained for statistical analysis on average. Statistical tests were based on correct and incorrect trials, as the exclusion of incorrect trials did not change the pattern of results but would have decreased the signal-to-noise ratio of EEG data.
Selection of electrodes and time windows; topographical maps.
We separately averaged CSDs across three adjacent electrodes contralateral and ipsilateral to the task-relevant side. As in prior studies (Katus & Eimer, 2016, 2018, 2019a), the tCDA and visual CDA components were measured at lateral central (tCDA: C3/4, FC3/4, CP3/4) and occipital scalp regions (CDA: PO7/8, PO3/4, O1/2). Statistical tests were conducted on difference values of contra- minus ipsilateral CSDs averaged between 1050 and 2000 msec after the sample set, corresponding to a time window of 350–1300 msec following the retro-cue (cf. McCants, Katus, & Eimer, 2020; Katus, Müller, et al., 2015).
Spline-interpolated voltage maps illustrate the topographical distribution of lateralized activity in the period after the cue. These maps were obtained by subtracting ipsilateral CSDs from contralateral CSDs, with contra-/ipsilateral referring to the side where the task-relevant visual sample stimuli had been presented. To collapse data across blocks where the visual samples on the left versus right side were task-relevant, electrode coordinates were flipped over the midline for visual left-side memory trials. Therefore, in the topographical maps, a negative potential over the left hemisphere indicates the presence of delay activity contralateral to the visual targets.
Data were analyzed with paired t tests and repeated-measures ANOVAs. Error bars in graphs indicate confidence intervals for the true population mean. Thus, error bars that do not overlap with the zero axis (y ≠ 0) inform about statistically significant tCDA/CDA components.
Bayesian t tests (Rouder, Speckman, Sun, Morey, & Iverson, 2009) and the software JASP (JASP Team, 2016) were used to calculate Bayes factors for each main effect/interaction in our statistical designs. The Bayes factor denotes the relative evidence for the alternative hypothesis as compared with the null hypothesis and thus allows for statistical inferences regarding the presence or absence of a modulation. The Bayes factor for the null hypothesis (BF01) corresponds to the inverse of the Bayes factor for the alternative hypothesis (BF10) and indexes the relative evidence in the data that an effect is absent rather than present. We report the numerically larger BF and categorize the evidence as substantial (for BFs > 3 and < 10), strong (BFs > 10 and < 30), very strong (BFs > 30 and < 100), or decisive (BFs > 100) according to the convention suggested by Jeffreys (1961).
Accuracy data were analyzed using an ANOVA with the factors Tested Modality (touch or vision) and Spatial Congruency (same-side trials, opposite-sides trials). Performance (mean 84.5% correct) was lower in the visual as compared with tactile task (81.6% vs. 87.4%, Tested Modality: F(1, 15) = 9.074, p = .009, BF10 = 6.107), and was generally reduced on opposite-sides relative to same-side trials (83.0% vs. 86.0%, Spatial Congruency: F(1, 15) = 9.261, p = .008, BF10 = 6.434). The interaction between both factors was significant (F(1, 15) = 7.842, p = .014, BF10 = 4.288). Follow-up analyses conducted separately for trials where tactile or visual WM was tested showed that tactile WM accuracy was impaired on opposite-sides relative to same-side trials (84.4% vs. 90.4%, t(15) = 4.357, p < .001, BF10 = 62.856). In contrast, no such spatial congruency effect was found for visual WM performance (81.6% vs. 81.5% for opposite-side vs. same-side trials, t(15) = 0.084, p = .934, BF10 = 3.903).
Amplitudes of the tCDA and CDA components were measured between 350 and 1300 msec after the onset of the retro-cue (see Methods section for electrode locations used to compute the tCDA and CDA). We examined whether the amplitudes of these components were sensitive to the spatial congruency of cued task-relevant visual and tactile sample stimuli in WM (same-side vs. opposite-sides trials). We obtained statistically significant visual CDA components in both types of trials (same-side: t(15) = 2.447, p = .027, BF10 = 2.425; opposite-sides: t(15) = 2.144, p = .049, BF10 = 1.530). The CDA did not differ in size between same-side and opposite-sides trials (t(15) = 0.254, p = .803, BF01 = 3.805), as CDA components of equal size were obtained on both types of trials (see bar graphs in Figure 2). In contrast, the tCDA was smaller on opposite-sides relative to same-side trials (t(15) = 4.394, p < .001, BF10 = 67.012). In fact, tCDA amplitudes were not significantly different from zero on opposite-sides trials (t(15) = 1.562, p = .139), although the evidence for the null hypothesis was not conclusive in this case (BF01 = 1.428). In contrast, the tCDA in same-side trials was highly reliable (t(15) = 4.691, p < .001, BF10 = 112.039).
We used retro-cue procedures to investigate whether concurrent attention shifts to visual and tactile WM representations are controlled independently for these two modalities or whether there are spatial synergies between these attentional shifts, indicative of crossmodal links. The results of Experiment 1 were clear but somewhat unexpected. The allocation of attention to retro-cued visual WM representations was entirely unaffected by the spatial congruency of simultaneous attention shifts toward task-relevant tactile WM representations. Visual CDA components emerged in the interval after a retro-cue had been presented, confirming that attention was directed toward visual WM representations that were signaled as task-relevant by these cues. CDA amplitudes did not differ between same-side and opposite-sides trials, and visual WM performance was also virtually identical on these two types of trials. These results provide strong evidence that the control of attention shifts within visual WM operates independently of any concurrent attention shifts in tactile WM. In marked contrast, these visual attention shifts produced clear spatial synergy effects for the concurrent activation of tactile WM representations during the period after the retro-cue. The amplitude of tCDA components measured during this period was strongly attenuated on opposite-sides relative to same-side trials, and the tCDA was not significantly different from zero on opposite-sides trials. Tactile WM performance was also impaired on these trials.
If the control of attention shifts within WM was shared between vision and touch, one would have predicted symmetrical spatial synergy effects for both modalities. In fact, these effects were only found for the allocation of attention to tactile but not visual WM representations in Experiment 1, indicating that attention shifts toward cued samples in visual WM affected concurrent shifts in tactile WM more strongly than vice versa. This asymmetry between sensory modalities could reflect a generic bias in the attentional control of WM toward vision (cf. Katus et al., 2017) but could also be a result of the specifics of the bimodal WM task used in this experiment. As WM load was higher in the visual task than in the tactile task (two items vs. one item), WM maintenance demands were higher for vision than for touch, which is also suggested by the observation that WM performance was lower in the visual task. Determining the identity of retrospectively cued targets, in contrast, was relatively straightforward in vision, where selection was based on a simple feature (filled versus unfilled dots). The tactile targets and distracters were harder to distinguish because the tactile filled and unfilled stimuli were initially identical, with the gap that defined unfilled samples appearing only after 20 msec. Because WM load was higher in vision and the attribute relevant for retrospective selection was more salient, participants may have prioritized attentional shifts in visual WM, resulting in spatial synergy effects from vision to touch but not vice versa.
Regardless of the exact factors responsible for this asymmetry, the fact that there were strong behavioral and electrophysiological crossmodal spatial congruency effects for tactile WM provides clear evidence that the attentional control processes activated in response to the retro-cues were linked across sensory modalities. These results differ markedly from our previous visual/tactile WM experiments that did not involve retro-cues, where no such links between the attentional control of visual and tactile WM were found (Katus & Eimer, 2016, 2019b). This striking dissociation suggests that the encoding of sensory stimuli into WM and their subsequent maintenance on the one hand and the selective allocation of attention to WM representations that are already stored in WM on the other are controlled by qualitatively distinct mechanisms (see also Katus, Andersen, & Müller, 2012). The former operate in a modality-specific fashion, whereas the latter are shared between sensory modalities. If this was the case, evidence for crossmodal links between visual and tactile WM should only be found in tasks where attention shifts to retro-cued visual and tactile WM representations are activated at the same time in both modalities. When such shifts within WM are only required in touch, they should not be affected by the spatial congruency of simultaneously maintained visual WM representations. Experiment 2 was conducted to test this prediction.
To determine whether spatial congruency effects on attention shifts to retro-cued tactile WM representations are eliminated in the absence of concurrent attention shifts within visual WM, we slightly modified the procedures used in Experiment 1. Stimulus presentation procedures and the timing of events remained the same, but the retro-cues were now only relevant for touch. In Experiment 2, the location of task-relevant visual sample stimuli (left or right visual hemifield) was specified before the first experimental block and remained constant for six successive blocks before it was reversed. As the location of task-relevant visual sample stimuli was known in advance, participants could immediately encode these stimuli into WM before the arrival of the retro-cue. For this reason, visual CDA components were expected to emerge already during the interval between the sample display and the retro-cue, reflecting the attentional activation of visual WM representations during this period. However, the retro-cue still indicated whether tactile sample stimuli on the left or right side had to be retained. Thus, attention shifts to task-relevant tactile WM representations could only be initiated after the retro-cue was presented, as in Experiment 1. Therefore, tCDA components should again only be present during the interval between the retro-cue and the test display.
The critical manipulation in Experiment 2 again concerned the spatial relationship between task-relevant visual and tactile sample stimuli. On same-side trials, retro-cued tactile samples were located on the side that was consistently task-relevant for the visual WM task. On opposite-sides trials, these tactile samples were located on the other side. Thus, the attention shifts within tactile WM prompted by the retro-cues were spatially congruent or incongruent with the representations in visual WM that had already been activated before the retro-cues had been presented. The central question was whether the spatial congruency effects found for tactile WM in Experiment 1 would also be found in Experiment 2. If these effects reflect crossmodal links that are specific to the control of retrospective selection processes in tactile and visual WM, they should now be absent, as no shifts of attention within visual WM were required in response to the retro-cues. Alternatively, the allocation of attention to retro-cued tactile WM representations may be more generally affected by the ongoing maintenance of visual WM representations on the same versus opposite side. In this case, Experiment 2 should find similar crossmodal spatial congruency effects for tactile WM as were observed in Experiment 1.
Sixteen neurologically unimpaired participants (mean age = 32 years; nine women, all right-handed) took part in Experiment 2. One additional participant was excluded because of excessive EEG artifacts. Informed consent and ethics approval procedures were the same as in Experiment 1.
Stimulus Material, Setup, Task Design, and Stimulation Procedure
These were all identical to Experiment 1, with one important exception: In Experiment 2, the retro-cue was only relevant for the tactile modality. The side of the task-relevant visual samples (left or right hemifield) was now indicated via verbal instruction and on the computer screen before the first experimental block and remained constant until the seventh block. The visual sample stimuli on the task-relevant side were equiprobably filled (in which case they were accompanied by unfilled distracters on the opposite side), or they were unfilled (and presented along with filled distracters on the opposite side). Whether the left- or right-side visual samples were relevant in the first or second experimental half was randomly determined for each participant. Participants therefore had to encode visual sample locations only on the relevant side and tactile sample locations on both sides before the retro-cue indicated which tactile sample (filled or unfilled) had to be retained in WM. Note that this blocking of relevant visual sample locations did not affect the spatial congruency manipulation (same-side vs. opposite-sides trials), which still varied unpredictably on a trial-by-trial basis, depending on which tactile sample was indicated as relevant by the retro-cue.
Acquisition, Preprocessing of EEG Data, Artifact Rejection, and Correction
These were all identical to Experiment 1. After artifact rejection, 98.7% of epochs remained for statistical analysis on average.
Participants correctly responded in 84.5% of all trials. An ANOVA did not yield any significant main effects or interaction. Performance did not significantly differ between trials where touch versus vision was tested (85.9% vs. 83.1% correct; Tested Modality: F(1, 15) = 1.020, p = .329, BF01 = 2.521), there was no significant difference in performance between opposite-sides versus same-side trials (83.9% vs. 85.1%; Spatial Congruency: F(1, 15) = 2.090, p = .169, BF01 = 1.635). In contrast to Experiment 1, the interaction between these two factors was not significant (F(1, 15) = 1.942, p = .184, BF01 = 1.733). Although tactile WM performance tended to be lower on opposite-sides relative to same-side trials (84.8% vs. 87.0%), this difference was not significant (t(15) = 1.711, p = .108, BF01 = 1.187).
Because the location of the task-relevant visual (but not tactile) samples was already known before the onset of the retro-cues in Experiment 2, the visual CDA component was expected to emerge shortly after the presentation of the sample set, whereas the tCDA should only appear after the retro-cue, as in Experiment 1. These predictions were confirmed. In the period before the retro-cue (i.e., 300–700 msec after presentation of the sample set), a reliable CDA was elicited (t(15) = 4.144, p = .001, BF10 = 43.307), whereas there was no evidence for the presence of a contralateral negativity over somatosensory cortex (t(15) = 0.115, p = .910, BF01 = 3.892; see Figure 3). The absence of any lateralized effect at central electrodes (i.e., no tCDA) during the period where a strong visual CDA was already present over posterior electrodes demonstrates that our CSD analysis method was successful in preventing volume conduction of electrical activity over visual areas to more anterior sites.
During the period after the retro-cue, both the tCDA and CDA components were present in all experimental conditions (tCDA same-side: t(15) = 2.223, p = .042, BF10 = 1.720; tCDA opposite-sides: t(15) = 4.408, p = .001, BF10 = 68.684; CDA same-side: t(15) = 4.025, p = .001, BF10 = 35.187; CDA opposite-sides: t(15) = 2.730, p = .016, BF10 = 3.8165; see also Figure 3, where statistically significant lateralized effects are marked by error bars that do not overlap with the zero axis). As in Experiment 1, the CDA amplitudes did not differ between same-side and opposite-sides trials (t(15) = 0.972, p = .346, BF01 = 2.601). But in marked contrast to Experiment 1, there was now also no longer a significant difference between tCDA amplitudes on opposite- and same-side trials (t(15) = 0.092, p = .928, BF01 = 3.900).
Comparison between Experiments
To formally assess whether the congruency effects (for same- vs. opposite-sides trials) in tCDA amplitudes and tactile WM accuracy differed between the two experiments, we analyzed the behavioral and ERP results obtained in Experiments 1 and 2. We used mixed ANOVAs with the between-subject factor Experiment and the within-subject factor Spatial Congruency. For tCDA amplitudes, we found no main effect of Experiment (F(1, 30) = 0.316, p = .578, BF01 = 2.874) and only a marginal main effect of Spatial Congruency (F(1, 30) = 3.800, p = .061, BF10 = 1.376). However, the interaction was statistically significant (Experiment × Spatial Congruency: F(1, 30) = 4.295, p = .047, BF10 = 2.242), indicating that spatial congruency effects on tCDA amplitudes were indeed more pronounced in Experiment 1 relative to Experiment 2. For tactile WM accuracy, an analogous mixed ANOVA revealed a reliable main effect of Spatial Congruency (F(1, 30) = 18.169, p < .001, BF10 = 67.174) and a significant interaction between Experiment × Spatial Congruency (F(1, 30) = 4.976, p = .033, BF10 = 2.019), again reflecting a larger congruency effect in Experiment 1. The main effect of Experiment was not significant for tactile WM performance (F(1, 30) = 0.267, p = .609, BF01 = 1.764). Analogous analyses were also conducted for CDA amplitudes and visual WM performance. As expected, there were no significant effects for CDA amplitudes (all ps > .1) or visual WM accuracy (all ps > .6), confirming that no spatial congruency effects were present for these measures in either experiment. Although CDA amplitudes were numerically larger in Experiment 2, this difference between experiments was not reliable (F(1, 30) = 2.772, p = .106, BF01 = 2.710).1
The results of Experiment 2 were clear-cut. In marked contrast to Experiment 1, there were now no longer any spatial congruency effects for tactile WM. The amplitudes of tCDA components elicited in the period after the retro-cue was presented were essentially identical on same-side and opposite-sides trials. There was also no significant impairment of tactile WM performance on opposite-sides trials. The presence of spatial congruency effects on tactile WM in Experiment 1 and the absence of these effects in Experiment 2 were substantiated by significant interactions between experiment and spatial congruency in the between-experiment analyses reported above. The critical factor that changed between these two experiments was the point in time when attention could be allocated to task-relevant visual WM representations. In Experiment 2, the to-be-attended side for vision was blocked, so that task-relevant visual samples could be attended and encoded into WM immediately after the sample display was presented. The presence of reliable CDA components in the period between the samples and the retro-cues confirms that this was indeed the case. Following the retro-cues, attention shifts toward task-relevant sample stimuli were required within tactile WM only, whereas the previously established focus of attention in visual WM could be maintained. Thus, the absence of spatial congruency effects for tactile WM in Experiment 2 and the presence of such effects in Experiment 1 were associated with the absence versus presence of concurrent attention shifts toward retro-cued WM representations in both modalities. These findings support the hypothesis that crossmodal links in multimodal WM tasks specifically affect the mechanisms that control the selective attentional activation of representations that are already stored in WM.
The question whether the processes involved in the attentional control of WM are modality specific or shared across sensory modalities remains the subject of considerable debate (e.g., Fougnie, Zughni, Godwin, & Marois, 2015; Cowan, Saults, & Blume, 2014; Cowan, 2011; Fougnie & Marois, 2011; Cocchini, Logie, Della Sala, MacPherson, & Baddeley, 2002). In our own previous work (Katus & Eimer, 2016, 2018, 2019b), we used ERP markers of concurrent visual and tactile WM maintenance processes and found no evidence for crossmodal links. Visual CDA and tCDA components did not differ between trials where to-be-encoded visual and tactile sample stimuli were presented on the same versus opposite sides, and there were also no differences in visual and tactile WM performance. The absence of any crossmodal spatial congruency effects suggests that the WM encoding and subsequent maintenance processes in vision and touch are mediated by entirely modality-specific attentional control processes.
In this study, we investigated a different aspect of the attentional control of WM. Using retro-cue procedures, we tested whether there are crossmodal links between processes that control the retrospective selection of visual and tactile WM representations. In Experiment 1, we found clear evidence for crossmodal spatial congruency effects. Tactile WM performance was impaired, and tCDA components were strongly attenuated on trials where retro-cues prompted attention shifts into opposite directions within visual and tactile WM. This indicates that the control of such retrospective selection processes is not fully modality specific but is based on mechanisms that are at least partially shared across sensory modalities. The results of Experiment 2 confirmed that these shared mechanisms are specifically involved in the control of concurrent attentional allocation processes in visual and tactile WM. The behavioral and electrophysiological spatial congruency effects observed for tactile WM in Experiment 1 were eliminated when shifts of attention within tactile WM triggered by retro-cues took place in the absence of simultaneous shifts in visual WM, and this was confirmed by additional between-experiment analyses.2 Although visual WM representations that had been encoded previously were still selectively maintained at the point in time when attention was shifted toward retro-cued tactile WM representations, the spatial congruency between these representations did not have any systematic effect on tactile WM. This suggests that the crossmodal spatial synergy effects found in Experiment 1 for tactile WM were not the generic result of a spatial mismatch between simultaneously active visual and tactile WM representations but were more specifically associated with the control of retrospective selection processes within WM. The absence of such effects in Experiment 2 also underlines that the initial encoding and storage of sensory stimuli and the subsequent selective activation of a subset of these stimuli are not just temporally, but also functionally, dissociable (see also Myers, Stokes, & Nobre, 2017, for discussion of an analogous distinction between maintenance and retrospective selective activation processes in WM).
The current findings and the results of our previous studies on the control of multimodal WM (Katus & Eimer, 2016, 2018, 2019a, 2019b) indicate that encoding and storage on the one hand and the attentional selection of existing WM representations on the other are regulated by distinct attentional control mechanisms. The former type of control appears to operate in a strictly modality-specific fashion, whereas the latter is shared across sensory modalities. If this was the case, the obvious question is why such a difference in the modality specificity of these two attentional control mechanisms for WM should exist. At present, we can only offer speculations with regard to this question, rather than definitive answers. One possibility is that this apparent dissociation reflects a qualitative difference in the prospective versus retrospective attentional control of WM. In situations where the spatial or nonspatial properties of to-be-memorized events are known in advance, representations of these properties (attentional templates; Duncan & Humphreys, 1989) can be activated in a prospective fashion before the onset of memory sample sets. These templates represent specific sensory attributes of task-relevant sample stimuli and may therefore guide the selection and encoding of these stimuli in a sensory-specific fashion. In contrast, because no such preparatory attentional templates are involved when relevant representations in WM are selected retrospectively, these selection processes may be mediated primarily by modality-unspecific control processes. Another possibility is that the difference between the control of WM encoding/storage and of retrospective selection mechanisms within WM is related to the distinction between external versus internal attention (Chun, Golomb, & Turk-Browne, 2011). External attention controls the selective processing of sensory stimuli in the outside world, and internal attention regulates the selective activation of internal representations held in different types of memory stores. The selective processing of sensory stimuli in sample displays and the encoding of these stimuli into WM primarily involves external attention, whereas the selective activation of stored WM representations signaled by retro-cues requires internal attention. Because the demands and constraints on external and internal attention are considerably different (see Myers et al., 2017, for further discussion), this could be reflected in differences in the underlying control processes. For example, the main function of external attention is to rapidly detect and select task-relevant sensory objects and to exclude irrelevant objects from further processing. This is believed to be achieved by “winner-takes-all” competitive interactions between sensory objects that are modulated by top–down task goals (e.g., Desimone & Duncan, 1995). Because such biased competition processes in external attention are implemented in modality-specific sensory-perceptual areas, it is plausible to assume that they are controlled by modality-specific mechanisms. In contrast, internal attention operates within a much more limited number of currently stored mental representations from different sensory modalities, and the selective prioritization of some of these representations does not necessarily involve the loss of others (e.g., Lewis-Peacock, Drysdale, Oberauer, & Postle, 2012). The allocation of attention to such internal WM representations might therefore be mediated by higher-level control processes in pFC that are shared and coordinated across sensory modalities.
Although these suggestions are in line with the results of the current visual/tactile WM experiments, they cannot easily be reconciled with previous evidence for crossmodal links in perceptual attention. In these studies, external attention was allocated to sensory stimuli in different modalities, and target-defining attributes were known in advance (e.g., Eimer & Driver, 2001; Spence & Driver, 1996). Further research is clearly needed to further dissociate the aspects of WM that are controlled by modality-specific versus modality-nonspecific mechanism.
The current study was the first to use retro-cue procedures in a bimodal WM task to investigate the concurrent allocation of spatial attention to representations stored in visual and tactile WM. We found behavioral and electrophysiological crossmodal spatial synergy effects for such attention shifts within WM, indicating that the underlying control mechanisms are shared across sensory modalities. These supramodal mechanisms appear to be specific to the control of attentional selection processes within WM, whereas other WM functions are regulated by modality-specific processes.
This work was funded by the Leverhulme Trust (grant RPG-2015-370). We thank Laura Katus for proofreading the manuscript.
Reprint requests should be sent to Tobias Katus, School of Psychology, University of Aberdeen, AB24 3FX Aberdeen, United Kingdom, or via e-mail: firstname.lastname@example.org.
As expected on the basis of previous observations (van Ede, Chekroud, & Nobre, 2019), the retro-cues employed in the present experiments triggered small but systematic deviations of eye gaze toward the side where cued task-relevant visual sample stimuli had been presented. However, an analysis of mean HEOG amplitudes between 350 and 1300 msec after the retro-cue showed that this effect only reached statistical significance in same-side trials of Experiment 1 (t(15) = 2.824, p = .013, BF10 = 4.455), but not for opposite-sides trials, and neither for same-side nor opposite-sides trials in Experiment 2 (all ps > .05). Importantly, a mixed ANOVA across both experiments found no reliable differences in the size of these residual eye movements between experiments (F(1, 30) = 0.037, p = .850, BF01 = 3.145) and no reliable interaction between Experiment × Spatial Congruency (F(1, 30) = 0.738, p = .397, BF01 = 2.217); the main effect of Spatial Congruency did also not reach the significance threshold (F(1, 30) = 3.900, p = .058, BF10 = 1.672). Analogous results were obtained when this analysis was conducted for HEOG data that had not been corrected using independent component analysis. Thus, any tendency to move the eyes toward cued visual sample locations is unlikely to account for the presence of Spatial Congruency effects on tCDA amplitudes in Experiment 1 but not in Experiment 2.
It should be noted that tactile WM accuracy and tCDA amplitudes were numerically enhanced on same-side trials and reduced on opposite-sides trials in Experiment 1 as compared to Experiment 2, suggesting that shifts of attention in visual WM triggered by retro-cues in Experiment 1 produced both costs and benefits for concurrent attention shifts in tactile WM on opposite-sides and same-side trials, respectively. However, direct comparisons of tactile WM performance and tCDA amplitudes between Experiments 1 and 2, conducted separately for same-side and opposite-sides trials, yielded no reliable differences (all t(30) < 1.55, all p > .13).