A consolidated practice in cognitive neuroscience is to explore the properties of human visual working memory through the analysis of electromagnetic signals using cued change detection tasks. Under these conditions, EEG/MEG activity increments in the posterior parietal cortex scaling with the number of memoranda are often reported in the hemisphere contralateral to the objects' position in the memory array. This highly replicable finding clashes with several reported failures to observe compatible hemodynamic activity modulations using fMRI or fNIRS in comparable tasks. Here, we reconcile this apparent discrepancy by acquiring fMRI data on healthy participants and employing a cluster analysis to group voxels in the posterior parietal cortex based on their functional response. The analysis identified two distinct subpopulations of voxels in the intraparietal sulcus (IPS) showing a consistent functional response among participants. One subpopulation, located in the superior IPS, showed a bilateral response to the number of objects coded in visual working memory. A different subpopulation, located in the inferior IPS, showed an increased unilateral response when the objects were displayed contralaterally. The results suggest that a cluster of neurons in the inferior IPS is a candidate source of electromagnetic contralateral responses to working memory load in cued change detection tasks.
The human ability to select and retain for a few seconds information useful to perform a wide range of cognitive tasks appears to be at the root of virtually all human cognition. This ability, commonly referred to as working memory, has recently become a matter of intense neuroimaging investigation (Todd & Marois, 2004; Vogel & Machizawa, 2004) since the demonstration that electromagnetic indices of working memory capacity are predictive of a variety of cognitive skills at the individual level (Unsworth, Fukuda, Awh, & Vogel, 2014, 2015). In vision, neurons showing sustained activation in tasks designed to engage visual working memory (VWM) populate a number of cortical regions of the human brain (including occipitoparietal, frontal, and prefrontal areas; e.g., Naughtin, Mattingley, & Dux, 2014; Pessoa, Gutierrez, Bandettini, & Ungerleider, 2002; Cohen et al., 1997; Courtney, Ungerleider, Keil, & Haxby, 1997). However, studies employing fMRI have pointed to neurons in the IPS as playing a critical role in determining the capacity of VWM, which averages to three to four items (Cowan, 2001). Activation of IPS neurons has been shown to increase almost linearly with the number of items encoded in VWM, but only up to its capacity, leveling off thereafter (Mitchell & Cusack, 2008; Xu & Chun, 2006; Todd & Marois, 2004).
This property of IPS neurons is typically reflected in EEG signals recorded in tasks like the one exemplified in Figure 1. Participants are usually invited to keep their gaze at central fixation and informed that a directional cue displayed shortly afterwards would indicate either the left or right visual hemifield. The directional cue is then followed by a memory array composed of a variable number of color patches evenly distributed between the two hemifields. Participants are instructed to memorize the colors displayed in the cued hemifield and disregard the other colors displayed in the opposite hemifield. After a blank retention interval, memory for colors is probed by displaying a color patch and asking participants to indicate whether this probe stimulus matches one of the previously memorized colors. Typically, ERPs time-locked to the onset of the memory array—held to track processing subtended in the encoding and retention phases of the task—show an increased negativity at occipitoparietal electrode sites (e.g., PO7/PO8, P7/P8) contralateral to the cued hemifield relative to ipsilateral electrode sites. The ERP component obtained by subtracting ipsilateral from contralateral activity has been labeled using different acronyms, like CNSW (for contralateral negative slow wave; Klaver, Talsma, Wijers, Heinze, & Mulder, 1999), CDA (for contralateral delay activity; Vogel & Machizawa, 2004), SPCN (for sustained posterior contralateral negativity; Jolicœur, Brisson, & Robitaille, 2008), and CSA (for contralateral search activity; Emrich, Al-Aidroos, Pratt, & Ferber, 2009), all intended to indicate some form of maintenance in an “online” state of VWM representations. To avoid confusion, we will use one of these acronyms throughout the paper, SPCN. SPCN has two important characteristics. One is that, starting from about 300 msec after the onset of the memory array, the amplitude of the SPCN increases monotonically with the number of displayed colors, reaching a plateau when VWM is at capacity (see Luria, Balaban, Awh, & Vogel, 2016; Luck & Vogel, 2013). The other is that SPCN memory-related amplitude modulations are symmetrical, that is, usually reported of comparable intensity at left-sided electrodes when the cued objects are in the right visual hemifield or at right-sided electrodes when the cued objects are in the left visual hemifield. Both these properties have been shown to also characterize the MEG counterpart of the SPCN (Becke, Müller, Vellage, Schoenfeld, & Hopf, 2015; Robitaille et al., 2010).
An issue concerning the role of IPS neurons in VWM is the lack of a hemodynamic analogue of the SPCN response when the activity of IPS neurons is estimated with fMRI or fNIRS using cued change detection tasks. Contrary to SPCN, VWM-related increments in hemodynamic activity in IPS have been reported to be either equivalent between contralateral and ipsilateral cerebral hemispheres (i.e., no SPCN; Cutini et al., 2011; Robitaille et al., 2010) or greater contralaterally than ipsilaterally in one cerebral hemisphere but not the other. These latter cases of interhemispherically asymmetric SPCN are however mutually inconsistent, because evidence of an SPCN-compatible response has been described for the left but not the right IPS by some (e.g., Sheremata, Bettencourt, & Somers, 2010), but also for the right but not the left IPS by others (e.g., Killebrew, Mruczek, & Berryhill, 2015; see also Becke et al., 2015, for converging MEG evidence on VWM-related right IPS bias).
Here, we attempt to solve the divergence between fMRI/fNIRS and EEG/MEG studies using an analytical approach to fMRI data inspired by the observation that subpopulations of neurons, especially in a relatively large and complex cortical area (like IPS), may be simultaneously active during a given cognitive task, though likely subserving different functions (Logothetis, 2008). Resorting to classic parametric mapping approaches under these circumstances may be suboptimal, as one limit of these formal approaches is to chart the activity of neural assemblies (a) in the absence of information about the specific processing stage subserved and (b) blind to interindividual differences in the localization of functionally distinguishable neural assemblies. A solution circumventing this limit is provided by data-driven approaches developed to classify voxels based on how hemodynamic activity modulates across the different conditions of an experiment. One such approach is cluster analysis of fMRI data, which is carried out without a priori formal assumptions about the function describing the time course of the hemodynamic response. On the basis of the similarity of the time courses of the hemodynamic response of each voxel across the different experimental conditions, here we grouped voxels in IPS into distinct clusters using the popular k-means algorithm (Baune et al., 1999; Goutte, Toft, Rostrup, Nielsen, & Hansen, 1999; Duda & Hart, 1973). The cluster analysis was carried out on fMRI data collected with the design illustrated in Figure 1, following a standard analysis that ascertained that the voxels of interest populated regions of IPS closely corresponding to those described in prior fMRI works (e.g., Todd & Marois, 2004). fMRI data were also collected in a control experiment to ensure that the clusters of interest were located in IPS regions displaying a prototypical VWM-compatible hemodynamic response. Rather than one or three colors, the memory array contained three or five colors on the ground that no hemodynamic activity increments would be observed when VWM capacity was likely to be equaled or exceeded. To anticipate, the results of the cluster analysis revealed the presence of a spatially segregated (Pulvermüller, Kherif, Hauk, Mohr, & Nimmo-Smith, 2009; Golland, Golland, Bentin, & Malach, 2008; Simon et al., 2004) subpopulation of voxels in the inferior portion of IPS of each hemisphere showing a clear SPCN-like response to contralateral objects.
Twenty-five students (11 men; age M = 26.48 years; SD = 4.2 years) of the University of Padova took part in the experiment. All participants reported normal or corrected-to-normal vision and had no history of neurological and/or psychiatric disorders. All participants gave written informed consent before entering the scanner according to the ethical principles approved by University of Padova.
Five participants were discarded from analysis: one because of technical problems of the MRI apparatus, one for excessive head movements during data acquisition, and three for not completing the experiment. The total number of participants considered in the following analyses was 20 (7 men; age M = 26.45 years; SD = 4 years).
Stimuli and Procedure
Participants performed the cued memory probe task illustrated in Figure 1. Each trial began with the presentation of a fixation point at the center of the screen for 600 msec, followed by a 400-msec arrow cue pointing to either the left side or the right side of the screen. The offset of the directional cue was followed by a blank interval of 800–1200 msec (randomly jittered in steps of 100 msec) and by the onset of a memory array composed of two or six color patches, evenly distributed in the left/right visual hemifields, displayed for 300 msec on a black background (RGB 0 0 0). From a viewing distance (depth of vision in the goggle system) of approximately 1.2 m, the dimension of each color patch was 1° × 1° of visual angle. Colors were randomly chosen from a set of eight highly discriminable hues: yellow (RGB 230 235 5), blue (RGB 0 0 255), green (RGB 51 151 68), red (RGB 255 0 0), white (RGB 255 255 255), cyan (RGB 0 255 255), violet (RGB 255 0 255), and claret (RGB 153 0 48). Each color could appear no more than once on either side of fixation. Stimuli could be displayed in random positions within two notional rectangles of 3.5° × 7° visual angle placed symmetrically on the left/right of fixation at a distance of 2.5° visual angle. The minimum distance between the upper left corners of two adjacent stimuli was constrained to be no less than 1.5°.
Participants were instructed to keep their gaze at fixation and memorize the color of the patches presented in the cued visual hemifield while ignoring the colors presented in the opposite hemifield. A single probe color patch was then presented at fixation after a blank retention interval of 1400–1600 msec (randomly jittered in steps of 20 msec). Participants had 2000 msec to indicate, by pressing one of two keys on an optically coupled response pad placed inside the scanner, whether the probe color matched one of the to-be-memorized colors. Half of the participants used the left index finger to respond “match” and the right index finger to respond “no match,” whereas the other half of participants used the opposite response mapping. On half of the trials, the color of the probe matched one of the colors displayed in the cued hemifield, whereas on the other half of trials the probe color was randomly selected among the available set of nondisplayed colors. Following the response, an intertrial interval of 10,000–14,000 msec (randomly jittered in steps of 500 msec) elapsed before the beginning of the next trial. The experimental session consisted of five blocks of 32 trials each. Each participant was exposed to 40 trials per condition in a design generated by the orthogonal combination of memory load (1 vs. 3) and cued visual hemifield (left vs. right). Trials in the different conditions were intermixed at random within each block. Participants were explicitly invited to take a short rest between one block of trials and the next. During the pause, each participant was reminded of the instruction to keep his/her gaze at central fixation throughout the visual stimulation sequence.
Before fMRI data acquisition, participants familiarized with the task in a practice session and received a feedback for their responses by displaying either a plus (correct) or minus sign (incorrect) at fixation at the end of each trial. No feedback was provided during fMRI data acquisition. During practice, response accuracy was emphasized relative to response speed.
Stimuli were presented with the VisuaStim System through virtual goggles (format: SVGA, 800 (×3) × 600 pixels, refresh rate: 85 Hz, field of view [FOV] = 30° H × 23° V, aspect ratio: 4:3, colors: 16.7 million) connected via a fiber-optic cable entering the magnet room. An eye-tracking camera (Resonance Technology, Northridge, CA) was embedded inside the right lens of the virtual goggles to monitor eye movements with ViewPoint software (Arrington Research, Scottsdale, AZ). Participants were instructed to maintain fixation and were informed verbally that their eye movements were monitored using a system mounted in the goggles. Trials associated with an eye movement equal or exceeding 1.0° away from fixation (toward either visual hemifield) during the entire visual stimulation sequence were excluded from analysis.
Data acquisition was performed with 1.5 T Avanto (Siemens, Erlangen, Germany) MRI scanner of the Radiology Department of the University of Padova equipped with a standard Siemens eight-channel coil. T2*-weighted EPI were acquired tilting the FOV to avoid signal drop (Weiskopf, Hutton, Josephs, & Deichmann, 2006; repetition time [TR] = 2190 msec, echo time [TE] = 49 msec, FOV = 224 mm, flip angle = 90°, 27 axial slices of 64 × 56 voxels, 4.5 mm thick without slice gap, interleaved). Before the beginning of the experimental blocks of trials (256 volumes on each block), localizer sequences were executed to prescribe the position of the slices. The MR scanner was allowed to reach a steady state by discarding the first four volumes in each scan series to ensure that subsequent scans were collected at equilibrium magnetization. A high-resolution anatomical scan, 3-D T1-weighted MPRAGE (TR = 1900 msec, TE = 2.91 msec, isotropic voxel size of 1 mm3) was carried out on each participant following the acquisition of the five blocks of trials. Field map was acquired to correct for geometric distortions (TR = 500 msec, TE/ΔTE = 4.76/4.76 msec).
Psychophysical estimates of individual VWM capacity were calculated using Cowan's (2001) equation, K = S × (H − FA), where K is the number of colors stored in VWM, S is the number of colors displayed in the cued side of the memory array, H is the proportion of “hits” (i.e., correct “no match” detections), and FA is the proportion of “false alarms” (i.e., incorrect “no match” detections). The sensitivity index d′ was also computed (Green & Swets, 1966). Trials associated with an incorrect response were excluded from fMRI analyses.
The standard analyses were performed using the FEAT toolbox included in FSL (Jenkinson, Beckmann, Behrens, Woolrich, & Smith, 2012) and in-house MATLAB (The MathWorks, Natick, MA) routines. Data were first skull stripped with BET (Smith, 2002), motion-corrected using MCFLIRT (Jenkinson, Bannister, Brady, & Smith, 2002), spatially smoothed using a 5 mm FWHM, and temporally high-pass filtered with cutoff frequency automatically tuned by FSL on a subject-specific basis. For every participant's block of trials, boundary-based registration and B0 unwarping using the acquired field map was performed between the EPI and the T1-weighted image using FLIRT (Greve & Fischl, 2009; Jenkinson et al., 2002). FNIRT implemented in FSL (Jenkinson et al., 2012) was used to perform nonlinear registration between the T1-weighted and the MNI152 2 mm atlas (Grabner et al., 2006). EPI was registered to the MNI atlas by combining the two previously computed registration steps. Statistical parametric maps were created for each participant's block of trials and experimental condition with a multiple regression analysis. Regressors were defined for each experimental condition as delta functions temporally located at the onset of the memory array, convolved with a canonical hemodynamic function (double-gamma hemodynamic response function). Temporal derivatives for each regressor were defined as additional regressors to correct for slice timing alignment. Twenty statistical parametric maps were created (5 blocks × 4 conditions) for each participant. A second-level within-subject analysis was carried out on the data set of each participant to obtain the mean statistical parametric map for each experimental condition. Finally, a third-level fixed-effects repeated-measures 2 × 2 ANOVA considering Memory load (1 vs. 3 colors) and Cued visual hemifield (left vs. right) as within-subject factors was performed. Regressors were defined for each participant's mean variability, for the memory load (1 vs. 3), the cued visual hemifield (left vs. right), and the interaction between factors. Individual K values for each condition were included in the model as covariate. A cluster-wise analysis was performed using the Gaussian random field theory to correct for multiple comparisons (z threshold = 2.3, cluster p threshold = .05). As a result of the group analysis, three statistical parametric maps were generated, namely, the memory load map, the cued visual hemifield map, and the interaction map.
An ROI analysis was performed to compute an SPCN-like BOLD response. ROIs were defined according to regions of activations observed in the group average memory load map (i.e., the two posterior bilateral regions of activation). Preprocessing of the raw data set from each participant was performed with a MATLAB-based custom-made software, using a subset of the FSL functions. Preprocessing consisted of motion correction using MCFLIRT routine (Jenkinson et al., 2002), high-pass temporal filtering, and slice timing correction using SLICETIMER.
Each ROI was back-projected to the individual space before extracting the time courses from each voxel in the ROI for each participant. These time courses were then divided into trials, and each trial was converted into percent signal changes using the mean of the two preceding TRs as baseline. Trials in each cell of the memory load by cued visual hemifield design were averaged to calculate the mean BOLD response for each voxel of each participant's ROI. The root median square error (RMSE) between the median across voxels of the mean BOLD responses in a given experimental condition and each voxel's mean BOLD response in the same experimental condition was then computed. To remove noisy voxels, voxels with an RMSE above the 85th percentile of the RMSE distribution were discarded from the analysis.
Consistent with the practice adopted to analyze electromagnetic signals in analogous situations, the mean BOLD responses of each participant were rearranged considering the relative, rather than the absolute, position of to-be-memorized objects in the visual field. For the left ROI, contralateral responses were those recorded when the cued objects were displayed in the right visual hemifield, and ipsilateral responses were those recorded when the cued objects were displayed in the left visual hemifield. The symmetrical opposite applied for the right ROI. The design was adapted to this scheme, with the orthogonal combination of the factors Laterality (ipsilateral vs. contralateral) and Memory load (1 vs. 3).
For each ROI, a matrix composed of the so obtained mean BOLD responses for all participants and voxels of the ROI was the input to the cluster-based analysis. The matrix was composed of the four mean BOLD responses obtained by the orthogonal combination of laterality and memory load by the overall number of voxels across participants. Figure 2 illustrates the main steps of the k-means clustering analysis (see also Control Experiment and Analyses section). A k-means clustering algorithm (number of clusters = 4, replicates = 50, maximum number of iteration = 400, Euclidean distance as metric) was used to classify the voxel-wise mean BOLD responses of all participants into four different clusters according to their temporal profiles (Seber, 2008). This was done by iteratively assigning each set of BOLD responses to the cluster with the closest mean using the Euclidean distance as metric. To reduce to nil the possibility of erroneous classifications by the k-means algorithm driven by the absolute amplitude of BOLD responses, the percent signal change of each voxel was normalized to the maximum percent signal change of that voxel in the four experimental conditions. As a result, all BOLD responses values varied in a [−1, 1] interval.
The cluster analysis classified each voxel of the ROI of each participant as belonging to one of the four clusters (see Figure 2). The overall spatial organization of the clusters within each ROI was determined by remapping each participant's cluster spatial maps onto the MNI152 standard space (Grabner et al., 2006). The mean BOLD response for every ROI, participant, and condition was computed by averaging the BOLD responses of the voxels composing each cluster. To perform statistical analysis on the k-means results, we computed the individual peak values of the clustered mean BOLD responses. A repeated-measures 4 × 2 × 2 × 2 ANOVA considering Clusters (1 to 4), ROI (left vs. right), Memory load (1 vs. 3), and Laterality (ipsilateral vs. contralateral) as within-subject factors was performed. The Greenhouse–Geisser correction was applied where appropriate. A further 2 × 2 × 2 ANOVA, considering ROI, Memory load, and Laterality as within-subject factors, was performed on the individual peak values of each cluster. Paired sample t tests were employed to contrast the different levels of ROI, memory load, and laterality factors.
The SPCN-like BOLD response was computed by subtracting, in each ROI and for each memory load, the BOLD response observed in the hemisphere ipsilateral to the cued hemifield from the BOLD response observed in the hemisphere contralateral to the cued hemifield. We computed the mean peak value of this BOLD curve in a temporal window centered around the BOLD response peak (3–5 TRs for Cluster 1 and 3–4 TRs for Cluster 2) and submitted it to repeated-measures 2 × 2 × 2 ANOVA including ROI (right vs. left), Cluster (Cluster 1 vs. Cluster 2), and Memory load (1 vs. 3) as within-subject factors. Paired sample t tests were employed to contrast the different levels of ROI and memory load.
Control Experiment and Analyses
The control experiment was performed to establish whether VWM activation increased as the load increases reaching a plateau by its capacity limit.
Seventeen students (5 men; age M = 25.4 years; SD = 4.03 years) of the University of Padova took part in the control experiment. All participants reported normal or corrected-to-normal vision and had no history of neurological and/or psychiatric disorders. All participants gave written informed consent before entering the scanner according to the ethical principles approved by University of Padova. One participant was discarded from the analysis for not completing the experiment. The total number of participants considered in the forthcoming analyses was 16.
The control experiment shared the same experimental procedure of the main experiment except for the number of presented stimuli, which was three or five colored patches in each hemifield. The same standard fMRI analysis carried out in the main experiment was performed on the data collected in the control experiment.
A further between-group analysis was performed to compare the results of the main experiment and those of the control one. Regressors were defined for each participant's mean variability, for the memory load in the main experiment, and for the memory load in the control experiment. Each participant's K value for each condition was added as a covariate.
Two additional control analyses were performed to test the results of the cluster analysis. In the first analysis, we validated the spatial localization of the clusters. For each participant and ROI, the assignments of the voxels to the four clusters (i.e., the cluster labels displayed in Figure 2) were randomized 500 times. For each of the 500 repetitions, the same procedure performed to obtain the final spatial maps for each cluster was carried out. A cluster can be considered localized if it has a large number of connected voxels (i.e., connected component). Therefore, for the spatial maps in each random assignment and for our original spatial maps, the number of voxels of the largest connected component was computed. The probability of obtaining a cluster as large as the one we observed, under shuffling, was computed with a kernel smoothing density estimator for each of the four clusters and two ROIs using the ksdensity built-in MATLAB functions (Bowman & Azzalini, 1997). The probability p was computed by numerically integrating the probability density function up to the number of voxels of the largest connected component of the original map and subtracting it from 1. In the second additional control analysis, the k-means algorithm was run on the same data after increasing the number of clusters from 3 to 8. The number of clusters k that best suits the data should be the lowest number at which the k-means algorithm reaches stability in differentiating clusters showing activity (Lange, Roth, Braun, & Buhmann, 2004).
The average K for one item was 0.93 (SD = .07) and 2.04 (SD = .25) for three items (t(19) = 20.92, p < .00001); accuracy in the memory task was higher for one color (d′ = 3.36, SD = .38) than for three colors (d′ = 2.09, SD = .36; t(19) = 12.54, p < .0001).
Results of the preliminary standard whole-brain multiple regression analysis are displayed in Figure 3. Figure 3A illustrates the group average memory load map. Three regions of activation were identified (MNI coordinates for each maximum are reported): a central frontal region (0, 10, 58, max z = 4.58, p < .01) and two symmetrical parietal areas (left: −26, −62, 54, max z = 4.53, p < .001; right, 26, −74, 56, max z = 4.56, p < .001, including IPS/IOS). No significant activity modulation depending on the cued visual hemifield was detected in the standard analysis. Figure 3B shows the map of the interaction between memory load and cued visual hemifield in the left posterior region (−34, −62, 56, max z = 4.42, p < .001).
The activation in the frontal central region includes part of the SMA and supplementary eye field (SEF). These regions do not lend themselves to a straightforward interpretation in functional terms in the current paradigm. SEF is widely held to be involved in executive control (especially, inhibition) over saccade generation and performance monitoring (Stuphorn & Schall, 2006; Curtis, Cole, Rao, & D'Esposito, 2005; Stuphorn, Taylor, & Schall, 2000). Given the surprisingly negligible number of trials discarded for detection of eye movements (perhaps because participants were aware of the eye-tracking system we used, which is seldom adopted in other similar studies), one possibility that cannot be excluded is that SEF was engaged to counteract the natural tendency to direct the gaze to “interesting” objects. In this vein, memory load effects detected in SEF could reflect an increased effort to suppress saccades as the number of objects was increased. A function typically ascribed to SMA is the preparation and inhibition of motor plans (Roux, Wibral, Mohr, Singer, & Uhlhaas, 2012; Sumner et al., 2007). Participants may have withheld their responses to comply with the emphasis on response accuracy rather than speed. We suggest, following this line of reasoning, that participants invested more resources to minimize interference while encoding objects, which is known to take longer for three objects versus one object (Jolicœur & Dell'Acqua, 1998). Over and above these mere speculations, we decided to not analyze further this aspect of the fMRI results and to focus on the interhemispheric localization of memory load effects in the posterior brain, namely, the issue at stake in the present experimental context.
The cluster analysis focused on the two symmetrical posterior ROIs, which included the IPS region (Figure 3A). Each voxel of the ROI of each participant was assigned to one of the four clusters. As a result, 1603 voxels were assigned to Cluster 1, 1405 voxels to Cluster 2, 813 voxels to Cluster 3, and 796 voxels to Cluster 4, for the left ROI, while 957, 776, 486, and 381 voxels were assigned to Clusters 1–4, respectively, for the right ROI. These values, collapsed across hemispheres, are summarized in Table 1. Figure 4 shows the extremely low variability of the mean BOLD responses across participants for the two largest clusters as a function of ROI and experimental condition (the raw output of the k-means for each cluster can be viewed in Figure 5).
|.||Cluster 1 .||Cluster 2 .||Cluster 3 .||Cluster 4 .|
|Number of voxels||2560||2181||1299||1177|
|.||Cluster 1 .||Cluster 2 .||Cluster 3 .||Cluster 4 .|
|Number of voxels||2560||2181||1299||1177|
The overall spatial organization of the clusters and spatial agreement across participants within each ROI is shown in Figure 6.
The maximum number of participants contributing spatially correspondent voxels to Cluster 1 was 14 for the right ROI and 15 for the left ROI, and the corresponding maximum for Cluster 2 was 11 for the right ROI and 14 for the left ROI. Most importantly, Clusters 1 and 2 were characterized by consistent and symmetrical spatial localizations (Figure 6). Cluster 1 was located in the superior part of IPS (MNI coordinates: right ROI: 26, −72, 50; left ROI: −24, −62, 52). Cluster 2 was located in the inferior part of IPS (MNI coordinates: right ROI: 32, −80, 24; left ROI: −34, −80, 24). Voxels in Clusters 3 and 4 had less well-defined spatial organizations. The maximum number of participants contributing spatially correspondent voxels to Cluster 3 was 10 for the right ROI and 10 for the left ROI, and the corresponding numbers for Cluster 4 were 9 for the right ROI and 7 for the left ROI.
The results of the analyses carried out separately for each cluster to check for the presence of an SPCN-like response, namely, a larger contralateral than ipsilateral BOLD response increase as memory load was increased from 1 to 3 are reported next.
The BOLD response in Cluster 1 increased with memory load (F(1, 19) = 211.04, p < .001), but asymmetrically between ROIs (F(1, 19) = 137.23, p < .0001). Contralateral and ipsilateral BOLD responses increased equally with memory load in the right ROI. Contralateral and ipsilateral BOLD responses showed a different pattern in the left ROI (min t(19) = 12.17, p < .00001), with an effect of memory load that was more substantial ipsilaterally than contralaterally (Figure 4).
The BOLD response in Cluster 2 also increased with memory load (F(1, 19) = 558.19, p < .0001) and symmetrically between ROIs (F < 1), that is, the increase in BOLD response driven by memory load was more evident contralaterally than ipsilaterally in both the left and right ROIs (min t(19) = 6.04, all ps < .00001), indicating a contralateral effect of memory load bearing a strict analogy with SPCN activity (Figure 4).
In Cluster 3, the analysis detected an asymmetric distribution between ROIs of the memory load effect, as stated by the three-way interaction (F(1, 19) = 76.75, p < .0001), with larger BOLD responses in both ROIs when three versus one color(s) had to be memorized (min t(19) = 7.85, all ps < .00001), increased ipsilateral versus contralateral activity across all experimental conditions, except when memory was probed for three colors in the right ROI, where the opposite behavior was detected (min t(19) = 4.84, all ps < .001). Although this type of response pattern resembles that of Cluster 1, activity in Cluster 3 was supported by a number of voxels that was half the number of voxels assigned to Cluster 1. Furthermore, Cluster 3 voxels were more spatially sparse across participants. These features invite caution in attempting to qualify the functional contribution of neurons in Cluster 3 to working memory. We present a tentative account in relation to prior failures to find SPCN-like neurons in the Discussion.
Cluster 4 was the smallest and was not sensitive to memory load (F < 1). Voxels in this cluster were spatially scattered and likely were a collection of nonactive voxels.
Table 1 summarizes the main features of the four clusters, highlighting that Cluster 1 and Cluster 2 meet all requirements to be proposed as neural structures primarily involved in the present VWM task, with responses that lend themselves to a more meaningful interpretation in terms of memory load and side of encoding, compared with those of Cluster 3 and Cluster 4.
A final test examined an expected correlation between the maxima of the contralateral minus ipsilateral mean BOLD responses from Cluster 2 with behavioral estimates of the number of objects stored in VWM, Cowan's K. Individual Ks correlated with the maxima of Cluster 2 when three (rs = .397, p = .041) colors had to be memorized. No correlation was detected between maxima of the contralateral minus ipsilateral mean BOLD responses from Cluster 1 and Ks.1
Control Experiment and Analyses
Standard whole-brain multiple regression analysis on the control experiment showed that memory load effect did not vary significantly between three versus five colors (F < 1), suggesting BOLD responses in IPS were at plateau. Consistently, behavioral indexes of VWM capacity and sensitivity indicated a generally worse performance with five versus three colors, suggesting that VWM capacity was brought to saturation with five colors. The average K was 1.83 (SD = .35) for three items and 1.46 (SD = .55) for five items (t(15) = 2.94, p = .01); accuracy in the memory task was higher for three items (d′ = 1.79, SD = .43) than for five items (d′ = .82, SD = .38; t(15) = 8.68, p < .0001).
The between-group conjunction analysis contrasting the main experiment versus the control experiment confirmed that no further increment of BOLD response in IPS was found beyond three colors (Figure 7). The pair of posterior symmetric regions of activation (MNI coordinates for each maximum are reported: left: −22, −66, 38, max z = 4.72, p < .001; right: 22, −70, 52, max z = 4.21, p < .001) and one frontal area located centrally (0, 8, 58, max z = 4.37, p < .05) were, indeed, still identified, indicating activation up to VWM capacity limit, but not beyond it.
The results of the control analysis performed to validate the spatial localization of the clusters are displayed in Figure 8. The probability of obtaining a cluster as large as the one we observed, under shuffling, was .024 and .037 for Cluster 1 on the left and right ROI, respectively, and .009 and .377 for Cluster 2 for the left and right ROI, respectively. The probability for Cluster 2 in the right ROI is higher compared with the others because the number of voxels assigned to this cluster is smaller. However, this region is well localized in the spatial map in the symmetric area compared with where Cluster 2 is localized in the left ROI. This further reduces the chance that this localization is a coincidence.
An additional control analysis was carried out to test the stability of the k-means algorithm by changing the number of clusters from 3 to 8. Using k = 3 does not allow to differentiate between Cluster 1 and Cluster 2 because, as can be seen in Figure 5, the latency of the mean BOLD responses in Cluster 3 is quite different from that of Cluster 1 and Cluster 2, leading the algorithm to differentiate between hemodynamic responses with early or late latency only, with the third cluster including the remnant nonactive voxels. Using k > 4, the k-means continues to differentiate between Cluster 1 to −3, as with k = 4, thus showing a stability in the results. The additional clusters do not show any meaningful BOLD response, but only noisy temporal patterns. This result suggests that k = 4 is the correct number of clusters to segregate the distinct sources of BOLD signal modulation in the present data set.
The neural activity underpinning the encoding and maintenance of single-feature objects (colors) in VWM was explored in this study through the analysis of fMRI data recorded while participants performed a cued variant of a memory probe task. The paradigm was designed to avoid any sensory imbalance in the visual stimulation sequence—at the time of cuing, stimulus encoding, and memory retrieval—and to maximize the chance to detect signals reflecting memory retention of object identities rather than of their spatial position. Participants memorized a varying number of colors from a cued visual hemifield while an equal number of colors were displayed in the opposite hemifield. Memory was probed by asking participants to categorize a color displayed shortly afterwards at central fixation as one of the memoranda or a different color. Following a standard whole-brain analysis that confirmed the involvement of large portions of the IPS during this VWM task, critical indications about the neural activity subtended in the maintenance of contralateral visual stimulation were examined using a data-driven analytical approach, a k-means cluster analysis of per-voxel normalized BOLD signals.
The results of the cluster analysis revealed a set of voxels (Cluster 2), characterized by a consistent localization in the brain across individuals, in a region roughly corresponding to IPS 0–2 (e.g., Swisher, Halko, Merabet, McMains, & Somers, 2007). Using a permutation test, the possibility that this set of findings could have been artificially generated by the present analytical approach to the fMRI data was explored, and ruled out. Cluster 2 voxels showed a BOLD response that was more pronounced in IPS contralateral to memoranda compared with ipsilateral IPS, a hallmark of the SPCN ERP component. Furthermore, Cluster 2 voxels showed a BOLD response that increased as the number of objects was increased from 1 to 3. This pattern was detected in a cortical region where displaying five objects did not cause an increase in BOLD response relative to the response set size 3. Critical for the present investigation, this pattern was symmetrical relative to the cerebral sagittal midline, with SPCN-like BOLD responses showing the symmetrical contralateral bias to memory load variations commonly found using electromagnetic indices of VWM capacity (e.g., Vogel & Machizawa, 2004). Like for SPCN activity, the BOLD response of Cluster 2 correlated with Cowan's Ks when three colors had to be held in VWM. The present results therefore suggest that the activity of Cluster 2 voxels is compatible with the proposal of a neuronal scaffolding for the contralateral dominance of VWM representations localized in the human inferior IPS.
The sluggish time course of hemodynamic signal changes encompasses several stages of processing held to be involved in change detection tasks. This imposes a careful examination of at least two alternative functional sources of the observed effects, that is, the encoding of colors from the memory array and the retrieval of information about the encoded color(s) upon probe presentation. We examine the case of retrieval first. Presenting a probe at fixation is no guarantee, in and of itself, that neural activity would ensue symmetrically between the cerebral hemispheres. In fact, ERP evidence from studies on retrospective search indicates that a contralateral bias similar to that reflected in SPCN is present when attention scans an internal representation of the visual world (when encoding from one hemifield), even when a search target is displayed at central fixation. This all is in line with the hypothesis put forth by Awh and Jonides (2001) that attention mechanisms operating at the perceptual level and attention mechanisms operating at the VWM level overlap to a large extent, both functionally and neuroanatomically (e.g., Nobre, Griffin, & Rao, 2008; Olivers, 2008; Lepsien, Griffin, Devlin, & Nobre, 2005; Nobre et al., 2004; Yantis et al., 2002; de Fockert, Rees, Frith, & Lavie, 2001; LaBar, Gitelman, Parrish, & Mesulam, 1999; Coull & Nobre, 1998). Kuo, Rao, Lepsien, and Nobre (2009), for instance, instructed participants to search a visual array of random shapes or colored squares for a target (shape in one condition and color in a different condition) that was either shown centrally before the visual array in precue trials or after the visual array in postcue trials. The visual array could be composed of either two or four eccentric shapes or colors to evaluate the impact of varying the number of objects in a visual array on search efficiency in the sensory and VWM domains. ERP responses time-locked to postcues observed in postcue trials, whose structure resembled closely the trial structure of the present investigation, were characterized by activity compatible with SPCN. A similar design was employed by Dell'Acqua, Sessa, Toffanin, Luria, and Jolicœur (2010), with comparable results. An important thing to note, however, is that, in both these studies, SPCN attenuated (Dell'Acqua et al., 2010) or remained stable (Kuo et al., 2009; see also Nobre et al., 2004) as memory load was increased. This distinctive SPCN modulation marks a sizable difference between SPCN driven by retrieval mechanisms and that driven by VWM maintenance, which—like the BOLD response of Cluster 2 voxels—invariably increases as set size of the memory array is increased. The susceptibility of IPS activity to retrieval operations in change detection has been matter of an fMRI exploration by Todd and Marois (2004). The hemodynamic activity due to maintenance and retrieval was separately analyzed by employing a particularly protracted retention interval in a standard change detection task. VWM load-related increments of the BOLD activity in IPS were observed during retention, but not during retrieval. Collectively, this set of findings makes a strong case against the idea that the SPCN-like behavior of Cluster 2 voxels was modulated by neural processing engaged in the retrieval phase of the present task.
It is admittedly hard to disentangle the impact of encoding operations as a source of SPCN-like activity of Cluster 2 voxels observed in the present investigation, as encoding and VWM maintenance are strongly intertwined operations. One natural objection that can be levied here (as well as in most of prior fMRI works using change detection designs or variants, where this issue was seldom explicitly discussed) is that activity of Cluster 2 voxels may have come about as a result of attention iterating through objects displayed on the cued side of the memory array for encoding purposes. This is usually reflected in an ERP component leading temporally the surge of SPCN activity, namely, N2pc (Luck & Hillyard, 1994a, 1994b). N2pc bears a number of analogies with SPCN. Like SPCN, N2pc manifests itself as an increment in negativity at posterior electrodes (e.g., OL/R, PO7/8) contralateral to an eccentric target relative to ipsilateral symmetrical electrodes. By examining Figure 5, it appears as though the BOLD response from Cluster 2 was slightly anticipated relative to BOLD responses from the other three clusters, especially Cluster 1, located in the superior part of IPS, and this may lead one to suspect that Cluster 2 voxels reflected N2pc-like, rather than SPCN-like, activity. As for the case of retrieval, however, it is important to examine reports of N2pc variations as a function of perceptual and/or VWM load. We employed the paradigm used in the present context in a prior study in which we displayed a varying number of colors and simple geometrical shapes to estimate how the number of objects and their complexity affected the amplitude of SPCN. Although SPCN exhibited the prototypical amplitude increment as the number of to-be-encoded colors/shapes was increased, no comparable N2pc modulation was observed across four experiments. N2pc amplitude did not vary whether participants were exposed to two or four objects (Luria, Sessa, Gotler, Jolicœur, & Dell'Acqua, 2010). In the precue trials of Kuo et al. (2009), when the search target was displayed before a visual array of two or four simple-feature objects, an N2pc was observed in the ERP locked to the onset of the visual array. However, the amplitude of the N2pc did not vary as a function of set size. In a task that required a speeded classification of two or four eccentric alphanumerical stimuli, Jolicœur et al. (2008) found an N2pc, but again no N2pc amplitude variations were reported as a function of set size. These findings point to a functional dissociation between N2pc and SPCN, a notion that seems to have recently gathered a large consensus among scientists involved in attention and VWM studies. N2pc is likely to index attention-driven gating of visual information aimed to preserve VWM from clutter by distractors, whereas SPCN would be a direct reflection of the number of stored objects, in a way that has been shown to be largely unaffected by the distractor/target nature of the encoded visual information (e.g., Gaspar, Christie, Prime, Jolicœur, & McDonald, 2016; Bacigalupo & Luck, 2015; Luck & Vogel, 2013; Jolicœur et al., 2008; Vogel, McCollough, & Machizawa, 2005) and by the extent of the spatial area in the visual hemifield occupied by the targets, which usually covaries with the number of targets in the memory array (Luria & Vogel, 2014; Ikkai, McCollough, & Vogel, 2010). On the other hand, the literature has always emphasized distractors as playing a critical role in N2pc amplitude modulations (Luck & Hillyard, 1994b). To note, to-be-encoded colors in this study, like in those briefly considered above, were all to-be-encoded “singletons” displayed in the absence of distractors. A general impression is that, unless target objects must be searched in arrays crowded with distractors (Bacigalupo & Luck, 2015), tracked among nonstationary distractors like in multiple object tracking (Drew, Horowitz, Wolfe, & Vogel, 2012), or enumerated when displayed among distractors (Pagano & Mazza, 2012; Mazza & Caramazza, 2011), N2pc amplitude variations depending on the number of targets have, to the best of our knowledge, never been observed. These findings provide little support for the hypothesis that load variations of the BOLD response from Cluster 2 (and Cluster 1) described in the present investigation are a hemodynamic analogue of N2pc.
Additional results were of interest. The cluster analysis revealed a distinct cluster (Cluster 1) with a consistent interindividual spatial distribution of voxels in the superior IPS, in a region roughly corresponding to the medial banks and extending laterally in proximity of IPS 3–4. In contrast to Cluster 2, BOLD responses from right Cluster 1 were sensitive to memory load but did not depend on the location of the memorized objects. It is possible that neurons in right superior IPS encoded and maintained all objects—both cued (contralateral) and uncued (ipsilateral) objects—from the memory array. The right BOLD response pattern from Cluster 1 is generally compatible with proposals about the dominance of the right frontoparietal circuit (including the dorsal IPS) in attentional control over both visual hemifields (e.g., Corbetta, Kincade, Lewis, Snyder, & Sapir, 2005). BOLD responses detected from the left Cluster 1 were also sensitive to memory load and displayed a response that was intensified for ipsilateral compared with contralateral memoranda. On one hand, the left BOLD response pattern from Cluster 1 converges with recent investigations pointing to the left IPS as a critical hub of a neural circuit that inhibits distractors by downsizing their visual salience while the right hemisphere selects targets (Mevorach, Hodsoll, Allen, Shalev, & Humphreys, 2010; Mevorach, Shalev, Allen, & Humphreys, 2009). Memoranda and distractors in the present design were in fact equally salient, and selection of visual input was driven on a purely attentive basis. Cluster 1 neurons in the left superior IPS may have played just this role, that is, to inhibit contralaterally displayed uncued objects to allow a more efficient selection of visual input from the left hemifield. On the other hand, the view that neural activity in IPS cannot be fully understood without considering the strict interhemispheric connection of homologous areas in both cerebral hemispheres (via the caudal portion of the corpus callosum) appears solid and convincing (e.g., Szczepanski & Kastner, 2013; Szczepanski, Konen, & Kastner, 2010). In this framework, a different perspective on the ipsilateral bias of neurons in the left bank of Cluster 1 is one that calls into play the interhemispheric distribution of BOLD responses observed in Cluster 3. Voxels in right Cluster 3 showed a BOLD response pattern undistinguishable from the SPCN-like pattern of Cluster 2. Voxels in Cluster 3 lacked a consistent spatial distribution, presumably for the inherent interindividual variability associated with superior IPS visuotopic subregions. One limit of this study is that no visuotopic mapping was performed before the memory test, which may have allowed us to ascertain that Cluster 3 did not simply emerge as some random combination of the BOLD responses from Cluster 1 and Cluster 2. The BOLD response from Cluster 3 did display a distinguishing feature relative to Cluster 1 and Cluster 2, namely, a postponed peak latency associated with a more “smeared” time course, but whether this feature reflected a distinct function of Cluster 3 neurons (e.g., related to probe processing) or the skewed portion of BOLD responses from Cluster 1 and Cluster 2 will require further work. This point apart, the cluster analysis revealed that right IPS voxels with a contralateral bias outnumbered left superior IPS voxels with analogous properties. This may be the cause of the BOLD response asymmetry observed in Cluster 1, assuming that the generally stronger contralateral bias of right IPS voxels to encode input from the left hemifield coaxed part of the left superior IPS for the same purpose, thereby inducing an ipsilateral bias in left superior IPS.
Relative to prior failures to find a hemodynamic correlate of SPCN, it must be pointed out that the present SPCN-like BOLD response had a characteristic that encompassed basically all past similar explorations. Consistently across fMRI/fNIRS studies, the main feature of the hemodynamic response reflecting VWM maintenance in IPS was a bilateral increment in neural activity as the number of memoranda was increased. This effect was replicated here. Figure 4 makes also clear that the intensity of the contralateral dominance was substantially smaller than the bilateral response to memory load, a pattern bearing a close analogy to prior electromagnetic VWM explorations using a cued change detection design (Robitaille et al., 2010). This raises the possibility that prior failures to find an SPCN-like response were caused by the lack of power in detecting a small effect in the presence of a strong bilateral BOLD increment, a limitation that was presumably circumvented in the present investigation with the use of cluster analysis. Although proposing an analytical solution to the discrepancy between the results of the standard whole-brain analysis—showing an SPCN-like BOLD response pattern from the left IPS—and those of the cluster analysis—showing a symmetrical SPCN-like BOLD response from both the left and right inferior IPS—is beyond the scope of the present investigation, some hints at where future efforts should be addressed to can be mentioned. For instance, one issue that should not be neglected is that the group level statistical analysis based on the general linear model (GLM) approach classifies each voxel as having an equivalent functional connotation across participants, whereas voxels having an analogous functional connotation can be spatially contiguous but not necessarily coincident in the cluster analysis. Furthermore, GLM methods usually hinge on a formal model of the hemodynamic response that is assumed to be constant across voxels. The results of our cluster analysis revealed that the shape of the hemodynamic response differed substantially across the four clusters, however. These results suggest the standard GLM approach may miss important functional relationships by combining voxels with distinct hemodynamic responses to different conditions, on the one hand, and that, on the other hand, the cluster-based approach cannot only pull apart these different functional relationships but also it can do so while overcoming the strong assumption of a common hemodynamic response in all voxels.
In conclusion, we used a cluster analysis to classify voxels in load-sensitive regions of IPS according to their memory-related physiology. The results suggest the presence of spatially contiguous voxels in the inferior IPS of most of our participants that had a larger load-dependent increase in activity for stimuli memorized from the contralateral visual field (relative to ipsilateral), as memory load increased. This SPCN-like response of inferior IPS neurons may provide part of the neuronal basis of the SPCN observed in EEG and MEG and be part of the neuronal scaffolding for the contralateral dominance of VWM representations in the human brain.
The authors would like to thank Debora Zanatto for her support during the fMRI data acquisition. This work was supported by grant STPD11B8HM from the University of Padova.
Reprint requests should be sent to Roberto Dell'Acqua, Centre for Cognitive Neuroscience and Department of Developmental Psychology, University of Padova, Padova, Italy, or via e-mail: email@example.com.
Like Vogel and Machizawa (2004), we also tested the “filtering efficiency” component of VWM by adopting their subtractive algorithm. We correlated the difference between the maxima of the contralateral minus ipsilateral mean BOLD responses for set sizes 3 and 1 from Cluster 2 with individual's K for set size 3. The correlation was not significant. We noted, however, that, while Vogel and Machizawa (2004) selected the lateralized ERP responses for set sizes 4 and 2 and found a significant correlation, our failure to find a similar result for the BOLD response was likely due to a ceiling effect in the set size 1 condition, producing little variance in K across individuals. In general, it must be considered that one needs an N ≥ 50 for a Cohen's d of .8 to find correlation of .4 (Yarkoni & Braver, 2010). Therefore, besides the correlations reported in the present context, the correlations reported in other studies using similar sample sizes should also be taken with caution.