Abstract
The top–down control of attention involves command signals arising chiefly in the dorsal attention network (DAN) in frontal and parietal cortex and propagating to sensory cortex to enable the selective processing of incoming stimuli based on their behavioral relevance. Consistent with this view, the DAN is active during preparatory (anticipatory) attention for relevant events and objects, which, in vision, may be defined by different stimulus attributes including their spatial location, color, motion, or form. How this network is organized to support different forms of preparatory attention to different stimulus attributes remains unclear. We propose that, within the DAN, there exist functional microstructures (patterns of activity) specific for controlling attention based on the specific information to be attended. To test this, we contrasted preparatory attention to stimulus location (spatial attention) and to stimulus color (feature attention), and used multivoxel pattern analysis to characterize the corresponding patterns of activity within the DAN. We observed different multivoxel patterns of BOLD activation within the DAN for the control of spatial attention (attending left vs. right) and feature attention (attending red vs. green). These patterns of activity for spatial and feature attentional control showed limited overlap with each other within the DAN. Our findings thus support a model in which the DAN has different functional microstructures for distinctive forms of top–down control of visual attention.
INTRODUCTION
Visual attention can be voluntarily directed to spatial locations (spatial attention) or to object features such as color or motion (feature attention; Duncan & Humphreys, 1989; Posner, Snyder, & Davidson, 1980). Deployment of voluntary attention in advance of stimulus processing (preparatory attention) enables facilitation of attended information and suppression of ignored or irrelevant information (Heinze et al., 1994; Mangun & Hillyard, 1991; Corbetta, Miezin, Dobmeyer, Shulman, & Petersen, 1990; Moran & Desimone, 1985; Van Voorhis & Hillyard, 1977). Neurophysiologically, this feat is thought to be achieved by control signals issued by a predominantly frontal and parietal network that bias visual cortex to enable the selective processing of incoming sensory stimuli (Corbetta, Kincade, Ollinger, McAvoy, & Shulman, 2000; Hopfinger, Buonocore, & Mangun, 2000; Gitelman et al., 1999; Kastner, Pinsk, De Weerd, Desimone, & Ungerleider, 1999). This attentional control network includes bilateral FEFs and bilateral superior parietal lobule/intraparietal sulcus (SPL/IPS), and related areas, and has been referred to as the dorsal attention network (DAN; He et al., 2007).
The DAN has been implicated in the attentional control of different forms of visual attention, including spatial attention, feature attention, and object attention (Morishima et al., 2009; Slagter et al., 2007; Corbetta et al., 2005; Giesbrecht, Woldorff, Song, & Mangun, 2003). What remains unclear is precisely how the DAN supports these different forms of attentional control, or, put another way: How does the activity in the DAN represent the different to-be-attended stimulus attributes in order to provide specific top–down control signals to sensory systems? For example, functional imaging studies of top–down preparatory attentional control mechanisms have found scant evidence for differential specializations in top–down control of spatial compared to nonspatial attention (Slagter et al., 2007; Corbetta et al., 2005; Giesbrecht et al., 2003), although important clues come from work on the mechanisms of feature attention (Niklaus, Nobre, & van Ede, 2017; Ibos & Freedman, 2016; Summerfield & Egner, 2016; Astrand, Ibos, Duhamel, & Ben Hamed, 2015; Bichot, Heard, DeGennaro, & Desimone, 2015; Baldauf & Desimone, 2014; Liu & Hou, 2013; Greenberg, Esterman, Wilson, Serences, & Yantis, 2010), and to a lesser but important extent on attention to objects (Liu, 2016; Baldauf & Desimone, 2014; Jiang, Summerfield, & Egner, 2013; Morishima et al., 2009).
Two general alternative models have been offered about the nature of the DAN in the control of attention in vision. One is a domain-general model (Spagna, Mackie, & Fan, 2015; Fedorenko, Duncan, & Kanwisher, 2013; Wojciulik & Kanwisher, 1999) and/or supramodal model (Betti, Corbetta, de Pasquale, Wens, & Della Penna, 2018; Salmela, Salo, Salmi, & Alho, 2018; Wang, Viswanathan, Lee, & Grafton, 2016; Green, Doesburg, Ward, & McDonald, 2011; Shomstein & Yantis, 2004) of the DAN, where it serves as an executive system for all forms of attentional control. In such a view, both specializations in the functional organization within the DAN (Liu & Hou, 2013), and of intra-DAN connectivity (Szczepanski & Kastner, 2013), likely play roles in different forms of attentional control. A different view is the idea that the DAN is primarily a spatially based system (Szczepanski & Kastner, 2013; Mangun & Fannon, 2007; Molenberghs, Mesulam, Peeters, & Vandenberghe, 2007; Bichot, Schall, & Thompson, 1996) and that nonspatial feature and object representations and/or control mechanisms are supported by specialized regions outside the classical DAN (e.g., inferior frontal junction; Bichot et al., 2015; Baldauf & Desimone, 2014). Here, we focus on the functional architecture of the DAN, asking whether specializations within it might be related to different forms of top–down attentional control during preparatory attention.
Neuroimaging studies of attentional control have primarily relied on univariate analysis (e.g., Bengson, Kelley, & Mangun, 2015; Szczepanski, Pinsk, Douglas, Kastner, & Saalmann, 2013; Sestieri et al., 2008; Corbetta et al., 2000; Hopfinger et al., 2000). In univariate fMRI analysis, for a voxel to be reported as activated by an experimental condition, it needs to be consistently activated across individuals. Individual differences in voxel activation patterns could lead to failure to detect the presence of neural activity in a given region of the brain (e.g., Haxby et al., 2011). Multivoxel pattern analysis (MVPA) provides a way to take into account the multivariate spatial pattern of the BOLD activity across voxels in order to discriminate between experimental conditions (Haynes, 2015; Tong & Pratte, 2012; Norman, Polyn, Detre, & Haxby, 2006). Studies using MVPA to study object recognition (Sterzer, Haynes, & Rees, 2008), attention (Liu & Hou, 2013; Greenberg et al., 2010), and emotion (Kim et al., 2015) have shown that multivoxel patterns can be different between experimental conditions from individual to individual, even if the average BOLD activity is comparable across conditions and/or individuals. Furthermore, MVPA analysis is conducted at an individual subject level, which takes into consideration each participant's idiosyncratic nature of the spatial pattern of BOLD responses (Haxby, Connolly, & Guntupalli, 2014; Cox & Savoy, 2003).
To investigate the organization of top–down preparatory attentional control in the DAN, we utilized a well-established cued spatial/feature attentional control task, which permitted us to distinguish preparatory attentional control from selective sensory processing and motor responses (Slagter et al., 2007; Giesbrecht et al., 2003). On each trial, an auditory cue (spoken word) was presented that gave advance information about the to-be-attended target attribute (spatial location or color). Univariate and multivariate analyses were performed on the cue-evoked BOLD activity to investigate the distinct functional neuroanatomical substrates of spatial versus feature attentional control in DAN. Here, we report successful decoding of different forms of attentional control in the cue–target interval and provide evidence for distinct neural activity patterns—referred to as microstructures—for spatial versus feature attentional control in the DAN. These findings have important implications for our understanding of the neural mechanisms of voluntary attentional control.
METHODS
Participants
The experimental protocol was approved by the institutional review board of the University of Florida. Twenty healthy, right-handed college students (mean age 24.65 ± 2.87 years, 15 men and 5 women) with normal or corrected-to-normal vision, and no history of neurological or psychological disorders, provided written informed consent and participated in the study.
Paradigm
The experimental paradigm used was a variant of those used in many previous preparatory attention studies in which attention-directing cues instructed participants how to selectively focus attention on each trial (e.g., Corbetta et al., 2000; Hopfinger et al., 2000). As illustrated in Figure 1, two peripheral locations, 3.6° lateral to the upper left and upper right of the fixation point, were marked on the screen. Each trial started with a spoken auditory cue of 500 msec in duration instructing the participant how to covertly direct attention on that trial. Three types of trials were included: (i) spatial cue trials, which directed attention to a spatial location (“left” or “right,” independent of color; 40% of all trials); (ii) color cue trials, which directed attention to a color (“red” or “green,” independent of location; 40% of all trials); and (iii) neutral trials (the word “none”; 20% of all trials), which directed the participant to neither spatial location nor color, but to prepare to respond to one rectangle's orientation based on it being presented on a gray patch. On 80% of the spatial and color cue trials, target stimuli followed the cues (varied delay of 3000–6660 msec). The target stimuli were either two colored rectangles (red and/or green) simultaneously presented in the left and right hemifields for a duration of 200 msec (valid trials) or a single rectangle of 200 msec in duration appearing in the uncued location or having the uncued color (invalid trials). On the remaining 20% of spatial and color cue trials, the cue appeared but no target followed (cue-only trials).
Experimental paradigm. Each trial started with an auditory cue (spoken words, 500 msec in duration) that instructed participants to covertly attend, while maintaining fixation on the plus sign, to either a spatial location (“left” or “right,” independent of color), a color (“red” or “green,” independent of location), or neither (“none”; see text for description of these neutral cues). For spatial and color cues, after a variable cue–target ISI (3000–6600 msec), on the majority of trials, two colored rectangles were displayed (200 msec in duration), one in each visual hemifield. Participants were asked to report the orientation of the rectangle (horizontal or vertical) that was displayed in the cued location (regardless of its color) or that had the cued color (regardless of its location); the uncued rectangle was to be completely ignored (except on 8% of trials that were invalidly cued, in which it was the only stimulus on the screen, i.e., there was only a single rectangle that was presented in the uncued location or having the uncued color). An intertrial interval (ITI) that varied randomly from 8000 to 12800 msec followed the onset of the target.
Experimental paradigm. Each trial started with an auditory cue (spoken words, 500 msec in duration) that instructed participants to covertly attend, while maintaining fixation on the plus sign, to either a spatial location (“left” or “right,” independent of color), a color (“red” or “green,” independent of location), or neither (“none”; see text for description of these neutral cues). For spatial and color cues, after a variable cue–target ISI (3000–6600 msec), on the majority of trials, two colored rectangles were displayed (200 msec in duration), one in each visual hemifield. Participants were asked to report the orientation of the rectangle (horizontal or vertical) that was displayed in the cued location (regardless of its color) or that had the cued color (regardless of its location); the uncued rectangle was to be completely ignored (except on 8% of trials that were invalidly cued, in which it was the only stimulus on the screen, i.e., there was only a single rectangle that was presented in the uncued location or having the uncued color). An intertrial interval (ITI) that varied randomly from 8000 to 12800 msec followed the onset of the target.
The participants' task was to report (button press) the orientation of the rectangle (target) appearing in the cued location (spatial attention) or having the cued color (feature attention), and to ignore the other rectangle (distractor). For color cue trials, the two rectangles displayed were always of different colors; for spatial cue trials, the two rectangles were either of the same color or of different colors. For neutral cue trials, two rectangles were also displayed, and participants were required to discriminate the orientation of the rectangle with the gray patch in the background. On 8% of the spatial cue or color cue trials, the cues were invalid, because only one rectangle was subsequently displayed (50/50 in left or right overall); the rectangle appeared either in the uncued location or having the uncued color, and the participants were required to report the orientation of that rectangle. Both the neutral and invalidly cued trials were included to permit the measurement of the behavioral effects of attentional cuing (Posner, 1980), but were not included in the BOLD analyses because there were too few such trials. An intertrial interval, from target onset to the start of the next trial varied randomly from 8000 msec to 12800 msec. Trials were organized into blocks, with each block consisting of 25 trials and lasting approximately 7 min, with short rest periods in between. Each participant completed 10–14 blocks over 2 days. The ISIs and trial structure were designed to enable successful deconvolution of overlapping BOLD responses from cues and targets, given the long duration of hemodynamic responses (Woldorff et al., 2004; Ollinger, Corbetta, & Shulman, 2001; Ollinger, Shulman, & Corbetta, 2001; Burock, Buckner, Woldorff, Rosen, & Dale, 1998).
The goal of our experimental design is to be able to contrast two types of preparatory (postcue/pretarget) attention: attention to spatial location and attention to a nonspatial feature (color). During the preparatory period after spatial cues, the participants could covertly orient spatial attention in order to prepare to discriminate the target orientation at the cued location, with the target colors being irrelevant. During the preparatory period after color cues, the participants could not develop an expectancy for where the relevant target would be, but only what its color would be, and thus, only after the targets actually appeared could spatial attention be oriented and the target discriminated. As a result, during the preparatory period (postcue/pretarget or cue–target interval), the participants engage different forms of attentional control (spatial or color). The logic of the design follows our prior work (Slagter et al., 2007; Giesbrecht et al., 2003), but it does not explicitly preclude that participants could have adopted a strategy of dividing their spatial attention during the preparatory period after color cues, given that they know in the task that the targets will only appear in either the left or right locations. This is important to bear in mind because it means that some activation of spatial control structures within the DAN by the color cues may be unavoidable in this design (and most others), which would have the effect of reducing our ability to differentiate patterns of attention control for spatial versus color attention (however, the pattern of results we present in the following suggest that this was not the case).
As we noted earlier, and have done in prior studies, it is possible to add additional control conditions to help with the isolation of feature from spatial attention, but no single design can do that perfectly (Slagter et al., 2007; Giesbrecht et al., 2003). This aspect of the design is one reason that (as described below) we performed the decoding separately for preparatory spatial attention (decoding left vs. right attention) and preparatory feature attention (decoding red vs. green attention). By performing the decoding in this way and then comparing the decoding results, we ensure that our decoding results are focused on the forms of attentional control that we aimed to investigate and not merely differences in preparatory spatial attention (e.g., focused attention in spatial trials, but divided spatial attention during feature trials), which could lead to different task sets between spatial and feature trials (Hubbard, Kikumoto, & Mayr, 2019), or potentially trivial differences between conditions, such as systematic deviations of eye position during task performance (Mostert et al., 2018). With respect to the latter issue of eye positions, we ruled out a confounding influence of systematic differences in eye positions by decoding eye position data recorded using an eye tracker during the scanner sessions (see Figure 8).
All participants went through a task training session during which their eye movements during the task were monitored using the EyeLink 1000 eye tracker system (SR Research). The participants who showed an accuracy above a minimum criterion (> 70%) and who were able to maintain proper eye fixation throughout the experiment (assessed by visual inspection of their fixation maps derived from the eye-tracking data) took part in the actual fMRI experiment, where eye tracking was also employed.
fMRI Acquisition and Preprocessing
Functional images were collected on a 3T Philips Achieva scanner (Philips Medical Systems) equipped with a 32-channel head coil. The EPI sequence parameters were as follows: repetition time = 1.98 sec; echo time = 30 msec; flip angle = 80°; field of view = 224 mm; slice number = 36; voxel size = 3.5 × 3.5 × 3.5 mm; matrix size = 64 × 64. Slice orientation was parallel to the plane connecting the anterior and posterior commissures. Simultaneous EEG was also recorded but not analyzed here. Although not analyzed for this report, in order to permit assessment of the quality of EEG recordings, the image acquisition protocol was modified; image acquisition was performed only during the initial 1.85 sec within each EPI volume, with no image acquisition taking place during an interval of 130 msec toward the end of each repetition time.
fMRI data were preprocessed in SPM (Friston et al., 1994). Preprocessing steps included slice timing correction, realignment, spatial normalization, and smoothing. Slice timing correction was carried out using sinc interpolation to correct for differences in slice acquisition time within an EPI volume. In order to account for changes in head position, spatial realignment of the images to the first image of each session was performed using a 6-parameter rigid body spatial transformation. Images from each participant were normalized and coregistered to the Montreal Neurological Institute template. Images were resampled to a voxel size of 3 × 3 × 3 mm, spatially smoothed using a Gaussian kernel with 7-mm full width at half maximum and high-pass filtered with cutoff frequency set at 1/128 Hz.
fMRI Analyses
Cue-evoked BOLD responses were first examined using the univariate general linear model (GLM) approach (Friston et al., 1994). Eight task-related regressors were included in the GLM, modeling the following events: Five regressors modeled BOLD activity related to the five types of cues with correct responses: attend left, attend right, attend red, attend green (each of which included cue–target and cue-only trials), and neutral attention; two regressors modeled BOLD activity evoked by target stimuli, that is, validly cued and invalidly cued target stimuli; one additional regressor was added to model the cues with incorrect responses. A t test was performed by contrasting betas from different conditions at a voxel level to yield the t map for each participant. The brain activation map for spatial attentional control was obtained by combining attend left and attend right cues, and the brain activation map for color attentional control was obtained by combining attend red and attend green cues. Statistical analyses were performed at the group level using a one-sample t test on the t maps from all the participants, thresholded at p < .05, corrected for multiple comparisons with the false discovery rate (FDR) method (Genovese, Lazar, & Nichols, 2002) as implemented in SPM. If the group-level maps were computed by using the individual contrast maps instead of the individual t maps, the results were virtually identical, with the maps computed the two different ways being highly correlated (R = .95).
Definition of ROI
The DAN was the focus of this study because of the extensive literature that has focused on the role of the regions within this network in attentional control, and this focus permitted us to ask a specific question about this identified attentional control network. The ROIs corresponding to the DAN were selected using the statistically significant (p < .05, FDR corrected) group-level cue-evoked BOLD activation map (space + color cues). FEF included voxels activated in the precentral gyrus, superior frontal gyrus, and middle frontal gyrus region, and SPL/IPS included voxels activated in inferior parietal region and SPL, consistent with previous studies (Szczepanski et al., 2013; He et al., 2007; Slagter et al., 2007; Giesbrecht et al., 2003). Activated voxels in dorsal precuneus that were contiguous with activated DAN voxels were also included in the ROI (Liu, Bengson, Huang, Mangun, & Ding, 2016; Giesbrecht et al., 2003). In addition to DAN as a whole, to investigate whether there were differences in MVPA decoding in major subdivisions of the DAN, we also subdivided it into posterior DAN (pDAN, bilateral SPL/IPS), anterior DAN (aDAN, bilateral FEF), left DAN (lDAN, FEF, and SPL/IPS in the left hemisphere), and right DAN (rDAN, FEF, and SPL/IPS in the right hemisphere).
As described above, the DAN ROI used in this study was defined using the univariate analyses of the attention conditions at the population level in coregistered standard space (Montreal Neurological Institute space). This approach allowed us to capture the core set of DAN voxels involved in the control of attention to space and feature in the present task that are common across the participants. Using this group approach means, of course, that individual differences in the DAN may not be accounted for in our analyses. Although the functional anatomy of the DAN is well conserved across individuals (Dworetsky et al., 2020; Gratton et al., 2018), we nonetheless could have considered alternatives, such as using a localizer scan or templates, to identify the DAN in order to define individual participant ROIs, as we have done in some prior work (Fannon, Saron, & Mangun, 2008). Indeed, some prior decoding studies have defined the DAN at the individual participant level in native space (Liu & Hou, 2013), whereas others have taken approaches similar to our group-level/standard-space method (Zhang & Golomb, 2021). Although there are pros and cons to each approach, one mitigating factor in our present work is that the DAN ROI used in this study is rather large (1390 voxels) and is therefore expected to have significant overlap with individual DANs.
Estimation of Single-Trial BOLD Responses and MVPA
The MVPA technique explores the difference in spatial patterns of BOLD activation to classify experimental conditions (Haynes, 2015; Haxby et al., 2014; Norman et al., 2006), and it is performed at the single-trial level. We applied the beta series regression method (Rissman, Gazzaley, & D'Esposito, 2004) to estimate BOLD activation on each trial. The beta series regression method has been used effectively to estimate single-trial BOLD responses for MVPA (Kriegeskorte, Goebel, & Bandettini, 2006; Norman et al., 2006).
There are different MVPA techniques. In this study, a linear support vector machine (SVM) with c = 1 was used to identify the patterns of activity within the DAN that were related to preparatory attention to spatial location (by decoding left vs. right attention) and, separately to preparatory attention to stimulus color (by decoding red vs. green attention). The resultant SVM weight maps (described below) were then compared for preparatory spatial versus color attention to assess whether the patterns for the two forms of preparatory attention with overlapping or distinct, and if overlapping, to what degree. All the voxels activated in response to spatial or color cues in the DAN ROIs were chosen as features for MVPA analyses. The classification accuracy for each participant was calculated using a 10-fold cross-validation technique. In this technique, 90% of the labeled data (e.g., for spatial attention, attend left vs. right trials) was used for training the classifier to generate a predictive model, and the remaining 10% of the data was used to test the model by comparing the actual labels against the predicted labels. This process was repeated 10 times using 10 different subsets of trials as testing data, and the 10 prediction accuracies were averaged. This averaged accuracy, referred to as decoding accuracy, measures the distinctiveness of preparatory spatial attention patterns or preparatory feature attention patterns of BOLD activation in the DAN (Haxby et al., 2014; Haynes & Rees, 2006). A nonparametric permutation technique was used to test the statistical significance of the decoding accuracy against chance-level decoding (Stelzer, Chen, & Turner, 2013). Specifically, at the individual participant level, the class labels were shuffled 100 times, and for each shuffled label, the 10-fold cross-validation procedure was carried out. At the group level, one classifier accuracy from the 100 shuffled accuracies were chosen randomly for each participant and averaged across participants. This procedure was repeated 105 times, which resulted in 105 chance-level decoding accuracies at the group level. The group-level decoding accuracy obtained from the actual data was compared with the empirical distribution of the group-level chance accuracies to determine its statistical significance.
One point bears consideration in these methods. We used a 10-fold validation method, in which 90% of the trials were used to train a model and the remaining 10% for testing the model. Past work has suggested that a leave-one-run-out approach is better at maintaining independence between training and testing data sets during cross-validation if the trials are close together in time (Varoquaux et al., 2017). When the trials are sufficiently separated in time, however, the leave-one-run-out method and the leave-10%-out (10-fold validation) method are expected to generate similar results. Our experiment utilized a slow event-related design, and the average time interval between two adjacent cues is 15 sec (see Figure 1). In addition, 20% of the trials were cue-only trials where the cue was not followed by a target, leading to further separation of events in the experiment. These design choices help to ensure that there is less overlap in hemodynamic response between any two events, making them fairly independent of each other. As expected, when we directly compared decoding using leave-one-run-out versus leave-10%-out cross-validation methods in a randomly selected subset of participants (n = 7), we found no significant differences in decoding accuracy between the two methods of cross-validation (p = .95).
Classifier Weight Maps
In addition to decoding accuracy, another key aspect of the SVM technique is the weight map, which can be used to attribute functional significance to each voxel. Specifically, a linear SVM tries to find a hyperplane to maximize the margin separating two classes of data (Cortes & Vapnik, 1995), which in this study are (i) attending left versus right for spatial attention and (ii) attending red versus green for feature attention. The weight vector normal to each separating hyperplane represents the direction along which there exists maximal separation between the two classes of data. It is worth noting that the weight maps corresponding to the SVM weight vectors described above are difficult to interpret functionally. An fMRI voxel that does not contain stimulus information may acquire a large weight because it helps to improve signal-to-noise in other voxels that do contain stimulus information (Kriegeskorte & Douglas, 2019). The transformation method proposed by Haufe et al. (2014), however, remedies this situation. In this method, the weight vectors from the SVM are transformed to activation patterns by multiplying them with the covariance matrix of the input data Z = Cov(X) * W, where X is the input data, which is an N × V matrix, containing N trials and V number of voxels/feature, W is the weight vector of length V, and Z represents the corrected weight vector, which, according to prior studies, are more functionally relevant (Grootswagers, Wardle, & Carlson, 2017; Haufe et al., 2014). The corrected weight vector was normalized by dividing by its maximum absolute value, projected onto the voxels within an ROI, and visualized in the form of a brain map referred to as the weight map (Lee, Halder, Kübler, Birbaumer, & Sitaram, 2010; Mourão-Miranda, Bokde, Born, Hampel, & Stetter, 2005). In addition to the magnitude, the sign of the weight in each voxel is also meaningful, providing information about the contribution of that voxel to a particular condition (e.g., attend left vs. attend right). To illustrate, suppose that Condition A is assigned positive label (+1) and Condition B negative label (−1). A positive weight means that the voxel has higher activity during Condition A as compared to Condition B, whereas a negative value means higher activity during Condition B as compared to Condition A. The functional activation difference becomes more pronounced for voxels with larger weights (see Results section for a demonstration of these properties). As a consequence of the foregoing, for the purposes of this study, the weight maps, obtained by combining SVM weight vectors with the Haufe et al. (2014) transformation, can then (with appropriate caveats) be interpreted in terms of the functional anatomical microstructures underlying different types of attentional control.
Monitoring of Eye Movements
During fMRI scanning, the eye position was monitored and recorded using an EyeLink 1000 MRI-compatible eye tracker. The x and y coordinates of eye positions were averaged in 100-msec windows with a 50% overlap and subjected to SVM analysis. At each time point, decoding accuracy between different attention conditions was obtained for each participant by implementing a 10-fold cross-validation technique and averaged across participants to yield the decoding accuracy time course. Serial t tests were performed to identify time periods where decoding accuracy was above the chance level. FDR correction was applied to account for multiple comparisons across different time points.
RESULTS
Behavior
The overall mean RT and accuracy across all trial types were 1011 msec ± 183 msec and 93.66% ± 3.82%, respectively. For spatial trials, mean RT was 1016 msec ± 178 msec, and accuracy was 93.94% ± 3.83%, whereas for color trials, RT was 1006 msec ± 188 msec, and accuracy was 93.39% ± 4.50%. There were no significant differences (p > .5) in these overall behavioral measures between the spatial and color attention conditions (Figure 2A). Since p values depend on the sample size, whereas the effect size does not, we also considered effect size. The Cohen's d is 0.05 for the RT difference and 0.127 for the accuracy difference, both being negligible, with d > 0.2 considered a small effect size. Furthermore, a Bayesian analysis was also applied to further compare RT and accuracy between spatial and feature attention conditions. For RT, the Bayes factor in favor of the alternate hypothesis (RT for spatial ≠ RT for feature) was very low at 0.26, whereas the Bayes factor in favor of the null hypothesis (RT for spatial = RT for feature) was 3.8, with 3.2 or higher considered as offering substantial supporting evidence (Kass & Raftery, 1995). For accuracy, the Bayes factor in favor of the alternate hypothesis (accuracy for spatial ≠ accuracy for feature) was also very low at 0.28, whereas the Bayes factor in favor of the null hypothesis (accuracy for spatial = accuracy for feature) was 3.6. These behavioral results provide converging evidence to indicate that the general level of task difficulty was equivalent between spatial and color cue conditions, which was by design (during pretesting of the paradigm, the aspect ratios and choice of color of targets, in spatial and color conditions, were independently adjusted to match performance across conditions).
Behavioral results. (A) No significant differences in either overall RT or accuracy were observed between spatial and color trials. This suggests that our design ensured that spatial attention and color attention conditions did not differ in task difficulty. (B) Attention effects (differences between validly and invalidly cued trials) were significant for both spatial and color conditions. RT was faster and accuracy was higher for validly cued (attended) targets (* p < .001). This suggests that the participants attended the cued location or the cued feature according to instructions.
Behavioral results. (A) No significant differences in either overall RT or accuracy were observed between spatial and color trials. This suggests that our design ensured that spatial attention and color attention conditions did not differ in task difficulty. (B) Attention effects (differences between validly and invalidly cued trials) were significant for both spatial and color conditions. RT was faster and accuracy was higher for validly cued (attended) targets (* p < .001). This suggests that the participants attended the cued location or the cued feature according to instructions.
Focused attention improved behavioral performance for both spatial trials (valid RT < invalid RT, p < 10−3) and color trials (valid RT < invalid RT, p < 10−5), as shown in Figure 2B, providing behavioral evidence that the participants deployed covert attention selectively to the cued location or the cued feature according to instructions. Furthermore, responses on validly cued spatial and color trials were significantly faster than on neutral (none cue) trials (p < 10−5), providing additional behavioral evidence that the participants were deploying the appropriate attention after attention-directing cues.
Univariate Analysis of Cue-evoked BOLD Activation
The GLM was applied to examine univariate BOLD activations in response to various attention-directing cues. Consistent with prior reports, bilateral FEF, bilateral SPL/IPS, and precuneus in the DAN were activated in response to both spatial and color cues (Slagter et al., 2007; Giesbrecht et al., 2003; Corbetta et al., 2000; Hopfinger et al., 2000), providing neural evidence that participants deployed preparatory attention in the cue–target interval (Figure 3A, B, and C). Statistically contrasting spatial cues versus color cues (Figure 3D) revealed that small clusters in bilateral SPL and left FEF within the DAN were significantly more activated for spatial cues than for color cues. However, when color cues were contrasted against spatial cues, no activated regions were found (Figure 3E). This pattern of findings is a replication of our original work contrasting attentional control for spatial location and nonspatial feature (color) attention (Mangun & Fannon, 2007; Slagter et al., 2007; Giesbrecht et al., 2003).
Univariate analyses of cue-evoked BOLD activation. BOLD signal was significantly increased (p < .05, FDR) in DAN structures in response to (A) spatial + color cues, (B) spatial cues only, and (C) color cues only. (D) Parts of bilateral SPL and left FEF were more activated when spatial cues were contrasted against color cues. (E) No regions in DAN were more activated when color cues were contrasted against spatial cues. These findings replicate prior work demonstrating significant overlap between DAN activation for spatial and feature attention control using univariate analysis methods.
Univariate analyses of cue-evoked BOLD activation. BOLD signal was significantly increased (p < .05, FDR) in DAN structures in response to (A) spatial + color cues, (B) spatial cues only, and (C) color cues only. (D) Parts of bilateral SPL and left FEF were more activated when spatial cues were contrasted against color cues. (E) No regions in DAN were more activated when color cues were contrasted against spatial cues. These findings replicate prior work demonstrating significant overlap between DAN activation for spatial and feature attention control using univariate analysis methods.
MVPA Analysis: Decoding Different Forms of Attentional Control
In order to test for the existence of distinct multivoxel neural activity patterns supporting spatial and color preparatory attentional control in DAN that might be obscured by univariate methods, the SVM classifier was applied to voxelwise single-trial beta values in the cue–target interval separately for spatial attention (decoding attend left vs. attend right) and color attention (decoding attend red vs. attend green) trials. As shown in Figure 4A and 4C, within the DAN as a whole, the mean accuracy of decoding attend left versus attend right was 55% (p < .0001), and the mean accuracy for decoding red versus green was 57% (p < .0001), both being significantly above chance level of 50%. Further dividing the DAN into aDAN, posterior DAN (pDAN), lDAN, and rDAN, we found that the decoding accuracy between spatial conditions (Figure 4B) and between color conditions (Figure 4D) were all above chance level, indicating that distinct neural activity patterns supporting different forms of attentional control were also present in DAN subdivisions. Furthermore, across participants, the decoding accuracies in different DAN subdivisions and in the whole DAN were found to be correlated with one another (Tables 1 and 2), suggesting that the individual differences in pattern distinctness were similar in these tested ROIs.
MVPA decoding accuracy for preparatory spatial and feature attention. Cue-evoked BOLD activation was estimated on a trial-by-trial basis. SVM was applied to decode different attention control conditions. Decoding accuracy for spatial attention (attend left vs. attend right): (A) within DAN as a whole and (B) within subdivisions of DAN. Decoding accuracy for color attention (attend red vs. attend green): (C) within DAN as a whole and (D) within subdivisions of DAN (**p < .0001, *p < .05).
MVPA decoding accuracy for preparatory spatial and feature attention. Cue-evoked BOLD activation was estimated on a trial-by-trial basis. SVM was applied to decode different attention control conditions. Decoding accuracy for spatial attention (attend left vs. attend right): (A) within DAN as a whole and (B) within subdivisions of DAN. Decoding accuracy for color attention (attend red vs. attend green): (C) within DAN as a whole and (D) within subdivisions of DAN (**p < .0001, *p < .05).
Spatial Attention: Correlations between Decoding Accuracies for Decoding Performed on the Whole DAN and for Decoding Performed Separately on Each Specified Subdivision of the DAN
. | DAN . | pDAN . | aDAN . | lDAN . | rDAN . |
---|---|---|---|---|---|
DAN | 1.00 | 0.92 | 0.87 | 0.94 | 0.83 |
pDAN | 0.92 | 1.00 | 0.75 | 0.86 | 0.78 |
aDAN | 0.87 | 0.75 | 1.00 | 0.81 | 0.80 |
lDAN | 0.94 | 0.86 | 0.81 | 1.00 | 0.76 |
rDAN | 0.83 | 0.78 | 0.80 | 0.76 | 1.00 |
. | DAN . | pDAN . | aDAN . | lDAN . | rDAN . |
---|---|---|---|---|---|
DAN | 1.00 | 0.92 | 0.87 | 0.94 | 0.83 |
pDAN | 0.92 | 1.00 | 0.75 | 0.86 | 0.78 |
aDAN | 0.87 | 0.75 | 1.00 | 0.81 | 0.80 |
lDAN | 0.94 | 0.86 | 0.81 | 1.00 | 0.76 |
rDAN | 0.83 | 0.78 | 0.80 | 0.76 | 1.00 |
Feature Attention: Correlation between Decoding Accuracies for Decoding Performed on the Whole DAN and for Decoding Performed Separately on Each Specified Subdivision of the DAN
. | DAN . | pDAN . | aDAN . | lDAN . | rDAN . |
---|---|---|---|---|---|
DAN | 1.00 | 0.85 | 0.85 | 0.94 | 0.64 |
pDAN | 0.85 | 1.00 | 0.57 | 0.75 | 0.72 |
aDAN | 0.85 | 0.57 | 1.00 | 0.87 | 0.63 |
lDAN | 0.94 | 0.75 | 0.87 | 1.00 | 0.56 |
rDAN | 0.64 | 0.72 | 0.63 | 0.56 | 1.00 |
. | DAN . | pDAN . | aDAN . | lDAN . | rDAN . |
---|---|---|---|---|---|
DAN | 1.00 | 0.85 | 0.85 | 0.94 | 0.64 |
pDAN | 0.85 | 1.00 | 0.57 | 0.75 | 0.72 |
aDAN | 0.85 | 0.57 | 1.00 | 0.87 | 0.63 |
lDAN | 0.94 | 0.75 | 0.87 | 1.00 | 0.56 |
rDAN | 0.64 | 0.72 | 0.63 | 0.56 | 1.00 |
In Figure 3D, small clusters of voxels in DAN were more activated by spatial cues than by color cues, and these voxels accounted for 8% of the total DAN ROI. We tested whether the decoding results differed when we included (DAN) versus omitted (DAN-8%) these voxels. For attend left versus attend right, the respective decoding accuracies were virtually identical: DAN = 55.37% and DAN-8% = 55.27%. The correlation between the two spatial attention weight maps was 0.991. For attend red versus attend green, the respective decoding accuracies were again virtually identical: DAN = 56.58% and DAN-8% = 56.66%. The correlation between these two color attention weight maps was 0.996. Thus, the results reported here are not expected to be different if we omitted the 8% of voxels, and therefore, we retained those voxels in defining our DAN ROI.
MVPA Analysis: Weight Maps and Microstructures of Attentional Control
In order to investigate whether there were differences in the topographic patterns (microstructures) of neural activity for preparatory spatial attention versus feature attention in the DAN, SVM weight maps derived for each participant from the SVM classifiers were utilized. Importantly, these SVM weight maps were subjected to the transformation introduced by Haufe et al. (2014; see Methods section for details), which rendered the transformed SVM weight maps more functionally interpretable (we refer to these transformed SVM weight maps simply as “weight maps” in the following). Specifically, each voxel was given a signed (+ vs. −) weight value according to the weight map, identifying its contribution toward a given form of attentional control. For example, when decoding attend left versus attend right, voxels of positively signed weight values constitute the microstructure supporting covert attention to the left visual field, whereas voxels of negatively signed weight values constitute the microstructure supporting covert attention to the right visual field; the union of the voxels having positively and negatively signed weight values collectively become the microstructure of spatial attention control (provided that proper thresholding on the magnitude of the weights was applied to eliminate voxels that contained mainly noise; see below). The microstructure of feature attention control can be similarly derived from the decoding of attending red versus green. We propose that these weight maps can reveal the microstructure of attentional control activity within the DAN, enabling us to investigate whether there are differences in the patterns of brain activity that characterize spatial versus feature attention control.
To verify the proposed functional meaning of the weight maps so derived, we tested whether the signed voxels (as described above) showed the predicted increases in hemodynamic activity implied by our logic. That is, those signed as attend left, for example, should exhibit increased preparatory hemodynamic activity when the participants were attending left versus attending right, whereas those signed attend right would have larger hemodynamic responses when the participants attended right versus attended left. A similar logic applies to the feature attention condition. To accomplish this, we extracted the hemodynamic responses (beta values) for voxels coding each of the four attention trial types (i.e., attend left, attend right, attend red, and attend green). For attend left versus attend right, the attend left voxels identified by decoding had significantly higher BOLD activation for the attend left trials as compared to the attend right trials (p < .05; Figure 5A); similarly, the attend right voxels had significantly higher BOLD activation for the attend right trials as compared to the attend left trials (p < .05; Figure 5B). In contrast, for sets of voxels randomly selected to represent attend right or attend left, there was no difference in BOLD activation between the two attention conditions (Figure 5C). We pursued the same approach to evaluate the functional meaning of the weight maps derived from decoding red versus green. The result was the same: The attend red voxels had significantly higher BOLD activation for the attend red trials as compared to the attend green trials (p < .05; Figure 5D), and the attend green voxels had significantly higher BOLD activation for the attend green trials as compared to the attend red trials (p < .05; Figure 5E); again, randomly selected voxels showed no difference between the two attention conditions (Figure 5F). These results demonstrate the functional interpretability of the attentional control microstructures based on the weight maps derived from the combination of SVM classifiers and the Haufe et al. (2014) transformation.
Comparison of activation evoked by different types of attention cues in different types of decoding-identified voxels. (A) BOLD activation evoked by spatial attention cues in attend left voxels. (B) BOLD activation evoked by spatial attention cues in attend right voxels. (C) BOLD activation evoked by spatial attention cues in randomly chosen voxels. (D) BOLD activation evoked by color cues in attend red voxels. (E) BOLD activation evoked by color cues in attend green voxels. (F) BOLD activation evoked by color cues in randomly chosen voxels. *p < .05.
Comparison of activation evoked by different types of attention cues in different types of decoding-identified voxels. (A) BOLD activation evoked by spatial attention cues in attend left voxels. (B) BOLD activation evoked by spatial attention cues in attend right voxels. (C) BOLD activation evoked by spatial attention cues in randomly chosen voxels. (D) BOLD activation evoked by color cues in attend red voxels. (E) BOLD activation evoked by color cues in attend green voxels. (F) BOLD activation evoked by color cues in randomly chosen voxels. *p < .05.
As the foregoing demonstrated, the weight maps would reflect the distributed patterns of neural activity in the DAN that supported spatial or feature attentional control. In particular, we posited that these weight maps would differ according to the information attended to. To test these ideas, we examined the extent of overlap in the weight maps for spatial attention (attend left vs. attend right) and feature attention (attend red vs. attend green). To visualize the overlap, we created maps of the absolute value of the normalized weights for each participant (Figure 6). In these maps, the hotter (yellow) color indicate voxels having higher weight for the respective attention condition (i.e., spatial vs. feature attention). By comparing the yellow regions of the maps in Figure 6 for the control of spatial attention and feature attention in the frontal or parietal nodes of the DAN, one is able to examine the anatomical similarity/dissimilarity of the patterns of activity under different attention conditions within DAN. Visually, it is apparent that voxels that most strongly contributing for decoding spatial attentional control differ from those for decoding feature attentional control. Furthermore, as can be observed in Figure 6, the patterns do not cluster into discrete neuroanatomical subregions within the DAN (i.e., dorsal vs. ventral clusters), but rather are distributed across the DAN, and are different from participant to participant, further highlighting the importance of using multivariate methods to study this.
Weight maps as attentional control microstructures. Weight maps from three individual participants (A, B, and C) for spatial attention and feature attention conditions in FEF (left) and IPS/SPL (right). For Participant C, the weight maps are shown both on the unfolded left and right hemispheres and as blown-up insets; the flattened hemisphere views also show the ROIs in FEF and IPS/SPL used in each analysis (defined by the univariate analyses, see Methods section). The normalized absolute value of the weight maps are plotted to compare the strength and distribution of activity for spatial and color-based attentional control in a given region. The hotter color (yellow) indicates voxels that most clearly discriminate the respective form of attentional control. Thus, comparing the patterns of the yellow voxels in FEF in the first column maps (spatial attention) to the patterns of the yellow voxels in the second column maps (color attention) reveals the extent to which spatial attentional control and color attentional control involved similar or different patterns of voxels. The same comparisons can be made for the IPS/SPL region in the third and fourth columns. As can be seen here, in each participant, the patterns of activity related to spatial and color attention are distributed and differ, which is quantified and substantiated by the JI (see text and Figure 7). Moreover, there are considerable individual differences in functional anatomy between participants for the same form of attention control (compare rows within each column).
Weight maps as attentional control microstructures. Weight maps from three individual participants (A, B, and C) for spatial attention and feature attention conditions in FEF (left) and IPS/SPL (right). For Participant C, the weight maps are shown both on the unfolded left and right hemispheres and as blown-up insets; the flattened hemisphere views also show the ROIs in FEF and IPS/SPL used in each analysis (defined by the univariate analyses, see Methods section). The normalized absolute value of the weight maps are plotted to compare the strength and distribution of activity for spatial and color-based attentional control in a given region. The hotter color (yellow) indicates voxels that most clearly discriminate the respective form of attentional control. Thus, comparing the patterns of the yellow voxels in FEF in the first column maps (spatial attention) to the patterns of the yellow voxels in the second column maps (color attention) reveals the extent to which spatial attentional control and color attentional control involved similar or different patterns of voxels. The same comparisons can be made for the IPS/SPL region in the third and fourth columns. As can be seen here, in each participant, the patterns of activity related to spatial and color attention are distributed and differ, which is quantified and substantiated by the JI (see text and Figure 7). Moreover, there are considerable individual differences in functional anatomy between participants for the same form of attention control (compare rows within each column).
To quantify the extent of overlap/nonoverlap between the spatial and feature attention control weight maps, we computed the Jaccard index (JI) between the weight maps of the two classes of attention conditions (spatial vs. feature); the index is a measure of the extent of similarity/dissimilarity between two sets of data (Levandowsky & Winter, 1971). Specifically, for two sets of voxels, the JI represents the size of the intersection of the two sets divided by the union of the two sets. A JI of 0 (1) means there is no overlap (total overlap). As the index approaches the value of 0, the two sets overlap to a lesser and lesser extent. Because the spatial and feature attention control weight maps necessarily include all the voxels in the ROI, we computed the JI using the top 50% of voxels according to their weight magnitude from each of the two weight maps. We found that the mean JI was 0.399 ± 0.013. Although this suggests that the two maps did not overlap to a great extent, to understand the meaning of this value further, we conducted the following simulations. Our DAN ROI has 1390 voxels, 50% of which (our thresholded value—see above) is 695. We can expect that even for two random sets of 695 voxels in our ROI, there will be overlap between them. The JI value of two random sets of 695 voxels therefore provides a useful reference number. The following procedure was carried out to obtain such a reference number: (1) 695 voxels were randomly selected in the DAN ROI and assigned to be the weight map of attend space. (2) 695 voxels were randomly selected in the DAN ROI and assigned to be the weight map of attend color. (3) The JI was then computed between the two sets. (4) Steps 1–3 were repeated 1000 times. (5) The mean JI was found to be 0.335, which is slightly less than our JI of 0.399 obtained from the actual data. Thus, the JI of 0.399 in our comparison of spatial to feature weight maps, being only slightly higher than the expected overlap between random sets of voxels, can be taken to indicate that the overlap between the two attention control microstructures (attend space vs. attend feature) is limited rather than substantial.
In addition to different patterns in the distribution of spatial and feature voxels in the weight maps, different voxels within each map had different weight values. The magnitude of the weight value is an indicator of their relative significance for a given form of attention control. To further understand the extent of overlap between the spatial and feature control weight maps, we selected voxels within each weight map according to whether they fell within the top 50%, 40%, 30%, 20%, or 10% in terms of voxel weights, and calculated the corresponding JI for each selection. Figure 7A shows the results, and as can be seen, the JI declined monotonically as the weight maps became more selective—going from the top 50% to the top 10% of voxel weights—indicating reduced overlap in the functional anatomical structures for spatial and feature attention control.
Relationship between weight maps underlying spatial and feature attention control. (A) JI quantifying the overlap in weight maps in DAN between spatial and color-based attention for varying threshold of weights. It shows that for voxels with higher weight values, the overlap (JI) between weight maps (i.e., microstructures) for spatial and feature attention control becomes lower. (B) Choosing top 10% voxels from the spatial weight map and the color weight map to decode attend left versus attend right (blue bars) and attend red versus attend (red bars). In color-selective voxels, decoding accuracy of attending red versus attend green is significantly above chance level, but not for decoding attend left versus attend right (right). However, spatial-selective voxels showed significantly above chance level decoding accuracy for both attend left versus attend right and attend red versus attend green (left; *p < .01, **p < 10−4).
Relationship between weight maps underlying spatial and feature attention control. (A) JI quantifying the overlap in weight maps in DAN between spatial and color-based attention for varying threshold of weights. It shows that for voxels with higher weight values, the overlap (JI) between weight maps (i.e., microstructures) for spatial and feature attention control becomes lower. (B) Choosing top 10% voxels from the spatial weight map and the color weight map to decode attend left versus attend right (blue bars) and attend red versus attend (red bars). In color-selective voxels, decoding accuracy of attending red versus attend green is significantly above chance level, but not for decoding attend left versus attend right (right). However, spatial-selective voxels showed significantly above chance level decoding accuracy for both attend left versus attend right and attend red versus attend green (left; *p < .01, **p < 10−4).
If the reduced overlap in the higher weight voxels of spatial and color attentional control weight maps reflected that these voxels became more selective for one form of attention over the other, then spatial attention voxels might be expected to do a poor job in decoding color attention conditions and vice versa for color attention voxels. To test this, for each participant, we took the voxels whose weight values were in the top 10% for preparatory spatial attention and feature attention, rejected any overlapping voxels, and decoded attend left versus attend right as well as attend red versus attend green in each of the two sets of remaining voxels (the average number of space and color voxels chosen in this analysis across participants was 87 ± 25). We found evidence for such selectivity in feature attention voxels, which showed above chance level decoding for feature attention (56% for attend red vs. attend green, p < 10−4) but not for spatial attention (51% for attend left vs. attend right; p > .05; Figure 7B right). However, the same effect was not seen in spatial attention voxels, which showed above chance level decoding for both spatial attentions (53% for attend left vs. attend right, p < .005) and feature attention (53% for attend red vs. attend green, p < .01; Figure 7B left). These results suggest that color attention information is more widely represented in DAN voxels than spatial attention information.
MVPA Analysis: Weight Map Overlap in Subdivisions of DAN
Our weight map analyses above considered the DAN as a whole, but prior work by our group and others (e.g., Popov, Kastner, & Jensen, 2017; Liu et al., 2016) has suggested that there may be differences in the functions of the subregions of the DAN (e.g., left vs. rDAN, or anterior vs. posterior DAN). In Figure 4, we showed that the attentional control conditions can be decoded in the whole DAN as well as in subdivisions of DAN. Here, we tested the overlap between the spatial attention control weight map (top 50% voxels) and the feature attention control weight map (top 50% voxels) in each of the four subdivisions of DAN using two methods. In Method 1, voxels in a given subdivision were used for the classifier, and the resulting weight maps were compared. In Method 2, all voxels in the DAN ROI were used for the classifier and the portions of the resulting weight maps in the given subdivision were used for the overlap analysis. The JI for each method was as follows: pDAN: JI = 0.436 (Method 1) and JI = 0.408 (Method 2); aDAN: JI = 0.401 (Method 1) and JI = 0.386 (Method 2); lDAN: JI = 0.401 (Method 1) and JI = 0.410 (Method 2); rDAN: JI = 0.391 (Method 1) and JI = 0.382 (Method 2). The JI values from the two different methods in each of the four subdivisions were not significantly different from each other and were in line with that obtained using the whole DAN ROI (JI = 0.399).
Decoding Analysis of Eye Movements
To what extent could subtle systematic patterns of eye movements (e.g., microsaccades) under different cueing instructions might carry information about the attended visual attributes, which, in turn, could influence the decoding results in the DAN? That is, even though the participants were required to maintain fixation, if participants subtly differentially moved their eyes for attend right versus attend left trials or attend red versus attend green trials, and the neural correlates of such systematic eye movements contributed to the training and decoding of DAN activity, then our findings would be confounded and potentially invalid (Mostert et al., 2018). To evaluate this possibility, we analyzed and applied decoding to our eye tracking data that was recorded during fMRI scanning, considering three contrasts, to evaluate whether the participants' eye movements might have varied with condition: 1) spatial attention: attend left versus attend right, 2) feature attention: attend red versus attend green, and 3) spatial attention (collapsed over attend left and right) versus feature attention (collapsed over attend red and green). The decoding accuracy time courses are shown in Figure 8. As can be seen, for all three contrasts, at no time did the decoding accuracies become significantly different from the chance level (p > .05). Thus, we conclude that no systematic eye movements between attention conditions could have contributed to our decoding results in the DAN.
Decoding eye movements. (A) Decoding accuracy as a function of time for attend left versus attend right. (B) Decoding accuracy as a function of time for attend red versus attend green. (C) Decoding accuracy as a function of time for attend space versus attend feature (collapsed across attend left and right, and attend red and green, respectively).
Decoding eye movements. (A) Decoding accuracy as a function of time for attend left versus attend right. (B) Decoding accuracy as a function of time for attend red versus attend green. (C) Decoding accuracy as a function of time for attend space versus attend feature (collapsed across attend left and right, and attend red and green, respectively).
DISCUSSION
We examined whether there exists a functional microstructure of preparatory attentional control within the DAN. Applying MVPA analysis to BOLD signals in the DAN, we found that (1) the accuracy of decoding attentional control activity for spatial attention (attend left vs. right) and feature attention (attend red versus green) was above chance in the DAN as a whole, as well as in the major subdivisions of the DAN, namely, lDAN, rDAN, anterior (frontal) DAN, and posterior (parietal) DAN; (2) weight maps obtained from combining SVM classifiers with Haufe et al. (2014) transformation differed both qualitatively (visual inspection) and quantitatively (JI) for attentional control of spatial versus feature attention; (3) the overlap between the two weight maps corresponding to the two types of attentional control is limited and not much different from the expected overlap between two random sets of voxels; (4) the overlap between voxels of the weight maps selected according to their weight values decreased monotonically as the weight threshold of voxel inclusion increased; and (5) the top 10% of voxels in the color attention weight map decoded above chance for color conditions (attend red vs. attend green) but not for spatial conditions (attend left vs. attend right), whereas the top 10% of voxels in the spatial attention weight map decoded above chance for both forms of attention, suggesting perhaps that spatial attention information is more concentrated in fewer voxels than is color attention information.
Our results provide information about how top–down attentional control signals are organized within the DAN and argue against strong domain-general (Spagna et al., 2015; Fedorenko et al., 2013; Wojciulik & Kanwisher, 1999) or supramodal models of the DAN (Betti et al., 2018; Salmela et al., 2018; Wang et al., 2016; Green et al., 2011; Shomstein & Yantis, 2004). Moreover, our findings provide a basis for understanding the specificity of attentional control mechanisms in the DAN by suggesting that functional microstructures for attentional control could serve as the sources of precise top–down projections to sensory structures as a function of stimulus processing requirements according to behavioral goals.
The MVPA approach used here helps to clarify past findings of little or minimal specialization in the DAN for different forms of attentional control based on univariate fMRI analysis methods, including in our own work (Slagter et al., 2007; Giesbrecht et al., 2003), and that of others (Egner et al., 2008; Vandenberghe, Gitelman, Parrish, & Mesulam, 2001; Wojciulik & Kanwisher, 1999). Indeed, there are good reasons—both theoretical and empirical—to propose that specializations within the DAN should exist for different forms of top–down attentional control. From studies in animals, we know that in the FEFs, there are different classes of colocalized neurons with different functional roles and that these neurons project to different cortical and subcortical targets for the control of eye movements (Pouget et al., 2009; Armstrong, Fitzgerald, & Moore, 2006). In addition to the evidence from animal studies, in humans, TMS applied to parietal cortex using different stimulation parameters produces distinct effects on spatial versus feature-based attention (Schenkluhn, Ruff, Heinen, & Chambers, 2008). As well, combined TMS and EEG research has shown that the functional connectivity between the DAN (FEF) and higher-order visual areas, such as the fusiform face area and human motion-specific cortex, shifts with behavioral task requirements (Morishima et al., 2009).
The DAN itself can be divided into multiple different functional zones (Szczepanski et al., 2013; Silver & Kastner, 2009; Sereno, Pitzalis, & Martinez, 2001). The DAN is also known to have different intra-DAN connectivity for different forms of spatial attention, such as for viewer- and object-centered spatial attention (Szczepanski et al., 2013). Furthermore, prior work using MVPA has also suggested differences in the organization of attentional control for different stimulus attributes (Liu & Hou, 2013; Greenberg et al., 2010). We observed distinct neural patterns characterizing different forms of attention control in the DAN, as well as in major subdivisions of the DAN. Our findings significantly advance models of top–down attentional control by isolating and focusing on preparatory brain activity in the cue–target interval. In contrast to our focus on preparatory attention, Greenberg et al. (2010) investigated shifts of attention that occurred while covert spatial and feature attention was being sustained over time to ongoing stimulation (e.g., moving dots). In their paradigm, specific changes in the direction of motion (e.g., up vs. down) of attended color dots (e.g., green) at the attended location (left or right hemifield) signaled (cued) the participants to either shift attention from the attended location to the moving colored dots in the opposite visual hemifield, or to maintain attention at the attended location but shift feature attention (e.g., from attending green to attending red moving dots). They found that whereas univariate analyses showed similarity in the patterns of activity within the DAN, MVPA suggested differences between the frontal portions of the DAN and the parietal regions during attentional shifts; the parietal cortex differed for spatial and feature attention shifts, but frontal regions largely did not (but see Supplementary Materials in Greenberg et al., 2010). They also argued for specialization in attentional control by suggesting that interleaved populations of control neurons were present within the posterior parietal cortex for different forms of attentional control (spatial vs. features). The important findings of Greenberg et al. (2010) differ in essential ways from our results in that they investigated shifts of attention (cued by visual signals in an ongoing stimulus display of moving dots), whereas we focus here on preparatory attention to impending visual targets (cued by an auditory signal prior to the appearance of relevant targets). As a result, in part, different cognitive operations are revealed in their task and analyses compared to ours (i.e., switching attention vs. preparatory attention). Nonetheless, at a less granular level of view, both we and Greenberg et al. (2010) argue for specializations in attentional control, as opposed to more domain-general models of attentional control, in the DAN.
Liu and Hou (2013) also investigated the DAN for specializations in attentional control, describing a hierarchy of attentional control in the DAN. They cued (auditorily) participants to attend to the location, color, or motion of moving dot patterns located in the lateral visual fields. Using MVPA methods, they found a hierarchical structure in the effects of attention such that the patterns for attending to spatial locations differed from those for attending to features. Furthermore, the patterns for feature attention (color vs. motion) could be segregated. Their study did not, however, distinguish between preparatory attentional control and selective stimulus processing, as we have done here, because, in their design, the analyses necessarily focused on the period during which the dynamic target stimuli were already in view. As a result, the very important work of Liu and Hou (2013) may include effects driven by the interactions between attentional control and attentional selection of the incoming sensory signals. Once again, the cognitive operations revealed in our study are related purely to the preparatory control component of visual selective attention. Nonetheless, our work and that of Liu and Hou (2013) converge on the idea that the DAN contains functional specializations for different aspects of top–down attentional control.
In this study, we used weight maps derived from combining SVM with the Haufe et al. (2014) transformation to help understand the microstructure of attentional control for spatial and feature (color) attention. Regarding this approach, there are two points requiring contemplation. First, the SVM weight vector, which represents the vector normal to the hyperplane optimally dividing two experimental conditions, can be projected onto individual voxels and visualized as a brain map. Such functional brain maps, however, may be difficult to interpret functionally because fMRI voxels that do not contain task-related information can play a significant role in decoding performance as the result of noise cancellation (Kriegeskorte & Douglas, 2019). The transformation suggested by Haufe et al. (2014) helps to mitigate this problem. The weight maps in our study, which are the basis for defining attentional control microstructures, are obtained after applying the Haufe et al. (2014) transformation to the SVM weight vectors. The weight maps so obtained have clearly defined functional meanings, as we demonstrated by comparing the corresponding hemodynamic responses between different conditions (Figure 5), and they also show distributed voxel patterns that are variable across participants within the same ROI (Figure 6; Guntupalli et al., 2016). The latter likely contributes to why univariate analyses have failed to consistently uncover differences in preparatory spatial compared to feature attentional control within the DAN, both in the present univariate analyses (Figure 3) and previous univariate analyses (Slagter et al., 2007; Giesbrecht et al., 2003); in the univariate analysis, to be deemed activated by an experimental condition, a voxel needs to be activated consistently across participants. Second, to assess the extent of overlap between the weight maps of spatial versus feature attention control, we utilized the JI approach. Choosing voxels corresponding to top 50% weights for each weight map, we calculated the JI value between spatial and feature maps and found that the two weight maps were similar (slightly higher) in overlap to what would be expected from the overlap between two random sets of voxels in our data. Consequently, we suggest that the microstructures underlying the two types of attentional control in the DAN have limited functional anatomical overlap, as opposed to substantial functional anatomical overlap.
We interpret the patterns of activity revealed by MVPA decoding to be reflections of an underlying difference in the functional representation of attentional control in the DAN. This notion is supported by the results in Figure 5, where it is shown, for example, that, in attend left voxels defined by decoding weights, the BOLD signals are higher for attend left than for attend right conditions; moreover, for example, in attend red voxels defined by decoding, BOLD signals are higher for attend red than for attend green conditions. The same was of course true for the inverse cases (i.e., for attend right voxels, and attend green voxels, as defined by decoding weights). Despite the strong functional interpretability of these attentional control microstructures, we do not, however, suggest that the voxel-based weight maps are identifying specific underlying circuitry, or subnetworks, per se. One might have hoped to reveal consistent differences in activity such that a subregional organization would be revealed that was generally consistent across participants, such as anterior–posterior, dorsal–ventral, or medial–lateral gradients, as have been observed in decoding high-level object categories in ventral–temporal cortex (e.g., Connolly et al., 2012), but that is not what was observed.
Our general model framework posits that the spatial and feature attentional control signals in the DAN project their top–down influences selectively to specific visual cortical regions to influence perceptual processing for the to-be-attended stimulus attributes (i.e., location versus color). It therefore follows that specific top–down connectivity necessarily complements the differing patters of activity in the DAN that support selective attentional control for spatial versus feature attention. That said, from the current data, we cannot distinguish between the representation of spatial location and color information in the DAN (e.g., working memory representations; see below) from activity corresponding to the output signals themselves, which in any case is a challenging proposition in human imaging studies. Regardless of whether the different spatial and feature attentional control activities we observed in the DAN are primarily representational or primarily output, or perhaps both in a parsimonious model, our findings shed light on the organization of the DAN for purely preparatory (anticipatory) attention that was not evident in prior work using univariate analyses of BOLD signals (Slagter et al., 2007; Giesbrecht et al., 2003).
Most models of attention include both working memory and attention as components of attentive behavior, and there is a long history considering the relationships between the two (e.g., Oberauer, 2019; Gazzaley & Nobre, 2012; Lewis-Peacock, Drysdale, Oberauer, & Postle, 2012; Awh & Jonides, 2001). There are many different ways that working memory has been conceptualized (Cowan, 2017), but two main considerations apply most directly here. The first is the issue of whether there are differences in the working memory load between conditions, which might contribute to differences in the patterns of our effects. In order to avoid such confounds, we were careful to balance task difficulty and working memory load between spatial attention and feature attention trials, as evidenced by the RTs and accuracies in the two different conditions, which were not significantly different from one another (Figure 2A). Moreover, the working memory load in our study was relatively low, being one of two alternatives (left vs. right; red vs. green) in each condition. The second consideration is whether the activation patterns we reveal using decoding in the DAN are related to working memory maintenance/representations in the DAN versus attentional control signals in the DAN, as mentioned earlier. Because our experiment was not designed to test for differences between working memory and attentional control, we cannot distinguish between the two. Indeed, our overarching model holds that, on a trial-by-trial basis, working memory is necessarily involved in performing an attention task such as ours (and indeed, perhaps even more so given that the cue–target delay can be several seconds). However, whether working memory is maintained actively during the period, or simply used to implement the attentional control (or whether they are indeed the same thing) is a fascinating question we cannot answer. It has been known for some time (e.g., LaBar, Gitelman, Parrish, & Mesulam, 1999) that there is overlap in (univariate) neural activations for nonspatial working memory and spatial attention in the DAN. More recent work, however, has shown dissociations between spatial attention and working memory storage (Hakim, Adam, Gunseli, Awh, & Vogel, 2019; Lewis-Peacock et al., 2012; Lewis-Peacock & Postle, 2012). What we can say from our findings is that the DAN is activated during covert attention (i.e., the cue–target interval), which may, but does not theoretically have to, include a distinct working memory activity (Lewis-Peacock & Postle, 2012); such effects may be task-specific (i.e., whether or not working memory maintenance is integral to task performance). It is nonetheless reasonable, in a design like ours, to assume that working memory is required and active. We also know, however, that working memory representation can be decoded in the DAN (Xu, 2017; Christophel, Hebart, & Haynes, 2012), and therefore, because we did not test this directly, it remains possible that our decoding for spatial attention and feature attention could reflect working memory as well as attentional control signals, assuming that these are actually even different cognitive-neural operations within the DAN. That is, in many models, working memory and attention are not distinct mechanisms (e.g., Awh & Jonides, 2001), whereas in more current models, they are integrally related processes, one (working memory) dependent on the other (attention; Foster, Vogel, & Awh, in press; Hakim et al., 2019). Given that attended information and remembered information are both decodable in the DAN and, therefore, that the DAN is a common neural substrate for both covert attention and working memory, one may reasonably ask whether, at the microstructural level, the voxels supporting attend left versus attend right (or attend red vs. attend green) also support remember left versus remember right (or remember red vs. remember green). In this sense, for future studies, our framework offers a novel avenue to pursue the relation between attention and working memory, a long-standing problem in cognitive neuroscience.
In conclusion, this study offers new evidence from MVPA and brain mapping approaches that supports the idea of functional–anatomical specialization (microstructures) underlying the control of different forms of preparatory attention in both frontal and parietal portions of the DAN. In the model, we hypothesize that specialized microstructures have specific output connections for the top–down control of those regions of visual cortex coding the relevant (to-be-attended) stimulus attribute(s), thereby selectively biasing the processing of incoming sensory information in the service of goal-directed behaviors.
Acknowledgments
This work was supported by National Institute of Mental Health grant MH117991 (G. R. M. and M. D.) and National Science Foundation grant BCS-1439188 (M. D.). We are grateful to Steve Luck, Joy Geng, John Henderson, Sean Noah, Edward Awh, Karl Friston, Tamara Swaab, and the members of our laboratories for their helpful comments and advice. All data will be publicly available on the National Institute of Mental Health Data Archive.
Reprint requests should be sent to Mingzhou Ding, J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, BME 149, PO Box 116131, Gainesville, FL 32611, or via e-mail: [email protected], or George R. Mangun, Center for Mind and Brain, University of California, Davis, CA 95618, or via e-mail: [email protected].
Funding Information
George R. Mangun, National Institute of Mental Health (http://dx.doi.org/10.13039/100000025), grant number: MH117991. Mingzhou Ding, National Institute of Mental Health (http://dx.doi.org/10.13039/100000025), grant number: MH117991. Mingzhou Ding, National Science Foundation (http://dx.doi.org/10.13039/100000001), grant number: BCS-1439188.
Diversity in Citation Practices
A retrospective analysis of the citations in every article published in this journal from 2010 to 2020 has revealed a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .408, W(oman)/M = .335, M/W = .108, and W/W = .149, the comparable proportions for the articles that these authorship teams cited were M/M = .579, W/M = .243, M/W = .102, and W/W = .076 (Fulvio et al., JoCN, 33:1, pp. 3–7). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance.