Organisms operate within both a perceptual domain of objects and events, and a mnemonic domain of past experiences and future goals. Each domain requires a deliberate selection of task-relevant information, through deployments of external (perceptual) and internal (mnemonic) attention, respectively. Little is known about the control of attention shifts in working memory, or whether voluntary control of attention in these two domains is subserved by a common or by distinct functional networks. We used human fMRI to examine the neural basis of cognitive control while participants shifted attention in vision and in working memory. We found that these acts of control recruit in common a subset of the dorsal fronto-parietal attentional control network, including the medial superior parietal lobule, intraparietal sulcus, and superior frontal sulcus/gyrus. Event-related multivoxel pattern classification reveals, however, that these regions exhibit distinct spatio-temporal patterns of neural activity during internal and external shifts of attention, respectively. These findings constrain theoretical accounts of selection in working memory and perception by showing that populations of neurons in dorsal fronto-parietal network regions exhibit selective tuning for acts of cognitive control in different cognitive domains.
To achieve behavioral goals, organisms must perceive objects and events in the local environment, encode the perceived information into working memory (WM), and evaluate, manipulate, categorize, or otherwise make use of that information in accordance with their goals. Selection of goal-relevant perceptual and mnemonic information is required for flexible, dynamic behavior in order to avoid perseveration on a single item or task. Two broad domains of selection can be distinguished: external selection of sensory information to be perceived and encoded into memory (achieved by selective perceptual attention), and internal selection of goal-relevant information held in WM (achieved by selective mnemonic attention).
External selection is widely viewed as reflecting the integration of bottom–up (stimulus-driven) and top–down (goal-driven) biases that influence the competition for representation among stimuli in the sensorium (Serences & Yantis, 2006; Desimone & Duncan, 1995). Selective visual attention improves perceptual performance concerning the attended item or location (Egeth & Yantis, 1997; Posner, 1980) and modulates neural activity in corresponding regions of extrastriate visual cortex (Kelley, Serences, Giesbrecht, & Yantis, 2008; Reynolds, Chelazzi, & Desimone, 1999). Perceptual attention is not limited to visuospatial selection, but extends to nonspatial (e.g., object or feature) and nonvisual (e.g., auditory) domains as well (e.g., Melcher, Papathomas, & Vidnyanszky, 2005; Liu, Slotnick, Serences, & Yantis, 2003; Muller & Kleinschmidt, 2003; O'Craven, Downing, & Kanwisher, 1999; Egly, Driver, & Rafal, 1994; Shomstein & Yantis, 2004b, 2006; Saenz, Buracas, & Boynton, 2002).
Although the effects of attention may be observed in behavior and in the modulation of activity in sensory brain regions, the control of attention involves coordinated activity of prefrontal and parietal brain regions that initiate the transition from one attentive state to another or maintain attention to relevant information in the face of competing irrelevant information, as dictated by behavioral goals (Corbetta & Shulman, 2002). In particular, voluntary shifts of covert visuospatial attention are associated with transient increases in cortical activity in the medial superior parietal lobule (mSPL). This mSPL activity is thought to reflect a top–down attentional control signal that initiates cortical reconfiguration needed to redeploy attention (Shulman et al., 2009; Kelley et al., 2008; Yantis et al., 2002; Vandenberghe, Gitelman, Parrish, & Mesulam, 2001). Similar mSPL control sources are thought to initiate shifts of attention in visual and nonvisuospatial domains (Shomstein & Yantis, 2004a, 2004b, 2006). Conversely, maintaining attention in the face of distraction is associated with sustained activity in retinotopically organized regions of the intraparietal sulcus and frontal eye fields, thought to reflect an attentional priority map that maintains a given state of attention (e.g., Serences & Yantis, 2007; Silver, Ress, & Heeger, 2005; Bisley & Goldberg, 2003; Corbetta, Kincade, Ollinger, McAvoy, & Shulman, 2000).
Although investigations of selective attention have focused on resolving perceptual competition, virtually any cognitive act requires resolution of conflict between competing memories, goals, plans, and behaviors (Courtney, 2004; Baddeley, 2003; Miller & Cohen, 2001; Ericsson & Delaney, 1999). Within WM, some items are prioritized for carrying out the immediate cognitive operation, whereas others are maintained with lower priority for future use, necessitating a reconfiguration of priority before they are fully available to cognition. For instance, when subjects are required to update one of two WM items, updating different items on successive trials takes longer than updating the same item repeatedly (Gehring, Bryck, Jonides, Albin, & Badre, 2003; Garavan, 1998). Similarly, subjects are better at reporting two features of a single object than one feature from each of two objects, not only in the perceptual domain (Duncan, 1984), but also in WM (Awh, Vogel, & Oh, 2006; Awh, Dhaliwal, Christensen, & Matsukura, 2001). Thus, WM items are not all equally accessible at any one time, suggesting a limited-capacity process must shift internal selection among items in WM. This is critical both in unitary store models with an explicit “focus of attention” (Jonides et al., 2008; Verhaeghen et al., 2007; McElree, 2006; Oberauer, 2002, 2003; Cowan, 1995) and in multipartite models that postulate a “central executive” to manipulate information in an item-wise fashion (Baddeley, 1996, 2003; Baddeley & Hitch, 1974).
Because there are no clearly separable neural bases for different WM representations within the same domain (as there are in perception, e.g., left and right extrastriate cortex for vision), it is difficult to observe the neural effects of the deployment of attention within WM. However, the sources of control of attention in WM are, in principle, more readily investigated. Several studies have included both a WM control task and a perceptual control task (Nee & Jonides, 2008; Lepsien & Nobre, 2007; LaBar, Gitelman, Parrish, & Mesulam, 1999), or have compared WM selection to various aspects of attentional control (e.g., ambiguity of a perceptual cue, Bledowski, Rahm, & Rowe, 2009; control under varying levels of distraction, Nee & Jonides, 2009). Other studies have reported similar neural activity for visuospatial perceptual selection and visuospatial WM selection (Kuo, Rao, Lepsien, & Nobre, 2009; Nobre et al., 2004). Finally, some studies have suggested that internal and external shifts of selection are carried out via domain-specific brain mechanisms with little (e.g., Rushworth, Paus, & Sipila, 2001) or no (e.g., Ravizza & Carter, 2008) spatial overlap in neural activation. These studies have contributed greatly to our understanding of cognitive control in WM. However, it remains unclear to what degree shifts of attention among perceptual and (nonperceptual) mnemonic representations—two highly specific, basic cognitive processes—are carried out by domain-specific mechanisms, or if they share the same domain-independent cortical control machinery.
To address this key unanswered question, we devised a behavioral task requiring interleaved shifts of external perceptual and internal mnemonic attention. In order to extend our knowledge of WM control beyond the visuospatial domain, we used a task that required WM for semantic information without any visual or spatial component, and compared shifts among semantic WM items to shifts of visuospatial attention. Specifically, observers shifted perceptual attention between two rapid serial visual presentation (RSVP) streams of letters and (at distinct times) shifted mnemonic attention between two counters held in WM, according to cue letters embedded within the RSVP streams. We report evidence from univariate general linear model (GLM) analysis of fMRI data that shifts of internal and external attentional selection recruit a partially overlapping domain-independent cortical control network. Event-related multivoxel pattern classification (MVPC) revealed, however, that perceptual and mnemonic shifts of attention evoke reliably distinct spatio-temporal patterns of activity within the cortical control network.
Seven subjects (3 women; mean age = 22.4 years, range = 19–25 years) participated in the fMRI experiment following several 1-hr practice sessions outside the scanner. The data from one subject were excluded from further analysis due to excessive head motion. All subjects gave written informed consent as approved by the Johns Hopkins Medical Institutions Institutional Review Board.
Subjects executed an interleaved series of perceptual shifts between visual items and mnemonic shifts between two numeric counters held in WM. Specifically, subjects continuously fixated a central white dot while observing a display consisting of six white RSVP streams of letters on a black background (Figure 1). Two cue streams fell on the horizontal visual meridian, at 10° of visual angle to the left and right of the fixation point. Above and below each cue stream, at the same distance from fixation, was a distractor RSVP stream which was placed to provide perceptual competition and to maximize visual attention effects. The four distractor streams never contained cue letters. All RSVP streams were presented synchronously at a rate of four items per second (250 msec per frame with no temporal gap). RSVP items were 3.1° of visual angle in height and approximately 2.6° in width.
All streams consisted mostly of filler items selected randomly from the entire alphabet, excluding the letters “G,” “I,” “O,” “Q,” “R,” and “W,” and the four cue letters, “C,” “H,” “L,” and “P.” The cue letter “L” (location shift) instructed a shift of visuospatial attention from the attended cue RSVP stream to the other cue RSVP stream (left-to-right or right-to-left). The cue letter “C” (counter shift) instructed a shift of mnemonic attention from one counter to the other counter being maintained in WM. The cue letter “P” (increment counter, or “plus”) instructed subjects to add one to the selected counter's value. The cue letter “H” (hold) instructed subjects to maintain the states of visual attention, counter selection, and counter values. Subjects were instructed to press both of the two buttons they held (one in each hand) whenever they detected any of the four cue letters. Cues appeared only in the attended RSVP stream. Hold cues were, in every way, identical to the distractor items, except that they occurred with the same frequency as and served as a motor control for the other cue types; they were specific neither to the state of visuospatial attention nor mnemonic attention, but subjects were instructed not to modify the state of either. Subjects were instructed not to shift attention even if they thought they had missed a cue letter; the cues would eventually appear in the attended RSVP stream.
Subjects were told at the start of each run to direct attention covertly to one of the cue streams (the starting stream was alternated by run). They also maintained two counters in WM, whose values were initially set to zero. Before the start of the RSVP streams, a blank display was presented for 1 sec, followed by the fixation dot alone for 2 sec. The RSVP streams then began; 2 sec after the start of the RSVP streams, a “P” appeared in the attended stream, instructing an increment of the first counter. From that point on, the order of cue letter presentations was random. Cue letters appeared with a random interstimulus interval (ISI) of 2.5, 4, 5.5, or 7 sec; the average ISI was 4.75 sec. All four cue letters and all four ISIs appeared an equal number of times in each run—15, 16, or 17 times. Runs ended with an additional 2 sec in which the fixation dot was displayed on its own. The run length thus varied as a result of the number of items in the run. This variable run length was imposed in order to prevent subjects from knowing in advance the correct sum of the values of the two counters and then implementing an undesirable strategy whereby the value of just one counter was tracked, and the value of the other counter was derived at the end of the run. At the end of the run, subjects verbally reported the values of the two counters.
Image Acquisition and Analysis
Each subject participated in three scanning sessions, each conducted on a separate day and lasting approximately 2 hr. This yielded approximately 28 functional runs of the task and 3 structural images (one per session) for each subject. Although this provided sufficient statistical power to identify regions replicating prior attention-shifting findings in a group GLM analysis (see Results), it also provided a substantial dataset from each participant to effectively perform the within-subject MVPC analysis.
Stimuli were back-projected onto a screen at the head of the scanner bore using an Epson PowerLite 7600p LCD projector with a custom lens (Buhl Optical, Pittsburgh, PA) inserted into a wave guide. The stimuli were visible via a custom mirror mounted on the head coil. Button-press responses were collected via custom-built fiber-optic button boxes during fMRI acquisition.
All scans were performed using a 3.0-T Philips Intera MRI scanner at the F.M. Kirby Center for Functional Brain Imaging (Kennedy Krieger Institute, Baltimore, MD) equipped with a custom SENSE 6-channel parallel-imaging head coil (MRI Devices, Waukesha, WI). A custom-built bite bar mounted to the head coil was used to minimize head motion. Subjects could disengage from the bite bar at will, however, and did so at the end of each run in order to verbally report counter values to the experimenter.
During one scan session (typically, the first) for each subject, a 200-slice coronal MP-RAGE 1-mm isotropic structural scan was acquired (TR = 8.09 msec; TE = 3.8 msec; flip angle = 8°; FOV = 256 mm2). During the other two sessions, identical MP-RAGE scans were acquired for the purpose of coregistration, except that parallel acquisition (SENSE factor = 2) was used, trading gray–white contrast for a reduced acquisition time.
Echo-planar images (EPIs) were acquired using thirty-five 3-mm axial–oblique slices (no gap) angled in order to capture occipital, parietal, superior temporal, and all but extremely ventral frontal cortex. The field of view was 192 × 192 mm, with a 64 × 64 matrix, yielding an in-plane resolution of 3 × 3 mm. A complete volume was acquired every 2 sec (TR = 2 sec; TE = 30 msec; flip angle = 70°). In each run, the task lasted 290, 309, or 328 sec (depending on the number of cues per run; see above). In each run, 164 volumes were acquired, and volumes acquired after task completion were discarded. Four volumes were acquired before the start of the task in order to allow saturation of the MR signal before data acquisition. The first of these otherwise unused volumes was used for alignment and coregistration.
Preprocessing was performed using BrainVoyager QX (Brain Innovation BV, Maastricht, The Netherlands). For GLM analyses, each subject's high-contrast structural scan was rigidly transformed such that the anterior commissure (AC) lay at the center of the image, and the line from the AC to the posterior commissure (PC) was horizontal (AC–PC alignment). Low-contrast structural scans from the other two days were aligned to this transformed high-contrast scan. The first pretask volume of one EPI from each session was aligned to that session's structural scan. The transformation of the structural scans and the registration of the EPI together yielded a transformation matrix that was used to place all functional runs from all sessions for a single subject into a common space. Within each session, all functional volumes were motion-corrected to the volume used for structural–functional coregistration, corrected for slice acquisition time within the volume acquisition window, spatially smoothed (Gaussian kernel, 4 mm FWHM), and temporally filtered (high-pass, 3 cycles per run; low-pass, Gaussian kernel 2.0 sec FWHM).
The high-contrast structural scan for each subject was warped to Talairach and Tournoux (1988) space, and the resulting warp was applied to each EPI volume after alignment to the common space.
MVPC was carried out separately for each subject, but with several preprocessing steps omitted or modified. The lower-contrast structural scans from 2 days were rigidly aligned to the AC–PC space of the high-contrast structural scan from the remaining day. Functional data were motion-corrected to the volume used for structural–functional coregistration and were corrected for slice acquisition time within the volume acquisition window. Images were not spatially smoothed. Temporal filtering consisted of high-pass filtering at 3 cycles per run.
General Linear Model
A standard GLM approach (Friston, Frith, Turner, & Frackowiak, 1995), with subjects modeled as a random effect, was used for the initial analysis. Functional data were transformed to percent signal change relative to the mean of the run. Events were modeled as 250-msec boxcars (equal to the duration of each cue letter) convolved with a standard hemodynamic response function (Boynton, Engel, Glover, & Heeger, 1996).
An initial GLM was carried out using block regressors that modeled the variable-length epochs during which attention was directed to the left or right visual field. This analysis was conducted to confirm that subjects allocated spatial attention as instructed and to check that spatial attention was not allocated based on the deployment of mnemonic attention. Epochs of left and right spatial attention were directly contrasted in this whole-brain GLM in order to examine the effects of spatial attention in extrastriate visual cortex (see Figure 3).
A completely separate GLM was run in order to analyze shift-related activity. This GLM proceeded by defining the following regressors of interest, which were modeled according to the moment in time during the task that each of these cue letters appeared: location shifts from left to right, location shifts from right to left, counter shifts from Counter A (the first counter to be incremented, 2 sec into the RSVP streams of each run) to Counter B, counter shifts from Counter B to Counter A, and holds (i.e., motor control for the shift cues). Note that only correctly detected cues were included in the regressors of interest; both to reduce noise in general and because hold cues were detected at a lower rate than other cue types, error trials were modeled separately as regressors of no interest. Specifically, regressors were also included for location shift misses, counter shift misses, increment misses, and hold misses, as well as correctly detected increments. A set of contrasts was carried out in order to identify regions involved in visuospatial perceptual, mnemonic, or both types of attentional control (location shifts vs. holds, counter shifts vs. holds, or the random effects of the within-subject conjunction of the two types of control contrasts, respectively).
Increment trials required subjects to update the counters frequently. This ensured that the counters were maintained in WM and could not efficiently be stored in long-term memory. Increments were modeled in the GLM, but are not events of interest for the purposes of this study because they involve several operations above and beyond the cognitive control needed to update WM, and thus, are beyond the scope of this study.
All statistical data were thresholded at a t value of 4.00, corresponding to a voxelwise alpha of .0103. The data were then corrected for multiple comparisons via a cluster size threshold of 11 functional voxels (297 mm3), corresponding to an alpha of .0005 as determined using the BrainVoyager Cluster Correction Plug-in.
The event-related mean BOLD signal within ROIs identified via GLM contrasts were computed to examine the BOLD time course for each type of event of interest in those regions. BOLD time courses locked to the event of interest were calculated for location shifts in each direction, counter shifts in each direction, and holds.
Multivoxel Pattern Classification
We used MVPC to examine voxelwise patterns of cortical activity in regions that exhibited significant activation for both visuospatial and mnemonic attention shifts according to a conjunction contrast in the GLM. A support vector machine, the OSU SVM toolbox (Ma, Zhao, Ahalt, & Eads, 2003; a Matlab adaptation of libsvm available at www.csie.ntu.edu.tw/∼cjlin/libsvm/), was used to compute weights that, when multiplied by the magnitude of the BOLD signal in each voxel and summed, could be used to predict which of two shift types—location shift or counter shift—occurred on that trial. The SVM was run for each subject separately in order to generate a set of single-subject linear classifiers. The probability of correct classification (using a leave-one-run-out cross-validation procedure) was compared to a statistical threshold obtained through a permutation test in which the trial type labels were randomized and the analysis was repeated 1000 times. The probability of correct classification in the permutation test had a mean near .5, and the 95th percentile of this null distribution (typically producing a probability of correct classification of about .53) was taken as the criterion for significantly above-chance classification performance. One parameter of the SVM that may be arbitrarily chosen or empirically determined is known as C, the cost or penalty for classification errors within the training data. We chose a C value of 2−7, but the qualitative pattern of results was preserved across a wide range of C values. To avoid biasing the results of classification, training and testing must be carried out on independent datasets (Vul & Kanwisher, 2010). Therefore, the classifier was trained on all but one run for a subject and then tested on the remaining run. A new classifier was then calculated with a different run left out, and the new classifier was tested on that run. This leave-one-run-out cross-validation procedure was continued until all runs had been used as the test dataset once.
We selected the voxels to be used in classification using an independent dataset to avoid statistical bias in the analyses (Vul & Kanwisher, 2010). The regions used for each subject were defined by running a new GLM on the Talairach-transformed data, including all subjects in the group except the subject under analysis (Esterman, Tamber-Rosenau, Chiu, & Yantis, 2010). The same conjunction of contrasts used in the main GLM (spatial attention shift vs. hold and WM attention shift vs. hold), recalculated for the leave-one-subject-out dataset, was used to select voxels for pattern classification. Both because this analysis included fewer subjects (and thus had lower power) and because this analysis served only to select voxels for an independent secondary test, a more liberal statistical threshold was employed. Thus, the leave-one-subject-out GLMs were thresholded at a t value of 3.495, corresponding to a voxelwise alpha of .025, and were subjected to a cluster threshold of 13 functional voxels (351 mm3), corresponding to an alpha of .0528. Only those regions in the same anatomical neighborhood (Esterman et al., 2010) to the regions that passed the original conjunction analysis were submitted to pattern classification (see Results; for a full discussion of region selection using this method and why it does not lead to biased results, see Esterman et al., 2010). Here, we have taken a conservative approach within that framework. Regions identified via the leave-one-subject-out GLM were then unwarped into the original brain space of the left-out subject, and were used for that subject's classifier. A separate classifier was run for each time point from 4 TRs before to 7 TRs after the volume acquisition nearest to the event of interest. This yielded an event-related MVPC (er-MVPC) time course for each region, separately for each subject. In addition to the between-domains er-MVPC, we used identical methods on the same ROIs to generate MVPCs distinguishing within-domain shifts, both for location shifts (left to right vs. right to left) and counter shifts (Counter A to Counter B vs. Counter B to Counter A).
To ensure that MVPC results were not driven by differences in the mean activation across conditions, the er-MVPC analysis was repeated on data that had been mean-centered. Mean-centering involves subtracting the condition-wise mean across features in an SVM from each feature in that condition. The mean-centered analysis was identical to the main er-MVPC analysis in every other respect.
In order to rule out the possibility that consistent subregions of each ROI were more sensitive to either location or counter shifts, we back-projected classifier weights from the between-domains MVPC into the brain. Classifier weights from 6 sec (3 TRs) after cue presentation, an epoch chosen for high classifier performance, were binarized into location shift-selective and counter shift-selective weights. Back-projection was performed separately for each subject-specific ROI because each subject/ROI had a completely independent classifier.
As in prior studies using similar counting paradigms (e.g., Voigt & Hagendorf, 2002; Garavan, 1998), it is not possible to measure item-by-item accuracy because the counter values are reported only at the end of a series of updates. Furthermore, in order to optimize our paradigm for fMRI (avoiding very short runs), we used relatively few comparatively long sequences of updates. Long runs, of course, result in less accurate final counter value reports. Thus, we report subjects' performance as both the percentage of cues that subjects successfully reported seeing (via button press) and their success in reporting the objectively correct counter values at the end of a series.
Responses to cue items were classified as correct if a button press was made within 2.5 sec of the cue, and as a miss otherwise. Rates of cue detection responses for each of the four types of cues are shown in Figure 2A. The rate of hold cue detection (75%) fell below those of the other cue types (92%–97%), perhaps because hold cues were less behaviorally relevant, as they necessarily did not affect counter values or the deployment of attention. Missed cues were modeled as regressors of no interest, but were not otherwise included in the fMRI analysis.
Deviations from the objectively correct counter values are presented in Figure 2B and deviations from the counter values conditioned on correct cue detection are presented in Figure 2C. Deviations are binned by first computing, separately for each of the two counters, the number of runs in which the reported counter differed from the correct or conditional value by a given amount. This number is then converted to a percentage of the total runs for a given subject. The percentages from the two counters are then averaged to produce the average deviation percentage for a subject. Averages and standard errors presented in Figure 2B and C were then computed from these averaged deviation percentages across subjects.
In addition to failures to increment a counter or shift attention, subjects may make errors at the report stage. One common error was the reversal of the counter identities. Figure 2D and E shows deviations from the correct and conditional counter values, respectively, when order of report is not considered an error. To arrive at these figures, whichever counter order minimized the summed (across both counters) absolute deviation from the expected counter values was taken as the response and compared to the correct or conditional counter values. Further calculations are then identical to those presented in Figure 2B and C.
Effects of Cognitive Control: Modulation of Domain-specific Cortical Activity
In order to examine the effects of attention in visual cortex, a separate GLM was run in which regressors that reflected the variable-length epochs of time during which visuospatial attention was directed to the left and right sides of space, respectively, were contrasted with one another (Figure 3; see Methods). As expected, activity in extrastriate visual cortex was greater when attention was directed to the contralateral visual hemifield than when it was directed to the ipsilateral hemifield. The event-related time courses shown in Figure 3 depict the magnitude of the BOLD signal in the regions identified via the contrast in the block GLM for events in the main (event-related) GLM. Shifts of attention from left to right (cyan) and vice versa (blue), time-locked to the corresponding attention shift cue, illustrate the dynamic crossover in activation that accompanies shifts of attention. The contrast of left versus right visuospatial attention will necessarily yield an event-related time course in which the BOLD signal evoked by left-directed attention is greater than that evoked by right-directed attention, and is therefore not, in itself, an additional finding. However, in each of these areas, the BOLD signal evoked by counter switch cues is independent of the direction of spatial attention. Note that shifts of attention between WM counters do not evoke transient responses in these regions.
In this task, the items to be maintained in WM lend themselves to sequential serial rehearsal. Such a strategy could lead to an imagery-like spatial assignment of counters (e.g., Counter A might be identified with the left side of space and Counter B with the right). We found no evidence, however, that extrastriate cortical activity differed when one counter or the other was active (Figure 3, dark and light green time courses).
Sources of Cognitive Control: Reconfiguration during Internal and External Shifts of Attention
The primary event-related GLM analysis permitted several contrasts of interest to investigate the control of shifts of external and internal attention (for a complete list of regions that passed voxelwise and cluster statistical significance thresholds for each contrast of interest, see Supplemental Table 1). A detailed description of each of these contrasts follows.
The control of spatial attention shifts was examined by contrasting the location shift versus hold events (Figure 4A). As in previous investigations of the control of spatial attention (Shulman et al., 2009; Kelley et al., 2008; Yantis et al., 2002; Vandenberghe et al., 2001), shifts evoked activity in superior parietal and frontal cortices (see Supplemental Table 1). This result demonstrates that our sample size was large enough to replicate the corresponding group-level GLM results obtained in previous studies with similar designs.
Control over shifts of attention between WM counters was revealed by the counter shift versus hold contrast (Figure 4B). A network of parietal and frontal brain areas was transiently more active following shift cues compared to hold cues. These activations are largely overlapping with or adjacent to those revealed by the location shift versus hold contrast, with the exception of regions in the caudate nucleus and globus pallidus involved in switches of WM attention but not visuospatial attention. This result suggests that a common, domain-independent cognitive control network mediates shifts in both the internal and external domains, which includes mSPL/precuneus, intraparietal sulcus, and dorsolateral prefrontal regions.
In order to determine if location shifts might indeed evoke activity in the basal ganglia, we applied an analysis that did not require any correction for multiple comparisons. Specifically, we performed ROI-based GLMs for the caudate and globus pallidus regions identified via the counter shift versus hold contrast. In the caudate ROI, location shifts did not evoke significantly more activation than did holds (p = .120). In the globus pallidus ROI, location shifts evoked only marginally greater activation than did holds (p = .076).
In order to more properly determine regions of overlap between location shift- and counter shift-evoked activity, we conducted a conjunction analysis of internal and external shifts of attention. The only regions that satisfied this more stringent statistical criterion were the right mSPL/precuneus, the left intraparietal sulcus (IPS), and the right superior frontal sulcus (rSFS) (Figure 5).
Figure 5 shows event-related average time courses in each of the regions identified by the conjunction analysis. The voxels in these three regions were selected on the criteria that internal shifts and external shifts each evoked greater activity than holds; therefore, the time courses must necessarily exhibit such a pattern. However, the selection criteria did not constrain the relationship between internal and external attention shifts in these voxels. In the rSFS and the right mSPL, internal and external shifts evoked similar time courses and did not differ in ROI mean activity [rSFS: t(5) = −.956, p = .383; rmSPL: t(5) = 1.962, p = .107]. However, internal counter shifts evoked marginally significantly greater activity than external location shifts in the left IPS [see Figure 5 for time courses; t(5) = −2.547, p = .051], perhaps because of the involvement of the IPS in numerical processing (Piazza, Pinel, Le Bihan, & Dehaene, 2007; Pinel, Dehaene, Riviere, & LeBihan, 2001).
Distinct Spatio-temporal Patterns for Internal and External Shifts of Attention
The conventional GLM analyses reported above suggest that there exist several brain regions that exhibit shift-related activity in both external and internal domains. This could reflect a truly domain-independent signal that triggers both acts of cognitive reconfiguration—a generic “go signal”—that has no role in specifying the specific features of the shift (e.g., the old and/or the new cognitive state). Another possibility is that neurons within these common regions respond selectively during cognitive reconfiguration in different domains. This second possibility can be assessed effectively with multivariate statistics.
To examine domain-independence in regions identified via the conjunction analysis (location shifts vs. holds and counter shifts vs. holds), we performed multivoxel pattern classification using a linear SVM classifier in ROIs defined separately in each subject. These regions were identified via a leave-one-subject-out procedure to ensure independent test and training data (Esterman et al., 2010; see Methods).
Leave-one-subject-out conjunction GLMs identified regions that were very similar to the group conjunction regions (right mSPL, right SFS, left IPS) in all cases. We did not further analyze regions that were identified with the leave-one-subject-out conjunction analysis that did not also yield significant activation in the full group conjunction analysis.
Event-related pattern classification performance for location shifts versus counter shifts in each of the three conjunction regions is presented in Figure 6 (see Figure S4 for mean-centered er-MVPC results). As the MVPC is completely independent across subjects, all comparisons are of a single subject's classification to that same subject's randomization test (see Methods). For readability, mean performance across all subjects is shown in black; the dashed line shows the mean of the upper 95th percentile of chance performance according to the randomization test. In all three regions, the classifier correctly classified more often than chance whether a spatial shift of attention or a WM counter shift occurred following a shift cue. Furthermore, the time course of this difference approximately follows that of the mean BOLD response in these regions: For each region, classification performance was indistinguishable from chance for several seconds before the cue. From 4 to 6 sec after the cue onset, classification performance was significantly greater than chance. Importantly, this classification performance is driven by the pattern of activity across voxels and not by a difference in mean activity; the mean BOLD signal in mSPL and SFS did not differ reliably for the two types of shift.
The presence of two consistent sub-ROIs, one more sensitive to location shifts and one more sensitive to counter shifts, cannot account for our results. In Supplementary Figure 1, we present binarized classification weights back-projected into each ROI. Classification weights of opposite polarities do not form two opposing sub-ROIs, but instead are interspersed throughout each region.
Results for within-domain MVPC—location shift direction and counter shift direction—are presented in Supplementary Figures 2 and 3, respectively. In the IPS and mSPL, location shift direction could be decoded; this pattern of results was sustained over time. However, the SFS classifier could not decode location shift direction. No region's classifier successfully decoded direction of counter shift.
In this study, we investigated the neural basis of two cardinal processes of attentional control—shifting of external attention between spatial locations and shifting of internal attention between items held in WM. Each of these acts of control recruited a network of frontal and parietal regions, as revealed by the contrasts of spatial attention shifts versus holds and internal attention shifts versus holds. Critically, these networks are overlapping: Three distinct regions exhibited significantly greater activation for both the external attention shift versus hold and internal attention shift versus hold contrasts according to a conjunction analysis.
Both types of attention shift—external and internal—were contrasted against holds. This contrast both removes activity due to motor response demands, which are equally present for shifts and holds, and removes non-shift-related activity related to detecting and responding to RSVP events in general. A possible concern related to this contrast is that hold targets were missed more often than were shift targets, suggesting that holds might have been less salient. However, this concern is ameliorated by the fact that only correctly detected events were included in the shift and hold regressors.
An alternative contrast that might be considered for isolating activity due to attention shifts is shift versus increment. We ruled this out, however, because it biases the analysis toward failing to detect shift-related activity to whatever degree attention shifts and WM updates evoke activity in common. WM updates indeed evoke activity in the same brain regions as do shifts of task and shifts of attention (e.g., Roth & Courtney, 2007), including areas identified in this study. Thus, the more neutral subtraction with regard to attentional control-related activity, shift versus hold, is required.
The mean BOLD signals evoked by external and internal attention shifting were similar in the mSPL and SFS. In the IPS, WM attention shifts evoked slightly greater activity than spatial attention shifts. The latter difference may well be attributable to task-specific considerations, as the IPS has been implicated not just in attentional and WM control, but also as a region important for numerical magnitude processing (Piazza et al., 2007; Pinel et al., 2001).
Substantial areas of the brain were activated for just one or the other type of shift (see Figure 4, Supplemental Table 1). However, we focused our analysis on the regions of overlap because we are primarily interested in the control of attention. The regions of overlap fell in the dorsal fronto-parietal control network widely implicated in attentional and WM control. Regions activated for just one or the other shift type were more distributed across the brain, and might well reflect more “peripheral” processing related to control in a single modality (visuospatial information or WM information) rather than the “central” processes of cognitive control.
Although conventional univariate GLM analyses rely on patterns of activation at a coarse spatial scale, MVPC can reveal patterns of activity at a sub-ROI level. It is critical to the interpretation of MVPC results to recognize that MVPC is an inherently within-subject analysis; each brain evokes its own unique patterns of activity and there is no reason to think that those patterns should be similar across subjects except at a much coarser spatial scale than the one used here. Although Figure 6 presents group-average time courses, all analysis was carried out separately for each subject, and mean classification accuracy was computed only at the final step of plotting the results.
This analysis revealed that macroscopically domain-independent regions express domain-specific spatio-temporal patterns of information: The voxelwise pattern of activity within right SPL, left IPS, and right SFS differed reliably for the two types of attention shifts. This suggests that neuronal populations in these regions are selectively tuned to participate in distinct acts of cognitive control, and that a distributed pattern of activity specifies the nature of the shift to be carried out.
Several prior studies have reported transient increases in activity in these regions during shifts between different states of attention (e.g., Greenberg, Esterman, Wilson, Serences, & Yantis, 2010; Shulman et al., 2009; Kelley et al., 2008; Serences, Schwarzbach, Courtney, Golay, & Yantis, 2004; Yantis et al., 2002; Vandenberghe et al., 2001) or task set (Chiu & Yantis, 2009; Kamigaki, Fukushima, & Miyashita, 2009; Braver, Reynolds, & Donaldson, 2003). A natural hypothesis that emerges from these observations is that neurons tuned to control cognitive reconfigurations in both domains of control populate these regions.
The present study extends the set of studied domains of attention shifting to internal shifts of selection within WM using a paradigm in which both external and internal shifts were carried out in an interleaved fashion within subjects. This permitted a direct test of the degree to which control in these different domains recruits similar cortical networks. The findings presented here, along with prior findings of domain-independent control of attention shifts in medial superior parietal and frontal brain regions, support an emerging view of the neural basis of attentional control. According to this account, task-relevant perceptual or mnemonic information is selected for monitoring, processing, or updating (Montojo & Courtney, 2008; Roth & Courtney, 2007; Roth, Serences, & Courtney, 2006). Shifts of selective attention in perception or WM are associated with transient control signals in a fronto-parietal network. This domain-independent attentional control network is deployed through distinct domain-specific modes, which are reflected in unique patterns of brain activity for different acts of control.
This functional network also serves to reconfigure cognitive task set. In a recent study (Chiu & Yantis, 2009), shifts of spatial attention between two RSVP streams were compared with shifts between two categorization rules (magnitude or parity, applied to digit targets embedded in the attended RSVP stream). A region in the mSPL was transiently active during both spatial attention shifts and categorization rule shifts (Chiu & Yantis, 2009). The present study echoes these the findings, in that a reconfiguration of WM either for attentional priority of WM counter (here) or for attentional priority of categorization rule to be performed on bivalent stimuli (Chiu & Yantis, 2009) leads to recruitment of the mSPL. An fMR-adaptation analysis suggested that subpopulations of neurons within the mSPL were selectively active during the initiation of shifts in the perceptual and cognitive domains, respectively (Chiu & Yantis, 2009)—a conclusion that is consistent with the outcome of the current MVPC analysis. This conclusion is further bolstered by the finding that shifts of categorization rule and shifts of spatial attention can be decoded using MVPC in the mSPL (Esterman, Chiu, Tamber-Rosenau, & Yantis, 2009).
The parallel findings of the current study and Chiu and Yantis (2009) speak to the long-standing debate over whether the same representational and control mechanisms are used for both items and rules (see, e.g., Montojo & Courtney, 2008; Ravizza & Carter, 2008; Rushworth et al., 2001), suggesting that control of these two constructs shares at least some elements in common. Although the Esterman et al. (2009) study (which included a subset of the data from the present study for a cross-experiment stability analysis) also supports this view, it is important to assess the direct comparison within-paradigm as presented here. In the present study, we found that different types of shifts of selection may be differentiated from one another in several core attentional control regions.
The present data, like those of the studies reviewed above, do not speak to what aspects of the data lead to successful classification. The present results represent classification between one spatial and one nonspatial type of shift. An important topic for further research will be to ascertain if these pattern differences are driven by an internal/external dissociation, a spatial/nonspatial dissociation, or some other dissociation entirely. Future research should attempt classification between two nonspatial shifts in service of this goal.
In the current paradigm, the foci of external and internal attention and the counter values were maintained throughout the task (i.e., there were no task intervals in which the subject was not actively maintaining these states except for the brief intervals surrounding shifts), thus this task does not afford a means to detect activation related to domain-independent maintenance processes (e.g., via a maintenance vs. no-maintenance contrast). However, this paradigm is well suited to identifying shift-related brain activity. According to a proposed computational model of task shifting (O'Reilly & Frank, 2006), prefrontal cortex maintains WM items and rules, including rules specifying when items or rules should be updated, but the act of updating itself is gated by the basal ganglia (see also McNab & Klingberg, 2008). Thus, the loci of both external and internal attention and the counter values are maintained in WM by prefrontal cortex, but shifts in either locus of attention are mediated by basal ganglia gating. We observed transient activations of the caudate nucleus and globus pallidus that were evoked by attention shifts between WM items (see Supplemental Table 1 and Figure 4), but no statistically reliable activations of the basal ganglia were identified for external attention shifts. We take this as intriguing preliminary evidence for the model (O'Reilly & Frank, 2006), but further exploration is necessary to understand when external attention shifts evoke significant basal ganglia activation.
Taken together with the results of the present study, the model of O'Reilly and Frank (2006) supports an account of attentional control in which shifts of attention require both an update to WM representations of the current task context (e.g., “at this moment in time, I should be directing spatial attention to the left side”), and a reconfiguration of the content-specific regions of sensory or mnemonic cortex (e.g., biasing neural competition in favor of right extrastriate visual cortex). On this view, it is important to separately consider the reconfiguration of the WM resource in which task context is maintained, on the one hand, and the reorienting of attention to a new sensory input or a different WM representation, on the other. This account of cognitive control highlights multiple distinct types of possible WM updates: a reconfiguration of task context, a replacement of maintained item information with altogether new information, the integration of new and old information, a shift of attention between WM representations, or one of perhaps many other acts of control in WM. One interpretation of our results is that different types of reconfigurations in this task context WM (here, either reconfigurations of visuospatial attention or of mnemonic attention) lead to distinguishable signals in WM control regions, as revealed by MVPC.
Both unitary store and multipartite models of WM posit (either explicitly or implicitly) shifts of attention within WM, but do not specifically address how this attention is deployed, or whether attentional deployments within WM are mediated by the same mechanisms as attentional deployments in perception. The data presented here do not resolve the debate between unitary store and multipartite models of WM, but they do sharply constrain these models by showing that shifts of attention between perceptual and mnemonic representations share a common cortical source that is deployed in distinct domain-specific modes. These findings suggest that a domain-independent cognitive reconfiguration operation, instantiated by different patterns of activity in the mSPL and other areas of the brain, subserves mnemonic and perceptual attention. Additional studies that separately evoke changes to task context, to item content, and to the state of attention will further clarify how voluntary shifts of attention at different levels of WM and perception are coordinated to yield flexible and adaptive cognitive behavior.
We thank Amy Shelton and John Serences for statistical and technical advice, and Jared Abrams, Terri Brawner, and Kathleen Kahl for excellent technical assistance. This work was supported by the National Institutes of Health grant R01-DA13165 to S. Y.
Reprint requests should be sent to Benjamin J. Tamber-Rosenau, Department of Psychology, Vanderbilt University, 427 Wilson Hall, 111 21st Avenue South, Nashville, TN 37240, or via e-mail: firstname.lastname@example.org.
Now at Vanderbilt University.