Recent research has demonstrated top–down attentional modulation of activity in extrastriate category-selective visual areas while stimuli are in view (perceptual attention) and after they are removed from view (reflective attention). Perceptual attention is capable of both enhancing and suppressing activity in category-selective areas relative to a passive viewing baseline. In this study, we demonstrate that a brief, simple act of reflective attention (“refreshing”) is also capable of both enhancing and suppressing activity in some scene-selective areas (the parahippocampal place area [PPA]) but not others (refreshing resulted in enhancement but not in suppression in the middle occipital gyrus [MOG]). This suggests that different category-selective extrastriate areas preferring the same class of stimuli may contribute differentially to reflective processing of one's internal representations of such stimuli.
Research in cognitive neuroscience has demonstrated that executive processes such as visual attention and working memory (WM), traditionally associated with heteromodal association areas in frontal and parietal cortex, are also capable of influencing activity levels in brain areas most strongly associated with unimodal perception. This is in line with theories describing the role of prefrontal cortex in executive processing as one of biasing the flow of information in other brain regions (Miller & Cohen, 2001), and more specifically, the ideas that attention serves to enhance perceptual processing of a small subset of stimuli against a background of competing percepts (the “biased competition” model; Desimone & Duncan, 1995) and that WM is at least partially mediated by an enhancement of activity in brain regions where the relevant representations were initially processed perceptually (Ranganath & D'Esposito, 2005; Curtis & D'Esposito, 2003; Ruchkin, Grafman, Cameron, & Berndt, 2003; Petrides, 1994), presumably affording a partial reinstantiation of patterns of neural activity experienced during initial perception.
Category-selective regions of extrastriate visual cortex such as the fusiform face area (Kanwisher, McDermott, & Chun, 1997; McCarthy, Puce, Gore, & Allison, 1997) and the parahippocampal place area (PPA; Epstein & Kanwisher, 1998) have provided a convenient testing ground for studies of executive processing. For example, if the fusiform face area responds more to faces than scenes and PPA responds more to scenes than faces, it would be reasonable to expect that, given a display consisting of a face and a scene, attending to the scene would produce more PPA activity than attending to the face. A number of studies have explored the ability of executive processing to modulate category-selective extrastriate (hereafter CSE) cortex activity, including studies within the domains of mental imagery (O'Craven & Kanwisher, 2000), WM maintenance (Ranganath, DeGutis, & D'Esposito, 2004; Druzgal & D'Esposito, 2003; Postle, Druzgal, & D'Esposito, 2003), visual attention (Wojciulik, Kanwisher, & Driver, 1998), and combinations of WM and attention (Gazzaley et al., 2007; Lepsien & Nobre, 2007; Gazzaley, Cooney, McEvoy, Knight, & D'Esposito, 2005).
One limitation of many of these studies is task complexity; for example, in a typical sustained WM paradigm, the cognitive processes a participant may invoke to maintain a face or scene in memory are relatively unconstrained. This makes it potentially difficult to distinguish which processes are responsible for producing which patterns of activity, and raises the question of what process is minimally necessary to modulate extrastriate activity. One potential explanation for the common characteristics observed in multiple executive control tasks (e.g., Duncan & Owen, 2000) is that they rely on common component processes, suggesting the strategy of breaking these tasks into simpler constituent processes for closer study. Following this strategy, we have used a “refresh” paradigm (Johnson & Johnson, in press; Raye, Johnson, Mitchell, Greene, & Johnson, 2007; Johnson et al., 2005; Johnson, Mitchell, Raye, & Greene, 2004; Johnson, Reeder, Raye, & Mitchell, 2002; Raye, Johnson, Mitchell, Reeder, & Greene, 2002) to investigate one simple executive process in which participants simply think briefly of a stimulus that was presented less than a second earlier, thus eliminating any elaborative rehearsal strategies or brain activity related to the preparation of a motor response. A previous study (Johnson, Mitchell, Raye, D'Esposito, & Johnson, 2007) found that refreshing pictures of faces or scenes is sufficient to modulate CSE area activity, and that the degree of modulation varied (relative to activity produced by perception) according to an anatomical gradient.
Because our previous results (Johnson et al., 2007) indicate that refreshing—the act of turning one's internal, or reflective, attention to highlight one of several activated representations—is capable of modulating CSE cortex activity, we can use this process as a probe of executive function in order to address a number of open questions in the literature. One such question is to what extent executive processes suppress competing representations in CSE cortex in addition to enhancing target representations. Previous studies (Gazzaley et al., 2007; Gazzaley, Cooney, McEvoy, et al., 2005; Gazzaley, Cooney, Rissman, & D'Esposito, 2005) demonstrated that in a sustained WM context, instructing participants ahead of time to attend to (e.g., “remember scenes”) or ignore (e.g., “ignore scenes”) one class of stimuli can enhance or suppress activity in relevant CSE regions (e.g., PPA) during encoding relative to a passive viewing baseline in which participants are free to attend to both classes of stimuli. Other studies showing above-baseline CSE activity during WM delay periods (Ranganath et al., 2004; Druzgal & D'Esposito, 2003; Postle et al., 2003) and mental imagery (O'Craven & Kanwisher, 2000) suggest that these processes enhance activity in extrastriate regions selective for the appropriate category in the absence of perceptual stimulation. However, to our knowledge, no study has conclusively demonstrated suppression of activity in regions selective for an irrelevant category without ongoing perceptual stimulation.
The suppression relative to a passive baseline observed by Gazzaley, Cooney, McEvoy, et al. (2005) and Gazzaley, Cooney, Rissman, et al. (2005) presumably was due to selective perceptual attention (or an unknown combination of perceptual and reflective attention). Our own study of CSE modulation via refreshing faces or scenes (Johnson et al., 2007) as well as another study of CSE modulation via shifting reflective attention to either a face or scene among multiple stimuli held in WM (Lepsien & Nobre, 2007) only compared refreshing/attending to one stimulus versus another without a baseline condition in which neither stimulus was selected (analogous to a passive viewing baseline in a study of perceptual attention), making it impossible to conclude whether the observed modulation was due to enhancement of activity when the area's preferred stimulus class was the target, suppression of activity when the area's preferred stimulus class was irrelevant, or some combination of the above. Thus, the aim of the present study was to determine if reflective attention alone, in the absence of ongoing perceptual stimulation, can suppress as well as enhance activity in CSE areas. We did so by presenting participants with displays consisting of a face and a scene stimulus, followed by a cue to refresh one of the two stimuli; these refresh conditions were then compared to a baseline condition in which the face and scene stimuli were initially presented but then neither stimulus was refreshed (see Methods). We also included an overt attention task (i.e., presenting the same face/scene displays as in the refresh conditions, but cueing participants ahead of time to look at only one stimulus and ignore the other), which allowed us to estimate the approximate strength of any refresh-induced modulation in CSE cortex by comparing it to the activity differences caused by simply foveating stimuli of a preferred or nonpreferred category.
Fourteen young, right-handed, self-reported healthy adults with normal or corrected-to-normal vision participated in the study (5 women, mean age = 21.1 ± 2.8 years). One additional participant was excluded due to excessive sleepiness during scanning. Participants were screened for MRI compatibility, gave written informed consent, and were compensated. The procedure was approved by the Yale University School of Medicine Human Investigation Committee.
On each trial, participants saw a face and a scene presented for 1500 msec on the left and right sides of the screen (see Figure 1). For Refresh trials (Figure 1A), face and scene stimuli were preceded by a white fixation cross (1500 msec) and followed by a brief blank screen (500 msec), and then a white arrow pointing to the left or right side of the screen (1500 msec). In Refresh trials, the arrow cued participants to think back to, or visualize, the picture that had just appeared on the indicated side (which could be either a face or a scene, yielding Refresh face and Refresh scene conditions) as long as the arrow was onscreen. For Attend trials (Figure 1B), face and scene stimuli were preceded by a white arrow pointing to the left or right side of the screen (1500 msec) and followed by a brief blank screen (500 msec) and then a white fixation cross (1500 msec). In Attend trials, the arrow cued participants only to look at (overtly attend to) the picture that would shortly appear on the indicated side (which could be either a face or a scene, yielding Attend Face and Attend Scene conditions), and ignore the picture on the other side. For act trials (Figure 1C), face and scene stimuli were preceded by a white fixation cross (1500 msec) and followed by a brief blank screen (500 msec) and then a gray square presented centrally (1500 msec). The gray square cued participants to press a button with their right index fingers, and not to think about either the face or the scene stimulus. In all trials, fixation crosses served only as an indicator that the trial was ongoing (the screen was blank between trials) and did not cue participants to do anything in particular. Trials were separated by blank intertrial intervals (ITIs) of 3000, 5000, or 7000 msec, with the distribution of ITIs selected via a Matlab (MathWorks, Natick, MA) script that generated multiple randomized sets of ITIs and retained those that would provide maximal orthogonality between conditions in subsequent fMRI statistical analyses (see below). Participants completed 5 runs of 40 trials each (8 trials per condition per run, pseudorandomly intermixed), for a total of 40 trials per condition per participant.
With this task design, we attempted to equate the reflective attention (Refresh) and overt perceptual attention (Attend) conditions as closely as possible, considering that cues for reflective and perceptual attention must necessarily occur at different points in time. However, both types of trial used the same types of stimuli (a fixation cross, a face/scene pair, and an arrow) and lasted for the same duration; the only difference in the stimuli between the two trial types was that the timings of the fixation cross and the arrow traded places. In turn, we attempted to provide as close a perceptual baseline as possible to both the Refresh and Attend conditions with the act condition, which also presented a fixation cross, a face/scene pair, and another cue, but without any instruction to attend preferentially to one of the two stimuli. The button press response was included to provide participants with a task to help interrupt any unprompted refreshing or rehearsing of the picture stimuli, but one simple enough not to require more cognitive control to execute than the Refresh or Attend trial types. No overt responses were collected for the Refresh or Attend trial types in an effort to keep these operations as simple as possible, thus keeping participants from invoking additional control processes and preventing them from being distracted from the stimuli. In addition, we knew that previous studies had reliably found refresh-related brain activity without recording any behavioral data (M. K. Johnson et al., 2005; M. R. Johnson et al., 2007; Raye et al., 2002, 2007) as well as neural (repetition attenuation when refreshed stimuli were subsequently perceived; Yi, Turk-Browne, Chun, & Johnson, 2008) and behavioral (better subsequent memory for refreshed than perceived items; Johnson et al., 2002; Raye et al., 2002) evidence of the consequences of refreshing.
All face and scene stimuli were full-color pictures measuring 300 × 300 pixels. Faces were forward-facing complete head shots of men and women (in equal proportions) with neutral or pleasant facial expressions;1 all faces were drawn from a database developed by Minear and Park (2004). Scenes were indoor and outdoor (in equal proportions) pictures of landscapes, buildings, and interior rooms in a wide variety of settings, drawn from a number of sources (mostly freely available images from the Internet). Stimuli were counterbalanced across participants with regard to the condition and run in which they appeared. Faces and scenes were also balanced to appear equally often on the left and right, and the stimulus to be attended or refreshed also occurred equally often on the left and right in each condition. Every face and scene stimulus was used exactly once per participant. Several practice trials, using different stimuli than those seen in the scanner, were given prior to scanning to familiarize participants with the task.
Imaging data were acquired on a 1.5-T Siemens Sonata scanner at the Yale University Magnetic Resonance Research Center. Medium-resolution T1 anatomical images were followed by five functional runs of the task (209 volumes, 6:58 per run). Six volumes were discarded from the beginning of each run to allow tissue to reach steady-state magnetization. Functional echo-planar images were whole-brain volumes with the following parameters: 24 axial slices, interleaved acquisition, TR = 2000 msec, TE = 35 msec, flip = 80°, 3.75 mm × 3.75 mm × 3.8 mm voxels with 0 mm skip.
Data were analyzed using SPM5 (Wellcome Department of Imaging Neuroscience, University College London, UK). Preprocessing included slice timing correction (setting the reference slice to the first slice acquired in time), motion correction, coregistration of functional images to the participant's anatomical scan, spatial normalization (with anatomical scans first being warped to match the T1 template provided with SPM, and those warping parameters then being applied to the functional data, resampling functional images to 3 mm isotropic voxels in the process), and spatial smoothing (9 mm FWHM Gaussian kernel). Single-subject statistics were modeled by treating each trial as a 5-sec epoch (which includes the total time needed to display a fixation cross, face/scene pair, and arrow or gray square for all trial types), which was convolved with the canonical hemodynamic response function (HRF) to create regressors for each condition. Thus, all hemodynamic activity reported here for a given trial type can be construed as the total amount of activity across the entire duration of the trial. Parameter estimates (beta images) of activity for each condition and each participant were then entered into a group random effects analysis using a one-way (5 conditions) within-subject ANOVA design.
Areas exhibiting significant activation differences between conditions were isolated with an effects-of-interest F test (i.e., an unbiased test to determine regions showing any difference among the five conditions in any direction) on this group analysis, with an alpha threshold of p < .001, uncorrected for multiple comparisons (for this particular dataset, the uncorrected threshold was actually substantially more conservative than an FDR-corrected threshold of p < .05) and an extent threshold of 5 voxels. Scene-selective regions of interest (ROIs) were determined from this effects-of-interest map by isolating local statistical maxima near to the known locations of these ROIs as reported in prior studies (e.g., Johnson et al., 2007). For all ROI analyses, voxel values from each participant's beta images were extracted from a 6-mm-radius sphere around the appropriate coordinate and averaged to produce a single value for the region.
To track the BOLD signal in each of the five conditions through time, separate sets of single-subject statistics were run using a finite impulse response (FIR) basis function instead of the canonical HRF. The FIR model used a 16-sec time window with eight bins. As all trials were time-locked to the onset of an fMRI volume acquisition and the preprocessing slice timing correction used the first slice acquired in time as the reference slice, the FIR model thus effectively sampled BOLD signal at 0 sec, 2 sec, 4 sec, … , 14 sec following the onset of each trial (Henson, Rugg, & Friston, 2001). To extract signal time courses in our ROIs, parameter estimates from these models were extracted in 6-mm-radius spheres around the coordinate of interest in the same manner as for ROI analyses in the canonical HRF models, detailed above.
We identified seven scene-selective ROIs for further analysis;2 these were located in the bilateral middle occipital gyrus (MOG), bilateral PPA, bilateral retrosplenial cortex (RSC), and right precuneus (PCu)/intraparietal sulcus (IPS); locations are shown in Figure 2H. We concentrated on these regions because a prior study (Johnson et al., 2007) had identified them in a localizer task as being relatively selective for scenes (i.e., activating more for scenes than faces, although not necessarily exclusively activating for scenes vs. other stimulus classes) and examined refresh-related activity in them. Our previous study had also identified the PCu/IPS region on the left, but only the right PCu/IPS activity was found to differ significantly among conditions in the present study.
Activation values for these seven scene-selective ROIs in the five conditions of the task are shown in Figure 2A–G. To better visualize the modulation induced by refreshing or overtly attending faces or scenes, we also calculated four modulation indices for each ROI by subtracting the activation related to the act condition from the activation related to each of the Refresh and Attend conditions (Figure 3). Thus, in Figure 3, upward-pointing bars represent enhancement of activity in that region relative to the act condition, and downward-pointing bars represent suppression relative to act.
In bilateral PPA (Figures 2C–D and 3), both refreshing and attending scenes significantly enhanced activity relative to the act condition, and both refreshing and attending faces produced significant suppression (one-tailed paired t tests; all individual p < .05). Neither the left nor right PPA showed a difference between the Refresh Scene and Attend Scene conditions (both p > .4, two-tailed paired t tests), indicating that refreshing and attending scenes produce similar degrees of enhancement in the PPA. There was also no difference between refreshing and attending faces in the left PPA (p > .2, two-tailed paired t test), indicating similar degrees of suppression for refreshing and attending, but in the right PPA there was significantly greater suppression in the Attend Face condition than in the Refresh Face condition (p < .05, two-tailed paired t test).
As a way of confirming that the activity levels captured by our canonical HRF analysis were accurate, we also extracted and plotted timelines of estimated BOLD signal based on our FIR analysis for the bilateral PPA (Figure 4). The enhancement and suppression effects reported in the preceding paragraph are apparent in these timelines as well, suggesting that these effects are not simply artifacts of the way the hemodynamic response was modeled. The timelines clearly suggest, as expected, that modulation in the Attend conditions occurred earlier in time than in the Refresh conditions. In both the left and right PPA, the Attend Scene and Attend Face conditions diverge sharply from each other (and from the act condition) around 6 sec after trial onset, but the Refresh Face, Refresh Scene, and Act conditions (which appear identical until the last stimulus in the trial is presented) track closely together until about 8 sec after trial onset, at which point the modulation induced by the refresh cue becomes apparent.
Bilateral RSC (Figures 2E–F and 3) showed a qualitatively similar pattern of activity to the PPA, but due to relatively greater variability in this region, not all comparisons were significant. Nevertheless, right RSC showed significant enhancement relative to act for both refreshing and attending scenes (both p < .05, one-tailed paired t tests) and refreshing faces produced suppression in bilateral RSC (left: p < .05; right: p = .056, both one-tailed paired t tests) relative to the act condition. Importantly, although not all enhancements/suppressions were significant when compared to the baseline act condition, refreshing and attending still clearly modulated activity in bilateral RSC; the Refresh Scene > Refresh Face and Attend Scene > Attend Face comparisons were significant on both sides (all p < .05, one-tailed paired t tests).
In the scene-selective clusters of the bilateral MOG (Figures 2A–B and 3), there was significant enhancement of activity relative to act for both refreshing and attending scenes as well as suppression for attending faces (all p < .05, one-tailed paired t tests). However, there was no evidence of any suppression of activity for refreshing faces (both p > .29, one-tailed paired t tests). Differences between the amount of enhancement for refreshing versus attending scenes were not significant (both p > .12, two-tailed paired t tests), but in both the left and right MOG there was numerically greater enhancement for attending than refreshing scenes.
In the right PCu/IPS (Figures 2G and 3), both refreshing and attending scenes significantly enhanced activity relative to the act baseline, and the Refresh Scene > Refresh Face and Attend Scene > Attend Face comparisons were significant as well (all p < .05, one-tailed paired t tests). However, neither the Refresh Face nor the Attend Face condition differed from the act baseline (both p > .14, two-tailed paired t tests), and in fact, activity from refreshing and attending faces was numerically greater than in the act condition. Thus, there was no evidence of suppression relative to the act condition for either refreshing or attending faces.
In the present study, we set out to determine if and to what extent reflective attention is capable of enhancing and suppressing activity in CSE cortical areas in a manner similar to perceptual attention. We found that both perceptual and reflective attention were capable of enhancing and suppressing activity relative to a baseline condition to varying degrees throughout the neural scene-processing pipeline. In the PPA, the most commonly studied scene-selective cortical area, both reflective and perceptual attention were capable of enhancing activity when the target stimulus was a scene and suppressing activity when the target stimulus was a face. Strikingly, the degree of enhancement did not differ between refreshing and overtly attending the scene stimulus, suggesting that reflective attention processes may be able to boost scene-processing activity immediately following perception just as much as attending during perception. This is in line with previous results by Yi et al. (2008), who observed an equal amount of repetition attenuation in the PPA produced by perceiving and then refreshing a scene picture versus perceiving it twice. It is also striking that refreshing a nonpreferred stimulus (e.g., a face) can actually suppress PPA activity relative to a baseline condition. This suggests that under ordinary conditions, scene processing in the PPA may continue for some time after the scene stimulus is removed, and that turning one's internal, or reflective, attention to a face representation is sufficient to interrupt this processing. This introduces the question as to what mechanisms are responsible for postperceptual scene processing in the PPA—for example, whether local circuit activity within the PPA is sufficient to carry on this processing, or if a top–down signal from frontal or parietal cortex (e.g., spontaneous, or automatic, refreshing) is needed to sustain PPA activity after perception. A related question is whether the “suppression” relative to a baseline condition is accomplished via active top–down inhibition of PPA activity, or whether it simply reflects the distribution of executive processing (e.g., refreshing) to the face representation that otherwise could have been directed to the scene representation.
In RSC, another often-studied scene-selective region (Epstein & Higgins, 2007; Epstein, Parker, & Feiler, 2007; Park, Intraub, Yi, Widders, & Chun, 2007), the pattern of enhancement and suppression was generally consistent with that seen in the PPA, but modulation effects were not observed as reliably as in the PPA. However, the pattern was quite different in two other (relatively) scene-selective areas, the MOG and the PCu/IPS. In the MOG, there were clear enhancement effects due to reflective attention (as well as the expected enhancement/suppression due to overt attention), but there was no evidence of suppression due to refreshing face stimuli. (Consistent with our findings of refresh-based suppression in the PPA but not in the MOG, the [Refresh Face − Act] difference was significantly different between the PPA and the MOG bilaterally; p < .05, two-tailed paired t tests.) This suggests a dissociation either in the type of postperceptual scene processing performed by the MOG and the PPA, or in the ability of top–down processes to modulate activity at different points along the scene-processing pipeline. For example, it may be that the MOG does not continue to process a scene after it is removed from one's visual field, and thus, there is no postperceptual processing occurring in the MOG to be cut off by refreshing a face. Alternatively, it may be that excitatory top–down signals have a pathway to both the MOG and the PPA, but inhibitory top–down signals are capable of suppressing activity in the PPA only. Future studies will be required in order to determine which of these possibilities are responsible for the effects observed here.
Finally, in our last scene-selective area, the right PCu/IPS, enhancement of activity was observed for both refreshing and overtly attending scenes, but there was no evidence of suppression during the processing of faces. Although this study did not include a functional visual short-term memory (VSTM) localizer task, the pattern of activity in this area and its coordinates suggest that it is anatomically and functionally the same region that has been previously isolated in studies of covert visual attention (e.g., Kincade, Abrams, Astafiev, Shulman, & Corbetta, 2005; Rushworth, Paus, & Sipila, 2001; Culham et al., 1998) and VSTM (e.g., Xu, 2007; Xu & Chun, 2006; Todd & Marois, 2004, 2005). In particular, this region appears to be close to Xu and Chun's (2006) (also see Xu, 2007) “superior IPS” region. That study found that the inferior IPS appears to track the number of objects active in VSTM regardless of complexity, but the superior IPS represents fewer objects when object complexity is high. Thus, Xu and Chun's results suggest that the superior IPS tracks the overall amount of information active in VSTM, regardless of how that information is parceled into discrete objects. The apparent scene-selectivity of this region in our studies suggests that our scene stimuli may have been more “complex” in some way than our face stimuli. If so, our results suggest that this region tracks complexity both during overt attention to scenes and when reflectively attending to an active representation of a scene. Perceptually or reflectively attending to faces, on the other hand, evoked only slightly and nonsignificantly more activity in this area than our baseline condition. However, it is unclear whether this means that participants were especially drawn to the face stimuli and tracked their features even during the baseline condition; that faces simply contain far less of the type of information tracked by the superior IPS than scenes; or that complex features in face stimuli are tracked elsewhere.
In short, we have demonstrated that basic acts of reflective as well as perceptual attention can both enhance and suppress extrastriate activity for preferred and nonpreferred stimuli, respectively. However, scene-selective areas did not all show the same pattern of modulation—for example, refreshing produced both enhancement and suppression in the PPA, but only enhancement in the MOG—raising further questions regarding how and to what extent different areas along the visual processing pathway participate in reflective thought processes (Johnson et al., 2007).
We thank the staff of the Yale Magnetic Resonance Research Center for their help with fMRI data acquisition.
Funding: National Institutes of Health (AG09253 to M. K. J.); National Science Foundation (Graduate Research Fellowship to M. R. J.).
Reprint requests should be sent to Matthew R. Johnson, Department of Psychology, Yale University, P.O. Box 208205, New Haven, CT 06520-8205, or via e-mail: email@example.com.
Faces were all caucasian and equally divided between pictures of younger and older adults; these manipulations were related to additional components of the study that are outside the scope of the present manuscript.
In the present report, we focus exclusively on scene-selective areas; although some face-selective areas were identified, effects were stronger and more reliable in the scene-selective regions (consistent with our previous findings; Johnson et al., 2007). Participants in this study also performed separate face–scene localizer tasks to confirm the anatomical locations of category-selective ROIs, but the locations were similar to those obtained using the main task, and thus, data from the localizer tasks are not discussed further.