Modulations of sensory processing in early visual areas are thought to play an important role in conscious perception. To date, most empirical studies focused on effects occurring before or during visual presentation. By contrast, several emerging theories postulate that sensory processing and conscious visual perception may also crucially depend on late top–down influences, potentially arising after a visual display. To provide a direct test of this, we performed an fMRI study using a postcued report procedure. The ability to report a target at a specific spatial location in a visual display can be enhanced behaviorally by symbolic auditory postcues presented shortly after that display. Here we showed that such auditory postcues can enhance target-specific signals in early human visual cortex (V1 and V2). For postcues presented 200 msec after stimulus termination, this target-specific enhancement in visual cortex was specifically associated with correct conscious report. The strength of this modulation predicted individual levels of performance in behavior. By contrast, although later postcues presented 1000 msec after stimulus termination had some impact on activity in early visual cortex, this modulation no longer related to conscious report. These results demonstrate that within a critical time window of a few hundred milliseconds after a visual stimulus has disappeared, successful conscious report of that stimulus still relates to the strength of top–down modulation in early visual cortex. We suggest that, within this critical time window, sensory representation of a visual stimulus is still under construction and so can still be flexibly influenced by top–down modulatory processes.
Activity in early sensory areas is thought to play an important role in conscious perception. For the same external stimulation, trial-to-trial variations in the strength of such activity can relate to fluctuations in conscious report (Ress & Heeger, 2003; Shulman, Ollinger, Linenweber, Petersen, & Corbetta, 2001). Activity levels before or during stimulus presentation are subject both to spontaneous fluctuations (Hesselmann, Kell, Eger, & Kleinschmidt, 2008; Boly et al., 2007) and to top–down influences such as from preparatory endogenous attention (Schwartz et al., 2005; Martinez et al., 1999). These combined factors can influence “feedforward” phases of stimulus processing and thus impact on how well a stimulus will be perceived (Macknik & Martinez-Conde, 2007; Lamme & Roelfsema, 2000). However, several emerging theories now postulate that conscious visual perception also crucially depends on top–down influences from higher cortical areas to early sensory areas potentially arising after a visual display has terminated (Gilbert & Sigman, 2007; Dehaene, Changeux, Naccache, Sackur, & Sergent, 2006; Lamme, 2004; Rees, Kreiman, & Koch, 2002).
Some initial support for this perspective has come from studies showing that conscious visual perception is associated with increased functional coupling between lower and higher level visual areas (Sterzer, Haynes, & Rees, 2006; Haynes, Driver, & Rees, 2005). Furthermore, artificially manipulating such functional coupling with TMS can modulate conscious perception (Ruff et al., 2006; Pascual-Leone & Walsh, 2001). Another line of support has come from studies examining the timing of the neural activity associated with conscious perception. In humans, relatively late correlates of attentional modulation or conscious perception can be observed in visual cortex beyond 100 msec poststimulus (Lamy, Salti, & Bar-Haim, 2009; Boehler, Schoenfeld, Heinze, & Hopf, 2008; Wyart & Tallon-Baudry, 2008; Del Cul, Baillet, & Dehaene, 2007; Sergent, Baillet, & Dehaene, 2005; Noesselt et al., 2002). For example, Sergent et al. (2005) used EEG in an attentional blink protocol, during which the perception of a visual stimulus (e.g., a word) can be altered by paying attention to another visual stimulus presented 200 to 300 msec before (Raymond, Shapiro, & Arnell, 1992). They contrasted brain potentials evoked by the same visual stimulus according to whether it was seen or missed during the attentional blink. They observed that the initial stages of sensory processing were identical for seen and missed stimuli. It was only beyond 200 msec poststimulus that brain activity started to show differences correlating with the conscious report of participants. These observations suggest that the perception of a stimulus might not be entirely determined by the strength of the initial phase of stimulus processing. Potential top–down influences at a later stage, such as those related to attentional systems (Macknik & Martinez-Conde, 2007), might also play a decisive role in perception of a visual stimulus.
However, to date, there has been little direct positive evidence specifically for late top–down modulations affecting early human retinotopic visual cortex nor for a specific relationship between such late top–down modulation and successful conscious perceptual report. The main difficulty in testing this top–down perspective has stemmed from the fact that bottom–up, recurrent and top–down contributions to information processing cannot always be readily disentangled. Here we offer a new approach inspired from the classic partial postcued report procedure (Sperling, 1960), now reconsidered from a top–down perspective.
Human capacity for reporting items from a brief multi-item visual display is restricted. For example, in a 12-letter display comprising four rows, capacity is typically limited to reporting three to five letters. Yet when observers are postcued (e.g., by a particular sound), several hundred milliseconds after display offset to report just one row, often any row can now be reported in full (Sperling, 1960). Thus, although “whole report” of all items in the visual display appears poor, all items remain potentially available for “partial” postcued report several hundred milliseconds later. A classical interpretation of these results is that, immediately after visual display, visual information is stored in a high-capacity short-lived sensory memory (Coltheart, 1983; Sperling, 1960). This high-capacity “iconic memory” has been proposed to decay relatively rapidly so that with a more delayed postcue after stimulus offset, performance of participants only reflects the more restricted capacity of visual STM.
Here we took advantage of the postcueing methodology to investigate the possible role of top–down modulations occurring after stimulus offset on conscious perception. In line with recent frameworks (Ruff, Kristjansson, & Driver, 2007; Sergent & Rees, 2007), we hypothesized that the increased ability to report items at the postcued location for postcues presented a few hundred milliseconds after stimulus offset (as compared with around a second later) might reflect specific top–down enhancement of ongoing sensory processing at the postcued location. To test this, we investigated the influence of a symbolic auditory postcue on retinotopically defined areas within human V1 and V2, using fMRI in healthy participants.
To be able to study fMRI signals in early human visual cortex for different visual quadrants, we used simplified displays comprising four oriented grating stimuli (see below and Figure 1A), one in each quadrant. Although such displays differed in type and had a smaller set size than the 12-letter displays classically used in the original “iconic memory” behavioral studies (e.g., Sperling, 1960), they were better suited for our purposes. Moreover if top–down processing shortly subsequent to stimulus offset can enhance conscious perception, we could still predict a benefit in performance with postcues delayed by 200 msec or so relative to stimulus offset as compared with later postcues after display offset by around a second or so. An initial purely behavioral experiment confirmed this prediction with our simplified four-item displays (see below). We then carried out an fMRI study using these displays that were particularly well suited for retinotopic analysis within V1 and V2.
Stimuli and Experimental Setup
Participants fixated centrally while viewing brief visual displays each comprising four circular patches of square-wave gratings (four cycles, one cycle per degree of visual angle, Michelson contrast = 1), one in each of the four quadrants (Figure 1A). Grating patches subtended 4° on a gray background at 7.5° eccentricity (from fixation to patch center). Their orientation (1/5 × π, 2/5 × π, 3/5 × π, or 4/5 × π) varied independently in each quadrant, with the constraint that each display contained no more than two gratings of the same orientation. The displays were presented for 200 msec, and the contrast of the gratings was reversed every 50 msec to minimize afterimages on the retina. These visual displays were followed, after a variable delay, by an auditory postcue played over headphones and lasting 200 msec. The auditory postcue varied on two dimensions: (1) the ear to which it was presented, left or right, and (2) its pitch, high or low (sine waves of 1000 or 500 Hz, respectively). The resulting four possible sounds instructed the participant to report the orientation of a particular grating from the four previously presented gratings (e.g., a high-pitched tone to the left ear instructed report of the top left grating).
At the beginning of each trial, an upcoming display was forewarned by a brief change in the color of the fixation cross (which turned from white to black for 100 msec, then back to white). Two hundred milliseconds after the fixation cross turned white again, four gratings appeared for 200 msec. After the auditory postcue (presented at various delays, see below), participants had to report the orientation of the preceding but now postcued grating within 3 sec (four-alternative forced choice for visual orientation) by pressing one of four designated buttons.
In an initial purely behavioral experiment (and also prescan training outside the scanner), visual stimuli were viewed on a 19-in. CRT monitor (Mitsubishi Diamond Pro 920 [Irvine, CA], 1024 × 768 pixel resolution, refresh rate of 60 Hz) at a viewing distance of 50 cm. In the scanner, visual stimuli were presented using an LCD projector with a 60-Hz refresh rate that projected onto a screen at the headend of the scanner. Participants viewed this screen through an angled mirror mounted on the head coil. The auditory postcues were played over MR-compatible headphones. For both experiments, all stimuli were created and displayed by means of the MATLAB toolbox Cogent (Wellcome Trust Centre for Neuroimaging at the University College London, UK), which was also used for determining the timing of presentation and collection of responses.
Purely Behavioral Experiment
In an initial purely behavioral experiment, we systematically varied the delay between visual stimulus offset and onset of the auditory postcue to test whether the classic partial postcued report pattern could be observed with these much simpler visual displays, chosen to enable subsequent study of early retinotopic visual areas with fMRI. Eight healthy volunteers aged 22 to 37 years (mean = 28, SD = 5; one woman, seven men) with normal vision and hearing gave written informed consent to take part in this experiment, which was approved by the local ethics committee.
Participants were first trained to interpret the auditory cue by performing the visual task for trials where the cue was presented at the onset of the visual display, until their performance was better than 80% correct. Then, after a brief training run with the actual postcue delays used in the experiment (see below), participants performed four blocks of 60 experimental trials. We tested five delays between visual stimulus offset and auditory postcue onset: 100, 200, 400, 800, and 1000 msec (randomly intermixed). After each auditory postcue, participants reported the orientation of the postcued grating (π/5, 2/5 × π, 3/5 × π, or 4/5 × π) within 3 sec after auditory onset by pressing one of four designated keys on the computer keyboard. The auditory postcue delay (five levels), the spatial location of the postcued grating (four levels, corresponding to the four different quadrants), and the particular orientation of the postcued grating (four levels) were independently and randomly determined for each trial throughout the experiment by the computer program used to generate the experiment. Performance was analyzed as a function of postcue delay (Figure 1B), pooling over the particular postcued visual location and grating orientation.
Ten healthy volunteers aged 22 to 37 years (mean = 27, SD = 5; two women, eight men) with normal vision and hearing gave written informed consent to take part in the fMRI experiment, which was approved by the local ethics committee. Eight of them had participated in the purely behavioral study. During a brief training period outside the scanner, we first confirmed or reconfirmed that participants could perform the visual task and used the auditory cue correctly on more than 80% of trials in a control condition where the auditory cue was presented simultaneously with the visual display. Then, participants trained on the actual postcue delays used in the scanning experiment (200 and 1000 msec, see below) for one or two short runs.
BOLD signals were measured using a 3-T Siemens Allegra scanner (Siemens, Erlangen). To ensure that participants maintained fixation during scanning, we monitored eye position at 60 Hz via a long-range infrared eye tracker (ASL LR504, Applied Science Laboratories). In the experimental session, we collected five runs of 244 fMRI volumes per participant using a gradient-recalled EPI sequence (3 × 3 × 3 mm voxels; 32 slices, repetition time [TR] = 2.08 sec; 106 trials per run). The stimuli and the structure of the trials were identical to those used in the purely behavioral experiment, except that on the basis of the behavioral time course results already obtained, only two critical delays for the auditory postcue were tested in the scanner: 200 and 1000 msec (randomly intermixed). Trials started with a blank period of fixation that varied in duration to allow fixed total trial duration of 4.68 sec. The auditory postcue delay (two levels), the spatial location of the postcued grating (four levels), and the orientation of the postcued grating (four levels) were randomly and independently determined for each trial within a run. Only the postcue delay and the location of the postcued grating were actually relevant for the analysis. Ten blank “null” trials were also inserted randomly within each run. Behavioral responses were recorded via an MR-compatible keypad. At the end of each run, a T1-weighted MP-RAGE image volume (1 × 1 × 1 mm voxels) was acquired to allow coregistration of functional and anatomical data.
In a separate retinotopic meridian mapping session, participants fixated and viewed standard meridian mapping stimuli (Tootell et al., 1995), comprising flickering checkerboard wedges presented along the horizontal and vertical meridians (Supplementary Figure 1A). We collected two to three runs of 205 volumes per participant using a similar MR protocol as for the experimental runs, but now with 24 slices covering the occipital cortex and TR = 1.56 sec. Periods of horizontal meridian stimulation, rest, and vertical meridian stimulation alternated in blocks of 10 volumes each.
In a separate within-quadrant “localizer” session, participants fixated centrally while viewing partial versions of the displays used in the experimental session, now comprising two gratings for one or other diagonal pair of gratings (Supplementary Figure 1B). This allowed us to isolate, within each hemisphere, the regions of V1 and V2, responding specifically to each grating patch. We collected two runs of 167 volumes per participant, using the same fMRI sequence as for the experimental runs. The two diagonal presentations alternated in blocks of three volumes each. The orientations of each of the two gratings in each display varied from block to block independently.
We analyzed all fMRI data using SPM5 (http://www.fil.ion.ucl.ac.uk/spm/software/spm5/). Data from one participant had to be excluded from further analysis because on-line eye tracking during the experimental session revealed systematic saccades toward the postcued locations. The first five volumes of each run were discarded to allow for magnetic saturation effects. The remaining images were realigned and coregistered to each participant's structural scan. The images of the localizer runs were spatially smoothed with a Gaussian kernel of 6 mm FWHM. Data were then analyzed using a voxelwise general linear model (GLM) containing regressors representing each of the experimental conditions plus motion parameters as effects of no interest.
fMRI Analysis: Retinotopic Mapping and ROI Definition
For each participant, we used mrGray (Teo, Sapiro, & Wandell, 1997) for segmentation and cortical flattening of the anatomical scans and then defined borders for early visual areas V1, V2v, and V2d using the meridian activations obtained in the meridian mapping session (Supplementary Figure 1A). Within these areas, we then identified ROIs responding to each of the four gratings in the experimental stimuli by means of the localizer data. These ROIs were defined as clusters of 30 voxels responding best to the presence (vs. absence) of a grating at one of the four visual field locations of the stimuli (Supplementary Figure 1B). The definition of these ROIs (four ROIs in each visual area) was thus entirely independent of the data from main experiment but confirmed their responsivity to the gratings in a particular visual quadrant.
Model-based fMRI Analysis (GLM)
Functional data from the experimental runs were modeled with a GLM using the onsets of the visual stimuli in the eight relevant experimental conditions produced by factorial crossing of postcue delay (two levels: 200 or 1000 msec) with location of the postcued grating (four levels) as delta functions convolved with the canonical hemodynamic response function implemented in SPM5. In a second GLM analysis, the experimental conditions were further divided according to whether the cued grating was successfully reported or not (hit or miss trial). Blank “null” trials were also modeled (to be used as general baseline, see below). Motion correction parameters were modeled as effects of no interest.
On an individual participant basis, we extracted and averaged the regression coefficients (betas) for the voxels within the independently defined ROIs in V1 and V2 for each experimental condition. The mean regression coefficient for blank “null” trials was used as a baseline and subtracted from the mean regression coefficients in the experimental conditions. For each ROI, the four “cued location” conditions were collapsed according to whether the ROI corresponded to the postcued location (“cued”) or not (“uncued”) on each particular trial. In a second analysis, the “cued” condition was separated into “hit” and “miss” trials. Activations were then collapsed across all four ROIs within V1 and V2. Note that the behavioral advantage for 200 versus 1000 msec postcues (see below) did not differ between visual quadrants, F(3, 24) = 1.24, p > .2, ns, and was reliable for each visual quadrant when considered separately (all ps < .005) during scanning.
fMRI Analysis: Time Courses of the Hemodynamic Responses
On an individual participant basis, time series of functional raw data within the previously defined ROIs (see earlier) in V1 and V2 were extracted using the Mars Bar toolbox (http://marsbar.sourceforge.net/) in SPM. Time series data were averaged across all the voxels within each independently defined ROI and high-pass filtered using the same default filter as in the SPM5 analysis to remove slow temporal drifts in the signal. Data from each session were then divided by the session mean (leading to a measure in percent signal change) and interpolated with a resampling parameter of TR/16 (spline interpolation). These time series data were then segmented into time windows of 12 sec after each visual onset (or the equivalent of the visual onset time in blank trials). The evoked hemodynamic responses were calculated by averaging these segmented data across the different trials belonging to the same experimental condition, within each session and then across sessions for each participant. The hemodynamic response obtained in blank “null” trials was used as a baseline and subtracted from the hemodynamic responses obtained in the experimental conditions.
In the purely behavioral experiment, participants were much more accurate in reporting the orientation of the postcued grating for short postcues latencies (100–400 msec) than for longer postcues latencies (800–1000 msec), as revealed by a highly significant main effect of postcue delay, F(7, 28) = 18.69, p < .0001 (see mean performance against postcue delay in Figure 1B). Performance reached a plateau for postcue delays beyond 800 msec (no significant difference between 800 and 1000 msec), t(7) = −0.4, p = .73, paired t test. This time course of the postcueing effect on performance is very similar to the time courses observed in classical postcued report experiments (Lu, Neuse, Madigan, & Dosher, 2005; Sperling, 1960). The plateau reached at 1000 msec was, as classically observed, significantly above chance level that is 25% in the present experiment, t(7) = 9.0, p < .001, and corresponded to an inferred late postcue “capacity” of two to three items (63% accuracy for a four-item display with a 1000-msec postcue). This capacity is in the lower range of the plateau capacities observed in classical partial postcued report experiments, which had typically used alphanumeric stimuli instead. This slight difference may be due to the nature of the visual stimuli used here: oriented gratings, as appropriate for activating early visual cortex in the fMRI part of our study. The plateau capacity observed here matches the capacities estimated in visual memory tasks for simple features such as orientation (Vogel, McCollough, & Machizawa, 2005) or color (Bays, Catalao, & Husain, 2009; Vogel & Machizawa, 2004). However, the more important point for present purposes was that the less delayed postcues (up to around 200 msec after display offset) led to enhanced performance, consistent with possible beneficial top–down modulation of visual processing triggered approximately 200 msec after display offset.
Overall, these behavioral results confirm that postcueing benefits for postcues within approximately 200 msec of display offset can indeed be revealed in a simplified four-item display when participants are required to report low-level attributes of the items such as line orientation (Figure 1A), even with the relatively low set size of four gratings as used here. This seems potentially in accord with our proposition that such postcueing benefits may relate to top–down enhancement of sensory processing in early visual cortex, even when the memory load does not exceed classically determined visual working memory capacity. In the following fMRI experiment, we directly tested this proposition, taking advantage of these simplified visual displays that were particularly suitable for carrying out a retinotopic analysis.
The fMRI experiment was conducted on nine participants (see Methods), using the same visual displays as in the purely behavioral experiment now testing just two different postcue latencies, 200 and 1000 msec, randomly intermixed across trials. These postcue timings were selected because the purely behavioral experiment had shown a clear benefit of postcueing at 200-msec delay poststimulus offset, relative to the plateau reached at 1000 msec poststimulus offset, t(7) = 3.9, p < .005 (Figure 1B). Performance inside the scanner confirmed a highly significant benefit for the shorter vs. the longer postcue delay, t(8) = 9.1, p < .001. Also consistent with the results of the purely behavioral experiment, performance was, however, still above chance at the longer postcue delay, t(8) = 14.9, p < .001.
fMRI Retinotopic Analyses of Postcueing Effects
The fMRI data allowed us to test our hypotheses that the behaviorally beneficial (200-msec delay) auditory postcue would induce retinotopically specific top–down enhancement of activity in early visual cortex and that this enhancement would relate to the accuracy of perceptual report. The 1000-msec postcue allowed us to test also for any impact on visual cortex that was not specific to delay (thus common to the 200- and 1000-msec conditions), as might arise from interrogating visual memory for a specific visual quadrant, regardless of success in perceptual report.
In a standard model-based SPM analysis (see Methods), we first evaluated any impact of the auditory postcue by comparing BOLD signals in the independently defined ROIs on trials where that particular quadrant was postcued versus a different quadrant being postcued instead (respectively “cued” and “uncued” conditions, see Figure 2A). This revealed enhancement specifically for those ROIs in V1 and V2 representing the currently postcued location (main effect of cued > uncued), F(1, 8) = 6.31, p < .05 in V1 and F(1, 8) = 8.69, p < .05 in V2. This particular spatially specific modulation arose regardless of postcue delay, with no interaction between cueing and delay, F(1, 8) < 2, p > .1 both in V1 and V2. This indicates that visual cortex was affected by visual memory being interrogated for a specific location, in accord with other data on possible “baseline shifts” in the absence of current visual stimulation (Martinez et al., 1999). But critically, the next aspect of our findings uncovered a more revealing top–down influence on visual cortex that arose only for the 200-msec delay, now in relation to the success of conscious visual report.
This analysis separated the BOLD response at the currently cued location in V1 and V2 for correct (“hit”) or incorrect (“miss”) cued reports, again compared with when quadrants were uncued instead as a baseline (Figure 2B). This allowed us to evaluate how the general top–down enhancement observed at the cued location in Figure 2A was distributed across hit and miss trials. According to our hypothesis, top–down modulation in visual cortex should interact with successful report only for postcues presented within a critical time window for which behavioral benefits of postcueing are observed (less than a second, as for the 200-msec postcues). In other words, we expect the top–down modulation to be stronger for hits than for misses when the postcue was presented at the 200-msec but not the 1000-msec delay.
At the 200-msec delay, we found a significant increase of activity at the cued location for hit trials relative to baseline uncued activity both in V1, t(8) = 2.92, p < .05, and V2, t(8) = 5.13, p < .005. But importantly, activity at the cued location in miss trials did not show any such significant difference from baseline uncued activity: V1, t(8) = −1.65, ns; V2, t(8) = −1.94, ns. The postcue effect at the 200-msec delay was significantly bigger for hits than misses: V1, t(8) = 3.12, p < .05; V2, t(8) = 4.95, p < .005. Hence, for short delay postcues, the postcueing advantage (cf. Figure 2A) solely arose in hit trials, whereas miss trials corresponded to trials where the same postcue actually failed to trigger this location specific enhancement (Figure 2B).
In contrast, no differences between hit and miss trials were found with the later 1000-msec postcue, neither for V1 nor for V2, all t(8) < 1, all ps > .5, although behavioral performance was still above chance (Figure 1B). This suggests that, at this longer delay, the strength of BOLD signal enhancement at the cued location did not interact with report accuracy. The difference between the BOLD signal for hits and misses was significantly stronger for the 200- versus 1000-msec postcue delay: V1, t(8) = 2.28, p < .05; V2, t(8) = 2.49, p < .05 (one-tailed t tests). This further suggests that the differential BOLD response for hit versus miss trials at short delay was not solely because of fluctuations in initial processing of the visual display when presented nor because of baseline fluctuations before display onset, as any such differences should have arisen equally on the 200- and 1000-msec postcued trials.
In sum, postcues presented at 200 and 1000 msec induced a similar enhancement at the cued retinotopic location in early visual cortex (Figure 2A), consistent with interrogation of visual representations at the postcued location in both cases. However, detailed examination of this cueing effect comparing hit and miss trials revealed clear differences in the way these modulations interacted with perceptual report at the different delays (Figure 2B). Similar strength of postcueing led indifferently to a hit or a miss for long delay (1000 msec) postcues (Figure 2B). In contrast, when the postcue occurred at 200 msec after display offset, only hit trials contributed to the general cueing effect observed in Figure 2A, whereas miss trials seemed to reflect instances where the postcue actually failed to enhance activity at the cued location (Figure 2B). Thus, although both types of postcues had some top–down influence on visual cortex (Figure 2A), these modulations related to perceptual success only when the postcue arrived within the behaviorally beneficial time window after display offset (Figure 2B).
Time courses of the Hemodynamic Responses
Our interpretation was confirmed and further refined by detailed analysis of the time course of the hemodynamic responses in the retinotopic regions of cortex responding to our stimuli, as defined by the independent localizer scans (see Methods). Note that these time course analyses were independent of the GLM-based analyses presented earlier. These analyses allowed us to distinguish effects occurring before or after the auditory postcue started to modulate activity in V1–V2.
We first determined the time point at which the spatial influence of the auditory postcue on visual cortex started to be evident in the hemodynamic responses, irrespective of report accuracy (Figure 3): This corresponded to the first time point where a significant deflection was observed in the hemodynamic curves for cued versus uncued conditions (see black arrows in Figure 3; see also Sligte, Scholte, & Lamme, 2009). On average, across the two cue-delay conditions, the auditory cue influence started to be evident in the hemodynamic curves around 4.29 sec after the onset of the visual display. We thus were able to distinguish an initial time window during which the hemodynamic response to the visual display did not yet reflect the spatial selectivity introduced by the auditory postcue (time window T1 = 0–4.16 sec after visual onset) versus a later time window, starting around 4.29 sec, in which a significant spatially selective influence of the postcue was observed (time window T2 = 4.29–8.45 sec). We set the durations of these two time windows to be equal (both 4.16 sec) to allow further comparison of average activity levels within those two windows on the independent dimension of report accuracy.
We next compared the hemodynamic responses at the cued location for correct (“hit”) or incorrect (“miss”) cued reports (Figure 4A and B), focusing on the time courses of the hit minus miss difference (Figure 4A and B, lower panels). Both in V1 and V2, when the postcue was presented at the short 200-msec delay, a positive difference between hits and misses was apparent throughout the hemodynamic response. In contrast, when the postcue was presented at the longer delay (1000 msec), a positive difference was only observed in the initial phase of the hemodynamic response. These strikingly different time courses of the hit minus miss difference were further assessed by averaging activity levels before and after the postcue started to influence the hemodynamic response (time windows T1 and T2; Figure 4C and D). These two time windows had been identified in the cued versus uncued contrast (see previous section). Importantly, that contrast was independent from the critical hit versus miss contrast reported now. For short delay postcues, this analysis revealed a significant difference in activity levels for hits versus miss trials only during the second time window T2, when the cue modulation was effective in retinotopic cortex: V1–T1, t(8) = 1.55, p = .16; V1–T2, t(8) = 2.67, p < .05; V2–T1, t(8) = 2.18, p = .06; V2–T2, t(8) = 3.63, p < .05. Conversely, for long delay postcues, a significant difference in activity levels for hits versus miss trials was observed only in the initial time window T1, before the cue modulated hemodynamic activity: V1–T1, t(8) = 3.05, p < .05; V1–T2, t(8) = −1.04, p = .33; V2–T1, t(8) = 2.89, p < .05; V2–T2: t(8) = .60, p = .56. The increase in the hit minus miss difference from time window T1 to time window T2 was significantly stronger for short delay postcues than long delay postcues: V1, t(8) = 2.10, p < .05; V2, t(8) = 2.28, p < .05 (one-tailed t tests). Crucially, this increase in the hit minus miss difference induced by short delay postcues was only observed at the cued retinotopic location, whereas on the same trials, no such increase was observed at the other “uncued” retinotopic locations (Figure 4E and F). This further demonstrates that this effect was specifically linked to processing of the cued target.
The results from these time course analyses suggest that when the postcue occurred at a long delay of 1000 msec, when a specific behavioral benefit could no longer be observed (cf. Figure 1B), the success of perceptual report (i.e., hits vs. misses) solely relied on any differences in an initial phase of the hemodynamic response, before the auditory cue started to modulate activity in V1–V2 (Figure 4). This yielded only modest differences in our estimates of brain activation in the model-based analysis of the data (see Figure 2B). However, when presented at a shorter delay of 200 msec, the postcue had a major impact on the capacity to report the stimulus by maintaining and increasing differences in the amplitude of the hemodynamic response evoked for the cued visual stimulus on hit trials in particular. This impact was necessarily top–down because the auditory postcue was presented 200 msec after the offset of the bottom–up visual stimulus. One plausible underlying neural mechanism could be that short delay postcues allow a stimulus-specific neural trace that is still ongoing in visual cortex some time after visual offset (i.e., within ∼200 msec of that offset) to be enhanced via top–down influences and thereby maintained longer. As emphasized by several recent models of perceptual decision making (Sigman & Dehaene, 2005; Gold & Shadlen, 2001; Ratcliff & Rouder, 2000; Schall, 2000), such a process could have a critical impact on perceptual decisions by allowing evidence for the correct perceptual decision to accumulate over a longer period.
To test further the relation between fMRI activity and performance from this perspective, we used the cumulated positive Hit > Miss difference in brain activity over the hemodynamic time course as a physiological index for accumulation of evidence for a correct (“hit”) over an incorrect (“miss”) perceptual decision (Figure 5). For the average hemodynamic response, this index reached a much higher plateau for short delay postcues than long delay postcues at the cued location (a difference denoted Δ in Figure 5A, left panel). No such difference was observed at the other uncued locations (Figure 5A, right panel). We next examined how individual variations in the strength of this physiological Δ index related to individual variations in the strength of the behavioral effect of providing a postcue at 200 versus 1000 msec (percent correct for 200 minus 1000 msec postcues). Consistent with our hypothesis, we found a significant positive relation between these behavioral and physiological indices of short versus long delay postcue benefits in V1 (Figure 5B). This significant physiological–behavioral correlation was only present at the cued location (cued location: R2 = .48, p = .04; uncued location: R2 = .15, p = .30). This provides further new evidence suggesting that the behavioral enhancement of perceptual success by postcues provided within 200 msec of visual offset relates specifically to the top–down influence of such postcues on V1 processing.
Top–Down Modulation of Visual Activity Induced by a Symbolic Auditory Postcue
In the present study, we found that auditory postcues presented after the offset of a visual display can enhance BOLD activity in visual cortex (V1 and V2), specifically at the cued retinotopic location (Figures 2A and 3). Because the postcues were presented in the auditory modality several hundred milliseconds after visual offset (either 200 or 1000 msec later), this enhancement must necessarily reflect a mechanism of top–down modulation of activity in early visual areas by higher order areas. Some previous fMRI studies (Ruff et al., 2007; Nobre et al., 2004) comparing activations evoked by cueing a spatial location either before or after a visual display have shown that pre- and postcueing induce largely overlapping activations in a fronto-parietal network, including the superior parietal lobule and the human FEF. These results suggest that postcueing triggers control processes that are very similar to those underlying voluntary orienting of attention (Corbetta & Shulman, 2002). This was found for postcues presented shortly after visual offset (200 msec; Ruff et al., 2007) or much later (beyond 2 sec; Nobre et al., 2004). Thus, the postcued top–down modulation we observed here in V1–V2 for both short (200-msec) and long (1000-msec) delay postcues (Figure 2A) probably involves similar networks as top–down-selective amplification by the attentional system.
There is now increasing evidence that cueing a specific visual stimulus after its presentation can indeed still induce such selective top–down modulations in sensory areas tuned to the postcued stimulus. Previous imaging work has reported such “retrospective” modulations in relatively high-level visual areas such as lateral occipital cortex (Ruff et al., 2007) or fusiform and parahippocampal gyri (Lepsien & Nobre, 2007). In the present work, we were now able to show that this phenomenon can even extend to retinotopically specific modulations, at the very earliest level of visual cortex. Such top–down modulations at different levels in the visual cortex probably reflect the effect of orienting selective visual attention toward visual traces. However, the enhancement of sensory activity by attention does not necessarily entail better behavioral performance at the attended location, as demonstrated in a recent fMRI study (Houtkamp & Braun, 2010). Critically, in the present study, we further investigated how these attentional top–down modulations were linked to the behavioral ability to successfully report the cued stimulus.
The Effect of Postcued Top–Down Modulation on Perceptual Report Varies with Postcue Delay
Although both short and long delay postcues induced some top–down modulations in early visual areas, the impact of this phenomenon on perceptual report crucially depended on the postcue delay (compare Figure 2A with Figure 2B). Even postcues presented one second after stimulus offset could induce some retinotopically specific top–down modulation in V1–V2 (Figure 2A). However, this top–down modulation was observed irrespective of whether the postcued stimulus was correctly identified or not (see hit/miss in Figure 2B), suggesting that, at this longer delay, the top–down modulation may have simply resulted in a readout of whatever information was still present in early visual areas for the cued retinotopic location. Whether that readout led to hit or miss behavioral reports evidently relied on the strength of the initial sensory processing operated in visual cortex, before the postcue took effect, as shown by the detailed analysis of the hemodynamic responses for hit and miss trials (see Figure 4). In contrast, for shorter delay postcues (200 msec), we observed a stronger effect of top–down modulation for hit than for miss trials (Figure 2B), suggesting that the strength of the postcueing effect at this earlier delay had a decisive impact on the quality of sensory representation at the cued location and thus on subsequent perceptual report. Hit trials now corresponded to trials where not only was the initial sensory processing relatively stronger but critically this advantage was maintained and amplified by the top–down modulation induced by the postcue (Figures 4 and 5A). In V1, the more extended accumulation process induced by short delay (200 msec) postcues accounted for 48% of the interparticipant variance in the behavioral benefit of short delay postcues (Figure 5B).
Our results thus suggest the existence of two different phases in the buildup of sensory representations in early visual cortex, in the context of brief visual displays as used here. In a first phase, within a few hundred milliseconds after visual offset, sensory information relevant to a perceptual decision may still be undergoing processing and therefore remain susceptible to enhancement by top–down modulation, leading to qualitative differences in performance (hit/miss). On the basis of previous fMRI studies showing the involvement of attention-related control areas in postcueing (Ruff et al., 2007; Nobre et al., 2004), we propose that this could reflect modulation of the sensory representation of the cued stimulus resembling those attentional effect typically observed with precues (Thiele, Pooresmaeili, Delicato, Herrero, & Roelfsema, 2009; Carrasco, 2006; Reynolds, Pasternak, & Desimone, 2000). In neural terms, this could correspond to an attentional enhancement of the “stimulus afterdischarge” typically observed at the offset of a visual stimulus (Macknik & Livingstone, 1998). Indeed, electrophysiological studies in monkeys have shown that, when a stimulus is turned off, it triggers an excitatory response in V1 called the “afterdischarge” (Macknik & Livingstone, 1998). Such afterdischarges can last several hundred milliseconds and have been shown to correlate with stimulus visibility (Macknik & Livingstone, 1998). Future work could test this proposal by modulating target duration and thus dissociating the timing of recurrent processing associated with stimulus onset and afterdischarges associated with stimulus offset and/or by using neural measures with higher temporal resolution than fMRI in the paradigm introduced here.
In a later phase of sensory processing, beyond one second poststimulus offset, early visual cortex can still hold some representation of the task-relevant stimuli, as demonstrated by two recent fMRI studies (Harrison & Tong, 2009; Serences, Ester, Vogel, & Awh, 2009). However, our results suggest that this representation can no longer be flexibly improved in a top–down manner. Here top–down modulations induced by postcueing at this later stage could apparently only operate a location-specific readout of the memory trace still present within sensory areas, irrespective of whether this information was sufficient to lead to a correct perceptual report or not.
Role of Top–Down Sensory Modulation in Possible Relation to Classic “Iconic Memory” Paradigms
To optimize our retinotopic study of early visual cortex, here we chose to test the effects of postcueing for a low-level visual feature judgment (orientation) in very simple four-item visual displays. Classical purely behavioral experiments on iconic memory typically tested partial report of alphanumeric characters for displays containing six items or more, up to set sizes of 12 or 16 (de Gardelle, Sackur, & Kouider, 2009; Sperling, 1960). Despite these many procedural differences, the time course of postcueing benefits on performance in the present initial purely behavioral experiment (Figure 1B) seems closely reminiscent of the typical time course of behaviorally inferred iconic memory (Lu et al., 2005). We suggest that the neural effects observed in the present postcueing fMRI experiment may relate to some of the mechanisms involved in classic iconic memory phenomena. The possible contribution of more long-lasting visual “working memory” also here should not be overlooked, although our most critical effects (for the hit/miss difference) disappeared by the time of the 1000-msec postcue, despite the relatively small set size (four items) used here.
In a classical interpretation of iconic memory phenomena, the decline in performance for increasing postcues delay is taken to reflect the decay function of a large capacity but short-lived “iconic memory” (Coltheart, 1983). On such traditional accounts, postcues provided within a critical interval allow selective transfer of the sensory trace still present in the iconic buffer for the postcued location to a more durable, capacity-limited form of storage allowing subsequent report (Coltheart, 1983). Beyond one second, the iconic memory buffer should have decayed entirely so memory performance would only reflect the more limited capacity of the durable form of storage, sometimes termed visual working memory.
Whatever terminology is used for the putative psychological processes that may be involved, the present results show clearly that a partial postcued report procedure can induce very specific top–down modulations even in early visual cortex (Figure 2A). Furthermore, the present results indicate that the influence of postcueing within the so-called iconic time window (i.e., for the 200-msec delay postcues here) might go beyond a simple transfer of the relevant visual information into visual working memory. Indeed if postcueing simply operated as selective readout of passively decaying iconic memory traces, correct perceptual report at different postcue latencies would be directly related to the level of activity at the specific visual location just before the postcue takes effect (during the T1 time window, see Figure 4). However, here the critical advantage of the 200- over the 1000-msec postcues was not evident during the initial phase of the hemodynamic response (T1) but developed in a second phase (T2), when top–down modulation took place in early visual cortex (Figure 4).
Our new results thus suggest that the advantage of early postcues over late postcues in partial report experiments might reflect the combined impact of at least two phenomena: (a) an early postcue allows readout of sensory traces that are still very strong in visual cortex because of the recent physical stimulation (as originally postulated by the classical interpretation of iconic memory and related phenomena) and (b) an early postcue can allow a further top–down enhancement of the sensory representation for the cued location, in a manner that may share close similarities with the sensory enhancement observed when attention is precued at a particular location before the stimulus. In other words, our results suggest that iconic memory traces can be modulated by attention.
From On-line Visual Processing to Visual Working Memory
Along with other recent neuroimaging work, the present results open new perspectives on the relationship between iconic memory, visual attention, and visual working memory and more broadly speak to the transition between perception and memory.
Recent studies on visual working memory have begun to emphasize the role of low-level visual areas in the maintenance of specific sensory traces over surprisingly long delays of several seconds (Harrison & Tong, 2009; Serences et al., 2009; Super, Spekreijse, & Lamme, 2001a). Two recent fMRI studies in humans have shown that, despite a sharp decrease in BOLD signals in V1 after the offset of a visual presentation, activity patterns can still reflect the specific visual features—for example, color or line orientation—that participants are required to retain for a subsequent comparison judgment (Harrison & Tong, 2009; Serences et al., 2009). This selective maintenance of finely tuned sensory traces could be decoded from activity patterns in V1 throughout retention periods as long as 10 sec (Harrison & Tong, 2009; Serences et al., 2009).
Taken together, these data and the present results suggest that—at least for low-level visual features—on-line visual processing, iconic memory, and visual working memory might not correspond to different neuroanatomical stages of information processing but could rather correspond to different phases in information processing within sensory areas themselves. We propose that, after a brief visual presentation, the first few hundred milliseconds of processing in visual cortex still correspond to a “flexible” buildup of perceptual representations. In accord with this proposition, electrophysiological studies in monkeys have shown that the stimulus afterdischarge, typically observed at the offset of a visual stimulus, is reduced when the stimulus is rendered invisible by backward masking (Macknik & Livingstone, 1998). The present results indicate that, conversely, enhancing neural processing after stimulus offset via postcueing can enhance its visibility, provided the postcue is not too delayed. Beyond 1 sec poststimulus, however, the sensory traces seem to be less flexible in the sense that they are no longer susceptible to improvement by top–down enhancement, just as they are also no longer susceptible to impairment by backward masking, unlike at shorter delays (Macknik & Livingstone, 1998). During this second phase, specific traces can still be selected to be maintained in visual working memory as shown by recent neuroimaging work (Harrison & Tong, 2009), but with no improvement of the sensory encoding per se. We suggest that this transition from a highly flexible to a less flexible sensory encoding within early visual cortex could correspond to the transition between visual perception and visual working memory.
Combined Roles of Bottom–Up and Top–Down Processes in Visual Perception
Finally, the present results help refine understanding of the combined roles of bottom–up and top–down effects on visual perception. They provide new evidence in relation to emerging accounts postulating that conscious visual perception may crucially involve top–down influences occurring shortly after the first “feed-forward sweep” of sensory processing (Dehaene et al., 2006; Lamme, 2004; Hochstein & Ahissar, 2002; Rees et al., 2002). Previous empirical support for that view has mainly come from studies showing that differences in the processing of seen versus unseen stimuli appear to relate to relatively late components of the neural activity (Del Cul et al., 2007; Sergent et al., 2005; Super, Spekreijse, & Lamme, 2001b). In early visual areas, these “late” components of neural activity were often found within 100 to 200 msec after stimulus onset (Boehler et al., 2008; Super et al., 2001b). In the present study, combining retinotopic mapping techniques in fMRI with a postcueing paradigm allowed us to provide direct evidence that even top–down modulations initiated beyond 200 msec after the offset of the stimulus can play a crucial role in low-level sensory processing and subsequent perceptual outcome. Further studies using imaging techniques allowing an excellent time resolution, such as EEG or MEG, will be needed to assess the precise time at which this particular late top–down modulation takes place. However, the fact that this modulation was initiated 200 msec after stimulus offset by symbolic auditory postcues here already provides a very strong argument that it must rely on top–down influences from higher level areas.
In conclusion, the present results contribute to our understanding of how activity levels in sensory cortex may modulate perception. They indicate that the perceptual fate of a visual stimulus can be influenced by top–down modulations of anatomically early stages of cortical visual processing arising within a time window of several hundred milliseconds after the physical stimulus has disappeared.
The authors thank Mathias Pessiglione and Romain Valabregue for their support and advice in data analysis. They also thank Catherine Tallon-Baudry, Valentin Wyart, Lionel Naccache, Laurent Cohen, Stanislas Dehaene, and anonymous reviewers for useful comments. This work was supported by a Marie Curie Intra-European Fellowship (C. S.), an EU NEST/Pathfinder grant (C. S., G. R.), and the Wellcome Trust (G. R., J. D.). J. D. is a Royal Society Anniversary Research Professor.
Reprint requests should be sent to Claire Sergent, Centre de Recherche de l'Institut du Cerveau et de la Moelle (CRICM), UPMC-UMRS 975 INSERM-UMR 7225 CNRS, 47 boulevard de l'Hôpital, 75651 Paris Cedex 13, France, or via e-mail: firstname.lastname@example.org.