We used fMRI to study figure–ground representation and its decay in primary visual cortex (V1). Human observers viewed a motion-defined figure that gradually became camouflaged by a cluttered background after it stopped moving. V1 showed positive fMRI responses corresponding to the moving figure and negative fMRI responses corresponding to the static background. This positive–negative delineation of V1 “figure” and “background” fMRI responses defined a retinotopically organized figure–ground representation that persisted after the figure stopped moving but eventually decayed. The temporal dynamics of V1 “figure” and “background” fMRI responses differed substantially. Positive “figure” responses continued to increase for several seconds after the figure stopped moving and remained elevated after the figure had disappeared. We propose that the sustained positive V1 “figure” fMRI responses reflected both persistent figure–ground representation and sustained attention to the location of the figure after its disappearance, as did subjects' reports of persistence. The decreasing “background” fMRI responses were relatively shorter-lived and less biased by spatial attention. Our results show that the transition from a vivid figure–ground percept to its disappearance corresponds to the concurrent decay of figure enhancement and background suppression in V1, both of which play a role in form-based perceptual memory.
Figure–ground segmentation is fundamental to object perception—we perceive objects as distinct from their surroundings. The process of figure–ground segmentation engages a network of visual cortical areas (Likova & Tyler, 2008; Appelbaum, Wade, Vildavski, Pettet, & Norcia, 2006; Lamme, Super, & Spekreijse, 1998). In primary visual cortex (V1), the differentiation of figure and background has been observed in individual neurons, which exhibit increased responses to figures relative to backgrounds composed of similar visual input (Zipser, Lamme, & Schiller, 1996; Lamme, 1995). Both figure enhancement (increased neural responses) and background suppression (decreased neural responses) in V1 are facilitated by extrastriate feedback, especially for figures of low visibility (Hupe et al., 1998), and involve both preattentive visual processing and attention-related mechanisms (Roelfsema, Tolboom, & Khayat, 2007; Roelfsema, Lamme, & Spekreijse, 1998).
fMRI studies of figure–ground segmentation in human observers also suggest that figure–ground mechanisms exist as early as V1 (Strother et al., 2011; Skiera, Petersen, Skalej, & Fahle, 2000). These studies reported increased fMRI responses in V1 corresponding to perceived figure–ground segmentation. Interestingly, neither of these studies observed decreases in V1 fMRI responses, which one would expect if background suppression is crucial to perceived figure–ground segmentation. Several other fMRI studies reported increased figure–ground fMRI activity in extrastriate visual areas (Wong, Aldcroft, Large, Culham, & Vilis, 2009; Ferber, Humphrey, & Vilis, 2003, 2005; Large, Aldcroft, & Vilis, 2005; Schira, Fahle, Donner, Kraft, & Brandt, 2004), but these studies also failed to observe fMRI responses corresponding to background suppression in V1. This is surprising because perceived figure–ground boundaries automatically engage attentional mechanisms (Egeth & Yantis, 1997), which are known to produce concomitant enhancement and suppression of neural responses in V1, even in the absence of visual stimulation (Silver, Ress, & Heeger, 2007; Kastner, Pinsk, De Weerd, Desimone, & Ungerleider, 1999).
We measured figure–ground fMRI responses in V1 during the perception of motion-defined forms (annuli) composed of line fragments that were superimposed on a background of similar line fragments. Our experiments consisted two stimulus phases: a motion phase during which the shapes moved and were highly visible and a subsequent static persistence phase (Figure 1) during which shapes briefly remained visible but eventually became camouflaged (shapes that persist or vanish abruptly can be viewed at www.tutis.ca/demos/circle.swf), despite sustained attention to the figure. In addition to V1, we measured fMRI responses in lateral occipital cortex (LO), which mediates shape perception (Kourtzi & Kanwisher, 2000) and represents figures rather than backgrounds (Appelbaum et al., 2006; Goh et al., 2004). Previous studies observed figure–ground fMRI responses in LO using stimuli similar to ours (Strother et al., 2011; Wong et al., 2009; Ferber et al., 2003, 2005; Large et al., 2005), but these studies were not designed to delineate V1 fMRI responses to a figure and its surrounding background.
The main goal of this study was to delineate retinotopically specific V1 fMRI responses to figure and background. This allowed us to determine whether concomitant figure enhancement and background suppression in V1 would correspond to perceived figure–ground segmentation and figure-related responses in LO. We observed sustained positive fMRI responses in portions of V1 corresponding to the retinal location of the figure that were similar to those observed in LO. These positive figure-related fMRI responses were accompanied by negative responses in portions of V1 corresponding to the retinal location of the background. The temporal dynamics of these positive and negative fMRI responses in V1 differed substantially, and although the earliest evidence of persistence in V1 corresponded to background suppression, both figure enhancement and background suppression occurred during the perceptual persistence of figure–ground segmentation. Our findings support the existence of a V1 ↔ LO circuit that participates in figure–ground segmentation and mediates the short-term perceptual memory of global form, the latter of which occurs at an intersection of mnemonic and sensory function (Emrich, Ruppel, & Ferber, 2008; Ferber & Emrich, 2007; Ferber et al., 2005). This proposed V1 ↔ LO circuit is consistent with an interactive account of perceived figure–ground segmentation and neural activity in V1 (Tong, 2003) and may provide visual structure for selective attention, as has been proposed for other image parsing mechanisms (Roelfsema & Houtkamp, 2011; Qiu, Sugihara, & von der Heydt, 2007). We show that the V1 portion of the circuit can be decomposed into a concomitant representation of a figure and its background.
We scanned 12 healthy right-handed volunteers (three women and nine men, age = 21–42 years). All participants gave written consent, and all experiments were approved by the University of Western Ontario Ethics Review Board.
fMRI Data Acquisition
We performed our experiments using a 3-T Siemens Magnetom Tim Trio imaging system. BOLD data were collected using T2*-weighted interleaved, single-segment EPI, and a 32-channel head coil (Siemens, Erlangen, Germany). Functional data were aligned to high-resolution anatomical images obtained using a 3-D T1 magnetization prepared rapid gradient ech sequence (echo time [TE] = 2.98 msec, repetition time [TR] = 2300 msec, inversion time = 900 msec, flip angle [FA] = 15°, 192 contiguous slices of 1.0-mm thickness, field of view [FOV] = 192 × 256 mm2). Scanning parameters for obtaining functional data are reported separately in the following descriptions of our experimental procedures. In all experiments, images were back-projected onto a screen and viewed through a mirror. The display extended approximately 50° horizontally and 24° vertically.
The Figure–Ground Experiments
We performed the fMRI studies of figure–ground persistence using an event-related design. Subjects viewed annuli composed of discontinuous line fragments that were superimposed on a background of similar line fragments and thus camouflaged (Figure 1; www.tutis.ca/demos/circle.swf). Following a stationary fixation period of 0.4 sec at the beginning of each trial, the annulus figure rotated 15° clockwise and 15° counterclockwise in alternation for 3 sec. After the figure stopped moving, the line fragments comprising the figure either remained (persist) or disappeared (vanish); see http://www.tutis.ca/demos/circle.swf. Subjects indicated with a button press the time at which the figure disappeared (henceforth “response time” or RT). In both cases, the background remained, and the figure re-appeared at the beginning of the next epoch (15.6 sec after figure motion stopped in the previous epoch). Participants maintained fixation on a centered stationary dot throughout each scan.
We employed three different figure sizes: Annuli were either 1.5°, 6°, or 12° in diameter (in separate scans) superimposed on a 15° × 15° background of disconnected line fragments. Persist and vanish conditions were randomly permuted and counterbalanced within each scan. Subjects performed up to four functional scans per figure size, 20 epochs per scan, each epoch lasting 19 sec. All 12 subjects participated in the 6° (mid-sized) scans; eight of these subjects also participated in the 12° (large) scans, and seven participated in the 1.5° (small) scans. Scanning parameters for obtaining functional data were as follows: FOV = 240 mm × 240 mm, in-plane pixel size = 2 × 2 mm, TE = 30 msec, TR = 1 sec (single shot), volume acquisition time = 2 sec, FA = 90°, 18 slices (slice thickness = 2 mm).
V1 and LO Localizer Scans
We performed an eccentricity localizer to identify voxels in V1 that responded most strongly to a flickering annulus with dimensions that matched those of the annuli used in our figure–ground experiments described previously. The flickering annulus was presented for 16 sec and alternated with 16-sec blank fixation periods (also 16 sec); this cycle repeated 20 times during an individual scan. We performed one to three runs for each subject.
We also used a standard retinotopic mapping procedure to further confirm the cortical boundaries of V1. Subjects viewed phase-reversing (temporal frequency = 2 Hz), 100% contrast-defined checkerboard wedge (with a spatial frequency of ∼0.85 cycle/°). The wedge stimulus subtended 45° and extended 15° of visual angle into the periphery. This wedge started at the 12-o'clock position (90° upright, upper visual field, apex at center screen) and rotated anticlockwise to the 6-o'clock position. The duration of each phase-reversing wedge was 2 sec, after which the wedge location revolved anticlockwise around the center of the screen by 15° (resulting in 33% overlap between each wedge and its successor). At the end of each half-cycle (26 sec), the wedge returned to the 12-o'clock position. Individual runs consisted eight half-cycles, each lasting 24 sec. We again performed one to three retinotopic mapping scans for each subject.
Finally, we performed a conventional LO localizer experiment (one to three scans) in which we presented subjects with intact 2-D grayscale photographs of faces, places, and common everyday objects, which alternated with scrambled versions of the same images. Functional scans consisted 19 epochs, each of which was 15 sec long. Fifteen images were presented in each epoch at 1-sec intervals while subjects maintained fixation on a centered stationary dot. The parameters for obtaining functional data in the localizer experiments were as follows: FOV = 240 mm × 240 mm, in-plane pixel size = 2 × 2 mm (but 3 × 3 mm in the retinotopic mapping experiment), TE = 30 msec, TR = 2 sec (single shot), volume acquisition time = 2 sec, FA = 90°, 18 slices (slice thickness = 2 mm).
Image Analysis and ROIs
Image analysis was carried out using the Brainvoyager QX software. 3-D statistical maps were calculated for each subject based on a general linear model. Retinotopic visual areas were identified using retinotopic maps obtained for each subject, and eccentricity-specific V1 ROIs were identified as the ∼100 voxels showing the greatest fMRI response to each of the three figure sizes in the eccentricity localizer experiment, for both right and left hemispheres (all V1 ROIs were located along or adjacent to the calcarine sulcus; Figure 2, top). In addition to V1, we identified LO as ∼1000 contiguous voxels in both hemispheres that showed significantly stronger activation (p < 10−4) to intact versus scrambled objects (Figure 2, bottom). In all subjects, the cortical location of LO was consistent with the relative positions of anatomical landmarks, MT+, and other retinotopically defined visual areas in LO.
Previous studies defined persistence-related fMRI responses as those that were greater in the persist condition as compared with the vanish condition following the offset of figure motion, because the stimuli in the two conditions are identical during the motion phase (Strother et al., 2011; Wong et al., 2009; Ferber et al., 2003, 2005; Large et al., 2005). As in these previous studies, whole-brain analyses revealed persistence-related responses in LO (but not adjacent MT+) in all subjects, for all annulus sizes (persist > vanish; always p < 10−3). Individual whole-brain analyses also showed significant persistence-related responses in early visual areas including V1. We used an ROI approach (see Methods) to study these in more detail and to compare the time course of V1 fMRI responses to those in LO. We also examined the degree to which fMRI responses were correlated with our behavioral measure of persistence.
fMRI Results for Mid-sized Figures
The right side of Figure 2 shows results for the mid-sized annulus figure–ground experiment. The graphs are event-related averages for the persist and vanish conditions in the V1 “figure” and “background” ROIs and LO, which were averaged across hemispheres and subjects. The colors used to plot these averages correspond to those used in the ROIs (Figure 2, left); the shaded region indicates figure motion (Figure 1, left). The main difference between V1 “figure” and “background” ROI results is that fMRI responses in the “figure” ROI were initially positive for both persist and vanish conditions (Figure 2, dark and light blue lines, respectively), and those in the “background” ROIs were negative (green and pink lines), especially in the persist condition (darker green and pink lines). As in previous studies (Strother et al., 2011; Large et al., 2005; Ferber et al., 2003), fMRI responses in LO (orange lines) were positive and resembled those in the V1 “figure” ROI.
Maximal fMRI responses in the “figure” ROI (Figure 2, blue lines) were observed 7–10 sec into the trial (4–7 sec after motion offset) for both persist and vanish conditions, in all of our subjects, and remained positive in the persist condition until the end of the trial. To assess the divergence of persist and vanish fMRI responses, we performed paired-samples t tests at each point in time for each subject. Previous studies showed that the divergence of persist and vanish fMRI responses reflects perceptual persistence (Strother et al., 2011; Large et al., 2005; Ferber et al., 2003). We were particularly interested in a temporally sustained divergence so we required that a significant difference be observed for each of three successive time points (e.g., a significant difference would have to be observed in separate t tests for 9, 10, and 11 sec to claim statistically significant divergence of persist and vanish at 10 sec); our approach is conservative in the sense that it requires a minimal temporally sustained divergence of 3 sec. Using this approach, we found that the earliest significant divergence of persist and vanish in the “figure” ROI occurred 10 sec into the trial (7 sec after motion offset)—at this point, shortly after V1 “figure” responses had reached a maximum, fMRI responses in the persist condition became significantly greater than those in the vanish condition (persist > vanish; p < .05, two-tailed t test).
In contrast to the V1 “figure” fMRI responses, the persist responses in both “background” ROIs (Figure 2, green and pink lines) became significantly more negative than the vanish responses (persist < vanish; p < .05, two-tailed t test). This divergence of persist and vanish fMRI responses in the “background” ROIs occurred 7 sec into the trial (4 sec after motion offset), 3 sec earlier than the divergence of persist and vanish responses (persist > vanish) in the V1 “figure” ROI. By the end of each trial, persist responses were always greater than vanish responses in both the “figure” and “background” ROIs (and in LO), but this was most apparent in the “figure” ROI.
Behavioral Results for Mid-sized Figures
As in previous studies (Strother et al., 2011; Large et al., 2005; Ferber et al., 2003), RTs in the persist condition were significantly greater than those in the vanish condition (p < .05, two-tailed t test performed on median RTs); we report these RTs relative to motion offset. The average median vanish RT was .50 sec (after motion offset) and ranged from .38 sec (average minimum [min]) to 1.1 sec (average maximum [max]). The average median persist RT was 3.7 sec after motion offset (individuals' medians ranged from 1 to 8.5 sec after motion offset). The bottom-most graph in Figure 2 (right) shows the average median persist RT (black dot) for mid-sized figures, with horizontal error bars corresponding to the average min and max RTs. With respect to the fMRI results in Figure 2, significant differences between the persist and vanish conditions occurred during the min–max range of RTs for all ROIs.
Persist–Vanish for Different Figure Sizes
Before we assessed the relationship between our fMRI results and our behavioral results, we further investigated the degree to which fMRI responses in the persist condition were different from those in the vanish condition and whether this depended on the size of the figure. As in a previous study by Strother et al. (2011), we computed the difference between fMRI responses in the persist and vanish conditions (persist–vanish) as a measure of the effect of the continued perception of the annulus after motion offset; a similar measure has been used in single-unit studies as a measure of figure enhancement (e.g., Roelfsema, Lamme, Spekreijse, & Bosch, 2002). The top graph in Figure 3 shows the time course of this persist–vanish measure for the mid-sized annuli. The solid black line shows persist–vanish values for the V1 “figure” ROI, and the dashed line corresponds to the average persist–vanish values for the two V1 “background” ROIs. These showed similar time courses (Figure 2, green and pink lines) that were statistically indistinguishable (always p > .25, using the statistical approach described earlier in reference to the results in Figure 2). These time courses were therefore averaged for our derived persist–vanish measure shown in Figure 3 (dashed line). The gray line shows persist–vanish values for LO.
For mid-sized annuli, persist–vanish remained near zero for the first few seconds following motion offset in all conditions and ROIs, which was expected because the persist and vanish stimuli were identical during the motion phase (and corresponds to a lack of significant differences between the persist and vanish fMRI responses in the shaded region of Figure 2). As stated earlier, the greatest differences between persist and vanish fMRI responses were observed later in the trial for the V1 “figure” ROI (where persist > vanish; solid black line in Figure 3, topmost graph) and LO (gray line) than those for the V1 “background” ROI (where persist < vanish; dashed black line in Figure 3, topmost graph). Persist–vanish in the V1 “background” ROIs (dashed line) was maximally negative, 8 ± 1 sec into the trial (5 ± 1 sec after motion offset), and returned to zero before the maximal persist–vanish values (solid line). Unlike the V1 “background” values (dashed line), the negative V1 “figure” values (solid black line) observed ∼6 sec into the trial were not significantly different from baseline. That is, the first fMRI evidence of persistence was observed earlier in the V1 “background” ROIs than in the V1 “figure” ROI and was negative in its direction (i.e., persist < vanish). Furthermore, this V1 “background” persistence effect occurred in closer temporal proximity to observer reports of perceptual persistence as compared with the maximal V1 “figure” persistence effect.
The persist–vanish results for the small and large annuli were similar to those observed for the mid-sized annuli: Persist > vanish in the V1 “figure” ROIs (solid black lines) always occurred later in the trial than did persist < vanish in the V1 “background” ROIs (dashed lines), as did persist > vanish in LO (gray lines). The maximal persist > vanish values in the “figure” ROIs (solid black lines) were generally more variable than maximal V1 “background” values, which always occurred in closer temporal proximity to the median behavioral RTs for the three different annulus sizes (Figure 3, black dots and horizontal error bars). Statistical analyses of the persist RTs did not show any significant eccentricity effect (i.e., effect of annulus size), although the longest RTs were observed for mid-sized figures (this is evident in the horizontal error bars). In short, although there were some subtle differences, the overall temporal dynamics of the results in Figure 3 were similar for all three figure sizes.
As a test of correspondence between individuals' reports of perceptual persistence and their fMRI results, we performed multiple Pearson correlation analyses using an approach similar to that used by Strother et al. (2011), except that we limited our analyses to fMRI responses that occurred within a limited temporal range that encompassed the min–max persist RTs (horizontal error bars) shown in Figures 2 and 3. We assessed the correlation between individuals' median persist–vanish RTs and the magnitude of the persist–vanish fMRI measure between time points from 5 sec (2 sec after motion offset, to take into account hemodynamic delay) to 14 sec. We did this for all ROIs using maximally positive persist–vanish values observed between 1 and 14 sec for V1 “figure” responses (and LO) and maximally negative values for V1 “background” responses during the same period. Because of the relatively fewer subjects who participated in the small and large annuli conditions, we only present behavior–fMRI correlations for the mid-sized figures.
We observed a significant positive correlation of persistence RTs and V1 “figure” responses (r = .68, p < .05), which means that greater persist–vanish RTs corresponded to greater persist–vanish fMRI responses. For V1 “background,” we observed a negative but statistically insignificant correlation (r = −.43, p = .12), which means that the V1 “background” fMRI responses are not predictive of individuals' behavioral reports. In LO, we observed a significant positive correlation (r = .62, p < .05) comparable to that observed for the V1 “figure” ROI.
We conducted an additional correlation analysis of individuals' persistence RTs and the combined V1 “figure” and “background” fMRI responses to see if it better predicted of perceptual persistence than either the V1 “figure” or “background” responses alone. The rationale for this was as follows: Although the RT–“background” fMRI response correlation was not significant, it showed a negative trend, as would be expected if suppression favors figure–ground segmentation. Therefore, some subjects' RTs—the shortest RTs—may have more accurately reflected the “background” fMRI responses and, thus, more accurately reflected the persistent visibility of the figure rather than a slightly later stage of the transition from strong figure–ground segmentation into spatial attention to the location of the figure, which had just disappeared. The others' RTs (the majority of our subjects) may have instead been more strongly biased by sustained attention to the location of the figure because the transition between the two processes (i.e., decay of figure–ground representation) is perceptually gradual and highly subjective.
We combined the V1 “figure” and “background” responses by subtracting the “background” responses from the “figure” responses, which resulted in a positive combined V1 figure–ground fMRI measure because the “background” responses were negative and the “figure” responses were positive. This measure reflects the maximal difference between the “figure” and “background” time courses in Figure 3 for each subject. Using this combined figure–ground V1 fMRI response measure, we observed a stronger correlation with persist–vanish RTs (r = .85, p < .01) than that observed for either V1 “figure” or LO (the V1 “figure” and “background” time courses were not correlated with each other or with those from LO; r is always <.15, p is always >.13). Thus, the best predictor of persist–vanish RTs was a combination of V1 “figure” and “background” fMRI responses, which supports our proposal that both “figure” and “background” fMRI responses contribute to perceptual persistence.
We measured fMRI responses in V1 and LO during motion-based figure–ground segmentation and its perceptual persistence following the offset of figure motion. We observed positive V1 and LO fMRI responses corresponding to the figure both during and after figure motion, and we observed negative V1 responses corresponding to the background. Our use of persist and vanish conditions allowed us to obtain a measure of form-related fMRI responses that were not contaminated by motion-related effects. These form-related fMRI responses exhibited different temporal dynamics once the figure stopped moving. The decay of the negative “background” fMRI responses occurred earlier than the decay of the positive “figure” fMRI responses. Furthermore, the “figure” responses never returned to baseline and therefore reflect continued visual processing related to something other than the continued visibility of the figure, because our subjects indicated that the figures always eventually disappeared. We propose that this additional latent visual processing is attention-related.
Our results confirm that background suppression results during figure–ground segmentation and attentional selection (Caputo & Guerra, 1998; Hupe et al., 1998). Our results also show that sustained attention is insufficient to suppress the eventual interference of a highly camouflaging background and the corresponding decay of the V1 “background” fMRI responses. The latter demonstrates a limit on known effects of attention on V1 fMRI responses (Brefczynski-Lewis, Datta, Lewis, & DeYoe, 2009; Silver et al., 2007; Brefczynski & DeYoe, 1999; Kastner et al., 1999). We discuss our results in relation to figure–ground representation in V1, visual awareness and attention, and form-based perceptual memory.
Figure–Ground Representation in V1
We observed positive “figure” and negative “background” fMRI responses for all three figure (annulus) sizes. This suggests that V1 activity corresponding to the retinal vicinity of the figure–ground boundary was enhanced and was accompanied by a concurrent suppression of V1 activity corresponding to the background. The observation of positive fMRI responses in V1 during the motion phase is consistent with a previous fMRI study in which motion-based boundary information was shown to be represented in V1 (Reppas, Niyogi, Dale, Sereno, & Tootell, 1997). Other studies have shown both positive and negative fMRI responses in V1 to highly salient contrast-defined static annuli (Pasley, Inglis, & Freeman, 2007; Shmuel, Augath, Oeltermann, & Logothetis, 2006). Importantly, these and other fMRI studies (Silver et al., 2007; Muller & Kleinschmidt, 2004; Smith, Williams, & Singh, 2004) showed that negative V1 fMRI responses were because of neural suppression rather than blood stealing. Likewise, we interpret our positive–negative fMRI results as reflecting neural enhancement–suppression during the persistence of figure–ground segmentation (which we measured as the difference in persist and vanish fMRI responses; Figure 3) because the temporal pattern of decay in the negative fMRI time courses was strikingly different from that observed in the positive time courses. The negative fMRI responses were relatively weak and always returned to baseline before the positive fMRI responses, which eventually decayed but did not necessarily return to baseline by the end of the trial.
Persistence-related fMRI responses in V1 (and also in LO) corresponding to the figure remained elevated until the end of the trial, whereas V1 suppression of the background returned to baseline shortly after observers reported the disappearance of the figure, which is consistent with a ∼2-sec hemodynamic lag for V1 fMRI responses (Miezin, Maccotta, Ollinger, Petersen, & Buckner, 2000; Menon, Luknowsky, & Gati, 1998). In short, concomitant figure enhancement and background suppression in V1 may better predict the perceptual disappearance of the figure than figure enhancement alone. Our correlation analyses strongly support this possibility. This suggests that background suppression in V1 may be equally essential to perceived figure–ground segmentation as figure enhancement, as in push–pull accounts of recurrent figure–ground processing (Scholte, Jolij, Fahrenfort, & Lamme, 2008; Super, Spekreijse, & Lamme, 2001). Furthermore, the fact that the “figure” fMRI responses in V1 (and in LO) better predicted our behavioral results than the “background” fMRI responses means that individuals' behavioral responses (which were highly variable) probably reflected additional processes in addition to figure–ground segmentation in V1. One possibility is that the persist RTs observed in our study reflected sustained attention to the location of a previously visible figure, which would have emerged gradually during the decay of the figure because this decay was not abrupt in the persist condition (as it was in the vanish condition), and that this was reflected in fMRI responses corresponding to the location of the figure but not the background. We discuss this further in the next section.
Background Suppression and Attention
Enhancement and suppression of V1 fMRI activity strikingly similar to that reported here has been shown to occur in the absence of visual stimulation (Silver et al., 2007) and interpreted to be a result of spatial attention. By virtue of our task, observers sustained attention to the figure, which nevertheless eventually disappeared—attention was not sufficient to sustain the initial salience of the figure. An attention-related account of our results is therefore inadequate. In terms of figure–ground representation in V1, attention was likewise insufficient to sustain the initial positive–negative figure–ground fMRI responses. These figure–ground fMRI responses in V1 may underlie the perceptual experience of a salient figure–ground boundary (Super, van der Togt, Spekreijse, & Lamme, 2003), and our behavior–fMRI analyses strongly support this possibility.
In our results, the failure of attention to sustain initial fMRI responses was most evident in the decay of the V1 “background” responses, for which background suppression (persist < vanish) became zero shortly (within 5 sec) after observers indicated the disappearance of the figure (Figure 3). In contrast, the figure enhancement (persist > vanish) became significantly greater than zero considerably later (≥5 sec after perceptual disappearance) and may reflect sustained attention to the location of the figure in the absence of its perceptibility. This would imply that the spatio-temporal allocation of attention is context-dependent in that enhanced V1 responses to a location are not necessarily accompanied by suppression of other locations to the same degree, if at all, as during the perception of a salient figure. Therefore, if spatial attention is involved in figure–ground segmentation, it operates differently during perceived figure–ground segmentation than during visual processing of the same stimulus in the absence of perceived segmentation.
It is conceivable that object-based attention (Shomstein & Yantis, 2002; Egly, Driver, & Rafal, 1994; Egly & Homa, 1991), which results in sensory enhancement (Corbetta, Miezin, Dobmeyer, Shulman, & Petersen, 1990) even in V1 (Ciaramitaro, Mitchell, Stoner, Reynolds, & Boynton, 2011), operates during perceptually salient figure–ground segmentation and may serve to “label” grouped image elements during perceptual organization (Roelfsema & Houtkamp, 2011). However, as the perceptual salience of the figure decreases, the effect of object-based attention may be replaced by voluntarily directed spatial attention to the location of the figure, which eventually becomes imperceptible. This would be consistent with our observation of positive persist–vanish “background” fMRI responses toward the end of the trial (Figures 2 and 3), which may reflect a gradual decrease in the focus of attention to figure location following the decay of the figure, such as a gradual spreading of a “Mexican hat” distribution (Muller, Mollenhauer, Rosler, & Kleinschmidt, 2005). Taken together, our results support the view that object-based attention and space-based attention interact but are functionally distinct (Muller & Kleinschmidt, 2003).
Finally, our findings are consistent with the idea that stimulus-driven figure–ground segmentation mechanisms provide visual structure for attentional selection (Qiu et al., 2007; Egeth & Yantis, 1997) and extend this idea to a distributed form-processing hierarchy, one that maintains representations of global structure in the absence of perceptual binding and segmentation cues. Recent studies have shown that, in addition to the ventral visual pathway, parietal areas in the dorsal pathway play a role in object perception (McMains & Kastner, 2011; Konen & Kastner, 2008). Future fMRI studies of figure–ground persistence may help elucidate the function of parietal cortex in maintaining figure–ground segmentation and the perception of object shape and its involvement in the gradual transition from figure salience (segmentation) to invisibility (camouflage) that leads to the emergence of spatial attention to the location of a previously salient figure.
Form-based Perceptual Memory
Although attention is presumably important to figure–ground segmentation, the persistence of figure–ground fMRI responses in V1 and LO, although initially stimulus-driven, reflects the brief perceptual memory of form (first proposed by Ferber et al., 2003) by a distributed network of visual cortical areas. The premise for this interpretation is that the gradual decay of a form percept and its neural correlates during the persist condition as compared with the vanish condition occurs over several seconds in the absence of an initial cue to figure–ground segmentation (motion). Persistence has been shown to last perceptually for up to several seconds using a variety of stimuli and experimental conditions (Wallis, Williams, & Arnold, 2009; Wong et al., 2009; Emrich et al., 2008; Ferber et al., 2003, 2005; Large et al., 2005) and is thought to reflect an intermediate, form-based memory store (Ferber & Emrich, 2007). Unlike iconic memory, which is considerably shorter in duration (Coltheart, 1980; Di Lollo, 1977), persistence occurs because of camouflage rather than the complete removal of visual information specifying a previously salient object. This also makes persistence distinct from other recently proposed types of visual STM related to figure–ground segmentation (O'Herron & von der Heydt, 2009; Sligte, Scholte, & Lamme, 2009).
Only one previous study reported figure–ground persistence in V1 (Strother et al., 2011), and this study reported only positive persistence-related fMRI responses similar to those observed in our V1 “figure” ROIs and in LO. This study was not designed to delineate positive–negative figure–ground fMRI responses in V1. The authors of this study concluded that V1 receives shape-related feedback from LO and other higher-tier visual areas that represent object shape. Their interpretation is consistent with other reports of shape-related modulation of V1 activity by LO (Williams et al., 2008; Appelbaum et al., 2006; Murray, Kersten, Olshausen, Schrater, & Woods, 2002). The current results extend this idea by showing that negative fMRI responses also reflect persistent form-related perceptual processing. In terms of visual function, this implies that LO represents objects but not the surrounding background, as proposed by others (Appelbaum et al., 2006; Goh et al., 2004), whereas V1 represents the retinal location of the figure–ground boundary (Skiera et al., 2000; Lee, Mumford, Romero, & Lamme, 1998). It is possible that these cortically disparate V1 and LO representations mutually reinforce one another in a bidirectional V1 ↔ LO neural circuit, thus resulting in figure–ground perceptual memory in V1 that is much longer than one would expect for individual V1 neurons in the absence of object-related feedback from higher-tier visual areas. In addition to LO, some of this object-related feedback might also originate in object-selective parietal cortex (Konen & Kastner, 2008).
An interesting consequence of our interpretation concerns the function of individual neurons in V1, some of which respond to local orientation, and others, to the direction of local motion information (Devalois & Devalois, 1980). The results reported here and those from previous studies showed that the persistence of motion-defined form in LO is not accompanied by persistence in MT+ (Large et al., 2005; Ferber et al., 2003). This is important for two reasons: First, results from fMRI studies suggest that MT+ is involved in motion-defined boundary perception (Likova & Tyler, 2008; Reppas et al., 1997) and exhibits object-selective fMRI responses (Konen & Kastner, 2008). Second, the neurons in V1 that gave rise to our observed effect of persistence in V1 appear to interact with LO but not MT+. This means that, although MT+ may be involved in motion-boundary extraction and object perception, it does not appear to be involved in figure–ground perceptual memory in the absence of motion. That is, MT+ may be involved in the perception of motion-defined form, but this circuit is cortically distinct from the V1 ↔ LO circuit that we posit to underlie figure–ground persistence and thus supports the idea of functional segregation at the earliest cortical stages of visual processing.
Our results suggest that figure enhancement and background suppression in V1 play different but complementary roles in figure–ground segmentation and form-based perceptual memory. We have argued that background suppression may be more closely tied to the decay in spatial resolution or vividness of a persistent figure–ground percept than figure-related fMRI responses or subjects' perceptual judgments of figure–ground decay, both of which may be more strongly biased by the subjective nature of our task and sustained spatial attention to the location of our figures after their disappearance. Additional studies will be necessary to elucidate the interaction of figure enhancement and background suppression in our proposed V1 ↔ LO circuit. Nevertheless, our results clearly show that both are involved in figure–ground segmentation and form-based perceptual memory, which means that V1 should not be considered a unitary entity in our proposed circuit. Although we treated LO as a unitary figure-related entity in this study, an interesting possibility is that it too can be broken down into figure and background responses. A literal interpretation of the view that LO represents figures and not backgrounds (Appelbaum et al., 2006; Goh et al., 2004) argues against this possibility. However, the observation of coarse retinotopy in LO (Strother, Aldcroft, Lavell, & Vilis, 2010; Sayres & Grill-Spector, 2008; Larsson & Heeger, 2006) means that this conceptualization of LO may be oversimplified and that it may be possible to delineate “figure” and “background” fMRI responses in LO in future studies of figure–ground representation and form-based perceptual memory.
Reprint requests should be sent to Lars Strother, The Brain and Mind Institute, University of Western Ontario, London, ON, Canada N6A 5B7, or via e-mail: firstname.lastname@example.org.