Abstract
The temporal sequence of neural processes supporting figure–ground perception was investigated by recording ERPs associated with subjects' perceptions of the face–vase figure. In Experiment 1, subjects continuously reported whether they perceived the face or the vase as the foreground figure by pressing one of two buttons. Each button press triggered a probe flash to the face region, the vase region, or the borders between the two. The N170/vertex positive potential (VPP) component of the ERP elicited by probes to the face region was larger when subjects perceived the faces as figure. Preceding the N170/VPP, two additional components were identified. First, when the borders were probed, ERPs differed in amplitude as early as 110 msec after probe onset depending on subjects' figure–ground perceptions. Second, when the face or vase regions were probed, ERPs were more positive (at ∼150–200 msec) when that region was perceived as figure versus background. These components likely reflect an early “border ownership” stage, and a subsequent “figure–ground segregation” stage of processing. To explore the influence of attention on these stages of processing, two additional experiments were conducted. In Experiment 2, subjects selectively attended to the face or vase region, and the same early ERP components were again produced. In Experiment 3, subjects performed an identical selective attention task, but on a display lacking distinctive figure–ground borders, and neither of the early components were produced. Results from these experiments suggest sequential stages of processing underlying figure–ground perception, each which are subject to modifications by selective attention.
INTRODUCTION
The famous face–vase drawing (Rubin, 1921) has long been recognized for its remarkable properties. Although physical input to the eyes remains fixed, the observer's perceptual experience alternates between two different figure–ground configurations: two face profiles versus one central vase. Because these perceptual switches are not driven by changes in sensory input, the face–vase stimulus has become an important tool for laboratory studies aimed at investigating figure–ground processing in the brain.
Figure–ground perception is likely to involve several stages of visual processing. Following initial sensory registration of contours, it is hypothesized that the visual system carries out a process called “border ownership assignment” (or edge assignment) in which regions adjacent to each contour are grouped with either the figure or the background (von der Heydt, Zhou, & Friedman, 2003; Albright & Stoner, 2002; Zhou, Friedman, & von der Heydt, 2000; Koffka, 1935; Rubin, 1921; see Craft, Schutze, Niebur, & von der Heydt, 2007, and Sakai & Nishimura, 2006, for neural models). The regions that are grouped with the figure are then given priority in subsequent processing. This preferential selection of the figure region is often referred to as “figure–ground segregation” (Rubin, 2001; see Roelfsema, Lamme, Spekreijse, & Bosch, 2002, Vecera & O'Reilly, 1998, and Sporns, Tononi, & Edelman, 1991, for neural models).
Neuroimaging studies employing the face–vase stimulus have reported increased activations in the fusiform face area (FFA) during periods when subjects perceived the faces as the foreground figure versus the vase (Andrews, Schluppeck, Homfray, Matthews, & Blakemore, 2002; Hasson, Hendler, Ben Bashat, & Malach, 2001). The electrophysiological counterpart of FFA activation is the well-known N170/vertex positive potential (VPP) component of the visual event-related potential (ERP), which is elicited selectively by stimuli that include faces or facial features (Itier & Taylor, 2004; Rossion, Joyce, Cottrell, & Tarr, 2003; Allison, Puce, Spencer, & McCarthy, 1999; Jemel, George, Olivares, Fiori, & Renault, 1999; Puce, Allison, & McCarthy, 1999). It has not been determined, however, whether the N170/VPP is modulated by perception of the face versus vase, nor is it known whether earlier stages of figure–ground processing (possibly reflecting border ownership) can be isolated by comparing ERPs associated with different figure–ground perceptions.
To investigate the time course of the neural processes underlying figure–ground perception, we recorded ERPs associated with different perceptual reports of the face–vase image. The major advantage of using the face–vase as an experimental stimulus is that it allows isolation of neural activity linked with perceptual change while sensory input is held constant. However, a well-recognized obstacle in using ambiguous figures as stimuli is the variability and unpredictability in the timing of perceptual switching (van Ee, 2005). For ERP studies especially, in which recordings must be time-locked to observable events, it is critical to obtain a temporally precise measure of subjects' ongoing perceptions. In previous experiments, ERPs were time-locked to subjects' perceptual reports when an ambiguous figure was presented continuously (Basar-Eroglu, Struber, Stadler, Kruse, & Basar, 1993), or to stimulus onset when an ambiguous figure was presented intermittently with brief blank-screen intervals (e.g., Pitts, Martinez, Stalmaster, Nerger, & Hillyard, 2009; Kornmeier & Bach, 2004). In the latter situation, perceptual reports were given after each stimulus presentation. Although these techniques were fruitful in measuring large and relatively late (>250 msec) ERP components associated with perceptual switches, neither led to the identification of early (<200 msec) percept-specific modulations. This failure to detect early percept-based ERP differences may have been due to uncontrolled variability in the latencies between percepts and reports, or to variability in the latencies between stimulus onset and perceptual disambiguation.
To overcome these obstacles, we adopted a technique in which the stimulus was continuously visible, and subjects reported their changing percepts by pressing one of two buttons. Each button press then immediately triggered a brief flash (“probe”) to one of three regions: the face region, the vase region, or the borders between the two regions. ERP recordings were time-locked to probe onset and categorized according to the subjects' perceptual reports of face or vase. ERPs were then compared when the same probe was perceived as appearing on the figure versus the background region (or on the shared border when it was “owned” by the face vs. the vase). Thus, by probing the visual system during existing perceptual states, these ERPs were used to track perception-related modulations at different stages of figure–ground processing.
In a second experiment, we investigated the role that selective attention may play in figure–ground perception of the face–vase figure. The same three probes were presented, but instead of being triggered by perceptual reports, the probes were presented at regular intervals while subjects were instructed to selectively attend to either the face or vase regions in order to detect infrequent probes having a longer duration. Similar ERP comparisons were made as in the first experiment (probe on attended figure vs. probe on ignored ground or probe on shared border while the face vs. the vase region was attended).
A third experiment was aimed at ensuring that ERP modulations in Experiments 1 and 2 reflected early stages of figure–ground perception as opposed to simple enhancements of probe-evoked activity by spatially directed attention. That is, in both experiments, it seemed possible that subjects' face or vase perceptions would be associated with their attending to different regions of space. To test this alternative, subjects in Experiment 3 performed the same selective attention task as in Experiment 2, but now the attended regions lacked distinct contours necessary for figure–ground segregation. Taken together, these experiments revealed two previously undescribed ERP components associated with early stages of figure–ground perception and examined the influence of selective attention on each of these stages.
METHODS
Participants
Fifty-four subjects participated; eighteen in each of the three experiments (Experiment 1: 11 women, ages 19–33 years, mean age = 23 years; Experiment 2: 11 women, ages 19–33 years, mean age = 24 years; Experiment 3: 12 women, ages 19–31 years, mean age = 23 years). Subjects were recruited as volunteers, and informed consent was obtained prior to the beginning of each experiment. All procedures were approved by the University of California San Diego Institutional Review Board.
EEG Recording
Brain electrical activity was recorded noninvasively from the scalp using a commercially available electrode cap (Electro-Cap International, Eaton, OH) with 64 electrode placements. The electrode impedances were kept below 5 kΩ. Scalp signals were amplified by a battery powered amplifier (SA Instrumentation, Encinitas, CA) with a gain of 10,000 and band-pass filtered from 0.1 to 80 Hz. Signals were digitized to disk at 250 Hz. During task performance, eye position and eye movements were monitored by means of vertical (V) and horizontal (H) electrooculogram (EOG) recordings. A right mastoid electrode served as the reference for all scalp channels and the VEOG. Left and right HEOG channels were recorded as a bipolar pair. Each recording session lasted 120–180 min, including setup time and cap/electrode preparation. Short breaks were given after each block of trials to help alleviate subject fatigue. Blocks lasted approximately 2.5 min.
ERP Analyses
ERPs were time-locked to probe onset, baseline corrected from −100 to 0 msec, and low-pass filtered at 30 Hz. Trials were discarded if they contained an eye blink or eye movement artifact, or if any electrode channel exceeded defined signal amplitudes. On average, 13% of trials per individual were rejected due to these artifacts. Averaged mastoid-referenced ERPs were calculated off-line as the difference between each scalp channel and an average of the left and right mastoid channels. In general, comparisons of interest were between ERPs associated with perception of (or attention to) the face or vase figures for each type of probe separately. ANOVA was used to test mean amplitude differences between perceptual and attentional states across subjects. Time windows and electrodes for ERP measurements were selected based on topographical distributions and were centered around the maximum amplitudes of the components of interest.
Source Analyses
The neural generators of the ERP components in each individual subject's difference waves (formed by subtracting ERPs associated with different perceptual or attentional states for each probe separately) were modeled using a minimum-norm linear inverse solution approach that involves local autoregressive averaging (LAURA; Grave de Peralta Menendez, Murray, Michel, Martuzzi, & Gonzalez Andino, 2004). The LAURA solution space included 4024 evenly spaced nodes (6 mm3 spacing), restricted to the gray matter of the MNI’s average brain. No a priori assumptions were made regarding the number or location of active sources. Time windows for estimating the sources of each component were the same as in the ERP statistical analyses.
LAURA solutions were computed and transformed into a standardized coordinate system (Talairach & Tournoux, 1988) and exported into the AFNI software package (Cox, 1996) for statistical analyses. Source intensity values for the 4024 nodes were transposed into a 91 × 91 × 109 voxel space (2 mm3 voxel sizes). t tests were conducted over all voxels (using an alpha threshold level of p < 1 × 10−8 to correct for multiple significance testing) in order to identify consistent sources across subjects. For visualization purposes, statistical maps were projected onto structural (MNI) brain images.
EXPERIMENT 1
Procedure
The traditional face–vase drawing (black peripheral face profiles, white central vase, black frame) was modified by adjusting the face and vase regions to neutral gray, adding black contour lines to separate the regions, and framing these regions with a darker gray background. This stimulus and the three types of white probes (face, vase, and border) are shown in Figure 1. The face–vase image subtended a 3.7° × 3.4° viewing angle and was centrally presented on a computer monitor with a frame rate of 60 Hz. Participants maintained their gaze on a small (0.25°) centrally located fixation cross that was visible throughout all trials.
Stimuli and procedure used in Experiment 1. Subjects viewed the face–vase stimulus (gray with black borders) and indicated their perceptions by pressing one of two response keys. Each keypress immediately triggered the presentation of one of three ERP-eliciting “probes” (white) to the face regions, vase region, or borders, or in some cases, no probes were presented (only a digital trigger) to assess the contribution of motor potentials to the ERPs.
Stimuli and procedure used in Experiment 1. Subjects viewed the face–vase stimulus (gray with black borders) and indicated their perceptions by pressing one of two response keys. Each keypress immediately triggered the presentation of one of three ERP-eliciting “probes” (white) to the face regions, vase region, or borders, or in some cases, no probes were presented (only a digital trigger) to assess the contribution of motor potentials to the ERPs.
Subjects were instructed to view the stimulus and indicate when they clearly perceived the face or the vase as the foreground figure by pressing one of two buttons with their right hand. Immediately following each button press (on the next monitor scan; 0–16 msec), one of three white probes was briefly flashed (100 msec duration) on the face region, vase region, or borders between the two. In some cases, a no-probe event (only a digital code) was triggered by the subject's button press to enable subtraction of motor-response related components from the ERPs time-locked to the probes. Subjects were asked to maintain their reported perception while viewing each probe and then to voluntarily switch their perception and report when the alternative percept was clear. They continued to alternate and report percepts until 35–40 reports were made. No time limit was given for percept-switching. On average, subjects reported alternative percepts every 3400 msec (SD = 1050 msec) and took slightly longer (107 msec) to switch from face-to-vase compared to vase-to-face, although this difference was not significant (p = .22). The three types of probes along with the no-probe trials were presented in random order. Prior to quantitative analyses, the averaged ERPs on the no-probe trials were subtracted from those on each type of probe trial. This subtraction eliminated contamination from motor and response-related potentials while maintaining percept-related differences. Thirty experimental blocks were administered while the continuous EEG was recorded, resulting in a total of 1100 trials per subject (300 for each type of probe and 200 for the no-probe trials).
Results
ERP Results
ERP comparisons were made between face and vase perceptions for each of the three probes separately. Figure 2 shows ERP and difference waves for each of the probes. For the border probe, ERP amplitudes for the two perceptions began to diverge at ∼100 msec. This difference persisted until ∼190 msec and was maximal over parietal–occipital scalp regions at ∼150 msec. Mean amplitudes across a 132–172 msec time window were tested by ANOVA with the factors perception (face, vase), hemisphere (left, center, right), and channel (PO3, PO7, POz, Oz, PO4, PO8). A significant main effect of perception (p = .011), and no interaction between perception and hemisphere was found (Table 1). For convenience, we refer to this component as the border difference (Bd).
ERPs (blue, red), difference waves (black dashed), and difference wave scalp topographies for the three types of probes (shown in left column). When the borders were probed (top row), ERPs differed between the two percepts from ∼100 to 190 msec (Bd). For the vase and face probes (middle and bottom rows), difference waves represent perceived figure minus background. When regions perceived as figure (vs. background) were probed, ERPs were more positive between ∼150 and 200 msec (FGd). Also, when the face regions were probed, the vertex positive potential (VPP) was larger for face versus vase perceptions (from ∼190 to 240 msec).
ERPs (blue, red), difference waves (black dashed), and difference wave scalp topographies for the three types of probes (shown in left column). When the borders were probed (top row), ERPs differed between the two percepts from ∼100 to 190 msec (Bd). For the vase and face probes (middle and bottom rows), difference waves represent perceived figure minus background. When regions perceived as figure (vs. background) were probed, ERPs were more positive between ∼150 and 200 msec (FGd). Also, when the face regions were probed, the vertex positive potential (VPP) was larger for face versus vase perceptions (from ∼190 to 240 msec).
Experiment 1: Main Effects
Probe (Time Window) . | Amplitudes . | F(1, 17) . | p . | ||
---|---|---|---|---|---|
Face Percept . | Vase Percept . | Difference . | |||
Border (132–172 msec) | −2.93 | −4.10 | 1.17 | 8.19 | <.011 |
Vase (172–192 msec) | −2.86 | −1.33 | 1.53 | 15.87 | <.001 |
Face (172–192 msec) | −0.97 | −1.94 | 0.97 | 5.22 | <.035 |
Face (204–244 msec) | 5.52 | 4.00 | 1.52 | 9.58 | <.007 |
Probe (Time Window) . | Amplitudes . | F(1, 17) . | p . | ||
---|---|---|---|---|---|
Face Percept . | Vase Percept . | Difference . | |||
Border (132–172 msec) | −2.93 | −4.10 | 1.17 | 8.19 | <.011 |
Vase (172–192 msec) | −2.86 | −1.33 | 1.53 | 15.87 | <.001 |
Face (172–192 msec) | −0.97 | −1.94 | 0.97 | 5.22 | <.035 |
Face (204–244 msec) | 5.52 | 4.00 | 1.52 | 9.58 | <.007 |
Amplitudes are mean voltages (μV) at electrode locations described in the text.
For the face and vase probes, ERP amplitudes were more positive when the probed region was perceived as figure (vs. as background) across a ∼150–200 msec time window at occipital scalp locations. The difference in mean amplitude for each probe was tested (from 172 to 192 msec) by separate ANOVAs with the factors perception (face, vase), hemisphere (left, center, right), and channel (PO3, PO7, O1, POz, Oz, Iz, PO4, PO8, O2). For the face probe, the main effect of perception was significant (p = .035), and there was no interaction between perception and hemisphere. For the vase probe, the main effect of perception was highly significant (p = .001), and the Perception × Hemisphere interaction approached significance (p = .068) (Table 1). We hereafter refer to these positive difference components as the figure–ground difference (FGd). Figure 2 shows the scalp topographies of these difference wave components.
An additional, centrally distributed amplitude modulation was identified for the face probe over the interval 190–240 msec. This component closely resembles the VPP reported in previous face perception studies (Sui, Zhu, & Han, 2006; Holmes, Winston, & Eimer, 2005; Itier & Taylor, 2002; Campanella et al., 2000; Rossion et al., 1999). The VPP has been shown to co-occur with the well-known face-sensitive N170 component and likely reflects the positive end of the same dipole generator (Holmes et al., 2005; Jemel et al., 2003; Rossion et al., 1999, 2003; Campanella et al., 2000). Because the topography of the N170 is ventral-temporal (i.e., overlapping the mastoids), it is best observed with an average reference, whereas the VPP is more prominent with an averaged mastoid reference (Joyce & Rossion, 2005). The mean amplitude difference of the VPP component (from 204 to 244 msec) was tested by ANOVA with the factors perception (face, vase), hemisphere (left, center, right), and channel (C1, CP1, Cz, CPz, C2, CP2). When the face was perceived as figure, the face probe led to larger VPP amplitudes (p = .007). The vase probe did not produce any modulation corresponding to the VPP (p = .17).
Source Localization Results
Figure 3 shows the cortical sources estimated by LAURA for the Bd, FGd, and VPP components. Estimations for the Bd revealed strong sources in the left and right posterior-ventral lateral occipital complex (LOC). Sources for the FGd were situated in more anterior-ventral LOC regions and were stronger in the left versus right hemisphere (for the vase probe, only left hemisphere sources were consistent across subjects). The VPP was estimated to have generators in the right fusiform gyrus, as well as in the left LOC. Talairach coordinates and anatomical regions for the center of mass of each source are provided in Table 2.
Source analysis results (LAURA) for each of the difference wave components found in Experiment 1: border difference (Bd), figure–ground difference (FGd), and vertex positivity difference (VPP). Source estimates that were consistent across subjects are shown in red, orange, and yellow.
Experiment 1: Talairach Coordinates for Estimated Sources
Difference Component . | x . | y . | z . | Anatomical Region(s) . |
---|---|---|---|---|
Bd | −20 | −86 | −10 | left middle occipital gyrus |
29 | −81 | −10 | right middle occipital gyrus | |
FGd (vase) | −32 | −78 | −10 | left middle occipital gyrus |
FGd (face) | −33 | −76 | −9 | left middle occipital gyrus |
22 | −81 | −9 | right fusiform gyrus | |
VPP | −32 | −77 | −9 | left middle occipital gyrus |
44 | −60 | −10 | right fusiform gyrus |
Difference Component . | x . | y . | z . | Anatomical Region(s) . |
---|---|---|---|---|
Bd | −20 | −86 | −10 | left middle occipital gyrus |
29 | −81 | −10 | right middle occipital gyrus | |
FGd (vase) | −32 | −78 | −10 | left middle occipital gyrus |
FGd (face) | −33 | −76 | −9 | left middle occipital gyrus |
22 | −81 | −9 | right fusiform gyrus | |
VPP | −32 | −77 | −9 | left middle occipital gyrus |
44 | −60 | −10 | right fusiform gyrus |
Coordinates represent the centers of mass of the LAURA sources that were consistent across subjects.
Results Summary and Discussion
Two ERP components that appear to be associated with early stages of figure–ground perception were identified. The first was an early difference (at ∼100–190 msec) isolated by comparing ERPs elicited by border probes when subjects perceived the faces versus the vase as figure. This component (the Bd) was followed by a positivity (at ∼150–200 msec) when the region perceived as figure (vs. ground) was probed (for both face and vase regions; the FGd). Both the Bd and the FGd were estimated to have generators in the ventral regions of the LOC, an area previously implicated as important for shape and object processing (Eger, Ashburner, Haynes, Dolan, & Rees, 2008; Martinez, Ramanathan, Foxe, Javitt, & Hillyard, 2007; Ewbank, Schluppeck, & Andrews, 2005; Ferber, Humphrey, & Vilis, 2005; Mandon & Kreiter, 2005; Altmann, Bulthoff, & Kourtzi, 2003; Hasson, Harel, Levy, & Malach, 2003; Kourtzi & Kanwisher, 2000; Kanwisher, Woods, Iacoboni, & Mazziotta, 1997; Malach et al., 1995).
These two early components (Bd and FGd) preceded the well-known VPP component, which most likely reflects specific structural encoding of faces (Rossion et al., 1999). Here, the timing of the VPP was slightly delayed, but this would be expected because critical facial features such as eyes were missing, and the faces were shown in profile instead of frontal view (Rossion et al., 2003; Itier & Taylor, 2002; Eimer, 1998, 2000; Jemel et al., 1999). VPP amplitudes were larger for ERPs elicited by the face probe when the faces (vs. the vase) were perceived as figure. This result is consistent with previous fMRI studies reporting greater FFA activation during face versus vase perception (Andrews et al., 2002; Hasson et al., 2001), and previous ERP (and fMRI) studies that found larger N170/VPP amplitudes (and FFA activation) associated with the perception of faces versus nonface objects (Holmes et al., 2005; Itier & Taylor, 2004; Rossion et al., 2003; McCarthy, Puce, Gore, & Allison, 1997). Consistent with several previous studies, right fusiform gyrus sources were identified for the N170/VPP modulation (Itier & Taylor, 2004; Rossion et al., 2003; Allison et al., 1999; Jemel et al., 1999; Puce et al., 1999), although a source in the left LOC was also identified.
In order to evaluate the reliability and functional characteristics of these early ERP components (Bd, FGd), two follow-up experiments were conducted. Experiment 2 investigated the role that selective attention to the face/vase regions might play in generating these early components, and Experiment 3 tested whether these components might simply reflect the allocation of spatially directed attention.
EXPERIMENT 2
Procedure
As in Experiment 1, subjects viewed the face–vase stimulus and three types of probes (face, vase, and border) while the ongoing EEG was recorded. Instead of reporting their percepts, however, subjects were instructed to selectively focus attention on the face regions or the vase region and respond to infrequent target probes in the attended region. If border assignment and figure–ground segregation occur automatically and are independent of attention, then the perception of the face or vase as figure should be equally likely regardless of which region is attended. Therefore, the percept-related components found in Experiment 1 should be absent in difference waves comparing ERPs elicited by attended versus ignored probes. Alternatively, if attention influences figure–ground assignment, explicit attention to the face or vase regions should bias that region to be perceived as figure more often than not. In this case, the associated ERP components should be similar to the percept-related components found in Experiment 1.
During each block of trials, subjects sustained attention to either the face or vase region (counterbalanced across blocks) and made a button-press response upon detection of infrequent longer-duration (184 msec) target probes (20%) appearing in the attended region, while ignoring probes in the unattended and border regions (Figure 4). The standard (nontarget) probes (80%) were of the same duration (100 msec) as in Experiment 1. The different types of probes were presented in random order with a 500–700 msec stimulus onset asynchrony. Behavioral performance was monitored throughout the experiment. Responses were counted as correct if made within 1500 msec of target stimuli and were otherwise counted as false alarms. Practice trials were administered until subjects' d′ values for target discrimination were greater than 2.0. During each block of trials, 150 probes (40 short-duration probes of each kind and 15 long-duration face and vase targets) were presented. Twelve blocks of trials were administered resulting in a total of 240 probes of each type for each attention condition.
Stimuli and procedure used in Experiment 2. Subjects viewed the face–vase stimulus and, during different blocks, attended to either the face or vase region in order to detect infrequent longer-duration (184 msec) target probes. Nontarget (100 msec) face, vase, and border probes (white) were presented in random order.
Stimuli and procedure used in Experiment 2. Subjects viewed the face–vase stimulus and, during different blocks, attended to either the face or vase region in order to detect infrequent longer-duration (184 msec) target probes. Nontarget (100 msec) face, vase, and border probes (white) were presented in random order.
Results
ERP Results
ERPs elicited by the onset of each standard probe were averaged separately as a function of whether attention was directed toward the face or vase region. To eliminate contamination from motor and response-related potentials, trials in which manual responses were made 500 msec before or after probe onsets were excluded from further analyses. Figure 5 shows grand-averaged ERPs, difference waves, and scalp topographies for the components of interest. The same early ERP components identified in Experiment 1 (Bd, FGd), as well as the VPP modulation, were also evident in this experiment.
ERPs (blue, red), difference waves (black dashed), and difference wave scalp topographies for the three types of probes (shown in the left column). When the borders were probed (top row), ERPs differed between the two attentional states from ∼100 to 190 msec (Bd). For the vase and face probes (middle and bottom rows), difference waves represent attended minus unattended regions. When the attended (vs. unattended) regions were probed, ERPs were more positive between ∼150 and 200 msec (FGd). Also, when the face regions were probed, the vertex positive potential (VPP) was larger for face versus vase attention (from ∼180 to 210 msec). Selection negativities (SNs) were also evident from ∼220 to 300 msec for attended versus ignored face and vase probes.
ERPs (blue, red), difference waves (black dashed), and difference wave scalp topographies for the three types of probes (shown in the left column). When the borders were probed (top row), ERPs differed between the two attentional states from ∼100 to 190 msec (Bd). For the vase and face probes (middle and bottom rows), difference waves represent attended minus unattended regions. When the attended (vs. unattended) regions were probed, ERPs were more positive between ∼150 and 200 msec (FGd). Also, when the face regions were probed, the vertex positive potential (VPP) was larger for face versus vase attention (from ∼180 to 210 msec). Selection negativities (SNs) were also evident from ∼220 to 300 msec for attended versus ignored face and vase probes.
For each of the early components, mean amplitudes were tested with ANOVA across the same time windows and with the same factors as in Experiment 1 (except the factor “perception” was changed to “attention”). Main effects of attention are reported in Table 3. For the border probe, ERP amplitudes for the two attention conditions began to diverge at ∼100 msec and this difference was maximal over parietal–occipital scalp regions. The latency and topography of this effect was highly similar to the Bd found in Experiment 1. A significant main effect of attention (p = .0001), and no interaction between attention and hemisphere, was found for the Bd.
Experiment 2: Main Effects
Probe (Time Window) . | Amplitudes . | F(1, 17) . | p . | ||
---|---|---|---|---|---|
Face Attention . | Vase Attention . | Difference . | |||
Border (132–172 msec) | −3.75 | −5.76 | 2.01 | 23.87 | <.0001 |
Vase (172–192 msec) | −3.18 | −2.45 | 0.73 | 2.49 | <.133* |
Face (172–192 msec) | −2.48 | −3.61 | 1.13 | 7.38 | <.015 |
Face (184–204 msec) | 3.66 | 2.54 | 1.12 | 5.72 | <.029 |
Probe (Time Window) . | Amplitudes . | F(1, 17) . | p . | ||
---|---|---|---|---|---|
Face Attention . | Vase Attention . | Difference . | |||
Border (132–172 msec) | −3.75 | −5.76 | 2.01 | 23.87 | <.0001 |
Vase (172–192 msec) | −3.18 | −2.45 | 0.73 | 2.49 | <.133* |
Face (172–192 msec) | −2.48 | −3.61 | 1.13 | 7.38 | <.015 |
Face (184–204 msec) | 3.66 | 2.54 | 1.12 | 5.72 | <.029 |
Amplitudes are mean voltages (μV) at electrode locations described in the text.
*Note that for the vase probe, t tests showed nearly significant differences (p < .057) for the right hemisphere channels.
For the face and vase probes, ERP amplitudes were more positive when that region was attended (vs. unattended) across a 172–192 msec time window over the occipital scalp. These effects closely resembled the FGd from Experiment 1. The main effect of attention for the face probe was significant (p = .015) and there was no interaction between attention and hemisphere. For the vase probe, the main effect of attention approached significance (p = .133) and there was no significant Attention × Hemisphere interaction. Post hoc t tests (justified by significant Perception × Channel interaction) revealed that the FGd to the vase probes averaged over the three right hemisphere channels differed in amplitude between the attention conditions at an alpha level < .057 (mean amplitude difference = 1.02 μV). The face probe also elicited a larger VPP component when the face region was attended, but at slightly earlier latencies compared to Experiment 1. This effect was significant across a 184–204 msec time window (p = .029), and there was a significant interaction between attention and hemisphere [F(2, 34) = 5.08, p = .012], with larger amplitudes over right central scalp regions. The vase probe did not produce any modulation corresponding to the VPP (p = .38).
Unlike Experiment 1, the ERPs to the attended face and vase probes included broad negativities between ∼200 and 300 msec poststimulus onset that appear equivalent to the previously reported selection negativity (SN) component (Hillyard & Anllo-Vento, 1998; Harter & Aine, 1984). The SN is typically observed when ERPs elicited by unattended stimuli (e.g., face probe when vase region is attended) are subtracted from the ERPs elicited by the same stimulus when attended. When attended locations are probed, the probes are selected as potential target candidates that require further perceptual analysis; this selection is reflected in the SN. No SNs were observed in Experiment 1 because subjects were simply reporting their perceptions and were not required to discriminate the probes. Because the focus of these experiments was on the earlier modulations, we did not test the SN component statistically.
To confirm that our attention manipulations were successful, we calculated d′ for each subject; d′ was similar for the face and vase attention conditions (3.13 and 2.97, respectively), indicating that subjects discriminated relevant targets accurately. Selectivity was also indicated in the P3 component elicited by attended targets. P3 amplitudes, measured at nine central–parietal electrodes from 400 to 500 msec, were larger for attended than unattended face targets [mean difference = 9.33 μV; F(1, 17) = 86.83, p < .000001] and vase targets [mean difference = 9.08 μV; F(1, 17) = 79.03, p < .000001]. These P3 amplitude differences were also positively correlated with d′ for both face targets (r = .49, p = .04) and vase targets (r = .53, p = .02), thus verifying the use of the P3 as an index of selective target discrimination.
Source Localization Results
All source estimates were similar to those of Experiment 1. Figure 6 shows estimated sources for the Bd in posterior-ventral LOC, sources for the FGd in anterior-ventral LOC, and generators of the VPP in the right fusiform gyrus and left LOC. The estimates for the vase probe difference waves contained a right hemisphere source in this experiment (vs. only a left hemisphere source in Experiment 1), which is consistent with the scalp distribution differences across the two experiments. Talairach coordinates for the sources are provided in Table 4.
Source analysis results (LAURA) for each of the difference wave components found in Experiment 2: border difference (Bd), figure–ground difference (FGd), and vertex positivity difference (VPP). Source estimates that were consistent across subjects are shown in red, orange, and yellow.
Experiment 2: Talairach Coordinates for Estimated Sources
Difference Component . | x . | y . | z . | Anatomical Region(s) . |
---|---|---|---|---|
Bd | −19 | −86 | −9 | left middle occipital gyrus |
21 | −84 | −10 | right middle occipital gyrus | |
FGd (vase) | −34 | −77 | −10 | left middle occipital gyrus |
28 | −80 | −11 | right middle occipital gyrus | |
FGd (face) | −33 | −70 | −11 | left fusiform gyrus |
39 | −71 | −10 | right fusiform gyrus | |
VPP | −30 | −77 | −10 | left fusiform gyrus |
43 | −62 | −11 | right fusiform gyrus |
Difference Component . | x . | y . | z . | Anatomical Region(s) . |
---|---|---|---|---|
Bd | −19 | −86 | −9 | left middle occipital gyrus |
21 | −84 | −10 | right middle occipital gyrus | |
FGd (vase) | −34 | −77 | −10 | left middle occipital gyrus |
28 | −80 | −11 | right middle occipital gyrus | |
FGd (face) | −33 | −70 | −11 | left fusiform gyrus |
39 | −71 | −10 | right fusiform gyrus | |
VPP | −30 | −77 | −10 | left fusiform gyrus |
43 | −62 | −11 | right fusiform gyrus |
Coordinates represent the centers of mass of the LAURA sources that were consistent across subjects.
Results Summary and Discussion
In this experiment, even though subjects were only concerned with attending to regions and discriminating longer-duration target probes, the Bd and FGd components were very similar to those seen in Experiment 1. These results suggest that components associated with early stages of figure–ground perception can be produced by directing attention to the figure region and support the view that figure–ground perception and selective attention are closely linked. The VPP enhancement for face versus vase attention in Experiment 2 further supports this view by providing evidence that figure–ground segregation was completed and face representations activated even when subjects were not required to report their face/vase perceptions.
However, an alterative explanation for these early ERP modulations is that they may be based on differences in the spatial focus of attention and not necessarily on figure–ground perception. Specifically, in Experiment 2, attention was spatially divided between the two face regions or focused within a single central vase region (another way to view this would be in terms of a larger vs. smaller attentional “spotlight”). In Experiment 1, even though the task was to report figure–ground perceptions, the distribution of spatial attention may have covaried with perception and contributed to the ERP modulations. A third experiment was designed to explicitly test this alternative hypothesis.
EXPERIMENT 3
Procedure
The design of this experiment was identical to Experiment 2, but the stimuli were tailored to allow isolation of spatial attention modulations from figure–ground effects. The solid-line borders of the face–vase stimulus were replaced with sparsely dotted wavy lines to minimize figure–ground segregation (see Figure 7). Subjects viewed the three-part display which was physically identical in luminance and approximately equal in size to the face–vase stimulus. The probes in this experiment were also matched in luminance and overall size to those used in Experiments 1 and 2. Subjects were instructed to attend to either the spatially separated regions or the central region (counterbalanced across blocks) in order to detect infrequent longer-duration probes (targets). As in Experiment 2, border probes were always task-irrelevant. All target/nontarget probabilities, probe durations, stimulus onset asynchronies, and number of trials were identical to those of Experiment 2.
Stimuli and procedure used in Experiment 3. Subjects viewed the “figure-less” gray display, and during different blocks, either divided their attention to the outer regions or focused their attention on the central region (regions were defined by the sparsely dotted black lines). The task was to detect infrequent longer-duration (184 msec) probes in the attended region. Nontarget (100 msec) divided, central, and border probes (white) were presented in random order.
Stimuli and procedure used in Experiment 3. Subjects viewed the “figure-less” gray display, and during different blocks, either divided their attention to the outer regions or focused their attention on the central region (regions were defined by the sparsely dotted black lines). The task was to detect infrequent longer-duration (184 msec) probes in the attended region. Nontarget (100 msec) divided, central, and border probes (white) were presented in random order.
Results
ERP Results
ERP amplitude differences were tested by using the same latency windows and ANOVA as in Experiment 2. None of the early components observed in Experiments 1 and 2 were evident in this experiment. In particular, when the borders were probed, ERPs did not differ between the two attentional states from ∼132 to 172 msec. For the central and divided probes, when the attended (vs. unattended) region was probed, ERPs were not more positive between ∼172 and 192 msec. Also, when the divided regions were probed, the VPP was not larger for divided versus central attention (from ∼184 to 204 msec), as the stimuli no longer led to a face perception. Table 5 shows the results from ANOVA and Figure 8 shows the ERPs, difference waves, and scalp topographies for the same electrodes and time windows as shown for Experiment 2.
Experiment 3: Main Effects
Probe (Time Window) . | Amplitudes . | F(1, 17) . | p . | ||
---|---|---|---|---|---|
Divided Attention . | Central Attention . | Difference . | |||
Border (132–172 msec) | −3.01 | −3.22 | 0.21 | 0.46 | <.505 |
Central (172–192 msec) | −1.56 | −1.47 | 0.09 | 0.05 | <.822 |
Divided (172–192 msec) | −1.98 | −0.71 | −1.27 | 17.84 | <.001a |
Divided (184–204 msec) | 2.49 | 2.15 | 0.34 | 1.24 | <.282 |
Probe (Time Window) . | Amplitudes . | F(1, 17) . | p . | ||
---|---|---|---|---|---|
Divided Attention . | Central Attention . | Difference . | |||
Border (132–172 msec) | −3.01 | −3.22 | 0.21 | 0.46 | <.505 |
Central (172–192 msec) | −1.56 | −1.47 | 0.09 | 0.05 | <.822 |
Divided (172–192 msec) | −1.98 | −0.71 | −1.27 | 17.84 | <.001a |
Divided (184–204 msec) | 2.49 | 2.15 | 0.34 | 1.24 | <.282 |
Amplitudes are mean voltages (μV) at electrode locations described in the text.
aNote the divided probe effect reflects the early SN; the effect is of opposite polarity as the face probe effects from Experiments 1 and 2.
ERPs (blue, red), difference waves (black dashed), and difference wave scalp topographies for the three types of probes (shown in the left column). The ERP differences found in the first two experiments were absent in this experiment. Latency windows tested are marked with solid bars. As in Experiment 2, selection negativities (SNs) were evident for attended versus ignored probes.
ERPs (blue, red), difference waves (black dashed), and difference wave scalp topographies for the three types of probes (shown in the left column). The ERP differences found in the first two experiments were absent in this experiment. Latency windows tested are marked with solid bars. As in Experiment 2, selection negativities (SNs) were evident for attended versus ignored probes.
As expected, the SN was produced under these conditions. As Figure 8 shows, the SN for the divided probes began earlier in this experiment than the SN for the face probes in experiment 2, possibly due to the absence of figure–ground processing. This difference led to a significant main effect of attention for the divided probes, but as is clear from the difference waves and scalp topographies, this effect was of opposite (negative) polarity compared to the positive difference in Experiments 1 and 2.
Results Summary and Discussion
Unlike Experiments 1 and 2, the Bd, FGd, and VPP were not produced in Experiment 3. These results argue against the possibility that the ERP differences observed in the first two experiments were due simply to enhancements of probe-evoked activity by spatially directed attention. Instead, the early ERP modulations found in Experiments 1 and 2 would appear to reflect early stages of figure–ground processing.
GENERAL DISCUSSION
In Experiment 1, two early ERP modulations were found to be associated with perceptions of the face–vase stimulus. When the shared borders between the face and vase regions were probed during the two different perceptual states, probe-evoked ERPs differed across the interval ∼100–190 msec (the Bd component). When the face or vase regions were probed, ERPs were more positive from ∼150 to 200 msec when the probed region was perceived as figure compared to ground (the FGd component). Both the Bd and FGd preceded the face-specific VPP component (∼190–240 msec), which was larger during face versus vase perceptions. Source localization estimates pointed toward LOC generators for both the Bd and FGd. In Experiment 2, we found that these same probe-evoked modulations were produced when subjects attended to one region while ignoring the other, suggesting that selective attention can influence figure–ground segregation. In Experiment 3, we ruled out the possibility that these modulations were simply produced by the allocation of spatial attention to the probed regions; when those regions lacked distinct contours that promote figure/ground segregation, the early components were absent. These results point to distinctive, sequential levels of figure–ground processing that were evident in modulations of ERPs elicited by probe stimuli delivered to different image regions.
Stages of Figure–Ground Perception
The perception of objects in everyday scenes is likely to involve several stages of visual information processing including contour detection, border ownership, and figure–ground segregation. In the present study, we did not investigate contour detection per se in that the contour (border) probes were always irrelevant, and we measured ERP modulations based on which perceptual figure the contour defined. Therefore, we did not expect to isolate ERP differences associated with this stage of processing. Following contour detection, a subsequent stage has been hypothesized to assign border ownership to regions adjacent to each contour in order to define the edges of figural forms (Zhaoping, 2005; von der Heydt et al., 2003; Albright & Stoner, 2002; Zhou et al., 2000; Koffka, 1935; Rubin, 1921). By probing the shared borders between the faces and vase, we found an early perception-based ERP modulation (Bd; ∼100–190 msec) that we hypothesize is associated with assignment of border ownership. It has been posited that the regions grouped with the figure are then segregated from the background during a subsequent stage of processing (Roelfsema, Tolboom, & Khayat, 2007; Roelfsema et al., 2002; Rubin, 2001; Vecera & O'Reilly, 1998; Sporns et al., 1991). The ERP modulations we observed for the face and vase probes (FGd; ∼150–200 msec) were likely produced during this figure–ground segregation stage.
To be clear, we do not claim to have measured the initial border assignment. This must have occurred before subjects reported their percepts in Experiment 1 and most likely at the beginning of each block in Experiment 2 when subjects first allocated their attention to one region or the other. Instead, we measured the time at which neural processing first differs when the border has (already) been assigned to one figure versus another. If a region is perceived as a foreground object over a period of a few seconds, it is plausible that side-selective neurons (and possibly, other relevant neurons) continue to fire in order to code border ownership, therefore allowing a stable percept to be maintained over time. Single-unit studies (e.g., Qiu, Sugihara, & von der Heydt, 2007) have shown that border ownership signals in V2 persist for at least several hundred milliseconds, implying that border assignment is maintained across time by the sustained firing of side-selective neurons. By probing the borders during temporarily stable perceptual states, we sought to track the ongoing (sustained) neural activity that supports border ownership.
The face minus vase Bd was a positivity at ∼110–190 msec. In both Experiments 1 and 2, we made no a priori predictions regarding the polarity of this difference. When the same border is assigned to one object or another, the relative firing rates of different side-selective neurons have been shown to differ (Qiu et al., 2007; von der Heydt et al., 2003; Zhou et al., 2000). For example, if a border is assigned to an object on the left, the left-side-selective cells fire more than the right-side-selective cells. Because different cells are more active when the border is assigned to a particular region, different patterns of postsynaptic potentials are likely to be produced, which in turn produce differences in electrical potentials measurable at the scalp. Although side-selective neurons have only been studied so far in V2, this same argument would apply for cells that contribute to border assignment in the LOC. In the current study, we measured these differences in border ownership coding by comparing ERPs when the border was currently assigned to one object versus another (face versus vase). We called this component a “Bd” (as opposed to a border positivity) because the polarity of this component depends on the differential synaptic activity patterns and is not readily interpretable, at least for now.
A challenge for future research will be to develop ways to isolate the early stages of figure–ground processing by selectively biasing perception while maintaining equivalent levels of sensory stimulation across conditions (see Palmer & Brooks, 2008; Appelbaum, Wade, Vildavski, Pettet, & Norcia, 2006 for promising approaches). In one such approach, Scholte, Jolij, Fahrenfort, and Lamme (2008) used a technique in which figures were created by orientation-defined texture boundaries. ERPs elicited by single figures differed from background-only stimuli from ∼90 to 120 msec, whereas ERPs elicited by multiple (overlapping) figures differed from single figures from ∼110 to 180 msec. Although the task and stimuli used by Scholte et al. (2008) were different from the present study, their results are consistent with ours in that two sequential phases of figure–ground processing were identified, the first associated with borders and the second with surface segmentation. Differences in the timing of figure–ground modulations across studies may be attributed to differences in figure–ground ambiguity, with less ambiguity resulting in earlier modulations. Also, although Scholte et al. compared ERPs elicited by border-present versus border-absent displays, the borders in the current study were always present and ERPs were compared based on whether the borders were assigned to one object versus another (face or vase).
Border Ownership, Early Form Perception, and Selective Attention
When objects are distinctly separated in space, figure–ground segregation has been shown to occur preattentively (Heinrich, Andres, & Bach, 2007; Scholte, Witteveen, Spekreijse, & Lamme, 2006; Schira, Fahle, Donner, Kraft, & Brandt, 2004; Fahle, Quenzer, Braun, & Spang, 2003; Marcus & Van Essen, 2002; Schubo, Meinecke, & Schroger, 2001; Super, Spekreijse, & Lamme, 2001). However, when objects are partially occluded and their contours are more ambiguous with respect to border ownership, selective attention seems to play a critical role in segregating figure from background (Nelson & Palmer, 2007; Qiu et al., 2007; Han, Jiang, Mao, Humphreys, & Gu, 2005; Han, Jiang, Mao, Humphreys, & Qin, 2005; Vecera, Flevaris, & Filapek, 2004; Driver, Davis, Russell, Turatto, & Freeman, 2001; Freeman, Sagi, & Driver, 2001; Scholl, 2001; Nakayama, He, & Shimojo, 1995; Wong & Weisstein, 1982). Recent neurophysiological research on nonhuman primates suggests shared circuitry for selective attention and border ownership mechanisms (Qiu et al., 2007; Bauer & Heinze, 2002; also see Li, Piech, & Gilbert, 2004, 2008 for attentional influences on contour integration). By changing only the context in the stimulus displays, different stimuli that lead to different figure–ground perceptions can be presented, while local input to individual receptive fields is held constant (Qiu et al., 2007; Qiu & von der Heydt, 2005; Rossi, Desimone, & Ungerleider, 2001; Zhou et al., 2000; Lamme, Zipser, & Spekreijse, 1998; Zipser, Lamme, & Schiller, 1996). This type of design allows an excellent level of experimental control because sensory input to individual cells remains fixed, whereas figure–ground perception is manipulated. Experiments utilizing this approach have shown strong modulations of single-cell responses in V1 and/or V2 based on whether physically identical regions were part of a figure or background (Zhou et al., 2000; Baumann, van der Zwan, & Peterhans, 1997; Zipser et al., 1996; Lamme, 1995; Grosof, Shapley, & Hawken, 1993).
Recently, Qiu et al. (2007) measured firing rates of V2 cells in nonhuman primates while single figures, separated figures, or overlapping figures were presented. They found that although V2 cells can code border ownership in the absence of directed attention for single figures (also see Rossi et al., 2001), the firing rates of these same cells were modulated by attention in the multiple figure conditions. Interestingly, for separated figures, the modulation by attention was monophasic and maximal from ∼150 to 200 msec, whereas for overlapping figures, the attention modulation appeared to be multiphasic with a small early phase at ∼80 msec, and two large phases, one from ∼100 to 150 msec and the other from ∼150 to 200 msec. Notable similarities are evident between the temporal phases of these attention effects and those found in the current study: Bd from ∼100 to 190 msec, and FGd from ∼150 to 200 msec. It should be noted, however, that our source estimates suggested generators in the LOC, whereas Qiu et al.'s (2007) results were based on measurements in V2. Future research may help clarify the roles of each of these regions in border ownership assignment and its modulation by attention.
The LOC and Figure–Ground Perception
One of the primary candidates for brain regions involved in figure–ground perception is the LOC. In recent years, this region of visual cortex, located lateral to the fusiform gyrus and extending anteriorly and ventrally, has consistently shown stronger activation in response to objects versus nonobjects (Grill-Spector et al., 1999; see Grill-Spector, 2003, and Grill-Spector, Kourtzi, & Kanwisher, 2001, for reviews). Kourtzi and Kanwisher (2001) used an fMRI adaptation approach to suggest that shape, but not contour processing, is carried out in the LOC (also see Baylis & Driver, 2001). However, it is important to note that Kourtzi and Kanwisher (2001) tested contour detection, not border ownership. Contour detection is hypothesized to occur early, in V1 or V2, whereas border ownership may require LOC activation, and potentially provide feedback to V1/V2 (Lamme, Zipser, & Spekreijse, 2002). Experiments 1 and 2 of the current study allowed isolated measurements of border ownership while contours were held constant (the border was always present and the border probe always irrelevant). The sources of the border difference ERP component (Bd) were estimated to reside in ventral-posterior occipital cortex (in or near the LOC).
It is interesting to note that “shape” stimuli used in experiments focused on the LOC are often identical to stimuli used to elicit figure–ground perceptions (Baylis & Driver, 2001; Hasson et al., 2001; Kourtzi & Kanwisher, 2001). Thus, it is difficult to determine whether the LOC response is more closely associated with shape encoding or figure–ground segregation. In the current study, the enhanced positivity found for probes delivered to regions perceived as figure (FGd) may well be related to early shape processing as opposed to figure–ground segregation. In fact, one of the Gestalt laws of perception states that regions perceived as figure have shape, whereas regions considered background appear shape-less (Rubin, 2001). Although our source localization estimates pointed to anterior LOC generators for the FGd component, it remains to be resolved whether this component reflects figure–ground segregation or low-level shape perception. Further research will be necessary to address this issue and to assess the role of the LOC in border ownership assignment, figure–ground segregation, and early shape encoding.
Acknowledgments
This work was supported in part by NIH Grants 5 T32 MH20002 and 2 R01 EY016984-35. The Cartool software (http://brainmapping.unige.ch/Cartool.php) used for LAURA source analyses was programmed by Denis Brunet, from the Functional Brain Mapping Laboratory, Geneva, Switzerland, and is supported by the Center for Biomedical Imaging (CIBM) of Geneva and Lausanne.
Reprint requests should be sent to Michael A. Pitts, School of Medicine, Department of Neurosciences, University of California, San Diego, 9500 Gilman Drive MC 0608, La Jolla, CA 92093-0608, or via e-mail: michaelapitts@ucsd.edu.