Edge-assignment determines the perception of relative depth across an edge and the shape of the closer side. Many cues determine edge-assignment, but relatively little is known about the neural mechanisms involved in combining these cues. Here, we manipulated extremal edge and attention cues to bias edge-assignment such that these two cues either cooperated or competed. To index their neural representations, we flickered figure and ground regions at different frequencies and measured the corresponding steady-state visual-evoked potentials (SSVEPs). Figural regions had stronger SSVEP responses than ground regions, independent of whether they were attended or unattended. In addition, competition and cooperation between the two edge-assignment cues significantly affected the temporal dynamics of edge-assignment processes. The figural SSVEP response peaked earlier when the cues causing it cooperated than when they competed, but sustained edge-assignment effects were equivalent for cooperating and competing cues, consistent with a winner-take-all outcome. These results provide physiological evidence that figure–ground organization involves competitive processes that can affect the latency of figural assignment.
Edge-assignment is the most conspicuous aspect of figure–ground organization because it governs not only the relative depth of the two regions adjacent to the edge but also the perceived shape of the closer region (e.g., Palmer, 1999). These phenomena can be demonstrated by the well-known vase–faces image in Figure 1 (Rubin, 1921). When the edges are assigned to the common inner region, the observer perceives a closer, black vase against a farther white background. However, when the edges are assigned to the outer regions, the observer perceives the same image as depicting a profoundly different scene: two white profile faces against a black background. A diverse set of image-based cues are known to influence edge-assignment, including convexity (Kanizsa & Gerbino, 1976), relative edge-region motion (Palmer & Brooks, 2008; Yonas, Craton, & Thompson, 1987), and extremal edges (Palmer & Ghose, 2008), among others. Top–down, nonimage factors can also be important, however, as indicated by the effects of previous experience (Peterson & Enns, 2005; Peterson & Gibson, 1994) and attention (Vecera, Flevaris, & Filapek, 2004; Baylis & Driver, 1995). Many of these cues are often simultaneously present for the same edge within the same scene, in which case they can bias its edge-assignment in a common direction (cooperative cue interaction) or in opposite directions (competitive cue interaction). Integration of these cues is critical for determining perceived edge-assignment. The dynamics of cue integration in edge-assignment have only recently been investigated behaviorally (e.g., Peterson & Skow, 2008; Burge, Peterson, & Palmer, 2005; Peterson & Lampignano, 2003), and much remains to be discovered about the neural underpinnings of figural cue integration (although see Qiu & von der Heydt, 2005).
Here, we examine cooperative and competitive interactions of edge-assignment cues during and after the determination of edge-assignment in the human brain. To do this, we used the steady-state visual-evoked potential (SSVEP) technique to measure the neural representation of figural (edge-assigned) regions and ground (edge-not-assigned) regions as has been reported previously by Appelbaum, Wade, Pettet, Vildavski, and Norcia (2008), Parkkonen, Andersson, Hamalainen, and Hari (2008), and Appelbaum, Wade, Vildavski, Pettet, and Norcia (2006). The SSVEP is the sinusoidal electrophysiological response of visual cortex to rapid, flickering visual stimulation (Regan, 1988). This technique has been used previously to study attentional modulation (e.g., Ding, Sperling, & Srinivasan, 2006; Muller, Malinowski, Gruber, & Hillyard, 2003; Muller & Hubner, 2002) and other processes. The flicker frequency of the driving visual item serves as a “tag” (e.g., Srinivasan, Russell, Edelman, & Tononi, 1999; Tononi, Srinivasan, Russell, & Edelman, 1998) for that item in the EEG, allowing activity related to simultaneously presented items to be separated despite the poor spatial resolution of EEG for differentiating retinotopic activations in cortex. For instance, neural activity related to Visual Item A flickering at 10 Hz can be indexed by isolating and plotting oscillatory EEG activity at 10 Hz (and/or its harmonic frequencies) and that related to a simultaneously presented Visual Item B flickering at 6 Hz can be indexed by isolating and plotting oscillatory EEG activity at 6 Hz (and/or its harmonic frequencies). Isolation of oscillatory activity in a frequency band can be accomplished using frequency domain methods such as Fourier analysis (see Figure 2A and B). Using the SSVEP technique, in combination with EEG source localization, Appelbaum et al. (2006, 2008) found that figural regions (tagged with one flicker frequency), but not ground regions (tagged with another flicker frequency), were preferentially represented by lateral visual cortex, including areas such as lateral occipital complex (Grill-Spector, Kourtzi, & Kanwisher, 2001; Kourtzi & Kanwisher, 2000, 2001). In contrast, neural activity related to ground regions was preferentially routed toward more dorsal cortical areas. This effect occurred regardless of the cues used to establish figure–ground organization. Appelbaum et al. (2008) also found nonlinear spatial interactions between figure and ground regions by measuring power at SSVEP interaction frequencies (e.g., Zemon & Ratliff, 1984). Using a similar region-tagging-by-frequency method, Parkkonen et al. (2008) showed participants Rubin's vase–faces image (e.g., Figure 1) and tagged face and vase regions with different dynamic noise frequencies. They found that the tag-related activity in early visual cortex (including primary visual cortex) varied with the perceptual states reported by the observer during spontaneous alternation of the bistable stimulus. When the observer saw the face regions as figural, the power in the corresponding face-tag frequency band was stronger than in the vase-tag frequency band. Power was stronger in the vase frequency band when that region was perceived as figural.
Using similar SSVEP methods, the present study focused on how the temporal dynamics of figure–ground organization (i.e., the time course of changes in the neural representations of a figural region and its adjacent ground region) are affected by competition between figure and ground cues when they are integrated to determine the final perceptual result. We applied frequency tags to two adjacent regions (i.e., a bipartite display with figure and ground regions) by contrast-reversing each region's checkerboard texture (Figure 2C) at a different frequency. We independently applied two edge-assignment cues: extremal edges (Palmer & Ghose, 2008), which were manipulated by display characteristics, and attention (Vecera et al., 2004; Baylis & Driver, 1995), which was manipulated by task instructions. The display-based cue of extremal edges nearly always dominated the final perceived figure–ground organization according to participants' reports. Thus, the region with the extremal edge was seen as figural relative to the adjacent checkerboard region and the adjacent region was seen as ground relative to the region with the extremal edge. In some cases, the two cues (i.e., extremal edges and attention) cooperated, whereas in other cases, they competed. This allowed us to examine the neural dynamics of cue cooperation and competition even though the ultimate perceptual result was identical in both cases. The high temporal resolution of EEG allowed us to track these dynamics both while edge-assignment was being determined and after it was established. Furthermore, we included trials in which the extremal edge cue reversed its direction during the trial (Figure 2D), allowing us to assess edge-assignment dynamics as they changed from competitive to cooperative and vice versa. Our use of image-based cues to guide the figure–ground reversals is unlike that of Parkkonen et al. (2008), who relied upon spontaneous reversals of the bistable stimulus (possibly due to changes in top–down cues such as attention).
Finally, because we independently manipulated the location of attention and perceived edge-assignment (as determined by the dominant extremal edge cue), our experiment enabled a dissociation of attention and edge-assignment. This dissociation is especially important in light of recent results showing strong associations between edge-assignment and attention (Nelson & Palmer, 2007; Qiu, Sugihara, & von der Heydt, 2007; Lazareva, Castro, Vecera, & Wasserman, 2006) and questions about the adequacy of attentional controls in previous SSVEP studies by Appelbaum et al. (2006, 2008) and Parkkonen et al. (2008). In the figural shape discrimination condition of Appelbaum et al., the figural region was task-relevant and, therefore, clearly attended. In the letter discrimination (attention control) condition, the figural region was not task-relevant because participants had only to monitor a stream of letters that were superimposed on the figural region to determine whether a target was present. Appelbaum et al. assumed that the figural region was not attended in the letter discrimination task, but the locus of spatial attention, nevertheless, clearly overlapped the figural region. It is therefore possible that attention overlapping the figural region in this task may have contributed to Appelbaum et al.'s results, because edge-assignment and the location of attention were not fully separated. In the Parkkonen et al. (2008) study, the location of attention was not explicitly manipulated, and thus it is impossible to determine the contribution of attention to their results. In the present experiment, we independently manipulated the location of spatial attention and edge-assignment by having observers attend to the figural region half of the time and to the ground region during the other half, thus separating attentional effects from edge-assignment effects.
Sixteen (8 men, mean age = 20.5 years) right-handed, University of California, Berkeley, students participated. All had normal or corrected-to-normal vision and no history of neurological or psychiatric illness. Those with a family history of seizure were excluded to avoid undiagnosed photosensitivity to flicker (e.g., Fisher, Harding, Erba, Barkley, & Wilkins, 2005).
Displays and Design
The rectangular displays (20-in. CRT, 100 Hz, viewing distance = 85 cm) subtended 5.82° (vertically) by 11.64° on a neutral gray background (49.50 cd/m2), divided into two equally sized 5.82° square regions along the vertically oriented meridian (Figure 2). Each region was filled with a black-and-white checkerboard texture rendered on the surface of a cylinder (Figure 2C). The size of the textured squares differed over the cylinder's curved surface, being largest (0.371° square) in the cylinder's middle (closest portion) and foreshortened on the sides. The cylinder's shading pattern was consistent with illumination from directly in front of the center of the display. One cylinder was oriented vertically (Figure 2C, right region) and the other horizontally. The vertical edge at the center of the display constituted a particular type of depth edge—called an extremal edge—between the horizontal and vertical cylinders. An extremal edge is a horizon of self-occlusion where a convex curved surface disappears from a particular viewpoint, signaled by a gradient in luminance and/or texture on the curved side (e.g., Barrow & Tenenbaum, 1981). Extremal edges strongly bias figural assignment of the edge toward the gradient side (Palmer & Ghose, 2008). The figural and ground regions did not differ in any way other than the orientation of the cylinders and their flicker frequency. The assignment of flicker frequency to figure and ground regions was counterbalanced across trials.
The checkerboard texture's contrast in one region cycled at 6.25 Hz (i.e., the contrast reversed 12.5 times/sec), whereas the other cycled at 10.0 Hz. Perceived contrast between the “white” (light) and “black” (dark) rectangular texture elements was lower at the higher frequency. To equate the perceived contrasts within the two regions, each participant adjusted the lower-frequency region's contrast until its perceived contrast was equal to that of the higher-frequency region. The calibration stimulus was similar to that used in the main experiment but with flat texture on both sides. The higher-frequency region luminances were: light rectangles, 99.20 cd/m2; dark rectangles, 0.01 cd/m2. The average adjusted luminances for the lower-frequency side were: light rectangles, 96.4 cd/m2; and dark rectangles, 4.86 cd/m2.
The experiment comprised six blocks of 192 trials (1152 total trials, 12 repeated measures/block). Each trial comprised a 4000-msec stimulus period preceded by a variable-length intertrial interval that varied randomly between 1000 and 2000 msec. Blocks lasted approximately 18 min and contained five 15-sec breaks, creating six 3-min sub-blocks. The attention condition was manipulated over sub-blocks. There were three attend-left sub-blocks and three attend-right sub-blocks within each block in random order. A sub-block instruction screen indicated which region (left or right) should be attended and judged. Participants were instructed to maintain fixation at the center of the screen throughout the trial while attending covertly to the task-relevant region. Participants reported whether they perceived the attended region as figure or ground with a button press at the beginning of the trial and again at any point that the perception of the attended region changed. Button assignments were counterbalanced across participants. Eye movements were monitored in two ways. An infrared camera (for low-light conditions) was focused on the eyes of the participant and the image was displayed to the experimenter. The experimenter marked trials with an eye movement (in any direction) with a button press and these trials were later removed from analysis. Additionally, the EOG trace was used off-line to detect eye movements and these trials were also removed. On average, 5.2% (maximum 8.9%) of trials were removed based on these criteria. We also conducted a separate experiment to verify fixation (see below).
EEG Data Collection Methods
We measured EEG using a 64 (modified 10–20 system configuration) + 4 (reference and EOG) channels Biosemi ActiveTwo system (Amsterdam, Netherlands: www.biosemi.com) with active electrodes. No acquisition filtering was done beyond the sampling filter (512 Hz). Data were recorded relative to the Common Mode Sense (CMS) active electrode. The CMS electrode forms a feedback loop with the Driven Right Leg (DRL) passive electrode, which drives the average potential of the subject (the Common Mode voltage) as close as possible to the analog–digital converter (ADC) reference voltage (i.e., “zero”). Data were referenced off-line to a nose reference electrode. The CMS electrode was located approximately halfway between Pz and PO3 and the DRL electrode was located approximately halfway between Pz and PO4.
Signal Processing Methods
Blink and eye movement artifacts were removed automatically. Eye artifacts were marked when the max–min voltage difference within a 200-msec interval exceeded a threshold. The most effective threshold for each participant (range: 75 to 120 μV) was determined by reviewing a subset of the data. The experimenter manually examined the data to detect incorrectly marked trials, missed artifacts, and muscle artifacts. On average, 20.3% (equal between conditions, 36% maximum) of the trials for a given subject were lost to artifacts, including the 5.2% (on average) lost to eye movements described above. Artifact-free data were segmented into 4800 msec epochs (−650 msec to 4150 msec).
The significance of differences between waveforms was assessed using a paired-samples (within-subjects), permutation t-test procedure (Blair & Karniski, 1993). This procedure made no assumptions about the distributions or autocorrelations of the data and maintained a 5% experiment-wise error (i.e., multiple comparisons correction). The procedure compared waveforms on a point-by-point basis along the time dimension. It compared all time points from 0 to 4000 msec, unless a temporal region of interest is noted.
Eye Tracking Control Experiment
In the EEG experiment described above, we used the EOG to exclude trials with saccades. However, this method is not sensitive to slow drifts in eye position and also cannot be used to verify fixation position. Systematic slow drifts of eye position and noncentral fixation could have affected the electrophysiological results. For instance, if fixation was biased toward the figural region, this could have increased the amplitude of the figural frequency because it was better represented in the fovea, an area of the visual field known to involve a larger number of neurons. Therefore, in a separate experiment, we monitored eye position (50 Hz sampling rate) with an ASL 5000 Remote Optics Eye Tracker (Applied Science Laboratories, Bedford, MA) while participants viewed exactly the same stimuli with exactly the same instructions as in the EEG experiment. The procedure was the same except that EEG was not recorded in this session. We tested four new participants (2 women, 2 men; mean age = 23 years) who were recruited from University College London. Because the instructions, stimuli, and procedure were the same as those in the EEG experiment, we expect that the eye tracking results here will be representative of eye movements made in the EEG experiment.
Behavior: Subjective Reports of Edge-assignment
As expected, the display-based extremal edge cue biased reports of edge-assignment very strongly: Participants reported edge-assignment consistent with the extremal edge cue on 98.7% of trials. For trials with cue reversals, this was true for both pre-reversal responses (98.6%) and post-reversal responses (98.7%). When the extremal edge cue cooperated with attention (e.g., extremal edge assigned figure to left and left was attended), the edge was slightly more likely to be assigned in the extremal edge direction (pre-reversal, 99.2%; post-reversal, 99.5%) than when it competed (pre-reversal, 98.0%; post-reversal, 97.9%) [pre-reversal, F(1, 15) = 48.96, p < .0001, η2 = 0.765; post-reversal, F(1, 15) = 30.72, p < .0001, η2 = 0.680].
Behavior: Reaction Time for Subjective Reports
Mean reaction time to report whether the attended region was figure or ground during the pre-reversal period (in both trials with and without reversals) was 494.6 msec, and was affected by whether the extremal edge and attention cues were cooperating (481.1 msec) or competing (507.5 msec) [F(1, 15) = 30.77, p < .0001, η2 = 0.672]. The mean reaction time to report figural status during the post-reversal period was 446.3 msec, and was also faster when the cues cooperated (423.1 msec) than when they competed (467.1 msec) [F(1, 15) = 89.62, p < .0001, η2 = 0.853].
We analyzed the electrophysiological data to address two primary issues. One was the existence of an edge-assignment effect: Was the SSVEP response associated with figural regions different than that associated with ground regions? The other was the existence of an edge-assignment/attention interaction: Were the dynamics of edge-assignment effects modulated by whether the extremal edge and attentional cues were competing or cooperating?
Electrophysiological Results: Edge-assignment Effects
To determine whether figure and ground regions differed in their SSVEP amplitude, we estimated the response at the first harmonic of the figure and ground flicker frequencies separately in each trial (see Methods for details). Only trials having figure–ground judgments consistent with the extremal edge cue were analyzed because there were too few inconsistent responses for analysis. We estimated the SSVEP time-course response (i.e., amplitude at the SSVEP frequency plotted at each point in time) for both figure and ground regions from electrodes O1, PO3, and PO7 (left hemisphere) and O2, PO4, and PO8 (right hemisphere), because these electrodes showed the strongest SSVEP responses and together receive signals from a large area of visual cortex. Because the results did not differ significantly between these electrodes within each hemisphere, they were averaged to create pooled left and right hemisphere estimates, respectively. There were no significant differences in the sustained edge-assignment effect due to hemisphere. The mean SSVEP amplitude over the sustained edge-assignment effect (collapsed over ipsilateral and contralateral) for times 650–4000 (after latest mean RT) was 0.367 μV for the left hemisphere and 0.375 μV for the right hemisphere [t(15) = 0.107, p = .915]. Thus, left and right hemisphere results were also pooled. The absolute flicker frequency (i.e., 6.25 Hz or 10 Hz) was of no interest and did not significantly interact with other factors. The results have therefore been averaged over the absolute flicker frequency.
Figure 3A shows the grand-average SSVEP power as a function of time for both perceived figure and ground regions, separately for trials with and without reversals of figure–ground assignment. During trials with no figure–ground reversals (sustained conditions), regions perceived as figure (Figure 3A, open dashed black line) were associated with a sustained, significantly stronger SSVEP response than regions perceived as ground (Figure 3A, open dashed gray line) from approximately 367 msec after display onset until 4000 msec (shaded area). During trials with figure–ground reversals, the SSVEP response associated with each region was also modulated by the region's perceived figure–ground status. The SSVEP response for a region initially (at display onset) perceived as figure (Figure 3A, black solid line) was stronger than the response associated with a region initially perceived as ground (Figure 3A, gray solid line) from 389 msec until 2221 msec from display onset (Figure 3A). Then, when the figure–ground polarity across the edge reversed at 2000 msec, the SSVEP responses for the two regions also changed as demonstrated by the crossing of the solid lines in Figure 3A. The region perceived as ground before the reversal now became figural and showed a stronger SSVEP response than the region that had been figural before the reversal (and was now perceived as ground). This effect lasted from 2378 msec until 4000 msec (Figure 3A). These results show a clear effect of edge-assignment on SSVEP amplitude that is sustained from shortly after display onset or figure–ground cue reversal (in the case of trials with reversals) until the end of the display. The edge-assignment effect (difference wave between figure and ground) for sustained figure–ground trials (no reversal) was equally strong for electrode pools ipsilateral and contralateral to the side of the figural region when compared on a point-by-point basis (Figure 3B, no significance shading) as well as when averaged over the entire sustained edge-assignment period (after latest mean RT = 650 msec): contralateral, 0.354 μV, and ipsilateral, 0.388 μV, t(15) = 1.07, p = .22.
Electrophysiological Results: Attention Effect
The location of attention also had a significant main effect on SSVEP amplitude, but it was dissociable from the edge-assignment effect by its hemispheric asymmetry. This attention effect was larger contralateral to the attended location than ipsilateral. Attended regions had significantly higher SSVEP amplitude compared to unattended regions from 223 to 3621 msec when the attended region was contralateral (Figure 3C, contralateral attended minus contralateral unattended difference wave, black line) to the recording site. A similar effect occurred ipsilateral to the attended region from 256 to 3578 msec (Figure 3C, gray line). The ipsilateral attention effect was significantly smaller than the contralateral attention effect during the shaded time ranges in Figure 3C, 302–709 msec and 855–3557 msec. These effects of attention on the SSVEP are consistent with those observed by others (e.g., Muller et al., 2003).
The absence of an attention effect late in the trial may be attributed to the task design. Participants needed to maintain attention until at least 2000 msec in order to report any change in edge-assignment. However, after that time passed, the participant may have realized that no more responses would be necessary for that trial and, therefore, may have relaxed their attention. This would have led to a reduction in the attention effect at the end of the trial only.
There was no difference in the size of the sustained (650–4000 msec average) edge-assignment effect for trials in which the regions were attended (Figure 3D, dark line: Attended Sustained Figure minus Attended Sustained Ground, 0.382 μV) and those in which the regions were not attended (Figure 3D, gray line: Unattended Sustained Figure minus Unattended Sustained Ground, 0.377 μV) [t(15) = 0.895, p = .26]. In other words, the attention effect and the edge-assignment effect did not interact during the sustained period. On point-by-point comparisons, there was a significant difference (indicated by shading in Figure 3D) from 3895 to 3926 msec. The reason for this difference at this point in time is unclear but it does not fit the pattern of the rest of the time period.
Electrophysiological Results: Temporal Dynamics of the Edge-assignment Process
The above results show the effect of figural status on the neural representation of a region during the sustained periods after edge-assignment was completed and reported by the participant. To examine the dynamics of edge-assignment processing as it occurred, however, we now focus on trials that contained a reversal of the extremal edge cue, thus requiring an on-line redetermination of edge-assignment. During these reversals, the attention cue stayed at its original location, whereas the display-based extremal edge cue reversed to support the other region. Thus, if the two cues were originally cooperating, this reversal caused them to compete during the reassignment of the edge. Because the reassignment of the edge could have occurred only after the cue reversal and before the participant's report, we restricted this analysis to a temporal region of interest spanning from 2000 msec (reversal) to 2630 msec (just after the latest mean response).
To test how competition and cooperation affected the temporal dynamics of edge-assignment, we estimated the time of edge-assignment resolution via its neural signature in the SSVEP (i.e., the edge-assignment effect) in both cue-cooperating and cue-competing conditions. For each of the cue-cooperating and cue-competing conditions, we compared the sustained figure condition to the ground-to-figure reversal condition. We then determined the first time point (moving from earlier to later) at which the difference between the two waveforms was not significant for 15 successive time points. We used the permutation t test procedure described in the Methods section, but with an independent-samples t test for comparing nonpaired sets of trials from different conditions from the same participant. We took the result as the time at which the ground-to-figure condition became indistinguishable from a condition that was already figural (i.e., the sustained figure condition). We call this the edge-assignment resolution time (EART). To indicate that the EART is calculated with the ground-to-figure condition—rather than by comparing the figure-to-ground condition to the sustained-ground condition, for instance (see below for this comparison), we label it EART-GF. When attention and edge-assignment cues cooperated, EART-GF was earlier (2321 msec; Figure 4A) than when the two cues competed (2439 msec; Figure 4B) [F(1, 15) = 11.45, p < .004; Figure 4B], indicating that competition between cues significantly extended the time necessary to finish edge-assignment. Individual participants' EART-GF values also showed significant correlations with individual mean reaction times in both cue-cooperating, r = 0.63, p < .008 (Figure 4C), and cue-competing conditions, r = 0.70, p < .0024 (Figure 4D). During the sustained post-reversal period (after 2630 msec), however, the size of the edge-assignment effect (ground-to-figure reversal condition minus sustained ground condition) did not differ between cue-competing and cue-cooperating conditions (Figure 4E). This suggests that once the cue competition was complete, edge-assignment was sustained in a winner-take-all fashion to the side with the extremal edge cue.
We also computed EART-FG (see Figure 5A for difference between EART-FG and EART-GF) by comparing the figure-to-ground reversal condition to the sustained-ground condition. Although we had no strong a priori reason to believe that this would yield different results, it is possible that EART-FG could differ from the EART-GF comparison above if, for instance, there is a neural persistence effect. That is, perhaps region representations do not immediately reduce in activity after a reversal. However, we found that EART-FG also showed a similar difference between competitive (2358 msec) and cooperative conditions (2477 msec) [F(1, 15) = 8.47, p < .01; Figure 5C].
We also used the reversal and sustained conditions to compute edge-assignment starting times (EAST), the time at which the difference between the reversal condition and the sustained condition first became significantly different. For instance, EAST-FG is the difference between the figure-to-ground reversal condition and the sustained-figure condition (see Figure 5A for a schematic depiction of this). Comparing EAST-FG for competitive and cooperative conditions indicates whether edge-assignment started earlier for cooperative cues or whether it just ended earlier (as found above). There were no significant differences between competitive and cooperative conditions for EAST-FG [F(1, 15) = 0.387, p < .543; Figure 5D], and EAST-GF [F(1, 15) = 1.96, p < .182; Figure 5E]. Thus, although edge-assignment was resolved faster during cooperative cue conditions than competitive conditions, it did not start any earlier.
The above results show that the reversal (i.e., the crossover pattern immediately after the cue reversal) in SSVEP signals ended later when the reversal caused a competitive integration between the two edge-assignment cues (i.e., attention and extremal edges) than when the post-reversal cue integration was cooperative. We also tested whether the reversal in the SSVEP signal started later for competitive than cooperative conditions, but we found no evidence of this.
Eye Tracking Control Experiment Results
The eye tracking data were analyzed to address whether eye position differed as a function of the edge-assignment direction and attention-position. We collapsed the data over the reversal factor (i.e., whether or not the trial contained a reversal). For trials with reversals, data from the pre-reversal period (e.g., when the edge was assigned to the left) was assigned to the appropriate edge-assignment condition (e.g., left in this case) and to the opposite edge-assignment condition for the post-reversal period (i.e., right in this case) because the edge-assignment conditions differed for the pre- and post-reversal portions of the trial. Figure 6 shows scatterplots (for one randomly selected participant) of fixation positions as a function of edge-assignment direction and attention-position.
Behavioral results in the eye tracking control experiment were similar to those observed in the EEG Experiment. Participants reported edge-assignment consistent with the extremal edge cue on 98.3% of trials. For trials with cue reversals, this was true for both pre-reversal responses (98.4%) and post-reversal responses (98.2%).
We quantitatively tested whether the distribution of eye positions differed as a function of the edge-assignment and attention-location factors. For each subject, we calculated their mean (averaged over time) horizontal and vertical eye positions within each trial segment. Each segment was 2000 msec long because pre- and post-reversal segments were averaged separately. This was done because in trials with reversals, pre- and post-reversal segments belonged to different conditions of the edge-assignment factor. Thus, it was not possible to average over the whole 4000-msec trial. These mean positions (two means for each trial) were then analyzed in a two-way ANOVA with edge-assignment and attention-location as factors. The ANOVA was done separately for each participant and separately for horizontal and vertical eye positions.
There were no significant differences in mean horizontal eye position (Table 1) as a function of edge-assignment [F(1, 378) = 0.761, 0.006, 0.187, 0.437, p = .383, .979, .665, .509, for Participants 1–4, respectively]. There were also no significant effects of attention-location [F(1, 378) = 0.803, 3.36, 0.032, 2.944, p = .371, .070, .857, .091, for Participants 1–4, respectively]. There were no significant differences in mean vertical eye position as a function of edge-assignment [F(1, 378) = 1.995, 0.021, 0.001, 0.035, p = .158, .884, .972, .850, for Participants 1–4, respectively]. There were also no significant effects of attention-location on vertical eye position [F(1, 378) = 0.426, 1.423, 0.083, 1.044, p = .514, .234, .772, .307, for Participants 1–4, respectively]. There were also no interactions of these two factors for any of the participants.
We used the SSVEP to demonstrate that the time course of edge-assignment in human visual cortex is modulated by competition and cooperation between edge-assignment cues. We independently measured the SSVEP response to figure and ground regions and found that the sustained SSVEP response associated with regions perceived as figure was significantly greater than that associated with regions perceived as ground (i.e., the SSVEP produces a measurable edge-assignment effect). This result is consistent with the results of Appelbaum et al. (2006, 2008) showing that figure and ground regions involve different neural pathways as well as neurophysiological evidence showing stronger responses within figural surfaces than ground surfaces (Lamme, Zipser, & Spekreijse, 1998; Zipser, Lamme, & Schiller, 1996; Lamme, 1995). In addition, we also found new effects of cue competition on the temporal dynamics of the SSVEP signal. When competition between edge-assignment cues was increased, the peak of this edge-assignment effect was delayed. When competition was reduced (i.e., when cues cooperated), the SSVEP modulation related to edge-assignment occurred relatively more quickly. These effects demonstrate that the temporal dynamics of edge-assignment are affected significantly by the competition between cues on the two sides of an edge. However, once edge-assignment was established, the edge-assignment effect amplitude was the same regardless of cue competition or cue cooperation. This suggests that although cue interactions affect neural dynamics during the initial assignment of edges, edge-assignment is resolved in a winner-take-all fashion at the neural level, consistent with the conscious perceptual outcome.
Others have found that competition between edge-assignment cues, similar to that in our displays but involving different cues, affects reaction time performance on behavioral tasks. Same–different matching decisions for two edges take longer when one of the edges was assigned in the opposite direction during a previous exposure compared to when it was assigned in the same direction (Peterson & Enns, 2005; Peterson & Lampignano, 2003). This result has been taken as evidence that greater competition leads to a delay in the resolution of edge-assignment. The authors reasoned that previous exposure of the edge, but with opposite edge-assignment, comprised a cue that competed with configural cues during edge-assignment in later exposures. Increased competition delayed edge-assignment. Because decision-making in the same–different task depended on the results of edge-assignment, reaction times were correspondingly increased by edge-assignment delays. Similar behavioral results, however, have also been interpreted as evidence of inhibition of ground region representations (Peterson & Kim, 2001; Treisman & DeSchepper, 1996). Our results suggest that the completion of edge-assignment is delayed by greater cue competition and this delay of neural processing may contribute to the reaction time effects observed by others.
This result is consistent with predictions derived from a computational model of border-ownership processing in area V2 of visual cortex (Zhaoping, 2005). Specifically, the model predicts longer border-ownership latency for border segments that have opposite or conflicting ownership biases. Furthermore, Kienker, Sejnowski, Hinton, and Schumacher (1986) built a model that integrated influences of bottom–up edge-assignment biases with a top–down “attentional” influence. Using a simulated annealing algorithm, their model required more iterations to reach an edge-assignment solution that was consistent with bottom–up cues when the “attention” cue was inconsistent (i.e., different location) than when it was consistent (i.e., same location). This model's behavior is also consistent with our results, although the number-of-iterations measure is not directly analogous to the neural and behavioral measures that we employed. Unfortunately, their results were less clear with a gradient descent algorithm. Thus, the generality of their model on this issue is unclear. Some of the other computational models that address integration of multiple cues (e.g., Vecera & O'Reilly, 1998) either do not make specific predictions on this issue or did not present timing results.
Several computational models (Roelfsema, Lamme, Spekreijse, & Bosch, 2002; Vecera & O'Reilly, 1998, 2000; Kienker et al., 1986) and theoretical accounts (Peterson & Skow, 2008; Peterson, de Gelder, Rapcsak, Gerhardstein, & Bachoud-Levi, 2000) of edge-assignment predict that figural regions show stronger neural activity than ground regions. These theoretical accounts and computational models have been supported by both neurophysiological (Zipser et al., 1996; Lamme, 1995) and behavioral data (Peterson & Skow, 2008; Peterson & Kim, 2001). Furthermore, some of the computational models also integrate top–down and bottom–up figure–ground cues, more analogous to the cue integration situation in our experiment (Vecera & O'Reilly, 1998; Kienker et al., 1986). Kienker et al. (1986) did this with a parallel network model and Vecera and O'Reilly (1998) used a PDP interactive network architecture.
Overall, our data square well with these computational models, theoretical accounts, and previous results. We showed a sustained increase in SSVEP amplitude for figural regions relative to ground regions after edge-assignment was complete (i.e., the sustained edge-assignment effect). Some theoretical accounts of edge-assignment (Peterson & Skow, 2008; Peterson et al., 2000) also specifically predict inhibition of ground regions in addition to facilitation of figural regions. Our experimental design has no appropriate neutral comparison, however, that would allow us to determine whether figural regions were enhanced, grounds were inhibited, or both.
Our edge-assignment effects onset roughly 250 msec after stimulus onset or reversal onset and reached their peak at 300–400 msec. These effects are relatively late in comparison to the latency found in neurophysiological studies with nonhuman primates (von der Heydt, Zhou, & Friedman, 2000; Zhou, Friedman, & von der Heydt, 2000; Lamme et al., 1998; Zipser et al., 1996; Lamme, 1995) and electrophysiological studies with humans (Scholte, Jolij, Fahrenfort, & Lamme, 2008; Appelbaum et al., 2006; Caputo & Casco, 1999), which range from 70 to 280 msec. It is unclear why our effects occurred significantly later than these studies. However, this may have arisen from the particular edge-assignment cues that were used in this study. Most of the studies noted above used texture segmentation cues, whereas we used the extremal edges cue. None of the previous studies used this cue, and thus, we have no basis for comparing it with other cues. It is also possible that our SSVEP measure only detected later differences that involved larger portions of cortex. A large number of neurons must be active synchronously to give rise to a strong SSVEP response. Further work will be necessary to clarify these issues.
Our experiment was specifically designed to separate the effects of attention from those of edge-assignment because of the close relation between these two processes in behavioral studies (e.g., Nelson & Palmer, 2007; Vecera et al., 2004; Driver & Baylis, 1995) and because of potential confounds in previous EEG studies on edge-assignment, as described in the Introduction. We attempted to avoid this problem by independently manipulating the location of attention and edge-assignment. Participants were instructed to pay attention to one of the flickering regions and make decisions about whether it was figure or ground. They were instructed to report this at the beginning of the trial but they also had to monitor this region throughout the trial and respond again if and when the figural status of the region changed. Although this task encouraged participants to continuously direct their attention toward one region, this cannot be guaranteed especially because the attention task was not particularly demanding. Furthermore, because the timing of the reversal was predictable (i.e., always at 2000 msec, if it occurred), participants could have let their attention wander after they responded to the reversal (and thus no longer needed to monitor for it) or realized that it was not going to occur. In fact, the data suggest that this may have been the case. The attention effect was not statistically significant after approximately 3600 msec (see Figure 3C). The edge-assignment effect, however, continued until the end of the trial, suggesting that it proceeded independently of the attention effect, even when attention may not have been systematically directed toward one location. These results are in contrast to fMRI results, suggesting that edge-assignment modulations are dependent on attention (Fang, Boyaci, & Kersten, 2009). In that work, however, attention was either strongly directed toward a task at fixation or to the edge-assignment stimulus. Fixation fell within a gap at the center of the edge-assignment stimulus, and thus, did not necessarily overlap any of the edges in the edge-assignment stimulus. An edge-assignment modulation was only observed when attention was directed to the stimulus. Our paradigm differed from the Fang et al. paradigm because attention was always near the critical edge rather than being directed somewhere else entirely. Furthermore, although our electrophysiological results suggest that attention was manipulated in our experiment, our attention manipulation was unlikely to be as strong as theirs, a factor that could also account for the difference in results.
Our results showed that attention and edge-assignment effects did not interact with the flicker frequency (i.e., the effects of attention and edge-assignment on SSVEP amplitude did not differ between the two frequencies). Other work, however, has shown that attention can have different effects at different flicker frequencies. For instance, Ding et al. (2006) showed that whereas attention to a flickering stimulus may increase its SSVEP power in the delta band (i.e., 2.5, 3, and 4 Hz in their study), both increases and decreases in SSVEP power were observed in the alpha band. However, whether power increased or decreased at a particular flicker frequency also depended on stimulus configuration and which EEG channel was analyzed. Our displays differed from theirs in shape, eccentricity, and flicker type (i.e., we used contrast-reversal and they used homogeneous flicker). The comparison is made even more difficult because stimulus configuration affected the results reported by Ding et al. The differences between the stimulus conditions make detailed comparison of the results difficult, if not, impossible. Nonetheless, using our pattern-reversal, checkerboard stimuli presented in central vision, we did not observe any significant interactions of attention effects with flicker frequency. Ding et al. also found that SSVEP attention modulations in the alpha band occurred only if the competing (i.e., nonattended) stimulus was in presented in the fovea. Our stimuli were quite large but both regions certainly overlapped the fovea. Again, however, it is difficult to compare our results to theirs because we did not manipulate eccentricity and because the shapes of our stimuli were very different from theirs. Finally, although we cannot guarantee that our results generalize to other flicker frequencies (because we did not systematically vary over a wide range of flicker frequencies), our manipulations are not confounded by flicker frequency because flicker frequency was completely counterbalanced across the other conditions. Further work will be necessary to determine whether different flicker frequencies evoke a similar pattern of results.
It is important to point out that a region's figural status is not absolute but always relative to something else. In our experiment, we set up a bipartite display with a critical edge between the two regions. The cues we used biased figure–ground assignment across this critical edge and we observed electrophysiological results from the two regions on either side of this edge. Thus, our manipulations were intentionally focused on the effect of figure–ground organization across only one edge in the display. However, the regions in our displays also had borders with the larger background of the screen. It could be said that the regions are figural relative to this background. There are other alternative perceptions as well. Because we did not manipulate the edge-assignment cues across these borders, however, we cannot determine how they affected the results found here. Our results do show, however, differences in the representation of two regions that share a border and have a clear figure–ground relationship relative to one another.
There are several potential neural sources of our sustained edge-assignment effect. Although figure and ground regions were of equal size and eccentricity in our displays, the size, extent, or location of their neural representations may have differed. For instance, figural regions may have engaged a larger portion of the cortex, involved more neurons within the same portion of cortex, or involved cortical regions that were detected better by the analyzed electrodes than did ground areas. Distinguishing between these possibilities is difficult using scalp-recordings due to the relatively poor spatial resolution of EEG. Recent results using EEG source modeling techniques suggest that figural regions may receive stronger representation in ventral cortical areas (which would presumably project more to our electrode region of interest), whereas ground regions are represented more dorsally (Appelbaum et al., 2006). It is unclear, however, whether the effects of attention were properly dissociated from figural status in their study because the attentional control condition involved attending to a task located on the figural region. The present results do not suffer from this problem because we independently manipulated attention and figural status. Several studies have shown that the lateral occipital complex is sensitive to border-ownership manipulations (Fang et al., 2009; Vinberg & Grill-Spector, 2008). Inferotemporal cortex in primates shows sensitivity to border ownership reversals but not mirror and contrast reversals (Baylis & Driver, 2001). A significant body of evidence also suggests that V1, V2, and V4 contain neurons that are sensitive to the direction of border-ownership across edges in their receptive field (Qiu & von der Heydt, 2005; von der Heydt et al., 2000; Zhou et al., 2000; Zipser et al., 1996). Many or all of these areas could have contributed to our results because of the broad spread of EEG signals across the scalp.
Another potential source of the edge-assignment effect is a difference in synchronization of neurons representing figure and ground regions. More synchronized neuronal firing generally gives rise to larger deviations in the EEG recorded at the scalp. Thus, if the neurons representing figural regions were more synchronized than those representing ground regions, the figural SSVEP response would be stronger. Neural synchrony has already been implicated in binding (e.g., Engel & Singer, 2001; Singer & Gray, 1995), object representation (e.g., Bertrand & Tallon-Baudry, 2000; Tallon-Baudry & Bertrand, 1999), and attention (e.g., Bichot, Rossi, & Desimone, 2005; Tallon-Baudry, 2004), in addition to higher-order cognitive processes. Assessing this hypothesis may require investigations in animal models because it is difficult to differentiate the effects of synchrony from other factors (such as the number of neurons involved) when recording at the scalp. However, given the role of synchrony in many other brain processes, it would not be surprising if it played a role here as well.
Overall, our results support a theoretical account of edge-assignment in which cue competition leads to longer neural processing and delays edge-assignment. Once edge-assignment is determined, however, cue competition has no lingering effect on the strength of the figural representation. This suggests that edge-assignment involves a competitive winner-take-all process. We also found that figural regions entrain a greater or different neural representation than ground regions, in line with the conclusions of previous research. The SSVEP thus provides a useful measure for testing theories about the temporal dynamics of edge-assignment processes as it unfolds in human visual cortex.
This work was supported by a Royal Society International Postdoctoral Fellowship and a NIMH Training Fellowship to J. L. B. (T32 MH62997).
Reprint requests should be sent to Joseph L. Brooks, Institute of Cognitive Neuroscience, University College London, 17 Queen Square, London WC1N 3AR, UK, or via e-mail: firstname.lastname@example.org.