Abstract

People are quicker to detect examples of real-world object categories in natural scenes than is predicted by classic attention theories. One explanation for this puzzle suggests that experience renders the visual system sensitive to midlevel features diagnosing target presence. These are detected without the need for spatial attention, much as occurs for targets defined by low-level features like color or orientation. The alternative is that naturalistic search relies on spatial attention but is highly efficient because global scene information can be used to quickly reject nontarget objects and locations. Here, we use ERPs to differentiate between these possibilities. Results show that hallmark evidence of ultrafast target detection in frontal brain activity is preceded by an index of spatially specific distractor suppression in visual cortex. Naturalistic search for heterogenous targets therefore appears to rely on spatial operations that act on neural object representations, as predicted by classic attention theory. People appear able to rapidly reject nontarget objects and locations, consistent with the idea that global scene information is used to constrain naturalistic search and increase search efficiency.

INTRODUCTION

In day-to-day life, our visual system is flooded with a rich stream of input that we must parse for objects of interest. This real-world selection problem has been approximated in the laboratory in visual search experiments in which participants are asked to detect or report characteristics of targets hidden among distractors. To render the issue tractable, visual search experiments have tended to employ simple, synthetic stimuli in regular arrays, with targets defined by unique visual features or limited combinations of features.

This reductionist approach has led to many notable successes, but as outlined in recent reviews it has also led to misunderstanding (Peelen & Kastner, 2014; Wolfe, Võ, Evans, & Greene, 2011). Classical attention theories—developed from experiments with synthetic stimuli in arrays—suggest that attention creates object representations by binding low-level visual features to each other and to locations. By this, search for complex naturalistic targets requires the serial application of attention so that objects can be compared with target templates (Wolfe, Cave, & Franzel, 1989; Treisman & Gelade, 1980). Search for a naturalistic target in a synthetic array should therefore be more efficient than search for the same target in a visually complicated natural scene, simply because scenes will tend to contain more objects. But this is not the case. People are extraordinarily quick to detect the presence of real-world objects in scenes (e.g., Wolfe, Alvarez, Rosenholtz, Kuzmova, & Sherman, 2011; Rousselet, Fabre-Thorpe, & Thorpe, 2002; Potter, 1975) and neural evidence of target detection emerges in frontal cortex as early as 150 msec poststimulus (Fabre-Thorpe, Delorme, Marlot, & Thorpe, 2001; VanRullen & Thorpe, 2001; Thorpe, Fize, & Marlot, 1996). This ultrafast object detection occurs when spatial attention is effortfully engaged elsewhere (Oliva & Torralba, 2007; Li, VanRullen, Koch, & Perona, 2002, for reviews) and has been linked to aspects of scene context that are ignored by classical theory (Torralba, Oliva, Castelhano, & Henderson, 2006; Fabre-Thorpe et al., 2001).

Two proposals have been offered to solve this puzzle. One is that recognition of target presence might rely on the detection of disjunctive sets of features that characterize the target category (e.g., Treisman, 2006; Evans & Treisman, 2005). Simple detection of these features may not require attention, but derivation of details about the eliciting object and its location would still rely on the deployment of attention. This predicts that evidence of target detection may emerge in the brain without the necessary precedence of spatially localized, attention-related activity in retinotopic visual cortex.

The alternative is that categorization and detection of naturalistic targets relies on serial spatial selection, but that global scene information rapidly constrains the search space to reduce the number of objects and locations that need be resolved (e.g., Wolfe, Võ, et al., 2011; Torralba et al., 2006). The prediction here is that spatially localized attentional mechanisms operate prior to the detection of naturalistic targets, but that these spatial effects emerge very quickly.

Here, we use human electrophysiology to differentiate between these proposals. To index spatial processing in naturalistic search, we looked to lateral components in the visual ERP—the N2pc and distractor positivity (Pd). The N2pc is an index of attentional selection that emerges around 175 msec poststimulus as a negative difference in ERPs recorded over visual cortex contralateral versus ipsilateral to an attended target (Luck & Hillyard, 1994a, 1994b). The Pd is similarly defined as a lateral difference in visual ERPs but has polarity opposite to the N2pc and reflects suppression of unattended stimuli rather than selection (Gaspar & McDonald, 2014; Sawaki, Geng, & Luck, 2012; Hickey, Di Lollo, & McDonald, 2009). The Pd has more temporal variability than the N2pc, emerging earlier when task confines support rapid distractor suppression (Kerzel, Barras, & Grubert, 2018; Weaver, van Zoest, & Hickey, 2017; Sawaki & Luck, 2010) and when overt responses are quick (Gaspar & McDonald, 2014).

Few existing studies have employed lateral ERP components as indices of selective processing in scenes. The N2pc has been observed in a set of studies investigating search through scenes, but the target in this work was superimposed over the scene and was not itself naturalistic (Doallo, Patai, & Nobre, 2013; Patai, Doallo, & Nobre, 2012). To our knowledge, only one study has attempted to elicit an N2pc with naturalistic targets—targets defined by their membership in a heterogenous real-world category—but failed to identify reliable effects (Das, Guo, Geisbrecht, & Eckstein, 2010).

Experiment 1 therefore had the simple goal of characterizing spatial object processing in the lateral ERP during discrimination of naturalistic targets in scenes. We had participants search for examples of people and vehicles in photos of urban scenes in alternating blocks, asking them to discriminate the orientation of the target. As described in the Methods, each scene contained examples of both object categories, but we prepared the scenes such that we could extract lateralized ERP activity elicited by individual category examples when they acted as target and when they acted as salient distractor. To foreshadow, a robust, normal latency N2pc is elicited by targets in Experiment 1, and a small, quick Pd is elicited by distractors. Experiment 2 expands from this to investigate the latency of spatially selective object processing in ultrafast object detection. Experiment 2 had two goals: to verify that the early Pd identified in Experiment 1 emerges during detection of naturalistic targets and to determine whether this effect precedes emergence of information about target presence elsewhere in the brain. To this end, Experiment 2 contrasts the onset of early Pd with the latency of a hallmark of ultrafast object detection in frontal cortex.

EXPERIMENT 1

Methods

Participants

Sixteen healthy University of Trento students gave informed consent before completing the experiment. All reported normal or corrected-to-normal vision and were paid for their participation. One participant was removed from analysis due to an inability to complete the task without moving the eyes, resulting in the rejection of more than 50% of trials due to oculomotor artifacts in the EEG. All remaining participants (five men, 19–26 years old) were right-handed. Sample size for both experiments was chosen based on results from pilot experiments suggesting ample power to detect effects of interest.

Stimuli and Procedure

Participants were seated in a dim room at approximately 70 cm distance from a 24-in. LED monitor designed for psychophysical research (VPixx Technologies; 100 Hz refresh rate). Each experimental trial began with presentation of a fixation cross for 500–1000 msec (randomly selected from rectangular distribution) before 1 of 480 photographs of complex urban scenes was presented at fixation (∼7° × ∼9° visual angle; see Figure 1 for examples). Photos sustained until response, when a new trial began. Approximately 80% of photos were taken by authors D. P. and G. B. using a standard SLR camera, with the remaining 20% taken from the LabelMe image database (Russell, Torralba, Murphy, & Freeman, 2008). All images were rendered in black and white.

Figure 1. 

Examples of scene stimuli employed in Experiment 1.

Figure 1. 

Examples of scene stimuli employed in Experiment 1.

Each photo contained foreground examples of a vehicle and a person with one category example always appearing in the center of the scene and the other appearing in the lateral periphery. Vehicles and people were themselves oriented to the left or to the right (i.e., side view of vehicle, profile of person), and participants reported the orientation of the target example. Left-hand response on a standard keyboard indicated a left-facing target and right-hand response indicated a right-facing target. To be clear, response was thus based on the direction that the vehicle or person faced, not on its location in the scene. Photos were selected or prepared such that the category example in the periphery was roughly equidistant from fixation in each image and such that people and vehicles had roughly consistent size across the image set.

Participants completed 40 practice trials before completing 20 blocks of 48 trials. The target category was cued at the beginning of each block and alternated across blocks (with the order of alternation counterbalanced across participants, such that equal numbers started with each of the two target categories). This created a situation where, when search was for people, vehicles were a salient distractor category because they had recently served as targets (and vice versa). Accuracy feedback was provided at the end of each block.

Stimuli presented at fixation or along the vertical meridian above and below fixation are processed equally by both visual cortices and therefore do not create lateralized activity in the visual ERP (Hickey et al., 2009; Hickey, McDonald, & Theeuwes, 2006; Woodman & Luck, 2003). As a result, when the target appeared in the periphery and the distractor appeared at fixation, lateralized brain activity could be unambiguously associated with target processing. Equally, when the salient distractor appeared in the periphery and the target appeared at fixation, lateral brain activity could be linked to distractor processing.

Participants saw each photo twice, once when the vehicle was a target and once when the person was a target, and the experiment took approximately 1 hr to complete. All procedures were approved by the University of Trento ethics commission.

EEG Recording and Analysis

EEG was recorded at 1 kHz from 64 Ag/AgCl electrodes mounted in an elastic cap (10/20 montage) using the BrainAmp DC system and Brain Vision Recorder software (Brain Products). Additional electrodes were placed at the left and right mastoids and 1 cm lateral to the outer canthus of the left eye. All electrodes were referenced during recording to the right mastoid and subsequently re-referenced offline to the algebraic average of the mastoids. Electrode impedences were kept below 10 kΩ, and EEG was digitally filtered offline at 0.01–45 Hz to remove direct current drift and high-frequency noise (Blackman-windowed FIR filter, 10k kernel length).

Infomax independent component analysis (Bell & Sejnowski, 1995) was used to identify variance stemming from ocular artifacts in the EEG. The independent components representing horizontal and vertical eye movements were used to identify trials in which eye movements were made in the 600-msec interval following stimulus onset. Participants retained for analysis moved their eyes in 6–10% of trials; these trials were removed from analysis. Components representing eye artifacts were subsequently removed from the data, and ERPs were calculated based on trials where participants made correct responses. ERPs were baselined on the interval beginning 100 msec before stimulus onset and ending 50 msec after.

Statistical Analysis

Contrasts of behavior and ERP components measured over predefined intervals relied on bootstrap analysis of difference scores with 105 iterations. Effect size for these contrasts is reported as Cohen's d (Cohen, 1988). Where noted, we additionally assessed lateral brain activity using a randomization method introduced by Sawaki et al. (2012). This tests the hypothesis that lateral ERP amplitude in a given interval is larger than expected by chance. Critically, only directional component amplitude is analyzed, generating a measure of signed area in the difference wave. That is, the N2pc is measured by summing the negative polarity signal in the contralateral-minus-ipsilateral difference across a long, predefined interval (50–500 msec in the current experiment). This approach is adopted because it allows the N2pc to be measured and assessed without the need for a priori definition of precise temporal analysis windows, even when the N2pc occurs in close temporal proximity to opposite-polarity effects like the Pd (which will cancel out N2pc amplitude when included in mean averages). However, because the signed area approach ignores any finding that is contrary to the hypothesis under consideration—that is, when the negative polarity N2pc is being tested, it ignores any positive polarity amplitude—it is a biased estimator. To test if a biased estimate of an effect is significant, it must be compared against a distribution of results observed when no effect is present but the bias in estimation remains.

To generate such a null distribution, we iteratively relabeled ipsilateral and contralateral waveforms for each participant, took a measure of mean negative polarity in the difference wave calculated across the resulting data sets, and stored this in an accumulator. We did this for all 215 combinations of ipsilateral and contralateral waveforms in the 15-participant data set. When the observed effect lay at the extreme edge of this null distribution, it was unlikely to be a product of chance, and the null hypothesis could be discarded. The associated p value was calculated as the proportion of the null distribution that lay at or beyond this observation. The same approach was adapted for statistical assessment of Pd, but with the measurement of positive rather than negative polarity.

We adapted this approach to test the difference in Pd observed in two experimental conditions. To create a null distribution for this test, we generated the contralateral-minus-ipsilateral difference wave for all 215 combinations of ipsilateral and contralateral waveforms for each of the two conditions of interest separately and took a measure of positive polarity for each difference wave. We subsequently calculated the difference between these biased estimators for all 230 combinations of results for each of the two conditions. Importantly, at this stage both positive and negative polarity differences were included in the null distribution.

Results

Behavioral Results

When the target appeared at a lateral location and the distractor appeared at fixation, mean RT was 600 msec (SD = 62 msec) and mean error rate was 4.7% (SD = 3.8%). When the target was central and the distractor lateral, participants were both reliably faster (558 msec, SD = 59 msec; p < 10−5, d = 0.70) and more accurate (2.5% error, SD = 2.1%; p < 10−5, d = 0.72).

Electrophysiological Results

In the lateral target condition—when the distractor was presented at fixation—signed area analysis revealed a robust target-elicited N2pc beginning at approximately 200 msec poststimulus (p < 10−4; Figure 2A). This was preceded by apparent emergence of an early, positive polarity difference at around 150 msec poststimulus. This may reflect a sensory imbalance in the scenes: The prominent foreground presence of the lateral target in one field was not balanced by a corresponding foreground stimulus in the opposite field, and this may have created sensory laterality in the visual ERP. However, this positive polarity effect was not statistically significant in signed area analysis (p = .666).

Figure 2. 

Waveforms from Experiment 1 as recorded at PO7/8, where both Pd and N2pc were maximal. Note that negative is plotted upward here and in subsequent figures and that stimulus onset occurred at graph origin. Topographic maps reflect mean difference between ipsilateral and contralateral waveforms mirrored across the hemispheres with the midline electrodes set to zero value. (A) ERPs observed ipsilateral and contralateral to a lateral target. Topographic map reflects lateral voltage difference from 200 to 250 msec poststimulus. (B) ERPs observed ipsilateral and contralateral to a lateral distractor. Topographic map reflects lateral voltage difference from 132 to 182, which reflects a 50-msec interval centered at the cross-conditional peak of the contralateral positivity at 155 msec. (C) Contralateral-minus-ipsilateral difference waves. The N2pc is reflected in negative waveform polarity, and Pd is reflected in positive polarity.

Figure 2. 

Waveforms from Experiment 1 as recorded at PO7/8, where both Pd and N2pc were maximal. Note that negative is plotted upward here and in subsequent figures and that stimulus onset occurred at graph origin. Topographic maps reflect mean difference between ipsilateral and contralateral waveforms mirrored across the hemispheres with the midline electrodes set to zero value. (A) ERPs observed ipsilateral and contralateral to a lateral target. Topographic map reflects lateral voltage difference from 200 to 250 msec poststimulus. (B) ERPs observed ipsilateral and contralateral to a lateral distractor. Topographic map reflects lateral voltage difference from 132 to 182, which reflects a 50-msec interval centered at the cross-conditional peak of the contralateral positivity at 155 msec. (C) Contralateral-minus-ipsilateral difference waves. The N2pc is reflected in negative waveform polarity, and Pd is reflected in positive polarity.

In the lateral distractor condition—when the target was presented at fixation—the early positivity appears larger in amplitude and is followed by a hint of continued positive signal (Figure 2B). Because the raw ERP is dominated by large positive deviation in both ipsilateral and contralateral waveforms, these positive polarity effects are more clearly visible in the difference waves provided in Figure 2C. Though the absolute lateral positivity was only marginally reliable in signed area analysis (p = .090), the waveform was reliably more positive in the lateral distractor condition than in the lateral target condition, both when measured using the signed area approach over a large temporal interval (50–500 msec, p = .048) and using a more traditional measure of mean amplitude across a 40-msec interval centered on the 155-msec cross-conditional peak of positive polarity (p = .015, d = 0.322).

Existing research has associated this early positive polarity effect—the early Pd—with suppression of the lateral distractor (Kerzel et al., 2018; Weaver et al., 2017; Sawaki & Luck, 2010). Its emergence in Experiment 1 thus constitutes an indicator of rapid distractor suppression during naturalistic search. As detailed in the Introduction, Experiment 2 was designed with two primary purposes: (a) to test if the early Pd emerges during naturalistic target detection and (b) to determine if it shows variance as a function of target presence prior to the emergence of evidence of target presence in frontal cortex.

EXPERIMENT 2

Methods

Participants

Sixteen healthy University of Trento students gave informed consent before completing Experiment 2. None had taken part in Experiment 1, all reported normal or corrected-to-normal vision, and all were paid for their participation. Fourteen of the participants (seven men, 18–26 years old) were right-handed.

Stimuli and Procedure

A new set of 600 photographs was used in Experiment 2 (see Figure 3 for examples). This set partially overlapped with that employed in Experiment 1 but was supplemented by additional photos taken by authors D. P. and G. B. and by additional images from the LabelMe database (Russell et al., 2008).

Figure 3. 

Examples of scene stimuli employed in Experiment 2.

Figure 3. 

Examples of scene stimuli employed in Experiment 2.

Photos once again contained examples of people and vehicles, but in Experiment 2, participants had to report whether an example of the target category was present in the scene. Half of participants responded with the left hand when the target was present and the right hand when it was not with the reverse response map for the remaining participants.

Scenes contained a target in 66% of trials. In 200 photos a person appeared to the left or right and a vehicle appeared at fixation, in 200 photos a vehicle appeared to the left or right and a person appeared at fixation, in 100 photos a vehicle appeared to the left or right with no person in the scene, and in 100 photos a person appeared to the left or right with no vehicle in the scene. Unlike in Experiment 1, vehicles and people could face directly toward or away from the camera. To increase task demands on selective attention, the scene was presented for 60 msec before being replaced by a naturalistic mask (created by generating white noise at a wide range of spatial frequencies and superimposing a naturalistic texture; see Hickey & Peelen, 2015, Figure 1, for an illustration).

Participants completed 60 practice trials before completing 20 experimental blocks of 60 trials. Images containing both object categories appeared twice during the experiment, once when participants were instructed to search for vehicles and once when they were instructed to search for people. All other procedures, including all EEG recording and analysis, were as in Experiment 1.

Statistical Analysis

In addition to the statistical approaches detailed for Experiment 1, in Experiment 2 lateral ERP component onset and offset latencies were measured and compared. For comparisons between components, onset and offset latencies were defined as the time at which an effect either first surpassed 80% of maximum or first fell below 80% of maximum. To test the reliability of conditional differences in onset and offset latency, null distributions were iteratively generated by randomly relabeling observations for each of the two effects of interest. That is, in a comparison of the latency of Effect A to Effect B, the null distribution was created by iteratively relabeling Effect A and Effect B for subsets of participants, measuring the latency difference between mean averages of these two synthetic sets and adding this to an accumulator. This was carried out for all 216 combinations of Effect A and Effect B, and the observed effect was compared with this null distribution.

In analysis of Experiment 2, we additionally compare the latency of a conditional difference in Pd components to the latency of a conditional difference in frontal ERPs. Because these effects differed in magnitude, the percentage of maximum approach to measuring onset was unsuitable (i.e., the much larger frontal effect was present for some time before reaching a substantive percentage of its maximum value). To estimate the onset of these effects, we contrasted the experimental conditions for each 1-msec sample in the ERP to identify the first evidence of statistically reliable difference (uncorrected threshold of p < .01). The first sample showing this difference that was followed by four more samples showing the same difference was defined as the onset of the effect. It is important to note that we employed noncausal filters during digitization of the EEG and thereafter and that these filters can shift effects forward in time. This is particularly the case when effects are large in magnitude, as is the case for frontal effects in the current data. The statistical estimates of absolute onset generated by this analysis should therefore be understood to be biased toward early estimation, particularly in the case of the large frontal effect. However, because the same filters are applied to all data, differences in onset observed between conditions are minimally impacted. Moreover, in the particular circumstance observed in the current results—where the larger effect follows the smaller effect—this impact of filtering will make estimates of latency differences between conditions conservative.

Results

Behavioral Results

When the target appeared at a lateral location and the distractor was central, mean RT was 585 msec (SD = 108 msec) and miss rate was 21% (SD = 16%). Participants were reliably faster (539 msec, SD = 105 msec; p < 10−5, d = 0.424) and more accurate (2.2% misses, SD = 2.4%; p < 10−5, d = 1.18) when the target was central and the distractor was lateral. The slowest responses occurred when the target was absent from the scene (642 msec, SD = 119 msec) where a false alarm rate of 7.9% was observed (SD = 4.7%).

Electrophysiological Results

In the lateral target/central distractor condition, when the distractor was presented at fixation, signed area analysis revealed a target-elicited N2pc that began at approximately 200 msec poststimulus (p = .001; Figure 4A). As in Experiment 1, this was preceded by early, positive polarity activity that potentially reflects an imbalance in the visual properties of the scene, here emerging at around 100 msec, but this did not emerge reliably from signed area analysis (p = .589). Similar results were observed in the lateral distractor/no-target condition, with the distractor rather than the target as the eliciting stimulus. A robust distractor-elicited N2pc emerged at approximately 180 msec (p = .002; Figure 4B) and was preceded by a positive effect at around 100 msec that did not reach significance (p = .616). In the lateral distractor/central target condition, when the target was presented at fixation, the lateral ERP shows no hint of a distractor-elicited N2pc, but the early positivity sustains until later in the ERP and signed area analysis identified a reliable Pd (p = .001; Figure 4C, 4D).

Figure 4. 

Waveforms from Experiment 2 as recorded at PO7/8, where both Pd and N2pc were maximal. (A) ERPs observed ipsilateral and contralateral to a lateral target when a distractor appeared at fixation. Topographic map reflects lateral voltage difference from 239 to 289 msec poststimulus, which reflects a 50-msec interval centered at the N2pc peak. (B) ERPs observed ipsilateral and contralateral to a lateral distractor when no target appeared in the scene. Topographic map reflects lateral voltage difference from 189 to 239 msec poststimulus, which reflects a 50-msec interval centered at the N2pc peak. (C) ERPs observed ipsilateral and contralateral to a latera distractor when the target appeared at fixation. Topographic map reflects lateral voltage difference from 160 to 210 msec poststimulus, which reflects a 50-msec interval centered at the Pd peak. (D) Contralateral-minus-ipsilateral difference waves.

Figure 4. 

Waveforms from Experiment 2 as recorded at PO7/8, where both Pd and N2pc were maximal. (A) ERPs observed ipsilateral and contralateral to a lateral target when a distractor appeared at fixation. Topographic map reflects lateral voltage difference from 239 to 289 msec poststimulus, which reflects a 50-msec interval centered at the N2pc peak. (B) ERPs observed ipsilateral and contralateral to a lateral distractor when no target appeared in the scene. Topographic map reflects lateral voltage difference from 189 to 239 msec poststimulus, which reflects a 50-msec interval centered at the N2pc peak. (C) ERPs observed ipsilateral and contralateral to a latera distractor when the target appeared at fixation. Topographic map reflects lateral voltage difference from 160 to 210 msec poststimulus, which reflects a 50-msec interval centered at the Pd peak. (D) Contralateral-minus-ipsilateral difference waves.

Onset analysis showed that the distractor-elicited N2pc in the lateral distractor/no-target condition emerged 54 msec earlier than the target-elicited N2pc in the lateral target/central distractor condition (p < 10−4). Analysis of offset latency demonstrated that the early Pd sustained in the lateral distractor/central target condition 34 msec longer than in the lateral target/central distractor condition (p = .005), which itself sustained 36 msec longer than in the lateral distractor/no-target condition (p = .020).

As illustrated in Figure 5A, the peak of early Pd amplitude in the lateral distractor/no-target condition (identified in Figure 5 as the target-absent condition) appears of greater magnitude than in the lateral distractor/central target conditions (identified in Figure 5 as the target-present condition). To test this, we determined the peak latency of positive lateral effect when these conditions were collapsed (121 msec) and subsequently extracted mean amplitude for each condition over a 40-msec interval centered at this time. The early distractor-elicited Pd was reliably larger when the target was absent from the scene (p = .033, d = 0.590).

Figure 5. 

(A) Frontocentral ERP as a function of whether target was present or absent in eliciting scene. (B) Lateral occipital contralateral-minus-ipsilateral difference wave as a function of whether target was present or absent in eliciting scene. Note that this panel reproduces Figure 4D for the purpose of illustration and comparison. Samples showing statistically significant difference between the waveforms are identified by shaded background (p < .01). No earlier latency difference was identified when a more liberal criterion was adopted (p < .05).

Figure 5. 

(A) Frontocentral ERP as a function of whether target was present or absent in eliciting scene. (B) Lateral occipital contralateral-minus-ipsilateral difference wave as a function of whether target was present or absent in eliciting scene. Note that this panel reproduces Figure 4D for the purpose of illustration and comparison. Samples showing statistically significant difference between the waveforms are identified by shaded background (p < .01). No earlier latency difference was identified when a more liberal criterion was adopted (p < .05).

To determine if the early Pd preceded or followed nonspatial evidence of target presence in frontal cortex, we contrasted target-present and target-absent scenes to extract the frontal ERP measure of ultrafast target detection first reported by Thorpe et al. (1996). This ERP effect expresses as an increase in amplitude when the eliciting scene does not contain a target and is therefore thought to reflect emergence of a control process involved in stopping a prepotent “target-present” response when this would be an error (Thorpe et al., 1996). Consistent with this interpretation, it has latency and distribution similar to that of the “no-go N2” (Pfefferbaum, Ford, Weller, & Kopell, 1985; see Folstein & Van Petten, 2008, for a review). We measured this effect across a set of frontal midline electrodes (F1, Fz, F2, FC1, FCz, and FC2) where it is known to appear first and where it is maximal (Thorpe et al., 1996). As illustrated in Figure 5A, statistical onset analysis determined that target presence emerged in this signal beginning at 150 msec poststimulus, closely replicating the 152-msec onset identified by Thorpe et al. (1996). In contrast, target presence impacted the early Pd from 111 msec poststimulus (Figure 5B).

To probe the functional role of the early Pd, we conducted a median split of conditional results based on trial RTs. As illustrated in Figure 6, the early Pd appears larger when the target was absent from the scene and participants were quick to correctly report its absence. To test the reliability of this pattern, we extracted mean Pd amplitude for each of the four waveforms illustrated in Figure 6 using the same 40-msec window centered described above. In the target-absent condition the early Pd was larger when participants responded quickly (p = .010, d = 0.642). The interaction of target presence and participant response speed was marginally significant (p = .065). Note that, although amplitude of the early Pd varied as a function of response latency, neither the latency nor amplitude of the subsequent distractor-elicited N2pc showed any difference in this analysis.

Figure 6. 

Contralateral-minus-ipsilateral difference waves as a function of target presence and median split based on trial RT.

Figure 6. 

Contralateral-minus-ipsilateral difference waves as a function of target presence and median split based on trial RT.

GENERAL DISCUSSION

What we know of visual search from studies employing synthetic stimuli arrays does not entirely match what we know of visual search from studies employing natural scenes. One conspicuous inconsistency is the speed of naturalistic search. Classic theories of search—developed from work with synthetic stimuli—suggest search for naturalistic targets in scenes should be slow because scenes contain so many objects. But naturalistic search is blazingly quick.

This disparity has been explained in two ways. One possibility is that detection of naturalistic targets—examples of real-world object categories—relies on the ability to detect any of a set of diagnostic midlevel features. By this, we can quickly infer the presence of a car in a scene because we have detected the combined presence of midlevel features like metallic sheen, a set of wheel-like circles, or a prototypical arrangement of headlights in a grill. This gives us a probabilistic notion that a car is present, with details and information about location filled in only when attention is deployed (Evans & Treisman, 2005).

This account depends on the plausible idea that extended experience of real-world object categories might render our visual system sensitive to midlevel visual features in the same way it is sensitive to low-level features. As a result, detection of overlearned midlevel features may not require attention, just as detection of low-level features—the color red, vertical orientation, movement—does not require attention (Treisman & Gelade, 1980). For the purpose of this study, the critical prediction of this account is that evidence of ultrafast object detection, like the frontal ERP effect reported by Thorpe et al. (1996), will precede evidence of spatial object processing such as distractor suppression.

The alternative is that detection of naturalistic targets requires selective processing, as predicted by classical theory, but that naturalistic search is more efficient than synthetic search. This efficiency could be supported by scene context, which might constrain search to locations likely to contain a target (e.g., Wolfe, Võ, et al., 2011; Torralba et al., 2006). The prediction from this account is that ultrafast object detection should follow spatially localized object processing and distractor suppression in particular, but that such localized object processing will emerge very quickly in naturalistic vision.

We used ERPs to differentiate between these accounts. In Experiment 1, we identified an index of early, spatially localized object processing that emerges during discrimination of naturalistic targets in scenes. This expressed as a rapid positive increase in visual ERP activity contralateral to a salient distractor that emerged under circumstances where target processing could not impact the lateral ERP. This importantly emerged under circumstances where the presence of the target itself could not contribute to the lateral ERP. Earlier research has shown that this early Pd reflects the suppression of distractors (Kerzel et al., 2018; Sawaki & Luck, 2010). For example, in concurrent EEG and eye-tracking work, the early distractor-elicited Pd precedes eye movements to targets, is absent when the eyes are captured to distractors, and has magnitude that predicts how close to the center of the target the eyes will land (Weaver et al., 2017).

The early Pd we observe in Experiment 1, though reliable in the sample, is unarguably small. We believe this is because the target in the eliciting scenes appeared at fixation and was therefore very easy to find. This design feature was adopted to maintain naturalism in the scene and to reduce potential confounds, but it is likely to have rendered the distractor less conspicuous and therefore less demanding of distractor suppression. Some support for this notion is provided by results from Experiment 2, where the early Pd is larger when the target is absent from the scene and attention is more heavily taxed.

The early Pd identified in Experiment 1 constitutes an early index of spatially localized distractor suppression during search. Experiment 2 builds from this finding to test the involvement of this distractor suppression in the detection of naturalistic targets. A core goal of Experiment 2 was to determine if this kind of object processing was evident in the visual ERP before or after evidence of target presence emerged in frontal brain activity.

Detection of a naturalistic target in Experiment 2 was associated with a robust N2pc, but this component emerged later than the onset of information about target presence in frontal ERPs and therefore could not play the causative role in target detection that is suggested by classic attention theory. However, frontal evidence of target presence, emerging at 150 msec poststimulus, was preceded by sensitivity to target presence in the early Pd at 111 msec. The early Pd was larger when the eliciting scene did not contain a target, suggesting an increase in distractor suppression when the target was not easily found. This effect was accentuated when participants quickly reported the absence of the target, consistent with prior observations of an association between Pd and response latency (Weaver et al., 2017; Gaspar & McDonald, 2014).

The Pd was briefer when the target was absent from the display, with subsequent emergence of a distractor-elicited N2pc reflecting attentional selection of this stimulus. One possibility here is that participants attentionally selected the distractor in the process of confirming the absence of the target from the scene. However, if this distractor-elicited N2pc were involved in evidence accumulation leading to a target-absent response, it might be expected to vary as a function of response latency. But no hint of this emerged in data analysis. The functional role of this distractor-elicited N2pc thus remains an open question: It may be involved in instrumental decision-making, but it does not vary according to expectations derived from this interpretation.

Our results are consistent with fMRI evidence from studies of naturalistic vision demonstrating a role for distractor suppression in visual search (Seidl, Peelen, & Kastner, 2012). However, the idea that ultrafast object detection relies on spatial operations acting on object representations in retinotopic cortex as at odds with recent conclusions drawn by Battistoni, Kaiser, Hickey, and Peelen (2018) from multivariate analysis of magnetoencephalographic data. Participants in the study of Battistoni et al. (2018) completed two tasks. In the first, they detected the presence of a synthetic target rendered salient through manipulation of global display characteristics. In the second, they detected the presence of lateral examples of real-world object categories in photographs of natural scenes. Magnetoencephalography was recorded in both tasks, and a multivariate classifier trained on the synthetic search task was used to determine the timing of selection in the naturalistic search task. Results showed that reliable decoding of target location emerged only after reliable decoding of target presence, leading the authors to conclude that target detection preceded the action of selective attention. But the effect identified in the current paper—in which target presence changes the action of a spatial mechanism of distractor suppression—does not provide information about target location. The dependent measure employed by Battistoni et al. (2018) was therefore insensitive to the effect identified in the current paper. There is the opportunity for future work employing multivariate classification to identify the earliest emergence of distractor suppression in naturalistic vision.

In summary, we provide evidence that ultrafast object detection involves ultrafast suppression of salient distractors. The quick distractor suppression we identify is consistent with the notion that global scene information can be used to rapidly suppress salient objects that are unlikely to be targets (or perhaps locations that are unlikely to hold targets). This mechanism improves efficiency by constraining the search space, giving us the ability to very rapidly detect the presence of targets in natural scenes.

Reprint requests should be sent to Clayton Hickey, School of Psychology, University of Birmingham, B15 2TT, Birmingham, United Kingdom, or via e-mail: c.m.hickey@bham.ac.uk.

REFERENCES

Battistoni
,
E.
,
Kaiser
,
D.
,
Hickey
,
C.
, &
Peelen
,
M. V.
(
2018
).
The time course of spatial attention during naturalistic visual search
.
Cortex
.
doi: 10.1016/j.cortex.2018.11.018
.
Bell
,
A. J.
, &
Sejnowski
,
T. J.
(
1995
).
An information-maximization approach to blind separation and blind deconvolution
.
Neural Computation
,
7
,
1129
1159
.
Cohen
,
J.
(
1988
).
Statistical power analysis for the behavioral sciences
.
Hillsdale, NJ
:
Erlbaum
.
Das
,
K.
,
Guo
,
F.
,
Geisbrecht
,
B.
, &
Eckstein
,
M. P.
(
2010
).
Predicting contextual locations in natural scenes from neural activity
.
Journal of Vision
,
10
,
1295
.
Doallo
,
S.
,
Patai
,
E. Z.
, &
Nobre
,
A. C.
(
2013
).
Reward associations magnify memory-based biases on perception
.
Journal of Cognitive Neuroscience
,
25
,
245
257
.
Evans
,
K. K.
, &
Treisman
,
A.
(
2005
).
Perception of objects in natural scenes: Is it really attention free?
Journal of Experimental Psychology: Human Perception and Performance
,
31
,
1476
1492
.
Fabre-Thorpe
,
M.
,
Delorme
,
A.
,
Marlot
,
C.
, &
Thorpe
,
S.
(
2001
).
A limit to the speed of processing in ultra-rapid visual categorization of novel natural scenes
.
Journal of Cognitive Neuroscience
,
13
,
171
180
.
Folstein
,
J. R.
, &
Van Petten
,
C.
(
2008
).
Influence of cognitive control and mismatch on the N2 component of the ERP: A review
.
Psychophysiology
,
45
,
152
170
.
Gaspar
,
J. M.
, &
McDonald
,
J. J.
(
2014
).
Suppression of salient objects prevents distraction in visual search
.
Journal of Neuroscience
,
34
,
5658
5666
.
Hickey
,
C.
,
Di Lollo
,
V.
, &
McDonald
,
J. J.
(
2009
).
Electrophysiological indices of target and distractor processing in visual search
.
Journal of Cognitive Neuroscience
,
21
,
760
775
.
Hickey
,
C.
,
McDonald
,
J. J.
, &
Theeuwes
,
J.
(
2006
).
Electrophysiological evidence of the capture of visual attention
.
Journal of Cognitive Neuroscience
,
18
,
604
613
.
Hickey
,
C.
, &
Peelen
,
M. V.
(
2015
).
Neural mechanisms of incentive salience in naturalistic human vision
.
Neuron
,
85
,
512
518
.
Kerzel
,
D.
,
Barras
,
C.
, &
Grubert
,
A.
(
2018
).
Suppression of salient stimuli inside the focus of attention
.
Biological Psychology
,
139
,
106
114
.
Li
,
F. F.
,
VanRullen
,
R.
,
Koch
,
C.
, &
Perona
,
P.
(
2002
).
Rapid natural scene categorization in the near absence of attention
.
Proceedings of the National Academy of Sciences, U.S.A.
,
99
,
9596
9601
.
Luck
,
S. J.
, &
Hillyard
,
S. A.
(
1994a
).
Electrophysiological correlates of feature analysis during visual search
.
Psychophysiology
,
31
,
291
308
.
Luck
,
S. J.
, &
Hillyard
,
S. A.
(
1994b
).
Spatial filtering during visual search: Evidence from human electrophysiology
.
Journal of Experimental Psychology: Human Perception and Performance
,
20
,
1000
1014
.
Oliva
,
A.
, &
Torralba
,
A.
(
2007
).
The role of context in object recognition
.
Trends in Cognitive Sciences
,
11
,
520
527
.
Patai
,
E. Z.
,
Doallo
,
S.
, &
Nobre
,
A. C.
(
2012
).
Long-term memories bias sensitivity and target selection in complex scenes
.
Journal of Cognitive Neuroscience
,
24
,
2281
2291
.
Peelen
,
M. V.
, &
Kastner
,
S.
(
2014
).
Attention in the real world: Toward understanding its neural basis
.
Trends in Cognitive Sciences
,
18
,
242
250
.
Pfefferbaum
,
A.
,
Ford
,
J. M.
,
Weller
,
B. J.
, &
Kopell
,
B. S.
(
1985
).
ERPs to response production and inhibition
.
Electroencephalography and Clinical Neurophysiology
,
60
,
423
434
.
Potter
,
M. C.
(
1975
).
Meaning in visual search
.
Science
,
187
,
965
966
.
Rousselet
,
G. A.
,
Fabre-Thorpe
,
M.
, &
Thorpe
,
S. J.
(
2002
).
Parallel processing in high-level categorization of natural images
.
Nature Neuroscience
,
5
,
629
630
.
Russell
,
B. C.
,
Torralba
,
A.
,
Murphy
,
K. P.
, &
Freeman
,
W. T.
(
2008
).
LabelMe: A database and web-based tool for image annotation
.
International Journal of Computer Vision
,
77
,
157
173
.
Sawaki
,
R.
,
Geng
,
J. J.
, &
Luck
,
S. J.
(
2012
).
A common neural mechanism for preventing and terminating the allocation of attention
.
Journal of Neuroscience
,
32
,
10725
10736
.
Sawaki
,
R.
, &
Luck
,
S. J.
(
2010
).
Capture versus suppression of attention by salient singletons: Electrophysiological evidence for an automatic attend-to-me signal
.
Attention, Perception, & Psychophysics
,
72
,
1455
1470
.
Seidl
,
K. N.
,
Peelen
,
M. V.
, &
Kastner
,
S.
(
2012
).
Neural evidence for distracter suppression during visual search in real-world scenes
.
Journal of Neuroscience
,
32
,
11812
11819
.
Thorpe
,
S. J.
,
Fize
,
D.
, &
Marlot
,
C.
(
1996
).
Speed of processing in the human visual system
.
Nature
,
381
,
520
522
.
Torralba
,
A.
,
Oliva
,
A.
,
Castelhano
,
M. S.
, &
Henderson
,
J. M.
(
2006
).
Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search
.
Psychological Review
,
113
,
766
786
.
Treisman
,
A.
(
2006
).
How the deployment of attention determines what we see
.
Visual Cognition
,
14
,
411
443
.
Treisman
,
A. M.
, &
Gelade
,
G.
(
1980
).
A feature-integration theory of attention
.
Cognitive Psychology
,
12
,
97
136
.
VanRullen
,
R.
, &
Thorpe
,
S. J.
(
2001
).
The time course of visual processing: From early perception to decision-making
.
Journal of Cognitive Neuroscience
,
13
,
454
461
.
Weaver
,
M. D.
,
van Zoest
,
W.
, &
Hickey
,
C.
(
2017
).
A temporal dependency account of attentional inhibition in oculomotor control
.
Neuroimage
,
147
,
880
894
.
Wolfe
,
J. M.
,
Alvarez
,
G. A.
,
Rosenholtz
,
R.
,
Kuzmova
,
Y. I.
, &
Sherman
,
A. M.
(
2011
).
Visual search for arbitrary objects in real scenes
.
Attention, Perception, & Psychophysics
,
73
,
1650
1671
.
Wolfe
,
J. M.
,
Cave
,
K. R.
, &
Franzel
,
S. L.
(
1989
).
Guided search: An alternative to the feature integration model for visual search
.
Journal of Experimental Psychology: Human Perception and Performance
,
15
,
419
433
.
Wolfe
,
J. M.
,
,
M. L.-H.
,
Evans
,
K. K.
, &
Greene
,
M. R.
(
2011
).
Visual search in scenes involves selective and nonselective pathways
.
Trends in Cognitive Sciences
,
15
,
77
84
.
Woodman
,
G. F.
, &
Luck
,
S. J.
(
2003
).
Serial deployment of attention during visual search
.
Journal of Experimental Psychology: Human Perception and Performance
,
29
,
121
138
.