The human cognitive system is highly efficient in extracting information from our visual environment. This efficiency is based on acquired knowledge that guides our attention toward relevant events and promotes the recognition of individual objects as they appear in visual scenes. The experience-based representation of such knowledge contains not only information about the individual objects but also about relations between them, such as the typical context in which individual objects co-occur. The present EEG study aimed at exploring the availability of such relational knowledge in the time course of visual scene processing, using oscillatory evoked gamma-band responses as a neural correlate for a currently activated cortical stimulus representation. Participants decided whether two simultaneously presented objects were conceptually coherent (e.g., mouse–cheese) or not (e.g., crown–mushroom). We obtained increased evoked gamma-band responses for coherent scenes compared with incoherent scenes beginning as early as 70 msec after stimulus onset within a distributed cortical network, including the right temporal, the right frontal, and the bilateral occipital cortex. This finding provides empirical evidence for the functional importance of evoked oscillatory activity in high-level vision beyond the visual cortex and, thus, gives new insights into the functional relevance of neuronal interactions. It also indicates the very early availability of experience-based knowledge that might be regarded as a fundamental mechanism for the rapid extraction of the gist of a scene.
When pictures are presented as briefly as 125 msec in rapid succession, we are nevertheless able to detect an object category reliably (Potter, 1975), demonstrating the outstanding speed of visual perception. Several studies have investigated how fast such object knowledge is activated using ERPs or intracranial recordings and have shown it to be available roughly 100–150 msec after stimulus onset (Liu, Agam, Madsen, & Kreiman, 2009; VanRullen & Thorpe, 2001; Thorpe, Fize, & Marlot, 1996). When scenes contain multiple objects, not only the knowledge related to individual objects comes into play but also the knowledge reflecting the general meaning of the scene, the so-called gist (Bar, 2004; Davenport & Potter, 2004; Hochstein & Ahissar, 2002; Henderson & Hollingworth, 1999). The extraction of gist from a visual scene is based on the conceptual coherence between its constituting elements. Although it has been demonstrated that physiostatistical properties of the visual input, such as the spatial frequencies in the picture of a scene, contribute to the extraction of gist (e.g., Oliva & Torralba, 2001, 2007; Schyns & Oliva, 1994), the present study focused on the semantic relation between objects that fundamentally contributes to this mechanism.
The coherence between objects arises from several semantic scene properties, such as the probability of co-occurrence of objects in a common context or their relative position and size. Several studies have shown that the relational embedding of a target object in a scene facilitates its detection (e.g., Biederman, Mezzanotte, & Rabinowitz, 1982), whereas the embedding of the object in an unrelated scene (e.g., an octopus in a farm scene) hinders its detection. For example, objects that are not integrated in a scene are fixated more often and longer (Henderson, Weeks, & Hollingworth, 1999; Loftus & Mackworth, 1978) and attract more attention (Gordon, 2004) because they require a higher demand in processing. The coherence of objects also affects the processing of task-irrelevant context objects. For example, a naming study demonstrated that context objects in coherent scenes are more likely to be processed up to a lexical–phonological level (Oppermann, Jescheniak, & Schriefers, 2008; see also Oppermann, Jescheniak, Schriefers, & Görges, 2010). Overall, these studies demonstrate that the gist provided by semantic relations of objects in a scene promotes visual processing.
The present study focused on (1) the temporal dynamics of gist extraction and (2) the cortical origins of gist perception. We hypothesized that the conceptual relations between multiple depicted objects are effective as early as incoming sensory information matches existing semantic memory traces in a first feed-forward sweep (Lamme & Roelfsema, 2000). To tap into the processing of these relations, we used high-density EEG (128 electrodes) and analyzed the so-called evoked gamma-band response (eGBR). eGBRs reflect cortical oscillatory activity in a frequency range above approximately 25 Hz. They occur at a latency range of approximately 100 msec after stimulus onset and are precisely phase-locked to stimulus onset (e.g., Tallon-Baudry & Bertrand, 1999). Cortical oscillatory activity in the gamma-band frequency range is supposed to reflect the activation of distributed cortical stimulus representations (Fries, 2005; Engel & Singer, 2001). This activation can be influenced by the existence of stimulus-specific memory representations (Roye, Schröger, Jacobsen, & Gruber, 2010; Herrmann, Lenz, Junge, Busch, & Maess, 2004; Herrmann, Munk, & Engel, 2004). For example, the study by Herrmann, Lenz, et al. (2004) demonstrated that real-world objects, which are represented in long-term memory, evoked a stronger eGBR at occipital electrodes than nonsense objects without a long-term memory representation. Hence, the authors argued that the eGBR reflects the integration of incoming sensory information and existing memory templates.
Our experiment contrasted stimulus displays, in which two objects shared a semantic–conceptual relation (coherent scene, e.g., mouse–cheese) with stimulus displays in which the two objects had no obvious relation (incoherent scene, e.g., crown–mushroom; see also Figure 1). We expected a stronger eGBR elicited by coherent scenes, reflecting the processing of relational information in addition to the processing of individual object information.
Twenty-two healthy, right-handed adults participated in the experiment. All of them had normal or corrected-to-normal vision.
Materials, Design, and Procedure
Three hundred twenty (320) line drawings of single objects were paired forming scenes consisting of two objects. Half the objects were combined to form a coherent context (coherent condition); the other half were arbitrarily combined (incoherent condition, 80 scenes per condition). Pictures of scenes were presented as white line drawings on a black screen (see Figure 1 for an example). Pictures of each scene were sized to fill an imaginary square of about 8 × 8 cm, hence, were of same size in both conditions. The number of filled pixels was matched between conditions. Furthermore, we have analyzed the energy level across the spatial frequency range of our stimuli, because spatial frequency information of visual scenes may be attributed to the speed and different pathways of processing (e.g., Bar, 2003) and may convey different aspects of scene information (e.g., Schyns & Oliva, 1994). There were no significant differences between conditions when analyzing the complete frequency range, only low spatial frequency information (below four cycles per degree of visual angle) or only high spatial frequency information (above six cycles per degree of visual angle), all Fs < 1 and no interaction with spatial frequency, Fs < 1.
The sequence of the pictures was randomized for each participant. All objects were used only once, either in the coherent or in the incoherent condition. This procedure prevents participants from becoming familiar with any specific object and building up expectations with respect to the topic of any upcoming scene.
Participants were instructed to decide whether the object pairs shared a general meaning (coherent condition) or not (incoherent condition) by giving a push-button response as fast and as accurately as possible. The left-to-right assignment of response keys was counterbalanced across participants. Each trial lasted between 2900 and 3200 msec. First, a fixation cross appeared at the center of the screen placed 1.5 m in front of the participants (frame rate = 70 Hz). After a variable interval of 500–800 msec, the scene was shown for 700 msec (visual angle of approximately 5°). Picture onset was synchronized to the vertical retrace of the monitor. After the disappearance of the picture, the fixation cross remained on the screen for another 800 msec and was followed by a blank screen (900 msec).
The EEG was recorded continuously from 128 active electrodes using a BioSemi Active Two amplifier system (sampling rate = 512 Hz). Eye movements and blinks were monitored by recording the horizontal and vertical EOG. Two additional electrodes (CMS = common mode sense and DRL = driven right leg; cf. www.biosemi.com/faq/cms&drl.htm) were used as recording reference and ground. For further analysis, the average reference was used.
Artifact correction was performed by means of statistical correction of artifacts in dense array studies (Junghöfer, Elbert, Tucker, & Rockstroh, 2000). Furthermore, incorrect responses were excluded from further analyses (approximately 18 trials per participant). The average rejection rate after artifact correction was approximately 20% of the epochs. Next, the EEG was off-line averaged time-locked to stimulus onset (one epoch of 1500 msec included a 500-msec prestimulus baseline interval).
Spectral changes in oscillatory activity were analyzed by means of Morlet wavelet transformations (Bertrand & Pantev, 1994) using a width of seven cycles per wavelet. This method provides a time-varying magnitude of the signal in each frequency band, leading to a Time × Frequency (TF) representation of the data. To determine a suitable time and frequency window for the statistical analysis, a TF plot averaged across all conditions and all electrodes was used. Furthermore, TF plots of typical participants were generated to document the stability of eGBRs at a single-subject level. These analyses resulted in the selection of a TF window of 70–130 msec and 30–50 Hz. Next, we submitted the mean eGBR amplitudes (70–130 msec, 30–50 Hz) within six regional electrode means (see Figure 2B) to a repeated measure ANOVA involving the three fixed variables Coherence (coherent vs. incoherent object pairs), Caudality (anterior vs. central vs. posterior), and Hemisphere (left vs. right).
Finally, and to validate the oscillatory nature of the eGBR, we have band-pass filtered the ERP from 30 to 50 Hz and compared the topographies of successive peaks and troughs of the resulting signal in the time domain.1
To localize the cortical generators of the statistically significant eGBR differences between coherent and incoherent stimulus pairings, we applied variable resolution electromagnetic tomography (VARETA; Bosch-Bayard et al., 2001). This procedure provides the spatial intracranial distribution of primary current densities (PCDs) in source space, which is best compatible with the amplitude distribution in electrode space. In particular, the eGBR was transformed into the frequency domain as described above (wavelet analysis), and VARETA was applied to the complex wavelet coefficients (cf. Gruber, Trujillo-Barreto, Giabbiconi, Valdes-Sosa, & Müller, 2006). Because of the linear relationship between EEG and PCD, the complex source reconstructions can be interpreted as an estimate of the wavelet coefficients of the PCD (complex inverse solution; Trujillo-Barreto, Aubert-Vazquez, & Valdes-Sosa, 2004). As possible sources of the signal, 3244 grid points (“voxels”) of a 3-D grid (7-mm grid spacing) were used. This grid and the arrangement of 128 electrodes were placed in registration with the average probabilistic MRI atlas (“average brain”) produced by the Montreal Neurological Institute (MNI; Evans et al., 1993). Statistical comparisons were carried out by means of Hottelling's T2 tests to localize differences in activation between coherent and incoherent stimuli. Activation threshold corrections to account for spatial dependencies between voxels were calculated by means of random field theory (Worsley et al., 1996). Regarding all SPMs, the results were thresholded at a significance level of p < .01. Finally, the outcomes were depicted as 3-D activation images constructed on the basis of the MNI average brain.
Before all ERP analyses, a 25-Hz low-pass filter was applied to the data. Furthermore, the averaged signal in a baseline period (500 msec prestimulus) was subtracted from all samples. To rule out perceptual differences in early visual processing between the two experimental conditions, we compared the following early ERP components: P1 (100–120 msec) and N1 (155–175 msec). Condition differences within these components were analyzed by means of paired t tests (averaged amplitudes across posterior electrodes). Additionally, we analyzed a late component (L1, 300–600 msec) at the six regional means presented in Figure 2B. Moreover, the ERP was examined within the same time window as the eGBR (70–130 msec), thereby using the same repeated measure ANOVA model as for the eGBR.
Behavioral data revealed faster RTs for coherent compared with incoherent object pairs (coherent: mean = 743 msec, SEM = 33 msec; incoherent: mean = 846 msec, SEM = 35 msec; mean difference = 103 msec, SEM = 8 msec; t(21) = 13.91; p < .0001).
In Figure 2A, the TF plot of the stimulus-locked activity over all participants averaged across conditions and all electrodes is presented. Figure 3 shows TF plots of six typical participants. On the basis of these plots, an eGBR peak in a TF window of 30–50 Hz and 70–130 msec was selected for further analyses.2
We found an interaction of Coherence and Hemisphere (F(1, 21) = 6.91, p < .02, MSE = .303), although no other effects involving the factor Coherence reached significance. Subsequent planned comparisons (t tests) revealed that the eGBR was significantly increased for coherent compared with incoherent object pairs in the right hemisphere (coherent: mean = 0.170 μV, SEM = 0.027 μV; incoherent: mean = 0.128 μV, SEM = 0.025 μV; mean difference = 0.042 μV, SEM = 0.016 μV; t(21) = 2.66, p < .02), although no difference was observable in the left hemisphere (coherent: mean = 0.135 μV, SEM = 0.026 μV; incoherent: mean = 0.133 μV; SEM = 0.025 μV; mean difference = 0.003 μV, SEM = 0.012 μV; t(21) < 1; Figure 2C).
In Figure 4, the topographies of successive peaks and troughs of the band-pass filtered ERP from 30 to 50 Hz is presented. The result showed a stable topography across these peaks and troughs, indicating the oscillatory nature of the here reported eGBR effect.
The right hemispherical distribution of the coherence effect (Figure 2B) was supported by the VARETA that estimates the sources of the intracranial density distributions compatible with the observed scalp voltage topographies (see also Gruber et al., 2006). This analysis revealed an activated network comprising the right middle frontal gyrus, the right middle and superior temporal gyrus, the precentral gyrus, and the bilateral occipital lobes as the source of the effect (Figure 5).
In the comparison of the coherent and incoherent condition at posterior electrodes, none of the amplitude differences of the early ERP components (P1, N1) reached significance (P1: t(21) < 1; N1: t(21) = 1.20, p > .24; see also Figure 6).
Regarding the L1 component (300–600 msec), the ANOVA resulted in a significant interaction of Coherence and Caudality (F(2, 42) = 83.84, p < .0001, MSE = .324). Subsequent planned comparisons (t tests) revealed that the L1 component was significantly more negative for coherent compared with incoherent condition at posterior electrodes (see Figure 6; posterior coherent: mean = 4.07 μV, SEM = 0.65 μV; posterior incoherent: mean = 5.37 μV, SEM = 0.65 μV; mean difference = 1.30 μV, SEM = 0.12 μV; t(21) = 10.89, p < .0001), whereas the coherent condition was significantly more positive at central and anterior electrodes (central coherent: mean = −1.02 μV, SEM = 0.18 μV; central incoherent: mean = −1.35 μV, SEM = 0.17 μV; mean difference = 0.33 μV, SEM = 0.06 μV; t(21) = 5.57, p < .0001; anterior coherent: mean = −3.44 μV, SEM = 0.68 μV; anterior incoherent: mean = −4.25 μV, SEM = 0.66 μV; mean difference = 0.81 μV, SEM = 0.12 μV; t(21) = 7.04, p < .0001). This ANOVA result is underlined by the difference topography (coherent minus incoherent) in Figure 6 (right topography).
The ERP for the same regional means and the same latency range (70–130 msec) as the eGBR revealed no effects (in the ANOVA involving the three fixed variables Coherence, Caudality, and Hemisphere; none of the interactions involving the factor coherence nor the main effect of Coherence reached significance; all F < 2.3, p > .1). Furthermore, Figure 6 (left topography) depicts the topography of the ERP averaged across both coherence conditions in the latency range from 70 to 130 msec to allow for a comparison with the topographies depicted in Figure 4.
In summary, our analyses show that the results regarding the eGBR are not accompanied by changes in early ERP components. They are specific to the gamma-band range. Thus, the transformation in the frequency domain reveals effects that are not visible in conventional ERP analyses.
The present study investigated the temporal availability of semantic knowledge in the processing of visual scenes by using oscillatory activity in the gamma-band range as a neural correlate of the formation of a coherent scene representation. We observed a very early effect in the eGBR (30–50 Hz) in a latency range of about 70–130 msec. Within this time window, the eGBR was increased for scenes in which objects were semantically related compared with unrelated objects. Thus, semantic scene knowledge is activated at early stages within the visual processing hierarchy. The results suggest that the difference of eGBRs between the coherent and the incoherent condition reflects the facilitated binding of two objects into a coherent scene representation.
However, before accepting this far-reaching conclusion, we have to rule out alternative explanations of this effect. Within the experiment, an individual object occurred only once, either in the coherent or in the incoherent condition. Because different objects were used in each condition, one might argue that, in fact, some physical differences between object sets used in both conditions may have caused the observed eGBR effect. Indeed, there are studies which have shown that eGBR measurements are sensitive to physical stimulus characteristics (e.g., Schadow et al., 2007; Busch, Debener, Kranczioch, Engel, & Herrmann, 2004; for a review, see Herrmann, Fründ, & Lenz, 2010). For example, a study by Busch et al. (2004) demonstrated that the eGBR increases with the size of a stimulus. Such variation in the eGBR is accompanied by concurrent changes in the early ERPs (P1 and N1; see also Martinovic, Gruber, & Müller, 2008; Busch, Herrmann, Müller, Lenz, & Gruber, 2006) and, most importantly, can be attributed to visual areas from the scalp topography. In our study, several reasons argue against the objection that physical stimulus characteristics between conditions might have caused the difference in eGBRs. In each condition, we used 160 different individual objects. To start with, two objects each were paired, forming object configurations that had the same size and contained an equal number of pixels in both conditions (see Materials). Furthermore, we observed no differences in early ERPs (P1, N1), which should have been affected by physical stimulus differences between picture sets. Finally, the eGBR effect in the present study has been localized to a network, including the right middle and superior temporal gyrus and the right middle frontal gyrus (see Figure 2B), whereas eGBR differences caused by physical stimulus properties exhibit occipital loci and show no lateralization. Taken together, all these arguments speak against the interpretation of our effect in terms of differences in physical stimulus properties. Furthermore, preliminary data from a study in which stimuli were counterbalanced across conditions by combining the same individual objects into coherent and incoherent pairs revealed a similar eGBR effect, as we observed in the present study (Oppermann, Hassler, & Gruber, in preparation). Taken all these points together, we can rule out that physical stimulus characteristics caused the observed effect. Rather, it has to be concluded that the semantic relation between objects is responsible for our finding.
A further point to address is the oscillatory nature of the observed effect in the gamma-band range. The TF plots of individual participants in Figure 3 show clear peaks of activity within the gamma-band range. This demonstrates that the effect is a genuine gamma-band effect and not only a residual of activity in a lower frequency range as suggested by the averaged TF plot in Figure 2A. This argument is further supported by the analyses of other frequency ranges in which no differences between coherent and incoherent conditions were found (see Footnote 2).
To corroborate the oscillatory nature of our eGBR effect, we have to demonstrate that the effect was not caused by successive peaks of activation of non-oscillatory events. This possibility was already discussed in previous studies for the auditory modality (Müller, Keil, Kissler, & Gruber, 2001; Pantev et al., 1993; Başar, Rosen, Başar-Eroglu, & Greitschus, 1987). According to this debate, the eGBR could reflect band-pass filtered portions of the middle-latency components. Thus, the eGBR would functionally equal an evoked response in the time domain. This alternative seems unlikely by the stability of the topographies across successive peaks and troughs of the band-pass filtered ERP in the gamma-band frequency range from 30 to 50 Hz (see Figure 4), because successive non-oscillatory events would be accompanied by changes of the topography. This finding supports the oscillatory nature of our eGBR effect.
Given all these arguments, our results demonstrate that early eGBRs reflecting cortical oscillatory activity are modulated by the semantic coherence of objects embedded in scenes. This finding provides first evidence that effects in eGBR measures are not confined to early visual areas reflecting the processing of physical stimulus characteristics or the top–down modulation of this processing by attention or stimulus anticipation (for a current review, see Herrmann et al., 2010). Rather, eGBRs might also reflect a first matching of visual input with stored long-term memory representations (see Herrmann, Munk, et al., 2004, for a related view) accompanied by the activation of a distributed network in areas beyond the visual cortex. Although the existence of visual eGBRs in structures beyond visual areas (with a central peak) was already demonstrated in previous studies, they were not shown to be sensitive to variations in stimulus type or other manipulations so far (e.g., Tallon-Baudry, Bertrand, Delpuech, & Pernier, 1997; Tallon-Baudry, Bertrand, Wienbruch, Ross, & Pantev, 1997). Thus, our data suggest that the presentation of multiple objects triggers the activation of a conjoint network, which represents the semantic coherence of objects. This conclusion supports the assumption that oscillatory activity of neuronal networks reflects the integration of cortically distributed representations into a coherent whole (Singer, 1999; Singer & Gray, 1995; von der Malsburg & Schneider, 1986). In the present case, this indicates the extraction of the gist of a visual scene. This binding by synchronization may reflect the preferred mechanism to bind individual object representations that can occur in various constellations to a coherent scene representation (Singer, 1999). Whether this binding mechanism only becomes effective when the coherence between objects is task-relevant and attention is directed to this aspect or whether it is independent of the task relevance needs to be explored in further studies.
In recent years, there is growing evidence that visual processing can be affected by contextual memory at about or even before 100 msec after stimulus appearance (Chaumon, Drouet, & Tallon-Baudry, 2008; Meeren, Hadjikhani, Ahlfors, Hamalainen, & de Gelder, 2008; Pourtois, Rauss, Vuilleumier, & Schwartz, 2008). These effects were reflected in ERPs of the EEG or magnetic fields of the magneto-encephalogram and were located in the visual cortex. Why these processes appear in ERPs in the time domain whereas the coherence of scenes is indicated in oscillatory eGBRs needs to be addressed in future studies. However, it may be speculated that the effects in the ERPs are based on perceptual familiarity, which is stored in a kind of permanent memory representation (e.g., visual search in familiar and unfamiliar textures or upright and upside-down presented images). The individual objects of coherent and incoherent object configurations in the current study did not differ in their familiarity to the participants. Therefore, we did not expect any difference between conditions at this level. The difference between conditions in our study might be because of a more abstract level of scene perception at which the semantic relation between the objects is reflected. Considering the type of stimuli used (see Figure 1) and having realized that participants had not seen the configurations of the objects before, it is likely that the amount of activated neurons did not differ between both conditions at early sensory processing stages, which would have been reflected in ERP differences. Because the effect was present in the eGBRs, our data support the functional importance of oscillatory activity for the formation of coherent stimulus representations in the brain.
The source reconstruction of the present eGBR effect suggests that the extraction of scene information is achieved in a widely distributed network that involves temporal, precentral, frontal, and occipital cortical areas. The center of activity was localized to temporal areas of the right hemisphere. This finding is in line with studies showing that temporal areas are involved in the representation of global categorical and relational knowledge in humans as well as primates (e.g., Gronau, Neta, & Bar, 2008; Sugase, Yamane, Ueno, & Kawano, 1999). Furthermore, the spatio-temporal characteristic of our effect in temporal areas provides evidence for a processing that is mainly based on a feed-forward architecture in the ventral stream (e.g., Meeren et al., 2008; Serre, Oliva, & Poggio, 2007; Lamme & Roelfsema, 2000). That applies in particular to the nonrecurring presentation of our stimuli because, in contrast to previous studies (e.g., Liu et al., 2009; VanRullen & Thorpe, 2001; Thorpe et al., 1996), our participants could not develop any expectation of the upcoming scene content that may have facilitated its processing in a top–down guided manner. In contrast, the involvement of occipital areas is known from previous studies (e.g., Herrmann, Lenz, et al., 2004) and is argued to reflect the integration of the incoming sensory information and the matched memory templates. Whether this integration into the network goes along a connection to temporal areas reflecting matched memory representations (Herrmann et al., 2010) or is forwarded by frontal areas (Strüber, Basar-Eroglu, Hoff, & Stadler, 2000; Tallon-Baudry, Bertrand, Delpuech, et al., 1997; Tallon-Baudry, Bertrand, Wienbruch, et al., 1997) cannot be decided based on our data. The involvement of the middle frontal gyrus in this network suggests a link to the attention network and might reflect the detection of a behaviorally relevant stimulus (Corbetta, Patel, & Shulman, 2008; Corbetta & Shulman, 2002). Otherwise, this frontal activation could also reflect a fast mechanism facilitating object recognition in a top–down guided manner (Bar et al., 2006; Bar, 2003). According to this mechanism, the partially analyzed global scene information is directly transmitted from the visual cortex to the pFC. On the basis of this global information, expectations are formed there about most likely interpretations of the visual input, which are then transmitted to the temporal cortex (Bar et al., 2006). In the temporal cortex, this top–down signal is integrated with the incoming bottom–up signal from the visual cortex. According to this assumption, the activation in the temporal cortex would not only reflect pure bottom–up activation in a feed-forward architecture but also the matching with a top–down process from frontal areas. Notably, all this activation is driven by the stimuli itself without any specific (semantic) expectation of the upcoming content. All mechanisms discussed here suggest a fast connectivity pattern of involved areas in our activated network.
What may be the reason of the seen hemispheric disparity? To speculate, the difference might stem from a general preference in analyzing global structural information in the right hemisphere and in analyzing local aspects in the left hemisphere (e.g., Malinowski, Hübner, Keil, & Gruber, 2002; Robertson & Ivry, 2000; Fink et al., 1996). This assumption is supported by a study suggesting that the right occipito-temporal cortex is involved in an initial and coarse processing of natural scenes whereas the left occipito-temporal cortex is involved in a more detailed processing of the parts of a scene (Peyrin et al., 2005). In the present study, participants directed their attention toward the general coherence of the scene that might be part of an initial and coarse processing, and thus, a right hemispheric source of our effect would be expected. Furthermore, the hemispheric disparity could also reflect that the effects are associated with the ventral attention network, which is largely lateralized to the right hemisphere (Corbetta & Shulman, 2002).
Finally, we want to address the late effect in the ERPs (300–600 msec). The observed L1 might be composed of several subcomponents (e.g., P3a and P3b). Thus, the effect might reflect attentional orienting and stimulus evaluation processes. Moreover, the L1 could also reflect a motor-response-related shift in ERPs, given that the behavioral response is considerable faster in the coherent compared with the incoherent condition. However, detailed analyses of ERP components in this late time range are not motivated by the question of the present study, which focused on early effect of visual scene processing. Therefore, we refrain from a detailed interpretation of this component.
In summary, our data provide evidence for a powerful neuronal mechanism that may account for the integrated visual processing of scenes on the basis of the fast retrieval of semantic knowledge. The increase of cortical oscillatory activity here might indicate the extraction of the gist of a scene. It suggests the dynamic binding of neuronal assemblies, which establish a coherent high-level visual representation within no more than 100 msec.
This work was supported by grants from the German Research Council. We thank S. Boigs for help in picture preparation and A. Roye for helpful discussions.
Reprint requests should be sent to Frank Oppermann, Department of Psychology, University of Leipzig, Seeburgstrasse 14-20, D-04103 Leipzig, Germany, or via e-mail: email@example.com.
We thank an anonymous reviewer for directing our attention to this point.
Analyses of other frequency ranges (5–15 and 15–25 Hz) in the time range around their peak of evoked oscillatory activity (5–15 Hz for the time range of 100–250 msec and 15–25 Hz for the time range of 50–150 msec) revealed no significant effects involving the factor coherence (no main effect of coherence andnointeractions with the factor coherence in the ANOVA involvingthe three fixed variables Coherence, Caudality, and Hemisphere;all F < 1.6, p > .2).