Humans are rapid in categorizing natural scenes. Electrophysiological recordings reveal that scenes containing animals can be categorized within 150 msec, which has been interpreted to indicate that feedforward flow of information from V1 to higher visual areas is sufficient for visual categorization. However, recent studies suggest that recurrent interactions between higher and lower levels in the visual hierarchy may also be involved in categorization. To clarify the role of recurrent processing in scene categorization, we recorded EEG and manipulated recurrent processing with object substitution masking while the participants performed a go/no-go animal/nonanimal categorization task. The quality of visual awareness was measured with a perceptual awareness scale after each trial. Masking reduced the clarity of perceptual awareness, slowed down categorization speed for scenes that were not clearly perceived, and reduced the electrophysiological difference elicited by animal and nonanimal scenes after 150 msec. The results imply that recurrent processes enhance the resolution of conscious representations and thus support categorization of stimuli that are difficult to categorize on the basis of the coarse feedforward representations alone.
The roles of linear feedforward processing from primary visual cortex to higher brain areas and recurrent feedback processing between higher and earlier brain areas in visual perception have been in the focus of recent research on the biological basis of higher-level cognitive processing and visual awareness. This research has used rigorously controlled simple stimuli such as geometrical shapes, gratings, bars, dots (Koivisto & Silvanto, 2012; Boehler, Schoenfeld, Heinze, & Hopf, 2008; Silvanto, Lavie, & Walsh, 2005), or flashes of light induced by TMS (Pascual-Leone & Walsh, 2001). The results converge in suggesting that recurrent processing is necessary for rich and detailed conscious perception. Different theoretical approaches agree with the idea that feedforward processing results in coarse low-resolution representation, whereas vivid conscious high-resolution perception requires an additional stage of recurrent or feedback processing (Lamme, 2004; Bar, 2003; Hochstein & Ahissar, 2002; Bullier, 2001; Di Lollo, Enns, & Rensink, 2000).
Research with more complex and ecologically valid stimuli has emphasized the capability of the feedforward sweep. The visual categorization of mammals, birds, fishes, or insects in complex natural scenes into “animal” category seems to occur rapidly and effortlessly (Fize, Fabre-Thorpe, Richard, Doyon, & Thorpe, 2005; Li, VanRullen, Koch, & Perona, 2002; Thorpe, Fize, & Marlot, 1996). Because Thorpe et al. (1996) showed with electrophysiological recordings that complex natural scenes containing animals could be discriminated from distracters by the brain already at 150 msec after scene presentation, it has been assumed that scene categorization can be performed in a purely feedforward manner (Fabre-Thorpe, 2011; Thorpe & Fabre-Thorpe, 2001). This view has been challenged by two recent studies (Koivisto, Railo, Revonsuo, Vanni, & Salminen-Vaparanta, 2011; Camprodon, Zohary, Brodbeck, & Pascual-Leone, 2010) applying TMS to disrupt brain activity during scene perception. Camprodon et al. (2010) showed that TMS to the occipital pole impaired discrimination between birds and mammals in two time windows, 100 msec and 220 msec after the image. They suggested that the first time window corresponded to feedforward processing, whereas the later time window reflected recurrent processing. However, the task called for discrimination at the basic level of category hierarchy, which requires detailed processing of the image and a longer categorization time than the animal/nonanimal discrimination at superordinate level (Macé, Joubert, Nespoulous, & Fabre-Thorpe, 2009). Koivisto et al. (2011) found that the activity of V1/V2 played a functional role in the superordinate animal/nonanimal scene categorization after the higher areas in the visual hierarchy (i.e., the lateral occipital cortex at 150 msec) had been activated. This suggests a role for recurrent processing or at least for prolonged information uptake time in early visual areas during scene categorization at the superordinate level.
These results do not exclude the possibility that rapid scene categorization is possible on the basis of the first feedforward sweep under optimal viewing conditions. However, in everyday vision, we encounter objects in a wide variety of situations. The feedforward signal may be noisy because of partial occlusion or low contrast of the stimulus; in such situations, it may be important that recurrent feedback from high-level visual areas reinforces and strengthens neural responses in lower-level areas (Wyatte, Curran, & O'Reilly, 2012). Thus, recurrent processing might be needed to increase the quality and resolution of aware perception when the target object is not clearly perceived. We performed a scene categorization experiment to explore the role of recurrent processing and awareness in scene categorization.
Several studies have used backward masking to manipulate the success of recurrent processing (e.g., Wyatte et al., 2012; Fahrenfort, Scholte, & Lamme, 2007). In backward masking, the target stimulus is typically followed by a mask either in the same position as the target (e.g., a pattern mask) or spatially surrounding it (a metacontrast mask). The temporal gap between the target and the mask gives a lead for the feedforward processing of the target, and the mask is assumed to interfere primarily with the recurrent stage of processing (e.g., Lamme & Roelfsema, 2000). We manipulated the success of recurrent processing during natural scene categorization with a specific type of backward masking: object substitution masking (OSM; Di Lollo et al., 2000; Enns & Di Lollo, 1997). In the most typical version of OSM, the target and a mask (e.g., four dots surrounding the target) appear simultaneously, but the mask persists after the offset of the target. This impairs perception of the target, as compared with a condition in which the target and mask offset simultaneously. The object substitution theory (Di Lollo et al., 2000) explains this effect by assuming that the representation of the target is substituted by that of the mask during recurrent processing. First, the combination of the target and the mask is processed in feedforward manner, resulting in vague or ambiguous representation. This is followed by a reentrant or recurrent phase of processing in which the tentative representation is compared with the initial pattern of activity. The processing of the target continues normally if the stimulus display has not been changed, but if the reentrant information does not match the information present at the lower level, as it is the case during the delayed mask offset condition, a new tentative representation involving only the mask emerges.
In the present experiment, two natural scenes were presented simultaneously, and the target scene image was surrounded by dots. In masked trials, the dots remained visible for a short period after the offset of the target. After having categorized the target scene at the superordinate level (animal vs. nonanimal), the participants rated the quality of their conscious perception in a 4-point perceptual awareness scale (Ramsøy & Overgaard, 2004). This allowed us to study categorization performance as a function of awareness. To make sure that the masking effects would not be explained by feedforward models (Francis & Hermens, 2002), the duration of the scenes was so long (100 msec) that the feedforward activation should have reached the highest areas in the ventral stream (i.e., the temporal cortex) at the scene offset and onset of masking (Liu, Agam, Madsen, & Kreiman, 2009; Lamme & Roelfsema, 2000). EEG was measured to track the effects of OSM on the differential electrophysiological activity elicited by scenes containing animals and nonanimals. If the differential activity or the behavioral responses are influenced by OSM, the hypothesis that recurrent processing is involved in scene categorization would be supported.
Twenty healthy, right-handed participants with normal or corrected-to-normal vision were tested (five men; mean age = 23.7 years, range = 19–34 years). The experiment was undertaken with the understanding and written consent of each participant. The study was accepted by the ethical committee of the University of Turku and was conducted in accordance with the Declaration of Helsinki.
Stimuli and Procedure
The visual stimuli were color photographs of natural scenes from the same sources that were used in a previous TMS study (Koivisto et al., 2011). The images of animals and nonanimals represented a mixture of general views and close-ups. Within both categories, they varied in luminance, color, and spatial frequency, so that the categorization task could not be done on the basis of low-level visual features. The images of animals (n = 256) represented one or more animals in their natural environments. The nonanimal images (n = 256) displayed landscapes, plants, fruits, vegetables, buildings, vehicles, and other human-made objects. The participants had never before seen the photographs. In addition, 99 nonanimal images served as fillers (in the opposite visual field relative to that of the target), and 48 additional animal and nonanimal images were used only in practice trials.
The stimulus images (2.4° × 2.4°) were presented on a 19-in. CRT monitor with 1278 × 768 resolution (85 Hz) from a distance of 120 cm (Figure 1B). Each trial began with the text “Seuraava” [next trial] in the center of the field. When the participant pressed the response button, a fixation cross appeared in the center of the screen for 1000 msec. It was followed by two images, one in the left visual field (LVF) and one in the right visual field (RVF), for 100 msec. The distance of the center of the images from the fixation cross was 3.2°. One of the images was the target that was surrounded by eight dots (0.75° in diameter), centered 1.2° away from the edge of the images. The other image was a randomly selected nonanimal filler. In half of the trials, the target was an image representing animal. The target appeared randomly in the LVF or RVF. In simultaneous offset (i.e., unmasked) trials, the dots onset and offset simultaneously with the images. In delayed offset (i.e., masked) trials, the dots remained on the screen for 300 msec after the offset of the images. The side and masking condition of each target image was counterbalanced across participants so that each target appeared equally often in LVF and RVF and in simultaneous and delayed offset conditions.
The task was to release the response button as quickly and accurately as possible if the target represented an animal. The response hand in this go/no-go task was balanced across participants. After each trial, the participants rated the quality of their subjective aware perception on a Perceptual Awareness Scale from 0 to 3 (0 = I did not see anything [in the area surrounded by the dots], 1 = I saw a glimpse of something but my response was a pure guess, 2 = I saw the animal/object with weak clarity, 3 = I saw the animal/object clearly). The subjective rating was indicated by pressing one of the buttons on the top of the pad with the thumb of the right hand.
EEG was recorded using Ag–AgCl sintered ring electrodes attached to EASYCAP recording cap (EASYCAP GmbH, Herrsching-Breitbrunn, Germany) with international 10–20 system sites Fp1, Fp2, F3, F4, F7, F8, Fz, P3, P4, Pz, C3, C4, Cz, T3, T4, T5, T6, O1, and O2. An electrode on the nose was used as reference, and an electrode in front of Fz was used as ground. An electrode placed below the right eye was used for monitoring vertical eye movements and blinks, and an electrode 1.5 cm to the right of the right eye was used for monitoring horizontal eye movements. EEG was amplified (SynAmps) using a band pass of 0.05–100 Hz, with the sampling rate of 500 Hz. A 50-Hz notch filter was used. The impedance was kept below 5 kΩ. Offline analysis of EEG was conducted with Brain Vision Analyzer (Brain Products GmbH, Gilching, Germany). Baseline was corrected to the activity in the −100 to 0 msec preceding the target stimuli. Trials with artifacts (>60 μV) in any of the electrodes were rejected off-line, and eye movements were corrected with the Gratton and Coles algorithm (Gratton, Coles, & Donchin, 1983). The data were filtered with 0.1-Hz high-pass and 30-Hz low-pass filters.
Statistical analyses of behavioral results did not show any significant differences in masking between the left- and right-side target stimuli. Therefore, we report all the behavioral results collapsed across visual fields.
A Category (2) × Masking (2) ANOVA on the proportion of trials in which the participants reported having awareness of the contents of the scenes (“weak clarity” or “clear” ratings; Figure 2A) showed that awareness was reduced by masking (F(1, 19) = 8.89, p = .008, ηp2 = .319). The masking effect did not differ between animal and nonanimal scenes (F < 1). We analyzed further the distribution of awareness ratings by calculating for each target image the frequency with which it was rated according to the four alternatives. A Rating (4) × Masking (2) ANOVA with repeated measures revealed a Rating × Masking interaction (F(3, 1530) = 2.93, p = .044, ηp2 = .006), showing that the clarity of perceptual awareness was influenced by OSM. Masking reduced the frequency of clear perception (from 42% to 40%, F(1, 510) = 3.98, p = .046, ηp2 = .008) and increased the frequency of seeing nothing (from 4% to 5%, F(1, 510) = 4.64, p = .032, ηp2 = .009) or seeing a glimpse of something (from 20% to 22%, F(1, 510) = 3.98, p = .047, ηp2 = .008).
The mean accuracy rates were 0.88 (SD = 0.06) for unmasked and 0.86 (SD = 0.06) for masked animal scenes and 0.69 (SD = 0.18) and 0.67 (SD = 0.02) for unmasked and masked nonanimal scenes, respectively. The accuracy rates were analyzed as a function of rated perceptual awareness. Because of small number of the lowest “seeing nothing” ratings, the results from this category were pooled with those in the second lowest category (“glimpse of something”) for the analyses of accuracy and response times. According to Logit Loglinear analysis, a nonparametric method allowing analysis of interactions in categorical data, the accuracy of categorization increased from 0.54 (“seeing something”) to 0.80 (“weak clarity”) and 0.91 (“clear perception”) as rated awareness became clearer (z = 13.76, p < .001). Responses to animal scenes were more accurate than those to nonanimal scenes (z = 10.27, p < .001), but masking did not have any effect on accuracy.
The median response time to animal images was 480 msec (SD = 95 msec) for unmasked and 479 msec (SD = 93 msec) for masked (Figure 2B) trials. In the item analysis of correct go-responses, univariate ANOVA involving masking and perceptual awareness as factors showed an interaction between rating and masking (F(2, 4431) = 3.49, p = .031, ηp2 = .002). The speed of categorization was slowed down by OSM in the trials in which nothing or a glimpse of something was seen (F(1, 599) = 4.02, p = .045, ηp2 = .007) but not in the trials with clearer conscious perception.
To assess how fast the categorization could be done, we calculated the index of minimal response time (MinRT; Fabre-Thorpe, Richard, & Thorpe, 1998), which was defined as the latency of the 10-msec bin at which correct go-responses to animal targets started to significantly outnumber incorrect go-responses to nonanimal scenes (Figure 2C). The MinRT was 330 msec (χ2 = 8.82, p = .003). When the unmasked and masked trials were examined separately, the MinRT was 330 msec for unmasked stimuli (χ2 = 8.97, p = .003) and 350 msec for masked stimuli (χ2 = 4.85, p = .028), but the 20-msec difference between the masking conditions was not statistically significant.
In addition, the distribution of the response times was explored separately for aware trials (seeing the target with “weak clarity” or “clearly”) and unaware trials (“seeing nothing” or “something”). Figure 2D shows that the correct go-responses to animals are accompanied with awareness already in fastest trials. Correct aware go-responses started to outnumber correct unaware go-responses to animals at 200 msec (χ2 = 4.46, p = .035) and incorrect aware go-responses to nonanimals at 220 msec (χ2 = 11.64, p = .001).
The ERPs in response to (unmasked) animal images were characterized by an increase in negativity relative to the ERPs elicited by nonanimals, starting about 150 msec after stimulus onset (Figure 3). This difference was observed over a wide area over the scalp and especially when the target stimulus was presented to the RVF (Figure 4). A Category (animal vs. nonanimal) × Visual field × Masking ANOVA on mean amplitudes from all electrodes in 150–200 msec latency range confirmed these impressions. It revealed a Category × Visual field × Masking interaction (F(1, 18) = 4.51, p = .048, ηp2 = .20), showing that masking influenced the difference between ERPs to animals and nonanimals, especially in response to images that were presented to the RVF. Unmasked animal scenes in the RVF elicited larger negativity than unmasked nonanimals (F(1, 18) = 5.76, p = .027, ηp2 = .24), whereas there was no difference in masked conditions between ERPs to animal and nonanimal images. This pattern was still present in the 200- to 250-msec time window (F(1, 18) = 4.64, p = .045, ηp2 = .21), but not 250–300 msec after the stimulus onset (F(1, 18) = 2.87, p = .107, ηp2 = .10).
After 300 msec, a strong positive potential (P3) emerged, and animal scenes elicited larger positivity than nonanimals scenes, particularly over parietal and occipito-temporal sites (300–400 msec: F(1, 18) = 19.39, p < .001, ηp2 = .519). This positive difference did not depend on masking and occurred for images in the LVF and RVF. This late positive enhancement cannot play a causal role in fast categorization as the MinRT was 330 msec (see above). The positive difference probably reflects postperceptual processes, such as updating of working memory (Donchin & Coles, 1988) or higher-order conscious evaluation processes (Koivisto & Revonsuo, 2010).
Behavioral Control Experiment
Because masking did not have any effect on the accuracy of categorization at the superordinate level between animal and nonanimal scenes, we conducted a behavioral control study to show that the masking manipulation was strong enough to interfere also with accuracy when access to more fine-grained representations was needed. Eight independent observers participated (age = 18–38 years, one man). The sequence of visual stimulation and the masking procedure were identical to the main experiment, but the categorization was made at subordinate level. Here, the target images (n = 64) were four-legged mammals, and the nontargets (n = 64) were nonmammals (insects, fishes, reptiles). The filler stimuli (in the visual field opposite to the target) were nonanimal scenes or images of birds. All the images were taken from the main experiment. The task was to release the response button as fast and accurately as possible if the target represented a mammal. After each trial, the participants rated the quality of their subjective aware perception by using similar scale as in the main experiment.
Similar to the main experiment, the proportion of aware trials (“weak clarity” or “clear” ratings) was higher in the unmasked condition (80.1%, SD = 10.6%) than in the masked condition (74.6%, SD = 13.8%), showing that awareness was impaired by masking (F(1, 7) = 13.72, p = .008, ηp2 = .66), without any difference in the magnitude of masking between the target and nontarget categories (F < 1). Response times did not differ between the masked (652 msec) and unmasked (655 msec) conditions (F < 1). Most importantly, the results showed a masking effect for accuracy (F(1, 7) = 11.08, p = .013, ηp2 = .613): The categorization accuracy was 76.2% (SD = 8.3%) in the unmasked trials and 69.6% (SD = 6.8%) in the masked trials. An ANOVA comparing accuracy levels between the control experiment and the main (ERP) experiment showed a significant Experiment × Masking interaction (F(1, 26) = 8.68, p = .007, ηp2 = .25), indicating that masking was stronger in the control experiment than in the main experiment. The accuracy levels in unmasked trials did not differ between the experiments (t(26) = 0.55, p = .585), but in the masked trials, the accuracy level was lower in the control experiment (t(26) = 1.78, p = .045). Thus, although different observers participated in the experiments, they did not differ in the “baseline” performance in the unmasked condition, but the difference between the experiments occurred specifically because of impaired performance in the masked condition of the control experiment.
As only a subset of the stimuli of the main experiment was used, the number of trials was lower in the control experiment, and therefore, the more detailed analyses based on rating levels in the awareness scale were not possible. In any case, this experiment shows that the OSM (with the same settings as in the main experiment, which did not show masking of accuracy) is powerful enough to have robust effects on accuracy when the categorization requires discrimination at subordinate level between mammals and nonmammals and, therefore, the construction of a more fine-grained visual representation to support successful categorization.
We used OSM to manipulate the success of recurrent processing while the participants categorized natural scenes at superordinate level and rated their perceptual awareness. The results revealed that masking decreased the subjective clarity of perceptual awareness of the content of the scenes but did not influence the accuracy of categorization. In addition, masking slowed down categorization speed in trials with low perceptual awareness. Without masking, electrophysiological responses to animal scenes started to differ from those to nonanimal scenes 150 msec after stimulus onset. However, masking eliminated this differential activity between 150 and 250 msec.
The behavioral results suggest that recurrent processing does not play a critical role in superordinate level categorization of clearly perceived images. Such images were responded to fast and accurately, and their categorization speed was not influenced by masking. The MinRT needed for correct behavioral categorization responses in our experiment (330 msec) was similar to that in the two image condition (320 msec) of Rousselet, Thorpe, and Fabre-Thorpe (2004). We were able to analyze the distribution of response latencies also as a function of perceptual awareness. The fastest correct categorization responses were almost invariably accompanied by awareness as shown by the finding that correct aware go-responses started to outnumber correct unaware go-responses to animals at 200 msec. Given that the time taken by decision making, motor programming, and motor output all are included in the behavioral response time, it is likely that the rapid responses were based on coarse magnocellular information conveyed by the feedforward sweep (Fabre-Thorpe, 2011). Although clearly perceived images were categorized rapidly and accurately and response speed to such images was not influenced by masking, the categorization speed in trials in which the participants reported at best having seen only a glimpse of something was further slowed down by masking, suggesting that recurrent processing contributed under suboptimal conditions by reinforcing visual processing of images that were difficult to categorize.
The clarity of visual awareness depended on recurrent processing: the probability of clear conscious perception was decreased by masking. The dependence of visual awareness on recurrent processing is consistent with several theories, which assume that the resolution and clarity of perception is modified by feedback signals from parietal (Bullier, 2001), frontal (Bar, 2003), or ventral areas (Campana & Tallon-Baudry, 2013; Lamme, 2004; Hochstein & Ahissar, 2002; Di Lollo et al., 2000). OSM sometimes masked the visibility of the target scene completely, and the participants reported having not seen anything at all. However, the magnitude of the masking effect on awareness was, on average, small. This could be expected because each stimulus display contained only two images. OSM is known to be stronger the more items there exist in the display (Enns, 2004; Di Lollo et al., 2000). We restricted the set size to two items, because previous research has shown that two scenes can be categorized in parallel (Rousselet et al., 2004). With a larger set size, the effects of masking would have been stronger, perhaps affecting also the categorization accuracy and speed. However, with a larger number of images in the displays, the nature of the categorization process would have changed (Rousselet et al., 2004), and the results would no longer be comparable with the previous work on ultra-rapid scene categorization that started from the work of Thorpe et al. (1996). Although accuracy at the superordinate level (animal/nonanimal) of categorization was not affected by masking, we showed in the behavioral control experiment that accuracy was decreased by the same masking settings when more fine-grained categorization was required between mammals and nonmammals. This finding shows that the masking procedure was effective and suggests that recurrent processes play a role in scene categorization when higher resolution vision is needed, whereas categorization at the superordinate level can proceed successfully on the basis of more coarse representation activated by the feedforward processes that are not affected by OSM.
There is substantial evidence for the involvement of recurrent processing in figure–ground segregation (Lamme & Roelfsema, 2000). In this study, the animal scenes that were most strongly affected by the mask seemed to be such that the animal shape was more difficult to segregate from the background than in less-masked scenes, either because of its similar color to the background or low contrast in relation to the surroundings. In an additional foveal task, this was confirmed by showing the 20 most strongly masked images and the 20 least masked images to five independent participants. They were to rate each scene in terms of how easily the animal could be segregated from the background (scale 1–9). In the strongly masked scenes, the animals were more difficult to segregate from the surroundings (M = 5.2, SD = 1.7) than in the less-masked scenes (M = 6.8, SD = 0.9; t(38) = 3.61, p = .001). This finding suggests that fast categorization on the basis of the feedforward sweep is restricted to scenes in which the target animal is easily segregated. Recurrent processing contributes to categorization and awareness when figure-ground segregation is more demanding and a low-resolution representation is not sufficient for categorization. These later situations are common in our natural environments in which we encounter a wealth of information in a wide variety of viewing conditions.
The results from our electrophysiological measurements show that recurrent processes occur during natural scene categorization. However, one should note that the electrophysiological effects in our study may not be directly comparable with those reported by Thorpe's group (Fize et al., 2005; Rousselet et al., 2004; Thorpe et al., 1996) because of the smaller size of our stimuli, the presence of the dots indicating the target scene, the requirement to rate perceptual awareness in each trial, and the different EEG reference electrode locations. In the present experiment, the scenes containing animals started to elicit a larger negativity than distracter scenes 150 msec after stimulus onset, particularly over the left hemisphere in response to scenes in the contralateral visual field. The cause of the visual field/hemispheric asymmetry in this differential activity cannot be decided on the basis of the present experiment, and because the observed lateralization of this response is not theoretically important in the current context, we will not analyze or speculate further on it here. The central finding is that the differential activity was completely suppressed by OSM between 150 and 250 msec, suggesting that it reflected recurrent interactions between higher and lower brain areas. The scalp distribution of the negative differential activity and its time course (Figure 4) fits well to this scenario, possibly reflecting recurrent feedback from frontal areas to the ventral stream (Dux, Visser, Goodhew, & Lipp, 2010; Bar, 2003).
The conclusion that 150-msec differential activity between animal and nonanimal images in scene categorization is not based on the early feedforward signals but reflects later stages of recurrent processing is consistent with the estimates of the speed of neural processing. Single cell and electrophysiological recordings in humans and animals have shown that the earliest magnocellular signals reach temporal cortex within 100 msec and the whole ventral stream is activated within ∼100 msec (Liu et al., 2009; Boehler et al., 2008; Lamme & Roelfsema, 2000). This conclusion is also in line with the results from saccadic latency tasks where two images are presented and the observers are asked to make an eye movement to the image that contains an animal. Here, the fastest correct saccades can be made about 120 msec after stimulus (Kirchner & Thorpe, 2006). Because these responses occur earlier than the electrophysiological differential activity begins, the differential activity most probably reflects recurrent processes taking place after the feedforward sweep. Our results suggest that these later processes do not play a causal role in fast categorization but enhance the resolution of conscious representations and thus support categorization performance for stimuli that are difficult to categorize on the basis of the coarse feedforward representations alone.
The viability of our interpretations depends partially on the assumption that the masking manipulation interfered specifically with recurrent processing and not with the feedforward activation of the target. There are reasons why this assumption is justified. Recent theories of visual awareness and masking propose that, in backward masking (e.g., in pattern masking, metacontrast, or OSM), the feedforward signal from the mask interrupts the recurrent processing of the target (Breitmeyer & Öğmen, 2006; Di Lollo et al., 2000; Lamme & Roelfsema, 2000). This explanation has the advantage that one does not have to assume that the feedforward processing of the following mask somehow catches up the feedforward processing of the preceding target. The OSM that we used can be argued to interfere with recurrent processing even in more selective manner than other types of backward masking such as the pattern masking or metacontrast. Unlike in pattern masking, the mask in our study did not spatially overlap with the position of the target, making any explanations in terms of integration masking (in which the target and mask fuse to a composite representation) improbable. Neither did the sparse dot mask share contours with the target image like in metacontrast masking. Therefore, feedforward lateral inhibition was unlikely to occur (Di Lollo et al., 2000). Note also that the masking in our study began 100 msec after the scene onset, allowing the feedforward activation to spread to the highest areas in the ventral stream (Liu et al., 2009; Lamme & Roelfsema, 2000) before the masking started. Thus, there was no chance for the feedforward activation of the mask-only stimulus to catch up the feedforward processing of the target. The viable explanation for our masking effects is by the process of object substitution in which the initial content of the perception (target + mask) is updated to correspond to that of the mask, which is only available during recurrent processing.
Taken together, we showed that recurrent processing starts relatively early and is involved in generating the electrophysiological differential activity around 150 msec. However, such processing does not play a critical role in the categorization of scenes that are perceived clearly. Recurrent processes are related to the resolution of conscious perception and facilitate categorization of scenes in which the target object is difficult to perceive, for example, because of difficulty of segregating it from the scene. Thus, the coarse information conveyed by the feedforward sweep is sufficient for fast categorization at superordinate level under optimal viewing conditions. The slower recurrent processing modulates the resolution of perception and contributes to visual categorization under more demanding situations, for example, when the feedforward signal is weak, noisy, or ambiguous or when the categorization must be performed at a semantically subordinate (and thereby visually more fine-grained) level.
This work was supported by the Academy of Finland (grants 125175 and 218272).
Reprint requests should be sent to Mika Koivisto, Centre for Cognitive Neuroscience, 20014 University of Turku, Finland, or via e-mail: email@example.com.