Abstract

Traditionally, it has been theorized that the human visual system identifies and classifies scenes in an object-centered approach, such that scene recognition can only occur once key objects within a scene are identified. Recent research points toward an alternative approach, suggesting that the global image features of a scene are sufficient for the recognition and categorization of a scene. We have previously shown that disrupting object processing with repetitive TMS to object-selective cortex enhances scene processing possibly through a release of inhibitory mechanisms between object and scene pathways [Mullin, C. R., & Steeves, J. K. E. TMS to the lateral occipital cortex disrupts object processing but facilitates scene processing. Journal of Cognitive Neuroscience, 23, 4174–4184, 2011]. Here we show the effects of TMS to the transverse occipital sulcus (TOS), an area implicated in scene perception, on scene and object processing. TMS was delivered to the TOS or the vertex (control site) while participants performed an object and scene natural/nonnatural categorization task. Transiently interrupting the TOS resulted in significantly lower accuracies for scene categorization compared with control conditions. This demonstrates a causal role of the TOS in scene processing and indicates its importance, in addition to the parahippocampal place area and retrosplenial cortex, in the scene processing network. Unlike TMS to object-selective cortex, which facilitates scene categorization, disrupting scene processing through stimulation of the TOS did not affect object categorization. Further analysis revealed a higher proportion of errors for nonnatural scenes that led us to speculate that the TOS may be involved in processing the higher spatial frequency content of a scene. This supports a nonhierarchical model of scene recognition.

INTRODUCTION

Within the scene perception literature, there are two dominant theories that have been proposed to explain how the human visual system engages in scene processing. The first of these two theories is an object-centered approach. In this viewpoint, recognition of a real-world scene occurs following the identification of one or more of its prominent objects and the meaning or gist of the scene is derived from the particular arrangement and cooccurrence of these objects (De Graef, Christaens, & d'Ydewalle, 1990; Biederman, 1981, 1987; Friedman, 1979). In opposition to this view is a second scene-centered approach based on research demonstrating that scene processing can occur rapidly and accurately even when the image is presented too quickly to allow a thorough investigation of the objects within the scene (Rousselet, Joubert, & Fabre-Thorpe, 2005; Oliva & Schyns, 1997, 2000; Biederman, Mezzanotte, & Rabinowitz, 1982; Potter, 1975; Biederman, 1972). In light of these findings, this theory claims that detailed information about object shape and identity is not necessary for scene processing, but rather the global gist of a scene can be processed independently of its objects (Greene & Oliva, 2009a, 2009b; Vogel & Schiele, 2007; Oliva & Torralba, 2001, 2002, 2006; Fei-Fei & Perona, 2005; Renninger & Malik, 2004; Torralba & Oliva, 2002, 2003; Oliva & Schyns, 2000). Specifically, this explanation suggests that a scene can be rapidly identified based on a set of perceptual dimensions reflecting scene structure (mean depth, openness, and expansion), scene constancy (transience, temperature), and scene function (concealment, navigability). Additionally, low-level features such as orientation, texture, and color, as well as more complex spatial layout properties including perspective, naturalness, roughness, size, diagonal plane, symmetry, and contrast, also aid in global gist-based scene processing. In other words, these dominant spatial structures are what define the overall layout or shape of a scene and facilitate scene processing.

Functional neuroimaging has revealed an area in the posterior medial-temporal lobe that plays an active role in scene processing (Kohler, Crane, & Milner, 2002; Maguire, Frith, & Cipolotti, 2001; Epstein & Kanwisher, 1998). Scene images, as compared with face or object images, produce stronger activation in a region of the parahippocampal gyrus now known as the “parahippocampal place area” (PPA; Epstein & Kanwisher, 1998). The PPA shows preferential activation for scenes whether they contain objects, are indoor or outdoor, or the environment is natural or nonnatural. Patient research also indicates that the PPA plays a role in scene processing as damage to this region results in a host of behavioral deficits, such as difficulties in recognizing scenes, landmarks, and places (Epstein, DeYoe, Press, Rosen, & Kanwisher, 2001; Habib & Sirigu, 1987; Landis, Cummings, Benson, & Palmer, 1986; Whiteley & Warrington, 1978). Interestingly, a recent neuroimaging study demonstrated that the magnitude of activation within the PPA is higher when an object is present rather than absent (Harel, Kravitz, & Baker, 2012). Together, these findings suggest that the PPA responds to information about the layout of local space and that its response may be modulated by object information (Kravitz, Peng, & Baker, 2011; Park, Intraub, Yi, Widders, & Chun, 2007; Epstein, Harris, Stanley, & Kanwisher, 1999; Epstein & Kanwisher, 1998). Further, the response modulation in the PPA with object images complements research suggesting that the lateral occipital cortex (LO), an object-selective region (Grill-Spector, Kourtzi, & Kanwisher, 2001; Malach et al., 1995), contributes to scene recognition (Kim & Biederman, 2011; MacEvoy & Epstein, 2011).

The notion of functional connectivity between object and scene regions is supported by behavioral studies showing an influence of salient objects on scene background categorization (Joubert, Rousselet, Fize, & Fabre-Thorpe, 2007; Rousselet et al., 2005; Fabre-Thorpe, Delorme, Marlot, & Thorpe, 2001). Object perception and scene gist recognition can both occur equally rapidly (Joubert et al., 2007; Gegenfurtner & Rieger, 2000; Schyns & Oliva, 1994). As a result, it appears that scene and object processing are parallel yet interactive processes with similar temporal dynamics (Joubert et al., 2007).

The notion of parallel processing of objects and scenes is supported by a case study of a patient with visual form agnosia, consequent to bilateral damage to area LO. Despite an inability to recognize objects based on their shape, the patient was capable of categorizing scenes. fMRI showed that the patient produced activation within the PPA for scene images that was modulated by color and texture (Steeves et al., 2004), which are global scene image properties. This study indicates that scene processing can occur despite a lack of ability to process objects. Consistent with this patient study, the application of functionally guided TMS to area LO impairs object but not scene processing (Mullin & Steeves, 2011). Moreover, disrupting area LO actually facilitates scene processing. These findings may represent a release of inhibitory connections between object-selective area LO and the scene processing pathway, which further suggests that scene and object processing may operate on separate but interactive pathways. This interpretation is consistent with previous research showing that object processing can interfere with scene categorization (Joubert et al., 2007) and that the presence of objects within a scene modulates BOLD signal in the PPA (Harel et al., 2012).

In this study, we sought to further investigate the relationship between object and scene processing by attempting to disrupt scene processing by administering TMS to the transverse occipital sulcus (TOS). The TOS is caudal to the parieto-occipital fissure within the superior portion of the occipital region (Iaria & Petrides, 2007), which makes it easily accessible to TMS, unlike the PPA, which is located much deeper in the brain.

The TOS has been shown to be involved in perceiving scenes that do not contain obvious objects (Grill-Spector, 2003), in processing familiar spatial layouts (Epstein, Higgins, Jablonski, & Feiler, 2007), and in the recognition of buildings (Levy, Hasson, Harel, & Malach, 2004; Hasson, Harel, Levy, & Malach, 2003). We predicted that administering TMS to the TOS will disrupt scene categorization if its role is essential in the scene processing network. We also asked whether there was a possibility of facilitation of object categorization because stimulation of object cortex facilitates scene processing (Mullin & Steeves, 2011).

METHODS

Participants

Eight healthy participants (four women, four men, ages 23–41 years) completed this study. All participants had normal or corrected-to-normal vision and reported no contraindications to TMS or fMRI. Informed consent was obtained, and all experimental procedures were conducted in accordance with the York University Office of Research Ethics, which follows the guidelines outlined by the Declaration of Helsinki.

Image Acquisition

Functional and anatomical images were acquired with a 3-T Siemens Magnetom Tim Trio magnetic resonance scanner at York University's Sherman Health Sciences Research Centre (Toronto, Canada). ROIs for TMS stimulation were localized using fMRI, and functional volumes were acquired using the GE 32-channel high-resolution brain array coil. Functional images were acquired with EPI with a T1-weighted sequence of 32 contiguous axial slices (in-plane resolution = 2.5 × 2.5 mm, slice thickness = 3 mm, imaging matrix 96 × 96, repetition time = 2000 msec, echo time = 30 msec, flip angle = 90°, field of view = 24 × 24 cm). Structural images were acquired with a T1 MPRAGE imaging sequence (in-plane resolution = 2.0 × 2.0 mm, imaging matrix = 122 × 122, repetition time = 8300 msec, echo time = 100 msec, flip angle = 90°, field of view = 24 × 24 cm), recording 176 slices at a slice thickness of 2.0 mm. The functional localizer used a 1-back paradigm and was composed of three different stimulus categories: faces, scenes, and objects. Stimuli were presented with a rear-projection system (Avotec, Inc., Stuart, FL) in two separate functional runs (6 min 52 sec). Each run began and finished with a fixation cross for 16 sec. Six repetitions of three 16-sec blocks of the three categories of stimuli were presented in a random order with 16 sec of fixation between each repetition. Each block contained 16 stimuli presented for 1 sec each.

fMRI Analysis

All preprocessing and statistical analyses were carried out with BrainVoyager QX (Brain Innovation, Maastricht, the Netherlands). Functional data underwent motion correction for small interscan head movements as well as linear trend removal to exclude scanner related signal drift and high-pass filtering to remove temporal frequencies lower than three cycles/run. The functional data were analyzed using a general linear model and averaged over the two runs. Functional images were then coregistered to the anatomical images.

The left TOS (see Figure 1) was defined by determining the peak scene-selective activation within this area in response to a linear balanced contrast of scenes versus objects. Using these criteria, the left TOS was functionally identified in four of the eight participants. Although some studies suggest that both hemispheres process scenes equally (Peyrin, Chauvin, Chokron, & Marendaz, 2003; Goldberg & Costa, 1981), there is evidence that participants show higher TOS volumes (cc) within the left hemisphere compared with the right hemisphere (Iaria & Petrides, 2007). The majority of our participants demonstrated greater activation in the left hemisphere than the right in response to scenes. For these reasons, we restricted the application of TMS to the left hemisphere in an attempt to maximize any effects. For the remaining four participants who did not show scene-selective activation in the left TOS, this region was defined by the position and structure of the sulcus. The inability to functionally localize the TOS in all participants has also been observed by others who have found that its position tends to be more variable across participants (Amit, Mehoudar, Trope, & Yovel, 2012; Konkle & Oliva, 2012). The location of the TOS for both the functionally and anatomically defined participants was confirmed by standardizing the brain with a Talairach transformation (Talairach & Tournoux, 1988) (averaged across n = 8 [x −27 ± 3; y −82 ± 5; z 19 ± 8]) and comparing the obtained coordinates of the TOS to those previously reported (e.g., Hasson et al., 2003).

Figure 1. 

(A) Cluster of peak scene-selective cortex (scenes vs. objects contrast) in the TOS region for the four functionally defined participants. (B) Location of the TOS for the four anatomically defined participants. Cross-hairs in each image indicate coil placement. Coordinates are in standardized Talairach units (Talairach & Tournoux, 1988).

Figure 1. 

(A) Cluster of peak scene-selective cortex (scenes vs. objects contrast) in the TOS region for the four functionally defined participants. (B) Location of the TOS for the four anatomically defined participants. Cross-hairs in each image indicate coil placement. Coordinates are in standardized Talairach units (Talairach & Tournoux, 1988).

TMS Stimulation and Functional Stereotaxy

A Magstim Super Rapid2 Stimulator (Magstim; Whitland, UK) and a figure-of-eight coil with a diameter of 70 mm was used to deliver the stimulation pulses. The coil was held tangential to the scalp surface with the handle pointed downward. TMS pulse onset was externally triggered and synchronized to the stimulus image onset by VPixx custom presentation software and DATAPixx hardware (VPixx Technologies, Inc.; www.vpixx.com). Delivery of TMS trials and no-TMS trials were randomized within each run and across stimulus category (i.e., nonnatural or natural) and 48% of the trials were no-TMS trials and 52% were TMS trials. Each coil placement site (i.e., left TOS and vertex) was targeted in separate blocks and the order of the blocks was counterbalanced across participants. A 10-Hz double pulse was delivered coincident with the onset of the stimulus at 60% of maximum stimulator output based on previous studies (e.g., Mullin & Steeves, 2011; Pitcher, Charles, Devlin, Walsh, & Duchaine, 2009). The frequency, intensity, and duration of the TMS train were well within the safety limits of stimulation (Rossi, Hallett, Rossini, & Pascual-Leone, 2009; Wassermann, 1998). Earplugs were worn to reduce the noise associated with TMS coil discharge. Participants were encouraged to take breaks between testing sessions.

To ensure that that the coil's position was maintained over the area of interest, its position was continually monitored with the Brainsight image-guided stereotaxic system (Rogue Research, Inc., Montréal, Canada), which allows for coregistration of the MR images with the stimulation hardware. Each participant's anatomical image was used to guide the TMS coil to the precise location of interest relative to the head and brain surfaces.

The research design consisted of two coil placement sites: (1) left TOS and (2) the vertex. The vertex, a point at the center of the top of the head, is defined as a point midway between the inion and the nasion and equidistance from the left and right intertragal notches. This location controls for potential nonspecific effects of TMS to the brain as well as the auditory and sensory artifacts (i.e., clicking sounds and tapping sensations on the scalp).

Stimuli

In both stimulation conditions, participants were presented scenes or objects and were instructed to categorize the stimuli as “natural” or “nonnatural” as quickly and accurately as possible. One hundred forty object images were taken from the Bank of Standardized Stimuli (Brodeur, Dionne-Dostie, Montreuil, & Lepage, 2010) and 140 scene images from the SUN database for Scene Recognition (Xiao, Hays, Ehinger, Oliva, & Torralba, 2010). None of the images were repeated across the two coil placement sites. All stimuli were rendered grayscale and resized to subtend a visual angle of approximately 7.6° × 7.6°.

Experimental Procedure

Participants sat 75 cm from the display, and stimuli were presented centrally on the computer display. On each trial, a fixation dot appeared for 1000 msec, followed immediately by a stimulus image for 33 msec. This was directly followed by a mask consisting of a static noise pattern, which remained on screen until participants responded. Between each trial, there was a 7000-msec wait period to allow for recovery from TMS (see Figure 2). Participants completed a total of four blocks (i.e., 140 stimuli) with stimulation at the TOS and four blocks (i.e., 140 stimuli) with stimulation at the vertex. Block order and order of coil placement site was counterbalanced across participants within a testing session. Just over half (i.e., 52%) of the stimuli presented at each coil placement site were paired with a double pulse of TMS. Half of the stimuli were objects and half of the stimuli were scenes. Further, all of the stimuli were split evenly between natural and nonnatural categories. All images were presented in random order within a block. Participants categorized stimuli as natural or nonnatural by pressing one of two designated buttons on a response box (RESPONSEPixx, VPixx Technologies, Inc.; www.vpixx.com).

Figure 2. 

Schematic overview of trial sequence. Example of natural and nonnatural scene and object stimuli. Each trial began with a central fixation point for 1000 msec, followed by a stimulus for 33 msec, which was then masked by a static noise pattern that was present until participants responded. This was followed by a 7000-msec wait period between each trial to allow the effects of the double pulse TMS to be completely abolished.

Figure 2. 

Schematic overview of trial sequence. Example of natural and nonnatural scene and object stimuli. Each trial began with a central fixation point for 1000 msec, followed by a stimulus for 33 msec, which was then masked by a static noise pattern that was present until participants responded. This was followed by a 7000-msec wait period between each trial to allow the effects of the double pulse TMS to be completely abolished.

RESULTS

Scene categorization accuracy was impaired during TMS to the TOS relative to no-TMS and vertex conditions (see Figure 3). A 2 × 2 repeated-measures ANOVA of Coil Placement site (TOS or Vertex) and Stimulation Application (TMS or no-TMS) for scene accuracy indicated a significant interaction, F(1, 7) = 8.959, p = .02, η2 = .561.

Figure 3. 

Accuracy scores (percent correct) for categorization tasks performed under each of the four stimulation conditions: TMS to TOS, no-TMS to TOS, TMS to vertex, and no-TMS to vertex. Asterisks indicate a significant difference between the specified conditions (Bonferroni, *p < .05, **p < .01). The Y axis is truncated to better illustrate the effects. Errors bars represent the SEM.

Figure 3. 

Accuracy scores (percent correct) for categorization tasks performed under each of the four stimulation conditions: TMS to TOS, no-TMS to TOS, TMS to vertex, and no-TMS to vertex. Asterisks indicate a significant difference between the specified conditions (Bonferroni, *p < .05, **p < .01). The Y axis is truncated to better illustrate the effects. Errors bars represent the SEM.

Bonferroni post hoc analysis revealed a significant reduction in scene categorization accuracy during TMS to the TOS relative to no-TMS to the TOS (p = .042) and TMS to the vertex (p = .008). There were no significant differences during TMS to the vertex relative to no-TMS to the vertex (p = .123) and no-TMS to the TOS relative to no-TMS to the vertex (p = .06).

The same 2 × 2 repeated-measures ANOVA was conducted on the object data, which revealed no significant differences in object categorization accuracy during TMS to the TOS relative to no-TMS to the TOS (p = .338) and TMS to the vertex (p = .756). There were also no significant differences during TMS to the vertex relative to no-TMS to the vertex (p = .331) and no-TMS to the TOS relative to no-TMS to the vertex (p = .235).

In regards to RTs, neither the main effects (i.e., Stimulation Application and Coil Placement site) nor the interaction were significant for either scene (ps = .308, .064, and .594, respectively) or object categorization (ps = .733, .153, and .473, respectively).

Further, a 2 × 2 repeated-measures ANOVA of Stimulation Application (TMS or no-TMS) and Stimulus Category (scenes or objects) on accuracy scores when the coil was applied to the TOS indicated a significant interaction, F(1, 7) = 7.656, p = .028, η2 = .522. A Bonferroni post hoc analysis indicated that during TMS to the TOS, there was a significant reduction in scene categorization accuracy relative to object categorization accuracy (p = .026; see Figure 3). During no-TMS to the TOS, there were no significant differences between scene and object categorization accuracy (p = .715). Further, there was a significant reduction in scene categorization accuracy during TMS to the TOS relative to no-TMS to the TOS (p = .042). There were no significant differences during TMS and no-TMS for object categorization accuracy (p = .338). In regards to RTs, neither the main effects (i.e., Stimulation Application and Stimulus Category) nor the interaction were significant (ps = .936, .733, and .109, respectively).

The same 2 × 2 repeated-measures ANOVA was conducted on accuracy scores when the coil was applied to the vertex and indicated a nonsignificant interaction, F(1, 7) = 2.710, p = .144. For RTs, neither the main effects (i.e., Stimulation Application and Stimulus Category) nor the interaction were significant (ps = .782, .337, and .953, respectively).

To further investigate the significant decrease in scene categorization accuracy when TMS was applied to the TOS, we examined whether there may be differences in the type of images that were miscategorized. We calculated an asymmetry score for each participant based on the proportion of nonnatural errors subtracted from the proportion of natural errors divided by the sum of these two error proportions (i.e., (nonnatural errors − natural errors)/(nonnatural errors + natural errors)). Asymmetry scores ranged from −1 (i.e., 100% errors resulting from incorrectly categorizing natural scenes as nonnatural) to +1 (i.e., 100% errors resulting from incorrectly categorizing nonnatural scenes as natural), with 0 representing 50% natural and 50% nonnatural errors. A paired t test revealed that the mean scene error asymmetry score during TMS to TOS trials (M = .5462, SE = .212) was significantly higher than the mean of the scene error asymmetry score during no-TMS to TOS trials (M = .0946, SE = .215; p = .008, r = .814; see Figure 4). Further, according to a Wilcoxon signed-rank test, the mean object asymmetry score during TMS to TOS trials (M = .2589, SD = .521) was not significantly higher than the mean of the object asymmetry score during no-TMS to TOS trials (M = −.1012, SD = .771), z = −1.367, p = .172 (Figure 5).

Figure 4. 

Error asymmetry scores (proportion error) for scene categorization during TMS versus no-TMS to the TOS. Values ranged from −1 (i.e., 100% errors resulting from incorrectly categorizing natural scenes as nonnatural) to +1 (i.e., 100% errors resulting from incorrectly categorizing nonnatural scenes as natural), with 0 representing 50% natural and 50% nonnatural errors. Errors bars represent the SEM.

Figure 4. 

Error asymmetry scores (proportion error) for scene categorization during TMS versus no-TMS to the TOS. Values ranged from −1 (i.e., 100% errors resulting from incorrectly categorizing natural scenes as nonnatural) to +1 (i.e., 100% errors resulting from incorrectly categorizing nonnatural scenes as natural), with 0 representing 50% natural and 50% nonnatural errors. Errors bars represent the SEM.

Figure 5. 

Error asymmetry scores (proportion error) for object categorization during TMS versus no-TMS to the TOS. Values ranged from −1 (i.e., 100% errors resulting from incorrectly categorizing natural scenes as nonnatural) to +1 (i.e., 100% errors resulting from incorrectly categorizing nonnatural scenes as natural), with 0 representing 50% natural and 50% nonnatural errors. Errors bars represent the SEM.

Figure 5. 

Error asymmetry scores (proportion error) for object categorization during TMS versus no-TMS to the TOS. Values ranged from −1 (i.e., 100% errors resulting from incorrectly categorizing natural scenes as nonnatural) to +1 (i.e., 100% errors resulting from incorrectly categorizing nonnatural scenes as natural), with 0 representing 50% natural and 50% nonnatural errors. Errors bars represent the SEM.

DISCUSSION

The current experiment employed a novel approach to examine the role of the TOS in scene processing and its relationship to object processing. Contrary to our predictions, temporarily disrupting scene processing did not facilitate object categorization. However, administering TMS to the TOS results in a significant negative effect on scene categorization performance demonstrating causally, for the first time, that the TOS plays an essential role in the scene processing network.

Although a number of neuroimaging studies have demonstrated that the TOS activates more reliably to scene compared with object images (Epstein et al., 2007; MacEvoy & Epstein, 2007; Epstein, Higgins, & Thompson-Schill, 2005), we know little about the response properties of this region (Levy et al., 2004). We also found a significantly higher proportion of errors toward nonnatural scenes, which may be the result of differences in spatial image characteristics between the two scene types considering that nonnatural and natural images vary greatly with respect to their spatial frequency content. Natural scenes tend to be composed of undulating contours (e.g., rolling landscape; Barton, Press, Keenan, & O'Connor, 2002) and are generally defined by low spatial frequencies (Torralba & Oliva, 2003; Webster & Miyahara, 1997), whereas nonnatural scenes are typically characterized by high spatial frequencies (Joubert et al., 2007; Torralba & Oliva, 2003) because of the abundance of sharp contours, vertical lines, right angles, and defining edges (e.g., buildings, walls, windows).

This pattern of results could potentially be explained by research suggesting that the left hemisphere may preferentially process high spatial frequencies, whereas the right hemisphere may preferentially process low spatial frequencies for centrally presented stimuli (Han et al., 2002; Evans, Shedden, Hevenor, & Hahn, 2000; Robertson & Ivry, 2000; Fink, Marshall, Halligan, & Dolan, 1999; Proverbio, Minniti, & Zani, 1998; Fink, Halligan, Marshall, Frith, & Frackowiak, 1996, 1997; Martinez et al., 1997; Heinze, Johannes, Munte, & Magun, 1994). Given that we administered TMS to the left TOS, we speculate that this region may be tuned to the higher spatial frequency aspects of scenes or to the vertical/horizontal orientations within the scene. Consistent with this notion, functional neuroimaging has suggested that the TOS likely contains neurons with smaller receptive field (RF) sizes than those in the PPA (MacEvoy & Epstein, 2007). Given the different properties ascribed to these two regions, it is plausible that these structures within the scene processing network respond to distinct aspects of a scene. For instance, if the PPA is characterized by larger RFs, it may process information about overall spatial layouts, such as the surfaces, features, and objects characterizing a scene. Conversely, if the TOS supports smaller RFs it may be involved in processing the detailed image features or high spatial frequency content within a scene. As a result, it is possible that the TOS and the PPA may be responsible for different aspects of scene processing. The PPA may be involved in ultra rapid encoding of overall topographical information into memory and not preferentially involved in the perceptual analysis, identification, or recall of topographical materials (Epstein et al., 2001), which is consistent with an early global stage of scene processing. It may be that the PPA processes global spatial layout information and feeds back to the TOS for fine detail scene processing to give rise to the rich and full percept of a scene.

This model is consistent with the nonhierarchical model of face processing based on research demonstrating a flow of information from “higher” to “lower” cortical areas through reentrant connections (Jiang et al., 2011; Steeves et al., 2006; Rossion, Caldara, Seghier, Schuller, Lazeyras, & Mayer, 2003). In this nonhierarchical model, the global gist of a face is processed first, followed by further processing of more detailed face identity information. The scene processing network may operate in a similar nonhierarchical fashion with initial global scene processing in the “higher” cortical area, the PPA, followed by feedback to the “lower” cortical area, the TOS, for a more detailed analysis of scene information.

In summary, TMS to the TOS resulted in a significant decrease in scene categorization accuracy providing the first causal evidence for the significant role of the TOS in scene processing. Further, we have shown that the majority of scene categorization errors during TMS to the TOS were made for nonnatural images, which are known to contain a different spatial frequency profile (higher spatial frequencies) than that for natural scene category images. This pattern of results suggests that the TOS could potentially be involved in processing higher spatial frequency content within a scene, although this hypothesis warrants further study. Specifically, future research will need to directly manipulate and measure spatial frequency content by conducting a Fourier transformation on scene stimuli. Together with the PPA and retrosplenial cortex, the TOS forms a network of scene processing regions whose distinct perceptual properties allow for the representation of a rich visual environment.

Acknowledgments

This research was supported by grants from the Canada Foundation for Innovation, the Ontario Research Fund, and the Natural Sciences and Engineering Research Council of Canada.

Reprint requests should be sent to Jennifer K. E. Steeves, 1032 Sherman Health Science Research Centre, York University, Toronto, Ontario M3J 1P3, Canada, or via e-mail: steeves@yorku.ca.

REFERENCES

Amit
,
E.
,
Mehoudar
,
E.
,
Trope
,
Y.
, &
Yovel
,
G.
(
2012
).
Do object-category selective regions in the ventral visual streatm represent perceived distance information?
Brain and Cognition
,
80
,
201
213
.
Barton
,
J. J. S.
,
Press
,
D. Z.
,
Keenan
,
J. P.
, &
O'Connor
,
M.
(
2002
).
Lesions of the fusiform face area impair perception of facial configuration in prosopagnosia.
Neurology
,
58
,
71
78
.
Biederman
,
I.
(
1972
).
Perceiving real-world scenes.
Science
,
177
,
77
80
.
Biederman
,
I.
(
1981
).
On the semantics of a glance at a scene.
In M. Kubovy & J. Pomerantz (Eds.)
,
Perceptual organization
(pp.
213
263
).
Hillsdale, NJ
:
Erlbaum
.
Biederman
,
I.
(
1987
).
Recognition by components: A theory of human image understanding.
Psychological Review
,
94
,
115
147
.
Biederman
,
I.
,
Mezzanotte
,
R. J.
, &
Rabinowitz
,
J. C.
(
1982
).
Scene perception: Detecting and judging objects undergoing relational violations.
Cognitive Psychology
,
14
,
143
177
.
Brodeur
,
M. B.
,
Dionne-Dostie
,
E.
,
Montreuil
,
T.
, &
Lepage
,
M.
(
2010
).
The Bank of Standardized Stimuli (BOSS), a new set of 480 normative photos of objects to be used as visual stimuli in cognitive research.
PLoS One
,
5
,
e10773
.
De Graef
,
P.
,
Christaens
,
D.
, &
d'Ydewalle
,
G.
(
1990
).
Perceptual effects of scene context on object identification.
Psychological Research
,
52
,
317
329
.
Epstein
,
R.
,
DeYoe
,
E. A.
,
Press
,
D. Z.
,
Rosen
,
A. C.
, &
Kanwisher
,
N.
(
2001
).
Neuropsychological evidence for a topographical learning mechanism in parahippocampal cortex.
Cognitive Neuropsychology
,
18
,
481
508
.
Epstein
,
R.
,
Harris
,
A.
,
Stanley
,
D.
, &
Kanwisher
,
N.
(
1999
).
The parahippocampal place area: Recognition navigation, or encoding?
Neuron
,
23
,
115
125
.
Epstein
,
R. A.
,
Higgins
,
J. S.
,
Jablonski
,
K.
, &
Feiler
,
A. M.
(
2007
).
Visual scene processing in familiar and unfamiliar environments.
Journal of Neurophysiology
,
97
,
3670
3683
.
Epstein
,
R. A.
,
Higgins
,
J. S.
, &
Thompson-Schill
,
S. L.
(
2005
).
Learning places from views: Variation in scene processing as a function of experience and navigational ability.
Journal of Cognitive Neuroscience
,
17
,
73
83
.
Epstein
,
R.
, &
Kanwisher
,
N.
(
1998
).
A cortical representation of the local environment.
Nature
,
392
,
598
601
.
Evans
,
M.
,
Shedden
,
J.
,
Hevenor
,
S.
, &
Hahn
,
M.
(
2000
).
The effect of variability of unattended information on global and local processing: Evidence for lateralization at early stages of processing.
Neuropsychologia
,
38
,
225
239
.
Fabre-Thorpe
,
M.
,
Delorme
,
A.
,
Marlot
,
C.
, &
Thorpe
,
S.
(
2001
).
A limit to the speed of processing in ultra-rapid visual categorization of novel natural scenes.
Journal of Cognitive Neuroscience
,
13
,
171
180
.
Fei-Fei
,
L.
, &
Perona
,
P.
(
2005
).
A Bayesian hierarchical model for learning natural scene categories.
IEEE Proceedings in Computer Vision and Pattern Recognition
,
2
,
524
531
.
Fink
,
G.
,
Halligan
,
P.
,
Marshall
,
J.
,
Frith
,
C.
, &
Frackowiak
,
R.
(
1996
).
Where in the brain does visual attention select the forest and the trees?
Nature
,
382
,
626
628
.
Fink
,
G.
,
Halligan
,
P.
,
Marshall
,
J.
,
Frith
,
C.
, &
Frackowiak
,
R.
(
1997
).
Neural mechanisms involved in the processing of global and local aspects of hierarchical organized visual stimuli.
Brain
,
120
,
1779
1791
.
Fink
,
G.
,
Marshall
,
J.
,
Halligan
,
P.
, &
Dolan
,
R.
(
1999
).
Hemispheric asymmetries in global/local processing are modulated by perceptual salience.
Neuropsychologia
,
37
,
31
40
.
Friedman
,
A.
(
1979
).
Framing pictures: The role of knowledge in automatized encoding and memory for gist.
Journal of Experimental Psychology
,
108
,
316
355
.
Gegenfurtner
,
K. R.
, &
Rieger
,
J.
(
2000
).
Sensory and cognitive contributions of color to the recognition of natural scenes.
Current Biology
,
10
,
805
808
.
Goldberg
,
E.
, &
Costa
,
L. D.
(
1981
).
Hemisphere differences in the acquisition and use of descriptive systems.
Brain and Language
,
14
,
144
173
.
Greene
,
M. R.
, &
Oliva
,
A.
(
2009a
).
Recognition of natural scenes from global properties: Seeing the forest without representing the trees.
Cognitive Psychology
,
58
,
137
176
.
Greene
,
M.
, &
Oliva
,
A.
(
2009b
).
The briefest of glances: The time course of natural scene understanding.
Psychological Science
,
20
,
464
472
.
Grill-Spector
,
K.
(
2003
).
The neural basis of object perception.
Current Opinion in Neurobiology
,
13
,
159
166
.
Grill-Spector
,
K.
,
Kourtzi
,
Z.
, &
Kanwisher
,
N.
(
2001
).
The lateral occipital complex and its role in object recognition.
Vision Research
,
41
,
1409
1422
.
Habib
,
M.
, &
Sirigu
,
A.
(
1987
).
Pure topographical disorientation: A definition and anatomical basis.
Cortex
,
23
,
73
85
.
Han
,
S.
,
Weaver
,
J.
,
Murray
,
S.
,
Kang
,
X.
,
Yund
,
W.
, &
Woods
,
D.
(
2002
).
Hemispheric asymmetries in global and local processing: Effects of stimulus position and spatial frequency.
Neuroimage
,
17
,
1290
1299
.
Harel
,
A.
,
Kravitz
,
D.
, &
Baker
,
C.
(
2012
).
Deconstructing visual scenes in cortex: Gradients of object and spatial layout information.
Cerebral Cortex
,
1
11
.
doi: 10.1093/cercor/bhs091
.
Hasson
,
U.
,
Harel
,
M.
,
Levy
,
I.
, &
Malach
,
R.
(
2003
).
Large-scale mirror-symmetry organization of human occipito-temporal object areas.
Neuron
,
37
,
1027
1041
.
Heinze
,
H. J.
,
Johannes
,
S.
,
Munte
,
T. F.
, &
Magun
,
G. R.
(
1994
).
The order of global- and local-level information processing: Electrophysiological evidence for parallel perception processes.
In H. Heinze, T. Muente, & G. Mangun (Eds.)
,
Cognitive electrophysiology
(pp.
1
25
).
Boston
:
Birkhaeuser
.
Iaria
,
G.
, &
Petrides
,
M.
(
2007
).
Occipital sulci of the human brain: Variability and probability maps.
The Journal of Comparative Neurology
,
501
,
243
259
.
Jiang
,
F.
,
Dricot
,
L.
,
Weber
,
J.
,
Righi
,
G.
,
Tarr
,
M.
,
Goebel
,
R.
,
et al
(
2011
).
Face categorization in visual scenes may start in a higher order area of the right fusiform gyrus: Evidence from dynamic visual stimulation in neuroimaging.
Journal of Neurophysiology
,
106
,
2720
2736
.
Joubert
,
O. R.
,
Rousselet
,
G. A.
,
Fize
,
D.
, &
Fabre-Thorpe
,
M.
(
2007
).
Processing scene context: Fast categorization and object interference.
Vision Research
,
47
,
3286
3297
.
Kim
,
J. G.
, &
Biederman
,
I.
(
2011
).
Where do objects become scenes?
Cerebral Cortex
,
21
,
1738
1746
.
Kohler
,
S.
,
Crane
,
J.
, &
Milner
,
B.
(
2002
).
Differential contributions of the parahippocampal place area and the anterior hippocampus to human memory for scenes.
Hippocampus
,
12
,
718
723
.
Konkle
,
T.
, &
Oliva
,
A.
(
2012
).
A real-world size organization of object responses in occipitotemporal cortex.
Neuron
,
74
,
1114
1124
.
Kravitz
,
D. J.
,
Peng
,
C. S.
, &
Baker
,
C. I.
(
2011
).
Real-world scene representations in high-level visual cortex: It's the spaces more than the places.
Journal of Neuroscience
,
31
,
7322
7333
.
Landis
,
T.
,
Cummings
,
J. L.
,
Benson
,
D. F.
, &
Palmer
,
E. P.
(
1986
).
Loss of topographic familiarity: An environmental agnosia.
Archives of Neurology
,
43
,
132
136
.
Levy
,
I.
,
Hasson
,
U.
,
Harel
,
M.
, &
Malach
,
R.
(
2004
).
Functional analysis of the periphery effect in human building related areas.
Human Brain Mapping
,
22
,
15
26
.
MacEvoy
,
S. P.
, &
Epstein
,
R. A.
(
2007
).
Position selectivity in scene- and object-responsive occipitotemporal regions.
Journal of Neurophysiology
,
98
,
2089
2098
.
MacEvoy
,
S. P.
, &
Epstein
,
R. A.
(
2011
).
Constructing scenes from objects in human occipito-temporal cortex.
Nature Neuroscience
,
14
,
1323
1329
.
Maguire
,
E. A.
,
Frith
,
C. D.
, &
Cipolotti
,
L.
(
2001
).
Distinct neural systems for the encoding and recognition of topography and faces.
Neuroimage
,
13
,
743
750
.
Malach
,
R.
,
Reppas
,
J.
,
Benson
,
R.
,
Kwong
,
K.
,
Jiang
,
H.
,
Kennedy
,
W.
,
et al
(
1995
).
Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.
Proceedings of the National Academy of Sciences, U.S.A.
,
9
,
8135
8139
.
Martinez
,
A.
,
Moses
,
P.
,
Frank
,
L.
,
Buxton
,
R.
,
Wong
,
E.
, &
Stiles
,
J.
(
1997
).
Hemispheric asymmetries in global and local processing: Evidence from fMRI.
NeuroReport
,
8
,
1685
1689
.
Mullin
,
C. R.
, &
Steeves
,
J. K. E.
(
2011
).
TMS to the lateral occipital cortex disrupts object processing but facilitates scene processing.
Journal of Cognitive Neuroscience
,
23
,
4174
4184
.
Oliva
,
A.
, &
Schyns
,
P. G.
(
1997
).
Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli.
Cognitive Psychology
,
34
,
72
107
.
Oliva
,
A.
, &
Schyns
,
P. G.
(
2000
).
Diagnostic colors mediate scene recognition.
Cognitive Psychology
,
41
,
176
210
.
Oliva
,
A.
, &
Torralba
,
A.
(
2001
).
Modeling the shape of the scene: As holistic representation of the spatial envelope.
International Journal of Computer Vision
,
42
,
145
175
.
Oliva
,
A.
, &
Torralba
,
A.
(
2002
).
Scene-centered description from spatial envelope properties.
In H. Bulthof, S. W. Lee, T. Poggio, & C. Wallraven (Eds.)
,
Proceedings of 2nd International Workshop on Biologically Motivated Computer Vision
(pp.
263
272
).
Tuebingen, Germany
:
Springer-Verlag
.
Oliva
,
A.
, &
Torralba
,
A.
(
2006
).
Building the gist of a scene: The role of global image features in recognition.
Brain Research
,
155
,
23
36
.
Park
,
S.
,
Intraub
,
H.
,
Yi
,
D. J.
,
Widders
,
D.
, &
Chun
,
M. M.
(
2007
).
Beyond the edges of a view: Boundary extension in human-selective visual cortex.
Neuron
,
54
,
335
342
.
Peyrin
,
C.
,
Chauvin
,
A.
,
Chokron
,
S.
, &
Marendaz
,
C.
(
2003
).
Hemispheric specialization for spatial frequency processing in the analysis of natural scenes.
Brain and Cognition
,
53
,
278
282
.
Pitcher
,
D.
,
Charles
,
L.
,
Devlin
,
J. T.
,
Walsh
,
V.
, &
Duchaine
,
B.
(
2009
).
Triple dissociation of faces, bodies and objects in the extrastriate cortex.
Current Biology
,
19
,
319
324
.
Potter
,
M. C.
(
1975
).
Meaning in visual search.
Science
,
187
,
965
966
.
Proverbio
,
A.
,
Minniti
,
A.
, &
Zani
,
A.
(
1998
).
Electrophysiological evidence of a perceptual precedence of global vs. local visual information.
Cognitive Brain Research
,
6
,
321
340
.
Renninger
,
L.
, &
Malik
,
J.
(
2004
).
When is scene identification just texture recognition?
Vision Research
,
44
,
2301
2311
.
Robertson
,
L. C.
, &
Ivry
,
R.
(
2000
).
Hemispheric asymmetries: Attention to visual and auditory primatives.
Current Directions in Psychological Science
,
9
,
59
63
.
Rossi
,
S.
,
Hallett
,
M.
,
Rossini
,
P. M.
, &
Pascual-Leone
,
A.
(
2009
).
Safety of TMS Consensus Group. Safety, ethical considerations, and application guidelines for the use of transcranial-magnetic stimulation in clinical practice and research.
Clinical Neurophysiology
,
120
,
2008
2039
.
Rossion
,
B.
,
Caldara
,
R.
,
Seghier
,
M.
,
Schuller
,
A. M.
,
Lazeyras
,
F.
, &
Mayer
,
E.
(
2003
).
A network of occipito-temporal face-sensitive areas besides the right middle fusiform gyrus is necessary for normal face processing.
Brain
,
126
,
2381
2395
.
Rousselet
,
G. A.
,
Joubert
,
O. R.
, &
Fabre-Thorpe
,
M.
(
2005
).
How long to get to the “gist” of real-world natural scenes?
Visual Cognition
,
3
,
852
877
.
Schyns
,
P. G.
, &
Oliva
,
A.
(
1994
).
From blobs to boundary edges: Evidence for time- and spatial-scale-dependent scene recognition.
Psychological Science
,
5
,
195
200
.
Steeves
,
J.
,
Culham
,
J.
,
Duchaine
,
B.
,
Pratesi
,
C.
,
Valyear
,
K.
,
Schindler
,
I.
,
et al
(
2006
).
The fusiform face area is not sufficient for face recognition: Evidence from a patient with dense prosopagnosia and no occipital face area.
Neuropsychologia
,
44
,
594
609
.
Steeves
,
J. K. E.
,
Humphrey
,
G. K.
,
Culham
,
J. C.
,
Menon
,
R. S.
,
Milner
,
A. D.
, &
Goodale
,
M. A.
(
2004
).
Behavioral and neuroimaging evidence for a contribution of color and texture information to scene classification in a patient with visual form agnosia.
Journal of Cognitive Neuroscience
,
16
,
955
965
.
Talairach
,
J.
, &
Tournoux
,
P.
(
1988
).
Co-planar stereotaxic atlas of the human brain.
New York, NY
:
Thieme Medical Publishers
.
Torralba
,
A.
, &
Oliva
,
A.
(
2002
).
Depth estimation from image structure.
IEEE Pattern Analysis and Machine Intelligence
,
24
,
1226
1238
.
Torralba
,
A.
, &
Oliva
,
A.
(
2003
).
Statistics of natural image categories.
Network: Computation in Neural Systems
,
14
,
391
412
.
Vogel
,
J.
, &
Schiele
,
B.
(
2007
).
Semantic scene modeling and retrieval for content-based image retrieval.
International Journal of Computer Vision
,
72
,
133
157
.
Wassermann
,
E. M.
(
1998
).
Risk and safety of repetitive transcranial magnetic stimulation: Report and suggested guidelines from the International Workshop on the Safety of Repetitive Transcranial Magnetic Stimulation, June 5–7, 1996.
Electroencephalography and Clinical Neurophysiology
,
108
,
1
16
.
Webster
,
M. A.
, &
Miyahara
,
E.
(
1997
).
Contrast adaptation and the spatial structure of natural images.
Optical Society of America
,
14
,
2355
2366
.
Whiteley
,
A. M.
, &
Warrington
,
E. K.
(
1978
).
Selective impairment of topographical memory: A single case study.
Journal of Neurology, Neurosurgery, and Psychiatry
,
41
,
575
578
.
Xiao
,
J.
,
Hays
,
J.
,
Ehinger
,
K.
,
Oliva
,
A.
, &
Torralba
,
A.
(
2010
).
SUN database: Large-scale scene recognition from abbey to zoo.
Paper presented at the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3485–3492), San Francisco, CA, June 13–18
.