Traditionally, it has been theorized that the human visual system identifies and classifies scenes in an object-centered approach, such that scene recognition can only occur once key objects within a scene are identified. Recent research points toward an alternative approach, suggesting that the global image features of a scene are sufficient for the recognition and categorization of a scene. We have previously shown that disrupting object processing with repetitive TMS to object-selective cortex enhances scene processing possibly through a release of inhibitory mechanisms between object and scene pathways [Mullin, C. R., & Steeves, J. K. E. TMS to the lateral occipital cortex disrupts object processing but facilitates scene processing. Journal of Cognitive Neuroscience, 23, 4174–4184, 2011]. Here we show the effects of TMS to the transverse occipital sulcus (TOS), an area implicated in scene perception, on scene and object processing. TMS was delivered to the TOS or the vertex (control site) while participants performed an object and scene natural/nonnatural categorization task. Transiently interrupting the TOS resulted in significantly lower accuracies for scene categorization compared with control conditions. This demonstrates a causal role of the TOS in scene processing and indicates its importance, in addition to the parahippocampal place area and retrosplenial cortex, in the scene processing network. Unlike TMS to object-selective cortex, which facilitates scene categorization, disrupting scene processing through stimulation of the TOS did not affect object categorization. Further analysis revealed a higher proportion of errors for nonnatural scenes that led us to speculate that the TOS may be involved in processing the higher spatial frequency content of a scene. This supports a nonhierarchical model of scene recognition.
Within the scene perception literature, there are two dominant theories that have been proposed to explain how the human visual system engages in scene processing. The first of these two theories is an object-centered approach. In this viewpoint, recognition of a real-world scene occurs following the identification of one or more of its prominent objects and the meaning or gist of the scene is derived from the particular arrangement and cooccurrence of these objects (De Graef, Christaens, & d'Ydewalle, 1990; Biederman, 1981, 1987; Friedman, 1979). In opposition to this view is a second scene-centered approach based on research demonstrating that scene processing can occur rapidly and accurately even when the image is presented too quickly to allow a thorough investigation of the objects within the scene (Rousselet, Joubert, & Fabre-Thorpe, 2005; Oliva & Schyns, 1997, 2000; Biederman, Mezzanotte, & Rabinowitz, 1982; Potter, 1975; Biederman, 1972). In light of these findings, this theory claims that detailed information about object shape and identity is not necessary for scene processing, but rather the global gist of a scene can be processed independently of its objects (Greene & Oliva, 2009a, 2009b; Vogel & Schiele, 2007; Oliva & Torralba, 2001, 2002, 2006; Fei-Fei & Perona, 2005; Renninger & Malik, 2004; Torralba & Oliva, 2002, 2003; Oliva & Schyns, 2000). Specifically, this explanation suggests that a scene can be rapidly identified based on a set of perceptual dimensions reflecting scene structure (mean depth, openness, and expansion), scene constancy (transience, temperature), and scene function (concealment, navigability). Additionally, low-level features such as orientation, texture, and color, as well as more complex spatial layout properties including perspective, naturalness, roughness, size, diagonal plane, symmetry, and contrast, also aid in global gist-based scene processing. In other words, these dominant spatial structures are what define the overall layout or shape of a scene and facilitate scene processing.
Functional neuroimaging has revealed an area in the posterior medial-temporal lobe that plays an active role in scene processing (Kohler, Crane, & Milner, 2002; Maguire, Frith, & Cipolotti, 2001; Epstein & Kanwisher, 1998). Scene images, as compared with face or object images, produce stronger activation in a region of the parahippocampal gyrus now known as the “parahippocampal place area” (PPA; Epstein & Kanwisher, 1998). The PPA shows preferential activation for scenes whether they contain objects, are indoor or outdoor, or the environment is natural or nonnatural. Patient research also indicates that the PPA plays a role in scene processing as damage to this region results in a host of behavioral deficits, such as difficulties in recognizing scenes, landmarks, and places (Epstein, DeYoe, Press, Rosen, & Kanwisher, 2001; Habib & Sirigu, 1987; Landis, Cummings, Benson, & Palmer, 1986; Whiteley & Warrington, 1978). Interestingly, a recent neuroimaging study demonstrated that the magnitude of activation within the PPA is higher when an object is present rather than absent (Harel, Kravitz, & Baker, 2012). Together, these findings suggest that the PPA responds to information about the layout of local space and that its response may be modulated by object information (Kravitz, Peng, & Baker, 2011; Park, Intraub, Yi, Widders, & Chun, 2007; Epstein, Harris, Stanley, & Kanwisher, 1999; Epstein & Kanwisher, 1998). Further, the response modulation in the PPA with object images complements research suggesting that the lateral occipital cortex (LO), an object-selective region (Grill-Spector, Kourtzi, & Kanwisher, 2001; Malach et al., 1995), contributes to scene recognition (Kim & Biederman, 2011; MacEvoy & Epstein, 2011).
The notion of functional connectivity between object and scene regions is supported by behavioral studies showing an influence of salient objects on scene background categorization (Joubert, Rousselet, Fize, & Fabre-Thorpe, 2007; Rousselet et al., 2005; Fabre-Thorpe, Delorme, Marlot, & Thorpe, 2001). Object perception and scene gist recognition can both occur equally rapidly (Joubert et al., 2007; Gegenfurtner & Rieger, 2000; Schyns & Oliva, 1994). As a result, it appears that scene and object processing are parallel yet interactive processes with similar temporal dynamics (Joubert et al., 2007).
The notion of parallel processing of objects and scenes is supported by a case study of a patient with visual form agnosia, consequent to bilateral damage to area LO. Despite an inability to recognize objects based on their shape, the patient was capable of categorizing scenes. fMRI showed that the patient produced activation within the PPA for scene images that was modulated by color and texture (Steeves et al., 2004), which are global scene image properties. This study indicates that scene processing can occur despite a lack of ability to process objects. Consistent with this patient study, the application of functionally guided TMS to area LO impairs object but not scene processing (Mullin & Steeves, 2011). Moreover, disrupting area LO actually facilitates scene processing. These findings may represent a release of inhibitory connections between object-selective area LO and the scene processing pathway, which further suggests that scene and object processing may operate on separate but interactive pathways. This interpretation is consistent with previous research showing that object processing can interfere with scene categorization (Joubert et al., 2007) and that the presence of objects within a scene modulates BOLD signal in the PPA (Harel et al., 2012).
In this study, we sought to further investigate the relationship between object and scene processing by attempting to disrupt scene processing by administering TMS to the transverse occipital sulcus (TOS). The TOS is caudal to the parieto-occipital fissure within the superior portion of the occipital region (Iaria & Petrides, 2007), which makes it easily accessible to TMS, unlike the PPA, which is located much deeper in the brain.
The TOS has been shown to be involved in perceiving scenes that do not contain obvious objects (Grill-Spector, 2003), in processing familiar spatial layouts (Epstein, Higgins, Jablonski, & Feiler, 2007), and in the recognition of buildings (Levy, Hasson, Harel, & Malach, 2004; Hasson, Harel, Levy, & Malach, 2003). We predicted that administering TMS to the TOS will disrupt scene categorization if its role is essential in the scene processing network. We also asked whether there was a possibility of facilitation of object categorization because stimulation of object cortex facilitates scene processing (Mullin & Steeves, 2011).
Eight healthy participants (four women, four men, ages 23–41 years) completed this study. All participants had normal or corrected-to-normal vision and reported no contraindications to TMS or fMRI. Informed consent was obtained, and all experimental procedures were conducted in accordance with the York University Office of Research Ethics, which follows the guidelines outlined by the Declaration of Helsinki.
Functional and anatomical images were acquired with a 3-T Siemens Magnetom Tim Trio magnetic resonance scanner at York University's Sherman Health Sciences Research Centre (Toronto, Canada). ROIs for TMS stimulation were localized using fMRI, and functional volumes were acquired using the GE 32-channel high-resolution brain array coil. Functional images were acquired with EPI with a T1-weighted sequence of 32 contiguous axial slices (in-plane resolution = 2.5 × 2.5 mm, slice thickness = 3 mm, imaging matrix 96 × 96, repetition time = 2000 msec, echo time = 30 msec, flip angle = 90°, field of view = 24 × 24 cm). Structural images were acquired with a T1 MPRAGE imaging sequence (in-plane resolution = 2.0 × 2.0 mm, imaging matrix = 122 × 122, repetition time = 8300 msec, echo time = 100 msec, flip angle = 90°, field of view = 24 × 24 cm), recording 176 slices at a slice thickness of 2.0 mm. The functional localizer used a 1-back paradigm and was composed of three different stimulus categories: faces, scenes, and objects. Stimuli were presented with a rear-projection system (Avotec, Inc., Stuart, FL) in two separate functional runs (6 min 52 sec). Each run began and finished with a fixation cross for 16 sec. Six repetitions of three 16-sec blocks of the three categories of stimuli were presented in a random order with 16 sec of fixation between each repetition. Each block contained 16 stimuli presented for 1 sec each.
All preprocessing and statistical analyses were carried out with BrainVoyager QX (Brain Innovation, Maastricht, the Netherlands). Functional data underwent motion correction for small interscan head movements as well as linear trend removal to exclude scanner related signal drift and high-pass filtering to remove temporal frequencies lower than three cycles/run. The functional data were analyzed using a general linear model and averaged over the two runs. Functional images were then coregistered to the anatomical images.
The left TOS (see Figure 1) was defined by determining the peak scene-selective activation within this area in response to a linear balanced contrast of scenes versus objects. Using these criteria, the left TOS was functionally identified in four of the eight participants. Although some studies suggest that both hemispheres process scenes equally (Peyrin, Chauvin, Chokron, & Marendaz, 2003; Goldberg & Costa, 1981), there is evidence that participants show higher TOS volumes (cc) within the left hemisphere compared with the right hemisphere (Iaria & Petrides, 2007). The majority of our participants demonstrated greater activation in the left hemisphere than the right in response to scenes. For these reasons, we restricted the application of TMS to the left hemisphere in an attempt to maximize any effects. For the remaining four participants who did not show scene-selective activation in the left TOS, this region was defined by the position and structure of the sulcus. The inability to functionally localize the TOS in all participants has also been observed by others who have found that its position tends to be more variable across participants (Amit, Mehoudar, Trope, & Yovel, 2012; Konkle & Oliva, 2012). The location of the TOS for both the functionally and anatomically defined participants was confirmed by standardizing the brain with a Talairach transformation (Talairach & Tournoux, 1988) (averaged across n = 8 [x −27 ± 3; y −82 ± 5; z 19 ± 8]) and comparing the obtained coordinates of the TOS to those previously reported (e.g., Hasson et al., 2003).
TMS Stimulation and Functional Stereotaxy
A Magstim Super Rapid2 Stimulator (Magstim; Whitland, UK) and a figure-of-eight coil with a diameter of 70 mm was used to deliver the stimulation pulses. The coil was held tangential to the scalp surface with the handle pointed downward. TMS pulse onset was externally triggered and synchronized to the stimulus image onset by VPixx custom presentation software and DATAPixx hardware (VPixx Technologies, Inc.; www.vpixx.com). Delivery of TMS trials and no-TMS trials were randomized within each run and across stimulus category (i.e., nonnatural or natural) and 48% of the trials were no-TMS trials and 52% were TMS trials. Each coil placement site (i.e., left TOS and vertex) was targeted in separate blocks and the order of the blocks was counterbalanced across participants. A 10-Hz double pulse was delivered coincident with the onset of the stimulus at 60% of maximum stimulator output based on previous studies (e.g., Mullin & Steeves, 2011; Pitcher, Charles, Devlin, Walsh, & Duchaine, 2009). The frequency, intensity, and duration of the TMS train were well within the safety limits of stimulation (Rossi, Hallett, Rossini, & Pascual-Leone, 2009; Wassermann, 1998). Earplugs were worn to reduce the noise associated with TMS coil discharge. Participants were encouraged to take breaks between testing sessions.
To ensure that that the coil's position was maintained over the area of interest, its position was continually monitored with the Brainsight image-guided stereotaxic system (Rogue Research, Inc., Montréal, Canada), which allows for coregistration of the MR images with the stimulation hardware. Each participant's anatomical image was used to guide the TMS coil to the precise location of interest relative to the head and brain surfaces.
The research design consisted of two coil placement sites: (1) left TOS and (2) the vertex. The vertex, a point at the center of the top of the head, is defined as a point midway between the inion and the nasion and equidistance from the left and right intertragal notches. This location controls for potential nonspecific effects of TMS to the brain as well as the auditory and sensory artifacts (i.e., clicking sounds and tapping sensations on the scalp).
In both stimulation conditions, participants were presented scenes or objects and were instructed to categorize the stimuli as “natural” or “nonnatural” as quickly and accurately as possible. One hundred forty object images were taken from the Bank of Standardized Stimuli (Brodeur, Dionne-Dostie, Montreuil, & Lepage, 2010) and 140 scene images from the SUN database for Scene Recognition (Xiao, Hays, Ehinger, Oliva, & Torralba, 2010). None of the images were repeated across the two coil placement sites. All stimuli were rendered grayscale and resized to subtend a visual angle of approximately 7.6° × 7.6°.
Participants sat 75 cm from the display, and stimuli were presented centrally on the computer display. On each trial, a fixation dot appeared for 1000 msec, followed immediately by a stimulus image for 33 msec. This was directly followed by a mask consisting of a static noise pattern, which remained on screen until participants responded. Between each trial, there was a 7000-msec wait period to allow for recovery from TMS (see Figure 2). Participants completed a total of four blocks (i.e., 140 stimuli) with stimulation at the TOS and four blocks (i.e., 140 stimuli) with stimulation at the vertex. Block order and order of coil placement site was counterbalanced across participants within a testing session. Just over half (i.e., 52%) of the stimuli presented at each coil placement site were paired with a double pulse of TMS. Half of the stimuli were objects and half of the stimuli were scenes. Further, all of the stimuli were split evenly between natural and nonnatural categories. All images were presented in random order within a block. Participants categorized stimuli as natural or nonnatural by pressing one of two designated buttons on a response box (RESPONSEPixx, VPixx Technologies, Inc.; www.vpixx.com).
Scene categorization accuracy was impaired during TMS to the TOS relative to no-TMS and vertex conditions (see Figure 3). A 2 × 2 repeated-measures ANOVA of Coil Placement site (TOS or Vertex) and Stimulation Application (TMS or no-TMS) for scene accuracy indicated a significant interaction, F(1, 7) = 8.959, p = .02, η2 = .561.
Bonferroni post hoc analysis revealed a significant reduction in scene categorization accuracy during TMS to the TOS relative to no-TMS to the TOS (p = .042) and TMS to the vertex (p = .008). There were no significant differences during TMS to the vertex relative to no-TMS to the vertex (p = .123) and no-TMS to the TOS relative to no-TMS to the vertex (p = .06).
The same 2 × 2 repeated-measures ANOVA was conducted on the object data, which revealed no significant differences in object categorization accuracy during TMS to the TOS relative to no-TMS to the TOS (p = .338) and TMS to the vertex (p = .756). There were also no significant differences during TMS to the vertex relative to no-TMS to the vertex (p = .331) and no-TMS to the TOS relative to no-TMS to the vertex (p = .235).
In regards to RTs, neither the main effects (i.e., Stimulation Application and Coil Placement site) nor the interaction were significant for either scene (ps = .308, .064, and .594, respectively) or object categorization (ps = .733, .153, and .473, respectively).
Further, a 2 × 2 repeated-measures ANOVA of Stimulation Application (TMS or no-TMS) and Stimulus Category (scenes or objects) on accuracy scores when the coil was applied to the TOS indicated a significant interaction, F(1, 7) = 7.656, p = .028, η2 = .522. A Bonferroni post hoc analysis indicated that during TMS to the TOS, there was a significant reduction in scene categorization accuracy relative to object categorization accuracy (p = .026; see Figure 3). During no-TMS to the TOS, there were no significant differences between scene and object categorization accuracy (p = .715). Further, there was a significant reduction in scene categorization accuracy during TMS to the TOS relative to no-TMS to the TOS (p = .042). There were no significant differences during TMS and no-TMS for object categorization accuracy (p = .338). In regards to RTs, neither the main effects (i.e., Stimulation Application and Stimulus Category) nor the interaction were significant (ps = .936, .733, and .109, respectively).
The same 2 × 2 repeated-measures ANOVA was conducted on accuracy scores when the coil was applied to the vertex and indicated a nonsignificant interaction, F(1, 7) = 2.710, p = .144. For RTs, neither the main effects (i.e., Stimulation Application and Stimulus Category) nor the interaction were significant (ps = .782, .337, and .953, respectively).
To further investigate the significant decrease in scene categorization accuracy when TMS was applied to the TOS, we examined whether there may be differences in the type of images that were miscategorized. We calculated an asymmetry score for each participant based on the proportion of nonnatural errors subtracted from the proportion of natural errors divided by the sum of these two error proportions (i.e., (nonnatural errors − natural errors)/(nonnatural errors + natural errors)). Asymmetry scores ranged from −1 (i.e., 100% errors resulting from incorrectly categorizing natural scenes as nonnatural) to +1 (i.e., 100% errors resulting from incorrectly categorizing nonnatural scenes as natural), with 0 representing 50% natural and 50% nonnatural errors. A paired t test revealed that the mean scene error asymmetry score during TMS to TOS trials (M = .5462, SE = .212) was significantly higher than the mean of the scene error asymmetry score during no-TMS to TOS trials (M = .0946, SE = .215; p = .008, r = .814; see Figure 4). Further, according to a Wilcoxon signed-rank test, the mean object asymmetry score during TMS to TOS trials (M = .2589, SD = .521) was not significantly higher than the mean of the object asymmetry score during no-TMS to TOS trials (M = −.1012, SD = .771), z = −1.367, p = .172 (Figure 5).
The current experiment employed a novel approach to examine the role of the TOS in scene processing and its relationship to object processing. Contrary to our predictions, temporarily disrupting scene processing did not facilitate object categorization. However, administering TMS to the TOS results in a significant negative effect on scene categorization performance demonstrating causally, for the first time, that the TOS plays an essential role in the scene processing network.
Although a number of neuroimaging studies have demonstrated that the TOS activates more reliably to scene compared with object images (Epstein et al., 2007; MacEvoy & Epstein, 2007; Epstein, Higgins, & Thompson-Schill, 2005), we know little about the response properties of this region (Levy et al., 2004). We also found a significantly higher proportion of errors toward nonnatural scenes, which may be the result of differences in spatial image characteristics between the two scene types considering that nonnatural and natural images vary greatly with respect to their spatial frequency content. Natural scenes tend to be composed of undulating contours (e.g., rolling landscape; Barton, Press, Keenan, & O'Connor, 2002) and are generally defined by low spatial frequencies (Torralba & Oliva, 2003; Webster & Miyahara, 1997), whereas nonnatural scenes are typically characterized by high spatial frequencies (Joubert et al., 2007; Torralba & Oliva, 2003) because of the abundance of sharp contours, vertical lines, right angles, and defining edges (e.g., buildings, walls, windows).
This pattern of results could potentially be explained by research suggesting that the left hemisphere may preferentially process high spatial frequencies, whereas the right hemisphere may preferentially process low spatial frequencies for centrally presented stimuli (Han et al., 2002; Evans, Shedden, Hevenor, & Hahn, 2000; Robertson & Ivry, 2000; Fink, Marshall, Halligan, & Dolan, 1999; Proverbio, Minniti, & Zani, 1998; Fink, Halligan, Marshall, Frith, & Frackowiak, 1996, 1997; Martinez et al., 1997; Heinze, Johannes, Munte, & Magun, 1994). Given that we administered TMS to the left TOS, we speculate that this region may be tuned to the higher spatial frequency aspects of scenes or to the vertical/horizontal orientations within the scene. Consistent with this notion, functional neuroimaging has suggested that the TOS likely contains neurons with smaller receptive field (RF) sizes than those in the PPA (MacEvoy & Epstein, 2007). Given the different properties ascribed to these two regions, it is plausible that these structures within the scene processing network respond to distinct aspects of a scene. For instance, if the PPA is characterized by larger RFs, it may process information about overall spatial layouts, such as the surfaces, features, and objects characterizing a scene. Conversely, if the TOS supports smaller RFs it may be involved in processing the detailed image features or high spatial frequency content within a scene. As a result, it is possible that the TOS and the PPA may be responsible for different aspects of scene processing. The PPA may be involved in ultra rapid encoding of overall topographical information into memory and not preferentially involved in the perceptual analysis, identification, or recall of topographical materials (Epstein et al., 2001), which is consistent with an early global stage of scene processing. It may be that the PPA processes global spatial layout information and feeds back to the TOS for fine detail scene processing to give rise to the rich and full percept of a scene.
This model is consistent with the nonhierarchical model of face processing based on research demonstrating a flow of information from “higher” to “lower” cortical areas through reentrant connections (Jiang et al., 2011; Steeves et al., 2006; Rossion, Caldara, Seghier, Schuller, Lazeyras, & Mayer, 2003). In this nonhierarchical model, the global gist of a face is processed first, followed by further processing of more detailed face identity information. The scene processing network may operate in a similar nonhierarchical fashion with initial global scene processing in the “higher” cortical area, the PPA, followed by feedback to the “lower” cortical area, the TOS, for a more detailed analysis of scene information.
In summary, TMS to the TOS resulted in a significant decrease in scene categorization accuracy providing the first causal evidence for the significant role of the TOS in scene processing. Further, we have shown that the majority of scene categorization errors during TMS to the TOS were made for nonnatural images, which are known to contain a different spatial frequency profile (higher spatial frequencies) than that for natural scene category images. This pattern of results suggests that the TOS could potentially be involved in processing higher spatial frequency content within a scene, although this hypothesis warrants further study. Specifically, future research will need to directly manipulate and measure spatial frequency content by conducting a Fourier transformation on scene stimuli. Together with the PPA and retrosplenial cortex, the TOS forms a network of scene processing regions whose distinct perceptual properties allow for the representation of a rich visual environment.
This research was supported by grants from the Canada Foundation for Innovation, the Ontario Research Fund, and the Natural Sciences and Engineering Research Council of Canada.
Reprint requests should be sent to Jennifer K. E. Steeves, 1032 Sherman Health Science Research Centre, York University, Toronto, Ontario M3J 1P3, Canada, or via e-mail: firstname.lastname@example.org.