The occipito-temporal cortex is strongly implicated in carrying out the high-level computations associated with vision. In human neuroimaging studies, focal regions are consistently found within this broad region that respond strongly and selectively to faces, bodies, or objects. A notable feature of these selective regions is that they are found in pairs. In the posterior-lateral occipito-temporal cortex, focal selectivity is found for faces (occipital face area), bodies (extrastriate body area), and objects (lateral occipital). These three areas are found bilaterally and at close quarters to each other. Likewise, in the ventro-medial occipito-temporal cortex, three similar category-selective regions are found, also in proximity to each other: for faces (fusiform face area), bodies (fusiform body area), and objects (posterior fusiform). Here we review some of the extensive evidence on the functional properties of these areas with two aims. First, we seek to identify principles that distinguish the posterior-lateral and ventro-medial clusters of selective regions but that apply generally within each cluster across the three stimulus kinds. Our review identifies and elaborates several principles by which these relationships hold. In brief, the posterior-lateral representations are more primitive, local, and stimulus-driven relative to the ventro-medial representations, which in contrast are more invariant to visual features, global, and linked to the subjective percept. Second, because the evidence base of studies that compare both posterior-lateral and ventro-medial representations of faces, bodies, and objects is still relatively small, we seek to provoke more cross-talk among the research strands that are traditionally separate. We identify several promising approaches for such future work.
Many lines of evidence implicate the occipito-temporal cortex in high-level aspects of visual perception. A salient feature of this broad swath of cortex is the presence of several focal regions that respond highly selectively in neuroimaging experiments to particular stimulus categories. One aspect of this category selectivity that is often noted, yet remains poorly understood, is that these regions do not appear as singletons. For example, fMRI reveals distinct, focal, and selective brain regions for faces (Pitcher, Walsh, & Duchaine, 2011; Kanwisher & Yovel, 2006) and bodies (Downing & Peelen, in press) in both posterior-lateral and ventro-medial occipito-temporal cortex. Furthermore, areas responding to complex object shape (relative to simple visual elements) are similarly divided between these two general regions (Grill-Spector et al., 1999). This division of labor is reminiscent of the multiple maps that have been identified elsewhere in the brain that support other sensory or motor systems, such as the multiple retinotopic maps in occipital cortex (e.g., Engel, Glover, & Wandell, 1997; DeYoe et al., 1996; Sereno et al., 1995), or the multiple maps of body space in frontal and parietal cortices (e.g., Medina & Coslett, 2010; Graziano & Gross, 1998).
In this article, we propose a general distinction between two groups of functionally defined regions: Posterior-lateral face, body, and object regions provide a relatively primitive, stimulus-bound representation of their preferred stimuli, compared with the ventro-medial representations, which are, in contrast, relatively high level and more closely related to the subjective percept. Below we review the current evidence for this distinction. It remains incomplete, in that many of the key tests remain to be performed for one or more categories. This partially reflects artificial boundaries in the literature between research on faces, bodies, and objects, and an aim of this article is to suggest that future work take a more inclusive approach. But there is sufficient evidence to allow a sketch that will provoke new experiments.
Why consider face- and body-selective regions, which respond preferentially to a single class of stimuli, together with regions that respond broadly to object shape? In line with the functional and anatomical similarities that are reviewed below, we consider it a reasonable possibility (developed further in the Discussion) that body- and face-selective regions reflect specialized “branches” of a more general object system, devoted to these highly relevant animate objects. Thus, it may be fruitful to survey the evidence on faces, bodies, and object form together. Conversely, why limit the scope of this proposal to faces, bodies, and objects, when there are also strongly selective responses to scenes and buildings in parahippocampal and retrosplenial cortices? Although these regions also form a pair along a posterior–anterior axis, they are found medially at some distance from the regions considered here. Furthermore, the general view is that focal scene-selective responses form part of a system for spatial navigation (e.g., Epstein, 2008) rather than for object perception, and so we do not include them in the framework discussed here.
fMRI EVIDENCE FOR FACE, BODY, AND OBJECT SELECTIVITY IN OCCIPITOTEMPORAL CORTEX
This section provides a brief summary of fMRI evidence for focal face, body, and object form representations in human extrastriate cortex (Figure 1). More extensive reviews of these areas can be found elsewhere (e.g., faces: Pitcher et al., 2011; Kanwisher & Yovel, 2006; bodies: Downing & Peelen, in press; Minnebusch & Daum, 2009; Peelen & Downing, 2007a; objects: Grill-Spector, Kourtzi, & Kanwisher, 2001).
Faces, relative to objects, scenes, and other parts of the body, elicit selective fMRI activity in several areas of human occipito-temporal cortex (Avidan, Hasson, Malach, & Behrmann, 2005; Gauthier, Tarr, et al., 2000; Haxby, Hoffman, & Gobbini, 2000; Halgren et al., 1999; Kanwisher, McDermott, & Chun, 1997; Puce, Allison, Asgari, Gore, & McCarthy, 1996). Of primary interest here are activations in the inferior occipital gyrus (occipital face area, OFA) and the lateral fusiform gyrus (fusiform face area, FFA). Many studies have examined the face-related response properties of these regions, considering such factors as the presence of realistic face features and the configuration of elements (Liu, Harris, & Kanwisher, 2010; Rhodes, Michie, Hughes, & Byatt, 2009), the response to nonhuman faces and face caricatures (Tong, Nakayama, Moscovitch, Weinrib, & Kanwisher, 2000), face-like stimulus symmetry (Caldera & Seghier, 2009), and the like.
Human fMRI studies reveal two occipito-temporal regions that respond more strongly to human bodies or body parts than to objects, object parts, faces, scenes, and other stimuli (Downing & Peelen, in press; Peelen & Downing, 2007a). One region, labeled the extrastriate body area (EBA; Downing, Jiang, Shuman, & Kanwisher, 2001), is found in the posterior inferior temporal sulcus/middle temporal gyrus. The other, the fusiform body area (FBA: Peelen & Downing, 2005; Schwarzlose, Baker, & Kanwisher, 2005), is found in the lateral fusiform gyrus. Both of these areas respond selectively to abstract depictions of the body (stick figures) as well as photorealistic images and respond more to nonhuman animals than to inanimate objects (Wiggett, Pritchard, & Downing, 2009; Downing, Chan, Peelen, Dodds, & Kanwisher, 2006).
The lateral occipital complex (LOC; Grill-Spector et al., 2001; Malach et al., 1995) is typically defined as a region that, in fMRI studies, responds preferentially to images of objects or large segments of objects compared with textures formed of finely “scrambled” images of objects. LOC can be subdivided (Grill-Spector et al., 1999) into a posterior-lateral region (LO) and a more medial and anterior region (posterior fusiform, pFs). An important characteristic of LOC is that its responses to images of objects are, to a large extent, independent of shape-defining visual features such as luminance and motion (Grill-Spector et al., 1999) and texture (Malach et al., 1995) and robust to partial occlusion (Kourtzi & Kanwisher, 2001).
EVIDENCE FROM OTHER METHODS
Aside from fMRI evidence, there are findings from a wide range of techniques that support the existence and selective nature of the regions described above. For example, studies of neurological patients show deficits in processing faces (Busigny & Rossion, 2010; Barton, 2008; Bouvier & Engel, 2006), bodies (Moro et al., 2008), and objects (Karnath, Ruter, Mandler, & Himmelbach, 2009; James, Culham, Humphrey, Milner, & Goodale, 2003) following lesions to the general areas reviewed above (although not necessarily with sufficient specificity to distinguish posterior-lateral and ventro-medial counterparts). Likewise, the presence of face- and body-selective cortical regions is supported by intracranial recordings in human patients (e.g., Pourtois, Peelen, Spinelli, Seeck, & Vuilleumier, 2007; Allison, Puce, Spencer, & McCarthy, 1999). And TMS has been used to selectively interfere with the functioning of posterior-lateral representations of faces (Pitcher, Walsh, Yovel, & Duchaine, 2007), bodies (Urgesi, Berlucchi, & Aglioti, 2004), and objects (Chouinard, Whitwell, & Goodale, 2009), and indeed TMS dissociates among all three of these (Pitcher, Charles, Devlin, Walsh, & Duchaine, 2009).
The posterior-lateral regions discussed here fall close together on the lateral surface of the cortex, with foci describing two mirror-symmetrical curves in the left and right hemispheres (Figure 2 and Table 1). Considered in Talairach space (Talairach & Tournoux, 1988)—which operates over voxels rather than cortical surface, and so may underestimate distances—the foci of OFA, EBA, and LO typically fall within approximately 1 cm of each other in the X axis, with OFA medial and inferior to the other regions and EBA superior to LO. In fMRI studies, depending on statistical thresholds, there will typically be overlap among the active voxels of these regions, even when defined at the single-subject level (Downing, Wiggett, & Peelen, 2007).
|pFs||−36 (1.8)||−46 (1.7)||−16 (0.7)||34 (1.3)||−43 (1.5)||−17 (0.9)|
|FFA||−39 (0.9)||−46 (1.9)||−18 (1.2)||38 (0.7)||−48 (1.9)||−16 (0.9)|
|FBA||−41 (1.1)||−46 (1.7)||−18 (1.3)||40 (1.0)||−46 (1.5)||−18 (1.0)|
|LO||−45 (1.0)||−70 (1.6)||−6 (1.6)||43 (1.3)||−70 (1.2)||−8 (1.6)|
|OFA||−37 (1.2)||−72 (1.7)||−14 (1.3)||37 (1.6)||−72 (1.0)||−14 (1.1)|
|EBA||−48 (1.0)||−68 (1.6)||3 (1.6)||47 (1.1)||−67 (1.3)||−1 (1.4)|
|pFs||−36 (1.8)||−46 (1.7)||−16 (0.7)||34 (1.3)||−43 (1.5)||−17 (0.9)|
|FFA||−39 (0.9)||−46 (1.9)||−18 (1.2)||38 (0.7)||−48 (1.9)||−16 (0.9)|
|FBA||−41 (1.1)||−46 (1.7)||−18 (1.3)||40 (1.0)||−46 (1.5)||−18 (1.0)|
|LO||−45 (1.0)||−70 (1.6)||−6 (1.6)||43 (1.3)||−70 (1.2)||−8 (1.6)|
|OFA||−37 (1.2)||−72 (1.7)||−14 (1.3)||37 (1.6)||−72 (1.0)||−14 (1.1)|
|EBA||−48 (1.0)||−68 (1.6)||3 (1.6)||47 (1.1)||−67 (1.3)||−1 (1.4)|
Values in parentheses denote (SE).
The anatomical picture in ventro-medial regions is similar. Here, however, there is closer overlap among selective regions (Figure 2 and Table 1): means of peak coordinates are separated by ≤3 mm in the Y and Z axes. This entwinement of representations may allow greater crosstalk between cortical units analyzing different visual categories in ventro-medial relative to posterior-lateral regions. Close overlap notwithstanding, it has proven possible to distinguish these activations at the individual level. For example, Grill-Spector, Knouf, and Kanwisher (2004) distinguished object from face-selective fusiform regions. Schwarzlose et al. (2005) demarcated face and body selectivity in FFA and FBA with high-resolution fMRI (see also Peelen, Wiggett, & Downing, 2006), and they can also be dissociated on the basis of developmental trajectory (Peelen, Glaser, Vuilleumier, & Eliez, 2009).
To varying degrees, the selective regions identified here are found in roughly mirror-symmetrical locations in both the left and right hemispheres (Figure 2 and Table 1). Face perception has long been associated with a right hemisphere bias (Rhodes, 1985), and this is reflected in general fMRI activation strength (e.g., more significant, more selective, or larger right hemisphere activations; e.g., Yovel & Kanwisher, 2004). Body-selective activations are also typically biased for the right hemisphere (e.g., Taylor, Wiggett, & Downing, 2007, 2010; Peelen & Downing, 2005; Downing et al., 2001). Interestingly, laterality of face (FFA) and body (EBA) representations in fMRI is associated with handedness, such that right-handers have right hemisphere biased representations whereas left-handers do not (Willems, Peelen, & Hagoort, 2010). In general, though, we conceive of the posterior-lateral to ventro-medial axis discussed here as orthogonal to the dimension of laterality and we do not discuss laterality further.
POSTERIOR-LATERAL VERSUS VENTRO-MEDIAL REPRESENTATIONS
We next review evidence from fMRI studies of faces, bodies, and objects. Our review is selective, focusing on studies that have the potential to identify commonalities or differences between the posterior-lateral and ventro-medial foci for one or more of these categories. The aim is to reveal general functional properties that hold across categories but that are different across the two “families” of regions.
Invariance across Location
Classically, one hallmark of a high-level visual representation is that it generalizes, at least to some degree, across retinal locations (Biederman & Cooper, 1991; Gross, Rocha-Miranda, & Bender, 1972; but see Kravitz, Kriegeskorte, & Baker, 2010; Kravitz, Vinson, & Baker, 2008). Early visual areas are organized as retinotopic maps and respond primarily to contralateral stimuli. To what extent do the lateral and ventral representations of faces, bodies, and objects encode retinal location, and to what extent are they location invariant?
One of the earliest sources of evidence for the distinction between LO and pFs came from a blocked design fMRI adaptation experiment (Grill-Spector et al., 1999). In that study, when objects were repeated over a block, but changed in size or location from trial to trial, adaptation in pFs was stronger relative to LO. This indicates that the representation of objects in the former region is more invariant to these dimensions than in the latter.
Object- and face-selective areas show a bias for contralateral over ipsilateral stimuli in terms of response selectivity. Importantly, this bias is stronger for LO and OFA (posterior-lateral regions) than for pFs and FFA (ventro-medial regions) (Hemond, Kanwisher, Op de Beeck, & Ferrari, 2007). Similarly, FFA adapts equally to contra- and ipsi-lateral presentations of faces, whereas OFA adapts only when presentations are in the contralateral field (Kovacs, Cziraki, Vidnyanszky, Schweinberger, & Greenlee, 2008). OFA and FFA are also relatively retinotopic in the sense of showing greater selectivity (vs. buildings) for stimuli in or near the fovea (Goesaert & Op de Beeck, 2010; Levy, Hasson, Avidan, Hendler, & Malach, 2001), although this pattern is more pronounced in OFA (Levy et al., 2001, Figure 3), indicating a more retintopically biased representation in that region.
Finally, many recent fMRI studies have begun to use multivoxel pattern analyses (MVPA) to reveal how patterns of BOLD activity can distinguish among kinds of stimuli and mental states in ways that gross activation levels cannot (Oosterhof, Wiggett, Diedrichsen, Tipper, & Downing, 2010; Peelen & Downing, 2007b; Haynes & Rees, 2006; Kamitani & Tong, 2006; Kriegeskorte, Goebel, & Bandettini, 2006; Norman, Polyn, Detre, & Haxby, 2006; Haxby et al., 2001). A study using this approach has shown that when objects, faces, or bodies are presented in several different retinal locations, more information about the location of the stimulus is revealed by MVPA of activity in the posterior-lateral areas than the ventro-medial areas (Schwarzlose, Swisher, Dang, & Kanwisher, 2008). This finding is consistent with a stronger representation of location in the former region relative to the latter.
In summary, there is good evidence that, considered across categories, the high-level representations in the posterior lateral regions are more closely linked to retinal location than those in the ventro-medial regions.
Invariance over Other Visual Dimensions
There is converging evidence that posterior-lateral object representations may be less abstract and defined to a greater extent by local physical structure than ventro-medial regions. For example, LO shows more sensitivity to changes in size and viewpoint, relative to pFs, for faces and objects (Grill-Spector et al., 1999). Similarly, LO shows adaptation to repeated presentations of shapes that have the same two-dimensional outline but different three-dimensional structures, whereas pFs distinguishes these, suggesting that integration of three-dimensional shape features is emphasized in the latter region (Kourtzi, Erb, Grodd, & Bulthoff, 2003).
Less is known about cue invariance in the perception of bodies. Cross, Mackie, Wolford, and de C. Hamilton (2010) compared adaptation effects to repeated versus changed whole-body postures depicted across viewpoint changes of approximately 90°. They found, in a whole-brain analysis, view-invariant adaptation effects in a region similar to FBA, but not in EBA. However, in a two-shot rapid adaptation paradigm (Kourtzi & Kanwisher, 2000), Taylor et al. (2010) found comparable levels of view invariance for bodies in EBA and FBA up to 45° rotations in depth. Both areas show selective responses to intact versus scrambled “point light” biological motion animations (Peelen et al., 2006), with somewhat greater selectivity in FBA than EBA—perhaps reflecting a relative bias in the former region for depictions of the whole body over body parts (Taylor et al., 2007).
Wholes and Parts
Lerner, Hendler, Ben-Bashat, Harel, and Malach (2001; see also Grill-Spector et al., 1998) measured the responses of retinotopic areas and LOC to images that had been “scrambled” to varying degrees into component elements. They observed a progressive increase in sensitivity to the amount of scrambling from early retinotopic areas through to LOC. More relevant to the present discussion, pFs, relative to LO, was more sensitive to scrambling, suggesting there is a hierarchy within LOC whereby the more anterior counterpart is most strongly involved in representing larger and more complex shape features. Indeed, when comparing the pattern of findings for cars and for faces, the authors found a similar posterior-to-anterior trend in both cases, leading them to suggest (consistent with the current proposal) a single hierarchical organisation for different visual categories.
Lerner, Hendler, and Malach (2002) compared the responses of LO and pFs to images of intact objects, the same objects in which some local features were occluded by parallel bars, and objects in which global features were disrupted by occlusion. Local occlusion caused significant decreases in both LO and pFs, relative to intact stimuli, but only in pFs did global disruption lead to a significant additional decrease, suggesting a stronger representation of the whole figure in that region than in LO.
Drucker and Aguirre (2009) used both fMRI adaptation and MVPA to examine the relationship between neural activity in the subregions of LOC and a shape space defined over simple closed curves. Adaptation revealed a fine-scaled neural space in pFs, but not LO, that mapped onto global shape similarity. Moreover, MVPA results were consistent with a model in which local features of shapes are represented in LO and whole-shape structure is integrated in pFs. The authors also noted a possible correspondence between the representation of parts and wholes in paired object-, body-, and face-selective regions, although faces and bodies were not examined in that study.
Inverting faces disrupts configural representations while leaving the representation of individual parts relatively intact (Yin, 1969). Many studies have examined the response of FFA to face inversion (e.g., Epstein, Higgins, Parker, Aguirre, & Cooperman, 2006; Mazard, Schiltz, & Rossion, 2006; Haxby et al., 1999; Kanwisher, Tong, & Nakayama, 1998). The findings of Yovel and Kanwisher (2005) are particularly relevant for the present discussion: BOLD signal was lower for inverted (versus upright) faces in FFA, but not OFA, and only activity in FFA was positively correlated with behavioral measures of the face inversion effect across subjects. Similarly, both FFA and OFA respond to changes in the spatial relations among the features of upright faces, but only OFA responds to these changes in inverted faces (Rhodes et al., 2009). There is also an inversion effect for perception of bodies (Cook & Duchaine, 2011; Reed, Stone, Grubb, & Mcgoldrick, 2006; Reed, Stone, Bozova, & Tanaka, 2003). In some tasks, this depends on the presence of the head in the image (Yovel, Pelc, & Lubetzky, 2010) and may be related to neural activity in face, rather than body-selective regions (Brandman & Yovel, 2010), but the relative role of OFA and FFA have not been determined.
Other studies have used different approaches to unpick the role of face-selective areas in part-based versus holistic representations. Schiltz and Rossion (2006) used the “face composite effect” (Young, Hellawell, & Hay, 1987) to examine the integration of face elements. This is the tendency of parts of faces (e.g., the top half) to produce a different subjective percept when aligned with different counterparts (e.g., the bottom half). Repetitions of the same top-half of a face produced less adaptation in OFA and FFA (but more strongly in the latter) when paired with different bottom-halves. Liu et al. (2010) manipulated faces to test for the effects of the presence of real face features (as opposed to geometric placeholders) and of the presence of the typical feature configuration. Although OFA responded more to real face features, its response did not depend on a correct face configuration; in contrast, FFA was sensitive to both feature and configuration. Other evidence (e.g., Betts & Wilson, 2010; Harris & Aguirre, 2008, 2010) similarly favors a model in which configural information is preferentially represented in FFA, but parts are represented in both OFA and FFA.
Taylor et al. (2007) used fMRI to examine the representations in EBA and FBA of whole bodies and individual body parts. In EBA, significant selectivity (vs. nonbody controls) was found for the smallest part tested (a single finger). In contrast, in FBA the response selectivity was not significant for fingers and hands in isolation. This suggests that EBA carries a relatively part-based representation of the body. Accordingly, TMS stimulation over EBA amplifies the body inversion effect (Urgesi, Calvo-Merino, Haggard, & Aglioti, 2007), presumably because inversion has disrupted holistic analysis, leaving only a part-based mechanism to perform the task (but see Brandman & Yovel, 2010). Likewise, Chan, Kravitz, Truong, Arizpe, and Baker (2010), using MVPA techniques, found above-chance levels of information about body parts in right EBA but not in FBA.
The broad picture that emerges is that the representational emphasis shifts qualitatively between posterior lateral and ventro-medial regions. The former are concerned with representation of more local, part-based aspects of their preferred stimuli, whereas the latter, relatively speaking, construct a representation of the whole stimulus from larger-scale assemblies of parts.
Specificity and Selectivity
The proposal that the posterior-lateral family of regions is selective for relatively local and/or part-based visual features implies that these regions may be more strongly engaged by “non-preferred” categories than their ventro-medial counterparts (Lerner et al., 2001). This follows from the logic that where neurons are sensitive to simpler, more locally defined features, these features are more likely to be present in a wider range of stimuli. This prediction has not been extensively tested, but it is borne out for faces, for example, in studies that show weaker selectivity in OFA than in FFA for faces against objects (Andrews & Ewbank, 2004; Halgren et al., 1999) or against bodies (Andrews, Davies-Thompson, Kingstone, & Young, 2010). A similar pattern was found by Gilaie-Dotan and Malach (2007) and Grill-Spector et al. (2004). More recently, Silvanto, Schwarzkopf, Gilaie-Dotan, and Rees (2010) used TMS to show that interference with activity in OFA modulated priming effects on a task requiring perception of simple, symmetrical, two-dimensional nonface shapes, implying a role for this region that is not strictly limited to faces.
For the case of objects more generally, the proposal of differential selectivity in LOC subregions may fit with the findings of Lerner, Epshtein, Ullman, and Malach (2008). These authors assessed the responses of pFs and LO to image fragments that were selected on computational grounds (cf. Ullman, 2007) to be highly diagnostic of object kind. These fragments were compared with image fragments that were randomly selected. Both pFs and LO responded more to informative than random fragments, but this pattern was somewhat stronger in pFs. This suggests representations in LO that are less closely tied to specific kinds and that therefore contribute more broadly than those in pFs to represent multiple visual classes.
Note that predictions about relative selectivity between pairs of regions must be considered alongside measurement issues that may arise when examining nearby areas. For example, because of the close overlap between FBA and FFA, there will be, depending on how ROIs are constructed, a relatively strong response to faces in FBA (Schwarzlose et al., 2005). So the selectivity of EBA (for bodies versus faces) could appear to be stronger than in FBA, contrary to the proposal described here. High-resolution techniques, exclusive definition of nonoverlapping ROIs (Schwarzlose et al., 2005) and MVPA (Downing et al., 2007) could help ameliorate this concern. Also, pragmatically, selectivity might be better defined in these situations against categories to which closely neighboring regions do not respond strongly.
Relationship between Neural Activity and the Subjective Percept
Some studies indicate that activity in the ventro-medial regions is more closely related to the subjective percept of a stimulus than to the objective features of the stimulus per se. In its simplest version, this question can be addressed by comparing events in which a conscious percept (e.g., of a face) occurs or does not occur with the stimulus held constant.
Hasson, Hendler, Ben Bashat, and Malach (2001) compared OFA and FFA responses to a variant of Rubin's ambiguous “face–vase” illusion. A face percept elicited stronger responses in both OFA and FFA than a vase percept; however, in ratio terms, the difference between conditions was greater in FFA than in OFA. A later study (Hesselmann, Kell, Eger, & Kleinschmidt, 2008) showed that prestimulus activity in FFA predicted a later percept of a face in the “face–vase” illusion; this effect was not significant for OFA. Similarly, Andrews and Schluppeck (2004; see also Kanwisher et al., 1998; Dolan et al., 1997) measured the responses of face- and object-selective cortical areas as a function of whether participants perceived, Mooney (1957) stimuli as faces or not. Only in the FFA was a relationship found between face perception and subjective reports. (Note, however, that McKeeff & Tong, 2007, found suggestive evidence that activity in both OFA and FFA relates to perceptual decisions about ambiguous Mooney stimuli). Summerfield, Enger, Mangels, and Hirsch (2006) found that responses in FFA, but not OFA, were elevated on trials in which a heavily degraded house stimulus was incorrectly perceived as a face. Similarly, FFA but not OFA responses are increased when faces are erroneously reported to be present in pure noise stimuli (Zhang et al., 2008).
Using an fMRI-adaptation paradigm, Large, Cavina-Pratesi, Vilis, and Culham (2008) examined the correlates of change detection for faces. Although both OFA and FFA showed adaptation rebound when a change was detected, OFA alone rebounded when a change occurred but went undetected. Complementary findings are reported by Schiltz, Dricot, Goebel, and Rossion (2010). This study examined adaptation to the composite face illusion in OFA and FFA. FFA (but not OFA) recovered from adaptation when upper face parts were perceived as having changed, even when no change was present in the stimulus. Both studies suggest that FFA tracks conscious perception of faces, whereas OFA is more sensitive to physical stimulus properties.
The relationship between percept and neural activity can also be addressed more indirectly by asking whether neural effects measured by BOLD correspond better to the physical or the subjective aspects of stimuli (see Edelman, Grill-Spector, Kushnir, & Malach, 1998, for an important early step in this direction). For example, OFA responds more to pairs of faces taken from a continuum created by “morphing” two faces than to the repeated presentation of a single face from the continuum. In FFA, this release from adaptation is stronger when the two faces straddle either side of the sharp categorical boundary perceived at the “morph” midpoint (Rotshtein, Henson, Treves, Driver, & Dolan, 2005; Harnad, 1990). This suggests that OFA is principally concerned with structural aspects of the face rather than more abstract (and more psychologically relevant) identity distinctions.
Haushofer, Livingstone, and Kanwisher (2008) parameterized the physical similarities among a set of artificial shapes and also behaviorally measured their subjective perceptual similarity. The similarity metrics were then compared with multivoxel activation patterns in both LO and pFs. These measures showed a closer relationship between activation patterns in LO and the physical shape of the stimuli, whereas activation patterns in pFs more closely reflected behavioral assessment of subjective similarity for the same stimuli. In contrast, another recent MVPA study relying on similar logic to the Haushofer et al. (2008) study found a significant correlation with perceived shape similarity in both LO and pFs (Op de Beeck, Torfs, & Wagemans, 2008). Many differences between these studies (e.g., 2-D versus 3-D shapes; different methods of assessing objective and subjective similarity; the nature of the between- and within-category differences in the shapes) make it difficult to draw firm conclusions from this evidence (see Op de Beeck, in press, for a discussion of these studies).
Other efforts to map the relationship between neural and psychological face “spaces” have focused on the role of central prototypes (Loffler, Yourganov, Wilkinson, & Wilson, 2005; Wilson, Loffler, & Wilkinson, 2002; Leopold, O'Toole, Vetter, & Blanz, 2001). More recently, Panis, Wagemans, and Op de Beeck (2011) took this approach in the domain of objects. They found evidence that in pFs, but not in LO, shapes were coded with reference to a central prototype that was experimentally determined by controlling participants' exposure to a novel set of shapes. Specifically, in pFs, responses were lower to stimuli the closer they were to the prototype, suggesting that, for more “distant” stimuli, additional neural activity reflects differences from the prototype. In contrast, there was no effect of distance from prototype in LO.
In summary, there is good evidence from studies of faces and objects to suggest that activity in the ventro-medial region is more likely to be related to participants' subjective percept of a stimulus whereas, in contrast, the posterior-lateral region relates more closely to its physical properties.
Relation to Performance on Detection, Identification, and Discrimination Tasks
A closely related question to those considered in the previous section is how activity in posterior-lateral and ventro-medial regions relates to success or failure on the detection, identification, and discrimination of visual stimuli. To date, several studies have considered the performance/activity relationship in this way, and the evidence for a gross distinction between posterior-lateral and ventro-medial regions is decidedly mixed.
Grill-Spector, Kushnir, and Malach (2000) used a blocked design to examine how recognition of cars, birds, and faces relates to activity in LOC. Both foci of LO and pFs showed roughly equivalent relationships between activity and successful performance and also of extensive training with a subset of the stimuli. However, in a similar but event-related paradigm, Bar et al. (2001) found that activity in pFs was more closely related to successful identification of briefly presented objects than activity in LO. In a study that focused on faces, Grill-Spector et al. (2004) found that activity levels in both OFA and FFA correlated trial-wise to roughly the same extent with successful detection and identification of faces. Nestor, Vettel, and Tarr (2008) measured OFA and FFA responses to face fragments that were selected based on how well they supported behavioral success on either detection or identification tasks. Gross changes in activity were seen in both OFA and FFA for stimuli associated with successful detection and MVPA revealed patterns in FFA, but not OFA, that distinguished fragments associated with successful identification. And Williams, Dang, and Kanwisher (2007) showed that in an object classification task, between-category information contained in LO (about chairs vs. bottles) correlated with behavioral performance, whereas no category information was found in pFs.
Finally, we note a recent study taking a somewhat different approach, in which activity in object-selective cortex was measured during formation of an attentional set. Stokes, Thompson, Cusack, and Duncan (2009) used MVPA to identify areas in which the preparation to search for a given target shape, in advance of seeing the target itself, modulated population activity in visual cortex. Participants were cued with a tone to search for one of two targets (X or O) hidden in visual noise. After the cue but before the target was presented, patterns in LOC distinguished which of the targets was being prepared by the participant. Thus, the intended target was represented by evoking, top–down, activity in the populations that would represent the targets if they were perceived. There was a positive correlation in pFs, but not in LO, between successful performance on the target task and the attentional biasing of activation patterns. Thus, the neural activity in pFs appeared to be particularly relevant, at least in that task, for setting up a successful target template.
All told, although occipito-temporal activity generally does seem to relate to successful performance on detection, categorization, or identification tasks, it does not appear that there is a clear segregation between activity in the posterior-lateral and ventro-medial regions.
Familiarity and Experience
How do the functional characteristics of the posterior-lateral and ventro-medial regions arise, and what is the role of experience in shaping these properties? A focus in the literature has formed around a debate about the role of extensive visual expertise in shaping the response to nonface objects in FFA; we do not attempt to revisit that debate here (see e.g., Harley et al., 2009; McKone, Kanwisher, & Duchaine, 2007; Kanwisher & Yovel, 2006; Xu, 2005; Grill-Spector et al., 2004; Gauthier, Skudlarski, Gore, & Anderson, 2000). There is less evidence on OFA, although it has also been implicated in such expertise (Gauthier, Skudlarski, et al., 2000).
Effects of familiarity and experience can also be examined by considering the difference between novel and familiar examples of a region's “preferred” stimulus class (e.g., Henson, Shallice, & Dolan, 2000). A review of recent studies produces mixed evidence. For faces, for example, Ewbank and Andrews (2008) found a dissociation between OFA and FFA in the adaptation to faces as a function of face familiarity: In FFA, adaptation was found to both unfamiliar and familiar faces (and was view invariant up to 8° rotation for the latter), whereas in OFA, adaptation was found only for unfamiliar faces. In contrast, Andrews et al. (2010) found similar adaptation in these two regions for familiar faces, but weaker adaptation in OFA than FFA for unfamiliar faces. Finally, Davies-Thompson, Gouws, and Andrews (2009) found similar levels of adaptation for familiar and unfamiliar images in both areas.
A similar mixture of evidence is found in studies on body perception. Comparing images of the self (which is presumably highly familiar) to unfamiliar others, Hodzic, Kaas, Muckli, Stirn, and Singer (2009) found no difference in right EBA but an increased response to the self in left EBA and right FBA. In a similar study, no difference was found between self and unfamiliar other in either region (Hodzic, Muckli, Singer, & Stirn, 2009), although Vocks et al. (2010) reported small increases to the self, relative to unfamiliar others, in right EBA and FBA. Note that across these and similar studies, familiarity or identity effects tend to be found only where the distinction is made explicit to participants and/or part of their task, suggestive of attentional effects rather than of familiarity per se (see Downing & Peelen, in press). Finally, intriguingly, EBA (but not FBA) shows a preference for viewing body parts in the “typical” location (e.g., right side of the body in the left visual field), suggesting a bias in favor of commonly experienced configurations (Chan et al., 2010).
The initial reports of object-selective responses in the LOC showed no difference between familiar and novel, meaningless objects (Kanwisher, Woods, Iacoboni, & Mazziotta, 1997; Malach et al., 1995). Studies of training have produced mixed results with regard to posterior-lateral versus ventro-medial representations. Grill-Spector et al. (2000) scanned participants before and after training on a task requiring recognition of briefly presented, masked familiar objects and unfamiliar faces. Signal changes in both LO and pFs were amplified to a roughly similar degree by training. A similar equivalence between subregions was found by Gillebert, Op de Beeck, Panis, and Wagemans (2009), who trained participants to perform two different tasks on outlines of already-familiar objects. Op de Beeck, Baker, Dicarlo, and Kanwisher (2006) used high-resolution fMRI to scan subjects while they viewed novel object classes. Subjects were scanned before and after training to discriminate between exemplars within one of these object classes. The training index was significantly larger in the LO than in pFs. In addition, the training effect in LO but not pFs was significantly correlated across subjects with the behavioral improvement subjects showed during training.
Overall, then, although there is evidence for changes in posterior-lateral and ventro-medial cortices in response to increasing familiarity with faces, bodies, and objects, we do not find a clear pattern favoring modulation in one or the other of these broad areas.
Timescales of Adaptation
Many of the results described above rely on the logic of fMRI adaptation (also referred to as repetition suppression) to infer the properties of extrastriate regions (Grill-Spector & Malach, 2001; Grill-Spector et al., 2001; Henson et al., 2000; Buckner et al., 1998; but see Sawamura, Orban, & Vogels, 2006). Some recent findings indicate that adaptation effects may be different across different time scales and further, that these effects may not be the same for posterior-lateral and ventro-medial regions. For example, Fang, Murray, and He (2007) found different patterns of invariance to changing views of faces with short (300 msec) or long (25 sec) adapting stimuli: Long adapters elicited increasingly large responses from increasingly large viewpoint changes, whereas short adapters did not. Furthermore, in OFA (unlike FFA) long-duration adapters did not produce adaptation at 30° rotations, which the authors attributed to a part-based representation in that region (see above). Likewise, Kovacs et al. (2008) found differential effects of adaptation duration to faces: FFA effects were independent of the duration of the adapting stimulus (500 msec or 4500 msec), whereas in OFA effects were seen only at the longer duration; and adaptation effects were generally stronger in FFA. Similarly, Betts and Wilson (2010) found a generally greater level of adaptation in FFA than in OFA with relatively long-lasting adapters (5000 msec). Note that hemodynamic response nonlinearities are roughly equivalent in OFA and FFA, suggesting that differences in adaptation between these regions are not because of differences in extended poststimulus neural activity in the two areas (Mukamel, 2004).
The disparate methodologies employed in these studies and the focus on faces make it difficult to draw strong conclusions about the relationship between anatomy and duration effects on adaptation. However, the evidence suggests that there are differences in both the adapting timescale and the poststimulus duration of adaptation after-effects between posterior-lateral and ventro-medial subregions, and these should be explored further.
Connectivity and Processing Architecture
The above considerations suggest a simple model in which more elemental, stimulus-based information about faces, bodies, and objects is assembled in posterior-lateral regions, and this information is then passed on to the anterior ventro-medial areas. To examine connectivity among face-selective brain areas, Fairhall and Ishai (2007) used dynamic causal modeling (DCM; Friston, Harrison, & Penny, 2003). Comparing across a family of possible connectivity models, their evidence favored a unidirectional influence of OFA on FFA (see also Pitcher et al., 2011). Similarly, Rotshtein, Vuilleumier, Winston, Driver, and Dolan (2007) found supporting evidence in connectivity analyses for projection of several occipital areas (likely including OFA) onto FFA. However, Rossion (2008) has argued from a combination of fMRI and neuropsychological evidence (e.g., Steeves et al., 2006; Rossion et al., 2003) for a contrasting model, in which FFA receives direct input from early visual areas to represent the whole face, and OFA responses are subsequently refined by mutual interactions with FFA.Although the focus has been on faces, a recent report used DCM to examine the functional connectivity between EBA and FBA (Ewbank et al., 2011). Using repetition supression as an index, the authors found evidence for qualitatively different directional influences: the “bottom–up” EBA to FBA connection was modulated only by exact stimulus repetition, whereas the “top–down” connection showed more complex properties and was modulated by repetitions across changes in size and viewpoint.
A related question is whether nonvisual areas (e.g., in pFC) differentially influence posterior-lateral or ventro-medial areas: Is one set of areas more subject to external influences? Summerfield et al. (2006) required participants to adopt a mental set to detect, among distractor items, either face or house targets. Although stimulus properties were held constant between the two conditions, DCM showed that adopting a face set increased the influence of ventral medial frontal cortex on FFA, but not on OFA. In contrast, however, Li et al. (2010) found modulation of OFA, but not FFA, by OFC in a task that measured observers' false reports of the presence of faces in visual noise.
So at present the evidence base on the connectivity architecture for occipito-temporal regions is still quite sparse and mainly focused on faces. It is too early to determine whether posterior-lateral and ventro-medial regions participate in different ways in functional brain networks, but the existing studies suggest approaches to studying these systems more broadly.
ASSESSMENT AND SYNTHESIS
The above evidence supports a general distinction between two broad occipito-temporal areas that are selective for faces, bodies, and object form. In this framework, the posterior-lateral areas, relative to the ventro-medial areas, perform different analyses of their preferred stimuli. With reasonably high confidence, we can argue that, relatively speaking, the posterior-lateral representations are based more on local features of the stimulus; are more spatially confined to the contralateral hemifield and generally more retinotopic; are less invariant to differences in location, lighting, viewpoint, and size; and show weaker selectivity for preferred versus nonpreferred stimuli. In contrast, ventro-medial representations are more strongly tuned to global and configural properties, are more invariant to location and other visual properties, and show stronger selectivity for preferred stimuli.
We further propose that activity in the posterior-lateral regions more closely reflects the stimulus itself rather than the subjective percept that it produces whereas ventro-medial regions show the opposite pattern. Although several lines of evidence support this idea, the related idea that there is a preferential role for ventro-medial over posterior-lateral areas in constructing representations that are directly related to behavioral performance (i.e., in detection or identification tasks) has mixed evidence. Note that previous studies on this behavior–activity relationship have sought to detect correlations between accuracy and activity in general. In the main, however, they have not formulated specific hypotheses in advance about how ventro-medial and posterior-lateral regions may differ in their representations, hence their ability to support behavior on particular overt behavioral tasks. Our prediction is that activity in posterior-lateral and/or ventro-medial regions (equally for faces, bodies, or objects) will relate to success on behavioral tasks when the representational demands created by the specific combination of task and stimulus match the basic functional properties of either or both of these areas.
A similar understanding arises when considering the influences of familiarity and experience on the properties of these regions. The present framework does not make strong predictions that either ventro-medial or posterior-lateral regions should preferentially show the effects of experience on their preferred stimuli. Rather, the locus of changes will depend on the nature of the representations in a given area and how well matched they are to the requirements of the to-be-learned task, which may not necessarily be organized focally nor exclusively found in the regions that respond maximally to a stimulus category (Kourtzi, 2010; Op de Beeck & Baker, 2010). Note also that any studies of familiarity, particularly involving the use of images of the participants' own faces or bodies, may be influenced by variations in attentional salience of the different conditions (Downing & Peelen, in press).
The great majority of the evidence reviewed above comes from fMRI studies that treat the regions in question as unitary, homogenous entities, and we have followed that convention. However, there are recent reports that further fractionate some of them into subregions. To give some examples: Bracci, Ietswaart, Peelen, and Cavina-Pratesi (2010) report an area that is adjacent to, but distinct from, EBA, that responds selectively to human hands, compared with human feet, other body parts, and objects. Likewise, Weiner and Grill-Spector (2011) used high-resolution fMRI to show a reliable substructure of three areas within EBA (see also Weiner & Grill-Spector, 2010). Larsson and Heeger (2006) identified retinotopic maps of the contralateral field that overlapped with the LOC. And finally, Eger, Kell, and Kleinschmidt (2008) found gradients within each subregion of LOC, across which sensitivity to image size changed gradually along a posterior–anterior axis.
Although each of those results has been cast in different ways—for example, Bracci et al. (2010) describe their findings as a possible new region, Weiner & Grill-Spector (2011) as fractionation of an existing area, and Eger et al. (2008) as gradients within existing regions—these findings all represent further steps in ongoing efforts to “deblur” our picture of category-selectivity in the human ventral stream. Increasingly fine-grained measurements (interpreted in parallel with fMRI-guided single-unit studies; e.g., Tsao, Freiwald, Tootell, & Livingstone, 2006) will likely reveal more details about the substructure of the gross posterior-lateral and ventro-medial regions that we focus on here. Our prediction is that this finer-grained picture will not make obsolete the present framework but rather refine it and make it more precise. That is, although gradients or subregions may be identified within the body-, face-, and object-selective areas, they will generally conform to the general properties identified here, depending on their location on the posterior-lateral/ventro-medial axis.
We speculate that some of the familial similarities among face, body, and object representations may be explained by positing that the category-specific foci arise as specialised “branches” off of a generic object–vision system. The response properties of LOC suggest underlying neural populations that are suited to representing the complex conjunctions of simple visual features that can, in combination, represent a variety of forms (and can generalize across viewpoint, size, etc.). Among the space of object shapes that human observers may need to represent, the shapes of bodies (and body parts), and faces, have several unique properties. To name a few: they are highly familiar; they are significant sources of socially relevant information; the shapes they take are highly consistent across exemplars; and they are articulated in complex ways allowing specific characteristic patterns of movement. Furthermore, it may be that other brain networks make differential use of information about faces, bodies, and objects, leading to different patterns of interconnectivity with the occipito-temporal regions which may influence their spatial organization (Mahon & Caramazza, 2011; Martin, 2007). Any or all of these properties and influences could have the effect of leading the representations of these kinds to specialize within the general object-vision system, resulting in the adult pattern of adjacent, highly-focal islands of selectivity. New studies of developing occipito-temporal organization in children (e.g., Peelen et al., 2009; Pelphrey, Lopez, & Morris, 2009; Golarai et al., 2007) may shed some light on how this pattern arises.
OPPORTUNITIES AND OPEN QUESTIONS
To date, the bulk of relevant evidence for the present discussion is on faces, with somewhat less for objects and far less for bodies. Experimental paradigms that have been tested on one stimulus category could in many cases simply be adapted to the others. Furthermore, many studies that could potentially reveal differences between posterior-lateral and ventro-medial subregions were conducted only on one of these regions. To give just one example, Yin, Shimojo, Moore, and Engel (2002) showed that the activity produced in pFs by line drawings of objects was reduced, in line with behavioral performance, when the same objects were viewed anorthoscopically (as if seen through a narrow slit). Data from LO were not reported but could be revealing about the relative role of LO and pFs in shape integration and the relationship of neural activity to the subjective perception of whole objects.
Beyond applying previously developed protocols to new categories and more complete inclusion of ROIs, some questions and techniques seem to us particularly promising for new work. First, although fMRI adaptation designs have contributed greatly to the evidence accrued above, there appears to be further untapped potential in using different timescales of adaptation (e.g., short vs. long adapters; short lag vs. long lag adaptation) as a way of unpicking the properties of posterior-lateral and ventro-medial regions. Second, more MVPA studies that relate patterns of local activity to performance on a variety of tasks and that compare the neural, subjective, and objective “spaces” that represent visual kinds, will be valuable. Finally, the evidence on areal interconnections in human occipito-temporal areas, and their projections to other networks, remains indirect and contradictory. Connectivity analyses, whether functional (e.g., using DCM or TMS in combination with fMRI) or anatomical (e.g., fiber reconstruction based on diffusion tensor imaging; e.g., Thomas et al., 2009) will add greatly to the current poor understanding of how these areas interact with each other and with other brain regions.
How far does the proposed two-part representational scheme extend? Notably, many studies of occipito-temporal cortex have revealed strong and selective responses to visual depictions of places, including scenes and buildings (Aguirre, Zarahn, & D'Esposito, 1998; Epstein & Kanwisher, 1998). These have some of the same properties (e.g., strong focal responses; modulation by attention) as the areas reviewed here and are often studied together with the same methods (e.g., Levy et al., 2001). However, the spatial layouts that are characteristic of places are different in many respects from objects (including faces and bodies), and neural analysis of these properties supports different kinds of tasks such as navigation and spatial orienting (Epstein, 2008). It is on these grounds that we have not attempted to integrate scene perception and associated focal cortical areas into the present framework. Nonetheless, when considering the organization of scene/place-selective focal brain regions, it is notable that two regions are routinely implicated, with a more ventral, anterior aspect (parahippocampal place area) and a more posterior (although also medial) aspect (retrosplenial cortex). Several studies implicate the former of these in view-specific analysis, and the latter in integration of views for representation of the environment more broadly (Park & Chun, 2009; Epstein, 2008). Whether this division of labor has parallels to the scheme for objects, faces, and bodies developed here could be explored further.
The evidence reviewed here only includes studies that test the visual (or visual imagery) responses of the occipito-temporal cortex. Recent evidence, however, implicates at least some part of these broad regions in tactile processing. For example, Amedi, Malach, Hendler, Peled, and Zohary (2001; see also James et al., 2002) report that a small portion of the LOC also responds selectively to the haptic exploration of objects (compared with textures). More generally, research with blind participants (e.g., Mahon, Anzellotti, Schwarzbach, Zampini, & Caramazza, 2009) raises the possibility that some aspects of occipito-temporal organization are modality-general and arise even in the absence of visual stimulation. How well the present scheme will accommodate the findings of tactile selectivity remains to be seen. One recent finding is promising: Costantini, Urgesi, Galati, Romani, and Aglioti (2011) found selectivity for tactile exploration of model body parts in EBA, but not FBA, consistent with the part-based representation for the former region proposed above (see also Kitada, Johnsrude, Kochiyama, & Lederman, 2009).
In this article, we have developed a proposal for the organization of part of the occipito-temporal cortex, in which highly selective, focal regions fall into two families, each of which shares some general properties across their specific category-selective response. Posterior-lateral regions, relative to ventro-medial regions, are more primitive in their representations and tied more directly to the stimulus than to the subjective percept. It is likely that the patterns of response and selectivity seen in the normal, adult extrastriate cortex reflect the combined operation of many forces and constraints, operating on both evolutionary and developmental time frames (Aflalo & Graziano, 2011). Visual perception is a complex, multidimensional problem, solved by a system that occupies a two-dimensional cortical sheet. Accordingly, when examined from different conceptual viewpoints, instantiated in neuroimaging experiments with specific stimuli and contrasts, different broad patterns will emerge—and these are reflected in the numerous attempts that have been made by researchers to explain these patterns (e.g., Bell, Hadj-Bouziane, Frihauf, Tootell, & Ungerleider, 2009; Kriegeskorte et al., 2008; Op de Beeck, Haushofer, & Kanwisher, 2008; Hasson, Harel, Levy, & Malach, 2003; Beauchamp, Lee, Haxby, & Martin, 2002; Malach, Levy, & Hasson, 2002; Haxby et al., 2001). It may be inappropriate to conceive of one scheme as correct and the others as incorrect, and better instead to view each of these as a necessarily simplified, and perhaps slightly myopic, view of a complex underlying truth. We add the present proposal to the literature, without attempting to argue that it replaces or excludes preceding schemes. Our main hope is that it will provoke and make predictions for many new studies that directly compare faces, bodies, and other objects to test the model outlined here and ultimately to better identify the principles underlying their cortical representation.
Reprint requests should be sent to Paul E. Downing, Brigantia Building, School of Psychology, Bangor University, Bangor, Gwynedd LL57 2AS, UK, or via e-mail: firstname.lastname@example.org.