Abstract

Ventral occipito-temporal cortex is known to play a major role in visual object recognition. Still unknown is whether object familiarity and semantic domain are critical factors in its functional organization. Most models assume a functional locus where exemplars of familiar categories are represented: the structural description system. On the assumption that familiarity should modulate the effect of visual noise on form recognition, we attempted to individualize the structural description system by scanning healthy subjects while they looked at familiar (living and nonliving things) and novel 3-D objects, either with increasing or decreasing visual noise. Familiarity modulated the visual noise effect (particularly when familiar items were living things), revealing a substrate for the structural description system in right occipito-temporal cortex. These regions also responded preferentially to living as compared to nonliving items. Overall, these results suggest that living items are particularly reliant on the structural description system.

INTRODUCTION

In the past two decades, functional neuroimaging has greatly improved our insight in the functional organization of high-level visual processing in the human visual ventral stream (Goodale, Milner, Jakobson, & Carey, 1991). In particular, PET and fMRI studies revealed category-selective regions for faces (fusiform face area, FFA; Kanwisher, McDermott, & Chun, 1997), body parts (extrastriate body area, EBA; Downing, Jiang, Shuman, & Kanwisher, 2001), and scenes (parahippocampal place area, PPA; Epstein & Kanwisher, 1998), partially overlapping with an object-selective area, the lateral occipital complex (LOC), responding more to 3-D shapes than to textures or scrambled images (Malach et al., 1995). More recently, Downing, Chan, Peelen, Dodds, and Kanwisher (2006), in a fMRI study using stimuli drawn from 20 different object categories, confirmed the presence of category-selective regions for faces, scenes, and body parts in the already identified regions (FFA, PPA, and EBA, respectively); moreover, when stimuli were collapsed across nonliving categories (e.g., clothes, chairs) and contrasted with stimuli drawn from living categories (e.g., mammals, fruit), the nonliving domain yielded an activation pattern in the medial aspects of the ventral stream, which partially overlapped the PPA, whereas the living domain activation pattern partially overlapped the FFA and the EBA on the lateral aspects of the ventral stream. Further evidence for a medial versus lateral preferential activation of the ventral stream, for nonliving and living things, respectively, has already been provided in neuroimaging studies (Zannino et al., 2010; Chao, Weisberg, & Martin, 2002; Whatmough, Chertkow, Murtha, & Hanratty, 2002; Chao, Haxby, & Martin, 1999; see Gerlach, 2007 and Joseph, 2001, for reviews), but there is still debate on the consistency of this finding and on its interpretation. In particular, the issue whether category/domain-selectivity reflects a modular organization of human vision, or it can be more parsimoniously accounted by reductionist hypotheses is still unresolved (Op de Beeck, Haushofer, & Kanwisher, 2008). In the first case, the human brain is assumed to comprise different substrates, each carrying out the same process (i.e., visual recognition) on a different kind of stimuli (e.g., faces, body parts, places, and possibly, also living and nonliving categories different from the already quoted ones). By contrast, in a reductionist view (Chao et al., 2002), selectivity is an epiphenomenon, due to the unbalanced cross-categories distribution of properties, which is lower in level than category membership.

A related issue in the investigation of the functional architecture of visual recognition concerns the real existence and the nature of the structural description system which, according to most models of visual processing (Humphreys & Forde, 2001; Humphreys, Riddoch, & Quinlan, 1988), stores long-term memories of how exemplars of known categories look like. According to these models, the structural description system represents the highest level in the visual processing stream, where incoming percepts match structural representations before accessing the semantic system. The existence of a structural description system received empirical support from a few single-case reports of agnosic patients who were impaired at providing semantic information about visually perceived objects, but still being able to perform an object decision task (see, for example, Carlesimo, Casadio, Sabbadini, & Caltagirone, 1998; Hillis & Caramazza, 1995; Riddoch & Humphreys, 1987). In object decision tasks, subjects have to select a figure depicting a familiar object among foils depicting chimeras composed by arranging parts of real objects (e.g., the body of a dog with the head of a cow). Agnosic patients who succeed in this task have been interpreted as having a deficit in accessing semantics from a spared structural description system. Although many researchers recognize the usefulness of a structural description system in the functional architecture of vision, due to the scarcity of suitable patients, neuropsychological investigations did not yield a widely agreed localization for it (Hovius, Kellenbach, Graham, Hodges, & Patterson, 2003). On the other hand, functional neuroimaging studies, by now, have only provided scarce and inconsistent evidence for advancing our knowledge on this topic. Generally, in functional neuroimaging studies, the distinction between familiar objects (which are expected to be represented in the structural description system) and chimeras (which are expected to lack a structural representation) has been operationalized in terms of familiar objects and unfamiliar, novel 3-D shapes. Significantly higher activation levels for familiar than for novel 3-D objects has been reported throughout the ventral stream, spanning from striate cortex to the parahippocampal and fusiform gyri (Vuilleumier, Henson, Driver, & Dolan, 2002; Martin, Wiggs, Ungerleider, & Haxby, 1996). However, in front of positive findings, the alternative hypothesis that they rely on lower-level differences in the visual appearance of familiar and novel objects rather than on familiarity per se may not be ruled out. To circumvent this confound, early functional imaging studies attempted to localize the structural description system by contrasting activation patterns to familiar and to novel 3-D objects within the LOC, that is, within a high-level visual region proven to be selective for processing items with a clear shape interpretation (Kanwisher, Woods, Iacoboni, & Mazziotta, 1997; Malach et al., 1995). The bulk of evidence, however, was inconsistent with this hypothesis. Indeed, when familiar and novel objects were contrasted with scrambled objects in this region, largely superposable activation patterns were found. These negative results have been generally interpreted as arguing for a role of the LOC in shape analysis irrespective of familiarity (Grill-Spector, Kourtzi, & Kanwisher, 2001). However, a few functional studies, which made successful attempts to circumvent low-level confounds using priming paradigms, partially argued against this conclusion (Vuilleumier et al., 2002; Bar et al., 2001; Grill-Spector, Kushnir, Hendler, & Malach, 2000).

In the present work we will approach these pending issues by testing the hypothesis that domain specificity reflects (over and above the effects of possible lower-level factors) a disproportional reliance of living things (as compared to nonliving things) on the structural description system. As suggested by Chao et al. (2002), domain-selectivity might be due to the unbalanced cross-domains distribution of several lower-level factors. These factors are not necessarily visual features (as proposed by those authors) as they could also be represented by processing strategies (Farah, 1997). Here we suggest that a recognition strategy based on a match between an incoming percept and a stored structural description could be particularly suitable for the recognition of living things, due to the fact that, as a rule (see, Price & Humphreys, 1989), exemplars of living categories (e.g., two individual ducks) look much more alike one another than exemplars of manmade categories (e.g., two individual houses). By converse, because exemplars of nonliving categories may differ substantially in their appearance, recognition of artifacts is likely to rely more heavily on an analytic processing strategy. Indeed, when reference to a prototype is a less straightforward route to recognition, considering a single part may be more helpful for making inferences about the function and identity of the whole object; moreover, considering single surface features, such as texture, may help the identification process, allowing inferences about the stuff an object is made of (Cavina-Pratesi, Kentridge, Heywood, & Milner, 2009).

Our hypothesis makes a clear-cut prediction as for the topographical relationship between domain-selectivity activation patterns and a locus for the structural description system. In our view, the substrate of the structural description system should be located within regions showing a living things selectivity. This follows directly from the assumption that a preferential response to living things as compared to artifacts depends on a disproportionate involvement of the structural description system in the recognition of living items. By converse, given the assumption that the ventral stream is primarily organized by domains, one would expect that neural activity arising from the match between an incoming percept and the corresponding stored structural description should be located in the living or in the nonliving-sensitive substrate, depending on the domain the to-be-recognized item refers to. In fact, a truly domain-specific organization predicts that distinct sets of structural representations (comprising either living or nonliving things categories) are stored separately in domain-specific substrates. It is worth noting that a disproportionate reliance of living (as compared to nonliving) things on the structural description system cannot account for the entire cross-domains differential activation pattern. Indeed, only a subregion of the living-specific activation pattern may result from a disproportionate reliance on the structural description system. By contrast, other living-specific regions, as well as the entire nonliving-specific activation pattern, need to be accounted for by other mechanisms, including possible cross-domain differences in the distribution of low-level visual features.

In order to test our hypothesis, we had to address several points. First, we had to localize the structural description system; that is, we had to identify a substrate preferentially responding to familiar objects, circumventing the confound of low-level features. Second, we had to replicate the (controversial) finding of a differential activation pattern across domains. Finally, we had to demonstrate the predicted topographical relationship between structural description system and living things driven activation patterns. With these aims in mind, we devised a new neuroimaging paradigm. Black-and-white photographs of living things, artifacts, and novel 3-D shapes (abstract sculptures) were shown in two conditions: scrambling versus unscrambling. In the scrambling condition, full pictures were progressively degradated by randomly replacing an increasing number of blocks; in the unscrambling condition, each scrambling sequence was shown in the reversed order. Subjects were instructed to press a key as soon as they were able/not anymore able (in the unscrambling and scrambling conditions, respectively) to perceive the 3-D shape of the depicted object. We reasoned that familiarity should modulate the effect of the viewing condition, leading to a Familiarity (familiar vs. novel objects) by Condition (scrambling vs. unscrambling) interaction in regions which are sensitive to the former factor. As we argued above, the sole finding of a main effect of familiarity does not allow us to reliably identify familiarity-sensitive regions because familiar and novel objects may differ at multiple levels of the visual processing stream. By contrast, if we have a hypothesis on how familiarity should interact with another factor and we find evidence for the predicted interaction pattern in a particular locus, then it is very unlikely for this locus to not be directly sensitive to familiarity but to respond to some correlated lower-level factor (unless, of course, we find a good candidate for correlating with familiarity and also being able to modulate in the predicted way the viewing condition effect). In particular, we expected that, in the scrambling condition, shape representations of familiar objects should be more resilient against progressively increasing noise than representations of novel objects, and that this could lead to a higher activation in familiarity-sensitive regions to familiar than to novel objects. On the other hand, we expect the unscrambling condition to be, overall, the most demanding viewing condition because competing shape representations need to be discarded, as visual noise progressively decreases. In our expectations, the modulatory effect of familiarity on the unscrambling > scrambling effect should determine a smaller increase of activity for familiar objects (as compared to novel objects) in the unscrambling (as compared to the scrambling) condition, as familiar shapes are easier to disambiguate, leading to less activity arising from concurrent shape representations.

As for domain-selectivity, because living things and artifacts are presented, this paradigm allows us to attempt to replicate the already-reported differential activation for living things and artifacts in the medial and lateral aspects of the ventral stream. Furthermore, it allows us to pursue our final and more interesting goal, that is, to test the hypothesis that some aspects of the living versus nonliving differential activation patterns depend on an unbalanced reliance upon the structural description system. In this view, taking the factor domain into account, more fine-grained predictions could be made. First, the Familiarity × Viewing condition interaction should be more evident in contrasts involving living things and novel objects across scrambling and unscrambling conditions, than in contrasts involving nonliving things and novel objects across viewing conditions. Second, the location of the Familiarity × Viewing condition interaction should be composed of cortical regions showing a main effect of living things over artifacts. Third, the site of the Familiarity × Viewing condition interaction should be the same irrespective of whether the familiar items involved in the interaction effect are drawn from the living or nonliving domain. This latter prediction needs some further qualifications. In our view, nonliving things pose smaller demands to the structural description system; however, to the extent that this system is shared by all the objects to be recognized, its activation during recognition of nonliving things should be colocalized with that driven by living things processing.

METHODS

Subjects

Thirteen right-handed (according to the Edinburgh Inventory; Oldfield, 1971) subjects (mean age = 24 years, range = 20–30 years, 2 men) participated in the fMRI study. All subjects had normal or corrected-to-normal vision, and had no history of neurological or psychiatric disease or were taking vasoactive or psychotropic medication. All gave their written informed consent. The study was conducted in accordance with the Declaration of Helsinki and has been approved by the independent Ethics Committee of the Fondazione Santa Lucia (Scientific Institute for Research, Hospitalisation and Health Care).

Stimuli

Stimuli presentation was controlled by a script using Cogent software (Cogent 2000, Functional Imaging Laboratory, Wellcome Department of Imaging Neuroscience, UCL, London) within the MATLAB environment. Participants laid into the scanner in a dimly lit environment and were asked to view, via a mirror system, stimuli shown by a DLP projector on a screen behind the head coil. Overall, 34 animals and 34 vehicles, matched for familiarity (a variable expressing how usual a real item is in one's realm of experience), were selected from two normative databases (McRae, Cree, Seidenberg, & McNorgan, 2005; Dell'Acqua, Lotto, & Job, 2000). An equal number of living things and artifacts were drawn from each corpus. Living things and artifacts were also matched for prototypicality (Boccardi & Cappa, 1997), a variable expressing the goodness of membership of an item (e.g., lion) for a given category (e.g., animals). For each of the 68 items, six black-and-white photographs (JPEG format) depicting a different exemplar of the corresponding concept (e.g., 6 individual lions) were selected. Overall, these 408 pictures comprised the familiar items of our experiment. As novel items, that is, items not depicting exemplars of known categories, 240 black-and-white photographs of abstract sculptures were used. Photographs were shown at the center of the screen with a height ranging between 3.7° and 7.8° and a width ranging between 5.1° and 10.1°. Mean dimensions of the photographs were balanced across the three categories [F(2, 645) = 1.19, p = .304].

fMRI Experiment

The paradigm used in the present experiment was aimed at investigating the neural substrate of the structural description system. A main effect of familiarity, by itself, is not conclusive at identifying familiarity-sensitive regions because various differences between familiar and novel objects may occur at multiple levels of the visual processing stream. A possibility to circumvent this problem was to look for regions where the effect of increasing/decreasing visual noise was modulated by familiarity.

In the present study, we manipulated two experimental factors orthogonally at the same time in a 2 × 2 factorial design: familiarity, using pictures of familiar items (FAM) and novel items (NOVEL), and the viewing condition, using scrambling (scr) and unscrambling (uns) sequences of the same images. In the scrambling condition, visual noise progressively increased. This was obtained by presenting 11 consecutive images of the same picture (each lasting 300 msec); across pictures, a linearly increasing number of blocks (10% per image) of 0.2° × 0.2° was randomly exchanged in position (Figure 1). In the unscrambling condition, each scrambling sequence was shown in the reversed order.

Figure 1. 

Schematic representation of a sequence of images in a trial of the scrambling condition (A) and the unscrambling condition (B). Each image represents a single frame of a sequence of 11 images, each one lasting 300 msec. The percentage reported refers to the relative quantity of squares exchanged in each frame.

Figure 1. 

Schematic representation of a sequence of images in a trial of the scrambling condition (A) and the unscrambling condition (B). Each image represents a single frame of a sequence of 11 images, each one lasting 300 msec. The percentage reported refers to the relative quantity of squares exchanged in each frame.

Because we were interested also to evidence any differential brain activity between living (L) and nonliving (NL) things, familiar items of the experiment were living and nonliving things. To test for domain specificity, we used a further experimental design with a single domain factor with two levels: living and nonliving things. Finally, to check for a disproportionate reliance of living things on the structural description system, we used two separate 2 × 2 factorial design using the familiarity by viewing condition, with living or nonliving items considered as familiar items and novel items as unfamiliar items.

The experimental design comprised a total of six different trial types: living, nonliving, and novel objects each in the scrambling and unscrambling viewing conditions presented in each scanning session. Subjects were investigated in six consecutive scanning sessions, each lasting about 8.5 min for a total of about 52 min of scanning. Each trial lasted for a total of 4810 msec (a sequence of images presented for 3300 msec and an intertrial stimulus interval of 1510 msec). Subjects were instructed to fixate their gaze on a permanent central fixation point (a red cross of about 0.5° × 0.5°) corresponding to the center of each image of sequences. They were asked to press a key of a response box with their right forefinger: in the scrambling condition, when they were not anymore able to perceive the 3-D form of the image they were observing; in the unscrambling condition, when they became able to perceive the 3-D form of the image they were observing. Key pressing time was recorded. Each session comprised 108 trials, each one showing an image of (i) a different exemplar of the 34 living concept, (ii) a different exemplar of the 34 nonliving concepts, and (iii) 40 novel 3-D sculptures. Images were presented in random order in an event-related design. Viewing condition (scrambling/unscrambling) was balanced across different sessions between various images of exemplars of the same living and nonliving concept and within each session. Prior to the fMRI experiment, subjects were trained with the task in a practice session consisting of 18 trials, which was administered outside the scanner in a psychophysics room. Subjects were informed that the purpose of the experiment was to study the neural correlates of the visual form processing.

Image Acquisition

Images were acquired with a T2*-weighted gradient-echo, echo-planar imaging sequence on a 3-T Siemens Allegra scanner (Siemens Medical Systems, Erlangen, Germany) with a standard head coil. Head movements were restricted by mild restraint and cushioning. Thirty-two axial slices aligned with the bicommissural plane of the functional MR images were acquired in each volume using BOLD imaging (repetition time = 2.08 sec, time echo = 30 msec, in-plane resolution = 3 × 3 mm, slice thickness = 2.5 mm, interslice distance = 1.25 mm), covering the entire cortex, including ventral temporal cortex.

Image Processing and Statistical Analysis

The statistical analysis and image preprocessing was performed in SPM5 (Wellcome Department of Cognitive Neurology; www.fil.ion.ucl.ac.uk/spm/) as implemented in MATLAB 7.1 (The MathWorks, Natick, MA, USA). For each participant, we acquired 1548 functional volumes: 258 for each of the six sessions of the experimental task. The first four volumes of each run were discarded to minimize saturation effects. All remaining volumes were realigned to the first volume for motion-correction, and slice-acquisition delays were corrected using the middle slice as reference (the 16th slice). All volumes were normalized to the standard SPM5 echo-planar imaging template, resampled to 2 mm isotropic voxel size, and finally spatially smoothed using an isotropic Gaussian kernel at 8 mm full-width half-maximum. Data were analyzed with the general linear model for event-related designs, using a random effects approach (Penny & Holmes, 2004). Data were analyzed for single subjects (first-level analysis) modeling the six trial types (L scr; L uns; NL scr; NL uns; NOVEL scr; NOVEL uns). Motion parameters were included in the design matrix as covariates of no interest. A high-pass filter with a cutoff period of 128 sec was used. Data were best fitted at every voxel using a combination of effects of interest. These were box functions aligned with the onset of the trial, with a duration corresponding to the time of stimulus presentation (i.e., 3300 msec) convolved with the SPM5 hemodynamic response function. Linear contrasts were used to determine responses corresponding to eight effects of interest, averaging across the six fMRI runs. Six effects of interest corresponded to the six experimental conditions (L scr, L uns, NL scr, NL uns, NOVEL scr, NOVEL uns). Moreover, we computed two additional effects corresponding to familiar scrambling and familiar unscrambling (FAM scr, FAM uns), averaging living and nonliving items in each viewing condition. This resulted in eight contrast images per participant. The contrast images then underwent two separate second-level analysis (within-subjects ANOVA; implemented in SPM5). The first analysis tested the interaction between the familiarity and viewing condition. Thus, the ANOVA modeled the effect of four conditions (FAM scr, FAM uns, NOVEL scr, NOVEL uns; i.e., averaging living and nonliving items; see description of the first-level contrasts above). The second analysis tested the main effect of domain, the main effect of viewing condition, and the interactions between familiarity and viewing condition, now separately considering living or nonliving items as the familiar items (unfamiliar items were always the novel items). Thus, the ANOVA modeled the effect of six conditions (L scr, L uns, NL scr, NL uns, NOVEL scr, NOVEL uns) using separate contrast images for living and nonliving items (see description of the first-level contrasts above). Correction for nonsphericity accounted for any differences in error variance across conditions and any nonindependent error terms for the repeated measures (Friston et al., 2002). A cluster-level threshold for statistical testing was set to p < .05, corrected for multiple comparisons, family-wise multiple-comparisons correction (FWE); cluster size estimated at p-unc. < .001. The activation of the visual structural system, by the interaction between familiarity and viewing condition within domain-selective regions of the ventral stream, was also tested using regions of interest (ROI). This was done by averaging the BOLD signal (MarsBar 0.41, “MARseille Boite A Région d'Intéret” SPM toolbox) of each of the four clusters showing domain selectivity for living or nonliving items in the ventral stream. Within these ROIs, we tested the interaction between familiarity and viewing condition separately for living and nonliving things. A Bonferroni correction for the number of ROIs tested was applied.

RESULTS

Behavioral Data

A 3 × 2 repeated measure subject ANOVA was performed on recognition times as a dependent measure. This was the time during which subjects were able to perceive the 3-D form of the item (in the scrambling condition, it is the time from the beginning of the trial to the keypress; in the unscrambling condition, it is the time from the keypress to the end of the trial). The ANOVAs factors were the stimulus type (L, NL, NOVEL) and the viewing condition (scr, uns). The analysis revealed a significant effect of viewing condition [F(1, 12) = 85.6, p < .001], with the scrambling condition allowing a longer recognition time than the unscrambling one (mean recognition time: scr 1981 msec, uns 608 msec). No significant main effect of Stimulus type [F(2, 24) = 2.49, p = .104] nor an interaction of Stimulus type × Viewing condition [F(2, 24) = 2.14, p = .139] emerged. This result confirmed that shape processing was more effortful during the unscrambling condition compared to the scrambling condition. Contrary to our expectation, familiarity did not increase recognition times (stimulus type main effect and interaction between stimulus type and viewing condition were both not significant).

Neuroimaging Data

Effect of Domain: Living and Nonliving Things

A focus of our study was to observe and document any differential brain activity between living and nonliving things presentation. Figure 2 and Table 1 show the regions demonstrating an enhanced response during living versus nonliving things presentation collapsed across scrambling and unscrambling viewing conditions ([L scr + L uns] > [NL scr + NL uns]) and vice versa ([NL scr + NL uns] > [L scr + L uns]). Living things presentation increased activity bilaterally in occipito-temporal cortex extending dorsally in the middle and inferior occipital gyri, and ventrally from the inferior occipital gyrus to the middle fusiform gyrus. On both hemispheres, the region of enhanced activity for living things laid anterolaterally to the retinotopic V4 (Hadjikhani, Liu, Dale, Cavanagh, & Tootell, 1998) and corresponded to the area defined by the three vertices (dorsal–posterior, ventral–posterior, ventral–anterior) of the LOC according to Grill-Spector et al. (1999). The posterior portion of the cluster included the EBA bilaterally, whereas the FFA partially overlapped the anterior portion of the right cluster (see Downing et al., 2006).

Figure 2. 

Brain areas showing enhanced brain activity for living (living > nonliving things) (hot color) and nonliving things (nonliving > living things) (green color) displayed on the lateral surfaces of the brain and on an axial view at z = −14. Enhanced activity for living things is shown on the lateral surface bilaterally in the middle occipital gyrus (MOG) and in the inferior occipital gyrus (IOG), and in the right hemisphere in the middle frontal gyrus (MFG). Moreover, enhanced brain activity for living things is also shown in the central axial view in the bilateral fusiform gyrus (FG). Enhanced activity for nonliving things is shown in the lateral surface of the left hemisphere in the superior occipital gyrus (SOG) and in the central axial view in the bilateral lingual gyrus (LG) and the calcarine sulcus (CS).

Figure 2. 

Brain areas showing enhanced brain activity for living (living > nonliving things) (hot color) and nonliving things (nonliving > living things) (green color) displayed on the lateral surfaces of the brain and on an axial view at z = −14. Enhanced activity for living things is shown on the lateral surface bilaterally in the middle occipital gyrus (MOG) and in the inferior occipital gyrus (IOG), and in the right hemisphere in the middle frontal gyrus (MFG). Moreover, enhanced brain activity for living things is also shown in the central axial view in the bilateral fusiform gyrus (FG). Enhanced activity for nonliving things is shown in the lateral surface of the left hemisphere in the superior occipital gyrus (SOG) and in the central axial view in the bilateral lingual gyrus (LG) and the calcarine sulcus (CS).

Table 1. 

Main Effects and Interactions in the Main Experiment

Contrast
Region
Hemifield
BA
p(corrected)
Extent(voxels)
Z-score
MNI
x
y
z
Living > Nonliving IOG Right 19 <.001 3432 Inf 48 −78 −6 
MOG Right 19   7.72 52 −76 
FGm Right 37   7.03 44 −50 −24 
FGp Right 19   6.75 42 −66 −18 
MTGp Right 39   6.3 46 −62 12 
MOG Left 19 <.001 2654 7.25 −48 −84 
FGm Left 37   7.01 −44 −52 −22 
IOG Left 19   5.91 −40 −76 −8 
IFG Right 44 .001 500 4.37 54 14 42 
IFG Right 45   3.69 50 20 18 
MFG Right   3.35 40 38 
Nonliving > Living LG Left 37 <.001 3013 Inf −24 −44 −12 
Precuneus Left 31   5.25 −18 −64 20 
PhG Left    3.97 −22 −16 −24 
LG Right 19 <.001 2671 Inf 30 −48 −10 
Precuneus Right 31   5.51 24 −60 18 
PhG Right    3.29 26 −18 −26 
CS Left 17 <.001 1862 6.7 −12 −92 −10 
CS Right 17   6.34 12 −94 −2 
SOG Left 19 .003 395 5.62 −34 −82 34 
Scrambling > Unscrambling SPL Left <.001 11,241 6.54 −16 −62 68 
SMG Left 40   6.33 −58 −26 50 
MTG Left 21   5.56 −64 −52 
SMG Right 40   5.93 56 −34 52 
Precuneus Right   5.87 −66 60 
MTG Right 21 <.001 553 4.72 64 −38 −2 
Cuneus Right 19 .017 274 3.62 −84 28 
Unscrambling > Scrambling FGp Left 19 <.001 8325 Inf −32 −70 −12 
IOG Left 19   Inf −34 −74 −10 
FGm Left 37   Inf −28 −56 −14 
MOG Left 19   Inf −32 −80 24 
PhG Left 35   5.21 −24 −36 −22 
IOG Right 19 <.001 9921 Inf 40 −84 −10 
MOG Right 19   Inf 32 −86 18 
PhG Right 35   7.54 30 −38 −24 
SC Right    6.52 −30 −4 
SC Left    5.97 −4 −30 −6 
IFG Right 44 <.001 2501 7.41 46 12 28 
IFG Right 45   6.75 44 26 22 
Insula Right    6.06 32 26 −4 
IFG Left 44 <.001 1839 6.72 −40 28 
Insula Left    6.16 −32 24 
IFG Left 45   5.17 −38 30 −16 
Cerebellum Left  .001 457 6.21 −6 −76 −26 
Cerebellum Right    5.2 10 −80 −26 
SFGm Right <.001 870 5.72 16 54 
SFGm Left   5.11 −6 14 52 
(NOVEL uns − NOVEL scr) > (FAM uns − FAM scr) OTS Right 37 .009 312 3.76 44 −66 −6 
(NOVEL uns − NOVEL scr) > (L uns − L scr) OTS Right 37 <.001 776 4.00 44 −68 −8 
Contrast
Region
Hemifield
BA
p(corrected)
Extent(voxels)
Z-score
MNI
x
y
z
Living > Nonliving IOG Right 19 <.001 3432 Inf 48 −78 −6 
MOG Right 19   7.72 52 −76 
FGm Right 37   7.03 44 −50 −24 
FGp Right 19   6.75 42 −66 −18 
MTGp Right 39   6.3 46 −62 12 
MOG Left 19 <.001 2654 7.25 −48 −84 
FGm Left 37   7.01 −44 −52 −22 
IOG Left 19   5.91 −40 −76 −8 
IFG Right 44 .001 500 4.37 54 14 42 
IFG Right 45   3.69 50 20 18 
MFG Right   3.35 40 38 
Nonliving > Living LG Left 37 <.001 3013 Inf −24 −44 −12 
Precuneus Left 31   5.25 −18 −64 20 
PhG Left    3.97 −22 −16 −24 
LG Right 19 <.001 2671 Inf 30 −48 −10 
Precuneus Right 31   5.51 24 −60 18 
PhG Right    3.29 26 −18 −26 
CS Left 17 <.001 1862 6.7 −12 −92 −10 
CS Right 17   6.34 12 −94 −2 
SOG Left 19 .003 395 5.62 −34 −82 34 
Scrambling > Unscrambling SPL Left <.001 11,241 6.54 −16 −62 68 
SMG Left 40   6.33 −58 −26 50 
MTG Left 21   5.56 −64 −52 
SMG Right 40   5.93 56 −34 52 
Precuneus Right   5.87 −66 60 
MTG Right 21 <.001 553 4.72 64 −38 −2 
Cuneus Right 19 .017 274 3.62 −84 28 
Unscrambling > Scrambling FGp Left 19 <.001 8325 Inf −32 −70 −12 
IOG Left 19   Inf −34 −74 −10 
FGm Left 37   Inf −28 −56 −14 
MOG Left 19   Inf −32 −80 24 
PhG Left 35   5.21 −24 −36 −22 
IOG Right 19 <.001 9921 Inf 40 −84 −10 
MOG Right 19   Inf 32 −86 18 
PhG Right 35   7.54 30 −38 −24 
SC Right    6.52 −30 −4 
SC Left    5.97 −4 −30 −6 
IFG Right 44 <.001 2501 7.41 46 12 28 
IFG Right 45   6.75 44 26 22 
Insula Right    6.06 32 26 −4 
IFG Left 44 <.001 1839 6.72 −40 28 
Insula Left    6.16 −32 24 
IFG Left 45   5.17 −38 30 −16 
Cerebellum Left  .001 457 6.21 −6 −76 −26 
Cerebellum Right    5.2 10 −80 −26 
SFGm Right <.001 870 5.72 16 54 
SFGm Left   5.11 −6 14 52 
(NOVEL uns − NOVEL scr) > (FAM uns − FAM scr) OTS Right 37 .009 312 3.76 44 −66 −6 
(NOVEL uns − NOVEL scr) > (L uns − L scr) OTS Right 37 <.001 776 4.00 44 −68 −8 

MTGp = posterior middle temporal gyrus; IOG = inferior occipital gyrus; MOG = middle occipital gyrus; FGm = middle fusiform gyrus; FGp = posterior fusiform gyrus; MTG = middle temporal gyrus; IFG = inferior frontal gyrus; MFG = middle frontal gyrus; LG = lingual gyrus; PhG = parahippocampal gyrus; CS = calcarine sulcus; SOG = superior occipital gyrus; SPL = superior parietal lobe; SMG = supramarginal gyrus; SC = superior colliculus; SFGm = medial superior frontal gyrus; OTS = occipito-temporal sulcus; Lower case “p” and “m” indicate posterior and middle. All reported clusters were obtained using a cluster-defining voxelwise threshold of p < .001 (uncorrected), and whole-brain FWE-corrected for significance using cluster extent, p < .05.

Enhanced activity was also found in the right hemisphere, specifically, in the posterior inferior and middle frontal gyri. Nonliving things presentation determined an increased activation observed bilaterally in the medial ventral regions including the lingual gyrus, the parahippocampal gyrus, the precuneus, and the calcarine sulcus. On both hemispheres, the region of enhanced activity for nonliving things included the PPA (see Downing et al., 2006).

Effect of Viewing Condition: Scrambling and Unscrambling Condition

The experimental paradigm we proposed in this study was directed to modulate brain activity within brain regions dedicated to object recognition (see Grill-Spector et al., 2001) in the ventral stream. In particular, we expected an increased activation of these brain regions in the unscrambling condition where pictures were progressively recomposed, leading to the simultaneous activation of many competing representations. Table 1 shows the regions demonstrating an enhanced activity during the unscrambling versus scrambling condition collapsed across living and nonliving familiar objects and novel objects ([L uns + NL uns + NOVEL uns] > [L scr + NL scr + NOVEL scr]). As expected, increased activation was observed in a bilateral and symmetrical network, including dorsoventral occipito-temporal cortex (middle and inferior occipital gyrus, posterior and middle fusiform gyrus, parahippocampal gyrus; see bar plots in Figure 3). Further foci of increased activity were found in the insula and prefrontal cortex (inferior frontal gyrus and mesial superior frontal gyrus).

Figure 3. 

Axial view of the brain at z = −14 showing the regions of the extrastriate ventral stream activated in the living versus nonliving main effect (red areas) and nonliving versus living main effect (green areas). These clusters were selected as ROIs to test the interaction between familiarity (separately for living and nonliving things) and viewing condition. On both sides of the figure, signal plots show the estimated activity for each cluster (in arbitrary units, mean-adjusted ± standard error of the mean [SEM]) for each of the three categories: living (L), nonliving (NL), and novel (N) in both viewing conditions, scrambling (SCR) and unscrambling (UNS).

Figure 3. 

Axial view of the brain at z = −14 showing the regions of the extrastriate ventral stream activated in the living versus nonliving main effect (red areas) and nonliving versus living main effect (green areas). These clusters were selected as ROIs to test the interaction between familiarity (separately for living and nonliving things) and viewing condition. On both sides of the figure, signal plots show the estimated activity for each cluster (in arbitrary units, mean-adjusted ± standard error of the mean [SEM]) for each of the three categories: living (L), nonliving (NL), and novel (N) in both viewing conditions, scrambling (SCR) and unscrambling (UNS).

Interaction between Familiarity and Viewing Condition

A main aim of this study was to individuate the structural description system, that is, the system involved in the visual processing of exemplars of familiar items. We evidenced brain regions showing a Familiarity × Viewing condition interaction instead of a main effect of familiarity. This strategy is preferable because familiarity effects could also reflect differences in lower-level features across familiar (animals and vehicles) and unfamiliar items (abstract sculptures). In particular, we expected that the presence of a structural representation for familiar objects should attenuate the modulation of the viewing condition (i.e., uns > scr) for familiar (FAM, i.e., L + NL) objects and, at the same time, increase this modulation for novel objects ([NOVEL uns − NOVEL scr] > [FAM uns − FAM scr]). Moreover, in the scrambling condition, familiar objects should show a greater activity compared to abstract objects because the activity of the structural representation should be more resilient against scrambling. For this reason, we showed only voxels of the interaction within regions where the simple effect of (FAM scr > NOVEL scr) was present (p < .05, uncorrected) by imposing a mask to our interaction contrast. In keeping with these predictions, our analysis revealed a cluster of enhanced activity in the depth of the posterior occipito-temporal sulcus at the level of the preoccipital incisura between the occipital and temporal lobes (Table 1; Figure 4). This region was located between the two vertices of the caudal–dorsal portion of the LOC (Grill-Spector et al., 1999, p. 196), partially overlapping the EBA (Downing et al., 2006).

Figure 4. 

The left side of the figure presents the brain area (hot color) showing significant enhanced brain activity for the interaction between familiarity and viewing condition driven by familiar things (FAM) ([NOVEL uns − NOVEL scr] > [FAM uns − FAM scr]), masked for the simple effect in the scrambling condition of familiar versus abstract (FAM scr > NOVEL scr) (p < .05, unc). The significant active cluster within the ventral occipito-temporal region is displayed on the right lateral surface of the brain and on an axial view at z = −14. On the right side of the figure, signal plots show the estimated activity (in arbitrary units, mean-adjusted ± standard error of the mean [SEM]) for the whole cluster for familiar and novel objects in both viewing conditions, scrambling (SCR) and unscrambling (UNS).

Figure 4. 

The left side of the figure presents the brain area (hot color) showing significant enhanced brain activity for the interaction between familiarity and viewing condition driven by familiar things (FAM) ([NOVEL uns − NOVEL scr] > [FAM uns − FAM scr]), masked for the simple effect in the scrambling condition of familiar versus abstract (FAM scr > NOVEL scr) (p < .05, unc). The significant active cluster within the ventral occipito-temporal region is displayed on the right lateral surface of the brain and on an axial view at z = −14. On the right side of the figure, signal plots show the estimated activity (in arbitrary units, mean-adjusted ± standard error of the mean [SEM]) for the whole cluster for familiar and novel objects in both viewing conditions, scrambling (SCR) and unscrambling (UNS).

Interaction between Familiarity and Viewing Condition across Living and Nonliving Domains

According to our hypothesis, part of the living things selectivity in the ventral stream reflects a greater reliance of this domain on the structural description system, as compared to the nonliving domain. In this view, we expected that the above reported interaction should be driven primarily by living items. Furthermore, we expected such a pattern of Familiarity × Viewing condition interaction to be composed of living things sensitivity regions irrespective of whether the involved familiar items are living or nonliving things. With this purpose, we first performed two different interaction analyses for living and nonliving things by viewing condition at the whole-brain level. Table 1 shows the brain area with a greater modulation of novel objects by viewing condition compared to living things ([NOVEL uns − NOVEL scr] > [L uns − L scr]), masked for the simple effect in the scrambling condition of living versus novel objects (L scr > NOVEL scr) (p < .05, uncorrected). This area comprises the one previously obtained, collapsing all familiar items and showing the same pattern of activity. No significant activations were obtained when the same interaction was performed with nonliving things ([NOVEL uns − NOVEL scr] > [NL uns − NL scr]), masked for the simple effect in the scrambling condition of nonliving versus novel objects (NL scr > NOVEL scr) (p < .05, uncorrected).

In our view, the processing of nonliving things relies, to a lesser extent, on the structural description system located in the living-specific brain regions. A whole-brain analysis could fail to detect it for lack of statistical power. For this reason, we performed the same interaction between nonliving things and abstract objects within four different ROIs corresponding to: the left and right ventrolateral occipito-temporal cluster for living things (Table 1; cluster size 3432 for the right hemisphere and 2654 for the left hemisphere) and the left and right extrastriate mesial occipito-temporal cluster for nonliving things (Table 1; cluster size 2671 for the right hemisphere and 3013 for the left hemisphere) (see Figure 3). ROI analysis revealed a significant interaction for nonliving things within the living things selective clusters bilaterally (right cluster: T = 2.65, p = .020; left cluster: T = 2.85, p = .012). No significant interaction was observed within the nonliving selective clusters bilaterally.

DISCUSSION

Three main issues were addressed in this work: the first concerns the characteristics of domain-specific activation patterns in the visual ventral stream; the second, the presence of familiarity-sensitive regions in the same substrate, which are expected on the assumption that a structural description system does exist; and finally, we wanted to verify the hypothesis that part of the living things selective activation pattern depends upon a greater reliance of living items on the structural description system.

Before discussing the imaging results with respect to those three goals, we want to briefly discuss the behavioral data. In keeping with our expectations, the recognition time, that is, the period of time during which subjects were able to perceive the 3-D shape of the stimulus items, was longer in the scrambling than in the unscrambling condition. Contrary to our expectations, however, familiarity did not increase recognition times and did not interact with viewing condition. Indeed, these expectations were based on the assumption that reaction times should express the time courses of the neural processes taking place in the structural description system. Within this system, in keeping with our hypothesis, the activity pattern corresponding to familiar shapes should persist longer as visual noise increases (in the scrambling condition); on the other hand, familiar patterns should inhibit their competitors sooner as visual noise decreases (in the unscrambling condition). Reaction times, however, do not reflect the time course of a single step in a cognitive processing stream such as visual recognition; on the contrary, they result from the contribution of all the subsequent stages. It is thus possible that different cross-categories imbalances in the time courses of other processing stages may have obscured the expected pattern.

Domain-specific Activation Patterns in the Ventral Stream

Our results confirm that pictures of living things and artifacts differentially activate occipito-temporal ventral cortex. In particular, living things as compared to artifacts activated a lateral region of occipito-temporal cortex, comprising the most lateral aspects of the middle and posterior fusiform gyrus, on the ventral surface of the brain, and extending in a caudal–dorsal direction to the middle and inferior occipital gyri. These activations were represented bilaterally, but with a predominance in the right hemisphere. By contrast, nonliving as compared to living things activated bilaterally more medial regions of ventral occipito-temporal cortex, comprising the lingual gyri and extending anteriorly to the parahippocampal gyri and posteriorly to striate cortex. Similar patterns of differential activation across semantic domains in the ventral visual stream have already been observed in a few neuroimaging studies using pictures of living and nonliving things (e.g., Zannino et al., 2010; Downing et al., 2006; Chao et al., 1999, 2002; Whatmough et al., 2002). As previously noted, the living-specific activation pattern partially overlapped with the EBA and the FFA (Downing et al., 2006) and was largely superposable with the LOC. On the other hand, the nonliving-specific activation pattern partially overlapped with the PPA.

The presence of a double dissociation per se does not strongly argue in favor of a truly domain-specific organization of the ventral stream, being also compatible with several reductionist accounts (see Joseph, 2001, p. 128, for a discussion on this point). We will discuss our own proposal in the last section, when commenting upon the results showing an interplay between the factor domain and the observed Familiarity × Viewing condition interaction. However, also without taking familiarity into account, the above reported results pose some constraints to the factors that might underlie the epiphenomenon of domain-specific activation patterns.

To account for the inconsistency in the finding of domain-specific activation patterns between studies, Gerlach (2007) proposed that differences in task requirements may account for part of this variability. In the same vein, Joseph (2001) pointed out that “the cognitive demands of a particular recognition task are as predictive of cortical activation patterns as is category membership” (Joseph, 2001, p. 119). Our results suggest to scale down the possible role of task differences as a confounding factor in the emergence of domain-specific activation patterns. As a matter of fact, in our previous study, we used a completely different task (a word–picture matching task) as in the present one; nevertheless, the domain-related activation patterns were highly superposable in the two studies (see Figure 2 in the present study and Figure 4 in Zannino et al., 2010; to facilitate comparison, both transverse sections are selected at z = −14 in MNI space). The other studies documenting a living versus nonliving double dissociation mostly used naming tasks (Chao et al., 1999, 2002; Whatmough et al., 2002); however, in addition to the naming task, Chao et al. (1999) also used a passive viewing and a matching-to-sample task, and Downing et al. (2006) only used a passive viewing paradigm. Thus, in summary, approximately the same differential activation patterns have been found using five different tasks (naming, word–picture matching, passive viewing, matching to sample, and the task used in the present study). We believe that the bulk of evidence strongly argues in favor of some factor driving the observed differential activations across domains irrespective of the cognitive task at hand.

A second point regards the role of two further potential confounding factors pointed out by Gerlach (2007). According to this author, a cross-domains imbalance of stimulus complexity and familiarity could be responsible for apparent category-specific activations (in this context, the term familiarity is used in a different sense as otherwise in this article; indeed, it refers to a continuous variable expressing how usual a real item is in one's realm of experience; see Methods). As for familiarity, because this factor was balanced across domains in the stimulus material used in the present work, we were able to rule out its impact on our domain-specific activation patterns. As for complexity, Gerlach (2007; see also Joseph, 2001) noticed that regions responding preferentially to living things were found when more complex living things (i.e., animals, which have many component parts) were contrasted with less complex artifacts (i.e., tools, with only few component parts). By contrast, when more complex artifacts, such as vehicles, were used, cross-domains differences disappear. As a matter of fact, three of the five previous studies reporting a double dissociation in the ventral stream used animals and tools when contrasting living and nonliving things (Chao et al., 1999, 2002; Whatmough et al., 2002). However, in the present study, we used the more visually complex category of vehicles for representing nonliving things, and in our previous study, items drawn from the categories of furniture and vehicles in the nonliving domains were contrasted with living things comprising fruits, vegetables, and animals. Finally, Downing et al. (2006) used items spanning across 10 living and 10 nonliving categories. Thus, the present study, the previous one by our group, and that by Downing et al. all argue against a major role of visual complexity in the finding of a double dissociation of domain-specific activation patterns in the ventral stream.

Although the above discussed results suggest that particular task requirements and cross-domains differences in visual complexity and item familiarity probably only exert a minor role in the emergency of category-specific activation patterns, this does not mean that other low-level factors may not contribute to the observed category-specific picture. On the other hand, it is highly probable that several different factors underlie category specificity in different regions of the ventral stream; thus, for example, our finding of a nonliving-specific activity in striate cortex is very likely to depend on a cross-domains different distribution of low-level visual features in the experimental items. By contrast, the living things-specific activation of the anterolateral portions of the ventral stream might depend on higher-level cross-domains differences; in particular, as we hypothesize in this work (see the last Discussion section), the living effect in this region may result from a greater reliance of living items on the structural description system.

Familiarity-sensitive Regions in the Ventral Stream

A further focus of interest of this work was related to the investigation of possible regions showing a selective response for familiar items, that is, for visually presented exemplars of known categories. As reported in the Introduction, this characteristic should be owned by the structural description system, where incoming percepts match long-term stored structural representations, specifying the visual appearance of familiar items. Neuroimaging studies faced two major problems when looking for the substrate of the structural description system. First, because familiar and novel items are necessarily different at the physical level, familiarity-sensitive activations may reflect the action of some lower-level visual factor than familiarity itself. Second, when the search for familiarity-sensitive regions was restricted to high-level nonretinotopic regions (i.e., the LOC) to limit the effect of low-level perceptual differences, as a rule, no differences between familiar items and novel 3-D shapes were indeed found (Grill-Spector et al., 2001). In the present study, we controlled for low-level confounding factors devising a paradigm able to reveal familiarity-sensitive regions in terms of an interaction between familiarity (familiar vs. novel) and viewing condition (scrambling vs. unscrambling). We reasoned that in the scrambling condition, where clear images progressively fragment, familiar items should be more resilient against increasing noise due to their stored structural representations. We also expected that the processing of form should be, overall, more effortful in the unscrambling condition, where 3-D shapes build up progressively, due to a high number of competitors; this should lead to a main effect of unscrambling in the ventral stream. However, we expected the activity increase for familiar items to be smaller than for novel ones, because stored structural representations allow us to more easily rule out concurrent activation patterns.

Our results were in keeping with these expectations. The main effect of viewing condition (unscrambling > scrambling) activated a large expanse of ventral temporo-occipital cortex, broadly overlapping with both domain-specific main effects, thus suggesting that this viewing condition poses greater demands for almost the entire extrastriate ventral stream. More interestingly, a significant Familiarity × Viewing condition interaction was found. The pattern of activity within the significant cluster of the interaction (see Figure 4) was in keeping with our predictions; indeed, familiar items showed greater activity as compared to novel items in the scrambling condition, and the modulation of the viewing condition was larger for novel than for familiar items (due to the fact that the increment of activity in the unscrambling condition was lower for familiar than for novel items). The region showing this interaction was located in the right caudal–dorsal portion of the LOC (Grill-Spector et al., 1999), around the edge between ventral and dorsal cortex at the level of the preoccipital incisura. These results argue in favor of a location of the structural description system within these regions. It should be noted, however, that our findings do not enable us to definitely rule out the possibility that some factors, other than familiarity per se, showing an unbalanced distribution across familiar and unfamiliar items, may have contributed to the activity patterns we predicted based on our hypothesis. Thus, to support our conclusions, congruent findings, already obtained by other groups using different paradigms, are particularly valuable. Using an entirely different approach, Gerlach, Law, Gade, and Paulson (1999) reached similar conclusions about the localization of the structural description system in a PET study. In that study, subjects performed an object decision task in an easy and a difficult condition. In the easy condition, unfamiliar items consisted of novel objects completely unknown by the subjects; in the difficult condition, unfamiliar items consisted of chimeras exchanging parts belonging to familiar objects. These authors reasoned that the difficult condition should pose greater demands to the structural description system leading to a greater activation of the corresponding substrate. A differential activation for difficult > easy condition was found in a right ventrolateral region (peak at x = 54, y = −62, z = −12 in Talairach space), which is very close to the site where we found a Familiarity × Viewing condition interaction in the present study. Our study extends these results because we are able to rule out the effect of possible low-level confounding factors. By contrast, stimuli in the easy and difficult condition in Gerlach's study were not identical at the perceptual level, thus the effect of a low-level confounding factor cannot be ruled out in that study.

In an fMRI object decision paradigm devised for preventing possible low-level confounds, Vuilleumier et al. (2002, Experiment 1) found a bilateral (left skewed) region within the LOC (fusiform/inferior temporal gyri) showing “stronger response and repetition decrease for real objects” than for novel 3-D object. These authors argued that, although object selectivity per se could be attributed to physical differences between real and nonreal items, the Repetition decrease × Familiarity interaction is likely to reflect the reactivation of preexisting structural representation of real objects. In the same vein, Bar et al. (2001) and Grill-Spector et al. (2000) searched for differential activations related to the recognition versus vision without recognition of identical visual stimuli. Both studies found a greater activation for recognized than for nonrecognized items within the LOC bilaterally. Because recognized and not recognized stimuli were identical as for physical features and image duration, different activation patterns are likely related to the more or less successful match with previous stored representations in the structural description system. Vuilleumier et al. only used pictures of man-made objects, whereas both the latter studies used stimuli drawn from the living and the nonliving domains without distinguishing between them. Our results confirm these suggestions of a central role of the LOC in high-level recognition processes, but improve on previous works providing some insights on the interplay between familiarity and domain sensitivity in the ventral stream.

Familiarity × Viewing Condition Interaction in the Living vs. Nonliving Domain

Our interest in demonstrating familiarity-sensitive regions in the ventral stream goes beyond the scope of finding the possible substrate for the structural description system. We were also interested in verifying our hypothesis about the factors underlying domain specificity. In our view, an important role in the emergency of living-specific activation patterns may be ascribed to the greater reliance of living things on the structural description system. According to this hypothesis, the expectations as for the Familiarity × Viewing condition interaction become more fine-grained. In particular, the Familiarity × Viewing condition interaction should be more evident when living, rather than nonliving, things were contrasted with novel objects across viewing conditions; furthermore, any Familiarity × Viewing condition interaction should be localized within the regions of the cortex showing a main effect of living things irrespective of the domain the familiar items are drawn from.

To ascertain that the Familiarity × Viewing condition interaction we found when collapsing living and nonliving things was actually driven by living things, the same interaction was tested separately for living versus novel objects and nonliving versus novel objects. As expected, only in the first case was a significant interaction found, notably in the same region as when all familiar items were entered into the analysis without distinguishing between living and nonliving things. Finally, we wanted to verify that, although nonliving things rely less than living things on the structural description system, to the extent that their structural representations are processed during recognition, this process takes place in the same substrate where living things structural representations were processed. To this aim, we tested for a Familiarity × Viewing condition interaction for nonliving things in four ROIs, corresponding to the two bilateral extrastriate areas showing a main effect for either living or nonliving things. A significant interaction was found only within those ROIs showing a main effect of living things; that is, in the right ventrolateral region, where the Familiarity × Condition interaction was found collapsing living and nonliving things, and in a homologue site in the contralateral ROI. By contrast, no interaction was found in regions showing a significant main effect of nonliving things. Overall, these results suggest that an important factor underlying domain specificity in the ventral stream may be a different degree of reliance of living and nonliving things on different subsystems of visual recognition. The main finding arguing for this conclusion is that the interaction revealing a substrate sensitive to familiarity is composed of the regions showing a main effect of living things, irrespective of whether the familiar items involved in the interaction analysis are living or nonliving things.

In the above quoted paper, Gerlach et al. (1999) made a similar attempt to demonstrate that living and nonliving things differentially tax the structural description system. These authors hypothesized a more critical role of the structural description system for recognizing living than nonliving things. This expectation was based on the assumption that living things are harder to recognize because they “tend to be globally more visually similar and share more common parts with other member of their categories than other objects” (Gerlach et al., 1999, p. 2160). This between-items similarity, which regards the visual appearance of different members of a semantic category (e.g., a dog and a cat in the category of animals and a hammer and a saw in the category of tools) should not be confused with the within-item similarity, regarding the visual appearance of different exemplars of the same concepts (e.g., different cats or different saws). In our account, a major degree of within-item similarity is the reason why living things rely more heavily on the structural description system (see Introduction).

To verify their hypothesis by means of the above-described object decision task, familiar items were selected half from the living and half from the nonliving domain. These authors, however, only found scarce evidence supporting their hypothesis in the imaging data. First, because no main effect of domain was found, they were not able to ascertain if the effect revealing the substrate for structural description system (i.e., the contrast between easy and difficult task condition; see above) was located in a region more crucial for living things processing. By contrast, in keeping with the assumption of a major reliance of living things on the structural description system, we found the expected Familiarity × Viewing condition interaction within a region showing a living things main effect. Second, these authors did not find a Domain × Task difficulty interaction, and simple main effects analyses only revealed a trend for a larger effect of task difficulty in the living domain. Namely, the contrast difficult living > easy living was marginally significant at the cluster level, whereas in the artifact domain it did not reach significance at all.

Our reductionist account of domain selectivity in the ventral stream assumes that (at least) two different processing systems are simultaneously involved in object recognition: the structural description system, located in the right LOC, which is particularly taxed by living things; and a more analytic processing system, located in the medial aspects of the ventral stream, which is more heavily taxed by nonliving things. Although our experimental paradigm was especially devised to investigate the structural description system and not the more analytic system that has been proposed to be particularly involved in recognizing nonliving items (Farah, 1997), we want to deserve some concluding remarks to the latter system. As we proposed in the Introduction, an important aspect of an analytic approach to visually perceived stimuli may consist in the appreciation of their surface properties (color, lightness, and particularly, texture) for obtaining information on the material from which an object is made. Recent functional neuroimaging evidence suggests a division of labor between the LOC and medial regions of the ventral stream surrounding the anterior collateral sulcus for processing shapes and surface textures, respectively (Cant, Arnott, & Goodale, 2009; Cavina-Pratesi et al., 2009). In particular, Cant et al. (2009), in a fMRI adaptation experiment, demonstrated texture-sensitive regions in the collateral sulcus bilaterally and in right parahippocampal cortex. It is worth noting that in our experiment, these regions are included in those corresponding to the main effects of nonliving. Furthermore, because the texture-sensitive area also “overlaps regions that have been shown to be specialized for processing of scenes,” namely, the PPA (Cant et al., 2009, p. 399), it has been hypothesized that processing surface features might be particularly helpful for the “rapid categorization of scenes” (Cavina-Pratesi et al., 2009, p. 433).

Overall, the bulk of the evidence suggests that the medial anterior regions of the ventral stream are sensitive to scenes, artifacts, and textures; by converse, the more lateral regions are sensitive to living things, faces, body parts, and global shape/structural description (see the Introduction for parallelism between category-specific and domain-specific activation patterns). This picture is clearly in keeping with our proposal of a differential reliance, across domains and categories, on two segregable processing systems which cooperate to object recognition as hypothesized, for example, by Farah (1997). In this framework, the relationship between regions processing global shapes and regions storing structural descriptions (and therefore being sensitive to familiarity) needs some further qualification. We believe that within the LOC complex 3-D shapes might be represented as patterns of more elementary shape components such as the geons in Biederman’s (1990) approach. Within the LOC, familiar shapes (i.e., shapes of exemplars of familiar categories) are not represented in a different substrate as unfamiliar shapes. The difference between a pattern of geons representing a dog and a pattern of geons representing an abstract sculpture likely depends on a different weighting of the synapses between neural populations supporting familiar and unfamiliar geons' patterns. In this context, the fact that only familiar items own structural representations, whereas unfamiliar 3-D shapes do not, only means that patterns of visual features making up familiar shapes are characterized by a more efficient weighting of the involved synapses. A possible epiphenomenon of this different weighting might be that real objects as compared to unreal ones are characterized by a greater resilience to increasing noise (as in our scrambling condition) and a relatively less effortful pop-up of the complete pattern as perceptual noise is progressively reduced (as in our unscrambling condition). These characteristics, in our view, account for the reported Familiarity × Viewing condition interaction.

Reprint requests should be sent to Gian Daniele Zannino, Clinical and Behavioural Neurology Laboratory, IRCCS Fondazione Santa Lucia, Via Ardeatina 306, Rome, 00179, Italy, or via e-mail: g.zannino@hsantalucia.it.

REFERENCES

REFERENCES
Bar
,
M.
,
Tootell
,
R. B. H.
,
Schacter
,
D. L.
,
Greve
,
D. L.
,
Fischl
,
B.
,
Mendola
,
J. D.
,
et al
(
2001
).
Cortical mechanisms specific to explicit visual object recognition.
Neuron
,
29
,
529
535
.
Biederman
,
I.
(
1990
).
Higher-level vision.
In D. N. Osherson, S. M. Kosslyn, & J. M. Hollberbach (Eds.),
Visual cognition and action: An invitation to cognitive science
(
Vol. 2
, pp.
41
63
).
Cambridge, MA
:
MIT Press
.
Boccardi
,
M.
, &
Cappa
,
S. F.
(
1997
).
Valori normativi di produzione categoriale per la lingua italiana.
Giornale Italiano di psicologia
,
24
,
425
436
.
Cant
,
J. S.
,
Arnott
,
S. R.
, &
Goodale
,
M. A.
(
2009
).
fMR-adaptation reveals separate processing regions for the perception of form and texture in the human ventral stream.
Experimental Brain Research
,
192
,
391
405
.
Carlesimo
,
G. A.
,
Casadio
,
P.
,
Sabbadini
,
M.
, &
Caltagirone
,
C.
(
1998
).
Associative visual agnosia resulting from a disconnection between intact visual memory and semantic systems.
Cortex
,
34
,
563
576
.
Cavina-Pratesi
,
C.
,
Kentridge
,
R. W.
,
Heywood
,
C. A.
, &
Milner
,
A. D.
(
2009
).
Separate processing of texture and form in the ventral stream: Evidence from fMRI and visual agnosia.
Cerebral Cortex
,
20
,
433
446
.
Chao
,
L. L.
,
Haxby
,
J. V.
, &
Martin
,
A.
(
1999
).
Attribute-based neural substrates in temporal cortex for perceiving and knowing about objects.
Nature Neuroscience
,
2
,
913
919
.
Chao
,
L. L.
,
Weisberg
,
J.
, &
Martin
,
A.
(
2002
).
Experience-dependent modulation of category-related cortical activity.
Cerebral Cortex
,
12
,
545
551
.
Dell'Acqua
,
R.
,
Lotto
,
L.
, &
Job
,
R.
(
2000
).
Naming times and standardized norms for the Italian PD/DPSS set of 266 pictures: Direct comparisons with American, English, French, and Spanish published databases.
Behavior Research Methods, Instruments & Computers
,
32
,
588
615
.
Downing
,
P. E.
,
Chan
,
A. W. Y.
,
Peelen
,
M. V.
,
Dodds
,
C. M.
, &
Kanwisher
,
N.
(
2006
).
Domain specificity in visual cortex.
Cerebral Cortex
,
16
,
1453
1461
.
Downing
,
P. E.
,
Jiang
,
Y.
,
Shuman
,
M.
, &
Kanwisher
,
N.
(
2001
).
A cortical area selective for visual processing of the human body.
Science
,
293
,
2470
2473
.
Epstein
,
R.
, &
Kanwisher
,
N.
(
1998
).
A cortical representation of the local visual environment.
Nature
,
392
,
598
601
.
Farah
,
M. J.
(
1997
).
Distinguishing perceptual and semantic impairments affecting visual object recognition.
Visual Cognition
,
4
,
199
206
.
Friston
,
K. J.
,
Glaser
,
D. E.
,
Henson
,
R. N.
,
Kiebel
,
S.
,
Phillips
,
C.
, &
Ashburner
,
J.
(
2002
).
Classical and Bayesian inference in neuroimaging: Applications.
Neuroimage
,
16
,
484
512
.
Gerlach
,
C.
(
2007
).
A review of functional imaging studies on category specificity.
Journal of Cognitive Neuroscience
,
19
,
296
314
.
Gerlach
,
C.
,
Law
,
I.
,
Gade
,
A.
, &
Paulson
,
O. B.
(
1999
).
Perceptual differentiation and category effects in normal object recognition: A PET study.
Brain
,
122
,
2159
2170
.
Goodale
,
M. A.
,
Milner
,
A. D.
,
Jakobson
,
L. S.
, &
Carey
,
D. P.
(
1991
).
A neurological dissociation between perceiving objects and grasping them.
Nature
,
349
,
154
156
.
Grill-Spector
,
K.
,
Kourtzi
,
Z.
, &
Kanwisher
,
N.
(
2001
).
The lateral occipital complex and its role in object recognition.
Vision Research
,
41
,
1409
1422
.
Grill-Spector
,
K.
,
Kushnir
,
T.
,
Edelman
,
S.
,
Avidan
,
G.
,
Itzchak
,
Y.
, &
Malach
,
R.
(
1999
).
Differential processing of objects under various conditions in the human lateral occipital complex.
Neuron
,
24
,
187
203
.
Grill-Spector
,
K.
,
Kushnir
,
T.
,
Hendler
,
T.
, &
Malach
,
R.
(
2000
).
The dynamics of object-selective activation correlate with recognition performance in humans.
Nature Neuroscience
,
3
,
837
843
.
Hadjikhani
,
N.
,
Liu
,
A. K.
,
Dale
,
A. M.
,
Cavanagh
,
P.
, &
Tootell
,
R. B. H.
(
1998
).
Retinotopy and color sensitivity in human visual cortical area V8.
Nature Neuroscience
,
1
,
235
241
.
Hillis
,
A. E.
, &
Caramazza
,
A.
(
1995
).
Cognitive and neural mechanisms underlying visual and semantic processing: Implications from “Optic Aphasia”.
Journal of Cognitive Neuroscience
,
7
,
457
478
.
Hovius
,
M.
,
Kellenbach
,
M.-L.
,
Graham
,
K. S.
,
Hodges
,
J. R.
, &
Patterson
,
J. R.
(
2003
).
What does the object decision task measure? Reflections on the basis of evidence from semantic dementia.
Neuropsychology
,
17
,
100
107
.
Humphreys
,
G. W.
, &
Forde
,
E. M. E.
(
2001
).
Hierarchies, similarity, and interactivity in object recognition: “Category-specific” neuropsychological deficits.
Behavioral and Brain Sciences
,
24
,
453
509
.
Humphreys
,
G. W.
,
Riddoch
,
M. J.
, &
Quinlan
,
P. T.
(
1988
).
Cascade processes in picture identification.
Cognitive Neuropsychology
,
5
,
67
103
.
Joseph
,
J.
(
2001
).
Functional neuroimaging studies of category specificity in object recognition: A critical review and meta-analysis.
Cognitive, Affective, & Behavioural Neuroscience
,
1
,
119
136
.
Kanwisher
,
N.
,
McDermott
,
J.
, &
Chun
,
M. M.
(
1997
).
The fusiform face area: A module in human extrastriate cortex specialized for face perception.
Journal of Neuroscience
,
17
,
4302
4311
.
Kanwisher
,
N.
,
Woods
,
R. P.
,
Iacoboni
,
M.
, &
Mazziotta
,
J. C.
(
1997
).
A locus in human extrastriate cortex for visual shape analysis.
Journal of Cognitive Neuroscience
,
9
,
133
142
.
Malach
,
R.
,
Reppas
,
J. B.
,
Benson
,
R. R.
,
Kwong
,
K. K.
,
Jiang
,
H.
,
Kennedy
,
W. A.
,
et al
(
1995
).
Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.
Proceedings of the National Academy of Sciences, U.S.A.
,
92
,
8135
3139
.
Martin
,
A.
,
Wiggs
,
C. L.
,
Ungerleider
,
L. G.
, &
Haxby
,
J. V.
(
1996
).
Neural correlates of category-specific knowledge.
Nature
,
379
,
649
652
.
McRae
,
K.
,
Cree
,
G. S.
,
Seidenberg
,
M. S.
, &
McNorgan
,
C.
(
2005
).
Semantic feature production norms for a large set of living and nonliving things.
Behavior Research Methods, Instruments & Computers
,
37
,
547
559
.
Oldfield
,
R. C.
(
1971
).
The assessment and analysis of handedness: The Edinburgh inventory.
Neuropsychologia
,
9
,
97
113
.
Op de Beeck
,
H. P.
,
Haushofer
,
J.
, &
Kanwisher
,
N. G.
(
2008
).
Interpreting fMRI data: Maps, modules and dimensions.
Nature Reviews Neuroscience
,
9
,
123
135
.
Penny
,
W.
, &
Holmes
,
A.
(
2004
).
Random effects analysis.
In R. Frackowiak, K. Friston, C. D. Frith, C. J. Dolan, C. J. Price, S. Zeki, et al. (Eds.),
Human brain function
(
Vol. 2
, pp.
843
851
).
San Diego, CA
:
Elsevier
.
Price
,
C. J.
, &
Humphreys
,
G. W.
(
1989
).
The effects of surface detail on object categorization and naming.
Quarterly Journal of Experimental Psychology, A
,
41
,
797
828
.
Riddoch
,
M. J.
, &
Humphreys
,
G. W.
(
1987
).
Visual object processing in optic aphasia: A case of semantic access agnosia.
Cognitive Neuropsychology
,
4
,
131
185
.
Vuilleumier
,
P.
,
Henson
,
R. N.
,
Driver
,
J.
, &
Dolan
,
R. J.
(
2002
).
Multiple levels of visual object constancy revealed by event-related fMRI of repetition priming.
Nature Neuroscience
,
5
,
491
499
.
Whatmough
,
C.
,
Chertkow
,
H.
,
Murtha
,
S.
, &
Hanratty
,
K.
(
2002
).
Dissociable brain regions process object meaning and object structure during picture naming.
Neuropsychologia
,
40
,
174
186
.
Zannino
,
G. D.
,
Buccione
,
I.
,
Perri
,
R.
,
Macaluso
,
E.
,
Lo Gerfo
,
E.
,
Caltagirone
,
C.
,
et al
(
2010
).
Visual and semantic processing of living things and artifacts: An fMRI study.
Journal of Cognitive Neuroscience
,
22
,
554
570
.