Although object perception involves encoding a wide variety of object properties (e.g., size, color, viewpoint), some properties are irrelevant for identifying the object. The key to successful object recognition is having an internal representation of the object identity that is insensitive to these properties while accurately representing important diagnostic features. Behavioral evidence indicates that the formation of these kinds of invariant object representations takes many years to develop. However, little research has investigated the developmental emergence of invariant object representations in the ventral visual processing stream, particularly in the lateral occipital complex (LOC) that is implicated in object processing in adults. Here, we used an fMR adaptation paradigm to evaluate age-related changes in the neural representation of objects within LOC across variations in size and viewpoint from childhood through early adulthood. We found a dissociation between the neural encoding of object size and object viewpoint within LOC: by age of 5–10 years, area LOC demonstrates adaptation across changes in size, but not viewpoint, suggesting that LOC responses are invariant to size variations, but that adaptation across changes in view is observed in LOC much later in development. Furthermore, activation in LOC was correlated with behavioral indicators of view invariance across the entire sample, such that greater adaptation was correlated with better recognition of objects across changes in viewpoint. We did not observe similar developmental differences within early visual cortex. These results indicate that LOC acquires the capacity to compute invariance specific to different sources of information at different time points over the course of development.
Although the neural circuitry subserving face processing takes many years to reach adult-like maturity and selectivity (Scherf, Behrmann, & Dahl, 2012; Cohen Kadosh, Henson, Cohen Kadosh, Johnson, & Dick, 2010; Golarai et al., 2007; Scherf, Berhmann, Humphreys, & Luna, 2007), the functional profile of the neural regions mediating object processing, such as the lateral occipital complex (LOC), appear more adult-like by 5–8 years (e.g., Grill-Spector, Golarai, & Gabrieli, 2008; Scherf et al., 2007). Specifically, in studies that measure the magnitude of category selectivity for objects compared with either scrambled images (Golarai et al., 2007) or other object categories (e.g., houses, faces; Scherf et al., 2007), even young children appear to exhibit adult levels of object selectivity in ventral visual cortex. However, these studies have not pursued a more fine-grained exploration of this neural profile nor have they investigated whether the computational principles underlying the recognition of specific exemplars of objects (i.e., not just categories of objects) change developmentally in the LOC. This is a central question given that recognition of individual visual objects is enabled by a wide variety of visual computations across many different object properties that might develop along different developmental trajectories. For example, it is commonly held that humans recognize objects across dramatic changes in their appearance, including changes in lighting, position, size, configuration, and pose—a characteristic that is referred to as “invariance.”
Here, we explore the developmental trajectory of both size and view invariance within the LOC, asking when object representations are coded in a size- and view-invariant manner developmentally. Moreover, we evaluate whether invariance in the neural basis of object recognition emerges as a unitary property of LOC. The alternative is that invariance to different sources of image variability arise at different times during development, which would suggest that the LOC acquires the capacity to compute invariance specific to different sources of information at different time points over the course of development.
Development of Object Recognition
Many previous studies examining developmental changes in the neural basis of visual recognition have done so by measuring neural responses to specific object classes—in particular, faces (e.g., Pascalis et al., 2011; Cantlon, Pinel, Dehaene, & Pelphrey, 2010). Within such studies, common objects have typically been considered as control stimuli, rather than the stimulus category of interest. To the extent that the developmental trajectory of nonface objects has been studied, the most common manipulation has been a contrast between the neural responses arising from viewing nonface objects and scrambled objects (e.g., Golarai et al., 2007) or nonface objects and other visual objects known to elicit category-selective activation, such as faces or scenes (Scherf et al., 2007). One of the converging results of these studies has been the observation that ventral visual object-selective neural activation in both children (ages 5–11 years) and adolescents (ages 11–16 years) is comparable, in both magnitude and extent, to the neural activation observed for adults (Golarai et al., 2007; Scherf et al., 2007), particularly in the object-selective LOC.
In contrast, the development of object recognition has been explored behaviorally in considerable detail from infancy through adolescence. Within the first months of age, infants can recognize 3-D shapes (Kaufman & Needham, 2010; Mash, Novak, Berthier, & Keen, 2006) and demonstrate some understanding of shape parts (Haaf et al., 2003). By 18–24 months, infants can recognize a familiar object simply from its prototypical shape without the aid of color or textural surface information (Pereira & Smith, 2009). These types of form recognition continue to improve rapidly along with more general visual abilities, such that by 6–9 years, acuity (Carkeet, Levi, & Manny, 1997; Ciner, Schanel-Klitsch, & Herzberg, 1996; Zanker, Mohn, Weber, Zeitler-Driess, & Fahle, 1992), contrast sensitivity (Bradley & Freeman, 1982), and perception of global form in glass patterns (Gunn et al., 2002) are all adult-like. However, recognition performance for complex objects appears to improve with age from young childhood to adolescence (e.g., Uttal, Gentner, Liu, & Lewis, 2008; Rentschler, Juttner, Osman, Muller, & Caelli, 2004). Relevant to the topic of this article, some of the difficulty in recognizing complex objects earlier in development may arise from the fact that the more complex the object, the more different it is likely to appear under varying viewing conditions. To compute object identity, regions in ventral visual cortex have to develop representations that are invariant to these varying viewing conditions.
Size invariance (i.e., the ability to perceive that an object does not change in size despite changes in retinal size with changes in viewing distance) appears to develop early. Infants as young as 4–5 months of age demonstrate knowledge that objects do not suddenly change in size. Wilcox (1999) tested infants' ability to use size information to individuate objects in an occlusion task. She found that 4.5-month-old infants looked longer (i.e., were surprised) at the visual display when an object appeared to change in size after passing behind an occluder than when the object appeared to maintain the same size. The results suggested that, when the occluder was removed, infants expected to observe multiple objects of different sizes as opposed to a single object that transformed in size behind the occluder, indicating that these young infants exhibit some size constancy for objects.
The demonstration of size constancy in infancy might lead to the prediction that this ability is entirely in place before adulthood; however, size constancy is not an all-or-none phenomenon. Specifically, many studies have shown increased RTs when the size of the to-be-recognized object has changed between viewings (e.g., Ellis, Allport, Humphreys, & Collis, 1989; Jolicoeur, 1987; Besner, 1983). One explanation is that the specific size of an object is encoded when we view it, and when there is a change in size in the subsequent viewing, some scaling transformation process is required to match the new image with the encoded image (e.g., Ullman, 1989). However, the nature of the experimental paradigm appears to affect the extent to which changes in size affect response times: When the task is to assess whether the shape is old/new, there appears to be a delay with a size change (Jolicoeur, 1987), but when the task is to name the shape, there appears to be no cost associated with a size change (Biederman & Cooper, 1992). Therefore, task demands influence the extent to which we observe size-dependent effects, and these task-based influences may be exaggerated in younger observers.
In contrast to size invariance, view invariance appears to take much longer to develop behaviorally. Although no study to date has manipulated 3-D rotation of individual objects and tested the extent of children's view invariance directly, one study that used facial stimuli revealed that younger children were less accurate than older children and adults in matching faces across changes in viewpoint (Mondloch, Geldart, Maurer, & de Schonen, 2003). A second study examined children's ability to learn unfamiliar 3-D objects made entirely of spheres. When recognition of individual objects was based solely on the configuration of the individual spheres making up the 3-D object, children (aged 8–10 years) exhibited worse recognition performance as compared with either adolescents (aged 13–14 years) or adults (Rentschler et al., 2004). Therefore, children appear to have some difficulty in processing configural information—a finding that suggests that they may also have difficulty compensating for changes in viewpoint in that rotations in depth also alter the visible configuration of an object's parts.
Whether adults consistently demonstrate view invariance—and under what circumstances—is also still unclear. Some studies report increased RTs when recognizing objects with increasing degrees of rotation from the original view (Lawson, Humphreys, & Watson, 1994; Humphrey & Khan, 1992; Jolicoeur, 1985; Corballis & McLaren, 1984), whereas other studies report recognition performance that is relatively independent of viewpoint (e.g., Biederman & Bar, 1999). Further complicating matters, it may also be the case that view invariance is achieved through view-dependent mechanisms. For example, one study trained monkeys to recognize 3-D objects and then later tested their ability to recognize the objects across changes in viewpoint (Logothetis, Pauls, Bulthoff, & Poggio, 1994). The findings revealed that one exemplar from a single viewpoint may not be sufficient for view-invariant recognition, but that as few as three key viewpoints (120° apart) may be sufficient to support view-independent recognition (Logothetis et al., 1994).
Thus, the existing findings suggest that view-invariant object recognition is accomplished by an interactive process that involves both view-dependent and view-independent processes (e.g., Hayward, 2003; Biederman & Bar, 1999; Tarr & Bulthoff, 1998). One likely possibility is that view dependence arises under conditions in which observers rely primarily on unstable features, such as more global spatial configurations of parts or image-based information, whereas view independence arises when observers rely on more stable features or structural properties of objects (e.g., Hayward, 2003; O'Toole, Edelman, & Bulthoff, 1998; Murray, Jolicoeur, McMullen, & Ingleton, 1993). Therefore, the extent to which any experimental paradigm reveals view invariance or view dependence depends on the chosen stimuli, task(s), viewpoints, and how such parameters tap into different sources of information present in the image. Critically, it is also the case that the developmental time course(s) for these multiple processes has not been investigated.
In summary, based on what is known about the development of behavioral object recognition abilities, the functional basis of the neural regions supporting object recognition may not be fully developed in late childhood or even adolescence. Note that, in a somewhat different domain, Scherf, Luna, Avidan, & Behrmann (2011) observed that, although adolescents evince adult-like category selectivity for faces in the ventral visual pathway, the way these face-selective brain regions encode information about facial identity continues to mature through early adulthood. Similarly, we address whether the functional properties of LOC remain constant across childhood through adulthood or whether, despite a consistent overall BOLD response to objects as a broad category, there are underlying developmentally driven changes in how the LOC encodes the properties of object size and viewpoint.
We chose to focus on age-related changes in the neural responses to two common visual object transformations, size and viewpoint changes, because both appear to be mediated—at least in part—by LOC (Sawamura, Georgieva, Vogels, Vanduffel, & Orban, 2005). Moreover, we chose to focus on LOC because previous studies have identified LOC as a locus where object recognition relates directly to the neural response (Grill-Spector, Kushnir, Hendler, & Malach, 2000) and recent studies have uncovered robust LOC activation in children (e.g., Grill-Spector et al., 2008; Scherf et al., 2007). In adult humans, several different experimental methods have been used to characterize the nature of object encoding in LOC, including adaptation/priming studies and multivoxel pattern analysis (e.g., Eger, Ashburner, Haynes, Dolan, & Rees, 2008; Grill-Spector et al., 1999). The method we adopt here, fMR adaptation (“fMR-a”), capitalizes on the fact that, with repeated presentations of the same image, the fMR signal is reduced (also known as “BOLD suppression”). fMR-a typically involves a two-step paradigm in which the neural response is adapted (attenuated) by repeated presentation of a single stimulus, and then recovery from adaptation is assessed when some property of the stimulus is changed; for example, the size or viewpoint of the repeatedly presented object. If a given cortical region encodes information that is not invariant over the changed property, then the BOLD signal will not be reduced further and may even show recovery in response to the new stimulus. If, however, that cortical region encodes information that is invariant to the changed property—that is, that brain area is either directly or indirectly involved in the derivation of object representations that are insensitive to the property in question—the changed stimulus will be treated as another instance of the previously repeated stimuli and the BOLD response will not recover and may even show further reduction.
In adults, the LOC shows significant adaptation to the same object presented in different sizes, indicating that the information encoded about objects is largely size-invariant (Konen & Kastner, 2008; Grill-Spector, Kourtzi, & Kanwisher, 2001; Grill-Spector et al., 1999). Whether LOC is view-invariant—that is, shows adaptation to the same object irrespective of viewpoint—is less clear. Some studies have reported a lack of (or only weak) evidence for view invariance (Andresen, Vinberg, & Grill-Spector, 2009; Grill-Spector & Malach, 2001; Grill-Spector et al., 1999), whereas others have offered stronger evidence of view adaptation in LOC (e.g., Konen & Kastner, 2008; James, Humphrey, Gati, Menon, & Goodale, 2002). The discrepancy presumably stems from the nature of the stimuli, as some stimuli may elicit greater view dependence relative to other stimuli (e.g., a donut has only a small number of “aspects,” but a complex object such as a car may give rise to a wide variety of geometrically different views, as well as “catastrophic” or unstable accidental views; Tarr & Kriegman, 2001). Thus, as is the case when assessing view invariance behaviorally, the magnitude of adaptation will be contingent on the type of stimuli selected.
Given that object perception and recognition behavior continue to develop beyond early childhood and even into adolescence, our study explored view and size invariance in LOC in both children and adolescents. To this end, we created a set of novel objects for use as stimuli in experiments designed to assess developmental changes in behavioral and neural sensitivity to size and viewpoint changes. We chose to use images of novel objects to equate the amount of visual experience across age groups because, necessarily, adults have more visual experience with most common objects as compared with children—experience being one factor that affects the neural coding of objects (Gauthier, Tarr, Anderson, Skudlarski, & Gore, 1999). Moreover, given the inconsistencies in the adult literature regarding view invariance within LOC, we also considered whether LOC adaptation may be contingent on the geometry of the objects by characterizing the behavioral view dependency of each individual novel object for adults before selecting the stimuli used in the developmental fMR-a.
Given the extant literature (e.g., Pereira & Smith, 2009), our central prediction was that observers of all ages would exhibit both behavioral and neural (within LOC) size invariance, but that younger children and perhaps even adolescents would not exhibit behavioral or neural invariance across object view. That is, we predicted that we would observe early sensitivity to viewpoint changes, with an increase in both behavioral and neural view invariance with increasing age. This pattern of results would support the notion that there is a prolonged developmental trajectory within the human ventral visual pathway that is associated with the onset of increasingly more invariant visual object representations. In addition, we examined a control brain area, early visual cortex (EVC), to assess whether size and view invariance, if present, are properties specific to LOC.
Experiment 1 examined whether the object stimuli to be used in the functional imaging experiment do, in fact, elicit either view/size-dependence or view/size invariance. View or size dependency is indexed by a decrease in perceptual performance—reduced accuracy or longer RTs—proportional to an increase in the angular or size disparity, respectively, between two objects. In contrast, view or size invariance is indexed by the absence of any significant change in perceptual performance across different angular or size disparities. As mentioned earlier, beyond generally assessing dependence/invariance, measuring the psychophysical properties of each individual stimulus object is important in light of the fact that behavioral and neural responses can vary significantly depending on specific object geometry (e.g., Andresen et al., 2009).
Sixteen college-aged individuals (5 men and 11 women, mean age = 19 years) participated in the study for course credit or $7. Each session took approximately 1 hr. All participants had normal or corrected-to-normal vision and provided written consent to a protocol approved by the institutional review board at Carnegie Mellon University.
Fifteen novel 3-D objects were created to have a unique central part and two smaller parts, with no two objects having the same two appendages (Figure 1). The two appendages were placed so that neither was obscured even in the two most extreme views of 60° and −60° rotation. Each object subtended 10° of visual angle. The stimuli were designed to ensure that the chosen manipulation did not result in, for example, strongly unusual views or occlusion of parts.
To assess size invariance, each object was resized to be 25%, 50%, 75%, 100%, 125%, and 150% of the original size. To assess view invariance, each object was rotated from −60° to 60° in the depth plane and captured in 10° increments (i.e., −60°, −50°, −40°, −30°, −20°, −10°, 0°, 10°, 20°, 30°, 40°, 50°, 60° rotation). In addition, scrambled versions of the original 15 objects were created by dividing the images into a 25 × 25 grid and randomly rearranging the cells of the grid.
The behavioral test required observers to determine whether two objects, presented sequentially, were the same or different despite changes in size or view. Changes in size and view were tested in separate blocks, the order of which was counterbalanced across participants. Each trial began with a fixation cross (750 msec), followed by the first object (750 msec), then a scrambled object (500 msec), followed by the second object, which remained present until a response was made. Participants were instructed to decide with a button press whether the objects were of the same identity or not.
In the size block, there were three conditions: the same-object-same-size condition, the same-object-different-sizes condition, or the different objects condition in which two different objects were shown. For the same-object-different-sizes condition, object pairs were created with a 25%, 50%, or 75% size difference (e.g., for a 25% difference, the first object would be shown at 100% and the second object would be shown at 125% of the original size). Each object was shown at each level of size difference (four levels: 0%, 25%, 50%, or 75%) twice, for a total of 120 same trials (same-object-same-size, same-object-different-sizes) and 120 different (different objects) trials. The exact pairings of a size difference (e.g., a 25% difference could be 25–50% or 125–150% or any pair in between) were counterbalanced across objects.
In the view block, there were also three conditions: the same-object-same-view condition (identical image repeated), the same-object-different-views condition in which the same object was shown in different views and the different objects condition. For same trials, each of the 15 objects were shown at 0° rotation (same-object-same-view) and at 30°, 60°, or 90° rotation (same-object-different-views) four times for a total of 240 same trials (15 objects × 4 views × 4 times). To equate the number of “same” versus “different” trials, there were also 240 different trials in which two different objects were shown. Because there has been some discrepancy in past studies about view invariance in LOC, we used twice as many trials in the view block to ensure that we could adequately capture the individual variability in performance across a wide range of views.
Data were analyzed separately for each block. As expected, overall accuracy was high for the Size block. Mean accuracy for the different trials (i.e., two different objects presented sequentially) was 97% correct. For “same” trials (i.e., same object but with a size change), accuracy was 97% for 0% size difference (i.e., same-object-same-size condition), and for the three levels of same-object-different-sizes condition accuracy was 98% for 25% size difference, 94% for 50% size difference, and 95% for 75% difference. Mean RT did not increase linearly with increasing size difference, F(1, 14) = 1.66, p = .22, which confirms invariance across size (see Figure 2).
Overall accuracy was also high for the View block, with a mean accuracy of 94% on different trials and for same trials, 96% for 0° rotation, 94% for 30° rotation, and 91% for 60° and 90° rotation between the two presentations of the same object. Across all the objects, the mean RT of the participants did not increase linearly with increasing degrees of rotation, F(1, 14) = 2.39, p = .15. However, closer examination of the data on an object-by-object basis revealed that some objects were more view-dependent than others. As a result, we categorized the objects into three groups based on visual inspection by plotting RT against degree of rotation for each individual object (see Figure 3). Objects that elicited a linear increase in RT with greater rotation, F(1, 14) = 12.58, p < .01, were classified as “view-dependent.” Those that did not elicit an increase in RT with rotation were classified as “view-independent.” These classifications were used in the subsequent fMR-a study (Figure 3B). The remaining five objects had a less interpretable pattern of RT results, and these five objects were used in the same-object-same-view condition in Experiment 2, in which the same stimulus was shown repeatedly so that there was no manipulation of viewpoint (Figure 4). All 15 objects were used in the different-objects condition, however, in a unique view not shown in any of the other conditions.
As alluded to earlier, object geometry, particularly with respect to the appearance and disappearance of object features over changes in viewpoint, has a significant impact on the degree to which observers exhibit view dependency or view invariance (Hayward & Williams, 2002; Tarr & Kriegman, 2001; Hayward & Tarr, 1997; Edelman & Bülthoff, 1992). At the same time, different configurations of image contours and parts appear to be more or less stable across changes in viewpoint (Tarr, Bülthoff, Zabinski, & Blanz, 1997). As a result, it is difficult, if not impossible, to objectively quantify the degree to which these different aspects of objects impact view invariance. Therefore, the best method for assessing a given object's view stability is to do so empirically—by measuring the degree to which observers exhibit view dependency when recognizing that object. Following this approach, our empirical results from Experiment 1 allowed us to classify the novel objects as more “view-dependent” or “view-independent” and then carefully assess how object invariance emerges in LOC during development.
Having developed and assessed a set of object stimuli that, in adults, give rise to viewpoint-dependent or viewpoint-independent behavior, we used these stimulus objects to examine the neural basis of size and view invariance across multiple age groups.
Participants included 15 adults (10 men and 5 women, age range = 18–34 years, mean age = 21.7 years; none of whom participated in the behavioral study reported above), 16 adolescents (9 boys and 7 girls, age range = 11–17 years, mean age = 14.4 years), and 10 children (6 boys and 4 girls, age range = 5–10 years, mean age = 7.5 years). Only participants who completed both the fMRI localizer and at least one adaptation task in their entirety, with minimal motion (no more than 4 mm or 1.33 voxels in either task), were included in the analyses. As a result, an additional two adolescents and five children who participated in the study were excluded from the final data analysis because of excessive motion. All participants were healthy with no history of neurological or psychiatric disorders in themselves or in their first-degree relatives, as determined in an interview with participants or participant's parents. All were right-handed and had normal or corrected vision. Participants and/or their legal guardians provided informed consent before participating in the study. All experimental procedures complied with the standards of the University of Pittsburgh and Carnegie Mellon University internal review boards. Participants were paid $30/hr for their participation.
General Procedure and Imaging Parameters
Immediately before the scanning session, all participants were trained to lie still for 15 min in a mock scanner that simulated the noise and confinement of an actual MR scanner. During the scanning session, the stimuli were displayed on a rear-projection screen located inside the MR scanner. EPI BOLD images were acquired in 35 AC–PC aligned slices on a Siemens 3T Allegra scanner (Washington, DC), covering most of the brain and all of the occipital and temporal lobes (repetition time = 3000 msec; echo time = 35 msec; 64 × 64, 3 mm slice thickness; 3.203 × 3.203 mm in-plane resolution). Anatomical images were acquired using a 3-D MP-RAGE pulse sequence with 192 1-mm, T1-weighted, straight sagittal slices. The data were analyzed using Brain Voyager QX (Brain Innovation, Maastricht, Netherlands). Preprocessing of functional images included 3-D-motion correction, filtering out of low frequencies, and resampling the voxels to 1 mm3. None of the functional data were smoothed.
Participants who moved more than 4 mm (1.33 voxels) were not included in the analyses. For the remaining participants, separate one-way ANOVAs on each of the six motion dimensions in each task revealed no age group differences in motion artifacts (all ps > .05), except for translation in the z axis during the LOC localizer experiment, F(2, 41) = 4.60, p = .02, and rotation in the x axis in the view adaptation experiment, F(2, 41) = 4.25, p = .02. In both instances, the children's group (LOC = 0.61 mm, view adaptation = 0.45 mm) showed greater motion than the adults (LOC = 0.31 mm, view adaptation = 0.21 mm) and adolescents (LOC = 0.21 mm, view adaptation = 0.23 mm). To minimize the contribution of residual motion artifacts to group differences in activation patterns, the relevant motion parameters were used as covariates in all subsequent analyses involving the LOC localizer (translation in the z axis) and the view adaptation experiment (rotation in the x axis).
The time-series images for each brain volume in each participant were analyzed for stimulus category and/or experimental condition differences in a fixed-factor general linear model (GLM). The GLM was computed on the z-normalized raw signal in each voxel. Each of the categories/conditions was defined as a separate predictor and modeled with a box-car function, which was convolved with a canonical hemodynamic response to accommodate the delay in the BOLD response. The time-series images were then spatially normalized into Talairach space, an approach that has been validated in previous developmental studies (Burgund et al., 2002) before being analyzed for age group differences.
LOC Localizer Task
The LOC localizer task allowed us to identify independently the LOC in each hemisphere of each individual participant. The LOC localizer included 10 blocks of common objects (e.g., clock, airplane, binoculars, etc.) and 10 blocks of scrambled objects, the order of which was randomized for each participant. The color images of 120 objects were downloaded from the Internet and were all sized to 300 pixels × 300 pixels (8° of visual angle). Scrambled images were created by dividing the object images into a 25 × 25 grid and then randomly scrambling the grids. Each block consisted of 12 trials, and in each trial, an image of an object or scrambled object was shown for 800 msec followed by 200 msec of a fixation cross. The task began with a 12-sec block of fixation, followed by a 12-sec block of abstract pattern stimuli, which was excluded from the statistical analyses, and ended with a 15-sec block of fixation. In each block, 2 of the 12 trials contained a large red circle around the object or around the scrambled image. Participants were asked to press a button, on a customized glove button-box, whenever the red circle appeared (two trials per block). This procedure was designed to engage attention to the center of the screen by all participants across age. The LOC localizer was always run before the adaptation experiments.
Size and View Adaptation Experiments
Both adaptation experiments were modeled after that used by Avidan and colleagues (2005) in which blocks of 12 stimuli were interleaved with blocks of fixation (a central fixation cross). The stimulus blocks lasted 12 sec and the interleaving fixation blocks were 6 sec. Within a stimulus block, each item was presented for 800 msec followed by 200 msec of fixation. The stimuli were offset by 1° visual angle above, below, to the left, or to the right of the center of the screen in a pseudorandom order so that stimuli were not identical even in the same-object-same-size/view conditions. The task began with a 21-sec block of fixation, followed by a 12-sec block of abstract pattern stimuli, which was excluded from the statistical analyses, and ended with a 15-sec block of fixation. As in the Localizer task, we engaged attention across all blocks by instructing participants to indicate via button press whenever a red circle was present in the display. Importantly, the red circle encompassed the entire stimulus with the result that attending to the location of the red circle enhanced perception of the stimulus itself. The Size and View Adaptation experiments were executed in separate runs, with order counterbalanced across participants.
Note that we specifically chose to adopt a block design to test the representational capabilities of LOC in developing populations. Such a design is considered to generate stronger adaptation (compared with an event-related design), and we wanted to ensure that the absence of adaptation, especially in the younger children, could not be attributed to an experimental design that lacked sufficient power (see Scherf et al., 2011).
Size adaptation experiment
The stimuli for the size adaptation experiment included the novel objects (from Experiment 1) that were resized from 4° visual angle to 16° visual angle in 1° increments (i.e., 4°, 5°, 6°, 7°, 8°, 9°, 10°, 11°, 12°, 13°, 14°, 15°, 16° visual angle). There were three conditions presented in separate blocks: same-object-same-size, same-object-different-sizes, and different-objects. In the same-object-same-size block, the same object was shown at the same size 12 times (but in slightly different locations on the screen). In the same-object-different-sizes block, the same object was shown in different sizes 12 times. In the different-objects block, 12 different objects were shown at a range of random sizes. There were five blocks per condition for a total of 15 size blocks.
View adaptation experiment
The view adaptation stimuli included the novel objects (10° visual angle), but they were rotated from −60° to 60° in 10° increments (i.e., −60°, −50°, −40°, −30°, −20°, −10°, 0°, 10°, 20°, 30°, 40°, 50°, 60° rotation). There were four conditions (see Figure 4): same-object-same-view, same-object-different-views (separate view-dependent and view-independent blocks), and different-objects. In the same-object-same-view block, the same object was shown from the same view 12 times (at slightly different positions on the screen). In the same-object-different-views blocks, the same object was shown in different views 12 times, but we presented “view-dependent” versus “view-independent” objects (as classified in Experiment 1) in separate blocks. In the different-objects block, 12 different objects were shown from previously unseen views. There were five blocks per condition (20 blocks total; four object conditions and fixation).
Behavioral Recognition Task
Following the scan, a subset of the participants (10 adults, 11 adolescents, and 7 children) performed a sequential same/different behavioral task (modeled after Experiment 1) on a laptop computer using the same stimuli from the fMRI adaptation experiments. Not all participants were available to participate in the behavioral task either directly after their scan or on a separate day because of scheduling difficulties. This additional data collection was done to obtain independent evidence for the perceptual functions of this group of participants. For the majority of the participants, this task occurred on the same day as the fMRI experiment, but some participants completed the behavioral portion at a subsequent time (within a month of the scan). Participants saw two objects presented sequentially: The first object appeared for 750 msec, followed by a scrambled object mask for 500 msec, and then the second object appeared and remained on the screen until a response was made. Participants were required to decide whether the objects had the same or different identity, despite any changes in size or view. There were 112 size trials (56 same) and 112 view trials (56 same), which were executed in separate blocks, the order of which was counterbalanced across participants. Accuracy and RT scores from the behavioral experiment were used to evaluate potential correlations between behavioral and neural responses (in magnitude) in the LOC during the fMR adaptation experiments.
Group level LOC maps
To identify regions of object selectivity, separate group maps of object-related activation were computed for each age group. The time-series images from each participant within an age group were submitted to a whole-brain voxelwise random effects GLM in which the category (object, scrambled) was a fixed factor and participant was a random factor. Object selectivity was defined by the contrast: objects–scrambled images. Each group-level contrast map was corrected for multiple comparisons (p < .01) separately using a Monte Carlo simulation, which calculates the likelihood to obtain different size clusters of significantly active neighboring voxels sizes. The implemented solution for cluster-level thresholding is based on the approach described by Forman and colleagues (1995) but has been extended and generalized from 2-D to 3-D statistical maps (Goebel, Esposito, & Formisano, 2006). In combination with relaxed single-voxel thresholds, calculated cluster extent thresholds are applied to the statistical map ensuring that a global error probability of p < .05 is met. This approach does not require spatial smoothing. The computation of the minimum cluster threshold is accomplished via MonteCarlo simulation of the random process of image generation, followed by the injection of spatial correlations between neighboring voxels, voxel intensity thresholding, and cluster identification. The product is a minimum cluster size threshold that yields 1% (or less) protection against false-positive detection at the cluster level. For adolescents and adults, this required eight contiguous voxels at a t value ≥ 3.0, and for children, it required eight contiguous voxels at a t value ≥ 3.3. Figure 5 shows that all three groups exhibited strong, consistent bilateral object-related activation in the LOC.
Defining individual LOC ROIs
The right and left LOC was defined separately in each participant individually, using a standard contrast of objects versus scrambled images (e.g., Grill-Spector et al., 1998) from the Localizer task. In each participant, the contrast was computed on the z-transformed raw signal and was corrected for multiple comparisons using the false discovery rate (FDR) procedure (Genovese, Lazar, & Nichols, 2002) with q < .001 for all adults and adolescents and eight children. However, two children had a statistically discernible LOC that was measurable only at q < .01, and for those children, solely for the purpose of defining the LOC, we used a lower threshold criterion. The LOC included the set of contiguous object-selective voxels in the posterior fusiform gyrus and lateral occipitotemporal gyrus. Once these regions were defined, an ROI-based GLM was computed on the time-series data for each participant separately with condition as the fixed factor, which generated betaweights that corresponded to the object and scrambled images categories. The magnitude of selectivity (betaweights), size (number of voxels), and location of the resulting ROIs were compared for age-group differences. Previous studies have verified the feasibility of making direct statistical comparisons in hemodynamic response time courses between children and adults (Kang, Burgund, Lugar, Petersen, & Schlaggar, 2003).
Defining individual EVC ROIs
As a control region, we chose to define EVC in each individual in each hemisphere separately by contrasting activation to scrambled objects > objects, using an FDR threshold of q < .01 for the majority of participants (this would allow us to define areas that are active to any visual stimulus while excluding LOC). When activation could not be identified at q < .01, we decreased the threshold to q < .05 (four children, one adolescent, three adults). EVC was identified in each hemisphere separately in the most ventral axial slice of the occipital pole in which the calcarine sulcus could be visualized. We selected activation posterior to the calcarine sulcus on the axial slice and included contiguous voxels of activation that extended a maximum of 10 mm above or below the axial slice in which the calcarine sulcus was identified. We used the transverse occipital sulcus as the lateral boundary for activation.
ROI-based adaptation analyses
Within each individual participant, an ROI-based GLM was computed on the time-series data from each adaptation experiment separately in the individually defined right and left LOC and EVC ROIs, with condition as the fixed factor. This generated a set of betaweights for all conditions for each participant (Size: same–same, same–different, different; View: same–same, same–diff-dependent, same–diff-independent, different). These betaweights were then submitted to mixed model GLMs with the factors of condition and group to determine the presence of any age differences in the fMR adaptation effect.
The group-defined LOC is shown separately for each age group in Figure 5. The center of mass of the LOC in Talairach coordinates were as follows for the different groups: Adults LH: x = −37, y = −65, z = −5; Adults RH: x = 35, y = −63, z = −4; Adolescents LH: x = −37, y = −67, z = −5; Adolescents RH: x = 34, y = −63, z = −5; Children LH: x = −35, y = −65, z = −5; Children RH: x = 31, y = −64, z = −7. We examined whether there were any group differences in the size of the individually defined LOC. The number of voxels in each participant's LOC was compared in an ANOVA with Hemisphere (left vs. right) as a within-subject variable and Group as a between-subject variable. There was no main effect of Hemisphere, F(1, 37) = 0.2, p = .7, no main effect of Group, F(2, 37) = 1.6, p = .2, and no Hemisphere × Group interaction, F(2, 37) = 1.7, p = .2. Therefore, when defined individually, all groups showed LOC activation of comparable size. Any subsequent group differences, therefore, cannot be attributable to a reduced statistical estimate for a ROI in the children compared with the other groups.
To examine the difference in the BOLD signal in LOC in response to viewing objects versus scrambled objects across age, a GLM analysis was performed on the time-series data within each participant's individually defined LOC. This resulted in betaweights for each condition in each ROI. These betaweights were submitted to an omnibus repeated-measures ANOVA with Hemisphere (left vs. right) and Condition (objects vs. scrambled objects) as within-subject variables and Group (adults, adolescents, and children) as a between-subject variable. There was a significant main effect of Condition, F(1, 38) = 352.9, p < .001, with stronger activation when viewing objects compared with scrambled objects. There was also a significant main effect of group, F(2, 38) = 7.3, p < .01. Tukey's HSD post hoc analysis revealed that the magnitude of activation was significantly larger for the adults and adolescents than the children (all p < .05), with no difference between adults and adolescents (p = .8; see Figure 6). Importantly, there were no significant interactions with the group variable: Hemisphere × Group, F(2, 38) = 1.0, p = .4, Condition × Group, F(2, 38) = 1.3, p = .3, Hemisphere × Condition × Group, F(2, 38) = 1.2, p = .3. Therefore, although overall BOLD signals were weaker in children than in adults and adolescents, there were no group differences in the differential sensitivity of LOC to objects versus scrambled objects (i.e., no Group × Condition interaction).
To examine the effect of object size on the magnitude of the BOLD signal, a GLM analysis was performed on the time-series data within each participant's individually defined LOC. This resulted in betaweights for each condition for each person in each ROI. These betaweights were submitted to an omnibus repeated-measures ANOVA with Hemisphere (left vs. right) and Condition (same-object-same-size, same-object-different-sizes, and different-objects) as within-subject variables and Group (adults, adolescents, and children) as a between-subjects variable. As is evident in Figure 7, there was a significant main effect of Condition, F(2, 35) = 30.3, p < .001, with stronger activation for the different-objects condition compared with both same-object conditions. In other words, in all three age groups, the neural response to the two same-object conditions (i.e., same size and different sizes) elicited a more adapted response than did the different objects conditions.
This pattern of results indicates that adults, adolescents, and children all encoded similarly the identity of the objects across variations in size in the LOC in both the right and left hemispheres. There was also a significant main effect of Hemisphere, F(1, 36) = 5.4, p < .05, reflecting larger betaweights (i.e., stronger activation) in the right compared with left hemisphere across all three age groups. This is consistent with earlier findings of right hemispheric asymmetry in visual object representations in adults (Konen, Behrmann, Nishimura, & Kastner, 2011) and extends it to show that children and adolescents evince the same asymmetry.
Critically, there was no main effect of Group, F(2, 36) = 0.0, p = 1.0, and no two-way interactions involving Group [Condition × Group interaction, F(4, 72) = 0.5, p = .7; Hemisphere × Group, F(2, 36) = 0.8, p = .4]. However, the three-way interaction of Hemisphere × Condition × Group was significant, F(4, 72) = 2.65, p < .05. Visual inspection of the betaweights suggested that the three-way interaction arises from hemispheric differences across groups and how these differences might interact with the condition effect. To investigate this possibility, we compared betaweights across hemispheres for each of the three age groups separately. There was a main effect of Condition in all age groups (all ps < .01), but the Hemisphere × Condition interaction was only significant in adolescents, F(2, 30) = 4.9, p < .05. It appears that adolescents showed a greater adaptation effect in the left LOC than in the right LOC, although this pattern was not observed in adults and children.
A separate GLM analysis was performed on the time-series data for the view adaptation experiment within each participant's individually defined LOC. The resulting betaweights were submitted to an omnibus repeated-measures ANOVA with Hemisphere (left vs. right) and Condition (same-object-same-view [SameSame], same-object-different-views-independent [SameDiff_Indep], same-object-different-views-dependent [SameDiff_Dep], different-objects [Different]) as within-subject variables and Group (adults, adolescents, and children) as a between-subject variable (Figure 8). In contrast to the Size adaptation results, there was a significant main effect of Group, F(2, 37) = 4.9, p < .05, reflecting the greater overall activation in adults and adolescents, who did not differ from each other, relative to children. There was also a significant main effect of Hemisphere, F(1, 37) = 5.2, p < .05, reflecting greater overall activation in the right than left hemisphere. Finally, there was a main effect of Condition, F(3, 111) = 10.3, p < .001, with greater activation, on average, in the Different condition. In addition, there was a significant Condition × Group interaction, F(6, 111) = 2.4, p < .05. No other interactions were significant [Hemisphere × Group, F(2, 37) = 0.05, p = 1.0, Hemisphere × Condition, F(3, 111) = 0.9, p = .4, and Hemisphere × Condition × Group, F(6, 111) = 0.7, p = .7].
To further examine the key Condition × Group interaction, separate ANOVAs were run within each age group to evaluate the simple effect of condition in the right and left LOC. In adult right LOC, there was a significant main effect of Condition, F(3, 42) = 3.3, p < .01. Pairwise post hoc comparisons with a Bonferroni correction revealed significantly less activation in the SameSame than in the Different object condition (p < .01), revealing sensitivity to the repetition of object shape. There was also significantly less activation in the same-object-different-views-independent (SameDiff_Indep) condition than in the Different objects (p < .01) and in the same-object-different-views-dependent (SameDiff_Dep) condition than in the Different objects condition, (p < .01). However, there was equivalent suppression of the SameSame and the SameDiff_Dep conditions (p = .7), reflecting adaptation when adults viewed the same object from different views. However, this pattern of results was counterintuitive as we see adaptation (low BOLD) for the view-dependent objects but not for the view-independent objects (view-independent objects should be easier to perceive as the same object).
There may be several reasons for this unexpected finding; aside from the fact that the objects were chosen in Experiment 1 on a different group of individuals who completed the imaging study in Experiment 2, the task performed in the two experiments also differed (necessarily to achieve the BOLD profile). Therefore, view-dependent effects may be task-dependent to a certain extent. We will return to this point below when assessing behavioral performance. Importantly, adults showed adaptation to the same objects shown from different views, revealing some evidence of view invariance in adult LOC. The pattern of results was identical when the analysis was conducted with the signal from the left LOC (see Figure 8B).
In the right LOC of adolescents, the main effect of Condition was significant, F(3, 45) = 5.7, p < .01. Similar to adults, there was significant adaptation as revealed in the difference between the SameSame and the Different objects condition. No other pairwise comparisons were significant (all ps > .08). Therefore, we did not find any conclusive evidence of view invariance in adolescent LOC. The same pattern held in the left LOC although, here, adolescents showed release from adaptation in the SameDiff_Dep condition, similar to the adult response. This last result may reflect the initial stages of a more adult-like response in left LOC.
Among the children, the one-way ANOVA investigating the main effect of Condition was not significant, F(3, 24) = 0.4, p = .8, indicating that there was no differentiation in children's LOC response regardless of condition. The same held true for the left LOC analysis. As a result, we did not find any conclusive evidence of view invariance in children's LOC.
Taken together, the data indicate that adults showed the strongest evidence of view invariance in LOC representations of objects, adolescents exhibited weaker evidence of such invariance, and children did not show any evidence of view invariance in their object representations at all. They did not even evince release from adaptation in the Different objects condition, which suggests that LOC does not have adult-like sensitivity even for object shape representations per se. This pattern of results is also demonstrated in Figure 9 by replotting the adaptation index (AI) by age.
Accuracy and median RTs from a same–different recognition task conducted outside the scanner were compared across the age groups (adults, adolescents, and children) in a one-way ANOVA (see Figure 10). Results showed that there was a significant main effect of Age, ps < .001, on all behavioral measures: size change accuracy, F(2, 25) = 12.0, size change RT, F(2, 25) = 17.8, view change accuracy, F(2, 25) = 11.4, view change RT, F(2, 25) = 15.9. Post hoc analyses with a Bonferroni correction revealed that there were no differences among adults and adolescents, but that in both the size change and view change experiments, children were less accurate and slower to respond than both adolescents and adults.
We also calculated an inverse efficiency score for each participant. The inverse efficiency score (expressed in msec) is equal to the median RT divided by the proportion of correct responses, calculated separately for each condition and each participant. Lower values on this measure indicate better performance (Christie & Klein, 1995; Akhtar & Enns, 1989; Townsend & Ashby, 1983). Inverse efficiency scores were used because they can discount possible speed–accuracy tradeoffs in performance that may be exaggerated in children and because they offer a single behavioral measure to be correlated with brain activity, leading to simpler and potentially more robust data analyses, especially in children.
In analogous fashion, we also calculated an AI of the betaweights for the size and view fMR adaptation experiments using the following formula: (Different − Same)/(Different + Same) (e.g., Dricot, Sorger, Schiltz, Goebel, & Rossion, 2008; Konen & Kastner, 2008). We calculated a different AI for the two “same” conditions in the size experiment (SameSame and SameDiff_Size) and the three “same” conditions in the view experiment (SameSame, SameDiff_Indep, SameDiff_Dep). To determine whether there was any relationship between the cortical response profile and the behavioral performance, we ran a correlation between behavioral inverse efficiency scores and the AI scores from the size and view fMR-a experiments.
First, the results showed that inverse efficiency scores from the size change and view change experiments were highly correlated, r(28) = 0.95, p < .001. There were no significant correlations between behavioral measure and AIs from the size adaptation experiment. However, there was a significant correlation between inverse efficiency scores and the AIs computed during the view adaptation experiment, in the right hemisphere same-object-same-view condition, r(28) = −0.48, p < .01, right hemisphere same-object-different view-dependent condition, r(28) = −0.51, p < .01, left hemisphere same-object-same-view condition, r(28) = −0.51, p < .01, and left hemisphere same-object-different-views-dependent condition, r(28) = −0.51, p < .01 (Figure 11). Because we used inverse efficiency scores, lower values indicate better performance, and therefore, these results reveal that better behavioral performance was correlated with larger fMR adaptation seen in both the left and right hemispheres, but only in the view experiment.
One unexpected finding was that there was greater adaptation to the same-object-different-views-dependent condition than the same-object-different-views-independent condition. In theory, it should be easier to perceive view-independent objects as being the same object—therefore, we expected greater adaptation to view-independent objects. We hypothesize that this unexpected finding may arise from the fact that view-dependence varied significantly among individual participants. Therefore we re-plotted RT as a function of degree of rotation (as in Experiment 1) for the “view-dependent” and “view-independent” objects for each individual participant. We then examined the slope of the regression line. If the view-independent objects were truly perceived to be view-independent, we would expect a slope of zero, and if the view-dependent objects were truly perceived as view-dependent, we would expect a positive slope (because RT increases as a function of degree of rotation). As can be seen from Figure 12, it is not the case that the slope is zero for all participants in the “view-independent” objects trials and positive for the “view-dependent” objects trials. To assess whether this individual variability in view dependence could account for the unexpected finding in the neural data, we correlated the slope values to AI. We did not find any significant correlation in the “view-independent” condition in either left or right LOC, ps > .80; however, the slope of the regression line in the “view-dependent” condition was significantly correlated to AI in left LOC, r(28) = −0.44, p = .02, and approached significance in right LOC, r(28) = −0.36, p = .06. These findings suggest that indeed, the extent to which LOC evinces view invariance is related (with no claim towards causation) to an individual's measured behavioral demonstration of view invariance.
To assess whether size and view adaptation effects are specific to area LOC, we also conducted analogous analyses in a control region of EVC. As shown in Figure 13, for both the size and view adaptation experiments, there were no significant main effects and no significant interactions with age group (all ps > .05). The only significant interaction was a Hemisphere × Condition interaction in the size adaptation experiment, F(2, 42) = 7.01, p < .01. By visual inspection it appears that in left EVC, adults are showing a pattern consistent with size adaptation, but such a pattern was not observed in adolescents or in the right EVC.
The goal of our study was a fine-grained exploration of potential age-related changes in how the human LOC encodes invariant representations of objects. As in previous studies (e.g., Golarai et al., 2007; Scherf et al., 2007), we found that by the age of 5 years, LOC responds preferentially to images of whole versus scrambled objects, and that this ROI is of similar size to that observed for adolescents and adults. This result is consistent with previous findings which have been interpreted to indicate that LOC processing is adult-like at least in terms of defining category-level responses for objects, even in rather young children (Golarai et al., 2007; Scherf et al., 2007).
To go beyond this category-level investigation of developmental changes in the cortical profile of LOC, we incorporated two novel experimental design parameters. First, we deliberately used novel 3-D images, instead of common objects, so that we could hold previous experience with the stimuli constant for all age groups. Moreover, we empirically assessed the degree of viewpoint dependence associated with objects of different geometries, thereby independently characterizing the stimuli as to their viewpoint sensitivity. Second, we used fMR-a to more effectively interrogate the neural coding within the LOC in children and adolescents.
In a developmental neuroimaging study incorporating these features, we found a dissociation between the neural encoding of object size and object viewpoint within LOC: By age 5–10 years, area LOC demonstrates adaptation across changes in size, suggesting that LOC responses are invariant to size variations, but that LOC only demonstrates adaptation across changes in view much later in development. This pattern indicates that LOC is not functionally adult-like, both in terms of whether invariance is achieved across different object properties and in terms of its developmental trajectory. Indeed, this trajectory is somewhat longer than has been considered to date. Critically, it is not that LOC is across-the-board immature but, rather, that viewpoint invariance—typically assumed to be more challenging relative to size invariance—has a more protracted developmental signature. Finally, that we observed no age differences in the responses of EVC suggests that LOC is uniquely linked to object processing and develops into early adulthood.
The finding that the LOC in young children demonstrates size adaptation is consistent with previous research suggesting that size invariance is present in the behavior of infants, perhaps even at birth (Slater, Mattock, & Brown, 1990; Granrud, 1987; Day & McKenzie, 1981). The early emergence of size invariance developmentally is consistent with findings that demonstrate the rapid emergence of size invariance examined at different timescales. For example, one study showed rapid emergence of size invariance in the context of perceptual learning that occurs across many trials (Furmanski & Engel, 2000). In that study, adult observers were trained to correctly identify common objects that were accompanied by backward-masking; although learning did not transfer to a set of untrained objects, learning did transfer to the trained objects shown in different sizes from the training size. Another study using magnetoencephalography examined the emergence of size invariance on the timescale of a single trial and showed that (in adults) activation for size invariance appears earlier than for other transformations such as position invariance (Isik, Meyers, Leibo, & Poggio, 2014). Furthermore, they showed that the peak of this invariance was located at magnetoencephalography sensors in the vicinity of the posterior occipital lobe, consistent with our findings in LOC using fMRI. These studies suggest that size invariance is a key property of object processing that is established rapidly, both in terms of real-time processing and development.
In contrast with the adult-like size adaptation, we did not obtain any evidence for adaptation to viewpoint in children's LOC, yet in adults we see evidence of view adaptation in LOC. Although previous research has suggested that significant changes to object processing mechanisms occur in the first 2 years of life (e.g., Pereira & Smith, 2009), our finding suggest that some aspects of object processing mechanisms continue to improve beyond late childhood. This slow developmental trajectory of a neural correlate for complex object representations is consistent with findings from studies that examine the development of face representations. Many studies have demonstrated that infants and children are notoriously poor at recognizing faces from different viewing angles (e.g., Gliga & Dehaene-Lambertz, 2006; Mondloch et al., 2003), perhaps as a result of immature processing of the spatial relations among facial features (Mondloch, Le Grand, & Maurer, 2002). Indeed, in younger children, similarity judgments about objects appear to rely more on the shape of salient features than on the spatial arrangement of those features (Mash, 2006; Thompson & Markson, 1998) and sensitivity to the spatial arrangement of features continues to improve into adolescence (Scherf, Behrmann, Kimchi, & Luna, 2009; Rentschler et al., 2004). Our current findings suggest that the processing of spatial and relational information may be a critical skill not only for face identification but also for object recognition, more generally.
Although configuration and relational processing are both important aspects of object perception, it is also possible that age-effects are a result of improvements in more general visual abilities associated with form perception. Examples of these more general visual abilities include contour integration (Gunn et al., 2002; Kovacs, Kozma, Feher, & Benedek, 1999) and/or cognitive limitations in the ability to attend to and/or remember multiple features simultaneously (Pereira & Smith, 2009; Uttal et al., 2008). All of these aspects may play a role in how well observers are able to extrapolate between familiar views to fully understand the shape of the object in 3-D. It is also unclear exactly what mechanisms need to be in place to allow children to acquire these skills and whether they may even be taught. A possible extension of this study that would permit an analysis of this issue might require showing children dynamic stimuli in which objects rotate across 360° and then examining their ability to extrapolate to novel views. Correlating this acquisition with LOC competence as well as other general perceptual and cognitive skills might allow us to begin adjudicating between these alternative hypotheses.
The key to successful object recognition is having an internal representation of the object identity that is sensitive to relevant differences among objects but that is tolerant of irrelevant changes. Research in machine vision suggests that different recognition tasks represent a trade-off between specificity and invariance and that this balance is achieved depending on the task demands and previous view-specific samples of the objects in question (Riesenhuber & Poggio, 2000). Thus, object recognition is accomplished by an interactive process of both view-dependent and view-independent processing, with the brain optimizing the relative contribution of the two processes to accomplish the task at hand. The available stimulus information and task demands will both affect the weighting of the two processes. In one well-known study, monkeys' learning of viewpoint was exquisitely sensitive to the view in which the object was trained, but if monkeys were trained with as few as three views of the object 120° apart, such that they could interpolate all 360° views of the object, their recognition was less view-dependent (Logothetis et al., 1994). Therefore, the critical developmental skill may be the efficiency with which the brain extrapolates to new visual angles given the trained view of an object.
Models of face and object recognition suggest that because neurons that have size- and view-invariant responses have been found in macaque IT, invariant recognition is established by a hierarchically organized system with convergent connectivity, capturing the available invariant information at each stage in the hierarchy (Wallis & Rolls, 1996). However, not all objects are created equal and variations in 2-D versus 3-D may pose different challenges for the visual system. Whereas size change is a 2-D variation because a different size can be estimated from just one object view, a 3-D variation, such as 3-D rotation, cannot be estimated accurately from just one view. In computer vision, the success of object recognition despite 3-D variations depends on the size and composition of the training set and how much variability is covered by the training examples (Riesenhuber & Poggio, 2000). By extension, perhaps humans require many years of visual experience to establish a training set with sufficient variability to be able to easily recognize novel objects from novel views. It is the progress in deriving object representations over these years that is elucidated in the current article.
This work was funded by NSF grant SBE-0542013 to the Temporal Dynamics of Learning Center, NSF Science of Learning Center, and National Science Foundation grant BCS0923763 to M. B. The authors thank Dan Elbich and Sara Barth for help with data analysis.
Reprint requests should be sent to Marlene Behrmann, Department of Psychology, Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA 15213-3890, or via e-mail: email@example.com.