Many everyday activities, such as driving on a busy street, require the encoding of distinctive visual objects from crowded scenes. Given resource limitations of our visual system, one solution to this difficult and challenging task is to first select individual objects from a crowded scene (object individuation) and then encode their details (object identification). Using functional magnetic resonance imaging, two distinctive brain mechanisms were recently identified that support these two stages of visual object processing. While the inferior intraparietal sulcus (IPS) selects a fixed number of about four objects via their spatial locations, the superior IPS and the lateral occipital complex (LOC) encode the features of a subset of the selected objects in great detail (object shapes in this case). Thus, the inferior IPS individuates visual objects from a crowded display and the superior IPS and higher visual areas participate in subsequent object identification. Consistent with the prediction of this theory, even when only object shape identity but not its location is task relevant, this study shows that object individuation in the inferior IPS treats four identical objects similarly as four objects that are all different, whereas object shape identification in the superior IPS and the LOC treat four identical objects as a single unique object. These results provide independent confirmation supporting the dissociation between visual object individuation and identification in the brain.
Previous behavioral research and theoretical conceptualizations have argued that there exist two distinctive and sequential stages of visual processing when multiple objects are encoded. Namely, there exists an object individuation stage where objects are selected based on their spatial/temporal information, and an object identification stage where the full object representation becomes available. For example, Sagi and Julesz (1984) noticed that observers were faster in detecting targets among distractors in a visual search task but were considerably slower in subsequent identification of the targets. Kahneman, Treisman, and Gibbs (1992) proposed in their well-known “object file” theory that spatial and temporal information allows an “object file” to be created or assigned (corresponding to object individuation), which can then be filled with object features to allow objects to be identified (corresponding to object identification). Similarly, Pylyshyn (1989, 1994) argued in his FINST theory that there is a preattentive stage of visual processing where a fixed number of about four objects are indexed via their spatial locations. Featural information is only available at a later attentive stage of processing. In infant research, the distinction between object individuation and identification has also been noted and it is argued that the development of object individuation precedes object identification by a few months (e.g., Leslie, Xu, Tremoulet, & Scholl, 1998).
In a recent functional magnetic resonance imaging (fMRI) study, Xu and Chun (2006) discovered possible neural mechanisms that may support object individuation and identification in the brain. The study was motivated by a debate in the visual short-term memory (VSTM) literature regarding what determines VSTM capacity. Although some researchers argued that VSTM is fixed to about four objects (e.g., Luck & Vogel, 1997), others argued that VSTM capacity is flexible and modulated by object complexity (Xu, 2002a, 2002b, 2006; Alvarez & Cavanagh, 2004). Xu and Chun (2006) asked observers to retain variable numbers of object shapes in VSTM and examined fMRI activations in the inferior and the superior intraparietal sulcus (IPS) and the lateral occipital complex (LOC). These three brain areas were chosen as regions of interest (ROIs) because responses in the LOC have been shown to reflect object shape processing and conscious object perception (e.g., Moutoussis & Zeki, 2002; Bar et al., 2001; Hansson, Hendler, Ben Bashat, & Malach, 2001; Kourtzi & Kanwisher, 2000, 2001; Grill-Spector, Kushnir, Hendler, & Malach, 2000; Grill-Spector et al., 1999; Grill-Spector, Kushnir, Edelman, Itzchak, & Malach, 1998; Malach et al., 1995), responses in the superior IPS have been correlated with the number of objects stored in VSTM (Todd & Marois, 2004,1 see also Todd & Marois, 2005; Vogel & Machizawa, 2004), and, finally, responses in the inferior IPS have been linked to attention-related processing (Kourtzi & Kanwisher, 2000; Wojciulik & Kanwisher, 1999).
When objects were presented simultaneously at different spatial locations, responses from all three brain regions increased as display set size increased and then plateaued. Critically, inferior IPS activation plateaued at about set size 4 regardless of object complexity, whereas superior IPS and LOC activations plateaued at about the maximal number of objects held in VSTM as determined by object complexity (e.g., at set size 3 for simple shapes and at set size 2 for complex shapes). When objects were presented sequentially at the same spatial location, while superior IPS and LOC responses still increased and plateaued corresponding to the maximal number of objects held in VSTM, the inferior IPS response did not vary with set size (Xu & Chun, 2006).
These results indicate that VSTM capacity may be determined both by a fixed number of objects and by object complexity. Moreover, relating to previous behavioral literature on object individuation and identification, these fMRI results support the existence of two distinctive stages of processing in the brain when multiple visual objects are encoded. Corresponding to object individuation, the inferior IPS selects a fixed number of about four objects via their spatial locations; and corresponding to object identification, the superior IPS and the LOC encode the shape features of a subset of the selected objects into great detail.
Because object individuation mainly concerns object location and not identity, and object identification mainly concerns object identity, an object individuation–identification account of the results of Xu and Chun (2006) would make the following critical prediction. When identical objects are presented simultaneously at different spatial locations, the brain area involved in object individuation (the inferior IPS) should treat them as multiple entries to the system. Thus, four identical and four different objects should be represented similarly at this stage of processing. In contrast, brain areas involved in representing detailed object features (the superior IPS and the LOC for object shapes) should treat multiple identical objects as a single unique object because the demand to represent the features of multiple identical objects is the same as that of a single unique object. Thus, four identical objects should be represented similarly as one object at this stage of processing. The aim of the present study was to test these predictions.
Jonides, Lacey, and Nee (2005) provided strong evidence arguing that the same brain mechanisms participating in perception are also involved in VSTM information storage. That is, VSTM information storage involves extending the neural representation formed at perception for a prolonged period of time. Thus, although the present study and that of Xu and Chun (2006) used a VSTM paradigm, the results can be used to inform the neural mechanisms underlying visual object perception in general.
Ten paid observers (4 women) were recruited from the Yale University community with informed consent. They were aged between 18 and 35 years, right-handed, had normal or corrected-to-normal visual acuity and normal color vision. The study was approved by the Yale University School of Medicine Human Investigation Committee.
Design and Procedure
In each trial, observers viewed a sample display containing one, four identical, or four different black objects around the central fixation. After a brief blank delay, they judged whether the identity of the centrally presented probe object in the test display matched one of the sample objects (Figure 1). To facilitate object individuation and to prevent grouping when objects appeared next to each other, eight dark squares, marking all possible object locations, were always present in the display. Seven distinctive object shapes, as in Xu and Chun (2006, Experiment 4), were used (see Figures 1 and 2B). All displays subtended 13.7° × 13.7° of visual angle and were presented on a light gray background. A given object subtended maximally 3.1° × 3.1° of visual angle. Each trial lasted 6 sec and consisted of fixation (1000 msec), sample display (200 msec), blank delay (1000 msec), test display/response period (2500 msec), and response feedback (1300 msec) as either a happy face (correct response) or a sad face (incorrect response) at fixation. There were also blank fixation trials in which only a fixation dot was present throughout the 6-sec trial duration. Trial presentation order was pseudorandom and balanced in a run (Xu & Chun, 2006; Todd & Marois, 2004). Each observer was tested with two runs, each containing 15 trials for each display condition and lasting 7 min and 9 sec.
To define the superior IPS ROI, a VSTM color experiment was conducted as in Xu and Chun (2006, see also Todd & Marois, 2004). A given sample display contained either two to four or six colored squares around the center fixation (Figure 2A). The probe color in the test display either matched a color at the same location in the sample display for no-change trials (Figure 2A, top), or it was a color present elsewhere in the sample displays for change trials (Figure 2A, bottom). Seven colors (red, green, blue, cyan, yellow, white, and magenta) were used. As in the main experiment, all displays subtended 13.7° × 13.7° and were presented on a light gray background. Each colored square subtended 2.0° × 2.0°. Each observer was tested with two runs, each containing 12 trials for each display set size and lasting 5 min and 12 sec.
To define the LOC and the inferior IPS ROIs, observers viewed blocks of black object images and blocks of noise images (Figure 2B) as in Xu and Chun (2006). Each object image contained six black shapes, created by the same algorithm used to generate the displays in the main experiment (Figure 2B, top). This procedure ensured that only brain regions involved in processing the type of visual objects used in the VSTM experiment were localized. Each image was presented for 500 msec, followed by a 300-msec blank interval before the next image appeared. To ensure attention on the displays, observers fixated at the center and detected a slight spatial jitter, occurring randomly in one out of every 10 images. Each observer was tested with two runs, each containing 160 black object images and 160 noise images. Each run lasted 4 min and 40 sec. Displays used in this localizer scan had the same spatial extent as those in the main experiment.
Observers lay on their back inside a Siemens Trio 3-T scanner and viewed the back-projected LCD display with a mirror mounted inside the head coil. Stimulus presentation and behavioral response collection were controlled by an Apple Powerbook G4 running MATLAB with Psychtoolbox extensions (Brainard, 1997; Peli, 1997). Standard protocols were followed to acquire the anatomical images. A gradient-echo pulse sequence (TE = 25 msec, flip angle = 90°, matrix = 64 × 64) was used for both the main experiment and the localizer scans, with TRs of 1.5 and 2.0 sec, respectively, for the blocked runs and the event-related runs. Twenty-four 5-mm-thick (3.75 mm × 3.75 mm in-plane, 0 mm skip) axial slices parallel to the AC–PC line were collected.
fMRI data collected were analyzed using BrainVoyager QX (www.brainvoyager.com). Data preprocessing included slice acquisition time correction, 3-D motion correction, linear trend removal, and Talairach space transformation (Talairach & Tournoux, 1988).
A multiple regression analysis was performed separately on each observer on the data acquired in the color VSTM task. The regression coefficient for each set size was weighted by the corresponding behavioral K estimate from that observer for that set size (Todd & Marois, 2004). The superior IPS ROI was defined as the voxels that showed a significant activation in the regression analysis (false discovery rate q < 0.05) and whose Talairach coordinates matched those reported previously (Todd & Marois, 2004). As in our previous study (Xu & Chun, 2006), the LOC and the inferior IPS ROIs were defined as regions in the ventral and lateral occipital cortex and in the inferior IPS, respectively, whose activations were higher for the black objects than for the noise images (false discovery rate q < 0.05).
These ROIs were overlaid onto the data from the main VSTM experiment and time courses from each observer were extracted. As in previous studies (Xu & Chun, 2006; Todd & Marois, 2004; Kourtzi & Kanwisher, 2000), these time courses were converted to percent signal change for each stimulus condition by subtracting the corresponding value for the fixation trials and then dividing by that value. Following prior convention (Xu & Chun, 2006; Todd & Marois, 2004), peak responses were derived by collapsing the time courses of all the conditions and determining the time point of greatest signal amplitude in the averaged response. This was done separately for each observer in each ROI. The resulting peak responses were then averaged across observers.
To evaluate behavioral responses, Cowan's (2001) K formula was used to transform change detection accuracies to VSTM capacity estimates (Ks) as a function of display set size. Ks were: 0.96 (SE of 0.02) for the one, 0.98 (0.01) for the four-identical (counted as one object), and 2.71 (0.21) for the four-different condition. Thus, although performance was at ceiling for the one and the four-identical conditions, only about three objects were retained in the four-different condition. Reaction times (RTs) for the correctly responded trials were: 595 msec (49), 590 msec (45), and 884 msec (55), respectively, for the three conditions. There was no difference between the one and the four-identical condition in neither K [F(1, 9) = 1.98, p > .19] nor RT (F < 1).
Peak fMRI responses from the three ROIs are plotted in Figure 3 (top). In the inferior IPS, response was lower for the one than for either the four-identical [F(1, 9) = 11.52, p < .01] or the four-different condition [F(1, 9) = 17.95, p < .01]. Responses for the latter two did not differ (F < 1). In the LOC, response for the four-different condition was greater than for either the one [F(1, 9) = 16.06, p < .01] or the four-identical condition [F(1, 9) = 13.05, p < .01]. Responses for the latter two did not differ (F < 1). Responses in the superior IPS were similar to those in the LOC, such that response for the four-different condition was greater than for either the one [F(1, 9) = 28.05, p < .001] or the four-identical condition [F(1, 9) = 42.89, p < .001], and responses for the latter two did not differ [F(1, 9) = 1.45, p > .25]. In pairwise comparisons, response difference between the one and the four-identical conditions was greater in the inferior IPS than in either the LOC [F(1, 9) = 25.65, p < .01] or the superior IPS [F(1, 9) = 6.33, p < .05]. Meanwhile, response difference between the two four-object conditions was smaller in the inferior IPS than in either the LOC [F(1, 9) = 7.48, p < .05] or the superior IPS [F(1, 9) = 22.13, p < .01], with this difference being greater in the superior IPS than in the LOC [F(1, 9) = 25.42, p < .01]. Lastly, response difference between the one and the four-different conditions was greater in the superior IPS than in either the inferior IPS [F(1, 9) = 12.25, p < .01] or the LOC [F(1, 9) = 21.59, p < .01].
Because baseline fMRI responses differed among the three ROIs examined, responses were also normalized. This was done by anchoring the averaged responses in each ROI for the one and the four-different conditions to be 0.5 and 1.0, respectively, and then linearly scaling each observer's response in each ROI and in each display condition accordingly. The results are plotted in Figure 3 (bottom). This linear transformation of the data did not change the statistical results for the comparisons carried out within each ROI. For the pairwise interactions between the different ROIs, virtually the same statistical results were obtained with the normalized data as with the raw fMRI data. Specifically, response difference between the one and the four-identical conditions was greater in the inferior IPS than in either the LOC [F(1, 9) = 15.09, p < .01] or the superior IPS [F(1, 9) = 17.07, p < .01]. Meanwhile, response difference between the two four-object conditions was smaller in the inferior IPS than in either the LOC [F(1, 9) = 12.83, p < .01] or the superior IPS [F(1, 9) = 6.14, p < .05], and this response difference did not differ between the latter two ROIs (F < 1).
To examine if there was any processing bias between the left and the right hemispheres, the raw fMRI peak responses for the left and the right LOC, the left and the right inferior IPS, and the left and the right superior IPS were analyzed separately. Similar response patterns were obtained in each hemisphere as those obtained averaged across the two hemispheres. There was no overall effect of hemisphere bias (F < 1) and no interaction between hemisphere and brain region (F < 1) or stimulus condition (F < 1). The three-way interaction of hemisphere, brain region, and stimulus condition was not significant either (F < 1).
Peak fMRI response latencies were also examined. The averaged peak latencies and standard errors for the three ROIs were: LOC = 6.05 sec (0.23), inferior IPS = 5.30 sec (0.30), and superior IPS = 6.05 sec (0.23). Peak latency was marginally significantly shorter in the inferior IPS than in either the LOC [F(1, 9) = 5, p = .052] or the superior IPS [F(1, 9) = 5, p = .052], with no differences between the latter two (F < 1).
Results of this study show that although four identical object shapes are treated similarly as four different objects in the inferior IPS, they are treated as a single unique object in the LOC and the superior IPS. Thus, even in a task in which only object shape identity, but not its location, is task relevant, the inferior IPS represents objects by their locations, whereas the superior IPS and the LOC represent objects by their identities. These findings confirm the predictions of the object individuation–identification theory, and support object individuation in the inferior IPS and object identification in the LOC and the superior IPS (for object shapes in this case).
Because overall task load and task difficulty were well matched between the one and the four-identical conditions (as shown by the matching response accuracy and speed between these two conditions), the difference observed between these two conditions in the inferior IPS was unlikely due to a difference in the overall attentional allocation, especially when such a general attentional effect was absent in the LOC and in the superior IPS.
Raw fMRI response amplitude differences between the four-different and the other two conditions were greater in the superior IPS than in the LOC. A similar response amplitude difference has also been observed in Xu and Chun (2006). This suggests that the superior IPS may be more sensitive than the LOC to the number and the identity of the visual objects present in the display, indicating possibly more robust information representation in the superior IPS than in the LOC. This is consistent with the role of the parietal cortex in general information processing and decision making as suggested by other studies (e.g., Huk & Shadlen, 2005; Toth & Assad, 2002; Shadlen & Newsome, 2001; Platt & Glimcher, 1999). Nevertheless, despite these absolute response amplitude differences, the overall response pattern was similar between the LOC and the superior IPS as shown in the normalized fMRI responses, indicating a similar role that these two brain areas may play during object shape identification.
Because the inferior IPS selects a fixed number of about four visual objects during object individuation, an object individuation–identification theory predicts that lesions to this brain area would reduce an observer's ability to simultaneously select and process multiple visual objects. Consistent with this prediction, in neuropsychological studies, it has long been known that after bilateral parietal–occipital lesions, patients could perceive only a single object when confronted with two or more objects (e.g., Friedman-Hill, Robertson, & Treisman, 1995; Coslett & Saffran, 1991; Balint, 1909). The bilateral parietal–occipital lesions that are necessary to cause these deficits coincide well with the bilateral inferior IPS region examined in this study and in Xu and Chun (2006) (although lesions in patients are usually more extensive). The peculiar visual deficits resulting from bilateral lesions to the inferior IPS thus further confirm the role of this brain area in object individuation. Importantly, these lesion studies highlight the importance of object individuation in visual perception, such that, without it, normal visual perception is severely impaired. Although the task used in the present study only involved object identity but not its location, the inferior IPS still responded to the number of objects present regardless of their identities, further suggesting that object individuation may be obligatory during visual object processing, consistent with the patient findings.
In a separate VSTM study, Xu and Chun (2007) found that grouping between visual elements influenced object individuation in the inferior IPS such that grouped shapes elicited lower fMRI responses than ungrouped shapes in the inferior IPS, even when grouping was task irrelevant. This relative ease of representing grouped shapes allowed more shape information to be passed onto later stages of visual processing such as information storage in the superior IPS when the task became more demanding. This grouping effect observed in the inferior IPS, however, is not restricted to VSTM tasks. When observers passively viewed the displays, inferior IPS response was higher when shapes were disconnected than when shapes were connected, even though shape dispersion was equated between the two conditions (Xu, 2008). This inferior IPS grouping effect provides additional support for the neural dissociation between object individuation and identification. In addition, it provides a good account for why grouped visual elements may be easier to perceive than ungrouped ones after parietal brain lesions (e.g., Friedman-Hill et al., 1995; Coslett & Saffran, 1991; Balint, 1909).
Time Course of Object Individuation and Identification
The object individuation–identification theory predicts that object individuation always precedes object identification. This is because, for a set of objects to be identified, they need to first be selected among other competing objects in the visual display. Thus, object identification can only operate after object selection and individuation have been completed. In behavioral studies, it has been shown that the capacity limitation of object individuation (about four) determines that of object identification. For example, without item grouping, the maximal VSTM capacity is about four items. Similarly, patients with Balint's syndrome had great difficulty perceiving the presence of multiple visual objects. This again shows that object identification depends on the output from object individuation, rather than the two processes being independent of each other.
Consistent with this notion that object individuation precedes object identification, in the latency analysis of the present experiment, fMRI signals were found to rise sooner (marginally significant) in the inferior IPS than in either the LOC or the superior IPS. Because of the poor temporal resolution of the fMRI signal and because fMRI peak latency could be affected by differences in local vascular structures (e.g., the inferior IPS may be closer to a major blood supply than the other two brain regions), fMRI latency differences between different brain regions should be interpreted with caution. Other neurophysiological measures with more precise temporal resolutions, such as single-unit recording, ERP or MEG measures, are needed to verify this result and examine the temporal response properties of the neural substrates underlying object individuation and identification.
Although object individuation occurs before object identification, in real-world object perception, these two stages of processing likely interact with each other. For example, initial object individuation would lead to some coarse object identification, which then feedbacks to object individuation to allow better object selection for individuation, which in turn allows better object identification. Further research is needed to understand how object individuation and identification operate in real-world object perception.
Results of this study and those of Xu and Chun (2006) bring new insights into understanding visual object processing in the brain. Traditionally, it is believed that separate neural mechanisms are involved in localizing objects and in identifying what they are, with such “where” and “what” processing mapped onto distinctive dorsal and ventral visual processing streams, respectively (Ungerleider & Mishkin, 1982). The present results and those of Xu and Chun (2006), however, indicate the involvement of the parietal cortex not only in the “where” but also in the “what” processing of visual objects, indicating that object feature processing may involve neural substrates beyond the ventral visual cortex (see also Konen & Kastner, 2008; Sereno & Maunsell, 1998). In addition, these fMRI results reveal possible functional subdivisions within the parietal cortex, which, up to now, are largely unknown despite the involvement of this brain area in a variety of cognitive tasks (e.g., Gottlieb, 2007; Wojciulik & Kanwisher, 1999).
Together with Xu and Chun (2006), the present results also revive and enrich a theoretical framework that describes the cognitive mechanisms involved in visual object perception when a large number of objects compete for limited processing resources. More than a decade ago, researchers have argued for the existence of two distinctive stages of visual processing involving object individuation and identification (e.g., Leslie et al., 1998; Pylyshyn, 1989, 1994; Kahneman et al., 1992). By mapping object individuation and identification onto distinctive neural mechanisms (this study and Xu & Chun, 2006), it allows this theory to be applied more broadly to visual object processing, such as resolving the debate in VSTM regarding what limits VSTM capacity (Xu & Chun, 2006). These new findings add to the growing evidence suggesting that object individuation followed by object identification are two critical steps in visual perception whenever the processing of multiple visual objects is required. In this regard, object individuation and identification may be keys to understanding visual object representation in both the mind and the brain.
This research was supported by NSF grants 0518138 and 0719975 to Y. X. I thank Marvin M. Chun for his generous support of this project and his valuable comments on early drafts of this article. I also thank Jenika Beck for assistance in fMRI subject recruiting.
Reprint requests should be sent to Yaoda Xu, Vision Sciences Laboratory, Department of Psychology, Harvard University, Cambridge, MA 02138, or via e-mail: firstname.lastname@example.org.
Although the IPS region reported by Todd and Marois (2004) encompassed both the inferior and the superior IPS, the mean Talairach coordinates reported for this brain region were located in the superior IPS.
Now at Harvard University.