An important component of perception, attention, and memory is the structuring of information into subsets (“objects”), which allows some parts to be considered together but kept separate from others. Portions of the posterior parietal lobe respond proportionally to the number of objects in the scope of attention and short-term memory, up to a capacity limit of around four, suggesting they have a role in this important process. This study investigates the relationship of discrete object representation to other parietal functions. Two experiments and two supplementary analyses were conducted to evaluate responsivity in parietal regions to the number of objects, the number of spatial locations, attention switching, and general task difficulty. Using transparent motion, it was found that a posterior and inferior parietal response to multiple objects persists even in the absence of a change in visual extent or the number of spatial locations. In a monitoring task, it was found that attention switching (or task difficulty) and object representation have distinct neural signatures, with the former showing greater recruitment of an anterior and lateral intraparietal sulcus (IPS) region, but the latter in a posterior and lateral region. A dissociation was also seen between selectivity for object load across tasks in the inferior IPS and feature or object-related memory load in the superior IPS.
The structuring of information into discrete parts that can be independently manipulated or combined is a prerequisite for all but the most elementary of cognitive functions. As an example, a vast array of information arrives from our senses, which must be structured, with some parts considered together but kept separate from other parts: To judge the speed and direction of a car, information from many motion sensitive neurons should be combined, but kept separate from those responding to the road surface and street furniture. This structuring into discrete objects1 affects many cognitive operations, modifying what we perceive, such as the direction of motion (in a formal analogue of the car example; Stoner, Albright, & Ramachandran, 1990) and congruent with this, the response of motion direction tuned neurons in the macaque (Stoner & Albright, 1992) and the human (Castelo-Branco et al., 2002). Discrete object formation also affects attention, as it is easier to attend to two features from a single object than to a single feature from two objects (Duncan, 1984). More recently, it has been shown that the capacity of short-term memory in general (Cowan, 2001) or, more specifically, visual short-term memory (VSTM), is determined entirely (Zhang & Luck, 2008; Awh, Barton, & Vogel, 2007; Luck & Vogel, 1997) or in part (Alvarez & Cavanagh, 2004) by a limit in the number of objects rather than their complexity. In tasks where multiple moving items are to be tracked (Pylyshyn & Storm, 1988), it is the number of perceptual objects rather than the number of spatial locations that limits performance (Scholl, Pylyshyn, & Feldman, 2001). Overall, there is broad evidence that multiple discrete objects can be represented simultaneously and that this level of structure has a strong effect on many cognitive processes.
A range of mental functions is affected by structuring into discrete objects, but do these functions share a common cognitive component? Kahneman, Treisman, and Gibbs (1992) proposed the influential idea of “object files,” representations that could play a common role across many different tasks, allowing the tracking of objects across space and time, and for identical items to be represented simultaneously. Many other psychological models also emphasize the requirement for discrete object representations, such as Trick and Pylyshyn's (1994) “fingers of instantiation,” Cavanagh and Alvarez's (2005) targets of multifocal attention, Cowan's (2001) “chunks,” or the “tokens” of Chun (1997) and Kanwisher (1987). These theoretical constructs are supported by behavioral evidence consistent with a common cognitive component responsible for object representation across disparate tasks, from the similarity in the units of the capacity limit (i.e., objects not features) and from the similar numeric value to the limit (around four) across a number of different tasks (Irwin & Andrews, 1996), including visual (Sperling, 1960) and auditory (Cowan, 2001; Darwin, Turvey, & Crowder, 1972) short-term memory, and multiple-object tracking (Scholl et al., 2001; Pylyshyn & Storm, 1988).
Is there any evidence from cognitive neuroscience that a particular brain region may perform this role? Cusack (2005) proposed that a region in posterior parietal cortex (PPC) might play a role in the representation of object files. There has long been a suggestion that some region in the parietal lobe is involved in the representation of discrete objects, following the discovery that simultanagnosia, which makes it difficult to attend to multiple objects, can occur following bilateral lesions (Dehaene & Cohen, 1994; Balint, 1909). Even when only one object is perceived, it can be an incorrect combination of the features in the display (Friedman-Hill, Robertson, & Treisman, 1995). More recently, neuroimaging studies have implicated particular regions in PPC in the representation of objects in short-term memory. Todd and Marois (2004) used a task requiring short-term memory for stimuli comprising a number of colored discs. It was found that during the maintenance period, the BOLD fMRI signal in PPC tracked the number of items in VSTM, increasing with the number of items in the display up to around four, but then leveling off as VSTM capacity was reached. In a later report, it was shown that across participants, PPC activity correlated with individual differences in VSTM capacity (Todd & Marois, 2005). Both of these VSTM findings have been supported by an analogous paradigm using EEG (Vogel & Machizawa, 2004). In a refinement of this proposal, Xu (2007) and Xu and Chun (2006) have distinguished using fMRI between a more inferior intraparietal sulcus (IPS) region that responds with the number of objects in VSTM, and a more superior IPS region that responds with the total quantity of information (number of features) in VSTM. This was further supported by Xu and Chun (2007) in which the perceptual grouping of the items to be remembered was manipulated.
Mitchell and Cusack (2008) have generalized the involvement of PPC beyond VSTM, finding that for tasks with similar stimuli to Todd and Marois (2004), but without a working memory requirement, activation also followed the number of objects presented, up to a limit of around four items. A number of other tasks that recruit PPC in the absence of working memory requirements might also have done so because of their requirement for the maintenance of multiple objects. These include multiple-object tracking, which is disrupted following PPC lesions (Battelli et al., 2001) and activated in fMRI (Culham, Cavanagh, & Kanwisher, 2001; Jovicich et al., 2001; Culham et al., 1998); the perception of discrete more than continuous stimuli (Castelli, Glaser, & Butterworth, 2006); the binding of visual features (Shafritz, Gore, & Marois, 2002); and the perception of multiple auditory streams (Cusack, 2005).
In summary, there is substantial evidence that portions of the posterior parietal lobe respond with the number of discrete objects in VSTM or perceptual tasks. But, how does this relate to other functions of the parietal lobe? The aim of the current study is to compare the neural machinery of this function to that of attention switching, task difficulty, and spatial representation.
It is well established that regions in PPC are activated by attention switching, and here we study the extent to which this dissociates from discrete object representation. We are careful to distinguish attention switching and object number, as in many of the paradigms discussed above they were confounded. For example, in VSTM tasks, it has been proposed that maintenance might proceed through rehearsal in which individual items are successively brought into the focus of attention (Awh, Vogel, & Oh, 2006; Smith, Jonides, Marshuetz, & Koeppe, 1998; Smyth & Scholey, 1994). In the analogous case of phonological rehearsal, it is argued that each object decays at a fixed rate (Baddeley, 1986), thus requiring a rehearsal rate that increases proportionally to the number of items. For rehearsal in VSTM, if the rate of switches also increased proportionally to the number of objects being remembered, neural markers of attention switching might be confused with those of a short-term memory. A similar confusion might occur during multiple-object tracking, as it has been suggested that attention may be switched between the targets to be tracked (Oksama & Hyona, 2004; but see Pylyshyn & Storm, 1988).
The “multiple demands” (MD) fronto-parietal network, which includes a core component in PPC, is recruited by a broad range of tasks, whether they involve working memory, selective attention, or simple perception (Duncan, 2006). In the current study, we contrast the response of parietal regions during task performance and discrete object representation. We dissociate task difficulty from object numerosity. In many studies discussed earlier in the Introduction (e.g., Shafritz et al., 2002), the conditions with a greater number of objects were also more difficult, and so some of the PPC activity could reflect greater activity in generalized processing regions rather than any specific role in the representation of objects.
It has long been argued that parietal cortex has a role in representing space (e.g., Colby & Goldberg, 1999) and PPC more specifically (Silver, Ress, & Heeger, 2005). Although in some studies (Xu & Chun, 2007; Cusack, 2005) the representation of discrete objects persisted in the absence of a difference in the number of spatial locations, the opposite has also been shown (Xu & Chun, 2006, Experiment 4), and in many of the other studies discussed above, the number of objects and spatial locations were confounded, and some or all of the parietal activity observed might be due to the representation of space. There is behavioral evidence against a strong role for space in VSTM (Lee & Chun, 2001), although this does not rule it out as a driving factor in the parietal response.
The aim of Experiment 1 is to identify PPC discrete object representations, while controlling for the spatial extent of the stimuli, and to contrast this to the MD network recruited more generally by task demands. Experiments 2A and 2B then use similar stimuli to compare discrete object representation, attention switching, and task difficulty within a single paradigm. The generality of the results is tested in an extension of the analysis of previously published fMRI data, in Experiment 3 to auditory stimuli, and in Experiment 4 to visual stimuli distributed in space with VSTM and perceptual tasks.
There is conflicting evidence on the extent to which regions in PPC are driven by the number of spatial locations versus by the number of discrete objects per se (Xu & Chun, 2006, Experiment 4; Xu & Chun, 2007; Cusack, 2005). The aim of this first experiment was to contrast the perception of two objects with one object, in the absence of a difference in the number of spatial locations, and to evaluate a stimulus type that might be developed in the following experiment to distinguish object representation, attention switching, and task difficulty. To do this, we exploited the phenomenon of transparent motion, which has been shown to be an effective substrate for the deployment of object-based attention (Rodriguez, Valdes-Sosa, & Freiwald, 2002; Valdes-Sosa, Cobo, & Pinilla, 1998) and allows matching of spatial location. On some trials, we presented one surface, and on other trials, two overlapping surfaces. The size and the position of the surfaces were the same across conditions.
Stimuli and Task
On each trial, a set of 1000 white dots was presented on a black background for 1.2 sec, viewed through a circular window of diameter 500 pixels (approximately 11° of visual angle). There were two conditions, schematically illustrated in Figure 1. In Condition 1, all of the dots would oscillate along a single axis chosen at a random angle. The temporal frequency of the oscillatory motion was 4 Hz, and the peak-to-peak amplitude 24 pixels. In Condition 2, half of the dots would move along one axis in a similar manner to Condition 1, and the other half of the dots would move along the orthogonal axis. In this transparent motion condition, two surfaces are clearly perceived, but they are entirely overlapping. The two conditions differ in the number of objects, but not in the number of spatial locations. After each 1.2-sec stimulus, there was a 1.5-sec intertrial interval. To ensure participant vigilance, on each trial, the participant was asked to press one of two buttons with the right hand to indicate whether one or two surfaces were perceived. The button mapping was counterbalanced across subjects. Two blocks of 192 trials were presented. One third of the trials were one surface (Condition 1), one third two surfaces (Condition 2), and one third null trials in which just a fixation cross was presented.
Data were acquired from 16 subjects using a 3-T Bruker Medspec scanner with a head gradient set at the Wolfson Brain Imaging Centre, Cambridge, UK. Two 8′30″ blocks of 204 EPI acquisitions (TA = 1.1 sec; TR = 2.5 sec) were acquired. The first eight scans were discarded to allow for T1 equilibrium. Each volume was 21 slices (Gaussian profile, 4 mm thickness, 1 mm gap) acquired in an ascending interleaved order each of matrix size 64 × 64, giving a resolution of 3.75 × 3.75 mm. The TE was 30 msec and flip angle was 65°. We also acquired fieldmaps using a phase contrast sequence (complex subtraction of a pair GE acquisitions, TE = 7 and 16.1 msec, matrix size 64 × 256 × 64, resolution 4 × 1 × 4 mm), and a whole-brain T1-weighted structural using an SPGR sequence giving approximately 1 mm3 resolution.
The data were analyzed using SPM2 with the automatic analysis (aa) library (http://imaging.mrc-cbu.cam.ac.uk/imaging/AutomaticAnalysisIntroduction) for scripting. Sinc interpolation through time was used to correct for the acquisition of different brain slices at different times (SPM's slice timing correction). Bulk motion of the head through the time series was then estimated. The pattern of inhomogeneity in the magnetic field measured using a fieldmap was used to correct for distortion in the EPIs (Cusack, Brett, & Osswald, 2003). The EPI mean was then coregistered with the structural using a mutual information cost function. Nonlinear warping to MNI space was then accomplished using SPM normalization on the structural image. A single spatial reslicing stage with sinc interpolation applied the transformations of motion correction, undistortion, and normalization to the EPI images. The normalized images were smoothed using a Gaussian kernel of FWHM 10 mm.
Each subject's data were analyzed using a multiple regression model as specified by an SPM design matrix. The model comprised three event-related columns (predictors): one for the fixation trials, one for the trials with a single surface, and one for the trials with two surfaces. Each event began at the onset of the stimuli and lasted for a single scan (2.5 sec). Boxcar functions representing the time course of these events were convolved with the canonical hemodynamic response to form the predictors for the BOLD fMRI data at each voxel. The six parameters derived from motion correction were also included as regressors to partial out the first-order effects of distortion during motion.
Estimation (fitting) of the regression model gave for each voxel a regression coefficient (β) for each of the predictors. The contributions of the events in these predictors to the BOLD signal were then probed with two orthogonal contrasts. One was the main effect of the stimulus and task (βone surface + βtwo surfaces − 2βfixation), and the other the effect of the number of surfaces (βtwo surfaces − βone surface). For each contrast at the first level, group random-effects analyses were then calculated using parametric statistics and reported at a corrected (FDR p < .05) threshold. To visualize the relative strength of each of the contrasts, we calculated the ratio of the t values of the two contrasts (e.g., tnumber of surfaces/tmain effect of stimulus task) for the voxels that survived corrected thresholds in either contrast.
To quantify the effects of task performance and of the number of surfaces across different parietal areas, we conducted a region-of-interest (ROI) analysis. All ROIs were spheres of radius 1 cm, constructed using MarsBar for SPM (http://marsbar.sourceforge.net). Coordinates are given in MNI152 space, converted from Talairach space where necessary using tal2mni http://imaging.mrc-cbu.cam.ac.uk/downloads/MNI2tal/tal2mni.m). Two-tailed statistics were used to calculate p values in t tests. In none of the experiments do we have lateralized stimuli, and we did not expect (or see) lateralized responses, thus all ROI analyses are collapsed across hemispheres. As summarized in the Introduction, a region close to the parietal/occipital boundary in the inferior IPS has been implicated in object representations (Xu, 2007; Xu & Chun, 2006, 2007; Cusack, 2005). This region has also been found to topographically map attention in a paradigm similar to retinotopic mapping, but with attention shifted around the display rather than with a changing stimulus (Silver et al., 2005). To test whether regions previously shown to be topographic are activated to a greater degree by multiple objects even in the absence of any manipulation of the spatial distribution, the peak coordinates in the inferior IPS from Silver et al. (2005) were selected as the center of two ROIs (IPS1: ±23, −80, 38).
To characterize activity in more general, non-task-specific, parietal regions (the MD network), we extended the meta-analysis of Duncan and Owen (2000) to the parietal lobe and used a kernel method to summarize the peak activation coordinates. The peaks of the activations from the studies described by Duncan and Owen appeared symmetric, and so those in the left hemisphere were reflected onto the right. A single point was placed at each coordinate, and the resulting image was smoothed (15 mm FWHM) and then thresholded at 3.5 times the height of the smoothed peak from a single point. The final thresholded regions were then mirrored onto both hemispheres. In addition to the lateral and medial frontal regions reported by Duncan and Owen, inclusion of parietal activations revealed clear foci around the IPS at (±37, −53, 40). As shown in Figure 2, these are more lateral and anterior to the inferior IPS ones, and were used to define another pair of ROIs.
In addition to these key parietal ROIs, we also investigated the response in visual regions (the lateral occipital complex [LOC] taken from Xu & Chun, 2006: −44, −71, 5 and 42, −69, 0) and frontal executive control regions taken from the MD kernel analysis (dorsolateral prefrontal cortex [DLPFC]: ±42, 24, 25). Finally, for direct comparison with Xu and Chun's (2006) studies, we summarized the response in their “inferior IPS” region on the occipital/parietal boundary, and close to Silver's IPS1 (−21, −89, 24 and 26, −85, 28) and the “superior IPS” regions, which are closest to the MD regions (−21, −70, 42 and 23, −56, 46).
The task was easy and performance was excellent (mean trials correct was 97% with a standard error across subjects of 0.7%). Neuroimaging revealed that presentation of the stimulus and performance of the task, when contrasted with fixation, recruited a broad range of regions, including occipital visual areas, left motor cortex, and regions in the fronto-parietal MD network (Figure 3A). The contrast of two surfaces minus one surface revealed several regions in common with this (Figure 3B). Both contrasts activated regions in the posterior parietal lobe to some extent. To visualize the relative strength of the response to the contrast between two surfaces and one surface and the contrast of task versus fixation, we calculated the ratio of the t values of these two contrasts (Figure 3C) for all voxels that were significant in either of the whole-brain corrected contrasts.
The response in the parietal lobe and other regions was quantified using ROI analyses (Figure 4A). There was a significant effect of the number of surfaces in the inferior IPS but not in the MD-IPS [Silver-IPS1: t(15) = 2.70, p < .02; MD: t(15) = 0.80, ns]. The opposite pattern was seen for the general task demand, with no evidence to reject the null hypothesis in the inferior IPS, but a significant effect in MD-IPS [Silver-IPS1: t(15) = 1.42, ns; MD-IPS: t(15) = 3.14, p < .01]. A repeated-measures ANOVA confirmed this Task × Region interaction [F(1, 15) = 8.23, p < .02].
As shown in Figure 5A and B, the LOC and inferior IPS responded during performance of the task [t(15) = 7.82, p < .001 and t(15) = 5.49, p < .001, respectively] and showed greater recruitment by two surfaces than one [t(15) = 5.09, p < .001; t(15) = 4.47, p < .001]. In contrast, DLPFC was not significantly activated by task [t(15) = 1.65, ns] or by the number of surfaces [t(15) = 0.32, ns]. Finally, the superior IPS region showed a pattern a little like MD-IPS, being significantly activated by performance of any task [t(15) = 2.89, p < .02] but not by the number of surfaces [t(15) = 0.72, ns].
Regions in PPC responded differentially. The more inferior IPS region (Silver-IPS1) showed selectivity for two surfaces rather than one, whereas the MD-IPS region did not. Conversely, the MD-IPS region responded to the performance of the task in general more strongly than the inferior region. This is consistent with the presence of object-related processing in the inferior IPS (Xu & Chun, 2006; Cusack, 2005), and distinct general processing resources in the MD-IPS (Duncan, 2006). That the object effect was seen even though the two surfaces did not differ in spatial location shows that a difference in the number of locations or spatial extent is not required, as found in the auditory study of Cusack (2005). This is not incompatible with a topographic representation in these regions: They might be topographically mapped (cf. Swisher, Halko, Merabet, McMains, & Somers, 2007; Silver et al., 2005), but over and above this, respond with the number of objects in the display.
Note that this effect of object representation in the absence of spatial differences is at odds with the conclusions drawn from Xu and Chun's (2006) Experiment 4, which used memory for sequentially presented shapes at the same location or at different locations, and concluded that objects at different spatial locations are required to modulate the inferior IPS. However, it is consistent with Xu and Chun (2007), who found that differences in grouping modulate inferior IPS activity without differences in the number of relevant spatial locations. At this stage, it is not clear what the critical aspect is of the difference between Xu and Chun's (2006) Experiment 4 and our experiment.
There was a low-level visual difference in the degree of motion coherence between the two and one surface conditions, which probably directly contributed to the contrast in visual regions. It is also possible that some of these differences in visual regions may have been mediated by the number of objects. It has been shown before that the response in MT can be modified by the number of objects in the presence of only small visual differences (Stoner & Albright, 1992). Conversely, it is also possible, although less likely, that regions in the parietal lobe were modulated directly by the visual differences (see General Discussion; also discussed in this section is the possible role of eye movements).
This experiment and the next investigate the pattern of recruitment of the parietal lobe by attention switching and its relationship to discrete object representations. In previous studies and Experiment 1 of the current study, when a greater number of objects were perceived, there may have been increased endogenous attention switching between them. This might be true even for the maintenance period of VSTM tasks (e.g., Todd & Marois, 2004), as attention switching between memorized items has been suggested as a rehearsal mechanism (Smyth, 1996; Smyth & Scholey, 1994). This is one explanation for the link reported between selective attention and VSTM capacity (Vogel, McCollough, & Machizawa, 2005). The parietal activity observed in Experiment 1 in response to multiple objects might reflect this switching and so here we isolate activity due to attention switching from that due to the number of objects. In Experiment 2A, we evaluate a paradigm behaviorally, and in Experiment 2B, we describe an fMRI experiment.
Ten participants were tested in a quiet room. Three 10-min blocks of 40 trials (each 15 sec in duration) were presented. To further investigate the role of object number without differences in spatial distribution, the stimuli were again transparent motion dot surfaces, but presented for much longer (13 sec) than in Experiment 1. The number of objects and requirement for switching were independently manipulated. To manipulate attention, in this experiment, participants were given the task of detecting targets. As in Experiment 1, surfaces were formed by coherently oscillating a set of dots: To generate one surface, all of the dots were oscillated along a single axis, and to generate two, half of the dots were oscillated along one axis and the other half in the perpendicular direction. The targets to be detected were occasional “ripples” in a surface, formed by a ripple of motion with a sinusoidal pattern across space in the direction perpendicular to the surface's primary axis of movement. This is illustrated in Figure 6. These targets were chosen to encourage attention to the whole surface rather than a small part of it, and to encourage selective attention to a surface, as the subtle ripple motion would have been in the same direction as the primary axis of movement of the other surface. In this experiment, there was an explicit examination of the effect of task difficulty, with two different types of target, one obvious (“easy”) and one more subtle (“hard”), that differed in the amplitude of the ripple motion.
Stimuli were presented on a monitor of resolution 1024 × 768 with a color depth of 32 bits and a refresh rate of 75 Hz. The total number of points on each surface was 350. They were viewed through a rectangular window of size 240 × 240 pixels, an approximate visual angle of 8°. To allow cueing to a particular surface when two were present, they were given two different colors (light blue and yellow). The 13 sec duration of each trial gave sufficient time for attention switching between the surfaces when required. The oscillation period of the two surfaces was different (41 frames and 57 frames) to avoid temporal coherence that might encourage grouping. The orientation of one axis was randomly chosen, and the other (if present) was perpendicular to this. The initial phase of oscillation was random. Which color was faster was randomly chosen on each trial. Before each trial, task instructions were displayed in the center of the screen. In all types of trial, there were three tone pips (duration 100 msec with a cosine squared amplitude envelope, frequency 440 Hz, approximately 75 dB SPL). The timing of these was chosen from a uniform random distribution, with the constraint that no tone could come closer to another than 2 sec. Either two or four targets were presented on each trial. The target timing was also chosen from a uniform random distribution, but constrained so that they could never be closer than 1 sec apart. The easy target had a maximum displacement of 6 pixels, and the hard, 3 pixels. The ripple motion had parabolic amplitude across time, smoothly appearing and then disappearing. It had a sinusoidal shape in space, with a spatial frequency of 1.5 cycles across the rectangle.
The main purpose of this behavioral pilot was to assess the stimuli, and to confirm that the task could not be performed by attending globally to both surfaces simultaneously, and then filtering targets at a postperceptual level. We independently manipulated the requirement for switching, the number of surfaces, and task difficulty. There were five conditions: C1e—one surface, easy targets; C1h—one surface, hard targets; C2f—two surfaces, focus attention on one throughout; C2s—two surfaces, switch attention on tone (three per trial); C2b—pay attention to both surfaces throughout. In all of the two surface conditions, the targets were of the “easy” depth. In all but condition C2s, participants were asked to ignore the tone pips. Condition C2b was introduced to examine the extent to which it was possible to attend to both surfaces simultaneously. If the task can be done by monitoring some global feature and selection is not required, then we will not be able to measure attention switching.
Results and Discussion
Behavioral performance is shown in Figure 7A. The proportion of hits was 88%, 66%, 75%, 68%, and 43% for conditions C1e, C1h, C2f, C2s, and C2b, respectively. Signal detection theory was used to correct for response bias. A repeated-measures ANOVA on d′ scores revealed a main effect of condition [F(4, 36) = 35.4, p < .001]. Planned comparisons showed first that the difficulty manipulation was effective, with lower accuracy in C1h than C1e [t(10) = 4.65, p < .001]. Second, participants were reasonably good at doing the task in the presence of another surface, with performance similar between C1h and when attention had to be focused on one throughout (C2f) or when it was switched by tones (C2s). Attention switching was performed reasonably, although performance was a little worse in C2s than in C2f [t(10) = 2.59, p < .05]. Third, participants did not seem to be able to monitor for targets in both (C2b), as they were much worse than in C2f [t(10) = 9.00, p < .001] or C2s [t(10) = 6.21, p < .001]. The number of targets identified in C2b was approximately similar to the number that could be identified monitoring just a single surface alone (C1e), even though C2b contained twice as many potential targets.
Our transparent dot-motion surfaces were an effective substrate for selective attention, such as the paradigm of Rodriguez and Valdes-Sosa (2006) and Valdes-Sosa et al. (2000). That participants did poorly at monitoring for targets on both surfaces simultaneously suggests that, in this paradigm, attentional selection was necessary.
The paradigm used in Experiment 2A separately manipulated the number of objects and the degree of attention switching required. In this fMRI experiment, we use this paradigm to distinguish the neural components of each cognitive process. There were four conditions, corresponding to C1e, C1h, C2f, and C2s. C2b was omitted, as the behavioral results of Experiment 2A suggested this was similar to C2s, except with attention switching of unknown frequency and timing.
Scanning was performed on a 3-T Bruker Medspec machine at the Wolfson Brain Imaging Centre. Sixteen participants were tested. Two blocks of 745 EPI volumes were acquired, each comprising 21 slices in an interleaved ascending order with matrix size 64 × 64, a TE of 27 msec, and a TA/TR of 1.1 sec. An SPGR sequence was used to acquire a T1-weighted anatomical. As with Experiment 1, SPM2 and the automatic analysis (aa) scripting system were used for preprocessing. The first eight dummy scans of each EPI session were discarded. The data were slice timing corrected, motion corrected, and then the mean coregistered to the T1 anatomical using the mutual-information cost function. The mapping from each subject's space to a standard (MNI) template space was derived from the T1 anatomical using normalization in SPM. This mapping was applied to the EPIs and the images were spatially smoothed using a spherical 10-mm FWHM Gaussian filter. SPM5 was used for statistical modeling. A regression model, as specified with an SPM design matrix, was used to partition components of the BOLD response. There were 11 columns of interest for each session. At the start of each 13-sec trial, we hypothesized that there might be distinct cognitive processes, such as in the selection of the single relevant surface in the two surface conditions. Given the extended time course of the trials, it is possible to distinguish the transient initial response from the sustained one, and so each of the four conditions was modeled with two predictors: delta functions at onset, and boxcars for the duration of each block. Each of these predictors was convolved with the hemodynamic response. Furthermore, three predictors were used to model each button press: a delta function convolved with the canonical hemodynamic response, its temporal derivative (to allow the modeling of slightly earlier or later responses), and a dispersion term (to allow the modeling of slightly broader or narrower responses). Movement parameters and session means were also added to the model to remove effects of no interest.
Estimation (fitting) of the model gave for each voxel a regression coefficient (β) for each of the predictors in the model. To reveal the effect of attention switching, we calculated the contrast βC2s − βC2f both for the onset and block regressors. These two conditions are identical in stimuli and equivalent in the task being performed, number of targets, and type of stimuli attended to. The only difference is that in C2s, attention switching is required each time a tone is heard, but in C2f, the tones were irrelevant. A contrast revealing the effect of the number of objects in the absence of attention switching [βC2f − (βC1e + βC1h)/2] was also calculated for onset and block regressors. Finally, an effect of task difficulty (βC1h − βC1e) was calculated for the block regressors. Group random-effects analyses were calculated using parametric statistics and reported at corrected (FDR p < .05) thresholds. An ROI analysis was conducted using the same regions as in Experiment 1.
Behavioral performance is shown in Figure 7B. A repeated-measures ANOVA revealed a main effect of condition [F(1, 15) = 46.3, p < .001]. As in Experiment 2A, the difficulty manipulation with one surface was effective [C1e vs. C1h: t(15) = 11.1, p < .001]. Performance on the two surface conditions was intermediate between the two one-surface conditions. There was no significant detriment in performance when attention was switched between the two surfaces [C2f vs. C2s: t(15) = 1.00, ns].
Attention switching (βC2s − βC2f) led to a sustained response over a broad network including frontal and parietal regions (Figure 8A). However, over and above this sustained response, there was no additional response at the onset of each 13-sec block. Two surfaces rather than one [βC2f − (βC1e + βC1h)/2] led to more posterior activation, in parietal and occipital regions, and a different pattern of temporal evolution. A sustained component (Figure 8C) was supplemented by an additional onset component (Figure 8B). Thus, attention switching and discrete object representation recruit the parietal lobe with different time courses. They also appear to have distinct spatial patterns. To investigate whether they really are distinct patterns or just the same activation with different strengths, we calculated the ratio of the t statistics between the onset or sustained 2–1 surfaces and the best measure of attention switching (switching-focused, sustained). These are shown in Figure 8D and E, and demonstrate visually that the patterns are distinct, with the surface contrast more strongly activating occipital and parietal regions relative to attention switching. To confirm these differences statistically, an ROI analysis was conducted.
The ROI results from the key parietal regions are shown in Figure 4B. There was an effect of the number of objects at onset both in the more inferior region [Silver-IPS1: t(15) = 5.15, p < .001] and in the MD region [t(15) = 3.30, p < .005], and also an effect of attention switching in both regions [Silver-IPS1: t(15) = 7.06, p < .001; MD: t(15) = 7.17, p < .001]. A repeated-measures ANOVA, with two factors (region, contrast), each with two levels, showed that the inferior region was relatively more selective for the object contrast than the MD region [Region × Condition interaction: F(1, 15) = 15.86, p < .001]. In an ROI analysis, there was also a sustained effect of two minus one surfaces throughout the block, in both regions [Silver-IPS1: t(15) = 5.34, p < .001; MD-IPS: t(15) = 2.65, p < .02].
Switching relative to focused attention also activated occipital, frontal, and other parietal regions as shown in Figure 5A. These were all significant [Xu-LOC: t(15) = 3.88, p < .002; Xu-infIPS: t(15) = 3.60, p < .005; Xu-supIPS: t(15) = 4.67, p < .001; DLPFC: t(15) = 8.43, p < .001]. Similarly, the difference between two surfaces and one surface (Figure 5B) recruited all but the frontal region [t(15) = 6.23, p < .001; Xu-infIPS: t(15) = 3.88, p < .002; Xu-supIPS: t(15) = 4.68, p < .001; DLPFC: t(15) = 1.70, p = .11].
To confirm that the effect of the number of objects was not one of difficulty, we compared Condition 2f with Condition 1h, which had a very similar level of performance (see Figure 7B). Even when contrasting these, there was greater activity in 2f than in 1h, in both the Silver-IPS1 [t(15) = 5.99, p < .001] and MD-IPS [t(15) = 5.05, p < .001], with significantly greater activity in the former [F(1, 15) = 11.0, p < .005]. Furthermore, nothing was significant for either onset or sustained regressors when comparing Condition 1h to 1e.
Dissociation was found between the response to attention switching (Condition 2s − Condition 2f) and to the number of objects (Condition 2f − Conditions 1e/1h). This dissociation was spatial, in that DLPFC and MD-IPS were most recruited by attention switching and Silver-IPS1 most recruited by the number objects. It was also temporal, with the object difference more enhanced at onset, unlike the attention switching, which was best explained by just a regressor persisting throughout the block. That the attention-switching activity persisted throughout the block might be expected, as the tone pips signaling switches were randomly scattered throughout the block.
There are a number of possible reasons why the object activity was stronger at the start of each block. First, it might be an effect of attention, with the activity reflecting the number of objects that are attended rather than just present in the display, which would reduce as selection takes place. Second, it might be that there is some activity associated with the process of generating or selecting among discrete object representations, over and above the activity from maintaining them. Third, it might be that there is some adaptation to the visual stimulus. The first possibility, that only attended objects are important, is consistent with some prior data. EEG studies of VSTM use a paradigm (bilateral with cueing) that depends entirely upon the response being modulated by attention (Vogel et al., 2005; Vogel & Machizawa, 2004). Furthermore, Mitchell and Cusack (2008) found object load–related activity in parietal regions was modulated by task and not automatic. To characterize the effect of the number of attended objects, an fMRI experiment with a condition where multiple items are attended would be useful. However, in Experiment 2A, it was found that for the current stimuli, participants were unable to monitor both surfaces (C2b), suggesting they are only attending to a single surface even when asked to do two. This might be because for these overlapping stimuli and targets, there was conflict between the two stimuli that encourages winner-takes-all attention. In Experiment 4, described later, stimuli with less conflict between items were used and multiple objects could be attended simultaneously. Future experiments might use these with longer duration trials to understand the effect of the time course of attention. The second possibility, that there is activity involved in generating discrete object representations or selecting one of them, might be investigated by varying the ambiguity of the perceptual arrangement, thus modulating the amount of work that must be done to create them or select one. Relevant to the third possibility is the consideration in the General Discussion of the extent to which a basic visual response contributes to the parietal activity we observe.
EXTENDING THE ROI ANALYSIS TO EARLIER STUDIES
To relate the current studies to previous work, we conducted ROI analyses of two published datasets from our laboratory that have examined object representations, in the same parietal regions used for Experiments 1 and 2. The conditions and analyses examined are a subset of those presented in the original papers, but the ROI analyses allow the patterns of the response in the parietal lobe to be probed in detail, and understood in the context of the results above.
Experiment 3: Auditory Streaming
A “stream” is the result of the perceptual grouping of sequential sounds and a unit in audition that is the target of selective attention (Bregman, 1990; Bregman & Campbell, 1971), and in this sense, it is similar to the concept of an “object” in vision (Duncan, 1984). Cusack (2005) used sequences of sounds that have an ambiguous percept, and could be heard as one stream or two. Although the physical characteristics of the sequences do modulate the propensity of listeners to hear one or two streams, for suitably designed stimuli the percept is bistable and switches randomly from one percept to the other (Pressnitzer & Hupe, 2006; Anstis & Saida, 1985). The use of bistable ambiguous sequences conveniently allowed comparison of the neural correlates of a two-stream percept with a one-stream percept in the absence of physical differences in the stimuli.
Readers are referred to Cusack (2005) for a detailed description of the experiment. Two important fMRI contrasts were summarized in the ROI analysis here. In a straightforward analogy with Experiment 1, one revealed the regions that were recruited by presentation of the stimuli and task demands, relative to a resting baseline, and the other, the difference between the percept of two streams and one stream, with physical differences in time and frequency difference controlled. The response in this contrast was quantified for the same regions as in Experiments 1 and 2.
Results and Discussion
The key results of the ROI analysis are shown in Figure 4. The more inferior IPS region (Silver-IPS1) was recruited by two streams more than one stream [t(17) = 3.60, p < .005], as was the MD-IPS region [t(17) = 2.25, p < .05]. In contrast, whereas the MD-IPS region was activated by the task in general [t(17) = 3.45, p < .005], the inferior IPS region was weakly deactivated [t(17) = −2.17, p < .05]. A repeated-measures ANOVA revealed a significant Contrast × Region interaction [F(17) = 29.9, p < .001], showing that the response in these two regions is dissociated by these contrasts.
The response in other regions to the task in general is shown in Figure 5A. There was suppression in some regions [Xu-LOC: t(17) = −3.33, p < .005; Xu-infIPS: t(17) = −3.40, p < .005], but recruitment of the MD network [MD-IPS: t(17) = 3.48, p < .005; MD-IFS: t(17) = 3.45, p < .005]. The difference between two streams and one stream recruited all but the MD-IFS [Figure 5B; Xu-LOC: t(17) = 2.23, p < .05; Xu-infIPS: t(17) = 2.64, p < .05; Xu-supIPS: t(17) = 2.12, p < .05; MD-IPS: t(17) = 2.25, p < .05; MD-IFS: t(17) = 1.79, ns].
In the Silver-IPS1 region, the pattern of results parallels that observed in Experiments 1 and 2, which used transparent visual motion. This suggests that (as argued in the discussion of Cusack, 2005 and by Cowan, 2001) the representation of multiple objects and the performance of tasks in general recruit common subsets of parietal cortex across sensory modalities.
Another salient feature of the results is that performance of the auditory task (Figure 5) suppresses activity in the Xu-LOC and Xu-infIPS (and to a lesser degree in Silver-IPS1), in contrast with a broad enhancement in the other experiments. This might be an effect of modality-specific attention in these occipital and inferior parietal regions. There was no visual stimulus projected, but the scanner bore was not dark and participants were not asked to close their eyes. In the periods of silent baseline, they may well have been attending to their visual environment to a greater degree than during the auditory task. In contrast, like Experiment 2, Experiment 3 recruited amodal frontal regions. The smaller activation in Experiment 1 is likely to be because the task was easy.
Experiment 4: Perceptual and Visual Short-term Memory Load
As discussed in the Introduction, Mitchell and Cusack (2008) extended a paradigm used by Todd and Marois (2004) and found that up to a capacity limit of around three to four items, posterior parietal representations follow the number of objects, and thereafter the response ceases to increase, either in tasks that require VSTM or in those that do not. The stimuli in this study were quite different from Experiments 1 and 2, comprising a number of static colored discs.
Again, the reader is referred to the published work for a detailed description of the methods. Here, we conduct an ROI analysis to quantify object-related activity in a VSTM and a perceptual task. The VSTM task required change detection as used by Todd and Marois (2004), with a sample display of colored discs, a maintenance period, and then a probe display comprising a single colored disc. Subjects had to report whether the disc was present at the same location in the sample display or not. The perceptual task we report here (extended spatial and temporal attention) used an identical sample display, which remained visible throughout what was the maintenance period, removing the requirement for VSTM. On half of trials, at offset, one disc would disappear a little earlier than the others, and subjects had to report whether this happened or not. In the same way as Todd and Marois, an auditory task enveloped both tasks to prevent rehearsal through covert articulation. Set sizes of one, four, or eight discs were used in both tasks. In each task, two regressors were then used to model the neural response: one that followed the set size, and one that followed the load as estimated from the VSTM condition, increasing with set size up to a capacity limit of around four items and then leveling off. This experiment was designed to concentrate power on distinguishing different tasks and set sizes, and had little power to identify regions involved in general task performance, and so no contrast of this type is reported here. The response in this contrast was quantified for the same regions as in Experiments 1 and 2.
As shown in Figure 4D, in the inferior IPS region (Silver-IPS1), the response followed load in both the perceptual task [t(13) = 2.70, p < .02] and the one requiring VSTM [t(13) = 3.30, p < .01]. In the MD-IPS region, neither reached significance although both showed a trend [perceptual: t(13) = 1.91, p = .08; VSTM: t(13) = 1.95, p = .07]. A repeated-measures ANOVA with two factors, task (perceptual vs. VSTM) and region (Silver-IPS1 vs. MD-IPS), showed no effect of task [F(1, 13) = 0.88, ns], a weak trend toward an effect of region [F(1, 13) = 2.42, p = .14] and no interaction [F(1, 13) = 0.01, ns].
Figure 5 shows the response in other regions. In the Xu-supIPS, the response to load was strong in both the perceptual [t(13) = 4.28, p < .001] and VSTM [t(13) = 4.98, p < .001] tasks. There was also a response in the LOC [VSTM: t(13) = 4.25, p < .001; perceptual: t(13) = 6.10, p < .001] but not in Xu-infIPS [VSTM: t(13) = 1.76, ns; perceptual: t(13) = 0.82, ns] and little in DLPFC [VSTM: t(13) = 1.87, p = .08; perceptual: t(13) = −0.70, ns].
There was an object load response in Silver-IPS1 in both VSTM and perceptual tasks. This increased from Set Size 1 to 4, and plateaued thereafter, mirroring typical behavioral estimates of the number of items remembered in VSTM tasks. This is consistent with Experiments 1 to 3. Unlike Experiments 1 to 3, however, there was only a weak trend toward a greater response in Silver-IPS1 than in MD-IPS. This might be because of the particular paradigm, or because of the absence of a “general task” control condition.
Xu (2007) and Xu and Chun (2006) proposed that during VSTM, the superior IPS responds proportionally to the total number of remembered features. As can be seen from Figure 5B, in the perceptual Experiment 4, the superior IPS showed notably greater activity than other regions during VSTM or the perceptual task, but this was not the case for Experiments 1 to 3. In Experiment 4, the number and complexity of features that had to be encoded was substantial, with a number of colored discs presented, each with a color chosen from a set of 10 and at one of nine spatial locations. In contrast, in Experiments 1 to 3, there were only a maximum of two objects, and the features had low dimensionality (orthogonal movement, or low/high pitch) and there were few to encode. Taken together, the experiments here are consistent with Xu's proposal, and extend it to perceptual tasks without a short-term memory requirement.
The response in the posterior and inferior IPS was modulated by the number of surfaces (Experiments 1 and 2), auditory streams (Experiment 3), or colored discs (up to a maximum around four) (Experiment 4). This modulation was observed in the absence of a difference in extent or number of spatial locations (Experiments 1 to 3). The pattern of activity was different from attention switching, which recruited frontal (DLPFC) and more anterior parietal (MD-IPS) regions more strongly than posterior, inferior parietal regions (Experiment 2). It was also different from the network recruited by general task performance (Experiments 1 to 3), and was present even when contrasting conditions matched for difficulty (Experiment 2).
In Experiments 1 and 2, the surfaces were created using transparent motion, and there was a difference in the number of movement directions present across the one- and two-surface conditions. Is it parsimonious to conclude that these visual differences alone led to the differential IPS activity? Three results suggest it is not. First, the inferior IPS responses have been seen in experiments that modulated the transparent motion percept using quite different visual manipulations that controlled for the direction and magnitude of local movement (Muckli, Singer, Zanella, & Goebel, 2002). Second, the response tracked the perceptual load of objects even with quite different visual stimuli that were static (Experiment 4), showing that motion is not required. Third, it occurred even in the absence of any difference in visual stimulation at all (Experiment 3).
Another potential confound that must be considered is that of eye movements. Although Experiments 1 and 2 did not require them, observing one surface might have inadvertently led to a different eye movement pattern from observing two. In Experiment 4, eye movements could have been modulated by set size. However, previous authors (Xu & Chun, 2006; Todd & Marois, 2004) have shown that parietal activity persists through the maintenance period of VSTM tasks, even when the visual stimulus has disappeared. Furthermore, Experiment 3 was auditory and there was no difference in visual stimulation. A further piece of evidence against eye movements being the cause of the activity is that the pattern of activation seen with eye movements is similar to that generated by attention switching (Corbetta, 1998), yet in Experiment 2 it was found that the object-related response is quite different from that to attention switching.
Thus, what role could be played by the inferior IPS region? Within some constraints, we suggest three possibilities that are compatible with our results. First, it might play a role in the process of perceptual grouping that generates object representations, perhaps by acting as a coordinating node for a broad network of regions responsible for grouping. This might happen through phase synchronization (Lisman & Idiart, 1995; Singer & Gray, 1995) or some other mechanism. A related idea is that it is responsible for the maintenance of the separation between representations of distinct objects so that they do not blur together (as has been suggested by Zhang & Luck, personal communication; Shafritz et al., 2002). Another variant is that it controls the degree of task-dependent fragmentation of the perceptual scene as seems to be required in audition and vision (Cusack, Deeks, Aikman, & Carlyon, 2004; Hochstein & Ahissar, 2002). A constraint on this role from the current data is that as task affects the inferior IPS response (Mitchell & Cusack, 2008), a purely perceptual bottom–up operation can be ruled out. A further logically possible explanation is that the region does not actually play a role in perceptual organization, but instead performs some other function that yields a BOLD response modulated by the state of grouping.
Two other possible cognitive roles proposed in Kahneman et al. (1992) are compatible with our data. One is that the inferior IPS holds the “tokens” or “tags” required whenever the representation of multiple simultaneous identical items is necessary (see also Chun, 1997; Kanwisher, 1987). In this model, the ventral stream represents “what” is in the visual scene, and when multiple items are present that have distinct features, the ventral stream alone could represent them, and tokens or tags would not necessarily be required. But when multiple items of the same kind are present, an additional token representation is required to keep them separate, which we (along with others; Humphreys, 1998) place in the dorsal stream. Although the individual objects in our experiments did differ in their features (and so presumably their ventral response), they also had strong similarities, and so token representations might have been useful to keep them distinct. One constraint the current data would place upon this model is that the tokens in the inferior IPS could not be purely spatial in nature (Experiment 1).
The second role suggested by Kahneman et al. (1992) is that the creation of an “object file” is necessary for a chunk of sensory input to enter awareness. This idea is strongly related to the ideas of “multifocal attention” proposed by Cavanagh and Alvarez (2005) and of “scope of attention” from Cowan et al. (2005) and Cowan (2001). It fits naturally with the effect of task on the activation (Experiment 4) and the time course of the inferior IPS response in Experiment 2B. Future research might investigate whether the inferior IPS response is necessary and sufficient for awareness.
Using a variety of stimuli, a number of authors have contrasted visual stimuli that are perceived as organized wholes versus scrambled controls (Fang & He, 2005; Muckli et al., 2002; Braddick et al., 2001; Vanduffel et al., 2001). All demonstrated an inferior IPS component, which could correspond to a discrete object representation response. Interestingly, Vanduffel et al. (2001) found that 3-D structure from motion led to activation in several regions in the IPS in the human, but not when using a similar paradigm and imaging method in the macaque. This might be because of the difficulty of identifying homologous parietal regions across the macaque and human (Van Essen, 2004) or because of differences in functional architecture. There is a suggestion that there is extra functionality in human parietal regions, as there is some evidence of disproportionate expansion of intraparietal regions from the macaque to human relative to other posterior brain regions (Orban, Van Essen, & Vanduffel, 2004; Simon, Mangin, Cohen, Le Bihan, & Dehaene, 2002). However, it is hard to imagine that macaques have no object file representation, particularly given their ability to spontaneously make number comparisons (Hauser, Carey, & Hauser, 2000) and to individuate objects (Santos, Sulkowski, Spaepen, & Hauser, 2002).
What about other parietal regions? Other successes in partitioning PPC (Rushworth, Behrens, & Johansen-Berg, 2006; Silver et al., 2005; Van Essen, 2004; Simon et al., 2002) are mirrored by the dissociations we found among regions. The MD-IPS region is recruited by general task demand, responding in a similar way to lateral frontal cortex. The inferior IPS region (Silver-IPS) encodes the number of objects (Xu & Chun, 2006, 2007; Cusack, 2005). The superior IPS region (Xu-supIPS) possibly follows the number of features (Xu, 2007; Xu & Chun, 2006). To further understand the functions of different regions of parietal cortex, we have used task performance connectivity from a broad set of fMRI data (Cusack & Owen, 2008). We found that the more anterior parietal (MD) and posterior (inferior IPS) regions differ in important ways. As might be expected, the MD parietal region shows greater connectivity to lateral and medial frontal regions than the inferior IPS. In contrast, the inferior IPS shows more connectivity to lateral occipital cortex and visual regions. Connectivity of an object file region to visual regions might serve a feedforward role (helping to demarcate objects), a feedback role (implementing the effect of object demarcation on perceptual interpretation), or both. We do not presently have a suggestion on the role of the medial frontal connectivity.
In summary, we have characterized the roles of the posterior parietal lobe, and found that distinct regions encode the presence of multiple objects, are generally recruited by complex tasks or attention switching, and encode the number of features in the display. Future work will extend the understanding of the roles of each of these regions, and their interplay.
Reprint requests should be sent to Rhodri Cusack, MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, Cambridge CB2 7EF, UK, or via e-mail: firstname.lastname@example.org.
We shall use the word “object” to refer to each discrete subset formed by perceptual organization, following the usage in the attention and working memory literature.