A hallmark of human cognition is the ability to rapidly assign meaning to sensory stimuli. It has been suggested that this fast visual object categorization ability is accomplished by a feedforward processing hierarchy consisting of shape-selective neurons in occipito-temporal cortex that feed into task circuits in frontal cortex computing conceptual category membership. We performed an EEG rapid adaptation study to test this hypothesis. Participants were trained to categorize novel stimuli generated with a morphing system that precisely controlled both stimulus shape and category membership. We subsequently performed EEG recordings while participants performed a category matching task on pairs of successively presented stimuli. We used space–time cluster analysis to identify channels and latencies exhibiting selective neural responses. Neural signals before 200 msec on posterior channels demonstrated a release from adaptation for shape changes, irrespective of category membership, compatible with a shape- but not explicitly category-selective neural representation. A subsequent cluster with anterior topography appeared after 200 msec and exhibited release from adaptation consistent with explicit categorization. These signals were subsequently modulated by perceptual uncertainty starting around 300 msec. The degree of category selectivity of the anterior signals was strongly predictive of behavioral performance. We also observed a posterior category-selective signal after 300 msec exhibiting significant functional connectivity with the initial anterior category-selective signal. In summary, our study supports the proposition that perceptual categorization is accomplished by the brain within a quarter second through a largely feedforward process culminating in frontal areas, followed by later category-selective signals in posterior regions.
The primate sensory system assigns meaning to visual images with swift proficiency. This ability is mediated by a “simple-to-complex” hierarchy of brain areas, the so-called ventral visual stream (Kravitz, Saleem, Baker, Ungerleider, & Mishkin, 2012), in which the complexity of neurons' preferred features gradually increases along a hierarchy. This processing stream begins with selectivity to oriented edges in primary visual cortex (V1), extends along intermediate visual areas V2 and V4, and culminates in lateral occipital and ventral temporal regions that exhibit selectivity for complex natural objects such as faces, cars, and words. The stimulus representations in visual cortex are then hypothesized to provide input to task-relevant circuits in pFC. Monkey electrophysiology (Freedman, Riesenhuber, Poggio, & Miller, 2003; Op de Beeck, Wagemans, & Vogels, 2001; Thomas, Van Hulle, & Vogels, 2001) as well as human fMRI studies (Jiang et al., 2007) have provided support for a two-stage model of perceptual category learning (Panis, Vangeneugden, Op de Beeck, & Wagemans, 2008; Thomas et al., 2001; Riesenhuber & Poggio, 2000; Ashby & Lee, 1991; Nosofsky, 1986) involving a perceptual learning stage in extrastriate visual cortex in which neurons come to acquire sharper tuning with a concomitant higher degree of selectivity for the training stimuli. These stimulus-selective neurons provide input to task modules located in higher cortical areas, such as pFC, mediating behavioral processes such as the identification, discrimination, or categorization of stimuli. A computationally appealing property of this hierarchical model is that the high-level perceptual representation in visual cortex can be used in support of other tasks involving the same stimuli (Riesenhuber & Poggio, 2002), permitting transfer of learning to novel tasks.
To probe the neural bases of perceptual categorization, it is crucial to distinguish the physical shape of a stimulus from its conceptual attributes. Note that “categorization” here refers to the assignment of a semantic label that is abstracted from the physical shapes of the category members. This capability requires that even physically similar stimuli (such as apple and tennis ball) can be assigned to different categories, whereas physically dissimilar stimuli (such as apple and banana) can belong to the same category (“fruit”). Populations of neurons computing the conceptual attributes of a stimulus are expected to respond similarly to physically dissimilar stimuli from the same category and to respond dissimilarly to physically similar stimuli from different categories, whereas populations of neurons selective for physical shape differences lack the sharp response transition at the category boundary and generalization within each category that are the hallmarks of perceptual categorization.
This dissociation between the shape and conceptual attributes of a stimulus can be difficult to achieve with natural categories (Gotts, Milleville, Bellgowan, & Martin, 2011), as stimuli in one category are commonly visually more similar to each other than to images from another category (e.g., faces vs. places). This difficulty has motivated the use of morphed stimulus spaces (Van der Linden, Van Turennout, & Indefrey, 2010; Gillebert, Op de Beeck, Panis, & Wagemans, 2009; Freedman et al., 2003; Freedman, Riesenhuber, Poggio, & Miller, 2001; Op de Beeck et al., 2001) that allow to precisely define categories and control shape similarity within and across categories. Studies using such morphed spaces have revealed that, although categorization training sharpened the selectivity of neurons in temporal cortex for trained stimuli (Van der Linden et al., 2010; Gillebert et al., 2009; Jiang et al., 2007; Freedman, Riesenhuber, Poggio, & Miller, 2006; Freedman et al., 2003; Thomas et al., 2001), these neurons appeared to respond to physical shape similarity only, rather than to conceptual category membership. Instead, neurons selective for the conceptual categories were observed in pFC (Jiang et al., 2007; Freedman et al., 2001; see also Gotts et al., 2011).
However, the temporal dynamics of how the human brain translates physical stimuli into conceptual labels is less clear. In particular, recordings from monkey IT and pFC in the same animals showed that neurons in IT responded to stimulus shape with a mean latency of approximately 100 msec whereas neurons in pFC responded to more abstract parameters, including stimulus category membership, with a mean latency of nearly 200 msec (Freedman et al., 2003). In humans, surface intracranial field potentials indicated that object-selective information accumulated as early as 100 msec poststimulus onset in temporal cortex (Liu, Agam, Madsen, & Kreiman, 2009), but no recordings have probed the representation of conceptual category information and its latency in frontal cortex. Complicating the picture, some human fMRI studies (Van der Linden et al., 2010; Li, Ostwald, Giese, & Kourtzi, 2007) have reported posterior category selectivity, along with frontal selectivity. However, fMRI methods have poor temporal resolution because they only capture hemodynamic responses that integrate over thousands of milliseconds, an order of magnitude slower than the underlying cognitive processes. This low temporal resolution makes fMRI poorly suited to investigate the nuanced temporal dynamics of conceptual categorization, for instance, to address whether posterior categorical information manifests in the first feedforward processing pass following stimulus onset or is driven by top–down modulations that are invoked only after the first feedforward pass of information processing is complete.
In contrast, EEG has superior temporal resolution at the millisecond level that combines whole-brain coverage with the ability to precisely study the time course of the neural signals underlying visual categorization in humans. Several potentially relevant signal components have been identified in prior EEG studies of human object recognition. The N1 (also referred to as the N170, particularly in the context of face stimuli), the first negative-going ERP observed on posterior electrodes, has been linked to both stimulus detection (Bentin, Allison, Puce, Perez, & McCarthy, 1996) and individual stimulus encoding (Jacques & Rossion, 2006) in the domain of faces as well as objects of expertise in general (Scott, Tanaka, Sheinberg, & Curran, 2006, 2008). A substantial literature using EEG has shown that real world expertise as well as training on specific object recognition tasks lead to a selectively enhanced N1 (e.g., Scott et al., 2006, 2008; Rossion, Gauthier, Goffaux, Tarr, & Crommelinck, 2002; Tanaka & Curran, 2001). Given the lateral occipital origins of the N1 (Schweinberger, Pickering, Jentzsch, Burton, & Kaufmann, 2002; Shibata et al., 2002), this ERP response component is a good candidate for the manifestation of shape-selective neural responses predicted by the two-stage model of visual object categorization.
It is less clear which EEG components might correspond to explicit categorization. Following the N1 response, candidate ERP response components include the posterior N250 and the more anterior P300. However, the posterior topography of the N250 (Scott et al., 2006, 2008) is inconsistent with the anterior neural source of concept-selective signals observed in recent monkey physiology and human neuroimaging studies, and the long latency of the P300 (Polich & Margala, 1997) is inconsistent with the fast behavioral RTs observed for explicit categorization of single stimuli (Fabre-Thorpe, 2011).
A major challenge in probing the temporal dynamics of human object categorization with EEG is that the magnitude of neural responses alone is insufficient to isolate whether a population of neurons responds to the shape of the stimulus or the concept it represents. To disentangle shape and concept-selective signals, we used EEG-rapid adaptation (RA) techniques. Although previous EEG studies have exploited adaptation effects to probe the selectivity of neuronal representations underlying particular signal components, for example, the N1 (Zimmer & Kovács, 2011; Caharel, d'Arripe, Ramon, Jacques, & Rossion, 2009), RA techniques have been mostly used in fMRI (but see, e.g., Vizioli, Rousselet, & Caldara, 2010; Heisz, Watter, & Shedden, 2006). The RA approach is motivated by findings from IT monkey electrophysiology experiments reporting that the second presentation of a stimulus (within a short time period) evokes a smaller neural response than the first (Miller, Gochin, & Gross, 1993). It has been shown that this adaptation can be measured using fMRI and that the degree of adaptation depends on stimulus similarity in line with recent monkey electrophysiology results (De Baene & Vogels, 2010), with repetitions of the same stimulus causing the greatest suppression and dissimilar stimuli causing progressively less adaptation. We (Jiang et al., 2006, 2007) as well as others (Panis et al., 2008; Fang, Murray, & He, 2007; Gilaie-Dotan & Malach, 2007; Murray & Wojciulik, 2004) have provided evidence that parametric variations in visual object parameters (shape, orientation, or viewpoint) are reflected in systematic modulations of the fMRI-RA signal and can thus be used as an indirect measure of neural population tuning (Grill-Spector, Henson, & Martin, 2006). In fMRI-RA, two stimuli are presented in rapid succession in each trial, with the similarity between the two stimuli in each trial varied to investigate neuronal tuning along the dimensions of interest. We applied the same technique to EEG with the goal of probing the selectivity of shape and conceptual category-selective representations in the human brain and how they are dynamically engaged in participants trained on a novel categorization task over morphed shapes.
Twenty participants participated in the study. One participant was eliminated from the group analysis because of inability to learn the task (behavioral performance more than 2 standard deviations below the mean), resulting in a sample of 19 participants (18 right-handed, 11 men, mean age = 24.0 years, range = 18–33 years). Georgetown University's Institutional Review Board approved experimental procedures, and written informed consent was obtained from all participants before the experiment.
Participants were trained on a visual categorization task involving car stimuli generated by a morphing system that was capable of finely and parametrically manipulating stimulus shape (Shelton, 2000). By morphing different amounts of the four car prototypes, we could generate thousands of unique images, continuously vary shape, and precisely define category boundaries (Figure 1A and B). The category of each stimulus was defined by whichever category prototypes contributed more (>50%) to a given morph (Jiang et al., 2007; Freedman et al., 2003). Thus, two sample stimuli could be similar yet situated on opposite sides of the category boundary, whereas stimuli that belonged to the same category could be dissimilar. This careful control of physical similarity within and across categories allowed us to disentangle the neural signals explicitly representing category membership from neural signals related to physical stimulus shape. The grayscale car images were presented on a white background for training and a neutral gray background for testing. Training images varied in size (between 200 and 320 pixels wide) and had different resolutions to prevent participants from focusing on local cues and to discourage a strategy based on latching on to individual local image differences. Images composed of blends of three or four prototypes were used for label training and spanned a four-dimensional morph space excluding a corridor of distances less than 5% from the category boundary. For category label testing, 21 images were positioned at 5% increments from each of the four distinct one-to-one prototype morph lines. The results of category testing were used to select four images from each morph line for the EEG testing (see Figure 1B). These four images were positioned at distances of 0%, 33%, 67%, and 100% relative to participants' subjective category boundaries, as determined by each individual's categorization test. The morph space extended from −20% to +120% for all morph lines, permitting the extraction of a balanced quadruplet for each morph line even when an individual category boundary deviated slightly from 50%. Thus, for a participant whose perceptual categorization results placed the category boundary for a given morph line at 45%, for instance, the EEG testing stimulus quadruplet for that particular morph line and participant would consist of images from positions −5%, 28%, 62%, and 95%.
Participants completed category label training remotely with a Web-based implementation. Participants learned to categorize stimuli into two categories labeled “SOVOR” and “ZUPUD.” A single training trial consisted of a test stimulus presented for 400 msec, followed by a 300-msec mask, followed by both category labels randomly positioned on the left and right sides of the screen for each trial (illustrated in Figure 1C) to avoid spatial associations with the category labels. Participants had up to 3 sec to indicate the correct label of the test stimulus with the left or right arrow key. Incorrect responses elicited auditory feedback along with a display containing the correct label adjacent to the test stimulus that participants viewed as long as desired. The difficulty of the categorization task was increased by introducing morphs with increasingly greater contributions from the other category until participants could reliably (performance >80%) identify the category membership of randomly chosen images composed of up to 40% of the other category (similar to Jiang et al., 2007).
Scalp voltages were measured using an Electrical Geodesics (EGI, Eugene, OR) 128-channel Hydrocel geodesic sensor net and Net Amps 300 amplifier. Incoming data were digitally low-pass filtered at 200 Hz and sampled at 500 Hz using common mode rejection with vertex reference. Impedances were set below 40 kΩ before recording began and maintained below this threshold throughout the recording session with an impedance check during each break between blocks. During the experiment, participants performed a categorization task on sequentially presented pairs of stimuli (Figure 1D). As in our previous fMRI-RA study (Jiang et al., 2007), stimulus pairs were drawn from individual morph lines according to one of four conditions, with each condition appearing with equal probability: M0, corresponding to the presentation of the same stimulus twice; M3-within (hereafter abbreviated as M3w), corresponding to the presentation of two stimuli differing by a 33% shape change along the morph line from the same category (i.e., 0% and 33%; or 67% and 100% positions along the morph line relative to participants' category boundary); M3-between (hereafter abbreviated as M3b), corresponding to the presentation of two stimuli from different categories differing by 33% (i.e., 33% and 67% positions along the morph line); and M6, corresponding to the presentation of two stimuli differing by a 67% shape change from different categories, as illustrated in Figure 1B. The M0, M3w, M3b, and M6 conditions occurred with equal probability and frequency in randomized order. Participants responded with their right index or middle finger to indicate whether the pair of images belonged to the same, or different, categories. In a fifth trial type, only one car was presented; participants did not make a response in those trials. The results of these trials are not considered further in this article. Trial timing was 500 msec fixation, blank screen for 500–1000 msec, stimulus 1 for 200 msec, blank for 200 msec, stimulus 2 for 200 msec, blank until participant response or 2300 msec had passed (see Figure 1D). Participants completed five blocks of 240 trials or eight blocks of 180 trials, for a total of 1200 or 1280 trials per participant with equal numbers of trials per condition in all cases. Breaks between blocks of trials were used to maintain impedances below threshold of 40 kΩ.
Data processing and statistical analyses were performed using EEGLAB (Delorme & Makeig, 2004), FieldTrip (Oostenveld, Fries, Maris, & Schoffelen, 2011) versions 20120204 and 20121212, and custom scripts in Matlab 7.10.0 (R2010a). Data were high-pass filtered at 0.1 Hz and low-pass filtered at 30 Hz using two-way least-squares FIR filtering in EEGLAB, epoched on the interval [−600 800] msec relative to Stimulus 2 onset for ERP visualization and truncated to the time interval [−200 400] relative to Stimulus 2 onset for statistical analyses, and baselined on the interval [−200 0] msec relative to stimulus onset to compare responses to the second stimulus across conditions. Bad channels were identified by visual inspection and replaced by the average of their neighbors through interpolation (Oostenveld et al., 2011). Trials containing artifacts or blinks were rejected if the recorded signal changed more than 75 μV within a trial on four vertical EOG channels, as in Scott et al. (2008). ERPs reflect trials for which participants responded correctly.
EEG signals were analyzed for stimulus selectivity using cluster-based permutation testing (Maris & Oostenveld, 2007). This approach permits the consideration of all the EEG data without imposing a priori constraints regarding which channels and time points reflect experimental manipulations while controlling for multiple comparisons. The space–time clusters were identified by subjecting every (channel, timepoint) pair between two conditions to a paired two-tailed t test to identify points where the conditions differed at p < αthresh before correcting for multiple comparisons. These points were then grouped into space–time clusters based on both temporal and spatial adjacency. Two points were considered temporally adjacent if they occurred at subsequent time points; spatial neighbors were constrained by triangulation, resulting in an average of 7.5 neighbors per channel (minimum 5, maximum 10 neighbors). For each cluster, a single statistic was extracted based on the sum of all t values in the cluster. The significance of each cluster was computed by recalculating each cluster statistic for 104 random partitions of the trial conditions. The overall significance of each cluster was calculated using the proportion of permutations for which the resulting cluster statistic was greater than the statistic calculated with the correct labels, resulting in a probability measure controlled for Type I error (Maris & Oostenveld, 2007).
Stimulus-selective space–time clusters over all 128 channels and selected time windows were identified by contrasting the M0 and M6 conditions to identify clusters that broadly showed adaptation to the stimuli used. For these primary contrasts, the two M3 conditions (M3w and M3b) were not involved in the cluster identification process, making it possible to independently assess the shape or conceptual category-tuning of the identified clusters by comparing the relative amplitudes of M3w and M3b in the stimulus-selective clusters identified by the M0 versus M6 contrast. Similar to our previous fMRI study (Jiang et al., 2007), we reasoned that signals elicited by populations of neurons responsive to a particular stimulus shape would have equivalent response levels to both M3 conditions, as the stimuli in each pair for the two conditions differed by an equivalent amount of physical shape change. The neuronal activation patterns caused by the second stimulus were therefore expected to have similar degrees of overlap and thus exhibit similar levels of adaptation to the M3w and M3b conditions. Unlike the shape-tuned neurons, populations of neurons explicitly selective for stimulus category (i.e., showing conceptual tuning) were predicted to exhibit different response levels in the two M3 conditions because M3w trials, which contained two stimuli that belonged to the same category, should repeatedly stimulate neurons tuned to the same category, causing adaptation. In contrast, because M3b trials contained two stimuli belonging to different categories, the two stimuli should activate different groups of category-selective neurons and therefore cause a stronger, unadapted response. Secondary, follow-up cluster searches were conducted by contrasting “same category” (M0 and M3w) versus “different category” (M3b and M6) conditions and M3w and M3b conditions. These contrasts served to identify the spatial topography and temporal extent of category selectivity.
We performed two cluster searches based on a priori hypotheses: an unconstrained cluster search on the entire 400 msec time window following the onset of the second stimulus to identify signatures of stimulus-selective processing across the whole brain and processing time window (with the [0 400] msec latency window motivated by the fastest participant's median RT for the M0 condition, which was 429.4 msec). We utilized a more temporally focused analysis to target the N1 component using a 50-msec time interval from 150 to 200 msec relative to Stimulus 2 onset. Given the lack of universal consensus regarding which channels to include in the analysis of the N1 (e.g., Eimer, Gosling, Nicholas, & Kiss, 2011; Caharel et al., 2009; Scott et al., 2008; Schweinberger et al., 2002), the N1 cluster search was initially conducted over all 128 channels using a cluster-identification threshold of αthresh = 0.05. Follow-up cluster searches were performed with lower αthresh values to probe the specificity of observed effects, as indicated in Results.
Functional connectivity was assessed by a coherence calculation in FieldTrip. Four groups of channels (left and right hemisphere N1, P2, and posterior category-selective clusters) were identified by cluster analysis, and coherence was calculated pairwise for each channel combination between the posterior category-selective cluster and the N1 cluster channels and between the posterior category-selective cluster channels and the P2 cluster channels. To limit the spatial extent of the P2 cluster channels to a size comparable to the extent of the posterior category-selective cluster (10 channels), channels belonging to the P2 cluster at cluster onset were considered (see Figure 5). To preserve temporal resolution while adequately capturing power changes over time, the time window for the coherence calculation varied by frequency from 2 to 30 Hz in steps of 2 Hz by a duration of 1/f, such that the time window at 2 Hz was 500 msec, at 4 Hz was 250 msec, and so on down to a 33-msec duration window at 30 Hz. Coherence was calculated in 10-msec steps, averaged over frequency bands from 2 to 30 Hz and normalized relative to a 200-msec prestimulus baseline to give percent change in coherence over time. Significance was assessed by contrasting each coherence time course with a null hypothesis of 0% coherence change relative to baseline. Significance is marked where two or more consecutive time windows reach p < .01.
Correlations with behavior were calculated by extracting the mean signal differences between the M3b and M3w conditions within clusters identified by contrasting the M0 and M6 conditions and calculating Pearson's r between these signal differences and the participants' mean accuracy on the M3w and M3b conditions.
Participants completed the Web-based category label training in 4.63 ± 0.84 hr (mean ± SEM). Following training, each participant's categorization ability was tested in the lab, and their performance was fit to a sigmoid function including parameters for the location of the category boundary, β, and slope of the change at the boundary, t (see Methods). The boundary parameter t indicating rate of change across the boundary was 0.80 ± 0.22 across participants and morph lines, indicating that, on average, for a category boundary positioned at 50%, participants indicated 93% category A for stimuli composed of 60% category A and 7% category A for a stimulus composed of 60% category B. Figure 2 shows accuracy and RTs on the category-matching task performed during EEG recordings. Accuracy across conditions showed a U-shaped profile, as would be expected, given that M3w and M6 conditions each involved one difficult category decision at the boundary, whereas M3b involved two difficult category decisions at the boundary. Correspondingly, RTs showed an inverted-U profile.
ERPs elicited by both stimuli for conditions M0, M3w, M3b, and M6 are shown in Figure 3 for all 128 channels, segregated into two broad topographical sections, anterior and posterior. An average of 11.3 ± 2.8% (mean ± SEM) of the trials were rejected because of artifacts for each participant. Classical ERP responses including the P1 and N1 can be seen in response to the first and second stimuli. Adaptation effects are also visible in these mean ERPs. For instance, although the N1 over posterior channels in response to the first stimulus did not differ across conditions, the N1 in response to the second stimulus showed evidence of adaptation, with an attenuation of M0, M3w, and M3b conditions relative to M6.
To probe for shape- and conceptual category-selective adaptation effects, we focused our analyses on the condition-dependent amplitude modulations emerging in response to the second stimulus. One space–time cluster was identified by contrasting M0 versus M6 trials on all 128 channels over the interval [150 200] msec using αthresh = 0.05. This lenient threshold revealed a cluster broadly distributed across bilateral posterior electrodes with temporal extent [172 200] msec and p = .036, as shown in Figure 4A. Figure 4B shows the ERP for each condition on the union of all channels in this cluster, with the temporal extent of the cluster highlighted in yellow. The nature of the stimulus selectivity exhibited by this space–time cluster was evaluated by extracting the response amplitude in the independent conditions, M3w and M3b, within the clusters identified by contrasting M0 and M6. Responses in the M0 and M6 conditions differed significantly as expected because of the cluster definition, but the key contrast of interest is how M3w and M3b differed relative to each other and also to M0 and M6. The mean response amplitude on this space–time cluster is shown in Figure 4C, with pairwise comparisons between conditions as noted. The M0, M3w, and M3b conditions were all significantly attenuated relative to the M6 condition, and M3w and M3b were equivalent. This response profile indicates that this cluster was broadly shape selective, showing release from adaptation for large shape changes (M6) and no explicit conceptual category selectivity.
To address the possibility that the stimulus selectivity profile observed in this time window depended on the broad spatial extent of the cluster, we re-ran the cluster search over the interval [150 200] msec using a more stringent using αthresh = 0.01 and searching only the posterior 65 channels instead of all 128 channels. This cluster search revealed two bilateral space–time clusters with topography comparable to the N1 in previous studies (Bentin et al., 1996; Bötzel, Schulze, & Stodieck, 1995). One cluster corresponded to the right hemisphere N1 response (Figure 4D) with temporal extent [188 200] msec and cluster level significance p = .027, and a second cluster corresponded to the left hemisphere N1 (Figure 4G) with temporal extent [184 194] msec and cluster significance p = .030. The ERP exhibited on the union of all channels within the clusters in each hemisphere are illustrated in Figure 4E and H. The mean response amplitudes within each cluster are plotted in Figure 4F and I: For both hemispheres, M0, M3w, and M3b were all significantly attenuated relative to the M6 condition, and M3w and M3b were equivalent. This response profile indicates that the N1 was broadly shape selective, showing release from adaptation for large shape changes, with no explicit conceptual category selectivity, and this selectivity profile was robust as the cluster was constrained in space and time by decreasing the αthresh level for cluster identification.
We next probed selectivity over the full [0 400] msec interval. Space–time cluster analysis was performed by contrasting M0 versus M6 over all 128 channels using αthresh = 0.01 and revealed one significant cluster (M6 > M0, p < .001), with temporal extent [210 400] msec. The topographical evolution of this cluster over time is illustrated in Figure 5A. A comparison of responses in the M3w and M3b conditions (which were independent of the conditions used to define the cluster) revealed a significant difference between these conditions, t(18) = −2.361, p = .030 (see Figure 5D), but not between M3b and M6, t(18) = 0.453, p = .656, indicating a conceptual category selectivity of the underlying neural source(s) generating this differential signal. Another contrast, comparing “same category” (M0 and M3w) versus “different category” (M3b and M6) conditions identified a markedly similar cluster (Figure 4B) with temporal extent [216 400] msec and cluster significance p < .001. This cluster was also categorical because responses for M3w versus M3b differed significantly, t(18) = −3.098, p = .006 (Figure 5E). We finally tested selectivity using the M3w versus M3b contrast. No cluster was identified before 300 msec at αthresh = 0.01, but a search with the more lenient criterion of αthresh = 0.05 returned a cluster largely overlapping with the M0 versus M6 and same versus different search clusters (Figure 5C, F, p = .030), again with temporal extent [216 400] msec. Note that the N1 component did not elicit a significant space–time cluster when searching the entire [0 400] msec time range as its temporal extent was too limited to lead to significance when searching over the whole 400-msec interval. This is not surprising given that cluster-based nonparametric tests are optimized for the identification of widespread, sustained effects (Maris & Oostenveld, 2007).
The broad spatiotemporal extent of the categorical response cluster identified by searching the time window from 0 to 400 msec raises the question whether it might span more than one cognitive process. For instance, although monkey (Freedman et al., 2001, 2003) and human studies (Gotts et al., 2011; Jiang et al., 2007) have established the presence of perceptual categorization circuits in pFC, a number of studies have shown that other frontal areas, including OFC (Kepecs, Uchida, Zariwala, & Mainen, 2008; Hsu, Bhatt, Adolphs, Tranel, & Camerer, 2005; Critchley, Mathias, & Dolan, 2001) and insula (Sarinopoulos et al., 2010; Grinband, Hirsch, & Ferrera, 2006) as well as medial frontal gyrus and ACC (Stern, Gonzalez, Welsh, & Taylor, 2010), encode categorization uncertainty with enhanced responses to stimuli at the category boundary. To probe for a dissociation of perceptual category selectivity and signals related to categorization uncertainty, we analyzed the temporal evolution of signals in the anterior category-selective cluster. Specifically, we reasoned that signals reflecting adaptation of neurons in circuits performing the perceptual categorization task should show significant response differences between M3w and M3b as well as M3w and M6 responses (as both cases involve a “same category” vs. “different category” contrast). On the other hand, neural sources coding for the uncertainty of this categorization would be expected to show significant response differences between M3w and M3b conditions (as the former contains only one stimulus at the boundary, whereas the latter contains two stimuli, and is therefore associated with a higher level of uncertainty) but not between M3w and M6 (as both conditions include one stimulus at the boundary and one at the endpoint). We therefore conducted paired t tests comparing responses in these conditions at 2-msec increments over the course of the entire original cluster that was identified using M0 versus M6 with αthresh = 0.01 over [0 400] msec. This analysis revealed that early on ([232 256] msec; see Figure 6A, B, D), signals in the cluster showed a left-lateralized response profile compatible with perceptual categorization because M3w was significantly different from both M3b and M6. Later in the cluster ([328 384] msec; see Figure 6A, C, E), the response profile shifted to one compatible with reflecting a mixture of category selectivity and categorization uncertainty, with responses in the M3w and M3b conditions still significantly different, but responses for M3w and M6 not being significantly different anymore and a broader distribution of selective signals. This analysis suggests that perceptual categorization was accomplished first, within 250 msec, followed by a computation of perceptual uncertainty within the subsequent 100 msec.
To further probe the selectivity of the spatially and temporally extensive categorical cluster illustrated in Figure 5A during its initial lateralized stage, subsequent bilateral stage, as well as the later posterior component separately, we performed two additional cluster analyses over all 128 channels constrained to the time windows [200 300] msec and [300 400] msec using the contrast M0 versus M6. We reduced αthresh to 0.005 to separate the spatially expansive clusters into subclusters. The [200 300] msec interval search still revealed only one anterior cluster, but the [300 400] msec search revealed two distinct clusters: an anterior cluster extending over the entire interval and a second cluster exhibiting a strongly categorical response on channels situated directly over parietal cortex, extending from 330 to 360 msec. Figure 7 shows the M6–M0 (part A) and M3b–M3w (part B) topographies within both clusters with the posterior cluster circled in blue. Figure 7C shows the ERP on the union of all channels in this posterior cluster with the temporal extent of the cluster highlighted, and Figure 7D shows the mean signal by condition for this cluster. Figure 7E shows the topography of functional connectivity between this posterior cluster and all other channels.
Functional connectivity was assessed by calculating the coherence between the posterior seed interval (circled in blue on A and B with its temporal extent highlighted on part C) and the N1 and P2 cluster channels (Figure 8A, see Methods). Following the onset of the first stimulus, functional connectivity is generally higher between the posterior category cluster channels and the P2 cluster channels than between the posterior category cluster channels and the N1 cluster channels. Moreover, although significant increases in coherence between the N1 and posterior category cluster channels begin earlier than those between the P2 and the posterior category cluster channels and this coherence decreases for both regions following stimulus 2 onset, coherence subsequently rebounds between the posterior category cluster and P2 cluster at the latency at which the posterior categorical signal is observed in the amplitude domain ERPs, providing support that input from anterior category-selective circuits might underlie the subsequent category-selective response modulations of posterior signals. We additionally analyzed coherence between the left N1 and P2 as well as between the right N1 and the P2 cluster to assess the lateralization of stimulus processing (see Figure 8B). Both N1 clusters exhibited significant connectivity with the left lateralized P2 beginning around 140 msec after Stimulus 1 onset, gradually decreasing, and then starting to rebound around the latency of the N1 associated with the second stimulus, supporting functional connectivity between the N1 and P2 clusters as postulated by the feedforward model.
Finally, to test the predicted mechanistic relationship between the observed neural signals indicating category selectivity and participants' ability to perform the perceptual categorization task, we examined the correlation of the response difference in the M3 conditions in these clusters (hypothesizing that stronger response differences between the M3B and M3W conditions reflected more sharply tuned category circuits, which should predict higher categorization performance) to the average of the behavioral categorization accuracy in those conditions, predicting a positive correlation between the two variables. Indeed, the signal difference between the M3b and M3w conditions within the cluster found by contrasting M0 versus M6 over the interval [200 300] msec, with an extent of [222 300] msec (Figure 9A) was found to strongly correlate with behavioral performance (r = 0.627, p = .004; Figure 9B). Likewise, the M3b–M3w signal difference on the anterior cluster for the subsequent 100 msec (temporal extent [300 400] msec, cluster significance p < .001; Figure 9C) was strongly correlated with behavior (r = 0.702, p < .001; Figure 9D). On the other hand, the posterior cluster identified during [300 400] msec interval was not significantly correlated with behavior, with r = 0.317 and p = .186.
Taken together, our results, identifying a shape-selective N1 over posterior channels and a subsequent, anterior category-selective cluster of channels, support the two-stage feedforward accounts of perceptual categorization (Panis et al., 2008; Thomas et al., 2001; Riesenhuber & Poggio, 2000; Ashby & Lee, 1991; Nosofsky, 1986) and are in agreement with the model positing that primate object categorization is mediated by frontal category-selective circuits that receive input from object shape-specific representations in occipitotemporal cortex (Jiang et al., 2007; Freedman et al., 2003; Riesenhuber & Poggio, 2000).
The N1 response profile observed in our RA experiment is consistent with a neurophysiological signal elicited by populations of neurons broadly encoding stimulus shape, independent of semantic attributes or category labels, as the M0 and M3 conditions are attenuated while the M6 condition exhibits a release from adaptation. Whereas previous fMRI experiments using the same stimuli demonstrated sharper tuning for shape after comparable amounts of training (Jiang et al., 2007), the broader response profile suggested by the results in our EEG-RA study is not surprising because EEG records the neural responses from multiple neural sources simultaneously and, in our case, likely included populations of neurons that specifically encode the shape of stimuli used in this experiment (such as the car shape-selective neurons in LOC identified in Jiang et al., 2007) as well as neurons less selective for (but still responsive to) the stimuli of interest. This shape selectivity of the N1 was found bilaterally. Category effects were more lateralized, with a dominance of the left hemisphere. This left hemisphere lateralization of the initial conceptual category effect contrasts with the finding of predominantly right hemispheric category selectivity found in the fMRI experiment (Jiang et al., 2007). This difference in lateralization could be because of the fact that this study, in contrast to Jiang et al. (2007), used verbal labels for the object categories, which might have led to a lateralization of category circuits in the language-dominant hemisphere. Interestingly, left lateralization of category circuits was also found in another study in which participants were trained on categories with verbal labels defined in a morphed stimulus space (Van der Linden et al., 2010). Left lateralization of category circuits is also consistent with evidence that left posterior dorsolateral pFC plays a key role in perceptual decision-making (Heekeren, Marrett, Ruff, Bandettini, & Ungerleider, 2006). This anterior category selectivity showed significant functional connectivity with shape-selective responses in both hemispheres, consistent with previous reports showing cross-hemispheric connectivity of right high level visual cortex and left pFC by way of the anterior corpus callosum (Tomita, Ohbayashi, & Nakahara, 1999; Gazzaniga, 1995), in addition to ipsilateral connections.
Our study reveals that information flow in perceptual categorization follows three distinct phases: posterior shape selectivity up to 200 msec following stimulus onset, anterior category selectivity after 200 msec following stimulus onset, and posterior category selectivity after 300 msec following stimulus onset. The spatially and temporally expansive conceptual category-selective cluster observed in our data was initially categorical on anterior channels and subsequently, around 330 msec poststimulus onset, exhibited response modulations consistent with perceptual uncertainty. The complex scalp topographies observed after 300 msec almost certainly reflect multiple neural processes taking place in parallel. An anterior signal is spatially dissociable from a posterior signal, but the anterior signal itself may likely reflect the convolution of multiple processes. The equivalent response levels for the M3w and M6 conditions, both of which reflect the categorization of one stimulus near the category boundary and one prototype stimulus, may reflect general processes related to uncertainty. Indeed, a number of studies have provided evidence that activity in particular frontal areas (Kepecs et al., 2008; Grinband et al., 2006; Hsu et al., 2005; Critchley et al., 2001) reflect uncertainty, but it is less clear how this uncertainty is computed. Our data, with their higher temporal resolution compared with fMRI, are compatible with the simple model that categorization is performed first, and uncertainty estimates are then derived from the level of activation of category units. That is, strong activation of units coding for just one category would be associated with low uncertainty, whereas activation of units for both categories (e.g., for more ambiguous stimuli close to the boundary) would be associated with higher degrees of uncertainty.
Around the same time, the signal consistent with decision certainty was observed on frontal channels, a strongly categorical response was observed on posterior channels. We can rule out the possibility that this posterior categorical signal arises from the opposite side of an anterior dipole because the direction of attenuation (“same category” < “different category”) was identical across the entire scalp topography. These posterior category-selective responses demonstrated strong functional connectivity with the preceding anterior signals (see Figure 7E). The selective coupling observed between anterior channels at the initial onset of category selectivity and the later posterior category signal reflect the synchronization of phases as well as covariation of power of neural signal during these intervals (Fell & Axmacher, 2011). Such “feedback” category signal may serve purposes proposed by other groups, including conscious awareness of the stimulus (Fahrenfort, Scholte, & Lamme, 2007), stimulus learning (Seitz & Dinse, 2007), and the directing of task-driven attention (Buschman & Miller, 2007). Interestingly, although showing strongly categorical responses (Figure 7C, D), the category selectivity of the posterior signal did not correlate significantly with individual behavioral categorization ability (Figure 9F). Although this might be because of the smaller spatiotemporal extent of the posterior cluster compared with the anterior clusters (Figure 9A–D), the observed lack of a correlation is in agreement with the postulated postdecisional status of the posterior signal (which would be expected to produce strongly categorical signals on correct trials, as in our analysis).
The anterior-to-posterior sequence of conceptual category effects observed in this study are consistent with similar task effects reported elsewhere. For example, a recent monkey electrophysiology study (Goodwin, Blackman, Sakellaridi, & Chafee, 2012) found rule-dependent category signals in pFC with earlier latencies than similar signals in parietal cortex. In contrast, another recent monkey electrophysiology study (Swaminathan & Freedman, 2012), using a motion categorization task, found that parietal concept-selective signals preceded the onset of such signals in pFC. Although this might reflect a difference between shape and motion categorization (with motion categorization preferentially engaging the dorsal pathway, leading to earlier activation of parietal areas), another intriguing possibility is that categorization circuits in parietal cortex develop as a result of extensive task experience, leading to an anterior-to-posterior shift of task-selective activation, as observed in fMRI and EEG studies (Rivera, Reiss, Eckert, & Menon, 2005; Staines, Padilla, & Knight, 2002; Sakai et al., 1998). It will be interesting to probe in future studies in which task elements influence the order of information flow between anterior and posterior brain regions and the engagement of these areas in categorization.
This work was supported by the National Science Foundation (0449743 and 1232530 to M. R. and a Graduate Research Fellowship to C. A. S.) and the Research Program in Applied Neuroscience. We would like to thank Brian Jucha for implementing the Web-based training and Tim Curran for helpful advice.
Reprint requests should be sent to Clara A. Scholl, Georgetown University, 3970 Reservoir Rd, Washington, DC 20007, or via e-mail: firstname.lastname@example.org, email@example.com.