Research on the spatio-temporal dynamics of visual object recognition suggests a recurrent, interactive model whereby an initial feedforward sweep through the ventral stream to prefrontal cortex is followed by recurrent interactions. However, critical questions remain regarding the factors that mediate the degree of recurrent interactions necessary for meaningful object recognition. The novel prediction we test here is that recurrent interactivity is driven by increasing semantic integration demands as defined by the complexity of semantic information required by the task and driven by the stimuli. To test this prediction, we recorded magnetoencephalography data while participants named living and nonliving objects during two naming tasks. We found that the spatio-temporal dynamics of neural activity were modulated by the level of semantic integration required. Specifically, source reconstructed time courses and phase synchronization measures showed increased recurrent interactions as a function of semantic integration demands. These findings demonstrate that the cortical dynamics of object processing are modulated by the complexity of semantic information required from the visual input.
Humans are able to recognize tens of thousands of meaningful objects with incredible speed and accuracy. The neural infrastructure that supports this key cognitive function has been extensively investigated in nonhuman primates, resulting in a well-developed model of hierarchically organized object processing in ventral occipito-temporal cortex (Felleman & Van Essen, 1991; Ungerleider & Mishkin, 1982) that also supports the semantic processing of visual objects (Moss, Rodd, Stamatakis, Bright, & Tyler, 2005; Tyler et al., 2003, 2004; Chao, Haxby, & Martin, 1999). Although this research has been key in identifying the neural architecture that supports the recognition of visual inputs as meaningful objects, a comprehensive model of the neural mechanisms involved in processing meaningful objects requires the characterization of dynamic interactions along the ventral stream. Research using time-sensitive methodologies (i.e., EEG, magnetoencephalography [MEG], intracranial recordings) provide detailed accounts of the temporal dynamics during object processing, but have been less frequently used to address the issue of how the meaning of visual objects evolves over time.
Theories of object recognition that incorporate temporal information can be broadly divided into two types. Feedforward accounts emphasize that bottom–up processing largely supports recognition (Riesenhuber & Poggio, 1999), whereas recurrent, interactive accounts consider top–down influences necessary for object recognition (Bar, 2003; Bullier, 2001; Ullman, 1995). On both accounts, visual processing is initially composed of a feedforward propagation of activity from posterior occipital cortex anteriorly along the ventral temporal lobe (Bullier, 2001; Lamme & Roelfsema, 2000). Connectivity and single-unit recording studies suggest that at each level of this ventral hierarchy, the afferent information from the previous level is integrated to reflect an incrementally more complex representation, thus providing a mechanism where neurons respond to increasingly complex visual features at increasing levels of the hierarchy (Kobatake & Tanaka, 1994; Felleman & Van Essen, 1991). Responses generated by the initial feedforward sweep have been recorded in prefrontal cortex of both human and nonhuman primates after approximately 100 to 150 msec, which show a category-level response selectivity (Liu, Agam, Madsen, & Kreiman, 2009; Huang, Kreiman, Poggio, & DiCarlo, 2005; Freedman, Riesenhuber, Poggio, & Miller, 2001). For example, Freedman et al. (2001) found that neuronal responses in monkey prefrontal cortex distinguished between two learned object categories (in this case cats and dogs), but not between more fine-grained, within-category level objects (i.e., different exemplars of cats). These findings, as well as neuronally inspired computational models (Serre, Oliva, & Poggio, 2007; VanRullen & Thorpe, 2002), show that a purely feedforward mechanism can generate a representation detailed enough for broad categorizations. Therefore, the neural information accrued during initial feedforward processing provides an initial basis for rapid and coarse-grained recognition.
However, although feedforward models provide an account of rapid, category-level recognition, both anatomical and experimental evidence suggest that feedback, recurrent connections play a vital role supporting more fine-grained, higher-level, meaningful representations (Schendan & Stern, 2008; Bar et al., 2001; Bullier, 2001). The abundance of feedforward, feedback, and lateral connections between brain regions provides the anatomical infrastructure necessary for dynamic interactions between cortical regions. The functional relevance of these cortical dynamics has been highlighted by single-unit recordings and connectivity methods. Nonhuman primate research has not only shown that there are rapid responses in prefrontal cortex during visual tasks (Huang et al., 2005; Bullier, 2001; Freedman et al., 2001; Lamme & Roelfsema, 2000) but has also provided direct evidence that responses from prefrontal cortex feedback to activate ventral temporal cortex (Tomita, Ohbayashi, Nakahara, Hasegawa, & Miyashita, 1996). Bar et al.'s (2006) recent study highlighted the importance of feedback responses during object recognition. They compared trials where objects were successfully identified against those where the objects remained unrecognizable (due to backward masking) and found that successful recognition was accompanied by enhanced activity in the left orbito-frontal region of prefrontal cortex that was followed by activity in the posterior fusiform. Crucially, they also showed that the relative phase of the anterior and posterior signals was consistent across trials, further supporting the functional importance of feedback response for successful object recognition. These studies provide evidence that feedback signals modulate activity in more posterior regions, with the implication that detailed recognition of objects is accomplished through a recurrent, interactive process involving prefrontal and ventral temporal cortex.
Although these studies provide evidence that higher-level representations are established through recurrent interactions, what remains to be determined are the factors that drive these recurrent processes during the evolution of meaningful object representations. Here, we propose that recurrent interactions are modulated by semantic integration demands, which are defined in terms of the task and statistical properties associated with different object representations.
Distributed accounts of semantic knowledge claim that object concepts are represented by their individual semantic properties in an interconnected, property-based system (Tyler & Moss, 2001; McRae, de Sa, & Seidenberg, 1997). To recognize an object requires that the relevant semantic property information is extracted from the visual input and integrated to perform the task at hand. In the current study, we manipulate the degree of semantic integration required in two ways: (1) by using different task conditions and (2) by using objects from different domains (i.e., living and nonliving things). Tasks that require broad distinctions, such as whether an item is living or nonliving (domain decision), can be achieved by integrating simpler property information compared to unique, basic-level identification, which requires the additional integration of the more detailed object-specific attributes. For example, whether an object is living or nonliving (domain decision) can be determined based on properties such as “has eyes,” “breathes,” and “has legs,” but for more fine-grained basic-level naming, additional distinctive information such as “has a hump” also needs to be integrated into the representation (e.g., to uniquely identify a camel). Therefore basic, relative to domain, decisions require a greater degree of semantic integration (Taylor, Salamoura, Randall, Moss, & Tyler, 2008; Tyler et al., 2004).
A similar proposal was put forth by Humphreys, Price, and Riddoch (1999) who argue that conceptual representations are accessed following the processing of structural (visual) descriptions of the object, and these structural descriptions influence domain and basic-level naming in different ways. Objects with a large degree of structural similarity with other objects (as is the case for living things) will activate overlapping structural descriptions in many other objects. In the context of domain decisions, as structurally similar items are typically from the same object domain, the overlapping activation will facilitate responses as a function of similarity. For basic-level identification, the activation of many structurally similar items requires additional processing to discriminate between similar items, and therefore, structural similarity between items slows basic-level decisions.
Whether the speed of object processing is driven by purely structural (visual) similarity, or semantic similarity, both structural description and distributed accounts claim that information about an object can become available before a complete semantic representation is determined (Humphreys, Lamote, & Lloyd-Jones, 1995). This is also consistent with reaction time data showing both object decisions and domain decisions can be made faster than basic-level decisions (Moss et al., 2005; Lloyd-Jones & Humphreys, 1997; Humphreys et al., 1995; Vitkovitch & Tyrrell, 1995; Potter & Faulconer, 1975), which is consistent with the claim that basic, relative to domain, decisions require a greater degree of semantic integration to disambiguate the object concept from conceptually similar items.
Within basic-level naming, we also used a more subtle manipulation of semantic integration demands by contrasting information processing associated with objects from different semantic domains (e.g., living and nonliving things). Living things (especially animals) tend to have many semantic properties that are highly correlated (or frequently co-occur) with other properties of the concept, and are also shared with many other living things (e.g., many animals have four legs, eyes, fur), while having relatively few, weakly correlated distinguishing properties that are critical for determining the unique identity of the object. In contrast, nonliving things tend to have fewer properties overall, but crucially have relatively more distinguishing properties that are also more highly correlated with the other properties of the concept, making nonliving things less confusable (Taylor et al., 2008; Moss et al., 2005; Randall, Moss, Rodd, Greer, & Tyler, 2004; Cree & McRae, 2003; Tyler & Moss, 2001). For example, property norm data (Randall et al., 2004; Moss, Tyler, & Devlin, 2002; Greer et al., 2001) have shown that living things have relatively more shared than distinctive properties compared with nonliving things [% shared (relative to distinctive) properties: animals 67%, fruit/vegetables 75% vs. vehicles 39%, tools 53%; see Moss et al., 2005, p. 618]. In addition, the distinctive properties of nonliving things were found to be more highly correlated than those of living things [0.58 vs. 0.50, respectively; t(369) = 6.7, p < .001; Randall et al., 2004]. Thus, property norm data indicate that living things tend to have less, as well as less correlated distinctive and more shared information which enhances the conceptual similarity between different living things (see also Taylor et al., 2008).
Due to their increased within-domain similarity, living compared to nonliving things require a greater degree of semantic integration in order to discriminate one object concept from similar exemplars. As a result, basic relative to domain decisions, and the basic-level identification of living compared to nonliving things, require a greater degree of semantic integration.
The novel and central prediction we test here is that increasing semantic integration demands—driven by task or the conceptual structure of the object—generates increasing recurrent interactions within the object processing system. Our prediction is, in part, supported by previous studies showing that after the completion of the feedforward sweep (i.e., ∼150 msec), responses are modulated by recognition success (Schendan & Maher, 2009; Bar et al., 2006; Schendan & Kutas, 2002), by basic-level categorization (Johnson & Olshausen, 2003), and by naming specificity (Martinovic, Gruber, & Müller, 2009; Tanaka, Luu, Weisbrod, & Kiefer, 1999), suggesting that subsequent recurrent interactions are necessary. However, the important further issue of what factors drive the extent of these recurrent interactions remains unknown.
Predictions regarding the neural sources that modulate recurrent processing can be derived from previous studies. The anterior temporal lobe at the endpoint of the hierarchical object processing stream is hypothesized to support the fine-grained analyses associated with high semantic integration demands. For example, fMRI studies with human participants have demonstrated that the activity along the ventral stream is modulated through a greater response in anteromedial temporal cortex for conditions that required a greater degree of semantic integration, that is, basic- compared to domain-level object naming and basic-level naming of living compared to nonliving things (Moss et al., 2005; Tyler et al., 2004). We therefore predicted that regions of the anterior temporal lobe will drive recurrent interactions with more posterior object-sensitive regions as a function of increasing semantic integration demands.
To test our predictions, we recorded MEG data while participants named the same set of living and nonliving objects at two different levels of specificity: domain-level naming and basic-level naming (Figure 1A). To test for modulations of recurrent processing, we first contrasted MEG responses for the two levels of naming specificity using a mass-univariate approach to localize effects across space and time, followed by a cortical ROI analysis from a priori locations based on previous studies (Bar et al., 2006; see Figure 1B, Table 1, and Methods; Tyler et al., 2004). These analyses served to establish the temporal order of effects across space and time. Following previous studies (e.g., Bar et al., 2006), we predicted that increases in recurrent activity will appear as increased anterior then posterior responses, and are predicted to be greater for basic- over domain-level naming. Finally, as a more stringent test for modulation of recurrent interactions, we determined the phase-locking between our cortical ROIs. Increased recurrent interactions are indicated by greater phase-locking that is predicted for basic- compared to domain-level naming. As a second manipulation, we compared basic-level naming for living and nonliving things to determine whether recurrent interactions are also modulated by a more fine-grained manipulation of semantic integration demands. As this contrast is based on a more fine-grained manipulation, modulations of recurrent interactions are predicted for the phase-locking analysis rather than the cortical ROI data.
|Region of Interest|
|Occipital||−26, −90, −2||22, −98, −6||Black|
|Fusiform||−34, −56, −24||−34, −48, −18||Blue|
|Anterior temporal||−22, −6, −20||22, −6, −20||Orange|
|Orbito-frontal||−36, 23, −14||36, 23, −14||Green|
|Region of Interest|
|Occipital||−26, −90, −2||22, −98, −6||Black|
|Fusiform||−34, −56, −24||−34, −48, −18||Blue|
|Anterior temporal||−22, −6, −20||22, −6, −20||Orange|
|Orbito-frontal||−36, 23, −14||36, 23, −14||Green|
Ten healthy participants (8 men), all right handed with normal or corrected-to-normal vision, took part in the study. The average age was 22.7 years (range = 19–31 years). All participants gave informed consent, and the study was approved by the Suffolk local ethics committee.
The study used 110 color images on a white background from two object domains: 65 living items (composed of 31 animals and 34 fruits/vegetables) and 45 nonliving items (composed of 19 vehicles and 26 tools), and were identical to those used in a previous fMRI study (Moss et al., 2005). All images were photographic pictures of the items taken from a canonical viewpoint. Living and nonliving items were matched on measures of concept agreement, typicality, exemplarity, age of acquisition, familiarity, visual complexity, and frequency (all p > .05; for details, see Moss et al., 2005).
Each trial consisted of a centrally presented fixation cross for 600 msec, followed by a picture lasting 500 msec, before a blank screen lasting between 2400 and 2700 msec. The participants' task was to overtly name the object they saw as accurately as possible, while keeping all movements to a minimum. Two naming tasks were employed: (1) basic-level naming, which requires the subject to produce the unique name of the object (e.g., cow, hammer), and (2) domain-level naming, which requires subjects to name the object as a “living” or “manmade” object. Critically, the same object pictures were used in both tasks. All participants performed both naming tasks, and task order was counterbalanced across participants to minimize effects of stimulus repetition. Stimuli were presented in one of two pseudorandom orders, counterbalanced across subjects, with the proviso that consecutive objects neither had a verbal label which began with the same sound, nor were from the same stimulus category (e.g., animals).
The presentation and timing of stimuli was controlled using Eprime software (www.pstnet.com). Vocal response latencies were recorded via an optical microphone, and naming accuracy was recorded by the experimenter during data acquisition.
Naming latencies from incorrectly named trials were excluded from the analysis, as were trials in which objects were incorrectly identified by more than half of the participants. Latencies below 250 msec were excluded as they were likely to reflect recording problems rather than genuine naming. Trials that were noted to include other sources of nonnaming noise were also excluded. Naming latencies from the remaining 96% of trials (range = 91–99%) were inverse transformed to reduce the effects of outliers on the data (Ulrich & Miller, 1994). Each participant's mean inverse transformed latency in each naming task, factored by object domain, was used in the statistical analyses. To facilitate interpretation and comparability with other studies, retransformed latencies are reported.
Continuous MEG data were recorded using a whole-head 306-channel Vector-view system (Elektra-Neuromag, Helsinki, Finland) at a sampling rate of 1000 Hz, with a band-pass filter from 0.03 to 125 Hz at the MRC Cognition and Brain Sciences Unit, Cambridge, UK. Eye movements and blinks were monitored with EOG electrodes placed around the eyes, and four Head-Position Indicator (HPI) coils were used to record the head position (every 200 msec) for subsequent movement compensation. The participants' head shape was digitally recorded using a 3-D digitizer (Fastrak Polhemus, Colchester, VA), along with the positions of the EOG electrodes, HPI coils, and fiducial points (nasion, left and right periauricular).
Static bad channels were detected using the MaxFilter program (Elektra-Neuromag), and were excluded from all subsequent analyses. Compensation for head movements (as measured by HPI coils) and a temporal extension of the signal–space separation (SSS; Taulu, Simola, & Kojola, 2005) technique was applied to the data every 4 sec with MaxFilter (Elektra-Neuromag). To facilitate accurate source reconstruction, T1-weighted MP-RAGE scans with a 1 × 1 × 1 mm voxel size were acquired for each participant with a Siemens 3-T Tim Trio located at the MRC Cognition and Brain Sciences Unit, Cambridge, UK.
MEG Data Analysis
All MEG data analyses used the 204 planar gradiometer channels. The data were low-pass filtered at 40 Hz and epoched from −100 to 450 msec, with respect to the onset of the picture. Baseline correction was applied by subtracting the average response of the 100 msec prior to the onset of the picture from all data points throughout the epoch. Trials were excluded if they contained an EOG amplitude exceeding 200 mV, or if the value on any gradiometer channel exceeded 2000 fT/m. Incorrectly named trials were excluded from the data analysis, although synonyms were accepted (e.g., lorry, truck). On average, 6% of trials were excluded from each participant's data. The remaining trials were averaged according to condition.
Global Field Power
The global field power (GFP) was estimated for each condition and statistically compared using paired-sample t tests at every time point (every 2 msec). To control for multiple comparisons, cluster-mass permutation corrections were used (Maris & Oostenveld, 2007). This method calculates the cluster size of an effect (in this case, the number of contiguous significant effects) that exceeds the alpha level using 10,000 permutations of the data.
3-D (Topography × Time) SPM
A 3-D (topography × time) SPM analysis (www.fil.ion.ucl.ac.uk/spm/) of the sensor data performs a mass univariate approach to test for effects across space and time. The sensor data were transformed with a root-mean-square function, and their topographic distribution was transformed into a 2-D space by linear interpolation to a 32 × 32 pixel grid, which extended through 276 time samples (2-msec) samples). The 3-D (topography × time) images for each subject were used to create t-statistic images that were thresholded at a voxel level of p < .005, and for extent at p < .05 using the nonstationarity toolbox (Hayasaka, Phan, Liberzon, Worsley, & Nichols, 2004) for implementing random field theory.
Structural MRI images were segmented to construct a representation of the cortical surface using FreeSurfer software (http://surfer.nmr.mgh.harvard.edu/). The inflated representation of the cortical surface was used to visualize the results of the MEG source reconstructions. The MEG forward model was calculated with a boundary element model using the inner skull surface segmented from the MRI data. The MEG sensor data were aligned to the MRI skull shape using the digitized head and fiducial points.
The cortical representation provided by FreeSurfer was decimated to contain ∼12,000 dipoles per hemisphere, distributed across the cortical surface, providing activity estimates for ∼24,000 dipoles every 5 msec. Minimum norm estimates (MNEs; Hämäläinen & Ilmoniemi, 1994) were calculated applying depth-weighting and using a loose-orientation constraint (0.3; as recommended in Lin, Belliveau, Dale, & Hämäläinen, 2006) to improve the spatial accuracy of localization. The noise covariance matrix was calculated from the 100-msec prestimulus period, and was used to normalized the data within each subject, resulting in dynamic statistical parametric mapping (dSPM; Dale et al., 2000). This produces a signal-to-noise ratio based activation map for each subject, while equalizing the point-spread function across spatial locations (i.e., activity will be spatially uniform across all locations). Each of these activation maps was normalized across subjects by subtracting the minimum value from the whole epoch and dividing all values by the new maximum value. This has the effect of scaling each subject's data between zero and one (Bar et al., 2006). The minimum and maximum values were obtained for the data from all conditions pooled together, so that differences and similarities between conditions within a subject remained. The normalized current maps were then averaged across subjects to produce group activation maps for display purposes.
Cortical ROI Analysis
The ROI analyses were conducted on a set of a priori defined regions based on previous fMRI data using similar stimuli and the same tasks. ROI locations were defined using the MNI coordinates from significant peak activations from the contrast of basic-level naming versus baseline (fixation cross) (see Tyler et al., 2004 and Table 1). As this study did not reveal prefrontal activity, an additional orbito-frontal ROI was defined as the peak coordinates from a related MEG study (Bar et al., 2006). All regions were defined bilaterally. Each ROI was created using a sphere of 15-mm radius produced using MarsBar (http://marsbar.sourceforge.net), and coregistered with the FreeSurfer group-average brain using SPM5 (www.fil.ion.ucl.ac.uk/spm). The ROIs could then be visualized on the group-average inflated cortex produced by FreeSurfer.
The activation time course of each ROI was extracted from the normalized activation maps and averaged across all diploes within the ROI. The normalized current for each subject was calculated every 10 msec and baseline-corrected (using the prestimulus baseline period). For statistical evaluation, the data were averaged across a time window identified by the sensor analysis before performing repeated measures ANOVAs with factors of naming task and ROI. Mauchley's test was used to assess the assumption of sphericity, and Greenhouse–Geisser corrected degrees of freedom were used if the assumptions were violated.
Cortical Phase-locking Analysis
The degree of phase synchronization between the cortical ROIs was determined using a phase-locking analysis (Lachaux, Rodriguez, Martinerie, & Varela, 1999). First, each ROI was mapped onto the individual subjects' cortical surface. The continuous data were then projected onto the cortical surface using the inverse solution operator calculated during the MNE source reconstruction stage to extract a single time series for each ROI. These data were then epoched between −300 and 600 msec and were baseline-corrected, providing single-trial data for each ROI.
To determine the influence of naming specificity and object domain on naming latencies, a repeated measures ANOVA, with the factors naming task (basic, domain) and object domain (living, nonliving), was conducted. This showed a highly significant effect of naming task [F(1, 9) = 32.20, p < .001], reflecting faster domain-level naming (696 msec) than basic-level naming (967 msec). There was no main effect of object domain [F(1, 9) < 1], indicating no overall difference between living and nonliving items, whereas the interaction of naming task and object domain was marginally significant [F(1, 9) = 4.90, p = .052]. To determine the factors that were driving this interaction, paired-sample t tests were conducted comparing latencies to living and nonliving items during basic- and domain-level naming. No significant differences were found between living and nonliving items during either basic- or domain-level naming. The interaction was due to a slower responses to living compared to nonliving things during basic-level naming [living = 983 msec, nonliving = 949 msec; t(9) = 2.04, p = .069] in conjunction with faster responses to living compared to nonliving things during domain-level naming [living = 682 msec, nonliving = 719 msec; t(9) = 1.60, p = .130].
The sensor space analysis was conducted to test our hypothesis that basic-level naming will elicit increased activity, relative to domain-level naming, and this will involve both anterior and posterior sensor regions. We first used GFP which provides a measure of the total power averaged across all sensors and indicates when effects occur in time. We found that basic-level naming was associated with significantly greater GFP than domain-level naming between 170 and 230 msec (p < .05, cluster-mass corrected, p = .015). The topographical distribution of these differences suggests they were most pronounced over the left hemisphere (Figure 2), suggesting the increased recruitment of these areas when more specific information is required. At no time point was GFP significantly greater during domain- than basic-level naming.
To determine the topographic locations of any significant effects, a 3-D sensor (Topography × Time) SPM analysis was conducted (Table 2, Figure 3A). Basic-level naming was associated with a significantly greater response than domain-level naming between 192 and 258 msec, located over left posterior sensors. In addition, two other clusters approached significance: a left anterior cluster spanning 170 to 196 msec, and a second right posterior cluster from 174 to 206 msec. At no time was there a significantly greater response for domain-level compared to basic-level naming. Taken in conjunction with the GFP findings, these results suggest that basic-level naming generates larger responses than domain-level naming beginning after 170 and continuing until 258 msec.
|Time Window (msec)|
|Basic > Domain|
|Domain > Basic|
|No suprathreshold clusters|
|Time Window (msec)|
|Basic > Domain|
|Domain > Basic|
|No suprathreshold clusters|
Time refers to the peak difference in time; x and y reflect the location of the peak difference; p (corrected) is the cluster-level p value after correction. The time window shows the extent of activity, in time, of each cluster.
In summary, the sensor data show that activity in posterior and anterior left hemisphere sensors are modulated as a function of the semantic integration demands required by the task. However, because the analysis of the sensor data can only approximately localize effects, analyses of the time course and phase-locking were conducted using source reconstructed data.
Next, the cortical responses during basic- and domain-level naming were estimated and contrasted. The contrast between the two conditions during the time window 170 to 258 msec shows increased activity associated with basic-level naming beginning in left ventral prefrontal cortex, followed by increases in left anterior temporal and posterior ventral regions (Figure 3B). To characterize the temporal pattern of effects, the estimated signal strength at each time point was extracted from a set of a priori ROIs derived from Bar et al. (2006) and Tyler et al. (2004). The time course of the difference between basic- and domain-level naming shows increased responses for basic-level naming that are most pronounced between 150 and 300 msec, primarily in left hemisphere ROIs (Figure 3C). This is consistent with the sensor data which showed effects concentrated between 170 and 258 msec that were also predominantly left-lateralized.
Repeated measures ANOVAs with factors of naming (basic or domain) and ROI (occipital, fusiform, anterior temporal, orbito-frontal) were conducted on the time-course data averaged across the window 170 to 255 msec. For the left hemisphere ROIs, there was a significant main effect of naming [F(1, 9) = 8.37, p = .018], reflecting a greater response during basic- than domain-level naming, and a significant main effect of ROI [F(3, 27) = 3.06, p = .045], reflecting the varying levels of responses across the ROIs. The interaction of naming and ROI was also significant [F(3, 27) = 6.39, p = .032], suggesting that the responsiveness of the ROIs varied between the two naming tasks. Additional paired-samples t tests showed basic-level naming was associated with significantly greater activity in the orbito-frontal [t(9) = 2.48, p = .035] and anterior temporal cortices [t(9) = 2.31, p = .046], as well as marginal effects in the fusiform ROI [t(9) = 2.22, p = .054]. In the right hemisphere, there was a significant main effect of naming [F(1, 9) = 6.39, p = .032], showing greater responsiveness during basic- compared to domain-level naming over all ROIs. However, the effect of ROI location [F(3, 27) = 1.23] and the interaction of naming and ROI [F(3, 27) < 1] were not significant.
This analysis shows that basic-level naming is associated with increased activity in the a priori defined ROIs. Further, this increase occurs between 170 and 255 msec and is associated with left hemisphere orbito-frontal, anterior temporal, and fusiform cortices. Although the onsets of the difference time courses in the left hemisphere appear simultaneous, the differences peak earlier in the orbito-frontal and anterior temporal ROIs than in the more posterior fusiform (Figure 3C).
The analysis so far highlights the presence of effects in three left hemisphere regions but does not address whether each region is responding to potentially independent effects, or whether the responses of these regions are functionally related. To address this issue, we used a phase-locking analysis to determine whether the anterior and posterior effects are functionally related, and to test whether any interactions are modulated according to naming specificity.
Only the phase-locking between the left anterior temporal and left fusiform ROIs was significantly modulated by naming specificity (p < .05, cluster-mass corrected, p = .033). This effect occurred between approximately 120 and 220 msec, and 30 to 40 Hz (Figure 4C). There was an additional period of increased phase-locking beginning after 220 msec and continuing until approximately 260 msec, although this cluster did not survive multiple comparisons correction (Figure 4B). The two clusters showing increased phase-locking between left anterior temporal and left fusiform could be construed as a single effect, indicating increased phase-locking during basic-level naming between 120 and 260 msec in the gamma-band frequency range.
In summary, the analysis contrasting different naming tasks shows that greater semantic integration demands are associated with increased recurrent interactions. An analysis of the cortical phase-locking further showed that these effects are driven by interactions between left anterior temporal and left fusiform regions. Our findings support the notion that the magnitude of semantic integration demands modulates recurrent interactions within the object processing system. As a further test of this hypothesis, semantic integration demands were also manipulated by contrasting spatio-temporal patterns of activity associated with processing objects in different domains. As mentioned in the Introduction, living things tend to be more similar to one another than nonliving things, thus requiring a more fine-grained analysis for their unique identification compared to nonliving things (Taylor et al., 2008; Moss et al., 2005; Tyler & Moss, 2001; McRae et al., 1997). Therefore, we predict that increased recurrent interactions should be associated with living things, over and above that of nonliving things.
The Effect of Object Domain
As the contrast of naming specificity found effects prior to 300 msec, the sensor analysis contrasting living and nonliving things was restricted to this time frame. The 3-D (Topography × Time) sensor SPM analysis revealed no significant differences between living and nonliving things named at the basic level. Despite this, modulations of recurrent interactions may still be seen in the ROI phase-locking analysis, as the measure of phase-locking used does not depend on amplitude modulations but only on the synchronization of phase angles between the two signals. As such, the phase-locking analysis was conducted using the whole epoch.
The degree of phase-locking was calculated between the anterior (orbito-frontal, anterior temporal) and posterior (fusiform) regions across both hemispheres to test whether recurrent interactions are modulated by different object domain. We found that living things showed significantly greater phase-locking compared to nonliving things between left anterior temporal and left fusiform ROIs (p < .05, cluster-mass corrected p = .037). This modulation of phase-locking by object domain was centered at 200 msec and around 50 to 80 Hz (Figure 5B). All other comparisons were nonsignificant.
In addition, the left anterior temporal and fusiform regions displayed greater phase-locking for living, compared to nonliving things, in the time range 300 to 500 msec (Figure 6A). To explore this further, the PLVs for each subject were averaged between 50 and 80 Hz (defined by the significant effect ∼200 msec) before comparing the differences in phase-locking between living and nonliving things. This showed two periods of significantly increased phase-locking for living over nonliving things. The first effect, also seen in the initial analysis, was between 176 and 202 msec (p < .05, cluster-mass corrected p = .050), and a later effect was between 296 and 340 msec (p < .05, cluster-mass corrected p = .008) (Figure 6B). Together these results indicate the presence of both early and later modulatory effects of object domain on phase-locking between the left anterior temporal and left fusiform, providing further support to the claim that recurrent interactions during meaningful object recognition are mediated by semantic integration demands.
The aim of the present study was to determine whether the degree of recurrent activity within the object processing system is mediated by semantic integration demands. By contrasting the level of specificity at which the items were named, we manipulated the relative degree of semantic integration through tasks requiring different types of information. Domain-level naming can be performed by integrating simpler information, as compared to basic-level naming that additionally requires the integration of object-specific attributes.
Contrasting the MEG responses for basic- and domain-level naming showed an increase in the phase-locking between left anterior temporal and left fusiform sites associated with basic-level naming between 120 and 260 msec in the gamma-band frequency range. Consistent with our central prediction, these results indicate that recurrent interactivity in the object processing system is underpinned by semantic integration demands.
As an additional manipulation of semantic integration demands, we also contrasted basic-level naming for living and nonliving things, as the more fine-grained analysis needed to uniquely identify living things requires a greater degree of semantic property integration. Consistent with the effect of naming specificity, this contrast showed increased phase-locking for living things between the left anterior temporal and left fusiform cortices in gamma-band frequencies, after approximately 170 to 200 msec, and 300 to 340 msec. Although this effect could be construed as category-specific, within the context of our cognitive model of conceptual knowledge (Taylor et al., 2008; Tyler & Moss, 2001), we interpret the effect as being driven by the degree of conceptual information required, providing further evidence that the degree of recurrent interactions in the object processing system is underpinned by semantic integration demands.
The results of the current study show that both manipulations of naming specificity and object-domain affect modulations of phase-locking between the left anterior temporal and left fusiform ROIs. Further, we find that both effects were in the predicted direction, consistent with the hypothesis that recurrent interactions are mediated by semantic integration demands. In addition, both of our manipulations modulated phase-locking in gamma-band frequencies as well as effects at approximately equivalent latencies, and so it is unlikely that the results can be accounted for by a generic effect of “task difficulty.” This is because, although it is the case that during basic-level naming there are multiple possible lexical outputs and during domain-level naming the lexical output is more restricted, the same cannot be attributed to the comparison of living and nonliving things that were all named at the same level of specificity. In addition, the current experiment used a matched set of living and nonliving things (see Methods and Moss et al., 2005, for details), as well as the same items for both tasks, therefore any effects observed are claimed to be due to the manipulations of task and conceptual structure, rather than task difficulty or stimulus properties.
The cortical ROIs used in this study were defined a priori to ensure the same data were not used for ROI selection and testing. This approach avoids any issues of circularity, although our ROIs may be considered as part of a larger network. This implies that functional connections between other regions may play some role, however, the interaction we observe between anterior and posterior temporal sites appears key to forming complex meaningful representations.
The findings of the current study add an extra dimension to previously proposed models that incorporate recurrent activity as a mechanism for object recognition. Two such models are the two-state interactive theory (2SI; Schendan & Maher, 2009; Schendan & Stern, 2008; Schendan & Kutas, 2002) and the top–down facilitation model (Bar et al., 2006; Bar, 2003).
The 2SI theory proposes that an initial feedforward sweep of responses along both the ventral and dorsal streams acts to establish a coarse categorization of the object. Subsequently, between 200 and 500 msec, recurrent interactions involving prefrontal and posterior object-sensitive regions act to disambiguate the identity of the object, thus facilitating recognition. Bar (2003) suggests a similar functional role of recurrent activity for hypothesis testing, in which feedback responses from orbito-frontal cortex to more posterior object-sensitive regions facilitate recognition by modulating the activity in posterior regions by biasing processing only to the most likely candidates. This is achieved by a mechanism where low spatial frequency information is rapidly projected from visual to orbito-frontal cortex via a dorsal magnocellular pathway. This information acts as a basis for predicting the most likely interpretations of the image, which is then fed back to posterior fusiform regions where it is integrated with (or “matched” against) the incoming bottom–up signal, thereby facilitating recognition. In an elaboration of this model, Kveraga, Ghuman, and Bar (2007) have suggested that if recognition is not established by this mechanism, the cycle of prediction and matching continues in an error reducing, iterative process.
Although both the above models claim recurrent interactions underpin a process of “prediction and test” until the identity of the object is known, neither address the important further issue of what factors drive the amount of recurrent interactions. The research reported here provides valuable new insights showing that variations in the degree to which semantic integration is required for object recognition under different task conditions modulate recurrent interactions within the ventral stream and orbito-frontal cortex.
In addition, the above models suggest a mechanism where feedback responses from prefrontal cortex to more posterior ventral temporal regions facilitate recognition (Schendan & Stern, 2008; Bar et al., 2006; Bar, 2003). Posterior regions of the ventral temporal lobe are routinely found to be active during visual object processing (Haxby et al., 2001; Grill-Spector et al., 1998; Malach et al., 1995), and these same regions are implicated in processing the semantic information about objects (Bright, Moss, & Tyler, 2004; Tyler et al., 2003; Chao et al., 1999). The neurons in these regions have been found to respond to moderately complex information about an object (Tanaka, 1996; Kobatake & Tanaka, 1994), whereas whole objects are represented through distributed patches of neural activation (Tsunoda, Yamane, Nishizaki, & Tanifuji, 2001). Moreover, regions of prefrontal cortex are involved in visual recognition (Bar et al., 2001; Freedman et al., 2001; Parker, Wilding, & Akerman, 1998), and they exert top–down effects on regions of the ventral processing stream during recognition (Ghuman, Bar, Dobbins, & Schnyer, 2008; Kveraga et al., 2007; Bar et al., 2006; Tomita et al., 1996). Feedback signals from prefrontal cortex are claimed to facilitate recognition by biasing/constraining processing in more posterior regions toward the predicted or most likely identity of the input (Miller & D'Esposito, 2005; Miller, Freedman, & Wallis, 2002).
Structures in the anterior temporal lobe, and specifically perirhinal cortex, are hypothesized to process complex feature-conjunctions allowing for more fine-grained discriminations (Bussey, Saksida, & Murray, 2002, 2003; Buckley, Booth, Rolls, & Gaffan, 2001), and disrupting the backward connections from perirhinal cortex to more posterior temporal cortex impairs the identification of visual pair-associates (Miyashita, Okuno, Tokuyama, Ihara, & Nakajima, 1996). Human fMRI data have also shown that anteromedial temporal cortex (including perirhinal cortex) displays a modulatory response to increasing levels of semantic integration (Moss et al., 2005; Tyler et al., 2004). These studies implicate anteromedial temporal cortex as performing a vital role in enabling the integration of semantic information that is coded in the posterior temporal lobe.
This account is also consistent with Simmons and Barsalou (2003), who suggest, in an elaboration of Damasio's (1989) convergence zone theory, that perirhinal cortex performs the most complex feature-conjunctions, and that when identification is difficult (although the meaning of difficult is not defined), feedback from anterior conjunctive neurons to more posterior object representation sites, such as the fusiform gyrus, “re-enacts” a visual representation of the object to guide further processing. Similarly, Patterson, Nestor, and Rogers (2007) highlight the importance of the anterior temporal lobe, although they place their emphasis on the lateral as opposed to the medial portion of anterior temporal cortex. They suggested that the anterior temporal lobes act as a semantic hub, enabling the processing of associations regarding different types of information (e.g., shape, color), both within and across sensory modalities. Under this account, the lateral anterior temporal lobe integrates information from a distributed semantic network and encodes the similarities between conceptual entities. Taken together, these accounts suggest that the anterior temporal lobe region becomes increasingly important as the task requires the disambiguation between similar conceptual entities, although different accounts place a different emphasis on the importance of lateral and medial structures.
Despite this, these accounts point to the interaction between anterior and posterior temporal lobe structures, which may act to disambiguate the representation of the objects, and as our results suggest, this interaction becomes more crucial in situations where greater semantic integration is needed.
Finally, the results of our phase-locking analysis identified a modulation of anterior and posterior synchronization in the gamma-band frequencies (25–100 Hz). Enhanced activity in the gamma-band has been reported in a distributed set of regions during object identification (Martinovic, Gruber, Hantsch, & Müller, 2008; Gruber, Trujillo-Barreto, Giabbiconi, Valdes-Sosa, & Müller, 2006; Lachaux et al., 2005; Tallon-Baudry & Bertrand, 1999). Further, long-range gamma-band synchronization is associated with visual perception, and has been shown to be modulated as a function of whether perception is successful or not (Melloni et al., 2007; Rodriguez et al., 1999), as well as by the familiarity of the item (Supp, Schlogl, Trujillo-Barreto, Müller, & Gruber, 2007; Gruber et al., 2006). Our results further show that long-range synchronization in the gamma-band is modulated by semantic factors.
We note that although the modulation of phase-locking as a function of object domain was statistically significant after correction for multiple comparisons, the duration of these effects was relatively short. One possibility could be that the transient effects are associated with an artifactual signal, however, this seems improbable for two reasons. First, as living and nonliving stimuli were intermixed in the presentation order, it is improbable that an artifact affected one object domain more than another. A second reason concerns the lateralized nature of the effect. Although anterior–posterior connectivity was tested between all ROIs, only connectivity between the left anterior temporal and left fusiform was significant. If the effects of object domain were artifactual (e.g., caused by ocular or muscular sources), effects would most likely have been bilateral. Another possibility is that the effects of object domain are short lived due to the subtle nature of the manipulation and the reduced number of trials used in the estimation of the PLV (∼50 trials per condition as opposed to ∼100 trials in the contrast of naming specificity). Some support for this conclusion comes from the present findings that the effects of naming specificity and object domain occurred at similar times and in similar frequencies.
In conclusion, the results of the current study support the hypothesis that recurrent interactions within the object processing system are mediated by the semantic integration demands required during the evolution of meaningful object representations. Further to this, we found that the functional interaction between the left anterior temporal and left fusiform cortices was modulated by our manipulations of semantic integration. These findings support an interactive model of meaningful object recognition that relies on the coordinated interaction between the anterior and posterior temporal lobe, providing a mechanism to disambiguate the appropriate meaningful representation from the visual input.
We thank the MEG operators and radiographers at the MRC Cognition and Brain Sciences Unit. This work was supported by a Medical Research Council (UK) programme grant (G0500842 to L. K. T.) and a Medical Research Council (UK) Doctoral Training Grant (G0700025 to A. C.).
Reprint requests should be sent to Lorraine K. Tyler, Centre for Speech, Language and the Brain, Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge, CB2 3EB, United Kingdom, or via e-mail: firstname.lastname@example.org.