Abstract
We manipulated the degree of structural similarity between objects that had to be matched either according to whether they represented the same object (perceptual matching) or belonged to the same category (conceptual matching). Behaviorally, performance improved as a linear function of increased structural similarity during conceptual matching but deteriorated as a linear function of increased structural similarity during perceptual matching. These effects were mirrored in fMRI recordings where activation in several ventral posterior areas exhibited a similar interaction between match type and structural similarity. Our findings provide direct support for the notion that structural similarity exerts opposing effects on classification depending on whether objects are to be perceptually differentiated or categorized—a notion that has been based on rather circumstantial evidence. In particular, the finding that structural similarity plays a major role in categorization of instances according to taxonomy challenges the view that the organization of superordinate categories is not driven by shared structural features.
INTRODUCTION
Similarity plays a central part in classification of instances because objects are often assigned category membership based on their shared characteristics (Sloutsky, 2009). Similarity in terms of shape (structural similarity [SS]) seems especially important in this respect as reflected, for example, by the fact that shape similarity forms one of the ontogenetically earliest and dominating bases for classification (Sloutsky, 2009; Mandler, 2000).
Evidence supporting the notion that SS is an important parameter in classification has also come from studies of brain-damaged patients with so-called category-specific disorders. Typically, these disorders affect the recognition or comprehension of natural objects (e.g., animals and plants), whereas the recognition or comprehension of artifacts (e.g., furniture and tools) is relatively preserved. Although less frequently, the reverse pattern has also been reported (for reviews, see Capitani, Laiacona, Mahon, & Caramazza, 2003; Humphreys & Forde, 2001; Gainotti, 2000; Caramazza, 1998). Such observations suggest that natural objects and artifacts may be processed differently, and it has been proposed that the underlying cause of at least some cases of category-specific disorders relates directly to differences in similarity between members belonging to the categories of natural objects and artifacts (Gerlach, 2009; Humphreys, Riddoch, & Quinlan, 1988), with natural objects being more visually and semantically similar to each other than artifacts (McRae & Cree, 2002; Tranel, Logan, Frank, & Damasio, 1997; Humphreys et al., 1988). This difference in similarity is likely to cause different category effects depending on the task at hand. To appreciate this, consider two cases of classification: object identification, where the object has to be classified as a particular instance (say “a fox terrier”), and object categorization, where the object has to be classified as member of a broader class of objects (say “animals”). These two cases clearly differ in how demanding they are in terms of object differentiation. During categorization, the stimulus need not be individuated with respect to other members of its category; you need not decide whether the target item is a cow, a dog, or a horse to categorize it as an animal. On the contrary, the more similar the stimulus is to other members of its category, and the less similar it is to members of other categories, the higher the probability that it belongs to that particular category compared with other categories. During object identification, however, the stimulus (e.g., a fox terrier) needs to be differentiated from other members of its category; now, you must decide not only whether the stimulus is a cow, dog, horse, or something else but also which particular dog you are presented with. In this case, high similarity is harmful for performance. If natural objects are more similar than artifacts, we should thus expect natural objects to be identified less efficiently than artifacts but also expect artifacts to be categorized less efficiently than natural objects. Although these effects have been reported across several studies (see, e.g., Gerlach, 2009), it remains unclear whether the observed effects were indeed driven by differences in similarity, as similarity was typically not under experimental control. Rather, an effect of similarity was inferred post hoc to explain the obtained results. This introduces the risk of circularity, for example, that objects which are categorized fast belong to categories of objects with high similarity, with the degree of similarity being inferred by whether the objects are quickly categorized. Neither has it been demonstrated convincingly that it is similarity in terms of structure rather than similarity in terms of, for example, function, an arguably semantic aspect of an object, which has been driving these effects. In this study, we address both of these outstanding questions directly.
Participants were presented with stimuli composed of two line drawings, which had to be compared. SS was manipulated parametrically in that the pairs could be of low, intermediate, or high similarity. In addition to SS, we also manipulated the type of matching to be performed so that it was either conceptual (“Do the stimuli belong to the same category?”) or perceptual (“Do the stimuli represent the same object?”; see Figure 1).
To judge whether two images represent the same object (perceptual matching), it is, in principle, sufficient to examine whether the two images map onto the same structural representation stored in visual long-term memory (VLTM). If they do, the images represent the same object. If they do not, the images are likely to represent different objects. In conceptual matching conditions, images also need to be matched with VLTM representations; however, this will not suffice to judge whether the objects belong to the same category as members of the same category can vary in structural composition and thus do not map onto the same VLTM representation. Evidence supporting these assumptions comes from studies of patients with visual agnosia. Some of these patients may have difficulties recognizing objects because of impaired VLTM representations (perceptual matching; Humphreys, Riddoch, & Boucart, 1992), whereas other patients seem capable of accessing such representations, for example, by being capable of matching objects seen from different viewpoints, although they cannot match objects according to functional similarity within the visual modality (conceptual matching; Ptak, Lazeyras, Di Pietro, Schnider, & Simon, 2014). Hence, conceptual matching necessitates access to a more abstract level of representation, which can be common even for objects that are dissimilar in shape. Typically, this level is considered semantic in nature and is often referred to as semantic memory (Gerlach, 2009). On the basis of this, we predicted that RTs would be longer on conceptual matching trials compared with perceptual matching trials.
If we assume that structurally similar objects are located nearer each other in psychological space than are structurally dissimilar objects and that discriminability increases as a function of distance in psychological space, then structurally dissimilar objects are easier to tell apart than structurally similar objects (Nosofsky, 1986). This “distance assumption” regarding similarity and psychological space may also apply to neural space. Imagine that (a) every perceived object gives rise to a unique pattern of activation in the brain; (b) the more similar two objects are, the more will their respective brain activations overlap; and (c) discriminability depends on how overlapping activation patterns are, with less overlap increasing the discriminability. If so, discriminability can be understood as distance in neural rather than psychological space, and it so happens that visually similar objects do give rise to more overlapping activation patterns in ventral posterior brain areas than less similar objects do (Weber, Thompson-Schill, Oshersona, Haxby, & Parsons, 2009). On the basis of this distance assumption, we predicted that the efficiency of perceptual matching would decrease as a linear function of increased SS between image pairs representing different objects. As an example, it should be easier to decide that an “apple” and a “banana” are different objects than that a “dog” and a “fox” are different objects. With respect to conceptual matching, we predicted that increasing SS would have the reverse effect on performance. This prediction is based on the observation that structurally similar objects often share similar functional features and cluster categorically (Hills, Maouene, Maouene, Sheya, & Smith, 2009; Rosch, 1999). For animals, for example, similar shape is a product of similar evolutionary constraints; four legs are good for movement on land but apparently rather hopeless for flying. For artifacts, similar shape also often implies similar function (Randall, Moss, Rodd, Greer, & Tyler, 2004; Rogers & McClelland, 2004). Hence, similarity in shape often conveys similarity in other respects too. This means that SS can act as a proxy to semantic category membership, making the conceptual matching process more efficient for structurally similar objects than for structurally dissimilar objects. As an example, because a dog and a fox are highly structurally similar, chances are that they belong to the same superordinate category “animals.” Hence, it is not even necessary to identify each object on a structural level as a dog and a fox before they can be categorized as animals. It is sufficient to see that they are similar in overall shape or share a couple of features (e.g., legs). On the other hand, if two objects are not structurally similar, they must be structurally individuated before category assignment. As an example, because an apple and a banana are structurally dissimilar but still belong to the same category “fruits,” each object must be identified structurally as a particular object before they can be properly categorized. These predictions regarding opposing effects of SS on perceptual and conceptual matching are entirely consistent with the literature mentioned above concerning category effects in visual object processing.
In addition to behavior, we also explored the neural correlates of perceptual and conceptual matching by means of fMRI recordings during the tasks. Given that both perceptual and conceptual matching necessitate structural processing of the stimuli and given that the same stimuli were presented during perceptual and conceptual matching, we did not expect perceptual matching to cause greater activation than conceptual matching across all SS levels in ventral posterior areas of the brain (defined here as regions posterior to y = −40 and inferior to z = 15; Evans et al., 1993; including the calcarine sulcus, lingual gyrus, inferior and middle occipital gyri, and the posterior/middle fusiform gyrus). However, because conceptual matching necessitates access to a more abstract level of representation (semantic memory) than perceptual matching, we did expect conceptual matching to cause greater activation than perceptual matching across all similarity levels in areas associated with semantic processing such as the inferotemporal cortex (Gerlach, Law, Gade, & Paulson, 2000) and/or left inferior frontal gyrus (Thompson-Schill, 2003; see also Binder, Desai, Graves, & Conant, 2009).
Besides this effect of conceptual matching, we also expected to find a positive relationship between increasing levels of SS and activation in ventral posterior parts of the brain during perceptual matching. This follows from the behavioral predictions and is further based on the assumption that ventral posterior parts of the brain are involved in structural rather than semantic processing. This assumption is supported by several lines of evidence: (a) these regions are not involved in semantic processing of both words and pictures—as would be assumed if they supported conceptual knowledge—but seem to respond to the similarity structure of pictures only (Devereux, Clarke, Marouchos, & Tyler, 2013), (b) they are sensitive to changes in shape but not to changes in basic-level semantic categories associated with those shape changes (Kim, Biederman, Lescroart, & Hayworth, 2009), (c) they are implicated in visual agnosia (e.g., Ptak et al., 2014)—a disorder characterized by impaired processing of object structure concurrently with preserved semantic knowledge, (d) they are involved in structural differentiation of objects (Gerlach, 2009; Liu, Steinmetz, Farley, Smith, & Joseph, 2008), and (e) they do indeed exhibit a positive correlation between SS and activation (Collins, Zhu, Bhatt, Clark, & Joseph, 2012; Liu et al., 2008; Joseph & Farley, 2004; Joseph & Gathers, 2003). In the present formulation, the ventral posterior regions process structural information about objects and accumulate evidence for differences in structure to make perceptual decisions but accumulate evidence for similarity in structure to make conceptual decisions. For low similarity pairs, evidence for differences in structure is high so the decision that they are different objects is easier compared with high similarity pairs for which evidence about differences is low. In this case, additional structural processing is needed to differentiate the objects. Therefore, the posterior ventral regions will be engaged more for high similarity than for low similarity pairs for perceptual matching.
In contrast, we expected to find the inverse relationship between SS and activation in the same areas during conceptual matching, that is, increasing degrees of activation as SS diminishes. This hypothesis is based on the rationale that, although conceptual matching is ultimately a semantic task, it also necessitates structural processing. Consequently, if high SS can act as a proxy to category membership, in that structurally similar objects are more likely to belong to the same category than structurally distinct objects, then conceptual matching can largely be based on structural information. However, as described above, if conceptual matching cannot be based largely on SS, then objects will need to be identified at the semantic level. In turn, this will require additional access to structural information, which will engage posterior ventral regions even more strongly. As a result, the low similarity pairs will induce more posterior ventral activation than high similarity pairs for conceptual matching.
In summary: In terms of behavior, we predicted that performance would deteriorate as a function of increased SS during perceptual matching but improve as a function of increased SS during conceptual matching. In terms of brain activation, we predicted that both perceptual and conceptual matching would activate ventral posterior parts of the brain because processing of VLTM representations is required for both types of matching. In addition, we expected activation in these areas to (a) increase during perceptual matching as the SS of the objects to be compared increased and (b) decrease during conceptual matching as the SS of the objects increased (an interaction between task and SS; see Figure 2). This outcome would indicate not only that these regions are integral for processing VLTM representations (i.e., activation is modulated by the similarity manipulation) but also that these regions process VLTM representations based on higher-level task demands of perceptual versus conceptual matching (i.e., differential modulation by similarity in the same region). An alternative account is that the processing in these regions is involved in comparing image-based descriptions, that is, the physical similarity between stimuli rather than the similarity between representations stored in VLTM. Two images that are not similar will not have as many features in common as two images that are more similar, which will have more image features in common. If these ventral posterior regions are simply computing similarity based on image information, then similarity modulation of fMRI signal in these regions would be the same regardless of the higher-level task demand given that the same images are used in the perceptual and conceptual conditions (see Figure 2). Finally, we expected the left inferolateral temporal cortex and/or left inferior frontal gyrus to be associated with a main effect of conceptual matching, as an index of semantic processing.
Although our prediction concerning a negative influence of SS on perceptual differentiation is based on many prior findings, we are unaware of any prior attempts to directly test whether SS can affect superordinate categorization positively. If it can, it will challenge the standard view that the organization of basic level categories may be driven by shared structural features among their members, but superordinate categories are not (Hills et al., 2009; Cutzu & Tarr, 1997). Indeed, Hills et al. (2009) note that the existence of superordinate categories has been taken as prima facie evidence in favor of more abstract and theory-like representations of categories over representations in terms of mere feature distributions.
METHODS
Participants
Twenty-four right-handed healthy adults participated. They all had normal or corrected-to-normal vision, and none of them reported neurological or psychiatric diagnoses or pregnancy. All participants provided informed consent before participating, and all procedures were approved by local institutional review board.
Because of excessive head movement, data from two participants had to be excluded from the analyses of behavioral and imaging data. The mean age of the remaining 22 participants was 22.5 years (SD = 4.1 years, range = 18–31 years; 12 men). In addition, RTs were not recorded appropriately for six participants because of technical failure. Hence, these participants also had to be excluded from the RT analyses (but not the analyses based on error rates), causing RT analyses to be based on 16 participants only.
Tasks and Stimuli
In all tasks, the participants had to compare two stimuli. In the conceptual matching task, they had to decide whether the stimuli came from the same category (animals that were not birds [mammals, fish, reptiles, amphibians, or insects], birds, fruits, or vegetables), whereas in the perceptual matching task, the participants had to decide whether the stimuli represented the same object (e.g., two different images of a dog). For conceptual matching same-response trials (e.g., apple and banana) and perceptual matching different-response trials (e.g., apple and banana), the object pairs were identical. For conceptual matching different-response trials (e.g., apple and broccoli) and perceptual matching same-response trials (e.g., dog and dog), all stimulus pairs differed. For conceptual matching same-trials and perceptual matching different-trials, object pairs were characterized by different degrees of SS: either low SS1 (e.g., banana and apple), intermediate SS2 (e.g., toad and alligator), or high SS3 (e.g., elephant and rhinocerous; see Figure 1). The stimuli were black and white line drawings of common objects, presented two at a time, one above and one below a fixation cross, which appeared in the middle of the screen. They were selected from previous norming studies (Joseph, 1997; Joseph & Proffitt, 1996) in which participants rated the similarity of the two items of a pair in terms of 3-D volumetric structure. In these prior studies, participants were instructed to consider the general volumetric configuration of the objects as opposed to simply the outline. They were also explicitly told to ignore stored knowledge about the objects such as texture, color, size, and taxonomic category. The rating scale was a horizontal bar spanning a length of 1000 pixels at the top of the screen anchored by the labels “least similar” at the left end and “most similar” at the right end. A vertical marker appeared in the center of this bar (at 500 pixels from the left end) at the start of each rating trial. Participants moved this marker along the scale using the mouse and then clicked a mouse button to indicate the degree of similarity between the two objects. The number of pixels that the marker was displaced from the left end of the scale served as the similarity rating, and this value could range from 0 to 1000 pixels; thus, high similarity was associated with values closer to 1000.
The distribution of SS ratings from the prior studies determined the assignment of object pairs to each of the three similarity levels (SS1–SS3) that characterized the conceptual matching same-response trials/perceptual matching different-response trials. A one-way ANOVA conducted across items revealed that the similarity ratings for the SS1, SS2, and SS3 stimuli were indeed significantly different from each other (F(2, 48) = 51.3, p = .0001) with SS1 having the lowest rating (M = 400.1, SD = 18.7), SS2 having a higher rating (M = 544, SD = 19.2), and SS3 having the highest rating (M = 677.0, SD = 19.9). There were no similarity ratings available for conceptual matching different-response trials or for perceptual matching same-response trials; however, SS can be assumed to be quite high for perceptual matching same-response trials as these stimuli pairs depict the same object although in different versions.
The assignment of objects to the four categories was based on the similarity ratings available from the original rating experiments (Joseph, 1997; Joseph & Proffitt, 1996). In these studies, participants completed pairwise ratings for subsets of the object pairs but not for all possible stimulus pairings. So there were only ratings available for birds paired with other birds but not birds with fruits or other kinds of animals, and so forth.
Design and Procedures
The experiment was composed of six experimental conditions: conceptual matching SS1–SS3 and perceptual matching SS1–SS3. Each of the six experimental conditions was composed of 72 trials. In each of the three conceptual matching conditions, 48 of the 72 trials were same-response trials (e.g., apple and banana), and the remaining 24 trials were different-response trials (e.g., apple and broccoli). In each of the three perceptual matching conditions, 48 of the 72 trials were different-response trials (apple and broccoli), and the remaining 24 trials were same-response trials (e.g., dog and dog). The 432 trials were distributed across three functional runs, with 12 task blocks (2 different matching conditions × 3 SS levels × 2 repetitions) interleaved with 11 rest blocks per run. The order of conditions within a run was counterbalanced across participants. Task blocks lasted 36 sec, and rest blocks lasted 12 sec each. Each task block included 12 trials (either eight perceptual matching different-response trials and four perceptual matching same-response trials or eight conceptual matching same-response trials and four conceptual matching different-response trials) presented in random order.
Each trial lasted for 3000 msec. It began with a query question (for 1 sec), either “Same category?” or “Same object?” depending on the matching condition. This was followed by the target object pairs, which were displayed for 400 msec, followed by a screen with a centrally presented “?” displayed for 1600 msec. The objects were presented one above the other with a fixation crosshair in the center. Each object subtended a vertical visual angle of 4°. Responses were collected via an MR-compatible response pad held in the right hand. Participants were instructed to press the “yes” button with their index finger if the objects in a pair came from the same category (conceptual matching trials) or depicted the same object (perceptual matching trials) or else press the “no” button with their middle finger.
Before entering the scanner, individuals were trained to identify which pairs of objects constituted a match or mismatch trial in the context of the conceptual and perceptual matching conditions and also which stimuli belonged to which categories (e.g., that tomato belonged to the category vegetables). First, participants were told that there were four categories (“animals” that were not birds, “birds,” “fruits,” and “vegetables”). They were then given practice viewing each object in each of its versions and were told the category assignment for those objects so that they could learn the association of each object with its category for the purposes of this experiment. Then, they practiced making decisions (“same object” and “same category”) with unlimited time to respond for 16 trials. Then, they completed 36 timed practice trials before the actual experiment in the scanner. During training and during the actual scanning session, participants were asked to respond as accurately and quickly as possible. No feedback was given on performance. During the scanning session, stimuli were presented using a high-resolution rear-projection system, and participants viewed the stimuli via a reflection mirror mounted on the head coil. A desktop computer running E-Prime (Version 1.1 SP3; Psychology Software Tools, Pittsburgh, PA) controlled stimulus presentation and the recording of responses. The timing of stimulus presentation was synchronized with the magnet trigger pulses.
Image Acquisition
A 3-T Siemens Trio magnetic resonance imaging system at the University of Kentucky Medical Center equipped for EPI was used for data acquisition. Four hundred forty-eight EPI images were acquired (repetition time = 3000 msec, echo time = 30 msec, flip angle = 81°), each consisting of 40 contiguous axial slices (matrix = 64 × 64, in-plane resolution = 3.5 × 3.5 mm2, thickness = 3.5 mm, gap = 0.6 mm). A high-resolution T1-weighted magnetization prepared rapid gradient echo anatomical set (192 sagittal slices, matrix = 224 × 256, field of view = 224 × 256 mm2, slice thickness = 1 mm, no gap, echo time = 2.93 msec, inversion time = 1100 msec, repetition time = 2100 msec) was collected for each participant.
Analysis of Behavioral Data
RT and error rates were recorded from participants performing the tasks in the scanner. To ensure that the RT variable was normally distributed to meet the assumptions of a multivariate approach, the log transformation of individual RTs was used (LogRTs). LogRTs from individual trials more than 3 SDs from the overall group mean were considered outliers (no outliers emerged). Only correct LogRTs were submitted to analyses (89% of the data). Each dependent variable was subjected to a two-way repeated-measures ANOVA using a multivariate approach (O'Brien & Kaiser, 1985), with repeated factors Task (conceptual vs. perceptual matching) and SS level (low vs. intermediate vs. high SS). Because SS levels were only truly comparable across conceptual matching same-response trials and perceptual matching different-response trials, data from conceptual matching different-response trials and perceptual matching same-response trials were not included in these ANOVAs. Following Hertzog and Rovine's (1985) recommendation, we use multivariate planned comparisons to test whether the effect of similarity for each matching condition shows a significant linear trend. When the linear trend is not significant, we also report any quadratic trends.
Analysis of fMRI Data
Preprocessing and statistical analysis used FMRIB Software Library (v. 4.1.7; FMRIB, Oxford University, Oxford, United Kingdom). For each participant, preprocessing included motion correction with MCFLIRT, brain extraction using BET, spatial smoothing with a 7-mm FWHM Gaussian kernel, and temporal high-pass filtering (cutoff = 100 sec). Statistical analyses were then performed at the single-subject level (general linear model, FEAT v. 5.98). Each scan was modeled with six explanatory variables (EVs; perceptual and conceptual matching × 3 similarity levels) versus baseline, with the height of each EV determined by the average accuracy for that block, individualized for each participant. Each EV was then convolved with a double gamma hemodynamic response function and a temporal derivative. In addition, six head motion parameters (three translations and three rotations) were also included to control for head motion confounds.
For each participant, contrast maps were registered via the participant's high-resolution T1-weighted anatomical image to the adult Montreal Neurological Institute 152 template (12-parameter affine transformation; FLIRT) yielding images with spatial resolution of 2 mm3. A mixed-effects group analysis (using FLAME) yielded the group-level statistical parametric map of each contrast. Higher level maps were thresholded by p < .01 (false discovery rate [FDR]-corrected). In accordance with the hypotheses presented in the Introduction, the following analyses were performed:
- (1)
To test the prediction that ventral posterior regions are strongly recruited for both perceptual and conceptual matching, we examined the overlap in activation between conceptual matching versus fixation and perceptual matching versus fixation.
- (2)
To examine main effects of Task type (conceptual vs. perceptual matching), we contrasted the perceptual matching task with the conceptual matching task and vice versa.
- (3)
To test the prediction that brain activation in ventral posterior areas would increase with increased SS during perceptual matching but decrease with increased SS during conceptual matching, we looked for interactions between task type and SS level. Because opposing effects of SS on conceptual and perceptual matching should be greatest at the most extreme ends of the similarity dimension, that is, at SS Levels 1 and 3, we set the weights for SS Level 2 to zero for each contrast when examining interaction effects. This should maximize our ability to detect interaction effects. Hence, interaction effects were modeled with the following contrast weights: [−1, 0, 1, 1, 0, −1] for perceptual matching SS Levels 1, 2, and 3 and conceptual matching SS Levels 1, 2, and 3, respectively. For completeness, we also looked for areas where activation increased during conceptual matching but decreased during perceptual matching ([1, 0, −1, −1, 0, 1]) although such interaction effects were not anticipated. In regions associated with interaction effects, we conducted post hoc trend analyses (IBM Statistics, Chicago, IL) examining the BOLD signal across all SS levels. For ROIs isolated from the interaction contrasts, percent signal change relative to fixation was extracted for each event type in each participants' first-level analysis (using FMRIB Software Library's Featquery tool). Percent signal change for 3 similarity levels × 2 matching types (conceptual and perceptual) for each participant and region was then submitted to repeated-measures ANOVAs. The motivation for this analysis was that previous evidence suggests that parametric manipulations are not necessarily linearly related to BOLD signal changes (Birn, Saad, & Bandettini, 2001). Such nonlinear effects have also been observed in manipulations of SS in prior studies (Liu et al., 2008; Joseph & Gathers, 2003). Consequently, although we expected that the voxelwise approach would be most sensitive in isolating differential activation between the two extreme ends of the similarity scale, the post hoc analysis would enable us to confirm that similarity trends were linear or quadratic. In other words, the voxelwise approach did not model the intermediate similarity level, but the post hoc analysis examined the full effect of all three similarity levels.
- (4)
To test the alternative hypothesis that the ventral posterior cortex is involved in processing image-based similarity (i.e., SS would exert the same effect in ventral posterior regions regardless of task type), we used a contrast that reflected both increasing similarity for perceptual matching and increasing similarity for conceptual matching ([−1, 0, 1, −1, 0, 1] for perceptual matching SS Levels 1, 2, and 3 and conceptual matching SS Levels 1, 2, and 3, respectively). If the ventral posterior cortex only processes imaged-based similarity, then this contrast will isolate activation in these regions, but the contrast described in (3) will not.
To ensure that the contrasts described above (2–4) reflected activations rather than deactivations, we made the additional requirement that activation during the experimental conditions (perceptual and/or conceptual matching) should be significantly higher than the activation during fixation (p < .01, FDR corrected) by masking activation maps with maps of perceptual > fixation and/or with maps of conceptual > fixation depending on the specific contrast, using fslmaths.
RESULTS
Behavioral Data
The analysis of errors revealed a main effect of Task (F(1, 21) = 5.3, p < .05) and a main effect of SS level (F(2, 20) = 15.4, p < .0001). These main effects were qualified by an interaction between task and SS level (F(2, 20) = 56.2, p < .0001). As trend analysis revealed a significant linear interaction (F = 108.9, p < .0001), simple trend tests were performed for perceptual and conceptual matching conditions separately. These analyses revealed significant linear trends across SS level for both matching types (F = 159, p < .0001, and F = 16.2, p < .001, for perceptual and conceptual matching, respectively). As seen in Figure 3A (left) and Table 1, the Task × SS level interaction reflects that the error rate increased as SS level increased for perceptual matching, whereas the error rate decreased as SS level increased for conceptual matching.
. | % Errors . | RT (Log) . |
---|---|---|
Conceptual matching | ||
Same responses, SS Level 1 | 21 (4) | 2.834 (.020) |
Same responses, SS Level 2 | 16 (3) | 2.825 (.012) |
Same responses, SS Level 3 | 12 (3) | 2.779 (.016) |
Different responses across all levels | 26 (3) | 2.860 (.008) |
Perceptual matching | ||
Different responses, SS Level 1 | 10 (3) | 2.770 (.021) |
Different responses, SS Level 2 | 18 (4) | 2.785 (.019) |
Different responses, SS Level 3 | 31 (3) | 2.780 (.018) |
Same responses across all levels | 15 (2) | 2.727 (.010) |
. | % Errors . | RT (Log) . |
---|---|---|
Conceptual matching | ||
Same responses, SS Level 1 | 21 (4) | 2.834 (.020) |
Same responses, SS Level 2 | 16 (3) | 2.825 (.012) |
Same responses, SS Level 3 | 12 (3) | 2.779 (.016) |
Different responses across all levels | 26 (3) | 2.860 (.008) |
Perceptual matching | ||
Different responses, SS Level 1 | 10 (3) | 2.770 (.021) |
Different responses, SS Level 2 | 18 (4) | 2.785 (.019) |
Different responses, SS Level 3 | 31 (3) | 2.780 (.018) |
Same responses across all levels | 15 (2) | 2.727 (.010) |
Within-subject 95% confidence interval for error rates and RTs are given in parentheses.
The analysis of RT revealed a main effect of Task (F(1, 15) = 6.2, p < .05) and a main effect of SS level (F(2, 14) = 9.2, p < .001). These main effects were qualified by a Task × SS level interaction (F(2, 14) = 14.1, p < .0001). As trend analysis revealed a significant linear interaction (F = 23.9, p < .0001), simple trend tests were performed for perceptual and conceptual matching conditions separately. These analyses revealed a significant linear trend across SS level for conceptual (F = 25.8, p < .0001) but not for perceptual (F = 0.9, p = .34) matching. As seen in Figure 3B (left) and Table 1, the interaction reflects that RT decreased across conceptual matching conditions as SS level increased, whereas RTs were more constant across perceptual matching conditions and, in fact, did not differ significantly across SS level (F = 1.6, p = .22).
To test whether error rate or RT differed for conceptual matching different-response trials or for perceptual matching same-response trials as a function of SS level, trials from these conditions were subjected to four separate repeated-measures ANOVAs. As expected, none of these comparisons approached significance (all ps > .35), because the assignment of pairs in these conditions to similarity levels was arbitrary (see Figure 3A and B, right).
Imaging Data
We predicted that ventral posterior regions would be strongly recruited for both perceptual and conceptual matching. Figure 4A shows that, indeed, a large expanse of ventral posterior cortex was activated by both perceptual and conceptual matching, according to the fMRI voxelwise analysis. Of course, many other regions were also activated as would be expected from other task demands such as response selection and execution. We also predicted that conceptual matching would activate additional regions involved in semantic processing compared with perceptual matching. However, no areas were significantly more activated during conceptual than perceptual matching.
The primary hypothesis was that brain activation in ventral posterior areas would increase with increased SS during perceptual matching but decrease with increased SS during conceptual matching. This hypothesis was confirmed according to the voxelwise analysis using the contrast that represented the interaction of SS and task (i.e., [−1, 0, 1, 1, 0, −1] for perceptual matching SS Levels 1, 2, and 3 and conceptual matching SS Levels 1, 2, and 3, respectively), as shown in Table 2 and Figure 4B. These large and bilateral posterior and ventral activations were separated into fusiform, inferior, middle, and superior portions using regions in the automated anatomical labeling atlas (Tzourio-Mazoyer et al., 2002) as masks. In addition to these areas, the interaction was also associated with bilateral activation in the cuneus, bilateral activations in the parietal cortex (precuneus), and activation of the left paracingulate cortex. The opposite interaction (increasing fMRI signal as a function of increasing similarity for conceptual matching and decreasing fMRI signal as a function of increasing similarity for perceptual matching; i.e., the contrast [1, 0, −1, −1, 0, 1]) revealed no activation.
Regiona . | BA . | Coordinates (x, y, z) . | Cluster Volumeb . | F Valuec . | F Valuedfor Simple Trends . | |
---|---|---|---|---|---|---|
Perceptual Matching (Trend) . | Conceptual Matching (Trend) . | |||||
L. inferior occipital | ||||||
L. middle occipital | 18 | −30, −90, 5 | 773 | 8.4*** | 6.5** | 3.3* |
L. fusiform | 37 | −36, −60, −16 | 651 | 6.4** | 3.9e,* | 8.1*** |
L. inferior occipital | 19 | −39, −76, −9 | 480 | 5.9** | 1.4 | 5.8** |
L. superior occipital | 18 | −16, −99, 10 | 13 | 12.9*** | 9.3*** | 5.1** |
R. inferior occipital | ||||||
R. fusiform | 37 | 37, −58, −17 | 581 | 7.9** | 0.74 | 13.0*** |
R. inferior occipital | 19 | 40, −77, −9 | 461 | 5.6** | 4.7e,*** | 6.4** |
R. middle occipital | 18 | 33, −87, 6 | 327 | 5.2** | 3.3e,* | 6.0** |
R. superior occipital | 18 | 23, −93, 9 | 62 | 6.2** | 3.2e,* | 8.5*** |
R. precuneus | 7 | 30, −55, 44 | 87 | 2.22 | 0.62 | 2.1 |
L. precuneus | 7 | −27, −56, 45 | 83 | 7.5** | 3.1* | 4.3* |
R. cuneus | 17 | 3, −76, 13 | 53 | 10.5*** | 2.6 | 9.3*** |
L. cuneus | 17 | −10, −73, 10 | 45 | 5.1** | 3.1* | 2.6 |
L. paracingulate | 32 | −1, 9, 48 | 44 | 1.8 | 0.008 | 1.8 |
Regiona . | BA . | Coordinates (x, y, z) . | Cluster Volumeb . | F Valuec . | F Valuedfor Simple Trends . | |
---|---|---|---|---|---|---|
Perceptual Matching (Trend) . | Conceptual Matching (Trend) . | |||||
L. inferior occipital | ||||||
L. middle occipital | 18 | −30, −90, 5 | 773 | 8.4*** | 6.5** | 3.3* |
L. fusiform | 37 | −36, −60, −16 | 651 | 6.4** | 3.9e,* | 8.1*** |
L. inferior occipital | 19 | −39, −76, −9 | 480 | 5.9** | 1.4 | 5.8** |
L. superior occipital | 18 | −16, −99, 10 | 13 | 12.9*** | 9.3*** | 5.1** |
R. inferior occipital | ||||||
R. fusiform | 37 | 37, −58, −17 | 581 | 7.9** | 0.74 | 13.0*** |
R. inferior occipital | 19 | 40, −77, −9 | 461 | 5.6** | 4.7e,*** | 6.4** |
R. middle occipital | 18 | 33, −87, 6 | 327 | 5.2** | 3.3e,* | 6.0** |
R. superior occipital | 18 | 23, −93, 9 | 62 | 6.2** | 3.2e,* | 8.5*** |
R. precuneus | 7 | 30, −55, 44 | 87 | 2.22 | 0.62 | 2.1 |
L. precuneus | 7 | −27, −56, 45 | 83 | 7.5** | 3.1* | 4.3* |
R. cuneus | 17 | 3, −76, 13 | 53 | 10.5*** | 2.6 | 9.3*** |
L. cuneus | 17 | −10, −73, 10 | 45 | 5.1** | 3.1* | 2.6 |
L. paracingulate | 32 | −1, 9, 48 | 44 | 1.8 | 0.008 | 1.8 |
Threshold was set at p < .01, FDR corrected. Also shown are the results from post hoc multivariate planned comparisons performed on percent signal change in the areas showing significant interactions in BOLD signal.
The degree of freedom for all F values is (1, 21).
L = left; R = right.
aRegions written in boldface designate the main peak activation within an area, and regions written in roman designate peaks within the region when separated into subregions.
bNumber of voxels comprising the region.
cThe F value associated with post hoc ANOVAs examining linear trend interactions (Task type × Structural similarity level).
dF value associated with post hoc multivariate planned comparisons examining simple linear trends across the three structural similarity levels for perceptual and conceptual matching, respectively.
eThe simple trend is quadratic.
*p < .10.
**p < .05.
***p < .01.
Because the voxel-level analysis revealed an interaction of SS and task but did not determine whether the SS trend was significant for perceptual matching, conceptual matching, or both, post hoc trend analyses were conducted in the regions in Table 2. Trend analyses were conducted in the context of the 3 (Similarity levels) × 2 (Matching tasks: conceptual and perceptual) repeated-measures ANOVAs. These analyses were based on percent signal change for each SS level relative to fixation, for each of the tasks. Most regions showed significant linear interactions, as expected, given that the voxel-level interaction contrast was used to isolate the regions. Two exceptions were the left paracingulate cortex and the right precuneus. However, most importantly, in 8 of the 13 regions, activation decreased significantly linearly as SS increased during conceptual matching. During perceptual matching, activation also generally increased as SS increased. This effect was significantly linear for the left middle and left superior occipital gyri; significantly quadratic for the right inferior occipital cortex; and marginally quadratic in the left fusiform, right middle occipital and right superior occipital gyri (see Table 2 and Figure 5).
The alternative hypothesis was that the ventral posterior cortex is involved in processing image-based similarity leading to the prediction that SS would exert the same effect in ventral posterior regions regardless of task type. This contrast, however, did not reveal any significant activation according to the voxel-level analysis, and the hypothesis is thus rejected.
Finally, we examined whether the activations revealed by the interaction of SS and task could be explained by task difficulty apart from the SS and task manipulations. In other words, is fMRI signal in the regions that showed similarity modulation driven by overall accuracy or RT? To address this, in each region that showed a significant linear or quadratic trend as a function of SS for either the perceptual or conceptual matching task, the average fMRI signal collapsed over SS level was correlated with average RT or error rate collapsed over SS level. None of these correlations were significant. Therefore, none of the regions that showed significant modulation by SS according to the trend analysis had a greater fMRI signal associated with longer RT or higher errors, apart from the task manipulation itself.
DISCUSSION
Although the main effects found in this study should be interpreted cautiously, as they were compromised by interactions, we note that RTs were generally longer for conceptual compared with perceptual matching trials, as predicted. This supports the assumption that the two matching conditions are tapping into only partly identical cognitive operations; both matching conditions require access to VLTM representations, but as opposed to perceptual matching, conceptual matching also necessitates access to semantic memory representations, which may cause RTs to be somewhat longer on conceptual than perceptual matching trials.
The interpretation offered here for the behavioral difference between perceptual and conceptual matching conditions is entirely compatible with the finding that SS exerted opposing effects on perceptual and conceptual matching. If we assume that structurally similar objects are located nearer each other in psychological space than are structurally dissimilar objects and that discriminability increases as a function of distance in psychological space, then structurally dissimilar objects are easier to tell apart than structurally similar objects (Nosofsky, 1986). On perceptual matching different-response trials, it is therefore relatively easy to decide that highly dissimilar images (e.g., banana and apple) must represent different objects because they map onto points far apart in psychological space. For perceptual matching different-response trials characterized by some degree of SS (SS Levels 2 and 3), some uncertainty regarding whether the images represent the same object may exist, as initial processing may yield activation of more closely located points in psychological space. This uncertainty can only be resolved by sampling more visual information, which will cause RT to increase. If sufficient information cannot be sampled, for example, because of short stimulus exposure duration (in the present experiment: 400 msec) or limited response time (in the present experiment: 2 sec), the consequence will be increased error rates. This interpretation is in accordance with the finding that RTs and error rates, as predicted, generally increased with increased SS level on perceptual matching trials. If high SS can act as a proxy to category membership, as we argue is the case, the interpretation offered above can also account for the finding that RTs and error rates decreased with increased SS level on conceptual matching trials.
As predicted, ventral posterior regions were strongly recruited for both perceptual and conceptual matching. On the basis of prior findings (Devereux et al., 2013; Kim et al., 2009; Liu et al., 2008; Joseph & Gathers, 2003) and the present finding that these regions were modulated by SS (discussed more below), we suggest that this activation reflects structural processing. We also found no areas that were more activated during perceptual matching than during conceptual matching across all SS levels. Although care should be exercised in concluding anything based on a null finding, we do note that this lack of effect is compatible with the assumption that perceptual matching draws on the same initial cognitive operations as does conceptual matching (access to VLTM representations). On the other hand, because conceptual matching, as opposed to perceptual matching, does require access to semantic knowledge in addition to VLTM representations, we did expect to find some areas (left inferolateral temporal cortex and/or left inferior frontal gyrus, see the Introduction) to be more activated during conceptual matching than during perceptual matching across all SS levels. This expectation was not borne out as no regions were associated with higher activation during conceptual compared with perceptual matching. This is so despite the fact that conceptual matching generally did take longer time than perceptual matching, compatible with the assumption that conceptual matching requires an additional step of (semantic) processing compared with perceptual matching. Although we do not want to place much weight on this lack of effect—as it also constitutes a null finding—it may reflect that people cannot refrain from semantic processing during perceptual matching although such processing is not required (Joseph, 1997; Joseph & Proffitt, 1996). In other words, semantic knowledge may have been accessed automatically following the operations necessary and sufficient for performing a perceptual match.
As opposed to the null findings reported above for the main effects of Task type, we found several areas exhibiting an interaction between task type and SS consistent with increased activation as a function of increased similarity for perceptual matching and decreased activation as a function of increased similarity for conceptual matching. As expected, most of these activations were located in ventral posterior brain regions (Brodmann's areas [BA] 17, 18, 19, and 37). However, we also found activations in more dorsal parts of the brain (the precuneus, BA 7) and the paracingulate cortex (BA 32), which were not anticipated.
Post hoc trend analysis revealed significant linear interactions between task type and SS level in all areas reported above except for the right precuneus and the left paracingulate cortex. Hence, the activations associated with this interaction generally reflected areas where activation decreased as a function of increasing SS across the three SS levels during conceptual matching but increased as a function of increasing SS across the three SS levels during perceptual matching. We note that the linear effects were not significant for all simple main effects, especially not for the perceptual matching conditions. However, in some cases, the simple effect of similarity for perceptual matching reflected quadratic trends that were also increasing, which is consistent with prior findings of similarity effects in some brain regions (Liu et al., 2008; Joseph & Gathers, 2003).
In terms of function, the ventral posterior areas are likely to mediate structural processing, that is, the buildup of visual representations and matching of these with representations stored in VLTM (see Gerlach, 2009; Liu et al., 2008). Indeed, these areas have been found to exhibit a positive correlation between SS and degree of activation in tasks demanding perceptual differentiation (Liu et al., 2008; Joseph & Gathers, 2003). What we find in addition is that activation in these areas is also inversely affected by SS when objects are to be conceptually matched. With respect to the precuneus, this area is not commonly associated with structural processing. It has, however, been implicated in both visuospatial processing and spatial attention (Cavanna & Trimble, 2006) and especially in visuospatial tasks, which require shifts in attention between different object features (Nagahama et al., 1999) or attentional shifts to different levels of processing (global shape or details) of complex visual stimuli (Fink et al., 1997). These suggestions can rather easily account for our findings. When participants are to compare two visual stimuli, which are only presented for a limited duration, these stimuli must be kept in visual STM (imagery) if a decision cannot be made before they disappear. Moreover, as the comparisons become harder during perceptual matching when SS increases and during conceptual matching when SS decreases, more information must be sampled to pass a judgment—a process that is likely to require attentional shifts from a global level (outline shape) to details or from some features to others. We note here that only the left and not the right precuneus exhibited a linear trend interaction, which was significant. Whereas we can offer no explanation for this difference, it is interesting that Fink et al. (1997) found that only the left precuneus exhibited a significant positive correlation between regional CBF and number of switches from one processing level to another (global/local).
The last area to be accounted for is the left paracingulate cortex (BA 32). According to the post hoc trend analysis, there were no significant linear or quadratic trends across SS levels during either perceptual or conceptual matching in this region. In addition, this area is usually found activated in tasks that demand a high level of attention or executive control (Niendam et al., 2012) and error monitoring (Bush, Luu, & Posner, 2000). Therefore, it seems likely that this region was recruited for more general aspects of attention and performance monitoring rather than engaged specifically in processing structural VLTM representations.
We found no support for the alternative hypothesis that ventral posterior regions would simply be computing image-based similarity. Given that the exact same pairs of objects were used for perceptual and conceptual matching, this hypothesis was designed to isolate regions involved in comparing the image-based similarity of the two objects. However, no regions survived the contrast that predicted increasing activation for increasing similarity in both perceptual and conceptual matching. In other words, there was no evidence that image-based information was processed independently from the task at hand. Ventral posterior regions, instead, were apparently driven by processing the similarity of VLTM representations, which had a differential effect depending on whether the similarity facilitated the decision (as in conceptual matching) or interfered with the decision (as in perceptual matching). This suggests that the process of matching image-based representations is strongly influenced by task demands in a top–down manner.
Similarity among objects plays an important role in object classification because objects are often assigned category membership based on shared characteristics (Sloutsky, 2009; Nosofsky, 1986), and similarity in object structure seems especially important in this respect. Indeed, evidence suggests that SS can be used as a proxy for category assignment (Gerlach et al., 2000; Rosch, 1999) although categorization is ultimately a conceptual task. Hence, SS is beneficial for categorization in that objects that belong to categories with structurally similar objects are categorized faster than objects that belong to categories with structurally dissimilar objects (Gale, Laws, & Foley, 2006; Kiefer, 2001; Price & Humphreys, 1989). However, when objects need to be differentiated from similar objects, as is required during identification, SS is a disadvantage (Gerlach & Toft, 2011; Gerlach, 2009). Although prior studies suggest that similarity exerts opposing effects on categorization and identification, much of this evidence has been circumstantial. Instead of being the target of direct experimental manipulation, similarity has been invoked post hoc as an explanation for observed effects. This introduces the risk of circularity where the degree of similarity among stimuli is being inferred from the task effects, which in turn are explained with reference to underlying difference among stimuli in similarity. Moreover, as similarity has typically not been under experimental control, it is also unclear which type of similarity may potentially have been driving the effects. Is it similarity in terms of structure, semantics, or both? This is difficult to disentangle based on behavioral studies alone because objects that tend to be similar in shape often tend to have similar functions (Randall et al., 2004).
In this study, we addressed both of these limitations by parametrically manipulating the degree of SS of the objects that were to be matched and by examining which areas were modulated by task type (perceptual or conceptual matching). On the basis of the evidence considered above, we predicted that objects with high SS would be categorized more efficiently than objects with low SS but that objects with high SS would be differentiated less efficiently than objects with low SS during perceptual matching. If such an interaction between task type and similarity level should indeed reflect structural rather than functional similarity, we would expect similar interaction effects in brain regions associated with structural processing. On the basis of prior studies implicating ventral processing stream regions as sites of structural rather than semantic processing (Ptak et al., 2014; Devereux et al., 2013; Gerlach, 2009; Kim et al., 2009; Liu et al., 2008), we predicted that interactions between task type and similarity level would be associated with ventral posterior brain regions. The results of the present experiment support both predictions. In terms of behavior, performance improved as a function of increased SS during categorization (conceptual matching) but deteriorated as a function of increased SS during perceptual differentiation (perceptual matching). This interaction was mirrored in the imaging data where activation in several ventral posterior areas (BAs 17, 18, 19, and 37) generally increased as a function of increasing SS during perceptual matching but decreased as a function of increasing SS during conceptual matching. Although some other areas also showed this interaction (cuneus, precuneus, and paracingulate cortex), the simple effects of similarity in these regions were, for the most part, not significant or only marginally significant (Figure 5).
We found no activation in areas usually associated with conceptual/semantic processing such as the anterior/lateral temporal lobes or the left inferior frontal gyrus, neither for the main effect of conceptual > perceptual matching nor for the interaction between match type and SS level. Although we do not want to place too much weight on these null findings, they are in stark contrast to the clear interaction effects found in ventral posterior brain regions. Given that these latter regions are associated with structural rather than conceptual/semantic processing, this suggests that it is indeed similarity in terms of structure that plays the dominant role in driving the behavioral effects we observed. What is also striking about the present results is that the same images were used for perceptual and conceptual matching tasks, yet similarity modulation of fMRI signal was in opposite directions for the two different tasks. Therefore, a processing account based solely on processing image-based similarity cannot explain the present findings.
In conclusion, the present findings provide strong support for the notion that SS among objects is a very significant factor underlying visual object processing performance and that it exerts opposing effects on classification depending on whether objects are to be perceptually differentiated or categorized. Although the negative impact of SS on perceptual matching concurs with previous findings, we are unaware of any prior studies demonstrating unequivocally that SS can impact positively on superordinate categorization. This suggests that there is a tight coupling between SS and taxonomic structure. A similar finding has recently been reported by Dilkina and Lambon Ralph (2012) who examined feature lists distilled from four different domains: perceptual, functional, encyclopedic, and verbal. They found that, although these domains gave rise to different organizations of conceptual space, based on how different concepts (e.g., dog, tree, knife, boat) tend to cluster within the given domain in terms of shared features, clustering based on perceptual (mainly visual) features correlated highly with taxonomic structure (i.e., superordinate categories such as animal and vehicle) and much more so than did clustering based on features within the functional, encyclopedic, and verbal domains. Our finding concerning the role of SS in superordinate categorization takes this further by showing that the arguably static association between shared perceptual features and taxonomic structure found by Dilkina and Lambon Ralph (2012) is not only correlational but causal and dynamic in the sense that SS is used during “online” categorization of instances according to taxonomy (superordinate categories). Although we do not suggest that SS is all there is to (visually based) superordinate categorization, the present findings clearly suggest that SS is a major determinant in this process. Our findings thus challenge the standard view that the organization of superordinate categories is not driven by shared structural features (cf. Hills et al., 2009; Cutzu & Tarr, 1997).
Acknowledgments
C. G. would like to pay his gratitude to F. F. Fakutsi for the stay at Nexø Neuroscience. This research was sponsored by the National Institutes of Health (R01 HD052724 and R01 MH063817).
Reprint requests should be sent to Jane E. Joseph, Department of Neurosciences, Medical University of South Carolina, 96 Jonathan Lucas St., CSB 325, MSC 606, Charleston, SC 29425-6160, or via e-mail: [email protected].