Abstract

Perceptual learning is the improvement in perceptual performance through training or exposure. Here, we used fMRI before and after extensive behavioral training to investigate the effects of perceptual learning on the recognition of objects under challenging viewing conditions. Objects belonged either to trained or untrained categories. Trained categories were further subdivided into trained and untrained exemplars and were coupled with high or low monetary rewards during training. After a 3-day training, object recognition was markedly improved. Although there was a considerable transfer of learning to untrained exemplars within categories, an enhancing effect of reward reinforcement was specific to trained exemplars. fMRI showed that hippocampus responses to both trained and untrained exemplars of trained categories were enhanced by perceptual learning and correlated with the effect of reward reinforcement. Our results suggest a key role of hippocampus in object recognition after perceptual learning.

INTRODUCTION

Intuitively, the recognition of objects around us seems like an easy task. Most objects have one or several characteristic features, we can use contextual cues, and we often have enough time to enable successful recognition. In many real-life situations, however, object recognition is complicated by impoverished sensory evidence, for example, through poor illumination, occlusion, or brief appearance. Yet, recognition performance in such challenging situations can be improved through training (Baeck & Op De Beeck, 2010; Furmanski & Engel, 2000; Grill-Spector, Kushnir, Hendler, & Malach, 2000). Although such perceptual learning and its neural underpinnings have been studied extensively for low-level visual features, such as contrast or orientation (see Fahle & Poggio, 2002, for a review), less is known about the mechanisms of learning in higher-level perceptual tasks, such as object recognition.

A hallmark of perceptual learning is the specificity of its effects for the features that have been learned, as reported for a number of low-level visual stimulus features (Fahle & Poggio, 2002). Previous work on object recognition learning showed improved performance specifically for those objects that were presented during training (Baeck & Op De Beeck, 2010; Furmanski & Engel, 2000; Grill-Spector et al., 2000). At the neural level, improved object recognition after training is associated with increased activation in object-selective lateral occipital complex (LOC; Grill-Spector et al., 2000).

From these findings, the important question arises whether perceptual learning of object recognition is confined to specific object exemplars or generalizes to other stimuli pertaining to the same category. Such a generalization at the category level would imply that learning is not solely based on specific low-level features of a particular object but involves a higher-level process that relies on the extraction of the features common to—and therefore constitutive of—a given category of objects.

Another important aspect of perceptual learning is the role of reward reinforcement, because of its implications for the underlying neural mechanisms, in particular with regard to an involvement of global neuromodulatory systems (e.g., the dopaminergic system; Roelfsema, van Ooyen, & Watanabe, 2010; Seitz & Dinse, 2007). At the behavioral level, it is evident that we need to be able to prioritize the learning of important stimuli in an environment with abundant sources of sensory information. One might expect, for example, that perceptual learning will be enhanced if the successful recognition of an object is associated with favorable consequences such as reward. In line with this notion, enhancing effects of reward on perceptual learning were reported for low-level orientation discrimination tasks (Baldassi & Simoncini, 2011; Seitz, Kim, & Watanabe, 2009), although it has remained unclear whether reward can also boost higher-level perceptual learning. We reasoned that, if reinforcement affected only early processing stages, the effect of reward should be limited to the particular object exemplars that are used for perceptual training, whereas a higher-level category-related mechanism should enhance performance also for untrained object exemplars belonging to the same category.

Here, we studied the behavioral and neural effects of category-related object recognition learning and their dependence on reward. Recognition of objects presented with backward masking and associated neural responses were examined with fMRI before and after extensive training. Object categories were coupled with either high or low reward during training. To assess the specificity of learning, the pretest and posttest additionally included a set of untrained exemplars of the same categories and an additional completely untrained object category.

As the aim of our study was to investigate the learning of object recognition under challenging viewing conditions rather than the learning of new categories or boundaries between them, we used object categories that were familiar to the participants beforehand. Objects were presented very briefly and with backward masking to reduce visibility (Baeck & Op De Beeck, 2010). In coping with those challenging viewing conditions, the brain may either optimize processing at the sensory stage, as reported in a previous study (Grill-Spector et al., 2000), or improve the interpretation of the sensory evidence by means of high-level mechanisms such as the integration of sensory priors (Seriès & Seitz, 2013) or perceptual completion (Pessoa & De Weerd, 2003).

We hypothesized that category-related learning should be reflected by a specific enhancement of posttraining recognition performance for trained object categories and, importantly, that it should extend to those exemplars of the trained categories that were not presented during training. Furthermore, we predicted enhanced learning for high-rewarded as opposed to low-rewarded trained categories and reasoned that whether the effects of reward generalized to untrained exemplars would be informative in regard to the processing level at which reward reinforcement takes effect. Furthermore, we reasoned that, if reinforcement affected only early processing stages (cf. Seitz et al., 2009), the effect of reward should be limited to trained exemplars, whereas a higher-level mechanism should enhance performance for the whole object category associated with reward. Finally, we hypothesized that category-related learning effects should lead to enhanced responses to trained object categories in the LOC (Grill-Spector et al., 2000) or in medial-temporal lobe structures, such as the perirhinal cortex and the hippocampus, that have been implicated in high-level perception (Aly, Ranganath, & Yonelinas, 2013; Mundy, Downing, Dwyer, Honey, & Graham, 2013; Lee, Yeung, & Barense, 2012; Graham, Barense, & Lee, 2010; Lee, Buckley, et al., 2005; Lee, Bussey, et al., 2005).

METHODS

Participants

Twenty healthy individuals participated in this fMRI experiment. Two participants were excluded from the analysis because of excessive head movement in the scanner (several shifts of more than 3 mm), leaving 18 valid participants (10 women, mean age ± SEM = 25.7 ± 1.1 years) for the subsequent behavioral and fMRI data analysis. Additional to a fixed allowance (€ 26), participants received monetary rewards proportional to the points they earned in the five test sessions (two fMRI sessions and three behavioral sessions). All participants gave their informed written consent, and the study was approved by the local ethics committee.

Task and Experimental Setup

Each participant came in on 5 successive days. Training took place during Days 2–4 in a darkened room, in which the participants sat in front of a 17-in. LCD monitor (LG Flatron L1750S, 60 Hz, at 1024 × 768; Englewood Cliffs, NJ) with a standard computer keyboard. The pretraining and posttraining sessions on Days 1 and 5 took place in the fMRI scanner with a beamer/projection screen setup (Sanyo PLC-XT21L, 60 Hz, at 1024 × 768; Moriguchi, Japan) and a response box.

In each trial of the object recognition task (Figure 1A), the participants were briefly presented with an object (17 msec), followed by either an ISI of 17 msec (learning condition) or an ISI of 0 or 33 msec (stimulating conditions) and a pattern mask (150 msec). The condition of interest was the learning condition with an intermediate ISI of 17 msec. The 0- and 33-msec conditions were included to stimulate learning in the 17-msec condition by providing participants with the occasional experience of higher visibility (33 msec), although concurrently challenging them at the threshold of awareness (0 msec). Indeed, a number of previous studies have demonstrated the benefit of training at high- and low-accuracy levels (Liu, Lu, & Dosher, 2012; Petrov, Dosher, & Lu, 2005, 2006). In addition, it has been shown that a mix of hard and easy trials is important to foster both specificity and generalizability of learning (Ahissar & Hochstein, 1997), which was of particular importance in our paradigm. In the framework of their reverse hierarchy model, Ahissar and Hochstein (2004) further conjecture that initial easy conditions (as in our first fMRI session) may lead to a “Eureka effect,” which in turn enables learning in more difficult conditions.

Figure 1. 

Experimental procedure. (A) Schematic trial of the object recognition task. (B) Experimental design. For each participant, the category pairs were split into a trained pair with high reward, a trained pair with low reward, and an untrained pair. Trained categories were additionally halfsplit into a set of trained and untrained exemplars. (C) The stimulus set consisted of six categories, each comprising 12 exemplars. Each category was assigned to one of three category pairs.

Figure 1. 

Experimental procedure. (A) Schematic trial of the object recognition task. (B) Experimental design. For each participant, the category pairs were split into a trained pair with high reward, a trained pair with low reward, and an untrained pair. Trained categories were additionally halfsplit into a set of trained and untrained exemplars. (C) The stimulus set consisted of six categories, each comprising 12 exemplars. Each category was assigned to one of three category pairs.

After a variable delay (2000 ± 1000 msec), in which a white fixation cross was shown, a response screen appeared, offering two response options left and right of the fixation cross (in randomized order). At the same time, the fixation cross turned green, indicating the beginning of the response phase. Participants had 1700 msec to make a choice. Different keys (buttons) were assigned to the options left and right of the fixation cross. After the response phase—and only during training—a feedback screen was shown for 1000 msec, indicating whether the participants responded correctly and how much reward they received (see below). During the intertrial interval (2000 ± 1000 msec), a white fixation cross was presented.

In total, there were three category pairs (shoe–fish, pot–bird, and hat–snail), with each category consisting of 12 exemplars. Participants always had to discriminate within a category pair, for example, when presented with a fish, the response screen offered the options shoe and fish (in counterbalanced order). The categories within a pair were carefully chosen to make the discrimination task challenging in the sense that the rough shape of the stimuli would not be sufficient to infer the category. Thus, by selecting categories with similar shapes and presenting them in random orientations, we forced the participants to recruit more fine-grained features of the stimuli.

For each participant, the category pairs were divided into two trained category pairs and one untrained category pair. Trained category pairs were presented both in the fMRI sessions and the training sessions, whereas the untrained category pair was only shown in the fMRI sessions. Across participants, we ensured counterbalancing of trained and untrained category pairs. The trained categories were further divided into six trained exemplars (presented in fMRI and training sessions) and six untrained exemplars (presented only in the fMRI sessions) of each category (see Figure 1B for a schematic overview).

Each fMRI and behavioral session consisted of eight runs. In each fMRI run, every exemplar of every category was presented once (adding up to 72 trials per run), whereas in runs of a behavioral session, each trained exemplar was presented twice (48 trials per run). Within each run, categories were counterbalanced across ISIs.

As a final manipulation, the trained category pairs were randomly assigned to high- and low-reward conditions. Participants received 5 points for correct responses to high-reward categories and 1 point in case of low-reward categories (0 points for incorrect responses). This assignment was stable for one participant throughout the experiment and counterbalanced across participants. Because participants received feedback only in the training phase, untrained categories were not associated with any reward. After the experiment, the sum of all collected points was converted to Euro cents (1:1 ratio, i.e., 1 point = 1 cent) and paid out to the participants.

Stimuli

The stimulus set (Figure 1C) consisted of 72 grayscale objects (450 × 450 pixels) subtending 10° × 10° of visual angle. We equalized the gray value histogram of all stimuli to match low-level properties as well as possible. The masks were generated on a trial basis by sampling a 450 × 450 pixel matrix of uniformly distributed values between 0 and 1, low-pass filtering with a Butterworth filter (cutoff frequency = 0.025 1/pixel), and thresholding at 0.5. To ensure equal physical luminance for stimulus presentation during fMRI, the parameters of all object and mask stimuli were adapted to the projection screen in the fMRI scanner based on luminance measurements. The timing of the stimulus presentation was inspected with a high-speed digital camera (1000 frames/s) and was confirmed as correct.

ROI Procedures

At the end of one of the fMRI sessions, we additionally performed a localizer run to map object-responsive regions of each participant (Malach et al., 1995). The localizer run consisted of 10 blocks of intact images and 10 blocks of scrambled images in a randomized order. Images were presented for 600 msec followed by a 200-msec blank screen. To hold participants' attention, they had to press a button whenever the same image appeared twice in a row.

We generated an ROI for the LOC (872 ± 4 voxels) based on the intersection of object-responsive voxels of the localizer (pFWE < .05) and an anatomical composite mask composed of the inferior and middle occipital gyrus, inferior temporal gyrus, and fusiform gyrus (derived from the Anatomical Automatic Labeling atlas; Tzourio-Mazoyer et al., 2002).

fMRI Data Acquisition and Preprocessing

Functional MRI data were acquired on a 3-T Siemens Trio (Siemens, Erlangen, Germany) scanner using a gradient EPI sequence and a 12-channel head coil. The experiment was composed of two fMRI sessions with eight runs of the main experiment on Days 1 and 5. In each run of the main experiment, 214 whole-brain volumes were acquired (repetition time = 2 sec, echo time = 25 msec, flip angle = 78°, 33 slices, descending acquisition, resolution = 3 mm isotropic, interslice gap = 0.75 mm). In addition, a functional LOC localizer run (233 volumes) was performed. On both sessions, a high-resolution T1-weighted magnetization prepared rapid gradient echo image was acquired as an anatomical reference (repetition time = 1.9 sec, echo time = 2.51 msec, flip angle = 9°, 192 slices, resolution = 1 mm isotropic) as well as a standard fieldmap (Hutton et al., 2002). Preprocessing was performed by using SPM8 (www.fil.ion.ucl.ac.uk/spm) and included realignment, field map correction, smoothing (8-mm FWHM), and spatial normalization to a standard Montreal Neurological Institute template.

Statistical Inference in the fMRI Data Analysis

Statistical analyses of the fMRI data was composed of two steps. At the first level, a general linear model was estimated for both fMRI sessions of each participant. The general linear models contained a regressor for the onset of the response screen and six motion regressors from the realignment analysis. Regressors for the experimental conditions modeled the effects of training (trained exemplars of trained categories, untrained exemplars of trained categories, and untrained categories), ISI (0, 17, and 33 msec), and reward (high reward and low reward). Although all three ISIs were modeled to account for associated variance in the data, only the learning condition (17 msec), as our condition of interest, was considered for all further statistical analyses. Regressors for the experimental conditions as well as one regressor for motor responses were modeled as stick functions convolved with a canonical hemodynamic response function. Contrast maps from each participant were submitted to a second-level random effects analysis, using a flexible factorial ANOVA with repeated measures. We considered effects statistically significant if FWE-corrected p values passed a significance threshold of <.05 at the cluster level (denoted as pcFWE; using a cluster-defining threshold of p < .001). If the search space was restricted to a small volume, we applied a threshold of p < .05 at the voxel level to FWE-corrected p values (denoted as pSVC). If appropriate, we reported effects with uncorrected p values (denoted as punc) for responses in the homologous contralateral area.

Category-specific Training Effect

Estimated beta maps for trained categories (both trained and untrained exemplars; TC) and untrained categories (UC) of the learning condition were submitted to a repeated-measures group-level ANOVA with factors Training and Session (pretraining/posttraining). To obtain the category-specific training effect, we computed the interaction of Training and Session as follows: (TCpost − TCpre) − (UCpost − UCpre).

Within-category Transfer

Estimated beta maps for untrained exemplars of trained categories (UE) and untrained categories (UC) of the learning condition were submitted to a repeated-measures group-level ANOVA with factors Training and Session (pretraining/posttraining). The within-category transfer was computed analogous to the category-specific training effect: (UEpost − UEpre) − (UCpost − UCpre).

Reward Effect

Because of the exemplar specificity of reward reinforcement at the behavioral level, we restricted the analysis at the group level to trained exemplars of trained categories (TE). Two different analyses were performed. In the first analysis, using repeated-measures ANOVA with factors Session (pretraining/posttraining) and Reward (low reward and high reward), we tested for an interaction of reward and session.

In a second post hoc analysis, we tested for a correlation of the behavioral and neural reward effects. At the neural level, we computed first-level contrasts of the reward effect as follows: (TEhigh,post − TEhigh,pre) − (TElow,post − TElow,pre). Analogously, we computed the behavioral reward effect for each participant as the performance improvement for high-rewarded trained exemplars minus the improvement for low-rewarded trained exemplars. This allowed us to correlate behavioral reward effects with neural reward effects. Because we were interested in modulatory effects of reward reinforcement on top of the training effects, we restricted this analysis to regions that showed a category-specific training effect. To this end, we generated post hoc spherical ROIs with 12-mm radius, centered at the peak voxels of the category-specific training effect in the bilateral inferior hippocampus.

RESULTS

Category-specific Perceptual Learning

Figure 2A shows the recognition performance for trained exemplars over the course of five sessions. A two-way ANOVA with repeated measures on the proportion of correct responses revealed a significant main effect of Session (F(4, 68) = 32.8, p < .001) and ISI (F(2, 34) = 166.5, p < .001) and a significant interaction (F(8, 136) = 5.9, p < .001). We found similar effects for response times (Figure 2B), which showed a main effect of Session (F(4, 68) = 3.8, p = .008), a main effect of ISI (F(2, 34) = 33.9, p < .001), and a Session × ISI interaction (F(8, 136) = 2.1, p = .043).

Figure 2. 

Perceptual learning across five sessions. Error bars represent SEM. (A) Learning curves for trained exemplars based on proportion of correct responses (referred to as performance). (B) RTs for trained exemplars.

Figure 2. 

Perceptual learning across five sessions. Error bars represent SEM. (A) Learning curves for trained exemplars based on proportion of correct responses (referred to as performance). (B) RTs for trained exemplars.

To assess whether improvements in object recognition were category specific, we compared the pretraining and posttraining performance of trained and untrained categories. Performance improvements were significantly higher for trained categories, evidenced by a Training (trained categories and untrained categories) × Session (pre, post) interaction for recognition performance (F(1, 17) = 7.7, p = .013). This demonstrates that the improvements were, to a large extent, specific to trained categories, henceforth referred to as the category-specific training effect. This effect was also present for RTs (F(1, 17) = 18.2, p < .001). However, the training effects were dependent on the ISI as indicated by a three-way interaction effect Session × Training × ISI for both recognition performance (F(2, 34) = 4.1, p = .024) and RT (F(2, 34) = 4.1, p = .025). This was owed to our design, in which we intended to stimulate perceptual learning by including both an easy condition (33 msec) and a difficult condition (0 msec; Liu et al., 2012). Because there were no or only marginal category-specific training effects in those conditions—both in terms of recognition performance (0 msec: F(1, 17) = 0.006, p = .94; 33 msec: F(1, 17) = 3.5, p = .079) and RT (0 msec: F(1, 17) = 1.1, p = .31; 33 msec: F(1, 17) = 3.1; p = .094)—we confined all further analyses of neural and behavioral learning effects to the 17-msec condition. As expected, the 17-msec ISI showed a robust category-specific training effect for recognition performance (F(1, 17) = 13.3, p = .002; Figure 3A) and RT (F(1, 17) = 19.3, p < .001). In direct comparison with the other two ISIs, the 17-msec condition showed a stronger category-specific training effect both in terms of recognition performance (0 msec: F(1, 17) = 9.7, p = .006; 33 msec: F(1, 17) = 3.7, p = .070) and RT (0 msec: F(1, 17) = 6.7, p = .019; 33 msec: F(1, 17) = 4.9, p = .041).

Figure 3. 

Specificity of perceptual learning in the 17-msec condition. Performance improvements were computed as proportion of correct responses posttraining minus pretraining. Error bars represent SEM. (A) Trained and untrained categories. (B) Untrained exemplars of trained categories and untrained categories. (C) Trained and untrained exemplars of trained categories.

Figure 3. 

Specificity of perceptual learning in the 17-msec condition. Performance improvements were computed as proportion of correct responses posttraining minus pretraining. Error bars represent SEM. (A) Trained and untrained categories. (B) Untrained exemplars of trained categories and untrained categories. (C) Trained and untrained exemplars of trained categories.

Within-category Transfer

To determine whether the category-specific training effect in the 17-msec condition was limited to trained exemplars or whether it generalized to untrained exemplars of the same category, we directly compared the improvements for untrained exemplars and untrained categories. As shown in Figure 3B, there was an advantage for untrained exemplars of trained categories relative to untrained categories (t(17) = 2.39, p = .029). Thus, perceptual learning was not limited to the particular object exemplars that were shown during training, indicating a transfer of learning to untrained exemplars within a category and, thus, a true category-specific (rather than exemplar-specific) training effect. Hereinafter, we refer to this effect as within-category transfer.

Beyond the within-category transfer, there was an additional advantage for trained compared with untrained exemplars (t(17) = 3.07, p = .007; Figure 3C), indicating a component of exemplar-specific perceptual learning.

Taken together, we found evidence for both category- and exemplar-specific effects of perceptual learning.

The Effect of Reward on Perceptual Learning

During training, one of the two trained category pairs was associated with a high monetary reward (€ 0.05) upon successful object discrimination, whereas the other respective category pair was associated with a low monetary reward (€ 0.01). Figure 4A and B shows the improvement for trained and untrained exemplars depending on the reward association. In a two-way repeated-measures ANOVA with factors Training (trained exemplars and untrained exemplars) and Reward (high reward and low reward), the main effect of Reward was not significant (F(1, 17) = 2.96, p = .10). However, there was a significant main effect of Training (F(1, 17) = 10.1, p = .006) and a significant Training × Reward interaction (F(1, 17) = 4.75, p = .044). A post hoc t test revealed a Reward effect specifically for trained exemplars (t(17) = 2.78, p = .013) but not for untrained exemplars (t(17) = −0.15, p = .883; see Figure 4A). Thus, only those exemplars of trained categories that were seen during the reinforcement phase (Sessions 2–4) showed an effect of reward reinforcement.

Figure 4. 

Effect of reward reinforcement on perceptual learning in the 17-msec condition. Error bars represent SEM. (A) Trained exemplars associated with low or high reward. (B) Untrained exemplars associated with low or high reward.

Figure 4. 

Effect of reward reinforcement on perceptual learning in the 17-msec condition. Error bars represent SEM. (A) Trained exemplars associated with low or high reward. (B) Untrained exemplars associated with low or high reward.

A Signature of Category-related Perceptual Learning in the Hippocampus

The object stimuli, although presented at the threshold of visibility, reliably engaged LOC (pcFWE < .001, t(51) = 13.27; Figure 5), which is known to play a key role in object processing and object recognition (Grill-Spector, Kourtzi, & Kanwisher, 2001; Malach et al., 1995).

Figure 5. 

Stimulus-related brain activity in the object recognition task. The presentation of objects in the 17-msec condition reliably engaged the LOC. The depicted t maps represent the average response to the object stimuli across all conditions (against the implicit baseline) and are thresholded at punc. < .001.

Figure 5. 

Stimulus-related brain activity in the object recognition task. The presentation of objects in the 17-msec condition reliably engaged the LOC. The depicted t maps represent the average response to the object stimuli across all conditions (against the implicit baseline) and are thresholded at punc. < .001.

To assess neural correlates of category-specific training over and above unspecific learning effects, we examined the interaction of training and session. We found a significant effect of category-specific training in the inferior hippocampus (peak in the right hippocampus at [22, −28, −12], t(51) = 5.17, pcFWE = .008; Figure 6A). At a more lenient threshold of punc. < .001, the left inferior hippocampus also showed an isolated cluster of activation (peak at [−26, −32, −12], t(51) = 3.84, punc. = .0001). When directly contrasting trained and untrained categories (trained > untrained) in the posttraining session, we likewise found significant effects in the right hippocampus (t(17) = 6.20, pcFWE = .005, peak at [28, −28, −12]) and, at a more lenient threshold, left hippocampus (t(17) = 4.17, punc. < .0001, peak at [−26, −32, −12]), both in proximity to the peaks of the category-specific training effect. Thus, our results are clearly not because of baseline differences.

Figure 6. 

Neural correlates of perceptual training. Whole-brain t maps are thresholded at p < .005, uncorrected, for display. (A) Category-specific training effect. Voxels in the bilateral hippocampus showed an increase in activation for trained categories relative to untrained categories. (B) Session difference of hippocampal responses (peak voxels) for trained and untrained categories. Error bars represent SEM. (C) Correlation of the behavioral category-specific training effect (performance improvement for trained categories minus performance improvement of untrained categories) and the category-specific training effect at the neural level. (D) Within-category transfer. Bilateral hippocampal activation increased also for the subset of untrained exemplars (relative to untrained categories). (E) Neural correlate of reward reinforcement. T maps depict voxels that showed a significant correlation of the neural and behavioral reward effects within a spherical ROI (green circles) around the bilateral hippocampal peak voxels of the category-specific training effect.

Figure 6. 

Neural correlates of perceptual training. Whole-brain t maps are thresholded at p < .005, uncorrected, for display. (A) Category-specific training effect. Voxels in the bilateral hippocampus showed an increase in activation for trained categories relative to untrained categories. (B) Session difference of hippocampal responses (peak voxels) for trained and untrained categories. Error bars represent SEM. (C) Correlation of the behavioral category-specific training effect (performance improvement for trained categories minus performance improvement of untrained categories) and the category-specific training effect at the neural level. (D) Within-category transfer. Bilateral hippocampal activation increased also for the subset of untrained exemplars (relative to untrained categories). (E) Neural correlate of reward reinforcement. T maps depict voxels that showed a significant correlation of the neural and behavioral reward effects within a spherical ROI (green circles) around the bilateral hippocampal peak voxels of the category-specific training effect.

These bilateral clusters of activation showed considerable overlap with a specific subfield of hippocampus, the subiculum. At a liberal threshold of p < .01, most active voxels in both the left and right hippocampus were contained in an anatomical map of the subiculum (Eickhoff et al., 2005; left: 54% overlap; right: 59% overlap). When we considered only the most significant voxels (punc. < 10−6), all 26 surviving voxels belonged to the map of the subiculum.

No other significant category-specific neural effects of training were observed, neither in the functionally localized LOC nor in any other brain region outside the hippocampus. To quantify changes for trained and untrained categories separately and to correlate individual category-specific training effects with behavior, we extracted the contrast estimates at the group-level peak voxels in the left and right hippocampus. Figure 6B shows that the interaction effect in both the left and right hippocampus was largely driven by an increase of activity for trained categories, whereas the response to untrained categories remained unchanged.

To assess the functional relevance of hippocampal activation for successful perceptual learning, we probed the association between the category-specific training effect in the hippocampus and the behavioral training effect. We found a significant correlation in the left hippocampus (rPearson = .51, p = .029). The correlation in the right hippocampus, although in the same direction, was not significant (rPearson = .34, p = .167; Figure 6C).

As in the behavioral analysis, we next asked whether training effects were limited to those exemplars that were shown during training or whether the effects generalized to untrained exemplars of the same categories. Indeed, when only considering the subset of untrained exemplars, we again found a correlate of category-specific learning in the same location in the inferior hippocampus (peak at [26, −26, −14], t(51) = 4.91, pcFWE = .007; Figure 6D). At a more liberal threshold of punc. < .001, we also found a cluster in the left inferior hippocampus (peak at [−26, −32, −12], t(51) = 3.44, punc. = .0005). Accordingly, a separate analysis directly comparing the training effect for trained and untrained exemplars within the trained category did not yield any significant activation in the hippocampus even at a liberal threshold of punc. < .01. Thus, also at the neural level, we found evidence for a within-category transfer of training-related changes.

Neural Correlates of Reward-related Perceptual Learning

Given the exemplar specificity of the reward effect at the behavioral level, we constrained the analyses on effects of reward reinforcement to trained exemplars. A whole-brain analysis did not show a significant interaction of reward reinforcement and session, that is, no brain region showed a significant BOLD increase for high-rewarded relative to low-rewarded exemplars. However, as the individual reward effects varied quite substantially at the behavioral level, we went on to investigate whether there were more subtle effects in terms of a correlation between the participants' behavioral and neural reward effects. We focused this post hoc analysis on the hippocampus, where we had found a strong category-specific training effect. To protect against type I error, we performed small-volume correction using a sphere four voxels (12 mm) in radius centered at the bilateral hippocampal peaks of the category-specific training effect. This yielded significant effects both in the left (peak at [−26, −20, −12], t(16) = 4.35, pSVC = .026) and right (peak at [26, −24, −18], t(16) = 4.6, pSVC = .018; Figure 6E) hippocampus. Thus, neural activity in the inferior hippocampus was modulated by monetary reward reinforcement during training.

DISCUSSION

We examined perceptual learning of object recognition under challenging perceptual conditions both at the behavioral and neural levels. Our design allowed us to isolate learning-related effects for specific object categories, to dissociate these effects from those related to particular object exemplars, and to assess the effects of reward on category-related perceptual learning.

Behaviorally, we found a marked improvement in object recognition performance over the course of 5 days. Crucially, this improvement was not limited to trained object exemplars but generalized to exemplars within the same object category that had not been presented during training. Thus, the training effect not only was because of the learning of specific low-level features of the particular objects shown during training but also reflected improved recognition based on category-related visual information. Furthermore, recognition performance was enhanced for stimuli associated with high reward during training, relative to stimuli linked to low reward. However, this effect of reward reinforcement did not generalize to those exemplars of the rewarded category that were not shown during training. A particular advantage of our stimulus presentation and task setup is that we can most likely exclude attention as an explanation for the observed reward effect. Participants did not know whether a low- or high-rewarded or unrewarded category was to be expected as an upcoming stimulus. Furthermore, the stimulus presentation itself was extremely brief (17 msec), wherefore it can be considered unlikely that attention or arousal processes could have influenced object recognition.

Analysis of fMRI data acquired before and after training revealed an increase of bilateral inferior hippocampus activation in response to trained categories, compared with untrained categories. Importantly, the inferior hippocampus showed an equally strong effect when the analysis was constricted to untrained exemplars—in analogy to the within-category transfer observed on the behavioral level.

Within-category Transfer of Perceptual Learning and Exemplar Specificity of Reward Reinforcement

Our finding of a category-specific training effect on recognition performance is in line with previous studies reporting enhanced recognition for trained compared with novel objects (Baeck & Op De Beeck, 2010; Furmanski & Engel, 2000; Grill-Spector et al., 2000). Our behavioral findings go beyond previous work by establishing within-category transfer for perceptual learning of object recognition, that is, generalization of the training-induced improvement in object recognition to untrained exemplars of the trained category. Of note, there was still an additional advantage for trained relative to untrained exemplars, suggesting that perceptual learning of object recognition not only is based on category-related features but also involves specific features of the trained object exemplars. Transfer of improved recognition performance to untrained exemplars within a trained category has been reported previously in expertise paradigms (Scott, Tanaka, Sheinberg, & Curran, 2006; Tanaka, Curran, & Sheinberg, 2005). Importantly, our research question was fundamentally different from studies investigating visual expertise in that we were interested in the recognition of known categories under conditions of reduced visibility, whereas the abovementioned expertise studies focused on learning of unfamiliar categories. Moreover, objects were presented only very briefly under conditions of backward masking, whereas expertise tasks typically involve longer and unobstructed stimulus presentations. Thus, the challenge in our task was to quickly extract informative category-related features under challenging viewing conditions rather than the learning of new category boundaries.

In contrast to the observed transfer of category-related learning to untrained exemplars, the additional impact of reward did not generalize to untrained exemplars. This suggests that the influence of reinforcement on perceptual learning is tightly linked to exemplar-specific features that involve early stages of visual processing rather than category-defining features that require higher-level processing. This is in line with earlier work showing that reward-related perceptual learning effects were limited to the eye at which the stimuli were presented, a hallmark of early visual processing (Seitz et al., 2009).

Neural Correlates of Perceptual Learning and Reward Reinforcement in the Hippocampus

The enhancement of activation in the inferior hippocampus in response to trained categories showed, in analogy to our observations at the behavioral level, within-category transfer to untrained exemplars. This clearly suggests that the hippocampus is involved in high-level perceptual learning that is based on complex stimulus features constitutive of object categories. The effect maps to the subiculum, a subfield of the hippocampus that seems to play a role in the retrieval of memories (Eldridge, Engel, Zeineh, Bookheimer, & Knowlton, 2005; Gabrieli, 1997). Recently, subicular activation has been proposed to reflect a “match-enhancement signal” (Dudukovic, Preston, Archie, Glover, & Wagner, 2011; Duncan, Curtis, & Davachi, 2009) caused by a neuronal firing rate increase after presentation of a target that matches an actively retained sample (Otto & Eichenbaum, 1992). It should be noted, however, that the spatial resolution of our fMRI protocol did not allow us to isolate activity to the subiculum exclusively. Other subfields, in particular, the neighboring subfields dentate gyrus and CA1, might as well be involved.

The functional impact of the match-enhancement signal has been interpreted either in terms of an attentional gain enhancement when a stimulus matches an internal representation (Muzzio, Kentros, & Kandel, 2009) or in terms of “pattern completion” whereby partial cues reinstate information that was present during encoding (Dudukovic et al., 2011).

In the light of these previous findings, the training-related hippocampal activation observed in our study is likely to play a role in perceptual decision-making by means of the aforementioned match-enhancement signal. Applied to our object recognition paradigm, a match-enhancement signal could be generated when the current sensory information about an object matches the learned template of the trained object category. Improved object recognition would then be directly linked to an increased match-enhancement signal, which might serve to generate a more complete percept from the impoverished sensory information through pattern completion (Dudukovic et al., 2011).

Alternatively, hippocampal activation could be a correlate of recognition memory or familiarity. Participants were more frequently exposed to trained categories than to untrained categories (which were only shown in the pretraining and posttraining fMRI sessions). This could have led to a stronger memory imprint of trained categories, in line with previous studies linking the subiculum to the retrieval of memories (Eldridge et al., 2005; Gabrieli, 1997). The observed within-category transfer would then suggest that those stored memories consist of more general, category-related features rather than specific representations of individual exemplars. However, if the hippocampal activation would be based on recognition memory or familiarity, one would expect a stronger category-specific training effect in the 33-msec condition because of higher visibility. However, in fact, as we show in the supplementary Figure S2, the effect in the 33-msec condition, although stronger than the 0-msec effect, is significantly weaker than the effect in the 17-msec condition. This pattern of results in the hippocampus is in close and statistically highly significant correspondence with the behavioral training effects for the different ISIs (supplementary Figure S1) and therefore corroborates the link to training effects as opposed to the frequency of exposure. Furthermore, the supplementary analysis showed that the hippocampus did not display a baseline difference between the 17- and 33-msec conditions in the pretraining session. The stronger effect in the 17-msec condition was therefore mainly based on a stronger posttraining engagement of the hippocampus for trained categories in the 17-msec condition (relative to the 33-msec condition), which fits our interpretation of a supporting role of the hippocampus in object recognition in cases where viewing conditions are particularly challenging.

Evidence from patients with hippocampal damage regarding the role of the hippocampus in high-level perception remains controversial (for a review, see Lee et al., 2012) with an emphasis on scene (Graham et al., 2006; Lee, Buckley, et al., 2005; Lee, Bussey, et al., 2005) and context (Chun & Phelps, 1999) processing. A number of studies also investigated perceptual learning in patients with hippocampal damage (Mundy et al., 2013; Graham et al., 2006; Zaki, Nosofsky, Nenette, & Unverzagt, 2003; Manns & Squire, 2001; Chun & Phelps, 1999; Reed, Squire, Patalano, Smith, & Jonides, 1999; Knowlton & Squire, 1993). Most studies reported normal perceptual learning in patients (but see Mundy et al., 2013 and Zaki et al., 2003). These findings appear to be at odds with our results. However, it is important to note that the aforementioned patient studies focused on a narrow range of perceptual learning tasks, namely, visual search and category learning. Visual search is a special case, as it mainly imposes demands on visual attention rather than fine tuning of visual processing. Category learning, on the other hand, is related to expertise tasks in that participants have sufficient time to scrutinize the stimuli (in the order of several seconds), which, in addition, are presented clearly visible. This is an important difference to our study, in which we investigated object recognition under challenging viewing conditions both in space and time. Our favored interpretation for the functional role of hippocampus in high-level perception, pattern completion, is not probed in visual search or category learning tasks. Those crucial differences in perceptual task demands might explain why a number of studies did not find perceptual learning deficits in patients with hippocampal damage. In addition, a cautionary note regarding early category learning studies was brought up by Zaki et al. (2003), who provided evidence that patients indeed did show deficits in category learning in a more challenging task (involving two categories instead of only a single prototype category as in previous studies). Finally, a more general caveat with regard to lesion studies is the uncertainty to what degree brain function might have been reorganized to compensate for lost abilities after the damage of hippocampal brain tissue (e.g., Pascual-Leone, Bartres-Faz, & Keenan, 1999).

Interestingly, a recent patient study (Aly et al., 2013) found that the hippocampus was important for strength-based (related to the overall configuration of an image), but not state-based (related to local features), visual perception. This indicates that a hippocampal involvement in object recognition is likely based on the overall configural appearance of an object and not on local features, which is in line with the category-related (rather than exemplar-related) hippocampal training effect observed in our study.

With regard to the correlation of hippocampal activity and the behavioral reward effect, it is noteworthy that rodent studies have described the subiculum as an integrator of the raw sensory information and its past emotional connotation (Behr, Wozny, Fidzinski, & Schmitz, 2009; Naber, Witter, & Lopes da Silva, 2000). According to those studies, the subiculum is a recipient of both raw sensory information (from perirhinal and postrhinal cortices) and already processed or modulated version of the same information. Septal areas could modulate sensory input to the subiculum to signal the “emotional connotation in relation to the context of the situation where the organism was stimulated” (Behr et al., 2009). Thus, the correlation of hippocampal activity and the strength of the reward effect could reflect the learned reward association with respect to the currently perceived stimulus.

Contrary to a previous study (Grill-Spector et al., 2000), we did not find a training-specific increase of the overall BOLD signal in LOC. As an important difference, our participants were trained on a set of specific categories (rather than hundreds of different objects). It is possible that the extensive training with a small set of object categories in our study led to more efficient representations in LOC and thus potentially counteracted gain effects in this brain region. Another possibility is long-term adaptation (van Turennout, Ellmore, & Martin, 2000).

Conclusion

Our study provides evidence for category-specific perceptual learning in object recognition at the behavioral level as well as the neural level. Hippocampal activation was consistent with both the training specificity and the within-category generalization of behavioral improvements. Our data thus suggest an involvement of hippocampal function in the enhancement of higher-level object recognition after perceptual learning. Although the hippocampus and, specifically, the subiculum have primarily been linked to the retrieval of memories, our data indicate a role beyond memory function in terms of a more direct involvement in object recognition. Hippocampal activation likely corresponds to a match-enhancement signal that serves to generate a more complete percept by matching the limited sensory information to an internal object template, thereby making object recognition under constrained viewing conditions more efficient. Furthermore, the observed modulation of hippocampal activity through reward reinforcement suggests that the hippocampus is involved in signaling the behavioral relevance of an object.

Acknowledgments

This study was supported by the DFG Research Training Group (GRK 1589/1) and partly by DFG grants STE 1430/2-1 and STE 1430/6-1.

Reprint requests should be sent to Matthias Guggenmos, Bernstein Center for Computational Neuroscience, Philippstraße 13, Haus 6, 10115 Berlin, Germany, or via e-mail: matthias.guggenmos@bccn-berlin.de.

REFERENCES

Ahissar
,
M.
, &
Hochstein
,
S.
(
1997
).
Task difficulty and the specificity of perceptual learning.
Nature
,
387
,
401
406
.
Ahissar
,
M.
, &
Hochstein
,
S.
(
2004
).
The reverse hierarchy theory of visual perceptual learning.
Trends in Cognitive Sciences
,
8
,
457
464
.
Aly
,
M.
,
Ranganath
,
C.
, &
Yonelinas
,
A. P.
(
2013
).
Detecting changes in scenes: The hippocampus is critical for strength-based perception.
Neuron
,
78
,
1127
1137
.
Baeck
,
A.
, &
Op De Beeck
,
H. P.
(
2010
).
Transfer of object learning across distinct visual learning paradigms.
Journal of Vision
,
10
,
1
9
.
Baldassi
,
S.
, &
Simoncini
,
C.
(
2011
).
Reward sharpens orientation coding independently of attention.
Frontiers in Neuroscience
,
5
,
1
11
.
Behr
,
J.
,
Wozny
,
C.
,
Fidzinski
,
P.
, &
Schmitz
,
D.
(
2009
).
Synaptic plasticity in the subiculum.
Progress in Neurobiology
,
89
,
334
342
.
Chun
,
M. M.
, &
Phelps
,
E. A.
(
1999
).
Memory deficits for implicit contextual information in amnesic subjects with hippocampal damage.
Nature Neuroscience
,
2
,
844
847
.
Dudukovic
,
N. M.
,
Preston
,
A. R.
,
Archie
,
J. J.
,
Glover
,
G. H.
, &
Wagner
,
A. D.
(
2011
).
High-resolution fMRI reveals match enhancement and attentional modulation in the human medial temporal lobe.
Journal of Cognitive Neuroscience
,
23
,
670
682
.
Duncan
,
K.
,
Curtis
,
C.
, &
Davachi
,
L.
(
2009
).
Distinct memory signatures in the hippocampus: Intentional states distinguish match and mismatch enhancement signals.
The Journal of Neuroscience
,
29
,
131
139
.
Eickhoff
,
S. B.
,
Stephan
,
K. E.
,
Mohlberg
,
H.
,
Grefkes
,
C.
,
Fink
,
G. R.
,
Amunts
,
K.
,
et al
(
2005
).
A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data.
Neuroimage
,
25
,
1325
1335
.
Eldridge
,
L. L.
,
Engel
,
S. A.
,
Zeineh
,
M. M.
,
Bookheimer
,
S. Y.
, &
Knowlton
,
B. J.
(
2005
).
A dissociation of encoding and retrieval processes in the human hippocampus.
The Journal of Neuroscience
,
25
,
3280
3286
.
Fahle
,
M.
(
2002
).
Perceptual learning
.
Cambridge, MA
:
MIT Press
.
Furmanski
,
C. S.
, &
Engel
,
S. A.
(
2000
).
Perceptual learning in object recognition: Object specificity and size invariance.
Vision Research
,
40
,
473
484
.
Gabrieli
,
J. D.
(
1997
).
Separate neural bases of two fundamental memory processes in the human medial temporal lobe.
Science
,
276
,
264
266
.
Graham
,
K. S.
,
Barense
,
M. D.
, &
Lee
,
A. C. H.
(
2010
).
Going beyond LTM in the MTL: A synthesis of neuropsychological and neuroimaging findings on the role of the medial temporal lobe in memory and perception.
Neuropsychologia
,
48
,
831
853
.
Graham
,
K. S.
,
Scahill
,
V. L.
,
Hornberger
,
M.
,
Barense
,
M. D.
,
Lee
,
A. C. H.
,
Bussey
,
T. J.
,
et al
(
2006
).
Abnormal categorization and perceptual learning in patients with hippocampal damage.
The Journal of Neuroscience
,
26
,
7547
7554
.
Grill-Spector
,
K.
,
Kourtzi
,
Z.
, &
Kanwisher
,
N.
(
2001
).
The lateral occipital complex and its role in object recognition.
Vision Research
,
41
,
1409
1422
.
Grill-Spector
,
K.
,
Kushnir
,
T.
,
Hendler
,
T.
, &
Malach
,
R.
(
2000
).
The dynamics of object-selective activation correlate with recognition performance in humans.
Nature Neuroscience
,
3
,
837
843
.
Hutton
,
C.
,
Bork
,
A.
,
Josephs
,
O.
,
Deichmann
,
R.
,
Ashburner
,
J.
, &
Turner
,
R.
(
2002
).
Image distortion correction in fMRI: A quantitative evaluation.
Neuroimage
,
16
,
217
240
.
Knowlton
,
B. J.
, &
Squire
,
L. R.
(
1993
).
The learning of categories: Parallel brain systems for item memory and category knowledge.
Science
,
262
,
1747
1749
.
Lee
,
A. C. H.
,
Buckley
,
M. J.
,
Pegman
,
S. J.
,
Spiers
,
H.
,
Scahill
,
V. L.
,
Gaffan
,
D.
,
et al
(
2005
).
Specialization in the medial temporal lobe for processing of objects and scenes.
Hippocampus
,
15
,
782
797
.
Lee
,
A. C. H.
,
Bussey
,
T. J.
,
Murray
,
E. A.
,
Saksida
,
L. M.
,
Epstein
,
R. A.
,
Kapur
,
N.
,
et al
(
2005
).
Perceptual deficits in amnesia: Challenging the medial temporal lobe “mnemonic” view.
Neuropsychologia
,
43
,
1
11
.
Lee
,
A. C. H.
,
Yeung
,
L.-K.
, &
Barense
,
M. D.
(
2012
).
The hippocampus and visual perception.
Frontiers in Human Neuroscience
,
6
,
1
17
.
Liu
,
J.
,
Lu
,
Z.-L.
, &
Dosher
,
B. A.
(
2012
).
Mixed training at high and low accuracy levels leads to perceptual learning without feedback.
Vision Research
,
61
,
15
24
.
Malach
,
R.
,
Reppas
,
J. B.
,
Benson
,
R. R.
,
Kwong
,
K. K.
,
Jiang
,
H.
,
Kennedy
,
W. A.
,
et al
(
1995
).
Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.
Proceedings of the National Academy of Sciences, U.S.A.
,
92
,
8135
8139
.
Manns
,
J. R.
, &
Squire
,
L. R.
(
2001
).
Perceptual learning, awareness, and the hippocampus.
Hippocampus
,
11
,
776
782
.
Mundy
,
M. E.
,
Downing
,
P. E.
,
Dwyer
,
D. M.
,
Honey
,
R. C.
, &
Graham
,
K. S.
(
2013
).
A critical role for the hippocampus and perirhinal cortex in perception learning of scenes and faces: Complementary findings from amnesia and fMRI.
Journal of Neuroscience
,
33
,
10490
10502
.
Muzzio
,
I. A.
,
Kentros
,
C.
, &
Kandel
,
E.
(
2009
).
What is remembered? Role of attention on the encoding and retrieval of hippocampal representations.
The Journal of Physiology
,
587
,
2837
2854
.
Naber
,
P. A.
,
Witter
,
M. P.
, &
Lopes da Silva
,
F. H.
(
2000
).
Networks of the hippocampal memory system of the rat. The pivotal role of the subiculum.
Annals of the New York Academy of Sciences
,
911
,
392
403
.
Otto
,
T.
, &
Eichenbaum
,
H.
(
1992
).
Neuronal activity in the hippocampus during delayed non-match to sample performance in rats: Evidence for hippocampal processing in recognition memory.
Hippocampus
,
2
,
323
334
.
Pascual-Leone
,
A.
,
Bartres-Faz
,
D.
, &
Keenan
,
J. P.
(
1999
).
Transcranial magnetic stimulation: Studying the brain–behaviour relationship by induction of “virtual lesions.”
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
354
,
1229
1238
.
Pessoa
,
L.
(
2003
).
Filling-in: From perceptual completion to cortical reorganization
.
New York
:
Oxford University Press
.
Petrov
,
A. A.
,
Dosher
,
B. A.
, &
Lu
,
Z.-L.
(
2005
).
The dynamics of perceptual learning: An incremental reweighting model.
Psychological Review
,
112
,
715
743
.
Petrov
,
A. A.
,
Dosher
,
B. A.
, &
Lu
,
Z.-L.
(
2006
).
Perceptual learning without feedback in non-stationary contexts: Data and model.
Vision Research
,
46
,
3177
3197
.
Reed
,
J. M.
,
Squire
,
L. R.
,
Patalano
,
A. L.
,
Smith
,
E. E.
, &
Jonides
,
J.
(
1999
).
Learning about categories that are defined by object-like stimuli despite impaired declarative memory.
Behavioral Neuroscience
,
113
,
411
419
.
Roelfsema
,
P. R.
,
van Ooyen
,
A.
, &
Watanabe
,
T.
(
2010
).
Perceptual learning rules based on reinforcers and attention.
Trends in Cognitive Sciences
,
14
,
64
71
.
Scott
,
L. S.
,
Tanaka
,
J. W.
,
Sheinberg
,
D. L.
, &
Curran
,
T.
(
2006
).
A reevaluation of the electrophysiological correlates of expert object processing.
Journal of Cognitive Neuroscience
,
18
,
1453
1465
.
Seitz
,
A. R.
, &
Dinse
,
H. R.
(
2007
).
A common framework for perceptual learning.
Current Opinion in Neurobiology
,
17
,
148
153
.
Seitz
,
A. R.
,
Kim
,
D.
, &
Watanabe
,
T.
(
2009
).
Rewards evoke learning of unconsciously processed visual stimuli in adult humans.
Neuron
,
61
,
700
707
.
Seriès
,
P.
, &
Seitz
,
A. R.
(
2013
).
Learning what to expect (in visual perception).
Frontiers in Human Neuroscience
,
7
,
668
.
Tanaka
,
J. W.
,
Curran
,
T.
, &
Sheinberg
,
D. L.
(
2005
).
The training and transfer of real-world perceptual expertise.
Psychological Science
,
16
,
145
151
.
Tzourio-Mazoyer
,
N.
,
Landeau
,
B.
,
Papathanassiou
,
D.
,
Crivello
,
F.
,
Etard
,
O.
,
Delcroix
,
N.
,
et al
(
2002
).
Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain.
Neuroimage
,
15
,
273
289
.
van Turennout
,
M.
,
Ellmore
,
T.
, &
Martin
,
A.
(
2000
).
Long-lasting cortical plasticity in the object naming system.
Nature Neuroscience
,
3
,
1329
1334
.
Zaki
,
S. R.
,
Nosofsky
,
R. M.
,
Nenette
,
J. M.
, &
Unverzagt
,
F. W.
(
2003
).
Categorization and recognition performance of a memory-impaired group: Evidence for single-system models.
Journal of the International Neuropsychological Society
,
9
,
394
406
.