We used ERPs to investigate neural correlates of face learning. At learning, participants viewed video clips of unfamiliar people, which were presented either with or without voices providing semantic information. In a subsequent face-recognition task (four trial blocks), learned faces were repeated once per block and presented interspersed with novel faces. To disentangle face from image learning, we used different images for face repetitions. Block effects demonstrated that engaging in the face-recognition task modulated ERPs between 170 and 900 msec poststimulus onset for learned and novel faces. In addition, multiple repetitions of different exemplars of learned faces elicited an increased bilateral N250. Source localizations of this N250 for learned faces suggested activity in fusiform gyrus, similar to that found previously for N250r in repetition priming paradigms [Schweinberger, S. R., Pickering, E. C., Jentzsch, I., Burton, A. M., & Kaufmann, J. M. Event-related brain potential evidence for a response of inferior temporal cortex to familiar face repetitions. Cognitive Brain Research, 14, 398–409, 2002]. Multiple repetitions of learned faces also elicited increased central–parietal positivity between 400 and 600 msec and caused a bilateral increase of inferior–temporal negativity (>300 msec) compared with novel faces. Semantic information at learning enhanced recognition rates. Faces that had been learned with semantic information elicited somewhat less negative amplitudes between 700 and 900 msec over left inferior–temporal sites. Overall, the findings demonstrate a role of the temporal N250 ERP in the acquisition of new face representations across different images. They also suggest that, compared with visual presentation alone, additional semantic information at learning facilitates postperceptual processing in recognition but does not facilitate perceptual analysis of learned faces.
There is a striking discrepancy between our ability to process familiar and unfamiliar faces: Although familiar faces are recognized effortlessly from a range of pictorial variations, it is surprisingly difficult to recognize or even to match unfamiliar faces if different photographs are used (Bruce et al., 1999). For this reason, a qualitative difference between the processing of familiar and unfamiliar faces has been suggested (Megreya & Burton, 2006; Bruce, Henderson, Newman, & Burton, 2001; Hancock, Bruce, & Burton, 2000). But what happens when an unfamiliar face becomes familiar? Surprisingly little is known about how new representations of faces are formed. In particular, there are no satisfactory explanations for the transformation from image-specific coding of unfamiliar faces to more flexible representations of familiar faces. Using ERPs, the present study investigates neurophysiological correlates of the acquisition of such representations for initially unfamiliar faces in the course of learning.
Although there are a few studies using ERPs to investigate neural correlates of face encoding and learning (e.g., Tanaka, Curran, Porterfield, & Collins, 2006; Paller, Gonsalves, Grabowecky, Bozic, & Yamada, 2000; Sommer, Schweinberger, & Matt, 1991), these studies used identical images for learning and test. It is therefore unclear whether these effects would generalize to those processes of face learning that eventually support the recognition of a familiar face across a large range of image variations (Burton, Jenkins, Hancock, & White, 2005). To study face (rather than image) learning, we presented different exemplars of images of the same persons throughout the present study. Our main goal was to study effects of learning on N170, N250, and N400 ERP components that have been shown to be sensitive to face processing.
The occipito-temporal N170 is strongly influenced by face inversion (Schweinberger, Huddy, & Burton, 2004; Rossion et al., 2000; Bentin, Allison, Puce, Perez, & McCarthy, 1996) but is unaffected by the familiarity of a face. It is now widely agreed that the N170 is associated with the analysis of structural information in faces (or face-like configurations) prior to accessing identity-specific representations (Carbon, Schweinberger, Kaufmann, & Leder, 2005; Herzmann, Schweinberger, Sommer, & Jentzsch, 2004; Jemel, Pisani, Calabria, Crommelinck, & Bruyer, 2003; Schweinberger, Pickering, Burton, & Kaufmann, 2002; Schweinberger, Pickering, Jentzsch, Burton, & Kaufmann, 2002; Bentin & Deouell, 2000; Eimer, 2000; Bentin et al., 1996; George, Evans, Fiori, Davidoff, & Renault, 1996; but see also Caharel et al., 2002). In contrast, a subsequent right temporal N250r (Schweinberger, Pickering, Jentzsch, et al., 2002) has been shown to be more pronounced for identical and immediate repetitions of familiar as compared with unfamiliar faces (Pfutze, Sommer, & Schweinberger, 2002; Schweinberger, Pickering, Jentzsch, et al., 2002; Begleiter, Porjesz, & Wang, 1995; Schweinberger, Pfuetze, & Sommer, 1995). On the one hand, the N250r seems to be a relatively transient effect, in that its amplitude is significantly diminished if a number of stimuli intervene between repetitions (Pfutze et al., 2002). When repetitions are separated by many trials, N250r can even be completely abolished (Schweinberger, Pickering, Burton, et al., 2002). On the other hand, there is also evidence that numerous repetitions can produce more long lasting N250r effects (Itier & Taylor, 2004), and it has been suggested that N250 should be sensitive to long-term face learning (Graham & Dawson, 2005).
A direct comparison of repetition effects for upright faces and other categories of objects has led to the suggestion that the right inferior temporal N250r may be a face-selective ERP response (Schweinberger et al., 2004). However, it needs to be noted that a smaller left temporal N250r has also been observed for repetitions of written names (Pickering & Schweinberger, 2003), and other studies found some evidence for ERP effects with similar time course for the repetition of objects (Henson, Rylands, Ross, Vuilleumeir, & Rugg, 2004). In addition, a recent study reports increased N250 after subordinate-level discrimination training using pictures of birds, and the effects even generalized to new exemplars of learned species (Scott, Tanaka, Sheinberg, & Curran, 2006). This later finding has been interpreted by the authors as evidence that N250 reflects expertise in the processing of within-category comparisons at the subordinate level. At present, although there is some evidence from direct comparison that the face- and the name-elicited N250r reflect the operation of at least partially different neural mechanisms (Pfutze et al., 2002), the extent to which these various ERP effects of repetition and expertise reflect the same or different neural mechanisms for objects and faces must remain unclear and awaits a more direct comparison of the respective conditions in a single study.
Of particular relevance for the present study, a recent article demonstrated that the presentation of one's own face elicits a task-irrelevant N250 response in a non-immediate-repetition paradigm, whereas for repetitions of previously unfamiliar faces an N250 was only found for targets (Tanaka et al., 2006). For familiar faces, N250-like modulations were also found for nontarget faces in a modified Sternberg task (Boehm & Sommer, 2005), supporting the view that N250 reflects the activation of previously acquired identity-specific face representations and therefore possibly the access of “face-recognition units” described in cognitive models of face perception (Burton, Bruce, & Johnston, 1990; Bruce & Young, 1986). However, it remains unclear whether N250 long-term repetition effects for previously unfamiliar faces (Tanaka et al., 2006; Joyce & Kutas, 2005) are specific to the identical image or can be robust across repetitions of different images of the same person. If N250 reflects activation at the face-recognition unit level, one should expect at least some degree of image independence once the “qualitative shift” from an unfamiliar to a familiar face has been accomplished.
Finally, starting around 400 msec, ERPs for familiar faces are typically characterized by increased central–parietal positivity (Bentin & Deouell, 2000; Eimer, 2000; Paller et al., 2000). This component has been demonstrated to be modality independent (Schweinberger, 1996) and is reliably observed in long-term repetition priming paradigms (Schweinberger, Pickering, Burton, et al., 2002). Although it is not face specific and possibly also reflects the ease of discriminating between familiar and unfamiliar stimuli, it has also been interpreted as reflecting access to personal semantic information (Paller et al., 2000).
In real life, we often encounter novel faces along with information about the face bearer. The role of such semantic information on face learning remains largely unknown, but recently it has been suggested that semantic information may help to establish new face representations (Bonner, Burton, Jenkins, & McNeill, 2003; Paller et al., 2000). In addition to investigating effects of multiple face repetitions, we also contrasted learning for faces that were presented together with semantic information and learning for faces that were presented on their own.
In the learning phase of the present study, initially unfamiliar faces were either presented with or without semantic information. During learning, video clips included nonrigid and rigid facial movements. We used videos for face learning because we reasoned that this should promote the acquisition of flexible face representations that do not depend on a particular image. In each of four blocks of a subsequent face-recognition test, learned faces were repeatedly presented, intermixed with novel faces. In addition to behavioral data (accuracy, d′, and RTs), EEG was recorded during the face-recognition test.
The study had four major aims. First, we were interested in the learning curves for faces using different exemplars of the same persons. Second, we wanted to investigate whether any of the previously described ERP components associated with face perception are specifically modified by learning. Third, we tried to replicate reports that semantic information supports face learning and to extend these findings to face recognition from different stimuli. Finally, we aimed at using the high temporal resolution of ERPs to investigate at which processing level semantic information might influence the learning of novel faces.
A total of 24 participants (14 women and 10 men) aged between 18 and 37 years (M = 22.9, SD = 5.0) were paid £15 to contribute data for this study. All participants had normal or corrected-to-normal vision. Twenty-two participants were right-handed, one was left-handed, and one was ambidextrous, as assessed by the Edinburgh Handedness Inventory (Oldfield, 1971). Data from three additional participants were excluded due to an insufficient number of artifact-free EEG trials. Three further participants were excluded because they did not reach a minimum performance criterion of correctly recognizing at least 60% of the learned faces (M = 44% as compared with M = 79% for all other participants).
Stimuli and Apparatus
For the learning phase, 72 different video clips of unfamiliar faces were prepared on videotape (22 female and 50 male). The duration of a single clip was 30 sec. Each face was preceded by a fixation cross presented for 2000 msec in the center of the screen. The duration of the whole tape was approximately 40 min. All faces displayed rigid and nonrigid head movements and were shown in color in front of a standardized background. In addition, 36 voice samples of 30 sec each were recorded from 36 different speakers. The voice samples provided semantic information, always including the person's forename, profession, residence, and seven additional distinct items of identity-specific information. For each voice, dialects and accents were matched to the given information, and sex and approximate age were matched to the face displayed. The videotape was presented on a 14-in. monitor.
For each face that had been presented during the learning phase, we prepared four different grayscale images, resulting in a total of 288 images of learned faces. All faces displayed a neutral expression. Images were cropped from raw video clips used to create the learning tapes. Only frames that had not been included in the learning clips were selected for the test phase to ensure that no identical images were repeated. The video-editing software was Final Cut Pro.
In addition to learned faces, images of 148 different unfamiliar faces were edited accordingly. The majority of unfamiliar face images were cropped from additional clips of the same video database that had been used for the production of the learning tape. All stimuli were digitally edited using Adobe Photoshop and adjusted with respect to size, background, brightness, and contrast. Any information from the neck downward, such as clothing accessories, was cropped (for examples of the stimuli, see Figure 1).
The stimuli in the test phase were presented on black background in the center of a 19-in. monitor. The presentation software was ERTS (Beringer, 2000). Image resolution was 28.3 pixels/cm at a screen resolution of 800 × 600 pixels. The size of the stimuli was 6 × 7.6 cm at a viewing distance of 80 cm, resulting in a visual angle of 4.3° × 5.4°. Viewing distance was kept constant by using a chin rest.
Participants were informed that they were about to take part in a face-learning task that would be followed by a face-recognition test including an EEG recording. They all gave their informed consent to participate, and handedness was then determined using the Edinburgh Inventory (Oldfield, 1971).
The set of video clips was divided into two subsets of 36 faces each. Faces from one set (“semantic faces”) were presented simultaneously with a voice providing semantic information. Faces from the other set (“nonsemantic faces”) were presented without voices and therefore without semantic information. The pairing of face set and semantic information condition was counterbalanced across participants. Semantic and nonsemantic faces were presented intermixed and in two different random orders for each counterbalancing group, resulting in four different tapes. Participants were instructed to memorize the faces and, where present, to memorize the semantic information given.
After the learning phase, an electrode cap (EasyCap; Falk Minow Services, Herrsching-Breitbrunn, Germany) was applied. Approximately 40 min after the end of the videotape, participants performed a speeded dual-choice face-recognition task. Faces were classified as familiar or unfamiliar by pressing a key on a response time keypad (ERTSKEY) using the index fingers of both hands. The assignment of response hand and response alternative was counterbalanced across participants. Both speed and accuracy were stressed. The test phase comprised four blocks, each consisting of 35 semantic, 35 nonsemantic, and 35 novel faces presented in random order. Thus, across the complete testing session, each previously learned face was repeated four times using nonidentical exemplars. Different novel faces were used in each block and therefore never repeated. Within each block, faces were presented in random order and without sound.
After reading the instructions on the monitor and before the first block of the test phase, participants performed 16 practice trials comprising two additional learned and two additional unfamiliar faces that were not used again later in the experiment. For the practice trials, feedback was given for wrong and missing answers using two tones of 300-msec duration and 500 or 650 Hz, respectively. Practice trials were not analyzed. Participants were encouraged to ask questions in case anything remained unclear at this stage.
In all trials, a central white fixation cross on black background was shown for 1000 msec, followed by a face (2000 msec) and a blank screen (1000 msec). During the four experimental blocks, no feedback was given for single trials. However, after a maximum of 53 trials, there was a short break in which feedback on average response times and errors was provided. The end of the break was self-paced. The whole test phase lasted about 35 min. For each block, data were averaged across condition (semantic faces, nonsemantic faces, and novel faces), resulting in a maximum of 35 trials per block and condition.
After completion of the four blocks of the test phase, we determined how well participants remembered the semantic information provided at learning. For that purpose, all of the learned semantic and nonsemantic faces were individually presented again on the computer monitor in random order. Below each face, three short summaries of semantic information and the corresponding names were presented visually. The participants decided by key press whether—and if so which—description matched the presented person. There was no time limit for this decision.
The EEG was recorded in an electrically and acoustically shielded room at the Department of Psychology, University of Glasgow, UK. Data were recorded with sintered Ag/AgCl electrodes mounted on an electrode cap (EasyCap; Falk Minow Services) using SynAmps amplifiers (NeuroScan Labs, Sterling, VA), arranged according to the extended 10/20 system at the scalp positions Fz, Cz, Pz, Iz, Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T7, T8, P7, P8, FT9, FT10, P9, P10, PO9, PO10, F9, F10, F9′, F10′, TP9, and TP10. Note that the T7, T8, P7, and P8 locations are equivalent to T3, T4, T5, and T6 in the old nomenclature. The F9′ electrode was positioned 2 cm anterior to F9 at the outer canthus of the left eye, and the F10′ electrode was positioned 2 cm anterior to F10 at the outer canthus of the right eye. The positions TP9 and TP10 refer to inferior–temporal locations over the left and right mastoids, respectively. TP10 (right upper mastoid) served as initial common reference, and a forehead electrode (AFz) served as ground.
Impedances were kept below 10 kΩ and were typically below 5 kΩ. The horizontal EOG was recorded bipolarly from F9′ and F10′ at the outer canthi of both eyes. The vertical EOG was monitored bipolarly from an electrode above the right eye against an electrode below the right eye. All signals were recorded with direct current (40 Hz low pass, −6 dB attenuation, 12 dB/octave) and sampled at a rate of 250 Hz.
Preprocessing of ERP Data
Offline, epochs were generated, lasting 2200 msec and starting 200 msec before the onset of a face stimulus. Automatic artifact detection software (KN-Format) was run for an initial sorting of trials, and all trials were then visually inspected for artifacts of ocular (e.g., blinks, saccades) and nonocular origin (e.g., channel blocking or drifts). Trials with nonocular artifacts, trials with saccades, and trials with incorrect behavioral responses were discarded. For all remaining trials, ocular blink contributions to the EEG were corrected using the KN-Format algorithm (Elbert, Lutzenberger, Rockstroh, & Birbaumer, 1985). ERPs were averaged separately for each channel and for each experimental condition and block. Each averaged ERP was low-pass filtered at 10 Hz with a zero phase shift digital filter and recalculated to average reference, excluding the vertical EOG channel.
Dipole Source Analysis
Dipole source analyses were conducted by using Brain Electromagnetic Source Analysis software (BESA, Version 5.1.4). For modeling, a four-shell ellipsoidal head modal was used, which takes into account different conductance of brain, cerebrospinal fluid, bone, and scalp. Coordinates are reported according to the Talairach and Tournoux (1988) brain atlas.
RTs were recorded from the onset of face stimuli. To exclude anticipatory responses as well as extremely slow responses, we included correct responses in the analysis of RTs only when they were given within a time window from 150 to 2000 msec. Performance was analyzed by means of accuracies and sensitivities (d′) (Brophy, 1986). For the calculation of d′, hit rates of 1 and false alarm rates of 0 were adjusted according to the suggestions by Macmillan and Kaplan (1985).
ERPs for faces were quantified by mean amplitude measures for the following components or time segments: P1 (110–130 msec), N170 (160–190 msec), N250 (240–280 msec), 300–400 msec, 400–600 msec, and 700–900 msec. The first two time windows were chosen to correspond with distinct peaks in the waveforms identified from grand means across all blocks and conditions (P1 = 128 msec at O1 and O2; N170 = 184 msec at P9 and P10); the time range between 240 and 280 msec corresponds to the N250r, a component that has been shown to be modulated by immediate repetitions of faces and that has been associated with face identity recognition (Schweinberger et al., 2004; Schweinberger, Pickering, Jentzsch, et al., 2002). The later time segments were chosen after visual inspection of ERPs. All amplitude measures were taken relative to a 200-msec baseline preceding the target stimulus.
As the main interest of this study was to investigate the effects of multiple repetitions and semantic information on face learning, we had included novel faces only to create the task demands; data for novel faces were therefore not included in the initial analysis. However, because effects of block might be influenced by face repetitions and time on task, block effects for learned faces were later validated by contrasting them with block effects for novel faces.
For behavioral and ERP data, where appropriate, Epsilon corrections for heterogeneity of covariances were performed with the Huynh–Feldt method (Huynh & Feldt, 1976) throughout. If additional ANOVAs were conducted to further investigate significant interactions, α levels were Bonferroni-corrected.
The extent to which semantic information was explicitly remembered was tested immediately after the EEG session. The results revealed that names and semantic information for the semantic faces were correctly recognized in 69% of the trials. For nonsemantic faces, nonassociated semantic information was correctly rejected in 75% of the trials (chance performance would be 25% in both cases).
Mean correct sensitivities (d′), RTs, and accuracies from the test phase are displayed in Figure 2. Behavioral effects were analyzed by performing an ANOVA with repeated measurements on Learning condition (semantic vs. nonsemantic) and Block (1 to 4), where block number is equivalent to the number of repetitions after the learning phase.
A main effect of Learning condition, F(1, 23) = 6.25, p < .05, demonstrated superior recognition of faces presented with semantic information (Figure 2A). There was also a main effect of Block, F(3, 69) = 63.65, p < .0001. Learning condition and Block did not interact, F(3, 69) < 1. The effect of Block was further investigated by means of polynomial contrast analysis, which revealed significant linear, F(1, 23) = 102.85, p < .0001, and quadratic trends, F(1, 23) = 79.27, p < .0001, with no higher-order trends.
In contrast to d′, response times for semantic and nonsemantic faces did not differ, F(1, 23) < 1. Overall, RTs decreased across blocks, F(3, 69) = 93.46, p < .0001. As for d′, polynomial contrast analysis revealed significant linear, F(1, 23) = 139.35, p < .0001, and quadratic trends, F(1, 23) = 50.23, p < .001, with no higher-order trends. There was no interaction between Learning condition and Block, F(3, 69) = 1.15, p > .33.
Effects of Learning Condition and Block
Effects were quantified using predefined ROIs, where effects were expected to be found. ROIs were selected based on previous studies investigating face learning and repetition priming (Schweinberger, Pickering, Burton, et al., 2002; Schweinberger, Pickering, Jentzsch, et al., 2002; Paller et al., 2000; Paller, Bozic, Ranganath, Grabowecky, & Yamada, 1999) and were further validated by visual inspection. They included occipital electrodes for P1 (Iz, O1/O2) and inferior–temporal sites for N170 and N250 (TP9/TP10, P9/P10, PO9/PO10). For all later time windows starting at 300 msec, 3 ROIs where analyzed. These consisted of frontal (Fz, F3/F4), central–parietal (Cz, Pz, P3/P4), and inferior–temporal (TP9/TP10, P9/P10, PO9/PO10) electrodes (see also Figure 3). For the inferior–temporal ROI, which did not include midline electrodes, the additional factor Hemisphere was introduced.
First, as for the behavioral data, we initially performed ANOVAs with repeated measurements on the factors Learning condition (semantic vs. nonsemantic) and Block (1–4) plus the additional factor electrode site for each time segment. In the interest of stringency and readability, we will report significant results only when the factors Learning condition or Block are included.
This component was neither modulated by Block, F(3, 69) = 1.04, p < .37, nor Learning condition, F < 1, and there were no interactions including either of the two factors.
A main effect of Block, F(3, 69) = 20.05, p < .0001, suggested an increase in N170 negativity (Ms = 0.11 μV, −.69 μV, −1.13 μV, and −1.06 μV in Blocks 1–4, respectively). This effect was further qualified by a two-way interaction with the factor Site, F(6, 138) = 4.94, p < .01. Inspection of Figure 4 suggests that this was due to larger block effects at the more posterior locations P09 and P010. N170 was not affected by Learning condition, F(1, 23) = 1.38, p > .25.
Similar to N170, N250 was characterized by increasing negativity across blocks, F(3, 69) = 43.83, p < .0001. Again, this block effect interacted with Site, F(6, 138) = 10.58, p < .0001. Inspection of Figure 4 suggests somewhat larger modulations at more posterior sites, in particular over the right hemisphere. This is also suggested by a trend for an interaction between Block and Hemisphere,1F(3, 69) = 2.81, p = .07. There was no effect of Learning condition, F(1, 23) < 1.
As for N170 and N250, inferior–temporal negativity also increased for this time window across blocks, F(3, 69) = 29.48, p < .001. Again, block effects were qualified by an interaction with Site, F(6, 138) = 10.58, p < .0001, suggesting larger increases in negativity at more posterior sites (Figure 4).
At central–parietal ROIs, the analysis yielded a significant increase in positivity, F(3, 69) = 18.04, p < .001. This main effect was further specified by a two-way interaction with the factor Site, F(9, 207) = 5.18, p < .0001. Inspection of Figure 5 suggests strongest increases in positivity at Pz.
Finally, there was a significant main effect of Block at frontal sites, F(3, 69) = 6.4, p < .001. Inspection of Figure 6 suggests this was due to more negative amplitudes in Block 1 compared with later blocks. No effects or interactions involving the factor Learning condition reached significance level.
Also for this time window, inferior–temporal negativity increased across blocks, F(3, 69) = 14.25, p < .001 (see also Figure 4). At central–parietal ROIs, positivity increased with block number, F(3, 69) = 45.61, p < .001 (see also Figure 5). As for the previous time window, block effects at central–parietal electrodes were further qualified by an interaction with the factor Site, F(9, 207) = 3.74, p < .01, with numerically largest increases at electrode Pz. There were no effects of Learning condition at any ROI.
As well for this late time window, inferior–temporal negativity increased across blocks, F(3, 69) = 12.54, p < .001. At central–parietal sites, positivity still increased along blocks, F(3, 69) = 18.61, p < .001 (see also Figure 5). The two-way interaction between Block and Site, F(9, 207) = 2.5, p < .05, suggested that increases were now largest at electrode Cz.
This time window was the first to show any involvement of the factor Learning condition: The analysis revealed a significant two-way interaction between Learning condition and Hemisphere at inferior–temporal sites, F(1, 23) = 5.71, p < .05. Inspection of Figure 7 suggests more positive ERPs for semantic faces over the left hemisphere. The interaction was further explored by two post hoc ANOVAs for the left and the right hemispheres. Both analysis included the factors Learning condition (semantic vs. nonsemantic) and electrode (P9, TP9, and PO9 for the left hemisphere and P10, TP10, and PO10 for the right hemisphere). The α level was adjusted to .025. According to these analyses, differences between semantic and nonsemantic faces were marginally significant over the left hemisphere, F(1, 23) = 5.48, p = .028. Over the right hemisphere, there was no evidence for differences between semantic and nonsemantic faces, F < 1.
In sum, the most prominent effects of block manifested themselves as increasing negativity at inferior–temporal sites, starting at the time range of N170 and lasting at least until 900 msec poststimulus onset. At central–parietal sites, starting at about 300 msec, we observed a sustained increase of positivity. Between 300- and 400-msec ERPs, there was a decrease in negativity along blocks at frontal sites. Finally, the factor Learning condition did not influence ERPs before a 700-msec poststimulus onset.2
Comparisons of Learned versus Novel Faces
As can be seen in Figure 8, also ERPs for novel faces were modulated to some extent by the factor Block. Because novel faces were never repeated, this suggests that the reported block effects for learned faces were not entirely caused by face repetitions and that time on task had a more general influence on ERPs elicited by all faces.
To separate effects of face repetitions from effects of time on task, we contrasted block effects for learned faces with block effects for novel faces. To this means, we calculated difference waves by subtracting ERPs in Block 1 from those of Block 4 for each learning condition (semantic, nonsemantic, and novel faces). Separate ANOVAs comparing novel faces with semantic and nonsemantic faces were performed for each time interval and ROI, which had yielded significant block effects in the initial analyses on semantic and nonsemantic faces. Factors were Learning condition (semantic vs. novel or nonsemantic vs. novel) and Site (identical to the analyses above). Figure 9 depicts topographies of differences between Block 1 and Block 4 for all face categories and relevant time intervals.
This component was not affected by face repetitions: Although overall amplitudes of N170 increased from Block 1 to Block 4, this did not differ between learned and novel faces, F(1, 23) = 1.73, p > .20, and F(1, 23) = 1.55, p > .22, for the comparison of semantic versus novel and nonsemantic versus novel faces, respectively.
By contrast, N250 was affected by face repetitions (see also Figure 10). Increases in inferior–temporal negativity between Blocks 1 and 4 were larger for semantic (Mdiff = −2.49 μV) and nonsemantic (Mdiff = −2.19 μV) in comparison to novel faces (Mdiff = −1.38 μV), F(1, 23) = 6.78, p < .05, and F(1, 23) = 7.10, p < .05, respectively).
We further analyzed the development of N250 across all blocks by separate ANOVAs for semantic, nonsemantic, and novel faces with repeated measurements on the factors Block (block 1 to block 4), Site, and Hemisphere. These analyses yielded significant N250 block effects both for semantic and nonsemantic faces, F(3, 69) = 26.06, p < .0001, and F(3, 69) = 21.46, p < .0001, respectively. According to polynomial contrast analyses, these were attributed to linear, F(1, 23) = 55.14, p < .0001, and F(1, 23) = 45.07, p < .0001, for semantic and nonsemantic faces, respectively, and quadratic trends, F(1, 23) = 17.46, p < .01, and F(1, 23) = 4.49, p < .05, respectively. For novel faces, there was also a significant effect of Block, F(3, 69) = 8.76, p < .001. In contrast to semantic and nonsemantic faces, the main effect of Block interacted with the factor Hemisphere, F(3, 69) = 6.68, p < .01. Separate ANOVAs for left and right inferior–temporal sites only revealed significant increases of N250 for novel faces over the right hemisphere, F(3, 69) = 12.54, p < .0001. According to polynomial contrasts, this effect was due to a significant linear trend, F(1, 23) = 23.83, p < .0001, with no higher order contributions. At left inferior–temporal sites, the main effect of block did not reach significance, F(3, 69) = 2.28, p = .10 (see also Figure 11A and B).
For this interval, ERPs for learned faces were characterized by a larger increase of central–parietal positivity in comparison to novel faces, F(1, 23) = 5.96, p < .05, for semantic versus novel and F(1, 23) = 8.61, p < .01, for nonsemantic versus novel faces, respectively (see also Figures 12 and 13).
In addition, there was some evidence for a more bilateral increase of inferior–temporal negativity for learned faces, whereas for novel faces block effects seemed to be smaller over the left hemisphere (see also Figures 9 and 10). Significant two-way interactions between the factors Learning condition and Hemisphere were found both for the ANOVAs including semantic versus novel, F(1, 23) = 5.01, p < .05, and nonsemantic versus novel faces, F(1, 23) = 6.12, p < .01. These were further evaluated by separate ANOVAs for semantic, nonsemantic, and novel faces, including the factor Hemisphere at electrodes P09 and P010. For novel faces, this analysis revealed stronger increases over the right hemisphere, F(1, 23) = 6.91, p < .016, whereas more bilateral increases in negativity were found both for semantic and nonsemantic faces: Neither of the two ANOVAs revealed an effect of hemisphere, F(1, 23) = 2.46, p = .13, and F(1, 23) = 1.48, p = .23, for semantic and nonsemantic faces, respectively).
The only significant finding for this time segment was a two-way interaction between Learning condition and Hemisphere for the comparison of semantic versus novel faces, F(1, 23) = 5.28, p < .05. However, neither of two separate post hoc tests for the left and the right hemispheres including the factors Learning condition (semantic vs. novel) and Electrode (P9, TP9, and PO9 for the left hemisphere; P10, TP10, and PO10 for the right hemisphere) showed differences between the two learning conditions, F(1, 23) = 1.52, p = .23, and F(1, 23) = 1.43, p = .24, for the left and the right hemispheres, respectively.
To derive the source model for N250, we used grand mean waveforms of the differences between Block 4 minus Block 1 for each face category. A dipole source analysis was performed for the time interval 200–300 msec. Spatial PCA was employed to estimate the minimum number of dipoles that should be included in the model. In an iterative process, the time window was then reduced until a minimum of components explained a maximum of the variance. For semantic faces, between 272 and 280 msec, one PC explained more than 99.5% of the variance. We subsequently fitted one pair of dipoles, with no constraints for location or orientation other than a symmetry constraint (Schweinberger, Pickering, Jentzsch, et al., 2002). This resulted in a solution with a goodness of fit of 97%. Source coordinates were compatible with generators in the fusiform gyrus (x = ±42.6, y = −66.9, and z = −6.7). For nonsemantic faces, more than 99.5% of the variance was explained by one PCA between 260 and 272 msec, and one pair of symmetrical sources (x = ±48.8, y = −75.6, and z = −8.3) led to a goodness of fit of 97.1%. Figure 14 depicts sources of N250 for semantic faces and nonsemantic faces projected on a standard brain anatomy.
Location Differences between N250 for Semantic and Nonsemantic Faces
By visual inspection, it appeared that N250 sources for semantic faces were located somewhat more medial and anterior in comparison to those for nonsemantic faces. To test statistically for location differences, separate model fits for each subject would be required. Here the problem of lower signal-to-noise ratio in individual data arises. Therefore, a jack-knife procedure, which has been demonstrated to increase the power to analyze LRP latency onset (Miller, Patterson, & Ulrich, 1998) and which has also been used to compare dipole source localizations (Schweinberger, Pickering, Jentzsch, et al., 2002; Leuthold & Jentzsch, 2001), was applied. Using the jack-knife procedure, 24 subsamples were calculated for each difference wave (Block 4 minus Block 1) for semantic and nonsemantic faces. For each subsample, data from one different participant were omitted. Then, the average waveform for each subsample was fitted as described earlier, with the difference that a fixed time window of 240–263 msec was used for all conditions. This was done to accomplish that the same components were compared. The mean location differences in dipole coordinates (x = lateral–medial, y = anterior–posterior, and z = superior–inferior) were calculated for the grand-average waveforms. Finally, one-tailed t tests were performed with the jack-knife-based standard error replacing the usual standard error calculated from individual location differences (Leuthold & Jentzsch, 2001; Miller et al., 1998). Mean source coordinates were x = ±44.4, y = −65.6, z = −10.0 for semantic and x = ±51.4, y = −76.2, z = −11.3 for nonsemantic faces. None of the comparisons between semantic and nonsemantic faces approached significance.
For both the behavioral and the ERP data, highly significant block effects were found. These were characterized by increasing performance and increasing ERP amplitudes starting with N170. For RTs and d′, multiple repetitions of learned faces resulted in an asymptotic learning curve. The analysis of ERPs showed that apart from occipital P1, mean amplitudes for all analyzed time intervals were modulated and typically increased across blocks. However, this finding was not entirely due to face repetitions, as a comparison of amplitude modulations between the first and the last block for learned and novel faces only revealed main effects of learning for inferior–temporal N250 (extending until 400 msec) and central–parietal positivity in the time range between 400 and 600 msec. These findings are in line with studies associating N250 with the activation of stored perceptual face representations (Tanaka et al., 2006; Schweinberger et al., 2004; Schweinberger, Pickering, Jentzsch, et al., 2002) and central–parietal N400 effects with postperceptual representations of persons (Schweinberger, Pickering, Burton, et al., 2002; Eimer, 2000; Paller et al., 2000; Schweinberger, 1996) and show how these components build up along with improved face representations due to repetitions. A recent study on face learning by Tanaka et al. (2006) suggests that increases of N250 amplitude for repeatedly presented faces are larger for target—in comparison to nontarget faces. As the focus of the present study was not on effects of task relevance, all learned faces were targets. It is therefore quite possible that modulations of N250 due to face repetitions are to some extent task dependent (also cf. Trenner, Schweinberger, Jentzsch, & Sommer, 2004) and that N250 effects are larger for intentional as opposed to incidental learning conditions.
We think that a particular strength of this study is the usage of nonidentical images, making mere picture learning highly unlikely, in contrast to other studies (e.g., Tanaka et al., 2006; Paller et al., 2000). However, it has to be said that although pictures were not identical, strong variations of emotional expression or visual angle have been avoided. Otherwise, the task would have simply been too difficult. Given the accuracy levels far below ceiling, we suppose that representations of learned faces may not have reached yet the same degree of picture independence as those of highly overlearned familiar faces.
In contrast to Schweinberger, Pickering, Burton, et al. (2002), who did not find N250 effects in a long-term repetition paradigm using familiar faces, the present study revealed an N250 for repetitions of learned faces despite many intervening stimuli. The most plausible interpretation for this discrepancy is that the present effects are contingent on the higher number of repetitions used in this study, an idea that is also in accord with other recent studies (Tanaka et al., 2006; Itier & Taylor, 2004). More evidence for long-term repetition effects in the time range of N250 for previously unfamiliar faces comes from a recent ERP study (Joyce & Kutas, 2005), but because that study did not record ERPs from inferior–temporal electrodes at which the N250 response is usually largest, it is difficult to compare those findings directly to the present ones.
Although there are clearly limitations regarding source localizations using EEG, sources obtained for N250 in this experiment are consistent with bilateral activity in fusiform gyrus and are remarkably close to those reported in a previous study using immediate repetition priming of familiar faces. The observation of nondifferent N250 source localizations for semantic and nonsemantic faces is consistent with the view that additional semantic information at learning facilitates postperceptual processing in recognition but does not facilitate perceptual analysis of learned faces.
The increase of N250 amplitudes across repetitions quite strikingly reflects the behavioral data, arguing for N250 being a correlate of accessing acquired face representations and being correlated to face familiarity (Caharel et al., 2002). However, although increases in N250 were greater for learned faces, there was some evidence for increased N250 across blocks for nonrepeated novel faces as well. At this moment, we can only speculate about the origin of the modulation for novel faces, but in this context it is interesting that a recent study applying discrimination training on nonfacial images (birds) reports increases of N250 that even generalized across nonpresented exemplars (Scott et al., 2006). This finding was interpreted as N250 representing enhanced discrimination at the subordinate level due to training. How could these interpretations be brought together? One possibility might be that increased negativity in the time range of N250 is caused by two closely related processes: The early stage of N250 (starting at about 220 msec) might be a correlate of enhanced subordinate level discrimination using structural information, increasing with training and time on task for learned and novel exemplars. The later aspect of N250, peaking around 270 msec, seems to emerge in the case of a successful match of incoming structural information with stored representations. In our opinion, there are not enough data yet to really put forward this speculative notion; however, it would be in line with the observation that compared with familiar human faces, N250r is visible but smaller for immediate repetitions of (unfamiliar) ape faces (Schweinberger et al., 2004) and it also receives some support from the present study as N250 was significantly smaller, peaking about 50 msec earlier for novel in comparison to learned faces. It might be worth addressing this question in future research.
One limitation of the present design is that we cannot say with certainty whether differences between learned and novel faces were due to the initial learning stage or the fact that learned faces were repeated across blocks and therefore received priming whereas novel faces did not. Although there are claims that face learning and face priming are basically mediated by the same processes (e.g., Burton, 1994), their precise relationship is still unclear.
The finding that increases in N170 across blocks did not differ between learned and novel faces is compatible with the view that this component reflects processing prior to face identity retrieval (Tanaka et al., 2006; Jemel et al., 2003; Bentin & Deouell, 2000; Eimer, 2000; Bentin et al., 1996; but see Kloth et al., 2006, for M170, the magnetic counterpart of N170). Whereas a decrease in N170 has been described for repetitions of faces (Itier & Taylor, 2004), we found an N170 increase for novel and repeated faces across blocks. It is tempting to interpret these differences as reflecting repetition suppression, a process that is invoked in the case of same-image repetitions (Itier & Taylor, 2004), and repetition enhancement, a process that may more likely be elicited by different-image repetitions as was the case in the present study (for related ideas, see Henson, Shallice, & Dolan, 2000). This issue deserves more systematic investigation.
Finally, for learned faces, we observed a more bilateral increase of late inferior–temporal negativity, which appeared to begin in the time range of the N250. For repeatedly presented learned faces, the analyses of polynomial contrasts suggested bilateral increases in the negative amplitude of N250, whereas for novel faces an increase of N250 seemed to be limited to the right hemisphere. This finding is compatible with the view that, during the process of learning, stimuli are stored in increasingly more extended and more bilateral neural assemblies. There is now ample evidence for a bilateral processing advantage for familiar faces and words, which is absent for matched unfamiliar stimuli (Schweinberger, Baird, Blumler, Kaufmann, & Mohr, 2003; Mohr, Landgrebe, & Schweinberger, 2002; Pulvermuller & Mohr, 1996). In particular, the left hemisphere might be crucial in generating more abstract representations of the visual world (Schweinberger, Ramsay, & Kaufmann, 2006; Burgund & Marsolek, 2000), a process that seems to depend on learning and stimulus variability.
The behavioral data of the present study demonstrate that rich semantic information at learning can enhance subsequent face recognition. Although there were no differences regarding RTs, overall accuracies and d′ values were higher for semantic than for nonsemantic faces. This finding is in line with previous reports of benefits of nonvisual information on face learning (Bonner et al., 2003; Paller et al., 2000). However, results in the present experiment differ in some aspects from the study of Paller et al. (2000). First of all, overall recognition rates were somewhat lower in our study, in particular for the first block, which is the only one that can be directly compared with Paller et al. (2000) because that study only applied one testing block. The lower performance levels in our study may be due to the fact that we used nonidentical images at learning and test and excluded any paraphernalia such as clothing accessories, resulting in a more difficult task. There are also differences with respect to the ERP data. Whereas Paller et al. (2000) reported increased positivity at frontal sites for semantically learned faces between 300 and 600 msec, which they interpreted as a potential correlate of accessing semantic information, we found evidence for differences in the processing of semantic and nonsemantic faces at later time intervals between 700 and 900 msec poststimulus onset at inferior–temporal sites. The interaction between hemisphere and learning condition suggests that starting at about 700 msec, semantic learning resulted in somewhat more positive ERPs over left hemispheric inferior–temporal electrodes. This late left temporal effect might be associated with the activation of semantic information. In this context, it is interesting to note that left temporal ERP correlates of semantic activation have been reported in tasks involving the semantic matching of famous faces (Huddy, Schweinberger, Jentzsch, & Burton, 2003). An alternative explanation might be that during learning, semantic faces were better attended due to the more interesting presentation, including the voice and the little story. We also cannot rule out that rather than the content of the semantic information, it was actually the voice itself that facilitated face learning, as it has been claimed that face and voice perception are strongly intertwined (von Kriegstein, Kleinschmidt, Sterzer, & Giraud, 2005).
In any case, the latency of the left temporal effect of semantic learning suggests that semantic information facilitates face learning at a postperceptual level of processing. This interpretation is further corroborated by the finding of nondifferent sources of N250 for semantic and nonsemantic faces. We would therefore suggest that ERP differences between semantic and nonsemantic faces are mainly due to concurrent memory retrieval of semantic information (possibly also shifting the response criteria to a more liberal one) rather than a better consolidation and a better mental representation for semantic faces. However, it has to be noted that the results of the correlation analyses between ERP components for semantic faces and the behavioral performance in the semantic memory test do not provide unequivocal support for this interpretation. Although for the time range between 700 and 900 msec there was a positive correlation of r = .28 between left inferior–temporal positivity in the familiarity task and the percentage of correctly retrieved semantic information in the following semantic information test, this correlation only approached significance. It is possible that the small correlations between performance in the semantic memory test and ERP responses in the familiarity task might be due to the nature of our task, interindividual ERP variability, or a combination of both factors. Our study was not primarily aimed at this question, such that further research is needed here. A related and promising approach might be to directly compare neural activity at encoding for items depending on subsequent memory performance (Schott, Richardson-Klavehn, Heinze, & Duzel, 2002; Wagner, Koutstaal, & Schacter, 1999).
To summarize, the present study demonstrates that extended participation in a difficult face-recognition task results in a modulation of ERP components starting with N170. Modulations for N170 were not affected by face repetitions and probably reflect overall changes in the efficiency of processing configural face information. In contrast, the learning of individual faces was associated with a stronger increase of inferior–temporal N250 and extending inferior–temporal negativity as well as a central–parietal positive component between 400 and 600 msec. Semantic information yielded effects at later time intervals and not before a 700-msec poststimulus onset, suggesting that semantics enhance face learning by facilitating postperceptual processing rather than by enhancing visual representations.
This study was supported by a British Academy Postdoctoral Fellowship to J. M. Kaufmann (BAPDF0407) and grant 17/S14233 from the Biotechnology and Biological Sciences Research Council (UK) to S. R. Schweinberger and A. M. Burton. We gratefully acknowledge the help of Franziska Plessow, Romi Zäske, and Hanna Nölting in stimulus editing and data collection and all the volunteers who kindly agreed to have their faces and voices recorded for this study.
Reprint requests should be sent to Jürgen M. Kaufmann, Department of Psychology, Friedrich-Schiller-University of Jena, Am Steiger 3, 07743 Jena, Germany, or via e-mail: email@example.com.
However, significant main effects of block were found in separate post hoc analyses for both the left, F(3, 69) = 20.32, p < .0001, and the right hemisphere, F(3, 69) = 40.0, p < .0001.
Following the suggestion of an anonymous reviewer, we also calculated Pearson product–moment correlations (α = .05) between amplitudes of ERP components for semantic faces in the familiarity task and the performance in the semantic memory task (percent correct). To this means, we calculated mean amplitudes across the four blocks of the familiarity task. For each ROI, data were averaged across included electrodes (for the inferior–temporal ROI, separate means were calculated for the left and the right hemisphere). Moderate correlations were found between the semantic memory test and the right hemispheric inferior–temporal N170, r = .43, p < .05, indicating that better semantic test performance was associated with smaller (less negative) N170 in the familiarity task. Correlations with the semantic memory test approached significance for the left inferior–temporal N250, r = .29, p < .10, for central–parietal amplitude in the 400- to 600-ms segment, r = .31, p < .10, and for left inferior–temporal amplitude in the 700- to 900-ms segment, r = .28, p < .10. None of the other correlations with performance in the semantic memory test approached significance (occipital P100, r = .04; left inferior–temporal N170, r = .25; right inferior–temporal N250, r = .06; 300–400 msec, left inferior–temporal, r = .22, right inferior–temporal, r = .04, frontal, r = −.03, central–parietal, r = .13; 400–600 msec, left inferior–temporal, r = .06; right inferior–temporal, r = −.09, frontal, r = .14; 700–900 msec, right inferior–temporal, r = .00, frontal, r = .06, central–parietal, r = .08).