Rapid identification of a familiar face requires an image-invariant representation of person identity. A varying sample of familiar faces is necessary to disentangle image-level from person-level processing. We investigated the time course of face identity processing using a multivariate electroencephalography analysis. Participants saw ambient exemplars of celebrity faces that differed in pose, lighting, hairstyle, and so forth. A name prime preceded a face on half of the trials to preactivate person-specific information, whereas a neutral prime was used on the remaining half. This manipulation helped dissociate perceptual- and semantic-based identification. Two time intervals within the post-face onset electroencephalography epoch were sensitive to person identity. The early perceptual phase spanned 110–228 msec and was not modulated by the name prime. The late semantic phase spanned 252–1000 msec and was sensitive to person knowledge activated by the name prime. Within this late phase, the identity response occurred earlier in time (300–600 msec) for the name prime with a scalp topography similar to the FN400 ERP. This may reflect a matching of the person primed in memory with the face on the screen. Following a neutral prime, the identity response occurred later in time (500–800 msec) with a scalp topography similar to the P600f ERP. This may reflect activation of semantic knowledge associated with the identity. Our results suggest that processing of identity begins early (110 msec), with some tolerance to image-level variations, and then progresses in stages sensitive to perceptual and then to semantic features.
We are easily able to recognize familiar faces, even in very blurry or noisy photos (Jenkins & Burton, 2011). In striking contrast, we experience difficulties recognizing faces of unfamiliar individuals (Johnston & Edmonds, 2009). For example, different face photos of the same unfamiliar person are more likely to be interpreted as belonging to different people than different photos of the same familiar person (Jenkins, White, Van Montfort, & Burton, 2011). Furthermore, when matching pairs of unfamiliar faces, accuracy and RT is similar to when matching pairs of inverted faces. Matching familiar faces is completed much faster and with higher accuracy (Megreya & Burton, 2006). Such findings suggest a transformation from a noisy image-based processing for unfamiliar faces to selective person-based processing for familiar faces. The neural computations underlying familiar face recognition are unknown, but we believe that an understanding of the time course of familiar face processing would be a useful starting place for their explication.
Cognitive models of face recognition (Burton, Jenkins, & Schweinberger, 2011; Bruce & Young, 1986) typically comprise a series of processing stages organized in a feed-forward hierarchy whereby person recognition progresses through initial processing of physical face features to later stages where semantic information is accessed. Neural models of face processing based on ERP measurements have typically associated processing stages derived from such cognitive models with a temporal sequence of distinct ERP components. For example, the perceptual analysis stage involves the initial encoding of the visual properties of a face and the generation of a viewpoint-invariant representation of that face. This stage has been associated with the appearance of a N170 ERP, a negative voltage deflection maximal in amplitude over the right temporoparietal scalp and peaking 140–180 msec after face onset (Eimer, 2011; Bentin, Allison, Puce, Perez, & McCarthy, 1996). A subsequent face recognition stage occurs when the perceptual face representation is matched with a stored memory trace of that face. This cognitive processing stage has been associated with two ERP components with latencies of ∼250 msec, N250 and N250r, and which are largest over the inferior temporal scalp (Caharel, Ramon, & Rossion, 2014; Zheng, Mondloch, & Segalowitz, 2012; Gosling & Eimer, 2011; Tacikowski, Jednoróg, Marchewka, & Nowicka, 2011; Bindemann, Burton, Leuthold, & Schweinberger, 2008; Neumann & Schweinberger, 2008; Martens, Schweinberger, Kiefer, & Burton, 2006; Herzmann, Schweinberger, Sommer, & Jentzsch, 2004; Schweinberger, Pickering, Jentzsch, Burton, & Kaufmann, 2002). Once the face is recognized, a person identification stage links the face to semantic information about the person (e.g., their name). This cognitive processing stage has been associated with the appearance of N400f, a centroparietal ERP occurring in the 300–500 msec range (Bentin & Deouell, 2000; Eimer, 2000a, 2000b), and P600f, a centroparietal ERP in the 400- to 600-msec range (Taylor, Shehzad, & McCarthy, 2016; Curran & Hancock, 2007; Boehm & Paller, 2006).
This hierarchical processing model has been challenged by recent studies using a variety of methodologies that report higher level face processing occurring at latencies presumed previously to be dominated by low-level visual processing. For example, multivariate pattern analysis (MVPA) performed on electroencephalography (EEG) recordings from subdural electrodes located on the fusiform face area revealed face selectivity beginning at 50 msec after the onset of a face stimulus compared with nonface stimuli (Ghuman et al., 2014). An MVPA analysis of magnetoencephalography (MEG) recordings presumed to originate in the fusiform face area also found that facial identity can be decoded at ∼50 msec (Vida, Nestor, Plaut, & Behrmann, 2017), and MVPA analysis of EEG recordings found that facial identity can be decoded as early as 70 msec (Nemrodov, Niemeier, Mok, & Nestor, 2016). Repetitive TMS applied at 60 and 100 msec and presumed to disrupt the occipital face area was shown to affect the accurate discrimination of face parts (Pitcher, Walsh, Yovel, & Duchaine, 2007). In the scalp-recorded ERP literature, face-specific processing has been found earlier than the N170 (Herrmann, Ehlis, Ellgring, & Fallgatter, 2005; Batty & Taylor, 2003; Ito & Urland, 2003); for instance, face inversion and face race has been shown to influence ERPs at ∼100 msec (P100) even while controlling for low-level image properties (Colombatto & McCarthy, 2017). These ERP differences occur in the same time range as involuntary “fast saccades” toward human faces demonstrated in eye-tracking studies (Crouzet, Kirchner, & Thorpe, 2010). The evidence of higher order face processing at short latencies across these studies suggests the existence of a nonhierarchical neural architecture where higher order processes emerge early and potentially without input from lower order processes (Rossion, Dricot, Goebel, & Busigny, 2011; Rossion, 2008).
Differences in methodology may be partly responsible for the discrepancy between the earlier ERP literature, suggesting that face recognition begins at ∼170–250 msec (Marzi & Viggiano, 2007; Caharel, Courtay, Bernard, Lalonde, & Rebaï, 2005; Jemel, Pisani, Calabria, Crommelinck, & Bruyer, 2003; Caharel et al., 2002; Bentin & Deouell, 2000; Eimer, 2000b) and the more recent studies that challenge that conclusion. Studies examining specific ERP components have typically relied upon univariate analyses that average activity across trials and then across participants. Although recordings are usually acquired at many scalp electrodes, measurement and statistical analysis of particular components usually occur only at a subset of scalp locations. That is, although traditional ERP analysis usually describes the average scalp distribution of voltage associated with a particular ERP component, the full distribution is often not analyzed.
In contrast, MVPA evaluates patterns of electrical activity across all electrodes to build statistical models that classify individual ERP trials as belonging to particular categories (e.g., face/nonface, familiar/unfamiliar face). These models are optimized for each individual participant, and thus, the patterns resulting in optimal classification performance need not be the same for different participants (Pernet, Sajda, & Rousselet, 2011; Rousselet, Gaspar, Wieczorek, & Pernet, 2011). Moreover, in MVPA, significant classification accuracies are aggregated across participants rather than mean amplitude differences (Norman, Polyn, Detre, & Haxby, 2006). For these reasons, MVPA has been found to be more sensitive than univariate analyses in fMRI research (Jimura & Poldrack, 2012). MVPA is finding increased utility in ERP research of familiar faces; for example, an ERP study that used MVPA to discriminate famous and novel faces found significant differences in classification beginning as early as 140 msec (Barragan-Jason, Cauchoix, & Barbeau, 2015).
In examining person identity, prior ERP studies have ignored the natural variability in faces encountered every day, which includes variation in pose, lighting, expression, and age. This variation in the so-called “ambient images” (Jenkins et al., 2011) can be considerable and must be accounted for to determine a person's identity. Most studies remove such variation from the stimulus set and use only a single photo to represent a particular person. Without a sufficiently variable sample of familiar faces, studies cannot accurately capture the invariance to image-level differences that defines person identity.
In this study, we applied MVPA to the EEG signal and used a large selection of ambient images from well-known celebrities to study familiar face recognition. Our goal was to determine the minimum latency for reliable neural signatures of person identity. Participants were shown 18 ambient images for each of three female celebrities and indicated the identity of the celebrity shown on each trial with a button press. On half the trials, we manipulated access to person identity with a semantic prime consisting of the name of the celebrity to be shown (name prime) or a nonidentifying general cue (neutral prime). We first identified the time points where the patterns of EEG activity across 64 scalp locations were associated with identity. We then identified the electrodes at those time points, which were driving the effect. To assess identity processing, we measured variation in activity between identities across various photos. We distinguished between the timing of low-level (perceptual) and high-level (semantic) features related to identity by considering the effects of the prime on our measure of identity processing. Because the name prime consisted of semantic information, we considered semantic processing of identity to occur whenever the name prime modulated identity processing. The validity of the semantic information conveyed by the name prime allowed us to also test if identity processing could be shifted in time due to a prior expectation.
Sixteen right-handed, healthy young adults (13 women, average age = 19.6 years) with normal or corrected vision participated in the study in exchange for compensation. Because of technical difficulties, data for one participant were not available for analysis. All procedures were approved by the Yale University's institutional review board, the Human Studies Committee. Informed consent was obtained from all participants.
We identified nine celebrities that were well known to all participants. We collected photos of each celebrity from the FaceScrub (Ng & Winkler, 2014) and PubFig (Kumar, Berg, Belhumeur, & Nayar, 2009) data sets supplemented by images obtained from a Google Image search. We chose photos in which a face could be detected and landmarks (e.g., nose, eyes) could be accurately identified using an automated algorithm in the dlib toolbox (King, 2009). The selected photos for each identity had considerable image variability reflecting natural variation in pose, luminance, expression, content, hairstyle, makeup, and age. Figure 1 demonstrates that variability in the photos of one celebrity. In this experiment, three celebrities served as targets (Angelina Jolie, Jennifer Aniston, and Julia Roberts), and six celebrities served as distractors (Anne Hathaway, Ellen DeGeneres, Kate Winslet, Keira Knightley, Sandra Bullock, and Tina Fey). Eighteen photos were shown for each target stimulus, whereas two photos were shown for each distractor stimulus.
We calculated the mean brightness (luminance) of each photo for use as a low-level vision control. Each photo was transformed to grayscale by converting from RGB to HSL color space. The mean of the lightness channel was taken as the mean brightness of the image.
Participants viewed faces of celebrities while continuous EEG was recorded. Each trial began with text displayed for 500 msec, followed by a fixation cross for 500 msec (Figure 2). In the name prime condition, the text indicated the name of the celebrity to be shown. In the neutral prime condition, the word “Face” was shown. One second after the onset of the text, a photo of a celebrity was displayed for 500 msec. Photos were resized to have the same area (256 × 256 pixels) while maintaining their native aspect ratios. To increase participant uncertainty and to ensure attention to the faces, photos of distractor celebrities were displayed on 10% of name prime trials and 10% of neutral prime trials. Participants were instructed to press a specific button for each of the three target celebrities and withhold a response for a distractor face. These relatively few invalid trials were not intended for further analyses and will not be discussed further. Trials were separated by an intertrial interval of 1900–2700 msec. Participants were instructed to maintain fixation at the center of the screen. Participants completed 10 runs with 72 trials in each run.
Given the large variation present in the ambient images, different features might be relevant in discriminating between identities. In particular, we sought to assess if low-level visual features could be used to discriminate between the three identities. We applied a computational model that reproduces the population behavior of V1 (Güçlü & van Gerven, 2014) to our celebrity photos. The model uses independent component analysis (ICA) to learn topographically organized simple and complex cell receptive fields from natural images (code can be found at https://github.com/artcogsys/Neural-coding). For each photo, the model outputs the activation associated with the simulated simple and complex cells. Because of redundancies in the simulated V1 activity, we applied PCA and selected the components with significant eigenvalues (see section on Multivariate Analyses below for our approach), which reduced the dimensionality of the model output. Then, to test if there are consistent differences in the physical features between the identities, we applied a multivariate analysis (description below) to examine if the pattern of V1 responses generated by the PCA-reduced model discriminate between photos of the three identities.
We conducted a two-way repeated-measures ANOVA on accuracy (i.e., in identifying the target identity) and RTs using the R statistical package (Version 3.3.1). We included name versus neutral prime condition as a factor and the three celebrity identities as a second factor. We also conducted a separate one-way repeated-measures ANOVA on false alarms (i.e., when participants failed to inhibit a response to a distractor stimuli) and included only name versus neutral prime as a factor.
EEG data were acquired continuously using a Compumedics Synamps RT amplifier system. The EEG was recorded from Ag/Ag–Cl electrodes using a 64-channel Neuroscan Quik-Cap (consisting of all the electrodes in the 10–20 system along with interposed electrodes). The participant's nose was used as a reference. Each EEG channel was amplified with a gain of 2200 and digitized at 500 Hz with 24-bit resolution and passed through a 0.05–100 Hz band-pass filter. Additional horizontal and vertical EOG channels were placed to monitor eye movements and blinks.
Artifact Removal and Trial Epochs
We used independent component analysis (ICA) to remove artifacts (e.g., eye blinks) from the continuous EEG signal. Because ICA can be sensitive to bad channels, we first identified such channels (seven participants had one to three bad channels) and excluded them from the ICA and subsequent analyses. Continuous data from the different experimental runs were concatenated together, and ICA was then run using the infomax algorithm in EEGlab v14.1 (Delorme & Makeig, 2004). Artifact components were automatically detected using the SASICA plugin to EEGlab (Chaumon, Bishop, & Busch, 2015). We detected noisy components with low autocorrelation, focal spatial activity, focal temporal activity, and time courses that correlated with the EOG channels. We also used the algorithms ADJUST and FASTER to identify artifactual components that reflected blinks, eye movements, or generic discontinuities (Mognon, Jovicich, Bruzzone, & Buiatti, 2011; Nolan, Whelan, & Reilly, 2010). The rejected components included high-frequency noise such as muscle activity and power line noise. We confirmed the selected components were artifacts by manually checking the component topography and time series. Any components that appeared to be signal were retained, whereas the remaining noise components were removed. For each participant, an average of 22 components were removed (range = 13–31).
Single trial epochs were extracted from the artifact-cleaned continuous EEG data consisting of the interval beginning 100 msec before the presentation of each face to 1000 msec afterward. The average of the 100-msec prestimulus period was subtracted from all data points for that epoch and thus served as a baseline for amplitude measurements. These single trial epoch data were then used for all subsequent analyses.
Our analytic approach was adapted from prior fMRI work (Shehzad et al., 2014; Reiss, Stevens, Shehzad, Petkova, & Milham, 2010) and examined whether variation in patterns of activity in the EEG signal could be explained by the prime and person identity. We defined patterns of activity as the topographic map (i.e., across 64 scalp electrodes) at each time point for each EEG trial. There were four steps in our analysis, with the first three steps done separately for each participant. The patterns of activity contained much redundant information so, in the first step, we used PCA to reduce data dimensionality at each time point (64 electrodes × 648 trials). We retained for further analysis those components with eigenvalues that were greater than chance (p < .05). To obtain a null distribution of eigenvalues, we first separately permuted the trial indices at each electrode and then recomputed the PCA based on the shuffled data. This procedure was repeated 25 times. The p value for each component was calculated by comparing the original eigenvalue to the distribution of 25 permuted eigenvalues. For a typical time point, the dimensionality of activity patterns reduced from 64 electrodes to three to six components (mean variance explained: 90 ± 6%).
In the second step, we assessed variation in patterns of activity at each time point and across trials by calculating the distance between topographic maps for all pairs of trials—resulting in a distance matrix. We computed the Mahalanobis distance, which measures the number of standard deviations between two topographic maps based on the distribution of pairwise trial-by-trial distances. It is similar to the Euclidean distance but takes into account correlations within the data set. In previous work in which this approach was applied to fMRI data, the Mahalanobis distance was the most sensitive distance metric among an array of metrics tested (Shehzad et al., 2014).
For the third step, we applied multidimensional distance matrix regression (MDMR) to determine if the trial labels for prime and person identity can explain the distances between trial-by-trial activity patterns. Four factors were examined: image luminance (low-level vision control), the effects of name versus neutral prime, identity, and the interaction of Prime × Identity. For each factor, MDMR yields a pseudo-F statistic analogous to an F statistic from a standard ANOVA model (for details, see Shehzad et al., 2014; Reiss et al., 2010). Because the pseudo-F statistic does not have an F distribution under the null hypothesis, its significance is assessed with a permutation test. The null distribution was simulated by applying a random permutation to the trial indices for each variable of interest (e.g., identity) 10,000 times and recomputing the pseudo-F statistic each time. This resulted in a separate null distribution for each variable of interest. The p value was calculated by comparing the original pseudo-F statistic for each variable to its associated null distribution.
Finally, in the fourth step, we combined the statistics across participants to obtain group statistics. For each factor, we averaged the pseudo-F statistics across the 15 participants for the original and permuted data. The group-averaged pseudo-F statistic from the original data was referred to the simulated distribution to obtain a p value. Thus, at each time point, we had a group-averaged pseudo-F statistic and p value for each factor.
We controlled for family-wise error rate by using a permutation-based cluster correction (time point threshold of p < .05 and cluster size threshold p < .05). We split the permuted data into halves (Sets A and B). Permutation Set B was used to obtain p values at each time point for the original data and permutation Set A. Permutation Set A was then used to obtain cluster p values. We thresholded the original data and permutation Set A (p < .05) and identified clusters as contiguous time points with significant effects. To determine cluster significance, the null distribution was simulated by taking the maximum cluster size for each permutation in Set A. We referred each cluster in the original data to this simulated distribution to obtain a cluster p value, which was then thresholded at p < .05.
Post hoc Univariate Analyses
MDMR served as an omnibus test to identify time points with significant effects for each factor. Post hoc univariate analyses were then employed to determine the direction of the effect and the particular electrodes involved. We selected relevant time points based on peaks in the MDMR analysis and examined the topographic maps. We used linear mixed-effects analysis or ANOVAs to measure how activity at each electrode varied with our factors of interest. Based on the topographic results, we selected relevant electrodes to examine the effects of each factor over time. In comparing the univariate and multivariate results, it is possible to find time points with significant multivariate effects, but no significant univariate effects. This reflects the increased sensitivity of multivariate relative to univariate analyses for the reasons described in the Introduction.
We sought to understand whether the early phase of face identity could be based on low-level visual features that were consistent for an identity, despite the variation present in the ambient images. Using our MDMR analysis, we found greater than chance discrimination of the three celebrity identities based solely on features from a model simulating V1 responses to natural images (p < .05). The correlations between the pattern of V1 responses for each photo were higher within identity (mean r = .19) than between identities (mean r = .09). For each identity, the photos were most similar for Julia Roberts (mean r = .20), less similar for Jennifer Aniston (mean r = .18), and least similar for Angelina Jolie (mean r = .17). This suggests that the early effects for identity processing might be based on consistent differences in low level physical features between identities.
We collected three behavioral measures during the EEG recording: accuracy, RT, and numbers of false alarms.
We found a significant main effect of Name versus Neutral prime, F(1, 67) = 6.2, p < .05, on accuracy and a marginally significant effect of Person identity, F(2, 67) = 3.1, p = .05. Participants were more accurate at identifying the target identity for the name prime (M = 97.9%) than neutral prime (M = 95.3%) trials. Responses were least accurate for Julia Roberts (M = 94.8%) compared with Angelina Jolie (M = 97.1%) and Jennifer Aniston (M = 97.8%), suggesting that participants were less familiar with Julia Roberts.
We also found a significant main effect of the Name versus Neutral prime, F(1, 67) = 40.9, p < .05, and Person identity, F(2, 67) = 71.3, p < .05, on RT (due to a technical error, RTs for one participant were missing). RTs were, on average, faster for the name prime (M = 523 msec) than neutral prime (M = 667 msec) trials. In terms of the person identity, RTs were faster for Angelina Jolie (M = 523 msec) than Jennifer Aniston (M = 638 msec) and slowest for Julia Roberts (M = 713 msec), suggesting that participants were most familiar with photos of Angelina Jolie and least familiar with Julia Roberts.
Finally, we found a significant main effect of the Name versus Neutral prime, F(1, 13) = 8.7, p < .05, on the number of False alarms. Participants were more likely to respond to a distractor face on name prime (M = 7.1%) than neutral prime (M = 2.2%) trials. This result provided a manipulation check in that it demonstrated that participants had a greater expectancy for the name primed face. In terms of person identity, we did not find a significant difference between identities on the proportion of false alarms, F(2, 67) = 0.97, p = .36. However, the distribution of false alarms by identity parallel the results for RTs. Participants had the least proportion of false alarms for Angelina Jolie (M = 12%), followed by Jennifer Aniston (M = 14%) and the most false alarms for Julie Roberts (M = 15%).
Figure 3 presents the outcome of the MDMR analysis. The pseudo-F statistic representing the degree of explained variance in the scalp EEG activity patterns is plotted at each time frame for the luminance, name versus neutral prime, and identity manipulations. The final plot represents the interaction of the prime and identity manipulations (i.e., time points at which the identity manipulation was modulated by the prime manipulation). The pink-shaded regions indicate time frames that reached statistical significance using the cluster correction methods described above. Variation in each of our manipulations explained variation in the EEG activity patterns over different time frames. Notably, both the prime and identity manipulations explained significant variation in the EEG activity patterns over long time intervals beginning at very short latencies. We note that RTs were systematically different between the three identities—not a surprising effect as familiarity strongly affects RT. However, RT differences could also introduce spurious differences in the multivariate analyses unrelated to face identity. However, when we removed the trial-by-trial variation in RTs (Todd, Nystrom, & Cohen, 2013), we found a similar pattern of multivariate results, which suggest that our findings reflect actual processes underlying face identity rather than differences in motor responses. In what follows, we will drill down into these significant time regions. MDMR will help us first identify the particular time points with significant effects. We then follow up with univariate analyses (Figures 4–7) to examine the scalp topographies at particular latencies of interest and time courses at particular electrodes of interest.
We included a measure of image luminance as a low-level vision control that would also provide a plausibility check for the MDMR results. We found the mean luminance in each image explained variation in the activity pattern maps between 98 and 234 msec with a prominent peak at 128 msec (Figure 3). We further investigated the significant MDMR results with univariate analyses (t statistics) to examine the direction and location of the luminance effect. The topographic map (right) for the univariate effect of luminance at 128 msec showed a more positive response for images with higher luminance in posterior occipitotemporal regions and a more negative response for images with higher luminance in frontal regions (Figure 4). The ERP amplitude at 128 msec (left) showed a left dominant but bilobed positive ERP distribution over posterior scalp, reminiscent of a P100 ERP distribution.
Name versus Neutral Prime
The topographic patterns of the Name and Neutral prime condition were significantly different for nearly the entirety of the epoch beginning at 62 msec before face onset (p < .05; Figure 3). We noted an increase in the explained variance (in the MDMR's pseudo-F statistic) for name versus neutral prime at 158 msec and a maximal peak in the pseudo-F statistic at 566 msec. At 158 msec, a canonical N170 ERP scalp distribution with a right posterior temporoparietal focus was clearly evident (Figure 5A). A similar scalp distribution was found for the difference between Name and Neutral prime at 158 msec using univariate analyses. In centroparietal regions, the response was greater for Neutral versus Name prime trials, whereas in temporoparietal regions, the response was greater for Name versus Neutral prime trials. However, the difference was not significant at individual electrodes (univariate analysis) unlike the multivariate result. In addition, when examining the time course of the PO8 electrode, we observed an N170 peaking at 156 msec but did not observe significant differences between name and neutral prime until 389–740 msec (p < .05; Figure 5B). At the peak of the MDMR prime effect at 566 msec, the topography was maximal over centroparietal scalp peaking at the Cz electrode. The topography of the univariate prime effect was similar, but slightly posterior, and peaked at the Pz electrode (Figure 5A). When examining the time course of the Pz electrode, we observed a more positive response for the neutral versus name prime condition throughout the epoch with a sustained and significant difference between 242 and 834 msec (p < .05; Figure 5B).
Variation in the activity patterns evoked by the photographs was significantly explained by the three celebrity identities in two time windows: 110–228 and 252–1000 msec (Figure 3). Within these intervals, we examined the scalp topographies using univariate analyses associated with four peaks in the MDMR's pseudo-F statistic at 180, 408, 488, and 614 msec. The earliest peak (180 msec) showed an effect of Identity in posterior bilateral temporoparietal regions and in frontal regions (Figure 6A). At 408 msec, variation in activity related to identity was centered over right frontal regions. At 488 msec, activity related to identity extended bilaterally in frontal regions (not shown in Figure 6). The maximal peak of the Identity effect using MDMR occurred at 614 msec, and the associated topographic map for identity at that time showed a posterior centroparietal focus, similar to the topographic distribution of the prime effect at 566 msec. Time courses at select electrodes showed significant variation related to Identity before about 500 msec in frontal electrodes (230–514 msec at F5 and 238–514 msec at F6) and after about 500 msec in posterior electrodes (552–924 msec at Pz and 652–926 msec at PO8; Figure 6B). At PO8, we also observed significant effects related to Identity intermittently between 158 and 218 msec, which suggests that the N170 ERP is modulated by between-person identity.
Interaction of Prime and Between-person Identity
We explored whether priming might modulate identity processing. We found that the interaction of prime and identity significantly explained variation in topographic patterns between 290–814 and 908–1000 msec (Figure 3). In other words, before 300 msec, identity processing occurred independently of the name prime (e.g., 110–228 msec). After 300 msec, semantic (name) priming modulated identity processing.
To better understand this interaction, we separately examined the effects of identity by name and neutral prime trials using univariate analyses. Topographic maps using univariate analyses were examined at three time points to cover the range of the interaction effect found initially with MDMR (Figure 7A). At 304 msec, we found an effect of Identity in left posterior frontal electrodes for name prime trials but not a significant effect for Neutral prime trials. At 630 msec, we observed similar topographic distributions for the effect of identity in both name and neutral prime conditions within posterior centroparietal regions. At 714 msec, we found no significant effect of Identity for Name prime trials but a significant effect for Neutral prime trials in posterior centroparietal regions. Time courses at FC5, Pz, and PO8 supported this distinction of an earlier effect for identity processing in name prime trials but a later effect in neutral prime trials (Figure 7B).
We analyzed the multivariate signal derived from the EEG scalp distribution at sequential time frames to investigate the timing of face identity processing. MDMR analysis demonstrated above chance discrimination between three celebrity identities from patterns of brain activity. The identity discrimination were obtained from brain activity evoked by a variety of ambient images for each identity and thus was tolerant to natural variations in a face. Our analyses revealed two intervals over which identity information could be discriminated and which we will argue are functionally dissociable. The early phase began at 110 msec and extended until 228 msec. Our analysis thus indicates that some face identity information is available as early as 110 msec, which is earlier than previously reported in the ERP literature. The three identities could also be significantly discriminated in a second, later and more extended phase beginning at 252 msec and peaking at 614 msec.
We suggest that the early and late phases of identity information processing are functionally dissociable based on their interaction with our semantic name priming manipulation. The name versus neutral prime manipulation was intended to differentially activate person information related to facial identity. Significant name versus neutral prime discrimination was evident ∼62 msec before the onset of the face suggesting that our manipulation was successful in activating person knowledge, even before the face was displayed. The identity information in the early epoch (110–252 msec) phase was not modulated by the presence of the name prime and thus may represent consistent featural aspects of the celebrity identities and not upon activation of semantic information associated with each identity. In contrast, the later phase of identity information processing was modulated by the name prime (from 300 msec onward), suggesting that this later phase involved activation of person knowledge associated with each identity.
Early Phase of Identity Processing
The early phase of identity processing may reflect activity in primary and secondary visual cortices typically associated with visual feature processing. This is consistent with a prior study from our lab, in which source modeling of the scalp distribution of ERP differences between upright and inverted faces occurring at 100 msec was consistent with a neural source in the pericalcarine extrastriate cortex (e.g., V3–V5; Colombatto & McCarthy, 2017). Patterns of activity obtained from fMRI suggested that voxels in area V1 can discriminate between facial expressions and gender (Petro, Smith, Schyns, & Muckli, 2013). A recent MEG study suggests that neural sources localized to V1 can discriminate between facial identities over 50–100 msec (Vida et al., 2017).
The early phase of face identity could be based on low-level visual features that were consistent for an identity, despite the variation present in the ambient images. We applied a computational model that reproduces the population behavior of V1 (Güçlü & van Gerven, 2014) to our celebrity photos. We then submitted to MDMR analysis features generated by the model for each photo. We found greater than chance discrimination of the three celebrity identities based on these features alone. This suggests that what identity-level representations that exist at V1 are based on consistent differences in physical features between the identities.
Our ability to detect face identity processing in the early interval depended on our multivariate approach. No effects of identity were evident at 110 msec in our univariate analyses using standard ERP signal-averaging approaches. A likely explanation for this discrepancy is that, while identity representations are consistent within each observer, they differ from other observers. A univariate ERP signal averaging would require the ordering of differences in activity between identities to be the same across participants (e.g., the amplitude for Angelina Jolie must be consistently higher than Jennifer Aniston than Julia Roberts across participants). In contrast, multivariate analyses create optimal models for each participant only and that the differences in activity among identities are to be consistent only across individual trials. However, the direction of the identity differences can be different between participants.
Following 110 msec, we found a peak in the identity effect for MDMR at 180 msec. This period overlaps with the N170 ERP thought to be sensitive to higher order visual features (such as distance between eyes). Univariate analyses at this time point showed significant variation in signal between identities in posterior temporoparietal regions as in the N170. This supports findings that the N170 is sensitive to face familiarity (Barragan-Jason et al., 2015; Wild-Wall, Dimigen, & Sommer, 2008; Marzi & Viggiano, 2007; Caharel et al., 2002, 2005; Jemel et al., 2003). However, most studies have failed to find a difference between familiar and unfamiliar faces at the N170 and suggest that the N170 indexes face individuation for both familiar and unfamiliar faces (Caharel et al., 2014; Rossion & Caharel, 2011; George, Jemel, Fiori, Chaby, & Renault, 2005; Itier & Taylor, 2004). Although our study did not use unfamiliar faces, we used many exemplars for each identity and can separate person from face individuation. For the variation in signal between our three identities to be significant, it had to be greater than the variation in signal among the different photos (i.e., the noise level).
Early to Late Phase of Identity Processing
Unlike the early phase of identity processing, the late phase was strongly influenced by the prime manipulation. Our multivariate results indicated that the sensitivity of identity processing to priming began at ∼252 msec, whereas our univariate results suggested an even earlier onset of ∼200 msec. These results are consistent with prior work demonstrating perceptual processing of face identity occurring first (∼100 msec and onward), followed by later semantic processing of face identity (∼300 msec and onward). For instance, Vida et al. (2017) used MEG to measure the timing of face identity processing. Consistent with our two phases of face identity processing, they found a time window of 100–200 msec for image-based processing of face identity and found a later time (onset of ∼200–300 msec) specific to identity-specific representations. Similarly, Dobs, Isik, Pantazis, and Kanwisher (2019) used MEG to measure face identity processing and found identity information as early as 91 msec. However, when they compared responses between familiar and unfamiliar faces, they found a much later difference at 400 msec and onward. In support of later semantic processing, a prior study from our laboratory indicated that the earliest moment in time when person knowledge associated with a face can be accessed is between 200 and 300 msec (Taylor et al., 2016). Finally, Kietzmann, Gert, Tong, and König (2017) found early effects related to head orientation (60 msec), but near-complete viewpoint invariance was observed by 280 msec. Taken together, these findings suggest that by ∼300 msec processing of perceptual features for a face is complete and the resulting viewpoint-invariant representation is used for additional semantic processing of the face, consistent with the timings of identity processing in our findings.
Late Phase of Identity Processing
Within the late phase, the temporal response of identity processing differed between the name and neutral prime conditions. When the name prime caused an expectation for the appearance of a particular celebrity, participants showed significant effects of identity earlier in time (∼300–600 msec). In contrast, in the neutral prime condition when no particular celebrity photo was expected, participants showed a later identity processing response (∼500–1000 msec). The effect of identity processing after the name prime may occur earlier than after the neutral prime because all the details associated with identity would have been retrieved after seeing the name of the person and before seeing the face. Consequently, in the name prime condition, participants may have engaged in a superficial matching of the person in memory with the face on the screen. This interpretation would be consistent with a familiarity judgment traditionally found in explicit memory tasks. Familiarity is associated with a frontal scalp distribution between 300 and 500 msec, called the FN400, which is similar to the response observed in our study for both the name and neutral prime conditions (Figure 7; Curran & Hancock, 2007). The later response in the neutral prime condition may reflect additional and more detailed retrieval of person identity information occurring at that moment. This would be consistent with a process of recollection found in explicit memory tasks. Recollection is associated with a positive response recorded over posterior parietal electrodes between 500 and 800 msec, which is similar to the response observed in our study largely for the neutral prime condition. Our results then suggest that outside the context of an explicit memory judgment (old vs. new), we can find similar responses in relation to judgments about person identity that rely on memory or knowledge about a well-known person.
The time course of identity effects support the preactivation of face recognition units (FRUs) by the name prime. In models of face processing, FRUs contain abstract face representations that can be matched to any view or expression of a face (Bruce & Young, 1986). Our results show that the name primes activate such abstract representations. First, the name primes modulate the N250, a proposed marker of FRUs (Schweinberger, 1996). At ∼300 msec, ERPs for name primes showed a less positive response in occipitoparietal electrodes when compared with ERPs for neutral primes. Second, name primes elicited identity effects with a topography similar to an FN400, which as mentioned reflect familiarity-based recognition of a face (Curran & Hancock, 2007). Taken together, the N250 and FN400 imply that long-term memory traces of a familiar face are activated (i.e., in a FRU) and then matched to online face representations.
Prediction Error Signals in Identity Processing
Our work did not investigate feedback of top–down information from the name prime resulting in a prediction error. Face identities incongruent with the name prime were too few (10% of trials) to have been examined. It is possible that our identity measure, which was modulated by the semantic (name) prime only after 300 msec, would have been affected by an incongruent name prime or identity mismatch earlier than 300 msec. In line with this prediction, Johnston, Overell, Kaufman, Robinson, and Young (2016) found that, when expecting a particular famous face identity, a change in the identity was detected at the N170. The authors attributed this result to generation of a prediction error signal and not a full analysis of the face identity. Future work then will need to separate stimulus processing from prediction error signals when examining familiar face processing.
Unfamiliar Face Processing
In the present experiment, we excluded unfamiliar faces because these faces are not associated with any person-specific knowledge and cannot be easily primed with a name. However, future work should consider comparing identity effects between familiar and unfamiliar faces. Unfamiliar faces with many ambient images for each identity could be matched to familiar faces in terms of low-level visual features. Based on prior work, we believe this comparison related to familiarity would demonstrate that early effects of identity processing (110 msec and onward) are specific to familiar faces. When Dobs et al. (2019) measured face identity processing based on familiarity, they found significant effects for familiar faces (starting at 96 msec) but no significant effects for unfamiliar faces. Prior knowledge of face identity may then help tune perceptual features for more accurate face recognition (e.g., at 110 msec and onward in our study).
We sought to identify the time course of face identity processing using multivariate EEG signal and a large selection of ambient images from well-known celebrities. Our results both support and challenge standard feed-forward models of face processing. In support of feed-forward models, we found two phases in identity processing where earlier perceptual processing of face features (110–252 msec) is followed by activation of semantic information associated with a face (300 msec and onward). However, the finding that perceptual face identity processing can occur as early as 110 msec may suggest that higher level processing, such as face recognition, may occur earlier than previously thought. In addition, knowing the identity to be shown via a name prime shifted higher level semantic processing to occur earlier in time (300–600 msec instead of 500–1000 msec), suggesting top–down influences on processing of person identity information. Overall, our work suggests that genuine “semantic” face identity processing starts relatively late following initial perceptual processing and can be modulated by prior knowledge.
This work was supported by the National Institute of Mental Health (MH-005286 to G. M.).
Reprint requests should be sent to Gregory McCarthy, Department of Psychology, Yale University, P.O. Box 208205, New Haven, CT 06520-8205, or via e-mail: firstname.lastname@example.org.
Currently at the Zuckerman Mind Brain Behavior Institute, Columbia University.