Research about the neural basis of face recognition has investigated the timing and anatomical substrates of different stages of face processing. Scalp-recorded ERP studies of face processing have focused on the N170, an ERP with a peak latency of ∼170 msec that has long been associated with the initial structural encoding of faces. However, several studies have reported earlier ERP differences related to faces, suggesting that face-specific processes might occur before N170. Here, we examined the influence of face inversion and face race on the timing of face-sensitive scalp-recorded ERPs by examining neural responses to upright and inverted line-drawn and luminance-matched white and black faces in a sample of white participants. We found that the P100 ERP evoked by inverted faces was significantly larger than that evoked by upright faces. Although this inversion effect was statistically significant at 100 msec, the inverted-upright ERP difference peaked at 138 msec, suggesting that it might represent an activity in neural sources that overlap with P100. Inverse modeling of the inversion effect difference waveform suggested possible neural sources in pericalcarine extrastriate visual cortex and lateral occipito-temporal cortex. We also found that the inversion effect difference wave was larger for white faces. These results are consistent with behavioral evidence that individuals process the faces of their own races more configurally than faces of other races. Taken together, the inversion and race effects observed in the current study suggest that configuration influences face processing by at least 100 msec.
The apparent ease with which humans recognize and remember faces has suggested the operation of specialized brain processes. This view has been supported by studies of prosopagnosia because of acquired brain lesions (Rezlescu, Pitcher, & Duchaine, 2012; Busigny, Graf, Mayer, & Rossion, 2010; McNeil & Warrington, 1993) or abnormal development (Duchaine & Nakayama, 2006). Psychological models of face processing have proposed a number of discrete processes associated with face processing. For example, the influential Bruce and Young (1986) model posits a series of dissociable face processes that begin with the structural encoding of faces and include the analysis of face expression, the activation of face recognition units, and personal identity nodes.
Much neural research has implicitly or explicitly incorporated staged face processing models similar to Bruce and Young (1986) and has used fMRI to identify the anatomical substrate of the different stages and used ERPs to identify the timing of these processes. fMRI research has identified an extended brain network (Haxby, Hoffman, & Gobbini, 2000) of face-specific and face-sensitive regions including the occipital face area (Gauthier et al., 2000) and the fusiform face area (FFA; Kanwisher, McDermott, & Chun, 1997; Puce, Allison, Gore, & McCarthy, 1995) in the ventral occipito-temporal cortex (VOTC) and the posterior STS (pSTS) in the lateral occipito-temporal cortex (LOTC; Puce et al., 1995). Recordings from subdural electrodes in patients have revealed a spatially focal face-specific ERP with a peak latency of ∼192 msec in the fusiform gyrus and adjacent VOTC and a second focus in LOTC (Allison, Puce, Spencer, & McCarthy, 1999; Allison et al., 1994). Scalp recordings have revealed a face-specific negative ERP at ∼170 msec (N170) with maximal amplitude at (primarily right) posterior-temporal electrodes (Bentin, Allison, Puce, Perez, & McCarthy, 1996).
Whereas other, longer-latency face-sensitive ERPs have been reported, including N250 (Begleiter, Porjesz, & Wang, 1995; Schweinberger, Pfutze, & Sommer, 1995), N400f (Bentin & Deouell, 2000; Eimer, 2000a), and P600f (Eimer, 2000b), the vast majority of face processing studies have focused on N170, which is larger for faces than for other stimulus categories or control stimuli (Itier & Taylor, 2004a; Rossion et al., 2000). The N170 has been considered by many to reflect the initial structural encoding stage of face processing (Eimer, 2011) based in part on evidence that it is larger for the same face viewed from different angles compared with two different faces shown from the same perspective (Caharel, d'Arripe, Ramon, Jacques, & Rossion, 2009).
However, several studies have provided evidence that face processing might occur earlier than N170. Batty and Taylor (2003) found that emotional face expressions influenced ERP amplitudes as early as 94 msec. A study by Liu, Harris, and Kanwisher (2002) using MEG identified a face-selective response at ∼100 msec that correlated with correct face recognition. However, in a simultaneous EEG-fMRI study, Sadeh and colleagues did not replicate this early face-selective peak at the group level, showing instead that face selectivity at 110-msec poststimulus was correlated with face selectivity in the occipital face area (Sadeh, Podlipsky, Zhdanov, & Yovel, 2010). Relevant to the current study, several reports have shown that the race of the face influences ERPs at ∼100 msec (P100; Wiese, 2012; Stahl, Wiese, & Schweinberger, 2010; He, Johnson, Dovidio, & McCarthy, 2009; Walker, Silvert, Hewstone, & Nobre, 2008; Kubota & Ito, 2007; Ito & Urland, 2003, 2005). However, the influence of race on P100 is still controversial, as a number of studies failed to find this early difference (Chen, Pan, Wang, Xiao, & Zhao, 2013; Montalan et al., 2013; Wiese, 2013; Caharel et al., 2011; Herzmann, Willenbockel, Tanaka, & Curran, 2011; Vizioli, Foreman, Rousselet, & Caldara, 2010; Stahl, Wiese, & Schweinberger, 2008; Herrmann et al., 2007), whereas other studies failed to either test or report effects earlier than N170 (Ofan, Rubin, & Amodio, 2011, 2014; Brebner, Krigolson, Handy, Quadflieg, & Turk, 2011; Balas & Nelson, 2010; Gajewski, Schlegel, & Stoerig, 2008; Dickter & Bartholow, 2007; Caldara, Rossion, Bovet, & Hauert, 2004; Caldara et al., 2003). Most important, all extant ERP studies that did find a P100 difference between faces of different races used photographs of faces as stimuli. Thus, low-level image properties such as luminance and contrast may have introduced a systematic confound between the face stimulus sets.
Individual features contained in a face (e.g., eyes, nose, mouth) are not processed in isolation but rather in relation to all other features. Faces are thus processed as a configural whole (Maurer, Le Grand, & Mondloch, 2002). For example, two identical top halves of a face are usually misperceived as different when presented in a composite with different bottom halves irrelevant to the task, a phenomenon known as the “composite face effect” (Hole, 1994; Young, Hellawell, & Hay, 1987). Prior research provides evidence that own-race faces are processed more configurally compared with faces of other races (Tanaka, Kiefer, & Bukach, 2004). For example, white participants showed better recognition of whole faces than individual parts for white faces compared with Asian faces (whole-face advantage; Michel, Caldara, & Rossion, 2006; Rhodes, Hayward, & Winkler, 2006). Participants also showed greater interference from one half of a face when processing the other half (the composite-face effect) when studying faces of their own race compared with faces of other races (Michel, Rossion, Han, Chung, & Caldara, 2006). The influence of race on holistic face processing is particularly interesting as it is thought to underlie differences in recognition and discrimination. According to this hypothesis, own-race faces are generally more easily recognized and discriminated because they engage holistic processing mechanisms to a greater extent. Other-race faces are instead less fully integrated into a whole and thus less accurately identified and remembered (a phenomenon known as the other-race effect; Meissner & Brigham, 2001; Malpass & Kravitz, 1969).
Holistic processing of faces is disrupted when faces are inverted (Rossion, 2008; Farah, Tanaka, & Drain, 1995). Inversion not only changes the orientation of local features but also diminishes the influence of configuration on the processing of an individual feature (e.g., Civile, McLaren, & McLaren, 2016; Suzuki & Cavanagh, 1995) and disrupts processing of face features (McKone & Yovel, 2009), resulting in a significant impairment in recognizing and discriminating faces (Yin, 1969).These inversion effects are much larger for faces than for other objects such as houses (Leder & Carbon, 2006), animals (Minnebusch, Suchan, & Daum, 2009), and landscapes (Diamond & Carey, 1986; see McKone & Robbins, 2011, for a review). The inversion effect has thus been used as a marker of specialized face processing mechanisms for the investigations of its anatomical substrate and developmental trajectory (e.g., Carey & Diamond, 1977). Despite notable differences in ease of recognition and discrimination, upright and inverted faces have identical low-level stimulus properties, making face inversion an ideal manipulation to study the time course and anatomical loci of face processing. Several fMRI studies have found that face inversion causes increased activation in extrastriate regions and lateral occipital (LO) object area (Epstein, Higgins, Parker, Aguirre, & Cooperman, 2006; Aguirre, Singh, & D'Esposito, 1999) especially in the right hemisphere (Haxby et al., 1999), but not in the FFA (but see Yovel & Kanwisher, 2005).
The face inversion effect has been explored previously in ERP studies of face processing. Most ERP studies focused on N170 and found that inversion delays N170 latency and increases its amplitude (e.g., Rossion et al., 1999, 2000; Bentin et al., 1996), regardless of the familiarity of the faces or their relevance to the participant's task (Anaki, Zion-Golumbic, & Bentin, 2007). Indeed, much of the literature on the effect of face inversion on ERPs addresses whether neural processes giving rise to the N170 component are involved in configural processing (Eimer, 2011; Latinus & Taylor, 2006). Most of these studies on face inversion did not report earlier ERP amplitude effects (Itier & Taylor, 2004a). Results of the few studies that investigated the effect of inversion on P100 are largely inconclusive: Some studies found delayed latencies for inverted faces (Wiese, 2013; Itier & Taylor, 2002; Linkenkaer-Hansen et al., 1998), some found larger amplitude differences (Feng, Martinez, Pitts, Luo, & Hillyard, 2012; Marzi & Viggiano, 2007), some found both (Minami, Nakajima, Changvisommid, & Nakauchi, 2015; Itier & Taylor, 2004b, 2004c, 2004d), and some reported no differences (e.g., Kloth, Itier, & Schweinberger, 2013; Rossion et al., 1999).
Three studies investigated face inversion using faces with differing qualities. Marzi and Viggiano (2007) showed participants upright and inverted faces of either famous or nonfamous individuals. P100 was larger for inverted than upright faces, but there was no interaction with familiarity. In Minami et al. (2015), participants were shown upright and inverted faces of differing skin tones. Both inverted faces and naturally colored faces elicited a larger P100 amplitude, and inverted faces also showed a delayed P100 latency. However, this face coloring manipulation is problematic for ERP research, as systematic low-level stimulus differences between faces of different skin tones could confound the P100 results. Wiese (2013) showed participants upright and inverted white and Asian faces and found that inverted faces elicited a larger P100. In this study, however, Asian faces showed happier expression than white faces, thus confounding race with emotions, which also has been shown to affect P100 amplitude (Batty & Taylor, 2003).
In the current study, we varied two factors thought to affect the configural processing of faces—orientation and race—and observed their influence on the ERP timeline. We used luminance-matched line-drawn faces to minimize systematic low-level differences between faces of different races. We focused on P100 as our goal was to establish with two manipulations whether configuration influenced the neural processing of faces before N170 and thus earlier in time than previously established. As we reviewed above, some prior studies have observed variation related to face processing at P100, including a prior study from our laboratory that used photographs uncontrolled for low-level stimulus properties (He et al., 2009). On the basis of prior behavioral findings, we hypothesized that faces of other races are not processed as configurally as own-race faces, and thus, strong evidence for configural processing would be manifested as an interaction between inversion and race.
If configuration affected face processing before N170, we then planned to model the neural sources. The neural source or sources of the scalp-recorded P100 are largely unknown, but intracranial recordings from patients have not revealed a P100 ERP in the mid-fusiform gyrus commonly referred to as the FFA but rather a later ERP with a peak latency of ∼192 msec (N200; Allison et al., 1994). Thus, a convincing demonstration of the effect of configuration on scalp ERPs could help resolve the timescale of the initial processing of faces, and source modeling could direct future research to brain regions involved in that initial processing.
Data were collected from 31 participants, aged 18–34 years (M = 23.67 years, SD = 4.80 years; 13 men, two left-handed). All participants were healthy with normal or corrected-to-normal vision. Participants were drawn from Yale University and from the community of New Haven, CT, and were compensated at an hourly rate. Although our participant recruitment was unrestricted by race, we restricted our analysis to white participants to simplify the analysis of same- and other-race effects on P100. We completed participant recruitment when 20 white participants were enrolled (18–33 years old, M = 23.85 years old, SD = 4.18 years old; seven men, one left-handed). Of the remaining participants, nine were Asian or Asian American and two were African or black. The Yale human subjects committee approved the experimental protocol, and all participants provided informed consent.
A set of 100 white and 100 black face stimuli was created with FaceGen 3.5 software (Singular Inversions, Toronto, Ontario, Canada). All faces depicted were of young adult men of typical symmetry, who are hairless, and with no accessories. Stimuli were then converted to black-and-white drawings using Photoshop and equated for luminance using the SHINE toolbox (Willenbockel et al., 2010). Upright faces were rotated 180° to form inverted stimuli. To reduce homogeneity within face races, we randomly increased or decreased the size of each face over a ±10% range of its original size. This ensured that regions of differential contrast (e.g., face contours) would not always appear in the same location on the screen. Because ERPs can be influenced by luminance asymmetries in vertical hemifields (Gunter, Wijers, Jackson, & Mulder, 1994), we centered each face on the screen at its center of mass, such that the luminance would always be equal above and below fixation.
Participants were tested in a single session. Each of the 400 faces (100 white upright, 100 black upright, 100 white inverted, 100 black inverted) was presented once in a random order. Each trial consisted of a face presented on a white background for 300 msec, and then a response interval with a cross hair in the middle of the screen (see Figure 1 for the task schematic) jittered between 1250 and 1750 msec. Participants were instructed to respond to the race of the face when the cross hair appeared on the screen by pressing “b” or “w” on a keyboard with the index finger of the right and left hands, respectively. All stimuli were presented using the Psychtoolbox v3.0 package for MATLAB (The MathWorks, Natick, MA; Brainard, 1997) on an LCD positioned ∼60 cm from the participants.
EEG Data Acquisition
The EEG data were acquired continuously from 64 channels using a Neuroscan Quik-Cap (Charlotte, NC). The channels incorporated the full 10–20 system along with interposed electrodes. An electrode was placed on the participants' nose and used as a reference, and a ground electrode was placed above the participants' forehead. Additional EOG channels were attached beside each eye to detect horizontal eye movements and below each eye to detect vertical eye movements and blinks. Each EEG channel was amplified with a gain of 2010 using a Synamps RT amplifier system (Compumedics USA, Charlotte, NC), digitized at a 500-Hz sampling rate with 24-bit resolution and passed through a 0.05- to 100-Hz band-pass filter. Impedances were kept below 5kΩ. Digital codes for each event in the trial were sent via a separate digital channel to identify the onset and condition of each stimulus.
EEG Data Processing
Single-trial epochs beginning 100 msec before and extending to 500 msec after each face were extracted offline from the continuous EEG recordings. For each trial, the average of the 100-msec prestimulus period was subtracted from all data points, thus serving as the baseline for amplitude measurements. Epochs contaminated with eye movements and blinks were identified using MATLAB scripts incorporating functions from the Fieldtrip analysis library (Oostenveld, Fries, Maris, & Schoffelen, 2011). Trials with a vertical or horizontal EOG peak-to-peak amplitude over 200 μV, a z score over 4, or any EEG channel's z score over 20 within the first 1000-msec poststimulus were rejected from further analysis. A total of 3.18% of all trials across all participants were rejected using these criteria. The ERP data were then smoothed using a 40-Hz low-pass filter before measurement.
Several issues complicate the measurement and analysis of ERPs. An experimental manipulation may influence the amplitude and/or latency of a particular ERP, thus confounding these measures. At every instant of time, the ERP at a given scalp site reflects a nonlinear summation of all active neural sources that is dependent on their locations and orientations in the brain relative to each electrode. This severely complicates the issue of multiple comparisons, as measurements at each electrode are differentially correlated for each source. Finally, although the prior literature has focused our analysis on the P100 ERP, our experimental manipulations might result in the addition or subtraction of neural sources that overlap in time with the prominent P100 peak but are better captured by a difference wave measurement.
To address these concerns, we restricted our primary analyses to P100 over occipital electrodes implicated in face processing in a prior study from our laboratory (He et al., 2009) and to N170 over the right lateral occipito-parietal scalp, which has been shown to be face sensitive (Bentin et al., 1996). We defined a priori measurement intervals of 100 ± 20 msec for P100 and 170 ± 20 msec for N170 based on the prior literature. The average amplitude over these intervals for P100 and N170, collapsed across participants and conditions, reached a maximum at O2 for P100 and PO8 for N170. Although statistical analyses were restricted to these electrodes, we visually inspected neighboring and contralateral electrodes (OZ and O1 and then P8 and PO7, respectively) to confirm that amplitudes at the electrodes selected a priori were representative and consistent. Because P100 and N170 latencies might be systematically influenced by our experimental manipulations (thus confounding amplitude measurement), we then measured the peak latencies of P100 (at O2) and N170 (at PO8) for each participant and condition over these a priori intervals and used these latencies to measure average amplitudes at the 20-msec interval about peak latencies for each participant and condition. We also calculated mean amplitudes at the interval corresponding to the FWHM of the peak response, that is, at time points surrounding the peak latency with amplitudes equal or larger than half the peak amplitude. Finally, after completing our analyses of P100 and N170 as described above, we computed difference waveforms between the relevant conditions and examined the mean scalp distributions of the difference waves.
Statistical analyses were performed using SPSS (SPSS, Inc., Chicago, IL) and R Statistical Package Version 0.98.1087. The four different face types, namely, upright black, upright white, inverted black, and inverted white, resulted in a 2 (Face race: white vs. black) × 2 (Face orientation: upright vs. inverted) design. Because each participant received all conditions, we used two-way repeated-measures ANOVA to analyze both behavioral and neural data.
We investigated significant differences by localizing their neural generators with sLORETA (standardized low-resolution brain electromagnetic tomography) software (Pascual-Marqui, 2002). sLORETA is a method for source estimation that provides an approximate solution to the inverse problem based on current density estimates given by the minimum norm solution. The solution space is restricted to cortical gray matter as determined by a probabilistic brain atlas (Lancaster et al., 2000), and the intracerebral volume is partitioned in 6239 voxels at a 5-mm spatial resolution. Its algorithm has been widely used in ERP research (e.g., Schiller et al., 2016; Zumsteg, Friedman, Wieser, & Wennberg, 2006; Vitacco, Brandeis, Pascual-Marqui, & Martin, 2002). To estimate the standardized electric activity at each voxel for the difference between inverted and upright faces, we calculated the estimated current density for the difference wave calculated in the ERP analyses. This included artifact rejection, baseline subtraction, and smoothing with a 40-Hz low-pass filter. We then approximated the neuroanatomical location of the resulting activity clusters by finding the position of the maximum difference in the Montreal Neurological Institute space and identifying its corresponding Brodmann's area.
On each trial, participants were asked to indicate the race of the face. A 2 (Face race: white vs. black) × 2 (Face orientation: upright vs. inverted) repeated-measures ANOVA revealed a main effect for Face orientation, F(1, 19) = 21.61, p < .001, η2 = .53 (Figure 2A), such that race classification accuracy was better for upright (M = 84.70, SD = 1.42) than inverted (M = 79.12, SD = 2.16) faces. There was no main effect of Race on accuracy (F(1, 19) = 0.51, p = .483, η2 = .03) and no interaction between Face orientation and Race (F(1, 19) = 1.20, p = .288, η2 = .06). Thus, accuracy was lower for inverted faces compared with upright faces regardless of face race.
The analysis of RTs for correct trials also showed a main effect of Face orientation (F(1, 19) = 7.05, p = .016, η2 = .27), whereby RTs were shorter for upright (M = 346, SD = 18) compared with inverted (M = 364, SD = 19) faces. There was a trend effect of Race (F(1, 19) = 4.01, p = .060, η2 = .17) such that RTs were shorter for black faces (M = 341, SD = 18) than for white faces (M = 369, SD = 21). The interaction between Face race and Face orientation was not significant (F(1, 19) = 0.407, p = .531, η2 = .02; Figure 2B).
A prominent positive component (P100) was observed at posterior electrodes that peaked ∼100 msec after face onset. Figure 3A shows the grand-averaged ERPs elicited by upright and inverted faces at electrode O2. Using our time window of 100 ± 20 msec defined a priori, we found that the latency of the P100 peak for inverted faces was longer than for upright faces (F(1, 19) = 10.48, p = .004, η2 = .36). There were no effects of Face race on P100 latency (F(1, 19) = 2.17, p = .158, η2 = .10). Given this systematic effect of Face orientation on latency, we adjusted our amplitude measurement window to 106 ± 10 msec for black upright and white upright faces and 112 ± 10 msec for black inverted and white inverted faces. Table 1 contains, for each condition, the latency and amplitude at the peak, the average amplitude over the 20-msec window surrounding the peak, and the latency and average amplitude at the FWHM window.
|.||Peak Latency .||Peak Amplitude .||Fixed Window Amplitude .||FWHM Window Latency .||FWHM Window Amplitude .|
|.||Peak Latency .||Peak Amplitude .||Fixed Window Amplitude .||FWHM Window Latency .||FWHM Window Amplitude .|
As predicted, there was a significant main effect for Face inversion on P100 (F(1, 19) = 16.94, p = .001, η2 = .47), with inverted faces eliciting a larger P100 (M = 3.89 μV) than upright faces (M = 1.38 μV; Figure 3A). The parieto-occipital P100 scalp distribution of the inversion effect difference wave (inverted minus upright) resembled a typical P100 distribution (Figure 4A; Heinze et al., 1994). The sLORETA inverse model revealed two main sources of the difference waves. The first was an extensive pericalcarine source, along the medial surface of the occipital lobe, over the cuneus and lingual gyrus in BAs 17 and 18 (x = −3, y = −88, z = 1; Figure 4A). The second source was located on the lateral brain surface over the pSTS in BA 22, at the boundary with the anterior portion of the LO cortex in BA 19 (LO: x = 53, y = −61, z = 15). This pattern was more pronounced in the right hemisphere.
The inversion effect difference waveform was significant at the peak P100 latency—the focus of our study. However, the difference waveform reached its maximum amplitude at 138 msec. The shape of the scalp distribution and the inverse modeling results at the 138-msec peak of the difference wave were very similar to those described above for the peak of the P100 (Figure 4A).
The P100 amplitude difference for face orientation was observed for both white faces (inverted: 4.34 μV, upright: 1.02 μV) and black faces (inverted: 3.45 μV, upright: 1.75 μV). However, we also found a significant interaction between face race and orientation (F(1, 19) = 10.82, p = .004, η2 = .36; Figure 3B). The analysis of the inversion effect difference waves (black inverted − black upright compared with white inverted − white upright; Figure 4B) showed that the inversion effect was larger for white faces (M = 3.32 μV) compared with black faces (M = 1.70 μV). This effect was present in 18 of the 20 participants who took part in the experiment (Figure 5).
The same effects were obtained by analyzing the mean amplitude within the window of FWHM of the peak amplitude: There was a significant main effect for stimulus orientation on mean amplitude (F(1, 19) = 14.50, p = .001, η2 = .43), with inverted faces eliciting a larger P100 (M = 4.81 μV) than upright faces (M = 2.92 μV). There was also an interaction between Face race and Orientation (F(1, 19) = 9.67, p = .006, η2 = .34), such that the effect was greater for own-race (Mupright = 2.63 μV, Minverted = 5.10 μV) compared with other-race (Mupright = 3.20 μV, Minverted = 4.52 μV) faces.
We found a negative ERP over right lateral posterior electrodes ∼170 msec after stimulus presentation that evinced a typical N170 scalp distribution (Figure 6). We measured its amplitude for each participant and condition at the time window of 170 ± 20 msec determined a priori surrounding the peak amplitude. As for P100, we found a main effect of Face orientation (F(1, 19) = 11.89, p = .003, η2 = .39; Figure 6A and C) on N170 latency with longer latencies for inverted faces. There were no differences in latency because of face race (F(1, 19) = 2.09, p = .165, η2 = .10; Figure 6B). We therefore adjusted our amplitude measurement windows to 162 ± 20 msec for upright faces and 168 ± 20 msec for inverted faces.
The two-way (Face race and Face orientation) ANOVA at PO8 revealed no significant main effect of Race, F(1, 19) = 0.89, p = .357, η2 = .05, or Orientation, F(1, 19) = 0.514, p = .482, η2 = .03, and no interaction between Race and Orientation, F(1, 19) = 1.63, p = .217, η2 = .08. Analyses of the mean amplitude within the window of FWHM of the minimum amplitude also revealed no significant effects of Side, F(1, 19) = 0.25, p = .621, η2 = .01, or Orientation, F(1, 19) = 0.18, p = .679, η2 = .01, and no interaction between Race and Orientation, F(1, 19) = 0.01, p = .934, η2 = .00.
This study investigated the influence of configuration on the time course of face processing by comparing ERPs evoked by upright and inverted faces that were of the same (white) or different (black) race of the participants. Upright faces elicited a smaller P100 than inverted faces, indicating that configuration affects face processing as early as 100 msec after face onset. Our face inversion results are consistent with prior studies suggesting that category classification might begin within 100 msec of stimulus presentation (Braeutigam, Bailey, & Swithenby, 2001; Mouchetant-Rostaing, Giard, Delpuech, Echallier, & Pernier, 2000; Schendan, Ganis, & Kutas, 1998; Seeck et al., 1997). These results, however, were complicated by low-level features present in different stimulus categories, such as contrast evoked by different skin tones (He et al., 2009). Because inversion preserves low-level features such as spatial frequency, contrast, and luminance, the difference in P100 amplitude in this study cannot be due to confounds in low-level features. Our finding instead suggests that face-specific processes occur at these earlier latencies.
Single-cell recording studies in monkeys have reported activity in face-specific neurons at 70 msec, about 30 msec after the earliest visual spikes occur (Oram & Perrett, 1992). The earliest visual ERPs recorded in humans occur at around 50–60 msec (Yoshor, Bosking, Ghose, & Maunsell, 2006; Huettel et al., 2004). Our results suggest that face-specific activity occurs by at least 40 msec afterward. Face-specific processing at these early latencies is also consistent with eye-tracking studies showing that fast saccades toward human faces occur at 100–110 msec (Crouzet, Kirchner, & Thorpe, 2009).
The current results also expand our understanding of the anatomical location of category-specific processing in the brain. Whereas face processing research has long focused on a specific network of face-sensitive regions in VOTC and LOTC (Puce et al., 1995), the inverse modeling of our inversion effect difference wave suggests a neural source in the pericalcarine extrastriate cortex. This modeling result is consistent with PET and simultaneous EEG and fMRI studies showing that P100 originates from the extrastriate cortex (Heinze, Hinrichs, Scholz, Burchert, & Mangun, 1998; Woldorff et al., 1997). Recent data from our laboratory have also shown nonconfigured faces (with internally rearranged components) elicit increased fMRI activity in an extensive swath of ventral occipital cortex regions when compared with normally configured faces (Engell et al., under review). The inverse modeling of the difference between inverted and upright faces also localized activity to the pSTS, which is also commonly found active in response to faces (Puce, Allison, Asgari, Gore, & McCarthy, 1996). Indeed, our source map consisting mainly of both extrastriate cortex and pSTS activity largely overlaps with patterns of activation evoked by viewing faces, as described by the atlas for social agent perception, a probabilistic map from a large fMRI sample (Engell & McCarthy, 2013).
Our results for N170 are consistent with the prior literature—that is, N170 latency was delayed for inverted compared with upright faces, although we did not find an increase in amplitude (Anaki et al., 2007; Itier, Latinus, & Taylor, 2006; Itier & Taylor, 2002; Rossion, Gauthier, Goffaux, Tarr, & Crommelinck, 2002; Rossion et al., 2000). Some authors have argued that the increase in N170 amplitude after inversion reflects additional recruitment of object-selective areas (Rossion & Gauthier, 2002; Rossion et al., 1999). In particular, evidence from neuroimaging suggests that, although both upright and inverted faces elicit activity in face perception systems, inverted faces recruit additional regions in the ventral extrastriate cortex (Haxby et al., 1999).
An alternative explanation for the observed effect of inversion on ERP amplitude at ∼100 msec is that disruption of configuration enhances processing of single features, such as eyes, that are otherwise inhibited by the fully configured face. This possibility would be supported by evidence that the perception of configuration interferes with the identification of constituent parts (Suzuki & Cavanagh, 1995; Young et al., 1987; Mermelstein, Banks, & Prinzmetal, 1979). The hypothesis that enhanced amplitudes for nonconfigured stimuli reflect the involvement of eye-selective regions has in fact been proposed as an explanation for the effects of inversion at ∼170 msec: On the basis of the finding that the N170 is larger in response to isolated eyes than faces containing eyes, Bentin and colleagues (1996) suggested that the N170 component reflected the activity of an “eye processor” responsible for processing gaze information. Determining whether the effect of inversion on ERP amplitude at ∼100 msec arises from the recruitment of neural sources dedicated to feature-specific processing (otherwise inhibited by holistic processing in upright faces) or object-specific processing (otherwise replaced by face-specific processing in upright faces) may help elucidate the mechanisms of early face specificity.
We also found an effect of race on the P100 difference between upright and inverted faces, confirming the prediction that this difference should be larger for own-race faces. This finding is consistent with behavioral data suggesting that inversion impairs our ability to readily detect and recognize faces especially for own-race faces, suggesting that own-race faces are processed more holistically (Hancock & Rhodes, 2008; Rhodes, Brake, Taylor, & Tan, 1989). Previous ERP results on this issue were instead focused on N170, and the results were mixed: Some studies found larger N170 inversion effects for own-race faces (Montalan et al., 2013; Caharel et al., 2011), whereas others found no difference between races (Chen et al., 2013). We note that the main difference between these experiments is task demands: In studies that found effects of race on the inversion effect, the task involved a specific response to the race of faces, whereas race was not task relevant in studies that found no effect of race, but participants were instead responding to orientation or occasional nonface targets (see Wiese, 2013, for a review; Taylor, Shehzad, & McCarthy, 2016, for a demonstration of task-dependent ERP differences).
One outstanding issue is whether the effect of race on the P100 inversion effect observed here would also be observed in non-white participants. The other-race effect may be at least partially attributable to the greater experience people have seeing and interacting with others of their own than other races (Rossion & Michel, 2011). That enhanced recognition and memory for own-race results from familiarity is supported by evidence for reduced bias in participants with greater contact with other races (Hancock & Rhodes, 2008) and by reversed biases in participants who were raised in countries where other races are predominant (Sangrigoli, Pallier, Argenti, Ventureyra, & de Schonen, 2005). Facilitated face processing resulting from experience has been shown in the case of improved face memory in participants from populous towns who are thus exposed to many different faces (Balas & Saville, 2015). It has been shown with the own-age bias for faces of the same age (Anastasi & Rhodes, 2005), which reverses with experience as in the case of adult preschool teachers (Kuefner, Cassia, Picozzi, & Bricolo, 2008) or young geriatric nurses (Wiese, Wolff, Steffens, & Schweinberger, 2013).
Our a priori analyses focused on amplitude and latency measurements at the peak latencies of P100 and N170 based on previous research. Indeed, the difference between inverted and upright faces strongly influenced the amplitude of the P100 peak and was significant at 100 msec as predicted. However, the difference wave continued to grow in amplitude until it reached a peak at 138 msec, all the while maintaining a stable scalp distribution. This discrepancy suggests that the difference between experimental conditions might be partially but not entirely captured by designating it a P100 effect. Indeed, P100 likely consists of the summed signal from multiple neuronal sources that may carry out parallel yet distinct functional processes. Our results exemplify how the excellent temporal resolution of electrical recordings might be especially helpful in identifying partially overlapping activations that give rise to the components typically observed and described in the literature.
This study contributes to our current understanding of structural encoding of faces by individuating face-specific components starting at around 100 msec after stimulus presentation. Furthermore, our finding regarding differences in configural processing of same and different race faces is represented in the earliest stages of face processing.
The authors thank Marcia Johnson and two anonymous reviewers for thoughtful comments on an earlier version of this manuscript. They also thank Maria Robbins for help with data collection, Adam Chekroud for technical advice, and members of the Yale Human Neuroscience Laboratory for helpful discussion. This work was supported by the National Institute of Mental Health grant MH-005286 to Gregory McCarthy.
Reprint requests should be sent to Gregory McCarthy, Department of Psychology, Yale University, PO Box 208205, New Haven, CT 06520, or via e-mail: firstname.lastname@example.org.