Electrophysiological and fMRI-based investigations of the ventral temporal cortex of primates provide strong support for regional specialization for the processing of faces. These responses are most frequently found in or near the fusiform gyrus, but there is substantial variability in their anatomical location and response properties. An outstanding question is the extent to which ventral temporal cortex participates in processing dynamic, expressive aspects of faces, a function usually attributed to regions near the superior temporal cortex. Here, we investigated these issues through intracranial recordings from eight human surgical patients. We compared several different aspects of face processing (static and dynamic faces; happy, neutral, and fearful expressions) with power in the high-gamma band (70–150 Hz) from a spectral analysis. Detailed mapping of the response characteristics as a function of anatomical location was conducted in relation to the gyral and sulcal pattern on each patient's brain. The results document responses with high responsiveness for static or dynamic faces, often showing abrupt changes in response properties between spatially close recording sites and idiosyncratic across different subjects. Notably, strong responses to dynamic facial expressions can be found in the fusiform gyrus, just as can responses to static faces. The findings suggest a more complex, fragmented architecture of ventral temporal cortex around the fusiform gyrus, one that includes focal regions of cortex that appear relatively specialized for either static or dynamic aspects of faces.
How the brain is able to decode identity, gender, emotion, and other attributes of faces with such apparent efficiency has been a major topic of investigation. An early and influential model postulated a “divide-and-conquer” approach to the problem, with different aspects of facial information processed by functionally separate streams (Bruce & Young, 1986), which are now known to map onto neural pathways that are partly neuroanatomically segregated. Such segregation has been proposed in particular for dynamic (changeable) and static (unchangeable) face information (Haxby, Hoffman, & Gobbini, 2000). Here, static features refer to those things about an individual's face that do not change quickly, such as identity, race, and gender, and changeable features refer to emotion, gaze, and mouth movements, which all participate in social communication. According to this model, motivated primarily by results from fMRI studies, the lateral part of the fusiform gyrus, which contains the face-selective fusiform face area (FFA), processes static aspects of faces (Kanwisher, McDermott, & Chun, 1997; McCarthy, Puce, Gore, & Allison, 1997), whereas the lateral temporal cortex around the STS processes changeable information (Hoffman & Haxby, 2000).
A number of behavioral and functional imaging studies, however, support some form of interaction between processing of these two processing streams (Vuilleumier & Pourtois, 2007; Ishai, Pessoa, Bikle, & Ungerleider, 2004; Baudouin, Gilibert, Sansone, & Tiberghien, 2000; Schweinberger & Soukup, 1998), but it remains unclear where this might happen. Direct electrophysiological recordings from the human brain offer the spatial resolution to investigate these issues. Intracranial ERP studies have revealed responses to static faces in fusiform cortex (Allison, Puce, Spencer, & McCarthy, 1999; Allison, Ginter, et al., 1994; Allison, McCarthy, Nobre, Puce, & Belger, 1994). On the other hand, functional imaging studies have shown that face motion can also activate this region (Schultz & Pilz, 2009; Sato, Kochiyama, Yoshikawa, Naito, & Matsumura, 2004; LaBar, Crupain, Voyvodic, & McCarthy, 2003). Analyzing the same data set as the one in this study, we previously found responses to both unchangeable and changeable aspects of faces that could be decoded better from ventral than lateral temporal cortex using spectral decoding (Tsuchiya, Kawasaki, Oya, Howard, & Adolphs, 2008). Given the different approaches used, it remains unclear as to what extent neurons in the ventral temporal lobe respond to static and dynamic faces, whether these aspects of faces are coded by the same neuronal populations or whether they are represented in different subregions. Here, we addressed this issue by recording intracranial responses from the fusiform gyrus while participants viewed static as well as dynamic facial expressions, allowing us to investigate the differential responses seen to the two classes of stimuli within the same person and same neural region. Our results suggest that ventral temporal cortex around the fusiform gyrus is relatively fragmented into subregions that respond best to either unchangeable or changeable aspects of faces.
Participants were eight neurosurgical patients with medically intractable epilepsy that was resistant to antiseizure medication therapy and were undergoing clinical invasive seizure monitoring to localize seizure foci. The research protocol was approved by the institutional review board of the University of Iowa, and all subjects signed informed consent before participation. The data analyzed here have been previously used in another study that focused on spectral decoding (Tsuchiya et al., 2008).
Stimuli were made from grayscale pictures of neutral, happy, and fearful expressions of four individuals (two women) selected from the Ekman and Friesen set (Figure 1; Ekman & Friesen, 1976). Each face was equated for size, mean brightness, mean contrast, and position and framed in an elliptical window using MATLAB (Mathworks, Natick, MA). The faces subtended 7.5° × 10° of visual angle. Intermediate morphs during the dynamic phase of stimulation were created from 28 evenly spaced linear interpolations between the initial neutral face and the ending emotional face using morphing software (Morph 2.5, Gryphon Software, San Diego, CA). The interpolations were based on the starting and ending positions of manually selected fiducial points and were made with respect to both warping (pixel position) and pixel luminance. During the dynamic phase, intermediate morphs were incremented at a frame rate of 60 Hz, creating the impression of smooth facial motion changing from neutral face to either a happy face (morph-to-happy) or a fearful face (morph-to-fear) over 500 msec (Figure 1). Dynamic nonface comparison stimuli (control trial) were generated from a radial checker pattern with black/white square wave modulation at around 0.25°/cycle framed in an elliptical window (Figure 1). The pattern was presented statically for 1 sec, followed by a 0.5-sec dynamic period in which the luminance boundaries moved radially, expanding or contracting at a velocity of 0.5°/sec. We presented the stimuli using the Psychophysics Toolbox version 2.55 (Brainard, 1997; Pelli, 1997) and MATLAB 5.2 on a PowerMac G4 running OS 9 (Apple, Cupertino, CA).
Each session consisted 200 trials, including 80 trials of morph-to-fear (20 for each identity), 80 trials of morph-to-happy, and 40 trials of nonface control (20 expanding and 20 contracting). A session was divided into 20 blocks of 10 trials. Within each block, 10 different stimulus types (morph-to-fear and morph-to-happy of each of four individuals and expanding and contracting movements of checker pattern) were presented once in random order. Blocks were successively continued without interval delay. Therefore, each stimulus type appeared 20 times in each session in a pseudorandom order. Immediately before a session began, we instructed subjects that feature, either emotion or gender, they had to attend and respond to. Each participant completed two sessions, an emotion discrimination session and a gender discrimination session. Five participants underwent an emotion discrimination session first followed by a gender discrimination session, and the remaining three participants underwent a gender discrimination session first. The order of sessions was arbitrary, determined by an experimenter. A trial began with a static rectangular checker pattern for 1 sec, followed either by a still image of faces with neutral expression or by a radial checker pattern. After 1 sec of the still images, the dynamic phase of each stimulus began and lasted for 500 msec. The last frame in the morph movie stayed on for another 1 sec. After the stimulus was extinguished, participants were prompted to make a response to discriminate the stimulus (gender or emotion, depending on the task). A prompt reminded participants of the three alternatives: 1 = happy, 2 = other, and 3 = fear in the emotion discrimination sessions and 1 = woman, 2 = other, and 3 = man in the gender discrimination sessions. They were asked to answer “other” if they saw a checker pattern instead of a face. After the response, the next trial started. We did not put any time constraint on the response time and did not instruct participants whether to put priority on speed or accuracy of responses.
Anatomical Location of the Electrodes
Participants had several subdural and depth electrodes implanted (Ad-Tech Medical Instrument Corporation, Racine, WI) with up to 188 contacts. The location and number of electrodes varied depending on clinical consideration. We analyzed data recorded from contacts on the ventral temporal cortex around the fusiform gyrus. Electrodes were either four-contact strip electrodes or 2 × 8 contact strip-grid electrodes with interelectrode distance of 1 cm and 5 mm, respectively. Three participants had 16 contacts each in the right hemisphere (R), and five participants had 4–16 contacts (mean = 10.4) in the left hemisphere (L). In summary, a total of 48 contacts on R and 52 contacts on L made a grand total of 100 contacts across all participants. Each contact was a 4-mm-diameter disc made of platinum–iridium embedded in a silicone sheet with an exposed diameter of 2.3 mm.
For each participant, we obtained structural T1-weighted MRI volumes on a 3-T TIM Trio (Siemens, Erlangen, Germany) with both preimplantation and postimplantation, as well as CT scans (postimplantation only). For the MRI scans, coronal slices were obtained with 1-mm slice thickness and 0.78 × 0.78 mm in-plane resolution. Axial slices of the CT scans were obtained with 1-mm slice thickness and 0.47 × 0.47 mm in-plane resolution. Postimplantation CT scans and preimplantation MRI were rendered into 3-D volumes and coregistered using AFNI (NIMH, Bethesda, MD) and Analyze software (version 7.0, AnalyzeDirect, Stilwell, KS) with mutual information maximization. Postimplantation CT scans were used to identify the coordinates of the contacts. We transferred these coordinates onto the high-resolution preoperative MRI and obtained 2-D projections of the MRI from ventral views using in-house programs in MATLAB 7. We manually identified anatomical landmarks around the ventral temporal surface, including the inferior temporal gyrus (ITG), lateral and medial fusiform gyrus (LFG and MFG, respectively), and inferior lingual gyrus (ILG).
The electrical potential at each electrode was referenced to an electrode placed under the scalp near the vertex of the skull. The impedances of the electrodes were 5–20 kΩ. Signals from the brain were filtered (1.6 Hz–1 kHz), digitized, and recorded using the Multichannel Neurophysiology Workstation (Tucker-Davis Technologies, Alachua, FL) and analyzed off-line using custom programs in MATLAB. In an initial two subjects, we used an LCD display (Multisync LCD 1760V, NEC, Tokyo, Japan) for stimulus presentation and recorded the electrophysiological signal at a sampling rate of 1 kHz. In the remaining six subjects, we used another LCD display (VX922, ViewSonic, Walnut, CA) and recorded the signal at 2 kHz. In both cases, the display refresh rate was 60 Hz. To measure the precise timing of visual stimulation, we presented a small white rectangle on the top left corner of the display at the onset of the stimulus and recorded changes of luminance with a photodiode along with the electrocorticography (ECoG).
We discarded any trial containing absolute ECoG potentials that exceeded the mean + 3 SD on raw data and high-pass filtered data (cutoff frequency = 24 Hz). We applied rejection on high-pass filtered data to remove small amplitude spikes that might go undetected in the raw data but can appear as wide-band noise after time–frequency analysis. Noisy trials were rejected on contact-by-contact and trial-by-trial basis using an automated homemade MATLAB program. Therefore, the number of trials that went into analysis for each stimulus category differed between contacts (see insets of Figures 2 and 3). Mean rejection rates for each stimulus category across all 100 ventral temporal contacts were 6.0%, 6.6%, and 4.5% for morph-to-fear, morph-to-happy, and nonface control trials, respectively, which were not significantly different from each other (p = .57, Kruskal–Wallis test). None of the cortical areas included in this study were within a seizure focus.
In the epoch-based analysis, we investigated the effect of face and emotion during static and dynamic stimulus periods by setting five epochs (Figure 1): (1) baseline (−550 to −250 msec before onset of static stimulus), (2) early static (150–450 msec after onset of static stimulus), (3) late static (550–850 msec after onset of static stimulus), (4) dynamic (150–450 msec after onset of dynamic stimulus), and (5) postdynamic (50–350 msec after offset of dynamic stimulus). We performed Wilcoxon rank sum tests to contrast the means of face and control trials and fearful and happy trials for each contact and for each epoch. Resultant p values were pooled across all contrasts, contacts, and epochs within each subject, and the level of statistical significance (q) was set at a false discovery rate (FDR) of <0.05 (Benjamini & Hochberg, 1995).
We defined the face-responsive ERBP to static face stimuli as the response that satisfied the following three criteria: (1) Mean ERBP responses of face trials were significantly greater in early and/or late static epochs than in the baseline epoch. (2) The mean ERBP elicited by the static faces was also significantly greater than the mean ERBP elicited by checkerboard control stimulus. (3) The maximum ERBP elicited by static face stimuli was at least 50% and 1 dB larger than the maximum ERBP elicited by control stimuli during the 1-sec period after onset of static faces. Similarly, we defined face-responsive ERBP in response to dynamic face stimuli as follows: (1) Mean ERBP responses of face trials was significantly larger than baseline in dynamic and/or postdynamic epochs. (2) The mean ERBP elicited by dynamic face stimuli was significantly larger than the mean ERBP elicited by control stimuli. (3) The maximum ERBP elicited by dynamic face stimuli was at least 50% and 1 dB larger than the maximum ERBP elicited by control stimuli during the 1-sec period after onset of dynamic faces.
The effect of emotional facial motion on ERBP responses was tested only with face trials because there was no emotional content in the control trials. We based significant emotional modulation on the comparison between the mean ERBP elicited by morph-to-fear trials and morph-to-happy trials in either the dynamic or postdynamic epochs. We investigated emotional modulation across all 100 contacts regardless of the magnitude of ERBP responses and the face responsiveness at that contact to obtain a broad and an unbiased assessment.
To coordinate electrode locations across the eight subjects, contacts were localized in relation to the anatomy of the ventral temporal cortical surface. In the medial–lateral orientation, their location was specified by gyri on which electrodes resided. Location in the anterior–posterior orientation was specified according to the position in 10 equally divided segments from temporal pole to occipital pole, with the first segment being the most anterior and the tenth segment being the most posterior (cf. Figure 4). We chose this localization method instead of a numerical coordinate system given the known close relationship between cortical function and gyral–sulcal anatomy and given that the anatomy of the cortical surface is quite variable from subject to subject, especially in the ventral temporal cortex, precluding automated coregistration procedures (Spiridon, Fischl, & Kanwisher, 2006).
To investigate the time course of modulation of the ERBP by expressive facial motion, we performed serial Wilcoxon rank sum tests comparing the averaged ERBP of fear trials and happy trials during every time point on 23 contacts with significant ERBP modulation by face motion. Resultant p values were pooled across all 23 contacts and across all time points over a 4-sec period starting from 1 sec before onset of static faces, and the level of significance was then corrected at FDR < 0.05. To show common tendencies in the time course of the response across contacts, p values at each time point were plotted for all 23 contacts as an overlapping time series (Figure 5).
We applied receiver operating characteristic (ROC) analysis to assess how well ERBP responses to each category of stimulus can be separated on a single-trial basis. We performed ROC analyses for binary classification between ERBP of preferred and nonpreferred stimuli by sliding a threshold over the whole range of ERBP at each peristimulus time point. We computed area under the curve (AUC; Figure 6D and E). If distributions of ERBP of preferred and nonpreferred stimuli completely overlap, AUC equals to 0.5. The more distributions of ERBP of both stimuli separate, the more AUC deviates from 0.5; with more ERBP of preferred stimuli distributed at a larger value than nonpreferred stimuli, AUC approaches 1, and with an opposite case, it approaches 0. For discrimination of face from nonface control, face is the preferred stimulus. For discrimination of fear from happy, we regarded morph-to-fear as the preferred stimulus and morph-to-happy as the nonpreferred stimulus and vice versa for discrimination of happy from fear. As can be seen in Figure 6E, the AUC value was above 0.5 when the response to fear was larger than that to happy, and it was below 0.5 when the response to fear was smaller than that to happy. We report the maximum AUC between 50 and 900 msec after the onset of static and dynamic stimuli for discrimination of face from nonface stimuli across 24 and 27 contacts that were face responsive during early and late static epochs and dynamic and postdynamic epochs, respectively. For discrimination of fear from happy and happy from fear, we reported the maximum AUC between 50 and 900 msec after the onset of dynamic stimuli across 20 and 4 contacts whose ERBPs were fear > happy and happy > fear, respectively. The distribution of maximum AUCs for the discrimination of faces from 24 static and 27 dynamic face-responsive contacts was statistically contrasted against that of 76 and 73 not face-responsive contacts, respectively, using Wilcoxon rank sum tests. Similarly, the distribution of maximum AUCs for discrimination between fear and happy of 20 fear > happy and 4 happy > fear contacts was statistically tested against that of 80 and 96 contacts that did not respond selectively to emotions, respectively (Figure 6F–I). To see AUC of baseline activity, we computed the maximum AUC of all 100 contacts between 900 and 150 msec before the onset of static stimuli.
Responses to Static and Dynamic Faces
Our stimuli of both faces and checker patterns elicited robust ERBP and ERP responses in the ventral temporal cortex (Figures 2 and 3; Supplementary Figures S2, S4, and S5). Face-responsive ERP sites were found distributed across ventral temporal cortex around the fusiform gyrus, consistent with previous reports (Allison et al., 1999). Following the onset of the static neutral face at the beginning of the trial, we observed the previously described positive–negative–positive (P150, N200, and P290) waveform (Allison, Ginter, et al., 1994; Allison, McCarthy, et al., 1994). However, unlike ERP responses that were found primarily for static stimuli but not for dynamic stimuli, robust ERBP responses were elicited by dynamic stimuli as well as static stimuli (Figures 2 and 3; Supplementary Figures S2, S4, and S5).
In each of our eight participants, we recorded face-responsive ERBPs in at least one electrode contact responding to either static or dynamic face stimulus or both (Figure 3; Supplementary Figures S4 and S5). The total number of face-responsive electrode contacts across eight participants responding to static and dynamic faces were 24 and 27, respectively, of 100 contacts. The distribution of face-responsive ERBP between R and L was not significantly different for static (R: 14/48 contacts, L: 10/52 contacts; Fisher's exact test, p = .35) or dynamic (R: 15/48, L: 12/52; Fisher's exact test, p = .38) faces (Figure 4). We did not see any difference in the overall distribution of static face-responsive sites and dynamic face-responsive sites across participants, except for slightly more dynamic face-responsive sites across both hemispheres. We found contacts responsive primarily to static faces, primarily to dynamic faces, and equally to both: Face-responsive ERBP were elicited only by static faces in 11 contacts (R: 5/48, L: 6/52), only by dynamic faces in 14 contacts (R: 6/48, L: 8/52), and by both static and dynamic stimuli in 13 of 100 contacts (R: 9/48, L: 4/52).
The existence of static-only and dynamic-only face-responsive contacts suggests that there might be partly separate neural systems involved in processing static and dynamic faces. Contacts with similar response properties, whether they were responsive to static faces, dynamic faces, or both, tended to cluster together as seen in Contacts 1–4, 9, and 10 of Figure 3B; Contacts 2, 3, 5, and 9 of Figure 3C; and Contacts 3, 10, and 11 of Supplementary Figure S4A. Transition from one type of response property to the other is often abrupt between clusters as seen between Contacts 9 and 10 and surrounding contacts of Figure 3B, where face responsiveness to dynamic faces steeply declined within 5 mm. On the other hand, some response changes were more gradual, such as the response to static faces in Contacts 1–4 of Figure 3B. These findings suggest that there are separate regions of cortex in the ventral temporal lobe, some more activated by static than dynamic faces and some showing the opposite responsiveness.
Responses to Different Emotions
Next, we investigated whether dynamic expressions of different emotions affect ERBP. Modulation of ERBP by expressive face motion was seen in 23 (R: 19/48, L: 4/52; p = .0002, Fisher's exact test) of 100 contacts in six subjects. The majority of cortical sites where ERBP was modulated by dynamic face expressions showed greater ERBP responses for morph-to-fear than morph-to-happy faces. Such fear > happy response was seen in 20 contacts (R: 17/48, L: 3/52, six participants; Figure 4B). In only four contacts (R: 3/48, L: 1/52, three participants) did happy expressions elicit larger ERBPs than fearful expressions (Figure 3C; Supplementary Figures S3C and S4A). The happy > fear modulation was spatially limited such that it was found in isolation surrounded by cortical sites showing fear > happy modulation or no modulation (Figure 3C; Supplementary Figures S3C and S4A). In total, modulation of ERBP by expressive face motion was seen in 16 of 38 face-responsive contacts (Figure 3; Supplementary Figures S4A and S5A) and in 7 of 62 contacts that did not have face-responsive ERBP responses in either of the epochs (Contact 6 of Figure 3B; Contacts 6, 7, and 10 of Figure 3C and Supplementary Figure S3C; and Contact 1 of Supplementary Figure S4A).
We examined the time course of ERBP evoked by fearful and happy dynamic facial expressions in 23 contacts that had a significantly different response to the two emotions. Latencies to the development of differences in ERBPs evoked by dynamic faces of different emotions were as brief as 120 msec after stimulus onset. We found that early differences, which developed within 300 msec, were mostly because of responses elicited by happy as compared with fearful dynamic faces (Figure 5).
Single-trial- and Single-contact-based Analysis
Next, we examined face versus control or fearful motion versus happy motion responses on a single-trial, single-contact basis. In the contact in the right LFG shown in Figure 2 and Contact 3 of Figure 3C, most ERBPs responding to face stimuli in both static and dynamic epochs were larger than ERBPs elicited by nonface control stimuli (Figure 6A and B). The AUC from our ROC analysis reached almost 1 in both epochs (Figure 6D, maximum AUC of 0.99 for static and 0.99 for dynamic), demonstrating that maximum ERBPs from single trials can almost perfectly distinguish responses to faces from those to control checkerboards. Discrimination of morph-to-fear versus morph-to-happy was more difficult, as one might expect (Figure 6C and E). In this contact, maximum AUC for fear > happy reached 0.63, and maximum AUC for happy > fear was 0.70. The average of maximum AUC for detection of faces was 0.89 (0.72–1; Figure 6F) in static epochs across 24 static face-responsive contacts and 0.84 (0.65–1; Figure 6G) in dynamic epochs across 27 dynamic face-responsive contacts. Maximum AUCs of these contacts were significantly different from those of face-unresponsive contacts (Wilcoxon rank sum test, p < 1−12, 76 unresponsive contacts in static epochs; p < 1−5, 73 in dynamic epochs). The average of maximum AUC for discrimination of fear from happy was 0.67 (0.60–0.79; Figure 6H) with 20 fear > happy contacts, and happy from fear was 0.64 (0.61–0.70; Figure 6I) with four happy > fear contacts. Maximum AUCs of these contacts were significantly different from those of contacts that did not respond to emotional facial motion (p < 1−10, 80 unresponsive contacts for detection of morph-to-fear; p < .002, 96 for detection of morph-to-happy).
Stimuli used in this study consisted two distinct epochs within each trial: presentation of a static neutral face and dynamic change of expression from neutral to either fearful or happy. In the dynamic part, a specific aspect of changeable features (i.e., the emotional expression) was being changed, whereas unchangeable features of faces (their identity) were held constant. Unchangeable features refer to those things about an individual's face that do not change quickly, such as identity, race, and gender, and changeable features refer to those that typically come into play during an emotional expression (Haxby et al., 2000). We employed a movie with gradual expression change from neutral to either fearful or happy in part because it is more natural to see facial expressions changing dynamically from neutral to an emotion than to see a static emotional face abruptly appearing.
Using the same set of data, we previously analyzed the power modulation of the intracranial EEG across wide frequency bands using a novel decoding approach and found that EEG components in the frequency range from 50 to 150 Hz carried information that discriminated faces from control geometric patterns as well as fearful from happy expressions. Importantly, we also found that decoding performance was highest around the MFG (Tsuchiya et al., 2008). Therefore, in this study, we focused our analysis on high-gamma band components in the fusiform gyrus to further elucidate how face information is represented there.
The ERBP in the high-gamma band elicited by static and dynamic faces provides evidence that human ventral temporal cortex around the fusiform gyrus processes not only unchangeable but also changeable aspects of faces. This region appears to be functionally divided into smaller heterogeneous subregions that can be differentially specialized for processing dynamic or static faces or indeed nonface stimuli. Latencies for the development of significant differences between responses evoked by fearful and happy face motions were as brief as 120 msec, suggesting that at least part of the response to dynamic face stimuli may be bottom–up (as opposed to requiring feedback from structures such as the amygdala or the STS, which would be expected to require longer latencies). To summarize the key conclusions from our findings:
There are small regional areas of cortex in the human ventral temporal lobe with face-responsive properties, a finding in line with electrophysiological and neuroimaging studies in monkeys as in humans (Freiwald, Tsao, & Livingstone, 2009; Moeller, Freiwald, & Tsao, 2008; Pinsk, DeSimone, Moore, Gross, & Kastner, 2005; Tsao, Freiwald, Knutsen, Mandeville, & Tootell, 2003; Allison et al., 1999; McCarthy, Puce, Belger, & Allison, 1999; Puce, Allison, & McCarthy, 1999; Allison, Ginter, et al., 1994; Perrett et al., 1985; Desimone, Albright, Gross, & Bruce, 1984; Perrett, Rolls, & Caan, 1982).
The precise location of these face-responsive regions varies from individual to individual.
Responses in ventral temporal cortex relatively encompass selectivity for unchangeable as well as changeable aspects of faces, with different small subregions specialized for one or the other or responding equally to both.
ERBP Elicited by Faces
The lateral part of the fusiform gyrus, the so-called FFA, is preferentially activated by faces, and a large volume of electrophysiological (Allison et al., 1999; McCarthy et al., 1999; Puce et al., 1999; Allison, Ginter, et al., 1994; Allison, McCarthy, et al., 1994), and imaging (Kanwisher et al., 1997; McCarthy et al., 1997; Puce, Allison, Asgari, Gore, & McCarthy, 1996) studies have confirmed this area's involvement in face processing. In agreement with this literature, we recorded face-responsive ERPs with a typical waveform from ventral temporal cortex around the fusiform gyrus responding to static faces (Figure 2).
ERBP is widely used for investigations of local neuronal activity. Higher-frequency components of the EEG that are measured with the ERBP have been implicated in various cognitive functions in humans (Edwards et al., 2010; Nourski et al., 2009; Vidal, Chaumon, O'Regan, & Tallon-Baudry, 2006; Lachaux et al., 2005; Tanji, Suzuki, Delorme, Shamoto, & Nakasato, 2005; Pfurtscheller, Graimann, Huggins, Levine, & Schuh, 2003; Crone, Boatman, Gordon, & Hao, 2001; Crone, Miglioretti, Gordon, & Lesser, 1998). The spatial distribution of the ERBP in the gamma range is typically more focal than for electrophysiological measures in lower frequency bands, and functional maps inferred from the ERBP correspond well to the topographic maps derived from electrical cortical stimulation (Crone, Boatman, et al., 2001; Crone, Hao, et al., 2001; Crone et al., 1998). In nonhuman primates, power increases in ERBP correlate better with multiunit neuronal firing than power modulation in lower frequency bands (Whittingstall & Logothetis, 2009; Ray, Crone, Niebur, Franaszczuk, & Hsiao, 2008; Steinschneider, Fishman, & Arezzo, 2008).
It is important to note that our use of the term “face responsiveness” in this study is not meant to imply face selectivity in a more general sense but only the relative selectivity of responses to faces over those to checker patterns, without a more exhaustive comparison of responses to other object categories (which we did not undertake in this study).
Functional Specialization in FFA
An emerging view of the face processing system holds that face information is processed in multiple interconnected and locally specialized brain regions in a coordinated manner (Moeller et al., 2008; Fairhall & Ishai, 2007; Calder & Young, 2005; Adolphs, 2002; Haxby et al., 2000; Ishai, Ungerleider, Martin, Schouten, & Haxby, 1999) rather than within strictly segregated pathways. Neurons responding selectively to faces have been found in the monkey inferior temporal cortex and cortex around the STS (Gross & Sergent, 1992; Desimone et al., 1984; Rolls, 1984; Perrett et al., 1982). Patches of cortex specialized for face processing are found in the ventral and lateral temporal cortex in nonhuman primates and humans (Bell, Hadj-Bouziane, Frihauf, Tootell, & Ungerleider, 2009; Pinsk et al., 2005, 2009; Hadj-Bouziane, Bell, Knusten, Ungerleider, & Tootell, 2008; Tsao, Moeller, & Freiwald, 2008; Tsao, Freiwald, Tootell, & Livingstone, 2006; Tsao et al., 2003). Neural responses in the FFA have been reported being stronger to dynamic faces than to static faces (Schultz & Pilz, 2009; Sato et al., 2004; LaBar et al., 2003). Regions responding to static or dynamic faces are mutually interconnected and capable of modulating one another (Rajimehr, Young, & Tootell, 2009; Moeller et al., 2008). Such distributed representations of objects including faces can be established with surprisingly short latencies and have been used to successfully decode stimulus categories from intracranial EEG recordings (Liu, Agam, Madsen, & Kreiman, 2009; Tsuchiya et al., 2008). An architecture such as this might explain the findings of interactions between the processing of emotion and identity that have been reported earlier (Ganel, Valyear, Goshen-Gottstein, & Goodale, 2005; Ishai et al., 2004; Dolan, Morris, & de Gelder, 2001; Baudouin et al., 2000; Schweinberger & Soukup, 1998).
We found that static and dynamic faces elicited significant ERBP modulation within discrete but partially overlapping cortical sites around the fusiform gyrus. This region may thus serve a more general function in extracting information from faces based on low-level features, which precedes the extraction of higher level information such as emotional cues (Tsuchiya et al., 2008). Such a system might exist in parallel with alternate visual routes that direct coarse visual information to cortical areas involved in emotional and attentional modulation (Rudrauf et al., 2008; Vuilleumier, Armony, Driver, & Dolan, 2003; Winston, Vuilleumier, & Dolan, 2003; Morris, Ohman, & Dolan, 1999).
FFA Responses to Facial Expression
Modulation of FFA responses by facial expression has been suggested to reflect feedback, which serves to enhance the processing of emotionally salient information (Vuilleumier & Pourtois, 2007). Possible candidate origins of such feedback are the amygdala and the pFC. Our findings do not rule out such a mechanism, but they put a temporal limit on its latency. Previous intracranial ERP studies in the ventral temporal lobe using static stimuli identified the earliest differential responses to emotion with latencies exceeding 300 msec (Pourtois, Spinelli, Seeck, & Vuilleumier, 2010; Puce et al., 1999), supporting the notion of a delayed feedback signal. In the present case, we observed the emergence of emotion category discrimination in the ERBP by 120 msec. Although such an early response does not by itself rule out a role for rapid feedback (Kirchner, Barbeau, Thorpe, Régis, & Liégeois-Chauvel, 2009), it is also very much consistent with a feed-forward mechanism given that the category discrimination we observed emerges at the onset of the response and follows a time course similar to other, presumably feed-forward, object and face-selective responses in adjacent cortex (Agam et al., 2010; Liu et al., 2009; Serre et al., 2007; Thorpe, Fize, & Marlot, 1996; Perrett et al., 1982). Second, the observation that modulation by facial expression appeared in isolated contacts, rather than as a global phenomenon encompassing all face-responsive responses, implies that any effect of feedback modulation would have to be directed to specific cortical areas. This finding does not fit the picture of a more diffuse feedback-dependent modulation that has emerged from the functional imaging literature (Vuilleumier & Pourtois, 2007), although it remains possible that feedback modulation acts selectively on specific subregions of face-responsive cortex or that the modulation measured with BOLD-fMRI is distinct from the modulation measured with direct electrophysiological recordings, at least in the frequency range we analyzed in our study. A number of functional imaging studies have identified a selective enhancement of FFA to fearful faces (Ishai et al., 2004; Vuilleumier, Richardson, Armony, Driver, & Dolan, 2004; Vuilleumier, Armony, Driver, & Dolan, 2001), which has been argued to depend on feedback from the amygdala (Vuilleumier & Pourtois, 2007). In agreement with this pattern, we found a predominance of emotion-discriminating responses, which showed enhanced ERBP to the fearful morph over the happy morph. This predominance of the fear-responsive response emerged late in the dynamic phase of the stimulation and may thus reflect a contribution from such a feedback mechanism. As noted, however, only a part of face-responsive contacts showed emotional modulation, suggesting that any feedback modulation affected specific subregions of the responsive cortex. In addition, we also observed a higher ERBP response to happy morphs at a few locations. These responses occurred in the early dynamic period, making them seemingly inconsistent with feedback modulation.
Because of limitations in collecting data from neurosurgical patients, such as time, attention span, and fatigue, we used emotional expressions as the sole facial dynamic stimuli, thus making it impossible to separate face motion from face emotion. It thus remains possible that these issues regarding the origin of selectivity for fearful or happy dynamic expressions relate to distinctions between particular motion cues rather than to distinctions between emotions. It will be important in future studies to determine the responsiveness of these cortical regions to specific face movement components, such as changes in eye gaze or mouth movements, to understand exactly how the temporal cortex constructs representations of facial emotion.
We thank all patients for their participation in this study; John Brugge, Mitchell Steinschneider, Jeremy Greenlee, Paul Poon, Rick Reale, and Rick Jenison for advice and comments; Haiming Chen, Chandan Reddy, Fangxiang Chen, Nader Dahdaleh, and Adam Jackson for data collection and care of participants; and Yota Kimura and Joe Hitchon for help with visual stimuli.
Reprint requests should be sent to Hiroto Kawasaki, Department of Neurosurgery, University of Iowa, 200 Hawkins Drive, Iowa City, IA 52242, or via e-mail: firstname.lastname@example.org.