Faces provide a wealth of information, including the identity of the seen person and social cues, such as the direction of gaze. Crucially, different aspects of face processing require distinct forms of information encoding. Another person's attentional focus can be derived based on a view-dependent code. In contrast, identification benefits from invariance across all viewpoints. Different cortical areas have been suggested to subserve these distinct functions. However, little is known about the temporal aspects of differential viewpoint encoding in the human brain. Here, we combine EEG with multivariate data analyses to resolve the dynamics of face processing with high temporal resolution. This revealed a distinct sequence of viewpoint encoding. Head orientations were encoded first, starting after around 60 msec of processing. Shortly afterward, peaking around 115 msec after stimulus onset, a different encoding scheme emerged. At this latency, mirror-symmetric viewing angles elicited highly similar cortical responses. Finally, about 280 msec after visual onset, EEG response patterns demonstrated a considerable degree of viewpoint invariance across all viewpoints tested, with the noteworthy exception of the front-facing view. Taken together, our results indicate that the processing of facial viewpoints follows a temporal sequence of encoding schemes, potentially mirroring different levels of computational complexity.
Faces are among the most important categories of visual stimuli, providing a rich set of information that is essential to our everyday behavior. One aspect of facial processing that has attracted considerable interest in the past is the cortical encoding of 3-D viewpoints resulting from rotations in depth. The variety of possible viewpoints presents a considerable challenge to the visual system, as different computational goals may rely on specific viewpoints and different levels of viewpoint invariance. For example, invariant face identification mechanisms must generalize across a large range of viewpoints, whereas shared attention must distinguish different gaze directions. Head orientation provides a strong cue for the recognition of another person's attentional focus, and therefore, different head orientations, too, need to be distinguishable from each other (Haxby, Hoffman, & Gobbini, 2000; Perrett, Hietanen, Oram, & Benson, 1992). Finally, the frontal viewpoint has important social and emotional relevance because it frequently co-occurs with direct eye contact (Carlin, Calder, Kriegeskorte, Nili, & Rowe, 2011; Senju & Johnson, 2009), whereas seeing a face from all other viewpoints usually signals an averted gaze.
Befitting the complexity and variety of the underlying computational demands, the primate brain contains a large network of face-selective regions (Pitcher, Walsh, & Duchaine, 2011; Freiwald & Tsao, 2010; Yovel & Kanwisher, 2005; Gauthier et al., 2000; Haxby et al., 2000; Kanwisher, McDermott, & Chun, 1997), and distinct cortical subsystems may exist that support different aspects of face processing (Freiwald & Tsao, 2010; Hoffman & Haxby, 2000). Despite our increasing understanding of the selectivity in the various nodes of this network (Anzellotti, Fairhall, & Caramazza, 2013; Axelrod & Yovel, 2012; Kietzmann, Swisher, König, & Tong, 2012; Carlin et al., 2011; Freiwald & Tsao, 2010; Natu et al., 2010; Pourtois, Schwartz, Seghier, Lazeyras, & Vuilleumier, 2005; Tong, Nakayama, Moscovitch, Weinrib, & Kanwisher, 2000) and a large body of research focusing on electrophysiological signatures of face processing (Caharel, Collet, & Rossion, 2015; Rossion, Prieto, Boremanse, Kuefner, & Van Belle, 2012; Eimer, 2000, 2011; Rossion & Jacques, 2008; Schweinberger, Kaufmann, Moratti, Keil, & Burton, 2007; Itier, Herdman, George, Cheyne, & Taylor, 2006; Joyce, Schyns, Gosselin, Cottrell, & Rossion, 2006; Liu, Harris, & Kanwisher, 2002; McCarthy, Puce, Belger, & Allison, 1999; Bentin, Allison, & Puce, 1996), temporal aspects of differential viewpoint encoding schemes have not been explored in detail. The latencies at which different types of information emerge from the cortical network can ultimately provide valuable insights into the underlying cortical mechanisms and could mirror different levels of computational complexity and task importance. That is, later effects might either be due to more complex visual inference that requires longer computation time or result from a less preferential treatment of the visual system that focuses on other aspects first.
To investigate the temporal sequence of viewpoint processing in the human brain, we used EEG to record cortical responses while participants viewed faces shown from 37 different viewpoints spanning a total of 180° (Figure 1A). We applied multivariate pattern analyses to the visually evoked responses to extract fine-grained information about the representation of facial viewpoints at different temporal latencies. Crucially, the high temporal resolution of the EEG measurements allowed us to investigate fast changes in the underlying representational structures, allowing us to determine the point in time at which the visual system exhibits different viewpoint encoding schemes and invariances.
To analyze the visually evoked potentials, we employed representational similarity analysis (RSA; Kriegeskorte, Mur, & Bandettini, 2008), a multivariate approach to study similarities between cortical activation patterns and their relation to the structure of experimental conditions. The resulting representational similarity structures were analyzed using data-driven visualization techniques, followed by a model-driven approach that explicitly tested three different encoding schemes, each based on a different computational demand for viewpoint encoding. The first model tested how well different viewing directions could be distinguished, as would be required for mechanisms of face perception and shared attention. This model predicted that similar viewpoints lead to similar neural responses, whereas increasing angular differences between viewpoints lead to larger differences in the visually evoked response. It should be noted that cortical activity sensitive to viewpoint similarity could either reflect face-specific processes or processes sensitive to low-level differences, because these increase as a function of viewpoint difference. Contrasting this separation of different viewpoints, our second model focused on partial viewpoint invariance by investigating the effects of viewpoint symmetry. This effect describes the neural preference for select viewpoints as well as their horizontally mirror-symmetric counterparts (e.g., 30° and −30° rotated away from the front view are processed similarly, compared with 30° and the intermediate angles). Such joint viewpoint selectivity would be particularly efficient for bilaterally symmetric objects, such as faces, for which a computationally simple mirror operation would allow the system to either effectively reduce the number of viewpoints required to achieve full invariance or increase the signal-to-noise ratio. Consequently, the effects of mirror-symmetric response tuning were suggested to constitute an important intermediate computational step in building a fully viewpoint-invariant cortical representation (Vetter, Poggio, & Bülthoff, 1994). Neural effects of mirror symmetry were first observed in the macaque temporal lobe (Dubois, de Berker, & Tsao, 2015; Freiwald & Tsao, 2010; Logothetis, Pauls, & Poggio, 1995; Perrett et al., 1991). In humans, the effects of viewpoint symmetry have been reported across a wide range of higher-level visual areas, in studies using multivariate analyses of fMRI data (Axelrod & Yovel, 2012; Kietzmann et al., 2012), and TMS (Kietzmann et al., 2015), highlighting the importance of this computational step. As the third model of interest, we investigated whether the frontal viewpoint is processed differently compared with neighboring, slightly oblique views, as the former frequently co-occurs with direct eye contact and might consequently posit a special status.
Nineteen healthy participants (aged 19–29 years, seven women) took part in the experiment. All participants had normal or corrected-to-normal visual acuity. They were informed about their right to withdraw from the experiment at any time and gave written informed consent to participate. Because of poor task performance or technical error, 3 of the 19 participants had to be excluded from the analyses. The study was approved by the institutional review board of the Osnabrück University and conformed to the Declaration of Helsinki.
The stimulus set was created using FaceGen (Singular Inversions, Inc., Toronto, Canada). It included four individuals (two women), shown from 37 different angles ranging from −90° to 90° in steps of 5°, resulting in 37 experimental conditions (Figure 1A). The stimuli were presented in grayscale on a gray background. The luminance histograms of all stimuli were matched using the SHINE toolbox (Willenbockel et al., 2010). To match previous experimental work from the literature, the face models did not include hair.
To better understand the similarity structure of the stimulus space, we followed our previous approach (Kietzmann et al., 2012) and estimated a low-level stimulus similarity matrix based on a computational model of V1 simple cells. The model consists of a set of 2-D Gabor functions with 17 spatial scales and four orientations (Serre & Riesenhuber, 2004; Lee, 1996). Once applied to the experimental stimuli, the model yields a high-dimensional response vector for each stimulus, which can be used to compute the corresponding similarity matrix using a Pearson product–moment correlation (Figure 1B). To visualize the similarity structure across all viewpoints to be expected from low-level stimulus features alone, the resulting V1 similarity matrix was projected into a 2-D space using multidimensional scaling (MDS; Figure 1C).
Experimental Setup and Design
Each experimental session consisted of 16 blocks, with 312 trials each (4992 trials in total). Within each block, each identity and viewpoint combination was shown twice (296 trials) in pseudorandomized order, while preventing the direct repetition of identical viewpoints. Each trial included 400 msec of stimulus presentation, followed by a random ISI of 300–500 msec. The presentation of the experimental blocks was self-paced by the participants.
To keep the participants' attention on the stimulus display, they were instructed to maintain fixation on a central target and to perform a color change detection task at fixation by reporting these rare target events (16 trials in each block) via button press. This distractor task was orthogonal to the experimental question. Target trials and trials including button presses by the participants were excluded from the analyses.
During the EEG measurements, the participants were seated in a dark room. Stimuli were presented using a 24-in. BenQ screen (Model 2420T, Taipei, Taiwan) running at a resolution of 1920 × 1080 pixels and a refresh rate of 120 Hz. The average latency between EEG trigger and stimulus onset was 7.5 msec. The data shown were corrected for this delay. The distance to the screen was 80 cm, yielding a stimulus display size of approximately 7.9° × 9.7° of visual angle.
In addition to the EEG recordings, we measured the eye movements of our participants using an EyeLink 1000 remote eye tracker (SR Research Ltd., Mississauga, Ontario, Canada), running at a sampling frequency of 500 Hz. The eye tracker was calibrated before the experiment, and recalibration was conducted after every fourth experimental block. Throughout the experiment, the eye-tracking validation error was kept below an average of 0.5° visual angle.
Data Acquisition and Preprocessing
Electrophysiological data were recorded using a 128-Ag/AgCl electrode system (ANT Neuro, Enschede, The Netherlands), with electrodes placed on a Waveguard cap according to the 5% electrode system (Oostenveld & Praamstra, 2001). The data were recorded at a sampling rate of 1024 Hz, ensuring premeasurement that the impedances of all electrodes were below 10 kΩ.
Data were preprocessed using the EEGLAB toolbox (Delorme & Makeig, 2004) in the following order: First, the data were down-sampled to 500 Hz and subsequently filtered using a 1-Hz high-pass filter and a 120-Hz low-pass filter as well as a notch filter around 50 and 100 Hz. Channels exhibiting either excessive noise or strong drifts were removed (four channels on average). After this, the continuous data were manually cleaned, rejecting data sequences including jumps, muscle artifacts, and other sources of noise (on average, 17% of data were rejected for each participant). To remove eye movement-related artifacts, an independent component analysis (based on the AMICA algorithm; Palmer, Makeig, Kreutz-Delgado, & Rao, 2008) was computed on the cleaned data. The independent components corresponding to eye, heart, or muscle activity were manually selected and removed, based on criteria matching previous work on component selection (Plöchl, Ossandón, & König, 2012), before transforming the data back into the original sensor space. The initially removed channels were interpolated based on the activity of their neighboring channels, selected via channel triangulation. Subsequently, the continuous data were divided into epochs for each trial by including data from 100 msec prestimulus to 700 msec poststimulus, using the time window between −100 msec and stimulus onset for baseline correction. The resulting temporal resolution of the multivariate pattern analysis is 2 msec. ERP topographies were plotted using Fieldtrip (Oostenveld, Fries, Maris, & Schoffelen, 2011) and custom code. To aid visibility, ERP topographies (Figure 3) and overview similarity matrices (Figure 4) are based on down-sampled data (100 Hz).
Multivariate Analyses of EEG Data
To test for the temporal development of facial viewpoint encoding, we performed a multivariate pattern analysis on the visually evoked potentials, computing representational similarity matrices across all experimental conditions. Each entry in a given similarity matrix was computed using a Pearson product–moment correlation (Khaligh-Razavi & Kriegeskorte, 2014; Kriegeskorte, Mur, Ruff, et al., 2008) on the respective ERP activation patterns across all EEG channels. The resulting matrices were ordered based on the rotational angle of the faces, ranging from 90° to −90°. Neighboring cells in each similarity matrix therefore represent neighboring viewing angles, and the resulting matrix depicts the overall similarity structure across all conditions. Elements along the main diagonal were estimated by comparing the same viewpoint across different identities (comparisons of the same identity and viewpoint yield a correlation of 1 by definition and were therefore excluded). Computed for every point in time and participant, the resulting 3-D structure (Condition × Condition × Time) reflects the temporal development of representational similarity across all experimental conditions (Cichy, Pantazis, & Oliva, 2014; Carlson, Tovar, Alink, & Kriegeskorte, 2013).
Representational similarity matrices allow for data-driven analyses as well as comparisons with model predictions, such as low-level similarity, control conditions, and other behavioral measures (Kietzmann et al., 2012; Kriegeskorte, Mur, & Bandettini, 2008). To investigate the dynamics of viewpoint encoding across time, we performed data- and model-driven analyses on the spatiotemporal similarity matrices. For the data-driven approach, we first visualized the average empirical similarity at each point in time by projecting the data into a 2-D space using nonmetric MDS. The resulting 2-D arrangement of conditions closely resembles the similarity structure in the original data but is visually more accessible. To account for the temporal smoothness of the underlying data, each MDS optimization was seeded with the MDS result computed for the previous time point (Carlson et al., 2013). The initial MDS solution was based on a random position seed.
As noted earlier, different computational goals likely rely on different viewpoint encoding schemes. On the basis of the respective requirements, distinct predictions about the optimal representational similarity structure can be derived and used as model predictions that are then matched against the empirical data. In addition to the data-driven analyses, three encoding models were defined and tested: The viewpoint similarity model predicts decreasing similarity values with increasing angular difference, the viewpoint symmetry model additionally predicts high similarity values for mirror-symmetric viewing angles, and the frontal viewpoint model predicts distinct EEG patterns for the front-facing view as compared with adjacent views (Figure 2).
Viewpoint similarity: Communication and shared attention require the recognition of another person's direction of gaze. For this, the overall head orientation is a reliable indicator, raising the question where and when different facial viewpoints are neurally separated. Potentially based on low-level stimulus features, early visual areas in macaque and human cortex have been shown to allow for reliable discrimination between different face viewpoints (Carlin, Rowe, Kriegeskorte, Thompson, & Calder, 2012; Kietzmann et al., 2012; Freiwald & Tsao, 2010). In contrast, higher-level visual areas are increasingly invariant to changes in viewpoint (Axelrod & Yovel, 2012, 2015; Anzellotti et al., 2013; Kietzmann et al., 2012; Freiwald & Tsao, 2010; Natu et al., 2010; Kriegeskorte, Formisano, Sorger, & Goebel, 2007). Nevertheless, it is possible that different viewpoints remain separable, even in higher visual areas, if different readout strategies are used (DiCarlo & Cox, 2007). To assess at what latencies head orientations can be discriminated, we designed a model that predicts that neighboring viewpoints elicit similar neural responses, whereas decreasing similarities are expected with increasing viewpoint difference. Maximal similarity is expected for identical viewpoints (tested across facial identities), that is, across the main diagonal of the similarity matrix. The model implicitly tests the extent to which disparate viewing angles can be distinguished by the activity patterns they evoke (Figure 2A). Because rank-based methods are employed to estimate the fit between model and empirical data (detailed below), the viewpoint similarity model does not make specific predictions about the exact rate of similarity falloff across viewpoints and matches any strictly decreasing similarity function. It therefore presents a more general approach, compared with the predictions of the Gabor model used to visualize the stimulus space, which is potentially better applicable to smooth EEG data.
Viewpoint symmetry: Whereas some cortical computations on face stimuli require a separation of differently oriented viewpoints, others require information that is viewpoint invariant. Achieving invariance is a computationally challenging task, as faces seen from different perspectives differ largely in their retinal projections. To cope with this complexity, the visual system relies on multiple view-specific representations that span the whole space of possible viewpoints. Together, these units support invariant recognition (Perrett, Oram, & Ashbridge, 1998; Ullman, 1998; Logothetis et al., 1995; Bülthoff & Edelman, 1992). In addition to this view-based code, however, it has been noted that bilaterally symmetric objects, such as faces, allow for a computational shortcut, because mirror-symmetric viewpoints are left/right mirror images of each other (Vetter et al., 1994). Indeed, corresponding effects were reliably shown using single-cell recordings (Freiwald & Tsao, 2010; Perrett et al., 1991) and fMRI (Dubois et al., 2015) in the macaque. In humans, the effects of viewpoint symmetry have also been reliably observed in behavioral (Vetter et al., 1994), fMRI (Axelrod & Yovel, 2012; Kietzmann et al., 2012), and TMS (Kietzmann et al., 2015) experiments. To investigate the temporal latency of the effect, we included a viewpoint symmetry model in our RSA predictors. This model predicts that directly neighboring viewpoints as well as mirror-symmetric views will elicit similar cortical response patterns. Because the experimentally tested viewpoints are sorted from +90° to −90° in the corresponding representational similarity matrices, values around the principal diagonal (top left to bottom right) indicate similarity across directly neighboring views. The secondary diagonal (top right to bottom left) defines mirror-symmetric viewpoints, that is, the similarity between selected viewpoints and their mirror-symmetric counterparts. Because of this layout, the effects of viewpoint symmetry predict increased similarity values along both diagonals (X shape, shown in Figure 2B). To derive the model prediction, we used the viewpoint similarity model and added a horizontally mirrored version to define the secondary diagonal.
Frontal viewpoint: Compared with oblique viewpoints, a direct frontal view of a face has distinct implications for communication and shared attention. It is horizontally symmetric and frequently co-occurs with direct eye contact. In the current experiment, the directly frontal viewpoint was the only condition that exhibited direct eye contact, whereas neighboring viewpoints, differing by only a 5° rotation, signaled an averted gaze. In line with this observation, the frontal viewpoint model predicts that the 0° (frontal) viewpoint elicits a neural response that is distinct from the response to all other viewpoints. The most conservative approach to testing this prediction is to contrast the frontal with its directly neighboring, slightly oblique viewpoints (+5° and −5°), which function as controls. Specifically, a “special status” of the frontal view predicts that the average representational similarity between the 0° and all other viewpoints is smaller than the average similarity between ±5° face views and all other viewpoints. Accordingly, the model matrix contains negative weights for all similarity estimates involving the frontal viewpoint (shown in blue in Figure 2C) and positive weights for correlations of slightly leftward- and rightward-facing viewpoints (shown in yellow). Please note that, although the viewpoints differ in the direction of gaze (direct vs. averted), the overall visual differences between the tested conditions are small.
To statistically evaluate the different models, we used a Spearman's rank correlation (Khaligh-Razavi & Kriegeskorte, 2014; Kriegeskorte, Mur, & Bandettini, 2008) between the model prediction and the empirical similarity matrices of each participant. This approach assumes that the relationship between model and data should maintain ordinal relationships and to avoid assuming linearity of the relationship between model and data. We then calculated a t test across participants to test, for every point in time, and to determine whether the correlation values (Fisher z corrected) were significantly different from zero.
To control for the multiple statistical comparisons performed, one for each time point, we applied a cluster-based permutation test (Maris & Oostenveld, 2007). All connected time points exhibiting a p value of <.01 were considered as empirical candidate clusters. That is, only time points that were individually significant at p < .01 were included for further analysis. The cluster null distribution was computed by randomly flipping the sign of the correlation values of each participant (in line with the prediction of a zero correlation; Good, 2013). In each iteration, the corresponding sign flip for a participant was applied to the whole time series, that is, to all time points, to preserve the temporal smoothness of the original data, as clusters are defined in the temporal domain. For every permutation (100,000 in total), we followed the same analysis steps as before and estimated positive and negative clusters to be expected under the null hypothesis. In every iteration, only the strongest positive and negative clusters were kept for the null distribution (max sum t statistic). The originally observed, empirical clusters were then compared with this cluster null distribution, computing the probability of observing a cluster equal or larger than the empirical one by chance alone. Only clusters with a p value of <.05 are reported in the following (because we separately tested for positive and negative clusters, a p < .025 was applied to either side, similar to a two-sided t test).
Noise Ceiling Estimates
To estimate an upper bound for model performance, we computed the noise ceiling of the RSA data (Nili et al., 2014) across time. Given an (unknown) true similarity structure that underlies data generation for all participants, this approach uses the empirically observed similarity matrices to estimate the extent to which noise present in the data might limit the maximal level of performance that can be achieved by any model prediction. The noise ceiling consists of a lower bound and an upper bound. The lower bound is computed by a leave-one-participant-out approach, in which the average similarity matrix of all but one participant is correlated with the data of the left-out participant. The upper bound is overfitted to the individual participant, as it is based on the overall average similarity matrix. Because the noise ceiling is estimated from the empirical similarity matrices, it depends on the cells that are included in the analyses. Thus, if models predict different parts of the similarity matrix, they require different noise ceiling estimates. Viewpoint similarity and viewpoint symmetry both rely on the whole matrix and therefore share a noise ceiling estimate. In contrast to this, the frontal viewpoint model provides a prediction for the subset of cells in the similarity matrix and therefore yields its own noise ceiling.
Sensor Searchlight Analysis
To estimate the spatial origin of the effects observed, we implemented a sensor-based searchlight analysis. For each EEG channel, we defined a set of neighboring sensors (i.e., searchlight window within a distance of 2 cm in the 2-D spatial layout; 19.8 neighbors on average per channel). All channels in the local searchlight were then used together to estimate the local representational similarity matrices for each point in time. The resulting similarity structure has size nchan × ntime × ncond × ncond. Once computed, the different model predictions (viewpoint similarity, viewpoint symmetry, and frontal viewpoint) were tested for each searchlight location, storing the corresponding correlation values in the respective searchlight center. Please note that neural dipoles can also project to distal sensor groups, which are not necessarily spatial neighbors. The results of the current sensor searchlight analysis are therefore limited to spatially connected regions exhibiting a selected effect, neglecting effects that could be observed across more distant sensors.
Because the effect latencies were known for the individual models from our previous analyses including all sensors, the searchlight analysis was applied specifically for these predefined temporal windows of interest.
Our participants were instructed to maintain fixation on a central target while performing a color change detection task at fixation. We nevertheless tested whether eye movements might have contributed to any of the observed effects. This was accomplished by computing a similarity matrix from the recorded eye-tracking data of the respective experimental conditions. We first computed a smoothed, 2-D probability distribution (fixation density map) for every participant, condition, and time point. Similar to the EEG activity and the V1 model, we then correlated the eye-tracking data of each condition with every other condition. This resulted in a 3-D correlation matrix (Condition × Condition × Time), which describes the similarity structure of the eye-tracking data for each individual participant and time point. As a next step, we again focused on previously established temporal clusters of significant model fits and used each participant's eye-tracking similarity matrices as control data for a partial Spearman's rank correlation analysis. Specifically, for a given temporal window of interest, we tested the fit between model and EEG similarity matrix, while using the eye-tracking similarity matrix as control. The (Fisher z transformed) partial correlation values were then subject to a t test across participants, while controlling for multiple statistical comparisons via Bonferroni correction.
The basis of all analyses performed was the transformation of the high-dimensional EEG data into representational similarity matrices, which depict the pattern similarity of the visually evoked responses across all conditions. Similarity matrices were computed and statistically evaluated for each time point individually at a temporal resolution of 2 msec, allowing us to investigate fast changes in the representational similarity structure, indicative of changes in the underlying viewpoint encoding. Exemplary EEG topographies, underlying the multivariate analyses, are shown in Figure 3.
To explore the temporal development of viewpoint processing, we visualized the average similarity matrix at each point in time and performed an MDS analysis on the data to project it into 2-D space.
The similarity matrices indicated distinct and temporally separated stages of face processing (Figure 4). Initially, the similarity matrices appear unstructured and random in their patterns. Then, starting around 60 msec after stimulus onset, the principal diagonal exhibits enhanced correlation values. At this point in time, neighboring viewing angles exhibit similar evoked activity patterns, in line with the response properties of lower-level retinotopic visual areas (Figure 4A and B). After this, at around 120 msec, we observed strong effects of viewpoint symmetry: Mirror-symmetric viewpoints exhibited increased correlations compared with intermediate viewpoints (Figure 4C). At a later point in time, around 300 msec, an effect specific to frontal viewpoints was observed. Whereas all other viewpoints exhibited comparable similarity in the evoked response patterns, the cortical responses to frontal views were decidedly different. The same pattern of results can be observed in the MDS projection. The alignment of conditions during baseline is largely random, whereas the arrangement after about 120 msec of processing is almost perfectly ordered with respect to overall rotational angle, with the addition of close proximity for symmetric viewpoints in this multidimensional space. At a considerably later point in time, the order of almost all viewpoints seems mostly random with one marked exception—frontal viewpoints. They appear at a large distance to all other conditions (Figure 4D), including viewpoints that differ by only small rotation angles, such as ±5°.
The two later effects observed in the empirical similarity structures, viewpoint symmetry and the special status of the frontal view, are distinctly different from the representational similarity obtained from a model of V1 responses to our face stimuli, which was analyzed using the same processing stream as the EEG data. For the V1 model, neighboring viewpoints exhibited strong similarity, which decreased with increasing angular difference (Figure 1B and C). For more extreme viewpoints, a slightly increased similarity for mirror-symmetric viewpoints can be observed. This effect has been reported previously for a similar stimulus set of faces without hair (Kietzmann et al., 2012). However, the effect is small, compared with the strong effects of viewpoint symmetry observed in the EEG data. A more detailed statistical analysis of the contribution of low-level similarity to the effects observed is provided below in the model-based analyses.
To better understand the dynamics of viewpoint encoding, we extended our data-driven analyses with a model-based approach. Three models were defined in close agreement with the predictions of different task requirements: viewpoint similarity, an encoding scheme contributing to mechanisms of shared attention; viewpoint symmetry, supporting viewpoint-invariant mechanisms of face identification; and frontal viewpoint, focusing on the special status of frontal views, presumably differentiating direct from averted gaze.
The results of the three model fits are presented in Figure 5 and Table 1. All statistical analyses were cluster-corrected at a threshold of p < .05 (cluster inclusion subject to p < .01 for each individual time point). The analysis of the viewpoint similarity model (Figure 5A) revealed multiple significant clusters. The earliest cluster emerged at around 60 msec after stimulus onset and remained statistically significant throughout the whole period analyzed (see Table 1 for exact latencies and cluster p values). These results indicate that viewpoints remain distinguishable from early to late periods of visual processing. Effects of viewpoint symmetry (Figure 5B) are expressed slightly later, starting after around 80 msec of visual processing, peaking around 115 msec. Finally, differences in similarity structure between frontal and directly neighboring views (±5°) were observed in two later clusters (Figure 5C), starting at around 280 msec and ending around 420 msec. Taken together, our model-based analysis revealed a temporally distinct sequence of viewpoint encoding stages in the human visual system. Effects of viewpoint similarity occurred first, followed by effects of viewpoint symmetry, which are again followed by differences between frontal and oblique viewpoints that occurred at comparably late stages of processing. Interestingly, whereas the viewpoint symmetry model approaches the noise ceiling (i.e., maximal expected correlation for true model given the noise in the data) during the early phase of processing, the frontal viewpoint model reaches the noise ceiling at later stages. This may indicate a true change in the underlying neural encoding scheme from early to late face processing.
|Time .||Cluster Statistic .||Cluster p .|
|Time .||Cluster Statistic .||Cluster p .|
This table shows the latencies and statistics for all significant temporal clusters. Clusters with a p value smaller than .025 were considered significant (applied to both negative and positive clusters, this yields an overall cluster-corrected value of p < .05). Cluster inclusion was contingent on an individual sample statistic of p < .01.
By definition, the viewpoint symmetry model contains aspects related to viewpoint similarity. It is therefore possible that the effects of viewpoint similarity contribute to the observed effects of viewpoint symmetry. To differentiate these two representational aspects, we split the viewpoint symmetry model into its two constituent diagonals: One part corresponds to viewpoint similarity (\); the other corresponds to the horizontally flipped counterpart (/), called symmetry-only in the following. We then used a partial Spearman's rank correlation to estimate the effect of symmetry-only while controlling for effects of viewpoint similarity. This revealed a later effect of viewpoint symmetry, ranging from 111 to 149 msec after stimulus onset (cluster p = .003, cluster statistic = 88.2). This more conservative measure of viewpoint symmetry reaches statistical significance much later than the onset of viewpoint similarity, starting around 50 msec earlier, and therefore indicates two separate representational stages. The effects of frontal viewpoint remained unchanged after using viewpoint similarity as a control in a corresponding partial correlation analysis.
Having demonstrated a distinct representational sequence of viewpoint encoding schemes based on the whole set of EEG sensors, we performed a more fine-grained, sensor-based searchlight analysis to spatially localize the effects observed (Figure 6). The effects of viewpoint similarity were localized in the earliest time window (60–80 msec) to ensure that the effects of viewpoint symmetry are not (yet) present. In line with the early latency, the resulting effect topography suggests an occipital generator. Next, we localized the effects of viewpoint symmetry in a more conservative time window (111–149 msec). This revealed a more lateralized topography, in line with stronger neural responses to faces in the right hemisphere. Finally, the special status of the frontal viewpoint (281–307 and 389–419 msec) was observed most strongly in central electrode locations.
We also performed a control analysis to ensure that any residual effects of eye position, which might have occurred despite the color change detection task at fixation, could not account for the results. Using the eye-tracking data recorded during the experimental trials, we estimated 2-D fixation density maps, one for every participant, condition, and time point. These were then used to compute eye-tracking similarity matrices for each participant (Figure 7). These individually defined matrices were then used as controls in a partial Spearman's rank correlation between model prediction and EEG data. That is, we asked whether the explanatory power of our models could be explained away by differential eye movements during the respective temporal window of interest. Focusing on previously established windows of interest and testing partial model correlations against zero, all model predictions remained statistically significant (viewpoint similarity: p < .0001, t(15) = 5.8; viewpoint symmetry: p < .0001, t(15) = 9.1; fontal viewpoint: p < .005, t(15) = 3.9; p < .0005, t(15) = 5.4). Residual effects of eye position are therefore unlikely to explain the effects observed.
Finally, to rule out explanations based on differences in attentional load introduced by the distractor task, we tested the behavioral performance of our participants across all viewpoint conditions using a repeated-measures ANOVA. We found no significant differences in d′ (p = .34, F(36, 15) = 1.09), hit rate (p = .30, F(36, 15) = 1.12), or false alarm rate (p = .53, F(36, 15) = 0.96) across experimental conditions, indicating that differences in attentional load cannot account for the observed EEG effects.
We investigated the representational dynamics of face viewpoint encoding in the human brain by applying multivariate analyses to high-dimensional EEG data, recorded at high temporal resolution. Our data-driven and model-based analyses indicate a temporal sequence of distinct viewpoint encoding stages in the EEG activation patterns. Neural responses that reflected viewpoint similarity occurred first, followed by strong effects of viewpoint symmetry. At a later processing stage, the frontal view led to significantly different activation patterns compared with all other viewpoints. Ruling out alternative explanations for the effects observed, we showed that viewpoint symmetry and specialized processing of front-facing views cannot be explained in terms of viewpoint similarity and that none of the effects observed can be explained in terms of residual eye movements or differences in distractor task performance.
The distinction between different head orientations was present early and lasted throughout the entire analyzed time window of 700 msec. The early onset of this viewpoint similarity effect and the results of the sensor-based searchlight analysis suggest that the initial discrimination of head orientation was mainly driven by low-level stimulus properties. A possible explanation of the fact that the effects lasted throughout the whole trial is that low-level stimulus properties remain an important factor in visually evoked responses, even at later temporal stages of visual processing. These results are in line with the results of recent fMRI studies demonstrating residual low-level effects in higher-level visual areas (Rice, Watson, Hartley, & Andrews, 2014; Wardle & Ritchie, 2014; Kietzmann et al., 2012; Yue, Cassidy, Devaney, Holt, & Tootell, 2011). A second, nonexclusive explanation is that the initial sensory coding of head orientation with a specific similarity structure was transformed to action-oriented activity patterns that obey identical relations. Indeed, the mapping of head orientation to the location of joint attention is continuous, that is, similar visual stimuli indicate similar locations for joint attention. Thus, this interpretation is also compatible with an action-oriented interpretation of cortical representations (Engel, Maye, Kurthen, & König, 2013).
Using multivariate analyses of fMRI data, the effects of viewpoint symmetry were previously shown to be prevalent across a large range of higher-order visual areas (Kietzmann et al., 2012). In addition to replicating the general pattern of these results with a different method for measuring neural activity, our current findings extend these results by providing an estimate of the latency of these face processing effects. In general agreement with electrophysiological data from macaque monkeys (Freiwald & Tsao, 2010), the effects of viewpoint symmetry were found to occur at around 110 msec after stimulus onset. This finding constrains possible models of viewpoint symmetry and renders explanations based on extended, recurrent processing less likely. Viewpoint symmetry was observed later than the effects of head orientation but significantly earlier than one might expect for fully viewpoint-invariant effects of face identification, as has been estimated in previous experimental work. For instance, behavioral data from a rapid go/no-go face identification task suggest that people need about 260 msec to identify a face, when accounting for the time required to perform the corresponding motor response (Barragan-Jason, Besson, Ceccaldi, & Barbeau, 2013). Electrophysiological experiments suggest a similar latency. Although the N170 component, with an onset of around 130 msec, was previously shown to exhibit effects of face identification (Jacques & Rossion, 2006), the N170 does not represent a fully viewpoint-invariant code (Caharel et al., 2015; Ewbank, Smith, Hancock, & Andrews, 2008; Miyakoshi, Kanayama, Nomura, Iidaka, & Ohira, 2008). Studies of a later face-selective N250 component, however, do show evidence of viewpoint invariance processing (Caharel et al., 2015; Schweinberger, Pickering, Jentzsch, Burton, & Kaufmann, 2002). Finally, the estimated latency of viewpoint symmetry directly matches the typically observed latencies of the higher-level visual areas (Ghuman et al., 2014; Parvizi et al., 2012; Liu, Agam, Madsen, & Kreiman, 2009), which were previously found to exhibit viewpoint symmetry using fMRI (Kietzmann et al., 2012). Taken together, these results are in line with the proposal that viewpoint symmetry might act as an intermediate step in achieving full viewpoint invariance, after the effects of head orientation and preceding a fully invariant code (Freiwald & Tsao, 2010).
In line with the vast majority of face processing studies, we decided to present the face stimuli foveally to study cortical responses under more natural viewing conditions and to obtain an improved signal-to-noise ratio. Recently, studies have evaluated the extent to which viewpoint encoding effects can generalize across changes in retinal position in the fusiform face area (FFA). Testing across peripheral stimulus positions along the vertical meridian, no position-invariant effects of viewpoint symmetry were observed (Ramírez, Cichy, Allefeld, & Haynes, 2014). Although further experiments are required to fully understand the origins of this symmetry null effect, an explanation can be given based on the fact that higher-level visual representations are not position-invariant (Hong, Yamins, Majaj, & DiCarlo, 2016; Golomb & Kanwisher, 2012; Kravitz, Vinson, & Baker, 2008; Hung, Kreiman, Poggio, & DiCarlo, 2005). Furthermore, FFA exhibits a much reduced signal-to-noise ratio in response to stimuli presented in the periphery (Hasson, Levy, Behrmann, Hendler, & Malach, 2002; Lerner, Hendler, & Malach, 2002; Levy, Hasson, Avidan, Hendler, & Malach, 2001) and is better tuned for foveally presented stimuli (Kay, Weiner, Kay, & Weiner, 2015). Both factors strongly affect the ability to find effects driven by distinct activation patterns.
In addition to these effects of viewpoint symmetry, we observed distinct cortical activation patterns for the front-facing viewpoint at a much later processing stage. As a possible explanation for this effect, we noted that the front view differed from the others in terms of direct versus averted gaze. Such effects, which are also known as the “eye contact effect,” were previously studied in the context of social attention (Nummenmaa & Calder, 2009) and social neuroscience (Itier & Batty, 2009; Senju & Johnson, 2009). Interestingly, we observed such effects while our participants performed a color change detection task at fixation, consistent with behavioral studies suggesting that automatic processing of direct gaze can occur, even when attention is drawn away from the face stimulus (Yokoyama, Sakai, Noguchi, & Kita, 2014). Despite the importance of direct eye contact for social cognitive inferences (Nummenmaa & Calder, 2009; Baron-Cohen, Wheelwright, Hill, Raste, & Plumb, 2001), the results of previous electrophysiological studies on the topic are mixed. Whereas one study suggested that the effects of eye contact occur after only 160 msec of processing (Conty, N'Diaye, Tijus, & George, 2007), others have not found differences between direct and averted gaze at similar latencies (Taylor, Itier, Allison, & Edmonds, 2001). The latter result is in line with univariate analyses performed in developmental studies that indicate diminishing effects of eye contact with adulthood (Grice, Halit, Farroni, & Baron-Cohen, 2005). Finally, data from intracranial recordings suggest that the effects of direct versus averted gaze occur during later stages of processing (>200 msec; Pourtois & Spinelli, 2010), in agreement with the current observations. Given the strength of the effects observed here, it would be interesting to revisit studies that produced negative results and to reanalyze the data using multivariate pattern analyses. As indicated above, the front-facing view is a reliable predictor for direct eye contact. It is not yet clear, however, which mechanisms could aid the visual system in detecting this particular viewpoint. A potential factor could be the reflectional symmetry of front-view faces. Studies of visuospatial regularity report a sustained posterior negativity for reflectionally symmetric dot patterns (Makin, Wilton, Pecchinenda, & Bertamini, 2012). Although previous effects of eye contact were shown based on nonsymmetric stimuli (Pourtois & Spinelli, 2010), demonstrating that the effects of eye contact exist independently of reflectional symmetry, it is conceivable that the reflectional symmetry of frontal face views contributes to their detection in a visual scene.
In search for a cortical origin of gaze direction effects, converging evidence from neurophysiology (Perrett et al., 1992; Perrett & Smith, 1985) and fMRI experiments has emphasized the role of the STS. In addition to the work by Carlin et al. (2011), experiments by Hoffman and Haxby (2000) suggest that the processing of facial information is separated between the STS and FFA. Whereas the FFA was suggested to process information related to facial identity (Axelrod & Yovel, 2015), the STS was found to be involved in the processing of changeable aspects of faces, including the perception of gaze direction (Haxby et al., 2000). In line with this, Calder et al. (2007) used an fMRI adaptation paradigm to study the encoding of gaze direction and found that the anterior STS and inferior parietal cortex contain information that allows for a separation of different gaze directions.
It should be noted that head orientation and gaze direction were congruent in the currently used stimulus set. It was therefore not possible to separate cortical signals related to gaze and head orientation in this study. Evidence in favor of such a separation was provided by Carlin et al. (2011), who demonstrated that the STS contains finely graded information about the direction of gaze, independently of head orientation and physical image features. Such evidence of a high-level encoding of gaze direction is in line with data from a behavioral adaptation paradigm, in which effects of gaze direction were demonstrated despite changes in size and head direction (Jenkins, Beaver, & Calder, 2006). Moreover, electrophysiological studies using the same paradigm found late effects, starting around 250 msec after stimulus onset (Kloth & Schweinberger, 2010; Schweinberger, Kloth, & Jenkins, 2007). In context of the current analyses and results, it would be interesting to combine both methodologies, gaze adaptation and spatiotemporal pattern similarity, to investigate how gaze adaptation affects the dynamics of viewpoint encoding. Future work might further benefit from the currently used analysis approach to investigate the temporal development and separability of head and gaze direction signals.
In summary, our multivariate analyses of visually evoked potentials revealed that the cortical representations of facial viewpoints traverse a distinct sequence, expressing different encoding schemes at different latencies. Such representational stages may reflect the complexity of the underlying task and the priority that the brain devotes to the respective computation.
The work was supported by the Research and Innovation Programs of the European Union (FP7-ICT-270212, H2020-FETPROACT-2014, and SEP-210141273), the European Research Council (ERC-2010-AdG #269716), and an DFG Postdoctoral Fellowship for Tim C. Kietzmann.
Reprint requests should be sent to Tim C. Kietzmann, Cognition and Brain Sciences Unit, Medical Research Council, 15 Chaucer Rd., Cambridge, United Kingdom CB2 7EF, or via e-mail: Tim.Kietzmann@mrc-cbu.cam.ac.uk.