Face inversion effects occur for both behavioral and electrophysiological responses when people view faces. In EEG, inverted faces are often reported to evoke an enhanced amplitude and delayed latency of the N170 ERP. This response has been attributed to the indexing of specialized face processing mechanisms within the brain. However, inspection of the literature revealed that, although N170 is consistently delayed to a variety of face representations, only photographed faces invoke enhanced N170 amplitudes upon inversion. This suggests that the increased N170 amplitudes to inverted faces may have other origins than the inversion of the face's structure. We hypothesize that the unique N170 amplitude response to inverted photographed faces stems from multiple expectation violations, over and above structural inversion. For instance, rotating an image of a face upside–down not only violates the expectation that faces appear upright but also lifelong priors about illumination and gravity. We recorded EEG while participants viewed face stimuli (upright vs. inverted), where the faces were illuminated from above versus below, and where the models were photographed upright versus hanging upside–down. The N170 amplitudes were found to be modulated by a complex interaction between orientation, lighting, and gravity factors, with the amplitudes largest when faces consistently violated all three expectations. These results confirm our hypothesis that face inversion effects on N170 amplitudes are driven by a violation of the viewer's expectations across several parameters that characterize faces, rather than a disruption in the configurational disposition of its features.
Early behavioral studies uncovered a phenomenon in which recognition of a face was delayed and less accurate when a face was turned upside–down, compared to when it was upright (Yin, 1969). Because this phenomenon has not been observed with similar intensity for non-face stimuli (c.f., Johnston, Molyneux, & Young, 2015), this phenomenon was termed the face inversion effect (FIE; Bentin, Allison, Puce, Perez, & McCarthy, 1996). The existence of the FIE is widely supported by subsequent studies using a variety of different face stimuli. For example, Rossion et al. (1999) found that participants were slower and significantly less accurate at determining if a new face image matched an upright face target image when that new image was inverted compared to when it was upright. Similarly, another matching-task study using intact and scrambled photographed faces, presented both upright and inverted, found that participant responses were most inaccurate and delayed when faces were scrambled and inverted (Bauser & Suchan, 2013).
Moreover, the FIE is also supported by studies using “Thatcherized” faces (i.e., images of faces in which the mouth and eyes have been rotated 180°). Thatcherized faces appear grotesque when presented in an upright orientation, but not when inverted (Thompson, 1980), indicating that expectations of the structural composition of a face are more recognizably violated in an upright orientation, whereas, in a less commonly experienced upside–down face, violations of structural expectations are harder to discern. For example, inverted and “Thatcherized” familiar faces induce slower and less accurate responses, when participants are asked to determine whether the face was Thatcherized or normative on a matching task (Carbon, Grüter, Weber, & Lueschow, 2007). Overall, these studies among others evaluating behavioral responses to face inversion have led to the suggestion that the visual modality has developed orientation-specific face representations, which are disrupted upon the inversion of a face (Bentin et al., 1996).
Electrophysiological Response to Face Inversion
A wealth of data has now accumulated in the field of EEG/ERP, suggesting that an important early step in the stream of neural events involved in face processing occurs between 130 and 200 msec. Indeed, the first ERP component differentiating faces from other objects peaks at 170 msec and is characterized by greater negativity over the lateral-occipital scalp, leading to its designation as the N170. This component, initially described by Bentin et al. (1996), was found to be greater for faces compared to scrambled faces, or to other object categories such as cars, furniture, hands, or animal faces (although ape faces were later reported to produce an N170 of similar amplitude; Carmel & Bentin, 2002). Subsequent investigations comparing a broader set of categories, including mushrooms, flowers, houses, lions, tools, road signs, and textures (Itier & Taylor, 2004), further confirmed the enhanced N170 for faces (Johnston et al., 2015).
Surprisingly, face inversion is found to produce a rather counterintuitive effect. Indeed, one might plausibly expect face inversion to lead to a decrease in the N170 amplitude, to the extent that this component reflects the appropriate encoding of the face stimulus. However, the reverse phenomenon is commonly observed, namely, that inverting a face elicits an increase in N170 amplitude (e.g., Sadeh & Yovel, 2010; Itier, Alain, Sedore, & McIntosh, 2007; Jacques, d'Arripe, & Rossion, 2007; Marzi & Viggiano, 2007; Caharel, Fiori, Bernard, Lalonde, & Rebaï, 2006; Itier, Latinus, & Taylor, 2006; Righart & de Gelder, 2006; Itier & Taylor, 2004; de Haan, Pascalis, & Johnson, 2002; Eimer, 2000; Rossion et al., 1999, 2000; Bentin et al., 1996).
N170 latency has also been found to be sensitive to the orientation of face stimuli, with the ERP peaking significantly later for inverted than upright faces (Rousselet, Husk, Bennett, & Sekuler, 2008; Itier & Taylor, 2004; Rossion et al., 2000; Bentin et al., 1996). The latency FIE is consistent across varying face stimuli, such as naturally photographed faces (Eimer, 2000; Rossion et al., 1999), two-tone Mooney faces preceded by a training period (George, Jemel, Fiori, Chaby, & Renault, 2005), and simple line drawn schematic faces (Latinus & Taylor, 2005, 2006). Rossion et al. (1999) proposed that inversion may disrupt the configural information of the face, which subsequently increases the difficulty of face processing and contributes to a delayed N170. Thus, the delay of the N170 peak to inverted faces may stem from the disruption of orientation-specific, struc tural encoding processes, driven by configural features of the face.
Interestingly though, the finding of an enhanced N170 amplitude to inversion is not consistently evident across different types of stimuli, with discrepancies in amplitudes elicited by schematic, photographed, and Mooney faces. A significantly enlarged N170 upon inversion has been consistently found for naturally photographed faces (Itier & Taylor, 2002; Rossion et al., 2000; Bentin et al., 1996). In contrast, when participants are presented with schematic faces, inversion has the opposite effect—reducing N170 amplitude (Henderson, McCulloch, & Herbert, 2003; Sagiv & Bentin, 2001). Similar findings have been found in studies using Mooney faces. When two-tone, high-contrast Mooney images with incomplete features were perceived as faces, they elicited a larger N170 in response to an upright image than to an inverted image (Latinus & Taylor, 2005, 2006; George et al., 2005). Latinus and Taylor (2006)'s comparative study of photographic, schematic, and Mooney faces further demonstrated that, when the three types of face stimuli were inverted, only photographed faces elicited a significantly enhanced N170 amplitude (Latinus & Taylor, 2006). Taking these findings together, it appears that, although N170 latency delays occur to inversion of all face stimuli, the increased N170 amplitude-to-face inversion appears to be unique to photographed face stimuli, which reflect how faces naturally appear in the world.
Several ideas have been offered to explain why the FIE is unique to naturally photographed faces. Itier et al. (2007) suggested that the enhancement of the N170 reflects the additional activation of eye selective neurons within the STS, which occurs when face stimuli are inverted. As schematic and Mooney faces lack the complex eye information within naturally photographed faces, Itier et al. (2007) suggested that they do not sufficiently activate the eye selective neurons and, in turn, elicit a reduced N170. Some support the importance of the eye region; this comes from Kloth, Itier, and Schweinberger (2013), who showed that N170 amplitude enhancements to inverted faces only occurred when the eye region was intact. Alternatively, Sagiv and Bentin (2001) posited that when a naturally photographed face is inverted, holistic processing is disrupted, so physiognomic feature processing is recruited. Because schematic faces are not perceived as containing physiognomic components, they are processed with the holistic pathway (Sagiv & Bentin, 2001). Therefore, Sagiv and Bentin (2001) proposed that the smaller N170 amplitude to inverted schematic faces reflects the inhibition of both the holistic and featural pathways within a face-specific processing system.
These face-specific processing theories, which propose the enhanced N170 to inversion stems from the activation of eye selective neurons (Itier et al., 2007) or the recruitment of physiognomic feature processes (Sagiv & Bentin, 2001), are not wholly supported by the literature. As although the N170 has been described as a face-sensitive component, it has been found to be sensitive to a number of other non-face-related factors, including contrast (Itier & Taylor, 2002), spatial frequency (Goffaux, Gauthier, & Rossion, 2003), Gaussian noise (Jemel et al., 2003), attention (Holmes, Vuilleumier, & Eimer, 2003), and interstimulus variance (Thierry, Martin, Downing, & Pegna, 2007).
A recent study by Johnston et al. (2017) also demonstrated that the N170 was sensitive to expectancy violations across a range of stimulus categories and attributes. In their study, the authors presented a sequence of five static images, with the first four images establishing an implied trajectory of either facial expression change, head or body rotation, or face and shape location within the visual field (Johnston et al., 2017). The final image in each sequence either conformed to the expected trajectory or deviated from the trajectory predicted, subsequently violating contextually induced expectations. Johnston et al. (2017) found that, across all three experiments, irrespective of stimulus type, the N170 amplitude was robustly greater to violated trajectories, than predictable trajectories. These findings suggest that the N170 is a surprise signal, which is responsive to the presence or absence of expectation violations. Further studies, using a similar experimental paradigm, showed that the N170 amplitude was dose dependent as a function of both the strength of prior expectations and the degree to which those expectations are violated (Robinson, Breakspear, Young, & Johnston, 2018) and that N170 latency surprise error signals were generated in different regions of the visual cortex when different attributes (head orientation and person identity) of the same stimulus faces violated expectations (Robinson et al., 2020). These findings suggest that the scalp N170 may (at least in part) reflect prediction error signals across a range of predicted stimulus attributes.
In line with these findings, we propose that the enhanced N170-to-face inversion may stem from surprising violations of contextual expectations, which inform daily visual perception, that faces appear upright, light comes from above, and gravity pulls downward. When a naturally occurring face appears upside–down, it is subject to the contextual influences of lighting and gravity, with the face (usually) lit from above by overhead lighting or the sun, and earth's gravity pulling the loose components of the face downward. Contrastingly, when an upright photograph of a face is inverted to appear upside–down, expectations of how faces commonly appear in the world are not met. As upon inversion, the face appears to be illuminated from below and as though gravity is pulling upward on the face. Thus, the N170 may be enhanced in response to unmet or violated expectations of contextual influences, such as orientation, direction of lighting, and effect of gravity. This proposed explanation stems from research that suggests that humans have developed an upright orientation, light from above, and gravity bias.
As faces are regularly viewed upright and very rarely upside–down, it is plausible that the brain generates upright expectations of how faces are orientated in the external world. This suggestion is supported by Itier and Taylor's (2004) developmental behavioral and EEG study, which found a progressive tuning throughout childhood and late adolescence to upright faces. A behavioral matching task revealed that, from 8 to 16 years of age, children grew progressively faster and more accurate at recognizing upright faces, over inverted faces (Itier & Taylor, 2004). This FIE was further supported by EEG recorded across temporal-parietal electrode sites, demonstrating that the N170 latency was delayed for inverted faces, over upright faces, consistently for all ages (Itier & Taylor, 2004). Differences in N170 amplitude size, elicited in adolescents and adults, suggested that, with age, there is a progressive enhancement of the N170 amplitudes to inverted faces and a gradual reduction in the amplitudes evoked by upright faces (Itier & Taylor, 2004). These findings suggest a gradual consolidation of upright expectations, with the N170 prediction error signal progressively enhanced with age, to inverted face stimuli, which violated orientation expectations. This may be attributed to consistent exposure to faces throughout development and establishment of mature contextual expectations within adulthood. Thus, we would surmise that, when a face is inverted, expectations of orientation are violated, resulting in the enhanced N170 component of the ERP.
Inversion of a photographed face violates a lifelong contextual bias that illumination comes from above. In the field of object recognition, seminal studies have demonstrated the expectation in viewers that light must originate from above. For example, Kleffner and Ramachandran (1992) modified the shading of two-dimensional stimuli (disks) producing subjective impressions of perspective and depth, where light sources would be situated to the left or right sides, or vertically, below or above. Their findings revealed that the visual system assumes the presence of a single light source illuminating the visual scene and, importantly, the constraint that this source must be situated above the object. In the field of face processing, behavioral studies have found that recognition of faces illuminated from below are significantly less accurate than faces illuminated from above, which suggests that we expect faces will be illuminated from above, as they are regularly top lit by the sun or overhead lighting (Johnston, Hill, & Carman, 2013; McMullen, Shore, & Henderson, 2000).
A magnetoencephalography study by Brodski, Paasch, Helbling, and Wibral (2015) provided insight into the effect of lighting violations on prediction error signaling. They examined the response in gamma-band activity (GBA) to Mooney faces, which appeared upright illuminated from above, upright illuminated from below, inverted illuminated from above, and inverted illuminated from below. Brodski et al. (2015) found that, within the high-frequency range (68–144 Hz), GBA was increased when faces appeared to be illuminated from below and when faces were inverted. Indicating surprise signaling within GBA is responsive to violations of lighting and orientation expectations (Brodski et al., 2015). While also confirming that visual experience with illumination from above generates expectations of how a face is illuminated, these observations demonstrate how violated light direction expectations activate surprise signaling within the brain when lighting is directed from below, leading to an enhanced N170.
Gravity constitutes a prevailing contextual influence on the way faces appear, with the consistent gravitational pull of earth pulling the loose components of the face downward. From lifelong exposure to earth's gravitational environment, it is suggested humans have established strong priors of how gravity influences visual perception (Jörges & López-Moliner, 2017). A virtual reality study that manipulated the trajectory of a ball, to comply with normal gravity acceleration (1 g) or without gravitational influence (0 g), found that, for 0-g trials, participants chose a consistently incorrect point of intercept, which was consistent with that of the intercept point for 1 g (Russo et al., 2017). Suggesting, an internal model of earth's gravitational effects is influential in predicting intercept location, with consistent exposure to earth's gravitational pull generating top–down driven predictions of trajectory (Russo et al., 2017). Furthermore, an EEG study conducted aboard the international space station found that, without a gravitational reference, the perception of 3-D images within a virtual reality navigation task evoked significantly different EEG signals than the same task performed without weightlessness on earth (Cheron et al., 2014). This suggested that gravity influences multisensory perception inherently on earth, with top–down engagement of gravitational expectations evident, whereas, in space, there is evidence to suggest some degree of adaption to weightlessness (Cheron et al., 2014). Anecdotally, astronauts have reported that the familiar faces of fellow astronauts look odd in space (de Schonen, Leone, & Lipshits, 1998). This may be attributed to the absence of earth's gravity effects, which we have grown accustomed to pulling the loose components of the face toward the earth.
Although the influence of gravity has not been formally examined with faces, from the available research, it is a reasonable assumption that prior beliefs about the direction pull of gravity might possibly influence perception. Although we acknowledge that viewing other people hanging upside–down is rare, we offer the following observation. It is our belief that a casual viewer observing, side-by-side, two people who were hanging upside–down, one of whom was subject to normal effects of gravity on the face (i.e., pulling downward toward the center of the earth), the other of whom was subject to “inverted gravity” (i.e., pulling skyward) would easily be able to (1) observe a difference between the two faces and (2) understand which face was versus was not subject to normal physical laws. We expect that prior knowledge about gravity will impact the way faces are processed, with violation of expectations enhancing the N170.
In this study, we examined how the direction of lighting and gravity influence the processing of upright and inverted faces, and whether these contextual cues influence the FIE on the N170 amplitudes. Full front photographs of faces were taken of models positioned either upright or hanging upside–down, to vary the effect of gravity systematically. In addition, light sources were varied with lighting directed to illuminate the face from above or below. The photographs themselves were then presented upright or inverted. By examining the event-related responses to faces in this 2 × 2 × 2 design, we aimed to determine whether the N170 was affected by inversion alone, or whether, as we surmised, gravity and lighting cues also might play a role.
Moreover, we further hypothesized that these factors would interact such that faces violating expectations of orientation, lighting, and gravity would evoke a larger N170 amplitude, and faces that conformed to expectations would evoke a smaller amplitude. In addition, as research has shown the latency of the N170 is consistently delayed to inverted faces across stimulus types (Latinus & Taylor, 2006; Rossion et al., 1999), we hypothesized that the latency of the N170 would be influenced solely by orientation effects, with the latency significantly more delayed to inverted faces than upright faces.
Twenty-eight participants met inclusion criteria and were tested. Two participants were excluded from analyses, one because of high impedance and the other because of extreme outlying scores, resulting in 25 participants contributing to the analyses, of which 21 were women, age ranged from 18 to 30 years (M = 22.1, SD = 4.3). Participants were recruited through Queensland University of Technology's online research recruitment system SONA. Participants were allocated either a $10 Coles Myer gift card or 2% of course credit for their participation. Participants were required to have normal or corrected-to-normal vision and no history of neurological disorders. Before the experiment, participants were asked to read an information document outlining the study's design and associated risks and provide written consent to take part. Ethical approval for the study was granted by Queensland University of Technology's Human Research Ethics Committee (Approval No. 1500000236).
A three-way repeated-measures design was applied to investigate the N170 response to photographed face stimuli that violated or did not violate prior expectations of orientation, lighting, and gravity. Each of the three independent variables had two levels of manipulation: Orientation was normal (depicted upright) or violated (inverted), lighting was normal (illuminated from above) or violated (illuminated from below), and gravity was normal (pulling downward) or violated (pulling upward). Each participant was exposed to a randomized presentation of faces, whereas the evoked fluctuations in postsynaptic potentials emitted by the brain and detectable at the scalp were recorded using EEG. The dependent variables were the amplitude and latency of the N170, a minimum peak occurring between 140 and 220 msec post stimulus onset and measured at left electrode sites (P7, P9) and right electrode sites (P8, P10).
The face stimuli were derived from static images of four male and four female models, photographed with a closed mouth, neutral expression, and from a frontal viewpoint. The photographs were taken against a black backdrop within four varied settings: illuminated from above with the model standing upright, illuminated from below with the model standing upright, illuminated from above with the model hanging upside–down, and illuminated from below with the model hanging upside–down. Soft box lights with a diffuse filter and 135W bulb were positioned at an equidistance within each setting, 110 cm from the face at a 45° angle. Photographs were taken at eye level using a Canon E05 SLR with an 18- to 55-mm lens, positioned on an adjustable height tripod 150 cm from the face.
A series of photo manipulations were undertaken using Adobe Photoshop CC 2018. The four types of images were converted to grayscale, and a black oval frame was placed over each face to remove the ears, neck, and hair. The images were scaled to size (W = 481 pixels, H = 500 pixels), and the resolution altered to 72 pixels per inch. Each image was aligned by the inner eye cornice to ensure the faces did not deviate during the sequence and contribute to a pop out effect. Faces were intensity normalized to ensure equal mean and variance of grayscale scores. Because of discrepancies in appearance after intensity normalizing, each image was normalized to each other's image to yield four intensities for each setting, resulting in 256 total images. All four versions of intensity were used in the study to ensure it was correctly balanced. The four image types were also rotated 180° to produce a reciprocal upright or inverted image, creating eight conditions composed of different orientation, lighting, and gravity interactions (see Figure 1). A second set of duplicate images was produced with a small red dot in the center of each face, to form the stimuli for the red dot attention task.
Each participant was fitted with an electrode cap, and an electro-conductive gel was applied at each of the 64 scalp sites. To allow the brain's electrophysiological response to the stimuli to be recorded, electrodes were then applied at each of the 64 corresponding points on the cap. Participants were instructed to remain still and fixate their gaze at the center of the screen while they passively viewed a randomized sequence of stimuli. To ensure visual attention was maintained, participants were required to complete a red dot vigilance task. When a red dot was visible on the screen, participants were instructed to respond by pressing the space bar.
The randomized sequence of stimuli was presented in two blocks, with an intervening 60-sec break provided between the two blocks of images to allow the participant to rest their eyes. Each block contained 288 images, composed of 256 trial images (32 images for each condition) and 32 red dot task image, which were later excluded from analysis. Each image was presented for 600 msec, preceded by a 400-msec interstimulus fixation point (see Figure 2). PsychoPy software (Version 2) was used to deliver the task, with the stimuli sequence presented on an HP widescreen monitor with 1920 × 1080 pixel resolution. Participants were seated in a darkened room approximately 100 cm from the monitor.
EEG Recording and Data Analysis
A BioSemi Active Two Acquisition System (ActiView Version 7.06, BioSemi, 2013) with 64 channels and a sample rate of 1024 Hz was used to collect the EEG recordings. The international 10–20 electrode montage system was implemented to determine scalp sites, with common mode sense and driven right leg used as reference channels.
Brain Vision Analyzer 2.0 software was used to process data. A bandpass filter from 0.1 to 40 Hz was applied to remove high-frequency and low-frequency artifacts. A Notch filter at 50 Hz was also applied to remove Australian power-line noise. The detection and attenuation of eye blinks was accomplished with independent component analysis. Topographic interpolation was also carried out on channels with high impedance to remove and replace the noisy signal with a weighted average from other surrounding electrodes. Data for each of the eight conditions was then divided into segmented epochs, which were time-locked from −100 to 600 msec, with the period preceding stimuli presentation (−100 to 0 msec) functioning as a baseline. An averaged ERP waveform was created for each participant at each of the eight conditions across a left hemisphere electrode cluster (P7, P9) and a right hemisphere electrode cluster (P8, P10). These electrodes were selected as they corresponded to lateral occipitotemporal electrode sites commonly reported within the literature when considering the N170 (Robinson et al., 2020; Johnston, Overell, Kaufman, Robinson, & Young, 2016; Rossion & Caharel, 2011). The N170 was calculated as an average value across ±10 msec around the largest minima between 140 and 220 msec post stimulus onset (Luck, 2014). Data consisted of the amplitude of the N170 measured in microvolts (μV); the latency of the N170 was measured in msec.
N170 amplitude and latency measures were analyzed by four-way repeated-measures ANOVAs with factors of lateralization (left vs. right), orientation (normal vs. violated), lighting (normal vs. violated), and gravity (normal vs. violated). Lateralization did not have major effects on either measure, and therefore, unless otherwise noted, the averaged amplitude and latency collapsed over left and right electrodes are reported below.
The mean and standard deviation of the N170 amplitude and latency, for each condition, are provided in Table 1. Overall, these descriptive statistics are consistent with the hypothesis, with the mean amplitude of the N170 largest when orientation, lighting, and gravity expectations were all violated (OV LV GV), and smallest when orientation, lighting, and gravity expectations were not violated (ON LN GN). However, visual depiction of means (see Figure 3) suggested that this was not a linear effect as the amplitudes did not consistently increase with the violation of more factors.
|Condition .||Amplitude (μV) .||Latency (msec) .|
|M .||SD .||M .||SD .|
|ON LN GN||−0.00||1.62||179.36||14.70|
|ON LN GV||−0.86||2.05||178.40||13.83|
|ON LV GN||−0.31||1.65||178.00||14.09|
|ON LV GV||−0.43||1.70||177.92||16.37|
|OV LN GN||−1.82||2.24||184.30||13.40|
|OV LN GV||−1.26||2.31||183.08||12.41|
|OV LV GN||−1.55||2.35||181.50||10.41|
|OV LV GV||−1.92||2.33||184.66||12.98|
|Condition .||Amplitude (μV) .||Latency (msec) .|
|M .||SD .||M .||SD .|
|ON LN GN||−0.00||1.62||179.36||14.70|
|ON LN GV||−0.86||2.05||178.40||13.83|
|ON LV GN||−0.31||1.65||178.00||14.09|
|ON LV GV||−0.43||1.70||177.92||16.37|
|OV LN GN||−1.82||2.24||184.30||13.40|
|OV LN GV||−1.26||2.31||183.08||12.41|
|OV LV GN||−1.55||2.35||181.50||10.41|
|OV LV GV||−1.92||2.33||184.66||12.98|
OV LV GV indicates violation of orientation, lighting, and gravity, respectively, whereas ON LN GN indicates no violation on any of these parameters.
More specifically, results showed that violation of orientation and gravity enhanced N170 amplitude, but these effects primarily occurred when lighting was normal. The four-way repeated-measures ANOVA yielded the following significant main effects and interactions: the main effect of Orientation, F(1, 24) = 21.67, p < .001, ηp2 = .47; the main effect of Gravity, F(1, 24) = 4.29, p = .049, ηp2 = .15; the interaction between Orientation and Gravity, F(1, 24) = 5.91, p = .023, ηp2 = .20; and the interaction between Orientation, Lighting, and Gravity, F(1, 24) = 10.76, p = .003, ηp2 = .31. Among these, the three-way interaction is important as it qualifies the other significant main effects and interaction. That is, the three-way interaction occurred because orientation and gravity significantly interacted only when lighting was normal: Tests of simple interaction effects indicated that the two-way interaction between Orientation and Gravity was significant when lighting was normal, F(1, 24) = 7.43, p = .011, ηp2 = .22, but not when lighting was violated, F(1, 24) = 0.54, p = .47, ηp2 = .02.
Focusing on conditions in which lighting was normal, inspection of marginal means suggested that the two-way interaction between Orientation and Gravity emerged because the effect of Orientation was evident when gravity was normal but not when gravity was violated (ON GN: M = −0.00 μV; SD = 1.62 μV; OV GN: M = −1.82 μV; SD = 2.24 μV; ON GV: M = −0.83 μV; SD = 2.05 μV; OV GV: M = −1.26 μV; SD = 2.31 μV). Indeed, tests of simple main effects of Orientation in the LN conditions were significant only when gravity was normal, F(1, 24) = 14.47, p < .001, ηp2 = .38 (ON: M = −0.00 μV; SD = 1.62 μV; OV: M = −1.82 μV; SD = 2.24 μV), not when gravity was violated , F(1, 24) = 0.84, p = .37, ηp2 = .01 (ON: M = −0.83 μV; SD = 2.05 μV; OV: M = −1.26 μV; SD = 2.31 μV).
In summary, these results showed that orientation, lighting, and gravity all contributed to N170 amplitude. Orientation and gravity exerted their effects both through interactions and on their own. Lighting modulated the amplitude by interacting with orientation and gravity.
In addition, effects of lateralization were found in the omnibus ANOVA as a significant interaction between Lateralization and Orientation, F(1, 24) = 4.61, p = .042, ηp2 = .16, and a marginally significant four-way interaction between Lateralization, Orientation, Lighting, and Gravity, F(1, 24) = 4.17, p = .052, ηp2 = .15. These interactions occurred because effects of Orientation, Lighting, and Gravity were generally clearer in the right hemisphere. For example, the four-way interaction trend was driven by a strong three-way interaction between Orientation, Lighting, and Gravity in the right hemisphere. Tests of simple interaction effects examining the three-way interaction within each hemisphere were significant in the right hemisphere, F(1, 24) = 14.47, p < .001, ηp2 = .33, but not in the left, F(1, 24) = 0.84, p = .37, ηp2 = .05. However, because the four-way interaction was only marginally significant, detailed assessment of the three-way interaction was carried out by collapsing data over left and right electrode sites, as described in the foregoing paragraphs. Similarly, inversion of faces caused a greater change in N170 amplitude in the right hemisphere than in the left: Tests of simple main effects of Orientation were significant in the right hemisphere, F(1, 24) = 12.07, p = .002, ηp2 = .33, but not in the left, F(1, 24) = 3.82, p = .06, ηp2 = .14(ON: M = −0.39 μV; SD = 1.76 μV; OV: M = −1.564 μV; SD = 2.31 μV). These lateralization effects did not alter interpretations of the main findings about orientation, lighting, and gravity.
All the other main effects and interactions not reported above were nonsignificant in the omnibus ANOVA.
The mean and standard deviation of the N170 latency for each condition, recorded across the left and right hemisphere electrode clusters, are provided in Table 1. These means provide support for the hypothesis that inverted faces evoke a more delayed N170 latency than upright faces. The four-way repeated-measures ANOVA on latency showed that the main effect of Orientation was significant, F(1, 24) = 27.31, p < .001, ηp2 = .53 (ON: M = 178.42 msec; SD = 14.75 msec; OV: M = 183.39 msec; SD = 12.30 msec). There were no other significant main effects or interactions.
Our aim was to investigate how multiple, orthogonal violated expectations contribute to the enhancement of the N170 in the FIE. To this end, the N170 ERP was measured while participants viewed upright or upside–down faces, which violated expectations of the direction of lighting and gravity. We hypothesized that faces with more violating expectations would evoke a larger N170, than faces that conformed to expectations, and that the N170 latency would be influenced by the effects of orientation alone.
The largest N170 amplitudes were observed when orientation, gravity, and lighting were all violated, and the smallest when none of these attributes were violated. However, this was not the consequence of linear addition of the effects of the three factors. Rather, they modulated N170 amplitude largely through interactions. Specifically, although violation of orientation or gravity alone led to increased N170 amplitudes, these factors also significantly interacted such that the violation of gravity reliably increased N170 amplitude only when orientation was normal. Furthermore, this interaction occurred only when lighting was normal. Thus, the current results showed that all three factors are important for understanding ERP correlates of the FIEs: Violating expectations about orientation or gravity enlarged N170s, but simultaneous violation of both factors did not necessarily result in further enhancement of N170 amplitude; and for N170 amplitude to be reliably modulated, faces had to be lit from above. Finally, as predicted, the N170 latencies were influenced by the effects of orientation alone, with the N170 significantly more delayed to inverted faces than upright faces. Overall, these findings support the notion that the N170 is modulated by the violation of prior expectations.
These results challenge the prevalent view within the literature that the enhanced N170 to an inverted face stems simply from the disruption of face-specific, structural encoding processes, which adapt to inversion by recruiting alternate physiognomic feature processes (Sagiv & Bentin, 2001) or eye-selective neurons (Itier et al., 2007). Contrastingly, the current findings suggest that, when an image of an upright face, photographed under normal circumstances—above illumination with gravitational pull from below—is inverted, multiple expectations are violated. As upon rotating the image upside–down, the face now appears illuminated from below and as though gravity is pulling from above, which is a simultaneous violation of orientation, lighting, and gravity expectations (Jörges & López-Moliner, 2017; Brodski et al., 2015; Cheron et al., 2014; Itier & Taylor, 2004).
The current findings are consistent with research by Johnston et al. (2017), with both studies finding the amplitude of the N170 was responsive to violated expectations. The Johnston et al. (2017) study induced expectations within each trial by displaying a progressive, implied trajectory, with the final stimuli confirming or violating expectations of trajectory. Contrastingly, the current study used a randomized design, with expectations of orientation, lighting, and gravity assumed to be established from consistent, lifelong exposure to the effects of each factor. Evidence suggests a gradual tuning to upright faces over inverted faces (Itier & Taylor, 2004), an expectation that light originates from above (Brodski et al., 2015; Kleffner & Ramachandran, 1992) and a sophisticated intuitive physics model of gravitational effects (Russo et al., 2017; Cheron et al., 2014). Thus, the current study expands upon Johnston et al. (2017), demonstrating that the N170 appears to be enhanced to violations of well-established lifelong priors, as well as to violations of immediate contextual-based expectancies.
A predictive coding account of our findings is attractive to us for a number of reasons. First, our own work revealing that N/M170 amplitudes are strongly modulated by expectation violations demonstrates the importance of contextually bound expectations in visual ERPs/ERFs (Robinson et al., 2018, 2020; Johnston et al., 2017; Simpson et al., 2015). This led us to ponder the question of whether FIEs on the N170 amplitude might, in part, reflect expectation violations, and led us to identify the issue that almost all FIE studies involving photographic stimuli are confounded by simultaneously violating expectations about lighting and gravity in addition to the intended focus of study—orientation, thus the inspiration and genesis of the current study.
Second, predictive coding may provide a unifying framework within which the gamut of cognitive phenomena relating to receiving information, transmitting information, understanding, planning, and acting may all be understood. In a recent article, Trapp, Schweinberger, Hayward, and Kovács (2018) suggest not only that predictive coding may provide a framework for complementing existing cognitive models of face processing but also that the field of face processing provides an excellent testbed for honing our understanding of the mechanisms by which predictive neurocognition are instantiated. This is precisely because of the wealth of data and the rich detail of the cognitive models in the field. We endorse this view and believe that the current study offers valuable insights in this regard.
A simplistic interpretation of a predictive coding account would be that the effects of violation across multiple orthogonal stimulus attributes would be additive. That is to say that the more attributes that are violated, the larger the combined prediction error signal should be. Our results partly conform to this pattern—the N170 is smallest when none of the stimulus attributes are violated and largest when all three are violated. However, it is not simply the case that the violation of two attributes leads to greater N170 amplitudes than when only attribute is violated. Indeed, our data clearly demonstrate that, although the orientation, lighting, and gravity all interact in modulating the N170 amplitude, only orientation and gravity have independent effects, and the effects of orientation are clearly the largest.
It is interesting to speculate on the reasons why the effects of violations of orientation influence the N170 amplitude more so than violations of gravity or lighting. As adult humans, it is very rare for us to look at an inverted face. It is also very rare to see faces where normal expectations about lighting are violated, and only a few hundred humans (i.e., astronauts) have been subjected to conditions where the normal expectations with respect to the effects of gravity on a face are violated. Let us focus first, however, on the inverted face. We might occasionally see such a thing; for instance, when strolling along a beach, or park, we might pass another person lying with their body in such a position that their face is “upside–down” relative to our own position. However, if we were to engage socially with the “upside–down” person, social norms dictate that we would, almost certainly, immediately resolve a mutual positioning that facilitated face-to-face interaction. Similarly, we might see a face on a magazine cover that is “upside–down” relative to our position. However, if we wished to look at the face, we would, almost certainly, reorient the magazine. The key insight here is that, in the real world, expectation violations with respect to face orientation are likely to trigger actions that aim to resolve the “error” and place the faces in an “upright” orientation. Such considerations may hold true for other objects for which there is a canonical orientation. For instance, if we wish to look at a map (that is, understand), rather than simply seeing it (see Johnston, Baker, Stone, & Kaufman, 2014), we are likely immediately to act so as to reorient it to its canonical viewpoint. If we see an upside–down car, we might be prompted to ensure that people are safe. In such circumstances, orientation violations prompt behavioral responses.
What of violations of expectations with respect to the effects of lighting and gravity upon the face? It is far less clear that they signal the need for or desirability of action in the same way that expectation violations with respect to orientation do. Rather, such signals may reflect that a contextual attribute of the face has fallen outside the “normal” bounds of expected statistical likelihood and, although a little surprising by consequence, represent a need to slightly update the ongoing statistical model of those contextual attributes, rather than to take immediate action. Unless such anomalies lead to rapid real-world consequences that merit them being bestowed a greater weighting and thus be encoded as a paradigm-changing event. To our knowledge, there have been no previous studies looking at the effects of gravity on FIEs, but our results confirm that it may be an important factor that warrants further attention. Interestingly, a previous article (Enns & Shore, 1997) has examined the effects of lighting direction and orientation in a series of behavioral experiments. These researchers reported that the effects of these two factors were sometimes additive and sometimes interacting, dependent upon the specific task demands. This suggests that the effects of lighting may be modulated by context—an idea that jibes well with a predictive coding framework.
As mentioned earlier, a naive and simplistic interpretation of predictive coding might lead to the prediction that the effects of expectancy violations across multiple orthogonal stimulus attributes might be additive. However, even the briefest reflection reveals that the predictive coding framework in no way presupposes such linear additivity. It considers our perceptions as being resolved through the interaction, cooperation, and competition of many multidimensional statistical models within dynamic hierarchical or nested hierarchical contexts. Perhaps when faces are inverted, error signals to other attributes may be modulated, attenuated, or suppressed, so as to prioritize the error signal that promotes appropriate action, or, perhaps, our cognitive systems have multiple overlapping, partially overlapping, and nonoverlapping statistical models relating to how lighting and gravity affect nonrigid objects, and all or any of these might interact with a statistical model of “upside–down faces.” We do not suggest here that either of these particular cases are responsible for the pattern of results that observe—we simply propose them as examples of the types of complexities that conceivably come into play.
As such, it is unclear to us that a lack of linear additivity with respect to the effects of multiple orthogonal expectation violations offers a strong argument against a predictive coding account. By the same token, we are also unsure that the presence of linearly additive effects would represent a definitive demonstration of predictive coding. Notwithstanding, our results clearly demonstrate that two previously generally overlooked factors—lighting and gravity—contribute to the face N170 amplitude and to the N170 FIE. Moreover, we demonstrate that expectation violations with respect to two sole attributes (orientation and gravity) significantly increase N170 amplitude. For lighting, the trend is in the same direction, and it is possible that the failure to observe a significant sole effect of lighting may reflect a lack of statistical power. These are novel and potentially important findings in an area where much is still unknown and opens a new vista from which to consider the issue. From the perspective that “all models are wrong—but they more be more or less useful,” we propose that the predictive coding model may be useful in this context because it serves as a useful “intuition pump” for generating new ideas and testable hypotheses.
The current study demonstrated that the N170 amplitude is modulated by a complex interaction of multiple expectations, which influence how a face is perceived, whereby when violation of lifelong contextual priors occurred, a larger N170 was elicited. These findings offer a plausible explanation for the FIE and provides insight into the role of contextual priors in visual perception.
Yasmin Allen-Davidian: Investigation, Methodology; Project administration; Resources; Writing – Original Draft; Writing – Review & Editing. Manuela Russo: Supervision; Writing – Review & Editing. Naohide Yamamoto: Formal analysis; Writing – Review & Editing. Jordy Kaufman: Conceptualization; Writing – Review & Editing. Alan J. Pegna: Conceptualization; Supervision; Writing – Review & Editing. Patrick Johnston: Conceptualization; Project administration; Supervision; Writing – Original Draft; Writing – Review & Editing.
We would like to acknowledge the gentle encouragement of the Oily Rag Foundation (EN10006).
Reprint requests should be sent to Patrick Johnston, School of Psychology and Counselling, Queensland University of Technology, Brisbane, QLD 4059, Australia, or via e-mail: email@example.com.