The direction of others' gaze is a strong social signal to their intentions and future behavior. Pioneering electrophysiological research identified cell populations in the primate visual cortex that are tuned to specific directions of observed gaze, but the functional architecture of this system is yet to be precisely specified. Here, we develop a computational model of how others' gaze direction is flexibly encoded across sensory channels within the gaze system. We incorporate the divisive normalization of sensory responses—a computational mechanism that is thought to be widespread in sensory systems but has not been examined in the context of social vision. We demonstrate that the operation of divisive normalization in the gaze system predicts a surprising and distinctive pattern of perceptual changes after sensory adaptation to gaze stimuli and find that these predictions closely match the psychophysical effects of adaptation in human observers. We also find that opponent coding, broadband multichannel, and narrowband multichannel models of sensory coding make distinct predictions regarding the effects of adaptation in a normalization framework and find evidence in favor of broadband multichannel coding of gaze. These results reveal the functional principles that govern the neural encoding of gaze direction and support the notion that divisive normalization is a canonical feature of nervous system function. Moreover, this research provides a strong foundation for testing recent computational theories of neuropsychiatric conditions in which gaze processing is compromised, such as autism and schizophrenia.
The ease with which we can identify another's direction of gaze belies the computational challenge that the brain faces in extracting this information from the visual stream. Our perception of gaze direction is derived from multiple sensory characteristics of the eye region and head (Otsuka, Mareschal, Calder, & Clifford, 2014; Langton, 2010; Wollaston, 1824) and rests upon sensory mechanisms that are sensitive to the recent history of stimulation (Jenkins, Beaver, & Calder, 2006), fluctuations in the reliability of sensory information (Mareschal, Calder, & Clifford, 2013), and higher-order expectations about the social context (Teufel et al., 2009). Gaze processing has critical importance to social behavior and development in humans and other primate species (Rosati & Hare, 2009; Baron-Cohen, 1997) and is implicated in neuropsychiatric conditions including autism, schizophrenia, and social anxiety disorder (Green, Horan, & Lee, 2015; Senju, 2013; Den Boer, 2000). Moreover, the representation of gaze direction presents as a model system for the emerging field of social neuroscience, as it is enacted by neural circuitry that is relatively well localized in the primate brain (Carlin & Calder, 2013) and entails a stimulus property that can be defined parametrically.
There is evidence that the direction of others' gaze is encoded in the brain across cell populations that are tuned to distinct angles of gaze. Pioneering electrophysiological recordings in the macaque cortex identified cells in the anterior STS that were maximally sensitive to specific directions of eye gaze and head rotation (Perrett, Hietanen, Oram, & Benson, 1992; Perrett et al., 1985, 1990). The presence of distinct cell populations in the human brain that are tuned to leftward- and rightward-averted gaze has been demonstrated using sensory adaptation. Prolonged exposure to leftward-averted gaze causes an apparent shift away from the left in the perceived gaze direction of subsequent face stimuli (Calder, Jenkins, Cassel, & Clifford, 2008; Jenkins et al., 2006; Seyama & Nagayama, 2006) as well as a reduction in the BOLD response in anterior STS to leftward- compared with rightward-averted gaze stimuli (Calder et al., 2007). The complementary pattern of perceptual and physiological effects is observed after adaptation to rightward-averted gaze, indicating the existence of cell populations that can be selectively adapted based on their differential response to leftward- and rightward-averted gaze directions.
An outstanding question in social neuroscience, however, is how the brain encodes parameters such as gaze direction across an ensemble of cell populations. Here, we take a psychophysical approach to gaze direction that has previously revealed the functional architecture of systems in the occipital cortex that encode orientation and spatial frequency (Regan & Beverley, 1985; Blakemore & Campbell, 1969). We implement, for the first time, a computational model of gaze direction, in which information is combined across sensory channels tuned to different directions of gaze. A key functional mechanism that we implement in modeling this cross-channel response is divisive normalization, which is a form of gain control that has been proposed as a canonical computational feature of neural responses in the visual cortex (Carandini & Heeger, 1994, 2012) but is yet to be demonstrated for gaze processing or social vision more broadly. This mechanism is thought to play a fundamental role in the neural coding of stimulus properties by controlling for extraneous contextual influences on sensory responses. A widespread failure of divisive normalization in sensory processes has very recently been proposed to occur in autism (Rosenberg, Patterson, & Angelaki, 2015), highlighting the importance of examining this mechanism in the context of social function.
Importantly, by proposing a specific computational structure for the neural system encoding gaze direction, we are able to identify a set of novel quantitative predictions regarding the effects of sensory adaptation on the represented direction of gaze. Most notably, we demonstrate that the interaction between neural adaptation and normalization mechanisms predicts a distinctive pattern of changes in the encoded stimulus after sensory adaptation that is yet to be observed empirically. This allows us to test for the operation of divisive normalization in gaze processing by comparing model predictions to the profile of perceptual aftereffects seen in human observers when tested on stimuli that span the continuum of physically realizable gaze directions. We hypothesized that perceived gaze direction in human participants after sensory adaptation would match the predicted cross-channel response of the divisive normalization model in the relative magnitude and direction of perceptual aftereffects. We also compare the predicted effects of sensory adaptation between models of gaze perception that rest on opponent coding (i.e., sensory channels tuned to left vs. right gaze), broadband multichannel coding (e.g., three channels tuned broadly to left, right, and direct gaze), and narrowband multichannel coding (i.e., a continuum of narrowly tuned sensory channels).
Overview of Experimental Methods
To test the predictions of the computational models of gaze perception (described below), we examined in human participants how the perception of gaze direction is altered after adaptation to gaze stimuli. A limitation of existing research on gaze adaptation is the use of categorical measures of perceived gaze direction, meaning that sensory adaptation is only apparent when responses cross category boundaries (e.g., from averted gaze to direct gaze). As a result, the effects of adaptation have only been tested on a narrow range of gaze deviations centered on direct gaze, and the magnitudes of these effects have not been directly quantified. In contrast, this study employs a continuous measure of perceived gaze direction, allowing us to measure the effects of adaptation on the perception of stimuli across the full range of horizontal eye deviations and to quantify the magnitude and direction of these effects. As we will see, this expanded paradigm for measuring the effects of adaptation is critical to discern the operation of divisive normalization in gaze processing and to distinguish between opponent coding and multichannel models.
Participants were six healthy adults (three women, three men), who completed all procedures in a repeated-measures design. This included the two authors and four naive observers. The UNSW Sydney human research ethics committee approved experimental procedures, and all participants gave written informed consent.
All stimuli were computer-generated faces seen from a frontal view (Figure 1). Face textures and 3-D face models were generated using Face-Gen Modeller 3.5 and manipulated using Blender 2.70, as described previously (Otsuka, Mareschal, & Clifford, 2015). The eyes were modeled separately from other face components, and their rotation was set to precisely control both the direction of eye gaze relative to the viewer and the vergence of the two eyes. The head rotation and vertical eye gaze angle were constant across all stimuli and directed at the viewer.
Stimuli were generated for 15 horizontal gaze directions ranging between 35° leftward gaze and 35° rightward gaze in 5° intervals. For each direction of gaze, stimuli were generated for each of the three female and three male face identities. To control for any left–right asymmetries in the stimuli, horizontally flipped versions of each image were also used.
Images were sized such that there was an interpupillary distance of 6.3 cm when head and gaze directions were direct. This is consistent with the average interpupillary distance in adults (e.g., Fesharaki, Rezaei, Farrahi, Banihashem, & Jahanbkhshi, 2012); thus, test images were displayed at approximate life size. The vergence of the stimulus eyes was such that the implied fixation distance of the stimuli was approximate to the participant's viewing distance (50 cm). This was designed so that stimuli with direct gaze would appear to be fixating on the participant rather than looking in front of or behind them.
The angles of gaze for adaptor stimuli were 25° leftward gaze, 25° rightward gaze, and direct gaze. This follows previous studies of gaze adaptation (Calder et al., 2008; Jenkins et al., 2006) and ensured that we could test the effects of adaptation on subsequent gaze directions that were both less averted (i.e., <25°) and more averted (i.e., >25°) than the adapted direction. The effects of adaptation were tested across the full range of gaze directions generated (Figure 1D).
It has been demonstrated previously that gaze aftereffects are due at least in part to adaptation of systems representing gaze direction specifically, rather than being limited to an effect of adaptation to lower-level features of the image (Jenkins et al., 2006; Seyama & Nagayama, 2006) or a general effect on spatial bias or perceived object rotation (Jenkins et al., 2006; Fang & He, 2005). In this study, several measures were taken to ensure that adaptor stimuli and test stimuli had very different lower-level image properties, such that the aftereffects observed could be attributed to adaptation of represented direction of gaze. This included presenting adaptor stimuli at a different size to the test stimuli (with a ratio of areas of 3:4) and jittering the spatial position of the test stimuli randomly between trials (by a maximum of 50 pixels in the horizontal and vertical dimensions). Moreover, in each session, differently gendered images were used for the adaptor and test stimuli—thus, the effect of sensory adaptation was always tested on different face identities to those adapted on. The stimulus gender used for adaptation versus test alternated between participants.
Stimuli were presented on Viewsonic (Walnut, CA) Graphics Series G90f and DiamondDigital CRT monitors (1280 × 1024 pixels, approx. 35 pixels per cm, 85 Hz) with the use of MATLAB PsychToolbox (Brainard, 1997).
Each participant completed three adaptation conditions (leftward gaze, rightward gaze, and direct gaze) in separate sessions conducted on separate days. The order of conditions was counterbalanced across participants.
Each session began with a baseline test of gaze perception, in which participants viewed a series of face images and responded by indicating the direction in which each face was looking (Figure 1A). In each trial, the test stimulus was presented for 500 msec. The stimulus was ramped on and then off over the presentation period by applying a raised-cosine envelope to stimulus contrast. The computer produced a short beep upon onset of the test stimulus. Blank screen ISIs were included before and after presentation of the test stimulus, lasting 200 and 300 msec, respectively.
Responses were made along a continuous dimension, using a spherical on-screen pointer that could be rotated in the horizontal plane between 90° leftward to 90° rightward around direct. The initial position of the pointer in each trial was randomized within this range. The pointer was displayed subsequent to the test stimulus, and participants were given 4 sec to make their response using the mouse. The duration of the interval between trials was fixed such that the period between the beginning of the response period in a given trial and the beginning of the next trial was always 4 sec. If participants failed to respond within 4 sec, the trial was repeated at the end of the block.
Participants completed two blocks for the baseline period. Each block consisted of 90 trials (15 gaze directions × 6 face identities), presented in a random order, for 180 trials. In each trial, the rotation of the pointer response was recorded as a continuous angle ranging from 90° leftward to 90° rightward around direct. For each participant, pointer responses were averaged across the 12 repetitions of each gaze direction. The baseline period was conducted in each of the three sessions that participants completed.
After the baseline period, participants completed an adaptation period that also consisted of two blocks. Depending on the session, participants were adapted to a specific direction of gaze: either 25° leftward gaze, 25° rightward gaze, or direct gaze.
Participants were shown an initial series of adaptor stimuli at the beginning of each block (Figure 1B). This consisted of 15 images displayed for 4 sec each (for a total duration of 60 sec). The adaptor images for each condition (described in Stimuli) were each presented an equal number of times during this period, in a random order. Participants were instructed to attend to the eyes when observing the adaptor images. To ensure that this was the case, participants were required to carry out a task during the adaptation period. For 20% of the presented images, the iris color would briefly turn blue for 200 msec at a random point within the 4 sec that the image was presented. Participants were instructed to press a button each time this occurred.
After adaptation, the perception of gaze direction was again tested using a similar procedure to the baseline period (Figure 1C). The only difference during the postadaptation test was that an adaptor stimulus was presented for 4 sec at the beginning of each trial. This usage of “top-up” adaptor stimuli follows the procedure of previous gaze adaptation studies (Calder et al., 2008; Jenkins et al., 2006) and is intended to maintain the level of sensory adaptation throughout the testing period. The adaptor stimuli were the same as those presented during the initial adaptation period and were presented in a random order, with each image shown an equal number of times across the block.
To compute a profile of sensory aftereffects for each session, the mean pointer response for each of the 15 gaze directions tested during the baseline period was subtracted from that of the adaptation period. An example of individual participant data is displayed in Figure 1E.
Three-channel Model of the Gaze System
Sensory channels are a well-established concept in psychophysics, referring to cell populations in the nervous system that are selectively responsive and differentially tuned along a given stimulus dimension (e.g., to ranges of orientation or spatial frequency). Typically, the percept is modeled as a weighted combination of the stimulus attribute represented by those channels, where the weighting corresponds to the channel activation (Salinas & Abbott, 1994; Georgopoulos, Schwartz, & Kettner, 1986). As described in the Introduction, evidence for this form of functional segregation in the primate brain has been reported in the context of stimulus gaze direction (Calder et al., 2007; Perrett et al., 1985). Importantly, the perceptual effects of selective sensory adaptation can indicate how a stimulus property is represented across a system of sensory channels (i.e., population coded). Psychophysical data reported by Calder and colleagues (2008) suggest that horizontal gaze direction is coded by at least three sensory channels, rather than by a simpler (two-channel) opponent system. In its simplest form, a multichannel system for gaze representation entails channels that are broadly tuned to rightward, leftward, and direct gaze, respectively. Thus, in the current article, we focus first on a three-channel model, which is an example of broadband multichannel coding. For comparison, we also implement an opponent model and a narrowband multichannel model, described in Results. We implement the three-channel model of the coding of horizontal gaze direction as follows.
We first fitted the model to an ensemble of the data reported by Calder and colleagues (2008) using the MATLAB fminsearch function to optimize parameters g and s determining the channel sensitivity profiles, σN and b governing the transformation into a forced-choice decision, and α determining the degree of adaptation. This included unadapted data and data after adaptation to several different directions of gaze: 25° averted gaze, 10° averted gaze, direct gaze, and interleaved left–right 25° averted gaze. Optimization was implemented as the minimization of the residual variance unexplained across the entire data set.
Model predictions were generated for the effects of sensory adaptation to 25° leftward-averted gaze, 25° rightward-averted gaze, and direct gaze, on stimuli that ranged across the spectrum of horizontal gaze directions. The final step was to fit model predictions to the continuous response data collected in this study. The channel parameters (g, s) were fixed to the values that best fit the data reported by Calder and colleagues (2008), whereas the decision parameters (b, σN) were not relevant to the present data set as responses were continuous rather than categorical. We allowed the degree of adaptation, α, to vary given that differences in the experimental protocol between this study and the study by Calder and colleagues (2008) likely resulted in different degrees of sensory adaptation. As above, the fminsearch function was used to minimize the sum of squared errors between the scaled multichannel responses and mean participant pointer responses, in each case using the difference between unadapted and adapted states. This was done simultaneously across the 15 tested directions of gaze and summed across the three adaptation conditions. Thus, the fit of model predictions to the experimental data collected in this study was, conservatively, a single-parameter fit.
Generating Quantitative Predictions of Sensory Aftereffects for Gaze
The three-channel model of encoded gaze direction was first fit to the categorical response data reported by Calder and colleagues (2008). The resultant channel sensitivities in the unadapted (baseline) state are displayed in Figure 2A. The fit between modeled categorical responses and the corresponding experimental data are shown in Figure 2B–D. The model fits the data very closely, accounting for 96.3% of the total variance. The parameters that best fit the model to the data were as follows: g = 7.78°, s = 5.57°, b = 0.28, σN = 0.15, and α = 0.69.
The fitted channel sensitivities were then used to generate predictions regarding the direction and magnitude of perceptual aftereffects across stimulus gaze directions ranging from 35° leftward to 35° rightward. Specifically, we modeled the effects of adaptation to 25° leftward gaze, 25° rightward gaze, and direct gaze. Channel sensitivities after adaptation are shown in the top row of Figure 3. The multichannel response (i.e., encoded gaze direction, computed by combining activation across channels) is shown in the center row of Figure 3, both after adaptation and in the unadapted state. The predicted effect of sensory adaptation on perceived gaze direction (i.e., the perceptual aftereffect) was taken as the difference in the multichannel response after adaptation compared with that in the unadapted state. The predicted direction and magnitude of aftereffects are shown in the bottom row of Figure 3. As can be seen, a characteristic profile of sensory aftereffects emerges for adaptation to each of leftward, rightward, and direct gaze. Importantly, the magnitude of perceptual aftereffects depends on the gaze direction of the stimulus in each case, with a surprising nonuniform and nonmonotonic relationship between these variables (discussed in detail below).
Fit between Model Predictions and Continuous Response Data
The present section reports the fit between model predictions generated in the previous section and the continuous response data collected in our experimental task. The value of the scaling parameter, S, that optimized the fit between the multichannel response and perceived gaze direction in the unadapted (baseline) state was 37.0. The degree of adaptation, α, that optimized the fit between the scaled multichannel response and perceptual aftereffects across the three adaptation conditions was 0.51. This corresponds to 74% of the degree of adaptation modeled above for the experimental data reported by Calder and colleagues (2008), suggesting a somewhat lesser degree of adaptation in this study.
The single-parameter fit between model predictions and pointer response data is shown in Figure 4. The model fits the data closely, accounting for 66.7% of the total variance. It is notable that several qualitative aspects of the model predictions are apparent in the experimental data, including (i) the tendency for peak aftereffects in the averted adaptation conditions to occur between the direction of the adaptor stimulus and direct gaze, (ii) the tendency for the magnitude of aftereffects to approach nil both for gaze stimuli more averted than the adaptor stimulus and for gaze stimuli on the opposite side to that adapted, (iii) a degree of normalization of the adapted direction itself after adaptation to averted gaze, (iv) the inverse direction of aftereffects in the leftward and rightward adaptation conditions, and (v) the lesser magnitude of aftereffects in the direct adaptation condition compared with the averted adaptation conditions.
The Effect of Divisive Normalization on Sensory Aftereffects
It is initially surprising that a nonmonotonic relationship exists between stimulus gaze direction and the perceptual aftereffects predicted by the model (and which are apparent in the experimental data). This is because adaptation has a monotonic effect on channel sensitivity, which one might expect to translate to a similarly monotonic effect on the multichannel response such that attenuation of leftward channel sensitivity (for instance) would produce a uniform or global shift in subsequent perception toward rightward gaze. Indeed, the perceptual aftereffects reported by previous studies after adaptation to averted gaze resemble a uniform shift in encoded gaze direction away from the adapted direction (e.g., Calder et al., 2008; Jenkins et al., 2006; Seyama & Nagayama, 2006). However, these previous studies only assessed the effects of adaptation on categorical judgments regarding a limited range of stimulus gaze directions and thus are not definitive regarding how the magnitude and direction of aftereffects present across the spectrum of test deviations. As different gaze deviations engage the three sensory channels to differing degrees, the extent to which a given degree of channel adaptation influences subsequent perception (i.e., the magnitude of the perceptual aftereffect) may depend on the particular test stimulus presented, leading to a nonuniform profile of aftereffects. In addition, it has recently been highlighted that the effects of adaptation on neural responses can be complicated by interactions with both excitatory and suppressive mechanisms in neural systems (Solomon & Kohn, 2014). In this section and the next, we illustrate that the specific profile of aftereffects observed after adaptation is diagnostic of how information is combined across sensory channels in the brain to encode the stimulus parameter.
First, the nonuniform and nonmonotonic pattern of aftereffects predicted by our model of gaze direction comes about because of the operation of divisive normalization in computing the multichannel response from channel activations. Importantly, increased activation of a given channel has both a driving influence on the represented gaze direction (via the numerator of the response function described in the previous section, Three-channel Model of the Gaze System) and an attenuating influence on the represented gaze direction (via the denominator of the response function). The relative strength of these competing influences differs across test stimuli, resulting in the distinctive profile of perceptual aftereffects predicted across the spectrum of test gaze directions.
The effects of adaptation on channel sensitivities and the multichannel response are shown in Figure 7, together with the resulting aftereffects predicted by this opponent model. Previous work has identified that opponent and multichannel systems make different predictions regarding the effect of adaptation to direct gaze (Calder et al., 2008). This is apparent in the results of the modeling reported here by comparing the central column in Figures 3 and 7; specifically, the opponent model predicts no effect of direct gaze adaptation, whereas the three-channel model predicts a small degree of local repulsion around direct gaze (Figure 7). Surprisingly, however, we find that the two- and three-channel models make even more notably different predictions regarding the effects of adaptation to averted gaze. Specifically, adaptation to averted gaze in the opponent model tends to result in a profile of aftereffects that peaks near to direct gaze and falls off in magnitude at approximately the same rate on either side of direct gaze. In contrast, adaptation to averted gaze in the three-channel model, as we have already seen, is associated with peak aftereffects between the point of the adaptor and direct gaze (i.e., an asymmetrical profile around direct gaze).
The fit between the opponent model and the data collected in this study is shown in Figure 8. Qualitatively, it is clear that the aftereffects that we observe after adaptation to averted gaze resemble those predicted by the three-channel model rather than those predicted by the opponent model. Most notably, peak aftereffects occur between the point of the adaptor and the point of direct gaze, rather than showing the pattern centered around direct gaze that is predicted by the opponent model. The opponent model accounted for 45.35% of the total variance, which is notably less than that of the three-channel model. Thus, the profile of aftereffects that we observe in our sample supports a three-channel model over an opponent model.
Why do opponent and three-channel coding systems predict different effects of adaptation to averted gaze? This is not easy to intuit and, again, results from the simultaneous effect of adaptation on the driving and normalizing components of the multichannel response (i.e., the numerator and denominator, plotted in Figure 9 for the opponent model). As displayed in Figures 3 and 8, adaptation to averted gaze tends to affect the sensitivity of one channel much more so than the other channels. In a model that incorporates normalization, the strongest effects of adaptation will occur for test stimuli that engage channels that are differentially affected by the adaptor. This will tend to occur at the point of overlap between the strongly adapted channel and the relatively unadapted channel(s). For the opponent system, this point of overlap occurs between the left and right channels, leading to peak aftereffects centered near direct gaze. This is the case for either leftward or rightward adaptation. In contrast, in the three-channel system, the relevant point of overlap occurs between the left and direct channels (for leftward adaptation) or between the right and direct channels (for rightward adaptation). This results in peak aftereffects that are located away from direct gaze, toward the point of the adapting stimulus.
Narrowband Multichannel Coding
The three-channel model discussed so far is an example of broadband multichannel coding, with sensory channels tuned broadly across a range of stimulus gaze directions. We focus on this model as the most parsimonious system that accounts for the evidence of a direct sensory channel reported in Calder et al. (2008). However, one might imagine multichannel systems with more than three channels. Thus, we have also simulated perceptual aftereffects in systems with many narrowly tuned sensory channels.
In Figure 10, we display a simulation of 17 sensory channels. The peak sensitivities of these channels are spaced in 5° intervals from −40° to 40°. The sensitivity of each channel was simulated as a raised cosine squared function that responds over a range ±10° from that channel's peak. The multichannel response was computed as the sum of channel activations, each weighted by the gaze direction that the channel is most sensitive to, normalized to the summed activity across channels. The effect of adaptation was modeled as described previously.
As can be seen, the effects of adaptation in a narrowband model tend to produce local odd-symmetric aftereffects around the point of the adapting stimulus (i.e., local repulsion). Moreover, the magnitude of these aftereffects tends to be equal after adaptation to direct and averted gaze. The narrowband models also tend to predict a lack of effect on test stimuli that are the same as the adapting stimulus (i.e., 0° or 25° in our case). Finally, the effect of full adaptation (i.e., the maximum possible adaptation strength) in the 17-channel model results in predicted aftereffects of only ∼3°. These four qualitative features do not fit with our empirical data, in which (i) adaptation to averted gaze is associated with aftereffects of clearly greater magnitude than adaptation to direct gaze, (ii) adaptation to averted gaze is associated with aftereffects that are clearly greater for test stimuli on one side of the adapting stimulus than the other (i.e., they are not odd-symmetric around the adaptor), (iii) we tend to see an effect of adaptation to averted gaze on test stimuli with the same gaze direction as the adapting stimulus, and (iv) after adaptation to averted gaze, we see aftereffects that are consistently stronger than the predicted peak effects of ∼3°.
Thus, the predictions of the narrowband model are not qualitatively consistent with the collected empirical data. This is not to say that a more complex model could not be made to fit the data, but under the basic assumptions that we have made (e.g., regarding how adaptation affects channel sensitivities), our results support the three-channel model over opponent coding or narrowband systems.
This article implemented, for the first time, a computational model of gaze processing, in which the horizontal direction of gaze of a seen face is encoded across a set of gaze-selective sensory channels, incorporating the divisive normalization of sensory responses. We demonstrate that this form of sensory encoding predicts a distinctive pattern of perceptual changes after sensory adaptation that differ across the spectrum of stimulus gaze deviations. We then find that these model predictions closely match psychophysical data recorded in human observers when adapted to leftward, rightward, and direct gaze. We focus on a model with sensory channels tuned broadly to leftward, rightward, and direct gaze and find evidence in favor of this form of sensory coding over opponent or narrowband multichannel systems.
The present findings are consistent with electrophysiological research in the macaque visual cortex, which identified individual cells in the anterior STS tuned to distinct directions of gaze (Perrett et al., 1985, 1990, 1992). Specifically, these studies detected cells that were differentially selective to gaze stimuli that were directed at the participant, averted horizontally by 45° or averted vertically by 45°. Psychophysical analyses of gaze perception and functional neuroimaging of gaze processing have built on these electrophysiological findings by demonstrating the existence of neural mechanisms in humans tuned specifically to leftward and rightward gaze (Calder et al., 2007, 2008; Jenkins et al., 2006; Seyama & Nagayama, 2006). The advance of the present article is in demonstrating a plausible scheme for the functional interaction between such cell populations that results in an encoded angle of gaze.
Calder and colleagues (2008) proposed that gaze direction is represented via multichannel encoding rather than a simpler, opponent coding system, as the operation of at least three sensory channels is most consistent with the effects of sensory adaptation on gaze perception that they observed. However, these previous data were unable to distinguish between a system with three broadly tuned sensory channels and a continuum of more narrowly tuned channels. Narrowband multichannel coding is associated with the representation of spatial frequency and orientation (Suzuki, 2005); for instance, an opponent coding system is associated with the representation of color, at least at the early stages of processing (Webster, 2015). Here, we show that opponent, three-channel, and narrowband multichannel systems predict different profiles of aftereffects after adaptation to averted gaze. We find that three broadly tuned channels cohere most closely with both the psychophysical data reported in this study and that reported by Calder and colleagues (2008).
A key functional mechanism that our findings point to is the operation of divisive normalization in the encoding of gaze direction across sensory channels. Divisive normalization is proposed to be a canonical computational feature of sensory processing in the brain (Carandini & Heeger, 2012), accounting, for instance, for the response rates of neurons in the retina (e.g., adaptation to global light intensity; Normann & Perlman, 1979), V1 (e.g., encoding of position and orientation; Carandini & Heeger, 1994), and higher visual areas (e.g., MT, encoding speed, and direction of motion; Simoncelli & Heeger, 1998). This computation has also been demonstrated outside the visual system, such as in olfactory neurons (Olsen, Bhandawat, & Wilson, 2010) and primary auditory cortex (Rabinowitz, Willmore, Schnupp, & King, 2011). It has recently been highlighted that the effect of adaptation on sensory coding and neural responses is complicated by its potential to act on both driving and suppressive mechanisms, including those that implement divisive normalization (Solomon & Kohn, 2014). In the current article, we demonstrate correspondingly that a fine-grained analysis of perceptual aftereffects can reveal the operation of divisive normalization in sensory coding. In this study, we model encoded gaze direction as being normalized by pooled activity across leftward, rightward, and direct sensory channels. The surprising correspondence between model predictions and the profile of aftereffects observed psychophysically is strong evidence that the neural encoding of gaze direction rests on the divisive normalization of sensory responses.
This is an exciting insight in part because of the recent trend toward computational approaches to neuropsychiatric conditions in which gaze processing is implicated, including autism and schizophrenia (Friston, Stephan, Montague, & Dolan, 2014). Indeed, a very recent proposal is that symptoms across sensory, cognitive, and social domains in autism reflect a widespread reduction of divisive normalization in neural processing, corresponding to an increased ratio of neural excitation to inhibition (Lawson, Friston, & Rees, 2015; Rosenberg et al., 2015). The role of divisive normalization in sensory coding is to control for the influence of extraneous contextual factors on sensory responses, and thus, dysfunction in the neural mechanisms that implement this computation suggests that contextual interference will be more disrupting to sensory coding and behavior. The appeal of computational approaches to psychiatry is the possibility of quantitatively describing information processing mechanisms that underlie mental function, aiding the bridge between the cognitive and behavioral levels of description that currently define the diagnosis of these conditions and the biological mechanisms that would ideally subserve diagnosis and treatment. Atypical gaze-based behaviors are a cardinal feature of autism (Pell et al., 2016; Lai, Lombardo, & Baron-Cohen, 2014; American Psychiatric Association, 2013), and thus, the present article provides a foundation for directly testing this computational approach to autism in a domain central to the core diagnostic symptoms.
It is a remarkable fact that the direction in which we perceive a face to be looking can be so markedly affected by the recent history of sensory stimulation. This phenomenon of sensory adaptation to eye gaze direction was first reported in 2006 (Jenkins et al., 2006; Seyama & Nagayama, 2006) and has since been demonstrated for other cues to social attention, namely, head and body rotation (Lawson & Calder, 2016; Lawson, Clifford, & Calder, 2009, 2011; Fang & He, 2005). Here, we report effects of adaptation on a continuous measure of perceived gaze direction, demonstrating that adaptation is not limited to an effect on categorical judgments (as used in all previous studies of adaptation to social cues). We find that these quantitative effects can be plausibly modeled as habituation of the sensory channels encoding gaze direction, consistent with the interpretation that this phenomenon is perceptual in nature, rather than reflecting a change in a more cognitive factor like the response criterion used by participants when making categorical judgments about gaze direction.
Adaptation to direct gaze has been reported to increase the tendency to categorize slightly averted gaze directions as being averted rather than direct (Calder et al., 2008). In this study, however, we do not observe clear evidence for an effect of adaptation to direct gaze on perceived gaze direction (Figure 4, center). This may be a matter of sensitivity: The three-channel model predicts that aftereffects after adaptation to direct gaze are notably smaller in magnitude than those induced by adaptation to averted gaze (Figure 3), and this is consistent with the subtle effects of direct gaze adaptation reported previously. Alternatively, however, the effect of the adaptation protocol on gaze judgments could in principle reflect something other than pure adaptation effects (e.g., an effect on the criteria for what constitutes “direct” gaze in categorical gaze judgments). Such effects may not be apparent in the present paradigm due to the use of a continuous measure of perceived gaze direction. Together, our modeling analyses and empirical data suggest that the effects of adaptation to averted gaze are more robustly diagnostic of the underlying channel structure.
A natural hypothesis is that the encoding of gaze direction that we model here is implemented in the anterior STS. The evidence that this region is central to encoding others' direction of gaze converges across experimental approaches (Carlin & Calder, 2013; Langton, 2010). Consistent with the electrophysiological research reviewed earlier, monkeys with bilateral ablation of the STS demonstrate impaired gaze processing, without showing pronounced impairments in face processing more generally (Heywood & Cowey, 1992; Campbell, Heywood, Cowey, Regard, & Landis, 1990). Accordingly, increased activation in STS regions is consistently observed during functional neuroimaging of humans viewing gaze stimuli (e.g., Pelphrey, Viola, & McCarthy, 2004; Hoffman & Haxby, 2000; Puce, Allison, Bentin, Gore, & McCarthy, 1998), and the pattern of activity across voxels in anterior STS carries information about the direction of others' gaze, independent of the particular combination of head and eye rotation used to signal this gaze direction (Carlin, Calder, Kriegeskorte, Nili, & Rowe, 2011). Finally, sensory adaptation to gaze direction in humans modulates the BOLD response to face stimuli in the anterior STS region (Calder et al., 2007). There is evidence that this region sends upstream connections to regions such as posterior STS, lateral parietal cortex, and medial pFC, which implement social–cognitive functions that are based on encoded gaze direction, such as joint attention (Carlin & Calder, 2013).
In summary, the present article builds on existing electrophysiological, neuroimaging, and psychophysical data that demonstrate the existence of cell populations in anterior STS that are tuned to specific directions of observed gaze. We implement a functional model of encoded gaze direction across gaze-selective sensory channels, in which a multichannel response is computed based on the difference in activation between channels normalized by the pooled activation across the channel population. Importantly, normalization of the sensory response is associated with a distinctive and surprising set of predictions regarding the effects of sensory adaptation on encoded gaze direction. These predictions closely match the profile of perceptual aftereffects exhibited by human observers after adaptation to leftward, rightward, and direct gaze, when tested across a broad range of gaze stimuli. This indicates a role for divisive normalization in the sensory encoding of perceived gaze direction and further demonstrates that the effects of recent sensory stimulation on encoded gaze direction can be accurately modeled as a reduction in the gain on individual channels that is proportional to how strongly the channel is activated by the prior stimulus. Our results also suggest that gaze is encoded across at least three broadly tuned sensory channels, rather than an opponent channel system or a bank of more narrowly tuned channels. This establishes a credible functional architecture for the encoding of gaze direction, which may be important for identifying a computational basis to the aberrant gaze mechanisms that occur in conditions such as autism, schizophrenia, and social anxiety disorder, particularly in relation to the role of divisive normalization in sensory computations (Rosenberg et al., 2015) and the modulation of neural responses by recent sensory history (Pellicano, Rhodes, & Calder, 2013).
This work is supported by Australian Research Council Discovery Project DP160102239.
Reprint requests should be sent to Colin J. Palmer, School of Psychology, UNSW Sydney, Sydney 2052, New South Wales, Australia, or via e-mail: Colin.Palmer@unsw.edu.au.