In daily life, efficient perceptual categorization of faces occurs in dynamic and highly complex visual environments. Yet the role of selective attention in guiding face categorization has predominantly been studied under sparse and static viewing conditions, with little focus on disentangling the impact of attentional enhancement and suppression. Here we show that attentional enhancement and suppression exert a differential impact on face categorization supported by the left and right hemispheres. We recorded 128-channel EEG while participants viewed a 6-Hz stream of object images (buildings, animals, objects, etc.) with a face image embedded as every fifth image (i.e., OOOOFOOOOFOOOOF…). We isolated face-selective activity by measuring the response at the face presentation frequency (i.e., 6 Hz/5 = 1.2 Hz) under three conditions: Attend Faces, in which participants monitored the sequence for instances of female faces; Attend Objects, in which they responded to instances of guitars; and Baseline, in which they performed an orthogonal task on the central fixation cross. During the orthogonal task, face-specific activity was predominantly centered over the right occipitotemporal region. Actively attending to faces enhanced face-selective activity much more evidently in the left hemisphere than in the right, whereas attending to objects suppressed the face-selective response in both hemispheres to a comparable extent. In addition, the time courses of attentional enhancement and suppression did not overlap. These results suggest the left and right hemispheres support face-selective processing in distinct ways—where the right hemisphere is mandatorily engaged by faces and the left hemisphere is more flexibly recruited to serve current tasks demands.
Throughout our waking moments, the visual system is constantly bombarded by dynamically changing sensory input from the environment. Remarkably, however, perceptual categorization within this overwhelming data stream happens rapidly and accurately. This is particularly true in the case of human faces—which, as objects of high ecological relevance, enjoy a privileged status in our visual system (Jonas & Rossion, 2016; Sergent, Ohta, & Macdonald, 1992). Effective social interaction depends critically on our ability to discriminate faces from a wide range of other perceptual categories (e.g., animals, plants, bodies), a complex and meaningful categorization that the brain achieves between 100 and 200 msec (Crouzet & Thorpe, 2011; Crouzet, Kirchner, & Thorpe, 2010; Rousselet, Mace, & Fabre-Thorpe, 2003). Although indisputably efficient, however, perceptual categorization is far from capacity-free (Schneider & Shiffrin, 1977; Broadbent, 1958). Rather, visual information processing is limited, such that multiple objects present in a scene must compete for neural representation (Kastner & Ungerleider, 2000). Selective attention allows us to cope with this visual competition, prioritizing processing of information relevant to our current behavioral goals. Yet despite the critical role that selective attention plays in guiding behavior in dynamic and complex environments, investigations of selective attention and face processing have often employed sparse and static viewing conditions. Typically, perceptual discriminations in these studies are binary (e.g., faces vs. houses), and stimuli are often spatially and/or temporally isolated (Baldauf & Desimone, 2014; Engell & McCarthy, 2010; Yi, Kelley, Marois, & Chun, 2006; Williams, McGlone, Abbott, & Mattingley, 2005; Lueschow et al., 2004; Holmes, Vuilleumier, & Eimer, 2003; Pessoa, McKenna, Gutierrez, & Ungerleider, 2002; Downing, Liu, & Kanwisher, 2001; Eimer, 2000; Vuilleumier, Armony, Driver, & Dolan, 2001; Wojciulik, Kanwisher, & Driver, 1998; Haxby et al., 1994). In this way, existing studies do not impose the twin constraints that characterise effective generic face categorization in the real world—namely, speed and high categorical diversity, and as such, are limited in what they can tell us about how selective attention modulates face categorization in natural vision.
Not only do extant studies of faces/attention utilize a simplified form of face categorization, they also operationalize selective attention at a relatively coarse level. That is, although selectively attending to a specific visual feature is known to both enhance the neural response to the attended feature and suppress the response to unattended features (Ho, Brown, Abuyo, Ku, & Serences, 2012; Cohen & Maunsell, 2011; Martinez-Trujillo & Treue, 2004; Kastner & Ungerleider, 2000; Treue & Trujillo, 1999), few face processing studies have attempted to disentangle these two attentional components. Instead, the classic approach has been to contrast face processing under maximally and minimally attended conditions by presenting two stimulus types concurrently (e.g., superimposed face and house images) and having participants selectively attend to one category at a time (Baldauf & Desimone, 2014; Engell & McCarthy, 2010; Furey et al., 2006; Yi et al., 2006; Williams et al., 2005; Lueschow et al., 2004; Holmes et al., 2003; Pessoa et al., 2002; Downing et al., 2001; Vuilleumier et al., 2001; Eimer, 2000; Wojciulik et al., 1998; Haxby et al., 1994). Because attending to one category under these conditions necessarily involves actively ignoring the other, this approach can provide no insight into how the enhancement and suppression aspects of attention contribute to the overall effect. To separately characterize these two attentional components, the maximally and minimally attended conditions must be contrasted with a third condition, in which neither category is actively attended to (or ignored), that is, an attentional baseline. Yet where such contrasts are commonplace in attention studies using low-level stimuli (Martinez-Trujillo & Treue, 2004; Treue & Trujillo, 1999), comparatively few face studies have taken this approach (but see Chadick & Gazzaley, 2011; Zanto, Hennigan, Östberg, Clapp, & Gazzaley, 2010; Gazzaley et al., 2008; Gazzaley, Cooney, Rissman, & D'Esposito, 2005).
That most studies of faces index selective attention at a global level has had a direct impact on the nature of the conclusions drawn about the role of this important cognitive mechanism in face perception. One such concerns the similarity in attentional modulation between the left and right face-processing networks. It is now well established that, although normal observers process face information via a bilateral network of specialized regions in occipitotemporal cortex, this response to faces is stronger in the right hemisphere than in the left (Frassle et al., 2016; Jonas & Rossion, 2016; Rossion, Hanseeuw, & Dricot, 2012; Kanwisher, McDermott, & Chun, 1997; Sergent et al., 1992), a pattern of lateralization that emerges very early in life (de Heering & Rossion, 2015). One possible consequence of the right hemisphere's specialization for face perception could be a differential benefit of attentional allocation between the left and right face networks. Specifically, we might predict that processing supported by the less-efficient left hemisphere should be more sensitive to attentional enhancement than that supported by face-dominant right hemisphere.1 Studies that operationalize attention at a global level are unlikely to detect such a nuanced pattern of differential attentional sensitivity, and indeed existing investigations have by and large reported identical (or uncompared) effects of selective attention in the two hemispheres (Baldauf & Desimone, 2014; Müsch et al., 2014; Engell & McCarthy, 2010; Furey et al., 2006; Holmes, Kiss, & Eimer, 2006; Yi et al., 2006; Williams et al., 2005; Lueschow et al., 2004; Holmes et al., 2003; Pessoa et al., 2002; Vuilleumier et al., 2001; Eimer, 2000; O'Craven, Downing, & Kanwisher, 1999; Wojciulik et al., 1998; Haxby et al., 1994).
In this article, we clarify these outstanding issues concerning the role of selective attention in face categorization. Specifically, we sought to separately quantify attentional enhancement and suppression of face categorization that is subject to the same constraints that underlie effective perceptual categorization in the real world—namely, speed and categorical diversity. To this end, we recorded high-density EEG while participants viewed a continuous stream of object images taken from many different perceptual categories (e.g., animals, vehicles, man-made objects, trees). In this so-called fast periodic visual stimulation (FPVS), images appeared at a rapid and strictly periodic rate of exactly 6 Hz (i.e., 6 stimuli/sec), allowing just a single fixation per 167 msec stimulus. Critically, we embedded a second periodicity in the sequence by inserting highly variable face stimuli as every fifth image, giving a face presentation frequency of 6 Hz/5 or 1.2 Hz2 (Figure 1). In line with previous studies using this approach, we expect this stimulation sequence to elicit two specific responses in the EEG spectrum: One at 6 Hz, reflecting visual processing common to both object and face images (referred to here as the common response), and one at 1.2 Hz (i.e., 6 Hz/5), reflecting the differential response to faces as compared with objects (Jacques, Retter, & Rossion, 2016; Retter & Rossion, 2016; Rossion, Torfs, Jacques, & Liu-Shuang, 2015). We refer to this 1.2 Hz response as the face-selective response, as it can only arise if the neural response evoked by each briefly presented face consistently differs from that evoked by the many other object categories appearing in the sequence. In this way, the 1.2-Hz signal captures high-level face-selective processing without the need for conditional subtraction (e.g., face activity − object activity). We compared this 1.2-Hz face-selective response under three conditions of task-based attention: On Baseline trials, participants performed the same orthogonal task used in all previous studies (Retter & Rossion, 2016; Rossion et al., 2015), in which they responded each time the central fixation cross overlaid on the images changed color. On Attend Faces trials, participants covertly monitored the face images in the sequence and responded each time they detected a female face. On Attend Objects trials, they monitored the object images and responded each time they detected a guitar. The probability of presentation for each target type was constant in every sequence (always five of each target type), such that only the focus of participants' task-based attention changed on each trial. Critically, from these conditions we calculated indices of attentional enhancement (Attend Faces − Baseline) and attentional suppression (Attend Objects − Baseline) for occipitotemporal ROIs in both the left and right hemispheres.
To anticipate our results, we found that when observers were engaged in an orthogonal task (i.e., Baseline), natural face images in our sequences activated the right hemisphere much more strongly than the nonpreferred hemisphere. Actively attending to faces enhanced the face-selective response, much more evidently in the left hemisphere than in the right hemisphere. Actively attending to a stimulus category other than faces served to suppress the face-selective response in both hemispheres to the same extent. Interestingly, the temporal dynamics of attentional enhancement and suppression differed across the temporal unfolding of the face-selective response. These results suggest that face-selective regions in the two hemispheres may support face categorization in distinct ways—whereas the right hemisphere may be mandatorily engaged by faces and the left hemisphere appears to be flexibly recruited to serve current tasks demands.
Twenty adults participated in this study in exchange for monetary compensation. Three were excluded due to technical issues during EEG recording, and two were excluded due to low behavioral performance (<80% accuracy in one of the conditions). The final sample consisted of 15 participants (age = 22 ± 2.63 years, seven women). All were right-handed and had normal or corrected-to-normal vision. None reported any history of neurological or psychiatric disease. We obtained written informed consent before testing in accordance with the guidelines set out by the biomedical ethicals committee of the University of Louvain.
The stimulus set consisted of 44 face images in total and 250 images of various nonface objects in total (animals, plants, man-made objects, and houses), all collected from the Internet and used in previous studies (Retter & Rossion, 2016; de Heering & Rossion, 2015; Rossion et al., 2015). Each image was converted to grayscale, resized to 200 × 200 pixels, and equalized in terms of mean pixel luminance and RMS contrast in MATLAB (The MathWorks, Natick, MA). Both faces and objects were left embedded in their original naturalistic background and varied in their size, position, viewpoint, and lighting. Target images were female faces (13 individual exemplars) and guitars (15 individual exemplars). The full image set can be downloaded from our website (face-categorisation-lab.webnode.com/resources/natural-face-stimuli/).
The current design was similar to that reported in previous studies (Rossion et al., 2015). We used PsychToolbox running on MATLAB R2009a to present stimuli at a periodic rate of exactly 6 Hz (i.e., 6 images/sec). Each stimulation cycle lasted 167 msec and began with a uniform gray background from which an image gradually appeared and disappeared as its contrast respectively increased and decreased (0%–100%–0%) (see Figure 1). We used a sinusoidal contrast modulation because it can be described with a single parameter (SOA) and gives a smoother, virtually continuous visual stimulation, with only one frame (8.33 msec) per cycle in which the contrast is at 0% (Movie 1; Rossion et al., 2015; Liu-Shuang, Norcia, & Rossion, 2014). Throughout the sequence, a small black fixation cross overlaid the images. Each 60-sec stimulation sequence consisted of randomly selected object images (without guitars) with a randomly selected male face interleaved as every fifth image. Thus, faces appeared periodically at a frequency of exactly 6 Hz/5 (i.e., 1.2 Hz). Periodic EEG responses at the 1.2-Hz frequency and its harmonics reflect the differential response to faces as compared with objects (i.e., face categorization), whereas responses at the base stimulation frequency of 6 Hz and harmonics reflect visual processing common to all stimuli (for an overview of how harmonic responses arise in the FPVS frequency spectrum, see Retter & Rossion, 2016). On any given sequence, participants performed one of three behavioral tasks (conditions). In the Baseline condition, participants attended to the central fixation cross and were instructed to press the spacebar whenever it changed color from black to red (duration = 200 msec). In the Attend Faces condition, participants monitored the face images in the stimulation sequence and responded whenever they saw a “female” face. We reasoned that this task would increase attention to all faces in the sequence, because to discriminate between male and female faces, an observer must first identify that a stimulus is a face. Finally, in the Attend Objects condition, participants monitored the object images in the stimulation sequence and responded when they saw an image of a “guitar.” Female face targets randomly replaced male faces, and guitar targets randomly replaced another object image. Targets were distributed throughout the whole sequence (time range between consecutive targets = 6.88–15.98 sec) to ensure that observers maintained attention for the entirety of the 60-sec sequences. Importantly, all target types occurred in every sequence the same number of times (i.e., each contained five fixation cross changes, five female faces, and five guitars), ensuring equal probability of each target type on each trial. In this way, only the participant's active task differed from trial to trial, whereas the visual stimulation itself was held constant. There were 4 × 60 sec trials per condition, making for 12 trials in total (total testing time = 15–20 min, including breaks). There were several pseudorandom trial orders assigned to each participant in a counterbalanced order (Figure 1).
The experiment was run in a quiet, low-lit room. Participants sat 80 cm away from an LED monitor (BenQ XL2420T) with a 1920 × 1080 resolution and a 120-Hz refresh rate. A curtain isolated the participant from the experimenter; participant behavior was monitored with a webcam. Stimuli appeared centrally and subtended 3.93° of visual angle. We used the ActiveTwo Biosemi system (Biosemi, Amsterdam, The Netherlands) to acquire high-density 128-channel EEG at a 512 Hz sampling rate. The magnitude of the offset of all electrodes, referenced to the common mode sense, was held below 50 μV. Four additional flat-type active electrodes recorded vertical and horizontal EOG: two above and below the participant's right eye and two lateral to the external canthi.
We calculated RTs relative to target onset and considered responses to be accurate if they occurred between 250 and 1500 msec following target onset. Only accurate responses were taken into account for the RT analysis. We also calculated an inverse efficiency score (correct RT/accuracy) to take into account a speed–accuracy trade-off.
We analyzed the EEG data using open source software (Letswave5 www.nocions.org/letswave/) running in MATLAB R2012b, with similar processing steps and parameters as in previous studies (e.g., Retter & Rossion, 2016; Rossion et al., 2015). We first band-pass filtered the EEG data between 0.1 and 100 Hz using a fourth-order Butterworth filter and downsampled it to 256 Hz for faster processing. We then segmented the continuous EEG trace relative to the starting trigger of each trial, including an additional 2 sec before and after each sequence. We removed blink artifacts using independent components analysis performed with a square mixing matrix (Jung et al., 2000). For each participant, we removed the single component corresponding to blinks based on the visual inspection of the topography and time course. We removed additional artifacts by interpolating bad channels with the three neighboring channels. No more than 5% of channels were interpolated for any given participant (i.e., maximum 6 channels out of 128). We re-referenced the clean data to the average of all scalp channels and averaged each participant's trials by condition. Electrode labels were changed to closely match a more conventional 10/20 system (see Rossion et al., 2015, Figure S2, for exact relabeling).
Frequency Domain Analysis
To avoid spectral leakage, we resegmented the preprocessed EEG data into epochs containing an integer number of cycles of the face presentation frequency (i.e., 1.2 Hz = 0.0833 sec/cycle). We discarded the first and last 2 sec of each trial to remove eye movements and transients related to the abrupt onset and offset of the flickering stimuli. The final cropped epochs were 55.84 sec long and contained 67 face presentation cycles. We subjected these to fast Fourier transformation and extracted the amplitude spectra with a frequency resolution of 0.018 Hz (i.e., 1/55.84).
To establish the presence of significant periodic EEG responses at the relevant stimulation frequencies in both individual participants and at the group level, we pooled the amplitude spectra across all scalp channels and calculated Z scores. The Z scores at a given frequency were computed as the difference in amplitude between that frequency and the mean of the 20 neighboring frequency bins, divided by the standard deviation of the 20 neighboring bins. The 20 neighboring bins represented a frequency range of 0.36 Hz (0.18 Hz on either side) and excluded the two immediately adjacent frequency bin (Retter & Rossion, 2016; Rossion et al., 2015; Rossion, Prieto, Boremanse, Kuefner, & Van Belle, 2012; Srinivasan, Russell, Edelman, & Tononi, 1999). As per previous studies (Jacques et al., 2016), we considered Z scores greater than 3.1 (p < .001, one-tailed, i.e., signal > noise) to be significant.
We quantified the size of periodic EEG responses in two steps. First, we applied a baseline correction to the raw amplitude spectra. For each frequency bin, we subtracted the mean amplitude of the 20 surrounding frequency bins (again excluding the two immediately adjacent bins) from the amplitude at that frequency (Jacques et al., 2016; Retter & Rossion, 2016). This enabled us to quantify the magnitude of the response at each individual relevant harmonic frequency.3 Second, we assessed responses at the global level by summing the baseline-corrected amplitudes across the relevant frequencies (Retter & Rossion, 2016). We calculated a face-selective response by summing the response at the first eight harmonics of the face presentation frequency, excluding the fifth harmonic, which was actually the 6-Hz base stimulation frequency (i.e., 1.2, 2.4, 3.6, 4.8, 7.2, 8.4, 9.6, and 10.8 Hz). We selected these harmonics of 1.2 Hz as they were the most consistently present in all participants. We also calculated a common response by summing the response amplitudes across the first three harmonics of the base stimulation frequency (i.e., 6, 12, and 18 Hz; note that responses at harmonics above the 20-Hz frequency range were largely decreased).
Following inspection of scalp topographies and on the basis of several of our previous studies that have used the exact same stimulation parameters (Jacques et al., 2016; Rossion et al., 2015), we defined one occipitotemporal ROI for each hemisphere: Right channels were P8, P10, PO8, PO10, and PO12, and left channels were P7, P9, PO7, PO9, and PO11. Given that studies using this paradigm often identify a few observers whose face-selective response is left-lateralized (roughly two to three participants out of 16, e.g., Retter & Rossion, 2016), we took account of individual lateralization patterns by identifying each participant's preferred and nonpreferred hemisphere, where preferred refers to the hemisphere with the strongest face-selective response. We then computed indices of attentional enhancement (Attend Faces − Baseline) and attentional suppression (Baseline − Attend Objects) within each ROI. We carried out statistical analyses using repeated-measures ANOVAs with Greenhouse–Geisser corrections applied to degrees of freedom whenever the assumption of sphericity was violated. We used pairwise t tests for post hoc comparisons, and unless specified otherwise, all p values were two-tailed. We used a Bonferroni correction to control for multiple comparisons where necessary.
Time Domain Analysis
We also inspected periodic EEG responses in the time domain (Jacques et al., 2016; Rossion et al., 2015). Here we low-pass filtered the re-referenced data with a 30-Hz cutoff (fourth-order zero-phase Butterworth filter) and cropped each sequence to be an integer number of cycles of the face presentation frequency (0–58 sec, 14,722 bins = 69 face presentation cycles). To remove aspects of the data common to both face and object processing, we applied a multinotch filter (width = 0.1 Hz) that encompassed five harmonics of the base stimulation frequency (i.e., 6–36 Hz). We then averaged sequences by condition, and starting from 3 sec after the start of the sequence, segmented smaller epochs containing five stimulation cycles (≈832 msec), corresponding to four objects and one face presentation (OFOOO). After averaging across these resulting smaller epochs, we performed a baseline correction relative to the first object cycle (−167 to 0 msec). As a final step, we calculated indices of attentional enhancement (Attend Faces − Baseline) and attentional suppression (Baseline − Attend Objects) within (i) the ROIs from the frequency domain analysis (standard ROIs) and (ii) larger ROIs including more dorsal and medial occipital channels selected based on visual inspection of time domain components (referred to here as wide ROIs). We identified statistically significant time points of enhancement and suppression by asking whether the 99% confidence interval at each time point excluded zero. Only clusters containing minimum five consecutively significant time points (∼19 msec) between 100 and 600 msec poststimulus were considered.
We inspected the phase of the individual harmonics of the face-selective response in MATLAB using the Circular Statistics Toolbox (Berens, 2009). For each participant, we averaged the complex values output of the fast Fourier transformation within each ROI and then grand-averaged across participants to plot the mean amplitude and phase of the relevant face-selective response harmonics. Because the first three harmonics were the strongest and because phase estimation depends on signal strength, we focused on this subset of frequencies (i.e., 1.2, 2.4, and 3.6 Hz) to examine phase shifts across attentional conditions. For each harmonic, we computed the difference in phase between the Attend Faces versus Baseline conditions (attentional enhancement) and the Baseline versus Attend Objects conditions (attentional suppression) for each participant. These scores were then averaged with the circ_mean function and converted to degrees. We used the same function to calculate the grand-averaged phase shift across harmonics.
Behavioral performance is summarized in Figure 2. Repeated-measures ANOVAs with Condition (Baseline, Attend Faces, Attend Objects) as a within-subject factor showed a significant difference between conditions in terms of RT, F(1.45, 20.28) = 8.61, p < .004, partial η2 = 0.38, and accuracy, F(1.44, 20.18) = 5.37, p < .02, partial η2 = 0.28. In both cases, there was no significant difference between the Attend Faces and Attend Objects conditions: RT, t(14) = 0.33, p = 1; accuracy, t(14) = −0.34, p = 1. Although accuracy in the Baseline condition was at ceiling relative to the Attend Faces and Attend Objects conditions (Baseline vs. Attend Faces: t(14) = 2.94, p < .03; Baseline vs. Attend Objects: t(14) = 3.95, p < .003), RT was also significantly higher in this condition compared with the other two (Baseline vs. Attend Faces: t(14) = 3.86, p < .006; Baseline vs. Attend Objects: t(14) = 2.93, p < .03). Hence, these differences were driven by a speed–accuracy trade-off in the Baseline condition, as demonstrated by the lack of difference between conditions when using inverse efficiency scores, F(2, 28) = 0.37, p = 069, partial η2 = 0.03. As such, there was no evidence that participants' behavioral performance varied meaningfully across conditions.
Periodic EEG Responses: Frequency Domain
There were large peaks at the frequencies of base stimulation (6 Hz) and face presentation (1.2 Hz), as well as at the harmonics (i.e., integer multiples) of these frequencies (see Figure 3A). To avoid task- or channel-related biases, we averaged across all conditions and channels before determining the range of frequencies to consider for quantification. At both the group and individual participant level, we observed significant responses at multiple harmonics of the base stimulation frequency and the face presentation frequency. General visual responses were mostly distributed over the first three harmonics (6, 12, and 18 Hz), and face categorization responses were most consistent across participants within the range of the first nine harmonics (1.2–10.8 Hz, excluding 6 Hz). Further analyses therefore concentrated on these frequencies.
Having identified the relevant frequency range, we quantified the overall face categorization response in each condition by summing responses at the first nine harmonics (excluding the fifth harmonic, 6 Hz, which is confounded with the base stimulation frequency). The magnitude of this face-selective response fluctuated across conditions. First, we analyzed the data minimizing spatial bias by considering the scalp-averaged response (128 channels). Grouped this way, the data showed a clear and significant effect of Condition, F(1.16, 16.25) = 30.73, p < .001, as face-selective responses were increased in the Attend Faces condition relative to the Baseline condition, t(14) = 4.22, p < .003 (i.e., enhancement), and decreased in the Attend Objects condition relative to the Baseline condition, t(14) = 6.01, p < .0001 (i.e., suppression). As expected, responses in the Attend Faces condition were also significantly larger than the Attend Objects condition, t(14) = 6.75, p < .0001.
Next, we used an ROI approach by focusing on channels where the face-selective response was maximal. Collapsing across conditions, the face-selective response exhibited a stable bilateral occipitotemporal topography with a clear right hemisphere preference (Figure 3B), with the exception of two participants (S07 and S15) who showed a left-lateralized face-selective response (Figure 4; Table 1). As such, we created two ROIs composed of right (P8, P10, PO8, PO10, PO12) and left (P7, P9, PO7, PO9, PO11) occipitotemporal channels. On average, the response in the right ROI was 21% stronger than the response in the left ROI (30% stronger when the two left-lateralized participants were excluded). When considered as a function of condition, visual inspection of the group level topographies revealed that participants' task modulated the relative contribution of the two hemispheres to the face-selective response. The right hemisphere dominance was most evident in the Baseline condition, both at the group level and for the majority of the individual participants (Figure 4; Table 1). This pattern is in line with previous studies that have used the same orthogonal task in this paradigm (Jacques et al., 2016; Retter & Rossion, 2016; Rossion et al., 2015; see Jonas & Rossion, 2016, for intracerebral recording evidence), as well as the general right hemispheric dominance of face processing (Kanwisher et al., 1997; Sergent et al., 1992). In contrast, actively attending to faces generated a more bilateral response profile (Figure 4).
|Participant||Average All Conditions||Baseline||Attend Faces||Attend Objects|
|Participant||Average All Conditions||Baseline||Attend Faces||Attend Objects|
We extracted the face-selective response in each ROI to statistically test the differential effects of attention within each hemisphere (Figure 5A). Because of interindividual differences in response lateralization, we defined the ROI with the strongest face-selective response (averaging across all conditions) as the preferred hemisphere (the right hemisphere for 13/15 participants) and the contralateral ROI as the nonpreferred hemisphere. A repeated-measures ANOVA with Condition (Baseline, Attend Faces, Attend Objects) and Hemisphere (preferred, nonpreferred) as within-subject factors showed significant main effects of Condition, F(2, 28) = 24.36, p < .001, partial η2 = 0.63, and ROI, F(1, 14) = 20.39, p < .001, partial η2 = 0.59, which were qualified by a significant Condition × Hemisphere interaction, F(2, 28) = 9.21, p < .001, partial η2 = 0.40.4 This interaction was due to the presence of differential attentional modulation in the two hemispheres. In the preferred hemisphere, responses in the Attend Faces and Baseline conditions did not differ, t(14) = −0.12, p = 1, but the response in both these conditions was larger than that in the Attend Objects condition (Attend Objects vs. Baseline: t(14) = 4.61, p < .002; Attend Objects vs. Attend Faces: t(14) = 3.87, p < .01). In the nonpreferred hemisphere, the Baseline, t(14) = 6.70, p < .001, and Attend Faces conditions, t(14) = 7.15, p < .001, also both evoked a larger response than the Attend Objects condition. However, here the response in the Attend Faces condition was larger than the response in the Baseline condition, t(14) = −4.07, p < .007. In other words, when attention was selectively allocated to faces, responses increased relative to baseline over the nonpreferred hemisphere, but not over the preferred hemisphere.
To further test the differential effect of attention in each hemisphere, we computed indices of attentional modulation (Figure 5B). Attentional enhancement (Attend Faces − Baseline) was stronger for the nonpreferred hemisphere than the preferred hemisphere, t(14) = −4.46, p < .001. In the nonpreferred hemisphere, face-selective responses increased by around 40% on average when participants explicitly attended to faces within the stimulation sequence. On the other hand, attentional suppression (Attend Objects − Baseline) did not differ across the two hemispheres, t(14) = −1.58, p = .146. On average, the face-selective response in both the left and right ROI decreased by around 25% when attention was directed toward a category other than faces (i.e., toward objects).
Responses at the base stimulation frequency and its harmonics represent visual processing common to all images, both objects and faces. In this way, this general visual response provides a baseline measure of how well stimuli were perceived across different task conditions (Figure 6). The spatial topography of the base stimulation frequency harmonics was variable, as in previous studies (Jacques et al., 2016; Rossion et al., 2015). Because we were primarily interested in determining whether the common response differed across attentional task conditions, we considered responses summed over the first three harmonics of the base stimulation rate (6, 12, and 18 Hz). We first tested for potential common response differences in the same occipitotemporal ROIs as those used in the analysis of the face-selective response. A repeated-measures ANOVA with condition (Baseline, Attend Faces, Attend Objects) and hemisphere (preferred, nonpreferred) as within-subject factors did not yield any significant main effects of condition, F(1.37, 19.16) = 1.68, p = .21, partial η2 = 0.11, or of hemisphere, F(1, 14) = 4.1, p = .06, partial η2 = 0.23, nor were there any significant Condition × Hemisphere interactions, F(1.31, 18.33) = 0.76, p = .43, partial η2 = 0.05. However, given that in contrast to the face-selective response, the common response was located over more (right) dorsal and medial occipital channels, we also ran a second analysis within a medial occipital ROI defined specifically for the common response (PPO6, PO8, PO10, POO6, O2, POI2, Oz, Oiz). This additional analysis also gave no evidence that common response amplitudes were modulated by attentional task, F(1.37, 19.19) = 3.0, p = .088, partial η2 = 0.18. Hence, stimuli appeared to be equally well perceived in all conditions, that is, regardless of the attentional task.
Periodic EEG Responses: Time Domain
Next, we sought to examine the temporal dynamics of attentional modulation by analysing the face-selective response in the time domain. We removed the base stimulation frequency (6 Hz and harmonics) from these data using notch filtering, such that the resulting waveforms isolate the differential responses to faces relative to objects (Retter & Rossion, 2016). Importantly, the latency of all observed components should be interpreted taking into account the sinusoidal contrast modulation. The epochs here were cropped relative to face onset, corresponding to the start of the sine cycle at 0% contrast, where the face is invisible. Taking 30% contrast as a reference point for when the faces became visible to observers, the “true” latencies of the current time domain components are shifted ∼30 msec earlier in time (i.e., a sinusoidal stimulation at 6 Hz with a 120-Hz screen refresh rate reaches 30% contrast around ∼30 msec poststimulus onset; see Retter & Rossion, 2016, for comparison between sinewave and squarewave stimulation).
The waveform of the face-selective response contained multiple components (Figure 7) similar to those we have described in previous studies (Jacques et al., 2016; Retter & Rossion, 2016; Rossion et al., 2015). A first positivity (“P1-face,” peaking at ∼170 msec) arose over medial and lateral occipital channels, followed by a large negativity (“N1-face,” ∼245 msec) with a bilateral distribution. This negativity was prolonged when observers explicitly attended to faces (second peak at ∼337 msec). Finally, a large positivity (“P2-face,” ∼427 msec) was apparent over bilateral occipitotemporal channels, but more ventral to the first negativity.
To compute indices of attentional enhancement and suppression, we calculated conditional difference waves (i.e., Attend Faces − Baseline; Baseline −Figure 8). We observed consistent differences between the time course of attentional enhancement and suppression in both types of ROI analyses. Attentional enhancement was present during an early time window (150–200 msec), corresponding to the “P1-Face” component and between 300 and 400 msec during the prolongation of the “N1-Face” component. By contrast, attentional suppression occurred between 200 and 300 msec during the “N1-Face” component and later at 400–500 msec during the “P2-Face” time window. Hence, these two opposing attentional modulations unfolded over largely nonoverlapping time windows in both the preferred and nonpreferred hemispheres.
One aspect of these analyses that may be surprising at first is the apparent discrepancy between the frequency domain and time domain analyses in terms of attentional enhancement in the preferred hemisphere. Specifically, where enhancement in this hemisphere was evident in the time domain difference waveforms (i.e., Attend Faces − Baseline; see Figure 8), there was no evidence of enhancement in the frequency domain quantification (see Figure 5B). This divergence is underpinned by the nature of the response quantification in each analysis. According to the Fourier theory, any signal in time can be reconstructed with a sum of sinusoids at different frequencies, amplitudes, and phases. For our frequency domain analysis, responses were quantified as the sum of amplitudes, discarding the phase information in the process. In contrast, phase information is preserved in the time domain analysis. Two response waveforms can be described by the same frequencies at the same amplitudes, but their respective phases will determine the shape of their waveforms and therefore any potential differences in temporal dynamics (Figure 9). Hence, as illustrated in Figure 10, although the amplitude of each harmonic of the face presentation frequency was similar between the Baseline and Attend Faces conditions in the preferred hemisphere, there was a large (±20°) phase shift on the first three harmonics between these two conditions (see Table 2). The observed amplitude enhancement of the time domain response in the Attend Faces condition is likely driven by this phase shift. Conversely, in the nonpreferred hemisphere, there was both a phase shift and an amplitude difference between the Baseline and Attend Faces conditions, making it easier to detect a difference regardless of whether phase information is taken into account (i.e., in both the frequency and time domains). Interestingly, we note that attending to a category other than faces appears to reduce the face-selective response amplitude but does not modify its phase (i.e., the shape of response waveforms are comparable). This again suggests two different processes underlying attending toward faces and attending away from faces. Overall, it appears that explicitly attending to faces does impact face-selective processing in the preferred hemisphere, albeit at a much more subtle level that is evident only when the temporal dynamics of attentional modulation are taken into account.
|Preferred Hemisphere||Nonpreferred Hemisphere|
|Attentional Enhancement||Attentional Suppression||Attentional Enhancement||Attentional Suppression|
|1F/5 = 1.2 Hz||−20.44||−3.66||−17.53||−3.29|
|2F/5 = 2.4 Hz||−28.21||8.90||−22.37||4.37|
|3F/5 = 3.6 Hz||−19.97||−9.14||−22.55||−13.87|
|Mean phase shift||−22.87||−1.31||−20.82||−4.26|
|Preferred Hemisphere||Nonpreferred Hemisphere|
|Attentional Enhancement||Attentional Suppression||Attentional Enhancement||Attentional Suppression|
|1F/5 = 1.2 Hz||−20.44||−3.66||−17.53||−3.29|
|2F/5 = 2.4 Hz||−28.21||8.90||−22.37||4.37|
|3F/5 = 3.6 Hz||−19.97||−9.14||−22.55||−13.87|
|Mean phase shift||−22.87||−1.31||−20.82||−4.26|
Efficient perceptual categorization in daily life occurs in dynamic and highly complex visual environments. Yet the role of selective attention in guiding meaningful categorization has predominantly been studied under sparse and static viewing conditions. Here we asked how task-based attention modulates face categorization that is characterized by the same temporal and complexity constraints typical of effective perceptual categorization in real-world vision. We provide the first evidence that attentional enhancement and suppression exert a differential impact on face processing supported by the left and right hemispheres. Relative to an attentional baseline, actively attending to faces enhances the face-selective neural response much more evidently in the left hemisphere than in the right, whereas attending to a stimulus category other than faces suppresses the face-selective response in both hemispheres to an equal extent.
Task-based Attentional Modulation of Face Categorization
Hemispheric Differences in Attentional Sensitivity
Our results in the frequency domain indicate that categorization of highly variable face images in a rapid, dynamic visual stream is efficient and robust, unfolding in under 167 msec over predominantly right occipitotemporal regions. Although a perceptual discrimination response for faces versus objects was evident under all conditions of task-based attention, this response was nevertheless still sensitive to attentional modulation. Critically, however, this attentional effect differed between the two hemispheres. When faces and objects were equally task-irrelevant (i.e., in the orthogonal task condition), the face-selective response was ∼20–30% stronger in the right ROI compared with the left ROI. This right hemispheric dominance is consistent with previous studies that have shown task-irrelevant faces in dynamic visual streams preferentially engage the right occipitotemporal region (Jacques et al., 2016; Retter & Rossion, 2016; de Heering & Rossion, 2015; Rossion et al., 2015). Actively attending to faces in the sequence enhanced the face-selective response much more prominently in the left (nonpreferred) hemisphere (43% increase) than in the right (preferred) hemisphere (3%). In contrast, selectively attending to objects (i.e., ignoring faces) produced an attentional suppression effect that was largely comparable in the two hemispheres (∼24% reduction). Taken together, these frequency domain results suggest that selective attention exerts a differential influence on face categorization processes supported by the left and right face perception networks. These findings are in line with our prediction that face processing supported by the right occipitotemporal region—the functional core of the face perception network (Jonas & Rossion, 2016; de Heering & Rossion, 2015; Zhen et al., 2015; Rossion, Prieto, et al., 2012; Sergent & Signoret, 1992; Sergent et al., 1992; Michel, Poncet, & Signoret, 1989)—should be comparatively robust to attentional enhancement. To the best of our knowledge, this study provides the first empirical demonstration of a clear hemispheric difference in attentional sensitivity in face categorization.
Distinct Time Courses of Attentional Enhancement and Suppression
An interesting and unexpected aspect of our results concerns the differing time course of attentional enhancement and suppression of face-selective processing. In the time domain, attentional enhancement was reflected in an increased amplitude for the P1-face (100–200 msec) and in a prolongation of the N1-face (300–400 msec). In contrast, attentional suppression was evident in an amplitude decrease for the N1-face and P2-face (400–500 msec; see Figure 8). To our knowledge, an alternating influence of enhancement and suppression during the unfolding of the face-selective response has not been reported before. Thus far, enhancement and suppression of face-related activity have been documented only in N1 (120–220 msec) latency shifts and P1 (50–150 msec) amplitude differences (Zanto et al., 2010; Gazzaley et al., 2005, 2008). Although this intriguing finding should be interpreted conservatively until it can be replicated, it is nevertheless interesting to consider the theoretical implications of the pattern we report here. Specifically, if a particular face-selective component is consistently robust to attentional enhancement, does this suggest that this aspect of face processing is automatically engaged to its saturation point? Inversely, if some components cannot be suppressed by the focus of task-based attention, what does this say about the compulsory/automatic nature of the face processing carried by these components?
Although at first glance the presence of attentional enhancement in the right (preferred) hemisphere in the time domain might appear to conflict with the absence of such an effect in the frequency domain, results from these two analyses are in fact entirely complementary and reveal different aspects of the same response. Periodic visual presentation of face images at a given frequency elicits a periodic face-selective response in the brain at this same frequency. This face-selective response is a complex waveform composed of multiple components (see Retter & Rossion, 2016). Frequency domain analysis allows us to easily identify and quantify the overall magnitude of the face-selective responses and compare them between conditions, with the drawback of not knowing how exactly waveforms in the two conditions differ. Conversely, time domain analysis provides exactly this information, giving a detailed picture of how responses unfold, but with less power to detect differences that are spread across time. Consequently, it is entirely possible for an effect to be present in one type of analysis and absent in the other (or vice versa). In our case, the combined information from the frequency and time domains suggests that attentional enhancement is stronger overall for the left hemisphere, where it both modulates the temporal dynamics of the response as well as its amplitude, whereas it only impacts response shape in the right hemisphere.
On balance, both the frequency and time domain results suggest a differential sensitivity to attention in the left and right face perception networks, a finding that represents a critical step forward in our understanding of the role of selective attention in face categorization. Moreover, the data here point to the interesting possibility that, at least for face categorization in the right hemisphere, the global attentional effect that has been so frequently reported in the literature is predominantly underpinned by suppression, rather than enhancement.
Advantages of the Current Design
The present design has several important advantages over existing paradigms that enabled us to observe hemispheric differences in attentional sensitivity where others have not. First and most importantly, our design here employed an attentional baseline condition that directly enabled the separate quantification of attentional enhancement and suppression, an approach that remains surprisingly rare in studies of high-level object perception (Chadick & Gazzaley, 2011; Zanto et al., 2010; Gazzaley et al., 2005, 2008). Had we taken the conventional approach of simply contrasting the face-selective response elicited under maximally and minimally attended conditions (e.g., “Attend Face” vs. “Attend House”; Baldauf & Desimone, 2014; Engell & McCarthy, 2010; Sreenivasan, Goldstein, Lustig, Rivas, & Jha, 2009; Yi et al., 2006; Williams et al., 2005; Lueschow et al., 2004; Holmes et al., 2003; Downing et al., 2001; Vuilleumier et al., 2001; O'Craven et al., 1999; Wojciulik et al., 1998), we would have entirely missed this interesting finding. Moreover, the attentional baseline used here was an orthogonal task that actively constrained observers' task-based attention, potentially providing a more stable attentional baseline against which to observe enhancement and suppression effects. This is in contrast to a handful of studies that have used a “passive-viewing” attentional baseline condition, in which participants are at liberty to, deliberately or otherwise, preferentially direct their attention to one stimulus category over the other (cf. Chadick & Gazzaley, 2011; Zanto et al., 2010; Gazzaley et al., 2005, 2008). Given that faces are thought to capture attentional resources more than other stimulus categories (for a discussion, see Palermo & Rhodes, 2007), participants in these studies may well have attended preferentially to the faces presented during passive viewing.
Second, the paradigm here indexes face categorization imposes the twin constraints that characterize effective perceptual categorization in the real world, that is, speed and high categorical diversity (Crouzet & Thorpe, 2011; Crouzet et al., 2010; Rousselet et al., 2003). Our dynamic rapid display (e.g., over 360 images/min) places considerable processing strain on the visual system, which may have helped to pull face processing “off the ceiling,” making it easier in turn to detect attentional benefits (Lavie, 2005). Importantly, this paradigm also minimizes the contribution of low-level differences to perceptual categorization, enabling us to target attentional modulation of high-level face categorization processes (Gao, Gentile, & Rossion, 2017; Rossion et al., 2015). By presenting observers with a large number of unsegmented images that vary widely in composition, lighting, viewing angle, and so forth, the face-selective response necessarily reflects both successful discrimination of faces from the many other object types, as well as successful generalization across multiple varied face exemplars. In this way, we are able to objectively quantify attentional modulation of high level face categorization processes (projected to 1.2 Hz and harmonics) in isolation from more general visual processing common to both faces and objects (projected to 6 Hz and harmonics). Attentional modulations may be more readily detected in the context of such a truly face-selective response, rather than activity that is simply face-related (e.g., responses elicited by faces), as is common in studies using standard EEG or fMRI approaches.
Given how the experimental framework of the current study differs from previous studies, it is important to consider whether our main results and conclusions may be explained by solely design-related factors. One concern is that the observed patterns of hemispheric lateralization are driven by the stimulation method itself, that is, by presenting a fast stream of natural images. However, there is strong evidence against the possibility that fast periodic presentation of any image type should give rise to a right-lateralized response profile. Indeed, not only is the right lateralization of the periodic responses to faces in the baseline condition consistent with the well-documented specialization of this hemisphere for face processing (see above), but we also have evidence that such periodic responses and their lateralization inherently represent functionally selective perceptual processing. More precisely, the periodic response to faces at 1.2 Hz can only arise from the detection of face images among object images and therefore reflects selective visual processing. Other visual category contrasts measured with this approach elicit vastly different response topographies. For instance, presenting words among letter strings (Lochy, Van Belle, & Rossion, 2015) elicits a left-lateralized periodic response at the word frequency that is consistent with the recruitment of the left hemisphere for specialized language processing. In another example, responses to faces, houses, or body parts among a stream of other objects leads to distinct response topographies despite identical presentation frequencies, with only faces leading to a significant right lateralization (Jacques et al., 2016). Hence, in this study, we have good reason to assert that the periodic face-selective responses in each condition directly relate to how the specialized face network processes the face stimuli in each case. In other words, the response lateralizations are not spurious but functionally relevant.
A separate concern is whether the hemisphere differences we observed might be an artifact of temporal attention, that is, participants attending to the periodic frequency of stimulation rather than its contents. Regarding this, we have recently demonstrated that temporal expectation does not influence the face-selective response (Quek & Rossion, 2017). However, if we consider that participants were indeed exploiting periodicity to complete the Attend Faces and Attend Objects tasks, the image presentation rate (6 Hz) would in fact be a more salient temporal cue than the embedded face presentation rate. In this case, both attentional conditions would be equally aided by periodicity, because participants would use the 6-Hz “beat” to focus attention. Alternatively, had participants been able to attend to the frequency specific to each task (e.g., 1.2 Hz for attending to faces and 6 Hz for attending to objects), there should have been similar attentional enhancement of the left hemisphere for both frequencies. In other words, we should have observed an increase of the face-selective response over the left hemisphere when participants were attending to faces and an increase of the common response when participants were attending to objects. However, there was no such response profile on the common response. Given that the ultimate goal here was to investigate task-based attention, we would argue that the current findings remain relevant on how attention differentially modulates face processing across hemispheres, regardless of the exact mechanism by which attention was selectively engaged to the task-relevant stimuli.
A final question is whether directing participants' attention specifically to face gender could have somehow driven the pronounced attentional enhancement effect over the left hemisphere. Yet such a task-specific effect would seem unlikely given that the existing literature does not suggest face gender itself is processed in the left hemisphere (Wiese, Schweinberger, & Neumann, 2008; Sergent et al., 1992; Sergent & Corballis, 1989). Note that although some reports suggest an interaction between response lateralization and participant gender (e.g., Lovén, Svärd, Ebner, Herlitz, & Fischer, 2013), our pattern of results was consistent across individual participants regardless of their gender (see Figure 4). Still another possibility might be that the gender task we employ depends on local feature processing (Yamaguchi, Hirukawa, & Kanazawa, 2013; Dupuis-Roy, Fortin, Fiset, & Gosselin, 2009; Brown & Perrett, 1993) and that this local processing drives the left hemisphere recruitment in the Attend Faces condition (Bourne, Vladeanu, & Hole, 2009; Hillger & Koenig, 1991; Parkin & Williamson, 1987). However, several important factors undermine this argument. First and most importantly, our face stimuli were grayscale and highly variable in their lighting, pose, size, position, and external facial features (e.g., ears, hair, accessories, …). In the absence of color cues or systematic overlap of features, local details would not be efficient diagnostic cues for gender. For example, participants cannot reliably monitor the mouth of each face because the position of the mouth changes across each face presentation. Moreover, the short image presentation duration (SOA = 167 msec) prevented participants from making multiple saccades across the faces to inspect individual features. As such, regardless of exactly how participants completed the face gender discrimination task (which they did with high accuracy), it is highly unlikely they relied on local processing to do so. In summary, we believe that the current findings are neither artifacts of the experimental design nor the specific task used but rather reflect functionally relevant hemispheric differences in face categorization.
Conclusion and Future Research
Selective attention guides behavior in dynamic and complex visual environments, yet its role in face categorization has not yet been examined under conditions that enforce the strong processing constraints that characterize effective perceptual categorization in the real world. Using an original dynamic visual stimulation approach, we uncovered the hitherto unknown finding that selective attention influences face categorization in the left and right face perception networks differently. Where the right hemisphere is mandatorily activated by faces and benefits little from the allocation of attention, the left hemisphere appears to be flexibly recruited to serve current task demands. In addition, we show that attentional enhancement and suppression occur over distinct time windows during the face-selective response. An outstanding question is whether this pattern of differential attentional sensitivity across hemispheres extends to other high-level object categories—for instance, the opposite pattern (i.e., greater attentional enhancement of right hemisphere responses compared with left) might be predicted for word stimuli, the processing of which is left-lateralized. Similarly, for a visual category with bilateral responses, such as objects, we might expect the attentional enhancement effect to be equally distributed across hemispheres.
The authors thank Talia Retter for assistance with data acquisition and three anonymous reviewers for their helpful comments on a previous version of this article. This work was supported by a cofunded initiative by the University of Louvain and the Marie Curie Actions of the European Commission awarded to G. L. Q. (grant F211800012), an FRSM-FNRS grant awarded to D. N. (grant 3.4601.12), a European Research Council grant awarded to B. R. (grant facessvep 284025), and an FSR-FNRS postdoctoral grant awarded to J. L. S. (grant FC 91608).
Reprint requests should be sent to Genevieve Quek, Psychological Sciences Research Institute and Institute of Neuroscience, University of Louvain, 10 Place du Cardinal Mercier, Louvain-la-Neuve, 1348, Belgium, or via e-mail: email@example.com.
Note that we are not the first to employ a frequency-tagging approach to the study of selective attention; however, this work has focused almost exclusively on low-level stimuli and simple features (Norcia, Appelbaum, Ales, Cottereau, & Rossion, 2015; Wang, Clementz, & Keil, 2007; Müller et al., 2006; Keil, Moratti, Sabatinelli, Bradley, & Lang, 2005; Chen, Seth, Gally, & Edelman, 2003; Müller & Hübner, 2002; Müller et al., 1998; Morgan, Hansen, & Hillyard, 1996).
Note that the same analysis comparing the left and right hemispheres directly yielded identical results: A repeated-measures ANOVA with Condition (Baseline, Attend Faces, Attend Objects) and Hemisphere (right, left) as within-subject factors showed significant main effects of Condition, F(2, 28) = 24.36, p < .001, partial η2 = 0.63, and Hemisphere, F(1, 14) = 8.82, p < .01, partial η2 = 0.39, as well as a significant Condition × Hemisphere interaction, F(2, 28) = 7.83, p < .002, partial η2 = 0.36. Within the left hemisphere, planned pairwise comparisons indicate that all conditions differed significantly (ps = .001–.005), with the largest responses occurring in the Attend Faces condition. However, in the right hemisphere, response amplitudes were similar in the Baseline and Attend Faces conditions, t(14) = 0.25, p = 1, and both were larger than in the Attend Objects condition (ps = .001–.006).
Shared first authorship.