Perceptual expectations can change how a visual stimulus is perceived. Recent studies have shown mixed results in terms of whether expectations modulate sensory representations. Here, we used a statistical learning paradigm to study the temporal characteristics of perceptual expectations. We presented participants with pairs of object images organized in a predictive manner and then recorded their brain activity with magnetoencephalography while they viewed expected and unexpected image pairs on the subsequent day. We observed stronger alpha-band (7–14 Hz) activity in response to unexpected compared with expected object images. Specifically, the alpha-band modulation occurred as early as the onset of the stimuli and was most pronounced in left occipito-temporal cortex. Given that the differential response to expected versus unexpected stimuli occurred in sensory regions early in time, our results suggest that expectations modulate perceptual decision-making by changing the sensory response elicited by the stimuli.
Perception can be understood as a process of probabilistic inference, in which the brain actively generates predictions and compares them with the sensory input it receives (de Lange, Heilbron, & Kok, 2018; Friston, 2005). When sensory input is inconsistent with the current predictions, the system generates a prediction error signal, which is fed forward to update predictions. When sensory input is consistent with current predictions, prediction errors are minimal, resulting in an attenuated sensory response (Summerfield & de Lange, 2014; Alink, Schwiedrzik, Kohler, Singer, & Muckli, 2010; Murray, Kersten, Olshausen, Schrater, & Woods, 2002).
On the implementation level of predictive perception, it has been proposed that low- and high-frequency oscillations may carry feedback predictions and feedforward prediction errors, respectively (Michalareas et al., 2016; Bastos et al., 2012). This proposal is supported by neurophysiological evidence showing that oscillations in the alpha (7–14 Hz) and gamma (40–90 Hz) range propagate in a feedback and feedforward fashion, respectively (Bonaiuto et al., 2018; van Kerkoerle et al., 2014). In addition, many studies have shown that these oscillatory signals correlate with behavior at different stages of perceptual decision-making. Oscillatory alpha-band activity has been shown to index the allocation of attentional resources (Haegens, Händel, & Jensen, 2011; Thut, Nietzel, Brandt, & Pascual-Leone, 2006; Worden, Foxe, Wang, & Simpson, 2000), the employment of sensory expectations (Mayer, Schwiedrzik, Wibral, Singer, & Melloni, 2016; Sherman, Kanai, Seth, & VanRullen, 2016), and other top–down cognitive control functions (for a review, see Sadaghiani & Kleinschmidt, 2016). Generally speaking, alpha power decreases with task engagement and increases when an area is disengaged (Jensen, Bonnefond, & VanRullen, 2012; Haegens, Nácher, Luna, Romo, & Jensen, 2011). Gamma band activity, on the other hand, has been shown to increase when sensory expectations are violated (Bauer, Stenner, Friston, & Dolan, 2014; Arnal, Wyart, & Giraud, 2011), consistent with the idea that prediction errors may be fed forward to update the system's predictions.
In addition to its implementation, recent studies have also focused on how predictions (or perceptual expectations) are employed by the brain. Some researchers have suggested that expectations facilitate perception by enhancing the sensory representation per se. When a stimulus is expected, the sensory representation is more precise (Kok, Jehee, & de Lange, 2012) and participants' sensitivity to the signal is enhanced compared with an unexpected stimulus (Cheadle, Egner, Wyart, Wu, & Summerfield, 2015; Wyart, Nobre, & Summerfield, 2012). However, others have suggested that expectations facilitate perception by shifting the decision criterion without changing the sensory representation. For example, Bang and Rahnev (2017) compared conditions where the expectation cues were provided either before or after the stimulus presentation and found no sensitivity (d′) difference between these conditions. Another recent study used EEG to track neural markers of sensory processing and found no expectation modulations on sensory processing, although expectations significantly modulated behavioral RTs (Rungratsameetaweemana, Itthipuripat, Salazar, & Serences, 2018). Taken together, it is still debated which stage of the perceptual decision-making process is modulated by prior expectations.
Statistical learning is a commonly used paradigm to study predictive perception in both human (Richter & de Lange, 2019; Manahova, Mostert, Kok, Schoffelen, & de Lange, 2018; Richter, Ekman, & de Lange, 2018; Turk-Browne, Scholl, Johnson, & Chun, 2010) and nonhuman (Ramachandran, Meyer, & Olson, 2017; Meyer, Ramachandran, & Olson, 2014; Meyer & Olson, 2011) primates. Instead of providing explicit expectation cues and instructing participants to expect upcoming stimuli, expectations in statistical learning paradigm are learned and employed in a more automatic and implicit manner (Batterink, Reber, Neville, & Paller, 2015; Kim, Seitz, Feenstra, & Shams, 2009). Modulatory effects of expectations that are based on learned statistical regularities are therefore less likely to be confounded by top–down goal-directed attention, making it a useful paradigm to study predictive perception. Studies using this paradigm have consistently shown that participants respond faster and more accurately when objects are presented in an expected order (i.e., following the same arrangement as during prior exposure), compared with when objects are presented in an unexpected order. Moreover, the neural response in object-selective cortex is typically stronger for unexpected, compared with expected, object stimuli (Richter & de Lange, 2019; Kaposvari, Kumar, & Vogels, 2018; Manahova et al., 2018; Richter et al., 2018; Ramachandran et al., 2017; Meyer & Olson, 2011). This phenomenon—that unexpected stimuli elicited a stronger neural response—is consistent with electrophysiological studies showing that unexpected stimuli lead to an enhanced response amplitude of ERP components such as the N170 (Robinson, Breakspear, Young, & Johnston, 2018; Johnston et al., 2017) and the visual mismatch negativity (see Stefanics, Kremláček, & Czigler, 2014, for a review).
The current study aimed to address whether expectations modulate sensory processing. We used a statistical learning paradigm to characterize the temporal profile of the brain's response to expected and unexpected object images. More specifically, we characterized the strength of (phase-locked) ERFs and (non-phase-locked) neural oscillations to explore their modulation in predictive perception. To preview, although we observed no differences in ERFs between the different expectation conditions, we found stronger low-frequency oscillatory activity over occipito-temporal cortex in response to an image when it was unexpected than when it was expected. The temporal and spatial profiles of the observed activity differences suggest that expectations derived from statistical regularities modulate perception at an early sensory processing stage.
All data and code used for stimulus presentation and analysis are available online at the Donders Repository at hdl.handle.net/11633/aacvnzlx.
Our target sample size was a priori set to 34, providing us with 80% power to detect two-sided experimental effects that had at least a medium effect size (Cohen's d > 0.5). Power analysis was conducted with G*Power (Faul, Erdfelder, Lang, & Buchner, 2007). Thirty-nine healthy adult participants were recruited online via the SONA system. All participants reported normal (or corrected-to-normal) vision. Five participants were excluded from analysis because of technical errors during data recording or dropout after the first session, resulting in the planned sample size of 34 participants (21 women; mean age = 23.4 years, SD = 3.1 years) in the reported analysis. The study was approved by the local ethics committee (CMO Arnhem-Nijmegen). All participants gave informed consent before the experiment and received monetary compensation for their participation.
A set of 80 object images of electronic and nonelectronic items were chosen from an image database (cvcl.mit.edu/MM/uniqueObjects.html; Brady, Konkle, Alvarez, & Oliva, 2008) as stimuli. For each participant, we randomly selected nine images from the set as leading images and six images as trailing images (three were of electronic items, and the other three were of nonelectronic items). Color images were presented on a gray (RGB = [128, 128, 128]) background, spanning approximately 5° × 5° (degrees) of visual angle (dva) on the screen. A bull's eye (outer black ring = 0.5° × 0.5° dva, innermost black dot = 0.25° × 0.25° dva) was used as fixation and presented throughout the run. The stimuli were displayed on an LCD screen during behavioral training and on a semitranslucent screen (1920 × 1080 pixel resolution, 120-Hz refresh rate) back-projected by a PROpixx projector (VPixx Technologies) during magnetoencephalography (MEG) recordings. The experiment was programmed with PsychtoolBox (Brainard, 1997) in MATLAB (The Mathworks, Inc.).
Participants reported to the laboratory on 2 consecutive days, for a training session on Day 1 and MEG recording session on Day 2. The training session on Day 1 served to familiarize participants with the task and the predictive relationship between the stimuli. Participants completed eight blocks of 92 trials of the main task during training. Participants were tested in the MEG on Day 2, during which they performed six blocks of the main task and six blocks of the functional localizer task while their brain activity was recorded. They completed a 20-min behavioral test after the MEG recording, during which they performed the categorization task and the questionnaire. Depending on the availability of participants' T1-weighted anatomical scans in our institute's database, some of the participants returned for a third session, during which we obtained their T1-weighted anatomical scans.
Participants were presented with two object images in quick succession. Each image was presented for 500 msec with a 300-msec ISI in between. An intertrial interval of 900–1000 msec was used (Figure 1A shows schematic of an example trial). Fifteen images were shown in different image pairs throughout the experiment, of which nine always appeared first within any image pair (“leading image”) and the remaining six always appeared second (“trailing image”). The participants' task was to press a key as fast as possible when they saw an image presented upside down, which occurred randomly in 88 (training session) and 90 (MEG session) trials and equally likely for either the leading or trailing image.
The predictive relationships of the 15 images were subtly different in the training and MEG sessions. During training, six of the nine leading images were always followed by the same trailing image (i.e., 100% predictive, referred to as expected pairs), and three of them were followed equally likely by any of the six trailing images (i.e., 16.7% predictive, referred to as neutral pairs). Each expected pair was presented 72 times, and each neutral pair was presented 12 times, resulting in 648 nonoddball trials during training. This information was provided to participants before they were shown any image; that is, the fact that there was a predictive structure was revealed but not the exact pairwise relationships between images.
During the MEG session, unexpected trials were introduced by manipulating the probabilistic relationship between the leading and trailing images (Figure 1D): Each leading image of the expected pairs was followed in 58.3% of the trials by the trailing image paired with it during training and 41.7% by one of the other trailing images (8.3% for each trailing image). Four hundred thirty-two nonoddball trials were used during the MEG session. The predictive relationship of the neutral pairs remained the same. The neutral pairs were used as a baseline for comparison, enabling us to ask whether the neural response was suppressed when a stimulus was expected and/or enhanced when a stimulus was unexpected. Participants were informed at the beginning of the MEG session that there would be a small change in how images were paired, yet the exact predictive structure was not explained.
To identify sensors most responsive to the experimental stimuli, independent of perceptual expectations, functional localizer trials were introduced during the MEG session. Only the six trailing images were used as stimuli in localizer trials. Each image was presented 110 times (10 trials being upside down) in a pseudorandom order. All trials started with a 500-msec fixation period, followed by a 500-msec image presentation, and were separated by an intertrial interval of 900–1100 msec (Figure 1B). Participants were asked to respond as fast as possible (while keeping fixation) when an image was presented upside down.
To assess participants' knowledge about the predictive relationships between images, we asked our participants to perform a categorization task after their MEG recording. We reasoned that, if participants learned the relationship between images and could use this knowledge to guide behavior, they would respond faster and more accurately for expected trials. Participants were instructed to report as fast as possible on every trial whether the trailing image was of an electronic item or not. The same trial structure and stimuli as those in the MEG main task were used, with the only difference that the occurrence of each image pair was halved to reduce participants' fatigue. A response window of 1.2 sec was used to encourage participants to prioritize response speed.
After the categorization task, participants were tested on their explicit awareness of the predictive relationships between images. Their task was to indicate using a 4-point scale how frequently they had seen a given image pair in the MEG main task (1 = most frequent and 4 = less frequent). They were shown 24 image pairs, consisting of the six expected pairs, six randomly selected neutral pairs, six randomly selected unexpected pairs, and six “swapped” pairs (i.e., image pairs whose presentation order of leading and trailing images was swapped).
Whole-head MEG data were acquired at 1200 Hz with a 275-channel CTF MEG system with axial gradiometers (CTF MEG Systems, VSM MedTech Ltd.) in a magnetically shielded room. Three fiducial coils were placed at a participant's nasion and both ear canals, to provide online monitoring of participant's head position and offline anatomical landmarks for coregistration. Eye position was recorded using an eye tracker (EyeLink, SR Research Ltd.) during the MEG recordings.
Anatomical MRIs were obtained during a third session or requested from the center's database if available. To improve coregistration of the MRIs and MEG data, earplugs with a drop of Vitamin E were placed at participants' ear canals during MRI acquisition. These anatomical scans were used for source reconstruction of the MEG signals. Note that source analysis reported here was based on 33 participants, as the anatomical MRIs of one participant were not available because of dropout after the MEG session.
MEG data were preprocessed offline and analyzed using the FieldTrip toolbox (Oostenveld, Fries, Maris, & Schoffelen, 2011) and custom-built MATLAB scripts. Trials of the main task and localizer were segmented and processed separately, given their different trial lengths. The data were down-sampled to a sampling frequency of 400 Hz, after applying a notch filter to remove line noise and harmonics (at 50, 100, and 150 Hz). Bad channels and trials were rejected via visual inspection before independent component analysis. Independent component analysis components were visually inspected, and those representing eye and heart artifacts were then projected out of the data. For the resulting data, outlier trials of extreme variance and trials within which participants blinked during the presentation of images were further removed from the data.
Before calculating the ERFs, singe-trial data were baseline-corrected using an interval of [−0.5, 0] sec for the main task trials and [−0.2, 0] sec for the localizer trials. To avoid the confounding influence of noise (in the planar transformation) because of unequal trial numbers across conditions, trial numbers were equated via subsampling when computing ERFs for different conditions. Specifically, we subsampled an equal number of trials from each condition before averaging over trials, such that the number of trials per condition matched that in the condition of the fewest trials. Planar gradients of the MEG field distribution were then calculated, which makes interpretation of the sensor-level data easier and enables comparing and averaging ERF topographies across participants. We repeated the abovementioned procedure 10 times per condition, to ensure every trial was used at least once, and then averaged over all corresponding planar-combined averages to obtain ERFs per condition.
Time–frequency representations (TFRs) of each trial were calculated by applying a fast Fourier transform to short sliding time windows. For low frequencies (4–30 Hz), a Hanning-tapered 300-msec sliding time window was used in time steps of 50 msec and frequency steps of 1 Hz. High-frequency (30–120 Hz) power was estimated using a discrete prolate spheroidal sequences multitaper approach with a sliding time window of 200 msec in time steps of 50 msec and frequency steps of 2 Hz, with ±10-Hz smoothing (obtained by using three tapers). Singe-trial TFRs were then averaged per condition. Power in the resulting average TFRs was expressed relative to a baseline, defined as [−0.6, −0.15] and [−0.4, −0.1] sec time-locked to stimulus onset for low and high frequencies, respectively. These time windows were used to prevent leakage of poststimulus activity into the baseline window.
Source Reconstruction of Frequency-Domain Data
The partial canonical coherence beamformer approach (Gross et al., 2001) was used to localize the sources of the observed differences in the sensor-level TFR between the expected and unexpected conditions. Volume conduction models were constructed based on a single-shell model of individual participants' anatomical MRIs (Nolte, 2003). They were then used to construct participant-specific search grids (6-mm resolution), which were later normalized to a template in Montreal Neurological Institute space. For each grid point, lead fields were computed with a reduced rank, which removes the sensitivity to the direction perpendicular to the surface of the volume conduction model. We extracted 400-msec data segments ([0.8, 1.2] sec time-locked to stimulus onset) from trials of both conditions and computed cross-spectral density matrices using the multitaper method centered at a frequency of 10 (±6) Hz. With the cross-spectral density matrices and the lead fields, a common spatial filter (i.e., common for both conditions) was constructed for each grid point for each participant. Using this common spatial filter, the spatial distribution of power was then estimated for the expected and unexpected conditions separately. To visualize the difference between conditions on source level, t statistic was computed for each grid point as a proxy of the source's contribution to the difference.
Behavioral results of the post-MEG tasks allowed us to evaluate whether participants had learned and used (either implicitly or explicitly) the transitional relations between images. Post-MEG behavioral data reported here were based on 33 participants, as data from one participant were not recorded because of a hardware error during testing. For the categorization task, mean RTs and accuracy of each condition were computed per participant. A within-participants repeated-measures ANOVA was then applied separately to both measures. For the questionnaire, the median rating score of each condition was computed per participant and used as entries to the repeated-measures ANOVA.
Statistical significance was evaluated using cluster-based permutation tests (Maris & Oostenveld, 2007). The time interval of interest was defined as 0.8–1.6 sec (i.e., the 800-msec window after the onset of the trailing image, during which expectations could be either confirmed or violated). For reference, we also considered the 0- to 0.8-sec time interval (i.e., the 800-msec window after the onset of the leading image, during which expectations about the trailing image may form). Pairwise permutation tests between conditions were conducted separately for the two intervals. For ERFs, data at each sensor and time point were compared univariately between two conditions and were used for clustering when the corresponding two-tailed paired t test resulted in a p value smaller than .05. A similar procedure was applied to TFRs, with the only difference being that the clustering took place in three dimensions including frequencies, sensors, and time course. The sum of the t values within a cluster was then computed as cluster-level statistic. The cluster with the maximum sum was subsequently used as test statistic. By randomizing the data across the two conditions and recalculating the test statistic 5000 times, we obtained a reference distribution of maximum cluster t values to evaluate the statistic of the actual data. The cluster of interest was considered significant when it fell out of the 95% tail of the reference distribution.
During the MEG main task, participants had to press a button for upside-down oddball images. They detected the oddballs on 99.3% (range = 97.3–100%) of the occurrences with mean RTs of 518 msec (between-participants SD = 44 msec), confirming their close engagement in the task.
During the categorization task (after the MEG recording), a significant difference in RT was observed for the three types of image pairs, F(1.54, 49.13) = 6.619, p = .006 (Greenhouse–Geisser corrected). Specifically, expected image pairs led to significantly shorter RT compared with unexpected pairs (Mexpected = 472 msec, Munexpected = 484 msec; t(32) = −3.044, p = .005, uncorrected), with the RT of the neutral pairs (Mneutral = 480 msec) in between (expected vs. neutral: t(32) = −2.168, p = .037, uncorrected; invalid vs. neutral: t(32) = −1.857, p = .073, uncorrected). Accuracy showed a similar yet insignificant pattern as RT, F(1.78, 57.05) = 2.983, p = .064 (Greenhouse–Geisser corrected). This lack of statistical significance in accuracy may be because of a ceiling effect (average accuracy across all conditions = 94.50%). Overall, participants' performance in the categorization task indicated that they learned the predictive relationship between images and benefited from this knowledge when expectations were relevant.
Finally, we presented image pairs in expected, neutral, unexpected, and swapped order and acquired frequency ratings from participants. A significant difference in the frequency ratings of the four trial types, F(2.75, 87.99) = 49.43, p < .001 (Greenhouse–Geisser corrected), was found. Post hoc tests showed that participants reported the expected pairs to have occurred most frequently (Mexpected = 1.91, uncorrected p < .01 for expected vs. neutral, expected vs. unexpected, and expected vs. swapped) and the swapped pairs as the least frequent (Mswapped = 3.45, uncorrected p < .001 for swapped vs. expected, swapped vs. neutral, and swapped vs. unexpected), with the neutral and unexpected pairs (Mneutral = 2.31, Munexpected = 2.53) in between (neutral vs. unexpected, p = .055, uncorrected). These results further suggested that participants learned the predictive relationship between images.
No ERF Amplitude Difference between Conditions
To investigate whether expectations modulated the response to the trailing image, we contrasted the evoked response to each of the three conditions (using three pairwise permutation tests) in an 800-msec window starting at the onset of the trailing image. We observed no significant differences between any of the conditions (p > .5 for all three pairwise permutation tests; Figure 2A and C, bottom row). As expected, there were also no differences between conditions for the evoked response to the leading image (i.e., pairwise permutation tests for an 800-msec window aligned to the onset of leading image; Figure 2C, top row). It could be argued that the statistical approach that we used, a cluster-based permutation test that controls the Type I error rate at the whole-brain level (involving 275 sensors), has reduced sensitivity compared with a more focused analysis that tests for differences in a subset of sensors of interest (e.g., occipital sensors). Therefore, we repeated our analysis for a subset of sensors reported to show different activations to expected and unexpected stimuli in a previous study by Manahova et al. (2018). Results for this ROI analysis showed no significant differences between the three conditions (p = 1 for all three pairwise permutation tests).
Unexpected Object Images Induce Stronger Low-Frequency Power
Although we found no significant ERF differences between conditions, we did observe a significant difference in low-frequency power between expected and unexpected conditions. When contrasting TFRs during the processing of the trailing image, we found a significant spectro-temporal cluster with stronger power for the unexpected than the expected object stimuli, centered around 10 Hz (time window of 0.8–1.2 sec after the onset of the leading image, i.e., at onset of the trailing image, frequency range = 6–26 Hz, p = 0.006, Bonferroni corrected; Figure 3B and C).
We used beamformer analysis to localize the source of this power difference and found that it stemmed mainly from the left occipito-temporal cortex (Figure 4). Pairwise comparisons between the neutral condition and the other two conditions were not significant (p > .3 for both permutation tests).
No Evidence for Phase-Locked Expectation Modulation of Low-Frequency Activity
The diverging patterns of our ERF and TFR results—that is, lack of differences in the evoked response versus significantly different oscillatory patterns—suggest that the low-frequency power differences between expected and unexpected stimuli are unlikely to be phase locked. Note that, although our TFR analysis mainly reflected the induced activity, it might also have captured some evoked/phase-locked responses. To address this further, we estimated the phase-locked TFRs (Figure 5A) by computing the time–frequency dynamics of the average ERFs for each condition (see Swettenham, Muthukumaraswamy, & Singh, 2009, for a detailed description of the method). We then repeated the comparisons between conditions, specifically focusing on the contrast between the expected and unexpected conditions. There were no significant differences between conditions for the phase-locked TFRs (p > .18; Figure 5B). We additionally computed the phase-locked TFRs using another method (Cohen, 2014) and compared the resulting TFRs between conditions. This additional analysis also did not indicate a difference between expected and unexpected conditions in the phase-locked component (p > .8). Together, these analyses suggest that expectations modulated endogenous oscillations in a non-phase-locked manner, rather than these oscillatory patterns being evoked by the external stimulus.
No Difference between Conditions in High-Frequency Power
Although we observed significant high-frequency activity in response to both the leading and trailing images (p < .01 for response to leading image vs. baseline and trailing image vs. baseline; see Figure 3A, top row), our cluster-based permutation tests revealed no significant differences between conditions for either time interval (for all permutation tests, p > .3).
In the current study, we investigated the consequences of prior expectation on the neural response to object stimuli using MEG. Specifically, we aimed to address whether expectations modulate perceptual processing by changing early sensory representations and to characterize the oscillatory activity induced by (violations of) expectation. Using a statistical learning paradigm in which participants implicitly acquired knowledge about stimulus transitions, we found a modulatory effect of perceptual expectations on low-frequency activity centered at around 10 Hz, in the form of increased power in response to unexpected relative to expected items.
No Effect of Expectation on ERFs
Somewhat surprisingly, we did not observe any expectation modulations in the ERFs, in apparent contrast with earlier electrophysiological studies in nonhuman primates (Ramachandran et al., 2017; Meyer & Olson, 2011) that used similar stimuli and probabilistic transition rules. Of note, though, primates were exposed to the object images extensively for many days in these studies, which may have caused object-selective neurons to alter their selectivity profiles (Li & DiCarlo, 2008, 2010). Hence, it is plausible that, in these studies, expectations modulated neuronal activity in a different manner than in our study, where exposure to each image was limited. In addition, attention was typically not controlled in these studies, as primates were simply trained to maintain fixation during stimulus presentation.
The lack of ERF amplitude differences that we observe also appears at odds with earlier electrophysiological studies in human participants, where multiple ERP components have been shown to index prediction error or surprise. For example, using predictable image sequences, Johnston et al. (2017) showed that violation of expectations robustly modulated the N170 ERP component across several stimulus categories. More generally, multiple studies have shown that surprising visual events can elicit a so-called “visual mismatch negativity” response (Kimura, Schröger, Czigler, & Ohira, 2010; Kimura, Widmann, & Schröger, 2010; see Stefanics et al., 2014, for a review), although the effects of perceptual expectation are not always separable from stimulus adaptation in these designs. It is possible that these seemingly inconsistent results are attributable to a difference in how expectations were elicited in different studies and whether the predictive relationship was task relevant (Richter & de Lange, 2019; Larsson & Smith, 2012).
Expectations Modulate Perception at Early Sensory Processing Stage
It is still debated whether expectations modulate perception by enhancing early sensory signals (Cheadle et al., 2015; Kok et al., 2012; Wyart et al., 2012) or whether this modulation is restricted to later decision-related stages (Rungratsameetaweemana et al., 2018; Bang & Rahnev, 2017). Our current findings support the notion that expectations modulate the perceptual decision-making process at an early sensory stage, as substantiated by both the temporal and spatial profiles of the expectation modulations we observed. Specifically, the observed modulatory effect of expectation coincided with the trailing image onset and was short-lived (i.e., about 400 msec), suggesting a modulation during the sensory processing of the bottom–up input. In addition, source localization showed that object-selective sensory regions (in particular, occipito-temporal cortex in both hemispheres) were most strongly modulated, supporting the idea that top–down expectations modulate activity of neuronal populations representing the sensory stimuli.
Note that, in addition to modulating sensory processes, expectations might also modulate later processes such as setting the decision criterion. As the current study used an oddball detection task where oddballs occurred randomly throughout the experiment, we were unable to address whether expectations modulate decisional stages beyond sensory processing. Furthermore, the use of clearly visible object images instead of threshold-level stimuli (cf. Bang & Rahnev, 2017) resulted in ceiling performance for all participants, thus minimizing our opportunities to observe any modulation in decision criterion. Therefore, future studies using threshold-level stimuli may address whether expectations modulate decisional stages in addition to modulation of sensory processes.
Violations of Sensory Expectations Result in Stronger Endogenous Alpha-Band Oscillations
When an image stimulus was presented unexpectedly compared with when it was presented as expected, a power increase was observed in band-limited low-frequency activity overlapping largely with the alpha band. It has recently been proposed that alpha-band oscillations emerge from recurrent interactions in a hierarchical network, in which the higher-order region tries to predict the signal received from the lower-order region (Alamia & VanRullen, 2019). Indeed, stimulus-induced alpha oscillations have been observed when computing the visual impulse response function to randomly varying (unpredictable) visual input (VanRullen & Macdonald, 2012). In contrast with the “canonical microcircuits” model of predictive coding that emphasized the feedback prediction carrier role of alpha-band oscillations (Bastos et al., 2012), Alamia and VanRullen (2019) proposed that alpha-band traveling waves occur in both the feedforward (during visual stimulation) and backward (in the absence of visual input) directions. It is well conceivable that the unexpected condition required a larger amount of recurrent activity cycles as the inconsistency between sensory expectation and input requires an update of sensory expectations, therefore leading to stronger power in the alpha band. At present, this proposal is speculative, however, and in need of further empirical evidence.
Alternatively, the power difference in alpha-band oscillations might also be interpreted as a stronger alpha power suppression for the expected stimuli. It has been proposed that alpha oscillations gate information processing by inhibiting task-irrelevant brain regions, and this inhibition is modulated by attention (Jensen & Mazaheri, 2010). It has been reported that statistical regularities can bias attention toward the regular input (Zhao, Al-Aidroos, & Turk-Browne, 2013). Therefore, one might hypothesize that the observed difference between expected and unexpected trials may stem from stronger attentional engagement for the expected stimuli. Although participants only paid limited attention to the stimuli, as they were engaged in an oddball detection task that was designed to minimize the task relevance of the nonoddball images, we cannot fully rule out the contribution of attention to the observed neural difference.
Alpha power in the neutral condition, in which all trailing images were equally expected, was intermediate between expected and unexpected images but did not significantly differ from these conditions (Figure 3D). We believe that this null result may be because of a lack of statistical power, given that this difference is expected to be more subtle than the difference between expected and unexpected trials (Ramachandran et al., 2017).
Although we observed significant expectation modulation of oscillatory activity in the alpha band, we did not see any activity modulation in the high-frequency range. This appears at odds with the proposal that low- and high-frequency oscillations reflect predictions and prediction errors, respectively (Bastos et al., 2012). One factor that may have precluded our observing such an effect is the stimulus set we used. Although we observed an increase in high-frequency power time-locked to our stimuli, this reflected a broadband evoked response, rather than the narrow-band gamma oscillatory activity that is typically seen as the feedforward information carrier in the predictive coding framework. Still, our observation was consistent with previous reports that narrow-band gamma oscillations are elicited by only certain types of stimuli (Hermes, Miller, Wandell, & Winawer, 2015). Future research may be able to shed light on the stimulus (in)dependence of high-frequency oscillatory activity for expectation violations.
To summarize, our current study demonstrates that perceptual expectations modulate perception at an early stage during sensory processing. Importantly, this modulatory effect of expectation is expressed by a power increase of low-frequency oscillatory activity in response to an unexpected stimulus. Our findings extend our understanding on how perceptual expectations are implemented by the human brain.
This work was supported by the European Union Horizon 2020 Program (ERC Starting Grant 678286, “Contextvision” awarded to F. P. d. L.), the Chinese Scholarship Council (CSC20170800036 awarded to Y. J. Z.), the Netherlands Organization for Scientific Research Vidi grant (NWO 016.Vidi.185.137 awarded to S. H.), and the Beatriu de Pinòs 2017-BP-00213 (AGAUR, awarded to A. P. B.). We thank Mats van Es, Jan-Mathijs Schoffelen, Matthias Fritsche, and Eelke Spaak for helpful discussions.
Reprint requests should be sent to Ying Joey Zhou, Donders Institute for Brain Cognition and Behaviour, Radboud University, 198328 Kapittelweg 29, Nijmegen, Gelderland 6525 EN, The Netherlands, or via e-mail: firstname.lastname@example.org.