Humans live in a volatile environment, subject to changes occurring at different timescales. The ability to adjust internal predictions accordingly is critical for perception and action. We studied this ability with two EEG experiments in which participants were presented with sequences of four Gabor patches, simulating a rotation, and instructed to respond to the last stimulus (target) to indicate whether or not it continued the direction of the first three stimuli. Each experiment included a short-term learning phase in which the probabilities of these two options were very different (p = .2 vs. p = .8, Rules A and B, respectively), followed by a neutral test phase in which both probabilities were equal. In addition, in one of the experiments, prior to the short-term phase, participants performed a much longer long-term learning phase where the relative probabilities of the rules predicting targets were opposite to those of the short-term phase. Analyses of the RTs and P3 amplitudes showed that, in the neutral test phase, participants initially predicted targets according to the probabilities learned in the short-term phase. However, whereas participants not pre-exposed to the long-term learning phase gradually adjusted their predictions to the neutral probabilities, for those who performed the long-term phase, the short-term associations were spontaneously replaced by those learned in that phase. This indicates that the long-term associations remained intact whereas the short-term associations were learned, transiently used, and abandoned when the context changed. The spontaneous recovery suggests independent storage and control of long-term and short-term associations.
The ability to make predictions about future states of the environment allows humans to adapt their perception and optimize their behavior. According to predictive coding models (Friston, 2005; Rao & Ballard, 1999), predictions are represented in the brain as probability distributions that are continuously compared with actual evidence and adjusted correspondingly. Predictions are primarily based on general knowledge and experience, that is, on global probabilities about how events succeed in a given context. These probabilities can be learned over time, improving our adaptation to the environment as we are exposed to it. Such an adaptation may be easily achieved under stable contextual conditions. However, we live in a mutable environment in which the relationships between predictive and predicted events are subject to changes. Moreover, these changes occur at different timescales, from very transient to very long-lasting, and the ability to adapt our predictions consequently is critical for optimizing perception and action. The aim of this study was to shed light on the adaptability of predictions by investigating into the dynamics of the acquisition of relationships between predictive and predicted events learned at different timescales.
The general idea that the nervous system is adapted to the statistical properties of the environment is a long-standing principle in neuroscience (Fiser, Berkes, Orbán, & Lengyel, 2010). This principle is related to an equally long tradition emphasizing the role of prediction in perception and cognition, a view that has roots in von Helmholtz's work in the late 19th century (Swanson, 2016; von Helmholtz, 2013; Dayan, Hinton, Neal, & Zemel, 1995). Over the last two decades, this “predictive brain” view has regained considerable strength, leading to the emergence of theoretical proposals such as the hierarchical predictive coding models (Friston, 2005; Rao & Ballard, 1999). Predictive coding models characterize the brain as an organ fundamentally dedicated to actively infer the causes of the inputs it receives and to predict future inputs accordingly. Furthermore, these models claim that all brain-based behaviors, including those of high-level cognition, can be explained in terms of suitably organized hierarchical prediction processes (Thornton, 2017; Adams, Shipp, & Friston, 2013). Predictive processes are generative, that is, they are based on prior beliefs about how causes interact and lead to a particular input. This information is estimated from sensory data and thus needs to be learned. This learning process seeks inferring the causes of the sensory inputs by minimizing the difference between the actual sensory data and the sensory data predicted on the basis of preceding events, that is, prediction error (PE). In predictive coding, backward projections from one hierarchical level to its subordinate level furnish predictions of the lower level's representation, whereas reciprocal forward projections convey PE that report the difference between the representation and the prediction (Bastos et al., 2012). Error signals received by the higher level are then used to correct its representation so that its predictions progressively improve. This recurrent exchange of signals continues until PE is minimized, at which point the hierarchy contains a representation as accurate as possible of the causes of sensory input.
Gathering evidence to uncover the probabilistic structure of the environment and reduce PE necessitates time. A consequence of the continuous PE-reduction process is that, the more stable the context is, the faster an optimal predictive state is achieved, that is, the faster the associations between predictive and predicted inputs are learned, the faster PE is reduced. In experimental sets, time needed for learning can indeed be very short, taking only a few trials under particularly stable conditions (Hsu, Le Bars, Hämäläinen, & Waszak, 2015; Todorovic & de Lange, 2012; Bekinschtein et al., 2009; Garrido et al., 2009; Schadow, Lenz, Dettler, Fründ, & Herrmann, 2009), but increases significantly when apprehending the probabilistic structure of the environment requires more complex computations (Bidet-Caulet et al., 2012; Domenech & Dreher, 2010). Once learned, the associations between predictive and predicted events are strengthened in the long term when participants are exposed to a given context beyond what is necessary for unveiling the probabilistic relationships between those events. Moreover, our experience depends not only on immediate information from the environment but also on our prior knowledge and expectations. Human behavior is shaped by past experience on multiple timescales, and successful performance in our dynamic environment critically depends on the brain's capacity to adapt their predictions based not only on current contextual information but also on past experience (Sohoglu & Davis, 2016). In other words, the brain needs to be able to learn from transient conditions and to adapt its predictions consequently without weakening the associations between predicted and predictive inputs that hold valid on a longer-term basis.
The current study addresses this issue by assessing PE in a dynamic learning environment as described in detail below. To do so, we harness the fact that a number of ERP components have been linked to PE (Stefanics & Czigler, 2012; Todorovic & de Lange, 2012; Garrido et al., 2009). This is the case of the P3, a positive deflection in the ERP waveform peaking between 300 and 800 msec and with a broad but varying topographical distribution depending on the task employed. P3 has been related to a wide variety of processes, including context updating (Donchin & Coles, 1988), decision confidence (Sawaki & Katayama, 2006), evidence accumulation, and the updating of perceptual evidence (O'Connell, Dockree, & Kelly, 2012), all arguably related to prediction processing. In the context of prediction research, modulations of P3 amplitude have been extensively related to mismatch, surprise, and novelty processing, and consequently to PE (Ehinger, König, & Ossandón, 2015; Kolossa, Kopp, & Fingscheidt, 2015; Feldman & Friston, 2010; Mars et al., 2008; Waszak & Herwig, 2007) and to the updating of an internal prediction model (Fischer & Ullsperger, 2013). More specifically, P3 has been linked to high-level error associated not to specific sensory features but to the processing of global contextual deviations (Bekinschtein et al., 2009). Mars et al., for instance (Mars et al., 2008), instructed participants to learn the associations between four arbitrary stimuli and four response buttons and later presented them with a series of experimental blocks in which participants were required to respond to each stimulus with the previously associated button as quickly and accurately as possible. Unbeknown to participants, the probability of the occurrence of each event was manipulated between blocks such that the relative probabilities of events were either low, medium, or high. Their results demonstrated that trial-by-trial fluctuations in P3 amplitudes could be explained in terms of participants keeping track of the global probabilities of visual events, so that its amplitude was reduced or enhanced as a function of surprise or, in other words, PE. In another experiment, Wacongne et al. (Wacongne et al., 2011) employed an auditory paradigm to dissociate two types of predictions, based on local probabilities versus global rules. In a given block, a frequent sequence of five tones was presented (in 75% of trials), interspersed with rare violations (in 15%) in which the frequency of the fifth tone deviates from the expected, and with rare omissions (10%) in which the fifth tone is simply omitted. Authors found that, in blocks in which the frequent sequences consisted of four identical tones followed by a deviant one, this last tone elicited a mismatch negativity (MMN), that is, an early PE response related to local regularity violations, but not a P3. However, in sequences in which the last tone was identical to the four preceding ones, this last tone did not elicit an MMN, but an enhanced P3 response. Authors concluded that two different levels of predictions operate in that context: a first low-level expectation, based on local transition probabilities and reflected by the MMN, and a second, higher-level prediction, based on the knowledge about the global overall rule or pattern followed by the stimuli, which, when violated, elicits a PE response reflected in the P3 component (Wacongne et al., 2011). Hence, because of its sensitiveness to PE, P3 amplitude offers insight into the predictions that participants are making at any given moment and, consequently, into their learning about the statistical structure of the environment and the use they make of that information to adapt to changes in that environment over time.
Here, we used ERPs and RTs to investigate the adaptability of predictions learned at different timescales. Specifically, we focused on the P3 ERP component as a measure of PE and thus as a means to track the learning and adaptation of participants to different contexts. The study comprised two experiments in which participants were presented with sequences of four succeeding Gabor patches, simulating either a clockwise or an anticlockwise rotation, and were instructed to respond to the last stimulus (target) by pressing one of two buttons to indicate whether it followed or not the direction indicated by the first three stimuli. Each experiment included a short-term learning phase in which the probabilities of the targets following or not following the direction indicated by the first three stimuli were very different (p = .2 vs. p = .8, designated as predictive Rules A and B, respectively), followed by a neutral test phase in which both rules predicted the targets with equal probability. In addition, in one of the experiments, participants performed a much longer long-term learning phase, prior to the other short-term learning and test phases, in which the relative probabilities of the two possible rules predicting the targets were opposite to those of the short-term learning phase (p rule A = .8, p rule B = .2). In every learning phase, targets presented according to the least predictive rule should generate a larger PE response, measured as the amplitude of the P3 component, and yield slower RTs. Moreover, if long-term predictions remain intact while participants learn the short-term associations, in the neutral test phase, a recovery of the PE response observed in the long-term learning phase should occur for participants who were pre-exposed to that phase. For participants not pre-exposed, however, differences in PE responses should simply disappear in the neutral test phase. This pattern of results would parallel those demonstrated in perceptual (Bao & Engel, 2012b; Vul, Krizay, & MacLeod, 2008), motor (Kording, Tenenbaum, & Shadmehr, 2007; Smith, Ghazizadeh, & Shadmehr, 2006), and Pavlovian (Bouton, 1993) learning, and would indicate that predictions valid at different timescales are independently stored and controlled.
Forty participants (27 women, 26.2 ± 3.8 years of age) took part in the study and received monetary compensation for their participation. All participants were right-handed students with normal or corrected-to-normal vision and reported no history of neurological or psychiatric disorders. Informed consent was obtained from them. Experimental procedures were undertaken in accordance with the Declaration of Helsinki and with the approval by the Comité de Protection des Personnes Ile de France II. Participants were randomly assigned to either Experiment 1 (20 participants, 13 women, 25.7 ± 3.9 years of age) or Experiment 2 (20 participants, 14 women, 26.4 ± 3.6 years of age).
Stimuli and Trials
Visual stimuli were presented on a 27-in. 60-Hz LCD monitor. Stimuli were presented against a gray background at the center of the screen and consisted of displays containing a Gabor patch (.4 Michelson contrast, 7 cycles, 4° visual angle), presented within the limits of a central fixation box (5° visual angle) and presented with one of eight possible orientations with respect to the vertical meridian (0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°). A small dark gray dot was placed on the internal edge of each Gabor patch to increase the possible number of the rotation's starting and ending points (otherwise, there would be only four possible orientations, given that 0° and 180°, 45° and 135°, 90° and 270°, and 225° and 315° would be identical) to help participants differentiate between supplementary orientation angles and to favor their engagement in the task (Figure 1). Trials consisted of sequences of four succeeding Gabor patches. The first three succeeded rapidly (duration 100 msec, 166 msec SOA) and were presented in a manner in which they simulated either a clockwise or anticlockwise rotation in 45° steps. Both rotation directions were equally probable. The initial point of this rotation (i.e., the orientation of the first stimulus in the sequence) was randomized. These first three Gabor patches were followed after a 1000-msec delay by a fourth one that could either be presented 45° following the same rotation direction indicated by the first three or be presented 45° against the rotation direction. Participants were instructed to respond to that fourth stimulus by pressing one of two buttons to indicate whether this target stimulus followed or not the direction indicated by the first three. Participants were encouraged to respond as fast as possible and given a maximum of 1000 msec to do so. The association between response button (left, right) and response meaning (same, opposite direction) was counterbalanced between participants. Intertrial interval, defined as the interval between the presentation of a target on a given trial and the presentation of the first stimulus in the sequence of the next trial, was set to 2500 msec. The stimuli were created, and the experiments written, using the Psychophysics Toolbox (Psychtoolbox-3) extensions (Kleiner et al., 2007; Brainard, 1997).
Experiment 1 consisted of two phases, containing 600 trials each. In the first phase (short-term learning), targets could be presented either following the direction indicated by the predictive sequence (Rule A) or the opposite direction (Rule B). The predictive value of Rules A and B was different (0.2 and 0.8, 120, and 480 trials, respectively). In the second phase (test), Rules A and B predicted targets with equal probability (0.5, 300 trials each). EEG activity was recorded while participants performed in both phases.
Experiment 2 was similar to Experiment 1 regarding the short-term learning and test phases explained above, but in this case, those phases were preceded by a long-term learning phase. This phase was much longer than the short-term learning and test phases and was divided into two sessions to reduce participants' fatigue. All through the long-term learning phase (Sessions I and II), the predictive values of Rules A and B were the opposite of those in the short-term learning phase. In the first session (long-term learning Session I), participants were presented with 1000 trials (800 presented according to one rule, 200 according to the other), and the EEG was not recorded. The second session (long-term learning Session II) started 90 min after the end of the first one. In this second session, participants were presented with 600 additional trials (480 according to one rule, 120 according to the other), while their EEG activity was recorded. After this two-session long-term learning phase, participants performed a short-term learning phase and a test phase similar to those in Experiment 1.
EEG Recording and Preprocessing
Continuous EEG data (0.1- to 250-Hz band-pass) were collected from 60 actiCAP EEG electrodes (BrainProducts GmbH) mounted on an elastic cap. EEG electrodes were placed following the extended 10–10 position system (Acharya, Hani, Cheek, Thirumala, & Tsuchida, 2016) and were referenced to right mastoid. Four additional electrodes were placed above and below the left eye and on the outer canthi of both eyes to monitor blinks and eye movements. EEG and EOG data were collected using the PyCorder system and actiCHamp amplifiers (BrainProducts GmbH) in direct current recording mode with a sampling rate of 2000 Hz.
EEG data were processed using EEGLAB (Swartz Center for Computational Neurosciences: http://www.sccn.ucsd.edu/eeglab) running under MATLAB R2012b (The Mathworks). Preprocessing was performed as follows. EEG data were rereferenced off-line to linked mastoids. Bad channels were then identified by visual inspection and excluded from processing. Epochs for each stimulus type were extracted from −2000 to +2000 msec with respect to the target stimulus in each trial, and were inspected for nonstereotyped artifacts and removed if present (4.87% ± 4.22 of trials removed). Stereotyped artifacts, including blinks, eye movements, and muscle artifacts were deleted via independent component analysis (ICA) using the extended infomax algorithm (Bell & Sejnowski, 1995). The average number of independent components removed was 2.71 (± 1.27 SD). The remaining components were then projected back into electrode space. After ICA, channels that were deemed bad were reintroduced by interpolating data between neighboring electrodes using spherical spline interpolation (Perrin, Pernier, Bertnard, Giard, & Echallier, 1987).
ERP analyses were performed on ICA-corrected epochs time-locked to the onset of each target (−200 to +1000 msec). To minimize the influence of individual differences in topographies as well as the effects of performing multiple statistical comparisons, the analyses of the P3 was performed on a central cluster including C1, Cz, C2, CP1, CPz, and CP2 in a 20-msec window with regard to the most positive point in the latency range of 315–375 msec (356 msec in the long- and short-term learning phases and 340 msec in the test phase), selected on the basis of both the grand average visual detection of the electrodes showing maximal P3 amplitude and the topographical distribution of the activity on the scalp (see Figures 8 and 10). Baseline was designated from −200 to 0 msec relative to stimulus onset.
Results were analyzed with a Bayesian linear mixed model (LMM) analysis using the package brms (Bürkner, 2017), a high-level interface on Stan (Carpenter et al., 2017) in R (R Core Team, 2016). Plots were made using brms and ggplot2 (Wickham, 2016). An advantage of LMMs over traditional approaches such as repeated-measures ANOVA and paired t tests is that a single model can take all sources of variance into account simultaneously, and comparisons between conditions can be implemented in a single model. LMMs (of which t tests and ANOVA are specific examples) allow for modeling complex data structures, interacting continuous and categorical variables, and taking inherent correlations in data structures into account. Bayesian LMMs give insight in the range of possible effect sizes and, as such, allow for direct comparison of parameter estimates. Bayesian statistics is theoretically distinct from frequentist statistics in its inferences. The coefficient estimates are expressed in credible intervals, which reflect the intuitive notion of the value of a parameter falling within that interval with a given probability, 95% in this case.
Main analyses focused on comparing the two experiments, and therefore only behavioral and EEG data obtained in the short-term learning and test phases were included. Behavioral and EEG data from the long-term learning phase in Experiment 2 were analyzed independently (see the end of this section), both as data sanity check and for descriptive purposes. For the main analyses, we used a predefined model reflecting our experimental design (Barr, Levy, Scheepers, & Tily, 2013), and we kept this model structure the same for the behavior and amplitude models. In the behavioral model, participant RTs in milliseconds were transformed to their logarithmic function in order to normalize the residuals (Baayen & Milin, 2010). In the P3 amplitude model, participant P3 amplitudes were normally distributed and did not need transformation. Both dependent variables were scaled to z scores (making parameter estimates interpretable as effect sizes) for ease of interpretation and comparison. In the model (Figures 3 and 6), we implemented our hypothesis with a full interaction of Experiment (1, 2), Rule (A, B) and Trial order (1:599, scaled) in each phase (short-term learning, test). The model additionally included individual participant intercepts and slopes of Rule (A, B) in order to account for individual variation. Contrasts of all categorical factors were centered (Baayen, 2008), so the intercept of the model represents the grand mean. We tested specific hypotheses with the “hypothesis” function in brms. We used a generic weakly informative normally distributed prior with mean 0 and 1 SD for each fixed parameter (Lemoine, 2019) and kept all other priors at default (see Appendix for the full prior specification of all models). This way, our models are explicit about expected small effect sizes, conservative, and robust to unrealistically large effects caused by noise, that is, Type I errors. We furthermore used four chains of 3000 iterations each per model, of which 1000 per chain were used for warm-up only, a maximum tree depth of 15 and a target acceptance rate (adapt delta) of .95. Convergence was verified through visual inspection of trace plots and the Rhat of 1.00 for each parameter.
Participants were excluded if they had less than 50% accuracy in the test phase. Two participants were additionally excluded for presenting periods of no response. Our final sample consisted of 18 participants in Experiment 1 and 17 participants in Experiment 2.
For the analysis of data from the long-term learning phase of Experiment 2, we used the same model as specified above, with the following differences. There was no variable “experiment”, because the data were only collected for Experiment 2, and there was no variable “phase” in the model of the EEG data, because EEG data were only collected in long-term learning phase Session II. The variable Session was included in the model of the behavioral data, where it had two levels: long-term learning Session I and long-term learning Session II.
In order to show the differences observed between Experiments 1 and 2, this section focuses on describing behavioral and ERP results obtained in the short-term learning and test phases, present in both experiments. Behavioral and ERP results corresponding to the long-term learning phase, present only in Experiment 1, are described together at the end of the Results section.
Short-Term Learning and Test Phases: Behavioral Results
Results obtained in the short-term learning and test phases are illustrated in Figure 2 (left and right, respectively). Medians and credible intervals of values of each parameter can be found in Figures 3 and 4.1 The model showed that RTs were shorter (Figure 3) in the test (mean = 504 msec) than in the short-term learning phase (mean = 510 msec), presumably reflecting a general effect of practice. Below, we describe the different phases separately.
Short-Term Learning Phase
Results obtained in the short-term learning phase are illustrated in Figure 2 (left) and described in Figures 3 (general model) and 4 (grouped by Experiment). A main effect of Experiment revealed slower RTs in Experiment 1 (mean RT = 534 msec) than in Experiment 2 (Mean RT: 480 msec). This probably reflects the effect of practice, because participants in Experiment 2 performed a similar task in the long-term learning phase prior to the short-term learning and test phases. A main effect of Trial showed that RTs progressively decreased over this phase. However, the significant interaction of experiment and trial revealed that RTs indeed decreased over trial in Experiment 1, but actually increased, albeit slightly, in Experiment 2 (Figure 4, left). Taken together, these results suggest that, although there was apparently a general effect of practice in Experiment 1, with RTs getting faster over the course of the phase, such an effect was not present in Experiment 2, presumably because of some interference caused by the long-term learning phase that participants in this experiment, but not those in Experiment 1, performed prior to the short-term learning phase. This interference could partly consist of a ceiling effect resulting from previously performing the extensive long-term learning phase in Experiment 2. As suggested in the Experiment 2 long-term learning phase result description (see the end of the Results section and Figure 10), RT improvement over the long-term learning phase seems to reach a ceiling throughout the second part of the long-term learning phase. This ceiling effect seems to have consolidated and can also be seen in Figure 2, where it is evident that RTs in the short-term learning phase are from the very beginning much faster in Experiment 2 than in Experiment 1.
Finally, a main effect of Rule revealed slower RTs to Rule A (mean RT = 525 msec) than B (mean RT: 498 msec). This effect was expected, because it indicates that participants learned about the relative probabilities of the two rules in both experiments and consequently responded faster to targets presented according to the most predictive one (Figure 3). Interestingly, an interaction of Rule and Trial further characterized this effect by showing that RTs decreased over this phase for targets presented according to Rule B whereas RTs to targets presented according to Rule A did not change. This would indicate that targets presented according to the less probable rule remained relatively surprising all through this phase, whereas participants progressively learned to anticipate targets following the most probable rule.
Results obtained in the test phase are illustrated in Figure 2 (right) and described in Figures 3 and 4. No main effects were found in the test phase. There was, however, an interaction of Experiment and Trial showing decreasing RTs over the course of this phase in Experiment 1 but not in Experiment 2 (Figure 4, left). Furthermore, an interaction of Rule and Trial was also observed, with decreasing RTs over trial for Rule A, and increasing RTs over trial for Rule B, as shown in Figure 2. Last and crucially, results showed a three-way interaction of Experiment, Rule, and Trial (Figure 3). This difference between experiments consisted of a stronger interaction of rule and trial in Experiment 2 than Experiment 1, as well as a main effect of Trial indicating that RTs decreased over trial in Experiment 1 but not in Experiment 2 (Figure 4, left). Visual inspection of the RT plots (see Figure 2) suggests that this difference lies in RTs to Rules A and B converging over trial in Experiment 1, but showing an X-shaped development in Experiment 2. To investigate the significance of this difference, we probed the effect of Rule at different stages of each experiment. To this end, we made simple-slope comparisons referencing the trial-predictor to the start and end (Figure 4, right) of each phase. Results revealed that, at the start of the test phase, there was a main effect of Rule in both Experiments 1 and 2, with longer RTs to Rule A than B, whereas at the end of this phase an effect of Rule was observed in Experiment 2, but not in Experiment 1. This effect showed an opposite pattern to that observed at the beginning of the phase, with longer RTs to Rule B than Rule A. Altogether, these results suggest that, in the neutral context of the test phase, participants started performing according to the relative probabilities they learned in the immediately preceding short-term learning phase. Over the course of the phase, however, RT differences between rules disappeared in Experiment 1, but reverted in Experiment 2 to show faster RTs to targets presented according to Rule A than to Rule B, a pattern similar to that observed in the long-term learning phase, as we describe at the end of the Results section.
Short-Term Learning and Test Phases: ERP Results
Short-Term Learning Phase
Results obtained in the short-term learning phase are illustrated in Figure 5 (left), and medians and credible intervals of parameter values can be found in Figures 6 (general model) and 7 (grouped by experiment). A main effect of Trial showed that the amplitude of P3 decreased over the course of the phase. However, an interaction of Experiment and Trial further clarified this effect by showing that amplitude decreased in Experiment 2, but not in Experiment 1 (see Figure 7, left), hence suggesting that participants in Experiment 2 were somehow influenced by their pre-exposure to the long-term learning phase. This effect seems to be essentially driven by the pronounced reduction of P3 amplitude to targets presented according to the most predictive rule (B) in Experiment 2, which presumably reflects a practice effect produced by participants' previous exposure to the extensive long-term learning phase. That previous experience may have allowed them to learn about the general structure of the experiment (one rule is much more predictive than the other). As a consequence, the predictability of targets presented according to the most predictive rule would increase at a higher rate over the phase for participants pre-exposed to the long-term learning phase than for participants not pre-exposed. This difference would be reflected in P3 amplitude reduction.
Results also revealed a main effect of Rule (Figure 6), with higher amplitudes for Rule A (mean amplitude = 3.75 μV) than B (mean amplitude = 1.14 μV), that is, higher for targets presented according to the less predictive rule and therefore more surprising or unexpected. This anticipated effect was further explained by an interaction of Rule and Trial, with decreasing amplitudes for Rule B but not A, suggesting that participants became progressively better at anticipating targets presented according to the most predictive rule, while targets presented according to the alternative rule remained surprising throughout the phase. These results parallel those observed in the analysis of RTs. The interaction of Experiment and Rule, showing a bigger difference in amplitudes between rules in Experiment 2 (mean amplitude Rule A = 4.69 μV; mean amplitude Rule B = 0.98 μV) than in Experiment 1 (mean amplitude Rule A = 2.86 μV; mean amplitude Rule B = 1.29 μV), suggests again the influence of the long-term learning phase on participants' performance in the short-term learning phase in Experiment 2. ERP waveforms obtained in response to Rules A and B in Experiments 1 and 2 in the short-term learning phase are shown for illustration purposes in Figure 8A.
Results obtained in the test phase are illustrated in Figures 5 (right) and 8; means and credible intervals of parameter values can be found in Figures 6 and 7. A main effect of Trial was observed, showing that amplitudes decreased over the course of the phase. Crucially, results revealed a three-way interaction of Experiment, Rule, and Trial (Figure 6). The difference between experiments (see Figure 7, left) consisted of an interaction of Rule and Trial in Experiment 2 and no interaction in Experiment 1. To investigate the significance of this difference, we proceeded the same way as we did for the RTs, with simple-slope comparisons. No effect of Rule was observed in Experiment 1 nor 2 at the beginning of this phase. Over the course of the phase, however, differences in P3 amplitude between rules emerged in Experiment 2 but not in Experiment 1. Interestingly, as observed in RT analyses, these differences showed a pattern similar to that observed in the long-term learning phase, as we describe in the following section. No such effect was observed in Experiment 1. ERP waveforms obtained in response to Rules A and B in Experiments 1 and 2 in the test phase are shown in Figure 8B.
Long-Term Learning Phase
Behavioral results obtained in the long-term learning sessions of Experiment 2 are described in Figure 9 (left) and illustrated in Figure 10A. There was a main effect of Session, with longer RT in Session I (mean RT = 466 msec) than II (mean RT = 427 msec), indicating a general effect of practice. There was also an effect of Rule in both Sessions I and II, with longer RTs to Rule B than A, showing that participants learned about the relative predictive value of the two rules and anticipated the most probable targets. In addition, in Session I, there was an effect of Trial, with decreasing RTs over the course of the session. No such effect was observed in Session II. This would be indicative of a ceiling effect on learning. Last, there was an interaction of rule and trial in Session I, with a sharper decrease of RTs in response to Rule A than to Rule B. Again, there was no such interaction in Session II. On the one hand, this interaction would also be indicative of the ceiling effect suggested above. On the other hand, it would show that participants became better over the course of the session at anticipating targets presented according to the most predictive rule, whereas targets presented according to the alternative Rule remained relatively surprising.
ERP results obtained in the long-term learning Session II of Experiment 2 are described in Figure 9 (right) and illustrated in Figure 10B. There was a main effect of Rule, showing higher amplitudes in response to the least predictive Rule B (mean amplitude = 6.76 μV) than to Rule A (mean amplitude = 0.75 μV). There was also a main effect of Trial, revealing that amplitudes decreased over the course of the phase. This effect was further clarified by an interaction of rule and trial, showing decreasing amplitudes over the course of the phase in response to targets presented according to Rule B, with no amplitude modulations for targets presented according to Rule A.
Here, we investigated the adaptability of predictions to different timescales. We measured RT and P3 amplitude as behavioral and neural correlates of PE. In the two experiments reported above, we found effects on both measures, showing probability-related differences between targets following the most and the least predictive rules in every learning phase of both experiments. Over and above these effects, we found that, for participants who performed the long-term learning phase prior to the short-term learning and test phases, both RTs and P3 amplitude showed a spontaneous transition from the short-term to the long-term learned associations in the test phase: Differences observed in the short-term learning phase faded throughout the test phase and were replaced by those learned in the long-term learning phase. No such transition was observed in participants that were not pre-exposed to the long-term learning phase. This pattern of results indicates that long-term associations remained intact whereas short-term associations were learned and transiently used while valid, and then no longer used when the context changed. The spontaneous recovery suggests independent storage and control of long-term and short-term associations. Before we further consider the implications of our main finding, we will briefly discuss the pattern observed in the long- and short-term learning phases of the two experiments.
P3 amplitude was larger, and RTs slower, in response to targets presented according to the less probable rule in every learning phase of both experiments (short-term in both experiments, long-term in Experiment 2). Furthermore, these differences increased throughout every learning phase, indicating that participants learned about the probabilities and consequently predicted the most probable outcome, whereas targets presented according to the alternative rule remained surprising throughout each of those phases. At a behavioral level, this interpretation is supported by numerous studies showing that RTs are faster for targets that can be anticipated and slower for unpredicted, surprising stimuli (Meyniel, Maheu, & Dehaene, 2016; Huettel, Mack, & McCarthy, 2002; Hyman, 1953). Regarding ERP modulations, previous studies on P3 have shown that its amplitude is sensitive to subjective probability, that is, to estimation of the environment on the basis of previous observations and learning (Mars et al., 2008; Donchin & Coles, 1988). In other words, P3 is sensitive to the discrepancy between the predicted state of the environment and its actual state, that is, to PE. However, in contrast to other error-related components, such as N1 or the MMN (Stefanics & Czigler, 2012; Todorovic, van Ede, Maris, & de Lange, 2011), which are related to sensory processing, P3 is considered a high-level error correlate, associated not to specific sensory features but to the processing of global contextual deviations (Bekinschtein et al., 2009) and to the update of perceptual evidence (Ehinger et al., 2015; Kelly & O'Connell, 2013; O'Connell et al., 2012; Sutton, Braren, Zubin, & John, 1965). The modulations of RTs and P3 amplitude observed in the long-term and short-term learning phases of the present experiments are therefore coherent with these views. Such modulations would reflect the violation of an abstract rule (same or opposite direction), rather than a mismatch between the physical attributes of the predicted and actual targets, which are different on a trial basis and thus would not create a strong sensory model on the basis of which physical characteristics of the target could be predicted. This differs from most other studies, in which sensory predictions are based on strong contextual regularities that favor the creation of such a model (Hsu et al., 2015; Todorovic & de Lange, 2012; Bekinschtein et al., 2009; Garrido et al., 2009; Schadow et al., 2009). This explains why no modulations in components related to sensory prediction, such as N1, were found.
Our results are also congruent with previous works that have studied the relationship between P3 amplitude and stimulus improbability expressed in terms of statistical surprise (Kolossa et al., 2015; Mars et al., 2008; Strange, Duggins, Penny, Dolan, & Friston, 2005). In this regard, Mars et al. (Mars et al., 2008) demonstrated that trial-by-trial fluctuations in P3 amplitudes could be explained in terms of participants keeping track of the global probabilities of visual events, so that its amplitude was reduced or enhanced as a function of surprise. Although the design used in this study is much simpler in terms of probabilities, our results fit this interpretation, with P3 amplitudes in response to targets presented according to the infrequent rule being systematically larger on a trial basis and P3 amplitudes in response to targets presented according to the dominant rule reducing throughout every single learning phase as participants learned about their high probability. Results observed in the test phase of Experiment 1, where differences in P3 amplitude rapidly disappeared as participants learned that both rules were equally predictive, are also coherent with this interpretation.
Despite the similarities, there were clear differences between experiments when comparing the short-term learning phases. Whereas in Experiment 1 RTs became faster throughout the phase in response to targets presented according to any rule, the difference increase being because of RT to targets that followed the dominant rule getting faster at a higher rate, in Experiment 2 RT difference between rules had a different origin. Indeed, rather than speeding up, RTs overall slowed down, and the difference increase between rules relied on RTs to surprising targets becoming progressively slower whereas RTs to predictable targets became slightly but significantly faster. This pattern suggests that, in Experiment 2, the previous exposition to the long-term learning phase influenced performance in the new context participants encountered in the short-term learning phase. This influence may be twofold. First, it may reflect a general effect of practice, so that participants reached a performance level close to ceiling throughout the long-term learning phase and consequently there was very little room for further RT improvement in the short-term learning phase. This would in part explain the fact that RT do not further decrease during this phase. Second, and most importantly, the RTs' increase in response to targets presented according to the infrequent rule in Experiment 2 suggests a more direct interference of the associations learned in the long-term learning phase on performance in the short-term learning phase. Specifically, the rule that was the most predictive in the long-term learning phase is in the short-term learning phase the most unlikely to predict the target, and participants need to learn about that. This learning process relies on practice and on accumulating enough contextual data (Bogacz, 2007; Gold & Shadlen, 2007), which takes time, and participants could have relied on their recent past experience in the long-term learning phase to make their predictions while there is not enough information available yet, needing to make an effort to inhibit their responses to the formerly most, now least predictive rule, which would be reflected in the RT increase.
The pattern observed in the long- and short-term learning phases allows us to interpret our main findings, namely, the differences between experiments in the test phase and, specifically, the spontaneous recovery of long-term associations in Experiment 2. As described in the Methods section, the test phase was identical in the two experiments, where targets could be predicted according to any of the two possible rules with equal probability. Results showed that participants rapidly learned about these contextual changes, as indicated by the immediate cancellation of the differences between rules in P3 amplitude and by the progressive fading and eventual suppression of RT differences. The existence of differences in RT at the beginning of this phase in the two experiments would indicate that the associations learned in the immediately preceding short-term learning phase were still being used by participants to anticipate the targets, until enough information about the new rules' relative probabilities was collected. Importantly, whereas in Experiment 1 differences in both variables disappeared after a number of trials, indicating that the new relative probabilities were learned, in Experiment 2 results reverted to a pattern similar to that observed in the long-term learning phase, that is, faster RTs and reduced P3 amplitude in response to targets presented according the rule that most likely predicted them in that phase. As in the short-term learning phase, this result can only be explained by the previous exposure to the long-term learning phase of participants in Experiment 2. More specifically, it would reflect the system's attempt to interpret contextual information and predict upcoming events according to the rule that dominated in the long-term learning phase once the alternative rule, valid in the short-term learning phase, could not effectively predict targets anymore. As said above, gathering evidence to uncover the relative probabilities of events and to adjust predictions accordingly takes time, and until this was achieved, the system might have tried to optimize this process by exploiting all sources of information available. This would include not only the sensory information extracted from the environment while performing the test phase but also the knowledge built upon similar past experiences (Domenech & Dreher, 2010), that is, during the long-term learning phase. This look back into previous knowledge would have been additionally impelled by the significantly larger number of trials included in the long-term learning phase: From a participant's perspective, the associations learned during the short-term learning phase might have been perceived as transiently valid only, leading participants to rely on those learned in the long-term learning phase until enough new information was available. Consequently, if the test phase extended in time, we would expect participants to eventually learn about the new relative probabilities and differences between the two possible targets to disappear, showing a similar pattern to that observed in Experiment 1.
The spontaneous recovery of the associations learned in the long-term learning phase indicates that these were not permanently replaced by short-term associations, but remained accessible to be used when the transiently useful short-term associations were no longer valid. This suggests independent control of short- and long-term associations. The question at this point would be whether this independent control requires or not the operation of independent mechanisms. There is increasing evidence that learning invokes multiple processes working at different timescales. In Pavlovian learning literature, it has been shown that long-term associations between stimulus and behavior spontaneously recover after being temporarily replaced by alternative stimulus–behavior associations conditioned on a shorter timescale (see for instance Bouton, 1993, for a review), which would indicate the participation of at least two mechanisms, one implementing the short-term association for transient adaptation and the other preserving the long-term association for being reused when conditions revert to the most common state. Working on perceptual learning, Bao et al. (Bao & Engel, 2012a, 2012b) ran a series of experiments in which participants were adapted to high contrast for a relatively long time to be later deadapted to a lower contrast for a shorter period. They found that, although the short phase initially produced deadaptation, adaptation effects spontaneously recovered when participants were tested in a neutral environment. They concluded that a single controlling mechanism could not account for the recovery effects, and that perceptual learning was possibly controlled by a continuum of mechanisms acting over a large range of timescales (Bao & Engel, 2012a). Along the same lines, Vul and coworkers (Vul et al., 2008) had previously demonstrated that the McCollough effect, an orientation contingent color aftereffect, is a product of two distinct and separable timescales of learning in early visual cortex. Similar examples can also be found in motor learning research, where Smith et al. showed that, in motor learning of reach, adaptation depends on two distinct processes, with different characteristics, that, importantly, operate at different learning rates, favoring motor adaptation to disturbances in the environment or within the motor system (Smith et al., 2006).
Taken together, those works suggest that the participation of multiple operators subserving learning at distinct timescales might be a widely implemented resource, providing humans with the necessary flexibility to successfully adapt to ever-changing contexts. However, although hypotheses proposing different independent operators control learning and adaptation at different timescales fit well in simple scenarios like those described above, they are problematic when explaining more complex situations. More specifically, in our study, such an explanation would imply the contribution of as many operators as sets of associations learned, that is, the associations participants would eventually learn in the test phase of Experiment 2 would call for a third operator, a hypothetical additional phase with different associations presented after the test phase would require the operation of a fourth mechanism, and so on. Alternatively, we think that the pattern of results found in our study needs for the associations learned at different timescales being stored and controlled separately but not, strictly speaking, for the operation of separate mechanisms. In this regard, although this study does not allow us to draw conclusions about the neural mechanisms underlying the operators supporting predictions at different time scales, we believe that our data fit quite well into previous findings about the biological bases of implicit statistical learning (Christiansen, 2019). There is strong evidence that implicit statistical learning activates the declarative and nondeclarative memory systems of the brain, essentially dependent on the medial temporal lobe (MTL) and the striatum, respectively (Batterink, Paller, & Reber, 2019; Reber, 2008, 2013; Squire, 1992). Specifically, neuroimaging results suggest a competitive interaction of the two systems in implicit statistical learning paradigms. The MTL system, supporting more flexible and abstract learning that can be generalized to new retrieval contexts, would play a role in learning during early stages of training, supporting rapid initial acquisition of higher order associations in complex sequences, and making predictions about possible outcomes (Bornstein & Daw, 2012). As learning progresses and the statistical structure of the context is unraveled, the associations between stimuli would become more predictable, which would lead to the progressive disengagement of the MTL system and to the striatal system, which supports more specific and context-dependent learning, subsequently taking control (Poldrack et al., 2001).
The interaction between the MTL and the striatum systems could hypothetically support the operation of predictions at different timescales observed in our study. Learning in each phase would be initially supported by the MTL system extracting the rules from the predictive sequences. Once the rules are learned, the activity in the MTL system would decrease and the striate system would take control of the predictive activity. Importantly, the information learned in every phase would be stored by the MTL system. When there are no previously learned associations available, statistical learning would simply proceed as explained above. In the test phase of both experiments, however, the initial involvement of the MTL system would not only support the extraction of statistical regularities, but also activate previously learned sets of associations, preferentially the most recently valid for target prediction (i.e., those learned in the short-term phase), as indicated by the RT and P3 amplitude pattern of results. As these associations fail to accurately predict targets, in Experiment 1, participants would eventually learn that both rules predict targets with equal probability and, consequently, the RT and P3 amplitude differences between rules would disappear. In Experiment 2, however, the long-term set of associations would be retrieved. This retrieval might be responsible for the spontaneous recovery observed in RTs and P3 amplitudes. Because these rules do not efficiently predict targets either, we speculate that participants would finally generate a new set of rules based on what they learn about the environment, so that RT and amplitude differences between rules would eventually disappear, as observed in Experiment 1. Unfortunately, the limited length of the test phase does not allow us to check this hypothesis.
In summary, although our behavioral and EEG data present obvious limitations to gain insight into the neural mechanisms supporting predictions at different timescales, we consider that the pattern of results obtained in our study could be better explained on the basis of the competitive interaction between the MTL and striatal systems, rather than on the contribution of separate, independent mechanisms operating at different timescales. This interaction would provide the system with the necessary flexibility to allow multiple sets of rules to be latently available to respond to changes in the environment when needed. The long-term/short-term results presented here would thus represent just a particular sample of such dynamics.
Finally, as explained above, our results suggest that, in the test phases of both experiments, instead of immediately learning about the regularities of the new context and applying that knowledge to anticipate the targets, at first, participants used the associations that had been proven to correctly predict sensory inputs in the preceding experimental phases. This result is coherent with well-established findings on associative learning, particularly from interference paradigms (Bouton, 1991; Miller, Kasprow, & Schachtman, 1986; Spear, 1978, 1981; Thomas, 1981) according to which different sets of associations can be learned and maintained through changing conditions, remaining more or less available for future retrieval (Bouton, 1993). However, the ensuing question would be what determines which one, among the different sets of associations stored, would be preferentially retrieved. Data obtained in previous experiments suggest that the relative availability of the different sets depends on variables such as the degree of match between the context in which the sets were learned and the current context, the time that has passed because the associations have been learned, and the relative strength of those associations, which in turn relies on several factors: Associative learning is incremental, so that, with practice and consistency, associations strengthen and performance becomes faster and more accurate (Schacter & Wagner, 2013). Hence, the pattern of results obtained in the test phase, and particularly the spontaneous recovery of long-term associations in Experiment 2, would be explained by participants initially applying the set of associations learned in the immediately preceding short-term learning phase, as those are the most recently acquired and, once these failed to consistently predict targets, either generating a new set of associations, as results from Experiment 1 suggest, or resorting before that to the more extensively practiced and consolidated long-term associations, as shown by results in Experiment 2. As we conjectured before, in Experiment 2, participants would also finally learn the probabilistic structure of the test phase, eventually causing RT and P3 amplitude differences between rules to disappear, as in Experiment 1, if the phase were prolonged long enough.
In order to successfully operate in an ever-changing environment, the brain needs to be able to flexibly adapt its predictions on the basis of the information it gathers over time. This is particularly important when a stable context is transiently modified: The brain needs to adapt to changes quickly, but without weakening the previous representations, as the new associations are only transient. Accordingly, our data show that the long-term associations were not simply overwritten by the short-term ones, because they spontaneously recovered in a subsequent neutral context different from that in which they were learned, as the pattern of RT and P3 amplitude modulations in the test phases suggests. The results of our experiments indicate that when the context changes and the current associations turn out not to be valid anymore, participants retrieve and use alternative associations they have previously learned in order to adapt their predictions, at least until enough contextual information is available to generate a new set of associations. The fact that the long-term associations recovered spontaneously, without needing to be relearned, indicates that they remained intact while transiently replaced by the short-term ones and therefore supports the notion that they are stored and controlled independently from each other. Future research should attempt to further explore the generation, maintenance, and possible interaction of predictions through different successive contexts by systematically manipulating their temporal and probabilistic relationships and, particularly, to determine the underlying neural bases supporting those processes.
APPENDIX: PRIORS USED IN EACH MODEL
|Parameter .||Parameter Code brms .||Prior .||Prior Decision .|
|Fixed effects||b||Normal (0, 1)||Weakly informative prior of null effect|
|Intercept||Intercept||student_t (3, 0, 2.5)||Default|
|Cholesky factors of correlations random effects||L||lkj_corr_cholesky (1)||Default|
|Random intercept and random slopes||sd, sigma||student_t (3, 0, 2.5)||Default|
|Parameter .||Parameter Code brms .||Prior .||Prior Decision .|
|Fixed effects||b||Normal (0, 1)||Weakly informative prior of null effect|
|Intercept||Intercept||student_t (3, 0, 2.5)||Default|
|Cholesky factors of correlations random effects||L||lkj_corr_cholesky (1)||Default|
|Random intercept and random slopes||sd, sigma||student_t (3, 0, 2.5)||Default|
This work was supported by the French Agence Nationale de la Recherche (ANR) within the Programme franco-allemand en Sciences humains et sociales (FRAL) 2016 (PROJECT ID: ANR-16-FRAL-0008 to F. W.) This research was partly supported by IdEx Université de Paris ANR-18-IDEX-0001.
Reprint requests should be sent to Álvaro Darriba, Integrative Neuroscience and Cognition Center (INCC), CNRS (UMR 8002) – Université de Paris, 45 rue des Saints-Pères, 75006 Paris, France, or via e-mail: firstname.lastname@example.org.
Álvaro Darriba: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Resources; Software; Visualization; Writing—Original draft; Writing—Review & editing. Sandrien Van Ommen: Data curation; Formal analysis; Methodology; Software; Visualization; Writing—Review & editing. Yi-Fang Hsu: Conceptualization; Writing—Review & editing. Florian Waszak: Conceptualization; Funding acquisition; Project administration; Supervision; Writing—Review & editing.
Florian Waszak, Agence Nationale de la Recherche (http://dx.doi.org/10.13039/501100001665), grant number: ANR-16-FRAL-0008.
Diversity in Citation Practices
A retrospective analysis of the citations in every article published in this journal from 2010 to 2020 has revealed a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .408, W(oman)/M = .335, M/W = .108, and W/W = .149, the comparable proportions for the articles that these authorship teams cited were M/M = .579, W/M = .243, M/W = .102, and W/W = .076 (Fulvio et al., JoCN, 33:1, pp. 3–7). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance.
Credible intervals depict the probability that a population effects would fall somewhere on that interval. As such, when an interval does not cross the 0 line, the probability is high that the effect found with the sample is a real effect.