Mechanisms of motor-sensory prediction are dependent on expectations regarding when self-generated feedback will occur. Existing behavioral and electrophysiological research suggests that we have a default expectation for immediate sensory feedback after executing an action. However, studies investigating the adaptability of this temporal expectation have been limited in their ability to differentiate modified expectations per se from effects of stimulus repetition. Here, we use a novel, within-participant procedure that allowed us to disentangle the effect of repetition from expectation and allowed us to determine whether the default assumption for immediate feedback is fixed and resistant to modification or is amenable to change with experience. While EEG was recorded, 45 participants completed a task in which they repeatedly pressed a button to produce a tone that occurred immediately after the button press (immediate training) or after a 100-msec delay (delayed training). The results revealed significant differences in the patterns of cortical change across the two training conditions. Specifically, there was a significant reduction in the cortical response to tones across delayed training blocks but no significant change across immediate training blocks. Furthermore, experience with delayed training did not result in increased cortical activity in response to immediate feedback. These findings suggest that experience with action–sensation delays broadens the window of temporal expectations, allowing for the simultaneous anticipation of both delayed and immediate motor-sensory feedback. This research provides insights into the mechanisms underlying motor-sensory prediction and may represent a novel therapeutic avenue for psychotic symptoms, which are ostensibly associated with sensory prediction abnormalities.
The ability to determine the origin of sensations is important for our adaptive behavior. For example, the sound of footsteps in a dark parking lot is not alarming if we determine that we have created the sound ourselves but could be a cause for alarm if we determine that the footsteps are generated externally by another agent (Stetson, Cui, Montague, & Eagleman, 2006). A critical distinction that our perceptual system makes when attempting to distinguish between self-generated and externally generated sensations relates to their predictability. In contrast to externally generated sensations, self-generated sensations are typically more predictable in nature. This inherent predictability means that the sensory consequences of self-initiated movements can be anticipated before their occurrence (Roussel, Hughes, & Waszak, 2013; Bays, Flanagan, & Wolpert, 2006; Wolpert, Ghahramani, & Jordan, 1995). This predictability is associated with a decrease in the perceived salience of sensations compared with equivalent sensations that are externally produced, a phenomenon known as sensory attenuation (Hughes, Desantis, & Waszak, 2013a; Ford & Mathalon, 2004; Blakemore, Wolpert, & Frith, 1998). Sensory attenuation has been formalized in feed-forward models of motor control, which postulate that, when an action is self-initiated, an efference copy of the motor command is produced and used to predict the sensory consequences of that action (Crapse & Sommer, 2008; Wolpert & Miall, 1996; Wolpert et al., 1995; von Holst, 1954). This efference copy acts to attenuate the (predictable) sensory experience resulting from a self-generated action, relative to when the sensory experience is produced by an external agent and hence is less predictable.
An assumption of the forward model is that self-produced sensations that are not accurately predicted in terms of their physical and temporal characteristics will be subject to less attenuation than self-produced sensations, which are better predicted (Roussel et al., 2013; Blakemore, Frith, & Wolpert, 1999). There is substantial evidence in support of this hypothesis (Hughes, Desantis, & Waszak, 2013b; Behroozmand, Liu, & Larson, 2011; Bäss, Jacobsen, & Schröger, 2008; Heinks-Maldonado, Nagarajan, & Houde, 2006; Bays, Wolpert, & Flanagan, 2005; Heinks-Maldonado, Mathalon, Gray, & Ford, 2005; McClure, Berns, & Montague, 2003; Houde, Nagarajan, Sekihara, & Merzenich, 2002; Blakemore et al., 1999). One aspect of sensory prediction involves anticipating when sensations will occur (Arnal & Giraud, 2012; McClure et al., 2003). Several studies have investigated how modifying the onset of self-initiated sensations alters the degree of sensory attenuation (Oestreich et al., 2016; Whitford et al., 2011; Aliu, Houde, & Nagarajan, 2009; Bäss et al., 2008; Bays et al., 2005; Blakemore et al., 1999). In these studies, participants pressed a button to either produce a sound (Oestreich et al., 2016; Whitford et al., 2011; Aliu et al., 2009; Bäss et al., 2008) or receive tactile stimulation (Bays et al., 2005; Blakemore et al., 1999). In these studies, the onset of sensory feedback was manipulated; it ranged from occurring immediately after the button press (i.e., 0-msec delay) or after some delay ranging from 50 to 500 msec. Consistently across these studies, it was found that imposing a delay between the action and resultant sensation caused a reduction in the amount of sensory attenuation compared with immediate feedback, with larger delays associated with lower levels of sensory attenuation. These results suggest that we have a default expectation that sensory feedback will follow immediately from self-generated actions and that deviations from this expectation result in prediction errors that can produce an increase in the salience of the experienced sensations (van Laarhoven, Stekelenburg, & Vroomen, 2017; Bendixen, SanMiguel, & Schröger, 2012).
The aim of this study was to investigate whether the expectation that sensations follow immediately from actions is fixed or whether it can be modified with experience. Predictive coding accounts of sensory attenuation argue that prediction errors, such as those caused by delayed sensory feedback, are critical teaching signals that facilitate new learning and recalibrate or update our sensory expectations (Bogacz, 2017; Brown, Adams, Parees, Edwards, & Friston, 2013; Friston, 2005; Rao & Ballard, 1999). According to such accounts, repeated exposure to temporally delayed sensory feedback could lead to a recalibration of the expectation that “sensations follow immediately from actions.” However, this raises the question of whether all sensory expectations are malleable as a function of experience or whether more weight is placed on certain “fundamental” predictions (such as the prediction that sensations follow immediately from actions), such that these predictions are resistant to modification on the basis of experience.
On the one hand, there seems to be clear value in the perceptual system updating its expectations in response to changes in perceptual experience. For example, visuomotor recalibrations take place via trial-and-error learning when reaching for objects or when walking while wearing laterally displacing prism glasses; this also results in a negative aftereffect when the glasses are removed (movement biases opposite to the displacement of the glasses), which again results in visuomotor adjustment with experience to maintain goal-directed actions (Hatada, Miall, & Rossetti, 2006; Morton & Bastian, 2004). Furthermore, with regard to the temporal association between actions and sensations, Stetson et al. (2006) have shown that repeated exposure to a delay between a button press and a subsequent visual stimulus results in a change in temporal order judgments, such that participants were more likely to judge a visual stimulus that occurred immediately after an action as occurring before the button press. In other words, participants appeared to adopt a new baseline for their temporal expectations consistent with their experience of delayed feedback from their actions. Stetson et al. (2006) suggested that these temporal recalibrations are adaptive and occur across the life span. For example, in the haptic system, the delay between actions and resultant sensations changes as we grow and our limbs elongate (e.g., when touching your thumb to a surface, the resultant reafference takes longer to reach the brain if your arm is 50 cm long than if it is 10 cm long). The implication is that accurate and efficient sensory modulation would benefit from updating predictions regarding the timing of sensory signals across the course of development.
An alternate hypothesis is that we hold relatively rigid expectations regarding certain key predictions, such as the prediction that “sensations follow immediately from actions.” Such predictions develop from a lifetime of experience and, as such, might be expected to be less susceptible to change. There is evidence to suggest that we can hold fixed heuristics to optimize information processing. For example, in the perceptual domain, the axiom that “light shines from above” seems to be inherent or fixed as it governs perceptual experience early in development and is resistant to modification with experience (Champion & Adams, 2007; Kleffner & Ramachandran, 1992; Hershberger, 1970; Hess, 1950). Research has shown that newborn chicks reared in an environment where light shone from above or light shone from below preferentially pecked to pictures of seeds where the shadows were consistent with a light source shining from above, irrespective of the environment they were reared in and when controlling for learning to peck based on depth cues (Hershberger, 1970; Hess, 1950). Furthermore, Champion and Adams (2007) used a visual search task in humans to show that visual–haptic training of stimuli with different lighting orientations did not result in a recalibration of the prior belief that “light shines from above,” as evidenced by no change in visual search performance after training. This suggests that a prior of “light shines from above,” by default, governs perception at the preattentive stage of processing, despite training experience to the contrary. Extending from this parameter, it is conceivable that the expectation that “sensations follow immediately from actions” may be an unmalleable perceptual rule that governs sensory experience and that persists even despite evidence to the contrary.
As such, the current study aimed to determine whether expectations regarding the timing of self-generated sensations are modifiable with training. Participants completed a task in which they repeatedly pressed a button to produce a tone, while EEG was continuously recorded. All participants underwent two training conditions, the order of which was randomized between participants. In the first condition, the tone was presented immediately subsequent to the button press; in the second condition, the tone was presented 100 msec subsequent to the button press. The dependent variable in this study was the amplitude of the N1 component of the ERP elicited by the tone. This component was of particular interest because prior research has shown that (other factors being equal) louder sounds evoke larger N1 amplitudes than softer sounds (Simmons, Nathan, Berger, & Allen, 2011; Mulert et al., 2005; Näätänen & Picton, 1987). The implication is that N1 amplitude provides a proxy measure of the perceived loudness of an auditory sensation and has been used as a standard measure of sensory suppression in past research (Timm, Schönwiesner, Schröger, & SanMiguel, 2016; SanMiguel, Todd, & Schröger, 2013; Bäss, Horváth, Jacobsen, & Schröger, 2011; Ford et al., 2001). Given that the focus of the current study is not on sensory suppression per se (i.e., a difference in N1 amplitude from a task where participants passively listen to tones) but rather differences in the predictability of self-generated sensations (i.e., a difference in N1 amplitude between delayed and immediate tones), it is also worth noting that N1 has been used to index sensory expectations, whereby unexpected stimuli typically evoke a larger N1 than expected stimuli (van Laarhoven et al., 2017; Oestreich et al., 2016; Hughes et al., 2013a; Lange, 2011; Bäss et al., 2008; Heinks-Maldonado et al., 2006; Heinks-Maldonado et al., 2005).
It was hypothesized that, before training, temporally delayed tones would be unpredicted by the motor-perceptual system, in keeping with the default assumption that “sensations follow immediately from actions.” This would be reflected in a larger N1 amplitude elicited by delayed tones compared with immediate tones, before training. However, if participants updated their sensory expectations with experience, then over the course of training, participants should learn to expect a delayed outcome after their actions, resulting in more accurate predictions on the timing of sensory feedback. This would be reflected in reduced N1 amplitude over the course of training. In contrast, if the expectation that “sensations immediately follow actions” was fixed and unmalleable, then delayed tones should remain poorly predicted even after extensive training, and correspondingly, N1 amplitude would be expected to remain relatively constant over training. The immediate training condition provides a necessary control for the possibility that N1 amplitude changes purely as a function of repeated exposure to a tone (Todorovic & de Lange, 2012).
Fifty University of New South Wales Sydney students participated for course credit. Five participants were excluded because of equipment failures (n = 2) and low signal-to-noise ratios (n = 3). The remaining 45 participants had a mean age of 19.4 years (SD = 3.2 years, range = 17–28 years), with 24 women. There were 37 right-handed, 3 ambidextrous, and 5 left-handed individuals, as measured by the Edinburgh Handedness Inventory (Ransil & Schachter, 1994). This study was approved by the University of New South Wales Sydney Human Research Ethics Advisory Panel (Psychology).
Stimuli and Procedure
The experiment was composed of a button-press-for-tone task (Figure 1A). Participants were required to press a button on a response pad (Model RB-530; Cedrus Corporation) with their dominant hand at a time of their choosing so long as it was subsequent to the presentation of a fixation cross on the monitor. The purpose of the fixation cross was to standardize the block length and to limit the occurrence of eyeblinks that were coincident with the tone (see below); it was emphasized to participants that the task was not speeded and that they should not press the button as soon as the fixation cross appeared. Pressing the button produced a pure tone (500 Hz, 100-msec duration, 5-msec rise-and-fall time), which occurred 17 msec after the button press (immediate tone) or 117 msec after the button press (delayed tone). The tone played binaurally over headphones (Sennheiser HD 201) at approximately 70 dB SPL. The fixation cross disappeared from the screen 1500 msec after tone offset, signaling the end of the trial. The next trial began after a random interval of between 1000 and 3000 msec. Each of the first 10 blocks of the experiment consisted of 60 button-press-for-tone trials. Each block was homogenous in that it contained only immediate or delayed tones. Each block took approximately 6 min to complete.
Figure 1B shows the experimental design. Participants were initially given verbal instructions on how to perform the task and were encouraged to time their eyeblinks to when the fixation cross was not on screen. They then completed five practice trials of pressing for an immediate tone. After this, participants completed 10 training blocks, consisting of five blocks with immediate tones (immediate training condition) and five blocks with delayed tones (delayed training condition). Participants who were randomly assigned to the Imm-Del group (n = 24) completed five immediate tone blocks followed by five delayed tone blocks; participants in the Del-Imm group (n = 21) completed five delayed tone blocks followed by five immediate tone blocks. Participants took brief, self-paced breaks between blocks, and reminder instructions were displayed on-screen before beginning each block (these instructions were identical for immediate and delayed tone blocks and made no reference to any delay between button press and tone). After training, all participants underwent one block (60 trials) of a motor control condition. The motor control condition was identical to the immediate and delayed blocks, except that pressing the button did not result in a tone being played. The instructions and fixation cross were presented on a 24-in. BenQ XL2420T monitor (1920 × 1080 resolution, 120-Hz refresh rate). The background color of the screen was middle gray throughout the experiment; the text instructions and the fixation cross were white. All stimuli were presented using MATLAB R2012b with Psychophysics Toolbox extensions (Kleiner, Brainard, & Pelli, 2007; Brainard, 1997; Pelli, 1997).
EEG Acquisition and Processing
EEG data were collected using a BioSemi ActiView system with a 2048-Hz sample rate, a 417-Hz bandwidth (3 dB), and a 24-dB/octave roll-off (Biosemi, 2012). The Ag/Ag-Cl electrodes were connected to all 64 cap channels, with additional electrodes attached to the mastoids and nose as well as electrodes placed 1 cm from the outer canthi of both eyes and 1 cm under the left eye to monitor horizontal and vertical eye movements. Online referencing was to sensors located in the parietal region of the cap (CMS and DRL). The DC offsets were kept below 25 mV.
EEG data were processed using BrainVision Analyser (v.2.1). The data were rereferenced offline to the average of linked mastoids and were then passed through a 0.5- to 30-Hz Butterworth bandpass filter. For all tone conditions, in each trial, an 800-msec epoch was created, which covered 200 msec before tone onset to 600 msec after tone onset. The same was done for the motor condition, except that the epoch was created around the button press. Eyeblink artifacts were corrected using the method of Gratton, Coles, and Donchin (1983), and epochs were rejected if the peaks were ±100 μV or had a maximum gradient of >50 μV. The remaining epochs were then averaged to create a waveform for each block. A fast Fourier independent component analysis correction was applied to headphone-induced artifacts, which manifest as a sharp positive deflection at approximately 15 msec after tone onset. The average waveform of the motor control block was subtracted from the immediate tone blocks to produce a motor-corrected waveform to the tone alone, consistent with previous sensory suppression literature (Mifsud, Beesley, Watson, & Whitford, 2016; Ford, Palzes, Roach, & Mathalon, 2014; SanMiguel et al., 2013; Bäss et al., 2011). The same procedure was applied to the data from delayed blocks, except that the motor control waveform was time-shifted 100 msec before applying the motor correction, to match the timing of the onset of the button press, as has been done previously (Elijah, Le Pelley, & Whitford, 2016). The prestimulus phase was baseline corrected in the interval of −200 to 0 msec.
Scalp topographies (Figure 4B) indicated that the N1 component was maximal at frontocentral sites, which is consistent with previous auditory suppression research (Whitford et al., 2017; Ford et al., 2014; Saupe, Widmann, Trujillo-Barreto, & Schröger, 2013). We analyzed data from an array of electrodes at which activation was maximal (Fz, FCz, and Cz), as is common practice in auditory ERP studies (Bednark, Poonian, Palghat, McFadyen, & Cunnington, 2015; Ford et al., 2014; Hughes et al., 2013b; Saupe et al., 2013; Luck, 2005). The amplitude of the auditory N1 was defined for each participant (for each condition) as the mean activity in a 20-msec time window centered around the most negative point on each participant's average waveform, in the time window of 50–150 msec. Similarly, scalp topographies were generated by extracting activity over a 20-msec time window around the most negative point between 50 and 150 msec on grand-averaged waveforms for each condition. The time windows for delayed tones for Blocks 1–5 averaged across all participants were 92–112, 90–110, 93–113, 92–112, and 94–114 msec, respectively. The time windows for immediate tones for Blocks 1–5 were 92–112, 92–112, 92–112, 88–108, and 94–114 msec, respectively. In all figures plotting N1 amplitude, the within-participant SEM was used to represent variability (Cousineau, 2005).
Figure 2 displays the mean N1 amplitude elicited by immediate and delayed tones across the five blocks of training, separated by training order. Mean N1 amplitude was analyzed using a 2 × 2 × 5 mixed ANOVA with factors of Training order (Imm-Del or Del-Imm), Training type (delayed or immediate training), and Block (Training Blocks 1–5). Given that we were interested in the change in N1 across multiple blocks, this analysis used orthogonal linear trend contrasts.
There was a significant main effect of Training type, F(1, 43) = 4.160, p = .048, ηp2 = .088, and Block, F(1, 43) = 9.084, p = .004, ηp2 = .174. Critically, the Training type × Block interaction was significant, F(1, 43) = 7.683, p = .008, ηp2 = .152, indicating that the influence of training on N1 amplitude depended on the delay experienced between button press and tone. Figure 2 shows that N1 amplitude tended to decrease across the course of training with delayed tones but remained relatively stable across training with immediate tones. Notably, this pattern of change across blocks did not depend on the order of training, as indicated by a nonsignificant three-way interaction (Training order × Training type × Block: F(1, 43) = 0.24, p = .63, ηp2 = .005). That said, the order of training did have an impact on N1 amplitude: There was a significant main effect of Training order, F(1, 43) = 4.317, p = .044, ηp2 = .091, that was qualified by a Training order × Training type interaction, F(1, 43) = 5.644, p = .022, ηp2 = .116. Further exploration of this interaction with simple effects revealed that, collapsing across training blocks, whereas N1 amplitude to delayed tones did not differ significantly as a function of Training order, F(1, 43) = 0.261, p = .612, ηp2 = .006, N1 amplitude to immediate tones was significantly greater in the Imm-Del group than in the Del-Imm group, F(1, 43) = 13.180, p = .001, ηp2 = .235 (see Figure 2). The Training order × Block interaction was not significant, F(1, 43) = 0.731, p = .397, ηp2 = .017.
Because the three-way interaction was not significant, we collapsed across the two training order groups (Del-Imm and Imm-Del) in a follow-up analysis, which was aimed at unpacking the Training type × Block interaction. Figure 3 shows the resulting mean N1 amplitudes for immediate and delayed tones across training blocks, and Figure 4 shows the corresponding ERPs (A) and scalp topographies (B). Consistent with the pattern observed in Figure 3, trend analysis revealed a significant linear decrease in N1 amplitude across delayed training blocks, F(1, 44) = 14.293, p < .001, ηp2 = .245, but no significant linear trend in N1 amplitude across immediate training blocks, F(1, 44) = 0.324, p = .572, ηp2 = .007. Furthermore, a targeted comparison of data at the first block of training between each tone type revealed a significantly higher N1 amplitude for delayed tones compared with immediate tones, F(1, 43) = 11.182, p = .002, ηp2 = .203. There was no significant difference between the two tone types at the final block of training, F(1, 43) = 0.108, p = .744, ηp2 = .002.
To further investigate the change in sensory modulation when switching from one timing expectation to another, a secondary analysis was run that focused on the period of transition between training types. Figure 5 shows N1 amplitude to the tone in the fifth training block (the final block of Phase 1 of the task, before the delay type switched) and in the sixth training block (the first block of Phase 2, after the delay type switched). These data were analyzed using a 2 × 2 mixed ANOVA with factors of Block (the final block of Phase 1 vs. the first block of Phase 2) and Training order (Del-Imm vs. Imm-Del). This revealed a significant main effect of Training order, F(1, 43) = 7.911, p = .007, ηp2 = .155, with a larger N1 amplitude in the Del-Imm condition. There was no significant main effect of Block, F(1, 43) = 0.018, p = .894, ηp2 < .001. Importantly, there was a significant Training order × Block interaction, F(1, 43) = 6.472, p = .015, ηp2 = .131. Analysis of simple effects found no significant difference in N1 amplitude between delayed and immediate tones in the final block of Phase 1, F(1, 43) = 0.537, p = .468, ηp2 = .012, but a significantly smaller N1 amplitude for immediate tones compared with delayed tones in the first block of Phase 2, F(1, 43) = 18.298, p < .001, ηp2 = .809. Orthogonal comparisons revealed that the change in N1 amplitude from Phase 1 to Phase 2 fell short of significance for both the Del-Imm group, F(1, 20) = 3.87, p = .063, ηp2 = .162, and the Imm-Del group, F(1, 23) = 2.793, p = .108, ηp2 = .108.
There is substantial evidence that auditory stimuli that follow immediately from self-initiated actions normatively evoke smaller N1 amplitudes compared with physically identical stimuli that occur after a delay (Elijah et al., 2016; Oestreich et al., 2016; Whitford et al., 2011; Aliu et al., 2009; Bäss et al., 2008). Consistent with the literature, our results showed that, at the beginning of the relevant training phase, participants' N1 amplitudes were larger to delayed tones compared with immediate tones (see Figure 3), suggesting that delayed tones were unpredicted relative to immediate tones. This finding adds to the growing body of literature that suggests that, by default, we have a prior expectation for immediate sensory feedback after our own willed actions (Friston, 2005; Niemi & Näätänen, 1981). The primary purpose of the current study was to determine whether this expectation that “sensations follow immediately from actions” is amenable to modification with training or whether it is fixed and unchangeable with experience. We demonstrated that, with repeated exposure to a delayed action–sensation contingency (i.e., a willed button press for a tone presented 100 msec later), there was a significant decrease in the N1 amplitude elicited by the tone over the course of training. This suggests that participants were able to learn to anticipate delayed feedback from their actions, which was reflected in the extent to which the cortical response to these tones was attenuated with training. In other words, we suggest that there was an update of the sensory expectation as to “when” auditory sensations were predicted to result from a willed action.
Notably, the reduction in N1 amplitude over the course of delayed training cannot simply be a consequence of habituation due to repeated exposure to the tone (Todorovic & de Lange, 2012). If habituation were the sole source of any change in N1 amplitude, then we would expect a similar decrease in N1 amplitude over the course of training to both immediate and delayed tones. However, we found a significant interaction between the training tone (immediate vs. delayed) and exposure (Training Blocks 1–5), whereby the reduction in N1 amplitude with repeated exposure to the tone was more marked in the delayed training condition compared with the immediate training condition, which showed no significant reduction in N1 amplitude over training blocks (shown most clearly in Figure 3). In other words, the reduction in N1 amplitude in the delayed training condition was apparent over and above any change that occurred in the immediate training condition. This lack of significant change with immediate training is critical, as it suggests that repetition per se was not the determinant of change; rather, we can conclude that participants' temporal expectations were modified with the experience of delayed feedback. Further evidence for the importance of the action–sensation duration came from a secondary analysis whereby the change in N1 amplitude with a switch in training depended critically on the type of training before and after the switch (see Figure 5). After repeated experience with a particular type of tone (delayed or immediate), there was no difference in N1 amplitude between delayed and immediate tones; however, after a switch in the experienced tone type, a difference in N1 amplitude between delayed and immediate tones emerged. This result suggests that our recent sensory experience (in this case, for delayed or immediate feedback after the button press) modulates subsequent sensory attenuation. Taken together, these results provide strong evidence that sensory expectations regarding the timing of self-generated sensations are, at least to some extent, malleable given new temporal information.
The current findings add to previous research that has investigated temporal adaptation to delayed, self-initiated sensations (Cao, Veniero, Thut, & Gross, 2017; Elijah et al., 2016; Aliu et al., 2009). Aliu et al. (2009) investigated auditory suppression changes with repeated exposure to delayed sensory feedback using MEG. Participants underwent four blocks of pressing a button for a 100-msec delayed tone, and suppression to these tones was calculated as the difference between the cortical response to passively presented tones and delayed, self-initiated tones. Aliu et al. found an increase in cortical suppression of the m100 component (the MEG equivalent of the N1 component in EEG) across repeated exposure to the delayed tone. Cao et al. (2017) showed the same training effect on the m100 component with delayed action effects and showed that this temporal adaptation was modulated by the cerebellum. These existing studies are consistent with the idea that expectations for delayed sensory feedback from an action can be updated.
An important methodological issue in the sensory attenuation literature relates to the validity of comparing “active” and “passive” conditions. Most previous studies in the field have focused on comparing an “active” condition, in which tones are generated by a self-initiated motor action, with a “passive” condition, in which tones are externally generated without any action. This active-versus-passive comparison is problematic, as it introduces potential confounds that may render findings equivocal (Horváth, 2015; Hughes et al., 2013a). For example, attention may differ between conditions: In a passive condition, participants' attention is solely focused on the tone, whereas in an active condition, attention may be divided between the action and the tone. In light of previous research indicating that attention can influence N1 amplitude, whereby the more attention given to a stimulus, the larger the N1 it elicits, (Saupe et al., 2013; Timm, SanMiguel, Saupe, & Schröger, 2013), it is possible that studies employing active-versus-passive comparisons may be tapping changes in attention rather than changes in temporal expectations. Furthermore, the fact that an active condition involves motor-related activity (from the button press) whereas the passive condition does not is also potentially problematic. Although movement-related activity is typically corrected during EEG processing (by measuring the motor-evoked potential generated in the motor-only condition and subtracting this from the active condition), this correction is suboptimal as it makes the (potentially erroneous) assumption that the auditory- and motor-evoked activity is additive in nature (Horváth, 2015). As such, any observed differences in N1 amplitude between active and passive conditions could potentially be due to differences in motor-related activity (or the correction of such activity), rather than differences in motor–auditory prediction per se. In contrast, the current study relied on a comparison between two active conditions (in which participants pressed a button for either immediate or delayed tones). In this design, factors such as attention and motor activity are more closely matched for the critical comparison, and thus we can be more confident that the observed difference between the two conditions reflects a difference in temporal expectations.
The current findings are also consistent with previous work from our own laboratory in which we investigated people's ability to modify temporal expectations with delayed training (Elijah et al., 2016). Like the current study, this previous research was based on a comparison between two active conditions. However, Elijah et al. (2016) used a fully between-participant design: One group of participants received training in which button presses produced a delayed tone, whereas another group was trained with immediate tones. Pretraining and posttraining measures of N1 amplitude elicited by delayed and immediate tones were used to assess the effectiveness of training in each group. We showed that training with delayed tones led to an elimination of the baseline difference in N1 amplitude between immediate and delayed tones. Training with immediate tones, in contrast, resulted in these baseline differences being retained, posttraining. However, as a product of its between-participant design, differences in the type of training received (i.e., immediate or delayed tones) were confounded with the order in which the critical test blocks were administered. This precluded a direct statistical comparison of the two groups, because it meant that different types of test trials were confounded with participants' level of experience with the task. This limited our ability to directly compare the effects of immediate versus delayed training. In contrast, the within-participant design of the current study allowed us to directly compare the effects of training with immediate or delayed tones, because all participants completed both training conditions (in counterbalanced order). As such, this study is, to our knowledge, the first to directly compare the differential effects of training temporally expected (immediate) and unexpected (delayed) tones on sensory attenuation. This direct comparison demonstrates that attenuation of N1 amplitude is indeed dependent on temporal experience, rather than merely reflecting habituation, or attentional and/or motor-related differences between conditions. More precisely, we suggest that the specific reduction in N1 amplitude that resulted from training with delayed tones, but not immediate tones, reflected the updating of temporal expectations, which enabled the anticipation of delayed feedback from self-initiated actions.
A further advantage of the current within-participant design over previous designs is that it allowed us to examine more closely the influence of switching participants' temporal experience—from immediate action effects to delayed action effects and vice versa—on their temporal expectations. At a theoretical level, it could be the case that experience of action–sensation delays results in the change of the expectation that “sensations follow immediately from actions” to an expectation of “sensations follow from actions after a specific delay” (i.e., there is no longer a default expectation for immediate feedback). Another possibility is that the experience of action–sensation delays leads to the formation of a new expectation (that “delayed sensations result from actions in the current context”), and this new expectation comes to dominate over the existing, default “immediate” expectation (that remains intact and unchanged) based on contextual information (i.e., recent experience of delayed feedback). In either of these cases, we would expect to see an increase in N1 amplitude to immediate feedback (indicating a violation of expectations) after repeated experience of delayed feedback. However, this was not the pattern we observed, as illustrated in Figure 5. On the contrary, in an analysis targeted at the period of transition between the two training types, we found evidence that a switch in training type from delayed to immediate tones did not result in a larger N1 amplitude for immediate tones. Indeed, when analyzing over the whole training period, we found that N1 amplitude elicited by immediate tones was significantly smaller when participants had received previous training with delayed tones, compared with when they had not, as shown in Figure 2.
These findings are inconsistent with an account under which the effect of training with delays is simply to shift the mean expectation of action–sensation delays, such that immediate tones are unexpected after prior training with delayed tones. At a general level, our data suggest that prior experience of delays results in greater sensory attenuation of both delayed and immediate tones in the future. One possibility is that the effect of training with delays might widen the range of delays that is expected, rather than shifting the mean expectation (see also van Laarhoven et al., 2017; Shadmehr, Smith, & Krakauer, 2010). More specifically, our results are consistent with an account in which experience with delayed feedback triggers a recalibration of action–sensation expectations, which nevertheless retains a privileged status for immediate sensory feedback. This account can be described as follows. All participants begin the experiment with the default expectation that “sensations follow immediately from actions,” and immediate sensations will be subject to greater sensory attenuation. For the Imm-Del group, the first training phase involves immediate tones, and there is no violation of the default expectation and thus no trigger to recalibrate it. When this group then encounters delayed tones in the second training phase, the default expectation is violated, and recalibration is triggered—resulting in increased sensory attenuation of delayed tones as training proceeds, such as we observed empirically. In contrast, participants in the Del-Imm group experience expectancy-violating delayed tones during the first training phase. This immediately triggers recalibration, thus increasing sensory attenuation of delayed tones as training proceeds (resulting again in a steadily decreasing N1 amplitude over training with delayed tones). Critically, our data suggest that the perceptual system may accomplish this via a “global” increase in suppression over the range of the delay: Additional suppression is applied over the whole range from 0 to 100 msec (as illustrated in the Figure 6 schematic). This has the effect of achieving attenuation of the now-predictable delayed tones, resulting again in a steadily decreasing N1 amplitude over training with delayed tones. However, as a side effect, when immediate tones are subsequently presented in the second training phase for the Del-Imm group, they are now subject to both the attenuation resulting from the default expectation plus the additional suppression that occurred as a result of the recalibration process. Hence, for these participants, N1 response to immediate tones will be especially small, as we observed empirically.
The idea that we still expect immediate sensations from our actions despite experience with delayed feedback raises the question of whether expectations for immediate feedback are “hardwired” or occur because of a lifetime of experience. For example, as noted earlier, previous research suggests that raising chicks in an environment where light shines from below does not seem to influence chicks' prior belief that “light shines from above,” suggesting that this prior is innate and resistant to modification (Hershberger, 1970; Hess, 1950). It could be the case that the same rigidity is apparent in our belief that sensations follow immediately from actions. That said, it should be noted that the delay manipulation in our task altered only a subset of the participants' experiences. Specifically, although we introduced a delay between button press and tone, actions peripheral to this task would have still produced immediate sensory feedback. For example, if the participants shifted in their seat, this would create immediate auditory and somatosensory feedback; likewise, head movements would create immediate visual feedback. In this sense, the expectation that “sensations follow immediately from actions” would continue to receive partial confirmation even during the delayed training condition. Hence, it seems plausible that, under these conditions, participants could retain a privileged status for immediate auditory feedback, while simultaneously adjusting their expectations for delayed auditory feedback (as illustrated in Figure 6). This implies that our procedure, if anything, probably underestimates the ability of the sensorimotor system to adapt to “pure” changes in experienced action–sensation delays (i.e., training in which there are no “disconfirmatory” trials). Finally, it raises the question of whether the adaptation of sensory attenuation that is produced by training is specific to the action–sensation relationship learned in the training context (e.g., a button press results in a 500-Hz tone in the psychophysiology laboratory) or whether the effects of training will generalize to other stimuli or contexts and possibly even other sensory modalities. This remains a question for future research.
In the discussion above, we have offered a simplistic account of the mechanisms that might underlie adaptability of temporal expectations. In reality, the situation is undoubtedly more nuanced; for example, the neural processing engaged by tasks such as this would be much more complex than simply categorizing temporal information as “delayed” or “immediate” based solely on past experience and using this information as a basis for sensory prediction. Instead, noise in the system, and in the world more broadly, would mean that people would experience a range of delays over a narrow window, based on what they have experienced in the past (Shadmehr et al., 2010; Wolpert & Miall, 1996; Wolpert et al., 1995). For example, during development, the conduction delays associated with nerve impulses initiated by peripheral tactile stimulation would change with limb growth (Campbell, Ward, & Swift, 1981). To account for such variation in the system, it is likely that the normative level of expected delay falls within a certain range. As such, it may only be when a set of experiences consistently fall outside this range (e.g., 100-msec delayed auditory feedback in the case of the current experiment) that a recalibration process is triggered. Nevertheless, predictive coding accounts suggest that priors are based on probability estimates of the likelihood of outcomes based on the current state and the context, not only on past experience (Bogacz, 2017; Brown et al., 2013; Friston, 2005; Rao & Ballard, 1999). Consequently, recalibration of the expectation for delayed feedback may be very specific to the action–effect association that is learnt and may not generalize to experiences outside this experimental context. As such, it may be the case that, for recalibration to trigger an expansion of the normative temporal window more permanently, the physical system must change (i.e., limb growth, diseases processes) so that most feedback from actions consistently occur within a new time frame. Otherwise, recalibration may be specific to the context in which the temporal association between the action and the event is learnt.
The current finding that sensory attenuation is, to some extent, modifiable with experience may provide a potential avenue for treating clinical populations who show dysfunctions in sensory attenuation, such as patients with schizophrenia. Research has consistently shown that patients with schizophrenia show subnormal differences in N1 amplitude between self-generated and externally generated sounds (Ford et al., 2014; Whitford et al., 2011; Ford & Mathalon, 2004). This deficit in sensory suppression provides a potential explanation for some of the most characteristic symptoms of schizophrenia, such as delusions of control whereby misplaced salience attributed to willed actions leads to misinterpretation of these self-initiated movements as being controlled by an external force (Fletcher & Frith, 2009; Frith, 2005; Feinberg & Guazzelli, 1999; Feinberg, 1978). There is further evidence to suggest that this disruption in sensory attenuation may be due to deficits in predicting the timing of self-generated sensations (Oestreich et al., 2016; Whitford et al., 2011), possibly due to structural damage to white matter (Whitford et al., 2011). If temporal expectations are modifiable with training—as suggested by the results of this study—retraining patients' expectations regarding the timing of self-generated sensations may be a viable avenue of treatment for normalizing their sensory attenuation deficits. Furthermore, research suggests that neurocognitive dysfunctions in schizophrenia can be remediated with cognitive and behavioral training (Dale et al., 2016; Penadés et al., 2013; Subramaniam et al., 2012; Vinogradov, Fisher, & de Villers-Sidani, 2012). If the characteristic symptoms of schizophrenia indeed reflect sensory attenuation abnormalities, then normalizing these abnormalities by means of behavioral training might be expected to be clinically therapeutic and thus worthy of investigation in future studies (Whitford, Ford, Mathalon, Kubicki, & Shenton, 2012).
In conclusion, this study provides evidence that the sensory expectation that “sensations follow immediately from actions” can be modified with experience. However, this flexibility is conditional; the current data suggest that, although it is possible to learn to expect delayed feedback from our actions with training, we concurrently continue to expect immediate feedback, despite this training. These findings are consistent with the idea that the experience of temporally delayed sensations after actions results in a broadening of the temporal window over which sensations are expected to occur after a self-generated action.
We thank the two anonymous reviewers for their helpful and insightful comments. Ruth B. Elijah is supported by an Australian Postgraduate Award. The study was funded in part by Discovery Projects from the Australian Research Council (DP140104394 and DP170103094) awarded to Thomas J. Whitford and Mike. E. Le Pelley. Thomas J. Whitford is supported by an NHMRC Career Development Fellowship (APP1090507).
Reprint requests should be sent to Ruth B. Elijah, School of Psychology, The University of New South Wales, Sydney, NSW 2052, Australia, or via e-mail: email@example.com.