Sensory responses to stimuli that are triggered by a self-initiated motor act are suppressed when compared with the response to the same stimuli triggered externally, a phenomenon referred to as motor-induced suppression (MIS) of sensory cortical feedback. Studies in the somatosensory system suggest that such suppression might be sensitive to delays between the motor act and the stimulus onset, and a recent study in the auditory system suggests that such MIS develops rapidly. In three MEG experiments, we characterize the properties of MIS by examining the M100 response from the auditory cortex to a simple tone triggered by a button press. In Experiment 1, we found that MIS develops for zero delays but does not generalize to nonzero delays. In Experiment 2, we found that MIS developed for 100-msec delays within 300 trials and occurs in excess of auditory habituation. In Experiment 3, we found that unlike MIS for zero delays, MIS for nonzero delays does not exhibit sensitivity to sensory, delay, or motor-command changes. These results are discussed in relation to suppression to self-produced speech and a general model of sensory motor processing and control.
A key goal in neuroscience is to understand the complex interplay between the brain's sensory and motor systems, and a phenomenon where this interplay is readily observed is the suppressed sensory response to self-produced sensations. In human auditory cortex, this suppression is observed during speech (speaking-induced suppression, or SIS), where it manifests properties that elucidate how auditory feedback is processed: A speaker's auditory cortex responds to the sound of his own speech with an activation that is suppressed compared with a greater response during passive listening to playback of the speech (Eliades & Wang, 2003; Houde, Nagarajan, Sekihara, & Merzenich, 2002). Such suppression is highly specific to the auditory speech feedback: Responses to additional tone pips occurring during speech are not suppressed beyond that expected for acoustic masking, and if the subject's auditory feedback is artificially altered, the response to speech is restored to the same levels observed during passive listening (Houde et al., 2002). This suppression profile suggests that the auditory cortex compares incoming auditory feedback to a prediction of expected feedback. Such a comparison is crucial as the brain is continuously assailed with sensory stimuli originating both externally and internally (self-produced) and it is necessary to accurately and continuously distinguish self-produced stimuli—which can generally be discarded—from external stimuli, which might be necessary for proper interaction with the environment.
It has been postulated that this distinction is guided by a central monitor (Frith, 1992) or an internal forward model (Blakemore, Rees, & Frith, 1998; Wolpert, 1997; Wolpert, Ghahramani, & Jordan, 1995), which learns and predicts sensory consequences of self-produced actions using a copy of such self-produced commands [variously referred to as “efference copy” (von Holst, 1954) or “corollary discharge” (Sperry, 1950)]. During speech, the efference copy enables the forward model to produce an accurate prediction of auditory feedback, resulting in a small prediction error, which translates to a minimal response in the auditory cortex. During passive listening, where the efference copy is unavailable, the forward model is unable to generate an accurate prediction of auditory feedback, resulting in a larger prediction error, which translates to a larger response in the auditory cortex. Under this hypothesis, a larger prediction error can be artificially created during speech by distorting the auditory feedback perceived by the subject, which again translates to a larger response in the auditory cortex.
A similar suppression phenomenon has been observed in the somatosensory system where responses to self-produced tactile stimuli are weaker relative to externally generated tactile stimuli (Blakemore, Wolpert, & Frith, 1998, 2000), and such suppression is sensitive to delays in stimulus delivery. These observations suggest that, akin to the auditory cortex, the somatosensory cortex processes sensory feedback by comparing it against an efference copy-based prediction of said sensory feedback. The similarities in suppression phenomena observed in the auditory and somatosensory cortex suggest that suppression observed in the auditory cortex during speech is not unique to the act of speaking, but instead, is a special case of a more general property of the auditory cortex: that it processes auditory feedback from any motor act by comparing incoming feedback against a prediction of that feedback derived from an efference copy of the motor command that produced the feedback, where this comparison results in motor-induced suppression (MIS).
Indeed, the auditory cortex does exhibit suppression for a more arbitrary pairing of a motor act and auditory stimulus: EEG and MEG experiments have demonstrated that the auditory response to self-triggered tones is suppressed relative to the response while passively listening to the same tones. Schafer and Marcus (1973) demonstrated that the vertex EEG response was attenuated for self-generated auditory stimuli when compared to machine-generated stimuli. More recently, Martikainen, Ken-ichi, and Hari (2005) reported similar findings, where the MEG responses arising from the auditory cortex were attenuated for self-triggered tones, and that such an attenuation develops rapidly within a block of 60 trials.
If MIS in the auditory cortex arises in the same way that SIS is hypothesized to arise, that is, from a comparison with an efference copy derived prediction, then it should have characteristics resulting from the properties of predictions. First, MIS in the auditory cortex should be a learned phenomenon, that is, it should not be immediately present, but require practice trials to develop. This follows from the hypothesis that MIS arises from comparison with a prediction, and that the prediction comes from a forward model that must be learned.
Second, a full sensory prediction should have two dimensions: It should specify not only what the predicted sensation will be (e.g., a tone), but also when the sensation will arrive (e.g., the tone is heard X msec after the button press). It is conceivable that expectation of sensory type is separately learned from the expectation of sensory timing, that is, perhaps when you expect something (anything) to happen as a result of your action, is a learned expectation process that is separate from the expectation of what will happen because of your action. To properly test this theory, it is necessary to test the response to sensory stimuli (the what) as well as the response to the same sensory stimuli in a paradigm where it is not immediately delivered (the when). One way to achieve this is to introduce a delay between the trigger—a motor act in our case—and sensory stimulus onset. Arrival time should be an important part of learning a prediction because there are intrinsic time delays in the processes between the efferent motor command going out (e.g., to the finger muscles, which have a response latency), and the transduced sensory feedback coming in (e.g., via the auditory pathway to the auditory cortex, which includes transmission and synapse delays).
Two predictions about MIS in the auditory cortex follow from this. First, MIS in the auditory cortex should be sensitive to feedback delays, as indeed appears to be true for MIS in the somatosensory cortex (Blakemore et al., 2000; Blakemore, Frith, & Wolpert, 1999). Second, MIS should develop for different, artificially produced, feedback delays—a prediction that is easier to test with button-generated tones than with speaking. This follows from the assertion that arrival time is an essential component of a prediction, and that due to intrinsic neural processing delays, a nonzero arrival time must be learned even without additional artificial feedback delay.
Why might one expect a different response profile to a zero-delay condition versus a non-zero-delay condition? For one, the zero-delay condition is encountered frequently in the course of everyday living, such as during speech, and might be special as a consequence. It is reasonable to expect zero-delay MIS to require minimal training because it is, in effect, an extension of everyday life, and to expect characteristics distinct from non-zero-delay MIS. Furthermore, because a key element of our hypotheses is a learned internal model that accommodates sensory delays (Whitney, Murakami, & Cavanagh, 2000; Whitney & Murakami, 1998), it is important to examine the development of MIS when additional delays are introduced in the sensory feedback of self-triggered actions. We hypothesized that because zero-delay sensory feedback is encountered so frequently in everyday use, there may be an inherent higher threshold to be surpassed with training at delayed conditions and perhaps differential characteristics of MIS specificity under these conditions.
In the present study, we characterized the properties of MIS in three MEG experiments. In Experiment 1, we examined whether MIS in response to tones triggered by a button press is a learned phenomenon, and we further explored the specificity of this button–tone MIS for time delays between the motor act (the button press) and the sensory stimulus (the tone feedback). In Experiment 2, we investigated the hypothesis that nonzero delays between motor act and the resulting sensory stimuli can induce MIS. Finally, in Experiment 3, we investigate the hypothesis that suppression to MIS learned with nonzero tone delays exhibits specificity to the sensory stimuli, motor act, and delay learned.
The MEG M100 response was measured in 13 healthy right-handed subjects (8 men, 5 women; aged 20–40 years). All subjects provided informed consent as approved by the Committee on Human Research at our institution. The experiment consisted of six blocks: a training block, four test blocks, and a control block. In the training block, subjects pressed a plastic button with their right thumb at a self-paced rate of 0.5 Hz and heard a simple tone (1 kHz, 100 msec long, 5 msec rise/fall ramp, 80 dB SPL, binaural), immediately after the button press at 0 msec delay. Subsequent to this training block, in four test blocks subjects pressed a button and heard a simple tone at delays of either 0 (delay0s), 100 (delay0.1s), 300 (delay0.3s), or 500 (delay0.5s) msec. Note that the 0-msec block was identical to the training block. Finally, in a control block, subjects passively listened to a simple tone, identical to the tone in the training and test blocks, presented once every 2 sec. The training block was presented first and the control block last. The order of the intervening four test blocks was randomized across subjects. Each block consisted of 100 trials and a short break of 1 to 2 min was provided between blocks.
Seven healthy right-handed subjects (4 men, 3 women; aged 22–40 years) participated in this experiment with informed consent. Five of these subjects also participated in Experiment 1. Experiment 2 consisted of six blocks as well: two control blocks and four training blocks. The first and sixth blocks were controls, where subjects passively listened to a simple tone presented once every 2 sec, similar to the control block in Experiment 1. Blocks 2 to 5 were training blocks identical to each other where subjects pressed a button and heard a tone afterward at a constant 100-msec delay (delay0.1s). Each block consisted of 100 trials.
Thirteen healthy right-handed subjects (8 men, 5 women; aged 20–40 years) participated in this experiment with informed consent, five of whom also participated in Experiments 1 and 2. The experiment consisted of four training blocks followed by four test blocks and a control block. The four training blocks were identical to the training blocks of Experiment 2, thus also serving as a replication, where subjects pressed a button with their right thumb once every 2 sec and heard a simple tone after a 100-msec delay (delay0.1s). Subsequent to these four training blocks, four test blocks were conducted. In one test block, subjects were asked to press the button with their left thumb and heard the 1-kHz tone after a 100-msec delay (motor; where the hand switch signifies an alteration in motor act). In a second test block (sensory), subjects pressed a button with their right thumb and heard a tone at a 100-msec delay but with a different frequency (0.5 kHz); the lower frequency tone signifying an alteration in sensory stimuli. In two other test blocks, subjects pressed a button with their right thumb and the delay between the tone was changed to either 0 or 200 msec (delay0s and delay0.2s) while the carrier frequency of the tones were fixed at 1 kHz. In a final control block, subjects passively listened to simple tones with frequencies of 0.5 or 1 kHz, presented randomly once every 2 sec. The order of presentation of the four test blocks was randomized across subjects. Each training and test block consisted of 100 trials. The control block consisted of 200 trials, with 100 randomly distributed trials for each tone frequency.
Data Acquisition and Analysis
MEG recordings (band-pass filtered from 0 to 300 Hz, sampling rate 1200 Hz) were obtained from the whole head in a magnetically shielded room using an Omega 275 biomagnetometer (VSM MedTech, Port Coquitlam, Canada). We were particularly interested in the evoked M100 response, which typically occurs ∼100 msec poststimulus (Farrell, Tripp, Norgren, & Teyler, 1980; Hari, Aittoniemi, Järvinen, Katila, & Varpula, 1980). Therefore, we created epochs time locked to the auditory stimulus (−300 to 500 msec). For each block, 100 responses were epoched off-line using CTF MEG software (VSM MedTech). During MEG recordings, subjects were fitted with position indicator sensors at anatomic landmarks (nasion, right and left auricle). These sensors were used to quantify motion, and in aligning MEG and MRI coordinate systems. Structural MR images were generally acquired on a 1.5-T GE Signa scanner (GE Healthcare, Milwaukee, WI) using a T1-weighted three-dimensional gradient-echo (3-D SPGR, 3-D spoiled gradient recalled acquisition in a steady state): flip angle = 40°, TR/TE = 27/6 msec, matrix = 256 × 256, slice thickness = 1.5 mm.
M100 responses were evaluated separately for the left and right hemispheres in each subject for each block. Root mean square (RMS) M100 amplitude and latencies were deduced from waveforms of sensors located in the left and right temporal regions. Sources of the M100 (Q) were estimated as ECDs using CTF MEG software (VSM MedTech). A spherical head model was used and optimized based on MR images. ECDs explaining the most dominant signals from left and right temporal regions were determined for the block with the best signal-to-noise ratio, which was usually the control block, with goodness-of-fit over 80%. Once found, these dipoles were used to model the responses of the other blocks, keeping the location and orientation of the dipoles fixed while permitting the dipole moment strengths to vary temporally and across conditions (Hämäläinen, Hari, IImoniemi, Knuutila, & Lounasmaa, 1993).The M100-RMS values and the M100-Q, that is, the dipole moment amplitude corresponding to the M100 response, were then subject to statistical analysis across experimental conditions. Subjects for whom reliable sources could not be estimated were eliminated from statistical analysis. We present both the M100-RMS and M100-Q results for each experiment. The former is an assumption-free and model-independent measure of activity in the sensor array that, although is dominated by auditory cortical sources, could also contain contributions from nonauditory sources. The M100-Q is a model-dependent measure of response strength and is dominated by the auditory cortical activity restricted to a single dipole for each hemisphere's auditory cortex.
Statistical analysis was performed using SPSS version 16 (SPSS, Chicago, IL). Whenever possible, repeated measures ANOVA was performed on RMS amplitudes, latencies, and dipole source strengths with factors block and hemisphere, block being the repeated measure. Post hoc tests and t tests were performed for specific contrasts.
Figure 1A and B illustrates representative dipole localizations during the control (tone alone) block in Experiment 1. Temporal region sensors were used in estimating current dipoles. Current dipoles generally localized to the supratemporal auditory cortex as expected (Picton et al., 1999; Reite et al., 1994). Figure 2A and B displays representative left-hemisphere sensor waveforms during the control and zero-delay test blocks, respectively, in Experiment 1. These figures show the development of MIS: Sensor waveform amplitudes are reduced during the zero-delay test block (where subjects pushed a button and then heard a tone) relative to the control block. This effect was not observed in the right hemisphere, as can be seen in Figure 2C and D, which displays right-hemisphere sensor waveforms during the control and zero-delay test blocks, respectively. The RMS amplitude and dipole source strength time courses also document MIS development. Figure 2E and F shows left-hemisphere RMS amplitude and source strength time courses for the control and zero-delay test block in a representative subject during Experiment 1. These figures show that the RMS amplitude and source strength time courses corresponding to the zero-delay block are reduced relative to those for the control block. As with the sensor waveforms, this effect was not observed in the right hemisphere (Figure 2G and H).
Suppression (normalized difference from the control block) for M100-RMS responses and M100-Q in the training and test blocks is displayed in Figure 3A and B. Repeated measures ANOVA on sensor waveform magnitude (M100-RMS) revealed a main effect of block [F(4, 88) = 4.178, p < .004] and an interaction between block and hemisphere [F(4, 88) = 3.362, p < .013]. Post hoc testing showed the zero-delay block to be significantly suppressed (p < .020) relative to the control block in the left hemisphere. Repeated measures ANOVA on source strengths (M100-Q) revealed a main effect of block [F(4, 72) = 3.061, p < .022]. Post hoc testing showed the zero-delay block to be significantly suppressed (p < .023), relative to the control block in the left hemisphere. No significant differences were observed between the training and control block for M100-RMS, M100-Q, suggesting that at least one training block is necessary for MIS development.
Peak response latencies were generally later in the left hemisphere (92–95 msec) when compared to the right hemisphere (88–90 msec) for all blocks, although latency differences between the two hemispheres were not significant. However, a repeated measures ANOVA on latencies revealed an effect of block [F(4, 88) = 4.866, p < .001]. Post hoc testing showed that latencies from all test blocks, independent of delay, were significantly greater by ∼3 msec (p < .019), from the control block in the left hemisphere; no such differences were observed in the right hemisphere.
It is unclear from the results of Experiment 1 whether MIS can be developed for nonzero delays. Here, we investigated the time course of MIS development for a 100-msec button-to-tone delay. We also sought to ascertain that MIS is “true” suppression, distinct from adaptation or habituation processes. Because control blocks were conducted preceding and succeeding training blocks, any differences between the control blocks would be considered generalized adaptation.
Repeated measures ANOVA on M100-RMS data revealed a main effect of block [F(4, 48) = 12.666, p < .0001]. Post hoc testing showed that training blocks were significantly suppressed relative to control blocks in the right (p < .029) and left hemisphere (p < .048). Repeated measures ANOVA on M100-Q data revealed a main effect of block [F(4, 48) = 7.036, p < .0001]. Post hoc testing showed that training blocks were significantly suppressed relative to control blocks in the right (p < .020) and left (p < .024) hemispheres.
To correct for adaptation, we subtracted the difference between the first and second control blocks from the observed suppression to each block. Upon correcting for adaptation, training blocks were still significantly suppressed relative to the control block, in the left hemisphere, for both M100-RMS and M100-Q. Figure 4A and B shows that within four blocks of training with nonzero delays, 31% MIS is observed in M100-RMS and 34% MIS is observed in M100-Q. We contrasted the difference between the two control blocks with the difference between the first control block and the last training block (where MIS is maximally developed) and found this difference to be significant in the left hemisphere both for M100-RMS [t(6) = 2.25, p < .040] and M100-Q [t(6) = 4.54, p < .002]. We only observed 14% adaptation to M100-RMS and 10% adaptation to M100-Q, suggesting that adaptation can only account for a portion (less than 50%) of MIS. Neither M100-RMS nor M100-Q results showed statistical difference between the control blocks and the first training block. This finding is in agreement with Experiment 1 that MIS requires at least one block of training to develop.
Although responses generally occurred later in the left hemisphere by ∼3 msec, there were no significant differences in latencies between blocks in both hemispheres.
Having established that MIS develops with a nonzero delay between button and tone, in Experiment 3 we examined the specificity of non-zero-delay MIS. The beginning of Experiment 3 was like Experiment 2: four training blocks with a 1-kHz tone onset 100 msec after each button press. Then, in four test blocks, we varied the motor act (left instead of right thumb pushing the button in the motor block), sensory stimulus (a 0.5-kHz instead of 1-kHz tone in the sensory block) and tone delays (delay0s, delay0.2s), examining MIS in each of these blocks. Due to insufficient degrees of freedom (missing and unreliable data for individual blocks in three subjects), we were unable to test MIS specificity using an ANOVA model. We opted to test MIS (normalized suppression) for each block against a null hypothesis that suppression is zero, using one-tailed t tests with modified Bonferroni corrections (Holland & Copenhaver, 1998; Jaccard & Wan, 1996; Hochberg, 1988; Hommel, 1988). Figure 5A and B shows suppression in the training (averaged) and test blocks in Experiment 3. We compared the average of the training blocks with the control block and found the training blocks to be suppressed for M100-RMS [t(7) = 2.23, p < .040] and M100-Q [t(8) = 3.44, p < .005], in the left hemisphere, replicating our finding in Experiment 2 that MIS extends to nonzero delays. In four subjects, MIS was not observed after four blocks of training and data from these subjects were excluded from further analysis of MIS specificity, effectively bringing the number of subjects in Experiment 3 to nine.
RMS results (Figure 5A) show MIS development in the left hemisphere for all test blocks. M100-Q results (Figure 5B) detail MIS development for all test blocks (motor, sensory, and delays) in the left and right hemispheres, suggesting that MIS generalizes with motor act across hemispheres, across sensory stimuli induced by the motor act, and lacks delay specificity.
Similar to the previous two experiments, responses generally occurred later in the left hemisphere by ∼3 msec. However, there were no significant differences in latencies between blocks in both hemispheres.
The neural mechanisms underlying MIS were examined in three MEG experiments. In Experiment 1, we found that MIS develops for zero delays but does not generalize to nonzero delays. In Experiment 2, we found that MIS developed for 100-msec delays within 300 trials and occurs in excess of auditory habituation. In Experiment 3, we found that unlike MIS for zero delays, MIS for nonzero delays does not exhibit sensitivity to sensory, delay, or motor-command changes. Results for each of these experiments are first discussed separately in relation to suppression to self-produced speech. Subsequently, we discuss these results in the context of a general model of sensory motor control.
In Experiment 1, we found MIS development in the left hemisphere for zero delay between button and tone (10.0% as measured by RMS and 18.6% as measured by dipole fit Q), which did not generalize to nonzero delays (Figure 3A and B) and did not extend to the right hemisphere.
It is instructive to compare our results to a recent study by Martikainen et al. Similar to our study, Martikainen et al. examined the auditory response to a tone generated by self-produced keypresses. There was no delay between keypresses and tone onset, effectively making their study a zero-delay paradigm and most comparable to our Experiment 1. They report significant MIS development in the left (24 ± 7%) and right (18 ± 4%) hemispheres. As in our study, subjects pressed the button with their right hand, and the authors did find greater MIS in the left hemisphere than in the right, but state that this difference is not statistically significant. Furthermore, we observe latency differences between control and MIS conditions, which were not reported by Martikainen et al.
Procedural differences distinguish our study from the Martikainen et al. study: The ISI, that is, the time between successive tones, in our study was approximately 2 sec. Each block or trial consisted of 100 button presses with an ISI of 2 sec, resulting in an approximate intertrial interval (ITI) of 200 sec. The implication of this is 200 sec of button–tone-association time during the training block. We measured MIS not in the training block but in a subsequent test block similar to the training block, adding another 200 sec of button–tone-association time. In contrast, Martikainen et al. took their measurements in two sessions where the number of button presses was 60 and the ISI approximately 5 sec, yielding an ITI of 720 sec, which translates to 320 sec more (720 sec minus 400 sec) association time than in our study. It is possible that total exposure time is more critical for MIS development than is the frequency of—or total number of button–tone associations experienced. The notion that ITI might influence learning rate is consistent with many studies in classical conditioning (Yeo, 1976; Prokasy, Grant, & Myers, 1958; Baron, 1952; Grant, Schipper, & Ross, 1952; Spence & Norris, 1950) and might explain why the Martikainen et al. study demonstrated greater suppression relative to our study (24% vs. 19% in the left hemisphere).
Another distinction between our study and the Martikainen et al. study is that they do not account for adaptation as we have in Experiment 2. It is possible that upon correcting for adaptation, their observed MIS would be less and their finding of MIS in the right hemisphere might be different.
It is equally instructive to compare our results to analogous experiments in a different modality. Blakemore, Wolpert, et al. (1998) report suppression to self-produced ticklishness relative to externally produced ticklishness of the left hand (by the right hand or an external robot) at zero delay. Functional imaging confirmed that self-produced stimulation creates bilateral suppression of activity in the secondary somatosensory cortex, relative to externally produced stimulation. In a subsequent experiment, Blakemore et al. demonstrate similar suppression of self-produced ticklishness relative to externally produced ticklishness, but this time to right-hand stimulation (by the left hand or external robot) at zero and nonzero delays (Blakemore et al., 1999, 2000). To the extent that we also demonstrate suppression to self-produced stimuli at zero delay and nonzero delay (in subsequent experiments), our results are consistent with the Blakemore et al. results. However, our results also highlight a potential unique feature of MIS at zero delay in the auditory cortex in that suppression was unilateral (only manifested in the left hemisphere) and did not extend to nonzero delays. Another distinction between the Blakemore et al. study and ours concerns the nature of action–consequence pairing: Results of the Blakemore et al. study support a forward model that learns the correspondence between the subject-triggered tickle-act (action) and tickle-sensation (consequence). This action–consequence pairing is more direct and natural relative to the pairing in our study—button press (action) and tone feedback (consequence), which is more indirect and unnatural.
A key element of our hypotheses is a learned internal model that accommodates sensory delays (Whitney et al., 2000; Whitney & Murakami, 1998). We tested this in Experiment 2, where subject button presses did not immediately produce a tone—tones were delayed by 100 msec. We predicted that MIS would develop for this delayed tone as subjects practiced the button presses and, indeed, this was the case: Over successive 100-trial blocks, MIS developed significantly in both the left and right hemispheres. This finding, in contrast to Experiment 1—where we only noted significant MIS development in the left hemisphere—raises questions about the role of delay in measured MIS. One explanation for this follows from our postulation that the left auditory cortex is specialized for processing zero-delay MIS due to its role in language processing. This school of thought would suggest that at nonzero delays, the tone is sufficiently distant from the motor stimulus such that language processing is not as critical to MIS development, and thus, the form of suppression observed is more general and extends to the right hemisphere.
Nevertheless, there are apparent hemispheric differences present in Experiment 2. MIS is consistently larger in the left hemisphere across training blocks. Furthermore, by the fourth block of training, the magnitude of MIS in the right hemisphere approaches an asymptote, whereas the left hemisphere appears to continue increasing, suggesting that MIS in the left hemisphere could continue to develop with more blocks of training. In fact, upon adjusting for general adaptation effects (by comparing the initial and final control sessions), we find that MIS is only significant in the left hemisphere, with adaptation accounting for a 14% change in RMS and a 10% change in Q during the experiment. Thus, MIS in the left hemisphere is 30.7% − 14% = 16.7% in terms of RMS, and 34.1% − 10% = 24.1% in terms of Q, which is still nearly double the MIS observed in the left hemisphere for the zero-delay condition in Experiment 1. Would the zero-delay MIS have been greater than 30% if training had continued for four blocks? This is possible, but it should also be noted that even by the second training block (i.e., the total exposure in Experiment 1), left-hemisphere MIS for the 100-msec delay condition amounted to 21.6%—still more than double the MIS observed in Experiment 1. Further experimentation will be required to investigate this issue.
Also interesting is a comparison of the results in Experiment 2 with an earlier study of SIS of the auditory cortex. In that study (Houde et al., 2002), the authors reported an SIS value similar to the left-hemisphere MIS of Experiment 2: M100 response to self-produced speech was 30% less than the response to tape-playback of that speech in the left hemisphere and 15% less in the right hemisphere.
In Experiment 1, we demonstrated MIS at zero delays; in Experiment 2, we confirmed MIS at nonzero delays; in Experiment 3, we found that MIS for nonzero delays does not exhibit sensitivity to sensory, delay, or motor-command changes. For both RMS and Q, the trained MIS generalized to other conditions: left hand, 500 Hz tone, zero delay, and 200-msec delay. Although MIS manifested bilaterally in all test blocks according to Q results, RMS results were slightly different in that MIS is not observed for all test blocks in the right hemisphere. To understand this seeming confound, it is instructive to examine the results of MIS training in Experiment 3. Our hypothesis for MIS development relies on an internal forward model that is trained to make predictions about sensory feedback. Our finding in all three experiments that the first training block(s) does not differ statistically from the control block lends credence to the notion that training plays a critical role in MIS development. As shown in Figure 5B, MIS training was not as effective in the right hemisphere, and as such, it is not entirely surprising that MIS does not develop for all test blocks in the right hemisphere. This finding goes to underscore the role of training in MIS development.
A prominent difference between Experiment 1 and Experiment 3 is that MIS learned at zero delay did not manifest at nonzero delays, whereas MIS learned at nonzero delays manifested at zero delay. This difference raises an intriguing question: Is the profile for MIS trained at zero delay different from that of MIS trained at nonzero delay? Although the possibility cannot be ruled out, a key component of our hypothesis for MIS development involves training an internal forward model. As such, it is also possible that a lack of adequate training in Experiment 1 could account for why MIS trained at zero delay did not manifest at nonzero delays: Unlike Experiment 2 and Experiment 3, where subjects were exposed to multiple blocks of training, subjects in Experiment 1 were exposed to a single training block—100 trials to be exact.
Overall, the results of this study confirm the basic results of Schafer and Marcus's original study and Martikainen et al.'s follow-up study: that it is possible to observe suppression in the response of the auditory cortex to tones triggered by a subject's own button presses. However, this study also extends this basic result in several ways that advance our understanding of the relationship between sensory prediction and motor output.
First, a key difference between our study and that of Martikainen et al. is that responses over the entire exposure time contribute toward MIS in their study, whereas responses in our study are separated into a training block and a test block. This allowed us to examine the effect of learning, which the Martikainen et al. study did not address. Owing to this difference, we have a different interpretation of why MIS arises. Martikainen et al. report that their results “support the existence of a forward model that predicts the auditory consequences of the subject's own motor acts on the environment—even with a tool—and thereby enables discrimination between self-produced and external sounds.” However, they do not address the development of this forward model. Because we do not see an immediate MIS effect (no significant MIS in the training block), but do see MIS in the subsequent test block, our results suggest a different hypothesis: that MIS is not an intrinsic property of motor-generated sensations, but instead develops when an internal model is trained to predict those sensations.
Second, our study also examined the effects of introducing delay between motor output (the button press) and sensory consequences (the tone). In Experiment 2, we showed that MIS will still develop if there is a 100-msec delay between button press and tone, whereas in Experiment 3 we examined how this 100-msec delayed MIS generalizes to other tones, hands, and delays, and here we found interesting differences between 100 msec delayed MIS and the zero-delay MIS examined in Experiment 1. On one hand, in Experiment 1, we did not investigate whether zero-delay MIS generalizes to different tones and hands, hence, it is possible that the generalization of MIS to the 100-msec delayed tone, different tones, and hands is, in fact, characteristic of MIS in general. Further experimentation is necessary to investigate this possibility. On the other hand, the generalization of 100-msec delay trained MIS to other delays does conflict with the lack of generalization observed in Experiment 1 for MIS to the zero-delay tone. So, how do we explain the difference in generalization pattern between zero and nonzero delays? One account for this difference starts from considering that the zero-delay MIS case may be special, in that adaptation to zero-delay sensory feedback (i.e., only internal sensory delays) is overlearned because this is continually encountered in everyday life. More generally, it may be that the sensory timing model is separately learned from the sensory type/quality model, and maybe the timing model takes longer to learn. Alternatively, it is possible that MIS that develops for nonzero delays merely reflects a generalized sensory expectation effect due to foreknowledge of the occurrence of the incoming sensory stimulus. Although foreknowledge has been shown to reduce sensory responses in a nonspecific manner (Begleiter, Porjesz, Yerre, & Kissin, 1973; Ritter, Vaughan, & Costa, 1968; Sutton, Tueting, Zubin, & John, 1967; Sutton, Braren, Zubin, & John, 1965), few studies have examined the brain areas subserving generalized expectation induced suppression. Further experimentation is necessary to investigate the relationship between MIS and generalized expectation induced suppression together with associated neural substrates.
It is interesting that we observed different hemispheric effects in the development of zero versus non-zero-delay MIS. A plausible explanation for zero-delay MIS development in the left hemisphere alone is that the left hemisphere is more adept at auditory suppression due to its role in language processing. Martikainen et al. also found greater suppression in the left hemisphere. Another study examining SIS (Houde et al., 2002) found twice as much suppression in the left hemisphere compared to the right hemisphere. It is also plausible that the left hemisphere is specialized for zero-delay MIS—because zero-delay MIS is most analogous to speech—whereas nonzero delay MIS is more akin to other modalities. Analogous studies in the somatosensory system, where self-produced tactile stimuli were perceived as less ticklish than the same stimuli generated externally, demonstrate suppressed fMRI activation in both right and left hemispheres (Blakemore, Rees, et al., 1998). To our mutual knowledge, there are no published accounts of hemisphere or delay-specific MIS in the somatosensory system. This suggests that zero-delay auditory MIS might be a unique condition to which the left hemisphere is specially attuned based on its role in language processing, whereas non-zero-delay auditory MIS exhibits characteristics more akin to other modalities.
On a final note, we believe our study also has potential implications for cross-modal interactions between brain systems and plasticity of these interactions. Cumulatively, our experiments suggest that through a coupling between motor and auditory systems, an internal forward model can be recruited within a reasonable time span and can adapt to systemic perturbations (delays, frequency-shifts, alteration in motor act). Although these results were derived from motor–sensory interactions, it is reasonable to suppose that these results generalize to other brain systems. An area of interest and potential impact is in persons who have lost the use of some sensory modality. Our results hold promise for recruiting compensatory interactions between other brain systems and plasticity of such interactions. Future experiments will be needed to explore these possibilities.
This work was supported by NIH R01 grants DC006435 and DC004855. We thank Susanne Honma and Anne Findlay for assistance in data collection and analyses.
Reprint requests should be sent to Srikantan S. Nagarajan, Biomagnetic Imaging Laboratory, Department of Radiology, Box 0628, 513, Parnassus Avenue, S362, San Francisco, CA 94143-0628, or via e-mail: firstname.lastname@example.org.