In the absence of visual information, our brain is able to recognize the actions of others by representing their sounds as a motor event. Previous studies have provided evidence for a somatotopic activation of the listener's motor cortex during perception of the sound of highly familiar motor acts. The present experiments studied (a) how the motor system is activated by action-related sounds that are newly acquired and (b) whether these sounds are represented with reference to extrinsic features related to action goals rather than with respect to lower-level intrinsic parameters related to the specific movements. TMS was used to measure the correspondence between auditory and motor codes in the listener's motor system. We compared the corticomotor excitability in response to the presentation of auditory stimuli void of previous motor meaning before and after a short training period in which these stimuli were associated with voluntary actions. Novel cross-modal representations became manifest very rapidly. By disentangling the representation of the muscle from that of the action's goal, we further showed that passive listening to newly learnt action-related sounds activated a precise motor representation that depended on the variable contexts to which the individual was exposed during testing. Our results suggest that the human brain embodies a higher-order audio-visuo-motor representation of perceived actions, which is muscle-independent and corresponds to the goals of the action.
When we listen to the sound of an action, such as knocking on a door or clapping, our brain may activate the neural processes involved in executing the perceived motor act. The notion of a functional equivalence between action execution and action perception was originally proposed by William James. As he said, “every mental representation of a movement awakens to some degree the actual movement which is its object” (James, 1890). Neurophysiological studies in nonhuman primates have provided direct evidence of such an intriguing mechanism by identifying a class of neurons (“mirror neurons”) that discharge when the monkey performs a particular motor act as well as when it observes another individual executing it (reviewed in Rizzolatti & Craighero, 2004). Recent work has demonstrated that mirror neurons also discharge in response to action-related sounds, independent of whether the action that generated them is visible (Keysers et al., 2003; Kohler et al., 2002).
A homologous mechanism has been found in healthy humans, where TMS and imaging studies (reviewed in Aglioti & Pazzaglia, 2010) have indicated that a left fronto-parietal network, comparable with that of the monkey, is activated by both audiovisual and motor aspects of actions. For instance, a rare intracranial EEG recording study in an epileptic 12-year-old girl demonstrated that the sound of finger-clicks excited the functionally defined hand motor area (Lepage et al., 2010). A recent lesion study has provided further support for a causal correspondence between action execution and action perception by showing that the inability to execute specific body movements may interfere with their recognition (Pazzaglia, Pizzamiglio, Pes, & Aglioti, 2008). In this study, patients with left fronto-parietal damage and limb and/or buccofacial apraxia (inability to perform specific gestures) were also particularly unable to match pictures of mouth and face-related actions with their sounds. Beyond cortical injuries, temporary inactivation of the left primary motor representations of the lips and tongue (D'Ausilio et al., 2009) or lips and hands (Möttönen & Watkins, 2009) may also selectively impair correct categorization of the corresponding sounds.
So far, a number of studies have described body part-specific (or somatotopic) motor activation during the perception of the sounds of familiar actions, such as hand clapping or finger tapping (Caetano, Jousmäki, & Hari, 2007; Aziz-Zadeh, Wilson, Rizzolatti, & Iacoboni, 2006; Gazzola, Aziz-Zadeh, & Keysers, 2006; Hauk, Shtyrov, & Pulvermuller, 2006; Pizzamiglio et al., 2005; Aziz-Zadeh, Iacoboni, Zaidel, Wilson, & Mazziotta, 2004) as well as while listening to speech (Buccino et al., 2005; Tettamanti et al., 2005; Wilson, Saygin, Sereno, & Iacoboni, 2004; Watkins, Strafella, & Paus, 2003; Fadiga, Craighero, Buccino, & Rizzolatti, 2002) and to piano playing (Lahav, Saltzman, & Schlaug, 2007; D'Ausilio, Altenmüller, Olivetti Belardinelli, & Lotze, 2006; Bangert et al., 2005; Bangert & Altenmüller, 2003; Lotze, Scheler, Tan, Braun, & Birbaumer, 2003; Haueisen & Knösche, 2001). For instance, a certain degree of audio-motor mirroring has been revealed when people read words or phrases depicting actions performed by different effectors (Aziz-Zadeh et al., 2006; Hauk, Johnsrude, & Pulvermüller, 2004) and while they listen to action-related sentences (Tettamanti et al., 2005). Yet, there is no consensus about the degree of somatotopicity and the way that actions become represented in the human auditory–motor network is not entirely understood. One concern is that numerous studies of the monkey brain have suggested that the concept of somatotopicity may be insufficient to explain the organization of the motor cortex, hence, to account for the mapping of perceived actions onto the motor system (e.g., Graziano & Aflalo, 2007; Graziano, 2006; Schieber, 2001; Fadiga, Fogassi, Gallese, & Rizzolatti, 2000; see also Fernandino & Iacoboni, 2010).
Indeed, the motor cortex contains a number of discontinuous and multiple representations for the same body part that significantly overlap with maps of other body parts. Moreover, beside that of the body, the functional organization of the motor cortex represents the target location in peri-personal space and a map of the motor repertoire that may involve more than one effector (Graziano & Aflalo, 2007; Graziano, 2006; Graziano, Taylor, & Moore, 2002).
Another concern is that the activity in fronto-parietal regions is highly sensitive to previous perceptuomotor experience (e.g., if the sound heard is an everyday action or speech), is particularly tuned in expert brains (e.g., Aglioti, Cesari, Romani, & Urgesi, 2008; Calvo-Merino, Grèzes, Glaser, Passingham, & Haggard, 2006; Cross, Hamilton, & Grafton, 2006), and may be influenced by semantic differences (e.g., between ripping paper and vocalization) and by anticipatory mechanisms (e.g., the predictability of the sequence of notes while listening to a rehearsed piano melody; e.g., Lahav et al., 2007; D'Ausilio et al., 2006). Therefore, to investigate how actions are mapped onto the auditory–motor network, it is necessary to use sounds that have no previous motor, verbal, or semantic meaning.
In the present TMS study, we aimed to further explore the relationship between action-related sounds and their motor representation. Unlike previous studies, we investigated whether and how the motor system responds to well-controlled and unfamiliar sounds (see above). In recent years, TMS has become a valuable tool to measure cortical motor activation. The amplitude of the motor-evoked potentials (MEPs) recorded from a selected muscle in response to a single magnetic TMS pulse applied over the primary motor cortex (M1) is an index of the corticospinal excitability under a given experimental condition that provides an extremely sensitive measure of motor activation at the time the TMS impulse was applied. Measuring TMS-induced MEPs have some advantages over other techniques (e.g., RTs). For instance, it allows measuring subthreshold effects at the motor level opening up the possibility of sophisticated experimentation.
In the first experiment, we studied whether the human motor system can learn to respond to action-related sounds by associating self-generated actions with arbitrary sounds that do not have any motor meaning per se. During a simple and established paradigm, participants were trained to generate two tones by pressing two buttons with their right index and little finger, respectively (Figure 1B; see also, Elsner & Hommel, 2001). Before and after this training (Figure 1A and C, Experiment 1), we tested the sound-related modulations in the listeners' motor cortex during passive tone perception. We predicted that plastic changes of the motor system would associate each sound with the visuo-motor representation of the finger that previously produced it, possibly through a basic associative learning mechanism (Hebb, 1949; see also Casile, Caggiano, & Ferrari, in press; Bonaiuto & Arbib, 2010; Heyes, 2010; Keysers & Perrett, 2004; Bi & Poo, 2001). The neural formation of novel audio-visuo-motor associations would become detectable as a completely new pattern of finger-specific motor facilitation, with characteristics similar to those observed in previous studies that have reported the activation of the motor system during the perception of others' actions (e.g., Aziz-Zadeh et al., 2004; Watkins et al., 2003; Fadiga et al., 2002).
In the second experiment, we asked how the motor cortex represents newly acquired action-related sounds. To date, the experimental findings have consistently shown a somatotopic mapping of perceived actions in the motor system (for some exceptions, see Cattaneo, Caruana, Jezzini, & Rizzolatti, 2009; Galati et al., 2008; Lewis, Brefczynski, Phinney, Janik, & DeYoe, 2005). However, acting in the environment often implies that kinematically different motor acts result in the same sensory effect. Therefore, it would often be more advantageous to internally represent how the objects must be acted upon to obtain the effect (or goal) desired in different contexts, independent of the muscle that might be used to do so. To investigate this, we dissociated the representation of the button–tone association (each button generates a specific tone) from that of the muscle–tone association (each tone is triggered by the movement of a particular muscle) by reversing the finger–button mapping after associative learning (Figure 1C, Experiment 2). In this way, the position of the fingers on the buttons was opposite to what was previously learnt whereas the mapping of the button to the tones remained the same (see Methods).
On the basis of the available nonhuman literature (e.g., Graziano, 2006; Iriki, Iriki, Tanaka, & Iwamura, 2006; Fogassi et al., 2005; Kakei, Hoffman, & Strick, 1999), in this second experiment, we expected two possible scenarios during passive perception of each tone. Either the perceived actions will be represented in intrinsic coordinate frames (e.g., taking into account the motor pattern that was associated with the sound heard) or in extrinsic coordinate frames (e.g., in reference to the position of the buttons in space and the association between each button and its corresponding tone). In the former case, each tone will facilitate the muscle representation that has previously generated the tone (e.g., Tone 1 will facilitate the index finger motor representation, to which it was associated during the learning phase [LP]; see Figure 1). In the other case, we would expect that each tone will facilitate the action needed to generate that tone in the new context (reversed finger–button mapping), regardless of which muscle was previously used to generate it (e.g., Tone 1 will facilitate the little finger motor representation and, thus, will evoke a muscle activity opposite to the trained one). Some recent evidence has suggested that the latter case may be more likely. Indeed, a behavioral experiment investigating the contribution of button–tone and movement–tone associations to the selection of appropriate movements found that the auditory effects of voluntary actions became associated with the sensory representation of the button rather than with the motor representation of the movement (Hoffmann, Lenhard, Sebald, & Pfister, 2009). This study used RT measurements to disentangle the representation of the former from the latter, with the finger–button mapping for the two hands changed for half of the participants after training and the subjects required to actively respond to the sounds heard.
To address these questions, we applied single-pulse TMS over the left primary motor cortex of healthy volunteers during passive listening to newly learnt action-related sounds. We recorded TMS-elicited MEP from the hand muscles specifically involved in the action–effect learning: the abductor digiti minimi (ADM; little finger muscle) and first dorsal interosseous (FDI; index finger muscle). The TMS technique allowed us to measure the corticomotor excitability in response to the presentation of auditory stimuli before and after associative learning and thus the correspondence between auditory and motor codes—as established in the associative acquisition phase—in the listener's motor system.
Experiment 1 (Same Mapping)
Participants and Experimental Protocol
Twenty-four volunteers (mean age = 25.7 years, SD = 2.9 years, 12 women) participated in the experiment, which was approved by the ethics committee of the University of Leipzig. Informed written consent was obtained from each participant. All were right-handed and naive to the purpose of the experiment. They were comfortably seated on a chair with their right forearm relaxed on a table and their right hand always positioned between two buttons (starting position): The right button was located on the right side of the little finger, the left one was on the left of the index finger (Figure 1).
The experiment consisted of five sessions: prelearning test (PreT), LP1, Postlearning Test 1 (PostT1), LP2, and PostT2. During training in LP1 and LP2, the participants performed free-choice button presses by abducting either the index finger (to press the left button) or little finger (to press the right button) from the starting position. After each button press and before beginning another abduction movement, the finger was repositioned to the starting position. These movements maximized the activity of the ADM (little finger) and FDI (index finger) muscles, respectively. Each button press was contingently followed (SOA of 0 msec) by one of two tones of different pitch (MIDI tones lasting 200 msec; instrument marimba: either 400 or 800 Hz; presented binaurally through headphones). The assignment of tones (high vs. low) to each button (left vs. right) was counterbalanced across participants.
During each LP (lasting approximately 5 min), participants performed 200 button presses by voluntarily choosing which button to press. We asked subjects to press both buttons about equally often and in random order. In this experiment, voluntary self-generated actions were used with the assumption that only intention-based actions (vs. motor responses following the stimulus) would result in stronger action effect learning (Herwig & Waszak, 2009; Herwig, Prinz, & Waszak, 2007). In the test blocks (PreT, PostT1, and PostT2), participants passively listened to the same tones that were generated during the previous LP, presented in random order to reduce anticipation effects (cf. D'Ausilio et al., 2006). The participants were asked to observe their right hand throughout the experiment. We investigated participants' corticospinal excitability during the presentation of tones in the test blocks by measuring MEPs to single TMS pulses delivered over the hand motor cortex of the left hemisphere.
Stimulation and Recoding
In the test blocks, 192 tones were presented (96 in PreT and 96 in PostT phases). A focal single TMS pulse (Magstim 200, Magstim Company, Whitland, U.K.; 70 mm figure-of-eight stimulation coil) was randomly delivered at a variable interval (50, 150, or 300 msec) from tone onset in 144 trials (72 in PreT and 72 in PostT phases). The coil was positioned tangentially over the left motor cortex with the handle pointing backward and laterally 45° away from the midline. We recorded TMS-induced MEPs simultaneously from the ADM and FDI finger muscles of the right hand using self-adhesive disposable Ag–AgCl surface electrodes. A ground 1.5 cm metal electrode was placed on the dorsal surface of the wrist. The EMG signal was amplified 1000 times, digitized at 5 kHz, and band-pass filtered (between 10 and 1000 Hz) with a mains hum notch filter at 50 Hz. The optimal scalp position from which MEPs with maximal amplitude were elicited in both resting ADM and FDI finger muscles was detected by moving the coil over the left motor cortex while delivering TMS pulses at constant intensity. The TMS intensity was set at 120% of each subject's resting motor threshold (rMT) and ranged from 31% to 58% (mean = 43.4%, SD = 7.4%) of the maximum stimulator output. rMT was defined as the lowest stimulator output that evoked at least 5 of 10 successive MEPs with an amplitude greater than 50 μV in the higher threshold muscle (the ADM) while the subject's hand was relaxed. The intertrial interval between the TMS pulses ranged from 5 to 7.5 sec. Muscular contraction in both muscles was always visually monitored and full muscular relaxation was obtained. To ensure an adequate level of attention, the participants were required to press a pedal with their left foot at the appearance of a predetermined tone (high or low, balanced across subjects) in randomly selected trials without TMS pulse (24 in PreT and 24 in PostT phases). Before the experiment, a screen provided participants with visual feedback of their muscle relaxation.
Individual peak-to-peak MEP amplitudes were calculated as the absolute distance between the minimum and maximum values observed within a search window starting at 10 msec and ending at 80 msec after the TMS pulse. Trials with detectable background activity preceding the TMS pulse, an MEP amplitude smaller than four times the EMG in the 50 msec before the TMS pulse or an MEP amplitude higher or lower than 2 SD of the mean of each block were discarded (5.3% of the total). For each condition and each muscle (see below), mean values in the test sessions before (PreT) and after (PostT1 + PostT2) learning were obtained from at least 31 MEPs. For each participant and muscle, we separately normalized (z scores) the MEP amplitudes recorded before (PreT) and after (PostT1 + PostT2) learning to the total MEPs of the respective test session(s). We computed separate ANOVAs for PreT and PostT on the mean normalized MEP amplitudes with two within-subjects factors, each with two levels: Congruency (congruent or incongruent condition) and Muscle (FDI or ADM). The factor congruency refers to the different muscle–tone combinations. In the congruent condition, the presented tone was the one associated (during the LP) with the muscle from which MEPs were recorded. In the incongruent condition, the tone was the one previously associated with the opposite finger. In other words, in each test trial, one of the (simultaneously measured) muscles was congruent with the presented tone, and the other was incongruent. The ANOVA on PreT was conducted to verify that the tones used as stimuli in this experiment did not elicit a specific pattern of motor facilitation before the training per se (i.e., they were not already motorically represented in the subjects' brains). Before the LP, the factor Congruency refers to the future congruency of the muscle–tone combination in the post-LP. A significance threshold of p < .05 was set for all statistical tests.
In addition to the ANOVA, we analyzed whether the number of button presses executed by each finger influenced the strength of the action–tone association (i.e., stronger association would result in higher MEPs). To do so, we computed a regression between the number of the index and little finger button presses during the LP and the normalized MEP amplitudes during passive listening to the sounds after training.
Experiment 2 (Reverse Mapping)
Participants and Experimental Protocol
Twelve right-handed volunteers (mean = 27.9 years, SD = 3.4 year, five women) took part in the experiment, which consisted of three experimental sessions (Figure 1): PreT, LP (about 100 trials per finger, overall about 3 min), and PostT. Importantly, to dissociate the representation of the button–tone relationship (each button generates a specific tone) from that of the muscle–tone relationship (each tone is triggered by the movement of a particular muscle), after the LP, the participants moved to the other side of the table (i.e., they rotated 180° from the LP position, as depicted in Figure 1C, Experiment 2). In this new position, their hands were again placed between the two buttons, but this time the finger–button mapping was opposite to that trained in the LP. Participants underwent no further training according to their new seating position. Instead, they passively listened to the tones generated during the previous LP, presented in random order. In Experiment 2 only, the buttons were differently colored (blue and yellow) to make the fact that the finger–button mapping had changed in PostT more explicit. For further details, see Experiment 1, above.
Stimulation and Recording
In each of the two test blocks (PreT and PostT), each tone was repeated 48 times for a total of 96 stimuli. TMS was delivered 36 times (18 times per condition) in each test block. The TMS intensity was set at 120% of each subject's rMT and ranged from 34% to 55% (mean = 44.6%, SD = 7.4%) of the maximum stimulator output. In randomly selected trials without TMS pulse (12 for each test block), the participants were required to press a pedal with their left foot at the appearance of a predetermined tone (high or low, balanced across subjects). For other TMS and recording parameters, see Experiment 1.
Data processing was the same as in Experiment 1. Mean values were obtained from at least 15 MEPs per condition (6.3% of MEPs were discarded). For one subject's ADM muscle, only 12 MEPs were used in PreT. Statistical analysis was performed on mean MEP amplitudes as dependent variable in an ANOVA with two within-subject factors, each with two levels: Congruency (congruent or incongruent condition) and Muscle (FDI or ADM), separately for each test session (PreT and PostT). As in Experiment 1, in the congruent condition, the tone presented was the one associated during the LP with the muscle from which MEPs were recorded. However, it should be noted that in the reverse mapping of Experiment 2, the finger that was associated with the perceived tone is close to the button opposite the one that generated the tone in the LP. For further details, see Experiment 1 above.
Figure 2A depicts the normalized MEPs recorded from the FDI and ADM muscles in the PostT (postlearning test in Figure 1C, Experiment 1) during passive listening to the tones trained in LP (learning phase in Figure 1B). The ANOVA showed a significant main effect of Congruency (F1, 23 = 5.683, p = .026). In particular, for each muscle recorded, mean z values (z scores: mean ± SEM) were larger for the congruent condition (i.e., when the tone heard was the one generated by the muscle tested during LP; FDI: 0.020 ± 0.020; ADM: 0.014 ± 0.021) than in the incongruent condition (i.e., when the tone heard was the one produced by the other muscle; FDI: −0.021 ± 0.020; ADM: −0.015 ± 0.022). This pattern of corticospinal modulation indicated that, during the LP, the actual finger movements were rapidly associated with the corresponding effects and that a completely new auditory–motor mapping was successfully learnt.
No other main effects or interactions were significant [muscle (F1, 23 = 0.013, p = .911), interactions between muscle and congruency (F1, 23 = 0.021, p = .885)]. Notably, we found no modulations before training (prelearning test in Figure 1A). The ANOVA showed no significant main effect of the factors Muscle (F1, 23 = 0.86, p = .363) and Congruency (F1, 23 = 0.186, p = .670) nor a significant interaction between them (F1, 23 = 0.553, p = .465). The values in the congruent condition were −0.005 ± 0.019 for FDI and 0.019 ± 0.026 for ADM. In the incongruent condition, 0.005 ± 0.019 for FDI and −0.019 ± 0.026 for ADM. The absence of tone-related modulations of participants' corticospinal excitability before associative learning demonstrated that the tones were not previously represented in the motor cortex and confirmed that the motor facilitation observed after learning constituted a newly acquired auditory–motor representation. By freely choosing which button to press during the LP, the participants executed a comparable number of button presses with each finger, as instructed by the experimenter (mean ± SEM of number of button presses: index finger, 196.8 ± 6.5; little finger, 198.1 ± 5.7, t23 = −0.61, p = .55, two tailed). Consequently, the number of each index and little finger button presses did not predict the strength of motor facilitation during passive listening to the sounds after training (index finger button presses to FDI congruent condition: R2 = .04 (1,23), p = .34; index finger button presses to FDI incongruent condition: R2 = .035, p = .38; little finger button presses to ADM congruent condition: R2 = .006, p = .72; index finger button presses to ADM incongruent condition: R2 = .009, p = .66).
Figure 2B depicts the modulation of normalized MEPs recorded in the PostT (postlearning test in Figure 1C) when participants passively listened to the tones after having rotated 180° around the buttons (thus reversing the previously learnt finger–button mapping). The ANOVA yielded a significant main effect of congruency (F1, 11 = 4.844, p = .050). However, the pattern of corticospinal excitability was opposite to that of Experiment 1. Indeed, the mean normalized amplitudes were now larger in the incongruent condition (i.e., when the tone heard was the one generated by the other muscle during LP; FDI: 0.030 ± 0.036; ADM: 0.112 ± 0.055) as compared with the congruent one (FDI: −0.030 ± 0.038; ADM: −0.109 ± 0.052). Most interestingly, this result indicates that each tone facilitated the representation of the motor program that would have generated the tone in the current (reversed) context. Indeed, the motor representation covertly activated was the reverse of the one associated with the tone during training. In our view, this result is evidence of high-level coding of action-related sounds (e.g., Lewis et al., 2005).
In PostT, neither the factor muscle (F1, 11 = 0.266, p = .616) nor the interaction between muscle and congruency (F1, 11 = 1.454, p = .253) was significant. The ANOVA on the MEPs recorded in PreT revealed neither a significant main effect of the factors Muscle (F1, 11 = 0.829, p = .382) and Congruency (F1, 11 = 1.95, p = .19) nor a significant interaction between them (F1, 11 = 1.98, p = .187). The values in the congruent condition were −0.014 ± 0.059 for FDI and 0.104 ± 0.043 for ADM. In the incongruent condition, 0.015 ± 0.060 for FDI and −0.099 ± 0.044 for ADM. As in Experiment 1, the absence of specific MEP modulation in PreT showed that the sounds used were unfamiliar to the subjects. Therefore, the corticospinal modulation after associative learning is evidence of a high-level new auditory–motor representation established during training. As in Experiment 1, the participants executed a comparable number of buttons presses for each finger (mean ± SEM of number of button presses: index finger, 98.7 ± 2.1; little finger, 100.1 ± 2.6, t11 = −1.21, p = .25, two tailed). The number of each index and little finger button presses did not predict the strength of motor facilitation (index finger button presses to FDI congruent condition: R2 = .07 (1,11), p = .43; index finger button presses to FDI incongruent condition: R2 = .076, p = .39; little finger button presses to ADM congruent condition: R2 = .01, p = .76; index finger button presses to ADM incongruent condition: R2 = .01, p = .76).
The present TMS study aimed to elucidate how novel sounds become represented in the human motor system. It has two major findings. First, in adult participants, voluntary self-executed actions were rapidly associated with arbitrary audible effects. After training, passive listening to the associated sounds specifically facilitated the motor program that generated them, even if actions were not overtly executed (Figure 2A). For instance, passive perception of Tone 1, which was associated with a left index finger button press (Figure 1B), now elicited larger MEPs in the muscle of the index finger. This result indicates that cross-modal plasticity of the motor system transformed arbitrary effects, which were not at all associated with an action beforehand, into action-related sounds. Second, when the finger–button mapping was the reverse of the mapping in the training (the position of the hand over the buttons that triggered the sounds was rotated 180°; see Figure 1C, Experiment 2), each sound selectively facilitated the hand muscle that would produce it in the current context rather than the muscle associated with the sound during training (Figure 2B). Therefore, the pattern of activation was the reverse of that observed in Experiment 1 (Figure 2A). Indeed, passive perception of Tone 1 (which was associated with a left index finger button press during learning) now elicited larger MEPs in the muscle of the little finger. Of particular relevance here is the fact that passive perception of sounds now facilitated kinematically different movements that would be executed for the purpose of generating the same auditory effect in the present context.
In our view, these results indicate that during the training, the brain acquired knowledge about the permanent features of the experimental setup, that is the audio-visual association (which remains constant across the two positions) between the sound of newly learnt actions and the corresponding buttons. This audio-visual representation is muscle-independent and takes extrinsic coordinate frames into account for the accomplishment of the goal of the action (i.e., for the act of generating a particular tone by pressing a specific button).
This raises the question of what the physiological mechanisms that rapidly map arbitrary stimuli onto the motor system during training could be. Nonhuman mammal studies have demonstrated that horizontal neuronal connections, which modify during the learning of new skills, lead to novel patterns of connectivity and consequently to experience-dependent reorganization of the primary motor cortex (Sanes & Donoghue, 2000). In humans, dynamic changes in the primary motor cortex representation may also follow the acquisition of new motor skills (Münte, Altenmüller, & Jäncke, 2002; Pantev, Engelien, Candia, & Elbert, 2001; Pascual-Leone et al., 1995; Pascual-Leone, Grafman, & Hallett, 1994). Less is known about perceptuomotor plastic changes during associative learning. In the rhesus monkey, premotor neurons have been shown to modulate their activity during the acquisition of new visuomotor associations after sensorimotor training (Mitz, Godschalk, & Wise, 1991). One may argue that these associations develop even stronger when self-produced actions are associated with their sensory outcomes (cf. Haggard, 2008). This association may be used in the reverse direction when a movement can be induced by anticipating or perceiving sensory effects (Herwig et al., 2007; Schütz-Bosbach & Prinz, 2007; Hommel, Müsseler, Aschersleben, & Prinz, 2001; James, 1890). Recent literature has suggested that an Hebbian associative process, which implies the contiguity between two neural events (“neurons that fire together wire together”; Hebb, 1949), may be critical for the development of a mechanism in the motor system that mirrors the actions of others (e.g., Casile et al., in press; Bonaiuto & Arbib, 2010; Heyes, 2010; Del Giudice, Manera, & Keysers, 2009; Keysers & Perrett, 2004; Oztop & Arbib, 2002). An alternative view indicates that contiguity (events occur together in time) and contingency (events occur in a predictive relationship) are equally necessary for establishing new sensorimotor associations (Cook, Press, Dickinson, & Heyes, 2010).
Our data provide significant support for the second possibility by showing that, in a few minutes of associative learning, the concurrent and contingent activation of visual and acoustic patterns and motor programs during self-executed actions contributes to the development of motor-sensory circuits that will discharge during the perception of newly acquired action-related sounds. This effect is detectable as a new pattern of sound-related and finger-specific motor facilitation. Moreover, the results show that the physical nature of the stimulus that can be mapped onto the motor cortex during voluntary actions is not restricted to sounds that are intrinsically motor-related (e.g., the sound of speech). This proposal is supported by previous TMS experiments showing that motor facilitation may originate from a wide range of sensorimotor contingencies, for instance, when action (Catmur, Walsh, & Heyes, 2007) and nonaction (Petroni, Baguear, & Della-Maggiore, 2010) visual cues are repeatedly paired with succeeding finger movements. To our knowledge, this is the first study to validate the idea that arbitrary sounds (without a previous motor or verbal meaning and not embedded in a sequence of tones) can be rapidly mapped onto the motor system.
In the second experiment, we further demonstrated that the motor cortex causally linked the sound heard to the corresponding button rather than to the specific movement that was used to generate it. The evidence of such an abstract representation of the goal of the action has a possible correspondence with the monkey brain. Electrophysiological studies in monkeys have reported motor cortex neurons coding for extrinsic reference frames for coordinated actions involving more than one effector (Graziano & Aflalo, 2007; Graziano, 2006; Graziano et al., 2002) and for the goal that the animal aims to achieve, regardless of which motor pattern has to be activated (Umiltà et al., 2001, 2008; Kakei, Hoffman, & Strick, 2001, 2003; Kakei et al., 1999; Alexander & Crutcher, 1990a, 1990b; Rizzolatti et al., 1988).
This leads to the question of where the abstract button–tone relationship tested in our study is implemented in the human brain. TMS experiments have repeatedly demonstrated that the primary motor cortex, or M1, is modulated by action execution as well as by action perception (e.g., Fadiga, Fogassi, Pavesi, & Rizzolatti, 1995). Similarly, a magnetoencephalography study has shown that the oscillatory activity originating in M1 after movement execution (cf. Salmelin & Hari, 1994; Murthy & Fetz, 1992) was also present after visual and auditory presentation of actions (Caetano et al., 2007). However, most of the neurons in M1 code the action in intrinsic (e.g., specific muscle, joint, and digit movements) rather than extrinsic (e.g., the relative position of the target and the hand in space) coordinate frames (Kakei et al., 1999, 2001, 2003; Kurata & Hoshi, 2002). Indeed, the modulation of M1 activity as well as corticospinal excitability measured in this study may also be driven by premotor cortices to which M1 has strong reciprocal connections (Dum & Strick, 2005; Shimazu, Maier, Cerri, Kirkwood, & Lemon, 2004; Matelli, Camarda, Glickstein, & Rizzolatti, 1986).
Therefore, a plausible candidate for the question posed above could be the pars opercularis of the inferior frontal gyrus (POp), the human homolog of the monkey premotor cortex (PM; area F5), where mirror neurons were originally discovered. Some experiments (reviewed in Fernandino & Iacoboni, 2010; Gentilucci & Dalla Volta, 2008) have reported a functional distinction between the POp and PM in action recognition. Whereas the POp appears to encode extrinsic features related to action goals, the PM seems to process rather lower-level intrinsic parameters regarding the specificity of movements.
Another area that is involved in both action execution and action perception may also be responsible for our results: the inferior parietal lobule (IPL). In a recent fMRI study, Jastorff, Begliomini, Fabbri-Destro, Rizzolatti, and Orban (2010) presented participants with video clips of an agent executing different motor acts with different effectors (foot, hand, and mouth). While the clusters of activation in the PM were grouped according to the effectors performing the actions observed, in the IPL, the observed motor acts were coded in terms of their behavioral significance and as a function of the relationship between the agent and object (e.g., bringing the object toward the agent or moving it away), regardless of the effectors used. Fogassi and colleagues (2005) reported that, during action execution, monkey IPL neurons coding for a particular motor act (e.g., grasping) discharged only if a consecutive action followed (e.g., placing). Interestingly, some of these neurons had mirror properties and, during action observation, discharged differently when the same motor act was embedded in different meaningful actions (e.g., grasping for eating, but not for placing). The current interpretation is that these cells would allow recognition of the observed motor act and prediction of which action the agent will perform next (Fogassi et al., 2005; for a review, see Rizzolatti & Sinigaglia, 2010). More recently, Peeters et al. (2009) demonstrated that, besides the other parieto-frontal circuits involved in grasping, the motor programs required to operate tools may activate the rostral part (anterior SMG or aSMG) of the human IPL. It is important to note that this activation was also observed during the perception of the sounds of tool use (e.g., cutting paper with scissors, cleaning metal with a brush) and during tool manipulation with the right (dominant) hand (Lewis et al., 2005).
Unfortunately, our data cannot establish the source of the modulation observed in M1. Nonetheless, we argue that the results may be reasonably explained by assuming that the perception of the tones activated a higher-level audio-visuo-motor network that, instead of embodying a somatotopic representation of trained muscle activations, controls a broader range of movement parameters and is activated according to the causal relationships between the tool–button and the perceivable consequences obtained by using it. Because each cortical point is connected to many muscles and each muscle receives input from many cortical sites (for a review, see Graziano, 2006), it is not surprising that the muscles of different fingers were facilitated by the same sounds in different conditions. Support of our results is provided by a semantic priming experiment in which Galati and co-workers (2008) found no effector-dependent activation of the prefrontal and premotor cortices, suggesting that the embodiment of the sounds of actions may indeed be abstract and high level.
An issue here is why cortical activity would code the goal of the action rather than the movement used to achieve it. Under ecological circumstances, sounds are often produced by interacting with objects (in our case, pressing a certain button generated a tone) and multiple body movements will often result in the same auditory effect. Hence, it is more likely that the brain will prioritize a representation of the object-sound association (here defined as the goal of the action) rather than that of the movement to sound.
So far, the somatotopic mapping of action-related sounds has been largely emphasized. Our experiment seems to represent an exception with respect to previous TMS experiments that have typically revealed an exact replication of the action-related sounds in the listener's motor cortex. We believe that this difference relies on the fact that previous experimental paradigms were not designed to disentangle the movements from the goals of the action (D'Ausilio et al., 2006; Buccino et al., 2005; Aziz-Zadeh et al., 2004; Watkins et al., 2003; Fadiga et al., 2002).
It is tempting to speculate that audio-visuo-motor associations, as demonstrated in this study, are involved in action selection. For instance, ideomotor theory claims that voluntary actions are guided by the internal anticipation of the action's sensory consequences (Herwig et al., 2007; Prinz, 1997). In agreement with this notion, our results may be taken to suggest that actions are selected with respect to their expected sensory outcomes at a subthreshold level of motor activation. These results, therefore, corroborate theories claiming that actions are not purely coded in terms of motor output but rather in terms of action goals (e.g., Fogassi et al., 2005; Umiltà et al., 2001).
In conclusion, our study indicates that the adult audio-visuo-motor system is highly plastic and it is sensitive to the individual's immediate sensorimotor experience. Thus, motor facilitation in response to sound can be rapidly developed when individuals associate their actions with any kind of auditory effect. We further demonstrate that the motor system holds a higher-level representation of these newly acquired audio-visuo-motor associations and that it takes the variable contexts to which the individual is exposed into account. These results extend previous findings in the domain of action listening and support the idea of a preferential involvement of the motor system in the recognition of higher-level features of behavior rather than in a process of action mirroring via somatotopic mapping.
We acknowledge the support from the DFG (Schu2471/1-1), and ANR (ANR-08-FASHS-13). We thank Pedro Cardoso-Leite, Antje Gentsch, Giacomo Novembre, and two anonymous reviewers for their comments on the manuscript. We would also like to thank Jan Bergmann and Jeanine Auerswald for their technical assistance and help with data collection and Rosie Wallis for editing this article.
Reprint requests should be sent to Luca F. Ticini, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, D-04103 Leipzig, Germany, or via e-mail: firstname.lastname@example.org or Florian Waszak, Laboratoire Psychologie de la Perception, Université Paris Descartes, 45 rue des Saints Pères, 75270, Paris Cedex, France, or via e-mail: email@example.com.