Embodied theories hold that understanding what another person is doing requires the observer to map that action directly onto his or her own motor representation and simulate it internally. The human motor system may, thus, be endowed with a “mirror matching” device through which the same motor representation is activated, when the subject is either the performer or the observer of another's action (“self-other shared representation”). It is suggested that understanding action verbs relies upon the same mechanism; this implies that motor responses to these words are automatic and independent of the subject of the verb. In the current study, participants were requested to read silently and decide on the syntactic subject of action and nonaction verbs, presented in first (1P) or third (3P) person, while TMS was applied to the left hand primary motor cortex (M1). TMS-induced motor-evoked potentials were recorded from hand muscles as a measure of cortico-spinal excitability. Motor-evoked potentials increased for 1P, but not for 3P, action verbs or 1P and 3P nonaction verbs. We provide novel demonstration that the motor simulation is triggered only when the conceptual representation of a word integrates the action with the self as the agent of that action. This questions the core principle of “mirror matching” and opens to alternative interpretations of the relationship between conceptual and sensorimotor processes.
The simulationist view of embodied cognition holds that action understanding results from the automatic mapping of a perceived action onto the perceiver's motor system, where a simulation of that action is carried out (Rizzolatti, Fogassi, & Gallese, 2001). “Self” and “other” must, therefore, share representation, as the actions made by either agent rely on the same neuronal underpinning (Decety & Chaminade, 2003). Studies of the macaque brain provided a physiological foundation to this hypothesis, revealing the existence of neurons in the motor circuitry that respond to the action, regardless of whether it is being executed by the macaque itself, or it is observed, while performed by a third agent (Gallese, Fadiga, Fogassi, & Rizzolatti, 1996; Rizzolatti, Fadiga, Gallese, & Fogassi, 1996; di Pellegrino, Fadiga, Fogassi, Gallese, & Rizzolatti, 1992). In humans, although the existence of neurons with such “mirror” properties is controversial (Lingnau, Gesierich, & Caramazza, 2009; Turella, Pierno, Tubaldi, & Castiello, 2009; Dinstein, 2008; Dinstein, Thomas, Behrmann, & Heeger, 2008), a similar mirror matching mechanism (direct matching is used synonymously), endowed in the motor system, has been inferred for action understanding.
The investigation of the human mirror matching mechanism focuses on the involvement of the neural correlates of motor simulation in processing action-related stimuli. One line of research maintains that simulation, the internal execution of a motor act, is not limited to rehearsing early stages of action, but it recruits the whole system for execution, for it involves “everything that is involved in an overt action, except for the muscular contractions and the joint rotations” (Jeannerod, 2003; see also Porro et al., 1996; Jeannerod & Decety, 1995). Thus, the activation of the primary motor cortex (M1) is taken as a reliable correlate of simulation and its association with the processing of action-related stimuli as one direct demonstration in favor of a simulation-based mechanism for action understanding (Rizzolatti & Sinigaglia, 2010).
Following evidence for motor and premotor activations (typically lateralized to the left hemisphere), not only when an action was observed (Fadiga, Fogassi, Pavesi, & Rizzolatti, 1995) but also when it was implied by a word meaning (Hauk, Johnsrude, & Pulvermüller, 2004), the mirror matching has been generalized to language. One influential—among the many—hypothesis of embodied cognition (for a historical overview, see Scheerer, 1984) proposes that the lexical–semantic processing of action words, just like the processing of any other action–stimulus, depends upon the early (within 250 msec) and automatic activation of the same action–representation circuitry for execution and observation via motor simulation (Pulvermüller, 2005). Moreover, following the principle grounding the mirror matching that self and others share action representations, it is conceivable to assume that motor areas respond equally to words describing oneself and another's action. Thus, determining whether motor activation is automatic or is modulated by contextual factors, such as the agent/subject performing the action, is crucial to assess whether a mirror matching mechanism operates during action word processing in humans but also whether this mechanism actually serves lexical–semantic processing. Early (within 250 msec) and automatic (context-independent) motor activation is indeed considered a strong point in support of the thesis that simulation is not simply a by-product but is essential for the lexical–semantic encoding of action language (Chersi, Thill, Ziemke, & Borghi, 2010; Pulvermüller, 2005).
Although the motor facilitation for self versus others' action has been widely studied in action observation (Decety & Chaminade, 2003), only Tomasino, Werner, Weiss, and Fink (2007) have explored this phenomenon in action–language understanding. The authors reported no difference in BOLD signal in M1, when participants imagined the content of action phrases adopting the first person (1P) or third person (3P) perspective. However, as participants were instructed to report whether the mental scene took place indoor or outdoor, it is possible that they failed to focus on the subject of the verb and used the egocentric perspective by default (see Willems, Hagoort, & Casasanto, 2010).
Soliciting individuals to represent the subject together with the meaning of an action verb can crucially contribute to establish how self and others' actions may modulate motor activity. The syntactic subject (or the person) of a verb plays a powerful role in determining the perspective the individuals adopt, when they mentally represent an implied language event. Brunyé, Ditman, Mahoney, Augustyn, and Taylor (2009) demonstrated that individuals were faster in matching a 1P sentence (e.g., I am slicing the tomato) with a picture representing an action in the first person than in the third-person perspective, whereas the opposite was true for 3P sentences.
The use of linguistic material may possibly serve the investigation of motor facilitation related to 1P and actions related to 3P better than visual stimuli. This consideration is well captured by the fMRI data, showing a greater activation within sensorimotor regions when participants viewed static body parts from the egocentric perspective (i.e., the subject looks at her own toes) than from the allocentric perspective (i.e., the subject looks at the toes of someone in front of her; Saxe, Jamal, & Powell, 2006). Thus, the mere visual difference between 1P and 3P actions can, in itself, produce differential activation in the sensorimotor system, bringing a confounding factor into the interpretation of the agency effect (see also Schütz-Bosbach, Mancini, Aglioti, & Haggard, 2006). In contrast, words enjoy the advantage of holding visual inputs associated with either action perspective constant, thus preventing visual effects from influencing the motor system.
In this study, TMS was combined with a linguistic task to test word-related motor facilitation. Italian participants were engaged in a referential judgment task: They were requested to read and report the subject of Italian action and nonaction verbs presented in 1P (afferr-o, I grasp) or 3P (afferr-a, he grasps). In each trial, only the verb inflected to the 1P or the 3P of the present tense was presented (root [afferr] + suffix [-o] for 1P and [-a/e] for 3P). In fact, Italian is a pro-drop language, in which the subject of the verb does not need to be expressed and can be inferred from the verb suffix, so that an inflected form corresponds to a full sentence (or a sentence fragment). TMS was applied to the left hand M1 to measure cortico-spinal excitability, defined by the amplitude of TMS-induced motor-evoked potentials (MEPs) in peripheral muscles responding to the stimulated area (i.e., hand muscles). This approach allowed us to investigate two hierarchical questions. First, are motor responses to action verbs automatic and independent of the subject of the action? A positive answer to this question would maintain that a mirror matching mechanism contributes to action language processing. A negative one would then prompt a second question: Is motor activation greater for self-referential or for other-referential action verb meaning?
Although measuring motor facilitation for 1P and 3P action verbs was our primary objective, participants' RTs and accuracy were also collected. TMS delivery to a brain area temporarily disrupts its activity (Harris, Clifford, & Miniussi, 2008); therefore, possible changes in the behavioral performance following M1 TMS could provide insight into the question as to whether this region is causally involved in the current task. Combining physiological (MEPs) and behavioral measures is required because activation of a brain area does not imply immediately that that area is necessary for task performance (Price & Friston, 2002).
Sixteen native Italian university graduates and undergraduates (eight women, 20–35 years) participated in the experiment. All were right-handed (laterality quotient = 80–100; Oldfield, 1971) with normal or corrected-to-normal vision. Before the experiment, they received information about TMS, compiled a questionnaire to ensure they were clear of contraindications (Wassermann, 1998), and confirmed their voluntary participation in writing. The study was approved by Scuola Internazionale di Studi Superiori Avanzati (International School for Advanced Studies) Ethics Committee.
Sixty-four Italian verbs denoting hand actions (“mescolare,” to stir) and 64 nonaction verbs (state/psychological verbs; “meditare,” to wonder) were used, each presented both in the 1P (“mescolo,” I stir; “digito,” I type; “scrivo,” I write) and in the 3P (“mescola,” he stirs; “digita,” he types; “scrive,” he writes) of the present tense, yielding a total of 256 items. All hand action verbs denoted meaningful transitive actions (the hand is acted upon an object) with the exception of two (“salutare,” weaving goodbye and “indicare,” pointing at), which implied an intransitive action (the hand action is performed without an object). Verbs were chosen from a database used in a previous study (Papeo, Vallesi, Isaja, & Rumiati, 2009), where 375 verbs were first classified according to the criteria of linguistic tradition that distinguishes, on the basis on their causal structure, eventive verbs (e.g., to kiss, pick, kick) implying a change from an initial situation to a resulting one, and typically used to describe actions from stative verbs (e.g., to love, belong, and contain) entailing a single stable situation (Jackendoff, 1990; Taylor, 1977; Vendler, 1967). These items were then presented to a panel of 10 judges who evaluated the strength of the semantic association of each verb with action and the body part involved (Papeo et al., 2009). All 128 experimental stimuli selected for this study were presented both during TMS (List 1) and sham stimulation (List 2), but verbs that appeared in the 1P form in List 1 (50%) were presented in the 3P form in List 2 and vice versa. The two lists of verbs were matched for length (number of graphemes), written frequency of the lexeme (Laudanna, Thornton, Brown, Burani, & Marconi, 1995), and degree of agreement across the 10 subjects rating the semantic variables. Moreover, within each list, the same psycholinguistic and semantic variables were matched across stimuli in the four experimental conditions (1P action, 1P nonaction, 3P action, and 3P nonaction verbs). Comparisons were performed with two-tail t tests (all ps > .05). The inflectional morphology of the present tense was regular, as described above (i.e., root + suffix [-o or -a/e]), for all verbs.
Participants sat on a height-adjustable chair 1 m from an LCD screen that displayed the stimuli (font: Arial 4 pt); they had to read each verb silently and state whether the syntactic subject was the 1P or 3P. They were encouraged to read each verb accurately to perform a successive recognition test. The instruction to read words for delayed recognition is held to trigger deep levels of processing, whereby depth is intended as the extent to which meaningfulness is extracted from the stimulus (Lockhart & Craik, 1990). The recognition test was, therefore, included to ascertain that participants processed the whole verbs and not just the suffix that, in Italian, could be sufficient for extracting information about the person. Each trial began with a 1500-Hz pure tone followed by a 100-msec blank screen, after which a central fixation was displayed for 1450 msec. The screen went blank again for 50 msec, then the verb appeared in the center for 350 msec. Afterward, three dots were displayed for 4075 msec, soliciting the participants' response. On conclusion of this cycle, the next trial began. Each trial lasted 6025 msec, a time sufficiently long to prevent interaction between consecutive TMS pulses (Robertson, Théoret, & Pascual-Leone, 2003). Half of the participants were instructed to respond /ba/ when the subject was 1P and /da/ when the subject was 3P. The remaining participants were given opposite instructions. These syllables were preferred to the more obvious “Io” (I) and “lui” (he) to avoid response bias because of phonological difference between the two responses. The voice-onset time was recorded as a measure of RTs using a microphone connected to the external response box of an E-prime PC-controlled system (Psychology Software Tools, Inc., Pittsburgh, PA). Response accuracy was recorded on-line by an experimenter pressing the right mouse key for 1P responses and the left for 3P responses. Each participant performed a training phase (eight trials) and then two blocks, one with M1 TMS and one with sham stimulation, each comprising a list of 128 items (32 1P action verbs, 32 3P action verbs, 32 1P nonaction verbs, and 32 3P nonaction verbs) for a total of 256 randomized trials. Each stimulation block was associated with a different list of stimuli, thus yielding the following four possible combinations [(i) Block 1: TMS List 1, Block 2: Sham List2; (ii) Block 1: TMS List 2, Block 2: Sham List 1; (iii) Block 1: Sham List 1, Block 2: TMS List 2; (iv) Block 1: Sham List 2, Block 2: TMS List 1], which were evenly distributed across all participants. Within each block, a pause was allowed every 32 consecutive trials. The experimental design was 2 × 2 × 2 with conditions TMS (M1 TMS and sham), person (1P and 3P), and verb category (hand action and nonaction), all manipulated within subjects.
Site and Intensity
Single-pulse TMS was applied to the left M1, using a Magstim 200 stimulator (Magstim Company, Withland, UK) connected to a figure-of-eight coil, positioned over the participant's cortical representation of the right-hand first dorsal interosseus muscle (FDI). This site was mapped starting from the Cz reference point of the international 10–20 EEG system (Jasper, 1958) and moving approximately 6 cm leftward, that is, position C3/C4. The optimal scalp position for the induction of maximum MEP amplitude in the right FDI muscle was individuated and marked on each participant with a cosmetic pencil. The coil, tangential to the scalp surface, was maintained in position by an articulated arm. The TMS intensity was adjusted to 120% of the motor threshold at rest (mean threshold = 40 ± 2.8% of the maximum stimulator output), defined as the minimum intensity to evoke MEPs with ≥50 μV peak-to-peak amplitude in the relaxed FDI, in at least three of five consecutive pulses (Rossini et al., 1994). Participants were instructed to keep their right arm or hand and head motionless and were monitored throughout the experiment for muscle relaxation and the presence of a muscle twitch (i.e., an abduction movement of the right forefinger) after each M1 TMS delivery. The same magnetic pulse intensity was used for both TMS and sham stimulation. In sham, the coil was held perpendicularly to the scalp surface over the left M1. This condition provides a control for nonspecific effects of TMS, as it mimics the characteristic TMS noise and the mechanical vibration of the coil, although magnetic stimulation does not reach the scalp (Robertson et al., 2003). The order of the two stimulation conditions was counterbalanced across the participants who were not informed on whether they were going to receive TMS or sham stimulation.
TMS-induced MEPs were recorded by a pair of gold surface electrodes placed over the FDI (active electrode) and the metacarpophalangeal joint of index finger (reference electrode). The ground electrode was placed on the ventral surface of the right wrist. The EMG signal was amplified and filtered (bandpass 20–2000 Hz) through a Grass amplifier (P122 series) and recorded with the Biopac system (MP150 model) at a sampling rate of 5 kHz. EMG data were transferred to a personal computer for off-line analyses with Matlab (MathWorks, Natick, MA).
In both stimulation conditions, TMS was delivered 250 msec after the stimulus-onset. This timing was set empirically on the basis of a pilot test. In this pilot (Pilot 1), the approximate timing for the retrieval of the referential information was estimated implementing the same experimental paradigm as in the TMS experiment, except that TMS was not used and participants (nine right-handed native Italians, seven women, 20–35 years) were instructed to respond /io/ (I) to 1P verbs and /lui/ (he) to 3P verbs. The mean RT for referential judgments across all conditions was 288 msec. This delay is fairly consistent with electrophysiological responses associated with referential processes (i.e., a negative deflection emerging before 300 msec after word onset; Van Berkum, Koornneef, Otten, & Nieuwland, 2007). Taking the result from Pilot 1 as a general indication of the timing when individuals retrieve the referential information (i.e., the person) conveyed by the verbs, the delay of TMS delivery in the main experiment was then set at 250 msec postword onset.
It is worth noting that this timing falls fairly within the interval associated with the word comprehension process, which ERP studies have related to a sustained activity that onsets around 200 msec, peaks around 400 msec, and offsets around 500 msec (Kutas & Federmeier, 2000). The virtual identity between motor simulation (reflected in motor activity) and word comprehension, posed by the simulationist hypothesis, implies a temporal overlap between the two. Accordingly, motor facilitation implicitly triggered by the processing of action language has been reported in a time window ranging from ∼200 msec (Glenberg et al., 2008; see also Pulvermüller, 2005; Hauk et al., 2004) to ∼500 msec (Papeo et al., 2009; Oliveri et al., 2004). By stimulating at 250 msec, we could, thus, measure M1 activity in the critical interval when participants were retrieving both the verb meaning and its referential information.
The results from Pilot 1 also revealed that 1P verbs were processed faster than 3P verbs, F(1, 8) = 9.71, p = .01. To establish whether this result reflected a true advantage in the processing of 1P items or was instead biased by phonological difference in vocal responses (i.e., /io/, I, /lui/, he), a second pilot (Pilot 2) was run. In this pilot, everything was identical to the former, except that the eight new participants (five women, 20–31 years) were instructed to respond with the syllables /ba/ and /da/, as in the main TMS experiment, the mapping of each syllable to the intended response (Io-lui) being counterbalanced across participants. The analysis confirmed the temporal advantage in processing 1P over 3P verbs, F(1, 7) = 22.87, p < .01,1 thus excluding that the results of Pilot 1 were affected by any response bias. Please notice that RTs in Pilot 2 were on average ∼118 msec longer than those in Pilot 1, presumably reflecting additional processing required to map the intended responses (Io-lui) into the appropriate response label (/ba/-/da/ or vice versa). This extra stage was indeed the only difference of Pilot 2 relative to Pilot 1, which enjoyed the highest stimulus–response compatibility mapping (i.e., participants responded /Io/ or /lui/ when the intended response was Io or lui, respectively).
Participants performed a recognition test at the end of the TMS session. The experimental setup was the same as before, except that TMS was removed from the participant's head and turned off. Eighty verbs were randomly presented, one at a time, on the computer screen in their infinitive form: 40 (20 action and 20 nonaction) were selected from the experimental stimulus set (“old” list); the remaining 40 were new action (n = 20) and nonaction (n = 20) verbs (“new” list). The old and new verb lists were matched for frequency and length. Each trial began with a fixation cross remaining on the screen for 400 msec and followed by the verb, shown up to 20 sec. Participants were instructed to read and decide whether the verb had been presented during the TMS section by pressing the right mouse key for yes response and the left one for no responses. They were encouraged to favor response accuracy over speed and received feedback about the correctness of response for each trial.
The criterion for including a participant in the off-line analysis was the successful performance on the recognition test. Therefore, a binomial test was performed on each individual performance (number of correct response) to check that it was significantly above the chance level (50%). This led to the exclusion of three participants (ps > .05).
In the referential judgment task during the TMS session, the remaining 13 participants achieved at least 90% accuracy (M = 97%) and were all included in the off-line analysis. In the MEP analysis, the peak-to-peak amplitude (mV) of each of the 128 MEPs obtained from each participant was computed with Matlab. Trials in which participants provided an incorrect response were discarded (6%). Using the individual mean and SD of each condition, we calculated the z scores of the remaining MEPs to discard values 2 SD above or below the individual condition mean. The remaining MEP values (mV) were then subjected to a 2 × 2 repeated measures ANOVA with Person (1P vs. 3P) and Verb Category (hand action vs. nonaction) as within-subject factors.
In the RT analysis, including both TMS and sham conditions, trials in which participants provided an incorrect response (3%) and those with RTs 2 SD above or below the individual condition mean (5% of correct responses) were discarded. Mean accuracy and RTs were entered in a 2 × 2 × 2 repeated measures ANOVA, with factors TMS (M1 TMS vs. sham), Person (1P vs. 3P), and Verb Category (hand action vs. nonaction). Post hoc comparisons were carried out using Fisher's least significant difference test (α ≤ 0.05).
The Person × Verb category interaction was significant F(1, 12) = 8.27, p = .01 (Figure 1): Motor facilitation was greater for 1P action verbs than for 3P action verbs (p < .03), whereas no difference was found between 1P and 3P nonaction verbs (p > .1). Motor facilitation for 1P action verbs was also greater relative to 1P and 3P nonaction verbs (ps < .05) but did not differ for 3P action verbs and 1P and 3P nonaction verbs (ps > .1). The main effects of Person and Category were not significant (ps > .1).
The analysis revealed significant effects of TMS, F(1, 12) = 5.29, p = .04, and Verb Category F(1, 12) = 7.61, p = .01. In fact, participants were more accurate during sham TMS than during M1 TMS delivery and performed better with nonaction verbs than with hand action verbs. However, the two factors did not interact, F(1, 12) < 1. The main effect of Person was not significant, F(1, 12) < 1. The same effects (i.e., effect of TMS: F(1, 12) = 8.85, p = .01; effect of Verb Category: F(1, 12) = 7.16, p = .02; no significant interaction) were replicated when signal detection methods were used to compute d′ (Green & Swets, 1989) as a measure of the sensitivity of 1P verbs.
The effect of Person was significant, F(1, 12) = 6.43, p = .02: Participants' responses were faster to 1P than to 3P verbs. The TMS × Person interaction approached significance, F(1, 12) = 4.36, p = .05 (Figure 2): During sham, participants were faster when processing 1P verbs than 3P verbs (p < .01); This advantage was abolished during M1 TMS delivery (p > .1). Neither the main effect of TMS nor the main effect of Verb Category resulted significant, F(1, 12) < 1.
Enhanced MEPs for action verbs (compared with nonaction verbs) was found only in the 1P condition but not in the 3P condition. Furthermore, the RT results showed that, in the sham condition, 1P verbs were processed overall faster than 3P verbs. Thus, it is possible that we failed to observe motor facilitation for 3P action verbs because the processing of these items was completed, on average, after 250 msec, the timing when M1 activity was recorded. If this were the case, enhanced MEPs should be associated with those 3P action verbs (as opposed to 3P nonaction verbs) that were processed as fast as the 1P items.
This was tested by carrying out a quartile analysis in which we analyzed only the MEPs corresponding to 3P trials with the shortest RTs, that is, with RTs comparable with those of the 1P condition. For each participant, we selected 3P trials whose RTs fell within the first and the second quartile of the individual 3P RT distribution; we then tested, for these trials only, putative MEPs difference between action and nonaction verbs. The mean RTs of the first quartile trials were 422 msec (SEM = 47 msec) and 427 msec (SEM = 53 msec) for 3P action and nonaction verbs, respectively. These values fell below the mean RTs to 1P action (586 ± 63 msec) and nonaction verbs (568 ± 60 msec). The two-tail t test revealed no difference in MEPs between 3P action and 3P nonaction trials, t(12) = −1.49, p > .16; moreover, the MEP amplitude was qualitatively greater for 3P nonaction verbs (3.3 ± 0.44) than for 3P action verbs (3.1 ± 0.44). The same analysis was repeated on trials belonging to the second quartile of each participant's RT distribution. The mean RTs of these trials (3P action: 531 ± 56; 3P nonaction: 529 ± 55) approached closely the mean RTs to 1P verbs. Again, we found that the MEP amplitude did not differ for 3P action and 3P nonaction verbs, t(12) = −0.6, p > .55, and was qualitatively greater for the latter type (3.5 ± 0.54) than for the former trials (3.4 ± 0.47).
Thus, we found no indication of motor facilitation associated with the processing of 3P verbs, even when considering those trials that were processed as fast as the 1P ones. This suggests that the overall slower RTs to 3P items (vs. 1P items) cannot explain the lack of motor facilitation associated with them: By extension, motor facilitation could be said to be greater, and not just earlier, for 1P action verbs than 3P action verbs.
With the present study, we provide a novel demonstration about the specificity of motor facilitation during action verb processing by healthy participants. We compared TMS-induced MEPs when they processed 1P or 3P hand action verbs and nonaction verbs and found that M1 activity increased for action (as opposed to nonaction) verbs, only when presented in 1P. Significantly lower excitability was found when the very same verbs were presented in 3P; in this case, no difference between action and nonaction verbs was found. Crucially, the analysis of participants' RTs showed that the temporary alteration of M1 activity (and forwardly connected regions; see Bestmann, Baudewig, Siebner, Rothwell, & Frahm, 2004)2 through TMS delivery did not affect selectively the processing of action verbs but impacted on the processing of the verbs' person, regardless of the verb category. In particular, during sham, participants exhibited a temporal advantage in processing 1P over 3P verbs, which was abolished when TMS interfered with M1 activity. Thus, although TMS-induced MEPs revealed a 1P effect that was specific for action verbs, TMS delivery influenced the RTs to 1P items in a similar way for action and nonaction words.
To the best of our knowledge, this is the first demonstration that motor facilitation does not occur automatically following any action word, but only after words implying self-referential action meanings. Neither the verb category nor the person factors can, alone, account for this result: A verb category effect would have yielded increased M1 activity also for 3P action verbs; likewise, an exclusive effect of the person would have resulted in increased M1 activity for 1P nonaction verbs too. It is worth to remark that 1P and 3P items were morphologically identical (i.e., they shared the same root), except for the last phoneme (the suffix) carrying the person information, and were comparable for frequency and length. Thus, the reported difference in motor facilitation for 1P and 3P action verbs can be hardly attributed to any item-specific or morpho-syntactic difference other than the person. Finally, because we compared motor facilitation on randomly interleaved trials, the effect appears immediate and not attributable to long-term contextual fluctuations of participants' cortical states, which can threaten block designs. We can also exclude that the task-related cognitive effort, which may cause increased cortical excitability (Scott, McGettigan, & Eisner, 2009), accounts for this interaction: During sham stimulation, participants were faster in responding to 1P than to 3P verbs and, although they were overall more accurate on action compared with nonaction verbs, there is no indication that 1P hand action verbs were more difficult to process than 3P hand action verbs. Our results have critical implications on when and to what extent M1 is engaged in action verb processing and what its role may be.
The first theoretical implication is that the engagement of motor simulation (as reflected in M1 activity) in the lexical–semantic processing of words is not automatic; instead, it is constrained by contextual factors, such as the word having an action meaning but also the agent of the implied action. If the motor facilitation reflected the encoding of the motor attributes of a word and if such motor encoding was necessary for word comprehension, as previously suggested (Pulvermüller, 2005), we should have found equally enhanced motor facilitation for 1P and 3P action verbs as the same attributes applied to both. Furthermore, the behavioral results showed no effect of M1 TMS on the participants' processing of action verbs. Thus, the combination of MEPs and behavioral data suggests that the language-induced M1 activation, granted by the current and many other studies, does not necessarily imply that this region is making a causal contribution to the lexical–semantic processing of action words.
The difference in M1 activity between 1P and 3P action verbs also questions the basic principle of the simulationist hypothesis that action understanding relies on shared neural representations for one's own and others' actions. The motor cortex seems capable of distinguishing between self and other by representing action contents in a subject-specific (1P) rather than subject-neutral format. The hypothesis that self and other actions share neural representation has been already challenged in the domain of action observation. For instance, using TMS, Maeda, Kleiner-Fisman, and Pascual-Leone (2002) found that action observation modulated the left M1 activity maximally when the observed action matched the observer's orientation. Likewise, Jackson, Meltzoff, and Decety (2006) reported greater BOLD responses in the left motor system for 1P than for 3P views of actions (see also Metzoff, 2007). With our study, the role of the acting subject in modulating motor activity has been extended to language processing.
A general role of M1 in self–other distinction is suggested by the abolition of the participants' advantage in processing 1P over 3P verbs when TMS stimulated M1. This advantage, found in both the pilots and the main experiment (with sham stimulation), can relate to the so-called “self-reference” effect, whereby self-related words are easier (e.g., faster RTs) to process than other-related words (Markus, 1977). Comparison between participants' behavior in TMS and their behavior in sham condition suggests a role of M1, which may relate to the corollary discharge mechanism for discriminating self-caused events from events caused by others (Sperry, 1950). In this perspective, the self attribution (the recognition of self as the agent), and by extension the self–other distinction, depends upon the interpretation, within specialized structures, of internal and external (sensory) signals associated with a movement, including feed-forward signals from M1 (Frith, Blakemore, & Wolpert, 2000). M1 forward connections may inform about the presence of 1P action verbs, which may contribute positively to the selection of a response during sham and may provide confounding information during M1 TMS. It is, however, important to observe that M1 activity was selective for action verbs, whereas the 1P advantage (and its abolition following M1 TMS) was applied to both action and nonaction verbs. Thus, although altered information about the presence of 1P items, because of M1 TMS, might efficiently explain the disappearance of the 1P advantage for action verbs, it cannot account for the presence of a similar effect for the nonaction verbs. Further research is needed to clarify the mechanism for self-attribution and self–other distinction and the role of M1 and of other areas possibly involved in it.
In summary, our results show that processing action words facilitates the motor system, depending on the agent to whom the implied action is attributed: 1P action verbs facilitate M1 activity, whereas 3P action verbs do not (or, at least, significantly less). It follows that motor simulation is not triggered automatically by the word's action content itself, but only when the conceptual representation of a word integrates the action with the self as the agent of that action.
Intriguingly, these constraints to the involvement of motor simulation in language parallel to those that typically elicit kinaestethic motor imagery. Imagery, the internal rehearsal of a motor act, is accompanied by kinaesthetic sensations and motor activations similar to those of the action execution, when people generate and transform images of their own body movements (Solodkin, Hlustik, Chen, & Small, 2004). Activity in M1, premotor, and somatosensory regions is greater when people represent actions in 1P than in 3P perspective (Jackson et al., 2006) and when they imagine to rotate objects with their own hands rather than when they imagine objects rotating on their own (Kosslyn, Ganis, & Thompson, 2001). Following these results, a virtual identity has been established between the kinaestethic motor imagery and the first-person imagery.
An open question is whether motor simulation, implicitly triggered by (1P) action language, is truly the same as the explicit employment of kinesthetic motor imagery. Although the latter is often considered the conscious counterpart of the former, the extent to which the two processes overlap remains unclear (Jeannerod & Decety, 1995). Recently, an fMRI study by Willems, Toni, Hagoort, and Casasanto (2010) compared the neural activity associated with the implicit simulation triggered during lexical decision on hand action verbs and the explicit imagery of the content of those verbs. Univariate and multivariate analyses focusing on the left M1 revealed that both tasks activated this region, although, in nonoverlapping portions, suggesting the recruitment of different neuronal populations (similar results were obtained by analyzing premotor regions). If one embraces such distinction between implicit simulation and explicit simulation (or imagery), it is most likely that the motor facilitation we measured was a reflection of the former. In our study, participants were not explicitly cued to imagine the content of the verbs but were engaged in an active task driving their attention toward the referential information. Moreover, M1 activity was recorded at 250 msec poststimulus, that is within the interval associated with language-related motor simulation (Hauk, Shtyrov, & Pulvermüller, 2008), and much earlier than the slow and long-lasting interval for constructing conscious images (from 400 to 750 msec or more; Iwaki, Ueno, Imada, & Tonoike, 1999; Kosslyn & Ochsner, 1994). Thus, the parallelism between language-induced simulation and explicit imagery may be limited to the constraints, at stimulus level, that trigger the two processes (i.e., the presence of a self-related action content). Neither can we conclusively exclude the involvement of explicit motor imagery in the MEP effect we observed. Our objective here was to investigate the relationship between language and motor simulation; whether the latter corresponded to an implicit or an explicit process will be an issue for upcoming studies.
The idea that motor simulation induced by an action word is not the same as the lexical–semantic encoding of that word accommodates at best recent realizations that motor activation arises during certain but not all conditions of action word processing. TMS and imaging studies reported increased motor activity only during tasks that focused participants' attention on the motor attributes of words (Papeo et al., 2009; Tomasino et al., 2007) or when action words were presented in a context (a sentence) that emphasized the representation of body movements (Raposo, Moss, Stamatakis, & Tyler, 2009). Moreover, although motor facilitation following action language is the most consistent result across TMS studies (Papeo et al., 2009; Glenberg et al., 2008; Tomasino, Fink, Sparing, Dafotakis, & Weiss, 2008), one study by Buccino et al. (2005) reported decreased M1 activity when participants listened to action sentences and another by Oliveri et al. (2004) found no difference between action and nonaction verbs when delivering single-pulse TMS to M1. As in both studies, verbs were presented in the 3P, our finding that increased M1 activity is most likely to occur following 1P action language may now help clarify apparently conflicting data.
Nothing we propose here implies that 3P action language can never trigger motor simulation. If people do not routinely attend to the referential information and use the egocentric perspective as default in mental simulation (see Willems, Hagoort, et al., 2010), the motor aspects of word meaning to which the 1P perspective is applied could be themselves sufficient to elicit motor activity. On the contrary, Brunyé et al. (2009) showed that, when the linguistic elements in a sentence clearly cued the 1P or the 3P as the performer of an action, readers mentally represented or “visualized” the implied events in the egocentric or in the allocentric perspective, respectively. Those findings first suggested that the subject/agent to whom an implied action (1p or 3P) is attributed may trigger nonequivalent types of action representation. Our findings add that the 1P action representations are motor in nature (i.e., they are mediated by structures, such as M1), whereas the representations triggered by 3P action meanings do not—or are less likely to—engage motor processes.
Although there is evidence that motor simulation can be implicitly triggered by any stimulus or task with sensorimotor components (Kosslyn, Behrmann, & Jeannerod, 1995), only the simulationist account of action understanding predicts that it is necessarily and obligatorily engaged in processing action-related words. The current study reveals the limit of this view: Simulation is not the same as the word comprehension process. It could instead serve to establish an interface between conceptual (abstract) and more basic (sensory or motor) representations, which can be functional to the comprehension in itself, by enriching concepts with a physical instantiation (Rumiati, Papeo, & Corradi-Dell'Acqua, 2010; Mahon & Caramazza, 2008). Otherwise, or in addition, simulation following comprehension can serve proper motor functions, like action anticipation (Csibra, 2007); the enhancement of motor activity can have the adaptive value of bringing the system close to threshold for actual execution when words are intended as motor commands (Postle, McMahon, Ashton, Meredith, & de Zubicaray, 2008). Although conjectural at this point in time, these interpretations have the potential to stimulate a productive way of thinking about how two distinct representational levels, such as the language and the sensorimotor ones, may interact.
This research was supported by a PRIN to R. I. R. We are very grateful to Patrick Haggard for commenting on an early version of the project and to Alfonso Caramazza for his valuable help with an earlier version of the manuscript. We thank Alessio Isaja for his assistance in all technical aspects.
Reprint requests should be sent to Liuba Papeo, Cognitive Neuroscience Sector, Scuola Internazionale di Studi Superiori Avanzati (International School for Advanced Studies), Via Bonomea, 265, Trieste, Italy, or via e-mail: email@example.com.
Mean RTs to the verbs in the two experiments were tightly correlated (n = 256; r = 0.24, p > .0001).
Notice that, although our stimulation technique provided a sensitive eye on M1 activity, but not on previous stages of motor processing, it is still possible that the cortico-spinal excitability measured with MEPs is the result of neuronal activation in regions outside M1, such as premotor cortex, supplementary and cingulated motor areas, which enjoy direct projections to spinal cord (Morecraft et al., 2002; Fadiga et al., 1995; Luppino, Matelli, Camarda, & Rizzolatti, 1994).