Interactions between the visual system and the motor system during action observation are important for functions such as imitation and action understanding. Here, we asked whether such processes might be influenced by the cognitive context in which actions are performed. We recorded ERPs in a delayed go/no-go task known to induce bidirectional interference between the motor system and the visual system (visuomotor interference). Static images of hand gestures were presented as go stimuli after participants had planned either a matching (congruent) or nonmatching (incongruent) action. Participants performed the identical task in two different cognitive contexts: In one, they focused on the visual image of the hand gesture shown as the go stimulus (image context), whereas in the other, they focused on the hand gesture they performed (action context). We analyzed the N170 elicited by the go stimulus to test the influence of action plans on action observation (motor-to-visual priming). We also analyzed movement-related activity following the go stimulus to examine the influence of action observation on action planning (visual-to-motor priming). Strikingly, the context manipulation reversed the direction of the priming effects: We found stronger motor-to-visual priming in the action context compared with the image context and stronger visual-to-motor priming in the image context compared with the action context. Taken together, our findings indicate that neural interactions between motor and visual processes for executed and observed actions can change depending on task demands and are sensitive to top–down control according to the context.
The relationship between perception and action is a key factor in unravelling the current debate over how we understand others' intentions (Rizzolatti & Sinigaglia, 2010), how we are able to imitate (Iacoboni, 1999, 2005; Buccino et al., 2004), and how we select our actions in social contexts (Bonini & Ferrari, 2011). Whereas traditional views considered perception and action as two independent systems working serially, recent neurophysiological and behavioral data have shown that the visual and motor systems are directly linked and interdependent (Schütz-Bosbach & Prinz, 2007). Compelling evidence for this new conceptualization has been found in studies of action observation. It is well known that observing another's actions influences activity in the observer's own motor system (visual-to-motor priming; Kilner, Paulignan, & Blakemore, 2003; Fadiga, Fogassi, Pavesi, & Rizzolatti, 1995), so that he or she is faster to initiate movements that match (or are congruent with) a concurrently observed action than to initiate movements that are mismatched (incongruent; Craighero, Bello, Fadiga, & Rizzolatti, 2002). What is less widely appreciated is that motor plans can also modulate the perceptual processing of observed actions (Bortoletto, Mattingley, & Cunnington, 2011; Cattaneo et al., 2010; Press, Gherri, Heyes, & Eimer, 2010) and influence perceptual judgments (motor-to-visual priming; Schütz-Bosbach & Prinz, 2007).
In primates, networks of parietal and premotor cortical areas provide a link between visual and motor systems. The discovery of mirror neurons in these regions, that is, neurons that respond during both action execution and action observation, suggests that mirror neurons provide a potential neurophysiological substrate for the interaction between action and perception (Gallese, Fadiga, Fogassi, & Rizzolatti, 1996). Mirror mechanisms might therefore support sensory–motor transformations of actions during action observation and mediate visuomotor interference effects, as many researchers have proposed (Gallese, Gernsbacher, Heyes, Hickok, & Iacoboni, 2011; Heyes, 2011; Longo & Bertenthal, 2009; Schütz-Bosbach & Prinz, 2007; Kilner et al., 2003; Craighero et al., 2002).
A crucial question is whether the link between perception and action, that is, visual–motor interaction, can be regulated by higher cognitive processes. Human studies on action observation suggest that attention can modulate interactions between the visual and motor systems. Indeed, directing attention to an observed action (Spengler, Brass, Kühn, & Schütz-Bosbach, 2010; Chong, Cunnington, Williams, & Mattingley, 2009; Bach, Peatfield, & Tipper, 2007) or to specific action-related features of a stimulus (Longo & Bertenthal, 2009; Longo, Kosobud, & Bertenthal, 2008) increases visuomotor interference effects and affects cortical activity associated with action observation (Schuch, Bayliss, Klein, & Tipper, 2010; Chong, Cunnington, Williams, Kanwisher, & Mattingley, 2008). Moreover, it has been shown that motor cortex activity during action observation, as indexed by mu-rhythm suppression (Muthukumaraswamy, Johnson, & McNair, 2004; Pineda, Allison, & Vankov, 2000), can be modulated by involvement in social interactions (Perry, Stein, & Bentin, 2011; Oberman, Pineda, & Ramachandran, 2007). Nevertheless, Cook and coworkers (Cook, Bird, Lünser, Huck, & Heyes, 2012) have shown that the tendency to imitate others' actions is automatic, in that it occurs even when imitative behavior is unhelpful (Cook et al., 2012), suggesting that visuomotor interference (i.e., automatic imitation) may not be completely inhibited by actors' intentions.
Here, we explored top–down regulation of visuomotor interference by varying the cognitive context in which participants performed hand gestures that they executed while viewing congruent or incongruent gesture images. On the basis of previous work (Teufel, Fletcher, & Davis, 2010; Vogt, Taylor, & Hopkins, 2003; Craighero et al., 2002), we reasoned that the direction of any visual–motor interaction, that is, either as visual-to-motor priming or motor-to-visual priming, should change depending on contingent cognitive demands. Specifically, when participants are required to observe and attend to actions performed by someone else, visual-to-motor priming should be most salient and thus observed actions should exert a robust influence on the preparation and execution of one's own actions. Conversely, when participants are required to monitor their own actions, motor-to-visual priming should become more apparent and thus intended actions should have a greater influence on the visual processing of observed actions. Such evidence would provide a crucial test of accounts that assume that top–down control in response to task demands dynamically alters bidirectional interactions between visual and motor systems during action observation.
Here, we distinguished visual-to-motor and motor-to-visual priming using a novel behavioral task combined with simultaneous recording of ERPs. Visual ERPs evoked by action observation, during concurrent action planning, permit examination of the influence of the motor system on early visual processing of observed actions (motor-to-visual priming). Previous studies have shown a negative ERP component, occurring around 170 msec after the presentation of images of body parts, that is referred to either as an N170 (Kovacs et al., 2006) or N1 (Taylor, Roberts, Downing, & Thierry, 2010) component. This N170 peak corresponds to the structural encoding of the stimulus (Kovacs et al., 2006; Eimer, 2000; Bentin, Allison, Puce, Perez, & McCarthy, 1996) and is modulated by the congruency between performed actions and observed actions (Bortoletto et al., 2011). The N170 therefore provides an index of neural processes associated with visual perception of observed gestures (Peelen & Downing, 2007). Similarly, movement-related potentials preceding the motor response to a visual stimulus can be employed to examine the influence of a visual stimulus on neural processes associated with planning for action (visual-to-motor priming; Eimer, 1998; Gratton et al., 1990). Indeed, the neural processes associated with the preparation and selection for action involve activity lateralized to the motor cortex contralateral to the side of movement and are reflected in lateralized readiness potentials (LRPs; Leuthold, Sommer, & Ulrich, 2004; Kutas & Donchin, 1980). Here, we examined the N170 and LRP to index visual and motor processes as participants observed and executed actions concurrently, with the aim of investigating whether visual–motor interactions during action observation are modulated by cognitive context.
Twenty-four right-handed healthy volunteers (12 women, aged 18–39 years, mean age = 22.5 years) gave their written informed consent to participate in the study. The study was approved by the medical research ethics committee of The University of Queensland. Data from four participants were excluded from analyses because of excessive artifacts in the EEG signal.
Participants were comfortably seated in a dimly illuminated room, facing an LCD monitor placed 70 cm in front of them. Their hands rested in a comfortable position on a table, with response boxes positioned adjacent to each hand. Their right arm was placed beneath a cardboard occluder to prevent them from seeing their own hand movements.
The experiment consisted of a delayed go/no-go task (Figure 1). In each trial, participants were required to prepare one of four possible hand gestures as indicated by a word cue presented in the center of the display (“OK,” “peace,” “thumbs-up,” or “point”) and to perform the gesture as quickly as possible upon presentation of the go stimulus. The go stimulus consisted of a photograph of one of the four possible hand gestures performed by the same actor, depicted from an allocentric (third-person) perspective (Figure 2A). All gesture images were presented centrally at fixation, within approximately 10° of visual angle. The gesture image either matched the action being prepared (congruent trials) or was one of the other nonmatching gestures (incongruent trials). Because gesture images depicted a right hand from an allocentric perspective, observed actions and performed actions did not share spatial features. To avoid a disparity in the frequency of presentation of particular actions as “congruent” or “incongruent,” each gesture was paired with one of the other gestures (counterbalanced between participants), and for incongruent trials, only the paired incongruent gesture was used. Participants responded to the go stimulus with their right hand by releasing a button and performing the prepared action as quickly as possible. Performed actions were monitored by the experimenter via an infrared camera mounted inside the occluding box around the participant's hand. In one third of the trials, a no-go stimulus was presented, represented by a “stop” hand gesture, and participants were required to withhold their response.
To manipulate the cognitive context, participants were instructed at the start of a block of trials that they would be required to perform an “action recall task” and report either the hand action they had performed (action context) or the hand action they had seen (image context). Following the go/no-go stimulus (a variable delay of 700–1300 msec), a horizontal array of all four hand gesture images was presented, and participants were required to select the appropriate gesture by pressing the corresponding key with their left hand. The cognitive context changed between blocks.
At the beginning of the experiment, all participants undertook a short training session to familiarize themselves with the actions employed in the task. Initially, participants simply watched and imitated the four gestures eight times each to learn the gestures and their associated word-cue labels. Next, participants performed two practice blocks of 24 trials of the go/no-go task, one for each context. In these blocks, the selection of the action to be remembered was followed by feedback (correct/incorrect) on responses. For the full experiment, participants performed the go/no-go task in 10 blocks of 52 trials, alternating for each cognitive context, with the order counterbalanced between participants. The first four trials of each block were discarded from analyses to avoid possible task-switching effects. (These excluded trials were also balanced across all possible task conditions so as not to affect the counterbalancing of the analyzed trials.) Four hundred eighty trials were analyzed, consisting of 80 trials for each condition (congruent, incongruent, and no-go) in each context (image vs. action).
Recording and Analyses
A 64-channel EEG and vertical and horizontal EOGs were recorded continuously using a BioSemi Active Two EEG system, at a sampling rate of 512 Hz. EEG data were rereferenced off-line to an average reference.
We analyzed visual ERPs time-locked to the presentation of the gesture image, focusing specifically on the N170 component, and premovement potentials time-locked to the onset of movement (as indexed by button release), during the delay following the go signal. For the N170 analyses, the EEG signal was band-pass filtered at 1–45 Hz, segmented into epochs from 100 msec before to 500 msec after the go stimulus onset, and baseline corrected relative to the first 100 msec of the epoch. For the premovement ERPs, the signal was band-pass filtered at 0.1–30 Hz, segmented into epochs from 800 msec before to 500 msec after the movement onset, and baseline corrected relative to the first 300 msec of the epoch. If the EEG signal exceeded ±120 μV or if eye blink, eye movement, or other artifacts were present, epochs were rejected from the analyses. Average ERPs were calculated separately for congruent and incongruent trials in the action context and the image context.
We measured the N170 amplitude in each individual as the maximum negative peak recorded between 130 and 200 msec at the electrode showing maximal amplitude (PO7) and on the corresponding electrode in the opposite hemisphere (PO8). For premovement ERPs, we analyzed lateralized activity over the left motor cortex preceding movement onset, which we will refer to as the left LRP (L-LRP). This activity was calculated in the same way as conventional LRPs (Gratton, Coles, Sirevaag, Eriksen, & Donchin, 1988) by subtracting activity over the motor cortex ipsilateral to the side of movement from activity of the motor cortex in the contralateral hemisphere. Because all movements were performed with the right hand, we calculated the difference waveform between electrodes C3 and C4. The mean amplitude calculated over the last 300 msec before movement onset was used to quantify the L-LRP.
Separate repeated-measures ANOVAs were conducted on the N170 amplitude, L-LRP amplitude, and RTs. For the N170, ANOVA included factors of Context (action context, image context), congruency of gesture images (congruent, incongruent), and electrode (PO7, PO8). For L-LRP and RTs, the ANOVAs included factors of Context (action context, image context) and Congruency of gesture images (congruent, incongruent). In all analyses, significant interactions were followed up with paired t tests as post hoc tests. We also conducted correlation analyses between the behavioral index of the priming effect and the neurophysiological index. For each participant, we calculated the RT difference between incongruent and congruent trials and correlated this separately with the N170 amplitude difference and the LRP amplitude difference for incongruent versus congruent trials.
Errors in participants' performance were assessed by measuring anticipations (i.e., responses initiated before the go cue), misses, or wrongly performed actions for the go cue, false alarms for the no-go cue, and incorrect actions reported.
Results and Discussion
Mean amplitudes of the N170 (Figure 3A), which reflect the visual processing of observed action images, revealed that motor-to-visual priming was modulated by cognitive context. The N170 was significantly larger for congruent than incongruent gestures [main effect of Congruency, F(1, 19) = 5.08, p < .05] and also larger in the image context than in the action context [main effect of Congruency, F(1, 19) = 7.90, p < .05]. Crucially, there was a significant interaction between the congruency effect and the cognitive context [Context × Congruency interaction, F(1, 19) = 4.42, p < .05]. In the action context, paired t tests showed that the N170 was significantly larger for congruent gestures compared with incongruent gestures (p < .05); by contrast, in the image context, there was no significant difference in the N170 amplitude between congruent and incongruent gestures (p > .05). Moreover, the N170 for incongruent gestures was significantly smaller in the action context than in the image context (p < .005). To summarize, therefore, the influence of motor plans on visual processing of observed actions, that is, the congruency effect on the N170, was significant only in the action context, when participants were required to focus on their intended actions. By contrast, when participants focused on the gesture images themselves, there was no effect of congruency on the N170 component.
The N170 was also more prominent over the left hemisphere than over the right hemisphere [main effect of electrode, F(1, 19) = 11.42, p < .005]. Previous studies have suggested that the N170 is lateralized to the left hemisphere for the visual processing of hand images (Wheaton, Pipingas, Silberstein, & Puce, 2001; McCarthy, Puce, Belger, & Allison, 1999). Also, in a previous study, we found that the N170 was significantly larger over the hemisphere contralateral to a planned action, suggesting an enhanced responsiveness of the neural generators of the N170 to planned actions in the hemisphere involved in preparing the action (Bortoletto et al., 2011). The laterality of the N170 did not interact with any other factor.
Results for the L-LRP (Figure 3B), which reflects motor processes for action preparation, revealed that the cognitive context influenced visual-to-motor priming in a complementary way to that observed for the N170. The L-LRP was significantly larger for congruent than incongruent gestures [main effect of Congruency, F(1, 19) = 9.84, p < .01], and this congruency effect was again modulated by the cognitive context [Context × Congruency interaction, F(1, 19) = 5.00, p < .05]. Paired t tests revealed that, in the image context, the L-LRP was significantly larger for congruent trials than for incongruent trials (p < .005); by contrast, in the action context, the L-LRP was not significantly different between congruent and incongruent trials (p > .05). Because the L-LRP measure was baseline-corrected to a time interval that overlapped the presentation of the gesture image, we performed a second analysis in which the baseline was set between −800 and −600 msec (corresponding, on average, to the 200 msec preceding the presentation of the gesture image). This additional analysis yielded the same statistical results, confirming that differences in L-LRP amplitudes were not simply caused by differences in gesture image processing occurring during the L-LRP baseline interval. To summarize, the influence of observed actions on neural processes associated with motor planning (the congruency effect on the L-LRP) was significant only in the image context, when participants focused on the observed actions. By contrast, when participants focused on their performed actions, there was no effect of congruency on the L-LRP.
Correlation analyses performed across cognitive contexts showed no significant association between effects of congruency on RTs and congruency effects on either the N170 or L-LRP amplitudes (highest r = .27, p = .24).
Taken together, the analyses of the N170 and L-LRP revealed stronger motor-to-visual priming in the action context and stronger visual-to-motor priming in the image context, consistent with the hypothesis that visual–motor interactions are sensitive to context during action observation (Teufel et al., 2010).
Mean RTs to the go signal replicated previous findings of visuomotor interference (for a review, see Heyes, 2011; Figure 4). RTs were significantly shorter when the depicted action and the executed action were congruent than when they were incongruent [main effect of Congruency, F(1, 19) = 19.36, p < .005]. Interestingly, there was also a significant interaction between congruency and cognitive context [Context × Congruency interaction, F(1, 19) = 7.24, p < .05]. Paired t tests showed that, for incongruent trials, RTs were significantly longer in the image context than in the action context (p < .05), whereas there was no significant difference between contexts for congruent trials (p > .05). In summary, incongruency between an observed action and a performed action significantly slowed the time required to initiate a response, and this effect was larger in the image context than in the action context.
Participants made very few errors across conditions and contexts. The mean percentage of errors for reporting gestures was 2.31% in the image context (reporting the action seen) and 1.81% in the action context (reporting the action performed). This difference was not significant (paired t test: t(19) = .94, p = .36). Likewise, the percentage of errors for the go stimulus was low, with less than 5% anticipations, misses, and wrongly performed actions combined and only 11% false alarms to the no-go stimulus.
Overall, Experiment 1 showed that visual processing and motor preparation in a visuomotor interference paradigm are modulated depending on the cognitive context, suggesting that top–down control can influence the mechanisms of interaction between the motor system and the visual system. In general, the congruency between observed and executed actions did not affect neural processes for that aspect of the task that was most relevant and specifically attended; it was only the nonrelevant aspect of the task that was affected by congruency. For example, when participants focused on observed actions, the visual processing of those actions (N170) was not affected by the congruency of planned actions, whereas motor preparatory activity (L-LRP) was significantly increased for congruent actions. Conversely, when participants focused on their performed actions, motor preparatory activity (L-LRP) was not affected by the congruency of observed actions, whereas N170 amplitudes during early visual processing were significantly smaller for incongruent than congruent actions.
It is not possible at this stage to individuate the level at which interactions between the motor and visual systems (as defined in Heyes, 2011) are modulated by cognitive context: whether the input level, that is, the processing of the visual stimulus, or visual–motor associations, or the output level, that is, the motor program. Rather, our data suggest that the direction of the visual–motor interaction, that is, either as visual-to-motor priming or motor-to-visual priming, can change according to contingent demands.
The direction of the congruency effect we observed for the N170 is at odds with the previous findings. In the current study, the N170 was larger for congruent gestures than for incongruent gestures in the action context, whereas previous studies (Bortoletto et al., 2011; Press et al., 2010) reported that the N170 was smaller for congruent gestures than for incongruent gestures. Although this “inversion of the congruency effect” has been shown according to spatial compatibility between observed and executed actions (Press et al., 2010), it is unlikely that spatial congruency, that is, the matching of spatial features between the performed actions and the observed actions, can explain our results. Indeed, we controlled for spatial compatibility effects by carefully constructing the images so that they did not have more critical features on the left than the right side and so that they had roughly equal pixels left and right. Moreover, the same stimuli were used in our previous study (Bortoletto et al., 2011) in which the N170 was larger for incongruent gestures compared with congruent gestures. Therefore, the inversion of the effect cannot be because of specific spatial features of the stimuli.
Alternatively, this inversion of the congruency effect might depend on the observer's point of view, that is, an egocentric perspective (the observer sees the action as if it was performed by himself/herself) versus an allocentric perspective (as if the observer faces someone else performing the action). Only Press et al. (2010) have previously investigated the visual processing of observed actions when action images are presented as the go stimulus, as in our Experiment 1, and they included images in an egocentric perspective. Interestingly, our Experiment 1 suggests that, when observed actions, as the go cue, are seen from an allocentric perspective, the congruency effect is reversed relative to when actions are observed from an egocentric perspective (Press et al., 2010). In a purely behavioral study, we recently reported that the effect of cognitive context on visuomotor interference is indeed perspective-dependent, that is, visuomotor interference in the action context is stronger for gesture images from an egocentric perspective than for those from an allocentric perspective (Bortoletto, Mattingley, & Cunnington, 2013). Hand images in the egocentric perspective better match the representation of the action outcome that is created at the time of movement initiation (Kilner, Friston, & Frith, 2007; Hommel, Müsseler, Aschersleben, & Prinz, 2001) and may therefore trigger different matching processes between predicted and observed action outcomes (Bortoletto et al., 2013). To verify this effect of perspective and to provide further support for the conclusions drawn from Experiment 1, we replicated the initial experiment in an independent sample. Specifically, in Experiment 2, we repeated the action context of Experiment 1 in which congruency effects on the N170 were found. This time, however, we varied the perspective of the observed actions to directly compare congruency effects on the N170 for gesture images presented from egocentric compared with allocentric perspectives.
Experiment 2 involved 18 right-handed healthy volunteers (10 women, aged 19–28 years, mean age = 22.3 years). None of them had taken part in Experiment 1. Participants gave their written informed consent to participate, and the study was approved by the medical research ethics committee of The University of Queensland.
The setup and stimuli from Experiment 1 were used again for Experiment 2. After a brief training session, participants performed the same delayed go/no-go task in the action context as in Experiment 1 but with the presented hand gesture images differing in terms of the perspective from which the gesture images were depicted: egocentric (first person) perspective and allocentric (third person) perspective (Figure 2). Participants were required to prepare a hand gesture as indicated by an initial word cue and to perform the gesture as quickly as possible upon presentation of the go stimulus. Participants responded to the go stimulus with their right hand by releasing a button and performing the prepared action as quickly as possible. In one third of the trials, a no-go stimulus was presented, represented by a “stop” hand gesture, and participants were required to withhold their response. As Experiment 2 was conducted entirely in the action context, following the response to the go/no-go stimulus, participants were required to report the hand action they had performed by selecting one of the four gesture images presented on the screen.
Gesture images in the egocentric and allocentric perspectives were presented in random order in all blocks. Participants completed 13 blocks of 40 trials each. The first block was used as practice and excluded from analyses. As in Experiment 1, 480 trials were analyzed, consisting of 80 trials for each condition (congruent, incongruent, and no-go) in each perspective condition (egocentric, allocentric).
ERPs and Statistical Analyses
EEG recording and data preprocessing of ERPs, time-locked to the presentation of the gesture image, were performed exactly as in Experiment 1. For the data analysis, the N170 was measured as the peak amplitude in the time window between 130 and 210 msec following presentation of the gesture image at the electrode showing maximal amplitude (P7) and the corresponding electrode in the opposite hemisphere (P8). Statistical analyses on the N170 were conducted using a three-way repeated-measures ANOVA including factors of Perspective (egocentric, allocentric), Congruency of gesture images (congruent, incongruent), and Hemisphere (P7 left, P8 right). Interactions involving Congruency were followed up with paired t tests to compare ERPs to congruent and incongruent hand gesture images.
RTs to the go stimulus were recorded and analyzed using a two-way repeated Measures ANOVA that included factors of Perspective (egocentric, allocentric) and Congruency of gesture images (congruent, incongruent). Participants' compliance with task instructions was assessed by measuring anticipations (i.e., responses initiated before the go cue), misses, or wrongly performed actions for the go cue; false alarms for the no-go cue; and incorrect actions reported.
Results and Discussion
As can be seen in Figure 5, the N170 amplitude was modulated by the congruency between observed action and performed action for stimulus images presented both in egocentric and allocentric perspectives. Overall, the N170 was significantly larger over the left hemisphere than over the right hemisphere [main effect of Hemisphere, F(1, 17) = 19.23, p < .05]. Crucially, ANOVA revealed a three-way interaction between Hemisphere, Congruency, and Perspective [F(1, 17) = 10.86, p < .05]. In the left hemisphere, post hoc tests showed that the N170 was significantly larger for congruent trials than incongruent trials in the allocentric perspective [t(17) = 2.14, p < .05). By contrast, in the egocentric condition, the N170 showed an opposite tendency, being slightly larger for incongruent than congruent trials, although this difference was not statistically significant [t(17) = 1.40, p = .18]. No congruency effects were found over the right hemisphere.
These results replicate our results from Experiment 1, showing that visual processing of observed actions is modulated when a concurrent motor plan is task relevant (action context). Specifically, the N170 was larger for congruent trials compared with incongruent trials in the allocentric perspective condition. Moreover, the perspective of the observed action seems to determine the direction of the congruency effect on the N170. In the egocentric perspective, although the difference between congruent and incongruent trials was not statistically significant, the direction was reversed and consistent with that reported previously by Press et al. (2010) for gesture images in an egocentric perspective.
RTs to initiate the hand actions were significantly faster in congruent trials than in incongruent trials [main effect of congruency: F(1, 17) = 13.10, p < .005]. Crucially, the congruency effect was modulated by the perspective of observed actions [Perspective × Congruency interaction: F(1, 17) = 4.58, p < .05]. This interaction arose from a much larger congruency effect for images presented in the egocentric perspective (congruent: 646 ± 159 msec, incongruent: 726 ± 197 msec) compared with images presented in the allocentric perspective (congruent: 661 ± 172 msec, incongruent: 701 ± 184 msec). These results replicate those reported in a recent study in which we examined the effects of perspective and context on visuomotor interference (Bortoletto et al., 2013) and are consistent with previous studies showing that action images presented from an egocentric perspective trigger stronger visuomotor interference than images from an allocentric perspective (Bruzzo, Borghi, & Ghirlanda, 2008; Jackson, Meltzoff, & Decety, 2006).
Error rates on the task were low and did not differ significantly between conditions, suggesting that participants performed the task well and according to instructions. The percentage of error trials on the go stimulus for egocentric and allocentric conditions, respectively, were 0.10% and 0.20% for anticipations [t(17) = 4.20, p > .05] and 2.50% and 2.95% for misses [t(17) = 4.20, p > .05]. False alarms to the no-go stimulus were 5.07%. The mean percentage errors for reporting gestures performed was 4.65% in the egocentric condition and 4.31% in the allocentric condition [t(17) = 0.53, p > .05].
We suggest that observing actions from an egocentric or allocentric perspective may lead to different identification of the actions' agent, that is, the participants themselves or others, respectively (Bortoletto et al., 2013; David et al., 2006; Vogt et al., 2003). If an incongruent action is observed from an egocentric perspective, visual processing of the action may trigger detection of a mismatch between intended actions and current external cues, which could be important for action updating at the time of movement initiation (Bortoletto et al., 2013; Press et al., 2010). By contrast, images seen from an allocentric perspective, which would be viewed as actions performed by another, may be less relevant for action control at the time of movement initiation and thus may not trigger the same “mismatch mechanisms.” The visual processing of congruent actions from an allocentric perspective may therefore be facilitated because such actions represent the same action goal but do not conflict directly with the intended action because of the different perspective and agency implied.
Moreover, this mechanism may also depend on the timing of the presentation of the gesture image. In the early stages of motor planning, the representation of an action's consequences may be sufficiently abstract to ignore the perspective from which the action is observed, and incongruent observed actions both from egocentric and allocentric perspectives may trigger “mismatch” mechanisms. This would fit with our previous study (Bortoletto et al., 2011) in which observed actions were presented early during motor planning rather than at the time of movement initiation. Further studies on these complex mechanisms are warranted.
To summarize, Experiment 2 replicated and extended our findings in Experiment 1. Taken together, the results of the two experiments suggest that, when performed actions are most relevant, the motor plan for intended actions modulates the visual processing of concurrently observed actions. In addition, the direction of this effect, that is, whether the N170 is increased for congruent or incongruent observed actions, is influenced by the viewer's perspective when images are presented at the time of movement initiation.
Overall, our findings suggest that visual–motor interaction during observation of action can change depending on cognitive context, suggesting that interactions between the motor system and the visual system are influenced by top–down (intentional) control. When performed actions were task relevant, the motor system had a stronger influence on the visual system (motor-to-visual priming). Conversely, when observed actions were task relevant, the visual system had a stronger influence on the motor system (visual-to-motor priming).
It should be noted that both the action and image contexts produced similar effects on RTs to initiate movements. Indeed, in both context conditions, participants were significantly faster to initiate movements that were congruent with the observed action than to initiate movements that were incongruent, although this effect was stronger in the image context. This is not surprising given that RTs presumably reflect a combination of perception, selection, and motor planning processes before the initiation of responses and therefore may not be sensitive to the effects of context on different stages of processing. Crucially, by using ERPs, we were able to show that visual–motor interactions during action observation were the result of two different underlying processes operating in different contexts. In future work, it would be interesting to examine how a behavioral measure based purely on perceptual discrimination, as opposed to RT, is influenced by context and congruency of concurrently planned actions.
In the action context, the N170 amplitude was significantly modulated by the congruency between the observed action and the performed action, suggesting that the participant's planned or intended action interfered with the visual processing of the incongruent observed gesture. Previous studies have shown that the amplitude of the N170 to observed actions is modulated when participants concurrently prepare or execute an action (Bortoletto et al., 2011; Press et al., 2010). Moreover, we have shown that more general semantic aspects of action representation, and holding the action representation in working memory, do not induce effects on the N170; instead, this modulation is specific to action planning (Bortoletto et al., 2011). Importantly, in these previous studies, the content or meaning of the observed action was irrelevant to the participant's task, equivalent to the action context in the current study. Here, we have shown that, when participants must encode the observed action for later report, there is no influence of the concurrently planned action on visual processing (N170). It is only in the action context, when the observed action is task irrelevant, that the N170 is influenced by concurrent motor plans.
In the image context, motor preparatory activity (L-LRP) was significantly larger in amplitude for congruent than incongruent gestures, suggesting that observed actions caused a modulation of motor activity for the preparation of congruent motor responses. Such an increase in motor activity suggests that action observation may have activated motor representations of the same action, thereby facilitating movement-related activity in the motor system in the contralateral hemisphere. It should be noted, however, that the L-LRP reflects lateralized activity associated with action selection and preparation but differs from more typical LRPs because it was calculated with a single subtraction method in the left hemisphere only. Such an approach permits examination of lateralized activity, but it does not necessarily eliminate potential confounds related to hemispheric differences (Oostenveld, Stegeman, Praamstra, & Van Oosterom, 2003). Nevertheless, differences in amplitude of the L-LRP across conditions and contexts in this study clearly provide an index of the modulation of motor-related activity before movement initiation.
The lack of effects on the N170 in the image context may at first seem surprising considering that the observed action was held in working memory until a response was required, that is, during the execution of the movement. However, it is unlikely that the maintenance of the gesture image in working memory alone would directly induce an effect on the N170. Indeed, a previous study on working memory for human body forms showed that processes associated with the early encoding of the stimulus (i.e., the N170) are not modulated by the working memory load in the encoding phase (Bauser, Mayer, Daum, & Suchan, 2011), corresponding to the time of presentation of the gesture image in our study. Moreover, several studies have reported the P300 to be the earliest ERP component modulated by working memory in the encoding phase (Kok, 2001; Fabiani, Karis, & Donchin, 1986).
Similarly, it is unlikely that our findings are explained by the semantic representation of actions alone or that the sensory–motor interference we observed arose from a more abstract or semantic-level representation of the gestures evoked by the initial word cue rather than the specific planning of action. In a previous study, in which we used the same meaningful, symbolic gestures in a similar paradigm, we found no effects of semantics on sensory–motor interference (Bortoletto et al., 2011). In that study, we included a control condition in which participants were required to remember and maintain the cued gesture in working memory for later recall, rather than preparing the gesture for execution. Effects of image congruency on N170 were found only when participants concurrently planned actions and not when maintaining gestures in working memory for later recall. This previous finding, that congruency effects are only seen during the planning and execution of actions (Bortoletto et al., 2011), indicates that sensory–motor interference represents influences of the motor system on visual processing (and vice versa) rather than any other effects of semantic or lexical representation in working memory. Therefore, the modulation of congruency effects by cognitive context that we report here most likely represent a top–down consequence of visual–motor interactions, rather than any direct effects of prefrontal or executive processes on visual and motor systems separately.
Overall, our results are consistent with the suggestion that visual–motor interactions are flexible and may adapt to meet contingent demands. Previous studies have shown that the strength of visuomotor interference, as indexed by congruency effects in RTs, can be modulated by higher cognitive functions such as attention and social interaction (Perry et al., 2011; Chong et al., 2009; Bach et al., 2007; Oberman et al., 2007). Attention has also been shown to modulate neural activity in areas involved in action observation and associated with visual–motor interactions. For example, activity in the inferior frontal gyrus is decreased when attention is diverted from action observation (Chong et al., 2008). Similarly, activity in the motor cortex during action observation depends on attention to those actions (Schuch et al., 2010). Moreover, participants' intentions, for example, to understand an action or to identify physical features, are associated with specific patterns of cortical activation during action observation (Molenberghs, Hayward, Mattingley, & Cunnington, 2011). This study extends these findings by showing that top–down regulation changes not only the strength of interactions between the visual system and the motor system but also their direction.
Visual and motor systems are linked via parietal and premotor brain networks. Regions of the posterior inferior frontal gyrus and the rostral inferior parietal lobe are activated both for action observation and for action execution and, together with the STS , have been included in an “action observation network” (Molenberghs, Cunnington, & Mattingley, 2012; Caspers, Zilles, Laird, & Eickhoff, 2010; Grèzes & Decety, 2001). Changes in the direction of visual–motor interference revealed in our study indicate that cortico–cortical interactions within this network are state dependent. Therefore, the way in which actions are represented and the manner in which information is exchanged between visual and motor areas evidently change depending on the current task or goal of the actor/observer and his or her cognitive contingencies. This is consistent with studies that have shown state-dependent changes in network connectivity (for a review, see Silvanto, Muggleton, & Walsh, 2008). Such results have been found both in the motor system during voluntary action production and observation (Koch et al., 2010; Koch & Rothwell, 2009; Bestmann et al., 2008) and in the visual system (Marreiros, Kiebel, & Friston, 2008; Ruff et al., 2008). Moreover, there is evidence that connectivity between the action observation network and the pFC during action observation depends on task instructions and whether participants are making inferences about or merely viewing others' actions passively (Schippers & Keysers, 2011).
In future studies, it will be important to determine which neural regions exert top–down control and modulate these interactions between visual and motor systems. The pFC is a central structure for executive behavioral control. Neurons of the lateral pFC are involved in multiple executive functions and, crucially, appear to be related to the attentional selection of task relevant information and actions (Tanji & Hoshi, 2008). Moreover, activity in this area can represent contingent rules and bias activity in connected areas in line with task requirements (Miller, 2000). Interestingly, recent anatomical studies in monkeys have revealed connections between regions of the lateral pFC, that is, the medial part of BA 12r and BA 46v, and a fronto-parietal network underlying the control of goal-directed actions and action observation (Borra, Gerbella, Rozzi, & Luppino, 2011; Nelissen et al., 2011; Barbas, 1988). These studies therefore suggest possible involvement of lateral prefrontal regions in modulating visual–motor interactions according to task requirements or contexts. Nevertheless, other mechanisms may be involved. A possible alternative is that the “action observation network” exerts top–down influence on early visual areas and motor areas and that these connections are modulated by cognitive context. In this case, the modulation of visual–motor interaction by cognitive context would not require the involvement of prefrontal structures but would rather take place through changes within the visuomotor system.
Although our results clearly suggest a link between motor and visual systems during perception of action, we can only speculate on the neural mechanisms that provide this link. It has been suggested that the mirror neuron system in humans might provide a mechanism for sensory–motor transformation of actions that can be employed, in conjunction with higher-order cognitive systems, to facilitate action understanding (Rizzolatti & Sinigaglia, 2010), imitation (Iacoboni, 1999, 2005; Buccino et al., 2004), and the selection of motor responses during social interactions (Bonini & Ferrari, 2011). The mirror neuron system has also been proposed to mediate visual–motor interaction effects during action observation (Gallese et al., 2011; Heyes, 2011; Longo & Bertenthal, 2009; Schütz-Bosbach & Prinz, 2007; Kilner et al., 2003; Craighero et al., 2002). Although our results do not speak to whether visual–motor interaction effects arise from mirror mechanisms or from some other form of interaction, they may contribute to the debate on the function of mirror neurons. Indeed, motor-to-visual priming and visual-to-motor priming may reflect different functions. Enhanced visual-to-motor priming during action observation may be important when observing actions for imitation. Moreover, our finding that motor-related activity is particularly increased when participants are actively engaged in observing an action is consistent with involvement of the mirror neuron system in mapping observed actions to the motor system for action understanding (Rizzolatti & Sinigaglia, 2010). By contrast, greater motor-to-visual priming during action performance may be useful when monitoring the outcome of our own intended actions. In line with the ideomotor theory of action (Greenwald, 1970; James, 1890) and with the common event-coding theory (Prinz, 2005; Hommel et al., 2001), according to which actions are also represented by their expected perceptual effects, motor-to-visual priming may be important for motor learning and for anticipation of the expected sensory outcomes from intended actions.
In summary, we suggest that visual–motor interactions during action observation are influenced top–down by higher cognitive processes, and therefore depend on current goals or task requirements. Such modulation allows the motor system to influence the visual processing of observed actions when the execution of movement is most relevant and allows the visual processing of observed actions to influence the motor system when perception of action is most relevant.
We thank Doug Fraser and Quirine Tordoir for their assistance in conducting experiments. This work was supported by a grant from the Australian Research Council (DP110103285). R. C. was supported by an Australian Research Council Future Fellowship (FF0991468). J. B. M. was supported by an Australian Research Council Laureate Fellowship (FL110100103).
Reprint requests should be sent to Marta Bortoletto, Cognitive Neuroscience Section, IRCCS San Giovanni di Dio Fatebenefratelli, Via Pilastroni 4, 25123 Brescia, Italy, or via e-mail: email@example.com.