Evidence implicates ventral parieto-premotor cortices in representing the goal of grasping independent of the movements or effectors involved [Umilta, M. A., Escola, L., Intskirveli, I., Grammont, F., Rochat, M., Caruana, F., et al. When pliers become fingers in the monkey motor system. Proceedings of the National Academy of Sciences, U.S.A., 105, 2209–2213, 2008; Tunik, E., Frey, S. H., & Grafton, S. T. Virtual lesions of the anterior intraparietal area disrupt goal-dependent on-line adjustments of grasp. Nature Neuroscience, 8, 505–511, 2005]. Modern technologies that enable arbitrary causal relationships between hand movements and tool actions provide a strong test of this hypothesis. We capitalized on this unique opportunity by recording activity with fMRI during tasks in which healthy adults performed goal-directed reach and grasp actions manually or by depressing buttons to initiate these same behaviors in a remotely located robotic arm (arbitrary causal relationship). As shown previously [Binkofski, F., Dohle, C., Posse, S., Stephan, K. M., Hefter, H., Seitz, R. J., et al. Human anterior intraparietal area subserves prehension: A combined lesion and functional MRI activation study. Neurology, 50, 1253–1259, 1998], we detected greater activity in the vicinity of the anterior intraparietal sulcus (aIPS) during manual grasp versus reach. In contrast to prior studies involving tools controlled by nonarbitrarily related hand movements [Gallivan, J. P., McLean, D. A., Valyear, K. F., & Culham, J. C. Decoding the neural mechanisms of human tool use. Elife, 2, e00425, 2013; Jacobs, S., Danielmeier, C., & Frey, S. H. Human anterior intraparietal and ventral premotor cortices support representations of grasping with the hand or a novel tool. Journal of Cognitive Neuroscience, 22, 2594–2608, 2010], however, responses within the aIPS and premotor cortex exhibited no evidence of selectivity for grasp when participants employed the robot. Instead, these regions showed comparable increases in activity during both the reach and grasp conditions. Despite equivalent sensorimotor demands, the right cerebellar hemisphere displayed greater activity when participants initiated the robot's actions versus when they pressed a button known to be nonfunctional and watched the very same actions undertaken autonomously. This supports the hypothesis that the cerebellum predicts the forthcoming sensory consequences of volitional actions [Blakemore, S. J., Frith, C. D., & Wolpert, D. M. The cerebellum is involved in predicting the sensory consequences of action. NeuroReport, 12, 1879–1884, 2001]. We conclude that grasp-selective responses in the human aIPS and premotor cortex depend on the existence of nonarbitrary causal relationships between hand movements and end-effector actions.
Technology fundamentally alters the relationship between our actions and their consequences in the world by enabling goals to be accomplished through otherwise ineffective behaviors. For instance, distant objects may become accessible when a plier is used to extend the range of our reach-to-grasp. Remarkably, with training, neurons in monkey ventral premotor cortex (vPMC; area F5) that code grasping objects with the hands come to represent the same action undertaken with this nonbiological effector. Critically, some of these units respond similarly regardless of whether the plier is conventional or reversed, such that opening the hand causes the tool's jaws to close on the object (Umilta et al., 2008). This result is considered strong evidence for the hypothesis that these premotor neurons are coding the goal of grasping independent of the specific movements or effectors involved in its realization (Rizzolatti et al., 1988). Response properties consistent with goal representation have also been reported within hand manipulation neurons of macaque inferior parietal cortex, specifically in area PFG (Bonini et al., 2011) and the anterior intraparietal region (Gardner et al., 2007).
Cortex located at the junction of the human postcentral and anterior intraparietal sulci (aIPS; Frey, Vinton, Norlund, & Grafton, 2005; Culham et al., 2003; Binkofski et al., 1998) and vPMC (Binkofski et al., 1999) exhibits grasp-selective activity resembling that of hand manipulation neurons. Importantly, responses in the aIPS (Hamilton & Grafton, 2006; Tunik, Frey, & Grafton, 2005) and vPMC (Hamilton & Grafton, 2006; Johnson-Frey et al., 2003) are independent of the specific sensorimotor demands involved in grasp execution. The interpretation of these effects as evidence for goal-dependent action representations is bolstered by data indicating that both regions exhibit increased activity when planning grasping actions that involve use of the hands or of a handheld tool (Gallivan, McLean, Valyear, & Culham, 2013; Martin, Jacobs, & Frey, 2011; Jacobs, Danielmeier, & Frey, 2010).
A potential confound in both macaque and human tool use studies is the existence of a nonarbitrary causal relationship between movements of the hand and the actions of the handheld instrument. By its very nature, a handheld tool must be grasped, and extending or retracting the hand has the same effect on the end-effector of the tool. The same is true for wrist adduction/abduction and forearm pronation/supination. Although transformed by the mechanical properties of the tools (Arbib, Bonaiuto, Jacobs, & Frey, 2009), finger extension or flexion still served to open or close the tools' end-effectors in these studies (Gallivan et al., 2013; Martin et al., 2011; Jacobs et al., 2010; Umilta et al., 2008). Rather than abstract goal-dependent representations, this non-arbitrary causal relationship between hand movements and tool actions may explain aIPS and vPMC involvement in grasping with the hands or with a tool.
Modern technology enables previously arbitrary movements to be harnessed as control signals for the actions of a wide variety of tools and devices in peripersonal, extrapersonal, and even extraterrestrial space. It is possible, for example, to learn to reach for and grasp objects with a robotic arm controlled through the press of a button, manipulation of a joystick, or even directly through brain activity (Hochberg et al., 2012; Schwartz, Cui, Weber, & Moran, 2006; Andersen, Burdick, Musallam, Pesaran, & Cham, 2004; Wolpaw & McFarland, 2004; Carmena et al., 2003; Nicolelis, 2001). Conversely, these same inputs can be used to control a diversity of tools and actions. The arbitrary causal relationship between our movements and tools' actions enabled by these technologies provides an unprecedented chance to test the hypothesis that the aIPS and/or vPMC are coding the goal of grasping actions independent of specific sensorimotor demands.
We capitalize on this opportunity by recording whole-brain activity with fMRI during two tasks in which the same participants planned and then performed object-oriented grasp or reach actions. To functionally localize grasp-related areas within the aIPS and potentially also in the premotor cortex, participants undertook a manual task (MT) wherein these behaviors are performed naturally, with the hand in peripersonal space. In the robot task (RT) we eliminate the nonarbitrary causal relationship between hand and tool by requiring participants to use individual fingers to initiate grasp or reach actions of a robotic arm located in remote extrapersonal space through button presses. If the human aIPS and vPMC represent the goal of grasping independent of the demands associated with sensorimotor control, then we expect these areas to exhibit similar grasp-selective responses in both the MT and RT. Alternatively, if the selective involvement of these areas in grasping depends on a nonarbitrary causal relationship between hand movements and tool actions, then we anticipate no differences in activity during grasping versus reaching with the robotic arm. Findings from a recent electroencephalography study suggest that aIPS and premotor activity depends on participants' perceiving that they are causing subsequently observed grasping actions (Bozzacchi, Giusti, Pitzalis, Spinelli, & Di Russo, 2012). If so, then we predict greater aIPS and premotor activity when participants press a button to launch the robot's actions and observe the ensuing consequences versus when they press a button known to be nonfunctional and watch the very same actions undertaken autonomously by the robot.
Eighteen volunteers (mean age = 24.4 years, range = 18.7–39.5, six men) participated in the study, which was approved by the University of Oregon institutional review board. All participants were right-handed as measured by the Edinburgh Handedness Inventory (Oldfield, 1971) and had normal or corrected-to-normal vision. Two participants completed only the MT.
Stimulus presentation and response recording for the MT were controlled by custom LabView software (www.ni.com/labview/), whereas the RT used Presentation (www.neurobs.com/). Trial orders in both experiments were optimized with Optseq2 (surfer.nmr.mgh.harvard.edu/optseq/). A central white fixation point was visible throughout the entirety of both experiments, and participants were asked to maintain fixation. Compliance was monitored by the experimenters through the live video feed from an MR-compatible eye-tracking camera (www.asleyetracking.com/Site/). If the participant was not fixating or showed signs of drowsiness, verbal feedback was provided as needed by the experimenter between runs.
The MT and RT were undertaken within a single session, and the order of administration was counterbalanced across participants. In both tasks, the workspace consisted of a 30.5 × 61 cm board and a 50-mm diameter, circular opening. Likewise, the target objects for reach or grasp movements were red or white 25 × 25 × 50 mm wooden blocks. In both the MT and RT, the workspace was captured by a digital video camera and projected onto a screen located at the rear of the scanner bore. This screen was viewed through a mirror mounted to the head coil. To ensure that participants were attending similarly to actions in both tasks, a red block replaced the standard white block on approximately 25% of the total number of trials. Participants were instructed to keep count of the number of red blocks used in the Grasp condition only and to verbally report this value after the end of each run. Because of an error, block counts were only recorded for the RT.
The workspace was placed across the participant's lap. A five-button response pad was positioned at a comfortable distance on the midsagittal plane. Participants viewed a live video of the workspace, captured from bird's eye view, creating a perspective similar to what they would experience if seated and looking down at their laps. The identity of the premovement cues and timing of the premovement phase was identical across tasks. However, the execution phases differed in length. On the basis of piloting, we found that the time required to perform the movements in the MT appeared extremely fast when replicated with the robot. On the basis of this observation, the RT execution phase was lengthened.
At the onset of each trial, a 500-msec visual instructional word cue (“Reach” or “Grasp”) indicated which movement would be involved. This was followed by a variable 3000- or 4000-msec delay interval. The premovement phase consisted of the 3500-msec period starting with the onset of the instructional cue and included the subsequent 3000-msec delay. Next, a 4500-msec execution phase began with the onset of the live video stream of the workspace. This signaled participants to initiate their movements and included the subsequent 4500-msec movement period (Figure 1).
In the execution phase, visual feedback of the hands was provided by live video feed. In the Reach condition, the fingertips remained together throughout the movements. On each trial, participants released the start button, reached forward and touched the 6-mm radius circle located on the top of the target block with the fingertips, moved the hand laterally to the circular opening, returned to the start position, and depressed the button. In the Grasp condition, participants released the start button, reached toward and grasped the target object with the fingertips, transported it laterally, and dropped it into the circular opening before returning to the start location and depressing the button.
The session consisted of 90 trials, 30 each of grasp, reach, and null conditions. Null conditions consisted of the fixation cross against a black screen and served both as a rest condition and induced temporal variation necessary for deconvolution of the event-related hemodynamic responses (Buckner, 1998). For each condition, the average trial length was 9 sec. The session was approximately 14 min in length including 15 sec of fixation at the beginning of the session to orient the participant and 15 sec of fixation at the end to capture the delayed hemodynamic response from the final trial. Trials were presented in two optimal orders (see above) that were administered in counterbalanced fashion across participants. Participants practiced the MT for approximately 5 min during the acquisition of the MRI structural scan.
The procedure was similar to that of the MT (Figure 1).
On the day before fMRI testing, participants were shown the robotic arm and introduced to controlling reaching and grasping actions through button presses in our behavioral laboratory. They were told that the next day's fMRI task would involve controlling the robot remotely. They then practiced controlling the robot live in the following four experimental conditions: (1) Reach: Pressing the button beneath the middle finger initiated the robot to perform the same actions as in the MT Reach condition, but with the “fingers” of the tools' end-effector closed and simply contacting the top of the wooden block and then moving laterally to the circular opening (Figure 2). (2) Grasp: Pressing the button under the index finger initiated the robot to pick up the wooden block, transport it laterally, and place it in the circular opening (Figure 2), similar to the MT Grasp condition. (3) Press: Pressing the button beneath the ring finger was ineffective; the blank screen and fixation cross remained visible for the duration of the trial. This condition was a control for activity related to the motor response. (4) Watch: No buttons were pressed, and participants observed the robot performing either the Grasp or Reach actions autonomously. Counterbalanced trial orders in the training session differed from those used during testing on the following day.
fMRI Testing Session
Unbeknownst to participants, during the fMRI experiment, they viewed prerecorded videos of the robot movements rather than an actual live video feed of the robot. In all other respects, the fMRI testing session was identical to the training session. To reinforce the impression of live video, reach and grasp actions of the robot were recorded from four different camera angles to create a total of 16 different videos of the robotic arm: 4 perspectives × 2 movement types (reach, grasp) × 2 block colors (red, white).
Each 12-sec trial began with a 500-msec visual instructional cue consisting of either the word “Grasp” or “Reach.” The instructional cue was followed by a variable duration delay period of 2000, 2500, 3000, or 3500 msec during which time participants were instructed to prepare to press the associated button. During the delay period, the omnipresent white fixation point was displayed against a black background (Figure 2). The 2500-msec premovement phase began with the onset of the instructional cue and concluded at the end of the shortest (2000 msec) delay interval. At the end of the delay interval, a movement cue appeared consisting of either the word “Go”, “Press”, or “Watch.” The 6750-msec execution phase began with the onset of the movement cue and concluded after the end of the video clip in the Go or Watch conditions (or fixation period in the case of the Press condition). After a “Go” movement cue, the participant was instructed to push either the “Grasp” or “Reach” button, depending on the identity of the preceding instructional cue. If issued within 750 msec of movement cue onset, a correct button press response would launch a video of the robot either grasping or reaching as described above. Likewise, issuing a correct “Press” response would result in a blank screen with central fixation cross through the end of the trial. If the participant did not press a button within 750 msec of the movement cue, feedback “too slow” was displayed for 6 sec. For the “Watch” movement cue, the participant was instructed to refrain from issuing any response and instead merely watch the robot autonomously perform the reach or grasp actions as indicated by the preceding instructional cue. To reinforce the sense of control, pressing an incorrect button in the training and experimental sessions resulted in observing the robot perform the corresponding, incorrect action.
The two instruction cues (Grasp, Reach) and the three movement cues (Go, Press, Watch) defined six unique trial types. The experiment consisted of eight predefined runs presented in counterbalanced order across participants. Every run contained 29 trials in optimally counterbalanced order (12 with the instructional cue reach [4 trials followed by Go, 4 by Watch, and 4 by Press], 12 grasp [4 trials followed by Go, 4 by Watch, and 4 by Press], and 5 null [black screen with central fixation cross]; Figure 2).
On the day of the fMRI experiment, participants completed a single refresher run using a trial order from the previous day's training session. At the beginning of each run, a 15-sec fixation screen was presented to allow the participant to become oriented, and a 15-sec fixation screen was shown at the end of each run to capture the BOLD response related to the last trial presented. The total time of each run of trials was 6:03.
All MRI scans were performed on a Siemens (Erlangen, Germany) 3T Allegra MRI scanner at the Robert and Beverly Lewis Center for Neuroimaging located at the University of Oregon. BOLD echo-planar images were collected using a T2*-weighted gradient-echo sequence, a standard birdcage radiofrequency coil, and the following parameters: repetition time = 2500 msec, echo time = 30 msec, flip angle = 80°, 64 × 64 voxel matrix, field of view = 200 mm, 42 contiguous axial slices acquired in interleaved order, thickness = 4 mm, in-plane resolution = 3.125 × 3.125 mm, and bandwidth = 2605 Hz/pixel. High-resolution T1-weighted structural images were also acquired using the 3-D MP-RAGE pulse sequence: repetition time = 2500 msec, echo time = 4.38 msec, inversion time = 1100 msecec, flip angle = 8.0°, 256 × 256 voxel matrix, field of view = 256 mm, 176 contiguous axial slices, thickness = 1 mm, and in-plane resolution = 1 × 1 mm. DICOM image files were converted to NIFTI format using MRIConvert software (lcni.uoregon.edu/jolinda/MRIConvert/).
fMRI Data Analyses
For both MT and RT, fMRI data processing was carried out using FEAT (FMRI Expert Analysis Tool) Version 4.14, part of FSL (FMRIB's Software Library, www.fmrib.ox.ac.uk/fsl). The following prestatistics processing was applied: motion correction using MCFLIRT (Jenkinson, Bannister, Brady, & Smith, 2002); nonbrain removal using BET (Smith, 2002); spatial smoothing using a Gaussian kernel of FWHM 5 mm; high-pass temporal filtering (Gaussian-weighted least-squares straight line fitting, with sigma = 100 sec). Time-series statistical analysis was carried out using FILM with local autocorrelation correction (Woolrich, Ripley, Brady, & Smith, 2001). Delays and undershoots in the hemodynamic BOLD response were accounted for by convolving the model with a double-gamma basis function. Registration to high-resolution structural and/or standard space images (MNI-152) was carried out using FNIRT (Smith et al., 2004). Localization of cortical (Eickhoff et al., 2005) and cerebellar (Diedrichsen, Balsters, Flavell, Cussans, & Ramnani, 2009) responses was determined through use of probabilistic atlases in FSLview and visual inspection. For surface visualization, data were mapped to a 3-D brain using CARET's Population-Average, Landmark- and Surface-based atlas using the Average Fiducial Mapping algorithm (Van Essen, 2005).
Manual Task Analyses
The experimental conditions for each run were modeled separately at the first level of analysis for each individual participant. Five explanatory (i.e., predictor) variables (EVs) were modeled along with their temporal derivatives. Four EVs coded the experimental conditions, Phase (Plan, Execute), and movement Type (Reach, Grasp). A fifth EV coded the 9000-msec null (resting baseline) trials. Orthogonal contrasts (one-tailed t tests) were used to test separately for differences between combinations of the four experimental conditions and between combinations of the four experimental conditions and resting baseline. The resulting first-level contrasts of parameter estimates (COPEs) then served as input to higher-level analyses carried out using FLAME Stage 1 (Woolrich, Behrens, Beckmann, Jenkinson, & Smith, 2004; Beckmann, Jenkinson, & Smith, 2003) to model and estimate random-effects components of mixed-effects variance. Z (Gaussianized T/F) statistic images were thresholded using clusters determined by Z > 2.3 and a (corrected) cluster significance threshold of p = .05 (Worsley, 2001).
First, a whole-brain analysis was undertaken to identify the cerebral areas that responded significantly to the experimental conditions when compared with resting baseline at the group level. The first-level COPEs were averaged across participants (second level). Second, to test for the main effects of Phase and Type and for the interaction between these two factors, a standard 2 (Phase: plan, execute) × 2 (Type: reach, grasp) repeated-measures ANOVA (F tests) was carried out on first-level COPEs. The sensitivity of this analysis was increased by restricting it to only those voxels that showed a significant increase in activity in at least one of the four experimental conditions compared with resting baseline at the group level in the whole-brain analysis (Z > 2.3, corrected cluster significance threshold of p = .05).
Robot Task Analyses
Trials were excluded from the fMRI analysis if participants made any of the following errors: pressed a button when none was expected, pressed an incorrect button or more than one button, or pressed the correct button <200 msec or >750 msec from the onset of the movement cue. The preprocessing and data modeling steps were identical to those described earlier for the MT. The execution phase EVs included the 750 msec that the motion cue was presented and the subsequent 6000 msec of either stimulus video (in the Watch or Go conditions) or fixation point (in the Press condition). A ninth EV coded the 12,000-msec null trials that were used as resting baseline.
To functionally define the aIPS, we compared activation for the manual grasp versus reach conditions of the MT (Frey et al., 2005; Binkofski et al., 1998). A 5-mm radius sphere was centered on the x, y, z coordinates of the group mean peak activation located at the intersection of the left IPS and postcentral sulcus. Mean percent signal change (PSC) for each condition of the RT was then computed using FSL's Featquery across significantly activated voxels for the contrast of grasp versus reach conditions in the MT that were located within the boundaries of the sphere. This was done separately for each participant and for the Planning versus Execution phases of the RT. Mean PSC was analyzed using a 2 (Instruction cue: Grasp, Reach) × 3 (Movement cue: Go, Press, Watch) repeated-measures ANOVA.
All trials were completed within the time limits, as indexed by button release at movement initiation and button press when the hand returned to the starting location. Video of the movements was reviewed, and a total of 16 trials across participants were removed before analysis because the grasp or reach actions were incorrectly executed.
A direct comparison between reach and grasp conditions following the onset of the instructional word cues (“Reach” or “Grasp”) failed to detect any areas exhibiting significant differences, and the data were therefore pooled for further analysis. Relative to resting baseline, we detected significant premovement increases in activity within the occipital cortex, extending dorsally into the medial superior parietal lobule, as well as in left vPMC and dorsal premotor cortices (dPMC; Figure 3A). These premotor increases are consistent with prior work reporting similar responses in association with processing action verbs (Pulvermuller, Hauk, Nikulin, & Ilmoniemi, 2005; Hauk, Johnsrude, & Pulvermuller, 2004). Bilateral activity was detected at the TPJ, a region considered to be part of the ventral attention network (Corbetta & Shulman, 2002; Shulman, d'Avossa, Tansy, & Corbetta, 2002) that may also play a role in the prediction of upcoming actions (Carter, Bowling, Reeck, & Huettel, 2012). In the left hemisphere, this cluster of increased activity extended into the caudal left middle temporal gyrus (cMTG), which shows increased activity when planning manual object-oriented actions (Marangon, Jacobs, & Frey, 2011; Martin et al., 2011; Jacobs et al., 2010).
Relative to resting baseline, execution of either the grasp or reach conditions resulted in significant increases in a distributed network associated with closed loop sensorimotor control, including bilateral posterior parietal and premotor cortex and subcortical regions (cerebellum and BG). Direct comparisons of grasp versus reach conditions revealed increases in activity in the contralateral aIPS (peak location at MNI coordinates: −54, −Figure 3B). As in prior fMRI investigations (Frey et al., 2005; Culham et al., 2003; Binkofski et al., 1998), the result of this contrast was used to functionally define the aIPS (see Methods) for ROI analyses elaborated below. These differences likely reflect the increased sensorimotor control demands of the Grasp condition, which include preshaping the hand to engage the object, integrating sensory feedback associated with these movements, as well as with object contact and transport. Increased sensory feedback related to object contact may account for the finding of bilaterally increased activity of the parietal operculum (putative secondary somatosensory cortex) during the grasp condition (−56, −32, 12; 56, −28, 26), as has been reported previously (Frey et al., 2005; Grafton, Fagg, Woods, & Arbib, 1996). We also detected increased activity in the lateral convexity of the right occipital cortex (48, −68, 24). This is the vicinity of area MT+, a complex known to be involved in processing visual motion (Ferber, Humphrey, & Vilis, 2003; Dukelow et al., 2001), including that of the hands (Whitney et al., 2007; Oreja-Guevara et al., 2004). This increase may therefore reflect greater motion of the fingers and object during the grasp, as compared with the reach, condition. Lateral occipital cortex is also involved in processing object structure (Kourtzi & Kanwisher, 2001), and its greater engagement during the grasp condition could reflect the additional processing needed to derive structural properties for the specification of hand shape. As with the majority of prior studies, the grasp > reach comparison failed to reveal grasp-selective activity within the premotor cortex (Grafton, 2010; Castiello & Begliomini, 2008).
The overall error rate was 7%, and subsequent analyses were based exclusively on correctly performed trials (see Methods for details). Participants correctly identified the number of red blocks in the grasp condition on 91.1% of the runs.
As in the MT, direct comparison of grasp versus reach conditions failed to reveal any significant differences, and data were therefore pooled for subsequent analyses. Although subsequent movements only involved pressing a button to launch the reach or grasp actions of the robot, we again detected increased left premotor activity (relative to resting baseline) in response to the instruction cues (Figure 4A). This too is possibly related to the processing of action verbs (Pulvermuller et al., 2005; Hauk et al., 2004). In the RT, however, premovement responses were more expansive, possibly due the collection of more data. Areas of increased activity included the entirety of the left intraparietal sulcus (IPS), extending rostrally through the postcentral gyrus into the central sulcus and onto the precentral gyrus (Figure 4A). Left-lateralized vPMC and IPS activity has been reported previously during the planning of grasping actions with a handheld novel tool that had a nonarbitrary relationship to hand movements (Martin et al., 2011; Jacobs et al., 2010). In the present case, these increases may be associated with the demands of solving the arbitrary mapping between finger movements (required to depress the correct buttons) and the actions of the tool (reach or grasp). This account may also explain increased activity in the cingulate gyrus extending dorsally into the pre-SMA, regions known to be involved in motor cognition (Macuga & Frey, 2012; Frey & Gerry, 2006) and movement inhibition (Sharp et al., 2010). Subcortical increases were present in the BG and the right cerebellum, which are functionally interconnected with one another and with the cerebral cortex (Bostan, Dum, & Strick, 2010; Bostan & Strick, 2010). Structures within the BG contribute to a variety of motor-related functions including motor learning and the modulation of reward-related motor activity (Turner & Desmurget, 2010), whereas the cerebellum participates in a variety of cognitive and motor functions including timing and feed-forward control (Manto et al., 2012; Fuentes & Bastian, 2007; Wolpert, Miall, & Kawato, 1998; Ivry & Baldo, 1992; Keele & Ivry, 1990).
Relative to pressing an ineffective response button (Press), when participants pressed designated buttons to initiate grasp (Grasp Go) or reach (Reach Go) with the robotic arm and then observed the resulting actions, activity increased throughout bilateral occipital, posterior parietal, premotor, and lateral prefrontal regions that have frequently been reported during observation of manual actions (Macuga & Frey, 2012; Frey & Gerry, 2006; Grafton, Arbib, Fadiga, & Rizzolatti, 1996). Increases in activity were also detected bilaterally in the BG and cerebellum (Figure 4B). Contrary to what is expected if the aIPS selectively represents the goal of grasp, however, we failed to detect any significant differences between grasp and reach execution (Grasp Go < > Reach Go), and the data were therefore pooled for subsequent analysis.
Relative to resting baseline, passively observing the robot autonomously reach (Reach Watch; Figure 5A) or grasp (Grasp Watch; Figure 5B) was associated with a widespread pattern of bilateral cortical and subcortical increases in activity, closely resembling the results of comparing the Go versus Press conditions (cf. Figures 4B and 5A, B). Increases in activity within inferior parietal and ventral premotor regions that are considered part of the mirror neuron system (Rizzolatti & Craighero, 2004) during observation of the robot's movements are consistent with some prior work (Gazzola, Rizzolatti, Wicker, & Keysers, 2007; Ferrari, Rozzi, & Fogassi, 2005). Together, these findings suggest that involvement of this network in action perception is not exclusive to behaviors involving biological effectors. Likewise, these responses do not appear to depend on the observer perceiving that they have caused these actions. Comparison of the observation of grasping (Grasp Watch) versus reaching (Reach Watch) revealed significant bilateral increases in the LOC and cMTG (Figure 5C). Similar to what was mentioned in regard to the MT, LOC involvement may reflect greater motion of the robot's hand and/or the object (Whitney et al., 2007; Oreja-Guevara et al., 2004), or greater processing of object structure during the Grasp condition (Kourtzi & Kanwisher, 2001). Given its involvement in processing the motions of objects and tools (Beauchamp, Lee, Haxby, & Martin, 2002, 2003), the adjacent left cMTG responses may also reflect the additional grasp-related motion. However, this cannot explain the presence of the Grasp versus Reach difference in these regions during passive observation, but not when participants initiate the very same actions through button presses (i.e., Grasp Go vs. Reach Go). As will be discussed, this suggests that responses within these areas may be suppressed when the actor causes these behaviors. The inverse contrast (Reach Watch > Grasp Watch) failed to reveal any differences.
Neither the aIPS nor premotor cortex demonstrated sensitivity to the perception of self-initiated causal actions. Instead, we detected a robust response within the right cerebellum that traversed areas V and VI and a smaller cluster within the vermis. Cerebellar activity was greater when participants pressed a button to initiate movements of the robotic arm and observed the resulting reaching or grasping actions compared with when they passively watched the same actions performed autonomously by the robot (Go conditions > Watch conditions; Figure 6A). Critically, use of an inclusive contrast mask revealed that most voxels in these cerebellar clusters also exhibited significantly greater activity when participants launched and observed the robot's subsequent actions versus when they pressed the nonfunctional button (Go conditions > Watch conditions and Go conditions > Press conditions; Figure 6B). Reponses within cerebellar voxels surviving this conjunction cannot be attributed either to the perception of the robot's actions (identical in both Go and Watch conditions) or to the motor demands of the button press (identical in the Go and Press conditions). As will be discussed shortly, the right cerebellum appears to be related to the actors' perceptions that their pressing the correct button controlled the behavior of the robot, or to the generation of predictions concerning the sensory feedback that is expected to follow pressing of the functional “Grasp” or “Reach” buttons.
ROI Analysis in the Functionally Defined aIPS
More sensitive ROI analyses were conducted on the mean percent BOLD signal change (PSC) in the aIPS, defined functionally on the basis of results from the grasp > reach execution contrast in the MT (Figure 3B). For the MT, there was a main effect of Movement Phase [F(1, 17) = 5.2168, p < .001], which reflects greater activity during execution (mean PSC = 1.92, SD = 1.01) than planning (mean PSC = −0.062, SD = 0.34). During the RT, the difference between execution (mean PSC = 0.28, SD = 0.37) and planning (mean PSC = 0.29, SD = 0.31) was nonsignificant, p = .94. There was a main effect of Action Type [F(1, 15) = 4.727, p = .046], due to responses during the Go conditions (mean PSC = 0.56, SD = 0.09) being greater than in either the Watch (mean PSC = 0.43, SD = 0.55, p = .04) or Press (mean PSC = 0.29, SD = 0.56, p = .005) conditions (Figure 7). Importantly, unlike the MT, there was no difference between mean responses in the grasp (Grasp Go: mean PSC = 0.378, SD = 0.39) and reach (Reach Go: mean PSC = 0.378, SD = 0.48) execution conditions, p = .99. As will be discussed shortly, this is contrary to what is expected if the aIPS is representing the goal of grasping independent of the movements and effectors involved.
Our primary objective was to clarify the roles of the aIPS and/or vPMC in representing the goals of object-oriented actions independent of the demands associated with sensorimotor control. This was approached through use of a task in which grasping and reaching with a robotic arm were initiated with button presses, movements bearing an arbitrary causal relationship to the actions they controlled. Consistent with existing evidence (Frey et al., 2005; Culham et al., 2003; Binkofski et al., 1998), we found that the comparison of manually performed grasp versus reach yielded significantly increased activity at the intersection of the IPS and postcentral sulcus (i.e., the functionally defined aIPS). By contrast, when these actions were undertaken with a robotic arm controlled by button press, the aIPS, vPMC (and indeed all other regions exhibiting increased activity) responded equivalently during both grasp and reach. This is inconsistent with what is expected if these areas represent the goal of grasping independent of the demands associated with sensorimotor control (Tunik, Rice, Hamilton, & Grafton, 2007; Hamilton & Grafton, 2006; Tunik et al., 2005; Johnson-Frey et al., 2003). As will be discussed, these findings instead support the hypothesis that grasp-specific responses in aIPS depend on the existence of a nonarbitrary causal relationship between the sensorimotor demands of hand movements and resulting manual or tool actions.
Grasp-selective Responses in the aIPS Depend on Nonarbitrary Causal Relationships between Hand Movements and End-effectors
Manual reach and grasp actions were associated with significant increases in activity throughout a widespread network of regions involved in sensorimotor control, including bilateral posterior parietal and premotor cortices. As in past research, we detected greater activity when contrasting the execution of manual grasping versus reaching conditions near the intersection of the IPS and postcentral sulcus, the functionally defined aIPS (Frey et al., 2005; Culham et al., 2003; Binkofski et al., 1998). When using the robot, activity in posterior parietal and premotor cortices also increased significantly above resting baseline. However, no brain regions exhibited grasp-selective increases in activity when these actions were undertaken with the robotic arm via button presses.
Prior fMRI research, including from our own lab, reports that grasping with the hands or with a handheld tool engages the aIPS and premotor cortex (Gallivan et al., 2013; Jacobs et al., 2010; As noted in the Introduction, however, the causal relationship between hand movements and the actions of the tools' end-effectors was nonarbitrary in all of these studies. This is also true of the elegant investigation of F5 neurons by Umilta and colleagues (2008). Regardless of whether pliers are normal or reversed, grasping with the tool still requires finger flexion or extension movements naturally involved in grasping. It is important that the introduction of a truly arbitrary causal relationship between hand movements (single degree-of-freedom button presses) and the actions of the robotic arm abolished any grasp-selective responses in these areas. On the basis of these findings, we conclude that the selective involvement of the aIPS and premotor cortex in grasp depends on hand movements that bear a nonarbitrary causal relationship to natural manual grasping. In the absence of such a relationship, we failed to detect any evidence for grasp-selective responses in the human brain, other than those exhibited during passive observation, as discussed shortly.
An alternative interpretation is that the aIPS does code the goal of grasping, but only when it is the immediately forthcoming (proximal) objective of the actor (Grafton, personal communication). This predicts that the aIPS will be involved not only when manually grasping an object but also when a handheld tool is grasped, as was the case in all previous studies. If, as in the RT, grasp is not involved in wielding and controlling the tool, then the aIPS will not respond selectively even when the ultimate (distal) goal of the action is grasping. Further work is required to determine if these closely related alternatives can be empirically disambiguated.
In this initial investigation, we elected to use a simple button press to launch reach or grasp actions, rather than have participants engage in online sensorimotor control of the robot. This design was chosen to equate the sensorimotor demands between these two movement conditions perfectly. Future research should focus on comparing neural representations involved in controlling such devices through the use of both arbitrary and nonarbitrary control signals. The challenge will be to equate these conditions for their sensorimotor demands. If our hypothesis is correct, then grasp selectivity in the aIPS (and perhaps also premotor cortex) will be apparent only when there is a nonarbitrary causal relationship between hand movements and tool actions.
Lateral Occipital Cortex and cMTG Differentiate between Passively Observed Grasping versus Reaching Actions of the Robot
We did find differences in responses to passive observation of the robot autonomously grasping versus reaching. Relative to resting baseline, watching both types of actions were associated with widespread increases in cortical and subcortical activity that included inferior parietal and premotor regions considered to be part of the mirror neuron system (Rizzolatti & Craighero, 2004). This is contrary to what is expected if this network is exclusive to actions undertaken with biological effectors (Tai, Scherfler, Brooks, Sawamoto, & Castiello, 2004). Instead, it is consistent with prior evidence of responses in macaque F5 neurons during the observation of grasping with tools, actions not in their motor repertoire (Ferrari et al., 2005), and with increased parieto-frontal activity detected in humans during the observation of a robot's actions (Gazzola et al., 2007).
In both hemispheres, the lateral aspect of the lateral occipital cortex, extending rostrally into cMTG, exhibited increased activity during the observation of the robot grasping versus reaching. One possible reason for this difference concerns the greater motion in the Grasp condition of both the robot's hand and the target blocks during pickup, transport, and release. The lateral aspect of the occipital lobe (putative MT/MT+) plays a key role in processing visual motion (Whitney et al., 2007; Oreja-Guevara et al., 2004; Ferber et al., 2003; Dukelow et al., 2001) and object form (Kourtzi & Kanwisher, 2001), which may receive greater attention when grasping because of its relevance to end-effector preshaping. Relatedly, Beauchamp and colleagues have demonstrated selective responses within the adjacent cMTG for processing the motions of objects and tools (Beauchamp et al., 2002, 2003). Alternatively, earlier work revealed very similar responses when participants viewed causal interactions between objects. Lateral occipital and caudal temporal activity was increased bilaterally when one ball appeared to cause movements of another through collision, as compared with when similar actions occurred in the absence of contact (Blakemore, Fonlupt, et al., 2001). It is possible that the effects exhibited here are similarly driven by the observation of causal contact between the hand and object during Grasp, but not during Reach. None of these accounts, however, explains the absence of these differences when participants initiated the very same actions through button presses (i.e., Grasp Go vs. Reach Go). It is possible that responses within these areas, whether driven by motion, form, and/or perceived causal interactions between tool and object, may be suppressed when the actor causes these actions. We speculate that corollary discharge associated with the button press that initiated these actions modulates responses within the regions and that this could play a role in the perception that these events are causally related to one's own movements. More work is clearly needed to replicate and clarify this effect.
Cerebellum Responds Selectively to Self-generated Actions of the Robot
Earlier electroencephalographic work suggests that the aIPS and premotor cortex are sensitive to the perception that an observed action is under one's own control (Bozzacchi et al., 2012). If so, then greater activity should be detected in these areas when participants press a button to launch the robot's actions and observe the ensuing consequences versus when they press a button known to be nonfunctional and watch the very same actions undertaken autonomously by the robot. Conventional whole-brain analyses failed to detect the predicted effect in any cortical region. A more sensitive ROI analysis within the functionally defined aIPS did, however, reveal significantly greater increases in activity in the Go conditions than in either the Press or Watch conditions. It appears that the aIPS may have modest sensitivity to the perception that an action is self-initiated. This more subtle effect may be responsible for the results reported by Bozzacchi et al. using ERPs and a paradigm that involved videos closely resembling the participants' own hands.
Unexpectedly, whole-brain analyses did detect increased activity in the right cerebellar hemisphere (V, VI, and vermis) when button presses led to the robot's actions than during the Watch or Press conditions. It is important to appreciate that this difference was observed in a comparison between conditions with identical motor responses (button presses) and visual stimulation (the same prerecorded video clips of the robot reaching or grasping). There is a considerable literature arguing for involvement of the cerebellum in generating predictions about the sensory feedback that will result from one's motor commands (Nowak, Topka, Timmann, Boecker, & Hermsdorfer, 2007; Flanagan, Vetter, Johansson, & Wolpert, 2003; Blakemore, Frith, & Wolpert, 2001; Wolpert et al., 1998). From this perspective, this cerebellar increase might reflect the prediction of the forthcoming sensory events that are expected to follow pressing of the “Grasp” or “Reach” buttons. This would explain why a comparable response is not detected during passive observation of the robot's actions or when the button known to be nonfunctional is pressed. This account is supported by evidence from an elegant study by Blakemore and colleagues that introduced delays between movements of the right hand and the experience of touch on the left palm. They found that responses within the right lateral cerebellar hemisphere increased as a function of the delay interval and concluded that this was attributable to sensory prediction (Blakemore, Frith, et al., 2001). Likewise, damage to the cerebellum impairs the ability to anticipate the sensory consequences of one's own movements (Diedrichsen, Verstynen, Lehman, & Ivry, 2005).
Alternatively, asymmetric involvement of the cerebellum, ipsilateral to hand involved in motor or sensory functions, is well documented (Diedrichsen, Wiestler, & Krakauer, 2013; Yan et al., 2006; Sakai et al., 1998), and it is possible that this accounts for the right-lateralized responses observed here. However, this alone cannot explain why this region exhibited greater activity when a button press with the right hand controlled the behavior of the robot versus when an ineffective button was pressed (see Figure 6B).
In conclusion, grasp-selective responses in the aIPS appear to depend on the existence of a nonarbitrary casual relationship between hand movements and consequent actions. When this relationship exists, as in our MT and prior investigations of grasping with the hands or with tools, the aIPS exhibits grasp-selective responses. Conversely, when hand movements bear an arbitrary causal relationship to tool actions, we fail to detect grasp-selective responses in the aIPS. If the aIPS is truly involved in goal-dependent representations of grasp, then we would instead expect grasp-selective responses that are independent of the relationship between hand movements and tool actions. This work may have important implications for our understanding of neural representations involved in the use of advanced technologies including brain-controlled interfaces, neural prosthetics, and assistive technologies, whose control signals can be very flexibly related to their actions. In turn, this understanding may engender neurally inspired control systems that exploit organizing principles of the existing biological architecture for action (Sadtler et al., 2014; Leuthardt, Schalk, Roland, Rouse, & Moran, 2009).
This work was supported by grants to S. H. F. from ARO/ARL (49581-LS) and NIH/NINDS (NS053962). The authors thank Bill Troyer for technical assistance, Ken Valyear for thoughtful discussions and input on the manuscript, and anonymous reviewers for constructive feedback.
Reprint requests should be sent to Scott H. Frey, 205a Melvin H. Marx Building, 1416 Carrie Francke Drive, Columbia, MO 65211, or via e-mail: firstname.lastname@example.org.