We investigated the neural substrates involved in visuo-haptic neuronal convergence using an additive-factors design in combination with fMRI. Stimuli were explored under three sensory modality conditions: viewing the object through a mirror without touching (V), touching the object with eyes closed (H), or simultaneously viewing and touching the object (VH). This modality factor was crossed with a task difficulty factor, which had two levels. On the basis of an idea similar to the principle of inverse effectiveness, we predicted that increasing difficulty would increase the relative level of multisensory gain in brain regions where visual and haptic sensory inputs converged. An ROI analysis focused on the lateral occipital tactile–visual area found evidence of inverse effectiveness in the left lateral occipital tactile–visual area, but not in the right. A whole-brain analysis also found evidence for the same pattern in the anterior aspect of the intraparietal sulcus, the premotor cortex, and the posterior insula, all in the left hemisphere. In conclusion, this study is the first to demonstrate visuo-haptic neuronal convergence based on an inversely effective pattern of brain activation.
There has been growing interest in the study of multisensory integration for the last few decades, leading researchers to investigate the neural substrates involved in visual and haptic object recognition (for reviews, see James, Kim, & Fisher, 2007; Amedi, von Kriegstein, van Atteveldt, Beauchamp, & Naumer, 2005). Results of these studies suggest that vision and touch share similar neural substrates for processing the macrogeometric properties of objects (i.e., form/shape) in occipito-temporal (Stilla & Sathian, 2008; Peltier et al., 2007; Pietrini et al., 2004; Zhang, Weisser, Stilla, Prather, & Sathian, 2004; Stoesz et al., 2003; James et al., 2002; Amedi, Malach, Hendler, Peled, & Zohary, 2001) and intraparietal cortices (Stilla & Sathian, 2008; Zhang et al., 2004; Grefkes, Weiss, Zilles, & Fink, 2002; Culham & Kanwisher, 2001) as well as in cerebellum (Naumer et al., 2010; Stevenson, Kim, & James, 2009). These studies also suggest that information from vision and haptics converges at these cortical sites for the purpose of visuo-haptic multisensory perception and/or action (Kim & James, 2011; James & Kim, 2010; Dijkerman & de Haan, 2007; James et al., 2007; Reed, Klatzky, & Halgren, 2005). In general, most of these previous fMRI studies have assessed the overlap of visual and haptic representations. Overlap of inputs from two or more sensory systems is indicative of ‘areal’ convergence. In the case of vision and haptics, areal convergence would imply the comingling in one brain region (or voxel cluster) of visual neurons and haptic neurons. Areal convergence, however, does not imply the presence of multisensory neurons in the region. The presence of multisensory neurons in a region is termed ‘neuronal’ convergence; multisensory neurons receive converging inputs from two or more sensory systems. Finding overlap of fMRI activation in a brain with visual and haptic stimuli is evidence that that region may be a site of areal convergence, that is, an area that has both visual and haptic neurons. However, overlap alone is not enough to imply neuronal convergence (James & Stevenson, 2011; James, Stevenson, & Kim, in press; Stevenson et al., 2009). Thus, despite the number of studies investigating haptic and visuo-haptic object processing, there is very little evidence concerning whether the occipito-temporal and intraparietal cortices show evidence of ‘neuronal’ convergence (Kim & James, 2010; Tal & Amedi, 2009).
The principle of inverse effectiveness has been employed to investigate multisensory integration for almost three decades, because Meredith and Stein (1983) described the inverse relationship between unisensory effectiveness and multisensory enhancement in cat superior colliculus cells. The principle of inverse effectiveness states that multisensory gain increases as the responsiveness to the constituent unisensory stimuli decreases. Inverse effectiveness is typically (but not necessarily) evaluated by degrading the quality of the stimuli. Degraded stimuli generally produce less activation than less degraded stimuli, providing a gradient of activation along which one can assess inverse effectiveness. The principle of inverse effectiveness has been widely used to investigate multisensory integration in non-human animals (Perrault, Vaughan, Stein, & Wallace, 2005; Meredith & Stein, 1983, 1986b). More recently, researchers have also started to study inverse effective patterns of activation using human neuroimaging (James et al., in press; Kim & James, 2010; Stevenson & James, 2009; Werner & Noppeney, 2009; Kayser, Petkov, Augath, & Logothetis, 2005), but inverse effectiveness has yet to be shown in relation to visuo-haptic multisensory object processing.
In a previous study (Kim & James, 2010) using an additive-factors design (Sternberg, 1969), we attempted to relate the principle of inverse effectiveness to the study of visuo-haptic object recognition. As with typical studies of inverse effectiveness, stimuli of different quality were presented and inverse effectiveness was assessed across the levels of quality. Unexpectedly, the study showed evidence for ‘enhanced effectiveness’ in three distinct object-selective brain regions: the left lateral occipital tactile–visual area (LOtv), the left intraparietal sulcus (IPS), and the anterior aspect of the left fusiform gyrus. Enhanced effectiveness is an increase in multisensory enhancement as the effectiveness of the constituent unisensory stimuli increases, which is the opposite of inverse effectiveness. Although this effect is not the same as inverse effectiveness, finding an effect in that direction also implies neuronal convergence of multisensory inputs (Kim & James, 2010; Stevenson et al., 2009).
In that study, novel objects made up of simple shape features were presented visually, haptically, or visuo-haptically at different levels of stimulus quality while participants performed a shape categorization task. Visual stimuli were pictures of the objects and were degraded by adding noise and reducing contrast. Tangible haptic stimuli were explored by subjects with their hands and were degraded by having subjects wear gloves. In the visuo-haptic condition, the procedures for degrading the constituent unisensory stimuli led to incongruencies in spatial location and temporal synchrony between visual and haptic inputs. Spatial and temporal incongruencies are known to influence firing rates in multisensory neurons (Meredith, Nemitz, & Stein, 1987; Meredith & Stein, 1986a; King & Palmer, 1985) as well as BOLD signals in audiovisual multisensory neuronal populations (Stevenson, VanDerKlok, Pisoni, & James, 2011; Stevenson, Altieri, Kim, Pisoni, & James, 2010; Miller & D'Esposito, 2005). Although the exact mechanism remains unclear, it is quite possible that a combination of these incongruencies in the previous study may have altered the integration of visual and haptic signals such that it gave rise to enhanced effectiveness instead of inverse effectiveness in the population measurement.
On the basis of this explanation of the previous findings, the first goal of this study was to develop a procedure for investigating visuo-haptic object recognition that would reduce incongruencies in stimulus presentation parameters and produce optimal multisensory integration of visual and haptic shape information. We chose two candidate factors for optimization, temporal and spatial congruency. Instead of viewing pictures of objects while touching tangible objects, participants in the current study were able to view the tangible objects and view their hand touching them through a mirror. This change to the procedure lessened the spatial incongruency caused by viewing and touching the object in different location. A recent fMRI study employed a similar procedure of visual and tactile stimulation where participants looked directly at their hand being touched by objects (Gentile, Petkova, & Ehrsson, 2011). In this study, however, participants did not actively explore the stimuli but rather passively felt them being stroked on their index finger and were not asked to make any perceptual or cognitive judgments on the shape of the stimuli, whereas the participants in the current study actively touched the stimuli while carrying out perceptual judgment tasks. Participants in the current study were also trained and specifically asked to open their eyes only when they began touching the object and to close their eyes only when they finished touching the object. This change to the procedure lessened the temporal incongruency caused by the difference in time required to move the hand compared with opening/closing the eyes.
To implement this new protocol, it was necessary to alter the task from the previous study. In the previous study, effectiveness was manipulated by degrading the stimuli, which is highly typical in studies of inverse effectiveness (Kim & James, 2010; Stevenson & James, 2009; Werner & Noppeney, 2009; Kayser et al., 2005; Perrault et al., 2005). To allow subjects to view the object and their hand touching the object simultaneously, we manipulated effectiveness by changing the level of similarity among the objects and thus the difficulty of object recognition. There is evidence to suggest that increasing the level of similarity produces changes in effectiveness in object-selective brain regions in the desired direction for assessing inverse effectiveness (Joseph & Farley, 2004). Thus, in the present experiment, we varied the level of behavioral performance and BOLD activation effectiveness by changing the similarity between objects, rather than by degrading them.
Fourteen volunteers (seven women and seven men, age = 20–34 years) participated in the study with monetary compensation. All participants were strongly right-handed (mean = 98.98, SD = 3.82) according to a revised Edinburgh Handedness Inventory (Oldfield, 1971). Three problematic items among the 10 original items of the Edinburgh Handedness Inventory were excluded to improve its measurement properties (Dragovic, 2004). All participants had normal or corrected-to-normal visual acuity, normal sensation of touch, and no history of neurological disorders. One participant was excluded because of excessive head motion (see below for criteria; final n = 13). The study was approved by the Indiana University Institutional Review Board. Written informed consent was obtained from all participants before the experiments.
Stimuli and Procedures
Different sets of stimuli were used in localizer runs and experimental runs. Fifteen (15) 3-D objects and 15 textures were used in the visual object localizer run. Objects and textures were tangible stimuli with a size of approximately 2 × 2 × 2 cm for objects and 2 × 2 cm for textures made of white acrylonitrile butadiene styrene (ABS) plastic (Figure 1). They were presented on a custom-made table placed over the participant's abdomen and viewed through a mirror. The same stimuli were used in the haptic object localizer run. Participants explored the stimuli with their right hand with their eyes closed in the haptic runs. The use of the right hand was chosen based on a study showing that BOLD activation during haptic object exploration in higher cortical areas such as LOtv is bilateral, regardless of the hand of use (Amedi, Raz, Azulay, Malach, & Zohary, 2010).
Stimuli used in the experimental runs were 3-D tangible objects with a size of 2 × 2 × 2 cm, made of white ABS plastic. The top of each object varied in its curvature such that the least curved object was a square shape, and the most curved object was a circular shape (Figure 2). Stimuli were explored under three experimental conditions: viewing the object through a mirror without touching (V), touching the object with eyes closed (H), or viewing the object through a mirror while touching the object (VH). The possibility of head movements evoked by touching movements was limited by having participants use only their right index finger to touch the objects and use only small movements of the finger and wrist (i.e., no elbow or shoulder movement). Participants performed a two-alternative forced-choice (2AFC) task in which they judged whether each presented object was the more circular (half-cylinder-like) or the more square (cube-like) one.
Difficulty was manipulated in the experimental runs by varying the distinctness of the curvature of the objects. Two difficulty levels were used. On low-difficulty trials, the 2AFC decision was unambiguous, that is, objects were clearly more circular or more square. On high-difficulty trials, the 2AFC decision was more ambiguous, because the curvature of the objects was more similar or closer together along the perceptual dimension of curvature (Figure 2). In summary, sensory modality and task difficulty were two independent variables in a 3 × 2 factorial design. It should be noted that the stimulus quality was equivalent for all conditions, which is a departure from the typical design of a study investigating inverse effectiveness. Instead, it was expected that the pattern of inverse effectiveness could be assessed over the predicted changes in effectiveness in BOLD activation produced by the manipulation of object similarity.
All 3-D stimuli were designed in Rhinoceros 3.0 (Robert McNeel & Associates, Seattle, WA) were made into tangible objects using 3-D printing on a STRATASYS Prodigy Plus (Stratasys, Inc., Eden Prairie, MN) rapid prototyping machine and were rendered to 2-D images for Figure 2 using Flamingo 1.0 (Robert McNeel & Associates, Seattle, WA).
fMRI Imaging Procedures
Before fMRI imaging sessions, participants were trained in an fMRI simulator until they were fully familiarized with the task. Each fMRI imaging session began with two visual object localizer runs and two haptic object localizer runs. The order of these localizer runs was randomized across participants. The localizer runs were conducted using a blocked design. Each localizer run contained 10 stimulation blocks, including of five blocks of an object condition and five blocks of a texture condition. The stimulation blocks were interleaved with 16-sec rest periods. Each stimulation block had four stimulus presentations (either objects or textures, depending on the block type) with each stimulus presented for 3 sec followed by a 1-sec ISI. Participants performed a one-back matching task on each stimulus by pressing the left index finger button (same as the previous stimulus) or middle finger button (different from the previous stimulus). The order of object blocks and texture blocks was randomized across runs and participants. Runs also had 16-sec rest periods at the beginning and at the end. Across the four localizer runs, there were 40 stimulus blocks divided equally among four stimulus conditions (VObject, VTexture, HObject, and HTexture), resulting in 10 blocks per stimulus condition. During the localizer runs, objects or textures were placed on a custom-made “table” on the participant's abdomen by the experimenter. During visual runs, participants viewed the objects and textures through a mirror mounted on the head coil. During the haptic runs, participants were instructed to keep their eyes closed and touch the objects or textures on the table with all of the digits of their right hand. Auditory cues were presented during haptic and visual localizer runs to indicate stimulus onset and offset so that participants knew when to start and stop exploring the stimuli. Ambient lighting for the visual conditions was provided by the MRI bore light, which was located at the rear of the bore, behind the subject's head. All other lights in the MRI room and control room were turned off. The experimenter could identify the stimuli in the dark with glow-in-the-dark marks on the back of each stimulus that were not visible to subjects. The experimenter received the same auditory cues as the subject to control stimulus presentation time.
In the experimental runs, stimuli were presented in a rapid event-related design, and each trial was pseudorandomly chosen from a cell in a 3 × 2 experimental design that crossed sensory modality (V, H, and VH) and task difficulty level (low and high). Each stimulus was presented for 2 sec, followed by a variable ISI. The duration and number of ISIs were pseudorandomly chosen from among 4, 6, and 8 sec. Each run contained 28 trials of stimulus presentation, with 16-sec rest periods at the beginning and at the end. The total number of trials per condition was 42 across nine runs. Participants performed a 2AFC task based on the curvature of the object stimulus and responded whether the stimulus was circular or square. Task difficulty was manipulated by changing the degree of distinctness of the stimulus, that is, how circular or square it was. For the three stimulus modality conditions, participants either viewed the objects without touching (V), touched the objects while their eyes were closed (H), or viewed and touched the objects simultaneously (VH). In the H and VH conditions, participants were instructed to explore the stimulus by moving their right index finger pad across its surface. In the V condition, participants were asked to view the stimulus while mimicking their right index finger sweeping motion from the H and VH condition, but not actually touching the stimulus. It should be specifically noted that in the V and VH conditions, participants were able to see their finger. By having participants mimic the finger sweeping motion during the V condition, motor activation elicited by finger movement was controlled across all three modality conditions. Furthermore, the visual input produced by finger movement was also controlled across the V and VH conditions. Subjects in pilot testing verified that seeing one's finger touch the objects produced an extremely strong sense of spatial congruence between the visual and tactile perceptions.
During experimental runs, participants were specifically asked to begin and terminate visual and haptic stimulus exploration simultaneously in the VH condition to better control the temporal synchrony of the input between sensory modalities. Before the imaging sessions, participants practiced until they were able to consistently achieve simultaneous onset and offset. An auditory beep was presented 2 sec before the stimulus onset to alert participants, and the task instruction (V, H, or VH) was given with the stimulus onset, followed by another auditory beep for the offset of the stimulus after 2 sec of exploration. The task instructions were given by a female speaker saying either “look” for V, “touch” for H, or “together” for VH condition. Behavioral responses were made with the left hand, with a left index finger button press for circular and a left middle finger button press for square. For the trials in which participants failed to follow instructions (e.g., accidently opened their eyes in the H condition), they were asked to withhold a response. Such trials, whether due to not following instructions or not being able to respond, were coded as ‘no response’ and were removed from further analyses.
Pilot testing showed that the experimenter required approximately two additional seconds per trial to present the stimuli in a specific predetermined order that combined the factor of difficulty with the factor of curvature (i.e., more circular versus more square) compared with when the stimuli presented in a specific predetermined order that used only the factor of difficulty. Pilot testing also showed that subjects were more accurate and faster to respond with the less similar pair of objects than the more similar pair. Thus, to maximize the number of trials per condition per subject in the allotted time, the objects were presented in a predetermined order based on difficulty condition, but the decision of whether to present the more circular or more square object on a given trial was made on-line by the experimenter. The use of this presentation order precluded the calculation of accuracy for the trials presented in the scanner but still allowed for the measurement of RTs.
Over the course of the scanning sessions, participants were instructed to limit their movements and trained to minimize their arm and shoulder movements. Before their first scanning session, participants were trained with feedback in an MRI simulator on how to produce the appropriate movements. During imaging, each participant's head was restrained tightly with foam padding in the head coil within the limit to which the foam padding did not cause discomfort. Each participant's elbow was supported by a foam pad to limit arm fatigue and reduce movement of the elbow and shoulder joints, which could also have caused incidental head movements.
Imaging Parameters and Analysis
Imaging was carried out using a Siemens Magnetom TIM Trio 3T whole-body scanner with an eight-channel phased-array head coil. Auditory cues and instructions were presented through headphones connected to a Macintosh computer operated by Mac OS 10 (Apple Computer, Inc., Cupertino, CA). The whole-brain functional volumes were acquired with a field of view of 220 × 220 mm, an in-plane resolution of 64 × 64 pixels, and 33 axial slices with 3.4-mm thickness and 0-mm slice gap, resulting in a voxel size of 3.4 × 3.4 × 3.4 mm. Readout interactions between slices were managed by collecting slices in an interleaved ascending order. Functional images were collected using a relatively standard gradient-echo EPI pulse sequence (echo time = 25 msec, repetition time = 2000 msec, flip angle = 70°). The number of EPI volumes per session was 176 and 116 in the localizer and experimental runs, respectively. High-resolution T1-weighted anatomical volumes with 160 sagittal slices (voxel size = 1 × 1 × 1 mm) were acquired using Turbo-flash 3-D (TI = 1100 msec, echo time = 3.93 msec, repetition time = 14.375 msec, flip angle = 12°).
Imaging data were analyzed using BrainVoyager QX (Brain Innovation, Maastricht, Netherlands) run on a PC operated by Windows XP Professional (Microsoft Corporation, Redmond, WA). Anatomical imaging data were transformed into a standard space corresponding to Talairach's coplanner stereotaxic atlas of the human brain (Talairach & Tournoux, 1988) using an eight-parameter affine transform. Functional imaging data were aligned to the first volume of the last run (the run acquired closest in time to the anatomical data acquisition), registered to the transformed anatomical data, and preprocessed. The preprocessing procedure included 3-D motion correction, slice scan time correction, 3-D spatial Gaussian smoothing (FWHM = 6 mm), and linear trend removal. Trials for which the participant did not respond were excluded from the analyses. The number of ‘no response’ trials per condition was less than 2 (mean = 1.95, SD = 3.85) of 42 trials per participant. Functional runs in which transient head movements exceeded 1 mm and/or gradual drift of the head exceeded 2 mm were excluded from the analyses. Only one individual was excluded based on these criteria.
For the localizer runs, a random effects general linear model (GLM) was conducted on the data of the whole group and a fixed-effects GLM on the data of each individual. Group and individual SPMs were created from the intersection (i.e., conjunction) of three GLM contrasts: (VObject > VTexture), (HObject > HTexture), and (HObject > VTexture). In previous studies, the intersection of only the first two contrasts was used to isolate visuo-haptic object-selective brain regions (Kim & James, 2010; Amedi et al., 2001). Here, using that approach uncovered several clusters in the visual cortex that produced more activation with visual textures than to haptic objects. Thus, the third contrast (HObject > VTexture) was included to ensure that the localized clusters produced more activation with each object condition than with either texture condition. In other words, the third contrast was included to ensure that the cluster was clearly object selective. A fourth contrast (VTexture > HObject) could have been added to the conjunction, but it was deemed unnecessary, because no clusters were found with the three-contrast conjunction in which visual object stimuli produced less activation than haptic texture stimuli.
Experimental runs were analyzed using individual-based ROI analyses, with the ROIs selected from the independent localizer runs. Functional time courses were extracted from each participant's unique ROIs. The individual ROI analysis ensured that the functional time courses for each subject were taken from a region with similar functional specialization. Although the primary interest was the pattern of activation during the experimental runs, for descriptive purposes, the percent BOLD signal change was also calculated for the localizer time courses as the average percent signal change across a time window that began 6 sec after the onset of the stimulus block and ended at the end of the block, and for the experimental time courses as the average percept signal change across a time window between 4 and 10 sec after the onset of the stimulus trial.
Figure 3 shows the mean response time from 14 participants. A two-way ANOVA was performed on response times using an alpha level of .05, and the sphericity assumption for within-subjects ANOVA was tested using Mauchly's test. Under the assumption of sphericity, the ANOVA showed significant effects of Sensory Modality [F(2, 26) = 6.41, p = .005] and Task Difficulty [F(1, 3) = 11.37, p = .005]. Post hoc t tests showed significant differences in response time between low- and high-difficulty levels in V [t(13) = 3.41, p = .002], H [t(13) = 2.91, p = .006], and VH [t(13) = 2.14, p = .026] conditions. The results demonstrate that manipulating the similarity of the stimuli influenced difficulty in the predicted direction. Although the effect of Difficulty appeared to be weaker for the multisensory VH condition compared with unisensory V and H conditions, this observation was not born out statistically, as the interaction between Modality and Difficulty was not significant [F(2, 26) = 1.46, p = .252].
The VH response time was longer than the V response time, which would not be predicted based on multisensory facilitation. Differences in response time between modality conditions were not considered meaningful because of differences in the instructions for the V, H, and VH conditions. For instance, in the VH condition, participants were specifically instructed and trained to open their eyes when they made contact with the stimulus. On the other hand, in the V condition, no contact with the stimulus was made and participants opened their eyes at cue onset. Thus, the longer response time with the VH condition compared with the V condition is attributable to the extra time taken for the finger to travel from the start position and make first contact with the stimulus in the VH condition.
The results from the independent localizer runs are shown in Figure 4 with Figure 4A and B showing the results from a whole-brain group analysis and Figure 4C showing individually defined ROIs. Figure 4A and B illustrates visual, haptic, and visuo-haptic object-selective brain regions defined in a group-averaged whole-brain map on a cortical model of a representative participant's brain. These group maps are purely illustrative and were thresholded at a voxel-wise p value of .003 and a minimum cluster size of 10 voxels. Visual object-selective brain regions are shown in red and haptic object-selective brain regions are shown in blue. Green represents the intersection of the two object-selective maps, that is, brain regions that responded to both visual and haptic objects more than visual and haptic textures.
In addition to the group-averaged map, maps of the same contrast were generated for each participant. Because cluster size varied considerably across individuals at a fixed statistical threshold, we adopted a procedure for active ROI selection that used a different threshold for each participant. The threshold was chosen for each participant based on two criteria. First, a minimum acceptable threshold t value (t = 1.0) was adopted to ensure that the data from both hemispheres of as many participants as possible were included in the ROI analysis. If no clusters greater than 300 mm3 in size were found with this threshold, then the participant was considered to not have an ROI in that area. Second, to limit the extent of each ROI to only the most statistically significant voxels, the threshold t value for excessively large clusters was increased until the size of the cluster was less than 1000 mm3 (maximum t = 3.4). Figure 4C shows individually defined, visuo-haptic, object-selective brain regions on an anatomically averaged brain of all 13 participants (see Table 1 for the Talairach coordinates of individual ROIs). Each color patch represents an individual's LOtv ROI. Among the 13 participants, LOtv ROIs were found in 10 participants in the left hemisphere and in seven participants in the right. Averaged percent BOLD signal changes for unisensory objects and textures are shown in Figure 5 (for descriptive purposes only).
|Talairach Coordinates (x, y, z)|
|P1||−43, −60, 10||54, −59, 5|
|P3||–||59, −53, 8|
|P4||−51, −55, −6||52, −50, −1|
|P6||Excluded because of excessive head motion|
|P7||−45, −49, −4||42, −55, −2|
|P8||−52, −55, −5||47, −59, −3|
|P9||−48, −64, −2||52, −70, −2|
|P10||−43, −64, 7||36, −59, 2|
|P11||−42, −50, −1||–|
|P12||−43, −53, 0||–|
|P13||−43, −67, 17||–|
|P14||−46, −61, 2||–|
|Talairach Coordinates (x, y, z)|
|P1||−43, −60, 10||54, −59, 5|
|P3||–||59, −53, 8|
|P4||−51, −55, −6||52, −50, −1|
|P6||Excluded because of excessive head motion|
|P7||−45, −49, −4||42, −55, −2|
|P8||−52, −55, −5||47, −59, −3|
|P9||−48, −64, −2||52, −70, −2|
|P10||−43, −64, 7||36, −59, 2|
|P11||−42, −50, −1||–|
|P12||−43, −53, 0||–|
|P13||−43, −67, 17||–|
|P14||−46, −61, 2||–|
Subsequent to locating each individual subject's ROIs, BOLD time courses from the experimental runs were extracted from their respective ROIs. Figure 6 shows the BOLD percent signal change data as a function of sensory modality and task difficulty in left and right LOtv. A three-way 3 × 2 × 2 repeated-measures ANOVA was performed using an alpha level of .05, and the sphericity assumption for within-subjects ANOVA was tested using Mauchly's test. Missing values, which occurred for subjects for whom either the left or right LOtv could not be properly localized, were replaced using a standard missing values analysis approach. In all cases, the best fitting parameter was the mean value across the other subjects for that same hemisphere. Thus, all cases of missing values were systematically replaced with this mean value. Under the assumption of sphericity, the ANOVA showed significant effects of Sensory Modality [F(2, 20) = 24.94, p < .001] and Task Difficulty [F(1, 10) = 12.44, p = .005] on BOLD activation, but no effect of Brain Region [F(1, 10) = 2.65, p = .135] on BOLD activation between left and right LOtv. No interactions were found between Modality and Difficulty [F(2, 20) = 2.11, p = .148], Brain Region and Modality [F(2, 20) = 3.42, p = .053], and Brain Region and Difficulty [F(1, 10) = 4.14, p = .069], but a significant three-way interaction was found among Modality, Difficulty, and Brain Region [F(2, 20) = 3.69, p = .043]. Post hoc t tests were performed without replacing missing values and significant differences in BOLD activation were found between low- and high-difficulty levels only in left LOtv in the V [t(9) = 2.23, p = .022] and H [t(9) = 2.45, p = .015] conditions. The results demonstrate that changes in stimulus similarity (and thus difficulty) had the predicted influence on BOLD activation in the left LOtv, but not the right LOtv. In the left LOtv, increased similarity led to a significant decrease in BOLD activation (or effectiveness) with the unisensory conditions, but not with the multisensory condition. Comparing levels of effectiveness across the levels of difficulty and across the unisensory and multisensory conditions showed a pattern in the left LOtv that is similar in direction to the principle of inverse effectiveness seen previously with manipulations of stimulus quality. Effectiveness in the right LOtv did not show reliable effects based on changes in similarity with the unisensory or multisensory conditions.
To compare the strength of inverse effectiveness across hemispheres more directly, we calculated a difficulty effect metric based on differences in BOLD activation between low- and high-difficulty conditions (Figure 7). A two-way 2 × 2 repeated-measures ANOVA with Difficulty Effect as the dependent measure and Hemisphere (left and right) and Modality (unisensory and multisensory) as independent factors was performed. Unisensory activation was calculated as the sum of the two unisensory conditions (Stevenson et al., 2009, for discussion). A significant interaction was found between Brain Region and Modality [F(1, 10) = 6.62, p = .028]. Post hoc t tests showed that ΔVH was significantly less than the sum of ΔV and ΔH in left LOtv [t(9) = 2.14, p = .026], but not in right LOtv [t(6) = .164, p = .436]. In other words, multisensory gain in left LOtv increased with decreasing effectiveness, which is consistent inverse effectiveness. Right LOtv, however, showed no evidence of differential changes in effectiveness between unisensory and multisensory conditions (i.e., no evidence of inverse effectiveness or enhanced effectiveness).
Group Whole-brain SPM Analysis
The ROI analysis on area LOtv provided a targeted assessment of brain activation and its relation to the principle of inverse effectiveness in that one functionally defined area. To assess how brain activation more generally matched a pattern resembling inverse effectiveness, a group-averaged, whole-brain, random effects GLM was performed on data from the experimental runs from 13 participants. Cortical regions demonstrating a pattern similar to inverse effectiveness were defined with a conjunction of four GLM contrasts. The first contrast, all conditions > rest, was included to limit the search to voxels with sensory responses. The second and third contrasts, (VLow > VHigh) and (HLow > HHigh) were included to limit the search to voxels that showed significant changes in activation with changes in object similarity. The fourth contrast, (VLow > VHigh) + (HLow > HHigh) − (VHLow > VHHigh), assessed the remaining voxels for a pattern of activation that resembled inverse effectiveness. Each of the four resulting contrast maps was thresholded using a voxel-wise p value of .04 (t = 2.3) and a cluster size threshold of 10 voxels. When combined using conjunction (logical and operation across contrasts/maps), this produced an equivalent voxel-wise p value of 2.5 × 10−6 (assuming the four contrasts are independent), which is slightly more liberal than the Bonferroni-corrected voxel-wise p value. Clusters that passed all four significance tests were found in the anterior aspect of the intraparietal sulcus (aIPS; x, y, z: −49, −26, 44), the premotor area (x, y, z: −49, 5, 24), and the posterior insula/parietal operculum (x, y, z; −32, −30, 19), all in the left hemisphere (Figure 8).
We assessed multisensory neuronal convergence in visuo-haptic object-selective brain regions using the principle of inverse effectiveness in fMRI BOLD signals. We predicted that multisensory gain would increase as the effectiveness of unisensory stimuli decreased based on the principle of inverse effectiveness. With the stimulus salience held constant, the similarity of the stimuli was manipulated to influence behavioral difficulty and neural effectiveness across unisensory (V, H) and multisensory (VH) conditions. Independent visuo-haptic object localizer scans were used to find the LOtv. In the left LOtv, the multisensory gain increased with increasing object similarity (reduced effectiveness). A whole-brain analysis found that the aIPS, premotor area, and posterior insula/parietal operculum of the left hemisphere showed the same activation pattern as left LOtv. The results demonstrate the first evidence of a pattern of BOLD activation resembling inverse effectiveness during visuo-haptic multisensory object perception. The presence of inversely effective activation implies neuronal convergence of visual and haptic object information in these cortical areas.
The effect of inverse effectiveness was predicted in the current study despite the fact that in a previous study (Kim & James, 2010) by our own group found the opposite effect, enhanced effectiveness. Although the pattern of brain activation in the previous study and the current study were different—the previous study found enhanced effectiveness and the current study found inverse effectiveness—both results imply the presence neuronal convergence. Thus, the previous and current results are consistent in suggesting that LOtv and other visuo-haptic object-selective cortical areas are sites of neuronal convergence during visuo-haptic object recognition.
It was suggested that the enhanced effectiveness was found in the previous study because of spatial and temporal incongruencies between the visual and haptic stimulus presentations brought about by the stimulus presentation procedures. It has been shown in behavioral studies that spatial congruency and temporal synchrony are important factors that influence visuo-haptic multisensory integration (Helbig & Ernst, 2007; Gepshtein, Burge, Ernst, & Banks, 2005). In the current study, the procedures for presenting visual and haptic stimuli were designed deliberately to enhance the amount of spatial and temporal congruency. Although, the influence of spatial and temporal congruency was not tested directly, the discrepancy of the findings in the previous and current studies suggests that visual-haptic object-selective brain regions are highly sensitive to changes in these factors. Furthermore, it serves to highlight the importance of the naturalness and ecological validity of the stimulus presentation procedures when studying multisensory phenomena.
Although the principle of inverse effectiveness has been widely employed to investigate the multisensory integration in non-human animals (Perrault et al., 2005; Meredith & Stein, 1983) and more recently in humans (James et al., in press; Kim & James, 2010; Stevenson & James, 2009; Werner & Noppeney, 2009; Kayser et al., 2005), a few potential issues in the application of the principle of inverse effectiveness have been raised. Several concerns are (1) the post hoc conditionalization based on effectiveness that could lead to regression to the mean, (2) the recruitment of unisensory neurons at the floor and ceiling levels of responsiveness that could lead to immeasurable multisensory responses, and (3) the use of a relative measurement of multisensory integration that could lead to a higher chance of the presence of inverse effectiveness (please see Holmes, 2007, 2009, for discussion). These potential issues, however, have been avoided through our selection of experimental design of the current study by (1) using a priori experimental factors, (2) employing only middle range effectiveness levels that produce no floor or ceiling effects, and (3) using both relative and absolute measures.
Both the whole-brain analysis and the ROI analysis of the current study revealed significant inverse effectiveness in the left hemisphere only. This effect was most striking in the ROI analysis, where it was shown that the difficulty had little effect on activation in the right hemisphere ROI. Although bimodal visuo-haptic activation tends to be found bilaterally in most individuals, there is growing evidence that when bimodal activation is not found bilaterally, it is usually found in the left hemisphere (Kim & James, 2010; James, Servos, Kilgour, Huh, & Lederman, 2006; Kilgour, Kitada, Servos, James, & Lederman, 2005; Grefkes et al., 2002; Banati, Goerres, Tjoa, Aggleton, & Grasby, 2000). Similarly, a recent fMRI adaptation study showed bilateral visuo-haptic repetition suppression effects in LOtv and aIPS, however, that the suppression effects were stronger in the left hemisphere than the right (Tal & Amedi, 2009).
Not all of the evidence supports the hypothesis of a left hemisphere bias for visuo-haptic convergence, for instance, some researchers have found bimodal activation in right insula (Hadjikhani & Roland, 1998) and in right lateral occipital complex (Stilla & Sathian, 2008; Prather, Votaw, & Sathian, 2004). The same studies have, however, shown bimodal activation in left IPS (Stilla & Sathian, 2008; Prather et al., 2004; Grefkes et al., 2002), which is consistent with our results. In addition to the left IPS, our finding of left insula in visuo-haptic multisensory integration is supported by an earlier PET study that showed left lateralized insula activation for visuo-tactile multisensory integration (Banati et al., 2000). It should be noted, however, that all of these studies examined multisensory areal convergence, not necessarily multisensory neuronal convergence. The current study found multisensory areal convergence in both hemispheres similar to the previous studies but found multisensory neuronal convergence, as demonstrated by the presence of inverse effectiveness, only in the left hemisphere. This result is similar to a recent fMRI adaptation study that also aimed to examine visuo-haptic neuronal convergence (Tal & Amedi, 2009) and which found stronger neuronal convergence in the left hemisphere than right hemisphere.
Considering that most fMRI studies of haptic exploration have right-handed participants palpate with their right hand, left lateralization of multisensory neuronal convergence could simply be considered a consequence of the contralateral representation of right-handed exploration. However, several previous studies have demonstrated that the hand of use during haptic exploration does not influence activation in higher-level cortical areas such as the lateral occipital complex. Amedi and colleagues (2010) compared left- and right-handed palpation during tactile exploration of objects and showed that the activation in LOtv was bilateral, irrespective of the hand of use. Furthermore, left-lateralized activation has been found in other studies where participants explored objects with either their left hand (James et al., 2006; Kilgour et al., 2005) or with both hands (Kim & James, 2010). In summary, previous studies combined with the current study suggested that visuo-haptic object-selective activation in the LOtv and possibly other brain regions is generally bilateral but may be biased to be stronger or more reliable in the left hemisphere than in the right. All of the previous studies, however, used right-handed participants; therefore, although it seems clear that the hand-of-use does not contribute to the bias, it is possible handedness may. Further study is needed to test this possibility.
In the current study, analysis of the object-selective localizer data found significant voxel clusters in the location of LOtv, but not in the location of IPS. Our previous study found significant voxel clusters in both regions using a similar statistical contrast (Kim & James, 2010). The IPS has been suggested to be a site of multisensory convergence for visuo-haptic object recognition (Stilla & Sathian, 2008; James et al., 2007; Zhang et al., 2004; Grefkes et al., 2002; Culham & Kanwisher, 2001) and to be involved in processing visual shape information particularly for visually guided reaching and grasping (James, Culham, Humphrey, Milner, & Goodale, 2003). There are several possible reasons why significant bimodal object-selective activation was not found in IPS in the current study. First, in the current study, subjects viewed tangible objects directly, whereas in previous studies, subjects viewed pictures of objects. Second, the size (visual angle of 2.57° × 2.57°) of tangible stimuli used in the current study was smaller than the size of the pictures used in our previous study (visual angle of 12° × 12°). Third (and the most likely), participants' hand movements were more restricted in the current study compared with the previous study. Participants were trained to make an almost automatic finger movement to the stimulus. By contrast, in most previous studies, participants were required to perform a ballistic reaching or grasping movement to the object with one or both hands to begin exploration. The lack of a need to action planning may have limited the recruitment of IPS in the current study, relative to previous studies. Although IPS was not found in the localizer data, a group-averaged SPM from the experimental data revealed the involvement of aIPS in visuo-haptic neural convergence of shape information.
Besides the involvement of LOtv and IPS, the cerebellum has been also found to be involved in multisensory visual and haptic object recognition in some human neuroimaging studies (Gentile et al., 2011; Naumer et al., 2010; Stevenson et al., 2009). There is also growing evidence over the last few decades indicating that the cerebellum plays a role in perception and cognition, not merely in motor control (Strick, Dum, & Fiez, 2009; Gao et al., 1996). Neither the ROI analysis, nor the whole-brain analysis, however, showed evidence of cerebellar involvement in the current study, implying a potential discrepancy between the underlying multisensory networks recruited in the previous studies and the current study. Further study is certainly needed to investigate the precise function of cerebellum in visuo-haptic multisensory object recognition and multisensory integration. One speculative explanation for the discrepancy between our study and previous studies is the involvement of a deliberate hand operation during exploration of stimuli. Participants in the previous studies (Gentile et al., 2011; Naumer et al., 2010; Stevenson et al., 2009) used their whole hand to palpate the objects or viewed their whole hand while part of it was touched by an object. In contrast, participants in the current study palpated the object with one finger and were constrained to using the same, repetitive, rather automatic sweeping movement on all trials throughout the experiment. If the cerebellum-related activity during visuo-haptic multisensory processing in other studies is related to coordination of sensory input of the body (in this case the hand) and sensory inputs of other objects, then the cerebellum may not have been recruited differentially in the current study, because the finger movement aspects of the study were so closely controlled across conditions.
Some studies have shown that eyes-opened and eyes-closed states without external stimulation have a different impact on brain activation patterns in sighted (Marx et al., 2003, 2004) and blind subjects (Hufner et al., 2009), suggesting that the choice of state as rest condition may lead to different interpretations of results. According to these studies, the eyes-closed state enhances brain activation in various sensory areas including visual, somatosensory, auditory, and vestibular systems, whereas the eyes-opened state enhances attentional and ocular motor system activities. Because participants in the current study had eyes opened or closed, depending on the condition (eyes closed in H condition; eyes opened in V and VH conditions), the changes of state may have been a confounding factor. The choice of rest condition, however, stayed consistent throughout the whole session in the current study, and all experimental conditions were compared with the same type of rest condition, eyes-closed. In addition to the homogeneous rest state, having object-selective brain regions selected by subtracting the texture condition from the object condition in the localizer runs should have canceled out the effect of rest condition in the end. Although it is possible that H condition with eyes closed may have induced increased BOLD activation in visual and somatosensory cortical areas during that condition compared with V or VH condition with eyes open, the effect is not seen in percent change of BOLD signal in Figure 6. H conditions did not produce increased BOLD activation in either low- or high-difficulty conditions compared with V and VH conditions. Hence, the state of the eyes did not seem to have a considerable impact on the interpretation of our results.
In conclusion, the neural substrates involved in visuo-haptic neuronal convergence were investigated using an additive-factors design. An ROI analysis on the object-selective brain regions that responded more to both visual and haptic objects than to textures found evidence of inverse effectiveness in the left LOtv. A whole-brain analysis also found evidence of inverse effectiveness in aIPS, premotor, and posterior insular cortices of the left hemisphere. This study is the first evidence of inverse effectiveness in the human brain with visuo-haptic object recognition.
This research was supported in part by the Indiana METACyt Initiative of Indiana University and funded in part by a major grant from the Lilly Endowment, Inc. and by a grant to T. W. James from Indiana University's Faculty Research Support Program administered by the Office of the Vice Provost for Research. We also gratefully acknowledge Daniel Eylath for stimulus presentation; Thea Atwood and Rebecca Ward for their technical support; Karin Harman James and the Indiana University Neuroimaging Group for their insights on this study; and June Yong Lee and Laurel Stevenson for their support.
Reprint requests should be sent to Sunah Kim, 360 Minor Hall, University of California, Berkeley, Berkeley, CA 94720, or via e-mail: firstname.lastname@example.org.