Abstract

Two fMRI experiments explored the neural substrates of a musical imagery task that required manipulation of the imagined sounds: temporal reversal of a melody. Musicians were presented with the first few notes of a familiar tune (Experiment 1) or its title (Experiment 2), followed by a string of notes that was either an exact or an inexact reversal. The task was to judge whether the second string was correct or not by mentally reversing all its notes, thus requiring both maintenance and manipulation of the represented string. Both experiments showed considerable activation of the superior parietal lobe (intraparietal sulcus) during the reversal process. Ventrolateral and dorsolateral frontal cortices were also activated, consistent with the memory load required during the task. We also found weaker evidence for some activation of right auditory cortex in both studies, congruent with results from previous simpler music imagery tasks. We interpret these results in the context of other mental transformation tasks, such as mental rotation in the visual domain, which are known to recruit the intraparietal sulcus region, and we propose that this region subserves general computations that require transformations of a sensory input. Mental imagery tasks may thus have both task or modality-specific components as well as components that supersede any specific codes and instead represent amodal mental manipulation.

INTRODUCTION

Many people experience auditory imagery in everyday life, such as imagining the sound of a familiar voice or the roar of the crowd during a remembered sports event. Musical auditory imagery is particularly common, as anyone having experienced a “tune stuck in the head” will attest to. However, musical imagery, often referred to as “audiation” by music educators, is also an important and useful skill for musicians. It has been shown, for instance, that brass, wind, and string players as well as singers imagine the pitch of an upcoming entrance to facilitate tuning (Trusheim, 1991). Conductors and arrangers who study scores in silence also must imagine pitches, as well as timbre, rhythm, etc. (Mountain, 2001). Better audiation has been related to higher musical skill, shown both by correlational (Highben & Palmer, 2004; Brodsky, Henik, Rubinstein, & Zorman, 2003) and training studies (Humphreys, 1986). In addition to general audiation skills, musicians may use auditory imagery specifically for mental practice. Several studies with musicians (pianists, guitarists, and singers) showed a benefit of mental practice over a nonpractice control condition (Pascual-Leone, 2003; Theiler & Lipppman, 1995; Lim & Lippman, 1991; Coffman, 1990).

Musical imagery may be utilized not only for representing or experiencing previously heard events but also, and perhaps more importantly, for generating novel musical structures and for evaluating what novel combinations of, or transformations of, previously experienced auditory events might sound like. Such skills are obviously necessary for the creation of new music; many composers and songwriters compose directly by writing down their musical thoughts based on their imagery. Perhaps the most extreme example of this phenomenon is provided by composers such as Beethoven or Smetana, who despite becoming deaf later in their lives nonetheless continued composing extraordinary musical works. An ability to manipulate as well as represent new sounds would clearly be necessary in such circumstances.

The neural underpinnings of musical imagery have so far been studied largely in the context of tasks requiring imagery for previously experienced musical materials but not for manipulations of such material. The present study sought to fill this gap. Given the evidence that good imagery is associated with good musicianship, we reasoned that trained musicians would be the appropriate sample in which to explore this phenomenon. Prior functional neuroimaging studies have found that auditory imagery is associated with activation of secondary auditory cortex as well as SMA and inferior frontal areas (for a review, see Zatorre & Halpern, 2005). These studies have used tasks such as judging the relative height of the pitches of two lyrics in an imagined song (Zatorre, Halpern, Perry, Meyer, & Evans, 1996), imagining the continuation of a familiar melody given its opening notes (Halpern & Zatorre, 1999) or during gaps in an ongoing melody (Kraemer, Macrae, Green, & Kelley, 2005), and comparing the similarity of the timbres of two imagined instruments (Halpern, Zatorre, Bouffard, & Johnson, 2004). Auditory cortical involvement is also indicated by MEG data showing an early preattentive response to unexpected incorrect continuations of imagined melodies (Herholz, Lappe, Knief, & Pantev, 2008). Finally, the role of auditory cortex was supported by lesion data showing deficits in musical imagery following excision of right temporal cortex (Zatorre & Halpern, 1993).

All of these prior musical imagery studies have used relatively straightforward tasks involving retrieval and/or comparison of auditory information. The manipulation of auditory information would presumably require additional processes beyond retrieval, such as maintaining and monitoring the information in working memory and performing computations associated with transforming the sounds. Such work has not yet been done in the auditory domain, but there is relevant research in vision. For example, some investigators (Kozhevnikov, Kosslyn, & Shepard, 2005) found that the ability to call up visual images holistically is behaviorally and psychometrically distinct from the ability to manipulate those images (artists excelling at the former and engineers at the latter). These two types of abilities also show different neural signatures. In particular, a great deal of research with visual stimuli shows distinct contributions from dorsolateral frontal and posterior parietal cortices for monitoring versus manipulation, respectively (for a recent and particularly clear demonstration of this dissociation, see Champod & Petrides, 2007).

To challenge musicians with a mental manipulation task, we devised a novel task that required mental time reversal of tunes. Listeners were presented with the first few notes of a familiar tune (Experiment 1) or the title of a familiar tune (Experiment 2) followed by another string of notes; that second string was either an exact reversal (retrograde, to use the musical term) of the target tune or a false reversal, in which several of the latter notes in the string were incorrect. The task was to reverse mentally the second string and to respond whether the reversed string matched the target item exactly or not. The task clearly has a large working memory requirement because six to eight notes of the reversal have to be retained, but more importantly for our purposes, it has a critical manipulation requirement because the tones in the second string have to be reversed mentally before a correct answer can be obtained. We assume that because the to-be-reversed strings were all derived from highly familiar tunes, the listener would need to compare the contents of working memory to the long-term memory trace.

The tune reversal task can be seen as an auditory analogue of visual mental rotation: Instead of rotation in space, the musical task requires reversal in time. Mental rotation is a well-established way of capturing the kind of active manipulation ability that we were seeking in the auditory domain. A number of neuroimaging studies have looked at two- and three-dimensional visual mental rotation tasks. Among the most commonly activated areas are the parietal lobe (often the intraparietal sulcus; IPS), most often on the right, but sometimes bilaterally (Zacks, 2008; Koshino, Carpenter, Keller, & Just, 2005; O'Boyle, 2005; Harris et al., 2000; Alivisatos & Petrides, 1997). These findings from the visual domain led to the hypothesis that, unlike previous auditory imagery studies that did not require manipulation of a mental image, we would also see parietal-lobe activation. Support for this hypothesis comes from a recent fMRI study that used reversed word strings and that asked listeners to reverse those strings mentally to identify the parent word (Rudner, Rönnberg, & Hugdahl, 2005). Activation in the word reversal task, minus a control of a rhyme judgment task, revealed involvement of posterior parietal lobes bilaterally, as well as right inferior frontal cortex.

Another prediction was that superior temporal regions would be recruited, if, as claimed in previous studies, auditory imagery depends critically on activity in auditory cortex. We also predicted activation of the SMA, as it has been a consistent finding in all our previous studies on auditory imagery. Although none of our previous studies has required or even allowed motor activity except for button presses, we have found the SMA (or in some cases pre-SMA as well) to be active in both pitch and timbre tasks and whether the queried tune has lyrics or not. This aspect of the motor network may support or at least reflect interactions with the parts of the auditory system involved in imagery, similar to interactions found between the auditory perception and the motor system (for a review, see Zatorre, Chen, & Penhune, 2007). Whether it plays a specific role in mental transformation is unknown, however. Finally, due to the working memory component of the task, which is greater than that of previous, simpler tasks, we predicted that dorsolateral frontal regions would be recruited in keeping with the literature in other working memory domains, such as visual and verbal, in which monitoring and manipulation of stimulus information is important (Champod & Petrides, 2007; Petrides, 2005; Postle, Berger, & D'Esposito, 1999; Smith & Jonides, 1998).

METHODS

EXPERIMENT 1

Subjects

Volunteers were 12 (7 women) healthy right-handed young adults (mean age = 23 years), all of whom had had a minimum of 8 years of formal musical training (mean = 16 years). All subjects gave written informed consent to participate according to guidelines approved by the Montreal Neurological Institute (MNI) Research Ethics Board.

Questionnaires

Subjects were administered two auditory imagery questionnaires: The Bucknell Auditory Imagery Questionnaire (BAIS) has been used in several of our studies to assess vividness and ability to control auditory imagery of voices, music, and environmental sounds. Respondents rate their ability to imagine vividly sounds such as a children's choir or a cheering crowd at a baseball game and to change the sounds at will. Answers were given on a 1 to 7 scale, with higher ratings meaning more vivid or more easily controlled imagery. Items are averaged within a subscale. The Audiation questionnaire was devised for this study and asked for self-ratings on ability to carry out various musical tasks involving auditory imagery, such as comparing and contrasting two musical themes in one's head. Items were rated on a scale ranging from 1 = strongly agree to 6 = strongly disagree, with lower ratings meaning more self-rated success at the task. We will report an overall score and also scores on the two items that ask about ability to carry out musical retrogrades in one's head. Subjects were also asked if they used visual imagery strategies during the task at all.

Apparatus

Scanning was conducted with a Siemens Magnetom Trio 3-T MR scanner using an eight-channel high-resolution head coil array. Auditory stimuli were presented via a high-fidelity system designed for the MR environment (MR Confon, Magdeburg, Germany). The headphones contained electrodynamic transducers for a broad, flat frequency response and construction-grade Peltor earmuffs for passive damping of gradient noise. For each subject, the loudness was adjusted to a comfortable hearing level by playing a sample tune stimulus continuously. All responses involved a button press with the right hand from a scanner-compatible optical response pad.

Stimuli

We used the opening notes of 24 familiar tunes in the experiment. All tunes had been rated as being highly familiar in previous experiments and included children's tunes, Christmas carols, and movie themes. Half had lyrics associated with them (e.g., “White Christmas”) and half did not (e.g., theme from “Pink Panther”); but all stimuli used consisted only of tones. Pilot testing ensured that all tunes were highly identifiable from their first five to eight notes and also that all phrases could be reversed without sounding like another familiar tune. Tunes were synthesized in piano timbre using a MIDI interface and ranged in duration from 1.9 to 4.9 sec (mean = 3.4 sec).

For each tune, we created a reverse version, which was simply the notes in reverse temporal order, without changing the durations (rhythmic values). For each reverse tune, we also created a false reverse. A false reverse changed one (or, for some trials, two notes) of each true reverse tune. The task was to judge whether the second string was a real or a false reverse of the first string. The changed notes were always in key and never the first one or two sounded notes (so that listeners would need to consider all or most of the pattern to make a true–false decision) and were always drawn from the same pitch range as the target melody.

Finally, we created a control item for each tune, which consisted of the notes of the false reverse strings permuted into a random order. These strings were not identifiably similar to the parent string but maintained the same pitch and time values. All four types of auditory stimuli were equalized for energy content.

Conditions

The experiment consisted of four conditions: silence, forward, reverse, and control. The forward and the reverse conditions were always presented in a pair, in that order (see Figure 1); thus, each tune excerpt presented in one frame was paired with either its correct or incorrect reversal in the following frame. In all conditions except silence, the presentation of a stimulus occurred 3 sec following the start of the previous frame acquisition. The end of the stimulus was followed by a mean of approximately 5.6 sec of silence before the next acquisition, based on the mean stimulus duration of 3.4 sec (because the stimuli varied in length by a maximum of 3 sec, the range of silences following the end of each stimulus was 4.1–7.1 sec). The position of the stimulus within the long repetition time (TR) frame (Figure 1) was selected so as to maximize the likelihood that the scan would be sensitive to the mental reversal and to minimize the likelihood that the scan would be sensitive to the presence of the stimulus itself. This was desirable because we wanted to pick up activation associated with imagery and not driven by the physical auditory event per se.

Figure 1. 

Timeline of events in Experiment 1. The target, a real melody (in the example, “Greensleeves”), was sounded in the first frame, followed always in the second frame by a comparison that was either a true or an incorrect temporal reversal of that melody (the item illustrated is a correct reversal). The mental reversal, depicted by the “cloud,” occurs sometime between the end of the reversed melody presentation and the scan. Acquisitions were spaced (TR = 12 sec) so as to capture neural activity associated with processing the cue and doing mental transformations, but not activity associated with simply hearing the melody. The histogram below depicts the distribution of response times to the task, indicating that responses occurred in the predicted time frame, and the dotted line illustrates the presumed hemodynamic function associated with performing the mental reversal task, based on the modal value of the response time distribution.

Figure 1. 

Timeline of events in Experiment 1. The target, a real melody (in the example, “Greensleeves”), was sounded in the first frame, followed always in the second frame by a comparison that was either a true or an incorrect temporal reversal of that melody (the item illustrated is a correct reversal). The mental reversal, depicted by the “cloud,” occurs sometime between the end of the reversed melody presentation and the scan. Acquisitions were spaced (TR = 12 sec) so as to capture neural activity associated with processing the cue and doing mental transformations, but not activity associated with simply hearing the melody. The histogram below depicts the distribution of response times to the task, indicating that responses occurred in the predicted time frame, and the dotted line illustrates the presumed hemodynamic function associated with performing the mental reversal task, based on the modal value of the response time distribution.

In the forward condition, the subject's task was to listen to and retain the familiar tune excerpt until the next stimulus was presented in the following frame, the reverse condition, at which point he or she was asked to reverse this second string of tones mentally so as to recover the original familiar melody and then respond whether it was the correct or incorrect reversal. Each of the 24 familiar tunes in the forward/reverse condition pairs was presented in two separate blocks, once paired with the correct reversal and the other with an incorrect reversal, randomly ordered, for a total of 96 trials. The control condition, in which the subject was simply instructed to listen, consisted of three blocks of eight stimuli each for a total of 24 trials. In the full run, the forward/reverse (FR) and control (C) blocks were ordered in the following way: C, FR, C, FR, and C. The order of stimuli within each block was randomized. Also, 12 silence trials were randomly interspersed among the five stimulus blocks. A short verbal instruction was presented to indicate the start of each block. Subjects were instructed to keep their eyes closed during the whole experiment. The complete scanning session for each subject consisted of two runs of 138 trials each. Audio examples of the stimuli and the task are available at www.zlab.mcgill.ca.

Before the MR scanning session, listeners were familiarized with the experimental tasks in three ways. First, they were tested in the laboratory for purposes of screening several days ahead of their fMRI scanning session on a forward/reverse task consisting of 48 pair presentations similar to the ones presented in the scanner, including similar temporal parameters. Only subjects showing performance above 80% were retained for the experiment. (Eighteen potential subjects were initially tested, of whom four performed below this cutoff hence were not invited to participate; of the remainder, one was excluded because he was left-handed whereas another was not available for scanning.) Second, immediately before the scanning session, volunteers were presented with an abbreviated version of the laboratory test consisting of 10 pair presentations, thus ensuring that the task and the stimuli were fresh in their minds. Third, subjects were informed of the ordering of the task blocks before the experiment.

fMRI Scanning Protocols and Analysis

First, a high-resolution anatomical scan was obtained for each subject (voxel size = 1 × 1 × 1 mm). Second, two series of echo-planar images of BOLD signal were acquired, each in 40 slices using a 64 × 64 × 40 matrix (voxel size = 3.5 × 3.5 × 3.5 mm) aligned with Sylvian fissure and covering the whole brain. BOLD signal images were smoothed using a 12-mm Gaussian kernel, corrected for motion within runs, and transformed into standard stereotaxic space using in-house software (Collins, Neelin, Peters, & Evans, 1994). Clustered volume acquisitions and sparse sampling were used, such that each volume (40 slices) was acquired in 2.4 sec, and the TR was set at 12 sec. The long interacquisition time ensured low signal contamination by noise artifacts related to image acquisition (Belin, Zatorre, Hoge, Evans, & Pike, 1999; Hall et al., 1999). We positioned the auditory stimuli within the frame at 3 sec, which we reasoned would be approximately at the optimal position to capture the mental reversal but far enough away from the onset of scanning (9 sec later) so as to avoid any contamination of an imagery response with a hemodynamic signal elicited by the presence of the acoustic input itself (Figure 1). To verify this latter assumption, several additional trials were included for three subjects where the control stimulus was presented at intervals varying from 3.5 to 9 sec before the onset of scanning, which allowed us to reconstruct the hemodynamic function for auditory activation.

Image analyses were performed with fMRISTAT, which involves a series of MATLAB scripts that utilize the general linear model for analyses (Worsley et al., 2002). The general linear model Y = + ɛ expresses the response variable (BOLD signal) Y in terms of a linear combination of the explanatory variable (condition) X, the parameter estimates (effects of interest) β, and the error term ɛ. Temporal drift is modeled as cubic splines and then removed because it can be confounded with the BOLD response. The first MATLAB script fmridesign created the design matrix within each run, where each column contained the explanatory variables and each row represented a scan. The program fmrilm then fitted the linear model with the fMRI time series and solved for the parameter estimates β with a least-squares solution, yielding estimates of effects, standard errors, and t statistics for each contrast and for each run. An effect of interest was specified by a vector of contrast weights that gave a weighted sum of parameter estimates referred to as a contrast. We entered our planned comparisons into the analyses, for example, the reverse versus forward or the reverse versus silence contrasts. The third MATLAB program multistat combined runs together within subjects. Before group statistical maps for each contrast of interest were generated, in-house software was used to linearly transform anatomical and functional images from each subject into standard MNI/Talairach stereotaxic coordinate space, using the MNI 305 template (Collins et al., 1994; Talairach & Tournoux, 1988). A mixed-effects linear model was subsequently specified in multistat to average data across subjects; the data were smoothed with a Gaussian filter so that the ratio of the random-effects variance divided by the fixed-effects variance results in approximately 100 degrees of freedom. The threshold for significance was set at t = 4.55 p < .05 for a whole-brain search using a discrete local maxima approach (Worsley, 2005). Values below this threshold are reported in cases where there is evidence for bilateral activation (i.e., we report subthreshold activity when a homologous region in the opposite hemisphere reaches significance) or to test a priori hypotheses concerning the auditory cortices. In such cases, we report values of t > 3.00 corresponding to an uncorrected t value of p < .0017. Finally, the skull and other nonbrain tissue were masked out of the anatomical MRI scans, and this same mask was applied to the functional data for display purposes.

Behavioral Control Task

To collect additional behavioral evidence in favor of the hypothesis that mental reversal was occurring, we implemented a control task conducted outside of the scanner, which was run on seven additional listeners who met the same criteria as in the fMRI study. We reasoned that if people truly are mentally reversing, then when an incorrect note is found relatively early, they should respond more quickly than when an incorrect note is found in toward the end. That is, if the notes of the target melody and its reversal correspond to 1,2,3,4,5,6,7:7,6,5,4,3,2,1, then responses should be faster if the probe note is at Position 2 than at Position 5 of the reversal (second string) because Note 2 would be encountered first if one starts at the end and scans backward toward the beginning. Trials with no incorrect notes should take longest of all, by the same logic. To this end, we selected the 19 most familiar tunes from the original set and created new versions, in which only a single incorrect note was present. This incorrect note could either occur in Position 2 or 3 or in Position 5 or 6. This yielded three trial types, early, late, or correct reversals. All of these were presented in a random order to listeners using parameters similar to those used in the screening task described earlier, except that the time between forward and reverse stimuli was set to 10 sec. Thus, on each trial, subjects heard the target melody and simultaneously saw its title, followed by the reversed stimulus. Subjects were instructed to mentally reverse the second item to determine if it was exactly correct or contained one wrong note and to respond as soon as they detected the incorrect note.

Results

Behavioral data

In keeping with their pre-MRI screening test performance, all subjects performed the task very well during scanning (Figure 2) for both Runs 1 and 2, with values ranging from 71% to 100% correct on any given run. Analysis of response times (measured from stimulus onset) across the different trial types showed that subjects took slightly but significantly (p < .05) more time to respond to the true reverses (mean across runs = 4.83 sec) than to the false reverses (mean = 4.60 sec), presumably because participants were truly reversing the strings and could make a negative decision for false items as soon as the first incorrect note was detected (see additional evidence from control task). In addition, response times for true and false reversals across items (measured from stimulus offset to account for different durations of target items) were significantly correlated (r = .49, p < .01), suggesting that both true and false reversals were processed similarly for each tune (it was thus unlikely that false reversal items were easy to dismiss without mental reversing them because of salient incorrect notes). Finally, the response times measured on-line set an upper limit on when the mental events of interest were occurring; the distribution of these response times across all stimuli during scanning (shown in the histogram in Figure 1) brackets an interval which indicates that the mental reversal was occurring within the appropriate time frame with respect to the scan acquisition, which happened, on average, 4.7 sec after the response. Thus, the cognitive event of interest was likely well sampled by TR parameters selected, if one assumes a typical hemodynamic lag on the order of 5 sec, which is roughly what we obtained in the control data shown in Figure 4 (see below).

Figure 2. 

Performance on the behavioral tasks in Experiments 1 and 2, collected in the laboratory, and during fMRI scanning (presented separately for each of the two runs).

Figure 2. 

Performance on the behavioral tasks in Experiments 1 and 2, collected in the laboratory, and during fMRI scanning (presented separately for each of the two runs).

Behavioral control task

Average percent correct for this task was 84.8%. For calculation of response times, we excluded values from any trials that a given listener did not get correct or from any tunes they judged to be unfamiliar. The mean response times for Positions 2/3 and 5/6 and the correct reversals were 3.79, 4.81, and 4.88 sec, respectively. An ANOVA showed a significant effect, F(2, 18) = 4.45, p = .02, indicating that the response times to Position 2/3 were significantly faster than those to either the Position 5/6 or the correct reversals, as predicted. The response times were also in keeping with the times measured in the main study.

Questionnaire data

On the BAIS, the participants indicated that their auditory imagery was typically quite vivid (M = 5.48, SD = 0.77) on a 7-point scale and that they could change it at will (M = 5.56, SD = 0.83). These two measures correlated positively, r(10) = .74, p < .05. On the audiation questionnaire, the mean over all items was 1.91 (SD = 0.44) on a 6-point scale, indicating moderate agreement that the respondents could carry out the musical imagery tasks described. Two items most similar to the current task asked for how well the person could mentally reverse short familiar tunes (M = 2.33, SD = 0.85) and unfamiliar tunes (M = 2.50, SD = 0.76), indicating, on average, between moderate and weak endorsement of this ability.

Five of the 12 musicians indicated that they had used visual imagery strategies on at least some trials. Performance for this subgroup did not differ significantly from the other seven individuals who reported using exclusively auditory imagery approaches. These two subgroups did not differ on their scores on the vividness or the control sections of the BAIS either.

Imaging data

To determine the pattern of hemodynamic responses occurring during mental reversal of melodies, which was the principal question of interest, we contrasted the reverse trials with the control trials, in which no imagery or reversal would be expected to occur. A pattern of activations emerged that included prominent activation of posterior parietal cortex, together with dorsolateral frontal and ventrolateral frontal areas, and cingulate cortex (Figure 3A, Table 1). Closer inspection of this contrast revealed that the parietal and the frontal areas were bilaterally recruited, but more strongly on the left for the parietal and more strongly on the right for the dorsolateral frontal areas. The pattern was similar overall when contrasting the reverse–forward conditions, although the left IPS peak of activity was somewhat more anterior (Figure 3A, Table 2). In both of these contrasts, analysis of BOLD signal decreases revealed widespread deactivation (data not shown) in medial anterior and posterior regions, as often found in active versus passive tasks (Gusnard & Raichle, 2001), as well as within visual cortical regions, also a common finding in studies containing auditory stimulation, either real (Johnson & Zatorre, 2005) or imagined (Halpern et al., 2004).

Figure 3. 

(A) Main imaging results from Experiment 1. Each vertical panel shows horizontal, sagittal, and coronal views, respectively, of the averaged functional data superimposed on the averaged anatomical MRI; in this and subsequent figures, the right side of the brain is depicted on the right side of the figure in horizontal and coronal sections; in sagittal sections, the left-hemisphere images face left and vice versa. Contrasting the reverse condition with the control condition (first two vertical panels) resulted in activation within posterior parietal cortex and several frontal areas; similar areas were recruited in the reverse–forward conditions (third panel). The bar graph depicts BOLD percent signal change relative to silence extracted from the parietal foci (red circles) and shows significant activation in both forward and reverse conditions, relative to control, but stronger in the reverse condition, with a relative left-hemisphere bias. (B) Main imaging results from Experiment 2, from the reverse − control subtraction. The parietal activation is depicted in the top left image, showing a horizontal section through the maximal activity peak; the bar graph depicts percent BOLD signal change relative to silence. Parietal activation is highest during the reverse condition, as in Experiment 1. The images on the right show the significant activation of cortex within the right STS; this activation is most prominent in the reverse condition, as seen in the bar graph.

Figure 3. 

(A) Main imaging results from Experiment 1. Each vertical panel shows horizontal, sagittal, and coronal views, respectively, of the averaged functional data superimposed on the averaged anatomical MRI; in this and subsequent figures, the right side of the brain is depicted on the right side of the figure in horizontal and coronal sections; in sagittal sections, the left-hemisphere images face left and vice versa. Contrasting the reverse condition with the control condition (first two vertical panels) resulted in activation within posterior parietal cortex and several frontal areas; similar areas were recruited in the reverse–forward conditions (third panel). The bar graph depicts BOLD percent signal change relative to silence extracted from the parietal foci (red circles) and shows significant activation in both forward and reverse conditions, relative to control, but stronger in the reverse condition, with a relative left-hemisphere bias. (B) Main imaging results from Experiment 2, from the reverse − control subtraction. The parietal activation is depicted in the top left image, showing a horizontal section through the maximal activity peak; the bar graph depicts percent BOLD signal change relative to silence. Parietal activation is highest during the reverse condition, as in Experiment 1. The images on the right show the significant activation of cortex within the right STS; this activation is most prominent in the reverse condition, as seen in the bar graph.

Table 1. 

Experiment 1: Reverse − Control

x
y
z
t
Area
−48 −40 56 7.94 L IPS 
−36 −28 68 5.83 L IPS 
46 −48 56 4.02 R IPS 
−30 16 12 5.38 L dorsolateral frontal 
−42 26 34 4.90 L dorsolateral frontal 
34 18 6.84 R ventrolateral frontal 
50 20 −6 5.69 R ventrolateral frontal 
52 26 30 6.24 R dorsolateral frontal 
52 10 42 5.68 R dorsolateral frontal 
26 44 −4 4.84 R orbito-frontal 
24 38 7.91 Ant cingulate 
−10 20 28 5.37 Ant cingulate 
−8 42 6.44 Ant cingulate 
−60 −24 26 5.73 L postcentral gyrus 
66 −32 30 4.68 R postcentral gyrus 
−16 22 5.10 L caudate 
10 10 14 6.42 R caudate 
−34 −66 −26 4.61 L cerebellum 
30 −42 −30 5.23 R cerebellum 
−16 −18 4.55 L thalamus 
x
y
z
t
Area
−48 −40 56 7.94 L IPS 
−36 −28 68 5.83 L IPS 
46 −48 56 4.02 R IPS 
−30 16 12 5.38 L dorsolateral frontal 
−42 26 34 4.90 L dorsolateral frontal 
34 18 6.84 R ventrolateral frontal 
50 20 −6 5.69 R ventrolateral frontal 
52 26 30 6.24 R dorsolateral frontal 
52 10 42 5.68 R dorsolateral frontal 
26 44 −4 4.84 R orbito-frontal 
24 38 7.91 Ant cingulate 
−10 20 28 5.37 Ant cingulate 
−8 42 6.44 Ant cingulate 
−60 −24 26 5.73 L postcentral gyrus 
66 −32 30 4.68 R postcentral gyrus 
−16 22 5.10 L caudate 
10 10 14 6.42 R caudate 
−34 −66 −26 4.61 L cerebellum 
30 −42 −30 5.23 R cerebellum 
−16 −18 4.55 L thalamus 
Table 2. 

Experiment 1: Reverse − Forward

x
y
z
t
Area
−48 −28 54 6.81 L IPS 
48 14 46 6.95 R dorsolateral frontal 
48 18 −4 6.85 R ventrolateral frontal 
38 58 −12 7.84 R orbito-frontal 
34 58 20 6.17 R frontal pole 
36 40 38 5.48 R frontal pole 
42 12 6.62 R frontal operculum 
30 36 8.17 Ant cingulate 
−6 −4 42 5.04 Ant cingulate 
−22 32 4.94 Postcingulate 
62 −34 34 6.82 R supramarginal g 
−38 −18 6.50 L insula 
38 −12 −6 6.31 R insula 
−50 −26 20 5.39 L parietal operculum 
−10 −82 −24 4.87 L cerebellum 
12 −66 36 4.60 R precuneus 
x
y
z
t
Area
−48 −28 54 6.81 L IPS 
48 14 46 6.95 R dorsolateral frontal 
48 18 −4 6.85 R ventrolateral frontal 
38 58 −12 7.84 R orbito-frontal 
34 58 20 6.17 R frontal pole 
36 40 38 5.48 R frontal pole 
42 12 6.62 R frontal operculum 
30 36 8.17 Ant cingulate 
−6 −4 42 5.04 Ant cingulate 
−22 32 4.94 Postcingulate 
62 −34 34 6.82 R supramarginal g 
−38 −18 6.50 L insula 
38 −12 −6 6.31 R insula 
−50 −26 20 5.39 L parietal operculum 
−10 −82 −24 4.87 L cerebellum 
12 −66 36 4.60 R precuneus 

Specifically focusing on the strong BOLD signal increases observed within parietal regions during mental reversal, we extracted the BOLD signal values for both left and right activation sites (using target voxels derived from the reverse–control contrast, as it yielded the largest response magnitude) and computed the percent signal change with respect to the silent baseline condition. ANOVA on these values (shown in the bar graph of Figure 3A) confirms that parietal activation was stronger in the reverse than in the control or the forward conditions (averaged across the two sides), t(11) = 3.56, p < .005, although it is significantly present in both forward and reverse conditions. This analysis also confirms a significant asymmetry in favor of the left hemisphere, F(1, 11) = 5.97, p < .05; there was no interaction of hemisphere with condition, however, indicating that the asymmetry was not specifically associated with the reversal process per se.

We also carried out a voxelwise analysis contrasting the subgroup who indicated that they had used visual strategies versus those who did not. No significant differences were found between the groups in either the forward or the reverse conditions. Comparing the trials in which the songs had lyrics associated with them versus those which did not yielded several regions of BOLD increase. In the forward condition, additional activity was observed bilaterally in posterior auditory cortices. In the reverse condition, this was also true (right: 50, −42, 16; t = 5.41; left: −58, −36, 14; t = 4.36); but in this case, there were also peaks of activation in the anterior cingulate, the left postcentral gyrus, and perhaps of greatest relevance given the verbal processing likely elicited by the melodies with verbal association, in the left inferior frontal gyrus close to what may be considered Broca's area (−56, −2, 16; t = 4.89). The only areas showing more BOLD signal in the trials for melodies without associated lyrics were located in the medial frontal cortex. No differences within the parietal cortex were seen according to the verbal content of the melodies.

Despite predictions that auditory cortex activity during mental reversal would be found, no significant BOLD changes in auditory areas were detected in either forward or reverse conditions when contrasted either with the control condition or with the silence condition. The timing of the stimuli within the frame was optimized, as explained earlier, to capture the mental reversal but to avoid any signal coming from the stimulus itself. The latter assumption was verified by reconstructing the hemodynamic response from the two foci of greatest activity located in left and right auditory cortex in data from three individuals (Figure 4), which shows, as expected based on prior data (Belin et al., 1999), that a signal maximum is reached when the stimulus is presented 4–5 sec before scan, but that no signal is detected when the stimulus is presented 9 sec earlier. This observation validates our assumptions and also demonstrates that the experimental design had sufficient power to detect activity within auditory cortex, even with the reduced sample size of 3. However, it is still possible that a response within auditory cortex related to imagery may have occurred, but that we failed to observe it. We hypothesized that this might have occurred if an imagery-related response peaked earlier than expected and our scan acquisition timing was too late; Experiment 2 was designed in part to test for this possibility by using a shorter TR value.

Figure 4. 

Hemodynamic response in two foci within right and left auditory cortex to the sounded target positioned at different time points within the TR frame, relative to a silent control condition. Note that the signal is essentially at zero when measured at the 9-sec poststimulus onset.

Figure 4. 

Hemodynamic response in two foci within right and left auditory cortex to the sounded target positioned at different time points within the TR frame, relative to a silent control condition. Note that the signal is essentially at zero when measured at the 9-sec poststimulus onset.

We also considered whether the lack of activation in auditory areas was related to individual differences in imagery vividness, such that auditory responses may have only occurred in people with strong imagery. To test this possibility, we conducted a voxelwise analysis of BOLD signal as a function of total BAIS score in the forward–control and reverse–control contrasts, specifically focused on auditory cortices. This analysis did yield an area of correlated BOLD signal within right auditory cortex (x = 64, y = −30, z = 18; t = 3.42; r = .71, p = .01) in the forward–control contrast; this focus lies within the planum temporale according to the probability map of this structure (Westbury, Zatorre, & Evans, 1999). A similar analysis in the reverse–control contrast did not yield any correlations within auditory regions (bilateral responses were observed in regions that were just dorsal to auditory cortex). However, this analysis on the data from the reverse condition did yield one result of interest, which is that the BAIS score correlated positively with BOLD signal within the right IPS (x = 42, y = −36, z = 50; t = 3.29; r = .72, p < .005).

EXPERIMENT 2

Experiment 1 yielded strong indications that a set of regions including the IPS, along with frontal lobe areas, were involved in mental reversal of tunes. Several questions remained open however, and Experiment 2 was designed to address these points. First, we wanted to determine the robustness of the effects by simply replicating the finding. Second, as explained earlier, we were concerned that auditory cortex activity during either the forward or the reverse phase may have been missed because of the long TR, which had been implemented in order not to contaminate any imagery-related auditory cortex response from a stimulus-driven auditory cortex response. Whereas the 12-sec frame duration was evidently successful in avoiding such contamination, it may also have been so long as to lose any imagery-related effect (Figure 4). Hence, in Experiment 2, we opted for a 10-sec TR. To avoid any stimulus-driven (as opposed to imagery-driven) activity within auditory cortex, in the forward condition of Experiment 2, we used a visual cue to the target melody rather than the auditory melody itself; thus, this condition now contained no auditory stimulation. In addition, we jittered the presentation of the stimulus during the reversal condition to better capture any potential auditory cortical response. Finally, another reason for presenting only the title in the first frame was to discourage the possible strategy of mentally reversing the forward tune during Frame 1, which may have occurred in Experiment 1. With only a title at presentation, it would be difficult to remember where a given tune excerpt “stops” so as to begin a reversal in anticipation of matching with the second note string.

Subjects

A subset of seven of the subjects (four women) from Experiment 1 was used in the second experiment (mean age = 23 years) and a mean of 15 years of formal education. This experiment took place several months after Experiment 1 was completed.

Apparatus

The scanning equipment was identical to Experiment 1 but with the addition of a screen used for visual stimulus presentation placed at the rear of the scanner bore, which was viewed via a mirror.

Stimuli

The auditory stimuli were identical to those used in Experiment 1. The visual stimuli used in the forward condition consisted of the titles of the familiar tune excerpts presented in the center of the screen (e.g., “Greensleeves” or “Pink Panther”) or a centered solid dot during the control and reverse conditions.

Conditions

The experiment consisted of four main conditions: forward, reverse, control, and silence, as before. Each condition consisted of presenting a stimulus that was visual, auditory, or both, presented at a time interval varying between 2.5 and 3.5 sec following the start of the previous frame acquisition. When an auditory stimulus was presented, the end of the stimulus was followed by a mean 3.6 sec of silence before the next acquisition, as in Experiment 1. When a visual stimulus was presented, it lasted until the start of the next acquisition, at which point the screen was blanked until the onset of the next visual stimulus.

Each tune title presented in one frame was again paired with either its correct or incorrect reversal in the following frame as in Experiment 1. However, in the forward condition, the subject's task consisted of reading the title of and retrieving the familiar tune excerpt indicated by the tune title (no auditory stimulus was used here until the reverse stimulus was presented in the following acquisition; Figure 5). In the reverse condition, the subject was to respond whether it was the correct or the incorrect tune reversal as before; but a solid dot was also presented at the same time as the auditory stimulus to control for visual input. The block arrangement was identical to Experiment 1. The only modification was that another 12 silence trials were added: Half the silence trials were presented with the visual text “Silent Trial” on the screen, and the other half with a solid dot; the trials with the text were meant to control for the presence of the melody titles included in the forward condition, and the trials with the dot were meant to control for the visual input provided during the reverse condition. The scanning session for each subject consisted of two runs of 150 trials.

Figure 5. 

Timeline of events in Experiment 2. Conditions were identical to Experiment 1, except that the target was the written title of the song and the TR was shortened to 10 sec.

Figure 5. 

Timeline of events in Experiment 2. Conditions were identical to Experiment 1, except that the target was the written title of the song and the TR was shortened to 10 sec.

fMRI Scanning Protocols and Analysis

The acquisition protocols were identical to those in Experiment 1, except that the TR used was 10 sec. In Experiment 2, we used nonlinear registration to improve the alignment of the volumes, hence the power of the analyses, given that there were fewer subjects than in Experiment 1. Nonlinear registration was carried out by using the brain model corresponding to the ICBM152 (Mazziotta, 2001) as a target and each individual T1-weighted anatomical MRI native image volume as a source. The procedure was done in two optimization steps, the first accounting for the linear component of the transformation function and the other the nonlinear component. The first step was accomplished by linearly combining nine parameters, three of each of rotation, translation, and scale, to find a transformation matrix as described in (Collins et al., 1994). In the second step, to account for nonlinear morphological differences between the target and the source volumes not accounted for by the linear transformation, deformations (in the x, y, and z directions) were applied at each voxel of the target volume to optimize local alignment with the source volume. The optimizations were carried out on progressively less blurred data with each fit initialized by the results of the previously computed deformation fields. In this experiment, the initial Gaussian blurring kernel resolution was 8 mm followed by 4 mm. Therefore, the full nonlinear transformation for a native T1-weighted MRI volume was defined by the linear transformation matrix coupled with a volume storing the voxelwise (4 × 4 × 4 mm3) deformation fields. Other aspects of the statistical analysis and image processing were carried out as in Experiment 1.

Results

Behavioral data. The behavioral data collected during scanning in Experiment 2 indicated high levels of performance (Figure 2). Although accuracy was slightly lower than in Experiment 1, perhaps due to the fact that only the title of the target melody was presented in the first frame rather than the melody itself as in Experiment 1, comparison of the data from the seven subjects who participated in both experiments indicated no significant difference in either percent correct or response times.

Imaging data. In this experiment, unlike Experiment 1, we cannot search for imagery-related activation by comparing forward to control conditions because the control condition contained an auditory stimulus whereas the forward condition did not. Therefore, we examined the forward–silence contrast for this purpose because both these conditions involve no auditory stimulation. The results yielded considerable activation within dorsolateral frontal regions, primarily on the left side, as well as ventrolateral frontal areas bilaterally, SMA, and anterior cingulate (data not shown). The left IPS region was also significantly active. A directed search for auditory cortex response yielded one site (x = −54, y = −38, z = 22; t = 4.51) in the posterior reach of the left planum temporale according to the probability map of this structure (Westbury et al., 1999).

To examine processing associated with mental reversal, we computed the same contrast as in Experiment 1, reverse–control, which is most appropriate here because both these conditions contained matched levels of auditory and visual stimuli. The results were largely comparable to what was observed in Experiment 1 (Table 3), with high levels of activation in IPS bilaterally, in ventrolateral frontal areas bilaterally, and in dorsolateral frontal areas primarily on the right side. ANOVA on the extracted BOLD signal relative to silence within the parietal region (Figure 3B) yielded a pattern similar to that of Experiment 1. In particular, there was significant recruitment of the intraparietal region during both forward and reverse conditions, but the activity was significantly greater in reverse than in forward (p = .02); also as in Experiment 1, there was no significant side by condition interaction, indicating again a general tendency for greater response on the left, not linked to condition specifically. Finally, we also observed activity via directed search in auditory-related areas, within the right STS (Figure 3B: x = 48, y = −28, z = −4; t = 4.12) in this contrast. Correlational analyses using the BAIS scores were also performed on the data of Experiment 2, but no significant effects were observed, likely due to the smaller sample size and concomitantly reduced range of behavioral scores.

Table 3. 

Experiment 2: Reverse − Control

x
y
z
t
Area
−42 −50 60 6.51 L IPS 
−40 −36 32 5.64 L IPS 
−28 −54 46 5.26 L IPS 
36 −38 38 7.14 R IPS 
44 −54 54 5.87 R IPS 
−28 18 8.42 L ventrolateral frontal 
34 20 8.73 R ventrolateral frontal 
44 38 18 6.74 R dorsolateral frontal 
50 10 20 6.69 R dorsolateral frontal 
44 44 −10 5.05 R orbito-frontal 
−44 30 6.73 L premotor 
−36 −6 56 5.63 L dorsal premotor 
42 38 5.18 R premotor 
28 62 5.99 R dorsal premotor 
24 42 9.92 Ant cingulate 
−6 56 7.28 Pre-SMA 
−36 −50 −34 5.43 L cerebellum 
24 −44 −30 4.72 R cerebellum 
−14 −14 8.12 L thalamus 
16 −10 10 7.12 R thalamus 
12 −70 44 5.15 R precuneus 
x
y
z
t
Area
−42 −50 60 6.51 L IPS 
−40 −36 32 5.64 L IPS 
−28 −54 46 5.26 L IPS 
36 −38 38 7.14 R IPS 
44 −54 54 5.87 R IPS 
−28 18 8.42 L ventrolateral frontal 
34 20 8.73 R ventrolateral frontal 
44 38 18 6.74 R dorsolateral frontal 
50 10 20 6.69 R dorsolateral frontal 
44 44 −10 5.05 R orbito-frontal 
−44 30 6.73 L premotor 
−36 −6 56 5.63 L dorsal premotor 
42 38 5.18 R premotor 
28 62 5.99 R dorsal premotor 
24 42 9.92 Ant cingulate 
−6 56 7.28 Pre-SMA 
−36 −50 −34 5.43 L cerebellum 
24 −44 −30 4.72 R cerebellum 
−14 −14 8.12 L thalamus 
16 −10 10 7.12 R thalamus 
12 −70 44 5.15 R precuneus 

DISCUSSION

The two experiments taken together give a first indication of the role played by several cortical regions in mental manipulation of imagined music. The behavioral data from both the fMRI and the control task lend validity to the mental transformation process that was the main focus of the research. The reverse condition, in which we assumed mental transformation would be required, consistently yielded high levels of hemodynamic response within the IPS, together with ventrolateral and dorsolateral frontal cortices. Some evidence for right auditory cortical involvement was also obtained in both experiments, and pre-SMA activity was also seen in Experiment 2, consistent with most prior auditory imaging studies. We note that most results were replicated from Experiment 1 to Experiment 2, despite a change in behavioral paradigm and differences in some imaging parameters, indicating that the findings are quite robust. We discuss these findings in the light of previous studies of auditory imagery and more broadly in terms of mental manipulation tasks in other modalities.

First, we note that our choice of behavioral task seemed to fulfill several important requirements. Musicians were able to perform at high levels overall but did not find the task trivial, as indicated by the fact that performance was not at 100%. More importantly, the latency data from both the main experiment and especially the control behavioral task are consistent with a mental transformation taking place. This validation is important because it could be argued that the task could be accomplished in some other way, not requiring mental reversal (e.g., noticing that a tone contained in the reversal was not present in the target melody). The use of such a strategy cannot be entirely excluded, but the control task demonstrated significantly shorter response times when the probe note was inserted toward the end of the reversal than toward the beginning (i.e., it would be encountered first only if the listener were truly reversing, starting from the end and going back toward the beginning). These chronometric data constitute important ancillary evidence that the mental event we intended to study was actually taking place, a point which has been raised in prior behavioral studies of musical imagery (Halpern, 1988); similar arguments have been made for chronometric data in the visual domain, which help to validate the reality of the mental transformation (Finlay, Motes, & Kozhevnikov, 2007; Zacks, Vettel, & Michelon, 2003; Shepard & Metzler, 1971; for a review, see Kosslyn, Thompson, & Ganis, 2006). We also note that the overall self-assessment of auditory imagery as elicited on the BAIS appears to reflect some real individual differences in neural processing, given the correlations in Experiment 1 of the BAIS score with BOLD signal in both the forward (right PT) and the reverse (right IPS) conditions. The uniform task performance precluded our testing whether this extra recruitment might correlate with better performance, given a more difficult task.

Many prior imaging and lesion studies involving manipulation of visual information have identified a network of frontal and posterior parietal areas similar to that observed in the present study. The emerging view is that there are functional dissociations between these regions, such that dorsolateral and ventrolateral areas are most important for working memory components of the task, whereas regions within the IPS are most important for the manipulation component of the task (Champod & Petrides, 2007). It has been proposed that experimental tasks requiring retrieval and monitoring of information in working memory will recruit ventrolateral and dorsolateral cortices, respectively (Petrides, 2005). The tasks used here would certainly require both these cognitive components because the stimulus information would need to be encoded and maintained in working memory during both the presentation phase (forward trials) and the manipulation phase (reverse trials). The interfering noise provided by the fMRI acquisition occurring in between the two stimuli would likely add to the working memory demands. However, we would expect a greater role for both retrieval and monitoring in the reverse condition because the reordering of tones from last to first would presumably require constant monitoring of the contents of working memory as well as retrieval of the correct version of the melody for comparison. In line with this expectation, both ventral and dorsal frontal regions in both experiments were more active when comparing reverse to forward conditions.

It is notable that involvement of posterior parietal cortex has not been reported in any of the musical imagery studies described in the introduction. The recruitment of the parietal region does not, therefore, appear to be a general characteristic of musical imagery per se but is more likely linked to the active manipulation of information. Although we did find significant parietal activity in both experiments within the forward condition, in both cases the parietal recruitment was greater during the reverse phase of the task. It is possible that the parietal response observed during the forward phase was related to listeners' mental preparation for the reverse condition, which, given their pretraining, they knew followed each forward item presentation.

The argument that parietal mechanisms are important for manipulation of imagined information, as opposed to simply representing such information, has been made most clearly in the visual cognition literature. Using visual mental rotation tasks, numerous studies have demonstrated that mental rotation consistently elicits activity within the posterior parietal cortex, typically within the IPS (for meta-analysis of this literature, see Zacks, 2008). The mean location of this response is typically more posterior than that found in the present study, raising the possibility of some segregation of function in this region; however, even within the visual domain there is considerable variation in the location within the IPS that is recruited for mental rotation, so that there is overlap between the location of the present findings and those reported in visual studies. This response within the IPS has been tied directly to models of parietal neuronal function based on neuroanatomical and neurophysiological findings, which posit that neurons within this region compute transforms across coordinate frames. For example, a task such as viewing and reaching for an object requires computation of several visuomotor transformations (Culham & Valyear, 2006; Grefkes & Fink, 2005). In the context of visual cognition, similar transformations of coordinate frames are thought to occur when mental rotation of a visual image is required, as mentioned earlier. Further evidence for a link between visual mental rotation and auditory transformations comes from behavioral findings: Cupchik, Phillips, and Hill (2001) showed an association across individuals between their behavioral ability in mental rotation and a melody reversal task. Conversely, a recent article on persons with amusia (Douglas & Bilkey, 2007) showed an association between impaired musical ability and impaired visual mental rotation. Our melody imagery task also required the transformation of a mental image: reversing the tones of the melody in time, which may be thought of as roughly analogous to the transformation of a visual image in space. In both cases, the relationship between the individual elements of the perceptual representation must be maintained, but a novel arrangement of them must be computed (spatial position in one case, temporal position in the other).

The results of Rudner et al. (2005), who performed an auditory–verbal temporal reversal task, are consistent with these conclusions. These authors report that posterior parietal areas, together with ventral and dorsal frontal cortices, are active when listeners judge whether a heard word constitutes a time-reversed version of a target word in comparison to a control rhyme condition. This task is similar in nature to the one used here and thus suggests some commonality in the processes used for temporal manipulations in the auditory domain, although the use of verbal materials in the study of Rudner et al. introduces a different level of cognitive analysis. The present conclusions are also consistent with recent fMRI data on a different type of auditory mental transformation: transposition. Melodies are readily recognized based on the pitch intervals (frequency ratios) formed between their constituent notes and not based on the pitch values themselves. Hence, when all pitches in a melody are changed by a constant factor or transposed, the melody remains constant. Foster and Zatorre (submitted) asked listeners to make same/different judgments on pairs of melodies in which the second item was transposed, as contrasted with judgments based on untransposed melody pairs. They obtained prominent neural activity within the IPS bilaterally, spatially overlapping the location of the present findings. Moreover, IPS activation in that study correlated with behavioral performance, such that better performance was associated with higher activity. These data, taken together with the present findings, strongly suggest that mental manipulations of auditory information, whether with verbal or musical stimuli and whether with temporal or pitch-based transformations, involve neural systems within the posterior parietal cortex. This conclusion is consistent with the visual literature reviewed earlier but expands the models that exist to date, which are largely based on visuomotor or visuospatial information manipulations. We argue, therefore, that the computations carried out in parietal cortex, particularly within the IPS, are of a similar nature whether they operate over visual or auditory information.

It could be argued that the prominent activation of parietal cortex reflects the use of a visuospatial strategy for completion of the mental reversal task. Although this explanation cannot be conclusively ruled out based on the current findings, it appears highly unlikely to account for our data. First, we did interview listeners to ask about the use of visual strategies. Fewer than half of the subjects reported using visual imagery (e.g., visualization of the notes of the song on a musical staff); moreover, analysis of the data from the two subgroups who did and did not report using visualization revealed no difference in the overall pattern of activation. There was also no difference between these subgroups in the degree to which they experienced vivid auditory images. Second, it is relevant to note that in Experiment 1 we obtained a correlation between BAIS scores, which indicate vividness of auditory (not visual) imagery, and activation of right parietal cortex. This effect would appear to reflect greater reliance on parietal cortex amongst those subjects most likely to use an auditory strategy. Third, numerous neuroimaging studies have been performed using auditory stimuli in which parietal cortex is recruited, typically in tasks requiring spatial processing (Zatorre, Bouffard, Ahad, & Belin, 2002; Alain, Arnott, Hevenor, Graham, & Grady, 2001; Maeder et al., 2001; Weeks et al., 2000), but also in certain nonspatial tasks (Cusack, 2005), indicating that visual input is at least not essential to activate parietal regions. Finally, it is well established that posterior parietal cortex receives direct anatomical inputs from multiple sensory modalities, including unimodal auditory cortex (Frey, Campbell, Pike, & Petrides, 2008; Schroeder & Foxe, 2002; Lewis & Van Essen, 2000). It therefore seems most parsimonious to assume that the role of the parietal cortex is linked not to the modality of input but rather to the computation required to carry out the mental manipulation task used in our study.

Although one of the principal aims of the present set of experiments was to elucidate the contribution of auditory cortex to mental imagery and transformation, overall, the evidence points to a less prominent involvement of auditory cortex in the present imagery task than anticipated. Although most prior imagery studies have reported auditory cortex involvement, as indicated in the introduction, at least one recent fMRI study involving anticipation of musical paired-associates did not (Leaver, Van Lare, Zielinski, Halpern, & Rauschecker, 2009). There were nonetheless two findings that do support some role of auditory cortex in the present circumstances: First, in the forward condition of Experiment 1, we observed a significant correlation between score on the BAIS scale and BOLD signal within the right planum temporale, an area previously shown by many studies to be important for melodic processing and other auditory tasks (Griffiths & Warren, 2002). We take this finding as evidence that this auditory cortical area contributes to imagery to a greater extent among those individuals whose imagery is more vivid or who tend to use imagery to a greater extent. Notwithstanding a role for auditory cortex in people with high imagery scores, it is to be noted that all musicians in the present study performed the task equally well, including those who did not evince any activity within auditory regions. We speculate, therefore, that activity in auditory cortex likely reflects the degree to which one is able to represent the sound of the stimulus phenomenologically, which may be useful in general, but is not necessarily required to solve the mental reversal task. The manipulative requirements of the task may take precedence in this instance over the mental imagery of the sound quality itself.

The second piece of evidence favoring a role for auditory cortex in mental reversal of melodies is that in Experiment 2 we found two significant foci of activity within secondary auditory areas. In the forward–silence comparison, there was recruitment of the left planum temporale. In the reverse–control contrast, the right STS was active (six of the seven subjects showed positive response in this region compared with silence; there was no correlation with BAIS score, however). Recall that Experiment 1 was designed with a long TR to avoid contaminating any imagery-related response in auditory cortex with a response elicited by the actual stimulus (Figure 4 indicates that this experimental manipulation was successful). However, as argued earlier, the 12-sec interval may have been too long to pick up any imagery-related activation. Both auditory cortical regions seen in Experiment 2 may therefore be taken as indicators of imagery-related recruitment during forward trials (which required retrieval from memory because only verbal titles were shown) as well as during reverse trials. Despite this evidence of auditory cortex recruitment in this task, we cannot exclude the possibility that the left PT activity in the forward trials reflects verbal processing (of the titles for example) rather than musical imagery, nor can we exclude the possibility that the right STS response in the reverse trials is related to the structure inherent in the reverse melody, as opposed to the randomly organized sequence of tones presented in the control condition (Peretz et al., in press). Hence, the contribution of auditory cortex to imagery in this task remains to be demonstrated conclusively.

Conclusion

The data from the two fMRI studies, together with the behavioral findings, constitute evidence that mental manipulation of imagined auditory events relies on similar mechanisms to those that have been described in other domains. In particular, we show that reversing a melody mentally is readily accomplished by musicians, that the chronometry of their responses is consistent with a mental reversal process, and that this manipulation recruits a neural circuit including the cortex within the IPS. The latter structure is known to play a critical role in various types of sensory-motor transformations in which the relationship between individual elements of a stimulus must be kept constant while computing these relationships within a new frame of reference. We propose that when auditory information is manipulated, it relies on this mechanism, indicating that it is likely to be an amodal processing system.

Acknowledgments

We thank the staff of the McConnell Brain Imaging Centre for their technical assistance, Anish Jain for help in administering the questionnaires and analyzing data, and Brittney McGill for generating the stimuli and pilot testing the task. This research was supported by a grant to the first two authors from the Grammy Foundation and by funding to the first author from the Canadian Institutes of Health Research and the Natural Sciences and Engineering Research Council.

Reprint requests should be sent to Robert J. Zatorre, Montreal Neurological Institute, McGill University, 3801 University, Montreal, Quebec, Canada H3A 2B4, or via e-mail: robert.zatorre@mcgill.ca.

REFERENCES

REFERENCES
Alain
,
C.
,
Arnott
,
S. R.
,
Hevenor
,
S.
,
Graham
,
S.
, &
Grady
,
C. L.
(
2001
).
“What” and “where” in the human auditory system.
Proceedings of the National Academy of Sciences, U.S.A.
,
98
,
12301
12306
.
Alivisatos
,
B.
, &
Petrides
,
M.
(
1997
).
Functional activation of the human brain during mental rotation.
Neuropsychologia
,
35
,
111
118
.
Belin
,
P.
,
Zatorre
,
R. J.
,
Hoge
,
R.
,
Evans
,
A. C.
, &
Pike
,
B.
(
1999
).
Event-related fMRI of the auditory cortex.
Neuroimage
,
10
,
417
429
.
Brodsky
,
W.
,
Henik
,
A.
,
Rubinstein
,
B. S.
, &
Zorman
,
M.
(
2003
).
Auditory imagery from musical notation in expert musicians.
Perception and Psychophysics
,
65
,
602
612
.
Champod
,
A. S.
, &
Petrides
,
M.
(
2007
).
Dissociable roles of the posterior parietal and the prefrontal cortex in manipulation and monitoring processes.
Proceedings of the National Academy of Sciences, U.S.A.
,
104
,
14837
14842
.
Coffman
,
D. D.
(
1990
).
Effects of mental practice, physical practice, and knowledge of results on piano performance.
Journal of Research in Music Education
,
38
,
187
196
.
Collins
,
D.
,
Neelin
,
P.
,
Peters
,
T.
, &
Evans
,
A.
(
1994
).
Automatic 3D intersubject registration of MR volumetric data in standardized Talairach space.
Journal of Computer Assisted Tomography
,
18
,
192
205
.
Culham
,
J. C.
, &
Valyear
,
K. F.
(
2006
).
Human parietal cortex in action.
Current Opinion in Neurobiology
,
16
,
205
212
.
Cupchik
,
G. C.
,
Phillips
,
K.
, &
Hill
,
D. S.
(
2001
).
Shared processes in spatial rotation and musical permutation.
Brain and Cognition
,
46
,
373
382
.
Cusack
,
R.
(
2005
).
The intraparietal sulcus and perceptual organization.
Journal of Cognitive Neuroscience
,
17
,
641
651
.
Douglas
,
K. M.
, &
Bilkey
,
D. K.
(
2007
).
Amusia is associated with deficits in spatial processing.
Nature Neuroscience
,
10
,
915
921
.
Finlay
,
C. A.
,
Motes
,
M. A.
, &
Kozhevnikov
,
M.
(
2007
).
Updating representations of learned scenes.
Psychological Research
,
71
,
265
276
.
Foster
,
N. E. V.
, &
Zatorre
,
R. J.
(
submitted
).
A role for the intraparietal sulcus in transforming musical information.
Cerebral Cortex.
[doi: 10.1093/cercor/bhp199].
Frey
,
S.
,
Campbell
,
J. S. W.
,
Pike
,
G. B.
, &
Petrides
,
M.
(
2008
).
Dissociating the human language pathways with high angular resolution diffusion fiber tractography.
Journal of Neuroscience
,
28
,
11435
11444
.
Grefkes
,
C.
, &
Fink
,
G. R.
(
2005
).
The functional organization of the intraparietal sulcus in humans and monkeys.
Journal of Anatomy
,
207
,
3
17
.
Griffiths
,
T. D.
, &
Warren
,
J. D.
(
2002
).
The planum temporale as a computational hub.
Trends in Neurosciences
,
25
,
348
353
.
Gusnard
,
D. A.
, &
Raichle
,
M. E.
(
2001
).
Searching for a baseline: Functional imaging and the resting human brain.
Nature Reviews Neuroscience
,
2
,
685
694
.
Hall
,
D.
,
Haggard
,
M.
,
Akeroyd
,
M.
,
Palmer
,
A.
,
Summerfield
,
A.
,
Elliott
,
M.
,
et al
(
1999
).
“Sparse” temporal sampling in auditory fMRI.
Human Brain Mapping
,
7
,
213
223
.
Halpern
,
A. R.
(
1988
).
Mental scanning in auditory imagery for tunes.
Journal of Experimental Psychology: Learning, Memory, and Cognition
,
14
,
434
443
.
Halpern
,
A. R.
, &
Zatorre
,
R. J.
(
1999
).
When that tune runs through your head: A PET investigation of auditory imagery for familiar melodies.
Cerebral Cortex
,
9
,
697
704
.
Halpern
,
A. R.
,
Zatorre
,
R. J.
,
Bouffard
,
M.
, &
Johnson
,
J. A.
(
2004
).
Behavioral and neural correlates of perceived and imagined musical timbre.
Neuropsychologia
,
42
,
1281
1292
.
Harris
,
I. M.
,
Egan
,
G. F.
,
Sonkkila
,
C.
,
Tochon-Danguy
,
H. J.
,
Paxinos
,
G.
, &
Watson
,
J. D. G.
(
2000
).
Selective right parietal lobe activation during mental rotation. A parametric PET study.
Brain
,
123
,
65
73
.
Herholz
,
S. C.
,
Lappe
,
C.
,
Knief
,
A.
, &
Pantev
,
C.
(
2008
).
Neural basis of music imagery and the effect of musical expertise.
European Journal of Neuroscience
,
28
,
2352
2360
.
Highben
,
Z.
, &
Palmer
,
C.
(
2004
).
Effects of auditory and motor mental practice in memorized piano performance.
Bulletin of the Council for Research in Music Education
,
159
,
58
65
.
Humphreys
,
J. T.
(
1986
).
Measurement, prediction, and training of harmonic audiation and performance skills.
Journal of Research in Music Education
,
34
,
192
199
.
Johnson
,
J. A.
, &
Zatorre
,
R. J.
(
2005
).
Attention to simultaneous unrelated auditory and visual events: Behavioral and neural correlates.
Cerebral Cortex
,
15
,
1609
1620
.
Koshino
,
H.
,
Carpenter
,
P. A.
,
Keller
,
T. A.
, &
Just
,
M. A.
(
2005
).
Interactions between the dorsal and the ventral pathways in mental rotation: An fMRI study.
Cognitive, Affective & Behavioral Neuroscience
,
5
,
54
66
.
Kosslyn
,
S.
,
Thompson
,
W. L.
, &
Ganis
,
G.
(
2006
).
The case for mental imagery.
Oxford
:
Oxford University Press
.
Kozhevnikov
,
M.
,
Kosslyn
,
S.
, &
Shepard
,
J.
(
2005
).
Spatial vs. object visualizers: Characterization of visual cognitive style.
Memory and Cognition
,
33
,
710
726
.
Kraemer
,
D. J. M.
,
Macrae
,
C. N.
,
Green
,
A. E.
, &
Kelley
,
W. M.
(
2005
).
Musical imagery: Sound of silence activates auditory cortex.
Nature
,
434
,
158
.
Leaver
,
A. M.
,
Van Lare
,
J.
,
Zielinski
,
B.
,
Halpern
,
A. R.
, &
Rauschecker
,
J. P.
(
2009
).
Brain activation during anticipation of sound sequences.
Journal of Neuroscience
,
29
,
2477
2485
.
Lewis
,
J. W.
, &
Van Essen
,
D. C.
(
2000
).
Corticocortical connections of visual, sensorimotor, and multimodal processing areas in the parietal lobe of the macaque monkey.
Journal of Comparative Neurology
,
428
,
112
137
.
Lim
,
S.
, &
Lippman
,
L. G.
(
1991
).
Mental practice and memorization of piano music.
Journal of General Psychology
,
118
,
21
30
.
Maeder
,
P.
,
Meuli
,
R.
,
Adriani
,
M.
,
Bellmann
,
A.
,
Fornari
,
E.
,
Thiran
,
J.-P.
,
et al
(
2001
).
Distinct pathways involved in sound recognition and localization: A human fMRI study.
Neuroimage
,
14
,
802
816
.
Mazziotta
,
J.
(
2001
).
A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM).
Proceedings of the National Academy of Sciences, U.S.A.
,
356
,
1293
1322
.
Mountain
,
R.
(
2001
).
Composers & imagery: Myths & realities.
In R. I. Godøy & H. Jorgensen (Eds.),
Musical imagery
(pp.
271
288
).
Florence, KY
:
Routledge
.
O'Boyle
,
M. W.
(
2005
).
Mathematically gifted male adolescents activate a unique brain network during mental rotation.
Cognitive Brain Research
,
25
,
583
587
.
Pascual-Leone
,
A.
(
2003
).
The brain that plays music and is changed by it.
In I. Peretz & R. Zatorre (Eds.),
The cognitive neuroscience of music
(pp.
396
412
).
Oxford
:
Oxford University Press
.
Peretz
,
I.
,
Gosselin
,
N.
,
Belin
,
P.
,
Zatorre
,
R. J.
,
Plailly
,
J.
, &
Tillmann
,
B.
(
in press
).
Musical lexical networks: The cortical organization of music recognition.
Annals of the New York Academy of Sciences
.
Petrides
,
M.
(
2005
).
Lateral prefrontal cortex: Architectonic and functional organization.
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
360
,
781
795
.
Postle
,
B. R.
,
Berger
,
J. S.
, &
D'Esposito
,
M.
(
1999
).
Functional neuroanatomical double dissociation of mnemonic and executive control processes contributing to working memory performance.
Proceedings of the National Academy of Sciences, U.S.A.
,
96
,
12959
12964
.
Rudner
,
M.
,
Rönnberg
,
J.
, &
Hugdahl
,
K.
(
2005
).
Reversing spoken items-mind twisting not tongue twisting.
Brain and Language
,
92
,
78
90
.
Schroeder
,
C.
, &
Foxe
,
J.
(
2002
).
The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex.
Cognitive Brain Research
,
14
,
187
198
.
Shepard
,
R. N.
, &
Metzler
,
J.
(
1971
).
Mental rotation of three-dimensional objects.
Science
,
171
,
791
793
.
Smith
,
E. E.
, &
Jonides
,
J.
(
1998
).
Neuroimaging analyses of human working memory.
Proceedings of the National Academy of Sciences, U.S.A.
,
95
,
12061
12068
.
Talairach
,
J.
, &
Tournoux
,
P.
(
1988
).
Co-planar stereotaxic atlas of the human brain.
New York
:
Thieme Medical Publishers, Inc
.
Theiler
,
A. M.
, &
Lipppman
,
L. G.
(
1995
).
Effects of mental practice and modeling on guitar and vocal performance.
Journal of General Psychology
,
122
,
329
343
.
Trusheim
,
W. H.
(
1991
).
Audiation and mental imagery: Implications for artistic performance.
Quarterly Journal of Music Teaching and Learning
,
2
,
139
147
.
Weeks
,
R.
,
Horwitz
,
B.
,
Aziz-Sultan
,
A.
,
Tian
,
B.
,
Wessinger
,
C.
,
Cohen
,
L.
,
et al
(
2000
).
A positron emission tomographic study of auditory localization in the congenitally blind.
Journal of Neuroscience
,
20
,
2664
2672
.
Westbury
,
C. F.
,
Zatorre
,
R. J.
, &
Evans
,
A. C.
(
1999
).
Quantifying variability in the planum temporale: A probability map.
Cerebral Cortex
,
9
,
392
405
.
Worsley
,
K.
(
2005
).
An improved theoretical P-value for SPMs based on discrete local maxima.
Neuroimage
,
28
,
1056
1062
.
Worsley
,
K.
,
Liao
,
C.
,
Aston
,
J.
,
Petre
,
V.
,
Duncan
,
G.
,
Morales
,
F.
,
et al
(
2002
).
A general statistical analysis for fMRI data.
Neuroimage
,
15
,
1
15
.
Zacks
,
J. M.
(
2008
).
Neuroimaging studies of mental rotation: A meta-analysis and review.
Journal of Cognitive Neuroscience
,
20
,
1
19
.
Zacks
,
J. M.
,
Vettel
,
J. M.
, &
Michelon
,
P.
(
2003
).
Imagined viewer and object rotations dissociated with event-related fMRI.
Journal of Cognitive Neuroscience
,
15
,
1002
1018
.
Zatorre
,
R. J.
,
Bouffard
,
M.
,
Ahad
,
P.
, &
Belin
,
P.
(
2002
).
Where is “where” in the human auditory cortex?
Nature Neuroscience
,
5
,
905
909
.
Zatorre
,
R. J.
,
Chen
,
J. L.
, &
Penhune
,
V. B.
(
2007
).
When the brain plays music: Auditory–motor interactions in music perception and production.
Nature Reviews Neuroscience
,
8
,
547
558
.
Zatorre
,
R. J.
, &
Halpern
,
A. R.
(
1993
).
Effect of unilateral temporal-lobe excision on perception and imagery of songs.
Neuropsychologia
,
31
,
221
232
.
Zatorre
,
R. J.
, &
Halpern
,
A. R.
(
2005
).
Mental concerts: Musical imagery and auditory cortex.
Neuron
,
47
,
9
12
.
Zatorre
,
R. J.
,
Halpern
,
A. R.
,
Perry
,
D. W.
,
Meyer
,
E.
, &
Evans
,
A. C.
(
1996
).
Hearing in the mind's ear: A PET investigation of musical imagery and perception.
Journal of Cognitive Neuroscience
,
8
,
29
46
.