Abstract

We investigated the functional characteristics of brain regions implicated in processing of speech melody by presenting words spoken in either neutral or angry prosody during a functional magnetic resonance imaging experiment using a factorial habituation design. Subjects judged either affective prosody or word class for these vocal stimuli, which could be heard for either the first, second, or third time. Voice-sensitive temporal cortices, as well as the amygdala, insula, and mediodorsal thalami, reacted stronger to angry than to neutral prosody. These stimulus-driven effects were not influenced by the task, suggesting that these brain structures are automatically engaged during processing of emotional information in the voice and operate relatively independent of cognitive demands. By contrast, the right middle temporal gyrus and the bilateral orbito-frontal cortices (OFC) responded stronger during emotion than word classification, but were also sensitive to anger expressed by the voices, suggesting that some perceptual aspects of prosody are also encoded within these regions subserving explicit processing of vocal emotion. The bilateral OFC showed a selective modulation by emotion and repetition, with particularly pronounced responses to angry prosody during the first presentation only, indicating a critical role of the OFC in detection of vocal information that is both novel and behaviorally relevant. These results converge with previous findings obtained for angry faces and suggest a general involvement of the OFC for recognition of anger irrespective of the sensory modality. Taken together, our study reveals that different aspects of voice stimuli and perceptual demands modulate distinct areas involved in the processing of emotional prosody.

INTRODUCTION

Modulation of speech melody (prosody) provides a powerful cue to express one's affective state. Previous neuroimaging studies demonstrated that voice-sensitive regions (Belin, Zatorre, Lafaille, Ahad, & Pike, 2000) in the middle part of the right superior temporal gyrus (STG) exhibit stronger responses to a variety of vocally expressed emotions than to neutral prosody (Ethofer et al., 2007; Ethofer, Anders, Wiethoff, et al., 2006; Grandjean et al., 2005). Another brain structure where differential responses to vocally expressed emotions were often reported is the amygdala (Quadflieg, Mohr, Mentzel, Miltner, & Straube, 2008; Fecteau, Belin, Joanette, & Armony, 2007; Ethofer, Anders, Erb, Droll, et al., 2006; Ethofer, Pourtois, & Wildgruber, 2006; Sander et al., 2005; Morris, Scott, & Dolan, 1999; Phillips et al., 1998), although its role in processing of vocal emotions is still controversial, as different studies found either increased (Quadflieg et al., 2008; Fecteau et al., 2007; Sander et al., 2005; Phillips et al., 1998), decreased (Morris et al., 1999), or no differential activity (Wildgruber et al., 2005) in amygdala responses to affective prosody or nonlinguistic vocalizations. In addition, it has been demonstrated that the orbito-frontal cortex (OFC) displays a particular sensitivity to angry expressions in faces (Blair, Morris, Frith, Perrett, & Dolan, 1999) but probably also in voices (Hornak et al., 2003). So far, however, the specific role of these different regions remains unclear, and the exact aspects of affective vocal information encoded in the amygdala, the OFC, and other related brain regions are still largely unknown.

Explicit evaluation of affective prosody has been linked to both right posterior middle temporal gyrus/superior temporal sulcus (MTG/STS; Ethofer, Anders, Erb, Herbert, et al., 2006; Wildgruber et al., 2005; Mitchell, Elliott, Barry, Cruttenden, & Woodruff, 2003) and the bilateral inferior frontal cortex/orbito-frontal cortex (IFG/OFC; Ethofer, Anders, Erb, Herbert, et al., 2006; Wildgruber et al., 2004; Wildgruber, Pihan, Ackermann, Erb, & Grodd, 2002; Imaizumi et al., 1997), as these brain regions show task-related brain activity with stronger responses during judgment of affective prosody than a variety of control tasks including judgment of emotional word content (Ethofer, Anders, Erb, Herbert, et al., 2006; Mitchell et al., 2003), phonetic identification (Wildgruber et al., 2005), evaluation of linguistic prosody (Wildgruber et al., 2004), and speaker identification (Imaizumi et al., 1997). The suggestion that these brain regions cooperate during voice processing is supported by functional magnetic resonance imaging (fMRI) data testing for their functional (Obleser, Wise, Alex Dresner, & Scott, 2007) and effective (Ethofer, Anders, Erb, Herbert, et al., 2006) connectivity, as well as by results from diffusion tensor imaging demonstrating structural connectivity between these areas (Glasser & Rilling, 2008).

Besides the emotional information, novelty constitutes an important factor that determines the behavioral relevance of a stimulus as automatic attention to novel stimuli is crucial to detect possible significant changes in the environment (Sokolov, 1963). Repeated exposure to a certain stimulus leads to a faster and more efficient processing of that stimulus. This phenomenon is called perceptual priming and reflects an implicit form of memory (Tulving & Schacter, 1990). At the neural level, stimulus repetition leads to a decrease in the neural responses, an effect that is termed “repetition suppression” (for a review, see Grill-Spector, Henson, & Martin, 2006). This neural adaptation to repeated stimulus exposure is a robust effect and has been investigated both at the neural level in monkeys, using single-cell recordings (e.g., Sobotka & Ringo, 1996; Li, Miller, & Desimone, 1993), and at the hemodynamic level in humans, using fMRI (e.g., Grill-Spector et al., 1999; Buckner et al., 1995). Across various studies, this effect has been exploited to investigate different aspects of face processing (Henson & Mouchlianitis, 2007; Bunzeck, Schutze, & Duzel, 2006; Hasson, Nusbaum, & Small, 2006; Yi, Kelley, Marois, & Chun, 2006; Soon, Venkatraman, & Chee, 2003; Henson, Shallice, Gorno-Tempini, & Dolan, 2002; Henson, 2000) and voice processing (Dehaene-Lambertz et al., 2006; Hasson et al., 2006; Orfanidou, Marslen-Wilson, & Davis, 2006; Cohen, Jobert, Le Bihan, & Dehaene, 2004; Belin & Zatorre, 2003). Concerning emotional communication, Ishai, Pessoa, Bikle, and Ungerleider (2004) used an adaptation paradigm to examine whether habituation in face-sensitive regions (Haxby et al., 2001; Kanwisher, McDermott, & Chun, 1997) is modulated by emotion displayed in facial expressions. They found that face-sensitive regions exhibit a stronger habituation for emotional than for neutral faces. So far, similar adaptation paradigms have not been used to investigate processing of affective prosody.

In the present study, we investigated whether brain regions showing differential activity to distinctive stimulus features (i.e., angry vs. neutral prosody) are additionally modulated by task demands (i.e., when prosody is relevant vs. incidental); and vice versa, whether regions displaying task-dependent activity are also modulated by stimulus-driven factors. We chose anger as a representative and well-discriminated category of emotion expressed by prosody with strong social relevance. Based on previous neuroimaging studies, we hypothesized that the bilateral STG (Ethofer et al., 2007; Ethofer, Anders, Wiethoff, et al., 2006; Grandjean et al., 2005), the amygdala (Quadflieg et al., 2008; Sander et al., 2005; Morris et al., 1999; Phillips et al., 1998), and the OFC (Blair et al., 1999) should exhibit sensitivity to anger as expressed by prosody, whereas a distinct set of cortical regions comprising the right posterior MTG/STS (Ethofer, Anders, Erb, Herbert, et al., 2006; Wildgruber et al., 2005; Mitchell et al., 2003) and the bilateral IFG/OFC (Ethofer, Anders, Erb, Herbert, et al., 2006; Wildgruber et al., 2002, 2004; Imaizumi et al., 1997) was expected to show task-related brain activity.

Furthermore, we wanted to clarify whether task- and stimulus-driven components of the brain structures subserving prosody recognition are influenced by the novelty of the stimuli and whether habituation in these regions is modulated by task instructions or expressed emotion. To this end, we presented adjectives and nouns spoken in angry or neutral prosody during an event-related fMRI experiment and instructed our participants either to judge the emotion expressed in the voice or to classify the spoken word as adjective or noun. Each stimulus was presented three times. Thus, the main fMRI experiment conformed to a 3 × 2 × 2 factorial adaptation design with presentation (first, second, third), emotion (angry, neutral), and task (emotion classification, word classification) as separate factors. In addition, we also performed an fMRI localizer scan to define voice-sensitive brain regions (Belin et al., 2000) and to verify that stimulus-driven effects due to the expressed affective prosody occurred within these regions.

METHODS

Subjects

Twelve subjects (6 women, 6 men, mean age = 23.2 years) took part in the behavioral study and 24 subjects (12 women, 12 men, mean age = 26.3 years) were included in the fMRI study. All participants were right-handed native speakers of German and had no history of neurological or psychiatric diseases. Right-handedness was assessed using the Edinburgh Inventory (Oldfield, 1971). The study was approved by the Ethical Committee of the University of Tübingen and conducted according to the Declaration of Helsinki.

Stimulus Material

The stimulus set used in the present study comprised 16 adjectives and 16 nouns which were spoken by six professional actors (3 men, 3 women) in neutral or angry prosody (8 adjectives and 8 nouns, respectively). Stimulus examples are provided as supplemental material. These stimuli were selected from a larger corpus of stimuli and rated in several prestudies (see Ethofer et al., 2007) to ensure that they are emotionally neutral and low arousing with respect to the word content, that they represent the emotional category intended by the actors, and to evaluate them with respect to emotional valence and arousal as expressed by prosody. The gender of the speakers was balanced over prosodic categories (angry, neutral) and word classes (adjectives, nouns). For each stimulus, the mean intensity (I) and variability (standard deviation [SD] of I, normalized to mean I), mean fundamental frequency (F0), and variability of F0 (SD of F0, normalized to the mean F0) were calculated from I and F0 contours determined for the whole stimulus duration using Praat software (version: 4.3.20, www.praat.org; Boersma, 2001). Since a previous fMRI study of our group revealed that mean I, number of syllables, and stimulus duration can influence activation within voice-sensitive regions (Wiethoff et al., 2008), stimulus selection aimed at minimizing differences in these parameters over prosodic categories (angry, neutral) and word classes (adjectives, nouns). Valence (ranging from 1 = highly negative to 9 = highly positive) and arousal (ranging from 1 = very calming to 9 = highly arousing) expressed by word content and affective prosody, number of syllables, and acoustic parameters of the stimuli are presented in Table 1. To verify that repeated presentation of these stimuli results in priming with shortening of reaction times and that the tasks to be used in the main fMRI experiment are comparable with respect to task difficulty, we evaluated our stimuli in a behavioral prestudy.

Table 1. 

Valence and Arousal as Expressed by Word Content and Affective Prosody, Number of Syllables, and Acoustic Parameters of the Stimuli


Valence (Word Content)
Arousal (Word Content)
Valence (Prosody)
Arousal (Prosody)
Number of Syllables
Adjectives (AP) 4.9 ± 0.5 3.2 ± 0.4 2.9 ± 0.5 5.9 ± 0.5 2.8 ± 0.5 
Adjectives (NP) 5.1 ± 0.5 3.3 ± 0.3 4.8 ± 0.2 2.7 ± 0.4 2.5 ± 1.2 
Nouns (AP) 5.1 ± 0.4 2.0 ± 0.5 3.0 ± 0.6 6.0 ± 1.2 3.1 ± 1.0 
Nouns (NP) 5.2 ± 0.4 1.8 ± 0.5 5.1 ± 0.3 3.0 ± 0.4 3.0 ± 1.1 
 

 
Mean I [a.u.]
 
SD of I [%]
 
Mean F0 [Hz]
 
%SD of F0 [%]
 
Duration [sec]
 
Adjectives (AP) 78.6 ± 0.3 10.9 ± 3.3 154.7 ± 39.7 15.3 ± 6.9 0.84 ± 0.35 
Adjectives (NP) 78.6 ± 0.3 12.7 ± 2.1 143.0 ± 42.6 8.9 ± 4.0 0.82 ± 0.25 
Nouns (AP) 78.5 ± 0.3 10.4 ± 2.3 164.8 ± 47.8 18.3 ± 12.2 0.86 ± 0.31 
Nouns (NP) 78.6 ± 0.3 10.9 ± 3.0 150.1 ± 46.6 12.8 ± 6.3 0.81 ± 0.32 

Valence (Word Content)
Arousal (Word Content)
Valence (Prosody)
Arousal (Prosody)
Number of Syllables
Adjectives (AP) 4.9 ± 0.5 3.2 ± 0.4 2.9 ± 0.5 5.9 ± 0.5 2.8 ± 0.5 
Adjectives (NP) 5.1 ± 0.5 3.3 ± 0.3 4.8 ± 0.2 2.7 ± 0.4 2.5 ± 1.2 
Nouns (AP) 5.1 ± 0.4 2.0 ± 0.5 3.0 ± 0.6 6.0 ± 1.2 3.1 ± 1.0 
Nouns (NP) 5.2 ± 0.4 1.8 ± 0.5 5.1 ± 0.3 3.0 ± 0.4 3.0 ± 1.1 
 

 
Mean I [a.u.]
 
SD of I [%]
 
Mean F0 [Hz]
 
%SD of F0 [%]
 
Duration [sec]
 
Adjectives (AP) 78.6 ± 0.3 10.9 ± 3.3 154.7 ± 39.7 15.3 ± 6.9 0.84 ± 0.35 
Adjectives (NP) 78.6 ± 0.3 12.7 ± 2.1 143.0 ± 42.6 8.9 ± 4.0 0.82 ± 0.25 
Nouns (AP) 78.5 ± 0.3 10.4 ± 2.3 164.8 ± 47.8 18.3 ± 12.2 0.86 ± 0.31 
Nouns (NP) 78.6 ± 0.3 10.9 ± 3.0 150.1 ± 46.6 12.8 ± 6.3 0.81 ± 0.32 

All values represent mean ± standard deviation.

AP = angry prosody; NP = neutral prosody; a.u. = arbitrary units; Hz = Hertz.

Experimental Design

In both the behavioral and the main fMRI experiment, stimuli were presented in an event-related design. To maximize behavioral priming and repetition suppression effects, we limited the maximal time interval between the repetitions by clustering the stimuli in eight segments consisting of 12 trials. Each segment contained two adjectives and two nouns (one spoken in angry and one in neutral prosody, respectively), which were presented three times. Stimulus presentation of the 12 trials within these segments and the order of the eight segments were fully randomized. Each trial started with a visual display of the judgment scale that informed the subject whether the next stimulus should be classified according to the affective prosody (negative, neutral) or word class (adjective, noun). This display was presented for 6 sec and the voice stimuli were presented 2 sec after onset of the scale. Within each segment, subjects had to classify half of the stimuli with respect to affective prosody and the other half according to the word class. The task did not change for the three presentations of each stimulus. Word classification was chosen as nonemotional control task because its task difficulty is close to that of emotion classification (see below results of the behavioral prestudy), whereas gender classification is usually much easier to perform and can result in biases due to different levels of cognitive effort (e.g., Bach et al., 2008). For the sake of statistical power and to have an equal number of response alternatives in the emotion classification and the word classification task, we restricted the present study to angry and neutral prosody. To reduce synchronization of scanner noise and acoustic stimulation, trial onsets were jittered relative to scan onset in steps of 500 msec and the intertrial interval ranged from 8.85 to 12.85 sec. Subjects were instructed that they should classify the stimuli as accurately and quickly as possible, and that they could report their decision during stimulus presentation.

In the fMRI localizer scan, a block design with 32 auditory stimulation blocks (8 sec duration) and 16 silent blocks (8 sec duration) was employed. The distribution of the silent blocks was pseudorandomized and presentation of auditory stimulation blocks was fully randomized. Auditory stimulation blocks included 16 blocks with human voices (e.g., speech, sighs, laughs), 8 blocks with animal sounds (cries of various animals), and 8 blocks with modern human environmental sounds (e.g., doors, bells, telephones, cars). (For a detailed description of the stimulus material, see Belin et al., 2000.) Subjects were instructed to close their eyes and to listen to the stimuli. In both fMRI sessions, stimuli were presented binaurally via magnetic resonance-compatible headphones with piezoelectric signal transmission (Jancke, Wustenberg, Scheich, & Heinze, 2002).

Analysis of Behavioral Data

Accuracy rates during the two tasks were compared using two-tailed paired t tests. Reaction times were submitted to a three-factorial ANOVA with repetition (first, second, third), emotion (angry, prosody), and task (emotion vs. word classification) as within-subject factors. All values are given in mean ± standard error of the mean.

Image Acquisition

Structural and functional imaging data were acquired using a 1.5-T whole-body scanner (Siemens AVANTO, Erlangen, Germany). A magnetization prepared rapid acquisition gradient-echo sequence was employed to acquire high-resolution (1 × 1 × 1 mm3) T1-weighted structural images (TR = 1980 msec, TE = 3.09 msec, TI = 1100 msec). Functional images were obtained using a multislice echo-planar imaging (EPI) sequence (26 axial slices acquired in descending order, slice thickness 4 mm + 1 mm gap, TR = 2.17 sec, TE = 40 msec, field of view [FOV] = 192 × 192 mm2, 64 × 64 matrix, flip angle = 90°, bandwidth 1532 Hz/Px). EPI time series consisted of 490 images for the main experiment and 255 images for the localizer scan. For off-line correction of EPI image distortions, a static fieldmap (36 slices acquired in descending order, slice thickness = 3 mm + 1 mm gap, TR = 487 msec, TE (1) = 5.28 msec, TE (2) = 10.04 msec, FOV = 192 × 192 mm2, 64 × 64 matrix) was acquired prior to the functional measurements.

Data Analysis

For both the main experiment and the localizer scan, the first five fMRI volumes were discarded from further analysis to exclude measurements that preceded T1 equilibrium. Functional images were analyzed using statistical parametric mapping software (SPM5, Wellcome Department of Imaging Neuroscience, London, UK). For both sessions, preprocessing steps comprised realignment to the first volume of the time series, unwarping by use of a static field map (Andersson, Hutton, Ashburner, Turner, & Friston, 2001), normalization into MNI space (Montreal Neurological Institute, Collins, Neelin, Peters, & Evans, 1994, resampled voxel size: 3 × 3 × 3 mm3) and spatial smoothing with an isotropic Gaussian filter (10 mm full width at half maximum). Statistical analysis relied on a general linear model (Friston et al., 1994). For the main experiment, separate regressors were defined for each of the 12 conditions of the 3 × 2 × 2 factorial design using a stick function convolved with the hemodynamic response function. Events were time-locked to stimulus onset. For the localizer scan, three regressors were defined for presentation of human voices, animal voices, and modern human environmental sounds using a box-car function with a length of 8 sec that were convolved with the hemodynamic response function. To remove low-frequency components, a high-pass filter with a cutoff frequency of 1/128 Hz was used. Serial autocorrelations were accounted for by modeling the error term as a first-order autoregressive process with a coefficient of 0.2 (Friston et al., 2002) plus a white noise component (Purdon & Weisskoff, 1998).

The following contrast images were calculated for the main experiment:

  • (1) 

    To examine brain regions showing stimulus-driven effects, we compared activation to angry prosody relative to neutral prosody (anger > neutral).

  • (2) 

    To investigate brain regions showing task-driven effects, we compared activation during judgment of affective prosody relative to judgment of word class (emotion > word class). To exclude that task-related differences in brain activation can be solely explained by differences in task difficulty, parameter estimates were extracted from the most significantly activated voxel of every brain region with stronger responses during emotion- versus word-classification task, and then submitted to a regression analysis with reaction time as independent variable. The regression residuals obtained from these analyses were subsequently tested in separate paired t tests to investigate whether task-related brain activity was still significant after neutralization of possible contributions of task difficulty.

  • (3) 

    To identify brain regions showing repetition suppression effects, we compared activation to the first versus second presentation and to the second versus the third presentation of voice stimuli (first > second and second > third, respectively).

  • (4) 

    To identify brain regions where repetition suppression is modulated by affective prosody, we computed the interaction between repetition and emotion. To maximize sensitivity, we used the respective responses of the first and third presentation for this analysis ([angry prosody, first presentation > angry prosody, third presentation] > [neutral prosody, first presentation > neutral prosody, third presentation]).

  • (5) 

    To investigate brain regions where repetition suppression is modulated by task demands, we computed the interaction between repetition and task. Again, to maximize sensitivity, we used the first and third presentation for this analysis ([judgment of emotion, first presentation > judgment of emotion, third presentation] > [judgment of word class, first presentation > judgment of word class, third presentation]).

Statistical inference was based on second-level random effects analyses. Activations are reported descriptively at a height threshold of p < .005 and an extent threshold of k > 8 voxels. Significance was assessed at the cluster level with an extent threshold of p < .05 (corresponding to a minimal cluster size of 180 voxels), corrected for multiple comparisons across the whole brain. For the activations within the amygdala and the OFC, the search volume was restricted to these brain regions as defined by the automatic anatomical labeling toolbox (Tzourio-Mazoyer et al., 2002) and correction for multiple comparisons was carried out using small volume correction (Worsley et al., 1996).

To examine whether brain regions showing task-driven or stimulus-driven effects are modulated by repeated presentation, parameter estimates from these regions were submitted to repeated measures three-factorial ANOVAs with repetition (first, second, third), emotion (angry, prosody), and task (judgment of emotion > judgment of word class) as within-subject factors.

To investigate whether stimulus-driven effects due to affective prosody are located within voice-sensitive regions, brain activation maps during perception of human voices were contrasted with brain activation maps during perception of animal sounds and environmental sounds. Statistical parametric t maps obtained at the single subject level were obtained at a height threshold of p < .005 (uncorrected) and probability maps showing a differential activation at this threshold for (i) at least 50% and (ii) at least 75% of the subjects were calculated across the whole brain.

RESULTS

Behavioral Data

In the behavioral prestudy, participants identified stimuli during the word classification task with a slightly higher accuracy (92.4 ± 1.8%) than during the emotion classification task (90.1 ± 2.1%). This difference in accuracy was not significant. A three-factorial ANOVA with repetition (first, second, third), emotion (angry, neutral), and task (emotional classification, word classification) as within-subject factors and reaction time as dependent variable yielded a significant main effect of repetition [F(2, 10) = 29.6, p < .001]. This was due to an acceleration of reaction times between the first (1.30 ± 0.11 sec) and the second presentations [1.03 ± 0.09 sec, paired t(11) = 7.8, p < .001, one-tailed; see Figure 1A]. No such acceleration occurred between the second and the third presentations (1.03 ± 0.10 sec). No main effect of task or emotion was found. A significant interaction between emotion and task [F(1, 11) = 17.3, p < .01] was due to faster reaction times for angrily than for neutrally spoken words during the emotion classification task, whereas the reverse pattern with longer reaction times for angry than for neutral prosody was observed during the word classification task. No other first- or second-order interaction was significant.

Figure 1. 

Reaction times during the behavioral prestudy (A) and the fMRI experiment (B).

Figure 1. 

Reaction times during the behavioral prestudy (A) and the fMRI experiment (B).

In the fMRI study, participants judged the stimuli with similar accuracy rates as those in the behavioral prestudy. However, the difference in accuracy rates between the word classification task (92.8 ± 0.7%) and the emotional classification task (89.0 ± 1.6%) was now slightly more pronounced and reached significance [paired t(23) = 2.6, p < .05, two-tailed]. A similar pattern of reaction times was found in the fMRI study as in the behavioral prestudy (see Figure 1B). The three-factorial ANOVA on reaction times yielded a significant main effect of repetition [F(2, 22) = 105.6, p < .001]: Again, there was an acceleration between the first (1.51 ± 0.07 sec) and the second presentations [1.19 ± 0.06 sec, paired t(23) = 12.0, p < .001, one-tailed], but also between the second and third presentations [1.13 ± 0.06 sec, paired t(23) = 3.3, p < .05, one-tailed]. Furthermore, the main effect of task was also significant during the fMRI study [F(1, 23) = 4.5, p < .05] with slightly faster reaction times for word classification (1.23 ± 0.05 sec) than emotion classification [1.32 ± 0.07 sec, paired t(23) = 2.2, p < .05, two-tailed]. There was no main effect of emotion, and no first- or second-order interaction was significant in the fMRI experiment.

fMRI Results: Functional Voice Localizer

In the localizer scan, a consistent pattern with stronger responses to human voices than to animal voices or environmental sounds was found in the bilateral STG exclusively (see Figure 2B). This selective activation in the bilateral STG is consistent with previous studies on voice-specific areas in the temporal cortex (Ethofer et al., 2007; Ethofer, Anders, Wiethoff, et al., 2006; Grandjean et al., 2005; Belin et al., 2000).

Figure 2. 

(A) Brain regions showing stimulus-driven effects (angry prosody > neutral prosody) rendered on the surface of a brain template and a transversal slice obtained at z = −15 with a threshold of p < .005 (uncorrected). Parameter estimates are presented for the right STG (upper left panel), the left STG (upper right panel), the right IFG/OFC (lower left panel), and the left IFG/OFC (lower right panel). (B) Voice-sensitive regions obtained in the localizer scan, showing consistent activation at a height threshold of p < .005 (uncorrected) in at least 50% (blue) or 75% (red) of the subjects rendered on a transversal slice at z = −3. (C) For comparison, stimulus-driven effects (angry prosody > neutral prosody) in the main fMRI experiment are illustrated at a threshold of p < .005 (uncorrected) on the same transversal slice at z = −3.

Figure 2. 

(A) Brain regions showing stimulus-driven effects (angry prosody > neutral prosody) rendered on the surface of a brain template and a transversal slice obtained at z = −15 with a threshold of p < .005 (uncorrected). Parameter estimates are presented for the right STG (upper left panel), the left STG (upper right panel), the right IFG/OFC (lower left panel), and the left IFG/OFC (lower right panel). (B) Voice-sensitive regions obtained in the localizer scan, showing consistent activation at a height threshold of p < .005 (uncorrected) in at least 50% (blue) or 75% (red) of the subjects rendered on a transversal slice at z = −3. (C) For comparison, stimulus-driven effects (angry prosody > neutral prosody) in the main fMRI experiment are illustrated at a threshold of p < .005 (uncorrected) on the same transversal slice at z = −3.

fMRI Results: Main Experiment

In the main experiment on task- and stimulus-dependent effects, we first compared the brain responses to angry and neutral prosody. This contrast revealed strong activations within the bilateral STG, IFG/OFC, insula, amygdala, and mediodorsal thalami (see Figure 2A and Table 2). The three-factorial repeated measures ANOVA on activation parameters (betas) extracted from these regions revealed a significant main effect of repetition for all these brain regions [all F(1, 22) > 7, p < .01]. In addition, the bilateral IFG/OFC showed a main effect of task [F(1, 23) = 6.1, p < .05 and F(1, 23) = 32.3, p < .001 for the right and left side, respectively], as well as an interaction between emotion and repetition [F(2, 22) = 7.7, p < .01 and F(2, 22) = 3.5, p < .05 for the right and left side, respectively]. The activation clusters in the bilateral STG showing stronger responses to angry than neutral prosody overlapped precisely with the region previously defined by the voice localizer experiment (compare Figure 2B and Figure 2C).

Table 2. 

Brain Regions Showing a Main Effect of Emotion, Main Effect of Task, or an Interaction between Emotion and Repetition

Anatomical Definition
MNI Coordinates
Z Score
Cluster Size
Main Effect of Emotion (Angry > Neutral) 
Right STG 57 30 −3 5.53 593* 
Left STG −57 −15 −3 4.74 387* 
Right IFG/OFC 48 30 −9 3.82 310* 
Left IFG/OFC −48 27 −9 4.83 267* 
Left Insula −24 15 −18 3.85 107 
Right Insula 30 6 −18 3.72 70 
Mediodorsal Thalamus 6 3 6 3.46 74 
Left amygdala −18 −9 −15 3.87 43** 
Right amygdala 18 −9 −18 2.90 9** 
 
Main Effect of Task (Emotional Classification > Word Classification) 
Left IFG/OFC −45 36 −9 5.66 357* 
Right IFG/OFC 45 39 −6 4.20 275* 
Right DMPFC 12 27 48 4.53 173 
Right posterior MTG/STS 51 −33 3 3.09 17 
 
Interaction between Emotion and Repetition ((Angry First > Angry Third) > (Neutral First > Neutral Third)) 
Right OFC 54 30 −9 3.93 69*** 
Left OFC −42 30 −15 3.04 31*** 
Anatomical Definition
MNI Coordinates
Z Score
Cluster Size
Main Effect of Emotion (Angry > Neutral) 
Right STG 57 30 −3 5.53 593* 
Left STG −57 −15 −3 4.74 387* 
Right IFG/OFC 48 30 −9 3.82 310* 
Left IFG/OFC −48 27 −9 4.83 267* 
Left Insula −24 15 −18 3.85 107 
Right Insula 30 6 −18 3.72 70 
Mediodorsal Thalamus 6 3 6 3.46 74 
Left amygdala −18 −9 −15 3.87 43** 
Right amygdala 18 −9 −18 2.90 9** 
 
Main Effect of Task (Emotional Classification > Word Classification) 
Left IFG/OFC −45 36 −9 5.66 357* 
Right IFG/OFC 45 39 −6 4.20 275* 
Right DMPFC 12 27 48 4.53 173 
Right posterior MTG/STS 51 −33 3 3.09 17 
 
Interaction between Emotion and Repetition ((Angry First > Angry Third) > (Neutral First > Neutral Third)) 
Right OFC 54 30 −9 3.93 69*** 
Left OFC −42 30 −15 3.04 31*** 

Height threshold: p < .005 (uncorrected), extent threshold k > 8 voxels.

IFG = inferior frontal gyrus; OFC = orbito-frontal cortex; DMPFC = dorsomedial prefrontal cortex; STG = superior temporal gyrus; MTG = middle temporal gyrus.

* 

p < .05 (corrected at cluster level for multiple comparisons across the whole brain).

** 

p < .05 (corrected at cluster level for multiple comparisons within the amygdala).

*** 

p < .05 (corrected at cluster level for multiple comparisons with the OFC).

We then examined task-related effects by comparing activations during the emotion classification relative to the word classification task. This contrast showed increases in the bilateral IFG/OFC, the right dorsomedial prefrontal cortex, and the right posterior MTG/STS (see Figure 3 and Table 2). These effects were not simply due to task difficulty because we found that residuals obtained from regression analyses using reaction time as an independent factor, and activation parameters (betas) from each of these regions as dependent variable, were still significantly larger for the emotion classification than for the word classification task [paired t(23) > 2.2 for all four regions, p < .05, one-tailed]. No significant activations were found in the reverse contrast. In addition, the three-factorial repeated measures ANOVA on responses in these regions revealed a significant main effect of repetition for the bilateral IFG/OFC [F(2, 22) = 24.3, p < .001 and F(2, 22) = 8.5, p < .01 for the right and left side, respectively] and for the right posterior MTG/STS [F(2, 22) = 20.9, p < .001]. Moreover, a significant main effect of emotion was also found for the bilateral IFG/OFC [F(1, 23) = 18.8, p < .001 and F(1, 23) = 8.6, p < .01 for the right and left side, respectively] and the right posterior MTG [F(1, 23) = 23.2, p < .001]. The interaction between emotion and repetition was marginally significant for the bilateral IFG/OFC [F(2, 22) = 3.4, p = .051 and F(2, 22) = 3.3, p = .06 for the right and left side, respectively]. No other first- or second-order interaction was found to be significant in brain regions showing task-related differences in activation.

Figure 3. 

Brain regions showing task-driven effects (emotion classification > word classification) rendered on the surface of a brain template and a sagittal slice obtained at x = 6 with a threshold of p < .005 (uncorrected). Parameter estimates are presented for the right MTG/STS (upper left panel), the right dorsomedial prefrontal cortex (DMPFC; upper right panel), the right IFG/OFC (lower left panel), and the left IFG/OFC (lower right panel).

Figure 3. 

Brain regions showing task-driven effects (emotion classification > word classification) rendered on the surface of a brain template and a sagittal slice obtained at x = 6 with a threshold of p < .005 (uncorrected). Parameter estimates are presented for the right MTG/STS (upper left panel), the right dorsomedial prefrontal cortex (DMPFC; upper right panel), the right IFG/OFC (lower left panel), and the left IFG/OFC (lower right panel).

Comparison of event-related responses during first and the second presentations revealed widespread activations in cortical areas including the bilateral STG, the middle and inferior frontal cortex, the parietal and occipital cortex, as well as subcortical structures, such as the thalamus and the caudate nucleus (see Figure 4A, brain areas marked in red and yellow). Regions that showed additionally significant habituation of their hemodynamic responses between the second and third presentations were located in the bilateral inferior frontal and superior temporal cortex (see Figure 4A, brain areas marked in blue and light blue).

Figure 4. 

(A) Brain regions showing habituation between the first and second presentations (yellow/red) and additionally between the second and the third presentations (light blue/blue) rendered on the surface of a brain template and a transversal slice obtained at z = 3 with a threshold of p < .005 (uncorrected). (B) Brain regions showing an interaction between stimulus-driven effects and repetition ([angry prosody, first presentation > angry prosody, third presentation] > [neutral prosody, first presentation > neutral prosody, third presentation]) rendered on the surface of a brain template and a transversal slice obtained at z = −12 with a threshold of p < .005 (uncorrected). Parameter estimates are presented for the right OFC (lower left panel) and the left OFC (lower right panel).

Figure 4. 

(A) Brain regions showing habituation between the first and second presentations (yellow/red) and additionally between the second and the third presentations (light blue/blue) rendered on the surface of a brain template and a transversal slice obtained at z = 3 with a threshold of p < .005 (uncorrected). (B) Brain regions showing an interaction between stimulus-driven effects and repetition ([angry prosody, first presentation > angry prosody, third presentation] > [neutral prosody, first presentation > neutral prosody, third presentation]) rendered on the surface of a brain template and a transversal slice obtained at z = −12 with a threshold of p < .005 (uncorrected). Parameter estimates are presented for the right OFC (lower left panel) and the left OFC (lower right panel).

A whole-brain analysis revealed that the right and left OFC were the only brain areas showing an interaction between emotion and repetition. This interaction was due to stronger repetition decreases for angry than for neutral prosody (see Figure 4B and Table 2). A similar whole-brain analysis on interactions between task and repetition did not yield any suprathreshold clusters.

DISCUSSION

The present neuroimaging study allowed us to disentangle the differential contributions of emotion, task instruction, and stimulus repetition (or novelty) to cerebral responses evoked by affective prosody. Brain imaging results converged with behavioral data to indicate differential effects of stimulus and task factors on voice processing.

Behavioral Results

We found reliable evidence for priming with acceleration of reaction times between the first and the second presentations during the behavioral prestudy. This acceleration of the subjects' responses did not interact with task instructions or emotion expressed by prosody, in agreement with previous behavioral findings for facial expressions (Ishai et al., 2004). Moreover, similar hit rates and reaction times for the two tasks indicated a similar degree of difficulty. However, there was an interaction between emotion and task, reflecting the fact that subjects responded faster to words spoken in angry than in neutral prosody during emotion classification, whereas angry prosody resulted in longer reaction times during word classification. A possible explanation for this pattern might be that, in some cases, one can already infer the stimulus' emotional intention from the first few syllables, whereas the decision that a stimulus is neutral requires listening to the whole word before excluding any emotionality. On the other hand, slower reaction times for angry than for neutral prosody during word classification might reflect an involuntary shift of the subjects' attention to the emotional information in the voice (Grandjean, Sander, Lucas, Scherer, & Vuilleumier, 2007; Wambacq, Shea-Miller, & Abubakr, 2004).

In general, the same pattern of behavioral responses and reliable priming effects was observed during fMRI scanning. Again, the facilitation of responses with repetition did not interact with task instructions or vocally expressed emotion. However, the difference in hit rates and reaction times between the two tasks was slightly more pronounced in the fMRI study, suggesting that the scanner noise might have impeded judgment of affective prosody more than judgment of the word class. Furthermore, the interaction between emotion and task failed to reach significance in the fMRI study, which was mostly due to a lack of prolongation of reaction times to angrily spoken words during word classification. Importantly, however, hit rates for emotion discrimination were similar for the behavioral and the fMRI study (90.1% and 89.0%, respectively), indicating that this was not due to a general incomprehensibility of affective prosody in the presence of scanner noise. We speculate that the higher level of attention the subjects had to maintain during scanning to successfully complete the task made them less susceptible to involuntary shifts of attention induced by affective prosody.

Brain Areas Showing Stimulus-driven Activity

Brain regions responding stronger to angry than to neutral prosody were situated bilaterally in the middle part of the STG, IFG/OFC, amygdala, insula, and mediodorsal thalamus. Stimulus-driven effects in the bilateral mid-STG strongly overlapped with sensitivity to human voices relative to animal voices or modern environmental sounds. These combined results confirm that a selective enhancement of neural responses by affective prosody occurs within voice-sensitive regions of the associative auditory cortex (Belin et al., 2000). The mid-STG has previously been reported to show stronger responses to a variety of emotional prosodic categories (Ethofer et al., 2007; Ethofer, Anders, Wiethoff, et al., 2006; Grandjean et al., 2005). As in previous studies, this enhancement was particularly prominent in the right hemisphere. However, in the present study, the left mid-STG also showed significant stimulus-driven effects that survived correction for multiple comparisons across the whole brain, suggesting that associative auditory cortices of both hemispheres are involved in decoding affective information from the voice, with the right-hemispheric areas showing only a relative superiority. This interpretation is supported by results from studies of patients with unilateral brain lesions demonstrating that left-hemispheric brain lesions produce impairments midway between healthy controls and patients with right-hemispheric lesions during comprehension of affective prosody (Kucharska-Pietura, Phillips, Gernand, & David, 2003).

Remarkably, our results reveal that brain activity in the bilateral mid-STG was not influenced by task demands. In keeping with previous results (Ethofer, Anders, Wiethoff, et al., 2006), this insensitivity to task instructions accords with the suggestion that the mid-STG constitute a relatively early stage for the perceptual analysis of voices, whose activity is mainly driven by stimulus features irrespective of whether the subjects' attention is focused on affective prosody or some other aspect of the stimuli (Wildgruber, Ackermann, Kreifelts, & Ethofer, 2006). Brain activity in the bilateral mid-STG showed clear repetition suppression effects, however, there was no effect of task instructions or emotion on habituation in this brain region.

Other brain regions that showed a similar response pattern, with clear sensitivity to stimulus-driven factors but insensitivity to task demands, and repetition suppression effects unaffected by either task or emotion, included the bilateral amygdala, insula, and mediodorsal thalami. Neuroimaging evidence for an emotional modulation of voice processing in these brain regions is less established than for the mid-STG. In particular, for the amygdala, conflicting results have been reported with either increased (Phillips et al., 1998) or decreased (Morris et al., 1999) responses to fearful vocalizations (e.g., screams). However, the present study is in line with previous studies reporting stronger amygdala responses to angry relative to neutral prosody in the amygdala (Quadflieg et al., 2008; Sander et al., 2005) and the anterior insula (Quadflieg et al., 2008). Furthermore, increased activity to affective prosody within the mediodorsal thalamus for a variety of emotional categories including anger has been recently reported (Kreifelts, Ethofer, Grodd, Erb, & Wildgruber, 2007), suggesting that some enhancement of neural activity might already occur at early processing levels downstream to the auditory cortex.

The limited temporal resolution of fMRI impedes direct inference on the precise temporal distribution of brain activity. However, based on their shared functional properties including stimulus-driven brain activity, insensitivity to task demands, and marked repetition suppression without influences of task or emotion, we speculate that the mediodorsal thalamus, the amygdala, the insula, and the mid-STG constitute an early processing stage during recognition of vocal emotions, which responds to anger prosody in a largely automatic manner.

Brain Areas Showing Task-related Activity

Brain regions showing stronger responses during the emotional classification than during the word classification task were located in the right posterior MTG, in the bilateral IFG/OFC, and in the right medial prefrontal cortex. Task-related activation within the right posterior MTG is consistent with results from previous fMRI studies, in which participants made explicit judgments on the emotional category (Wildgruber et al., 2005; Mitchell et al., 2003) or emotional valence (Ethofer, Anders, Erb, Herbert, et al., 2006) of auditory stimuli. It has been suggested (Wildgruber et al., 2006) that activation in the right posterior MTG might reflect effortful extraction of suprasegmental features (Poeppel et al., 2004; Meyer, Alter, Friederici, Lohmann, & von Cramon, 2002; Zatorre, 2001) that are characteristic for emotional information in the voice.

Enhanced responses within the bilateral IFG/OFC during explicit categorization of affective prosody is also consistent with previous neuroimaging results (Ethofer, Anders, Erb, Herbert, et al., 2006; Wildgruber et al., 2004; Imaizumi et al., 1997), and with lesion studies demonstrating that damage to either the right or left IFG/OFC can impair comprehension of affective prosody (Hornak et al., 2003). It has been suggested that that the involvement of the prefrontal cortex during processing of emotional prosody might reflect engagement of working memory components because activity in the lateral prefrontal cortex increases with working memory load (Mitchell, 2007). However, this result was not specific for emotional prosody because it occurred in a similar manner during lexico-semantic processing. A recent fMRI study demonstrated activation of an auditory–motor mirror system in the premotor cortex during passive listening to various affective vocalizations (Warren et al., 2006). No differential task- or stimulus-dependent responses were found in this brain region in the present study. Suppression of spontaneous orofacial gestures due to the explicit categorization task may offer a possible explanation for this discrepancy.

To our knowledge, no imaging study so far has indicated that the right medial prefrontal cortex is implicated in active judgment of affective prosody. Although this effect is therefore only reported descriptively here, it converges with the view that medial prefrontal regions are critically involved in more subjective appraisal of emotional significance of sensory events, as already found in the visual domain (Posner et al., 2008; Vuilleumier, Armony, & Dolan, 2003; Lane, Fink, Chau, & Dolan, 1997).

Reaction times obtained for the emotion classification task were slightly longer than those for the word classification task, raising the question of whether task-related effects might potentially be biased by differences in task difficulty. However, by using additional regression analyses, we were able to remove all the variance correlating with reaction time, and thus, to demonstrate that, for all regions showing these differences, the task-related activation was still significantly larger for emotion than for word classification. Therefore, it is unlikely that different levels of difficulty can account for the task-related activations found in our study.

In addition to task-related differences, the right posterior MTG/STS and the bilateral IFC/OFC also exhibited stimulus-driven effects, with stronger responses to angry than to neutral prosody. Thus, these results lend support to the notion that neural activity within these cortical areas is not exclusively driven by task demands, but is also modulated by stimulus properties. Furthermore, these findings are in agreement with previous reports on the right MTG/STS and the IFC/OFC, showing differential responses to vocally expressed happiness and anger (Johnstone, van Reekum, Oakes, & Davidson, 2006). The presence of stimulus-driven effects within task-dependent brain areas, albeit at a lower level of significance than in the mid-STG, suggests that enhanced neural responses to affective prosody spread through the neural system beyond early cortical areas engaged by voice analysis, and that emotional features may also boost activity in higher-order cortical areas underlying explicit judgment of prosody.

General Effects of Stimulus Repetition and Specific Interactions between Repetition and Affective Prosody

The most significant decrease of hemodynamic responses across the three stimulus presentations was found within the left inferior frontal cortex, a region that has been reported to show repetition suppression effects for various types of stimuli including faces/scenes (Bunzeck et al., 2006), objects (Vuilleumier, Schwartz, Duhoux, Dolan, & Driver, 2005), visually presented words (Meister, Buelte, Sparing, & Boroojerdi, 2007), spoken words/pseudowords (Orfanidou et al., 2006), and spoken sentences (Dehaene-Lambertz et al., 2006; Hasson et al., 2006). A significant habituation of brain responses across the three presentations was also found within bilateral superior temporal cortices in keeping with studies reporting repetition suppression effects in the auditory cortex for repeated exposure to acoustic stimuli (Dehaene-Lambertz et al., 2006; Hasson et al., 2006; Saad et al., 2006). However, comparison of brain responses to the first and second stimulus presentations revealed widespread activations in cortical areas including bilateral sensory cortices in the temporal and occipital lobes, prefrontal and motor cortices, as well as subcortical structures. These findings strongly suggest that the phenomenon of repetition suppression is not restricted to sensory areas, but presumably occurs for many brain regions engaged by a certain stimulus in the context of a given task.

To disentangle habituation effects that are specific for processing of angry prosody, we presented our stimuli within a factorial design. Such factorial habituation designs represent an effective way of isolating habituation effects attributable to certain cognitive factors from other confounding components which might also show habituation (such as phonological, lexical, semantic, working memory, or motor components) as these confounding factors are eliminated in the interaction between repetition and the cognitive factor of interest. The only brain region which showed a specific habituation with an interaction between repetition and emotion as expressed by prosody was found in the bilateral OFC. This interaction was due to the fact that hemodynamic responses in the OFC strongly habituated for stimuli spoken in angry prosody, whereas habituation for neutral trials was much less pronounced. Thus, the OFC showed particularly strong responses to stimuli that were both novel and emotional, suggesting that this region may be essential for prompting the organism to evaluate and respond to stimuli with affective value when these have not been encountered before.

This functional property makes the OFC an ideal site for detection of socially relevant stimuli and adds to lesion studies suggesting that the OFC is critical for normal social functioning (Beer, John, Scabini, & Knight, 2006), especially for correct recognition and appropriate reaction to the emotion of anger (Blair & Cipolotti, 2000). Furthermore, these novel results on vocally expressed anger are in line with a previous neuroimaging study (Blair et al., 1999) showing enhanced responsiveness of the OFC to angry relative to neutral faces. Taken together, these data point to a more general role of the OFC for the detection of negative emotional information irrespective of modality—a hypothesis that is anatomically well supported by the fact that it receives highly processed afferents from all sensory cortices (for a review, see Rolls, 2004) and received recent support from an fMRI study evidencing differential responses of this region in response to valence information expressed by word content (Alia-Klein et al., 2007).

In summary, our results shed new lights on several functional properties of the brain regions implicated in the processing of affective prosody. The bilateral amygdala, insula, mediodorsal thalami, as well as voice-sensitive regions in the mid-STG, demonstrated stimulus-driven effects with stronger responses to angry than to neutral prosody. These stimulus-driven effects were not influenced by task demands, suggesting that these brain regions represent a processing stage for identification of emotional cues in the voice that operates relatively independent of task-related control. In contrast to this, the right MTG/STS and the bilateral IFG/OFC showed significant task-related increases, but also responded stronger to angry than to neutral prosody, indicating that some features from early processing levels are also encoded in higher-order cortices that subserve explicit recognition of affective prosody. Habituation of responses due to repeated stimulus presentation was observed in widespread cortical and subcortical regions, demonstrating that repetition suppression is not limited to perceptual stages but involves a distributed system recruited by processing of emotional voices. Importantly, the OFC was the only area that showed an emotion-specific habituation with enhanced responses during initial presentations of angry prosody that strongly decreased across repetitions. This unique pattern with pronounced responses to stimuli that were both novel and emotional points to this region as a key structure for appraisal of socially relevant information and, together with previous findings for angry facial expressions, suggests a major role for the OFC in processing of anger across different sensory modalities. Increased OFC activations to vocal anger in patients with social phobia (Quadflieg et al., 2008) indicate that this might be of clinical relevance for diagnosis and, possibly, also monitoring of treatment of patients with social disorders.

Acknowledgments

This study was supported by the Deutsche Forschungsgemeinschaft (Sonderforschungsbereich 550-B10).

Reprint requests should be sent to Thomas Ethofer, Laboratory for Behavioral Neurology & Imaging of Cognition, Department of Neurosciences and Clinic of Neurology, University Medical Center of Geneva, 1 Rue Michel-Servet, 1211 Geneva, Switzerland, or via e-mail: Thomas.Ethofer@medecine.unige.ch.

REFERENCES

REFERENCES
Alia-Klein
,
N.
,
Goldstein
,
R. Z.
,
Tomasi
,
D.
,
Zhang
,
L.
,
Fagin-Jones
,
S.
,
Telang
,
F.
, et al
(
2007
).
What is in a word? No versus Yes differentially engage the lateral orbitofrontal cortex.
Emotion
,
7
,
649
659
.
Andersson
,
J. L.
,
Hutton
,
C.
,
Ashburner
,
J.
,
Turner
,
R.
, &
Friston
,
K. J.
(
2001
).
Modeling geometric deformations in EPI time series.
Neuroimage
,
13
,
903
919
.
Bach
,
D.
,
Grandjean
,
D.
,
Sander
,
D.
,
Herdener
,
M.
,
Strik
,
W. K.
, &
Seifritz
,
E.
(
2008
).
The effect of appraisal level on processing of emotional prosody in meaningless speech.
Neuroimage
,
42
,
919
927
.
Beer
,
J. S.
,
John
,
O. P.
,
Scabini
,
D.
, &
Knight
,
R. T.
(
2006
).
Orbitofrontal cortex and social behavior: Integrating self-monitoring and emotion–cognition interactions.
Journal of Cognitive Neuroscience
,
18
,
871
879
.
Belin
,
P.
, &
Zatorre
,
R. J.
(
2003
).
Adaptation to speaker's voice in right anterior temporal lobe.
NeuroReport
,
14
,
2105
2109
.
Belin
,
P.
,
Zatorre
,
R. J.
,
Lafaille
,
P.
,
Ahad
,
P.
, &
Pike
,
B.
(
2000
).
Voice-selective areas in human auditory cortex.
Nature
,
403
,
309
312
.
Blair
,
R. J.
, &
Cipolotti
,
L.
(
2000
).
Impaired social response reversal. A case of “acquired sociopathy”.
Brain
,
123
,
1122
1141
.
Blair
,
R. J.
,
Morris
,
J. S.
,
Frith
,
C. D.
,
Perrett
,
D. I.
, &
Dolan
,
R. J.
(
1999
).
Dissociable neural responses to facial expressions of sadness and anger.
Brain
,
122
,
883
893
.
Boersma
,
P.
(
2001
).
Praat, a system for doing phonetics by computer.
Glot International
,
5
,
341
345
.
Buckner
,
R. L.
,
Petersen
,
S. E.
,
Ojemann
,
J. G.
,
Miezin
,
F. M.
,
Squire
,
L. R.
, &
Raichle
,
M. E.
(
1995
).
Functional anatomical studies of explicit and implicit memory retrieval tasks.
Journal of Neuroscience
,
15
,
12
29
.
Bunzeck
,
N.
,
Schutze
,
H.
, &
Duzel
,
E.
(
2006
).
Category-specific organization of prefrontal response-facilitation during priming.
Neuropsychologia
,
44
,
1765
1776
.
Cohen
,
L.
,
Jobert
,
A.
,
Le Bihan
,
D.
, &
Dehaene
,
S.
(
2004
).
Distinct unimodal and multimodal regions for word processing in the left temporal cortex.
Neuroimage
,
23
,
1256
1270
.
Collins
,
D. L.
,
Neelin
,
P.
,
Peters
,
T. M.
, &
Evans
,
A. C.
(
1994
).
Automatic 3D intersubject registration of MR volumetric data in standardized Talairach space.
Journal of Computer Assisted Tomography
,
18
,
192
205
.
Dehaene-Lambertz
,
G.
,
Dehaene
,
S.
,
Anton
,
J. L.
,
Campagne
,
A.
,
Ciuciu
,
P.
,
Dehaene
,
G. P.
, et al
(
2006
).
Functional segregation of cortical language areas by sentence repetition.
Human Brain Mapping
,
27
,
360
371
.
Ethofer
,
T.
,
Anders
,
S.
,
Erb
,
M.
,
Droll
,
C.
,
Royen
,
L.
,
Saur
,
R.
, et al
(
2006
).
Impact of voice on emotional judgment of faces: An event-related fMRI study.
Human Brain Mapping
,
27
,
707
714
.
Ethofer
,
T.
,
Anders
,
S.
,
Erb
,
M.
,
Herbert
,
C.
,
Wiethoff
,
S.
,
Kissler
,
J.
, et al
(
2006
).
Cerebral pathways in processing of affective prosody: A dynamic causal modeling study.
Neuroimage
,
30
,
580
587
.
Ethofer
,
T.
,
Anders
,
S.
,
Wiethoff
,
S.
,
Erb
,
M.
,
Herbert
,
C.
,
Saur
,
R.
, et al
(
2006
).
Effects of prosodic emotional intensity on activation of associative auditory cortex.
NeuroReport
,
17
,
249
253
.
Ethofer
,
T.
,
Pourtois
,
G.
, &
Wildgruber
,
D.
(
2006
).
Investigating audiovisual integration of emotional signals in the human brain.
Progress in Brain Research
,
156
,
345
361
.
Ethofer
,
T.
,
Wiethoff
,
S.
,
Anders
,
S.
,
Kreifelts
,
B.
,
Grodd
,
W.
, &
Wildgruber
,
D.
(
2007
).
The voices of seduction: Cross-gender effects in processing of erotic prosody.
Social Cognitive and Affective Neuroscience
,
2
,
334
337
.
Fecteau
,
S.
,
Belin
,
P.
,
Joanette
,
Y.
, &
Armony
,
J. L.
(
2007
).
Amygdala responses to nonlinguistic emotional vocalizations.
Neuroimage
,
36
,
480
487
.
Friston
,
K. J.
,
Glaser
,
D. E.
,
Henson
,
R. N.
,
Kiebel
,
S.
,
Phillips
,
C.
, &
Ashburner
,
J.
(
2002
).
Classical and Bayesian inference in neuroimaging: Applications.
Neuroimage
,
16
,
484
512
.
Friston
,
K. J.
,
Holmes
,
A. P.
,
Worsley
,
K. J.
,
Poline
,
J. P.
,
Frith
,
C. D.
, &
Frackowiak
,
R. S. J.
(
1994
).
Statistical parametric maps in neuroimaging: A general linear approach.
Human Brain Mapping
,
2
,
189
210
.
Glasser
,
M. F.
, &
Rilling
,
J. K.
(
2008
).
DTI tractography of the human brain's language pathways.
Cerebral Cortex.
Epub ahead of print.
Grandjean
,
D.
,
Sander
,
D.
,
Lucas
,
N.
,
Scherer
,
K. R.
, &
Vuilleumier
,
P.
(
2007
).
Effects of emotional prosody on auditory extinction for voices in patients with spatial neglect.
Neuropsychologia
,
46
,
487
496
.
Grandjean
,
D.
,
Sander
,
D.
,
Pourtois
,
G.
,
Schwartz
,
S.
,
Seghier
,
M. L.
,
Scherer
,
K. R.
, et al
(
2005
).
The voices of wrath: Brain responses to angry prosody in meaningless speech.
Nature Neuroscience
,
8
,
145
146
.
Grill-Spector
,
K.
,
Henson
,
R.
, &
Martin
,
A.
(
2006
).
Repetition and the brain: Neural models of stimulus-specific effects.
Trends in Cognitive Sciences
,
10
,
14
23
.
Grill-Spector
,
K.
,
Kushnir
,
T.
,
Edelman
,
S.
,
Avidan
,
G.
,
Itzchak
,
Y.
, &
Malach
,
R.
(
1999
).
Differential processing of objects under various viewing conditions in the human lateral occipital complex.
Neuron
,
24
,
187
203
.
Hasson
,
U.
,
Nusbaum
,
H. C.
, &
Small
,
S. L.
(
2006
).
Repetition suppression for spoken sentences and the effect of task demands.
Journal of Cognitive Neuroscience
,
18
,
2013
2029
.
Haxby
,
J. V.
,
Gobbini
,
M. I.
,
Furey
,
M. L.
,
Ishai
,
A.
,
Schouten
,
J. L.
, &
Pietrini
,
P.
(
2001
).
Distributed and overlapping representations of faces and objects in ventral temporal cortex.
Science
,
293
,
2425
2430
.
Henson
,
R.
(
2000
).
Neuroimaging evidence for dissociable forms of repetition priming.
Science
,
287
,
1269
1272
.
Henson
,
R. N.
, &
Mouchlianitis
,
E.
(
2007
).
Effect of spatial attention on stimulus-specific haemodynamic repetition effects.
Neuroimage
,
35
,
1317
1329
.
Henson
,
R. N.
,
Shallice
,
T.
,
Gorno-Tempini
,
M. L.
, &
Dolan
,
R. J.
(
2002
).
Face repetition effects in implicit and explicit memory tests as measured by fMRI.
Cerebral Cortex
,
12
,
178
186
.
Hornak
,
J.
,
Bramham
,
J.
,
Rolls
,
E. T.
,
Morris
,
R. G.
,
O'Doherty
,
J.
,
Bullock
,
P. R.
, et al
(
2003
).
Changes in emotion after circumscribed surgical lesions of the orbitofrontal and cingulate cortices.
Brain
,
126
,
1691
1712
.
Imaizumi
,
S.
,
Mori
,
K.
,
Kiritani
,
S.
,
Kawashima
,
R.
,
Sugiura
,
M.
,
Fukuda
,
H.
, et al
(
1997
).
Vocal identification of speaker and emotion activates different brain regions.
NeuroReport
,
8
,
2809
2812
.
Ishai
,
A.
,
Pessoa
,
L.
,
Bikle
,
P. C.
, &
Ungerleider
,
L. G.
(
2004
).
Repetition suppression of faces is modulated by emotion.
Proceedings of the National Academy of Sciences, U.S.A.
,
101
,
9827
9832
.
Jancke
,
L.
,
Wustenberg
,
T.
,
Scheich
,
H.
, &
Heinze
,
H. J.
(
2002
).
Phonetic perception and the temporal cortex.
Neuroimage
,
15
,
733
746
.
Johnstone
,
T.
,
van Reekum
,
C. M.
,
Oakes
,
T. R.
, &
Davidson
,
R. J.
(
2006
).
The voice of emotion: An FMRI study of neural responses to angry and happy vocal expressions.
Social Cognitive and Affective Neuroscience
,
1
,
242
249
.
Kanwisher
,
N.
,
McDermott
,
J.
, &
Chun
,
M. M.
(
1997
).
The fusiform face area: A module in human extrastriate cortex specialized for face perception.
Journal of Neuroscience
,
17
,
4302
4311
.
Kreifelts
,
B.
,
Ethofer
,
T.
,
Grodd
,
W.
,
Erb
,
M.
, &
Wildgruber
,
D.
(
2007
).
Audiovisual integration of emotional signals in voice and face: An event-related fMRI study.
Neuroimage
,
37
,
1445
1456
.
Kucharska-Pietura
,
K.
,
Phillips
,
M. L.
,
Gernand
,
W.
, &
David
,
A. S.
(
2003
).
Perception of emotions from faces and voices following unilateral brain damage.
Neuropsychologia
,
41
,
1082
1090
.
Lane
,
R. D.
,
Fink
,
G. R.
,
Chau
,
P. M.
, &
Dolan
,
R. J.
(
1997
).
Neural activation during selective attention to subjective emotional responses.
NeuroReport
,
8
,
3969
3972
.
Li
,
L.
,
Miller
,
E. K.
, &
Desimone
,
R.
(
1993
).
The representation of stimulus familiarity in anterior inferior temporal cortex.
Journal of Neurophysiology
,
69
,
1918
1929
.
Meister
,
I. G.
,
Buelte
,
D.
,
Sparing
,
R.
, &
Boroojerdi
,
B.
(
2007
).
A repetition suppression effect lasting several days within the semantic network.
Experimental Brain Research
,
183
,
371
376
.
Meyer
,
M.
,
Alter
,
K.
,
Friederici
,
A. D.
,
Lohmann
,
G.
, &
von Cramon
,
D. Y.
(
2002
).
fMRI reveals brain regions mediating slow prosodic modulations in spoken sentences.
Human Brain Mapping
,
17
,
73
88
.
Mitchell
,
R. L.
(
2007
).
fMRI delineation of working memory for emotional prosody in the brain: Commonalities with the lexico-semantic emotion network.
Neuroimage
,
36
,
1015
1025
.
Mitchell
,
R. L.
,
Elliott
,
R.
,
Barry
,
M.
,
Cruttenden
,
A.
, &
Woodruff
,
P. W.
(
2003
).
The neural response to emotional prosody, as revealed by functional magnetic resonance imaging.
Neuropsychologia
,
41
,
1410
1421
.
Morris
,
J. S.
,
Scott
,
S. K.
, &
Dolan
,
R. J.
(
1999
).
Saying it with feeling: Neural responses to emotional vocalizations.
Neuropsychologia
,
37
,
1155
1163
.
Obleser
,
J.
,
Wise
,
R. J.
,
Alex Dresner
,
M.
, &
Scott
,
S. K.
(
2007
).
Functional integration across brain regions improves speech perception under adverse listening conditions.
Journal of Neuroscience
,
27
,
2283
2289
.
Oldfield
,
R. C.
(
1971
).
The assessment and analysis of handedness: The Edinburgh inventory.
Neuropsychologia
,
9
,
97
113
.
Orfanidou
,
E.
,
Marslen-Wilson
,
W. D.
, &
Davis
,
M. H.
(
2006
).
Neural response suppression predicts repetition priming of spoken words and pseudowords.
Journal of Cognitive Neuroscience
,
18
,
1237
1252
.
Phillips
,
M. L.
,
Young
,
A. W.
,
Scott
,
S. K.
,
Calder
,
A. J.
,
Andrew
,
C.
,
Giampietro
,
V.
, et al
(
1998
).
Neural responses to facial and vocal expressions of fear and disgust.
Proceedings of the Royal Society of London, Series B, Biological Sciences
,
265
,
1809
1817
.
Poeppel
,
D.
,
Guillemin
,
A.
,
Thompson
,
J.
,
Fritz
,
J.
,
Bavelier
,
D.
, &
Braun
,
A. R.
(
2004
).
Auditory lexical decision, categorical perception, and FM direction discrimination differentially engage left and right auditory cortex.
Neuropsychologia
,
42
,
183
200
.
Posner
,
J.
,
Russell
,
J. A.
,
Gerber
,
A.
,
Gorman
,
D.
,
Colibazzi
,
T.
,
Yu
,
S.
, et al
(
2008
).
The neurophysiological bases of emotion: An fMRI study of the affective circumplex using emotion-denoting words.
Human Brain Mapping.
Epub ahead of print.
Purdon
,
P. L.
, &
Weisskoff
,
R. M.
(
1998
).
Effect of temporal autocorrelation due to physiological noise and stimulus paradigm on voxel-level false-positive rates in fMRI.
Human Brain Mapping
,
6
,
239
249
.
Quadflieg
,
S.
,
Mohr
,
A.
,
Mentzel
,
H. J.
,
Miltner
,
W. H.
, &
Straube
,
T.
(
2008
).
Modulation of the neural network involved in the processing of anger prosody: The role of task-relevance and social phobia.
Biological Psychology
,
78
,
129
137
.
Rolls
,
E. T.
(
2004
).
The functions of the orbitofrontal cortex.
Brain and Cognition
,
55
,
11
29
.
Saad
,
Z. S.
,
Chen
,
G.
,
Reynolds
,
R. C.
,
Christidis
,
P. P.
,
Hammett
,
K. R.
,
Bellgowan
,
P. S.
, et al
(
2006
).
Functional imaging analysis contest (FIAC) analysis according to AFNI and SUMA.
Human Brain Mapping
,
27
,
417
424
.
Sander
,
D.
,
Grandjean
,
D.
,
Pourtois
,
G.
,
Schwartz
,
S.
,
Seghier
,
M. L.
,
Scherer
,
K. R.
, et al
(
2005
).
Emotion and attention interactions in social cognition: Brain regions involved in processing anger prosody.
Neuroimage
,
28
,
848
858
.
Sobotka
,
S.
, &
Ringo
,
J. L.
(
1996
).
Mnemonic responses of single units recorded from monkey inferotemporal cortex, accessed via transcommissural versus direct pathways: A dissociation between unit activity and behavior.
Journal of Neuroscience
,
16
,
4222
4230
.
Sokolov
,
E. N.
(
1963
).
Higher nervous functions; the orienting reflex.
Annual Review of Physiology
,
25
,
545
580
.
Soon
,
C. S.
,
Venkatraman
,
V.
, &
Chee
,
M. W.
(
2003
).
Stimulus repetition and hemodynamic response refractoriness in event-related fMRI.
Human Brain Mapping
,
20
,
1
12
.
Tulving
,
E.
, &
Schacter
,
D. L.
(
1990
).
Priming and human memory systems.
Science
,
247
,
301
306
.
Tzourio-Mazoyer
,
N.
,
Landeau
,
B.
,
Papathanassiou
,
D.
,
Crivello
,
F.
,
Etard
,
O.
,
Delcroix
,
N.
, et al
(
2002
).
Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain.
Neuroimage
,
15
,
273
289
.
Vuilleumier
,
P.
,
Armony
,
J. L.
, &
Dolan
,
R. J.
(
2003
).
Reciprocal links between emotion and attention.
In: R. S. J. Frackowiak, J. Ashburner, W. Penny, S. Zeki, K. J. Friston, C. D. Frith, et al. (Eds.),
Human brain function
(2nd ed., pp.
419
444
).
San Diego
:
Academic Press
.
Vuilleumier
,
P.
,
Schwartz
,
S.
,
Duhoux
,
S.
,
Dolan
,
R. J.
, &
Driver
,
J.
(
2005
).
Selective attention modulates neural substrates of repetition priming and “implicit” visual memory: Suppressions and enhancements revealed by fMRI.
Journal of Cognitive Neuroscience
,
17
,
1245
1260
.
Wambacq
,
I. J.
,
Shea-Miller
,
K. J.
, &
Abubakr
,
A.
(
2004
).
Non-voluntary and voluntary processing of emotional prosody: An event-related potentials study.
NeuroReport
,
15
,
555
559
.
Warren
,
J. E.
,
Sauter
,
D. A.
,
Eisner
,
F.
,
Wiland
,
J.
,
Dresner
,
M. A.
,
Wise
,
R. J.
, et al
(
2006
).
Positive emotions preferentially engage an auditory–motor “mirror” system.
Journal of Neuroscience
,
26
,
13067
13075
.
Wiethoff
,
S.
,
Wildgruber
,
D.
,
Kreifelts
,
B.
,
Becker
,
H.
,
Herbert
,
C.
,
Grodd
,
W.
, et al
(
2008
).
Cerebral processing of emotional prosody—Influence of acoustic parameters and arousal.
Neuroimage
,
39
,
885
893
.
Wildgruber
,
D.
,
Ackermann
,
H.
,
Kreifelts
,
B.
, &
Ethofer
,
T.
(
2006
).
Cerebral processing of linguistic and emotional prosody: fMRI studies.
Progress in Brain Research
,
156
,
249
268
.
Wildgruber
,
D.
,
Hertrich
,
I.
,
Riecker
,
A.
,
Erb
,
M.
,
Anders
,
S.
,
Grodd
,
W.
, et al
(
2004
).
Distinct frontal regions subserve evaluation of linguistic and emotional aspects of speech intonation.
Cerebral Cortex
,
14
,
1384
1389
.
Wildgruber
,
D.
,
Pihan
,
H.
,
Ackermann
,
H.
,
Erb
,
M.
, &
Grodd
,
W.
(
2002
).
Dynamic brain activation during processing of emotional intonation: Influence of acoustic parameters, emotional valence, and sex.
Neuroimage
,
15
,
856
869
.
Wildgruber
,
D.
,
Riecker
,
A.
,
Hertrich
,
I.
,
Erb
,
M.
,
Grodd
,
W.
,
Ethofer
,
T.
, et al
(
2005
).
Identification of emotional intonation evaluated by fMRI.
Neuroimage
,
24
,
1233
1241
.
Worsley
,
K. J.
,
Marrett
,
S.
,
Neelin
,
P.
,
Vandal
,
A. C.
,
Friston
,
K. J.
, &
Evans
,
A. C.
(
1996
).
A unified statistical approach for determining significant signals in images of cerebral activation.
Human Brain Mapping
,
4
,
74
90
.
Yi
,
D. J.
,
Kelley
,
T. A.
,
Marois
,
R.
, &
Chun
,
M. M.
(
2006
).
Attentional modulation of repetition attenuation is anatomically dissociable for scenes and faces.
Brain Research
,
1080
,
53
62
.
Zatorre
,
R. J.
(
2001
).
Neural specializations for tonal processing.
Annals of the New York Academy of Sciences
,
930
,
193
210
.