Abstract

The embodied view of language processing proposes that comprehension involves multimodal simulations, a process that retrieves a comprehender's perceptual, motor, and affective knowledge through reactivation of the neural systems responsible for perception, action, and emotion. Although evidence in support of this idea is growing, the contemporary neuroanatomical model of language suggests that comprehension largely emerges as a result of interactions between frontotemporal language areas in the left hemisphere. If modality-specific neural systems are involved in comprehension, they are not likely to operate in isolation but should interact with the brain regions critical to language processing. However, little is known about the ways in which language and modality-specific neural systems interact. To investigate this issue, we conducted a functional MRI study in which participants listened to stories that contained visually vivid, action-based, and emotionally charged content. Activity of neural systems associated with visual-spatial, motor, and affective processing were selectively modulated by the relevant story content. Importantly, when functional connectivity patterns associated with the left inferior frontal gyrus (LIFG), the left posterior middle temporal gyrus (pMTG), and the bilateral anterior temporal lobes (aTL) were compared, both LIFG and pMTG, but not the aTL, showed enhanced connectivity with the three modality-specific systems relevant to the story content. Taken together, our results suggest that language regions are engaged in perceptual, motor, and affective simulations of the described situation, which manifest through their interactions with modality-specific systems. On the basis of our results and past research, we propose that the LIFG and pMTG play unique roles in multimodal simulations during story comprehension.

INTRODUCTION

According to the embodied view of language, comprehension involves multimodal simulations in which the comprehender's contextually relevant perceptual, motor, and emotional knowledge is retrieved through partial reactivation of modality-specific brain systems (Barsalou, 2003). These systems are associated with distinct neural substrates and are thought to be specialized for perceptual, motor, and affective processing. Brain imaging studies have shown that regions within these neural systems are selectively activated during sentence processing (e.g., Desai, Binder, Conant, & Seidenberg, 2010) and story comprehension (Speer, Reynolds, Swallow, & Zacks, 2009; Ferstl & von Cramon, 2007; Ferstl, Rinck, & von Cramon, 2005) in response to different types of content, which suggests their involvement in processing the meaning of language. Neuropsychological research has further demonstrated that modality-specific systems are not simply associated with, but may be directly involved in, semantic processing (Willems, Labruna, D'Esposito, Ivry, & Casasanto, 2011; Boulenger et al., 2008; Neininger & Pulvermuller, 2003). Nevertheless, the precise mechanism by which these systems facilitate semantic processing is unclear (Meteyard, Cuadrado, Bahrami, & Vigliocco, 2012; Mahon & Caramazza, 2008).

At the same time, decades of research on the neuroscience of language have shown that there exists a network of brain regions consistently activated and functionally connected during language comprehension, regardless of content (Binder & Desai, 2011; Lau, Phillips, & Poeppel, 2008; Tyler & Marslen-Wilson, 2008; Vigneau et al., 2006; Price, 2000). Furthermore, there is substantial evidence demonstrating that activation, manipulation, and integration of semantic information is associated with a set of regions within this network. These include classical core language regions, the left inferior frontal gyrus (LIFG; Lau et al., 2008; Badre, Poldrack, Pare-Blagoev, Insler, & Wagner, 2005; Hagoort, 2005) and the left posterior middle temporal gyrus (pMTG; Turken & Dronkers, 2011; Hickok & Poeppel, 2007). In addition to these two core language regions, recent studies have shown that the bilateral anterior temporal lobes (aTL) play an important role in language processing, particularly semantic integration (Lambon Ralph, Sage, Jones, & Mayberry, 2010; Patterson, Nestor, & Rogers, 2007; Jung-Beeman, 2005). Together, the LIFG, pMTG, and aTL are not only engaged in processing the meaning of words and sentences but are also consistently activated during discourse comprehension (Mar, 2011; Ferstl, Neumann, Bogler, & von Cramon, 2008). If modality-specific neural systems play an integral role in processing the meaning of language, they are not likely to operate in isolation and should operate in concert with these language regions.

Recent theories have proposed that an additional set of heteromodal areas may play an important role in discourse comprehension. These include the medial prefrontal cortex (MPFC), posterior cingulate cortex (PCC), and bilateral parietal lobules (IPL; Binder & Desai, 2011; Ferstl et al., 2008; Mason & Just, 2006). However, unlike the language regions we defined in the previous paragraph, it is unclear whether these areas are directly involved in building up modality-specific representations or whether they serve other functions such as monitoring coherence. Moreover, this network has also been associated with a wide variety of other cognitive functions such as theory of mind and prospection (e.g., Buckner, Andrews-Hanna, & Schacter, 2008). Considering the functional uncertainty that surrounds this network of regions and the fact that we know little about how language and modality-specific neural systems relate during narrative comprehension, we focused our investigation on the LIFG, pMTG, and aTL in this study.

To examine the interactions between the language regions and modality-specific systems, we used fMRI to evaluate participants' responses while they listened to stories. These stories included paragraphs containing visually vivid, action-based, or emotionally charged content. The three content types were manipulated in the first, second, and third paragraphs of a story, respectively. We chose to map the content types onto the paragraph structure in this way because it fits with the canonical narrative structure, which begins with a setting, describes actions, and ends with an emotional resolution. To rule out the possibility that any effect of story content could be driven by paragraph order, we included control paragraphs for each of the three paragraph positions. These control paragraphs contained minimal description of perceptual, action, emotional content. Comparing the vivid perception, action, or emotional paragraphs to the sequence-matched control paragraphs makes it possible to remove the potential effect of sequence (see Methods for details). Given these controls, we expected that visually vivid, action-based, or emotionally charged content would selectively engage neural systems devoted to visual-spatial, motor, and affective processing, respectively. Crucially, with these activations identified, we employed connectivity methods to examine the relationships between language regions and these modality-specific regions.

More specifically, we hypothesized that the language ROIs (i.e., LIFG, pMTG, and aTL) would be activated independent of the type of content being processed, but that their interactions with the modality-specific systems would be differentially enhanced in a content-dependent fashion. In other words, interactions between the language regions and visual-spatial, motor, and affective systems should be selectively modulated by the visually vivid, action-based, or emotionally charged content. More importantly, we expected that the language regions might interact with the modality-specific systems in different ways, perhaps reflecting their distinct functional roles in simulating sensorimotor or emotional information. Observing and identifying these differences may make it possible to incorporate the functions of modality-specific neural systems into the traditional account of language processing.

METHODS

Participants

Participants were 24 native speakers of English (11 women and 13 men; mean age = 25.5 years, SD = 2.7 years, range = 21–33 years) with no history of neurological or psychiatric disease. All were right-handed according to the Edinburgh Handedness Inventory (Oldfield, 1971). Informed written consent was obtained from each participant in accordance with the protocol approved by the NIH CNS Institutional Review Board. Participants were compensated for participating in the study. The data from two other participants were excluded because of poor performance on the comprehension task administrated after each fMRI run, indicating that they might not have paid attention to the stories (response accuracies were below 75%; three SDs lower than the mean accuracy of the remaining participants; 88.5%, SD = 4.3%). Their data were excluded from further analyses as a result.

Story Stimuli

Our stimuli consisted of 18 stories written by the authors. Each story consisted of three paragraphs, conforming to a canonical narrative structure: (1) the introduction and description of the setting and the main protagonist, (2) the unfolding of narrative events, and (3) the outcomes and reactions of the protagonist. Because the three content types of interest fit naturally with this structure, story content was manipulated as follows. Twelve stories contained vivid descriptions of a scene in the first paragraph (Perception condition), 12 stories contained rich descriptions of a protagonist's actions in the second paragraph (Action condition), and 12 stories vividly described a happy or sad conclusion (Emotion condition) in the third paragraph. In 12 stories, one or two control paragraphs were inserted into each of the three paragraph positions to serve as Control conditions. These control paragraphs described settings, portrayed typical sequences of everyday events, or presented a coherent resolution of the actions, but contained muted descriptions of vivid visual, motor, or emotional content. The control paragraphs were therefore coherent with the other parts of a story, conformed to a canonical narrative structure, and ensured that an accumulating discourse-level context took place, but without vivid description of the three content types of interest. The control paragraphs also served as fillers that limited participants' expectation of content type. The experimental conditions for each story and the associated topics for each story are listed in Table 1. Examples of story are shown in Table 2, and the full set of stimuli is available on-line at www.nidcd.nih.gov/research/scientists/pages/brauna.aspx. Additionally, three traditional nursery rhymes (i.e., Mary Had a Little Lamb, Jack Be Nimble, and Humpty Dumpty) to which all participants had been exposed earlier in life were presented to the participants. Like the stories, these contain phonological, lexical, and syntactic structures, making it possible to control for low-level language processing. It is important to point out, however, that there are some shortcomings of this approach to a baseline condition. For one, nursery rhymes might be considered a type of short narrative, so there may have been excessive control for the creation of a discourse model (i.e., both the experimental and control conditions have narrative elements). Moreover, the nursery rhymes differed from our experimental narratives in terms of their familiarity, length, and the fact that they are presented in verse. However, our reasoning was that the familiar and overlearned nature of these nursery rhymes, which were repeated multiple times in the experiment to match the duration of story stimuli, would make it unlikely that rich discourse models be created for these rhymes. The use of the nursery rhyme baseline, in this case, should allow us to identify language regions that participate in the creation of such models. In light of the aforementioned concerns, however, interpreting this contrast between the experimental stories and nursery rhymes should be undertaken with some caution.

Table 1. 

The Experimental Conditions and Topic of the Stories


Condition
Topic

Condition
Topic
Story 1 Perception Theatre hall Story 10 Control N/A 
Action Playing the piano Action Fixing TV and cables 
Emotion Winning a prize on stage Emotion Getting compliments from your boss 
Story 2 Perception New York City Story 11 Control N/A 
Action Renovating apartment Action Dancing 
Emotion Confrontation with neighbors Emotion Getting injured in a car accident 
Story 3 Perception Paris Story 12 Control N/A 
Action Playing with a toddler Action Assembling furniture 
Control N/A Control N/A 
Story 4 Perception Basketball stadium Story 13 Perception Disco 
Action Playing basketball Control N/A 
Emotion Winning a game Emotion A close relative dying in a hospital 
Story 5 Perception Mountain ranges Story 14 Perception 3-D movie 
Action Jogging Control N/A 
Emotion Being robbed Emotion Meeting personal idol 
Story 6 Perception Decorative church Story 15 Perception Biochemical lab 
Action Conducting music Control N/A 
Control N/A Control N/A 
Story 7 Control N/A Story 16 Perception London 
Action Baking Control N/A 
Emotion Getting compliments from others Emotion Being cheated on by your romantic partner 
Story 8 Control N/A Story 17 Perception Disneyland 
Action Watching a football game Control N/A 
Emotion Being dumped by your romantic partner Emotion Marriage proposal 
Story 9 Control N/A Story 18 Perception Los Angeles 
Action Painting walls Control N/A 
Control N/A Control N/A 

Condition
Topic

Condition
Topic
Story 1 Perception Theatre hall Story 10 Control N/A 
Action Playing the piano Action Fixing TV and cables 
Emotion Winning a prize on stage Emotion Getting compliments from your boss 
Story 2 Perception New York City Story 11 Control N/A 
Action Renovating apartment Action Dancing 
Emotion Confrontation with neighbors Emotion Getting injured in a car accident 
Story 3 Perception Paris Story 12 Control N/A 
Action Playing with a toddler Action Assembling furniture 
Control N/A Control N/A 
Story 4 Perception Basketball stadium Story 13 Perception Disco 
Action Playing basketball Control N/A 
Emotion Winning a game Emotion A close relative dying in a hospital 
Story 5 Perception Mountain ranges Story 14 Perception 3-D movie 
Action Jogging Control N/A 
Emotion Being robbed Emotion Meeting personal idol 
Story 6 Perception Decorative church Story 15 Perception Biochemical lab 
Action Conducting music Control N/A 
Control N/A Control N/A 
Story 7 Control N/A Story 16 Perception London 
Action Baking Control N/A 
Emotion Getting compliments from others Emotion Being cheated on by your romantic partner 
Story 8 Control N/A Story 17 Perception Disneyland 
Action Watching a football game Control N/A 
Emotion Being dumped by your romantic partner Emotion Marriage proposal 
Story 9 Control N/A Story 18 Perception Los Angeles 
Action Painting walls Control N/A 
Control N/A Control N/A 
Table 2. 

Example Story Stimuli

Condition
Story 1
Condition
Story 9
Perception From her spot backstage, Hannah caught a glance of the spacious theater hall where the music competition was being held. The hall's ornamented balconies hung over the heads of those sitting on the lower level. The buzz of their talking was loud, and yet they seemed small, almost miniature, in their seats. She heard orchestra players tuning their instruments and felt butterflies in her stomach. It was the first time she would be competing while accompanied by an entire orchestra. Control Two weeks of vacation time were almost gone before Alice got around to her home renovations. The walls of the apartment where she had been living for several years were in disrepair. Still, it was so easy to busy herself with other things—vacation time was always too short to do everything that needed to be done. But with only a few days left of her vacation, she was finally going to get to work fixing up her place. 
Action Hannah slowly walked out onto the stage. The theater-goers abruptly fell silent, before clapping their hands enthusiastically. She sat at the piano stool, placing her hands on the piano keys, ready to begin. The conductor waved both his hands, vigorously, and the orchestra started playing. Her fingers moved swiftly across the keys and her feet lightly pressed the piano pedals. Her body swayed gently back and forth, as if the very sound of the music was moving her. Action Walking through the local hardware store, Alice grabbed supplies and dropped them in her basket. Then, at home, she pushed the furniture into the center of each room and threw large plastic sheets over each piece. She then laid down tape along the edges of the walls to prevent smudges on the floorboards. Finally, she started painting—first slowly dabbing the corners with special brushes, then more rapidly rolling paint on large parts of the wall with her rollers. 
Emotion Only when she was in the backstage room did Hannah let herself feel the joy. With the thunderous applause still in her ears, she felt a wave of pleasure overcome her. Before long, she was being called back to the stage by the judges and she walked out unsteadily, her face flushed with excitement. With tears of joy in her eyes, she accepted a large bouquet of flowers and a check that was awarded to her. She felt the thrill of victory. Control In the evening, Alice was about half-way done with painting. She left all the windows open to air the place out. She took a shower, got dressed, and went out to find some food. She ate at one of the local restaurants, only 5 min away from her home. When she got back home, the apartment still smelled of paint and was chilly too. Alice called up her best friend Jane and asked to crash at her place. 
Condition
Story 1
Condition
Story 9
Perception From her spot backstage, Hannah caught a glance of the spacious theater hall where the music competition was being held. The hall's ornamented balconies hung over the heads of those sitting on the lower level. The buzz of their talking was loud, and yet they seemed small, almost miniature, in their seats. She heard orchestra players tuning their instruments and felt butterflies in her stomach. It was the first time she would be competing while accompanied by an entire orchestra. Control Two weeks of vacation time were almost gone before Alice got around to her home renovations. The walls of the apartment where she had been living for several years were in disrepair. Still, it was so easy to busy herself with other things—vacation time was always too short to do everything that needed to be done. But with only a few days left of her vacation, she was finally going to get to work fixing up her place. 
Action Hannah slowly walked out onto the stage. The theater-goers abruptly fell silent, before clapping their hands enthusiastically. She sat at the piano stool, placing her hands on the piano keys, ready to begin. The conductor waved both his hands, vigorously, and the orchestra started playing. Her fingers moved swiftly across the keys and her feet lightly pressed the piano pedals. Her body swayed gently back and forth, as if the very sound of the music was moving her. Action Walking through the local hardware store, Alice grabbed supplies and dropped them in her basket. Then, at home, she pushed the furniture into the center of each room and threw large plastic sheets over each piece. She then laid down tape along the edges of the walls to prevent smudges on the floorboards. Finally, she started painting—first slowly dabbing the corners with special brushes, then more rapidly rolling paint on large parts of the wall with her rollers. 
Emotion Only when she was in the backstage room did Hannah let herself feel the joy. With the thunderous applause still in her ears, she felt a wave of pleasure overcome her. Before long, she was being called back to the stage by the judges and she walked out unsteadily, her face flushed with excitement. With tears of joy in her eyes, she accepted a large bouquet of flowers and a check that was awarded to her. She felt the thrill of victory. Control In the evening, Alice was about half-way done with painting. She left all the windows open to air the place out. She took a shower, got dressed, and went out to find some food. She ate at one of the local restaurants, only 5 min away from her home. When she got back home, the apartment still smelled of paint and was chilly too. Alice called up her best friend Jane and asked to crash at her place. 

Stories contained perceptual content in the first paragraph, action content in the second, and emotional content in the third. As a form of control, some stories had control paragraphs with minimal perception, action, or emotion content (e.g., Story 9, Paragraphs 1 and 3). The full set of stimuli is available on-line at www.nidcd.nih.gov/research/scientists/pages/brauna.aspx.

Stories were constructed so that various linguistic features would be closely matched across all conditions. These features included the number of syllables per paragraph, number of words per paragraph, number of words per sentence, Flesch-Kincaid readability score per paragraph (Kincaid, Fishburne, Rogers, & Chissom, 1975), average word frequency (occurrence per million words) of content words per paragraph, and average imageability rating (likelihood of a word evoking mental images) of content words per paragraph. For evaluating word frequency and imageability, only open-class words available from the MRC Psycholinguistic Database (Coltheart, 1981) were considered to exclude the extreme values associated with closed-class words that often bias such measures. The summary statistics of the paragraphs in each condition are listed in Table 3. For each linguistic feature, the difference between the Perception, Action, and Emotion conditions is less than 10%, and there is no significant difference between the three conditions (t test, p < .05 Bonferroni corrected), except in readability between the Perception and Emotion condition (the Emotional material being somewhat easier to read). This measure was included in the general linear model (GLM) analysis as a covariate to remove its potential effects from our primary interest in story content (see Methods for details).

Table 3. 

Mean and Standard Deviation (in Parentheses) of the Linguistic Measures of the Paragraphs in Each Condition

Condition
Number of Syllables
Number of Words
Number of Words per Sentence
Readability Indexa
Average Word Frequencybper Paragraph
Average Word Imageability Ratingcper Paragraph
Perception 101.0 (8.1) 77.1 (2.7) 16.7 (2.0) 69.3 (7.5)d 109.5 (20.0) 506.8 (18.2) 
Action 109.4 (6.6) 78.3 (2.0) 16.9 (2.4) 71.4 (7.1) 124.9 (30.7) 473.4 (33.0) 
Emotion 105.6 (5.4) 77.9 (3.0) 15.6 (2.1) 76.3 (5.0) 133.0 (30.4) 490.5 (22.1) 
Control 108.3 (6.9) 76.7 (2.1) 16.2 (3.0) 70.9 (8.6) 209.8 (67.9)e 455.7 (33.9)e 
Condition
Number of Syllables
Number of Words
Number of Words per Sentence
Readability Indexa
Average Word Frequencybper Paragraph
Average Word Imageability Ratingcper Paragraph
Perception 101.0 (8.1) 77.1 (2.7) 16.7 (2.0) 69.3 (7.5)d 109.5 (20.0) 506.8 (18.2) 
Action 109.4 (6.6) 78.3 (2.0) 16.9 (2.4) 71.4 (7.1) 124.9 (30.7) 473.4 (33.0) 
Emotion 105.6 (5.4) 77.9 (3.0) 15.6 (2.1) 76.3 (5.0) 133.0 (30.4) 490.5 (22.1) 
Control 108.3 (6.9) 76.7 (2.1) 16.2 (3.0) 70.9 (8.6) 209.8 (67.9)e 455.7 (33.9)e 

aReadability index indicates comprehension difficulty of a text (developed by Kincaid et al., 1975). Higher values indicate that a text is easier to read. An index of 60–70 is approximately eighth grade level.

bOccurrence in a corpus of 1.014 million words provided by Kucera and Francis (1967). Only open-class words available from the MRC Psycholinguistic Database (Coltheart, 1981) were considered.

cRatings of likelihood of a word evoking mental images (ranged from 100 to 700). Only open-class words available from the MRC Psycholinguistic Database (Coltheart, 1981) were considered.

dIndicates the Control condition is significantly different (p < .05) from the Emotion condition.

eIndicates the Control condition is significantly different (p < .05) from all the other conditions.

The stories and nursery rhymes were spoken by a male native English speaker (aged 35) in a neutral tone at speech rate of around 155 words per minute. All stimuli were recorded digitally in a sound-isolated booth at a sampling rate of 44.1 kHz. For each story paragraph and nursery rhyme, the duration was normalized to exactly 30 sec and the loudness was equalized using audio editing software (Audition, Adobe Systems, Inc.). Because each story contained three paragraphs, the total duration of a story is 90 sec.

Stimulus Norming

To confirm that the content of stories had been manipulated effectively as intended, story paragraphs were normed in a prestudy with 19 participants (3 men and 16 women, aged 22–34 years). None of these participants were scanned in the subsequent fMRI study. Individuals were asked to rate the content of each story segment with respect to the three dimensions we manipulated: (1) vividness of the described scene, (2) vividness of the described bodily action, and (3) the intensity of the protagonists' emotion. Ratings were provided on a scale of 1 (not vivid/intense) to 5 (highly vivid/intense). The responses were analyzed using a repeated-measure ANOVA model with Content Type and Rating forming two factors. Both Content Type, F(2, 36) = 22.0, p < .05, and Rating, F(2, 36) = 32.0, p < .05, were statistically significant. Importantly, their interaction was also significant, F(4, 144) = 74.2, p < .05. To gain further information about these effects, t tests were performed to compare ratings between pairs of content types, with p values corrected for multiple comparisons using the Bonferroni procedure. The results are listed in Table 4. Importantly, the comparisons of ratings between Perception, Action, and Emotion story segments showed that only the rating of the designated dimension for each condition was significantly more vivid than that in other two conditions (p < .05 in all cases). The participants of the fMRI experiment also rated the stories using the same system after scanning. These results were congruent with the norming study, showing that only the rating of the designated dimension for each condition was rated significantly more vivid than that in both of the other two conditions (p < .05 in all cases).

Table 4. 

Mean and Standard Error (in Parentheses) of Vividness Ratings in Each Condition

Condition
Vividness of the Described Scene
Vividness of the Described Bodily Action
Intensity of Emotion
Perception 4.2 (0.1)a 3.1 (0.1) 2.7 (0.1)b 
Action 3.6 (0.2) 4.1 (0.1)a 2.2 (0.1) 
Emotion 3.5 (0.2) 3.5 (0.1)c 4.4 (0.1)a 
Control 2.9 (0.2)a 2.7 (0.1)a 2.1 (0.1)a 
Condition
Vividness of the Described Scene
Vividness of the Described Bodily Action
Intensity of Emotion
Perception 4.2 (0.1)a 3.1 (0.1) 2.7 (0.1)b 
Action 3.6 (0.2) 4.1 (0.1)a 2.2 (0.1) 
Emotion 3.5 (0.2) 3.5 (0.1)c 4.4 (0.1)a 
Control 2.9 (0.2)a 2.7 (0.1)a 2.1 (0.1)a 

Boldface indicates the ratings that were expected to be significantly higher than the other conditions.

aIndicates significant differences from all the other conditions.

bIndicates a significant difference (p < .05) from the Action condition.

cIndicates a significant difference (p < .05) from the Perception condition.

Experimental Design

All 18 stories were presented in random order without repetition in six runs of an fMRI session. In each run, three stories and one nursery rhyme were presented binaurally via a SilentScan 3100 pneumatic headphone system (Avotec, Stuart, FL). To allow the hemodynamic response to return to baseline, paragraphs within a story were separated by a 12-sec interval; between stories or between a story and a nursery rhyme block, the interval was 16 sec. Participants were instructed to listen to the stories carefully and to do nothing during ISIs. To measure whether or not participants paid attention to the stimuli, they were asked to answer three yes–no comprehension questions after each run, responding via button press.

Data Acquisition

Whole-brain gradient-echo EPI data were acquired on a 3-T GE Signa scanner with an eight-channel head coil (repetition time = 1500 msec, echo time = 30 msec, flip angle = 90°, 64 × 64 matrix, field of view = 224 mm, 30 sagittal slices acquired in interleaved order, slice thickness = 5 mm, acceleration factor = 2). High-resolution anatomical images were acquired using MP-RAGE (voxel size = 0.81 × 0.81 × 1.0 mm).

Data Analysis

fMRI Data Preprocessing

Using AFNI (Cox, 1996), functional images of each participant were aligned to the first images of each run on a slice-by-slice basis, timing differences in acquiring each slice were corrected, and each volume was aligned to the first volume of the first run. To remove the physiological and motion-related artifacts, spatial independent component analysis (McKeown et al., 1998) was applied to the motion and slice-time corrected functional data for each participant using the GIFT toolbox (mialab.mrn.org/software/gift/index.html). In spatial independent component analysis, each functional image was treated as a mixture of multiple spatially independent signal and noise components. The number of components in each data set was estimated by minimum description length criterion (Li, Adali, & Calhoun, 2007). Classification of artifactual and neuronal ICA components was based on their degree of spatial clustering, location of major positively weighted clusters, and neighborhood connectedness between positively and negatively weighted clusters. Using these criteria, the noise components were identified by human experts and their variances were then subtracted from the original data set. The functional and anatomical images of each participant were coregistered and normalized to the stereotaxic space of the Montreal Neurological Institute (MNI) using SPM8 (www.fil.ion.ucl.ac.uk/spm/). All coordinates are reported in MNI space. The functional images were resampled to a voxel size equal to 3 × 3 × 3 mm and smoothed with an isotropic 6-mm FWHM Gaussian kernel.

Activation Analysis

The activation analysis was conducted using the framework of the GLM implemented in SPM8. For the subject-level analysis, separate regressors were constructed by convolving the box-car function of each condition with the canonical hemodynamic function. The model also included regressors of no interest to account for variance because of low-frequency (<1/128 Hz) scanner drift and global signal fluctuations. Additionally, we removed potential effects of no interest associated with our auditory presentations to focus on our primary interest in the content types. Specifically, the means and standard deviations of the fundamental frequency and intensity of the auditory stimuli, along with the readability index for each paragraph, were included as individual regressors. The estimated hemodynamic responses for each condition and participant were entered into a repeated-measures ANOVA model, implemented in SPM8, to make statistical inferences at the population level (i.e., a random effects analysis). Contrasts between conditions were computed using paired t tests.

To identify the regions that responded selectively to each content type, we used a conjunction analysis similar to that described in a previous study (Simmons, Reddish, Bellgowan, & Martin, 2010). Selective activation was defined as a cluster exhibiting reliably greater activity for a particular content type than the other two. To control for the potential corresponding effect of sequence, for each comparison, the difference between two conditions had to be significantly stronger than the difference between the corresponding sequence-matched control paragraphs (i.e., an interaction between story content and paragraph order). For example, to qualify as a selective activation for perceptual content, the activity of a region had to be significantly greater in two separate statistical tests of interaction: (Perception > Action) > (first > second control paragraphs) and (Perception > Emotion) > (first > third control paragraphs). The alpha threshold for each of the individual tests of interaction was set at p < .0075 one-tailed with minimum cluster size equal to 58 voxels, corresponding to p < .05 (family-wise error corrected) based on Monte Carlo simulations (computed by 3dClustSim in AFNI). Because the individual tests were corrected for multiple comparisons, the regions surviving in the conjunction analysis are regarded as reliable. However, it is possible that small intersections between clusters from different statistical tests could arise from spatial smoothing and resampling (Simmons et al., 2010). To rule out this possibility, a small cluster size threshold of at least 10 voxels was applied in the conjoined areas.

In addition to comparing the Perception, Action, and Emotion conditions, we also compared the control paragraphs using the same conjunction method described above (i.e., first–second ∩ first–third, second–first ∩ second–third, and third–first ∩ third–second). Doing so allows us to examine the effect of paragraph order across the stories, independent of story content.

To identify the regions activated in all story conditions, we compared each story condition (including the Control conditions) to the nursery rhyme condition separately. Conjunctions between these contrasts were identified using the same threshold described above.

Because the size of the amygdala is small compared with the spatial extent threshold used for family-wise error correction, an anatomical ROI approach was used to detect increased hemodynamic response for this region. For each condition, the average responses of all voxels located in left and right amygdalae, defined according to a standard atlas (Tzourio-Mazoyer et al., 2002), was calculated. The average response between conditions was compared using paired t tests, and the threshold of significance was set at p < .05.

Functional Connectivity Analysis

Additional preprocessing steps were carried out for the functional connectivity analyses. We removed variance explained by the task regressors and those associated with effects of no interest from the preprocessed data, for each participant's GLM. The residual time series of each voxel was then band-pass filtered with cutoff frequencies at 0.03 and 0.3 Hz, which ensured that the connectivity between regions was not affected by high frequency physiological noise or low-frequency scanner drift patterns. The epochs of each condition were concatenated according to their onsets and offsets with a 6-sec delay accounting for the sluggishness of the hemodynamic response, generating a data set for each condition (Bokde, Tagamets, Friedman, & Horwitz, 2001). The seed locations were defined as the peak activations in the LIFG (−51, 24, 18; t = 6.1), left pMTG (−51, −42, 0; t = 8.9), left aTL (−51, −9, −18; t = 13.1), and right aTL (54, −6, −18; t = 11.6). These peaks were determined by comparing the averaged response across all story conditions (i.e., Perceptual, Action, Emotion, and Control conditions) with the Nursery Rhymes baseline (p < .05, corrected). For each condition, the eigenvector of all voxels within a sphere (radius of 5 mm) centered at each seed's coordinate was computed. A correlation map was generated by calculating a Pearson's correlation coefficient (r) between a given seed's time series and each voxel's time series in the brain. Fisher's transformation was then applied to convert these correlation coefficients to Z values. For each seed, the participants' Fisher's Z maps were subjected to a repeated-measures ANOVA model to make statistical inferences at the population level. Contrasts between two conditions were computed using paired t tests. To identify connectivity changes specific to the Perception, Action, and Emotion conditions, we employed the same conjunction analysis method used in the activation analyses. The alpha threshold for each of the individual tests of interaction was set at p < .015 one-tailed with minimum cluster size equal to 79 voxels, corresponding to p < .05 (family-wise error corrected) based on Monte Carlo simulations (computed by 3dClustSim in AFNI). These analyses revealed clusters that increased in correlation with a seed region for a particular type of content, to a degree greater than that observed for both of the other two types of story content and the sequence-matched control paragraphs in three separate statistical tests.

RESULTS

Behavioral Results

In general, participants accurately answered the comprehension questions at the end of each run. The mean accuracy of the 24 participants included in the analyses was 88.5 ± 4.3% (SD), indicating that they paid attention to the story stimuli.

Activations Modulated by Story Content

Content-specific activations were defined as clusters of voxels exhibiting reliably greater activity for a particular content type than for the other two types of content while controlling for the paragraph sequence effect, when tested individually (see Methods). A composite map of content-specific activations is shown in Figure 1 (see also Table 5). As predicted, perceptual, action-based, and emotionally charged story content modulated responses in the related modality-specific brain regions. In the Perception condition, we observed robust activations in regions associated with visual-spatial processing, including the left occiptoparietal cortex, the precuneus, the retrosplenial cortex, the parahippocampal gyrus, and the fusiform gyrus. The Action condition was associated with selective activation of the anterior intraparietal (AIP) area including the left somatosensory cortex and the left dorsal premotor cortex (PMd). Emotionally charged story content elicited responses in paralimbic and prefrontal areas frequently associated with affective processing, social concept processing, and mentalizing. These included the ventral and dorsal MPFC, the PCC, the STS, the TPJ, and the orbital frontal gyri bilaterally. Emotionally charged content also elicited significantly greater responses than the other two content types in bilateral amygdalae, according to our ROI analysis (Figure 2).

Figure 1. 

Brain regions selectively activated in processing visually vivid (pink), action-based (red), and emotionally charged (blue) story content. A region was considered reliably activated for a condition only if the activation in that condition was significantly stronger than the other two in two separate comparisons controlling for the paragraph-sequence effect (see Methods). All individual comparisons were corrected for family-wise error (p < .05), and the results are rendered on a single-subject anatomical image.

Figure 1. 

Brain regions selectively activated in processing visually vivid (pink), action-based (red), and emotionally charged (blue) story content. A region was considered reliably activated for a condition only if the activation in that condition was significantly stronger than the other two in two separate comparisons controlling for the paragraph-sequence effect (see Methods). All individual comparisons were corrected for family-wise error (p < .05), and the results are rendered on a single-subject anatomical image.

Table 5. 

Regions Activated Specifically for Visually Vivid, Action-based, and Emotionally Charged Story Content

Cluster
Hemisphere
Peak MNI Coordinates
Min. ta
Volume (cm3)
x
y
z
Perception 
Parahippocampal/fusiform gyri −27 −33 −18 5.4 3.8 
Parahippocampal/fusiform gyri 30 −33 −18 5.1 2.2 
Retrosplenial cortex/precuneus 12 −51 12 5.2 12.1 
Retrosplenial cortex/precuneus −12 −54 12 5.0 3.5 
Middle cingulate cortex −9 −36 27 5.5 1.5 
Middle occipital lobe −30 −84 33 4.6 4.3 
 
Action 
Anterior intraparietal area/somatosensory cortex −54 −33 36 4.0 2.1 
Left dorsal premotor cortex −18 57 3.2 1.8 
 
Emotion 
Superior temporal sulcus −54 −24  41.6 
Temporal parietal junction −51 −60 24 
Middle temporal gyrus −54 −42 
Orbital frontal gyrus −54 21 
Superior temporal sulcus 48 −30  16.6 
Superior temporal sulcus 60 −30 −6 
Temporal parietal junction 57 −60 30 
Orbital frontal gyrus 54 21 
Cerebellum −24 −78 −36 8.6 6.9 
Cerebellum 30 −78 −36 8.1 8.0 
Cerebellar vermis −6 −60 −48 7.1 4.0 
Medial prefrontal cortex −6 54 30 8.3 6.3 
Medial prefrontal cortex L/R 60 −15 5.2 1.6 
Posterior cingulate cortex/precuneus L/R −54 33 5.7 7.2 
Medial orbital frontal L/R 60 −15 5.2 1.6 
Middle frontal gyrus −48 51 5.2 3.3 
Cluster
Hemisphere
Peak MNI Coordinates
Min. ta
Volume (cm3)
x
y
z
Perception 
Parahippocampal/fusiform gyri −27 −33 −18 5.4 3.8 
Parahippocampal/fusiform gyri 30 −33 −18 5.1 2.2 
Retrosplenial cortex/precuneus 12 −51 12 5.2 12.1 
Retrosplenial cortex/precuneus −12 −54 12 5.0 3.5 
Middle cingulate cortex −9 −36 27 5.5 1.5 
Middle occipital lobe −30 −84 33 4.6 4.3 
 
Action 
Anterior intraparietal area/somatosensory cortex −54 −33 36 4.0 2.1 
Left dorsal premotor cortex −18 57 3.2 1.8 
 
Emotion 
Superior temporal sulcus −54 −24  41.6 
Temporal parietal junction −51 −60 24 
Middle temporal gyrus −54 −42 
Orbital frontal gyrus −54 21 
Superior temporal sulcus 48 −30  16.6 
Superior temporal sulcus 60 −30 −6 
Temporal parietal junction 57 −60 30 
Orbital frontal gyrus 54 21 
Cerebellum −24 −78 −36 8.6 6.9 
Cerebellum 30 −78 −36 8.1 8.0 
Cerebellar vermis −6 −60 −48 7.1 4.0 
Medial prefrontal cortex −6 54 30 8.3 6.3 
Medial prefrontal cortex L/R 60 −15 5.2 1.6 
Posterior cingulate cortex/precuneus L/R −54 33 5.7 7.2 
Medial orbital frontal L/R 60 −15 5.2 1.6 
Middle frontal gyrus −48 51 5.2 3.3 

aMin. t is the minimal t statistics of the individual comparisons between a condition and each of the other two conditions.

Figure 2. 

Activity in the left and right amygdalae in response to visually vivid, action-based, and emotionally charged story content. This ROI analysis showed that the responses to emotionally charged content were significantly stronger than the responses to the other two content types (indicated by bracketed asterisks; p < .05). Error bars in both panels indicate the standard error in each condition.

Figure 2. 

Activity in the left and right amygdalae in response to visually vivid, action-based, and emotionally charged story content. This ROI analysis showed that the responses to emotionally charged content were significantly stronger than the responses to the other two content types (indicated by bracketed asterisks; p < .05). Error bars in both panels indicate the standard error in each condition.

To examine the effect of paragraph sequence, independent of story content, we compared the first, second, and third control paragraphs using the conjunction analysis method and threshold. When the first paragraph was compared with the second and the third paragraphs in two separate tests, only the pre-SMA and the left anterior insula were significantly activated. But, the visual-spatial processing regions that were activated in the Perception condition, including the occipital lobe, retrosplenial cortex, parahippocampal gyrus, were not detected in this comparison. There were no significant activations at all detected when the second paragraph was compared with the first and the third paragraphs, nor were any significant activations found when we compared the third with the first and the second paragraphs. These analyses provide further evidence that the modality-specific activations associated with the three content types are not driven by the paragraph sequence in a story.

Activations Common to All Story Conditions

To identify regions activated by story comprehension per se, independent of content type, we compared each story condition individually to the nursery rhymes condition to control for low-level language processing. Figure 3 shows regions that were commonly activated across all individual comparisons. As expected, language areas were activated, including the LIFG, the posterior superior and middle temporal gyri, and the bilateral aTL. Importantly, the seed regions representing the language regions selected for the connectivity analyses overlapped with the activated clusters in this analysis (Figure 3). In other words, the seed regions were activated in every story condition when they were each compared with listening to nursery rhymes. Moreover, additional regions that have been associated with narrative comprehension were also active, including the left dorsal lateral pFC, the MPFC, the PCC, the parahippocampal gyrus, and the fusiform gyrus (Mar, 2011; Ferstl et al., 2008; Yarkoni, Speer, & Zacks, 2008; Xu, Kemeny, Park, Frattali, & Braun, 2005). Extensive activations were also observed in the lingual gyrus and the cerebellum.

Figure 3. 

Activations common to all story conditions when each was compared individually to the nursery rhyme condition and the approximate locations of the seed regions selected for the functional connectivity analyses (indicated by green dots). The minimum t statistics for the individual comparisons are presented using yellow–orange color scale and rendered on a single-subject anatomical image. Each comparison was corrected for family-wise error (p < .05).

Figure 3. 

Activations common to all story conditions when each was compared individually to the nursery rhyme condition and the approximate locations of the seed regions selected for the functional connectivity analyses (indicated by green dots). The minimum t statistics for the individual comparisons are presented using yellow–orange color scale and rendered on a single-subject anatomical image. Each comparison was corrected for family-wise error (p < .05).

Modulation of Functional Connectivity with the LIFG, pMTG, and aTL

We next examined how functional connectivity with the language areas was influenced by presentation of different types of story content. The LIFG showed selective increases in functional connectivity with the modality-specific regions activated for perceptual, action-based, and emotionally charged content in the Perception, Action, and Emotion conditions, respectively (Figure 4 and Table 6). In other words, when perceptual content was presented in the story, there was greater connectivity between the LIFG and areas associated with perception, and so forth. Comparing Perception with the Action, Emotion, and Control conditions, the LIFG was more strongly connected with the left and right parahippocampal gyrus. These regions overlapped with the clusters that exhibited increased responses selective to perceptual content, as reported in the previous section. The Action condition selectively modulated connectivity between the LIFG and a cluster in the left AIP area, the same region that showed increased activity for action-based content. However, neither the LIFG, nor other regions tested (i.e., pMTG and bilateral aTL), showed enhanced connectivity with the left PMd in the Action condition, a region that had been selectively activated for action-based content. Lastly, LIFG activity was more strongly correlated with the right IPL (extending to the TPJ) by the Emotion condition, compared with the Perception, Action, and Control conditions. The same anatomical regions were strongly activated when emotionally charged story segments were contrasted with the other two types of content.

Figure 4. 

Regions that exhibited selective increases in connectivity with the LIFG seed (A), the left pMTG seed (B), the left aTL seed (C), and the right aTL (D) for visually vivid (pink), action-based (red), and emotionally charged (blue) story content. The green dots indicate the approximated locations of the seed regions. An increase in connectivity was considered reliable for a condition only if the increase in that condition was significantly larger than the other two in two separate comparisons controlling for the paragraph-sequence effect (see Methods). All individual comparisons were corrected for family-wise error (p < .05), and the results are rendered on a single-subject anatomical image.

Figure 4. 

Regions that exhibited selective increases in connectivity with the LIFG seed (A), the left pMTG seed (B), the left aTL seed (C), and the right aTL (D) for visually vivid (pink), action-based (red), and emotionally charged (blue) story content. The green dots indicate the approximated locations of the seed regions. An increase in connectivity was considered reliable for a condition only if the increase in that condition was significantly larger than the other two in two separate comparisons controlling for the paragraph-sequence effect (see Methods). All individual comparisons were corrected for family-wise error (p < .05), and the results are rendered on a single-subject anatomical image.

Table 6. 

Regions that Exhibited Selective Increase in Connectivity with LIFG, Left pMTG, Left aTL, and Right aTL for Visually Vivid, Action-based, and Emotionally Charged Story Content


Hemisphere
LIFG
Left pMTG
Left aTL
Right aTL
Peak MNI Coordinates
Min. ta
Volume (cm3)
Peak MNI Coordinates
Min. t
Volume (cm3)
Peak MNI Coordinates
Min. t
Volume (cm3)
Peak MNI Coordinates
Min. t
Volume (cm3)
x
y
z
x
y
z
x
y
z
x
y
z
Perception 
Parahippocampal gyrus 24 −33 −15 3.8 1.5                
Parahippocampal gyrus −30 −18 −18 3.7 2.8 −21 −36 −15 2.6 0.5           
 
Action 
Inferior temporal sulcus      −48 −60 −6 4.1 1.4           
Anterior intraparietal area −45 −42 45 4.0 4.3 −57 −33 39 3.0 0.6           
 
Emotion 
Inferior parietal lobule           −57 −60 36 3.3 0.9 −45 −57 33 3.7 2.4 
Inferior parietal lobule 48 −36 36 3.2 1.6      54 −60 36 3.0 1.3 51 −60 33 3.6 2.4 
Posterior cingulate cortex/Precuneus                −45 27 3.6 5.9 
Posterior cingulate cortex/Precuneus      −6 −60 30 3.5 4.3 −3 −51 30 3.0 2.2      
Medial prefrontal cortex                48 39 2.8 2.8 

Hemisphere
LIFG
Left pMTG
Left aTL
Right aTL
Peak MNI Coordinates
Min. ta
Volume (cm3)
Peak MNI Coordinates
Min. t
Volume (cm3)
Peak MNI Coordinates
Min. t
Volume (cm3)
Peak MNI Coordinates
Min. t
Volume (cm3)
x
y
z
x
y
z
x
y
z
x
y
z
Perception 
Parahippocampal gyrus 24 −33 −15 3.8 1.5                
Parahippocampal gyrus −30 −18 −18 3.7 2.8 −21 −36 −15 2.6 0.5           
 
Action 
Inferior temporal sulcus      −48 −60 −6 4.1 1.4           
Anterior intraparietal area −45 −42 45 4.0 4.3 −57 −33 39 3.0 0.6           
 
Emotion 
Inferior parietal lobule           −57 −60 36 3.3 0.9 −45 −57 33 3.7 2.4 
Inferior parietal lobule 48 −36 36 3.2 1.6      54 −60 36 3.0 1.3 51 −60 33 3.6 2.4 
Posterior cingulate cortex/Precuneus                −45 27 3.6 5.9 
Posterior cingulate cortex/Precuneus      −6 −60 30 3.5 4.3 −3 −51 30 3.0 2.2      
Medial prefrontal cortex                48 39 2.8 2.8 

aMin. t is the minimal t statistics of the individual comparisons between a condition and each of the other two conditions.

A parallel analysis focusing on the pMTG revealed a similar pattern to that observed with the LIFG in the sense that all three types of content selectively modulated connectivity with the modality-specific regions that had been identified in the previous activation analysis (Figure 4). However, the specific associations observed were slightly different. During the Perception condition, the pMTG was more strongly connected to the left parahippocampal gyrus. During the Action condition, there was a significant increase in connectivity between the pMTG and both the left posterior inferior temporal sulcus and the AIP area. During the Emotion condition, the pMTG was more strongly connected to the PCC, but not the IPL/TPJ.

With respect to aTL connectivity, the pattern of connectivity enhancement was markedly different, in that it was confined to the Emotion condition (Figure 4). While listening to emotionally charged content, the right aTL showed strong connectivity enhancement with the MPFC, PCC, bilateral IPL/TPJ, and right orbital frontal gyrus. A similar connectivity pattern was observed for the left aTL, which was more connected to the bilateral IPL/TPJ and PCC. Neither the left nor right aTL showed significant increase in connectivity with any regions associated with the Perception and Action conditions.

Lastly, we evaluated the connectivity patterns associated with the first, second, and third control paragraphs using the LIFG, pMTG, and aTL seed regions. However, none of these comparisons yielded any significant clusters at the specified threshold.

DISCUSSION

Our goal was to understand how the language regions interact with modality-specific systems during naturalistic, spoken language comprehension. Consistent with the embodied view of language comprehension and previous studies on story content (Speer et al., 2009; Ferstl & von Cramon, 2007; Ferstl et al., 2005), we demonstrated that modality-specific activations are selectively modulated by the content of a story. To investigate the interactions between language and modality-specific regions, we examined the modulatory effects of content on the connections between language regions and modality-specific regions. As predicted, the functional connectivity analyses illustrated that the language regions interacted selectively with modality-specific systems during the comprehension of different types of story content, but the connectivity patterns associated with each of the language ROIs were markedly different. Although the LIFG, pMTG, and bilaterial aTL were strongly activated for all stories, regardless of content, only the LIFG and pMTG showed increased functional connectivity with all three of the modality-specific neural systems in response to the corresponding story content.

Our results suggest that sensorimotor and affective simulations during narrative comprehension involve not only the modality-specific system relevant to the content but that these multimodal simulations likely manifest through collaboration between the modality-specific neural systems and language areas. Since both the LIFG and pMTG interacted with modality-specific regions in a content-dependent fashion, the LIFG and pMTG may subserve a general mechanism to activate the comprehender's perceptual, motor, and affective knowledge and integrate it into a coherent mental representation of the situation described, known as a situation model (Zwaan & Radvansky, 1998; Kintsch, 1988).

Neural Responses to Different Forms of Story Content

Visually vivid descriptions of scenes evoked strong responses in major components of the parietomedial temporal pathway for visual-spatial processing (Kravitz, Saleem, Baker, & Mishkin, 2011), including regions critical for processing visual scenes, construction of mental scenes, and spatial navigation (Vann, Aggleton, & Maguire, 2009; Epstein, 2008). This implies that generating a mental scene of the described situation may involve a reactivation of past personal perceptual experience triggered by the text (Mar & Oatley, 2008).

Descriptions of actions elicited robust responses in a different set of brain regions, those that play an important role in the planning and control of complex movements (Gazzola & Keysers, 2009; Filimon, Nelson, Hagler, & Sereno, 2007; Buccino et al., 2001). Increased activation in these motor areas during the processing of action-based content suggest that motor and somatosensory representations of the described actions are simulated during narrative comprehension, consistent with past work on single-word and sentence processing (Tettamanti et al., 2005; Hauk, Johnsrude, & Pulvermuller, 2004).

With respect to emotionally charged content, we found that these story segments elicited selective responses in the left and right amygdalae. Activation of the amygdala has been associated with the processing of affective stimuli (Phelps, 2006), including emotional words (Herbert et al., 2009; Isenberg et al., 1999). However, understanding a protagonist's emotional reactions during story comprehension is more complicated than processing simple emotional stimuli because it requires understanding the complex social situations narrated in the story. This requires the use of social concepts (Frith, 2007; Dalgleish, 2004) and inferring the mental states of the protagonist (e.g., beliefs and intentions; Olsson & Ochsner, 2008). Consistent with this view, we observed increased activations in a set of heteromodal regions during the Emotion condition, the so-called mentalizing network (including the MPFC, PCC, STS, and IPL). Although the role of the amygdala during mentalizing is not clear, a previous meta-analysis demonstrated that this structure is reliably activated across studies of mentalizing (Mar, 2011). In addition, it has been shown that patients who acquired a lesion in the amygdala early in life were impaired in the ability to reason about the mental states of others (Shaw et al., 2004). Although we cannot rule out the possibility that activations in the amygdala merely reflect the participants' own arousal triggered by the stories' emotional content, in light of the evidence mentioned above, the amygdala may play a role in evaluating the protagonist's emotional state (Thom et al., 2012; Dapretto et al., 2006; Keysers & Gazzola, 2006; Ruby & Decety, 2004) and may also act in concert with the mentalizing network to interpret the intentions underlying the protagonists' actions (Olsson & Ochsner, 2008).

Neural Responses to Narrative Sequence Independent of Content

In our analyses, the control paragraphs were used to rule out the possibility that condition-dependent effects might be because of the sequence in which the experimental conditions were presented. These control paragraphs can also be used to examine the effect of accumulating context independent of story content. Xu and colleagues (2005) showed that the endings of Aesop's fables were associated with increased hemodynamic responses in the MPFC, PCC, and bilateral IPL. However, when we compared the third control paragraph with the first and second, none these regions were significantly activated. An important difference in story materials might help to explain these diverging results, as Xu and colleagues did not take story content into consideration. It is plausible that the fables used in that study were emotional at the end of the fable compared with the beginning or involved moral teachings and social concepts that were only presented at the end of the fables. If this is the case, the increased responses in the MPFC, PCC, and bilateral IPL associated with at the end of the fables previously reported by Xu and colleagues (2005) would be consistent with the results of this study; we observed activity in this same set of regions in response to emotional content.

Interactions between Language and Modality-specific Systems

In addition to identifying activations modulated by the three content types, we interrogated the data with respect to functional connectivity. The foregoing activation analysis identified the extent to which activity in each voxel, reflected in the hemodynamic response, was sustained throughout the onset and offset of paragraphs for each condition. On the other hand, the functional connectivity analyses quantified the degree of synchronization between transient activity in spatially segregated regions. Although connectivity analyses cannot determine how this transient activity is triggered within a task condition, it reflects a direct or indirect functional relationship between two regions. Examining functional connectivity is particularly important for the understanding of higher-level cognition such as narrative comprehension, because these complex functions are generally assumed to emerge from interactions between regions with specialized functions (Friston, 2002). Because the language regions examined in our study are all robustly activated during discourse comprehension (in our data and in previous studies; Binder et al., 2011; Mar, 2011; Ferstl et al., 2008), the activation analysis is not ideally suited to differentiate their roles in processing different types of story content. Using a connectivity analysis, we revealed that the language regions demonstrate different degrees of functional connectivity with modality-specific regions and that these are selectively enhanced in response to the different types of story content.

While listening to visually vivid content, both the LIFG and pMTG were more functionally connected with regions responsible for visual-spatial processing. Similar patterns were observed during the comprehension of action-based and emotionally charged content. Both areas demonstrated significantly greater connectivity to regions responsible for motor processing and mentalizing during action and emotion paragraphs, respectively. This pattern of content-specific connectivity enhancement suggests that simulations instantiated in modality-specific systems are not independent of the language regions. On the contrary, they are functionally coupled in a systematic way, with multimodal simulations possibly emerging from the interactions between language and modality-specific systems.

The connectivity results for the aTL differed markedly from those observed for the IFG and pMTG, in that the modulation of connectivity for the aTL was only observed during the Emotion condition. During the presentation of emotional content, the correlations between the aTL and elements of the mentalizing system were significantly enhanced relative to the presentation of the other two types of content. This pattern of results can inform the current debate on the role of the aTL in semantic processing. Past research on semantic dementia patients suggests that the aTL is involved in a wide variety of semantic tasks (Lambon Ralph et al., 2010; Davies, Halliday, Xuereb, Kril, & Hodges, 2009; Rogers et al., 2006; Williams, Nestor, & Hodges, 2005). This has led a number of researchers to propose that the aTL is a “semantic hub,” binding multimodal information into amodal, domain-general representations (Lambon Ralph et al., 2010; Ferstl et al., 2008; Patterson et al., 2007; Jung-Beeman, 2005). Evidence for the semantic hub hypothesis primarily comes from tasks that rely on single words or pictures for stimuli, so it is unclear whether the aTL behaves similarly when people are presented with discourse. It is possible that in this situation the aTL also acts as a semantic hub, integrating modality-specific information into a coherent discourse representation. If this is the case, then we would expect the aTL to interact with all three modality-specific systems to integrate information and form these representations. In contrast, in our data connectivity with the aTL was only modulated by emotional content. This indicates that the functional role of the aTL in processing words or pictures may be different than its role during discourse comprehension. In narrative comprehension, the aTL may play a more specialized role in comprehending protagonists' emotions and retrieving social concepts during story comprehension, consistent with past research that has associated the aTL with social semantic-processing (Binder & Desai, 2011; Simmons et al., 2010; Simmons & Martin, 2009). It is also possible that different regions in the aTL are devoted to different processes. The ventral portion of the aTL may be responsible for “semantic hub” functions (Binney, Embleton, Jefferies, Parker, & Ralph, 2010), in contrast to the superior portion of the aTL where our seeds were located. Lastly, it is also possible that the anterior temporal regions may not have been adequately sampled during our data acquisition because these areas of the brain are quite vulnerable to susceptibility artifacts (Visser, Jefferies, & Lambon Ralph, 2010). Further studies are needed to properly clarify these issues.

Both the LIFG and pMTG showed enhanced connectivity with elements of all three modality-specific systems in a content-specific manner. This indicates that the LIFG and pMTG might contribute to a general mechanism for multimodal simulations through which perceptual, motor, and affective representations of the described situation are constructed. Incorporating our connectivity results into contemporary models of language processing (Lau et al., 2008; Tyler & Marslen-Wilson, 2008; Hagoort, 2005), we postulate that the pMTG and LIFG may play distinct roles in activating comprehender's text-relevant perceptual, motor, and affective knowledge and using this knowledge to form coherent, multimodal mental representations.

The pMTG has been shown to play an important role in accessing conceptual knowledge based on brain imaging and lesion studies (Hickok & Poeppel, 2007; Dronkers, Wilkins, Van Valin, Redfern, & Jaeger, 2004). Extending the language processing models proposed by Lau and colleagues (2008) and Hickok and Poeppel (2007), we propose that the pMTG serve as an interface between lexical and modality-specific representations. Through this interface, the comprehender's text-related knowledge is activated and represented in perceptual, motor, and affective neural systems. This would account for the modulation of connectivity between the pMTG and the modality-specific regions observed in our study. In contrast, the interactions between the LIFG and modality-specific regions may reflect the selection of activated information based on the context established by preceding portions of the text (Thompson-Schill, D'Esposito, Aguirre, & Farah, 1997), along with integration of this information into modality-specific representations (Hagoort, 2005). As the story unfolds, perceptual, motor, and affective representations are updated through activation and integration processes (Zwaan, 2003), forming a multimodal situation model of the narrative.

Taken together, our results provide a more complete understanding of the neural mechanism underlying language comprehension. Comprehension emerges not only from the interactions between frontal and temporal brain systems, as proposed by contemporary models of language processing, but also from content-dependent interactions between these language regions and modality-specific brain systems.

Acknowledgments

The authors would like to thank Maja Djikic for her assistance in story composition. We also thank Alex Martin for helpful discussions and the NIH Fellows Editorial Board for editing an earlier version of this paper. This work was supported by the Intramural Research Program of the NIDCD.

Reprint requests should be sent to Ho Ming Chow, National Institutes of Health, 10/5C410, 10 Center Dr., Bethesda, MD 20892, or via e-mail: chowh@nidcd.nih.gov.

REFERENCES

REFERENCES
Badre
,
D.
,
Poldrack
,
R. A.
,
Pare-Blagoev
,
E. J.
,
Insler
,
R. Z.
, &
Wagner
,
A. D.
(
2005
).
Dissociable controlled retrieval and generalized selection mechanisms in ventrolateral prefrontal cortex.
Neuron
,
47
,
907
918
.
Barsalou
,
L. W.
(
2003
).
Situated simulation in the human conceptual system.
Language and Cognitive Processes
,
18
,
513
562
.
Binder
,
J. R.
, &
Desai
,
R. H.
(
2011
).
The neurobiology of semantic memory.
Trends in Cognitive Sciences
,
15
,
527
536
.
Binder
,
J. R.
,
Gross
,
W. L.
,
Allendorfer
,
J. B.
,
Bonilha
,
L.
,
Chapin
,
J.
,
Edwards
,
J. C.
,
et al
(
2011
).
Mapping anterior temporal lobe language areas with fMRI: A multicenter normative study.
Neuroimage
,
54
,
1465
1475
.
Binney
,
R. J.
,
Embleton
,
K. V.
,
Jefferies
,
E.
,
Parker
,
G. J.
, &
Ralph
,
M. A.
(
2010
).
The ventral and inferolateral aspects of the anterior temporal lobe are crucial in semantic memory: Evidence from a novel direct comparison of distortion-corrected fMRI, rTMS, and semantic dementia.
Cerebral Cortex
,
20
,
2728
2738
.
Bokde
,
A. L.
,
Tagamets
,
M. A.
,
Friedman
,
R. B.
, &
Horwitz
,
B.
(
2001
).
Functional interactions of the inferior frontal cortex during the processing of words and word-like stimuli.
Neuron
,
30
,
609
617
.
Boulenger
,
V.
,
Mechtouff
,
L.
,
Thobois
,
S.
,
Broussolle
,
E.
,
Jeannerod
,
M.
, &
Nazir
,
T. A.
(
2008
).
Word processing in Parkinson's disease is impaired for action verbs but not for concrete nouns.
Neuropsychologia
,
46
,
743
756
.
Buccino
,
G.
,
Binkofski
,
F.
,
Fink
,
G. R.
,
Fadiga
,
L.
,
Fogassi
,
L.
,
Gallese
,
V.
,
et al
(
2001
).
Action observation activates premotor and parietal areas in a somatotopic manner: An fMRI study.
European Journal of Neuroscience
,
13
,
400
404
.
Buckner
,
R. L.
,
Andrews-Hanna
,
J. R.
, &
Schacter
,
D. L.
(
2008
).
The brain's default network: Anatomy, function, and relevance to disease.
Annals of the New York Academy of Sciences
,
1124
,
1
38
.
Coltheart
,
M.
(
1981
).
The MRC psycholinguistic database.
Quarterly Journal of Experimental Psychology: Section A-Human Experimental Psychology
,
33
,
497
505
.
Cox
,
R. W.
(
1996
).
AFNI: Software for analysis and visualization of functional magnetic resonance neuroimages.
Computers in Biomedical Research
,
29
,
162
173
.
Dalgleish
,
T.
(
2004
).
The emotional brain.
Nature Reviews Neuroscience
,
5
,
583
589
.
Dapretto
,
M.
,
Davies
,
M. S.
,
Pfeifer
,
J. H.
,
Scott
,
A. A.
,
Sigman
,
M.
,
Bookheimer
,
S. Y.
,
et al
(
2006
).
Understanding emotions in others: Mirror neuron dysfunction in children with autism spectrum disorders.
Nature Neuroscience
,
9
,
28
30
.
Davies
,
R. R.
,
Halliday
,
G. M.
,
Xuereb
,
J. H.
,
Kril
,
J. J.
, &
Hodges
,
J. R.
(
2009
).
The neural basis of semantic memory: Evidence from semantic dementia.
Neurobiology of Aging
,
30
,
2043
2052
.
Desai
,
R. H.
,
Binder
,
J. R.
,
Conant
,
L. L.
, &
Seidenberg
,
M. S.
(
2010
).
Activation of sensory-motor areas in sentence comprehension.
Cerebral Cortex
,
20
,
468
478
.
Dronkers
,
N. F.
,
Wilkins
,
D. P.
,
Van Valin
,
R. D.
, Jr.,
Redfern
,
B. B.
, &
Jaeger
,
J. J.
(
2004
).
Lesion analysis of the brain areas involved in language comprehension.
Cognition
,
92
,
145
177
.
Epstein
,
R. A.
(
2008
).
Parahippocampal and retrosplenial contributions to human spatial navigation.
Trends in Cognitive Sciences
,
12
,
388
396
.
Ferstl
,
E. C.
,
Neumann
,
J.
,
Bogler
,
C.
, &
von Cramon
,
D. Y.
(
2008
).
The extended language network: A meta-analysis of neuroimaging studies on text comprehension.
Human Brain Mapping
,
29
,
581
593
.
Ferstl
,
E. C.
,
Rinck
,
M.
, &
von Cramon
,
D. Y.
(
2005
).
Emotional and temporal aspects of situation model processing during text comprehension: An event-related fMRI study.
Journal of Cognitive Neuroscience
,
17
,
724
739
.
Ferstl
,
E. C.
, &
von Cramon
,
D. Y.
(
2007
).
Time, space and emotion: fMRI reveals content-specific activation during text comprehension.
Neuroscience Letters
,
427
,
159
164
.
Filimon
,
F.
,
Nelson
,
J. D.
,
Hagler
,
D. J.
, &
Sereno
,
M. I.
(
2007
).
Human cortical representations for reaching: Mirror neurons for execution, observation, and imagery.
Neuroimage
,
37
,
1315
1328
.
Friston
,
K.
(
2002
).
Functional integration and inference in the brain.
Progress in Neurobiology
,
68
,
113
143
.
Frith
,
C. D.
(
2007
).
The social brain?
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
362
,
671
678
.
Gazzola
,
V.
, &
Keysers
,
C.
(
2009
).
The observation and execution of actions share motor and somatosensory voxels in all tested subjects: Single-subject analyses of unsmoothed fMRI data.
Cerebral Cortex
,
19
,
1239
1255
.
Hagoort
,
P.
(
2005
).
On Broca, brain, and binding: A new framework.
Trends in Cognitive Sciences
,
9
,
416
423
.
Hauk
,
O.
,
Johnsrude
,
I.
, &
Pulvermuller
,
F.
(
2004
).
Somatotopic representation of action words in human motor and premotor cortex.
Neuron
,
41
,
301
307
.
Herbert
,
C.
,
Ethofer
,
T.
,
Anders
,
S.
,
Junghofer
,
M.
,
Wildgruber
,
D.
,
Grodd
,
W.
,
et al
(
2009
).
Amygdala activation during reading of emotional adjectives-An advantage for pleasant content.
Social Cognitive and Affective Neuroscience
,
4
,
35
49
.
Hickok
,
G.
, &
Poeppel
,
D.
(
2007
).
The cortical organization of speech processing.
Nature Reviews Neuroscience
,
8
,
393
402
.
Isenberg
,
N.
,
Silbersweig
,
D.
,
Engelien
,
A.
,
Emmerich
,
S.
,
Malavade
,
K.
,
Beattie
,
B.
,
et al
(
1999
).
Linguistic threat activates the human amygdala.
Proceedings of the National Academy of Sciences, U.S.A.
,
96
,
10456
10459
.
Jung-Beeman
,
M.
(
2005
).
Bilateral brain processes for comprehending natural language.
Trends in Cognitive Sciences
,
9
,
512
518
.
Keysers
,
C.
, &
Gazzola
,
V.
(
2006
).
Towards a unifying neural theory of social cognition.
Progress in Brain Research
,
156
,
379
401
.
Kincaid
,
J. P.
,
Fishburne
,
R. P.
,
Rogers
,
R. L.
, &
Chissom
,
B. S.
(
1975
).
Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for navy enlisted personnel
.
Research Branch Report 8-75. Chief of Naval Technical Training
.
Memphis, TN
:
Naval Air Station
.
Kintsch
,
W.
(
1988
).
The role of knowledge in discourse comprehension: A construction-integration model.
Psychological Review
,
95
,
163
182
.
Kravitz
,
D. J.
,
Saleem
,
K. S.
,
Baker
,
C. I.
, &
Mishkin
,
M.
(
2011
).
A new neural framework for visuospatial processing.
Nature Reviews Neuroscience
,
12
,
217
230
.
Kucera
,
H.
, &
Francis
,
W. N.
(
1967
).
Computational Analysis of Present-Day American English.
Providence
:
Brown University Press
.
Lambon Ralph
,
M. A.
,
Sage
,
K.
,
Jones
,
R. W.
, &
Mayberry
,
E. J.
(
2010
).
Coherent concepts are computed in the anterior temporal lobes.
Proceedings of the National Academy of Sciences, U.S.A.
,
107
,
2717
2722
.
Lau
,
E. F.
,
Phillips
,
C.
, &
Poeppel
,
D.
(
2008
).
A cortical network for semantics: (De)constructing the N400.
Nature Reviews Neuroscience
,
9
,
920
933
.
Li
,
Y. O.
,
Adali
,
T.
, &
Calhoun
,
V. D.
(
2007
).
Estimating the number of independent components for functional magnetic resonance imaging data.
Human Brain Mapping
,
28
,
1251
1266
.
Mahon
,
B. Z.
, &
Caramazza
,
A.
(
2008
).
A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content.
Journal of Physiology Paris
,
102
,
59
70
.
Mar
,
R. A.
(
2011
).
The neural bases of social cognition and story comprehension.
Annual Review of Psychology
,
62
,
103
134
.
Mar
,
R. A.
, &
Oatley
,
K.
(
2008
).
The function of fiction is the abstraction and simulation of social experience.
Perspectives on Psychological Science
,
3
,
173
192
.
Mason
,
J.
, &
Just
,
M. A.
(
2006
).
Neuroimaging contributions to the understanding of discourse processes.
In M. T. M. A. Gernsbacher (Ed.)
,
Handbook of psycholinguistics
(pp.
765
799
).
Amsterdam
:
Elsevier
.
McKeown
,
M. J.
,
Makeig
,
S.
,
Brown
,
G. G.
,
Jung
,
T. P.
,
Kindermann
,
S. S.
,
Bell
,
A. J.
,
et al
(
1998
).
Analysis of fMRI data by blind separation into independent spatial components.
Human Brain Mapping
,
6
,
160
188
.
Meteyard
,
L.
,
Cuadrado
,
S. R.
,
Bahrami
,
B.
, &
Vigliocco
,
G.
(
2012
).
Coming of age: A review of embodiment and the neuroscience of semantics.
Cortex
,
48
,
788
804
.
Neininger
,
B.
, &
Pulvermuller
,
F.
(
2003
).
Word-category specific deficits after lesions in the right hemisphere.
Neuropsychologia
,
41
,
53
70
.
Oldfield
,
R. C.
(
1971
).
The assessment and analysis of handedness: The Edinburgh inventory.
Neuropsychologia
,
9
,
97
113
.
Olsson
,
A.
, &
Ochsner
,
K. N.
(
2008
).
The role of social cognition in emotion.
Trends in Cognitive Sciences
,
12
,
65
71
.
Patterson
,
K.
,
Nestor
,
P. J.
, &
Rogers
,
T. T.
(
2007
).
Where do you know what you know? The representation of semantic knowledge in the human brain.
Nature Reviews Neuroscience
,
8
,
976
987
.
Phelps
,
E. A.
(
2006
).
Emotion and cognition: Insights from studies of the human amygdala.
Annual Review of Psychology
,
57
,
27
53
.
Price
,
C. J.
(
2000
).
The anatomy of language: Contributions from functional neuroimaging.
Journal of Anatomy
,
197
,
335
359
.
Rogers
,
T. T.
,
Hocking
,
J.
,
Noppeney
,
U.
,
Mechelli
,
A.
,
Gorno-Tempini
,
M. L.
,
Patterson
,
K.
,
et al
(
2006
).
Anterior temporal cortex and semantic memory: Reconciling findings from neuropsychology and functional imaging.
Cognitive, Affective & Behavioral Neuroscience
,
6
,
201
213
.
Ruby
,
P.
, &
Decety
,
J.
(
2004
).
How would you feel versus how do you think she would feel? A neuroimaging study of perspective-taking with social emotions.
Journal of Cognitive Neuroscience
,
16
,
988
999
.
Shaw
,
P.
,
Lawrence
,
E. J.
,
Radbourne
,
C.
,
Bramham
,
J.
,
Polkey
,
C. E.
, &
David
,
A. S.
(
2004
).
The impact of early and late damage to the human amygdala on “theory of mind” reasoning.
Brain
,
127
,
1535
1548
.
Simmons
,
W. K.
, &
Martin
,
A.
(
2009
).
The anterior temporal lobes and the functional architecture of semantic memory.
Journal of the International Neuropsychological Society
,
15
,
645
649
.
Simmons
,
W. K.
,
Reddish
,
M.
,
Bellgowan
,
P. S.
, &
Martin
,
A.
(
2010
).
The selectivity and functional connectivity of the anterior temporal lobes.
Cerebral Cortex
,
20
,
813
825
.
Speer
,
N. K.
,
Reynolds
,
J. R.
,
Swallow
,
K. M.
, &
Zacks
,
J. M.
(
2009
).
Reading stories activates neural representations of visual and motor experiences.
Psychological Science
,
20
,
989
999
.
Tettamanti
,
M.
,
Buccino
,
G.
,
Saccuman
,
M. C.
,
Gallese
,
V.
,
Danna
,
M.
,
Scifo
,
P.
,
et al
(
2005
).
Listening to action-related sentences activates frontoparietal motor circuits.
Journal of Cognitive Neuroscience
,
17
,
273
281
.
Thom
,
N. J.
,
Johnson
,
D. C.
,
Flagan
,
T.
,
Simmons
,
A. N.
,
Kotturi
,
S. A.
,
Van Orden
,
K. F.
,
et al
(
2012
).
Detecting emotion in others: Increased insula and decreased medial prefrontal cortex activation during emotion processing in elite adventure racers.
Social Cognitive and Affective Neuroscience
.
doi: 10.1093/scan/nst029
.
Thompson-Schill
,
S. L.
,
D'Esposito
,
M.
,
Aguirre
,
G. K.
, &
Farah
,
M. J.
(
1997
).
Role of left inferior prefrontal cortex in retrieval of semantic knowledge: A reevaluation.
Proceedings of the National Academy of Sciences, U.S.A.
,
94
,
14792
14797
.
Turken
,
A. U.
, &
Dronkers
,
N. F.
(
2011
).
The neural architecture of the language comprehension network: Converging evidence from lesion and connectivity analyses.
Frontiers in Systems Neuroscience
,
5
,
1
.
Tyler
,
L. K.
, &
Marslen-Wilson
,
W.
(
2008
).
Fronto-temporal brain systems supporting spoken language comprehension.
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
363
,
1037
1054
.
Tzourio-Mazoyer
,
N.
,
Landeau
,
B.
,
Papathanassiou
,
D.
,
Crivello
,
F.
,
Etard
,
O.
,
Delcroix
,
N.
,
et al
(
2002
).
Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain.
Neuroimage
,
15
,
273
289
.
Vann
,
S. D.
,
Aggleton
,
J. P.
, &
Maguire
,
E. A.
(
2009
).
What does the retrosplenial cortex do?
Nature Reviews Neuroscience
,
10
,
792
802
.
Vigneau
,
M.
,
Beaucousin
,
V.
,
Herve
,
P. Y.
,
Duffau
,
H.
,
Crivello
,
F.
,
Houde
,
O.
,
et al
(
2006
).
Meta-analyzing left hemisphere language areas: Phonology, semantics, and sentence processing.
Neuroimage
,
30
,
1414
1432
.
Visser
,
M.
,
Jefferies
,
E.
, &
Lambon Ralph
,
M. A.
(
2010
).
Semantic processing in the anterior temporal lobes: A meta-analysis of the functional neuroimaging literature.
Journal of Cognitive Neuroscience
,
22
,
1083
1094
.
Willems
,
R. M.
,
Labruna
,
L.
,
D'Esposito
,
M.
,
Ivry
,
R.
, &
Casasanto
,
D.
(
2011
).
A functional role for the motor system in language understanding: Evidence from theta-burst transcranial magnetic stimulation.
Psychological Science
,
22
,
849
854
.
Williams
,
G. B.
,
Nestor
,
P. J.
, &
Hodges
,
J. R.
(
2005
).
Neural correlates of semantic and behavioural deficits in frontotemporal dementia.
Neuroimage
,
24
,
1042
1051
.
Xu
,
J.
,
Kemeny
,
S.
,
Park
,
G.
,
Frattali
,
C.
, &
Braun
,
A.
(
2005
).
Language in context: Emergent features of word, sentence, and narrative comprehension.
Neuroimage
,
25
,
1002
1015
.
Yarkoni
,
T.
,
Speer
,
N. K.
, &
Zacks
,
J. M.
(
2008
).
Neural substrates of narrative comprehension and memory.
Neuroimage
,
41
,
1408
1425
.
Zwaan
,
R. A.
(
2003
).
The immersed experiencer: Toward an embodied theory of language comprehension.
The Psychology of Learning and Motivation
,
44
,
35
62
.
Zwaan
,
R. A.
, &
Radvansky
,
G. A.
(
1998
).
Situation models in language comprehension and memory.
Psychological Bulletin
,
123
,
162
185
.