The embodied view of language processing proposes that comprehension involves multimodal simulations, a process that retrieves a comprehender's perceptual, motor, and affective knowledge through reactivation of the neural systems responsible for perception, action, and emotion. Although evidence in support of this idea is growing, the contemporary neuroanatomical model of language suggests that comprehension largely emerges as a result of interactions between frontotemporal language areas in the left hemisphere. If modality-specific neural systems are involved in comprehension, they are not likely to operate in isolation but should interact with the brain regions critical to language processing. However, little is known about the ways in which language and modality-specific neural systems interact. To investigate this issue, we conducted a functional MRI study in which participants listened to stories that contained visually vivid, action-based, and emotionally charged content. Activity of neural systems associated with visual-spatial, motor, and affective processing were selectively modulated by the relevant story content. Importantly, when functional connectivity patterns associated with the left inferior frontal gyrus (LIFG), the left posterior middle temporal gyrus (pMTG), and the bilateral anterior temporal lobes (aTL) were compared, both LIFG and pMTG, but not the aTL, showed enhanced connectivity with the three modality-specific systems relevant to the story content. Taken together, our results suggest that language regions are engaged in perceptual, motor, and affective simulations of the described situation, which manifest through their interactions with modality-specific systems. On the basis of our results and past research, we propose that the LIFG and pMTG play unique roles in multimodal simulations during story comprehension.