Understanding language always occurs within a situational context and, therefore, often implies combining streams of information from different domains and modalities. One such combination is that of spoken language and visual information, which are perceived together in a variety of ways during everyday communication. Here we investigate whether and how words and pictures differ in terms of their neural correlates when they are integrated into a previously built-up sentence context. This is assessed in two experiments looking at the time course (measuring event-related potentials, ERPs) and the locus (using functional magnetic resonance imaging, fMRI) of this integration process. We manipulated the ease of semantic integration of word and/or picture to a previous sentence context to increase the semantic load of processing. In the ERP study, an increased semantic load led to an N400 effect which was similar for pictures and words in terms of latency and amplitude. In the fMRI study, we found overlapping activations to both picture and word integration in the left inferior frontal cortex. Specific activations for the integration of a word were observed in the left superior temporal cortex. We conclude that despite obvious differences in representational format, semantic information coming from pictures and words is integrated into a sentence context in similar ways in the brain. This study adds to the growing insight that the language system incorporates (semantic) information coming from linguistic and extralinguistic domains with the same neural time course and by recruitment of overlapping brain areas.