Abstract

In the absence of sensory information, we can generate meaningful images and sounds from representations in memory. However, it remains unclear which neural systems underpin this process and whether tasks requiring the top–down generation of different kinds of features recruit similar or different neural networks. We asked people to internally generate the visual and auditory features of objects, either in isolation (car, dog) or in specific and complex meaning-based contexts (car/dog race). Using an fMRI decoding approach, in conjunction with functional connectivity analysis, we examined the role of auditory/visual cortex and transmodal brain regions. Conceptual retrieval in the absence of external input recruited sensory and transmodal cortex. The response in transmodal regions—including anterior middle temporal gyrus—was of equal magnitude for visual and auditory features yet nevertheless captured modality information in the pattern of response across voxels. In contrast, sensory regions showed greater activation for modality-relevant features in imagination (even when external inputs did not differ). These data are consistent with the view that transmodal regions support internally generated experiences and that they play a role in integrating perceptual features encoded in memory.

INTRODUCTION

In the absence of sensory information, the mind produces experiences with rich sensorimotor features through the retrieval of information from memory (Mason et al., 2007; Antrobus, Singer, & Greenberg, 1966; Singer, 1966). For instance, in everyday life we regularly hear voices and music in the mind's ear when no sound is delivered (e.g., Alderson-Day & Fernyhough, 2015; Halpern, 2001), and studies suggest more than one third of our time is spent engaged in thoughts and experiences that are unrelated to the ongoing environment (Killingsworth & Gilbert, 2010; Kane et al., 2007).

Although attempts have been made to understand how the brain retrieves memories in the absence of input (Vetter, Smith, & Muckli, 2014; Albers, Kok, Toni, Dijkerman, & de Lange, 2013; Daselaar, Porat, Huijbers, & Pennartz, 2010), we lack an account of the critical neurocognitive processes. One prominent framework suggests that neurons with long-range patterns of connectivity provide a “global workspace” that can potentially interconnect multiple distributed and specialized brain areas in a coordinated although variable manner (Dehaene, Kerszberg, & Changeux, 1998). However, it remains unclear whether this varies with respect to the modality of the memories being retrieved and how these processes combine to support more complex multidimensional aspects of cognition. For instance, studies of imagination have almost entirely focused on a constrained ROI analysis, which may not adequately represent the rich involvement of multiple brain regions distributed across the cortex. Moreover, they have seldom attempted to differentiate between different forms of imagery, with the majority of studies focusing solely on visual imagery (Dijkstra, Zeidman, Ondobaka, van Gerven, & Friston, 2017; Coutanche & Thompson-Schill, 2015; Vetter et al., 2014; Albers et al., 2013; Lee, Kravitz, & Baker, 2012; Reddy, Tsuchiya, & Serre, 2010; Stokes, Thompson, Cusack, & Duncan, 2009; Ishai, Ungerleider, & Haxby, 2000). As such, there is limited understanding of the neural signature of top–down retrieval within different modalities (e.g., visual versus auditory) and whether different forms of internally generated conceptual retrieval share similar or unique neural representations. Notably, studies that have compared visual and auditory imagery within the same experiment have been criticized for not employing comparable task conditions (see Daselaar et al., 2010; Halpern, Zatorre, Bouffard, & Johnson, 2004).

A wealth of evidence supports the view that regions of unimodal sensory cortex are important for modality-specific elements of memory retrieval during imagination. Visual cortex is activated by mental images (de Borst & de Gelder, 2017; Vetter et al., 2014; Albers et al., 2013; Reddy et al., 2010; Ishai et al., 2000), and auditory cortex is activated by imagined sounds (de Borst & de Gelder, 2017; Zvyagintsev et al., 2013; Daselaar et al., 2010; Halpern & Zatorre, 1999). These findings are consistent with embodied cognition accounts, which propose that sensory regions important for perception and action also support mental processes such as comprehension and imagery (for discussion, see Kiefer & Pulvermüller, 2012; Barsalou, 1999, 2008; Patterson, Nestor, & Rogers, 2007). Notably, the majority of studies find recruitment of sensory association cortices during visual (Amedi, Malach, & Pascual-Leone, 2005; Ishai et al., 2000; Knauff, Kassubek, Mulack, & Greenlee, 2000) and auditory mental imagery (Bunzeck, Wuestenberg, Lutz, Heinze, & Jancke, 2005; Zatorre & Halpern, 2005). Moreover, a recent fMRI study showed that both secondary sensory regions and top–down mechanisms are necessary in visual imagery for enhancing the relevant representations in early sensory areas (Dijkstra et al., 2017). Some studies have also found imagery-induced activation in primary sensory cortex (Slotnick, Thompson, & Kosslyn, 2005; Kosslyn, Ganis, & Thompson, 2001; Kosslyn et al., 1999), and the extent to which primary and/or secondary sensory regions are recruited during different modalities of imagery remains a source of contention (Daselaar et al., 2010; Kosslyn et al., 2001). By directly comparing visual and auditory imagery under equivalent conditions in the same experiment, this study can elucidate the role of sensory cortex in mental imagery.

Contemporary accounts of semantic cognition suggest that memory retrieval also relies on abstract representations that are largely invariant to the input modality. A prominent theory of conceptual representation, known as the hub-and-spoke account, suggests that input-invariant concepts draw on a convergence zone in the ventrolateral anterior temporal lobe (ATL), which extracts deep semantic similarities across multiple unimodal features (Lambon Ralph, Jefferies, Patterson, & Rogers, 2017; Patterson et al., 2007). Support for this account comes from a recent fMRI study utilizing multivoxel pattern analysis (MVPA), which demonstrated that anterior inferior and middle temporal gyrus (MTG) support modality-invariant patterns of activity corresponding to meaning. In contrast, superior temporal voxels held patterns of activity that reflected sensory input modality (Murphy et al., 2017). If ventrolateral ATL represents abstract conceptual representations, as expected for a transmodal brain region (Margulies et al., 2016; Mesulam, 2012; Patterson et al., 2007), it may be critical for stimulus-independent cognition regardless of the modality that is being brought to mind.

In line with this broad perspective, studies have revealed ventrolateral ATL activation during the retrieval of concepts across different input modalities (e.g., Murphy et al., 2017; Reilly, Garcia, & Binney, 2016; Rice, Hoffman, & Lambon Ralph, 2015; van Ackeren & Rueschemeyer, 2014; Visser, Jefferies, & Lambon Ralph, 2010; Gabrieli, Brewer, Desmond, & Glover, 1997). Coutanche and Thompson-Schill (2015) also found that he left ATL could successfully decode the properties of an imagined object. In this study, classifiers in visual regions related to the shape (in V1) and color (in V4) of the object predicted classification of the specific imagined object in ATL. This is consistent with the hypothesis that information from sensory cortex is integrated in ATL to form modality-invariant conceptual representations that are critical for perceptually decoupled semantic retrieval, arguably facilitating sensory cortex in a top–down manner by priming the relevant features in sensory brain regions. However, this previous study only examined visual features, whereas connectivity and task activation data suggest ATL acts as a convergence zone across different sensory modalities, including auditory features (Lambon Ralph et al., 2017; Visser et al., 2010; Patterson et al., 2007). Because the convergence of these different modalities is thought to be graded (Lambon Ralph et al., 2017), it is assumed that ventrolateral ATL retains some degree of differential connectivity to auditory and visual cortex. A key question, therefore, is whether transmodal portions of ATL play a common or distinct role in the representation of information about different modalities when imagining visual and auditory features in the absence of input.

Furthermore, it remains unclear which brain regions are recruited during more complex multimodal imagery. Baron and Osherson (2011) found that conceptual combinations were represented in the left ATL: Decoding accuracy was related to classification accuracy for the constituents (boy = young + man). ATL can also show a stronger response to conceptual combinations, perhaps because these combinations require more specific patterns of semantic retrieval (Bemis & Pylkkänen, 2013). However, recent studies have shown that complex mental events are associated with a broader transmodal network, including medial pFC (mPFC; Hartung, Hagoort, & Willems, 2017) and attentional mechanisms (Berger, 2015). Taken together, this literature suggests that the heteromodal regions recruited to support simple semantic imagery across visual and auditory features may not be sufficient when imagination is more complex: Additional regions may come into play to support our capacity to flexibly maintain and integrate multiple features in specific and diverse ways.

We addressed these issues using a range of neuroimaging methods to identify neural patterns that support different aspects of conceptual knowledge at the whole-brain level. Using a constant source of visual and auditory noise as a baseline, participants were tasked to imagine information under three different conditions: visual (e.g., what a dog looks like), auditory (what a dog sounds like), and contextual (e.g., imagining a dog in a specific context, such as a race dog). This latter condition combines features from multiple modalities in a complex way (e.g., imagining a race dog may involve the visual properties of a greyhound and race track, as well as the auditory properties of dogs panting and crowds cheering). Figure 1 presents a schematic description of the experimental design used in our experiment. We compared the time points during which participants were explicitly instructed to imagine a given concept while observing visual and auditory noise to those in which participants only observed visual and auditory noise (baseline). Our paradigm, therefore, permitted us to investigate the mechanisms involved in imagery while controlling for sensory input across our conditions.

Figure 1. 

Experimental design. Participants were presented with written cues embedded in visual and auditory noise that referred to items they must imagine. Cues referred to one of three tasks (thinking about the sound of a concept; thinking about the visual properties of a concept; thinking about a concept in a particular complex context, i.e., at the races) for one of two concepts (Dogs; Cars). This yielded six experimental conditions (Sound Car, Sound Dog, Visual Car, Visual Dog, Context Car [e.g., Race Car], Context Dog [e.g., Race Dog]). Cues were followed by blocks of pure-noise that lasted 6–12 sec. During this presentation of noise, participants were explicitly told to sustain imagery of the cued concept (e.g., the sound of a dog barking) until an item appeared through the noise. Each block ended with either an image or a sound embedded in noise that was either congruent to the cue (e.g., greyhound for the context cue “Race Dog”) or incongruent (e.g., elephant trunk for the visual cue “Visual Dog”). Participants responded with a yes/no response to whether the target trial matched the cue. Time points of interest are highlighted in red: These refer to pure-noise trials where participants were thinking about the relevant cue (e.g., thinking about what a sound looked like). Cues, each pure-noise image, and targets were shown for 3 sec each.

Figure 1. 

Experimental design. Participants were presented with written cues embedded in visual and auditory noise that referred to items they must imagine. Cues referred to one of three tasks (thinking about the sound of a concept; thinking about the visual properties of a concept; thinking about a concept in a particular complex context, i.e., at the races) for one of two concepts (Dogs; Cars). This yielded six experimental conditions (Sound Car, Sound Dog, Visual Car, Visual Dog, Context Car [e.g., Race Car], Context Dog [e.g., Race Dog]). Cues were followed by blocks of pure-noise that lasted 6–12 sec. During this presentation of noise, participants were explicitly told to sustain imagery of the cued concept (e.g., the sound of a dog barking) until an item appeared through the noise. Each block ended with either an image or a sound embedded in noise that was either congruent to the cue (e.g., greyhound for the context cue “Race Dog”) or incongruent (e.g., elephant trunk for the visual cue “Visual Dog”). Participants responded with a yes/no response to whether the target trial matched the cue. Time points of interest are highlighted in red: These refer to pure-noise trials where participants were thinking about the relevant cue (e.g., thinking about what a sound looked like). Cues, each pure-noise image, and targets were shown for 3 sec each.

To understand patterns of common and distinct neural activity that are important for our experimental conditions (auditory features, visual features, and complex conceptual combinations), we first used MVPA to identify regions that code for each condition. Second, we performed conjunctions of these MVPA maps to identify distinct regions representing the presence or absence of a specific condition. Third, we interrogated the univariate activation of our conjunction maps to identify the BOLD response in each region. Fourth, we seeded these maps in an independent resting-state cohort to identify the intrinsic networks that these fall within. Finally, we performed a conjunction of these resting-state maps to identify potential common regions within the large-scale networks necessary for all forms of imagery. To complement the resting-state analyses, we performed a meta-analysis of these spatial maps to provide a quantitative description of the types of cognitive processes that these patterns are linked to.

Using this analysis pipeline, this study examined three questions that emerged from a common and distinct account of semantic retrieval in the absence of meaningful input. First, we examined whether different types of sensory cortex play a specific role in top–down conceptual retrieval in the absence of sensory input. For example, auditory cortex should be recruited more for thinking about what a dog sounds like than what it looks like; moreover, the patterns of activity in this region should be able to decode between thinking about auditory features and other forms of imagery (e.g., visual or context conditions). Given that the majority of the literature highlights the recruitment of sensory association cortex, we predicted that secondary sensory regions would be recruited more extensively than primary sensory regions during imagery. Second, we investigated the contribution of transmodal regions, including ATL, to different forms of imagery. If these regions combine information from different modalities in a graded fashion, differential connectivity might allow these regions to classify imagined visual and auditory features. Finally, using resting-state fMRI, we characterized the intrinsic connectivity of regions identified in our MVPA analysis to understand the neural networks they are embedded in. We anticipated that these regions would show functional connectivity to regions of transmodal cortex implicated in abstract forms of cognition, as well as to relevant portions of sensory cortex (i.e., visual cortex during visual imagery). Together these different analytic approaches permit the investigation of both similarities and differences in the networks recruited when semantic retrieval is internally generated.

METHODS

Functional Experiment

Participants

Twenty participants were recruited from the University of York. One participant's data were excluded because of excessive motion artifacts, leaving 19 participants in the final analysis (11 women; mean age = 23.67 years, range = 18–37 years). Participants were native British speakers, right-handed, and had normal or corrected-to-normal vision. Participants gave written informed consent and were reimbursed for their time. The study was approved by the York Neuroimaging Centre ethics committee at the University of York.

Design

The functional experiment contained six experimental conditions, in a 2 (concepts; dog, car) × 3 (type of imagery; auditory, visual and conceptually complex context) design (see Supplementary Material A2 for full list of experimental conditions).

Stimuli

Experimental stimuli consisted of (i) six verbal conceptual prompts that referred to each of our six experimental conditions (e.g., Dog Sound, which cued participants to imagine what a dog sounded like) and (ii) visual and auditory noise that was presented throughout experimental conditions and rest periods. For this, Gaussian visual noise was generated through Psychopy (Pyschopy, 2.7), and auditory white noise was generated through Audacity software (Audacity Version 2.0.0), followed by (iii) target images/sounds. The targets used in this paradigm were piloted before fMRI scanning on a separate group of participants (n = 24) to determine the average length of time taken to detect a target (image or sound) emerging through noise (see Supplementary Material A1 and Table A2 for a full description of the pilot experiment). From this pilot, 10 images were selected for each of our six experimental conditions (Dog Visual-Features, Car Visual-Features, Dog Sound, Car Sound, Dog Context, and Car Context) based on statistically similar RTs for detecting the item emerging through noise (see Supplementary Material A3 for a full list of stimuli). Images were detected, on average, at 2861 msec and sounds at 2912 msec (see Table 1). The fMRI scan, therefore, allowed 3000 msec for participants to detect whether an item emerging through noise matched the content of their imagery.

Table 1. 
Behavioral Scores
ConditionfMRI Experiment
RT (msec)Acc (%)
Car sound 2748 (713) 82.11 (16.53) 
Dog sound 2753 (552) 76.84 (12.04) 
Car visual 2704 (204) 83.68 (11.64) 
Dog visual 2620 (241) 82.63 (9.91) 
Car context 2754 (211) 76.76 (12.62) 
Dog context 2569 (250) 79.61 (14.71) 
ConditionfMRI Experiment
RT (msec)Acc (%)
Car sound 2748 (713) 82.11 (16.53) 
Dog sound 2753 (552) 76.84 (12.04) 
Car visual 2704 (204) 83.68 (11.64) 
Dog visual 2620 (241) 82.63 (9.91) 
Car context 2754 (211) 76.76 (12.62) 
Dog context 2569 (250) 79.61 (14.71) 

Standard deviation in parentheses. Acc = percentage accuracy.

Task Procedure

Before being scanned, participants completed a practice session, identical to one scanning run. After this practice run, participants were probed to describe what they had been focused on during the pure-noise trials to ensure they were imagining the relevant concepts. For the in-scanner task, stimuli were presented in four independent runs. Within each scanning run, participants were presented a cue word (e.g., Sound DOG) and instructed to imagine this concept in the presence of visual and auditory noise; for instance, they were told to imagine the sound of a dog barking, growling, yelping, and so forth. They were asked to continue imagining the relevant visual or auditory properties until the stimulus appeared. Task instructions were presented for 3 sec. A variable number of images then followed, each displaying visual and auditory noise (see Figure 1). Within the blocks, the pure-noise images were each shown for 3 sec. Following a variable length of time (between 6 and 12 sec after the initial cue), a target image or sound began to emerge through the noise (at the rate outlined in the pilot experiment described above). Participants were instructed to respond with a button press (yes/no) whether a target item emerging through visual and auditory noise was related to what they had being imagining based on the cue word. Participants were given 3000 msec to respond to this item. The block automatically ended after this image. This design afforded us the high signal sensitivity found with block designs, combined with unpredictability to keep participants cognitively engaged.

The basic design of the task, in which stimuli appeared through noise, was adapted from a previous fMRI study examining internally generated conceptual retrieval (see Coutanche & Thompson-Schill, 2015). One advantage of this method is that it required participants to sustain and search for a particular concept over time, ensuring that there were seconds of data available per trial for the decoding analyses. In addition, because the onset of the stimulus was subtle within the visual and auditory noise, participants were encouraged to engage in semantically driven retrieval before the presentation of the target.

Each experimental condition (e.g., “Dog Sound”) occurred twice in a run (giving eight blocks for each condition across the experiment). Blocks were presented in a pseudorandomized order so the same cue did not immediately repeat, and blocks were separated by 12-sec fixation. During the fixation period, the visual noise and auditory noise were also presented to create an active baseline. Fifty percent of the items emerging through noise contained an item that did not match the preceding cue (i.e., four of eight were foils) to ensure that participants focused on the specific target. To encourage participants to pay attention from the very start of every block, an additional short block was included in each run, in which an item emerged through noise after only 3 sec, followed by 12 sec of fixation. These blocks were disregarded in the analysis.

Acquisition

Data were acquired using a GE 3T HD Excite MRI scanner at the York Neuroimaging Centre, University of York. A Magnex head-dedicated gradient insert coil was used in conjunction with a birdcage, radio frequency coil tuned to 127.4 MHz. A gradient-echo EPI sequence was used to collect data from 38 bottom–up axial slices aligned with the temporal lobe (repetition time [TR] = 2 sec, echo time [TE] = 18 msec, field of view [FOV] = 192 × 192 mm, matrix size = 64 × 64, slice thickness = 3 mm, slice gap = 1 mm, flip angle = 90°). Voxel size was 3 × 3 × 3 mm. Functional images were coregistered onto a T1-weighted anatomical image from each participant (TR = 7.8 sec, TE = 3 msec, FOV = 290 mm × 290 mm, matrix size = 256 mm × 256 mm, voxel size = 1.13 mm × 1.13 mm × 1 mm) using linear registration (FLIRT, FMRIB Software Library [FSL]). This sequence was chosen as previous studies employing this sequence have produced an adequate signal-to-noise ratio in regions prone to signal dropout, such as ATL (e.g., Murphy et al., 2017; Coutanche & Thompson-Schill, 2015).

To ensure that our ROIs had sufficient signal to detect reliable fMRI activation, the temporal signal-to-noise ratio (tSNR) for each participant was calculated by dividing the mean signal in each voxel by the standard deviation of the residual error time series in that voxel (Friedman, Glover, & The FBIRN Consortium, 2006). tSNR values were averaged across the voxels in both ATL and mPFC, regions that suffer from signal loss and distortion due to their proximity to air-filled sinuses (Jezzard & Clare, 1999). Mean tSNR values, averaged across participants, were as follows: ATL, 82.85; mPFC, 97.14. The percentage of voxels in each ROI that had “good” tSNR values (>20; Binder et al., 2011) was above 97% for all ROIs: ATL, 97.19%; mPFC, 99.24%. These values indicate that the tSNR was sufficient to detect reliable fMRI activation in all ROIs (Binder et al., 2011).

Preprocessing

Imaging data were preprocessed using the FSL toolbox (www.fmrib.ox.ac.uk/fsl). Images were skull-stripped using a brain extraction tool (Smith, 2002) to remove nonbrain tissue from the image. The first five volumes (10 sec) of each scan were removed to minimize the effects of magnetic saturation, and slice-timing correction was applied. Motion correction (MCFLIRT; Jenkinson, Bannister, Brady, & Smith, 2002) was followed by temporal high-pass filtering (cutoff = 0.01 Hz). Individual participant data were first registered to their high-resolution T1 anatomical image and then into a standard space (Montreal Neurological Institute MNI152); this process included trilinear interpolation of voxel sizes to 2 × 2 × 2 mm. For univariate analyses, data were additionally smoothed (Gaussian FWHM 6 mm).

Multivariate Pattern Analysis

Analysis was focused on the moments when participants were imagining the target cues (e.g., thinking about what a dog looked like or what a car sounded like). The condition onset and duration were taken from the first pure-noise trial in each block (after the initial cue) to the end of the last pure-noise trial (before the item began to emerge through the noise). The response to each of the six conditions was contrasted against the active rest baseline (periods of auditory and visual noise where participants were not cued to imagine concepts). Box-car regressors for each condition, for each run, in the general linear model were convolved with a double gamma hemodynamic response function (FEAT, FSL). Regressors of no interest were also included to account for head motion within scans. MVPA was conducted on spatially unsmoothed data to preserve local voxel information. For each voxel in the brain, we computed a linear support vector machine (LIBSVM; with fixed regularization hyperparameter C = 1) and a fourfold cross-validation (leave-one-run-out) classification, implemented in custom python scripts using the pyMVPA software package (Hanke et al., 2009). A support vector machine was chosen to combat overfitting by limiting the complexity of the classifier (Lewis-Peacock & Norman, 2014). The classifier was trained on three runs and tested on the independent fourth run; the testing set was then alternated for each of four iterations. Classifiers were trained and tested on individual participant data transformed into MNI standard space. The functional data were first z-scored per voxel within each run. The searchlight analysis was implemented by extracting the z-scored β values from spheres (6-mm radius) centered on each voxel in the masks. This sized sphere included ∼123 3-mm voxels (when not restricted by the brain's boundary; Kriegeskorte, Goebel, & Bandettini, 2006). Classification accuracy (proportion of correctly classified trials) for each sphere was assigned to the sphere's central voxel to produce accuracy maps. The resulting accuracy maps were then smoothed with a Gaussian kernel (6 mm FWHM). To determine whether accuracy maps were above chance levels (50%), individual accuracy maps were entered into a higher-level group analysis (mixed effects, FLAME; www.fmrib.ox.ac.uk/fsl), testing the accuracy values across participants against chance for each voxel. Voxel inclusion was set at z = 2.3, with a cluster significance threshold at FWE p < .01.

The following classification tests were performed: (1) Car versus Dog classifier: This examined whether patterns of activity conveyed information about conceptual identity by training a classifier to discriminate between periods of noise where participants were thinking about a dog and periods of noise where participants were thinking about a car. We were not able to successfully classify the semantic class (dog vs. car) in our data set at the whole-brain level. As this analysis revealed no regions across the cortex could successfully decode this information, the remaining classification tests combined car and dog trials. (2) Auditory versus visual classifier: This examined whether patterns of activity conveyed information regarding the modality of imagery by training a classifier to discriminate between periods of noise where participants were thinking about the visual properties of objects and periods of noise where participants were thinking about the auditory properties of objects. (3) Visual versus context classifier: Here, a classifier was trained to discriminate between periods of noise where participants were thinking about the visual properties of objects and periods of time when participants were thinking about objects in more complex conceptual contexts. (4) Auditory versus context classifier: Here, a classifier was trained to discriminate between periods of noise where participants were thinking about the auditory properties of objects and periods of time when participants were thinking about objects in complex contexts. Unthresholded maps from all analyses are uploaded on Neurovault: neurovault.org/collections/2671/.

Next, we identified regions where patterns of activity consistently informed the classifier for each of our three conditions (visual, auditory, and context) by running a formal conjunction on the uncorrected searchlight maps (using the FSL easythresh command). For visual patterns, we looked at the conjunction of the two searchlight maps that decoded visual properties (visual vs. auditory and visual vs. context). Because regions that contributed to both of these searchlight maps were able to decode simple visual features in imagination, relative to both auditory features and more complex contexts, we reasoned that their pattern of activation related to simple visual features. Next, we looked at the conjunction of the two searchlight maps that decoded the auditory condition (auditory vs. visual and auditory vs. context) to identify brain regions containing patterns of activation relating to simple auditory properties in imagination. Finally, we looked at the conjunction of the two searchlight maps that decoded context properties (context vs. visual and context vs. auditory). This identified brain regions containing activation patterns relating to complex conceptual contexts, as distinct from both simple visual and auditory features. All analyses were cluster-corrected using a z-statistic threshold of 2.3 to define contiguous clusters. Multiple comparisons were controlled using a Gaussian Random Field Theory at a threshold of p < .01.

Univariate Analysis

We examined univariate activation to further characterize the response within our unimodal and transmodal regions defined by MVPA. The percent signal change was extracted for each condition from ROIs defined by the MVPA conjunctions (see above).

Resting-state fMRI

Participants

This analysis was performed on a separate cohort of 157 healthy participants at York Neuroimaging Centre (89 women; mean age = 20.31 years, range = 18–31 years). Participants completed a 9-min functional connectivity MRI scan during which they were asked to rest in the scanner with their eyes open. Using these data, we examined the resting-state fMRI connectivity of our conjunction regions that were informative to decoding visual imagery, auditory imagery, and contextual imagery to investigate whether these regions fell within similar or distinct networks. The data from our resting-state scans have been used in prior published works from the same lab (e.g., Murphy et al., 2017, 2018; Villena-Gonzalez et al., 2018; Wang et al., 2018; Poerio et al., 2017; Sormaz et al., 2017; Vatansever et al., 2017).

Acquisition

As with the functional experiment, a Magnex head-dedicated gradient insert coil was used in conjunction with a birdcage, radio frequency coil tuned to 127.4 MHz. For the resting-state data, a gradient-echo EPI sequence was used to collect data from 60 axial slices with an interleaved (bottom–up) acquisition order with the following parameters: TR = 3 sec, TE = minimum full, volumes = 180, flip angle = 90°, matrix size = 64 × 64, FOV = 192 × 192 mm, voxel size = 3 × 3 × 3 mm. A minimum full TE was selected to optimize image quality (as opposed to selecting a value less than minimum full, which, for instance, would be beneficial for obtaining more slices per TR). Functional images were coregistered onto a T1-weighted anatomical image from each participant (TR = 7.8 sec, TE = 3 msec, FOV = 290 mm × 290 mm, matrix size = 256 mm × 256 mm, voxel size = 1 mm × 1 mm × 1 mm).

Preprocessing

Data were preprocessed using the FSL toolbox (www.fmrib.ox.ac.uk/fsl). Before conducting the functional connectivity analysis, the following prestatistics processing was applied to the resting-state data: motion correction using MCFLIRT to safeguard against motion-related spurious correlations slice-timing correction using Fourier space time-series phase shifting; nonbrain removal using a brain extraction tool; spatial smoothing using a Gaussian kernel of FWHM 6 mm; grand mean intensity normalization of the entire 4-D data set by a single multiplicative factor; high-pass temporal filtering (Gaussian-weighted least squares straight line fitting, with sigma = 100 sec); Gaussian low-pass temporal filtering, with sigma = 2.8 sec.

Low-level Analysis

For each conjunction site, we created spherical seed ROIs, 6 mm in diameter, centered on the peak conjunction voxel: visual conjunction site in the left inferior lateral occipital cortex [−48 −70 −2], auditory conjunction site in the left superior temporal gyrus [−48 −12 −10], and context conjunction site in the left lateral occipital cortex [−48 −60 0], respectively (see Supplementary Table A2). This ensured that we assessed the functional connectivity of a key site when the searchlight conjunction revealed a large cluster or multiple clusters. The time series of these regions were extracted and used as explanatory variables in a separate participant-level functional connectivity analysis for each seed. Participant-specific nuisance regressors were determined using a component-based noise correction (CompCor) approach (Behzadi, Restom, Liau, & Liu, 2007). This method applies PCA to the fMRI signal from participant-specific white matter and CSF ROIs. In total, there were 11 nuisance regressors, five regressors from the CompCor and a further six nuisance regressors were identified using the motion correction MCFLIRT. These principle components were then removed from the fMRI data through linear regression. The working memory (WM) and CSF covariates were generated by segmenting each individual's high-resolution structural image (using FAST in FSL; Zhang, Brady, & Smith, 2001). The default tissue probability maps, referred to as prior probability maps, were registered to each individual's high-resolution structural image (T1 space), and the overlap between these prior probability maps and the corresponding CSF and WM maps was identified. These maps were then thresholded (40% for the SCF and 66% for the WM), binarized, and combined. The six motion parameters were calculated in the motion correction step during preprocessing. Movement in each of the three Cartesian directions (x, y, z) and rotational movement around three axes (pitch, yaw, roll) were included for each individual.

High-level Analysis

At the group level, the data were processed using FEAT Version 5.98 within FSL (www.fmrib.ox.ac.uk/fsl), and the analyses were carried out using FMRIB's Local Analysis of Mixed Effects (FLAME) Stage 1 with automatic outlier detection. No global signal regression was performed. The z-statistic images were then thresholded using clusters determined by z > 2.3 and a cluster-corrected significance threshold of p = .01. Finally, to determine whether our connectivity maps overlapped with one another, we calculated the number of overlapping voxels for our three conjunction site connectivity maps.

RESULTS

Behavioral Results

To determine whether our experimental conditions were well matched at the behavioral level, accuracy and RTs for the fMRI session were calculated for each participant (n = 19). All participants were engaged in the correct task (e.g., thinking about the sound of a dog) as indicated by a mean accuracy score above 75% for all experimental conditions (Table 1). A 2 (Semantic category; car, dog) × 3 (Condition type; auditory, visual, context) repeated-measures ANOVA revealed no differences in Accuracy across the three types of conditions (auditory, visual, conceptually complex context; F(2, 36) = 2.32, p = .11) and no effect of Concept (car, dog; F(1, 18) = 1.95, p = .66). RT scores were also well matched across our experimental conditions (Table 1). A 2 × 3 repeated-measures ANOVA revealed there was no difference in RT between the three experimental tasks (auditory, visual, conceptually complex context; F(2, 36) = 0.46, p = .64), no effect of concept (car, dog; F(1, 18) = 2.61, p = .09), and no interaction between Condition types and Concept (F(2, 36) = 1.17, p = .37). Furthermore, the in-scan RT data were close to the RT in our pilot study (see Supplementary Table A2), suggesting that participants required the same amount of time to detect stimuli both in and out of the scanner (mean RT for images = 2660 msec, SD = 233 msec, mean RT for sounds = 2763 msec, SD = 616 msec).

MVPA Decoding Results

To test which brain regions held patterns of activity related to the type of internally generated conceptual retrieval, we examined brain regions that could classify between conditions during the presentation of auditory and visual noise. For example, the auditory versus visual classifier was trained on the distinction between thinking about auditory and visual properties of concepts (collapsed across both cars and dogs) and tested on the same distinction in unseen data using a cross-validated approach. All results reported are above chance levels (50%, cluster-corrected, p < .01).

Notably, the whole-brain searchlight analysis for the distinction between semantic categories (car vs. dog) revealed no significant results. This finding is broadly consistent with previous decoding studies of internally generated thought, which have shown that specific-level concepts (e.g., lime vs. celery) can be decoded; however categorical-level concepts (e.g., fruit vs. vegetable) were not successfully classified (Coutanche & Thompson-Schill, 2015). This may reflect the dynamic nature of conceptually driven internally generated thought; for instance, on one trial, participants may have been thinking about the exterior look of a car and on the next trial imagining the interior decor.

In contrast, the whole-brain searchlight analysis for the distinction between visual and auditory imagery revealed an extensive network of brain regions including sensory regions, such as bilateral inferior lateral occipital cortex, left fusiform, and left auditory cortex (encompassing planum polare and Heschl's gyrus extending more broadly into superior temporal gyrus), as well as transmodal brain regions that have been implicated in semantic processing, such as MTG, ATL (middle, inferior, fusiform, and parahippocampal portions) and, on the medial surface, anterior cingulate gyrus and thalamus (see Figure 2A; Table 2).

Figure 2. 

Results of the group-level whole-brain searchlight analysis with above-chance (50%) decoding projected in red (cluster-corrected p < .01). All panels reveal results from binary choice searchlight analyses decoding the content of thought while participants viewed visual and auditory noise. (A) Location of searchlights that could decode between thinking about the sound and thinking about the visual properties of concepts. (B) Location of searchlights that could decode between thinking about the visual properties of concepts and thinking about the same concepts in more complex contexts. (C) Location of searchlights that could decode between thinking about the sound of concepts and thinking about the same concepts in more complex contexts.

Figure 2. 

Results of the group-level whole-brain searchlight analysis with above-chance (50%) decoding projected in red (cluster-corrected p < .01). All panels reveal results from binary choice searchlight analyses decoding the content of thought while participants viewed visual and auditory noise. (A) Location of searchlights that could decode between thinking about the sound and thinking about the visual properties of concepts. (B) Location of searchlights that could decode between thinking about the visual properties of concepts and thinking about the same concepts in more complex contexts. (C) Location of searchlights that could decode between thinking about the sound of concepts and thinking about the same concepts in more complex contexts.

Table 2. 
Center Voxel Coordinates of Highest Decoding Sphere in the Searchlight Analyses
ConditionCluster PeakExtended Cluster RegionsCluster ExtentZ ScoreAcc (%)xyz
Auditory vs. visual L lateral occipital cortex, superior division L lateral occipital cortex, inferior division, L occipital pole, L occipital fusiform gyrus 975 4.13 75.00% −36 −86 10 
L thalamus R thalamus 599 4.18 66.25% −12 −26 
R lateral occipital cortex, inferior division R MTG, temporo-occipital part 431 4.43 68.75% 54 −66 10 
L planum polare L superior temporal gyrus, posterior division, insular cortex, L Heschl's gyrus, anterior superior temporal gyrus 226 3.77 70.75% −40 −16 −8 
L SMG, posterior division L planum temporale, posterior superior temporal gyrus 178 3.52 75.00% −60 −42 16 
R frontal operculum cortex R frontal orbital cortex, R insular cortex 156 3.37 68.25% 40 22 
L aPG L temporal fusiform gyrus 75 4.34 75.00% −36 −18 −18 
L aMTG L anterior inferior temporal gyrus 67 4.12 66.25% −56 −6 −18 
L anterior cingulate gyrus   49 3.82 58.36% −4 34 −2 
Visual vs. context L lateral occipital cortex, inferior division L MTG, temporo-occipital part, L occipital pole 733 4.16 68.75% −46 −72 
Auditory vs. context L lateral occipital cortex, inferior division L temporal occipital fusiform cortex, L inferior temporal gyrus, temporo-occipital part 312 3.81 76.49% 48 −62 −6 
R temporal occipital fusiform gyrus R lateral occipital cortex, inferior division, R inferior temporal gyrus, temporo-occipital part, R MTG, temporo-occipital part 118 3.17 68.75% 34 −56 −20 
R pMTG R posterior superior temporal gyrus, R SMG, R anterior superior temporal gyrus 90 2.92 68.75% 56 −34 −2 
R posterior superior temporal gyrus R MTG, R planum polare, R planum temporale 81 3.15 75.00% 60 −22 
ConditionCluster PeakExtended Cluster RegionsCluster ExtentZ ScoreAcc (%)xyz
Auditory vs. visual L lateral occipital cortex, superior division L lateral occipital cortex, inferior division, L occipital pole, L occipital fusiform gyrus 975 4.13 75.00% −36 −86 10 
L thalamus R thalamus 599 4.18 66.25% −12 −26 
R lateral occipital cortex, inferior division R MTG, temporo-occipital part 431 4.43 68.75% 54 −66 10 
L planum polare L superior temporal gyrus, posterior division, insular cortex, L Heschl's gyrus, anterior superior temporal gyrus 226 3.77 70.75% −40 −16 −8 
L SMG, posterior division L planum temporale, posterior superior temporal gyrus 178 3.52 75.00% −60 −42 16 
R frontal operculum cortex R frontal orbital cortex, R insular cortex 156 3.37 68.25% 40 22 
L aPG L temporal fusiform gyrus 75 4.34 75.00% −36 −18 −18 
L aMTG L anterior inferior temporal gyrus 67 4.12 66.25% −56 −6 −18 
L anterior cingulate gyrus   49 3.82 58.36% −4 34 −2 
Visual vs. context L lateral occipital cortex, inferior division L MTG, temporo-occipital part, L occipital pole 733 4.16 68.75% −46 −72 
Auditory vs. context L lateral occipital cortex, inferior division L temporal occipital fusiform cortex, L inferior temporal gyrus, temporo-occipital part 312 3.81 76.49% 48 −62 −6 
R temporal occipital fusiform gyrus R lateral occipital cortex, inferior division, R inferior temporal gyrus, temporo-occipital part, R MTG, temporo-occipital part 118 3.17 68.75% 34 −56 −20 
R pMTG R posterior superior temporal gyrus, R SMG, R anterior superior temporal gyrus 90 2.92 68.75% 56 −34 −2 
R posterior superior temporal gyrus R MTG, R planum polare, R planum temporale 81 3.15 75.00% 60 −22 

Highest decoding accuracy clusters for each of our three classifiers analyzed separately. The auditory versus visual classifier was trained on the distinction between thinking about the sound of a concept versus thinking about what a concept looked like. The visual versus context classifier was trained on the distinction between thinking about what a concept looked like versus thinking about it in a specific meaning-based context. The sound versus context classifier was trained on the distinction between thinking about what a concept sounded like and thinking about it in a specific meaning-based context. All analyses were cluster corrected using a z-statistic threshold of 2.3 to define contiguous clusters. Multiple comparisons were controlled using a Gaussian random field theory at a threshold of p < .01. As well as peak accuracy (reported under the “Cluster Peak” column), the “Extended Cluster Regions” include all significant regions within each ROI. The unthresholded MVPA maps for each searchlight have been uploaded to the Neurovault database and can be found here neurovault.org/collections/2671/. L = left; R = right.

Next, we examined a visual versus context classifier, which identified regions that could classify the difference between thinking about the visual properties of concepts and thinking about the same concepts in complex conceptual contexts. This whole-brain searchlight analysis revealed a large region in the left occipital lobe that could decode between visual and context conditions at above chance levels (50%, cluster-corrected p < .01; Figure 2B; Table 2). Finally, we tested whether auditory versus context conditions could be decoded. This whole-brain searchlight analysis revealed a set of clusters in bilateral auditory cortex extending along the superior temporal gyrus into ATL and posterior occipital-temporal cortex that could decode between auditory and context conditions (50%, cluster-corrected p < .01; Figure 2C; Table 2).

To identify regions that could consistently decode visual, auditory, and context conditions, conjunction analyses were performed across the searchlight maps outlined in Figures 2AC. The results of these conjunctions are presented in Figure 3A. For visual imagery, we looked at the conjunction of the two searchlight maps that involved decoding simple visual features (visual vs. auditory and visual vs. context). This revealed a left-lateralized cluster in occipital pole extending into lateral occipital cortex, which reliably decoded the distinction between simple visual feature trials and both of the other conditions. For auditory imagery, we looked at the conjunction of the two searchlight maps that involved decoding auditory properties (auditory vs. visual and auditory vs. context). This analysis revealed left hemisphere regions, including primary auditory cortex, superior temporal gyrus, posterior MTG (pMTG), and occipital fusiform, which reliably decoded the distinction between simple auditory feature trials and both of the other conditions. For imagery driven by complex conceptual contexts, we looked at the conjunction of the two searchlight maps that involved decoding context (context vs. visual and context vs. auditory), which produced a cluster in the left lateral occipital cortex.

Figure 3. 

(A) Represents brain regions where patterns of activity consistently informed the classifier for each of our three tasks (visual, context, and sound). For visual patterns, we looked at the conjunction of the two searchlight maps that decoded visual properties (sound vs. visual and visual vs. context). For context patterns, we looked at the conjunction of the two searchlight maps that decoded context properties (visual vs. context and sound vs. context). For sound patterns, we looked at the conjunction of the two searchlight maps that decoded sound properties (sound vs. visual and sound vs. context). (B) Shows the univariate percent signal change for each of our three conditions taken from a 6-mm sphere centered on the peak conjunction point (visual [−48 −70 2], context [−48 −60 0], sound [−52 −8 −10]). (C = context, S = sound, V = visual). * Indicates a significant different between conditions (p < .05). Error bars represent 95% confidence intervals. The unthresholded maps for each condition have been uploaded to the Neurovault database and can be found at neurovault.org/collections/2671/. (C) Gray panel illustratesthe seven core intrinsic networks identified by Yeo et al. (2011): dark purple = visual network, light blue = somatosensory network, dark green = dorsal network, light pink = ventral network, white = limbic network, yellow/orange = frontoparietal network (FPN), and red = DMN. The black circles highlight where our peak conjunction sites fall with respect to these network. Our peak visual conjunction fell within the visual network, peak context conjunction fell within the dorsal network, and peak sound conjunction site within the somatosensory network.

Figure 3. 

(A) Represents brain regions where patterns of activity consistently informed the classifier for each of our three tasks (visual, context, and sound). For visual patterns, we looked at the conjunction of the two searchlight maps that decoded visual properties (sound vs. visual and visual vs. context). For context patterns, we looked at the conjunction of the two searchlight maps that decoded context properties (visual vs. context and sound vs. context). For sound patterns, we looked at the conjunction of the two searchlight maps that decoded sound properties (sound vs. visual and sound vs. context). (B) Shows the univariate percent signal change for each of our three conditions taken from a 6-mm sphere centered on the peak conjunction point (visual [−48 −70 2], context [−48 −60 0], sound [−52 −8 −10]). (C = context, S = sound, V = visual). * Indicates a significant different between conditions (p < .05). Error bars represent 95% confidence intervals. The unthresholded maps for each condition have been uploaded to the Neurovault database and can be found at neurovault.org/collections/2671/. (C) Gray panel illustratesthe seven core intrinsic networks identified by Yeo et al. (2011): dark purple = visual network, light blue = somatosensory network, dark green = dorsal network, light pink = ventral network, white = limbic network, yellow/orange = frontoparietal network (FPN), and red = DMN. The black circles highlight where our peak conjunction sites fall with respect to these network. Our peak visual conjunction fell within the visual network, peak context conjunction fell within the dorsal network, and peak sound conjunction site within the somatosensory network.

The conjunction of the MVPA searchlight maps revealed regions of sensory cortex that could decode different types of imagery (Figure 3A). As an additional complementary analysis, the percentage signal change was extracted for each condition from each of the three conjunction sites by placing a 6-mm sphere around the peak (Figure 3B). A 3 (Conjunction site; visual, sound, conceptually complex context) × 3 (Condition type: visual, sound, conceptually complex context) repeated-measures ANOVA revealed no significant main effect of Conjunction site, F(2, 36) = 0.48, p = .622, or Condition type, F(2, 36) = 2.30, p = .114; however, there was a significant interaction between Site and Condition type, F(4, 72) = 4.38, p = .003. Planned comparisons in the form of repeated-measures t tests revealed that our visual cluster showed significantly more activity for our visual condition than auditory, t(18) = 4.99, p < .001, and for context versus auditory conditions, t(18) = 4.61, p < .001, but there was no significant difference between the visual and context conditions, t(18) = 0.94, p = .36. Likewise, our auditory cluster showed significantly more activity for our auditory condition than visual, t(18) = 4.64, p < .001, and for the context versus visual conditions, t(18) = 5.602, p < .001, but no significant difference between auditory and context conditions, t(18) = −1.17, p = .25. Finally, our context cluster revealed significantly more activity for the context condition compared with both visual, t(18) = 5.56, p < .001, and auditory conditions, t(18) = 5.31, p < .001, but no significant difference between visual and auditory conditions, t(18) = −0.03, p = .97.

These univariate analyses demonstrate that regions that were able to classify particular aspects of internally driven conceptual retrieval also showed a stronger BOLD response to these conditions, that is, greater activation to visual or auditory imagery in “visual” and “auditory” classifier areas, and more activation to complex conceptual contexts in areas that could reliably classify this context condition. Regions that could decode visual and auditory conditions also responded to the context condition, consistent with the view that there is a multisensory response to complex conceptual contexts. Moreover, the context classifier region showed a response across both visual and auditory conditions, suggesting this region is transmodal; however, it also showed an increased response in the context condition, supporting the view that this region responds most strongly to the unique demands of the construction process imposed by complex contexts. Finally, to determine which distributed networks our conjunction findings fall within, we compared our results with seven large-scale networks as defined by Yeo et al. (2011) (Figure 3C). Both visual and sound conjunction clusters fell predominantly within unimodal sensory networks (visual and somatosensory respectively), whereas our context conjunction site was located within the dorsal attentional network.

Given our prior predictions regarding heteromodal cortex (e.g., ATL), we interrogated candidate heteromodal regions within the auditory versus visual classifier map. The brain regions labeled on Figure 4 are the peaks representing the highest decoding accuracy taken from Table 2, with the exclusion of peaks in unimodal cortex (determined by the conjunction results). This analysis included a distributed network of putative transmodal regions, including supramarginal gyrus (SMG) extending into pMTG, ventrolateral ATL (anterior MTG [aMTG] and anterior inferior temporal gyrus), thalamus, anterior parahippocampal gyrus (aPG), and anterior cingulate cortex (aCC; Figure 4A). As before, the percent signal change was extracted from each of these regions by placing a 6-mm sphere around each peak: SMG [−60 −42 16], aMTG [−56 −6 −18], aCC [−4 34 −2], thalamus [−12 26 2], and aPG [−36 −18 −18]. A 5 (Location; SMG, aMTG, aCC, thalamus, aPG) × 3 (Condition type: visual, sound, conceptually complex context) repeated-measures ANOVA revealed no significant main effect of Location, F(4, 72) = 0.34, p = .71, or Condition type, F(4, 72) = 2.02, p = .131, nor was there a significant interaction between Site and Condition type, F(8, 144) = 2.65, p = .102. This equivalency across conditions is consistent with the characterization of these regions as transmodal. Finally, to quantify which intrinsic networks our clusters fall within, we compared our results with seven large-scale networks as defined by Yeo et al. (2011) (Figure 4B). The majority of clusters fell within transmodal cortices, including the default mode network (DMN) and limbic system.

Figure 4. 

Heteromodal brain regions taken from the auditory versus visual classifier map (Figure 2A). (A) Labeled regions highlight the peaks of decoding accuracy from Table 2 (excluding those peaks in unimodal cortex highlighted in our conjunction analysis for sound and visual imagination): SMG [−60 −42 16], aMTG [−56 −6 −18], aCC [−4 34 −2], thalamus [−12 26 2], aPG [−36 −18 −18]. The bar graph shows the univariate percent signal change for each of our three conditions (C = context, S = sound, V = visual) extracted from a 6-mm sphere centered on each labeled peak. There was no significant difference between conditions across any of our ROIs (p > .05). Error bars represent 95% confidence intervals. The unthresholded maps can be found here neurovault.org/collections/2671/. (B) Gray panel illustrates the seven core intrinsic networks identified by Yeo et al. (2011): dark purple = visual network, light blue = somatosensory network, dark green = dorsal network, light pink = ventral network, white = limbic network, yellow/orange = frontoparietal network (FPN), and red = DMN. The black circles highlight where our peak sites fall with respect to these network. SMG falls between ventral stream and somatomotor, aMTG; ACC falls within the DMN; aPG falls within the limbic system. Subcortical regions (e.g., the thalamus) are not shown on the Yeo et al. (2011) networks.

Figure 4. 

Heteromodal brain regions taken from the auditory versus visual classifier map (Figure 2A). (A) Labeled regions highlight the peaks of decoding accuracy from Table 2 (excluding those peaks in unimodal cortex highlighted in our conjunction analysis for sound and visual imagination): SMG [−60 −42 16], aMTG [−56 −6 −18], aCC [−4 34 −2], thalamus [−12 26 2], aPG [−36 −18 −18]. The bar graph shows the univariate percent signal change for each of our three conditions (C = context, S = sound, V = visual) extracted from a 6-mm sphere centered on each labeled peak. There was no significant difference between conditions across any of our ROIs (p > .05). Error bars represent 95% confidence intervals. The unthresholded maps can be found here neurovault.org/collections/2671/. (B) Gray panel illustrates the seven core intrinsic networks identified by Yeo et al. (2011): dark purple = visual network, light blue = somatosensory network, dark green = dorsal network, light pink = ventral network, white = limbic network, yellow/orange = frontoparietal network (FPN), and red = DMN. The black circles highlight where our peak sites fall with respect to these network. SMG falls between ventral stream and somatomotor, aMTG; ACC falls within the DMN; aPG falls within the limbic system. Subcortical regions (e.g., the thalamus) are not shown on the Yeo et al. (2011) networks.

Intrinsic Connectivity

To provide a better understanding of the neural architecture that supported each of our experimental condition, we explored the intrinsic connectivity of our unimodal conjunction sites (Figure 3) and transmodal sites (Figure 4) identified through MVPA, in resting-state fMRI. The results of the unimodal connectivity analysis are presented in Figure 5 and Supplementary Table A4. For the visual and auditory conjunction sites, which peaked within visual and auditory cortex, respectively, there was coupling beyond the sensory areas surrounding the seed regions to include areas of transmodal cortex, including ATL, particularly the left medial surface, pMTG, and precuneus. To aid the interpretation of the functional connectivity of the visual, context, and sound connectivity maps, we performed a decoding analysis using automated fMRI meta-analytic software NeuroSynth (Figure 5, right). Meta-analytic decoding of these spatial maps revealed domain-specific networks and their associated function. The visual connectivity map correlated with terms related to visual processing (e.g., visual, objects); likewise, our sound connectivity map correlated with terms related to auditory processing (e.g., speech, sound). The context connectivity map included both visual (e.g., objects) and higher-order terms (e.g., attention).

Figure 5. 

Resting-state connectivity maps of unimodal regions projected on rendered brain, displaying the left hemisphere and the left medial view. Maps thresholded at z = 3.1, cluster-corrected p < .01. Visual maps seeded from the left inferior lateral occipital cortex [−48 70 −2]. Context maps seeded from the left inferior lateral occipital cortex [−48 −60 0]. Sound maps seeded from the left superior temporal gyrus [−52 −8 −10]. Word clouds represent the decoded function of each connectivity map using automated fMRI meta-analyses software (NeuroSynth; Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011). This software computed the spatial correlation between each unthresholded zstat mask and every other meta-analytic map (n = 11406) for each term/concept stored in the database. The 10 meta-analytic maps exhibiting the highest positive correlation for each subsystem was extracted, and the term corresponding to each of these meta-analyses is shown on the right. The font size reflects the size of the correlation. This allows us to quantify the most likely reverse inferences that would be drawn from these functional maps by the larger neuroimaging community.

Figure 5. 

Resting-state connectivity maps of unimodal regions projected on rendered brain, displaying the left hemisphere and the left medial view. Maps thresholded at z = 3.1, cluster-corrected p < .01. Visual maps seeded from the left inferior lateral occipital cortex [−48 70 −2]. Context maps seeded from the left inferior lateral occipital cortex [−48 −60 0]. Sound maps seeded from the left superior temporal gyrus [−52 −8 −10]. Word clouds represent the decoded function of each connectivity map using automated fMRI meta-analyses software (NeuroSynth; Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011). This software computed the spatial correlation between each unthresholded zstat mask and every other meta-analytic map (n = 11406) for each term/concept stored in the database. The 10 meta-analytic maps exhibiting the highest positive correlation for each subsystem was extracted, and the term corresponding to each of these meta-analyses is shown on the right. The font size reflects the size of the correlation. This allows us to quantify the most likely reverse inferences that would be drawn from these functional maps by the larger neuroimaging community.

Finally, the results of the heteromodal connectivity analysis are presented in Figure 6 and Supplementary Table A4. Both our thalamus and SMG seed coupled extensively with sensorimotor regions and core portions of the DMN (thalamus = angular gyrus and posterior cingulate cortex; SMG = middle temporal gyrus and ATL). The three other seeds (aMTG, aPG, and aCC) all coupled with core transmodal networks (DMN and limbic system). To aid the interpretation of these connectivity maps, we performed a decoding analysis using automated fMRI meta-analytic software NeuroSynth (Figure 6, right). The thalamus connectivity map correlated with terms related to task demands and multisensory properties (e.g., anticipation, motivation, somatosensory). Likewise, the SMG connectivity map correlated with terms related to sensory processing (e.g., speech, sound). In contrast, aMTG, aPG, and aCC connectivity maps all correlated with terms related to memory retrieval (e.g., semantic, memory, encoding, DMN).

Figure 6. 

Resting-state connectivity maps of heteromodal regions projected on rendered brain, displaying the left hemisphere and the left medial view. Maps thresholded at z = 3.1, cluster-corrected p < .01. Thalamus maps seeded from [−48 −60 0]. SMG map seeded from [−48 −70 −2]. aMTG seeded from [−56 −6 −18]. aPG seeded from [−36 −18 −18]. aCC seeded from [−4 34 −2]. Word clouds represent the decoded function of each connectivity map using automated fMRI meta-analyses software (NeuroSynth; Yarkoni et al., 2011). This software computed the spatial correlation between each unthresholded zstat mask and every other meta-analytic map (n = 11406) for each term/concept stored in the database. The 10 meta-analytic maps exhibiting highest positive correlation for each subsystem was extracted, and the term corresponding to each of these meta-analyses is shown on the right. The font size reflects the size of the correlation. This allows us to quantify the most likely reverse inferences that would be drawn from these functional maps by the larger neuroimaging community.

Figure 6. 

Resting-state connectivity maps of heteromodal regions projected on rendered brain, displaying the left hemisphere and the left medial view. Maps thresholded at z = 3.1, cluster-corrected p < .01. Thalamus maps seeded from [−48 −60 0]. SMG map seeded from [−48 −70 −2]. aMTG seeded from [−56 −6 −18]. aPG seeded from [−36 −18 −18]. aCC seeded from [−4 34 −2]. Word clouds represent the decoded function of each connectivity map using automated fMRI meta-analyses software (NeuroSynth; Yarkoni et al., 2011). This software computed the spatial correlation between each unthresholded zstat mask and every other meta-analytic map (n = 11406) for each term/concept stored in the database. The 10 meta-analytic maps exhibiting highest positive correlation for each subsystem was extracted, and the term corresponding to each of these meta-analyses is shown on the right. The font size reflects the size of the correlation. This allows us to quantify the most likely reverse inferences that would be drawn from these functional maps by the larger neuroimaging community.

DISCUSSION

Our study examined common and distinct components supporting top–down conceptual retrieval in the absence of relevant sensory input. Multivariate whole-brain decoding identified aspects of secondary visual and auditory cortex (inferior lateral occipital cortex and superior temporal gyrus) in which the pattern of activation across voxels related to the modality of what was imagined. Using functional connectivity, we established that, at rest, these regions showed differential connectivity with auditory or visual cortex, suggesting that their recruitment reflected domain-specific aspects of imagination. We also identified several heteromodal regions (including ventrolateral ATL, aPG, and aCC) that were able to decode the difference between thinking about what a concept looked like and what it sounded like. Finally, a region within the dorsal attention network (inferior lateral occipital cortex) was differentially recruited during imagination for more complex contexts and could reliably decode the difference between all of our experimental conditions. Complementary investigation of the intrinsic connectivity of these regions confirmed their role in unimodal and heteromodal processing. These findings are consistent with the view that imagination emerges from a combined response within unimodal and transmodal regions.

The current fMRI study is one of only a few (e.g., Vetter et al., 2014) to identify patterns of activity in both visual and auditory association cortices that can reliably decode between different modalities of imagination (e.g., thinking about what a dog sounds like and what it looks like) within the same participants. Our study is the first, to our knowledge, to investigate this issue while equating the visual and auditory input across our conditions. Typically neuroimaging studies of visual imagery have required participants to stare at a fixation cross while imagining an object, ensuring a consistent and simple visual input into the system (e.g., Dijkstra et al., 2017; Albers et al., 2013; Lee et al., 2012; Reddy et al., 2010; Ishai et al., 2000). In contrast, studies of auditory imagery typically require participants to imagine the sound of an object or piece of music in the presence of auditory input created by the scanner noise (e.g., Lima et al., 2015; Kraemer, Macrae, Green, & Kelley, 2005; Zatorre & Halpern, 2005). In this study, we presented both visual and auditory random noise, providing more comparable visual and auditory baselines. This methodological advance allows a purer test of common and distinct neural contributions to imagination within different modalities than has been possible in prior studies.

Domain-specific Contributions to Imagination

Our study provided evidence that neural recruitment occurs in primary sensory regions to support modality-specific imagery. However, the highest decoding accuracy and the location of our imagination conjunctions fell within secondary sensory regions (superior temporal gyrus and inferior lateral occipital cortex, respectively; Figure 3). Our functional connectivity analyses confirmed that, although these peak decoding sites fall outside the primary systems, at rest these regions are functionally coupled to primary visual and auditory cortex, respectively. These findings are in line with prior decoding and fMRI studies that have highlighted the relationship between imagery and secondary sensory regions (de Borst & de Gelder, 2017; Coutanche & Thompson-Schill, 2015; Vetter et al., 2014; Albers et al., 2013; Zvyagintsev et al., 2013; Lee et al., 2012; Daselaar et al., 2010; Reddy et al., 2010; Stokes et al., 2009; Halpern et al., 2004; Ishai et al., 2000; Chen et al., 1998). Notably, our results are broadly consistent with the “anterior shift” noted by Thompson-Schill (2003). She found that areas activated by semantic processing are not isomorphic to those used in direct experience but, rather, are shifted anterior to those areas (for a wider review, see Meteyard, Cuadrado, Bahrami, & Vigliocco, 2012; Binder & Desai, 2011; McNorgan, Reid, & McRae, 2011; Chatterjee, 2010).

Our whole-brain searchlight analysis also revealed patterns of activity supporting modality-specific imagination that extended beyond sensory cortex into semantic regions, including ATL (MTG, ventral and medial portions) and aCC (see Figure 4). Functional connectivity analysis indicated that the majority of these regions showed extensive connectivity to other temporal lobe regions, encompassing both medial and lateral sites. Three of these regions also showed prefrontal connectivity, primarily with connections to regions of the DMN (anterior inferior frontal gyrus and ventral and dorsal mPFC). Together, this pattern of functional connectivity suggests that these regions form a common network in the temporal lobe and at least some of these regions are closely allied at rest with regions within the DMN.

Domain-general Contributions to Imagination

We found a cluster in the left inferior lateral occipital cortex that showed stronger activation in the context condition. This region was able to classify the distinction between all three conditions. The left lateral occipital cortex is traditionally thought to support visual perception. However, this region predominantly falls within the dorsal attention network, as opposed to the visual network (Yeo et al., 2011). Although this “task-positive” network usually responds to demanding, externally presented decisions (for a review, see Corbetta & Shulman, 2002), in this study, we see engagement in a task in which imagery is being generated internally from memory. This pattern of results demonstrates that imagery not only recruits transmodal regions associated with memory but also sites implicated in attention, when the features that are being retrieved have to be shaped to suit the context and/or when complex patterns of retrieval are required. One caveat is that our current experimental paradigm does not allow us to establish if this response in lateral occipital cortex is driven by the need to generate rich heteromodal content (i.e., “dog races” can envision the sound of a crowd cheering and the visual properties of a race track) or the requirement to steer retrieval away from dominant features to currently relevant information (because the fact that dogs go for walks is not pertinent to “dog races” and might need to be suppressed to allow contextually relevant information to come to the fore). Nevertheless, the findings do suggest that this specific region plays a greater role in supporting imagery of complex multimodal contexts as opposed to single features.

Seeding from our “heteromodal” MVPA sites highlighted extensive functional coupling with core transmodal networks, including DMN and limbic systems (see Figure 6; Margulies et al., 2016; Yeo et al., 2011; Mesulam, 1998). Meta-analytic decomposition of these maps returned terms related to memory retrieval (e.g., semantic, memory, encoding, DMN). In addition, two of these sites (thalamus and SMG) also coupled to somatosensory and attentional networks. Thalamic influence has been reported during multisensory interplay (Driver & Noesselt, 2008), and its role in multimodal processing may explain why this region could decode between visual and auditory forms of imagination. Moreover, it has recently been suggested that SMG is crucial in the construction of mental representations (Benedek et al., 2018). As this region is connected to both attention and sensory networks, our findings converge with previous evidence suggesting that SMG integrates memory content in new ways and supports executively demanding mental simulations (Benedek et al., 2014, 2018; Fink et al., 2010).

Our analysis of semantic retrieval in the absence of meaningful input is based on the assumption that participants are actively engaging in imagery during pure-noise periods. Our task explicitly instructed participants to engage in imagery of the cued concept while actively searching in noise for this item. Despite instructions to actively imagine the concept and steps to check that participants were following these instructions before scanning, an alternative strategy might have been to hold the cue in working memory, using verbal rehearsal, until the target was presented—and then to judge if the target matched the cue. However, the results appear to be inconsistent with this possibility. In our conjunction analyses, we found auditory, but not visual or context information, was consistently decoded in auditory cortex, whereas visual, but not auditory information, was successfully decoded in visual cortex, even before the stimulus was presented. This specialization of sensory cortex for different types of conceptual information is consistent with top–down initiation of visual and auditory imagery, as opposed to verbal rehearsal in working memory. In addition, previous fMRI studies utilizing similar paradigms (e.g., Coutanche & Thompson-Schill, 2015) have shown that, during moments when participants are required to imagine stimuli while actively searching for them in noise, the decoding accuracy in brain regions that encode unimodal featural information (e.g., shape information in lateral occipital cortex and color information in V4) predicts the decoding accuracy in transmodal semantic regions (ATL). Top–down conceptual retrieval within an imagery task can readily explain this finding (Dijkstra et al., 2017; Coutanche & Thompson-Schill, 2015; Kalkstein, Checksfield, Bollinger, & Gazzaley, 2011; Pearson, Rademaker, & Tong, 2011; Ganis & Schendan, 2008; Kosslyn, 2005; Ishai et al., 2000). Our findings are in line with Dehaene and colleagues' (1998) “global workspace” framework, in which distributed brain regions connect both heteromodal and unimodal regions in a coordinated and flexible manner, based on current task demands. By this view, when attention is allocated to the auditory domain (e.g, SOUND DOG), the global workspace recruits distributed regions that hold representations in memory (e.g., the DMN) and specialized brain areas pertinent to the task at hand (e.g., auditory cortex) to support imagination.

A second limitation of this study relates to our inability to decode semantic category (dog vs. car). Notably, the whole-brain searchlight analysis for the distinction between semantic categories revealed no significant results. This finding is broadly consistent with previous decoding studies of internally generated thought, which have shown that specific-level concepts (e.g., lime vs. celery) can be decoded; however, categorical-level concepts (e.g., fruit vs. vegetable) were not successfully classified (Coutanche & Thompson-Schill, 2015). This might reflect the dynamic nature of conceptually driven internally generated thought; for instance, on one trial, participants may have been thinking about the exterior look of a car and, on the next trial, imagining the interior decor. This seems plausible because fMRI decoding studies have shown that amodal semantic representations can be successfully decoded within isolated heteromodal regions (e.g., ATL; Murphy et al., 2017). On the other hand, our success at classifying the task (visual vs. auditory vs. context), yet not the concept being imagined, raises the possibility that the classifiers were at least partly driven by attentional processes important in initiating and sustaining imagination. For example, changes in connectivity could allow tasks to be classified even in the absence of specific mental representations pertaining to the target. For this reason, it may therefore be useful in the future to explore how conceptual categories are represented across the cortex in the absence of meaningful input.

Conclusion

In this investigation of semantic retrieval in the absence of meaningful stimuli in the external environment, we found extensive recruitment of sensory cortex, which was modulated by the modality of imagination required by the task. We also observed a role for transmodal brain regions in supporting internally generated conceptual retrieval. These findings are consistent with the view that different types of imaginative thought depend upon patterns of common and distinct neural recruitment that reflect the respective contributions of modality specific and modality invariant neural representations.

Acknowledgments

The research was supported by BBSRC grant BB/J006963/1. E. J. was supported by a grant from the European Research Council (SEMBIND - 283530). J. S. was supported by a grant from the European Research Council (Wandering Minds - 303701).

Reprint requests should be sent to Charlotte Murphy, Department of Psychology, York Neuroimaging Centre, University of York, York, YO10 5DD, United Kingdom, or via e-mail: charlotte.murphy@york.ac.uk.

REFERENCES

Albers
,
A. M.
,
Kok
,
P.
,
Toni
,
I.
,
Dijkerman
,
H. C.
, &
de Lange
,
F. P.
(
2013
).
Shared representations for working memory and mental imagery in early visual cortex
.
Current Biology
,
23
,
1427
1431
.
Alderson-Day
,
B.
, &
Fernyhough
,
C.
(
2015
).
Inner speech: Development, cognitive functions, phenomenology, and neurobiology
.
Psychological Bulletin
,
141
,
931
965
.
Amedi
,
A.
,
Malach
,
R.
, &
Pascual-Leone
,
A.
(
2005
).
Negative BOLD differentiates visual imagery and perception
.
Neuron
,
48
,
859
872
.
Antrobus
,
J. S.
,
Singer
,
J. L.
, &
Greenberg
,
S.
(
1966
).
Studies in the stream of consciousness: Experimental enhancement and suppression of spontaneous cognitive processes
.
Perceptual and Motor Skills
,
23
,
399
417
.
Baron
,
S. G.
, &
Osherson
,
D.
(
2011
).
Evidence for conceptual combination in the left anterior temporal lobe
.
Neuroimage
,
55
,
1847
1852
.
Barsalou
,
L. W.
(
1999
).
Perceptions of perceptual symbols
.
Behavioral and Brain Sciences
,
22
,
637
660
.
Barsalou
,
L. W.
(
2008
).
Grounded cognition
.
Annual Review of Psychology
,
59
,
617
645
.
Behzadi
,
Y.
,
Restom
,
K.
,
Liau
,
J.
, &
Liu
,
T. T.
(
2007
).
A component based noise correction method (CompCor) for BOLD and perfusion based fMRI
.
Neuroimage
,
37
,
90
101
.
Bemis
,
D. K.
, &
Pylkkänen
,
L.
(
2013
).
Basic linguistic composition recruits the left anterior temporal lobe and left angular gyrus during both listening and reading
.
Cerebral Cortex
,
23
,
1859
1873
.
Benedek
,
M.
,
Jauk
,
E.
,
Fink
,
A.
,
Koschutnig
,
K.
,
Reishofer
,
G.
,
Ebner
,
F.
, et al
(
2014
).
To create or to recall? Neural mechanisms underlying the generation of creative new ideas
.
Neuroimage
,
88
,
125
133
.
Benedek
,
M.
,
Schües
,
T.
,
Beaty
,
R. E.
,
Jauk
,
E.
,
Koschutnig
,
K.
,
Fink
,
A.
, et al
(
2018
).
To create or to recall original ideas: Brain processes associated with the imagination of novel object uses
.
Cortex
,
99
,
93
102
.
Berger
,
B.
(
2015
).
Brain oscillatory signatures of working memory control process
.
Doctoral dissertation
,
University of Surrey
.
Binder
,
J. R.
, &
Desai
,
R. H.
(
2011
).
The neurobiology of semantic memory
.
Trends in Cognitive Sciences
,
15
,
527
536
.
Binder
,
J. R.
,
Gross
,
W. L.
,
Allendorfer
,
J. B.
,
Bonilha
,
L.
,
Chapin
,
J.
,
Edwards
,
J. C.
, et al
(
2011
).
Mapping anterior temporal lobe language areas with fMRI: A multicenter normative study
.
Neuroimage
,
54
,
1465
1475
.
Bunzeck
,
N.
,
Wuestenberg
,
T.
,
Lutz
,
K.
,
Heinze
,
H.-J.
, &
Jancke
,
L.
(
2005
).
Scanning silence: Mental imagery of complex sounds
.
Neuroimage
,
26
,
1119
1127
.
Chatterjee
,
A.
(
2010
).
Disembodying cognition
.
Language and Cognition
,
2
,
79
116
.
Chen
,
W.
,
Kato
,
T.
,
Zhu
,
X.-H.
,
Ogawa
,
S.
,
Tank
,
D. W.
, &
Ugurbil
,
K.
(
1998
).
Human primary visual cortex and lateral geniculate nucleus activation during visual imagery
.
NeuroReport
,
9
,
3669
3674
.
Corbetta
,
M.
, &
Shulman
,
G. L.
(
2002
).
Control of goal-directed and stimulus-driven attention in the brain
.
Nature Reviews Neuroscience
,
3
,
201
215
.
Coutanche
,
M. N.
, &
Thompson-Schill
,
S. L.
(
2015
).
Creating concepts from converging features in human cortex
.
Cerebral Cortex
,
25
,
2584
2593
.
Daselaar
,
S. M.
,
Porat
,
Y.
,
Huijbers
,
W.
, &
Pennartz
,
C. M. A.
(
2010
).
Modality-specific and modality-independent components of the human imagery system
.
Neuroimage
,
52
,
677
685
.
de Borst
,
A. W.
, &
de Gelder
,
B.
(
2017
).
fMRI-based multivariate pattern analyses reveal imagery modality and imagery content specific representations in primary somatosensory, motor and auditory cortices
.
Cerebral Cortex
,
27
,
3994
4009
.
Dehaene
,
S.
,
Kerszberg
,
M.
, &
Changeux
,
J.-P.
(
1998
).
A neuronal model of a global workspace in effortful cognitive tasks
.
Proceedings of the National Academy of Sciences, U.S.A.
,
95
,
14529
14534
.
Dijkstra
,
N.
,
Zeidman
,
P.
,
Ondobaka
,
S.
,
van Gerven
,
M. A. J.
, &
Friston
,
K.
(
2017
).
Distinct top–down and bottom–up brain connectivity during visual perception and imagery
.
Scientific Reports
,
7
,
5677
.
Driver
,
J.
, &
Noesselt
,
T.
(
2008
).
Multisensory interplay reveals crossmodal influences on ‘sensory-specific’ brain regions, neural responses, and judgments
.
Neuron
,
57
,
11
23
.
Fink
,
A.
,
Grabner
,
R. H.
,
Gebauer
,
D.
,
Reishofer
,
G.
,
Koschutnig
,
K.
, &
Ebner
,
F.
(
2010
).
Enhancing creativity by means of cognitive stimulation: Evidence from an fMRI study
.
Neuroimage
,
52
,
1687
1695
.
Friedman
,
L.
,
Glover
,
G. H.
, &
The FBIRN Consortium
. (
2006
).
Reducing interscanner variability of activation in a multicenter fMRI study: Controlling for signal-to-fluctuation-noise-ratio (SFNR) differences
.
Neuroimage
,
33
,
471
481
.
Gabrieli
,
J. D. E.
,
Brewer
,
J. B.
,
Desmond
,
J. E.
, &
Glover
,
G. H.
(
1997
).
Separate neural bases of two fundamental memory processes in the human medial temporal lobe
.
Science
,
276
,
264
266
.
Ganis
,
G.
, &
Schendan
,
H. E.
(
2008
).
Visual mental imagery and perception produce opposite adaptation effects on early brain potentials
.
Neuroimage
,
42
,
1714
1727
.
Halpern
,
A. R.
(
2001
).
Cerebral substrates of musical imagery
.
Annals of the New York Academy of Sciences
,
930
,
179
192
.
Halpern
,
A. R.
, &
Zatorre
,
R. J.
(
1999
).
When that tune runs through your head: A PET investigation of auditory imagery for familiar melodies
.
Cerebral Cortex
,
9
,
697
704
.
Halpern
,
A. R.
,
Zatorre
,
R. J.
,
Bouffard
,
M.
, &
Johnson
,
J. A.
(
2004
).
Behavioral and neural correlates of perceived and imagined musical timbre
.
Neuropsychologia
,
42
,
1281
1292
.
Hanke
,
M.
,
Halchenko
,
Y. O.
,
Sederberg
,
P. B.
,
Hanson
,
S. J.
,
Haxby
,
J. V.
, &
Pollmann
,
S.
(
2009
).
PyMVPA: A python toolbox for multivariate pattern analysis of fMRI data
.
Neuroinformatics
,
7
,
37
53
.
Hartung
,
F.
,
Hagoort
,
P.
, &
Willems
,
R. M.
(
2017
).
Readers select a comprehension mode independent of pronoun: Evidence from fMRI during narrative comprehension
.
Brain and Language
,
170
,
29
38
.
Ishai
,
A.
,
Ungerleider
,
L. G.
, &
Haxby
,
J. V.
(
2000
).
Distributed neural systems for the generation of visual images
.
Neuron
,
28
,
979
990
.
Jenkinson
,
M.
,
Bannister
,
P.
,
Brady
,
M.
, &
Smith
,
S.
(
2002
).
Improved optimization for the robust and accurate linear registration and motion correction of brain images
.
Neuroimage
,
17
,
825
841
.
Jezzard
,
P.
, &
Clare
,
S.
(
1999
).
Sources of distortion in functional MRI data
.
Human Brain Mapping
,
8
,
80
85
.
Kalkstein
,
J.
,
Checksfield
,
K.
,
Bollinger
,
J.
, &
Gazzaley
,
A.
(
2011
).
Diminished top–down control underlies a visual imagery deficit in normal aging
.
Journal of Neuroscience
,
31
,
15768
15774
.
Kane
,
M. J.
,
Brown
,
L. H.
,
McVay
,
J. C.
,
Silvia
,
P. J.
,
Myin-Germeys
,
I.
, &
Kwapil
,
T. R.
(
2007
).
For whom the mind wanders, and when: An experience-sampling study of working memory and executive control in daily life
.
Psychological Science
,
18
,
614
621
.
Kiefer
,
M.
, &
Pulvermüller
,
F.
(
2012
).
Conceptual representations in mind and brain: Theoretical developments, current evidence and future directions
.
Cortex
,
48
,
805
825
.
Killingsworth
,
M. A.
, &
Gilbert
,
D. T.
(
2010
).
A wandering mind is an unhappy mind
.
Science
,
330
,
932
.
Knauff
,
M.
,
Kassubek
,
J.
,
Mulack
,
T.
, &
Greenlee
,
M. W.
(
2000
).
Cortical activation evoked by visual mental imagery as measured by fMRI
.
NeuroReport
,
11
,
3957
3962
.
Kosslyn
,
S. M.
(
2005
).
Mental images and the brain
.
Cognitive Neuropsychology
,
22
,
333
347
.
Kosslyn
,
S. M.
,
Ganis
,
G.
, &
Thompson
,
W. L.
(
2001
).
Neural foundations of imagery
.
Nature Reviews Neuroscience
,
2
,
635
642
.
Kosslyn
,
S. M.
,
Pascual-Leone
,
A.
,
Felician
,
O.
,
Camposano
,
S.
,
Keenan
,
J. P.
,
Thompson
,
W. L.
, et al
(
1999
).
The role of area 17 in visual imagery: Convergent evidence from PET and rTMS
.
Science
,
284
,
167
170
.
Kraemer
,
D. J. M.
,
Macrae
,
C. N.
,
Green
,
A. E.
, &
Kelley
,
W. M.
(
2005
).
Musical imagery: Sound of silence activates auditory cortex
.
Nature
,
434
,
158
.
Kriegeskorte
,
N.
,
Goebel
,
R.
, &
Bandettini
,
P.
(
2006
).
Information-based functional brain mapping
.
Proceedings of the National Academy of Sciences, U.S.A.
,
103
,
3863
3868
.
Lambon Ralph
,
M. A.
,
Jefferies
,
E.
,
Patterson
,
K.
, &
Rogers
,
T. T.
(
2017
).
The neural and computational bases of semantic cognition
.
Nature Reviews Neuroscience
,
18
,
42
55
.
Lee
,
S.-H.
,
Kravitz
,
D. J.
, &
Baker
,
C. I.
(
2012
).
Disentangling visual imagery and perception of real-world objects
.
Neuroimage
,
59
,
4064
4073
.
Lewis-Peacock
,
J. A.
, &
Norman
,
K. A.
(
2014
).
Multi-voxel pattern analysis of fMRI data
. In
M. S.
Gazzaniga
&
G. R.
Mangun
(Eds.),
The Cognitive Neurosciences
(5th ed., pp.
911
920
).
Cambridge, MA
:
MIT Press
.
Lima
,
C. F.
,
Lavan
,
N.
,
Evans
,
S.
,
Agnew
,
Z.
,
Halpern
,
A. R.
,
Shanmugalingam
,
P.
, et al
(
2015
).
Feel the noise: Relating individual differences in auditory imagery to the structure and function of sensorimotor systems
.
Cerebral Cortex
,
25
,
4638
4650
.
Margulies
,
D. S.
,
Ghosh
,
S. S.
,
Goulas
,
A.
,
Falkiewicz
,
M.
,
Huntenburg
,
J. M.
,
Langs
,
G.
, et al
(
2016
).
Situating the default-mode network along a principal gradient of macroscale cortical organization
.
Proceedings of the National Academy of Sciences, U.S.A.
,
113
,
12574
12579
.
Mason
,
M. F.
,
Norton
,
M. I.
,
Van Horn
,
J. D.
,
Wegner
,
D. M.
,
Grafton
,
S. T.
, &
Macrae
,
C. N.
(
2007
).
Wandering minds: The default network and stimulus-independent thought
.
Science
,
315
,
393
395
.
McNorgan
,
C.
,
Reid
,
J.
, &
McRae
,
K.
(
2011
).
Integrating conceptual knowledge within and across representational modalities
.
Cognition
,
118
,
211
233
.
Mesulam
,
M.-M.
(
1998
).
From sensation to cognition
.
Brain
,
121
,
1013
1052
.
Mesulam
,
M.-M.
(
2012
).
The evolving landscape of human cortical connectivity: Facts and inferences
.
Neuroimage
,
62
,
2182
2189
.
Meteyard
,
L.
,
Cuadrado
,
S. R.
,
Bahrami
,
B.
, &
Vigliocco
,
G.
(
2012
).
Coming of age: A review of embodiment and the neuroscience of semantics
.
Cortex
,
48
,
788
804
.
Murphy
,
C.
,
Jefferies
,
E.
,
Rueschemeyer
,
S.-A.
,
Sormaz
,
M.
,
Wang
,
H.-T.
,
Margulies
,
D. S.
, et al
(
2018
).
Distant from input: Evidence of regions within the default mode network supporting perceptually-decoupled and conceptually-guided cognition
.
Neuroimage
,
171
,
393
401
.
Murphy
,
C.
,
Rueschemeyer
,
S.-A.
,
Watson
,
D.
,
Karapanagiotidis
,
T.
,
Smallwood
,
J.
, &
Jefferies
,
E.
(
2017
).
Fractionating the anterior temporal lobe: MVPA reveals differential responses to input and conceptual modality
.
Neuroimage
,
147
,
19
31
.
Patterson
,
K.
,
Nestor
,
P. J.
, &
Rogers
,
T. T.
(
2007
).
Where do you know what you know? The representation of semantic knowledge in the human brain
.
Nature Reviews Neuroscience
,
8
,
976
987
.
Pearson
,
J.
,
Rademaker
,
R. L.
, &
Tong
,
F.
(
2011
).
Evaluating the mind's eye: The metacognition of visual imagery
.
Psychological Science
,
22
,
1535
1542
.
Poerio
,
G. L.
,
Sormaz
,
M.
,
Wang
,
H.-T.
,
Margulies
,
D. S.
,
Jefferies
,
E. A.
, &
Smallwood
,
J.
(
2017
).
The role of the default mode network in component processes underlying the wandering mind
.
Social Cognitive and Affective Neuroscience
,
12
,
1047
1062
.
Reddy
,
L.
,
Tsuchiya
,
N.
, &
Serre
,
T.
(
2010
).
Reading the mind's eye: Decoding category information during mental imagery
.
Neuroimage
,
50
,
818
825
.
Reilly
,
J.
,
Garcia
,
A.
, &
Binney
,
R. J.
(
2016
).
Does the sound of a barking dog activate its corresponding visual form? An fMRI investigation of modality-specific semantic access
.
Brain and Language
,
159
,
45
59
.
Rice
,
G. E.
,
Hoffman
,
P.
, &
Lambon Ralph
,
M. A.
(
2015
).
Graded specialization within and between the anterior temporal lobes
.
Annals of the New York Academy of Sciences
,
1359
,
84
97
.
Singer
,
J. L.
(
1966
).
Daydreaming: An introduction to the experimental study of inner experience
.
New York
:
Crown Publishing Group/Random House
.
Slotnick
,
S. D.
,
Thompson
,
W. L.
, &
Kosslyn
,
S. M.
(
2005
).
Visual mental imagery induces retinotopically organized activation of early visual areas
.
Cerebral Cortex
,
15
,
1570
1583
.
Smith
,
S. M.
(
2002
).
Fast robust automated brain extraction
.
Human Brain Mapping
,
17
,
143
155
.
Sormaz
,
M.
,
Jefferies
,
E. A.
,
Bernhardt
,
B. C.
,
Karapanagiotidis
,
T.
,
Mollo
,
G.
,
Bernasconi
,
N.
, et al
(
2017
).
Knowing what from where: Hippocampal connectivity with temporoparietal cortex at rest is linked to individual differences in semantic and topographic memory
.
Neuroimage
,
152
,
400
410
.
Stokes
,
M.
,
Thompson
,
R.
,
Cusack
,
R.
, &
Duncan
,
J.
(
2009
).
Top–down activation of shape-specific population codes in visual cortex during mental imagery
.
Journal of Neuroscience
,
29
,
1565
1572
.
Thompson-Schill
,
S. L.
(
2003
).
Neuroimaging studies of semantic memory: Inferring “how” from “where.”
.
Neuropsychologia
,
41
,
280
292
.
van Ackeren
,
M. J.
, &
Rueschemeyer
,
S.-A.
(
2014
).
Cross-modal integration of lexical-semantic features during word processing: Evidence from oscillatory dynamics during EEG
.
PLoS One
,
9
,
e101042
.
Vatansever
,
D.
,
Bzdok
,
D.
,
Wang
,
H.-T.
,
Mollo
,
G.
,
Sormaz
,
M.
,
Murphy
,
C.
, et al
(
2017
).
Varieties of semantic cognition revealed through simultaneous decomposition of intrinsic brain connectivity and behaviour
.
Neuroimage
,
158
,
1
11
.
Vetter
,
P.
,
Smith
,
F. W.
, &
Muckli
,
L.
(
2014
).
Decoding sound and imagery content in early visual cortex
.
Current Biology
,
24
,
1256
1262
.
Villena-Gonzalez
,
M.
,
Wang
,
H.-T.
,
Sormaz
,
M.
,
Mollo
,
G.
,
Margulies
,
D. S.
,
Jefferies
,
E. A.
, et al
(
2018
).
Individual variation in the propensity for prospective thought is associated with functional integration between visual and retrosplenial cortex
.
Cortex
,
99
,
224
234
.
Visser
,
M.
,
Jefferies
,
E.
, &
Lambon Ralph
,
M. A.
(
2010
).
Semantic processing in the anterior temporal lobes: A meta-analysis of the functional neuroimaging literature
.
Journal of Cognitive Neuroscience
,
22
,
1083
1094
.
Wang
,
H.-T.
,
Poerio
,
G. L.
,
Murphy
,
C.
,
Bzdok
,
D.
,
Jefferies
,
E. A.
, &
Smallwood
,
J.
(
2018
).
Dimensions of experience: Exploring the heterogeneity of the wandering mind
.
Psychological Science
,
29
,
56
71
.
Yarkoni
,
T.
,
Poldrack
,
R. A.
,
Nichols
,
T. E.
,
Van Essen
,
D. C.
, &
Wager
,
T. D.
(
2011
).
Large-scale automated synthesis of human functional neuroimaging data
.
Nature Methods
,
8
,
665
670
.
Yeo
,
B. T. T.
,
Krienen
,
F. M.
,
Sepulcre
,
J.
,
Sabuncu
,
M. R.
,
Lashkari
,
D.
,
Hollinshead
,
M.
, et al
(
2011
).
The organization of the human cerebral cortex estimated by intrinsic functional connectivity
.
Journal of Neurophysiology
,
106
,
1125
1165
.
Zatorre
,
R. J.
, &
Halpern
,
A. R.
(
2005
).
Mental concerts: Musical imagery and auditory cortex
.
Neuron
,
47
,
9
12
.
Zhang
,
Y.
,
Brady
,
M.
, &
Smith
,
S.
(
2001
).
Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm
.
IEEE Transactions on Medical Imaging
,
20
,
45
57
.
Zvyagintsev
,
M.
,
Clemens
,
B.
,
Chechko
,
N.
,
Mathiak
,
K. A.
,
Sack
,
A. T.
, &
Mathiak
,
K.
(
2013
).
Brain networks underlying mental imagery of auditory and visual information
.
European Journal of Neuroscience
,
37
,
1421
1434
.