Causal Contributions of the Domain-General (Multiple Demand) and the Language-Selective Brain Networks to Perceptual and Semantic Challenges in Speech Comprehension

Abstract Listening to spoken language engages domain-general multiple demand (MD; frontoparietal) regions of the human brain, in addition to domain-selective (frontotemporal) language regions, particularly when comprehension is challenging. However, there is limited evidence that the MD network makes a functional contribution to core aspects of understanding language. In a behavioural study of volunteers (n = 19) with chronic brain lesions, but without aphasia, we assessed the causal role of these networks in perceiving, comprehending, and adapting to spoken sentences made more challenging by acoustic-degradation or lexico-semantic ambiguity. We measured perception of and adaptation to acoustically degraded (noise-vocoded) sentences with a word report task before and after training. Participants with greater damage to MD but not language regions required more vocoder channels to achieve 50% word report, indicating impaired perception. Perception improved following training, reflecting adaptation to acoustic degradation, but adaptation was unrelated to lesion location or extent. Comprehension of spoken sentences with semantically ambiguous words was measured with a sentence coherence judgement task. Accuracy was high and unaffected by lesion location or extent. Adaptation to semantic ambiguity was measured in a subsequent word association task, which showed that availability of lower-frequency meanings of ambiguous words increased following their comprehension (word-meaning priming). Word-meaning priming was reduced for participants with greater damage to language but not MD regions. Language and MD networks make dissociable contributions to challenging speech comprehension: Using recent experience to update word meaning preferences depends on language-selective regions, whereas the domain-general MD network plays a causal role in reporting words from degraded speech.


INTRODUCTION
During speech comprehension, listeners are continually challenged by various aspects of the input, which leads to uncertainty at multiple levels of the linguistic hierarchy. For example, acoustic challenges arise when speech is quiet, in an unfamiliar accent, produced by a young child who has not yet mastered articulation, or otherwise degraded. In such cases, perception of the individual phonemes and lexical forms is more uncertain. Linguistic challenges arise when there is lexical-semantic or syntactic ambiguity, or complexity from low-frequency words or constructions, such that the intended meaning is unclear. To resolve these uncertainties during speech comprehension, listeners make use of diverse sources of information (Altmann & Kamide, 1999;Cutler et al., 1997;Garrod & Pickering, 2004;Hagoort et al., 2004;Münster & Knoeferle, 2018;Özyürek, 2014;Van Berkum, 2009;Zhang et al., 2021). Furthermore, listeners learn in response to their experiences: They show perceptual and semantic adaptation such that improvements in the perception and comprehension of different types of challenging speech can be observed over time Rodd et al., 2013). In this paper, we consider the potential functional contributions of two distinct groups of cortical brain regions-the domain-selective language network and domain-general multiple demand (MD) network-to successful perception and comprehension of different types of challenging speech, and to subsequent perceptual and semantic adaptation.
Sometimes, linguistic stimuli also activate a set of bilateral frontal, parietal, cingular, and opercular regions (see Diachek et al., 2020, for a large scale fMRI investigation and relevant discussion), which together form the MD network (Duncan, 2010b(Duncan, , 2013. This network is domaingeneral, responding during diverse demanding tasks (Duncan & Owen, 2000;Fedorenko et al., 2012;Fedorenko et al., 2013;Hugdahl et al., 2015;Shashidhara et al., 2019) and has been linked to cognitive constructs such as executive control, working memory, selective attention, and fluid intelligence (Assem, Blank, et al., 2020;Cole & Schneider, 2007;Duncan & Owen, 2000;Vincent et al., 2008;Woolgar et al., 2018). Regions of the MD network show strongly synchronized activity and fluctuation patterns that dissociate sharply from those of the language network (Blank et al., 2014;Mineroff et al., 2018;Paunov et al., 2019). Moreover, damage to the MD network leads to patterns of cognitive impairment that differ from those observed in cases of language network Multiple demand (MD) network: A set of bilateral frontoparietal brain regions that respond to a range of diverse demanding tasks.
Language-selective network: A set of left-lateralised frontotemporal brain regions that respond selectively to linguistic stimuli.
damage (Duncan, 2010a;Fedorenko & Varley, 2016;Woolgar et al., 2010;Woolgar et al., 2018), confirming a functional dissociation between the two networks. (See Fedorenko & Blank, 2020, for a review focusing on the dissociation between subregions of Broca's area.) Recently, it has been argued that the MD network does not play a functional role in language comprehension (Blank & Fedorenko, 2017;Diachek et al., 2020;Shain et al., 2020;Wehbe et al., 2021; for reviews, see Campbell & Tyler, 2018;. That is, activation of MD regions does not reflect core cognitive operations that are essential to language comprehension such as perceiving word forms and accessing word meanings. Instead, it is proposed that activation of MD regions reflects a general increase in effort, which is imposed by task demands in particular, or in some cases even mis-localisation of languageselective activity because of the proximity of the two systems in some parts of the brain (e.g., in the inferior frontal gyrus (IFG), Fedorenko et al., 2012;see Quillen et al., 2021, for evidence that increased linguistic and nonlinguistic task demands matched on difficulty differentially recruite langage-selective versus domain-general regions).
However, existing evidence that domain-general MD regions do not contribute to language comprehension is limited in two ways. First, relevant studies have typically drawn conclusions about function based on the magnitude of neural activity (e.g., the BOLD fMRI response). The strongest causal inference about the necessity (and selectivity) of brain regions for particular cognitive processes comes from approaches that transiently disrupt neural functioning in the healthy brain (e.g., transcranial magnetic stimulation, or TMS) and measure the effects on behaviour, or from cases of acquired brain damage, either in case studies or multi-patient lesion-symptom mapping investigations that exploit inter-individual variability in behavioural and neural profiles to link specific brain systems to behavioural outcomes (Halai et al., 2017). A recent lesion study found that the extent of damage to the MD network predicted deficits in fluid intelligence; in contrast, MD lesions did not predict remaining deficits in verbal fluency after the influence of fluid intelligence was removed, which instead were predicted by damage to the language-selective network (Woolgar et al., 2018), in line with the dissociation discussed above. These results provide convincing evidence that the MD network but not the language network contributes to fluid intelligence, but suggest that the MD network contribution does not extend to language function. However, given that language function was assessed with a verbal fluency task-an elicited production paradigm that relies on a host of diverse cognitive operations-the question of whether the MD network causally contributes to specific aspects of language comprehension remains unanswered.
A second limitation of previous studies is their focus on the comprehension of clearly perceptible and relatively unambiguous language, whereas naturalistic speech comprehension typically involves dealing with noise and uncertainty in the input. For example, speech may be in an unfamiliar accent or contain disfluencies and mispronunciations; there may be background speech or other sounds or distractions; or the words and syntax may be ambiguous or uncommon. These features can make identifying words and inferring meaning-core computations of comprehension-more difficult (for a review of different types of challenges to speech comprehension, see Johnsrude & Rodd, 2015). It therefore remains a possibility that the MD network is functionally critical for successful comprehension in these more challenging listening situations (Diachek et al., 2020).
Lexical-semantic ambiguity (for a review, see Rodd, 2018) challenges comprehension because of the competition between alternative meanings of a single word form during meaning access (Rayner & Duffy, 1986;Rodd et al., 2002;Seidenberg et al., 1982;Swinney, 1979), and because costly reinterpretation is sometimes required (Blott et al., 2021;Duffy et al., 1988;Rodd et al., 2010Rodd et al., , 2012. Domain-general cognitive operations may be useful in responding to the challenge, as evidenced by the positive relationship between individuals' success in semantic ambiguity resolution and executive functioning skill (Gernsbacher et al., 1990;Gernsbacher & Faust, 1991;Khanna & Boland, 2010) and in dual-task studies showing that performance on nonlinguistic visual tasks is impaired during semantic reinterpretation (Rodd et al., 2010), but these domain-general operations may be plausibly generated by either language-selective or domain-general cortical regions.
Functional imaging studies show that semantic ambiguity resolution engages left-lateralised frontal and temporal brain regions typical of the language-selective network, specifically posterior parts of middle and inferior temporal lobe, anterior temporal lobe, and the posterior IFG (Bilenko et al., 2009;Musz & Thompson-Schill, 2017;Rodd et al., 2005;Vitello et al., 2014;Zempleni et al., 2007; for a review, see Rodd, 2020). The possibility that the IFG in particular plays a causal role is supported by the observation that individuals with Broca's aphasia have difficulties in using context to access subordinate word meanings (Hagoort, 1993;Swaab et al., 1998;Swinney et al., 1989), although patients in these studies were selected based on their language profile rather than lesion location.
Although subregions within the IFG form part of the language-selective network, as discussed above, there are also subregions that fall within the domain-general MD network (e.g., . Indeed IFG recruitment during ambiguity resolution has been typically accounted for by invoking domain-general constructs of cognitive control or conflict resolution (Novick et al., 2005;Thompson-Schill et al., 1997) which resolve competition between alternative meanings of ambiguous words (Musz & Thompson-Schill, 2017). Currently, the heterogeneity of the IFG makes activations within this region difficult to interpret functionally, without careful anatomical identification of relevant components (Tahmasebi et al., 2012).
A range of studies show that listeners can adapt to the challenges of perceiving and comprehending acoustically degraded or semantically ambiguous sentences. Listeners' perception of degraded speech improves spontaneously over time with repeated exposure (Davis et al., Semantic ambiguity resolution: Selection of the appropriate meaning of a semantically ambiguous word. 2005; Guediche et al., 2014;Hervais-Adelman et al., 2008;Loebach & Pisoni, 2008;Sohoglu & Davis, 2016;Stacey & Summerfield, 2008), so long as attention is directed to speech (Huyck & Johnsrude, 2012). This perceptual adaptation is facilitated by visual/auditory feedback presented concurrently or in advance , generalises across talkers (Huyck et al., 2017), is supported by lexical-level information such that learning through exposure to pseudowords is less effective than with real words (although in some cases, learning with pseudowords is possible; Hervais-Adelman et al., 2008) but does not additionally benefit from sentence-level semantic information (learning was as effective with meaningless syntactic prose; Davis et al., 2005).
Regarding adaptation to ambiguous words, research has shown that accessing a less frequent (subordinate) meaning of an ambiguous word is easier following exposure to the same meaning of an ambiguous word in a prime sentence. Although the cognitive operations underpinning this so-called word meaning priming effect remain somewhat underspecified, the effect can be described as a form of longer-term lexicosemantic learning since it can be observed tens of minutes or even hours after initial exposure, or perhaps longer if adaptation is consolidated by sleep (Betts et al., 2018;Gaskell et al., 2019;Rodd et al., 2013).

The Current Study
In the current study, we ask whether speech perception and comprehension in different challenging circumstances, and adaptation in response to these challenges, depend on the MD network or the language-selective network. To do this, we investigated the impact of lesions to these networks, on behavioural measures of speech perception, comprehension, and adaptation.
We recruited participants (n = 19) on the basis of having long-standing lesions that either (1) had substantial overlap with the domain-selective language network, (2) had substantial overlap with the domain-general MD network, or (3) had overlap with neither language nor MD network. The participants performed behavioural tasks to assess the immediate effects and longer-term consequences of two types of listening challenge. For the first challenge (acoustic-phonetic), we measured perception of noise-vocoded spoken sentences in a word report task. Adaptation to this type of acoustic degradation was assessed in a subsequent word report task following a period of training. For the second challenge (lexicosemantic), we measured comprehension of spoken sentences that included low-frequency meaning of semantically ambiguous words, in a sentence coherence judgement task. Adaptation to semantic ambiguity was assessed in a word association task to measure the consequences of experience with the lower-frequency meanings for subsequent meaning access. Whilst all cognitive tasks will require the contribution of some general cognitive operations (e.g., attention, working memory), our tasks were chosen to be simple enough for participants to perform, thus minimising the demands on such general cognitive operations, in the absence of acoustic-degradation or lexicosemantic ambiguity. These tasks are made more difficult by challenges to perceptual processes (e.g., acoustically degraded speech), or semantic processes (e.g., lexicosemantic ambiguity) that are a central part of language function. We acknowledge that perceptual and semantic challenges to language function may have secondary impacts on domain-general functions (e.g., due to increased working memory demand or a requirement that listeners use sentence context to support processing). However, we expect the same sorts of additional domain-general operations to apply both to degraded speech perception and semantic ambiguity comprehension, as well as to adaptation to these challenges. Thus, if brain lesions have a dissociable impact on accommodating these different challenges, then this would suggest a causal contribution to a specific aspect of language functioning rather than a contribution to domain-general processes.
Word meaning priming: The increase in ease of accessing a particular word meaning following exposure to that same meaning in a prime sentence, which can be considered a type of semantic adaptation or learning.
Noise vocoding: A method of acoustically degrading a speech signal in which spectral detail is reduced but slow amplitude modulations-approximately reflecting syllabic units-are retained.
Planned analyses focused on comparing behavioural performance measures across the three participant groups. However, the aetiologies (e.g., stroke, tumour excision) that lead to brain lesions do not respect functional boundaries of the two networks of interest, and therefore our primary analyses treated lesion volume in each network as a graded rather than a categorical factor. We report group identities of the participants in the demographics table, and in the lesion maps and data plots contained in the figures below, since this was the basis of our participant recruitment and so that interested readers can observe how individual participants-defined on the basis of lesion location-perform our various tasks. Group analyses can be found in the Supporting Information at https://doi.org/10.1162/nol_a_00081.
For each task, behavioural performance measures were associated with lesion location and extent by performing correlational analyses using probabilistic functional activation atlases (e.g., Woolgar et al., 2018). Finally, across-task analyses assessed potential dissociations between the contributions of these two networks for accommodating and adapting to different sources of listening challenge during speech comprehension.

Participants
Twenty-one right-handed native English speakers were recruited from the Cambridge Cognitive Neuroscience Research Panel (CCNRP), a database of volunteers who have suffered a brain lesion and have expressed interest in taking part in research. Participants were invited to take part in the current research on the basis that they had chronic lesions (minimum time since injury of 3 yr) to cortical areas falling predominantly in the language or MD networks (or lesions in other areas for control participants), but without knowledge of their behavioural profiles. Thus, volunteers were not recruited on the basis of a known language impairment or aphasia diagnosis. The two networks were broadly defined, based on previous functional imaging data from typical volunteers (described below), and linked to lesions traced on anatomical MRI scans for CCNRP volunteers. Participants gave written informed consent under the approval of the Cambridge Psychology Research Ethics Committee. Data from two participants were not included in the final analyses of either task (one participant was unable to complete either task due to fatigue and hearing difficulties; a second failed to complete the semantic ambiguity experimental tasks and also had difficulties accurately reporting back the words they heard in the degraded speech task, achieving only 68% word report accuracy for the clear speech condition across pre-and post-training test sessions; see task details below).
The remaining 19 participants (8 female, mean age 61 yr, range 37-75 yr) had brain lesions caused by tumour excision (n = 8), stroke (haemorrhagic: n = 6, ischaemic: n = 1), or a combination of these (tumour excision and haemorrhagic stroke: n = 1), with other causes being abscess excision (n = 1) or resection because of epileptic seizures (n = 1), and one of unknown cause (n = 1). Individual participant characteristics are detailed in Table 1. Two participants contributed data to just one of the tasks (md6 was excluded from the degraded speech experiment for not completing the task; md10 was excluded from the semantic ambiguity analyses for giving multiple responses during the word association task). Thus, data from 18 participants were included for each of the experiments analysed separately (see below) and from 17 participants for the cross-experimental analyses.
The National Adult Reading Test (NART; Nelson, 1982) was used to estimate premorbid IQ. The Test of Reception of Grammar (TROG-2; Bishop, 2003) was used as a background assessment of linguistic competence. Note. NART = National Adult Reading Test (Nelson, 1982); TROG2 = Test of Reception of Grammar (Bishop, 2003); lang/LANG = language; md/MD = multiple demand; L = left; R = right.
a Participant md6 was excluded from the degraded speech task analyses.
b Participant md10 was excluded from the semantic ambiguity task analyses.

Lesion Analysis
Lesion analysis followed procedures developed in previous research (Woolgar et al., 2018). Each participant had a structural MRI image (T1-weighted spoiled gradient echo MRI scans with 1 × 1 × 1 mm resolution) which included lesion tracing as part of previous participation in the CCNRP. From these images, we estimated the volume of lesion that overlapped with the language network, the MD network, or elsewhere. The two networks were defined from probabilistic fMRI activation maps constructed from large numbers of healthy participants (Language: n = 220, MD: n = 63), who performed tasks developed to localise language processing and domain-general executive processing (see Blank et al., 2014;Fedorenko, 2014;Fedorenko et al., 2013;Mahowald & Fedorenko, 2016). The activation maps for the language network contrasted data from participants reading or listening to sentences versus lists of pseudowords (neural responses in the language network are modality-independent; Fedorenko et al., 2010;Scott et al., 2017); those for the MD network contrasted data from participants performing a hard versus easy visuospatial working memory task (remembering 8 vs. 4 locations, respectively, in a 3 × 4 grid). The visuo-spatial task captures all major components of the MD network defined by overlap of multiple demands (Assem, Glasser, et al., 2020). Furthermore, defining the network with a non-auditory, non-language task makes relating the impact of damage to the network on spoken language functions potentially more noteworthy than using a task that targets auditory or language processing. Each individual participant's activation map for the relevant contrast (sentences > pseudowords, hard > easy spatial working memory) was thresholded at a p < 0.001 uncorrected level, binarised and normalised before the resulting images were combined in template space. Thus, the language and MD networks are functionally defined for each individual separately before being combined (for discussion of the benefits of using an individual subject approach, see Fedorenko, 2021). The resulting probabilistic activation overlap maps ( Figure 1A) contain information in each voxel about the proportion of participants who show a significant effect (at p < 0.001) for the contrast of interest. Following Woolgar et al. (2018), we thresholded the probabilistic map for each network at 5%, thus retaining voxels in which activation was present for at least 5% of the contributing participants.
We then calculated the lesion volume falling into each network (defined in the probabilistic map) for each of the 19 participants ( Figure 1B). Participants were initially assigned to one of three broad groups (LANG, MD or OTHER) based on the proportion and volume of their lesions falling into language and MD regions as well as the overall proportion of each network that was damaged ( Figure 1C; see Supporting Information for further details of group assignment). However, since assignment of participants to groups is based on arbitrary lesion volume thresholds and because the group allocation for several participants was not clear-cut (e.g., lang2, lang3, md5 in Figure 1B) our main analyses correlate behavioural performance measures with lesion volume in the two key networks, thereby avoiding these arbitrary choices. We detail the group assignments in describing the participants and results so that the interested reader can track information about individual participants. Group analyses are included in the Supporting Information.

Statistical Analysis
Analyses were performed using R statistical software ( Version 3.6.1; R Core Team, 2019). For each task, the primary analyses assessed whether more extensive damage to the language and MD networks were associated with more impaired performance on the behavioural tasks, with one-tailed Pearson's r correlation coefficients. We compared the strength of different correlations within task (i.e., comparing the impact of damage to language and MD networks on a given behavioural measure) and between task (i.e., comparing the impact of damage to a given network on different behavioural measures) with two-tailed Meng's z tests (Meng et al., 1992) using the 'cocor' package (Diedenhofen & Musch, 2015). The between-task comparisons focused on the 17 participants for whom we had data for both the degraded speech and the lexical-semantic ambiguity tasks. The language and MD networks against which we compared participants' lesions. The images show probabilistic activation maps of the language network and the MD network based on fMRI data from large numbers of neurotypical participants (language: n = 220; MD: n = 63), which have been thresholded to show regions active in at least 5% of participants during the relevant functional task and plotted onto a volume rendering of the brain. (B) Volume of lesion falling into each network for each of the 19 participants in the present study. Solid line depicts an equal volume of each network affected by the lesion. Different colours/shapes indicate assignment of the participants into the LANGUAGE (LANG), MD, and OTHER Groups upon which recruitment was based (for categorical group analyses, see Supporting Information). (C) Lesion overlap across participants depicted on volume renderings of the brain and on midline sagittal slices viewed as if from the left or right (Montreal Neurological Institute (MNI) space x coordinates of −8 and +8, cross-hairs show the location of y = 0, z = 0 in these slices). Images are shown separately for participants originally assigned to each of the three groups (see Supporting Information for group analyses). Two participants assigned to the MD group (md6, md10) contributed data to tasks for only one type of challenge and therefore images are shown separately for the two challenge types. Brighter colours reflect greater lesion overlap across participants.

Challenge 1. Acoustically Degraded Speech Perception and Adaptation
The first challenge increased speech comprehension difficulty at the acoustic-phonetic level of the input, by acoustic degradation of spoken sentences with noise vocoding (Shannon et al., 1995). Noise vocoding reduces the spectral detail in the speech signal but retains the slow amplitude modulations, which approximately reflect syllabic units, and the broadband spectral changes that convey speech content. These low frequency modulations and broadband spectral modulations have been shown to be most important for accurate speech perception (Elliott & Theunissen, 2009;Shannon et al., 1995). We selected the particular numbers of channels in the vocoder based on previous research, which established that intelligibility (as measured by word report: How many words of the sentence a participant can accurately report) increases with the logarithmic increase in number of channels (McGettigan et al., 2014). In healthy adults with good hearing, for short sentences of 6-13 words long, intelligibility is near 100% for 16-channel vocoded speech, near 0% for 1-channel vocoded speech, and at an intermediate level for 4-channel vocoded speech (Peelle et al., 2013). We assessed speech perception in terms of the logarithmic number of channels estimated as required to achieve 50% word report accuracy of these sentences and assessed adaptation by comparing performance before and after a training period.
Sentences were recorded by a female native speaker of British English and digitised at a sampling rate of 22050 Hz. We created three degraded versions of the sentences, of decreasing intelligibility, using 16, 8, and 4 channels in the vocoder. To do this, the frequency range 50-8000 Hz was divided into 16, 8, or 4 logarithmically spaced frequency channels. Each channel was low-pass filtered at 30 Hz and half-wave rectified to produce an amplitude envelope for each channel, which was then applied to white noise that was filtered in the same frequency band. Finally, the channels were recombined to create the noise-vocoded version of the sentence.
The 40 sentences were grouped into eight sets of five sentences such that each set contained 45 words in total and were expected (based on previous word report data) to be approximately equally intelligible. Each participant heard all eight sentence sets, but assignment of sets to the different levels of degradation (clear, 16-, 8-, 4-channel vocoded) and to the preand post-training test (described below) was counterbalanced across participants.

Procedure
The experiment started with four practice trials to familiarise the participants with the stimuli and the word report task. Participants listened to four different sentences (not included in the test set) at increasing levels of degradation (clear, 16, 8, 4) and after each sentence had to repeat the sentence or as many words from the sentence as possible in the correct order. The experiment then followed a test-train-test format (cf. Sohoglu & Davis, 2016) with the 40 experimental sentences (eight sets of five sentences; see Stimuli section above for details of assignment of the sentences to the pre-test and post-test and to the different levels of degradation). In the initial test, participants listened to 20 of the sentences, five at each level of degradation (clear, 16, 8, 4; order randomised uniquely for each participant) and performed the word report task. This was followed by a training period in which participants listened passively to the same 20 sentences, each repeated four times at decreasing levels of degradation, whilst the written text of the sentence was presented visually on a computer screen. Following the training, participants listened to the other (previously unheard) 20 sentences, five at each level of degradation, and again performed the word report task.

Data processing and analysis
For each participant, we calculated the number and proportion of words correctly reported for each sentence at each level of degradation (clear, 16-, 8-, 4-channel vocoded) and for the pre-and posttraining test. Words were scored as correct only if there was a perfect match with the spoken word from the sentence (morphological variants were scored as incorrect, but homonyms, even if semantically anomalous, were scored as correct). Words reported in the correct order were scored as correct even if intervening words were absent or incorrectly reported, but scored as incorrect if they were reported in the wrong order. To verify that decreasing the number of vocoded channels increased the challenge of speech perception and that training facilitated perception, we analysed differences in proportion of words correctly reported between the sentences with different numbers of channels (clear, 16, 8, 4) and pre-and post-test sentences, with a logistic mixed effects model using the lme4 package (Bates et al., 2015). The model had a single categorical fixed effects predictor for Training (pre-test or post-test) with deviation coding defining one planned contrast: pre-test = −1/2 versus post-test = 1/2. There was also a continuous fixed effect predictor of Log2Channels (log2 number of channels). The final model contained a by-subject random intercept, by-subject slopes with Training and Log2Channel, and a by-item random intercept for sentence.
To quantify the relationship between acoustic degradation and speech perception performance in single participants we also fit a logistic psychometric function to the word report accuracy data separately for each participant, for averaged data, and for pre-and post-training tests separately using the quickpsy package (Linares & López-Moliner, 2006). The parameters of the logistic function were estimated using direct maximisation of the likelihood with the following equation: During the fitting, we treated clear speech as equivalent to 32-channel vocoded speech and converted the number of channels vocoded at each level of degradation into their log equivalents (χ). From each fit, we obtained alpha (α), the number of channels estimated to give 50% accuracy on the word report task. This value, referred to as threshold number of channels was used for the subsequent analyses of the impact of lesion on performance (cf. McGettigan et al., 2014). Lower alpha values indicate that fewer channels were required to reach this threshold and thus reflect better performance or more accurate perception. Beta (β) corresponds to the slope or steepness of the curve. Gamma (γ) is the guess rate, which was fixed to 0 for this open set speech task. Lambda (λ) is the lapse rate, or expected proportion of errors as the number of channels reaches the highest levels. Lambda represents the upper horizontal asymptote and was fixed at 1 minus the proportion of correct word report observed for clear speech for each participant separately. This was required as some participants did not achieve 100% word report for clear speech.

Challenge 2. Semantically Ambiguous Speech Comprehension and Adaptation
The second challenge increased speech comprehension difficulty at the lexical-semantic level, by the inclusion of semantically ambiguous words, in sentence contexts that in most cases supported the lower frequency meaning. We assessed speech comprehension in terms of the speed and accuracy of judging the coherence of these sentences, which were interspersed with sentences without ambiguities and anomalous sentences. The coherence judgement task appeared well-suited for assessing competence at semantic ambiguity resolution for several reasons. Firstly, to respond accurately listeners must understand the whole sentence and not just identify one (or more) unusual words. For example, a sentence might initially make sense but then become anomalous only at the end ("It was a rainy day and the family were thinking to the banana") or might initially seem odd but would eventually make sense ("It was a terrible hand and the gambler was right to sit it out."). Secondly, because most of the meanings that we used in the sentences were the less frequent meanings, accurate performance relies on listeners utilising contextual cues to select the appropriate meaning rather than the higherfrequency, more accessible meaning. The use of lower-frequency word meanings also maximised our chance to observe word meaning priming effects, as described below. Thirdly, participants make a speeded judgement giving a continuous measure of performance in addition to accuracy.
To assess the increase in availability of low frequency word meanings in response to experience, we measured changes to meaning preferences in a word association task. This task provides a direct measure of how participants interpret ambiguous word forms in the absence of any sentence context. Specifically, using two counterbalanced sentence sets, we measured the increase in proportion of word association responses that were consistent with the (low frequency) meaning used in the sentence context for ambiguous words that had been heard (primed) compared to those that had not (unprimed). Counterbalanced assignment of sentences to primed and unprimed conditions for different participants ensured that differences in meaning frequency or dominance did not confound assessment of the word-meaning priming effect (for further discussion of word-meaning priming, see Rodd et al., 2013).

Stimuli
The stimuli for the coherence judgement task were 120 declarative sentences, selected from two previous studies (Davis et al., 2011;Rodd et al., 2005). Of these, 40 were high-ambiguity coherent sentences, 40 low-ambiguity coherent sentences, and 40 anomalous sentences. The high-ambiguity sentences each contained two ambiguous words that were disambiguated within the sentence (e.g., "The PITCH of the NOTE was extremely high." The ambiguous words were not repeated across the set of 40 sentences.). Prior dominance ratings (Gilbert & Rodd, 2022) indicated that in most of the sentences, the context biased the interpretation of the ambiguous words towards their subordinate (less frequent) meanings (mean dominance = 0.31; SD = 0.25). The low-ambiguity sentences were matched with the high-ambiguity sentences across the set for number of words, number of syllables, syntactic structure and naturalness but contained words with minimal ambiguity (e.g., "The pattern on the rug was quite complex."). These 80 coherent sentences were separated into two lists (List A and List B), each containing 20 high-ambiguity and 20 low-ambiguity sentences. Participants were presented with sentences from either list (List A or List B) and thus were exposed to half of the ambiguous words in this part of the experiment. Each list also contained all 40 anomalous sentences (i.e., the same sentences were presented to all participants) which had been created from the low-ambiguity sentences by randomly substituting content words matched for syntactic class, frequency of occurrence, and numbers of syllables (e.g., "There were tweezers and novices in her listener heat."). Thus, the anomalous sentences had identical phonological, lexical, and syntactic properties but lacked coherent meaning (see Table 2 for psycholinguistic properties of the 3 sentence types).
The stimuli for the word association task were the 80 ambiguous target words from the 40 high-ambiguity sentences. Given that participants had only heard half of the high-ambiguity sentences in the sentence coherence judgement task (List A or List B), for 40 of the ambiguous words, the subordinate meaning was primed (previously heard in a supportive sentence context) and for the other 40 the subordinate meaning was not primed.
Sentences and single words were recorded individually by a male native speaker of British English (M.H.D) and sentences were equated for root mean square amplitude across conditions.

Procedure
The task consisted of two phases. In the first phase, participants listened to 80 sentences (20 high-ambiguity, 20 low-ambiguity, 40 anomalous) and had to judge as quickly and as accurately as possible the coherence of each sentence. They indicated their response by pressing a green button if the sentence made sense and a red button if it did not. Participants were given examples (not included in the test set) to encourage them to listen to the sentence in its entirety before making the judgement. Following the coherence judgement task, participants completed other behavioural tasks (not relevant to the current investigation) for 20-30 min before moving to the second phase: a word association task. In this phase, participants heard 80 ambiguous words presented in isolation, of which half had been presented in phase 1 (primed) and half were new (unprimed; counterbalanced across participants). For each word, participants had to repeat it and then say the first related word that came to mind. Responses were audio recorded and later coded as consistent with the subordinate meaning (e.g., "NOTE-music") or inconsistent with the subordinate meaning ("NOTE-write").

Data processing and analysis
There were 1,440 experimental trials (18 participants × 80 items). We excluded trials with very fast responses (more than 300 ms before the offset of the sentence), which were assumed to arise from accidental key presses or anticipatory responses. This resulted in the exclusion of two anomalous sentence trials and one low-ambiguity sentence trial.
For each participant, we first assessed whether they could discriminate the coherent sentences (high-ambiguity and low-ambiguity) from the incoherent sentences better than would be expected by chance, by calculating d-prime values for the high-ambiguity and lowambiguity sentences separately: Hits correspond to the proportion of coherent sentences correctly judged as coherent. False alarms correspond to the proportion of incoherent sentences incorrectly judged as coherent.
To allow for calculation of the z-scores, hit rates of 1 were adjusted by 1 − 1/2N (i.e., to a value of 0.975) and false alarm rates of 0 were adjusted by 1/2N (Macmillan & Kaplan, 1985; i.e., to a value of 0.0125).
As the false alarm rate was necessarily identical for high-ambiguity and low-ambiguity conditions (we only included a single set of incoherent sentences), differences in accuracy between the high-ambiguity and low-ambiguity sentences can be assessed using error rates when participants judged these coherent sentences to be anomalous. Therefore, for the main accuracy analyses we excluded the 40 anomalous sentence trials, leaving 719 trials (1 trial was excluded based on a fast response time; see above).
The response time analyses focused on ambiguous and unambiguous sentence trials. Of the 720 total experimental trials (18 participants × 40 items), we excluded trials incorrectly judged as incoherent (23 trials: 14 ambiguous, 9 unambiguous). For exclusions of trials based on response times, we followed the general principle of minimal trimming with model criticism (Baayen & Milin, 2010). We excluded trials with very fast response times (less than 300 ms before offset; as for the accuracy analysis), which were assumed to reflect accidental key presses (1 trial) as well as trials with very slow response times (three trials with responses longer than 4,000 ms after sentence offset) because we were interested in speeded responses. Further exclusions were considered after first determining whether any transformation of the dependent variable was required to meet assumptions of the linear mixed-effects models, of homogeneity of residual variance and normally distributed residuals. Model diagnostic plots (quantile-quantile and histogram plots of the residuals) for the raw, log10-transformed and inverse transformed response time data showed that log10 transformation best met the assumptions. Examination of the plots for outliers indicated that no further trimming was necessary, thus there were 693 correctly judged coherent trials included in the analyses.
We analysed differences in accuracy and response times between the high-ambiguity and low-ambiguity sentence trials with a logistic mixed effects model (accuracy) or a linear mixed effects model (log-10 response times) using the lme4 package (Bates et al., 2015). The models had a single categorical fixed effects predictor for Sentence Type (High-ambiguity or Lowambiguity) with deviation coding defining one planned contrast: High-ambiguity = 1/2 versus Low-ambiguity = −1/2. The final models each contained a by-subject and by-item random intercept.
The correlational analyses used the model residuals (comparing predictions to the data) to estimate the ambiguity response time effect (difference between responses for high-ambiguity and low-ambiguity sentence trials) for each participant. A positive residual difference indicates that the participant's ambiguity effect was larger than predicted by the model (response times were slower than estimated for the high-ambiguity condition and/or faster than estimated for the low-ambiguity condition). A negative residual difference means that their response time effect was smaller than predicted by the model (response times were faster than estimated for the high-ambiguity condition and/or slower than estimated for the low-ambiguity condition). For the word association task, each response was independently coded for consistency with the subordinate meaning used in the priming sentence by two of the authors (LM and ZB), who were blind to the experimental condition (primed/unprimed) of the responses. For example, the word "ball" came from the sentence "The ball was organised by the pupils to celebrate the end of term," so responses such as "party" and "dance" were coded as consistent whereas responses such as "kick" and "round" were coded as inconsistent. The consistency scores for the unprimed words give a baseline measure of the preference for the dominant meaning. Response codes from the first author were used with the exception of one participant for whom data were lost and only the codings from the second rater were available; inter-rater reliability for the remainder of the responses was high (94% agreement from 1,360 responses, Cohen's Kappa = 0.862).
We analysed differences in the proportions of responses consistent with the subordinate meaning between primed and unprimed words (word meaning priming) with a logistic mixed effects model with a categorical fixed effect predictor for Priming Type (Primed or Unprimed) with deviation coding defining one planned contrast: Primed = 1/2 versus Unprimed = −1/2. There was also a continuous fixed effect predictor of Meaning Dominance (Gilbert & Rodd, 2022) and the associated interactions. The final model contained a by-subject and by-item random intercept and a by-subject random slope for Dominance.
In the main correlational analyses we used the model residuals (comparing predictions to the data) to estimate word priming effects (difference between response values for primed and unprimed words) for each participant. A positive residual difference indicates that the participant's priming effect was larger than predicted by the model (proportion of responses consistent with the subordinate meaning was underestimated for the primed condition and/or overestimated for the unprimed condition). A negative residual difference means that their priming effect was smaller than predicted by the model (proportion of responses consistent with the subordinate meaning was overestimated for the primed condition and/or underestimated for the unprimed condition).

Challenge 1. Acoustically Degraded Speech Perception and Adaptation
Word report task Figure 2A shows the mean proportion of words correctly reported for speech with different numbers of channels, for the pre-and post-training tests. Word report accuracy was near ceiling (100%) for the clear speech reflecting the participants' ability to perform the task, but was close to floor levels for the 4-channel vocoded condition, reflecting the challenge of the acoustic degradation. The mixed effect model confirmed that speech perception accuracy increased as the log2 number of channels increased (model coefficient: β = 3.199, SE = 0.236, z = 13.556, p < 0.0001). Accuracy was greater following training (model coefficient: β = 1.262, SE = 0.392, z = 3.217, p = 0.001), showing that participants were able to learn. There was no interaction between the level of degradation and training.
The outputs of fitting the data with a logistic psychometric function are shown in Figure 2B. Analyses to assess the impact of lesions on performance used the threshold number of channels (the estimated number of channels required for 50% word report accuracy) with lower values reflecting better perception (fewer channels needed to reach 50% accuracy). Figure 2C shows the mean performance before and after training for the group and for individual participants. Figure 3 shows the relationship between degraded speech perception performance and the extent and location of lesions. Correlational analyses showed that the mean threshold number of channels across pre-and post-training tests positively correlated with damage to the MD network (r = 0.427, p = 0.039) but not with damage to the language network (r = −0.152, p = 0.727), or with total damage (r = 0.216, p = 0.194). Comparisons of these correlations demonstrated that poorer speech perception was numerically more strongly predicted by damage to the MD network than to the language network, although this did not reach the p < 0.05 threshold of statistical significance (z = −1.954, p = 0.051). There was no evidence for MD  (Morey, 2008). (B) Psychometric logistic function fits separately for the pre-training (solid) and post-training (dashed) data for the mean across all 18 participants (black colour) and each participant separately (coloured by group). The horizontal line indicates the 50% word report accuracy threshold. Vertical lines indicate the estimated threshold number of channels corresponding to the 50% word report accuracy threshold for the mean fits across all 18 participants. (C) Estimated threshold number of channels (log scale) required for 50% accuracy in the word report task for the pre-and post-training tests separately. Bars show mean values across all 18 participants and error bars show ±1 SEM, adjusted to remove between-subject variance (Morey, 2008). Individual participant values are overlaid (colour and shape reflect participant group; see Supporting Information). network damage being more predictive of speech perception than total damage (z = −1.122, p = 0.262). There were no correlations between perceptual learning of degraded speech (i.e., change in threshold from pre-to post-training) and the volume of brain damage in the language network (r = −0.123, p = 0.727), MD network (r = 0.003, p = 0.504), or total damage (r = −0.006, p = 0.491. There was also no evidence that MD damage was more predictive of degraded speech perception than of degraded speech adaptation (z = −1.403, p = 0.161).

Sentence coherence judgement task
All participants showed d-prime values substantially above 0 indicating successful discrimination of the incoherent sentences from both the high-ambiguity (mean = 3.66, SD = 0.25, range = 3.0-4.20) and the low-ambiguity sentences (mean = 3.75, SD = 0.37, range = 2. 72-4.20). Figure 4 shows the mean error rates ( Figure 4A) and the mean response times ( Figure 4B) for the different Sentence Types and for individual participants. Across all participants the proportions of correct responses were near ceiling (mean = 0.97, SD = 0.03, range = 0.92-1.0). The mixed effect model showed no effect of Sentence Type on accuracy (model coefficient: β = −0.440, SE = 0.538, z = −0.817, p = 0.414), and hence we have no evidence that sentences containing ambiguous words were less well understood. Figure 3. Relationship between speech perception and lesion volume. Individual participant data for estimated threshold number of channels (log scale) required for 50% word report accuracy for the mean of pre-and post-training tests plotted separately against damage to the language network, to the multiple demand (MD) network, and total damage (colour and shape reflect participant group; see Supporting Information). Higher threshold number of channels indicates worse speech perception performance. The dashed line shows the linear best fit, and grey shaded areas show 95% confidence intervals. The mixed effect model for response times confirmed that high-ambiguity sentences were responded to more slowly than low-ambiguity sentences (β = 0.043, SE = 0.019, t(2.319) = 2.319, p = 0.023), showing that ambiguous words increased the challenge of sentence comprehension. However, there was no correlation between individual ambiguity response time effects (model residual difference measure) and extent of damage to the language network (r = −0.102, p = 0.656), the MD network (r = −0.038, p = 0.559) or overall damage (r = 0.076, p = 0.383).

Word association task
We excluded primed trials corresponding to words from sentences that were responded to incorrectly in the coherence judgement task. This resulted in exclusion of 28 trials (words from 14 ambiguous sentences: 1.94% of data) across the 18 participants, leaving 1,412 observations. For unprimed words (i.e., for ambiguous words that were not presented to participants in the coherence judgement task), the mean proportion of responses (across items and participants) that were consistent with the subordinate meaning of the word was 0.29 (SD = 0.09). This value, which gives a baseline measure of the preference for the dominant meaning, indicates that the sentence-primed meanings were indeed the subordinate or less preferred meanings (note that the value is similar to the one derived from an existing database (Gilbert & Rodd, 2022; see Stimuli section above). Figure 5 shows the mean proportion of responses consistent with the subordinate meaning for primed and unprimed words. We observed a main effect of Priming (β = 0.352, SE = 0.137, z = 2.565, p = 0.010), which reflects a higher proportion of responses consistent with the subordinate meaning for the primed compared to unprimed words. This finding demonstrates a change in word meaning preferences in response to recent experience of sentences containing ambiguous words. We also observed a main effect of meaning Dominance of the word (β = 1.112, SE = 0.111, z = 10.054, p < 0.0001), reflecting an increase in proportion of responses consistent with the subordinate meaning as the dominance of that meaning increased (became less subordinate and closer in frequency to the alternative dominant meaning). There was an interaction between Dominance and Priming (β = −0.300, SE = 0.171, z = −2.124, p = 0.034), reflecting a stronger Priming effect for meanings that were more subordinate.
Correlational analyses revealed a negative relationship between individual word meaning priming effects (model residual difference measure) and the extent of damage to the language network (r = −0.659, p = 0.001) but not the MD network (r = −0.035, p = 0.446) or total damage (r = −0.180, p = 0.237). Comparisons between these correlations showed that word meaning priming was more strongly predicted by damage to the language network than to the MD network (z = −2.182, p = 0.0291); a comparison with the correlation with total damage did not reach the conventional p < 0.05 threshold (z = −1.863, p = 0.062). There was also evidence that damage to the language network was more predictive of individual participants' word meaning priming than the ambiguity response time effect (z = −2.6523, p = 0.008). Figure 6 shows scatter plots of the correlations between word-meaning priming and damage to language network, MD network, and total damage. A further correlational analysis showed no relationship between participants' ambiguity response time effect and their word meaning priming effect (both measured using the model residuals: r = −0.26, p = 0.298, two-tailed), suggesting that reduced word meaning priming effect could not be explained simply as due to poorer comprehension. Table 3 summarises the correlations between the lesion volume in each network and measures of acoustically degraded speech perception, semantically ambiguous speech comprehension, and associated adaptation to each of the challenges. Table 3A displays lesion-behaviour correlations from the 18 participants tested for each challenge. Where significant lesionbehaviour correlations are observed, additional comparisons between correlations with behaviour for lesions to the two networks and within each challenge type are also shown. Table 3B displays lesion-behaviour correlations for the 17 participants for whom we have data for both types of challenge. This table shows comparisons between correlations for the two types of challenge. Figure 6. Relationship between word meaning priming and lesion volume. Individual participant data for word meaning priming effects estimated from the model residuals, plotted separately against damage to the language network and to the multiple demand (MD) network, and against total damage (colour and shape reflect participant group; see Supporting Information). The dashed line shows the linear best fit, and grey shaded areas show 95% confidence intervals. As detailed in the task-specific results (summarised in Table 3), acoustically degraded speech perception was predicted by the degree of MD network damage and this correlation was in the opposite direction to, but not significantly stronger than, the correlation with language network damage volume (see Word report task section in Results). Conversely semantically ambiguous speech adaptation (word meaning priming) was predicted by language network damage and this correlation was significantly stronger than the nonsignificant correlation with MD network damage volume (see Word association task section in Results). This double dissociation provides evidence for causal associations between the integrity of the MD network and abilities at degraded speech perception and between the language network and word meaning priming. Further comparisons of the strength of correlations between tasks within the same type of challenge (acoustic degradation or semantic ambiguity) showed that damage to the language network was more predictive of impaired word meaning priming than of comprehension of ambiguous sentences. This finding provides support for a specific contribution of the language network to adaptation that is independent to its role in comprehension (at least as measured here), which was shown to be largely independent of language or MD network lesions. However, there was no evidence that damage to the MD network was more strongly predictive of degraded speech perception than of degraded speech adaptation. Table 3. Summary of correlations between the extent of damage to the multiple demand (MD) and language networks and performance on the tasks measuring speech perception/comprehension and adaptation for the two types of challenge (acoustic degradation and semantic ambiguity). Note. Tasks measuring speech perception/comprehension and adaptation are labeled (1)-(4). Where significant lesion-behaviour correlations are observed, additional comparisons of the strength of these correlations are also shown. Correlations are shown for data from the 18 participants for each type of challenge and associated within-challenge (across-lesion, across-task) comparisons. Two participants (md6 and md10) performed tasks for only one challenge type. Across-challenge comparisons are shown for data from the 17 participants for whom we have data for both types of challenge. Significant lesion-behaviour correlations (in bold) are shown between the MD network and Task (1): acoustically degraded speech perception (turquoise), and between the language network and Task (4): adaptation to semantic ambiguity (word meaning priming; purple). In the case of Task (1) the finding that an increase in lesion volume is associated with worse behavioural outcomes is reflected in a positive correlation since higher perceptual thresholds indicate worse perception. In Task (4) the finding that an increase in lesion volume is associated with worse behavioural outcomes is reflected in a negative correlation since less word meaning priming indicates less adaptation. A dash indicates that analyses were not performed. NA = data are not applicable. a Excluding md6.
To further explore the specificity of the contribution of the MD and language networks to acoustically degraded speech perception and word meaning priming respectively, we also compared the strength of correlations between tasks using data from the 17 participants who performed both the acoustic degradation and the semantic ambiguity tasks (Table 3). There was no evidence that damage to the MD Network was more predictive of degraded speech perception than of word meaning priming (z = −1.042, p = 0.300), or of comprehension of ambiguous sentences (z = 1.644, p = 0.100). Damage to the language network was more predictive of impaired word meaning priming than of degraded speech perception (z = −2.160, p = 0.031), although the comparison with degraded speech adaptation did not reach the p < 0.05 threshold (z = −1.678, p = 0.093).

DISCUSSION
We report two main findings. First, we show that damage to the domain-general MD network, but not the language-selective network, causes significant impairments to the perception of acoustically degraded speech. Word report accuracy for noise-vocoded sentences decreased as the number of channels in the vocoder decreased, reflecting an increased challenge to speech perception. The degree of perceptual impairment (i.e., the number of channels required for 50% correct word report) depends on the extent of damage to the MD network, but not damage to the language network ( Figure 3; Table 3). Word recognition improved following a period of training, reflecting adaptation or perceptual learning for this form of acoustic degradation, but the degree of learning was not reliably predicted by lesion location or extent.
In contrast to these results with acoustically challenging speech, we found no evidence that semantically challenging speech comprehension was dependent on the MD system: All participants were highly accurate in judging the coherence of sentences and were no less accurate when the sentences contained ambiguous words, indicating an intact ability to access the typically less frequent (subordinate) word meanings used in our high-ambiguity sentences. Although participants were slower to make judgements for sentences which include ambiguous words, reflecting more effortful comprehension when words have multiple meanings, there was no significant association between response time slowing for ambiguous sentences and the extent of damage to the MD or language networks.
Our second main finding is that despite accurate comprehension of semantically ambiguous speech, damage to the language network-but not to the MD network-caused a significant reduction in updating of word meaning preferences following recent linguistic experience. As shown in previous studies of individuals without brain lesions, our participants were (as a group) more likely to generate a word associate related to the less frequent meaning of an ambiguous word when they had encountered this meaning in an earlier sentence (word meaning priming, as reported by Rodd et al., 2013). However, the magnitude of this word meaning priming effect was predicted by the extent of damage to the language network, but not to the MD network ( Figure 6; Table 3), a dissociation that was supported by a statistically significant difference between the strength of these two correlations. The reduction in word meaning priming was not explained by sentence comprehension difficulties as there was no correlation between the magnitude of word meaning priming and increased response times when judging the coherence of sentences containing semantically ambiguous words. Furthermore, acrosstask comparisons showed that the damage to the language network was more predictive of impaired word meaning priming than impaired comprehension of ambiguous sentences or impaired perception of acoustically degraded speech.
Below, we discuss our two main findings in greater detail. First, we discuss possible cognitive operations performed by the MD network that are required for the perception of acoustically degraded speech. We then turn to the linguistic challenge of resolving lexical-semantic ambiguity. We discuss the functional contribution of the language-selective network in adaptation such that low frequency meanings of semantically ambiguous words become more accessible following recent exposure. In a final section we consider the dissociation between these different challenges to speech processing and explore implications for the neural basis of speech perception, comprehension, and adaptation.

The MD Network Makes a Causal Contribution to Perception of Acoustically Degraded Speech
Recently it has been argued that the MD network does not play a functional role in language comprehension (Blank & Fedorenko, 2017;Diachek et al., 2020;Shain et al., 2020;Wehbe et al., 2021; for reviews, see Campbell & Tyler, 2018;Fedorenko, 2014). According to such an account, activations observed during language comprehension within the MD network could reflect a generic increase in effortful processing or contributions to specific task demands (such as decision-making), rather than computations essential for language comprehension (e.g., identifying words; accessing word meanings). However, this line of research left open the possibility that MD contributions may be necessary when speech is acoustically degraded and challenging to perceive (Diachek et al., 2020). Here we provide novel evidence that the MD network indeed makes a causal contribution to perception of acoustically degraded speech by assessing the impact of damage to MD regions on performance in a word report task that indexes cognitive operations required for word identification.
Previous fMRI studies have shown that listening to acoustically challenging speech is associated with an increase in activation in prefrontal and motor regions that plausibly fall within the MD network (Adank, 2012;Davis & Johnsrude, 2003;Du et al., 2016;Erb et al., 2013;Hardy et al., 2018;Hervais-Adelman et al., 2012;Rysop et al., 2021;Vaden et al., 2013;Vaden et al., 2015;Wild et al., 2012). However, these studies did not explicitly define MD regions or test the necessity of MD contributions, and hence this association has not been firmly established. A substantial advance, then, comes from our finding that neural integrity of the MD network supports more successful word report for degraded speech, which allows us to conclude a causal role of MD regions in degraded speech perception.
The MD network has previously been linked to a diverse range of domain-general cognitive constructs, including executive control, working memory, and fluid intelligence. These constructs may reflect a combination of different cognitive operations, including setting and monitoring of task goals; directing attention; and the storage, maintenance, integration, and inhibition of information across different time scales. It is therefore of interest to consider which of these operations, performed by the MD network, might be critical for the perception of acoustically degraded speech. For example, focused attention may be particularly important when the identities of specific phonemes or words are uncertain. Monitoring may be important for tracking the accuracy of phoneme perception and word recognition over time.
Future work can tease apart these possible distinct cognitive operations, either by focusing on potential contribution of distinct subnetworks within the broader MD network, or by exploring correlations between these other functions of MD networks and perception of degraded speech. Given strong evidence of inter-regional correlations during naturalistic listening paradigms (Assem, Blank, et al., 2020;Blank et al., 2014;Mineroff et al., 2018;Paunov et al., 2019), we here treated the MD network as a functionally integrated system. However, other research concerned with domain-general cognitive processes has proposed that the MD network consists of at least two interconnected, but distinct subnetworks (one comprising lateral frontal and parietal areas and the other cingular and opercular areas), which may contribute differently to cognition (Dosenbach et al., 2007;Dosenbach et al., 2008;Nomura et al., 2010). In the context of effortful speech comprehension, Peelle (2018) proposes a three-way distinction between frontoparietal, premotor, and cingular-opercular contributions to attention, working memory, and performance monitoring processes respectively. Consistent with the proposed role of cingular-opercular regions are data showing that activation is associated with better word recognition on subsequent trials (Vaden et al., 2013;Vaden et al., 2015), which may reflect mechanisms for tracking the accuracy of phoneme perception and word recognition over time. Although the present data cannot adjudicate between bi-or tripartite views, further research using similar methods and data from a larger number of individuals could potentially dissociate the effect of lesions of these three subnetworks and establish underlying mechanisms. For example, we might predict that focal damage to cingularopercular regions would result in a greater impairment in degraded speech perception when perceptual difficulty varies from trial-to-trial compared to cases in which trial difficulty is grouped into blocks.
Replicating a range of previous behavioural findings Hervais-Adelman et al., 2008;Huyck & Johnsrude, 2012;Loebach & Pisoni, 2008;Peelle & Wingfield, 2005;Sohoglu & Davis, 2016), we showed that listeners adapt to acoustically degraded speech over time. This finding extends earlier observations of perceptual learning to individuals with lesions to language-selective and domain-general regions. We found no evidence that damage to either MD or language-selective networks led to reduced perceptual learning and hence cannot make causal claims about the contribution of either network to this form of learning. Future studies using similar methods would benefit from a larger number of participants, with more variable and more extensive lesions.

Ambiguous Speech
Semantically ambiguous words introduce a substantial challenge to speech comprehension because of the need to engage competition processes to select between alternative meanings and the cognitive cost of reinterpretation when initial selection fails (Rodd et al., 2002;Rodd et al., 2010Rodd et al., , 2012. The presence of two or more ambiguous words in each of the high-ambiguity sentences used in our study made comprehension especially challenging. Nonetheless, we observed that comprehension-indicated by judging high-ambiguity sentences to be coherent-was ultimately successful (although slower than for low-ambiguity sentences) and that accuracy in judging coherence did not differ between high-and lowambiguity sentences. Neither response time differences nor the accuracy of coherence judgements were associated with the degree of damage to MD or language-selective brain networks. Thus, our study does not provide evidence for a specific causal role of either of these brain networks for comprehension of sentences containing ambiguous words.
Despite intact comprehension of sentences containing semantically ambiguous words, we observed differential effects of lesion location and extent on learning mechanisms involved in adapting lexical-semantic processing after successful disambiguation. Previous research has established that recent exposure to low-frequency (subordinate) meanings of ambiguous words in a sentence context facilitates subsequent meaning access and selection of those meanings, a process termed word-meaning priming (Betts et al., 2018;Gaskell et al., 2019;Gilbert et al., 2018;Rodd et al., 2013;Rodd et al., 2016). Previous functional imaging studies have not studied neural activity associated with word-meaning priming and hence the present results make a novel contribution to understanding the neural basis of this adaptation process. We here replicated the standard word meaning priming effect for the group of participants tested overall, but showed that the magnitude of the priming effect was significantly reduced by damage to the language-selective network but not to the MD network.
There is substantial anatomical overlap between the language network shown here to be critical for updating of word-meaning preferences following successful disambiguation, and the fronto-temporal brain regions previously shown to respond to semantic ambiguity resolution (Bilenko et al., 2009;Musz & Thompson-Schill, 2017;Rodd et al., 2005;Vitello et al., 2014;Zempleni et al., 2007; for a review, see Rodd, 2020), consistent with shared neural resources between semantic comprehension and subsequent adaptation. The absence of an effect of language-network damage on immediate comprehension, coupled with the observed impact of language-network damage on semantic adaptation may therefore reflect the relatively high functioning of the volunteers, the limited severity of the language lesions, and/or the relative insensitivity of the comprehension task to distinguishing between these relatively unimpaired volunteers. It also remains possible that particular subregions of the language network are differentially important for immediate comprehension compared to subsequent adaptation. These issues could be explored in future work, which would clearly benefit from larger numbers of participants. A larger sample would also allow the use of alternative methods such as voxel-based lesion-symptom mapping to localise function more specifically within the network, for example, by contrasting frontal and temporal lobe lesions.
One striking illustration of the longevity of learning is that word-meaning priming has previously been observed 24 hr after a single exposure to an ambiguous word; especially if there is an intervening period of sleep (Gaskell et al., 2019). This latter finding, in combination with a wider literature on the role of consolidation processes that facilitate the acquisition of new lexical knowledge (Dumay & Gaskell, 2007;Gaskell & Dumay, 2003;Tamminen et al., 2010), led Gaskell et al. (2019) to suggest that word meaning priming may involve a two-stage complementary systems account of learning (McClelland, 2013), as proposed for the acquisition of novel words (Davis & Gaskell, 2009). According to this account, short-term learning arises from hippocampally mediated binding of associations between words in the sentences, while these short-term changes are consolidated into long-term changes to word meaning preferences after sleep.
The present study constrains these complementary systems accounts of learning by revealing a causal contribution of language-selective cortical regions even for short-term adaptation of familiar word meanings. Future work could further consider the interaction of hippocampal and cortical regions in the learning and maintenance of meaning preferences over different timescales and the relationship between learning novel vocabulary and updating of existing lexical semantic knowledge (for recent meta-analyses of word form learning and consolidation, see Schimke et al., 2021;Tagarelli et al., 2019). We note that previous research has shown that individuals with aphasia (identified behaviourally) can learn novel vocabulary but that learning is highly variable (Kelly & Armstrong, 2009;Tuomiranta et al., 2011Tuomiranta et al., , 2014. The present work similarly shows variability in the impact of cortical lesions on adapting the meanings of familiar words. However, our participants were not recruited on the basis of language impairment and retained good comprehension both on a standardised measure of sentence comprehension (TROG2) and on the experimental measure of ambiguity resolution tested here. It might be that individuals with more extensive lesions to language selective cortex, or more focal lesions of posterior temporal and inferior frontal regions that contribute to ambiguity resolution would show a greater impairment to comprehension. Such a finding would suggest that a common set of cortical regions support comprehension and learning of ambiguous words in sentences. Further refinement of lesion definitions and tests of larger samples of individuals could also provide more detailed anatomical evidence concerning the relative contributions of language-selective and/or domain-general subregions of the IFG to semantic ambiguity resolution. These regions lie in close proximity and thus may appear to overlap in group studies , which may explain why they have not been dissociated in previous imaging research on semantic ambiguity resolution.

Neural Dissociation of Different Challenges to Speech Perception and Comprehension
Taken together, our findings provide a double dissociation indicating independent functional contributions of the MD and language-selective networks to responding to and adapting to different types of difficult-to-understand sentences. Specifically, we show that the challenge of perceiving acoustically degraded sentences (measured in terms of word report accuracy) is causally linked to the degree of damage to the MD network but not to the language-selective network (although the comparison of correlations was not statistically significant; see Table 3). Conversely, the challenge of post-comprehension adaptation to semantically ambiguous words in sentences (measured in terms of word meaning priming) causally depends on the integrity of the language-selective but not the MD network; moreover, in this case there is a reliable difference between the significant (language) and null (MD) correlations.
Here we tested a limited set of challenges to speech comprehension, thus we cannot make general statements concerning dissociable contributions made by each of these cortical networks to all forms of perceptual or semantic challenge. However, our data provide initial evidence for the task specificity of causal contributions. Focusing first on the effect of language network damage, the correlation between language lesion volume and word meaning priming was significantly different from the null correlation between lesion volume and coherence judgement response times for ambiguous sentences, indicating a greater sensitivity to the integrity of the language network for adaptation compared to initial comprehension of lexicosemantic ambiguity. It is important to note that our results are based on data from individuals without aphasia in whom lesions extended to a maximum of only~11% of the language cortex (see Supporting Information). It is likely that more extensive lesions, or indeed more sensitive tests, would detect a contribution of the language network to comprehension. Furthermore, the reliable correlation between lesion volume and word-meaning priming could also be dissociated from the (null) effect of lesions on degraded speech perception, suggesting that the integrity of the language network is more important for lexicosemantic than for acoustic or perceptual challenges. We note, however, that our definition of the language network was derived from studies using both written and spoken language and hence likely excluded early auditory processing stages. It would be of interest to explore lesion definitions based on localising the speech perception system, which might reveal other systems that causally support abilities at degraded speech perception.
Equivalent across-task comparisons of the effect of MD network damage did not reach statistical significance. Despite a reliable correlation between MD lesion volume and impaired perception of acoustically degraded speech, this effect could not be clearly dissociated from the null effects of lesion volume on tasks involving semantic processing, or perceptual and semantic adaptation. We therefore cannot draw strong conclusions about the specificity of the contribution of the MD network to degraded speech perception based on the current data. For instance, it remains possible that the MD contribution is a secondary consequence of an increase in demands on domain-general cognition (e.g., working memory) that would affect all aspects of language functioning, but are emphasised by the word report task for degraded speech. While we had expected that additional demands on domain-general operations would be equivalent across each of our listening challenges and tasks, we do not have evidence that this was the case. It might be that more careful titration of task difficulty will be required if we are to demonstrate that network-specific lesions impair specific aspect(s) of language function (perception vs. comprehension vs. adaptation to challenge types) rather than apparently specific effects being mediated by domain-general processes.
Relatedly, a reviewer raised the possibility that unmeasured auditory and cognitive impairments may have had consequences for participants' task performances. Such impairments could affect performance even when participants achieved a high level of accuracy such as reporting words for clear speech or distinguishing coherent and anomalous sentences. Further, they could potentially account for the poorer task performance when challenge was introduced. This concern is based on previous research that demonstrates a variety of higher-level cognitive consequences of hearing impairment on tasks requiring speech perception and comprehension (for a review, see Humes et al., 2012). For example, a participant with a hearing impairment may experience increased demands on domain-general functions depending on the task, and the extent to which they can manage these demands may therefore depend on cognitive abilities. As above, we had no reason to suppose differential effects of any auditory or cognitive impairments on our different linguistic tasks, but we acknowledge the limitation and suggest that future studies should measure auditory and cognitive abilities more broadly alongside linguistic measures of interest.
Further studies exploring a wider range of challenges to speech comprehension and with larger samples of participants might specify the causal contributions identified here in more detail. For example, we might assess lesion correlates of perception and adaptation for other forms of perceptual challenges to speech comprehension such as those arising when speech is in background noise, or speech sounds are perceptually ambiguous (see Mattys et al., 2012, for a review of these listening challenges). Future studies might also consider whether other forms of semantic, syntactic, or lexical challenge to comprehension are also causally associated with the integrity of the MD or language networks. In this way, building on the current methods and findings, one could map the hierarchy of cognitive processes involved in speech perception and comprehension onto specific brain regions that support them. However, as mentioned above, larger samples of patients will be needed if we are to conduct more anatomically specific analyses at the level of individual voxels (e.g., using voxel-based lesionsymptom mapping) or functional subregions within the larger networks studied here.

Conclusions
Speech comprehension in naturalistic situations requires listeners to accommodate and learn in response to a range of perceptual and semantic challenges that make spoken sentences more difficult to recognise and understand. Behavioural data from individuals with lesions to language-selective and domain-general MD networks demonstrate different functional contributions of these two networks depending on the source of the listening challenge. In particular, the MD network appears to be necessary for the perception of acoustically degraded speech, whereas using recent experience to update meaning preferences for ambiguous words appears to depend on anatomically distinct, frontotemporal regions argued to form a specialised language network.
In this work we considered two specific challenges, but future work should consider whether differences in the ways in which acoustic degradation and lexical-semantic ambiguity engage and depend on the domain-general MD network and domain-selective language network translate to other perceptual and semantic challenges and to more naturalistic speech processing. For example, speech perception must be resilient in the face of unfamiliar accents, mispronunciations, and competing sounds. Comprehension processes must accommodate multiple forms of syntactically or semantically complex and ambiguous speech. Many of these are situations in which activation of inferior frontal regions has been observed (Blanco-Elorrieta et al., 2021;Boudewyn et al., 2015;January et al., 2009;Kuperberg et al., 2003;Novais-Santos et al., 2007) and an attribution to domain-general MD processing has sometimes been made. However, given the evidence for functionally distinct language-selective and domain-general subregions lying in close proximity within the IFG, and the individual variability in their precise locations , such conclusions may be premature. Further studies of individuals with focal lesions can be used to determine whether accommodating these other perceptual and semantic challenges to speech processing similarly depends on the integrity of domain-general or language-selective brain regions. These perceptual and semantic challenges are common for the noisy and ambiguous spoken language that listeners perceive and comprehend every day.