A Review on Treatment-Related Brain Changes in Aphasia

Numerous studies have investigated brain changes associated with interventions targeting a range of language problems in patients with aphasia. We strive to integrate the results of these studies to examine (1) whether the focus of the intervention (i.e., phonology, semantics, orthography, syntax, or rhythmic-melodic) determines in which brain regions changes occur; and (2a) whether the most consistent changes occur within the language network or outside, and (2b) whether these are related to individual differences in language outcomes. The results of 32 studies with 204 unique patients were considered. Concerning (1), the location of treatment-related changes does not clearly depend on the type of language processing targeted. However, there is some support that rhythmic-melodic training has more impact on the right hemisphere than linguistic training. Concerning (2), we observed that language recovery is not only associated with changes in traditional language-related structures in the left hemisphere and homolog regions in the right hemisphere, but also with more medial and subcortical changes (e.g., precuneus and basal ganglia). Although it is difficult to draw strong conclusions, because there is a lack of systematic large-scale studies on this topic, this review highlights the need for an integrated approach to investigate how language interventions impact on the brain. Future studies need to focus on larger samples preserving subject-specific information (e.g., lesion effects) to cope with the inherent heterogeneity of stroke-induced aphasia. In addition, recovery-related changes in whole-brain connectivity patterns need more investigation to provide a comprehensive neural account of treatment-related brain plasticity and language recovery.


INTRODUCTION
Aphasia is an acquired neurological language disorder affecting approximately 1 in 250 people (NIDCD, 2015). This is most commonly caused by a cerebrovascular accident in the languagedominant hemisphere, which is the left hemisphere in more than 90% of right-handed persons (Rasmussen & Milner, 1977). Aphasia results in impaired production and/or impaired comprehension of speech, reading, and/or writing. These communication impairments dramatically affect societal participation and integration, causing a substantial decrease in the quality of life (Dahlberg et al., 2006). Effective language treatment might be a crucial element to trigger recovery. Different types of interventions can be used to target the language problems of people with aphasia (PWA). Depending on the affected speech and language components, patients are trained on the production and/or comprehension of the meaning of words and sentences a n o p e n a c c e s s j o u r n a l (semantics), the sound structure of words (phonology), written word forms (orthography), grammar (morphosyntax), and/or the melodic intonation patterns inherent to language (melody/rhythm). Other interventions tap communication as a whole (e.g., constraint-induced aphasia therapy; Pulvermüller et al., 2001) and/or social activities and societal participation (e.g., script training; Kaye & Cherney, 2016).
In the last two decades, research on the effectiveness of these therapies has increased, and has provided evidence that high dose speech-language therapy results in better functional communication and better language comprehension and production compared with no intervention. However, effect sizes are weak, inconsistent, and not necessarily evident at follow-up (Brady, Kelly, Godwin, Enderby, & Campbell, 2016). The effects of therapy have also been observed in structural and functional alterations in the brain. It is known that experience, learning, and training can strengthen synapses through the frequent sequential coactivation of connected neuronal assemblies (Tsumoto, 1992). This can consequently alter brain structure and functioning in younger (Scholz, Klein, Behrens, & Johnsen-Berg, 2009) as well as older (Boyke, Driemeyer, Gaser, Buchel, & May, 2008) healthy adults. Varley (2011), who translated known neuroscience principles to aphasia therapy, described that such neuroplasticity should occur after stroke when appropriate interventions, focusing on specific language behaviors, are provided with a sufficient dose, frequency, and intensity (Pulvermüller & Berthier, 2008). Recovery is achieved by either maximizing the capacity of a damaged neural language network or by linking new neural processing assemblies to fulfill a linguistic task (Murphy & Corbett, 2009).
A central question is which regions are considered to belong to "the neural language network." For centuries, research has been conducted on investigating the neurobiological basis of language. The familiarity of Broca's and Wernicke's regions in the context of language is the result of the classic view on linguistic processing proposed by the "Broca-Wernicke-Lichtheim-Geschwind model" in the late 19th century (Geschwind, /2010(Geschwind, , 1970. According to this model, language is situated in the perisylvian area of the left hemisphere, more specifically in the middle and posterior superior temporal lobe for language comprehension, and in the inferior frontal lobe for language production. The connection between Broca's and Wernicke's regions is established by the well-known white matter pathway, the arcuate fasciculus (AF; Hagoort, 2014). However, there is still no clear and consistent definition of either of these two regions in terms of anatomical localization. Over the years, Wernicke's area has been located in almost every part of the posterior perisylvian cortex, including the superior temporal gyrus (STG), the middle temporal gyrus (MTG), and the inferior parietal cortex. In an online survey of specialists in the neurobiology of language, none of the seven anatomical definitions of Wernicke's region garnered more than 30% of the votes. In addition, in the same survey, only 50% of respondents agreed on the precise location of Broca's region in the triangular and opercular part of the left inferior frontal gyrus (IFG; Tremblay & Dick, 2016). Furthermore, the strict functional division between language production in Broca's region and language comprehension in Wernicke's region is not valid because many fMRI studies have demonstrated that both language modalities share neural resources (Menenti, Gierhan, Segaert, & Hagoort, 2011;Segaert, Menenti, Weber, Petersson, & Hagoort, 2012;Stokes, Venezia, & Hickok, 2019).
In the last two decades, many alternative models for language have been proposed, of which the dual-stream model for speech processing of Hickok and Poeppel (2007) is particularly well known. In this model, multiple regions in the perisylvian cortex, as well as a premotor region and more ventrally located areas, are assumed to underlie linguistic processing. More specifically, the first step in speech perception, the spectrotemporal analysis of the sounds, is assigned to the dorsal STG. Subsequently, phonological processing takes place in the mid-post superior temporal sulcus (STS). The model then proposes a left dorsal stream, underlying the mapping of phonological representations onto articulatory representations, in the parietotemporal junction, the posterior IFG, and a more dorsal premotor region. A ventral stream underlies the mapping of phonological representations onto meaning. Bilateral posterior regions in the ventral stream (posterior MTG and inferior temporal gyrus [ITG]) engage more in lexical semantics, whereas the left anterior regions of the ventral stream are also engaged in sentence level processing (Hickok & Poeppel, 2007).
Other recent language models have proposed an even more extended network, including medial and subcortical structures involved in sensory, motor, and higher-order cognitive processes that support linguistic functioning (e.g., Price, 2000Price, , 2012Price, Seghier, & Leff, 2010;Vigneau et al., 2006Vigneau et al., , 2011. The specific role of each of these regions according to these more elaborate language models is provided in Table 1 in the online supporting information located at https:// www.mitpressjournals.org/doi/suppl/10.1162/nol_a_00019. Over time, various research groups have studied the healthy neural language network, and different subnetworks have been detected to support different linguistic (semantics, phonology, syntax, orthography) and rhythmic-melodic levels of language (Friederici, 2011;Price, 2000Price, , 2010Price, , 2012Vigneau et al., 2006Vigneau et al., , 2011. Yet little is known on treatment-related brain changes in these networks in PWA.

Aims of the Present Study
In this review, we strive to summarize and integrate the results of recent research on the structural and/or functional changes associated with different language treatments in PWA. The aim is twofold, namely to investigate whether brain changes are (1) specific to the type of intervention received (i.e., phonological, semantic, orthographic, syntactic, or rhythmic-melodic); and (2a) whether the most consistent changes occur within the language network or outside, and (2b) whether these are related to individual differences in language outcomes. Concerning the first aim, we first discuss whether the linguistic interventions (i.e., phonology, semantics, orthography, or syntax) result in specific or similar brain changes. Next, we discuss whether nonlinguistic interventions focusing on rhythmic-melodic aspects rely more on the right hemisphere than linguistic-based interventions. As indicated in the model of Hickok and Poeppel, as well as in Table 1 in the online supporting information, different brain correlates have been related to each of the linguistic components (Hickok & Poeppel, 2004Price, 2000Price, , 2010Price, , 2012. It is therefore plausible that, depending on the intervention, different subnetworks undergo changes over time. These differences in intervention effects are assumed to be partly responsible for the extreme heterogeneity in recovery patterns seen in PWA (Saur & Hartwigsen, 2012). However, at the same time, these linguistic components are highly interwoven. This makes it unlikely that only one kind of language processing is tapped during an intervention (e.g., one cannot train sentence production without involving the meaning of the sentence). In addition, there is considerable overlap in the neural networks for different linguistic components (see e.g., Vigneau et al., 2006) and each network presumably interacts with others to create our general language behavior.
To illustrate this, we overlaid the fMRI association test maps for semantic (blue), phonological (red), syntactic (orange), as well as orthographic (green) processing in Figure 1, based on the automated meta-analysis of previous fMRI studies provided by Neurosynth (http://neurosynth. org/). The association maps represent brain regions where blood oxygen level dependent (BOLD) changes occur more consistently for studies including the search term, than for studies that do not mention the search term. According to Neurosynth, the overlap is the greatest in the left frontal and temporal lobe, with most of the phonological network (red) located dorsally, most of the semantic network (blue) located ventrally, and most of the orthographic network (green) located ventrally and posterior to the other language networks.
Based on these Neurosynth maps, we expected that (1) brain changes related to interventions focused on phonological processing are more frequently located in the phonological network (red), (2) brain changes related to interventions focused on semantic processing are more frequently located in the semantic network (blue), (3) brain changes related to interventions on orthographic processing are more frequently located in the orthographic network (green), and (4) brain changes related to interventions focused on syntactic processing are more frequently located in the syntactic network (orange). For all these language interventions, we expected a left-dominance of treatment-related changes. This does not imply that the right hemisphere is not involved in linguistic processing (e.g., see the bilateral ventral stream in Hickok & Poeppel, 2007), but only suggests that it does so to a lesser extent than the left hemisphere.
The rhythmic-melodic network is not depicted in Figure 1, because there was no metaanalysis available on Neurosynth for this keyword. However, it similarly involves brain regions in frontal and temporal lobes (see Table 1 and Figure 1 in the online supporting information), but it is generally considered the only component relying more on the right hemisphere than on the left hemisphere (Baum & Pell, 1999). Thus, we expected more reorganization in the right hemisphere after an intervention on the musical elements of speech, compared with the other linguistic components (i.e., phonology, semantics, orthography, and syntax), which preferentially target the left hemisphere. We also compared the neural effects of interventions targeting the right versus the left hemisphere. Our hypothesis was that brain changes related to interventions focused on rhythmic-melodic processing are more frequently located in frontal and temporal lobes of the right hemisphere.
Concerning the second goal, we first describe whether the most consistent treatment-related changes across therapies occur within the language network (i.e., a combination of response maps visualized in Figure 1) or outside (aim 2a). Although the focus of research on brainlanguage relations in aphasia recovery has been language centered, Cahana-Amitay and Figure 1. Association test maps for the keywords "semantic" (blue, 1,031 studies), "phonological" (red, 377 studies), "syntactic" (orange, 169 studies), and "orthographic" (green, 132 studies) processing, according to the meta-analysis of Neurosynth (http://neurosynth.org/). First row: left view, second row: right view. The figures were composed using Paraview software (version 5.4.1; https://www.paraview.org/) following the guidelines specified in Madan (2015). Albert (2015) argued in their review that other cognitive functions, for example, attention, shortterm memory, and cognitive control, also contribute to aphasia recovery. In addition, more recent models on the neurobiology of language also have considered brain regions that are involved in multiple other functions (Price, 2000(Price, , 2010(Price, , 2012. Therefore, we hypothesized that the observed treatment-related brain changes are not restricted to regions classically associated with linguistic processing, but involve a variety of brain structures associated with nonlinguistic cognitive functions. In addition, we explored whether and how the most consistent treatment-related changes are associated with individual differences in language outcomes (aim 2b). In accordance with findings in Saur et al. (2006), we expected that, at least in the chronic phase post-stroke, normalization of activity to the left hemisphere is most associated with language improvement (restoration). However, because in patients with extensive lesions the left-hemispheric recovery potential is limited, associations between language improvement and brain changes in right-hemispheric regions are expected as well (compensation). For the second aim, we considered treatmentrelated regional changes as well as changes in connectivity patterns.

Inclusion and Exclusion Criteria
We searched three databases: Pubmed (https://pubmed.ncbi.nlm.nih.gov/), Embase (https:// www.embase.com), and Web of Science (https://www.webofknowledge.com) for studies exploring neuroanatomical and/or functional changes in patients with aphasia due to specific language interventions, published between January 2000 and April 2018. More specifically, in the keywords, we combined three main concepts, that is, (1) aphasia, (2) brain changes, and (3) intervention, using different words for each concept (Table 1). If all three concepts were present in the title and/or the abstract, the article was included for further consideration. By Note. The asterisks represent wildcards and can be replaced by one or more characters (e.g., the search term anatom* will look for terms anatomical, anatomy, etc.).
screening the reference list of the so-collected articles, relevant papers published after 2000 were additionally added.
We established the following inclusion criteria. First, the patients had to be adults (to ensure that language and brain development were complete), who had been diagnosed with aphasia as a consequence of a cerebral vascular accident (i.e., stroke). Second, the study had to statistically evaluate the effect of the treatment using measures collected through functional or structural neuroimaging. We decided not to exclude studies on the basis of MRI modality, given that training-induced neuroplasticity can be reflected in functional cortical changes as well as structural white matter changes, and that changes in each modality are related to each other (Honey et al., 2009). Third, each therapy investigated had to focus specifically on one, maximally two, linguistic domain(s): semantics, phonology, syntax, orthography, and/or melody/rhythm, to enable the identification of brain changes after training of these specific types of language processing (aim 1). For this reason, studies providing mixed conventional therapy (e.g., Aerts et al., 2015), intention treatment (e.g., Benjamin et al., 2014), action observation treatment (e.g., Gili et al., 2017), interventions on the activity/participation level, imitation therapy (e.g., Santhanam, Duncan, & Small, 2018), script training (e.g., Fridriksson, Hubbard, et al., 2012), or constraint-induced language therapy (e.g., McKinnon et al., 2017) were not considered. We also excluded studies that combined language therapy with noninvasive brain stimulation and/or drug trials, intervention studies in bilingual aphasia, non-peer-reviewed reports, and studies that were not available in English. Figure 2 represents the literature search process.

Characteristics of the Studies Included
In total, we identified 32 studies on treatment-related brain changes in PWA that met the inclusion criteria. All references are listed per imaging modality, in-scanner task, and targeted linguistic component in Table 2. For the sake of brevity, we refer to Appendix A for details on each study and to the Supplementary Information for in-depth information on the specific interventions that each study used. (Both can be found in the online supporting information for this article.) No specific constraint was set on the time post-stroke, but 94% of the studies (30 out of 32) included only patients who were in the chronic stage (≥6 months) post-stroke, to avoid the confounding effects from spontaneous recovery. The remaining two studies included PWA who were at least 4 months post-stroke. Figure 3 shows the number of studies (counts, on the y-axis) with different numbers of participants (PWA, on the x-axis). In total, 11 studies shared participants with one or two other studies. Seven of these studies were considered separately because they applied different, and mostly unrelated, analyses, that is, task-based fMRI versus resting-state fMRI versus diffusion-weighted imaging (DWI; van Hees et al., 2014;van Hees et al., 2014avan Hees et al., , 2014b, voxel-wise whole-brain contrast analysis versus region of interest (ROI) based effective connectivity analysis (Vitali et al., 2007;Vitali et al., 2010) and univariate versus multivariate fMRI analysis (Fridriksson, 2010;Fridriksson, Richardson, et al., 2012). Two studies were considered separately because there was only minimal overlap in participants and different interventions were considered (Fridriksson, Morrow-Odom, Moser, Fridriksson, & Bayliss, 2006;Fridriksson et al., 2007). Finally, the studies of Abel, Weiller, Huber, and Willmes (2014) and Abel, Weiller, Huber, Willmes, and Specht (2015) were considered as one study in this review, because the same voxel-wise whole-brain contrast analysis has been reported in both studies. More details for each study concerning the rationale behind these decisions can be found in the online supporting information. In total, 204 unique patients with aphasia were tested in the articles within the scope of this review.

Variability Across Included Studies
Differences across studies in method, participants, modality, task, and contrasts To collect a sufficient number of studies on the topic of treatment-related brain changes in PWA, we included both studies reporting standard spatial coordinates for brain regions showing treatment-related neural plasticity and studies that only reported anatomical labels for these regions. For our analysis, we used the anatomical labeling according to the AAL-VOI atlas (Tzourio-Mazoyer et al., 2002). If the study reported standard stereotaxic coordinates, the anatomical labels for these coordinates were derived from the atlas. If the study did not report   Wierenga et al. (2006) fMRI SG standard stereotaxic coordinates, the anatomical labels of the study were adopted if the labels corresponded to one of the labels in the atlas. If studies reported labels that did not correspond to a label in the atlas (e.g., [dorsolateral] prefrontal cortex), we assigned the region to one of the labels in the atlas where possible (e.g., middle frontal gyrus), or reported the labels in addition to the atlas labels (which was the case for the inferior frontal sulcus and the premotor area). There are two disadvantages to this choice. First, this review is descriptive because an insufficient number of studies reported standard spatial coordinates to support a quantitative meta-analysis. This, in combination with our choice to focus only on intervention studies targeting maximally two linguistic domains, restricted the number of studies for a meta-analysis. Second, the spatial resolution is reduced because it was not always possible to be sure which parts of an anatomical label were being referred to.
Even when studies used the same imaging modality (e.g., DWI or fMRI), there were substantial differences in methodology (for review, see Crosson et al., 2007). These include differences in fMRI tasks, research design (group study vs. multiple single-subject study), or analysis method (whole-brain vs. ROI analysis). Functional neuroimaging was used in 29 papers to investigate treatment-related changes in the brain, of which 26 applied task-based fMRI, one acquired resting-state fMRI, one applied positron emission tomography (PET), and one applied MEG  Note. Some studies target multiple linguistic components and are therefore repeated. For example, several of the identified neuroimaging studies provide a semantic treatment alternated with a phonological treatment. However, they do not always differentiate between the two types of intervention when reporting BOLD changes or they additionally report general BOLD changes over the course of both treatments. The results of these kinds of studies are therefore listed under sem (separate results for the semantic treatment), phon (separate results for the phonological treatment), and sem+phon (mixed results after both treatments). rs-fMRI = resting-state fMRI, DWI = diffusion-weighted imaging, PET = positron emission tomography, ON = object naming, SFV = semantic feature verification, VN = verb naming, WJ = word judgment, SL = sentence listening, SR = sentence repetition, RD = rhyme detection, SYR = syllable repetition, SJ = semantic judgment, RJ = rhyme judgment, SPM = sentence-picture matching, SG = sentence generation, RCV = repetition of chanted vowel changes, RSW = repetition of spoken/sung words, NA = not applicable, BOLD = blood oxygen level dependent.
used with MRI for source localization. Of these functional neuroimaging studies, 62% explored functional changes during an object-naming task (18 out of 29), and five of the included studies additionally concentrated on treatment-related changes in neural connectivity patterns. Finally, three papers applied a structural neuroimaging method, that is, DWI, investigating either local (one study) or distributed white matter changes (two studies). Although  demonstrated in a cohort of 132 stroke patients that language relied on both highly localized brain regions, as well as on bilateral brain networks and their connections, surprisingly, in total only seven studies focused on treatment-related changes in neural connectivity patterns (Kiran, Meier, Kapse, & Glynn, 2015;Marcotte et al., 2013;Sandberg, Bohland, & Kiran, 2015;Schlaug, Marchina, & Norton, 2009;van Hees et al., 2014avan Hees et al., , 2014bVitali et al., 2010). Importantly, five of the seven connectivity studies used an ROI-approach in which the ROIs were chosen based on previous literature or a healthy control group (Marcotte et al., 2013;van Hees et al., 2014b;Vitali et al., 2010;Schlaug et al., 2009). This might have induced bias towards the language network.
When comparing different functional imaging studies, the contrast used to identify a neural response pattern will naturally determine the voxels associated with a specific conditiondependent effect. Some studies applied lenient contrasts, for example, overt picture naming versus rest (e.g., Cornelissen et al., 2003), while others applied very stringent contrasts, such as overt picture naming versus saying "baba" to digitally distorted nonsense images (e.g., Marcotte & Ansaldo, 2010;Marcotte et al., 2012). This makes comparison of studies difficult, because the use of a lenient contrast will identify brain regions involved in a very wide range of processing (from lower to higher level). In addition, across studies, the statistical threshold applied to these contrasts of interest varied. This threshold was sometimes not reported (Marcotte & Ansaldo, 2010;Vitali et al., 2007), and frequently, it was not corrected for multiple comparisons (Abel et al., 2014(Abel et al., , 2015Haldin et al., 2018;Kiran et al., 2015;Marcotte et al., 2013Marcotte et al., , 2018Menke et al., 2009;Thompson, Riley, den Ouden, Meltzer-Asscher, & Lukic, 2013;Vitali et al., 2010;Wan et al., 2014). This again complicates the comparison of response foci across studies.

Correct versus incorrect language behavior
The studies included differ in whether they included incorrect and/or absent language behavior in their analysis or not (see Appendix A, column "extra," in the online supporting information). Some studies included all responses in their analysis (correct, incorrect, and no-response items), while others contrasted trained items with correct items pretreatment (trained items > correctly named items pretreatment), as well as incorrect items pretreatment (trained items > incorrectly named items pretreatment). In the latter case, we only included the results of the contrast with correct naming pretreatment. This is because it is assumed that incorrect (language) behavior has a different neural signature (Meinzer et al., 2013) and activates an error network in the brain, including for example the anterior cingulate cortex (Stevens, Kiehl, Pearlson, & Calhoun, 2009), rather than the processing of interest (Price, Crinion, & Friston, 2006). However, it is important to note that this creates a source of variability between studies, as not all of them disambiguated correct and incorrect trials in their analyses. (For a more elaborate discussion of this matter, see Crosson et al., 2007 andMeinzer et al., 2013.) Direction of brain change Some papers found both upregulation and downregulation of neural activity-in the same brain regions-in different subjects who went through the same language intervention. Increased brain activity, in intensity or extent, might reflect the restoration of neural activity in the perilesional language areas, the engagement of homolog language regions, or compensatory strategies that involve brain regions that are not traditionally associated with language (Cabeza, Anderson, Locantore, & McIntosh, 2002). Increases in brain activity could, on the other hand, also point to inefficient use of neural resources or increased effort when performing language tasks (Fridriksson & Morrow, 2005). In contrast, reduced brain activity accompanied by behavioral improvement could represent increased efficiency in the use of regions (Wierenga et al., 2006), consistent with the effect of practice during skill acquisition (for meta-analysis, see Chein & Schneider, 2005). Decreased activity could alternatively point to persistent malfunctioning, disconnection, or missing input due to the brain damage. Likewise, different recovery mechanisms might also occur simultaneously (Abel et al., 2015). Hence, it is important to relate the BOLD changes to behavioral changes, or to compare them with a healthy control group to interpret them correctly. If there is no relation between brain and behavior, the BOLD changes are hard to interpret, and all that can be generated are hypotheses to be tested in future research. For these reasons, we will not make a difference between upregulation and downregulation of neural activity for the fMRI results in this review, but generally refer to "changes" in the BOLD signal. The difference between increases and decreases in BOLD signal will only be considered below in Association with language outcomes, where we discuss how the brain changes are related to behavior, and the interpretation of the results.

Treatment-related brain changes
A general problem in neuroimaging reviews is the substantial variability in individual brain reorganization patterns-both within and across studies-which makes comparisons between studies very challenging. Throughout the review, we primarily referred to treatment-related brain changes, which encompassed neural plasticity in the language network, in homologous areas of the right hemisphere, or alternatively, the recruitment of supporting neural infrastructure (e.g., due to a strategy change) or brain dynamics related to (changes in) error processing. This choice reflects that, aside from neuroplasticity in the language network in the left and/or right hemisphere, several alternative processes can induce brain changes over treatment. For example, participants could have had different neural recruitment strategies during language processing before the stroke, related to a difference in task strategy. After the stroke, these differences may be further strengthened by the differential impact of the functional and structural lesions on the brain (Thompson, den Ouden, Bonakdarpour, Garibaldi, & Parrish, 2010). Similarly, participants could rely less or more on supporting cognitive processes after versus before the treatment, (e.g., attention, executive control, and responsive inhibition; Kurland, Baldwin, & Tauer, 2010). In addition, behavioral improvement can manifest in different ways: as an increase in correct attempts or as a decrease in overall errors. Both means of recovery can lead to different brain response patterns (e.g., in the error network as explained in the previous section; Raichle et al., 1994). For comparison purposes, we have tried to include as much study-specific information as possible in Appendix A in the online supporting information.

Neurosynth
To answer our second research question, whether the most consistent treatment-related changes across therapies occur within the language network or outside, we used Neurosynth association test maps for the different linguistic components as a reference (Figure 1). It should be noted that this analysis combines highly variable studies. The association map represents a z-map for a twoway ANOVA testing for an association between the search term and voxel responses. Because a large number of studies contribute to the meta-analysis, it is assumed to provide a good estimate of the specific response patterns (Yarkoni, Poldrack, Nichols, van Essen, & Wager, 2011).

RESULTS AND DISCUSSION
The first aim of this review was to explore whether treatment effects are dependent on the focus of the therapy. The second aim was to explore (a) whether the brain regions/networks that most often show treatment-related changes are located within the language network or outside, and (b) whether these changes are associated with individual differences in language (improvement). In Appendix A in the online supporting information, the studies addressing local treatment-related brain changes in specific gray or white matter regions are depicted in italic font. The studies addressing distributed treatment-related brain changes in connectivity patterns are depicted in bold. There was an insufficient number of connectivity studies to investigate the first aim (where they need to be split according to the targeted linguistic component) and therefore these studies are only discussed within the second aim.
Does the Neural Effect Depend on the Focus of the Intervention? Figure 4 shows, for all the brain regions in the left hemisphere undergoing intervention-related changes, which types of interventions (sem, phon, sem+phon, phon+orth, syntax, and r-m) have been associated with changes in that area. For example, the left superior frontal gyrus (SFG) was reported in one out of three studies targeting syntactic processing. For this region and this type of intervention, the corresponding proportion is 0.33, which is represented by an orange bar. In addition, the left SFG was reported in four out of 11 studies targeting phonological processing. Therefore, the red bar representing phonological processing has a height of 0.36. This proportion is calculated for every type of . Proportion of studies reporting treatment-related brain changes in a specific brain region of the left hemisphere, relative to the total number of studies providing this type of intervention. The number of studies (#) and patients with aphasia (n) per type of intervention is reported in the figure legend. SFG/MFG/IFG = superior/middle/inferior frontal gyrus, IFS = inferior frontal sulcus, oper = opercular, tri = triangular, orb = orbital, med = medial, SMA = supplementary motor area, IPG/SPG = inferior/superior parietal gyrus, SPL = superior parietal lobule, SMG = supramarginal gyrus, AG = angular gyrus, STG/MTG/ITG = superior/middle/inferior temporal gyrus, SOG/MOG/IOG = superior/middle/inferior occipital gyrus, HC = hippocampus, PHCG = parahippocampal gyrus. Anatomical labels other than those included in the AAL-VOI atlas used by the included studies that did not report standard brain coordinates are IFS and SPL. intervention, and consequently, for each region, the bar represents a stacked proportion, which can be greater than one. The higher a specific color in the stacked bar, the more the brain changes in that region were specific to the language component. The higher the stacked bar, the higher the number of studies that led to changes in that brain region. In Figure 5 the data for the right hemisphere are represented in a similar way. We also made this figure after excluding studies that did not use a naming task in the scanner (mostly rhythmic-melodic and syntactic processing). As the results are very similar for the other language domains, we will not further discuss this.

Neural differences within linguistic interventions
Our first hypothesis was that brain changes related to interventions focused on phonological processing are more frequent in regions associated with phonological processing, particularly the left posterior inferior frontal lobe, the dorsal premotor regions, and an area in the parietotemporal junction. Based on Figures 4 and 5 one can see that, although widespread, most brain changes after phonological interventions (red) occurred in the bilateral SFG, middle frontal gyrus (MFG), precuneus, cingulum, and cerebellum, the left supramarginal gyrus (SMG), superior parietal gyrus (SPG), MTG, precentral gyrus, and the right insula, calcarine gyrus, and basal ganglia. Except for the precentral gyrus and the left SMG, these regions are not typically associated with phonological processing. However, the SFG, the cerebellum, the precentral gyrus, and the insula are implicated in motor speech (Ackermann & Riecker, 2010;Stegemöller, 2017;Tourville & Guenther, 2011). PWA, especially the nonfluent subtype, frequently struggle with motor speech planning (Ogar, Slama, Dronkers, Amici, & Gorno-Tempini, 2005) and articulation, which complicates the differentiation of articulatory Figure 5. Treatment-related brain changes in the right hemisphere. The number of studies (#) and PWA (n) per type of intervention is reported in the legend. Anatomical labels other than those included in the AAL-VOI atlas used by the included studies that did not report standard brain coordinates are PMA and SPL. SFG/MFG/IFG = superior/middle/inferior frontal gyrus, oper = opercular, tri = triangular, orb = orbital, med = medial, PMA = presupplementary motor area, SMA = supplementary motor area, IPG/SPG = inferior/superior parietal gyrus, SPL = superior parietal lobule, SMG = supramarginal gyrus, AG = angular gyrus, STG/MTG/ITG = superior/middle/inferior temporal gyrus, SOG/MOG/IOG = superior/middle/inferior occipital gyrus, HC = hippocampus, PHCG = parahippocampal gyrus. and phonological errors. Therefore, it could be possible that the phonological language interventions, all targeting speech production, indirectly affected speech-motor processes as well.
Our second hypothesis was that brain changes related to interventions focused on semantic processing are more frequent in regions associated with semantic processing, particularly the bilateral posterior MTG and ITG, and the left anterior temporal lobe. As with phonological interventions, therapies focusing on semantic processing (blue) were associated with changes in the bilateral (although left > right) frontal, temporal, and parietal lobes. However, there is also evidence that the temporal lobe was more influenced by semantic interventions, especially in the left hemisphere. More than 40% of semantic studies found brain changes in the left temporal lobe, compared with only 10-25% of phonological studies. Interventions combining semantic and phonological processing (purple) led to very mixed results bilaterally (left > right) in the frontal, temporal and parietal lobes, as well as in more medial/subcortical structures and the cerebellum.
The third hypothesis was that brain changes related to interventions involving orthographic processing (green) are more frequent in orthographic language networks. Those are situated more posteriorly and ventrally than the previous networks and include the posterior temporal lobe, the fusiform gyrus, the lingual gyrus, the calcarine gyrus, and the cuneus (also see Table 1 in the online supporting information). Again, the results are widespread (right > left), but there was relatively more involvement of the left inferior occipital gyrus (IOG) and the right fusiform gyrus compared with the other interventions. This might be related to the early visual processing of sublexical forms and reading processes that occur in these posterior ventral regions. However, one should keep in mind that these results are based on only two studies.
The fourth hypothesis (mainly based on Vigneau et al., 2006) was that syntactic processing (orange) engages inferior frontal regions as well as superior temporal regions, anteriorly as well as posteriorly. Based on Figures 4 and 5 this is clearly the case, although changes were not restricted to these areas. For example, there are also BOLD changes in the bilateral superior parietal lobule and in visual regions of the right hemisphere.
Although we expected that the linguistic interventions would mainly lead to brain changes in the left hemisphere, there is bilateral neural involvement. Due to within-study variation in lesion size and location in different PWA, perilesional activity might have been masked (Crosson et al., 2007). Fridriksson, Richardson, et al. (2012) created patient-specific ROIs of the perilesional cortex and residual naming areas in each lobe. They found that the best predictor of naming improvement was an increase in subject-specific perilesional activity in the frontal areas involved in naming, as well as frontal regions not recruited for naming by the control group. Thus, group studies most probably underestimate the involvement of the left hemisphere in the treatmentrelated recovery of aphasia. In addition, note that some studies did not show behavioral improvement in (some) participants (see columns "behavioral outcome" and "results" in Appendix A in the online supporting information) and/or included incorrect responses in their analysis (see Appendix A, column "extra"). Therefore, it is possible that some of the results (i.e., the involvement of the bilateral anterior cingulate region, insulae, right parietal lobe, medial temporal lobe, basal ganglia, and thalamus) are related to error processing (Stevens et al., 2009). However, most therapies did have a positive behavioral outcome. Another important limitation is that the amount of treatment-related brain changes in each brain region was not considered, because these data were not available for every study. This may have masked neural differences in the effect of different types of treatment, since the amount of brain activity (rather than the location) in the affected networks could be specific to the treatment. On the other hand, it could also be possible that treatment-related effects are not specific to the type of therapy administered. In conclusion, treatment-related brain changes do not seem to be very treatment-specific. However, error processing effects and the fact that we were not able to quantify the degree to which brain regions were affected by each treatment, may have masked treatment specificity.

Neural differences between linguistic and rhythmic-melodic interventions
Here, we compared interventions targeting the left hemisphere with those targeting the right hemisphere. It is widely accepted that typical linguistic processing is more supported by the left hemisphere than the right hemisphere (e.g., Vigneau et al., 2006), whereas the right hemisphere is more involved in musical, prosodic, and metalinguistic processing (Leon, Rodriguez, & Rosenbek, 2015). More specifically, phonological, semantic, orthographic, and syntactic processing rely more on the left hemisphere. However, there is some involvement of the right temporal lobe in the processing of context, for example during sentence and discourse comprehension, which involves both syntactic and semantic processing (Vigneau et al., 2011). In contrast, there is evidence that linguistic prosody encoded in the intonational contour of a sentence, relies on frontotemporal areas in both hemispheres. The less segmental information, the higher the involvement of the right hemisphere relative to the left one. This functional distinction between the left and right hemisphere begins during acoustic processing in the primary auditory cortex (A1; Friederici, 2011;Witteman, van Ijzendoorn, van de Velde, van Heuven, & Schiller, 2011). According to the asymmetric sampling in the time hypothesis (Poeppel, 2003), left A1 is specialized in processing rapidly changing information with a time resolution of 20-40 ms (e.g., speech sounds), while right A1 prefers the processing of slowly changing information (150-250 ms), such as tonal pitch changes. Based on these findings, we expected that the right hemisphere would be affected by rhythmicmelodic language interventions, such as melodic intonation therapy (MIT) and SIPARI, compared with the other interventions. Similar to MIT, SIPARI combines singing, intonation, prosody, breathing (atmung in German), rhythm, and improvisation (Jungblut, Huber, Mais, & Schnitker, 2014), and therefore places high demands on suprasegmental aspects of language. The initial focus is on vocal training of melodic speech segments assumed to be supported by the right hemisphere. Subsequently, the focus shifts to rhythmic chunking of these speech segments with different complexity levels to stimulate the left hemisphere.
We hypothesized that brain changes related to interventions focused on rhythmic-melodic processing (gray) are more frequent in the frontal and temporal lobes of the right hemisphere. When we compare effects in the left versus the right hemisphere, Figures 4 and 5 show that three out of four r-m studies reported brain changes in the right IFG (in both the triangular and opercular part), while only one out of four studies reported changes in the same structures in the left IFG. Moreover, all r-m studies reported changes in the right STG, while only one study showed changes in its left-hemispheric counterpart. Only in Jungblut et al. (2014), who provided SIPARItreatment, was there any evidence of left hemisphere involvement. This can be explained by the fact that SIPARI is theoretically structured in such a way that, in addition to the right hemisphere, the left hemisphere is increasingly stimulated over time by shifting the focus from singing to rhythmic chunking of speech. When linguistic and rhythmic-melodic interventions are compared, it can be seen that in the right STG and opercular part of the IFG there are more studies (at least 33%) on rhythmic-melodic interventions that found changes in these regions, than the studies on linguistic interventions. All r-m studies applied a whole-brain contrast analysis or a data-driven ROI-analysis, which precludes bias due to the methodological approach.

Conclusions aim one
From the above discussion in Neural differences within linguistic interventions, it seems that most brain regions with treatment-related changes were not specific to a particular type of language intervention (because most regions have bars in multiple colors and not one). On the other hand, there are some indications that some regions were more likely to show brain changes when training a specific aspect of language. For example, in the temporal lobe, changes related to a semantic intervention occurred more consistently than changes related to a phonological intervention. Moreover, the studies integrating phonological and orthographic processing led to more changes in the ventral posterior network compared with the other interventions. The interventions focusing on semantic, phonological, orthographic, and syntactic processing seemed to elicit brain changes in both hemispheres. This right-hemispheric involvement in treatments classically targeting the left hemisphere could point to compensatory mechanisms after left-hemispheric brain damage. (For a review see Cocquyt, De Ley, Santens, Van Borsel, & De Letter, 2017.) This right-hemispheric compensation typically takes place in brain regions homologous to language regions in the left hemisphere or in regions involved in more general cognitive functions (e.g., executive functioning; see the next section). From the above discussion in Neural differences between linguistic and rhythmic-melodic interventions, it seems that the language interventions focusing on rhythmic-melodic processing included in this review elicited more changes in the right hemisphere compared with the left hemisphere. This right dominance was not found for the linguistic interventions. In general, across treatments and subjects, the regions that are involved in language recovery are very diverse. It is hard to find similar patterns of brain changes between intervention studies targeting the same linguistic component.

Consistency of Location of Treatment-Related Brain Changes
The second aim of this literature review was to describe whether the most consistent treatmentrelated changes occur within the language network (i.e., Neurosynth response maps visualized in Figure 1) or outside. We summarized which brain regions showed consistent treatment-related changes across the included studies and investigated whether these ROIs are located within the linguistic maps visualized in Figure 1 (aim 2a). We then explored whether and how these consistent brain changes are related to (a change in) language behavior (aim 2b). Which brain regions show treatment-related brain changes? Table 3 summarizes which brain regions show treatment-related brain changes across the studies included in this review and how frequently each region is identified. Table 1 in the online supporting information indicates which type of linguistic functions (semantics, phonology, orthography, syntax, and/or rhythmic-melodic processing) have been associated with each of these areas. Table 3, the brain regions that were most frequently reported showing treatmentrelated brain changes in PWA, across all kinds of language interventions, are the bilateral SFG, MFG, IFG, precentral gyri, superior STG, MTG, SPG, SMG, precuneus, basal ganglia, cingulum and cerebellum, the left ITG and inferior parietal gyrus, and the right insula. At least five of the 25 studies showed treatment-related brain changes in these ROIs. This choice reflects effects that are present in one out of five of all treatment-related intervention studies in PWA, which is an effect size reported to be much more common than higher effect sizes (Eickhoff et al., 2016). An important remark is that six of the 25 studies considered here used an ROI-approach in their analysis, which might lead to an overrepresentation of "classic" language areas. However, in five of them, the ROIs were chosen based on the results of a precedent whole-brain analysis, which minimizes the possible bias of the analysis choice.
Consistent treatment-related changes (five of the 25 studies, in color) overlap with the language network (in black) mainly in the left IFG (the orbital, triangular, and opercular part), the left (ventral) precentral gyrus, the left SMG, IPG and ITG, and in the bilateral STG and MTG. Although the left insula, ITG, fusiform gyrus, and angular gyrus also belong to the traditional language network depicted in black, treatment-related changes have been less consistently found in these regions. Although the language network (in black) is more situated in the left hemisphere than the right hemisphere, treatment-related brain changes (in color) are not restricted to the left hemisphere. Gold and Kertesz (2000) stated that contributions of the right hemisphere are task-dependent and are larger in lexico-semantic processing than phonological processing. This arises because, in the healthy brain, lexico-semantic processing is less left-lateralized (i.e., more bilateral) than phonological processing. However, in Figure 5 it can be observed that all treatments evoked changes in right-hemispheric ROIs, not only interventions focusing on semantic processing. Structures wherein treatment-related brain changes occurred consistently over different types of treatment, and which do not nicely overlap with the linguistic network (in black) were the lateral, orbital as well as medial part of the bilateral superior and middle frontal gyri, the left hippocampus, paracentral lobule and IOG, the bilateral precuneus, cingulate, cerebellum, SPG, and the right basal ganglia. However, for the right cerebellum and the left precuneus, there were some overlapping "dots." Figure 6 might overrepresent the frontal and cerebellar areas, because we were limited to visualize ROI labels instead of specific MNI-coordinates. Not all studies reported these coordinates, and these ROIs tend to encompass large areas of the brain.
Concerning connectivity, the included functional and effective connectivity studies (not represented in Table 3 or Figure 6) found modulations of the connectome in very similar regions. van Hees et al. (2014b) and Sandberg et al. (2015) demonstrated posttreatment modulations of resting-state and task-related functional connectivity strength, respectively. Upregulation of Figure 6. Comparison of Neurosynth language networks (black regions in the lower panel) with regions of interest (ROIs) most frequently associated with treatment-related brain changes (shown on the color map in the upper panel). In the upper panel, the color represents the number of selected studies in which the ROI was reported. In the lower panel, the black regions correspond to association test maps of semantic, phonological, syntactic, and orthographic processing, according to the Neurosynth meta-analysis (http://neurosynth.org/). The figures were composed using Paraview software (version 5.4.1; https://www.paraview.org/) following the guidelines specified in Madan (2015). connectivity strength was found in the language network of PWA in and between both hemispheres, as well as in the bilateral SFG, left MFG, precuneus, and precentral gyrus (Sandberg et al., 2015). On the other hand, there was a downregulation of connectivity between bilateral language regions and the right basal ganglia, cingulum, and cerebellum (van Hees et al., 2014b). Kiran et al. (2015) and Vitali et al. (2010) showed treatment-related changes in task-related effective connectivity patterns. In agreement with the functional connectivity results, effective connectivity modulations existed throughout the language network in and between both hemispheres, as well as between the IFG and the MFG in both hemispheres . Furthermore, Marcotte et al. (2013) demonstrated that language interventions in PWA were able to normalize the amount of functional integration within the posterior default mode network (the bilateral MTG, the AG, the left ITG, the right middle cingulate, and the right cerebellum). This network is more active during conscious resting states of the brain compared with during the performance of a cognitive task (Cavanna & Trimble, 2006).

Association with language outcomes
We explored whether the measured brain changes in the regions visualized in Figure 6 show (positive) associations with language outcomes. As explained above in Direction of brain change, relating the brain changes to behavioral change is necessary to interpret the meaning of the activity patterns. Therefore, in this section, we will consider the difference between an increase and a decrease in the measured neural response over the course of the treatment.
Among the consistently identified ROIs (colored regions in Figure 6) overlapping with the language network, BOLD increases in the left IFG oper (Fridriksson, 2010), BOLD decreases in the left STG, MTG, and SMG (Abel et al., 2014), and BOLD increases as well as decreases in the IPG (Abel et al., 2014;Fridriksson, 2010;Raboyeau et al., 2008) have been related to language improvement across studies. These studies highlight the importance of the left hemisphere for aphasia recovery, although it is not clear why Abel et al. (2014) found BOLD decreases in these regions, in contrast to the other studies. However, they found a negative association between activity decrease and therapy gains, implicating the importance of continued reliance on the left hemisphere during the treatment. For the right hemisphere, there are two main hypotheses concerning its involvement in the post-stroke neural response pattern. In the first hypothesis, left-hemisphere damage is thought to induce pathological transcallosal disinhibition of the right-hemisphere homologs. As such, the proponents of this presupposition, view right-hemispheric activity as detrimental (e.g., Heiss & Thiel, 2006;Price & Crinion, 2005;Saur et al., 2006). Support for this hypothesis comes from the positive association between language improvement and BOLD decreases in the right insula and IFG oper (Nardo et al., 2017), the right precentral gyrus and precuneus (Raboyeau et al., 2008), or a decrease in fractional anisotropy of the IFG oper (Wan et al., 2014). However, two other studies in this review found positive associations between language improvement and brain responses in different right hemisphere regions (MFG, SMA, fusiform gyrus, hippocampus, SPG, putamen and anterior cingulate). This suggests that the right hemisphere could also support language recovery in the chronic phase after stroke (Menke et al., 2009;Raboyeau et al., 2008).
Positive correlations between language improvement (picture naming or picture description) and response (changes), in regions that are not traditionally associated with language, suggest that these structures may play a role in recovery from aphasia, particularly naming. More specifically, there were positive associations between naming improvement and BOLD signal in the bilateral SFG , BOLD increases in the bilateral MFG (Fridriksson, 2010;Raboyeau et al., 2008), BOLD decreases in the bilateral precuneus (Raboyeau et al., 2008), BOLD increases in the left precuneus (Fridriksson, 2010), posttreatment BOLD signal in the right precuneus (van Hees et al., 2014), BOLD decreases in the left paracentral lobule (Abel et al., 2014), BOLD increases in the left IOG, left cerebellum, bilateral hippocampus, and the right SPG (Menke et al., 2009), and BOLD increases in the right putamen and the right anterior cingulate (Raboyeau et al., 2008). In summary, across different studies, changes in left-hemispheric language regions, right-hemispheric homologs, as well as bilateral regions not traditionally associated with language have been associated with language improvement over treatment. It is very likely that part of this variability is caused by variations in lesion size and site, which determines whether there is still recovery potential in the left hemisphere, as well as premorbid lateralization patterns (Warburton, Price, Swinburn, & Wise, 1999) and time post-stroke (Saur et al., 2006).
Among the connectivity studies, Vitali et al. (2010) showed associations of modulations in task-related effective connectivity patterns throughout the language network in and between both hemispheres with correct picture naming of trained items. In contrast, the correlations between the amount of functional integration within the posterior default mode network and naming improvement was not significant in the study by Marcotte et al. (2013). Overall, these results suggest that modulation and normalization of functional and effective connections within and outside the language network are concurrent, but not necessarily correlated, with aphasia recovery. In addition to functional connectivity, recent research has suggested that intact structural connectivity is beneficial for successful aphasia recovery (Bonilha, Gleichgerrcht, Nesland, Rorden, & Fridriksson, 2016;Bonilha et al., 2017;Griffis, Nenert, Allendorfer, & Szaflarski, 2017;Yourganov, Fridriksson, Rorden, Gleichgerrcht, & Bonilha, 2016). Several studies have demonstrated the importance of the AF for language recovery after stroke, especially for improvement in speech production (Breier, Juranek, & Papanicolaou, 2011;Hosomi, Nagakane, & Yamada, 2009;Jang, 2013;Jang & Lee, 2014). This finding is consistent with the assumed function of this white-matter pathway in mapping acoustic representations of sounds with their motor representation (Saur et al., 2008). In the study by van Hees et al. (2014a) included in this review, the preand posttreatment mean generalized fractional anisotropy, an indirect measure of fiber bundle characteristics, of the left AF correlated with maintenance of (phonological) treatment gains, and the fractional anisotropy value increased over treatment. This observation complements that of Marchina et al. (2011) andWang, Marchina, Norton, Wan, andSchlaug (2013) who showed that lesion load of the left AF significantly predicted the level of impairment in language production in chronic stroke patients, explaining more variance in language behavior than the functional gray matter lesion load. It also has been shown that better language performance after irreversible damage in the left hemisphere is associated with increased structural connectivity in the right AF (Forkel et al., 2014). Fridriksson et al. (2006) demonstrated significant naming recovery of a subject sparing some white matter connections in the inferior frontal lobe, while a patient with more extensive white matter damage in this area did not recover. Because the AF is connected to the posterior part of the ventrolateral frontal lobe (Catani, Jones, & Ffytche, 2005), these white matter connections might have been part of it.

Conclusions aim two
In summary, across studies, many brain regions have been associated with treatment-related language recovery, although in a very inconsistent way, which makes it difficult to make any decisive conclusions. Although the left IFG, STG, MTG, and inferior parietal regions are typically considered to be involved in linguistic processing (Geschwind, 1970;Hickok & Poeppel, 2007), their right counterparts and the bilateral SFG, MFG, precentral gyri, SPG, precuneus, cerebellum, cingulum, right insula, and basal ganglia are not. The latter regions are mainly known for their nonlinguistic functions. Thus, it is possible that brain regions more medial and/or subcortical to what is generally studied in the context of language processing, are additionally involved in the process of aphasia recovery. In addition, a variety of other cognitive functions have been attributed to the brain regions that are typically considered to be involved in language processing (Geschwind, 1970;Hickok & Poeppel, 2007), such as feedback mechanisms (motor and auditory); the planning, coordination, timing, execution, and control of (speech) movements; amodal semantic processing; learning; attention; and other higher-order executive functions.
Language improvement is evident not only in changes in distinct gray matter regions, but also in their functional connectivity patterns and in the white matter pathways enabling direct communication between these regions. More research on the role of brain connectivity in aphasia recovery is necessary to fully understand functional deficits beyond the lesion and to evaluate reorganization potential after stroke at the neural network level. Methodological shortcomings and variability between studies (e.g., in lesion size), as well as the large inconsistency in results across the different studies make it hard to present clear conclusions on treatment-related brain changes in PWA. We cannot put forward that language regions show consistent treatment-related changes, nor can we conclude that other regions involved in cognitive processing show consistent changes. There is not, therefore, more evidence for changes in the language network than for changes outside the language network. In the next section, we will discuss these issues in more detail and formulate suggestions for future research.

Limitations and Future Directions
Over the past two decades, an increasing number of studies have investigated treatment-related brain changes in PWA after stroke. In this review, we integrated the results in a descriptive way to provide the current state-of-the-art on this topic. We are well aware that even in this descriptive comparison one should be cautious when interpreting the results because of variability between as well as within studies. In this section, we will summarize the limitations that we came across throughout the review and formulate recommendations to enable quantitative and more reliable comparisons in the future. Table 4 provides a short overview of the limitations and recommendations, which will be elaborated on throughout the text.

Methodological limitations
Although the main aim of this research was to gain insight into the mechanisms underlying aphasia recovery, there are considerable differences in the operationalization of the experiments, the neuroimaging pipelines, and the statistical analyses among the included studies. In summary, there are differences in the imaging modality (PET, MEG, fMRI, DWI), research design (group study vs. [multiple] single-subject study), in-scanner language task (targeted linguistic component, production vs. comprehension, overt vs. covert, word vs. sentence level), assessed contrast (lenient vs. stringent, pre-post vs. trained-untrained), neuroimaging analysis approach (wholebrain vs. ROI-approach), modeling of the trials in the statistical model (including vs. excluding incorrect trials, collapsing data from multiple time points or not), and the statistical analysis itself (e.g., t-test, F-test, [multiple] regression analysis, partial least squares analysis, correlation analysis). The systematic integration of the diverging findings of these studies might overcome this variability to some extent. However, failing to report peak locations of treatment-related effects in stereotaxic reference space makes coordinate-based meta-analysis at present impossible. This is an important limitation and point of concern for future research, because meta-analyses could compensate for the highly variable results from the small-data studies, which are still very common in this field of research (Eickhoff et al., 2016).
Several methodological limitations are related to the brain lesion. To take into account the role of perilesional activity in language recovery over time, it is important to consider individual data Sample size More than 70% of the studies have a sample size smaller than 10 Perform studies of aphasia recovery in larger samples of patients Treatment-related effects outside of the language network Overlap between the language network and several non-language processes Characterize the function, importance, and language specificity of regions that have been consistently identified in aphasia recovery (whole-brain analyses) Connectivity studies Focus on regional changes in functional response patterns Explore treatment-related brain changes on the connectome level, structurally as well as functionally Individual variability as meaningful information Results are collapsed across the subject dimension Consider data at the individual subject level points in the analysis (e.g., regression). In group analysis of PWA with diverse lesion patterns, there is a lack of power to detect changes in perilesional areas, even when the group is highly homogeneous (Crosson et al., 2007;Meinzer et al., 2013). This variability in lesion size and location also leads to a lower statistical power in the lesioned left hemisphere as compared with the structurally intact right hemisphere. In particular, the effect of lesion site and size on treatment-related recovery should be evaluated, because it is known to be one of the most predictive factors in aphasia prognosis (e.g., Plowman, Hentz, & Ellis, 2012). However, only a few studies included lesion information as a confounding variable in their correlation or regression analysis (parametrized as lesion volume or IFG lesion load; Brownsett et al., 2014;van Hees et al., 2014a;Wan et al., 2014). As shown by Wang et al. (2013), it might be interesting for future studies to additionally include a measure of AF-lesion load, because this explained more variance in language behavior than functional gray matter lesion load. Only two studies performed a separate analysis to specifically assess the effect of the lesion on recovery patterns (i.e., voxel-based lesion recovery analysis (Fridriksson, 2010) and joint posttreatment independent component analysis (Abel et al., 2015). Crinion, Holland, Copland, Thompson, and Hillis (2013) provided guidelines for the quantification of brain lesions after stroke, since in this area, there is also substantial variability in methodology.
An important limitation and point of concern for future research is the potential unreliability of the hemodynamic response in stroke populations with cerebrovascular damage. This response is dependent on cerebral blood flow, cerebral blood volume, and oxygen consumption. There is evidence that the neurovascular coupling response, which underlies fMRI and is typically modeled by the hemodynamic response function (HRF), is reduced and delayed in stroke patients (Bonakpardour, Parrish, & Thompson, 2007;Crinion & Leff, 2007;Lake, Bazzigaluppi, & Stefanovic, 2016;Thompson et al., 2010) and even in healthy aging (Nair, Raut, & Prabhakaran, 2017). Siegel, Snyder, Ramsey, Shulman, and Corbetta (2016) demonstrated that over one-third of stroke patients show hemodynamic lags two weeks post-stroke, dropping to 15% three months post-stroke and 10% one year post-stroke. Importantly, the amount of lag severity was correlated with lesion size and severity of deficits in multiple domains. Some studies included in the review tried to consider this by estimating patient-specific HRFs using long-trial fMRI, to enhance the detection of BOLD changes in brain regions with delayed HRFs (e.g., Thompson et al., 2010). Moreover, by conducting additional perfusion imaging, hypoperfused tissue could be identified. Indeed, Thompson and colleagues showed associations between decreased blood flow and an increased time-to-peak value of the HRF, and between perfusion levels and treatment-related BOLD changes. Importantly, this hypoperfusion was not limited to the perilesional area, extending the lesion to remote brain regions, even in the contralateral hemisphere, although perfusion values were generally higher there than in the affected hemisphere. Therefore, studies that limit their analysis to a canonical model of the HRF, based on healthy brain responses, might underestimate or completely fail to detect functional activity in affected regions (Bonakdarpour, Beeson, Demarco, & Rapcsak, 2015).
Subject-level analyses allow for more careful consideration of not only the lesion but also differences between PWA in age, gender, education level, time post-stroke, already received intervention, aphasic symptoms, task strategy, and response to treatment (see also Individual variability as meaningful information below). All these sources of variability add to the challenge of finding consistent results in aphasia recovery studies. In an ideal situation, insights in pre-stroke brain structure and function would highly improve our understanding of neuroplasticity after stroke. This can only be achieved by conducting large-scale longitudinal studies of people at high risk for stroke, considering factors such as family history, arterial hypertension, hyperlipidemia, diabetes, nicotine, and/or alcohol (ab)use. Another option is to assess undamaged structures immediately after stroke, to be ahead of brain reorganization as much as possible.

Interpretation of results
The location of the treatment-related effects in each study is highly dependent on the large number of (subjective) choices on the specific implementation, as listed above. A crucial issue that is not typically addressed in intervention studies of PWA is the distinction between treatmentrelated brain changes and overall scan-rescan variability, especially the neural effect of task learning (Chein & Schneider, 2005). If PWA habituate to the imaging task from pre-to postintervention, neural activity in task-related regions will most probably decrease, independent from the effect of the treatment. Three possible options to disentangle the effects of the treatment from practice-related reductions in brain activity (Rapp, Caplan, Edwards, Visch-Brink, & Thompson, 2013) are (a) to compare the fMRI-task of interest with a control task, (b) to compare treated with untreated items, and (c) to estimate the test-retest variability by performing multiple scan sessions pre-and/or posttreatment Cornelissen et al., 2003;Fridriksson et al., 2007;Fridriksson, 2010;Fridriksson et al., 2006;Fridriksson, Richardson, et al., 2012;Sandberg et al., 2015;Schlaug et al., 2009). However, 11 studies did not perform multiple scan sessions, or mention or use an appropriate control task (such as null events, looking at a fixation cross, or rest; Abel et al., 2014Abel et al., , 2015Jungblut et al., 2014;Marcotte et al., 2018;Menke et al., 2009;Raboyeau et al., 2008;Tabei et al., 2016;Thompson et al., 2010Thompson et al., , 2013Vitali et al., 2007Vitali et al., , 2010, and only one of those 11 contrasted trained items with untrained items to compensate for this (Vitali et al., 2007).
The interpretation of the meaning of the effects is further complicated by the variability in the direction of the change in BOLD response in both hemispheres across PWA and studies, as well as the diverging possible causes of treatment-related neural plasticity. In summary, some of the included studies reported correlations between increased activity in (homologous) righthemispheric regions and language improvement (Menke et al., 2009;Raboyeau et al., 2008), which is frequently interpreted as compensation. Others reported associations between activity decreases in bilateral brain regions and better language performance, which is attributed to increased task processing efficiency (Nardo et al., 2017;Raboyeau et al., 2008). In contrast, other studies found associations between therapy-induced language improvement and increased (or less decreased) activity in bilateral brain regions, including perilesional and spared lefthemispheric areas. This might be related to increased task demands, successful reorganization patterns or maladaptive plasticity (Abel et al., 2014;Fridriksson, 2010;Menke et al., 2009;Raboyeau et al., 2008). In the future, careful and individual consideration of lesion site and size, treatment strategies, response to treatment, and behavioral symptoms is needed to understand why and how the changes in the BOLD signal are related to behavioral improvement (Hartwigsen & Saur, 2019). Moreover, comparisons with intervention studies in PWA in the subacute phase after stroke are needed to take into account and to understand the dynamic nature of brain plasticity across the time course of recovery (Saur et al., 2006;Saur & Hartwigsen, 2012). At present, intervention studies almost exclusively took place in the chronic phase after stroke, although it is generally assumed that most language recovery takes place in the first days to weeks after stroke, and the mechanisms involved are most probably different.
Another issue that impacts the interpretation of the results concerns the inclusion criteria for the various interventions provided in the different studies. If specific behavioral symptoms were prerequisites to receive a specific treatment, the inclusion criteria of the different studies induced a systematic selection bias that ultimately also would have influenced the (generalizability of the) results. To explore this, we checked the patient-specific inclusion criteria of all included studies and found that, overall, the studies providing anomia treatments (sem, phon, sem+phon and phon+orth interventions) selected patients with at least a moderate naming deficit or did not mention any inclusion criteria. Remarkably, the majority of phonological studies included patients with nonfluent aphasia, while semantic (and sem+phon) studies included patients with various aphasia profiles or even fluent aphasia. The studies providing interventions on the syntactic level all included agrammatic patients with nonfluent aphasia. The studies providing rhythmic-melodic interventions also specifically included patients with (mostly moderate to severe) nonfluent aphasia. Because the difference in aphasia symptoms/profiles across studies may translate to a difference in lesion patterns, this selection bias could have systematically influenced the amount of change that was possible in the different linguistic subnetworks.

Sample size
Due to the specific patient population and practical concerns, studies in PWA frequently have to deal with small sample sizes. The sample size of the included studies varies from one to 29 patients, with more than 70% of the studies (23/32) having a sample size smaller than 10. Figure 3 shows the number of studies (on the y-axis) including a certain number of PWA (on the x-axis). From this figure, it is clear that more than 70% of the included studies of treatment-related changes in PWA include 10 participants or less. As Ramus, Altarelli, Jednoróg, Zhao, and Scotto di Covella (2018) described in their review, neuroimaging studies with such limited sample sizes are statistically underpowered to detect group differences. As a consequence, uncorrected (or not adequately corrected) results are frequently reported, which are more likely to be unreliable (Ramus et al., 2018). Thus, in this case, one problem (underpowered studies) might lead to another (reporting spurious results), preventing the field from moving forward.
In particular, when looking for differences with small to medium effect sizes, which is the case for treatment-related brain changes in PWA, a large sample is needed to detect within-group differences with adequate (not too large) confidence intervals (Ramus et al., 2018). For example, to have a power of 80% to detect small within-subject differences ( 2 = 0.02) using withinsubjects repeated measurements ANOVA (4 measurements), one needs to include at least 69 participants. The number of PWA should be even higher than this to deal with fewer measurements, the large heterogeneity in the aphasia population, for example in lesion pattern and symptomatology, and other confounds mentioned earlier. This number stands in great contrast with the number of participants in the studies included in this review and in studies that are still being published in the field of aphasia recovery, with few of them reaching this standard. It is, therefore, crucial to perform studies of aphasia recovery in larger samples of patients to ensure sufficient statistical power to demonstrate a treatment effect, to allow for generalization to other (sub) populations, and to account for methodological limitations (e.g., smoothing and the need to correct for multiple comparisons; see Poldrack, Mumford, & Nichols, 2011). Because large scale studies in a population of patients of aphasia are practically very challenging, data sharing might be a suitable alternative to increase sample size. We refer the reader to Meyer (2018) and Poldrack and Gorgolewski (2014) for an overview of data-sharing efforts in the field of fMRI and practical tips for ethical data sharing, respectively.

Treatment-related effects outside the language networks
Our review has shown that recovery from aphasia engages brain regions that are outside those traditionally associated with language functions (Figure 6). Such effects might relate to other functions (such as cognitive control) that are not specific to language but indirectly support language. It has already been suggested that neural activity in the right IFG, a region that has been consistently found to be involved in language recovery (Turkeltaub, Messing, Norise, & Hamilton, 2011), including in our own review, is more related to top-down cognitive control by a cingulo-opercular network than to a dynamic language-specific process . Given the overlap between the language network and several nonlanguage processes, such as cognitive control or episodic memory-see, for example, Chein and Schneider (2005), Geranmayeh, Brownsett, and Wise (2014), and Humphreys and Lambon Ralph (2015)-it seems worthwhile to characterize the function, importance, and language specificity of other regions that have been consistently identified in aphasia recovery, including subcortical brain regions. The relative contribution of these regions to language recovery compared with the perisylvian regions traditionally associated with linguistic processing (Geschwind, 1970;Hickok & Poeppel, 2007) is presumably even more important than those we can derive from the results of this review. Approximately one-third of the included studies applied an ROI analysis, which in half of them, focused on exploring the role of language regions in the left hemisphere and their right-hemispheric counterparts in the response to aphasia interventions. Therefore, we believe that whole-brain analyses are required to fully understand the neural correlates of the cognitive mechanisms that support treatment-related language recovery. Moreover, it might be interesting to further explore the effect of targeting these cognitive functions during language interventions on aphasia recovery (Cahana-Amitay & Albert, 2015). For example, it would be useful to know whether these treatments (e.g., working memory training) are as effective as language interventions at boosting language recovery and whether PWA could benefit from a combination of both.

Connectivity studies
Another striking finding is that the majority of the included research (around 80%) focused on regional changes in the brain, while only seven studies investigated treatment-related changes in connectivity patterns. As language and other cognitive functions are linked to large-scale neural networks, formed by interconnected cortical and subcortical areas, stroke-induced damage to that network can lead to distributed dysfunction far beyond the lesion (Carter, Shulman, & Corbetta, 2012). Indeed, experimental animal studies have shown that stroke unavoidably affects the brain connectome within minutes of onset (Silasi & Murphy, 2014). Therefore, treatment-related (or spontaneous) regional alterations should be considered in the context of these brain-wide connections that might explain more of the recovery process in PWA than considering discrete brain regions in isolation. Due to recent advances in neuroimaging and computational sciences, more and more studies could and should explore treatmentrelated brain changes on the connectome level, structurally as well as functionally. Again, for these kinds of analyses, a large sample size is very important. In addition, analyses on the network level might be more sensitive to reveal distinct effects of interventions targeting different linguistic components. It is very likely that treatments, to some extent, do not differ in the location of their effect, but rather in the specific distribution of neural activity over the (same) language network, as well as other supporting cognitive networks. In addition, one can investigate functional connectivity patterns using resting-state fMRI, which has several advantages. First, it requires minimal participation, which makes it clinically more interesting compared with task-based fMRI, especially in the case of severe aphasia or in the acute phase post-stroke. Second, the interpretation of functional connectivity is not complicated by different task strategies, as in task-based fMRI (Siegel, Shulman, & Corbetta, 2017). In summary, investigating connectivity patterns offers a better understanding of the effects of the lesion on the networks in the brain (extended lesion effects), as well as a better understanding of how treatment influences the interaction in and between specific distributed networks (language, but also memory, executive function, etc.). We contend that a regional approach and a connectivity approach are complementary methods that together can answer different research questions (i.e., whether we are interested in treatment effects on specific regions or on how they work together to create language).

Individual variability as meaningful information
Approximately half of the included studies included a control group of healthy adults, mostly to provide normative data on (repeated) task-specific brain response patterns. The results for the healthy control group are therefore collapsed across the subject dimension and only the mean effects are considered. In the context of personalized medicine, it might be interesting not to treat the between-subject variance in the neuro-anatomical representation of language as noise, but rather as meaningful information. After all, interindividual variability in a healthy population (e.g., differences in learning or cognitive strategies to perform a given task) could explain differences between subjects in the speed and amount of aphasia recovery (Seghier & Price, 2018). As Seghier and Price (2018) suggest, we could derive the likelihood of recovery of PWA from the amount of variability in the functional response in a normal population. In a healthy population, there are multiple ways to perform a certain (language) task, and therefore, PWA should be able to compensate for a problem in that specific language task. Seghier and Price (2018) propose the method of covariance analysis to characterize interindividual variability, which typically is masked in group analyses. However, once again, this requires a large number of observations from a large number of individuals and thus a large study sample size.
In addition, measurement and characterization of interindividual differences at the neural level in a heterogeneous group of PWA would enable the tailoring of appropriate interventions to every individual. For example, in the context of transcranial direct-current stimulation, Shah-Basak et al. (2015) demonstrated that patients with more extensive lesions in the frontal lobe benefited more from left-hemispheric inhibition, while PWA with less frontal damage responded better to lefthemispheric facilitation. On the other hand, in large databases of stroke patients with and without aphasia such as the Predicting Language Outcome and Recovery after Stroke (PLORAS)-database (Seghier et al., 2016), machine learning approaches could be used to estimate the response to intervention in PWA. By comparing neuroimaging data of a new PWA with neuroimaging data of the stroke database and the treatment outcomes, the system could learn which treatment is generally effective for PWA with similar neuroimaging characteristics. This could be done with lesion data, as well as with structural and functional connectivity patterns (Silasi & Murphy, 2014).

Conclusion
Across treatments and participants, the regions that are involved in language recovery are very diverse. Similarity between intervention studies targeting the same linguistic component is not apparently greater than similarity between intervention studies targeting different linguistic components. However, methodological shortcomings and variability between studies make it hard to present clear conclusions on treatment-related brain changes in PWA. It is possible that treatmentrelated brain changes associated with recovery of language after brain damage entail both regions traditionally involved in linguistic processing as well as regions involved in other cognitive functions in both hemispheres. If this is true, we should interpret recovery from aphasia as the result of the adaptive reorganization of functionally heterogeneous perilesional and bilateral neural networks not uniquely involved in language processing (Cahana-Amitay & Albert, 2015). Therefore, we argue for the interpretation of treatment-related language recovery in light of the concept of neural multifunctionality (as discussed in the review of Cahana-Amitay & Albert, 2015). This label highlights the constant and dynamic interactions between neural networks supporting linguistic as well as nonlinguistic functions, such as cognitive, emotional, and sensorimotor processing. In other words, the language network is most likely widely distributed over many different functionally and structurally connected brain regions that are activated interactively. Specific linguistic functions are fulfilled through the integration of neural activity in many regions subserving many functions (Price, 2012).