White matter microstructural plasticity associated with educational intervention in reading disability

Abstract Children’s reading progress typically slows during extended breaks in formal education, such as summer vacations. This stagnation can be especially concerning for children with reading difficulties or disabilities, such as dyslexia, because of the potential to exacerbate the skills gap between them and their peers. Reading interventions can prevent skill loss and even lead to appreciable gains in reading ability during the summer. Longitudinal studies relating intervention response to brain changes can reveal educationally relevant insights into rapid learning-driven brain plasticity. The current work focused on reading outcomes and white matter connections, which enable communication among the brain regions required for proficient reading. We collected reading scores and diffusion-weighted images at the beginning and end of summer for 41 children with reading difficulties who had completed either 1st or 2nd grade. Children were randomly assigned to either receive an intensive reading intervention (n = 26; Seeing Stars from Lindamood-Bell which emphasizes orthographic fluency) or be deferred to a wait-list group (n = 15), enabling us to analyze how white matter properties varied across a wide spectrum of skill development and regression trajectories. On average, the intervention group had larger gains in reading compared to the non-intervention group, who declined in reading scores. Improvements on a proximal measure of orthographic processing (but not other more distal reading measures) were associated with decreases in mean diffusivity within core reading brain circuitry (left arcuate fasciculus and left inferior longitudinal fasciculus) and increases in fractional anisotropy in the left corticospinal tract. Our findings suggest that responses to intensive reading instruction are related predominantly to white matter plasticity in tracts most associated with reading.


Introduction
Reading disabilities are the most common learning disability (Shaywitz, 1998), impacting as many as 20% of children (Wagner et al., 2020).Formal reading instruction begins at school entry (around 6 years old) for most children in the United States, and readers continue developing their skills in and out of school contexts.However, during extended formal education breaks such as summer vacation, typically an 8-9 week period in U.S. schools, reading progress typically slows (Cooper et al., 1996;Entwisle et al., 1997;von Hippel et al., 2018).Extended suspension of formal schooling can exacerbate achievement gaps among vulnerable readers who do versus do not participate in reading instruction (Christodoulou et al., 2017) as well as between vulnerable readers and their typically reading peers, as was observed during COVID-19 school disruptions (Kuhfeld et al., 2023).Reading interventions over the summer can halt reading skill loss and even lead to appreciable gains in reading abilities for struggling readers (Christodoulou et al., 2017;Donnelly et al., 2019).
Learning and memory are thought to drive long-term plasticity in white matter (Fields, 2015;Fields et al., 2014;Sampaio-Baptista & Johansen-Berg, 2017;Xin & Chan, 2020).Such changes may manifest as alterations to axonal geometry (e.g., diameter modulations or axonal pruning/branching), myelin remodeling driven by oligodendrocyte proliferation and differentiation, or variations to extra-axonal glial cells and vascular systems (Sampaio-Baptista & Johansen-Berg, 2017).White matter micro-and macrostructural properties can be inferred in vivo non-invasively with diffusion-weighted imaging (DWI; Basser et al., 1994).Longitudinal DWI studies have related skill acquisition to white matter changes in behavior-relevant bundles in animals (Blumenfeld-Katzir et al., 2011;Sampaio-Baptista et al., 2013) and humans (Metzler-Baddeley et al., 2017;Scholz et al., 2009), providing evidence for DWI's utility in quantifying white matter plasticity.Most DWI studies of plasticity have used metrics from the diffusion tensor imaging (DTI) model, including fractional anisotropy (FA) and mean diffusivity (MD).FA measures the degree to which water molecule movement is directionally dependent (1 -water moves only along a single axis, 0 -water moves equally as well in all directions), while MD is related to the magnitude of water movement in all directions.Higher FA and lower MD are thought to indicate well-myelinated white matter (however, see the Discussion for limitations surrounding these metrics).Given the high test-retest reliability of DTI measures and modest rates of developmental change (Behler et al., 2021;Wu et al., 2022;Yu et al., 2020), observing significant and rapid changes in these metrics is encouraging in suggesting that white matter alterations are occurring and manifest at a level resolvable by conventional MRI.
Reading is an appropriate and educationally relevant domain for investigations of learning-driven neural plasticity.First, reading is a skill that has been socio-culturally introduced too recently to be a product of evolution or natural selection pressure.Second, reading must be explicitly taught, and highly stable measures exist to gauge reading performance (Torgesen et al., 2012;Woodcock, 2011).Longitudinal DWI studies of longterm reading development have shown that trajectories of white matter properties and reading skills are significantly linked, especially in the left AF, such that increases in tract volume (Myers et al., 2014) and FA (Roy et al., 2022;Van Der Auwera et al., 2021;Wang et al., 2017;Yeatman et al., 2012) accompany improvements in reading among children with diverse reading abilities, although the opposite trend for FA has also been reported for children with reading disabilities (Yeatman et al., 2012).A comparison of illiterate and ex-illiterate adults suggests that developing literacy is associated with higher FA in the left AF (Thiebaut de Schotten et al., 2014).Similarly, higher FA in core reading tracts have predicted subsequently better future reading outcomes in children (Borchers et al., 2019;Davison et al., 2022;Gullick & Booth, 2015), as well as better reading-adjacent skills, such as phonological awareness, among pre-readers (Saygin et al., 2013;Zuk, Yu, et al., 2021).
In comparing populations with and without reading disabilities, there has not only been a focus on left-hemispheric core reading tracts, which have exhibited lower FA among pre-readers with familial risk of dyslexia (Langer et al., 2017;Vandermosten et al., 2015) and future diagnoses of dyslexia (Vanderauwera et al., 2017), but also on their right hemispheric homotopes.Higher FA in the right superior longitudinal fasciculus has predicted future reading outcomes in children with dyslexia (Hoeft et al., 2011) and longitudinal FA increases in this tract relates to positive reading development in children with familial risk for dyslexia (Wang et al., 2017).These studies suggest the right hemisphere may provide a compensatory mechanism in reading disabilities.This notion is supported by a report suggesting that children with dyslexia show faster reading-related volumetric changes in right, but not left, inferior frontal gyrus compared to typically reading children (Phan et al., 2021).Longitudinal designs, as described above, have yielded stronger results than analogous high-powered cross-sectional studies that have suggested little-to-no significant relationship between DTI measures and individual differences in reading skills (Meisler & Gabrieli, 2022a;Moreau et al., 2018;Roy et al., 2022).Collectively, the extant literature implies that white matter infrastructure could have a causal and dynamic relationship with reading outcomes, as opposed to being a static genetically predisposed foundation that reflects individual differences in such outcomes.
While longitudinal studies of long-term reading development seem to have converged on the importance of left-hemispheric reading circuitry in predicting and tracking reading outcomes, neuroimaging studies focusing on short-term reading instruction (on the order of days-to-weeks) have yielded few and mixed findings on rapid anatomical correlates of reading remediation (as well as functional correlates -for reviews, see Barquero et al., 2014;Braid & Richlan, 2022;Perdue et al., 2022).Huber et al., (2018) found that decreases in MD across the brain, not limited to core reading circuitry, were related to better reading intervention benefits across participants.A later reanalysis of a subset of these participants found that this plasticity was likely not driven by factors related to myelination, but rather extra-axonal characteristics (Huber et al., 2021).One study reported better intervention responses were related to increases in FA in the anterior left centrum semiovale (Keller & Just, 2009), a broad term for white matter between the corpus callosum and cortical surface which is not considered a part of core reading circuitry.Another study found decreased MD in several left hemispheric regions after reading intervention, although right-hemispheric regions were not reported (Richards et al., 2017).However, a different study concluded that white matter did not change with reading intervention, but lower right-hemispheric dorsal white matter MD prior to intervention predicted better intervention outcomes (Partanen et al., 2021).In young pre-readers undergoing early literacy training, pre-to-post increases in FA in the left AF and ILF were observed, but these were ultimately attributed to developmental, as opposed to intervention-driven, processes (Economou et al., 2022).In summary, while rapid white matter changes can be observed in a short period of time in the context of intensive reading intervention, it is unclear whether these changes are reproducible, domain-specific (that is, localized to tracts that typically support reading), or dissociable from broader developmental trajectories.
Inconsistent findings could be driven by a variety of factors including publication bias to report positive findings, small sample sizes, and variation in participant characteristics, interventions, and neuroimaging acquisition and analysis protocols (Perdue et al., 2022;Roy et al., 2022;Schilling, Rheault, et al., 2021;Schilling, Tax, et al., 2021;Thornton & Lee, 2000).It is also a possibility that intervention-driven effects are not robust or generalizable, reflecting unique properties of the intervention used or cohort studied.Neuroimaging studies in general are prone to issues in replicability (Button et al., 2013;Kharabian Masouleh et al., 2019;Marek et al., 2022;Turner et al., 2018).Notably, the present study bears strong similarity to another reading intervention DWI study (Huber et al., 2018), employing the same intervention curriculum among children within a similar age-range and focusing on comparable DWI metrics.This resemblance offers a uniquely well-suited opportunity to attempt to reproduce prior results.
In the present study, we examined changes in reading skill and white matter properties before and after a six-week summer reading intervention among children with reading disabilities (RD).We focused on 7 white matter tracts: the left AF and ILF as core reading circuitry bundles, their right-sided homotopes as potential compensatory bundles, bilateral corticospinal tracts (CST) as control bundles that are not thought to subserve reading, and the splenium of the corpus callosum, which may support reading through connecting bilateral visual cortices but has been shown to be microstructurally stable during reading intervention (Huber et al., 2018).We hypothesized observing one of two outcomes: (1) decreases in MD and/or increases in FA in all tracts (besides the splenium) would be related to better intervention responses (e.g., Huber et al., 2018), or (2) this effect would be localized to just the left AF, consistent with multiple studies tracking reading development on longer time scales (Roy et al., 2022;Van Der Auwera et al., 2021;Yeatman et al., 2012).

Ethics Statement
This project was approved by the Massachusetts Institute of Technology's Committee on the Use of Humans as Experimental Subjects (protocol number: 1201004850).Informed written consent was obtained from parents or legal guardians, while informed written assent was obtained from the participants, who were all minors.

Participants
Participants included in the present study were a subset of a larger sample, for which reading (Christodoulou et al., 2017) and gray matter morphometric (Romeo et al., 2018) findings have been previously reported.Forty-one participants passed all inclusion and quality control criteria and were analyzed in the present study (see Data Inclusion and Quality Control).All participants were between 7 and 9 years old and were entering the summer having completed grades 1 or 2. Inclusion criteria included a history of reading difficulty based on parental report and a manifestation of reading difficulty at study enrollment.In particular, to be included in the study, participants had to have scored "At Risk" or "Some Risk" on the Dynamic Indicators of Basic Early Literacy Skills test (DIBELS; Good et al., 2002) and below the 25th percentile on at least 3 of the 5 following measures: Elision and Nonword Repetition subtests from the Comprehensive Test of Phonological Processing, 2nd Edition (CTOPP-2; Wagner et al., 1999), and the Objects, Letters, and 2-set Letters and Numbers subtests of the Rapid Automatized Naming and Rapid Alternating Stimulus Tests (RAN/RAS; Wolf & Denckla, 2005).Additionally, participants had to score at or above the 16th percentile on the Matrices subtest of the Kaufman Brief Intelligence Test, 2nd Edition (KBIT-2; Kaufman, 2004), which is a measure of nonverbal cognitive ability.All the children were native English speakers.
Children were recruited from a local partner charter school and the Greater Boston area.

Reading Intervention
Participants were randomly assigned to either receive a reading intervention (n = 26) or be placed on a waiting-list (n = 15).Comprehensive details of the intervention have been previously described (Christodoulou et al., 2017).Intervention participants completed intensive reading instruction following Seeing Stars: Symbol Imagery for Fluency, Orthography, Sight Words, and Spelling program (Bell, 1997).Instruction was delivered by trained Lindamood-Bell teachers, who rotated classrooms hourly.The program duration was 4 hours per day on 5 days per week for 6 weeks; intervention duration totaled between 100 and 120 hours.Students received small group instruction (3-to-5 students per group) to improve foundational reading skills including phonological and orthographic processing.Children recruited from the local partner school received the intervention on-site at their school (n = 9), while children recruited from the community-at-large (n = 17) received the intervention at a dedicated space at the Massachusetts Institute of Technology.

Outcome Measures
Standardized reading scores were collected from all participants before and after the intervention period, regardless of whether they participated in the intervention.A proximal reading measure of intervention response, the Symbol Imagery Test (SIT), measured phonological and orthographic processing in reading (Bell, 2010).During the SIT, participants briefly viewed cards with words or pseudowords for between 2 and 7 seconds and were then asked to report what they were shown.Cronbach's α values range from .86 to .88, and the test-retest reliability is .95(Bell, 2010).A relatively distal composite reading index was calculated at each time point by averaging the following four standardized reading scores: Sight Word Efficiency (SWE) and Phonemic Decoding Efficiency (PDE) from the Test of Word Reading Efficiency, 2nd Edition (TOWRE-2; Torgeson et al., 1999), and Word Identification (WID) and Word Attack (WA) from the Woodcock Reading Mastery Tests, 3rd Edition (WRMT-3; Woodcock, 2011).Timed and untimed single word reading skills were measured by SWE and WID, respectively, while timed and untimed pseudoword reading skills were measured by PDE and WA, respectively.For all four subtests, Form A was administered at the beginning of the study, and Form B was administered at the end of the study to avoid practice or familiarity effects.High alternate form reliability has been reported for standardized tests scores on both the WRMT-3 subtests (Word ID: r = 0.93, Word Attack: r = 0.76; Woodcock, 2011) and the TOWRE-2 subtests (SWE: r = 0.90, PDE: r = 0.92; Torgesen et al., 2012).Agenormed scores for all tests were defined such that the population mean is 100, with a standard deviation of 15.

Neuroimaging Acquisition
Participants were scanned at the Athinoula A. Martinos Imaging Center at the Massachusetts Institute of Technology using a 3 Tesla Siemens TimTrio scanner and standard 32 channel head coil.During each session, a T1-weighted (T1w) MPRAGE image was acquired with the following parameters: TR=2.53s,TE=1.64ms,Flip Angle=7°, 1mm isotropic voxels.A diffusion-weighted image (DWI) was acquired with the following parameters: TR=9.3s,TE=84ms, Flip Angle=90°, 2mm isotropic voxels, and 10 b0 volumes followed by 30 non-collinear directions at b=700 s/mm 2 .Age-appropriate movies were shown during these scans to increase scan engagement and reduce head motion (Greene et al., 2018).Functional MRI tasks were also collected but are not discussed here.Before the first MRI session, participants were introduced to the MRI by visiting the center's pediatric mock scanner, which allows children to get acclimated with MRI noise and lying still in the machine, which improves scan compliance (de Bie et al., 2010;Gao et al., 2023).

MRI Preprocessing and Tract Segmentation
MRI preprocessing and tract segmentation were performed according to the longitudinal TRActs Constrained by UnderLying Anatomy (TRACULA) pipeline (Maffei et al., 2021;Yendiki et al., 2011Yendiki et al., , 2016)), as part of FreeSurfer version 7.2 (Fischl, 2012;Fischl et al., 2002;Reuter et al., 2012).This method uses longitudinal anatomical priors to produce more plausible tracts compared to creating independent segmentations at each time point (Yendiki et al., 2016), as well as leverages high-quality training data to help inform tract shapes on routine-quality DWI data (Maffei et al., 2021).To achieve this, FreeSurfer's longitudinal processing pipeline (Reuter et al., 2012) was run on each participant's pre and post T1w images to create an unbiased subject template image (Reuter & Fischl, 2011) using inverse consistent registration (Reuter et al., 2010).
Information from this template was used to initialize several steps of the recon-all pipeline, such as skull-stripping and anatomical segmentation (Reuter et al., 2012).DWI volumes from each image were aligned to the first b0 image in that scan.The b-matrix was rotated accordingly (Leemans & Jones, 2009).DWI images were corrected for motion and eddy currents with FSL's eddy command (Andersson & Sotiropoulos, 2016).Information from this process was used to generate four measures of head motion and image quality that inform the "total motion index" (Yendiki et al., 2014): mean volumeby-volume head rotation, mean volume-by-volume head translation, proportion of slices with signal dropout, and severity of signal dropout.The diffusion tensor was fitted using FSL's dtifit.Mean diffusivity (MD) and fractional anisotropy (FA) were derived from the tensor.A GPU-accelerated ball-and-stick model was fit for each DWI image (Behrens et al., 2007;Hernández et al., 2013;Jbabdi et al., 2012).At each time point, a registration was computed between the diffusion-weighted image and T1w image (native space) using an affine boundary-based registration algorithm (Greve & Fischl, 2009).This transformation was used to bring anatomical segmentations into DWI space.The DWIto-T1w and T1w-to-template registrations were multiplied to get a DWI-to-template transformation.Information from high-resolution 7T training data (Maffei et al., 2021) was used to estimate endpoint ROIs and pathways for white matter tracts in each participant's template space images.These data were then brought back into the native DWI space of each time point.The DWI ball-and-stick model and tract anatomical priors were used to calculate the probability density of each pathway.From these, we collected the average MD and FA from the cores of our tracts of interest ([MD|FA]_Avg_Center) to mitigate concerns of noise and partial volume effects from fiber branching towards the exterior and extremities of the bundles.These tracts included the bilateral AF, ILF, and CST, as well as the splenium of the corpus callosum (Figure 1).

Statistics and Analysis
Analyses were prepared, run, and visualized using Python packages Pandas 1.3.2(McKinney, 2011), Statsmodels 0.13.5 (Seabold & Perktold, 2010), and Seaborn 0.12.1 (Waskom, 2021), respectively.We made a dataframe that contained the following phenotypic fields for each subject: ages at each scan (in months), sex (binary categorical factor), and reading measures at pre and post timepoints, as well as their longitudinal preto-post differences.For each tract, the pre and post MD and FA metrics were added to the dataframe, along with their longitudinal pre-to-post microstructural differences.The total motion indices, calculated separately for each time point, were also added to the dataframe (see Data Inclusion and Quality Control).We used ordinary least squares linear models to run multiple regressions, allowing us to relate reading measures to white matter microstructure while controlling for confounds.
For the primary analyses, we created models to predict pre-to-post changes in a given tract microstructural measure by the pre-to-post change in a reading measure across all participants, with nuisance regressors for sex, age at first scan, and motion indices at both timepoints.We also ran similar models using just participants in the reading intervention.As exploratory analyses to contextualize our main focus of longitudinal differences in tract microstructure, we also examined whether reading skill was related to tract FA and MD cross-sectionally at each time point across all participants.
For these additional models, at a given time point, the tract metric was predicted based on the given standardized reading measure, controlling for sex, age, and motion for the given time point.Across all models, effect sizes (ΔR 2 adj) were calculated as the difference in adjusted R 2 coefficients between the full model and a reduced model without the reading score term of interest.

Data Inclusion and Quality Control
A total of 153 children were recruited as part of the larger overarching study.52 participants had anatomical and DWI scans at both time points and were able to complete the neuroimaging processing pipeline without errors.44 of the remaining participants had the necessary phenotypic data.As a quality assurance metric, we computed the total motion index (TMI) as described in Yendiki et al., (2014).TMI is related to four measures: rotation, translation, signal dropout prevalence, and signal dropout severity.For each scan, we calculated each motion metric's difference from the study population mean for the given time point, divided by the interquartile range of the metric.The TMI for each scan is the cumulative sum of these calculations across the four motion metrics.3 subjects had outliers in TMI at either time point and were excluded.For further quality assurance, we confirmed that no remaining participant had any tract-averaged FA lower than 0.3, which could indicate some combination of white matter disorganization and partial volume effects from a tract branching into significant amounts of gray matter or CSF.Thus, a total of 41 subjects (26 who received an intervention, and 15 in the nonintervention group) were analyzed in the present study.

Cognitive and Phenotypic Data
Phenotypic summary statistics for the participant cohort are provided in Table 1.
Of note, the intervention and non-intervention groups were matched in sex (χ 2 test, p > 0.7), but not handedness (χ 2 test, p < 0.05).However, we did not include handedness as a regressor in our models due to a lack of evidence of handedness-related asymmetry in white matter microstructure (López-Vicente et al., 2021).The two groups were also matched in age, socioeconomic status, and reading performance across all subtests cross-sectionally at each time point (two-sample t-test, p > 0.1 across all tests), with the exception of the intervention group having a significantly higher SIT score postintervention (two-sample t-test, p < 0.005).Demonstrating the efficacy of the reading intervention, the intervention group showed higher longitudinal pre-to-post differences in the SIT and composite reading index (two-sample t-test, p < 0.002 across both tests), driven by the non-intervention group regressing in both measures and the intervention group improving on the SIT and maintaining scores on the composite reading index (Figure 2).These results are consistent with what was observed in the larger cohort from which the present subset was derived (Christodoulou et al., 2017).

Scores
Across all participants at the beginning of the summer, lower MD (p < 0.1, ΔR 2 adj = 0.058) and higher FA (p < 0.1, ΔR 2 adj = 0.064) in the left ILF were marginally associated with better SIT scores.Higher FA in the right ILF (p < 0.05, ΔR 2 adj = 0.106) and right CST (p < 0.05, ΔR 2 adj = 0.069), as well as lower MD in the right CST (p < 0.05, ΔR 2 adj = 0.079), were related to better initial composite reading index scores.At the end of the summer, better composite reading index scores were associated with higher FA (p < 0.05, ΔR 2 adj = 0.084) and lower MD (p < 0.05, ΔR 2 adj = 0.086), in the right CST.No tract microstructure values were associated with SIT scores at the end of the summer.No tests remained significant at α = 0.05 with a Bonferroni factor of 28 (at each time point, 7 tracts X 2 microstructural metrics X 2 reading scores).

Relationship Between Changes in White Matter Microstructure and Reading Scores
Across all participants, pre-to-post decreases of MD in the left AF and left ILF were related to improvements in SIT scores over the summer, with MD changes in the left AF accounting for ~9% of variance in reading scores trajectories, and left ILF accounting for ~16% (Table 2, Figure 3).This effect was not present when considering the composite reading index.Decreasing splenium MD was marginally correlated with improvements in both reading measures (p < 0.1), accounting for ~5% of variance in each.Increasing FA in the left CST (p < 0.05) and to a lesser extent the left AF (p < 0.1) were related to improvements in SIT scores, but not the composite reading index (Figure 3, Table 3).No tests remained significant at α = 0.05 after correction for multiple hypotheses with a Bonferroni factor of 28.
When running the same models on only the 26 participants who completed the intervention, the relationships between pre-to-post decreases in MD in the left ILF and improvement in SIT scores (Figure 3, p < 0.05, ΔR 2 adj = 0.238) and between pre-to-post decreases in MD in the splenium and improvement in composite reading index scores (p < 0.1, ΔR 2 adj = 0.093) remained.Additionally, there was a modest relationship between pre-to-post decreases of FA in the right ILF and improvement in SIT scores (p < 0.1, ΔR 2 adj = 0.118).None of these additional models remained significant at α = 0.05 after correction for multiple hypotheses with a Bonferroni factor of 28.

Discussion
In the present study, we investigated whether changes in white matter microstructure were related to changes in reading skill during the summer among 41 children with reading disabilities.Reading ability trajectories varied on a wide spectrum, including score regression (the "summer slump") and intervention-driven improvement.
We focused on 7 tracts within and outside of core reading circuitry, using two microstructural measures (FA and MD), and two reading measures.One reading measure, the SIT, was closely related to the intervention, while a separate composite reading index was more distal to the intervention.We found that longitudinal decreases in MD were related to improved SIT scores in core reading circuitry (the left AF and left ILF).Longitudinal increases in FA in the left CST were also related to improved SIT scores.Notably, none of these associations were present when considering the composite reading index.We originally hypothesized seeing a relationship between white matter microstructure and reading score changes in either just the left AF or more globally.
While neither of these hypothesized outcomes were explicitly met, the pattern of results, particularly relating to the use of MD as done in other studies of short-term reading intervention (Huber et al., 2018;Partanen et al., 2021), suggests that intervention effects were strongest (and in the hypothesized direction) within core reading circuitry when considering the reading measure most proximal to the intervention.
The design of our study most closely resembles that of Huber et al., (2018), but with some important differences.The same intervention curriculum (Seeing Stars) was used in both studies, with similar instruction hours per week, but the present study had a 6-week intervention period, as opposed to the 8 weeks in Huber et al., (2018).Notably, students in the present study were taught in small group settings, while a 1-on-1 approach was used in Huber et al., (2018).The longer intervention duration and more intense individualized instruction may have been factors that led to children in that study improving on their composite reading measure (composed of the same reading tests as in the present study), as opposed to only maintaining scores as found in the present study.
Both studies had similar cohort sizes (41 and 43 children).However, the present study had participants within a narrower age-range of 7-9 years compared to 7-12 years old in Huber et al., (2018).Additionally, all participants in the present study were diagnosed with a reading disability, while 10 children in Huber et al., (2018) were typical readers.Both studies found that changes in MD in the left AF and ILF were related to reading outcomes over the course of the summer.However, Huber et al., (2018) found more widespread plasticity that did not include the splenium, while our cohort exhibited more domainspecific plasticity that included the splenium (p < 0.1).Beyond the contrasts between studies explained above, additional variation in data acquisition, processing, and statistical techniques could have contributed to differences in results between studies (Schilling, Rheault, et al., 2021;Schilling, Tax, et al., 2021).
It is encouraging that both the present study and Huber et al., (2018) found intervention effects in the left AF and ILF, which provides converging lines of evidence suggesting that intense educational instruction targets reading-relevant white matter tracts, albeit with different findings about plasticity occurring in a broader range of tracts.
Future studies ought to address similar questions in different contexts to evaluate the reproducibility and generalizability of these findings.Presently, there are not enough extant studies on longitudinal neuroanatomical correlates of reading intervention (in either gray or white matter) to perform meaningful meta-analyses.In functional MRI, a metaanalysis of 8 studies with longitudinal neuroimaging and cognitive scores concluded that there were no consistent locations where longitudinal changes in reading-invoked BOLD signal and intervention response covaried (Perdue et al., 2022).However, individual studies, including those that may have only contained one session of neuroimaging either prior or after intervention, have found functional correlates of intervention response both in putative reading regions as well as more globally (reviewed in Barquero et al., 2014;Braid & Richlan, 2022;Perdue et al., 2022).One of the few studies of gray matter morphometric correlates of intervention response was conducted on the same participant pool as in the present study (Romeo et al., 2018).This study concluded that children who improved exhibited significant cortical thickening in brain regions including the left middle temporal gyrus, right superior temporal gyrus, and bilateral middle-inferior temporal cortex, inferior parietal lobule, precentral cortex, and posterior cingulate cortex.While some of these regions comprise left-lateralized core reading areas, others extend globally beyond the reading network.Considering these spatially distinct patterns of results for gray and white matter, it is unclear yet how to relate different anatomical measures of plasticity in response to reading intervention.
In exploratory analyses, we created cross-sectional models to investigate whether white matter microstructure and reading skills were associated at each time point.Lower MD and higher FA in the left ILF were associated with better reading scores at the beginning of the summer, consistent with the left ILF's critical role in supporting reading.
We also found that the right ILF and right CST microstructure had significant associations with reading scores.Notably, this is not consistent with a previous report showing an inverse relationship between right ILF FA and reading outcomes among children with reading disabilities (Banfi et al., 2018), and in the present study, longitudinal microstructural trajectories in these right-sided homotopes were not linked with reading score changes.This might suggest that right-lateralized white matter serves as a static compensatory agent that reflects early reading outcomes in reading disabilities (e.g., (Zuk, Dunstan, et al., 2021), but does not dynamically change with reading instruction.
However, given the small sample size and limited power of cross-sectional designs, this should be interpreted with caution.
The significance of the models including the CST, both cross-sectionally and longitudinally, were unexpected given its seeming lack of a role in reading as a primary motor tract.Although effect sizes for these models were appreciably lower than those for the left ILF and AF, this suggests that intervention effects could still be detected to some extent outside of reading circuitry.The CST is not often focused in studies of reading abilities, but one study found that volumes of bilateral CST were informative in predicting dyslexia diagnoses in children (Cui et al., 2016).Additionally, other studies have found that FA in the left CST predicted future phonological skills (a critical pre-reading ability) in kindergarten (Zuk, Yu, et al., 2021), correlated with phonological processing in preschoolers (Walton et al., 2018), and related to phonological encoding abilities in adults with brain damage (Han et al., 2016).This suggests that the left CST could be co-opted into reading circuitry in populations with deficient or not-fully developed language abilities.This may deviate from the more frequent focus of compensation from right-sided reading circuitry homotopes such as the right ILF and AF.However, this theory is consistent with the left CST's importance in speech, evidenced by microstructural deficiencies in preterm children with poor oromotor outcomes (Northam et al., 2012) and stuttering populations (Connally et al., 2014;Kronfeld-Duenias et al., 2016), and the relationship between speech and reading disabilities (Catts, 1993;Hayiou et al., 2010).It is also a possibility that the CST reconstructions largely intersected with the nearby corticobulbar projections, as tractography is prone to overlaps (Schilling et al., 2022).Corticobulbar projections innervate cranial nerves that support head and neck muscles, and properties of corticobulbar tracts have also been related to speech and language outcomes in preterm adolescents (Northam et al., 2019).
Associations between white matter microstructure and reading outcomes favored proximal, but not distal, measures.The proximal outcome measure (SIT) was directly aligned with the intervention program and is the metric that one would predict to be most impacted by the intervention.We expected but did not find statistically significant associations between white matter microstructure and the distal reading composite that included four single word reading measures (timed and untimed, real and pseudo-words).(Beaulieu, 2002;Genc et al., 2017), as opposed to axonal factors such as myelination, orientation coherence, and axonal density (Friedrich et al., 2020).This is consistent with other DWI studies of reading intervention (Huber et al., 2018(Huber et al., , 2021)).However, multimodal research at various spatial and temporal resolutions will need to be reconciled to perform the nontrivial task of ascribing such changes to biophysical mechanisms (Jelescu et al., 2020).While higher FA and lower MD are often thought to reflect more "healthy" white matter, these metrics are biologically unspecific due to the variety of factors that can influence diffusion of water in voxel-sized regions and the confounding impact of crossing fibers (De Santis et al., 2014;Jones et al., 2013), which can impact as many as 90% of white matter voxels (Behrens et al., 2007;Jeurissen et al., 2013).FIber-specific measures such as quantitative anisotropy (Yeh et al., 2013) and fixel-based metrics (Raffelt et al., 2017), and multicompartmental models such as NODDI (Zhang et al., 2012), can provide higher biological specificity and have shown promise in better resolving brain-behavior relationships in studies of reading abilities (Koirala et al., 2021;Meisler & Gabrieli, 2022b;Sihvonen et al., 2021).Unfortunately, the low angular resolution and weak single-shelled diffusion weighting of the present DWI acquisition scheme were not well-suited for these more novel approaches (Genc et al., 2020), effectively limiting us to using DTI metrics.
However, future work should design data acquisition protocols to use these techniques in longitudinal study designs.
Our results should be considered in the context of additional limitations.The small amount of within-participant data (two timepoints) precluded us from running more numerically sophisticated models, such as linear mixed-effect models as in Huber et al., (2018).This also limited our ability to infer when microstructural changes occurred over the course of the intervention and characterize the temporal relationship between changes in reading and tract properties (in other words, whether tract microstructural changes preceded changes in reading scores, or vice-versa).To address these points, future longitudinal studies of reading intervention should strive to contain at least 3 sessions of data collection (King et al., 2018).Our relatively small sample size of 41 reflects the challenges of collecting acceptable quality cognitive and multi-modal MRI data in children with learning disabilities, which are particularly amplified for longitudinal studies (Davis et al., 2022).Even with these limitations, we found that white matter microstructural plasticity, predominantly in core reading circuitry, was related to changes in reading abilities over the summer in the context of short-term intensive educational intervention.

Data and Code Availability
Due to language used in the consenting process, we are not permitted to publicly share subject MRI images.Images may be privately distributed upon reasonable request.
We share a CSV containing all necessary data to replicate the present results, as well as the code to recreate the analyses and figures.All instructions and code for processing data and running the statistical analyses can be found at https://github.com/smeisler/Meisler_ReadingInt_DWI.To execute the FreeSurfer workflows, we ran a Docker container containing FreeSurfer 7.2 and FSL 6.0.4 with Singularity (3.9.5) (Kurtzer et al., 2017).The container can be collected with either docker pull amirro/tracula:latest or singularity build tracula_container.imgdocker://amirro/tracula:latest.Development of these software may introduce improvements and bug fixes that should be used in future research, so we encourage using the latest stable releases.

Figure 1 :
Figure 1: Tracts produced by TRACULA analyzed in the present study, overlaid on top of a fractional anisotropy image.Only the left hemispheric bundles are visualized for bilateral tracts.Pictured data come from a single representative participant.

Figure 2 :
Figure 2: Changes in Composite Reading Index (left) and SIT (right) scores for intervention (purple) and non-intervention (yellow) participants.Paired t-tests were used to compare pre and post scores within groups, and two-sample t-tests were used to compare scores at a given time point across groups.Significant tests (p < 0.05) are annotated in the figure.Abbreviations: SIT -Symbol Imagery Test.

Figure 3 :
Figure 3: Partial regression plots relating changes in tract microstructure to changes in standardized SIT scores.Confounds included age at first scan, sex, and motion indices at each time point.Values on axes are residuals after accounting for nuisance regressors in the model.Only tracts with an uncorrected p < 0.05 (denoted by *) across all participants are shown.These include the left AF (left), ILF (middle), and CST (right).No test reached this threshold with the composite reading index.Purple dots represent intervention participants, and yellow does represent non-intervention participants.The black solid line represents the best fit across all participants, and the purple dashed line represents the best fit when considering only intervention participants.Abbreviations: AF -Arcuate Fasciculus; ILF -Inferior Longitudinal Fasciculus; CST -Corticospinal Tract.

Table 3 :
Multiple regression outcomes relating changes in age-standardized reading Inferior Longitudinal Fasciculus;CST -Corticospinal Tract; SIT -Symbol Imagery Test.