Within-person changes in the aging white matter microstructure and their modifiers: A meta-analysis and systematic review of longitudinal diffusion tensor imaging studies

This meta-analysis and systematic review synthesized data from 30 longitudinal diffusion tensor imaging (DTI) studies on the magnitude, direction, spatial patterns, and modifiers of naturally occurring within-person changes in healthy adult white matter (WM) microstructure. Results revealed: (1) significant within-person declines in fractional anisotropy (FA) in the whole WM ( d = - 0.12), genu ( d = - 0.16), and splenium ( d = - 0.13); (2) greater declines in FA associated with older age, longer follow-up times, and female sex; (3) a possible yet inconclusive vulnerability of late-myelinating WM (the “development- to-degeneration” gradient); and (4) factors decelerating (e.g., physical activity and social activities) and accelerating (e.g., vascular risk factors, biomarkers for Alzheimer’s disease, and alcohol consumption) age-related FA changes. Our findings encourage the consideration of WM as a new target for treatments and interventions against cognitive decline and lay the foundation for studying the plastic and regenerative potential of adult WM in clinical trials. Individual differences in WM changes could aid in the preclinical diagnosis of dementia, opening a window for earlier, more effective treatments.


INTRODUCTION
Human white matter (WM) contains mostly myelinated axons, whose properties determine the speed and synchrony in the brain's transduction and transmission of neural signals ( Chorghay et al., 2018).The WM is supported by a complex network of glial cells-oligodendrocytes, astrocytes, and microglia-each cell type plays a distinct role, from myelination and energy supply to immune responses and neuron-glia interactions.This intricate composition, however, makes WM highly susceptible to ischemic injuries due to its limited blood flow compared to gray matter.Specifically, the WM of the brain is particularly sensitive to metabolic, inflammatory, and vascular dysfunction ( Levit et al., 2020;Mendelow, 2015), all hallmarks of brain aging, Alzheimer's disease, and related dementias, and even neuropsychiatric disorders such as schizophrenia ( Zhao et al., 2022).This vulnerability of WM is accentuated by the metabolically demanding processes of myelin maintenance and long-distance axonal transport ( Bartzokis, 2004;Nave, 2010), which are necessary for efficient action potential conduction and metabolic support of myelinated axons ( Morrison et al., 2013).Postmortem studies in healthy older adults have shown that aging is associated with demyelination and decreases in axonal density or diameter ( Marner et al., 2003;Mason et al., 2001;Peters, 2002;Tse & Herrup, 2017).Similarly, failed myelin repair ( Bartzokis, 2004( Bartzokis, , 2011) ) and defects in axonal structure and transportation ( Stokin et al., 2005) have been observed in the early stages of Alzheimer's disease, suggesting that gray matter pathology may be triggered or preceded by WM pathology.Specifically, the "myelin" hypothesis of Alzheimer's Disease posits that proteinaceous deposits such as amyloidβ aggregates and tau tangles are the byproducts of homeostatic myelin repair processes and disruptions to axonal transport ( Bartzokis, 2011).Together, alterations in WM microstructure in both healthy aging and neurodegenerative processes result in a structural "disconnection" of distributed neural networks, considered one of the primary mechanisms underlying cognitive decline in healthy aging, Alzheimer's disease, and related dementias ( Bartzokis, 2004;Nasrabady et al., 2018).
However, postmortem histopathological examinations provide no insights into how these changes in WM occur over time and to what extent the magnitude or patterns of within-person progression differs between healthy and pathological aging.These answers can be studied only in vivo using non-invasive techniques, such as Magnetic Resonance Imaging (MRI).Therefore, this article aims to synthesize the evidence from longitudinal diffusion tensor imaging (DTI) studies on the magnitude, direction, spatial patterns, and possible modifiers of naturally occurring within-person changes in adult WM microstructure.Specifically, we aimed to address the following questions: What is the magnitude and direction of within-person changes in adult WM microstructure?What are the time periods over which WM microstructural decline can be detected in healthy adults?Do within-person changes in white matter microstructure accelerate with age?Is there regional variability in WM changes?What factors modify withinperson changes in the WM?
To date, WM microstructure in aging, Alzheimer's disease, and related dementias has been studied almost solely using diffusion MRI and predominantly using the DTI model ( Harrison et al., 2020;Madden et al., 2012).DTI provides a voxel-wise estimation of the magnitude and directionality of water diffusion.Fractional anisotropy (FA) measures the directional dependence of diffusion, reflecting fiber-orientational coherence within a voxel.Radial diffusivity (RD) and axial diffusivity (AD) represent diffusivity perpendicular and parallel to the main fiber direction, respectively.Finally, mean diffusiv-ity (MD) reflects the overall magnitude of total water diffusion within a voxel ( Beaulieu, 2002).The magnitude of diffusion is determined by microstructural elements that may hinder diffusion in any direction, such as density, permeability, and integrity of axonal and myelin membranes, activation of glia, microvasculature, and enlargement or tortuosity of extracellular spaces ( Jones et al., 2013).This review focuses on DTI-the most widely used diffusion MRI technique-yet we acknowledge that several more advanced diffusion acquisition and modeling methods have been applied in recent crosssectional studies.
The study's first aim was to determine the magnitude and direction of within-person changes in DTI parameters in the adult WM microstructure in older age.Age-comparative (cross-sectional) studies on aging consistently report decreased FA, increased MD, RD, and bidirectional age differences in AD ( Burzynska et al., 2010).These age differences have been attributed to loss of "WM integrity," including loss of myelin and axons ( Madden et al., 2012).Furthermore, crosssectional studies have suggested nonlinear trajectories in diffusion parameters across the lifespan, suggesting protracted development or myelination until middle adulthood.Specifically, FA has been shown to peak between 20 and 42 years of age, followed by a decline, whereas MD shows a minimum at 18-41 years, followed by a steady increase from middle adulthood onwards ( Lebel et al., 2012).An analysis of different diffusion parameters in 3,513 generally healthy people aged 45-77 years from the UK Biobank revealed predominantly nonlinear associations with age ( Cox et al., 2016).Specifically, an increase in MD and a decrease in FA accelerated typically after age 60 ( Cox et al., 2016).Therefore, our central hypothesis was that withinperson changes in middle and older age would predominantly involve declines in FA and increases in MD and RD.In addition, we expected these changes to accelerate after the age of 60.
Our second aim was to test the spatial gradients of WM aging.Cross-sectional findings revealed that WM tracts differ in their susceptibility to aging.As a result, several spatiotemporal gradients have been proposed to explain this selective vulnerability.The overarching model, called development-to-degeneration, retrogenesis, or last-in-first-out hypothesis, posits that WM regions that myelinate later in development deteriorate earlier with age, possibly due to greater metabolic demands on late-differentiating oligodendrocytes ( Bartzokis, 2004;Bartzokis et al., 2004).Cross-sectional DTI data have lent substantial support for the retrogenesis hypothesis ( Brickman et al., 2012), as reflected by studies showing steeper age decline in prefrontal regions and association fibers than in projection fibers ( Barrick et al., 2010;Burzynska et al., 2010) and steeper age decline in the most anterior sections of the corpus callosum ( Bartzokis, 2004;Head et al., 2004;Salat et al., 2005;Sullivan et al., 2010).Therefore, we hypothesized that late-myelinating WM regions, such as the genu of the corpus callosum, will show the greatest magnitude of within-person declines in FA and increases in RD, possibly reflecting demyelination.In contrast, regions myelinating earlier, such as the corticospinal tract or posterior sections of the corpus callosum, may show FA declines only in later life (i.e., after age 70).
Our third aim was to explore the role of various modifiers of within-person changes in adult WM.We expected chronological age to be the main moderator of declines in WM integrity, with older age correlating with a greater magnitude of decline.Furthermore, given the role of sex hormones in promoting myelination, oligodendrocyte proliferation ( Ghoumari et al., 2020;Jure et al., 2019;Mendell & MacLusky, 2018), and modulating brain inflammation ( Yilmaz et al., 2019), we believe there could be sex differences in age-related declines in WM.So far, crosssectional DTI studies have reported greater FA in men ( Kochunov et al., 2012;Lebel et al., 2012;Ritchie et al., 2018) or no sex differences across the adult lifespan ( Kennedy & Raz, 2009).Thus, our analyses concerning sex differences were exploratory.Other candidate modifiers of WM aging include hypertension ( van Dijk et al., 2004), habitual physical activity ( Burzynska et al., 2014;Sexton et al., 2016), or APOE genotype ( Sudre et al., 2017).In addition, since people with mild cognitive impairment, subjective cognitive impairment and risk of Alzheimer's disease show higher MD and lower FA compared to healthy older adults ( Brueggen et al., 2019), we will also discuss evidence of within-person change in these groups.
Studying within-person changes in adult WM is important given that for decades, WM has been thought to play a passive role in brain function by merely relaying electrical signals between gray matter regions, where information processing occurs.In addition, the adult WM has been considered "static" after reaching maturity, namely, not capable of or involved in neuroplasticity and only prone to deterioration due to age or disease.Recently, rodent studies have shown that cognitive, and motor learning in adult animals requires myelin plasticity ( Gibson et al., 2014;Hines et al., 2015;Jeffries et al., 2016;McKenzie et al., 2014;Sampaio-Baptista et al., 2013).However, because the evidence of traininginduced changes in adult human WM microstructure is scarce and inconsistent, WM remains rarely considered the primary target for treatments and interventions against cognitive decline ( Mendez Colmenares et al., 2021;Sampaio-Baptista & Johansen-Berg, 2017), which is a missed opportunity.We argue that understanding the naturally occurring within-person changes in WM in older age will lay the foundation for studying adult WM's plastic and regenerative potential in future clinical trials.In this literature review, we also reviewed evidence from clinical studies to assess the malleability of adult WM microstructure with experience, to identify the most promising interventions for inducing change.
Taken together, our overarching hypothesis was that WM microstructure undergoes significant within-person changes during adulthood and aging, and that these changes can be captured noninvasively with DTI.We hypothesized that within-person changes in WM microstructure in older age: (a) involve predominantly declines in FA and increases in MD and RD; (b) are characterized by declines in FA and concurrent increases in MD and RD; (c) follow the development-to-degeneration spatiotemporal pattern, with greater magnitudes of change in late-myelinating regions; (d) are moderated by duration or time until follow-up, sex, hypertension, lifestyle factors, and genetic risk factors for Alzheimer's disease, and are more pronounced in individuals with mild cognitive impairment or risk of Alzheimer's disease.To answer these questions, we conducted a comprehensive qualitative review of longitudinal DTI studies and performed a meta-analysis on a subsample of studies that provided sufficient data.

METHODS
Our study was pre-registered in the PROSPERO database as PROSPERO 2021 CRD42021273127.

Search strategy
A systematic search was performed in electronic databases Web of Science and Pubmed up to July 13, 2021.The main search strategy was based on three key components: longitudinal studies, white matter, diffusion tensor MRI, and healthy adult samples.The PubMed database was searched for the terms in either the title or abstract, whereas the Web of Science database was searched for the terms in "topic," which includes title, abstract, and keywords.We searched for studies in peer-reviewed journals, applying no limitations on publication year or language.Given that researchers use different terms to refer to DTI and may not use the DTI or MRI abbreviations in the abstract or title, we used the broad term "diffusion" in our search query.The PubMed query ("white matter"[Title/Abstract] AND "longitudinal"[Title/Abstract] AND "diffusion"[Title/Abstract] AND "adults"[Title/Abstract]) resulted in 283 hits.The Web of Science query ("white matter"(Topic) and longitudinal (Topic) and diffusion (Topic) AND "adults"(Topic)) resulted in 531 hits.After inspection of the results, we noticed that many hits for "longitudinal" were associated with the longitudinal fasciculus.Therefore, we added the NOT "longitudinal fasciculus" term to both queries, resulting in 126 hits in PubMed and 248 hits for the Web of Science.In addition, reference lists of included studies and relevant reviews were manually searched for additional eligible studies.

Study selection
A.M.C. and A.Z.B. independently screened the title, abstracts, and, where appropriate, full text of identified citations and any disagreements were resolved by consensus.For studies to be included in the systematic review, the following criteria had to be met: 4. Included cognitively and neurologically healthy adults.Healthy adults generally excluded participants on anxiolytics, antidepressants, or antiepileptics and those consuming over three alcoholic beverages daily.Some studies reported including participants with treated hypertension ( Bender, Völkle, et al., 2016;Williams et al., 2019), but this information was not reported in all studies.Animal and patient populations (e.g., schizophrenia, autism, stroke, concussion, substance abuse, prehypertension) were excluded, except for studies involving people with mild cognitive impairment, Alzheimer's disease, and related dementias in older age groups, which were included in the qualitative review.5. We excluded studies that did not report change (or effect of time) in DTI parameters as a study outcome.These studies included ( Fissler et al., 2017;Fletcher et al., 2013;Lampit et al., 2015;Racine et al., 2019), who reported only differences in change between clinical and healthy populations, or ( Maltais et al., 2020;Raffin et al., 2021;Scott et al., 2017;Staffaroni et al., 2019) who used change in DTI only as a correlate of change in cognition, brain perfusion, or baseline physical activity.However, we listed these studies in Appendix A and mentioned them in the qualitative review of modifiers of WM change.6.In addition, we excluded two studies with short follow-up times (<4 weeks) ( Chen et al., 2020;Nilsson et al., 2021)

Data selection
The PRISMA flowchart provides an overview of the number of articles screened, included, and excluded (Fig. 1).We included a total of 30 studies in the systematic review, of which half had sufficient data to be included in the meta-analysis.Missing outcomes were requested by contacting the corresponding authors.We contacted 25 authors with insufficient data in the original publication to calculate standardized mean differences or standard errors and received 13 responses.
Given the variability in reporting all four DTI parameters, we focused only on FA to maximize the number of studies for the meta-analyses.At the same time, other DTI metrics are discussed in the qualitative review.
From the 30 studies included in the review, the median year of publication was 2015 (range 2009-2021).The median sample size was 56, varying from 11 to 2,125.The average baseline age was 65.3 years (range 18-103 years).The mean follow-up time was 27.7 months (range 2-58 months) (Fig. 2).
Studies with overlapping samples were excluded when the same aspect of WM structure was examined in both papers ( Kocevska, Cremers, et al., 2019;Kocevska, Tiemeier, et al., 2019).In this case, the study with the largest sample size was first given preference.One study reported multiple follow-up visits ( Bender, Völkle, et al., 2016).In this case, for the meta-analysis, we used data from the longest follow-up time.We included six randomized controlled trials with longitudinal DTI data and collected information from the healthy control groups ( Burzynska et al., 2017;Cao et al., 2016;de Lange et al., 2017;Engvig et al., 2012;Lövdén et al., 2010;Voss et al., 2013).We excluded one randomized controlled trial without a control group ( Clark et al., 2019).

Risk of bias (quality) assessment
A.M.C. and an external reviewer assessed the risk of bias with the NIH quality assessment tool for observational cohort studies, case control studies, and pre-post studies with no control group ( Study Quality Assessment Tools | NHLBI, NIH, 2013).Studies needed to have clearly defined aims, a clearly specified study population, appropriate inclusion criteria description, ethical approval, and healthy adults recruited from the community (see Appendix B and Appendix C for more details).In addition, A.Z.B. and A.M.C. performed the quality check of the reported MRI methodology and statistics.

Data extraction
A.Z.B. and A.M.C. independently extracted the following details using a structured data abstraction form: MRI method of WM microstructure quantification, study design (number and time between within-person measurements, longitudinal observational vs. intervention), anatomical specificity (global or regional measures of WM microstructure), participant demographics (sample size, age range, age at baseline, percentage of female participants), and results (statistically significant findings, measures of change, and their standard errors, Table 1).

Effect size estimation
Our meta-analyses focused on FA and two regions of interest: whole WM (n = 12) and genu of the corpus callosum (n = 9), as these regions allowed us to include the largest number of studies.We included the splenium of the corpus callosum (n = 4) for exploratory analyses.We did not include MD, RD, AD, or other WM regions as insufficient number of studies overlapped in reporting these DTI metrics and WM regions (Table 1).We used the R package "metafor" to estimate the mean and standard deviation of the distribution of the outcome effect size using a random-effects model ( Viechtbauer, 2010).For our effect size, we calculated Cohen's d or standardized mean difference (SMD) as the difference between two means (i.e., post-pre time measures), standardized by the pooled within-sample estimate of the population SD, calculated as SD (pooled where SD1 is the standard deviation for the baseline measurement and SD2 is the standard deviation for the follow-up measurement.
We calculated the standard error of the SMD with acco unts for the covariance between the two measurements and provides a more accurate estimate of the precision of the SMD, as recommended in the Cochrane Handbook (Section 23.2.7.2).

Heterogeneity analysis
We estimated heterogeneity using the I² statistic, which represents the percentage of variance between studies  attributable to differences in true effect sizes across studies rather than sampling variability.Although there is no universal threshold for interpreting the I², values of 25%, 50%, and 75% are commonly used to denote low, moderate, and high heterogeneity, respectively.However, I² estimates may be imprecise because they are influenced by the precision of the individual study effect sizes and the presence of outliers ( Ioannidis et al., 2007).
To address this potential issue, we calculated 95% confidence intervals for the I² estimate using the Q-profile method ( Viechtbauer, 2007).Heterogeneity variance was calculated using the restricted maximum likelihood (REML) method ( Langan et al., 2019).To further explore the heterogeneity of the effect sizes and the robustness of our meta-analysis, we employed Graphical Display of Study Heterogeneity (GOSH) plots ( Olkin et al., 2012) to display the effect sizes across studies.We then employed three supervised machine learning (k-means, DBSCAN, and the Gaussian Mixture Model) algorithms to detect clusters in the GOSH plot data and identify outlying and influential studies in our data.Lastly, to examine the potential for publication bias, we performed funnel plots and Egger's regression tests for funnel plot asymmetry.

Analysis of modifiers of change using individual-level data
Lastly, we performed linear mixed-effects models using the lme4 package in R for a subset of studies (n = 6 studies, n = 375 subjects) that provided individual FA data ( Beck et al., 2021;Bender, Völkle, et al., 2016;Burzynska et al., 2017;Rieckmann et al., 2016;Teipel et al., 2010;Voss et al., 2013).We added a random intercept for study and fixed effects for time point, age, sex, time until follow-up and sex-by-age interaction.To create partially standardized regression coefficients, we standardized all quantitative variables, but not factors.All analyses were conducted in R version 4.0.1, and statistical significance was accepted at p < 0.05 for two-tailed tests.

Within-person changes in DTI parameters-a qualitative summary
To provide a qualitative summary of within-person changes in DTI parameters, we analyzed 30 studies included in our systematic review (Table 1).FA was the most frequently reported metric (29 studies) and in 77% studies (n = 23) there was a decline in FA.Notably, earlier studies (i.e., published 2009-2014) tended to report no significant changes in FA.MD was the second most reported metric (19 studies), and 53% (n = 16) reported an increase in MD.The less commonly reported metrics were RD (18 studies) and AD (16 studies) were less commonly reported, with 43% (n = 13) and 33% (n = 10) reported increases in RD and AD, respectively.Table 2 provides a visual summary of the changes in each DTI parameter.

Within-person changes in FA of the whole WM-a meta-analysis
To account for the different ways effect sizes were reported across studies, we selected a subset of studies that provided sufficient data to calculate the d-statistic defined as the change between two timepoints divided by pooled standard deviation.For the whole WM, we obtained data from 12 studies (Fig. 3).The pooled effect showed a significant decline in the whole WM FA (d = -0.1235,95% CI: -0.21 to -0.03, p = 0.0086), both when adjusted and not adjusted for the follow-up time as a moderator.Heterogeneity across the studies was substantial (I² = 93.5% after adjusting for study follow-up time as a covariate).
To address the high heterogeneity, we performed diagnostic testing for influential cases (outliers) with GOSH plots, followed by sensitivity analyses, which identified two outlier studies ( Kocevska, Cremers, et al., 2019;Staffaroni et al., 2018).We repeated the random effects model without the two outliers, which confirmed the significant negative change in FA, but with reduced heterogeneity (residual I² = 48%); Figure 4 (see Table 3 for model comparisons).
The reduction in heterogeneity indicates that approximately 48% of the total variance in FA can be attributed to heterogeneity among the studies, with the remaining 2% attributed to sampling variance.In sum, the model comparison indicated a robust and significant, yet small effect size of within-person declines in FA in the whole WM despite the heterogeneity observed among the studies.

Within-person changes in FA of the genu and splenium corpus callosum-a meta-analysis
For the genu corpus callosum, we obtained data from nine studies.The pooled effect among 550 participants (69.2 ± 6.8 years old) showed a significant negative change in FA (d = -0.1432,95% CI: -0.22 to -0.06, p = 0.0003, Fig. 5).We noted a moderate level of heterogeneity (residual   For the splenium corpus callosum, we obtained data from four studies.The pooled effect among 176 participants (67.7 ± 4.0 years old) showed a non-significant negative change in FA (d = -0.1399,95% CI: -0.2881 to 0.0084, p = 0.0644, Fig. 6), with a level of heterogeneity (residual I² = 0).However, it is important to note the wide confidence interval for this I² estimate (0% to 90.10%), indicating a high degree of uncertainty about the true level of heterogeneity.

The effect of follow-up time on change in FA
To understand the effect of follow-up time (i.e., the time elapsed between the two measurements) on FA change, we correlated the mean % change in the whole WM, genu, and splenium of the corpus callosum with the mean study follow-up time among the studies included in the meta-analyses.We found a trend towards increased decline in FA with longer follow-up times in the whole WM and genu (whole WM r = -0.28,95% CI: -0.74 to 0.34, p = 0.361; genu of the corpus callosum r = -0.53,95% CI: -0.88 to 0.19, p = 0.134; Fig. 7).We found no correlation between mean % change and follow-up time in the splenium, with a wide confidence interval indicating high uncertainty (r = -0.02,95% CI: -0.96 to 0.96, p = 0.977).

The effect of age and sex on change in FA
To examine the effects of age (at the time of the first or baseline measurement) and sex on within-person changes in DTI parameters, we (1) conducted a qualitative analysis of the 14 studies that included age and sex as covariates in their analyses and (2) performed a quantitative analysis of the studies that provided individual FA data at both time points.

Effects of age: qualitative analysis
Out of 14 studies reporting relevant data, 10 studies found that older age at baseline was associated with greater magnitude of decline in FA ( Beck et al., 2021;Bender, Prindle, et al., 2016;Bender, Völkle, et al., 2016;Burzynska et al., 2017;Pfefferbaum et al., 2014;Sexton et al., 2014;Song et al., 2018;Storsve et al., 2016;Voss et al., 2013;Williams et al., 2019), while one study reported no effect of age on FA change ( Barrick et al., 2010)  ( Beck et al., 2021;Bender, Prindle, et al., 2016;Bender, Völkle, et al., 2016;Sexton et al., 2014;Storsve et al., 2016).Notably, two studies across the lifespan specifically reported an accelerated decline in FA after the fifth decade of life ( Sexton et al., 2014;Storsve et al., 2016).Furthermore, Beck et al. (2021) showed that FA plateaued around the third decade, with a steady decline following the age of ~40 years and an accelerated decrease in older age.For MD, AD, and RD, these metrics decreased until the 40-50-year age mark and subsequently increased following a steady period.This pattern is consistent with previous cross-sectional data ( Bartzokis et al., 2004;Lebel et al., 2012) and shows an inverted U-shape for FA and a U-shape for other DTI metrics, with an inflection point at approximately 40--50 years of age.

Effects of sex: qualitative analysis
Out of seven studies that investigated sex differences in within-person changes in DTI parameters ( Beck et al., 2021;Burzynska et al., 2017;Nicolas et al., 2020;Sexton et al., 2014;Teipel et al., 2010), only two reported significant sex differences.Williams et al. (2019) found that women (aged 50-95 years) showed a greater decline in FA in the cingulum and a greater MD increase in the genu of the corpus callosum.In contrast, in a study of very old adults (aged 81-103 years), Lövdén et al. (2014) found that women had a smaller decline in FA in the forceps minor than men.

Effects of age and sex: quantitative analysis in the whole WM
A linear mixed-effects model using individual-level FA supplied to us by authors ( Beck et al., 2021;Bender, Völkle, et al., 2016;Burzynska et al., 2017;Rieckmann et al., 2016;Teipel et al., 2010;Voss et al., 2013) showed that older age, female sex, and longer follow-up time were associated with greater declines in FA.We also observed an interaction between age and sex, with the negative effect of age on FA change being about 48% larger in females than in males (Table 4 and Fig. 8).
In subsequent exploratory analysis of individual-level FA data, greater baseline age and time until follow-up correlated with greater FA decline in both genu and splenium of the corpus callosum (Table 5).

Spatial patterns of within-person changes: qualitative summary
Due to the wide variability in defining regions of interest among the 30 studies in Table 1, we could not directly compare the effect sizes of FA change across different regions in a meta-analysis.Thus, we offer a qualitative summary of our findings (see Table 2).

Other modifiers of within-person changes in DTI parameters
This section highlights lifestyle, health, and genetic modifiers and correlates of changes in DTI parameters in adult WM investigated among the 30 studies (Table 1).

Health-related modifiers
3.7.1.1.General health indicators.Telomere attrition was associated with greater FA decrease and MD increase in the fornix, even after controlling for physical activity and vascular risk ( Staffaroni et al., 2018).In con-trast, sleep duration and quality were not related to DTI changes ( Kocevska, Cremers, et al., 2019).
3.7.1.2.Cardiovascular risk factors.Higher baseline cumulative burden of vascular risk factors (i.e., hypertension, obesity, elevated cholesterol, diabetes, and smoking status) was associated with greater decline in FA in the parahippocampal cingulum, fornix/stria terminalis, and splenium of the corpus callosum and greater increases in MD in the splenium of the corpus callosum in otherwise cognitively healthy older adults ( Williams et al., 2019).
Another study reported trend-level associations between diagnosed hypertension and greater within-person increase in AD and RD ( Bender, Völkle, et al., 2016).

Lifestyle modifiers
3.7.2.1.Physical activity and social activities.A 6-month dance intervention led to a slower decline in FA and smaller RD increase in the fornix compared to control and aerobic walking groups, and spending less time sedentary and engaging more in moderate-to-vigorous physical activity at study baseline correlated with lesser 6-month decline in prefrontal FA ( Burzynska et al., 2017).Notably, adding a nutritional supplement (beta alanine) to walking did not seem to affect within-person changes in WM compared to walking alone.Another 1-year RCT with the same aerobic walking intervention group reported no group-level level effects of exercise, but found that greater aerobic fitness gain correlated with more positive FA changes in the frontal and temporal lobes ( Voss et al., 2013).Engagement in social leisure activities over a 3-year period was associated with increased FA in the corticospinal tract and improved processing speed in individuals older than 80 years ( Köhncke et al., 2016).In sum, physical activity and aerobic fitness is a promising protective lifestyle factor for allowing down or reversing age-related FA declines.Interestingly, studies considering genetic risk of Alzheimer's disease have reported mixed results: engaging in physical activity was associated with greater increases in MD and AD among healthy adults with APOE ε4 genotype ( Raffin et al., 2021) and increased MD in patients with subjective cognitive impairment ( Maltais et al., 2020).These findings suggest that the effects of physical activity on WM changes may vary in various clinical groups and warrants further investigation.2012) reported a reduced decline in FA in the anterior WM after 12 weeks of memory training compared to controls.Lövdén et al. (2010) documented an increase in FA in the genu for older, but not younger participants, following a 100-hour cognitive training.Similarly, De Lange et al. (2017) found that older adults in the cognitive training group experienced less age-related decline in FA and a smaller increase in MD, RD, and AD compared to the control group in areas, including the corpus callosum and the cortico-spinal tract; these effects were not observed in younger participants.Other 12-week cognitive training interventions failed to find any changes in within-person change ( Cao et al., 2016;Lampit et al., 2015).

Genetic risk and neurological modifiers
3.7.3.1.Genetic risk and biomarkers of Alzheimer's disease.APOE ε4 carriers had a significantly greater decline in FA in the genu and body of the corpus callosum and splenium of the corpus callosum compared to non-carriers, but did not differ in rates of change in MD ( Williams et al., 2019).Healthy older adults with a higher amyloid burden showed accelerated FA decline in the parahippocampal cingulum, body corpus callosum, and forceps minor than those with low amyloid burden, even after controlling for hippocampal atrophy ( Rieckmann et al., 2016).Other biomarkers such as YKL-40 and amyloid-beta have also been found to be predictive of greater within-person changes in MD ( Racine et al., 2019) and RD ( Song et al., 2018).
3.7.3.2.Cognitive status.Fletcher et al. (2013) have reported that greater within-person changes in AD in the fornix were associated with an increased risk of conversion to mild cognitive impairment in healthy older adults, whereas others found no magnitude differences but greater inter-person variability in FA change between participants with mild cognitive impairment compared to the healthy controls ( Teipel et al., 2010).

DISCUSSION
Our meta-analysis and longitudinal models demonstrated that: • WM microstructure undergoes significant changes throughout adulthood.• Within-person changes can be captured noninvasively using DTI.• The pooled effect size of FA declines was d = -0.12 in whole WM, d = -0.16 in genu, and d = -0.13 in the splenium of the corpus callosum.• The magnitude of within-person changes increases with advancing age in the whole WM, genu, and splenium of the corpus callosum.
• Female sex is associated with increased decline in the whole WM.• Longer follow-up times were associated with increased decline in FA in the whole WM and genu.• Regarding the spatiotemporal pattern of changes in WM, evidence for the anterior-to-posterior gradient in the corpus callosum remains inconclusive.
The outcomes of the systematic review suggest that: • Changes observed in the adult WM include predominantly declines in FA, increases in MD and RD, and to a lesser extent, increases in AD. • Changes in WM can be detected within a relatively short period of 6 months.However, results are more consistent when the follow-up time is longer.Below, we discuss main findings, limitations, and recommendations for future longitudinal studies on WM aging.

Magnitude of change
We found that the magnitude of within-person changes increased with advancing age.However, we could not calculate an effect size estimate per year because of varying follow-up times across studies (ranging from 2 to 58 months), as only two studies had a 12-month follow-up duration ( Cao et al., 2016;Voss et al., 2013).We did not standardize our effect size estimates per year, as this would require a strong assumption of a common linear change in WM across all studies.Given the large heterogeneity in the effect sizes across studies, we do not think this assumption is reasonable.Therefore, it is important to acknowledge the limitations of our current understanding of the rate of decline in FA in WM in aging individuals, given the variability in follow-up durations and potential nonlinear trajectories of change.Future studies with more uniform follow-up durations would be needed to estimate the effect size of this decline more accurately.

Is there regional variability in WM changes?
Our systematic review supports the notion of selective vulnerability to aging and neurodegeneration in latemyelinating WM regions, such as the fornix and genu of the corpus callosum ( Lacalle-Aurioles & Iturria-Medina, 2023; Raghavan et al., 2020).Our meta-analysis showed a negative change in FA in the genu and splenium of the corpus callosum with comparable effect sizes.The fact that the changes were significant in the genu but not in the splenium may be due to differences in statistical power between the two analyses (n = 550 vs. n = 176), leaving the evidence for selective vulnerability in the anterior corpus callosum inconclusive.Considering the range of sample sizes in our included studies-spanning from 11 to 108 for the splenium and 11 to 201 for the genu-it is prudent for future research to calibrate their sample sizes informed by our findings.A larger sample size may reveal greater effect sizes in anterior commissural zones like the genu, aligning with current theories of aging.
While our review did not aim to study WM lateralization, select studies, like Cao et al., 2016, observed withinperson change primarily in the left cingulum and superior longitudinal fasciculus.Additionally, Ritchie et al. found that including the correlation between bilateral tracts improved a single-factor cognition model.However, there is a possibility that variance in the FA changes of specific tracts may provide additional information about cognitive decline ( Ritchie et al., 2015).Future studies should directly examine WM lateralization in a longitudinal healthy aging sample.

Effect of follow-up time and time periods to detect within-person change in WM
In the whole WM and corpus callosum, the meta-analyses and linear-mixed effects models showed a significant effect of follow-up time on changes in FA.Notably, adjusting for follow-up time as a moderator in the meta-analysis had a minimal effect on the pooled effect size estimates.It is possible that including studies with varying follow-up durations (6-58 months) introduced heterogeneity that affected the overall effect size estimate.Nevertheless, our findings suggest that a longer duration between MRI measurements is associated with a greater decline in FA.
However, in the splenium region, despite a significant main effect of follow-up time in the linear mixed-effects model, we did not find a correlation between average follow-up time and FA percent change by study.This lack of correlation may be attributed to lower statistical power, as the meta-analyses and correlational analyses used aggregated effect sizes, whereas the linear mixed-effects models employed data at the individual level.
In our qualitative review, we found that earlier studies with shorter follow-up times and small sample sizes did not find significant within-person changes in WM, while more recent research has reported small yet significant effects at shorter follow-up times.Studies with follow-up times ranging from 6-to 58 months reported a decline in FA and increases in MD, RD, and AD over time, with a few exceptions ( Kocevska, Cremers, et al., 2019) or mixed findings ( Bender, Prindle, et al., 2016;Cao et al., 2016).This suggests that follow-up times shorter than 6 months might be insufficient to robustly detect within-person changes in WM microstructure in healthy adults, especially when sample sizes are limited.It is possible that the lack of significant within-person changes observed in the first DTI longitudinal studies can be attributed to the lack of standardization in DTI preprocessing pipelines and lower quality of diffusion sequences ( Lövdén et al., 2010;Mielke et al., 2009;Sullivan et al., 2010).However, it is plausible that advancements in data processing techniques and the enhancement of artifact and noise removal methods (e.g., susceptibility or motion) have increased the sensitivity of DTI data to within-person changes.

Modifiers of within-person change
Our systematic review suggests that aerobic exercise and cognitive training may have subtle effects on WM changes as measured by DTI ( Burzynska et al., 2017;Engvig et al., 2012;Voss et al., 2013).
We also found that genetic factors such as the APOE ε4 allele and biomarkers for Alzheimer's disease pathology (amyloid-beta) and chronic inflammation (YKL-40) appear to be promising modifiers of within-person WM changes ( Racine et al., 2019;Song et al., 2018), especially in the fornix, a WM region susceptible to aging and neurodegeneration ( Lacalle-Aurioles & Iturria-Medina, 2023).Additionally, Fletcher et al. (2013) noted that greater within-person changes in WM in the fornix predicted conversion to mild cognitive impairment.Similarly, greater within-person declines were predictive of decreased executive function, but not memory, in those with late mild cognitive impairment and dementia ( Scott et al., 2017).
Despite the evidence that exercise, physical activity, and increased social engagement have an impact on WM health, many important questions remain to be answered.From a practical perspective, we still need to learn how to design exercise interventions that mitigate WM decline.Future research might be able to answer questions such as: When is it best to begin lifestyle interventions?Can exercise that combines cognitive stimulation with social interactions (e.g., dancing) have more positive effects on WM? How do social and environmental interactions relate to modifiers of WM health?What types of exercise work best for people with comorbidities such as diabetes and hypertension?Regardless, we observed subtle and inconclusive findings of modifiers of within-person change in several studies, possibly due to varying follow-up times and study protocols.

Heterogeneity among studies
Heterogeneity among the MRI studies included in this review and meta-analysis impacts the observed effect sizes in within-person changes in white matter DTI over time.We identified several sources of heterogeneity.First, there is inconsistent reporting, such as the absence of mean and variances at baseline or follow-up, presenting only means at baseline, or providing latent change scores adjusted by other covariates without the corresponding raw mean scores or unadjusted estimates.The variations in the statistical measures reported (e.g., M±SE, M±SD, median±SD, β) further contributed to the challenges in comparing and synthesizing the results across the 30 studies (see Table 1 for more details).Moreover, we observed that most studies incorporated distinct regions of interest and often lacked consistent data on modifiers of healthy aging, such as hypertension and lifestyle factors.This restricted our capacity to execute a meta-analysis with higher statistical power and impeded our efforts to quantitatively assess the influence of these modifiers on within-person changes in WM.
Second, there were significant methodological differences in acquisition and processing of DTI data.For example, although the great majority of the studies used the TBSS procedure designed to minimize the effects of anatomical differences in samples with wide age ranges, several studies implemented various customizations of TBSS processing ( Bender, Prindle, et al., 2016;Bender, Völkle, et al., 2016;Coelho et al., 2021;Engvig et al., 2012).Other studies employed alternative approaches such as manual drawing of ROIs and other semiautomated segmentation methods to extract subregions of the corpus callosum ( Charlton et al., 2010;Lövdén et al., 2010;Mielke et al., 2009;Song et al., 2018;Staffaroni et al., 2018;Williams et al., 2019), or tractography-related methods ( Kocevska, Cremers, et al., 2019;Storsve et al., 2016;Sullivan et al., 2010;Vik et al., 2015).Different ways to extract regional DTI data may affect not only the mean values of the DTI metrics, but also the sensitivity to change.For example, skeletonized data are extracted from the center of tracts, which omit areas of lower FA due to partial volume or white matter lesions, both of which may be confounded by aging.Consequently, the skeletonization method tends to overestimate FA values and underestimate the values of MD, RD, and AD compared to methods such as manual ROI delineation or atlas-based segmentation, which consider the whole tract.
Notably, DTI, mainly when processed using the standard TBSS pipeline, has shown robust stability over short periods of time.TBSS studies have indicated a low testretest variability in DTI metrics with less than 0.4% mean differences and below 1.2% variance in measurements across and within scanner locations and a between-site intraclass correlation coefficient (ICC) exceeding 0.80 ( Melzer et al., 2020).Similarly, a multisite study of healthy older adults found that FA had the strongest test-retest reliability with reproducibility errors consistently within 2-4% range ( Jovicich et al., 2014).Another study of healthy older adults comparing conventional DTI measures to free water elimination diffusion MRI found that irrespective of the diffusion analysis method used, FA was the most reliable metric (ICC: 0.87 ± 0.05), while MD was the least reliable (ICC: 0.81 ± 0.09) ( Albi et al., 2017).
However, we recognize that different processing pipelines can introduce variability, and the reliability of TBSS can vary by region and registration quality across images.As a step forward, we recommend future longitudinal studies, especially those employing varied diffusion models or preprocessing methods, to incorporate an evaluation of test-retest reliability for a subset of participants at baseline.Thus, while our reviewed studies suggest that longitudinal changes in DTI primarily arise from aging, not measurement error, the implications of methodological differences on within-person WM remain unresolved and would be an interesting topic for future investigation.
Lastly, given that probabilistic tractography can have varying degrees of reproducibility and reliability ( Maier-Hein et al., 2017) and TBSS-ROI in standard space has shown excellent precision and reproducibility ( Cai et al., 2021), we suggest that TBSS-ROI should be the method of choice for comparing and pooling results for future meta-analyses.In summary, our results show consistent negative changes in FA despite heterogeneity in DTI protocols and analyses; however, we could not determine how each potential source of heterogeneity influenced the pooled effect size in our analyses.-Use the standard TBSS-ROI procedure, at least as a point of reference.In doing so, use a studyspecific "mean FA" and "skeleton" (rather than a standard template).-Use a minimum follow-up time of 6 months to detect significant within-person changes in WM microstructure in healthy adults, especially when sample sizes are small.-Include a properly designed control group when conducting an RCT.A crossover design, although beneficial for controlling for between-person variability, would not allow observation of the effects of time on natural within-person changes in WM. -In both case-control studies and RCTs, report not only the effect of the intervention on within-person changes in WM but also the changes observed within the control groups.
-Acquire more data-rich datasets, such as using more diffusion-weighted directions and b = 0 images since increased quality of the diffusion sequence can lead to higher reproducibility of FA and MD in older adults.

Conclusions and future directions
This study is the first attempt to synthesize observational longitudinal changes in the adult WM microstructure, providing estimates of effect sizes, direction regional variability, and modifiers in changes in DTI parameters.We also provided specific recommendations to ensure comparability and reproducibility in future longitudinal studies on WM.Therefore, our results should serve as a reference point regarding the expected effect and sample sizes in designing observational or randomized clinical trials, with DTI of the WM as the outcome variable.Furthermore, our results suggest that many protective and risk factors influence within-person deterioration in WM microstructure and, thus, may provide a good return on investment in clinical trials aimed at slowing down or reversing this decline using both lifestyle and pharmaceutical means  ( Weiskopf et al., 2021), such as myelin-water imaging, advanced diffusion imaging, quantitative susceptibility mapping, or quantitative magnetization transfer.Despite its sensitivity to age-related differences and changes, DTI provides limited insight into the integrity of myelin and axons.Related to this, it must be noted that FA is a derivative of axial and radial diffusivities, and so is MD.Thus, the different metrics obtained by DTI are not mathematically independent, and their correlations may differ as a function of both local microstructure and age-related processes ( Burzynska et al., 2010).Such findings highlight the challenge of identifying specific biological processes with DTI metrics.Longitudinal studies using more advanced techniques are currently emerging ( Beck et al., 2021) and will help identify the biological underpinnings that drive changes in DTI metrics.This may facilitate the development of new non-pharmacological and pharmacological interventions targeting WM pathology to complement efforts focused on gray matter pathology in aging and dementia.

DATA AND CODE AVAILABILITY
This study primarily worked with summary effect sizes and individual-level data, which were obtained directly from the authors of the original studies or extracted from the original publications.In compliance with data sharing policies and respect for the confidentiality of individuallevel data, we can provide a detailed summary of the effect sizes upon reasonable request.

Fig. 3 .
Fig.3.Forest-plot showing standardized effects sizes of FA decline in whole WM using summary statistics across 12 studies.Box size represents study weights.At the bottom, we display final summary estimates with 95% CI for unadjusted vs. adjusted models (accounting for study follow-up time as a moderator).The weights for each study are calculated as the inverse of the variance of the effect size estimate for the study, meaning that the larger the standard error of an effect size estimate, the smaller the weight.

Fig. 4 .
Fig.4.Forest-plot showing standardized effects of FA change across in the whole WM using summary statistics across 10 studies (after omitting two outlier studies).Box size represents study weights.At the bottom, we display final summary estimates with 95% CI for the random-effect model.Removed as outliers:Staffaroni et al., 2018 andKocevska, Cremers, et al., 2019.The weights for each study are calculated as the inverse of the variance of the effect size estimate for the study, meaning that the larger the standard error of an effect size estimate, the smaller the weight.

Fig. 5 .
Fig. 5. Forest-plot showing standardized effects of FA change in the genu of the corpus callosum across nine studies.Box size represents study weights.At the bottom, we display final summary estimates with 95% CI for unadjusted vs. adjusted models accounting for study follow-up time as a moderator.The weights for each study are calculated as the inverse of the variance of the effect size estimate for the study, meaning that the larger the standard error of an effect size estimate, the smaller the weight.

Fig. 6 .
Fig. 6.Forest-plot showing standardized effects of FA change in the splenium of the corpus callosum across four studies.

Fig. 7 .
Fig. 7. Mean study follow-up time was associated with greater decline in FA in the whole WM and genu (more negative mean % change).The regression lines represent the results of a linear model fitted to the data.The shaded area around the line represents the standard error.Points display the percent change for each study.

Fig. 8 .
Fig. 8. Older age correlated with more negative change in the FA of the whole WM.Each point represents an individual's predicted FA change based on the linear mixed-effects analyses.The solid lines represent the linear regression line for each study.

4. 3 .
Recommendations for design and reporting in future longitudinal studies on WM Based on the challenges we encountered in conducting meta-analyses, we suggest that future studies should strive to: -Standardize the reporting of results.This should include, at a minimum, (a) parameter estimates for within-person changes in DTI, (b) standard deviations, (c) standard errors, and (d) pre-and postmeasurement mean values.This information should always be included, for example, in Supplementary Materials.-Report effect size estimates and 95% confidence intervals rather than p-values only.-Consistently provide information on (a) mean follow-up times, (b) age at baseline and follow-up, (c) sample size at baseline and follow-up, and (e) correlations between pre-and post-measurements.-Report null findings when appropriate.

Table 2 .
Within-person change in DTI metrics, moderators of change, and regional differences.The color-coding in the heat map is used to represent the direction of change in DTI-MRI parameters (FA, MD, RD, AD).A positive change is represented by red, and a negative change is represented by blue.The color gray is used to represent no change.The following abbreviations are used in the table: ACR: anterior corona radiata, AD: axial diffusivity, ALIC: anterior limb of internal capsule, ATR: anterior thalamic radiation, BCC: body corpus callosum, CING: cingulum, CST: corticospinal tract, Fmin: forceps minor, FA: fractional anisotropy, FX: fornix, GCC: genu corpus callosum, MD: mean diffusivity, PLIC: posterior limb of internal capsule, RD: radial diffusivity, SLF: superior longitudinal fasciculus.Δ: change, NA: Not Applicable/Not Available, and ns: Not Significant.

Table 3 .
Meta-analysis of within-person declines in FA in the whole WM: comparison of the full model and with excluded influential studies.

Table 4 .
Linear mixed-effects analysis of within-person change in the whole WM.

Table 5 .
Linear mixed-effects analysis of within-person change in the corpus callosum.We recorded 165 participants, consisting of 330 observations at baseline and follow-up across 3 study groups.For the splenium: We documented 176 participants, consisting of 352 observations at two time points across 4 study groups.