Abstract
Developmental cognitive neuroscience aims to shed light on evolving relationships between brain structure and cognitive development. To this end, quantitative methods that reliably measure individual differences in brain tissue properties are fundamental. Standard qualitative MRI sequences are influenced by scan parameters and hardware-related biases, and also lack physical units, making the analysis of individual differences problematic. In contrast, quantitative MRI can measure physical properties of the tissue but with the cost of long scan durations and sensitivity to motion. This poses a critical limitation for studying young children. Here, we examine the reliability of an efficient quantitative multiparameter mapping method—magnetic resonance fingerprinting (MRF)—in children scanned longitudinally. We focus on T1 values in white matter, since quantitative T1 values are known to primarily reflect myelin content, a key factor in brain development. Forty-nine children aged 8–13 years (mean 10.3 years ± 1.4) completed 2 scanning sessions 2–4 months apart. In each session, two 2-min 3D-MRF scans at 1 mm isotropic resolution were collected to evaluate the effect of scan duration on image quality and scan–rescan reliability. A separate calibration scan was used to measure B0 inhomogeneity and correct for bias. We examined the impact of scan time and B0 inhomogeneity correction on scan–rescan reliability of values in white matter, by comparing single 2-min and combined two 2-min scans, with and without B0 correction. Whole-brain voxel-based reliability analysis showed that combining two 2-min MRF scans improved reliability (Pearson’s r = 0.87) compared with a single 2-min scan (r = 0.84), while B0 correction had no effect on reliability in white matter (r = 0.86 and 0.83 4- vs. 2-min). Using diffusion tractography, we segmented major white matter fiber tracts and examined the profiles of MRF-derived T1 values along each tract. We found that T1 values from MRF showed similar or greater reliability compared with diffusion parameters. Lastly, we found that R1 (1/T1) values in multiple white matter tracts were significantly correlated with age. In sum, MRF-derived T1 values were highly reliable in a longitudinal sample of children and replicated known age effects. Reliability in white matter was improved by longer scan duration but was not affected by B0 correction, making it a quick and straightforward scan to collect. We propose that MRF provides a promising avenue for acquiring quantitative brain metrics in children and patient populations where scan time and motion are of particular concern.
1 Introduction
Developmental cognitive neuroscience aims to shed light on evolving relationships between brain structure and cognitive development. To this end, quantitative methods that reliably measure individual differences in the structure of brain tissue are fundamental. Commonly used qualitative MRI sequences, for example, T1-weighted and T2-weighted contrasts, are useful for delineating anatomy and detecting pathologies. However, these qualitative measures do not provide quantitative metrics that reliably index individual differences in tissue properties (Lerch et al., 2017). Qualitative MRI measures are influenced by scan parameters and hardware-related biases (for review, see Cheng et al., 2012) and also lack physical units that can accurately quantify differences among participants. Images also may appear dramatically different due to specific scanner and protocol settings, making it hard to repeat and compare across studies, and within the same individual over time.
In contrast, quantitative MRI (qMRI) techniques aim to quantify specific physical properties of the tissue that shed light on cellular structure. Specifically, quantitative longitudinal relaxation mapping (T1) has physical units (milliseconds) that are reliable across different hardware setups (Mezer et al., 2013) and have been reported to be primarily sensitive to myelin content in the white matter (Lutti et al., 2014; Stüber et al., 2014; Warntjes et al., 2017). This specificity is crucial to uncover the neurobiological mechanisms of brain development, degeneration, and disease. For example, changes in T1 values have been described in multiple sclerosis (Harper et al., 2023; Stevenson et al., 2000), reflecting alterations in axon and myelin composition. Similarly, quantitative T1 has been used to study aging (Eminian et al., 2018; Filo et al., 2019), and degenerative and inflammatory processes in clinical conditions (Gozdas et al., 2021; Menke et al., 2009; Moallemian et al., 2023), as well as lifespan maturation and degeneration of white matter (Yeatman, Wandell, et al., 2014). Combining quantitative T1 with diffusion measurements can provide complementary information and support inferences about biological processes. For example, T1 can differentiate between white matter tracts that have similar diffusion values (Schurr et al., 2019; Yeatman, Weiner, et al., 2014), while diffusion metrics seem to be more sensitive to learning induced plasticity (Huber et al., 2021).
Although quantitative T1 mapping can provide insights into neurobiological mechanisms, it is not commonly incorporated into studies of child development. Existing T1 protocols typically require long acquisition times, which vary from 10 up to 25 min, depending on the protocol type and resolution (e.g., Callaghan et al., 2014; Deoni et al., 2005; Gracien et al., 2017; Helms et al., 2008; Kecskemeti et al., 2016; Leutritz et al., 2020; Marques & Gruetter, 2013; Mezer et al., 2013; O’Muircheartaigh et al., 2019; Sanchez Panchuelo et al., 2021). This limits the feasibility for children and clinical populations that have difficulties staying in the scanner for long periods of time. Moreover, children tend to move while in the scanner (Barkovich et al., 2019; Greene et al., 2018), which is a major challenge for many quantitative measures that require multiple images to be collected in sequence. For all these reasons, loss of data is a major barrier to large-scale quantitative MRI studies in children. When scans are required for medical reasons, it is a common practice to sedate children in order to get good data quality, though sedation also has health implications (Artunduaga et al., 2021; Jaimes et al., 2018; Wilder et al., 2009) and is, therefore, rarely used in research. This makes the development of reliable sequences with short scan times of particular importance for pediatric research and practice.
Magnetic resonance fingerprinting (MRF) is an efficient quantitative MRI technique that has the potential to address these limitations by providing a rapid and high-resolution data acquisition, particularly with recent advancements that have further improved its acquisition efficiency (Cao et al., 2019, 2022; Liao et al., 2024; Ma et al., 2013, 2018). MRF is designed to simultaneously quantify different tissue properties in a single fast acquisition. With recent improvements, it can achieve whole-brain 1-mm isotropic mapping in 2 min, which paves the way for using quantitative MRI across a broad range of research and, eventually, in clinical practice. Several studies have assessed the reliability and reproducibility of MRF (Buonincontri et al., 2019; Fujita et al., 2023; Gracien et al., 2020; Körzdörfer et al., 2019; Wicaksono et al., 2023), and others reported excellent agreement between MRF values compared with gold standard quantitative sequences (Jiang et al., 2017; Liao et al., 2017; Ma et al., 2013; Statton et al., 2022; Zhou et al., 2023), but these analyses have been done in phantoms and healthy human adults. Despite the accumulating evidence validating MRF as a stable technique, the use of MRF in pediatric studies has been scarce (Chen et al., 2019; Kim et al., 2023; Liao et al., 2024). To date, the reliability and quality of MRF-based measurements in pediatric and clinical populations, where data quality is a major challenge, have never been evaluated. The aim of the current study is to address this gap by providing a thorough and rigorous assessment of MRF scan–rescan reliability in a sample of children. This is an essential first step for adopting MRF in developmental studies, since the reliability of a measurement is the upper limit for detecting any correlations with other measurements (e.g., brain–behavior correlations) and how they evolve over time.
To this end, in the current study we evaluate the scan–rescan reliability of MRF-estimated T1-maps in a group of children scanned twice, several months apart. We show that MRF-derived T1 values are (1) highly reliable, (2) precisely capture individual differences in tissue properties, and (3) replicate known age effects in white matter tracts. In addition, we examine different reconstruction pipelines and evaluate the influence of different pipeline choices on scan–rescan reliability. We conclude with recommendations for incorporating MRF acquisitions in studies of brain development in health and disease.
2 Methods
2.1 Participants
In total, 49 children aged 8–13 years (mean 10.3 years ± 1.4; 29/20 female/male) completed 2 scanning sessions 2–4 months apart at the Stanford University Center for Neurobiological Imaging, as part of an ongoing longitudinal study. All children provided informed assent to participate in the study and their parents provided written consent. The study protocol was approved by the Stanford University School of Medicine Institutional Review Board. Before the first scan, children were acclimated to the MRI environment with a mock MRI scanner. All data were acquired on a 3T GE Discovery MR750 UHP (GE Healthcare, Milwaukee, WI, USA), equipped with a Nova 32-channel head coil.
2.2 Data acquisition
2.2.1 MRF acquisition
In each session, we collected two 3D-MRFs and a separate “PhysiCal” calibration scan. MRF data were acquired using tiny golden-angle shuffling MRF with optimized spiral-projection trajectories as proposed in Cao et al. (2022). In the MRF sequence, an adiabatic inversion-prepared pulse is used with an inversion time of 15 ms, followed by acquisition groups. Each acquisition group contains 500 TRs with varying flip angles. A resting time of 1.2 s follows each acquisition group to allow for signal recovery before the next acquisition group. Different acquisition groups use different complementary spiral k-space trajectories. A total of 16 acquisition groups were used in each MRF sequence to obtain 1-mm whole-brain quantitative (2-min scan time). The field of view (FOV) was 220 × 220 × 220 mm3, with reconstruction matrix of 220 × 220 × 200. While the two MRF scans used the same parameters, they included complementary k-space acquisition trajectories to allow them to be combined synergistically to produce a 4-min scan with reduced k-t undersampling. We chose to combine two 2-min acquisition rather than a single 4-min acquisition due to evidence that shorter sequences with more breaks decrease motion and improve scan quality in children (Meissner et al., 2020). In addition, a unified, rapid calibration sequence called Physics Calibration (PhysiCal) (Iyer et al., 2020) was used to measure B0 inhomogeneity within the same FOV, with resolution of 2 × 2 × 2 mm3 in 25 s.
2.2.2 Diffusion-weighted acquisition
We collected multi-shell dMRI data with 180 diffusion-weighted volumes, distributed across 3 shells with the following gradient scheme: 30 directions with b = 1000 s/mm2, 60 directions with b = 2000 s/mm2, and 90 directions with b = 3000 s/mm2, as well as 14 reference volumes without diffusion weighting (b = 0 s/mm2). We used a repetition time (TR) of 3335 ms with echo time (TE) of 87 ms to obtain spatial resolution of 1.5 mm3 isotropic voxels in 84 axial slices. This was achieved using a hyperband acceleration with a slice acceleration factor of 4. The scan duration totaled 11 min and was broken down to two 5.5-min scans to allow children a short break. An additional scan of six non-diffusion-weighted volumes with a reversed phase encoding direction and the same parameters was also acquired to correct for echo-planar imaging (EPI) distortions.
2.2.3 Anatomical acquisition
A high-resolution T1-weighted (T1w) anatomical scan was acquired using GE’s BRAVO sequence, which is a fast spoiled gradient echo (SPGR) with a spatial resolution of 0.9 mm3 isotropic voxels and the following parameters: inversion time (TI) of 450 ms, flip angle of 12 degrees, receiver bandwidth of 41.67 kHz, field of view 23 cm, and a matrix of 256 × 256. The scan duration was 4:47 min.
2.3 Data processing
2.3.1 MRF image reconstruction
2.3.1.1 B0 field inhomogeneity estimation
From the PhysiCal data, highly under-sampled multi-echo images were reconstructed with parallel imaging and compressed sensing methods as described in Iyer et al. (2020). Subsequently, a robust multi-echo general linear modeling robustly recovered an artifact-free B0 map (Corbin et al., 2019).
2.3.1.2 Subspace reconstruction without B0 inhomogeneity correction
The reconstruction used a spatiotemporal subspace modeling with locally low rank (LLR) constraint based on previous literature (Cao et al., 2022). The MRF dictionary was pre-calculated using the extended phase graph method (Weigel, 2015). The T1 dictionary entries covered the range of 20 to 3000 ms, with a step size of 20 ms. The dictionary also covered higher values between 3000 and 5000 ms (corresponding to CSF), with a coarse step size of 200 ms. A singular value decomposition (McGivney et al., 2014) was applied on the pre-calculated dictionary to get the first five temporal principal components, termed as subspace basis Φ. The coefficient maps of the bases can be reconstructed by this formula:
where is the under-sampling pattern, is the nonuniform Fourier transform (NUFFT), is the coil sensitivity map, is the acquired k-space data, is LLR term, and is the regularization.
2.3.1.3 Subspace reconstruction with B0 inhomogeneity correction
With the presence of B0 field inhomogeneity, the acquired signal becomes and the equation becomes
where is the conjugate phase demodulation with acquisition time t accumulated during the acquisition trajectory at an off-resonance frequency . In this study, we incorporated a time-segmented B0 correction method (Irarrazabal et al., 1996; Noll et al., 1991) to correct the B0-inhomogeneity induced image blurring.
2.3.1.4 Template matching
Subspace-based compression was also applied to the pre-calculated dictionary, which can significantly reduce the computational load and memory requirements. Then, a cross-correlation template matching was applied to the reconstructed coefficient maps using the compressed dictionary to obtain quantitative T1 values, as described in Ma et al. (2013).
2.3.1.5 MRF reconstruction pipelines
In order to assess the scan time effects on image quality, data were reconstructed in four ways: (1) using a single 2-min MRF data. (2) Combining two 2-min MRF scans (4-min). (3) Using a single 2-min MRF in combination with B0 correction. (4) Combining two 2-min MRF scans with B0 correction. Importantly, to account for any motion between the two 2-min MRF scans, we applied the following procedure to coregister the second scan to the first one, before reconstructing the combined 4-min data: We first reconstructed the two 2-min MRFs separately and then estimated the rigid transformation motion parameters between them using AFNI 3dvolreg (Cox, 1996; Cox & Hyde, 1997). We then used these parameters to update the second MRF acquisition’s trajectory and data in k-space. Finally, we combined the k-space data and trajectories of the first and second 2-min acquisitions to reconstruct the combined 4-min data. This approach ensured that the combined 4-min data are in the exact same space as the first 2-min data to which it was compared in our subsequent analyses. This registration procedure improved the quality of the 4-min reconstruction and reduced blurring (as observed qualitatively).
These computations were performed on a Linux server using MATLAB R2022a (The MathWorks, Natick, MA, USA) and Python scripts (see https://github.com/SophieSchau/MRF_demo_ISMRM2022/), with the following specifications: Ubuntu 20.04 with 104 Core Intel Xeon Gold 2.20G CPUs, an Nvidia RTX A6000, and 503GB RAM. Reconstruction of each 2-min acquisition took about 40 min with B0 correction and 10-min without B0 correction. The 4-min data reconstructions took 57 min with B0 correction and 16 min without B0 correction.
2.3.2 dMRI preprocessing
Diffusion data were pre-processed using the default pipeline in QSIprep 0.19.1 (Cieslak et al., 2021). The T1w image was corrected for intensity non-uniformity using N4BiasFieldCorrection (Tustison et al., 2010; ANTs 2.4.3) and reoriented into AC-PC alignment. This image was used as an anatomical reference throughout the workflow. The diffusion data were denoised with MP-PCA denoising as implemented in MRtrix3’s dwidenoise (Tournier et al., 2019; Veraart et al., 2016) with a 5-voxel window. After denoising, the mean intensity of the diffusion-weighted series was adjusted so that the mean intensity of the b = 0 images matched across each separate DWI scanning sequence. B1 field inhomogeneity was corrected using dwibiascorrect from MRtrix3 with the N4 algorithm (Tustison et al., 2010). FSL’s Eddy (version 6.0.5.1:57b01774) was used for head motion correction and Eddy current correction (Andersson & Sotiropoulos, 2016). Eddy was configured with a q-space smoothing factor of 10, a total of 5 iterations, and 1000 voxels used to estimate hyperparameters. A linear first level model and a linear second level model were used to characterize Eddy current-related spatial distortion. q-Space coordinates were forcefully assigned to shells. Eddy’s outlier replacement was run (Andersson et al., 2016). Data were grouped by slice, only including values from slices determined to contain at least 250 intracerebral voxels. Groups deviating by more than 4 standard deviations from the prediction had their data replaced with imputed values. b = 0 reference images with reversed phase encoding directions were used along with an equal number of b = 0 images extracted from the primary diffusion scans. The susceptibility-induced off-resonance field was estimated using a method similar to that described in Andersson et al. (2003). The field maps were ultimately incorporated into the Eddy current and head motion correction interpolation. The final interpolation was performed using the jac method. Finally, the two diffusion-weighted series were concatenated and resampled to the anatomical AC-PC space with 1.5 mm isotropic voxels.
2.3.3 Fiber tractography
Voxel-level diffusion modeling and whole-brain tractography were carried out using QSIprep’s MRtrix3 (Tournier et al., 2019) reconstruction pipeline. Within each voxel, diffusion was modeled with constrained spherical deconvolution (CSD) using the dhollander algorithm for estimating the fiber response function (Dhollander et al., 2016, 2019). Whole-brain tractography was carried out using the iFOD2 algorithm with the multi-shell multi-tissue method (msmt-csd; Tournier et al., 2004, 2008). One million streamlines were generated and their length was limited to the range 50–250 mm, resulting in a whole-brain tractogram in each subject’s native space for each timepoint.
pyAFQ (Kruper et al., 2021; Yeatman, Dougherty, Myall, et al., 2012) was then applied to the whole-brain tractograms to (i) segment white matter tracts of interest and (ii) calculate tract profiles of variation in diffusion metrics along the trajectory of each tract. A standard cleaning procedure was used to remove spurious streamlines that do not follow the trajectory of the desired tract. Specifically, we removed streamlines that deviated from the tract core by more than 3 standard deviations, or that were longer than the mean tract length by more than 4 standard deviations. This process was repeated iteratively five times.
Diffusion metrics, namely, fractional anisotropy (FA) and mean diffusivity (MD), were calculated by pyAFQ based on the diffusion kurtosis imaging model (DKI; Jensen et al., 2005) as implemented in DiPy (Garyfallidis et al., 2014; Henriques et al., 2021). These metrics were then sampled onto 100 equidistant nodes along each tract, such that for each metric the value at the node represents the weighted average across the streamlines that comprise the tract. Here we limit our analyses to the middle 80 nodes of each tract, as nodes closer to the tract terminations typically suffer from partial volume as streamlines enter gray matter. To evaluate reliability across different regions of the brain, we segmented both inter- and intra-hemispheric tracts. First, we segmented the seven sub-bundles of the corpus callosum (CC): Orbital, Superior Frontal, Motor, Superior Parietal, Posterior Parietal, Temporal, and Occipital (Dougherty et al., 2007). Then, we also segmented seven intra-hemispheric tracts that have been frequently studied in developmental cognitive neuroscience: the direct and posterior branches of arcuate fasciculus (ARC, pARC), the superior longitudinal fasciculus (SLF), the inferior fronto-occipital fasciculus (IFOF), the inferior longitudinal fasciculus (ILF), the uncinate fasciculus (UF), and the corticospinal tract (CST). Each tract was segmented bilaterally (see Fig. 1).
Calculating T1 profiles along the trajectory of major white matter tracts. (A) Intra-hemispheric tracts (left) and callosal sub-bundles (right) of a representative subject, overlaid on their anatomical T1w image. (B) Left arcuate fasciculus (ARC) of two example subjects across scans at two timepoints. The tract streamlines are color coded by their corresponding T1 values. Tract shape is variable between subjects yet highly consistent within the same subject over time. (C) T1 profiles along the trajectory of the left ARC. Nodes are ordered from anterior to posterior position.
Calculating T1 profiles along the trajectory of major white matter tracts. (A) Intra-hemispheric tracts (left) and callosal sub-bundles (right) of a representative subject, overlaid on their anatomical T1w image. (B) Left arcuate fasciculus (ARC) of two example subjects across scans at two timepoints. The tract streamlines are color coded by their corresponding T1 values. Tract shape is variable between subjects yet highly consistent within the same subject over time. (C) T1 profiles along the trajectory of the left ARC. Nodes are ordered from anterior to posterior position.
2.4 Reliability analysis
We took three approaches to evaluate scan–rescan reliability in white matter: (1) voxel-wise analysis across the entire white matter, (2) ROI-based analysis of the corpus callosum (CC), and (3) tractometry-based comparisons in multiple white matter tracts (Fig. 1). In each of these three analyses, we compared values obtained from the first scanning session available for each child (“scan-1”) with the values obtained in their second scan, which took place 2–4 months apart (“scan-2”). When comparing the 2-min reconstruction pipelines, we always compared the first 2-min acquisition obtained in each scanning session. This provides a lower limit on scan–rescan reliability as there might be some developmental changes over this time scale. For each of these approaches, we calculate several measures of scan–rescan reliability: Pearson’s correlation coefficient (r), the coefficient of determination (R2), and the coefficient of variation (CV) between the two scans. The coefficient of variation is defined as the ratio between the standard deviation of two measurements, and their mean. Lower CV is, therefore, considered better as it indicates less variation in the measurement over repeating scans. We visualize the distribution of the differences in T1 values between the two scans using Bland–Altman plots. We repeat these calculations for T1 values estimated using each of the four pipelines to explore whether different reconstruction approaches impact reliability.
2.4.1 Voxel-wise and ROI-based comparison
Accurate registration of the two sessions is an essential step for assessing scan–rescan reliability. We took several steps to achieve high-quality registration (see Fig. 2): First, we registered and resampled the two scans into a “half-way” space using the ANTS unbiased pairwise registration tool (Tustison et al., 2021), to maintain inverse consistency, instead of resampling the source (the second scan) to the estimated target location (the first scan) (Reuter & Fischl, 2011; Reuter et al., 2010). Second, instead of registering MRF-generated T1 maps, we registered the reconstructed coefficient maps before template matching, which reduces the partial volume effects and thus improves the accuracy of T1 fitting (Q. Chen et al., 2023). The registration was calculated for the 4-min reconstruction pipeline and then applied to all pipelines. Then, to obtain accurate white matter masks in the “half-way” space, the two “half-way” space 4-min coefficient maps were averaged to generate a higher SNR image (equivalent to an 8-min scan), followed by template matching and synthetic T1w-MPRAGE generation. This high-quality image was generated using Bloch simulations (Wang et al., 2014) and used as input for the Freesurfer segmentation pipeline using the SynthSeg method (Billot et al., 2023), to obtain the white matter mask and corpus callosum (CC) ROIs (see Fig. 2). CC ROIs were visually inspected and manually edited to exclude corticospinal fluid (CSF) and fornix voxels.
Workflow for the half-way registration process. Initially, 4-min coefficient maps from two scans are presented in native space, where the c3 coefficient (highlighted in a red box) was chosen as the registration input due to its high contrast. Then, we applied the calculated transform matrices from each scan to the half-way space to all five coefficient maps. We then averaged the registered coefficient maps from both scans to increase SNR, before running the template matching to generate synthetic T1-weighted MPRAGE images. Lastly, masks were created using Freesurfer software for white matter and corpus callosum regions of interest.
Workflow for the half-way registration process. Initially, 4-min coefficient maps from two scans are presented in native space, where the c3 coefficient (highlighted in a red box) was chosen as the registration input due to its high contrast. Then, we applied the calculated transform matrices from each scan to the half-way space to all five coefficient maps. We then averaged the registered coefficient maps from both scans to increase SNR, before running the template matching to generate synthetic T1-weighted MPRAGE images. Lastly, masks were created using Freesurfer software for white matter and corpus callosum regions of interest.
The scan–rescan reliability was first assessed voxel-wise within the entire white matter mask. Pearson’s correlation coefficient was calculated among voxels of the two scan sessions for each participant. Coefficient of determination (R2) was further calculated with the fitting function y = x to assess the data consistency. We ran a linear mixed effect model using the lme4 package in R (Bates et al., 2015) to evaluate the contribution of scan duration and B0 correction to scan–rescan reliability. The model included age as a fixed effect as well as a random intercept for each subject to account for individual variability (full formula:
For the ROI-based analysis, we focused on five corpus callosum (CC) regions as the CC is an area with dense, highly myelinated axons (Aboitiz et al., 1992). The average T1 value in each ROI was calculated per subject and compared across the two scanning sessions. Again, we used Pearson’s correlation, coefficient of determination (R2), and coefficient of variation (CV) across all subjects as measures of scan–rescan reliability. While the main focus of the current study was white matter, we also analyzed several subcortical gray matter regions obtained with Freesurfer’s pipeline to provide an estimate of the T1 values and reliability in gray matter.
2.4.2 Tractography-based comparison
In order to evaluate T1 values along the tracts, we first had to register the MRF data to the native diffusion space. To achieve this, synthetic diffusion b = 0 images were generated based on the T1 and T2 maps from the 4-min MRF scan. These synthetic images were registered to the diffusion b = 0 images using Advanced Normalization Tools (ANTs) (Avants et al., 2009; Klein et al., 2009). A rigid linear transformation was chosen with optimizing a mutual information (MI) similarity metric. The transformation was subsequently applied to the MRF coefficient maps, followed by template matching to get T1 values mapped into the diffusion native space. T1 values obtained from each of the 4 different pipelines were then sampled onto 100 equidistant nodes along each tract using pyAFQ.
We evaluated scan–rescan reliability by comparing the mean T1 values of each tract across the two scans of each participant. We calculated Pearson’s correlation coefficient, coefficient of determination (R2), and coefficient of variation (CV) between the two scans, when T1 values were estimated using each of the four pipelines.
2.5 Age effects
To examine whether MRF-derived values are able to replicate known developmental effects (Yeatman, Wandell, et al., 2014), we calculate Pearson’s correlation between mean R1 values in each tract and participants’ age at the time of each scan. For this analysis we used R1, the inverse of T1 (1/T1), as our metric, since R1 linearizes the T1 scale and has been more extensively studied in relation to brain development (Travis et al., 2019; Yeatman, Wandell, et al., 2014). This was repeated for the four pipelines to test whether different processing choices affect the ability to detect expected age effects.
3 Results
3.1 Qualitative evaluation of reconstructed maps
Figure 3 shows two typical slices of reconstructed T1 maps from the four different pipelines. Visual inspection revealed that all the reconstructed 1 mm resolution images showed good contrast between gray matter, white matter, and CSF. Compared with the 2-min acquisition, the 4-min acquisition resulted in less noisy maps, irrespective of B0 correction (first row of Fig. 3A). B0 correction significantly reduced image blurring around the sinuses, which are air-filled spaces in the skull that induce high B0 inhomogeneity (Fig. 3A).
Example axial slices of reconstructed T1 maps using the four pipelines. (A) A slice passing through the sinuses, a region with high B0 inhomogeneity (as shown in the corresponding B0 map on the right). Applying B0 correction significantly sharpens the image in the inhomogeneous area. (B) A more superior slice where B0 is more homogeneous. Here, B0 correction does not show significant improvement to image quality in white matter. In both slices, maps generated from 4-min data are less noisy than those of 2-min data.
Example axial slices of reconstructed T1 maps using the four pipelines. (A) A slice passing through the sinuses, a region with high B0 inhomogeneity (as shown in the corresponding B0 map on the right). Applying B0 correction significantly sharpens the image in the inhomogeneous area. (B) A more superior slice where B0 is more homogeneous. Here, B0 correction does not show significant improvement to image quality in white matter. In both slices, maps generated from 4-min data are less noisy than those of 2-min data.
3.2 Voxel-wise comparison
Figure 4 shows scan–rescan reliability across all white matter voxels in a single subject (panel A) and the distribution of correlation coefficients across all participants in the sample (panel B; one subject was removed from the analysis due to severe blurring caused by motion). Individual participants showed high Pearson’s correlation for the 2-min data (r = 0.838 ± 0.041, range [0.744–0.918]) and coefficient of determination (R2 = 0.638 ± 0.114, range [0.278–0.835]), indicating that individual variation in T1 values across voxels is highly reliable across scans. As expected, reliability was higher for 4-min data, suggesting that longer scan duration improved SNR (r = 0.866 ± 0.043, range [0.772–0.927]; R2 = 0.693 ± 0.121, range [0.319–0.853]). B0 correction did not improve reliability compared with data without this correction (2-min: r = 0.838 vs. r = 0.834, R2 = 0.638 vs. 0.629 for without vs. with B0 correction; 4-min: r = 0.866 vs. 0.863, R2: 0.693 vs. 0.683 for without vs. with B0 correction). A linear mixed effect model confirmed that reliability was higher for 4-min data compared with the 2-min data (β = 0.028, t = 12.85, p < 1e-16), but was not affected by the B0 correction (β = -0.004, t = -1.705, p = 0.09). The interaction between duration and B0 correction was not significant (β = 0.0002, t = 0.065, p = 0.948) and neither was the effect of age (β = -0.006, t = -1.584, p = 0.12). Lastly, to examine reliability in the absence of developmental changes, we also compared the values of the two 2-min scans that were collected at the same scanning session. As expected, scan–rescan reliability was higher within session compared with the 2-min data across sessions (r = 0.841 ± 0.031, range [0.696–0.880]; R2 = 0.673 ± 0.065, range [0.365–0.755]).
Voxel-wise scan–rescan reliability. (A) 2D Histograms show T1 values in white matter voxels in the two scans of a single subject, for the four reconstruction pipelines. Color denotes voxel count in logarithmic scale. Gray lines denote the equality line. (B) Scan–rescan reliability across all subjects (N = 48), calculated as Pearson’s correlation (top panel) or coefficient of determination (bottom panel).
Voxel-wise scan–rescan reliability. (A) 2D Histograms show T1 values in white matter voxels in the two scans of a single subject, for the four reconstruction pipelines. Color denotes voxel count in logarithmic scale. Gray lines denote the equality line. (B) Scan–rescan reliability across all subjects (N = 48), calculated as Pearson’s correlation (top panel) or coefficient of determination (bottom panel).
3.3 ROI-based comparison
We next evaluated mean T1 values in five segments of the corpus callosum (CC), obtained in each subject’s halfway space using Freesurfer. We first observed that T1 values follow an inverted U-shape (Fig. 5B), in line with previous findings using traditional quantitative T1 methods and histology (Aboitiz et al., 1992; Berman et al., 2018; Hofer et al., 2015; Lynn et al., 2021; Stikov et al., 2011, 2015). Across the five segments, T1 values were highly reliable for all four reconstruction pipelines, with slightly higher reliability for 4-min data (Table 1; Fig. 5C). Similarly to the voxel-based analysis, B0 correction did not contribute to reliability in the CC segments. Supplementary Figure S1 shows the distribution of differences between T1 values in the two scans in the four pipelines. Reliability of T1 values in subcortical gray matter regions was also very high (mean Pearson’s R > 0.9, CV < 1.5%) and is reported in Supplementary Table S1 and Figure S2.
Scan–rescan reliability and T1 values in the corpus callosum (CC). (A) Five regions of interest (ROIs) of the CC overlaid on a synthetic T1w image of a representative subject. (B) T1 values exhibit an inverted U-shape along the anterior-posterior axis of the CC. Gray lines denote individual subjects. (C) Scan–rescan reliability in the CC ROIs is high across all pipelines, with subtle improvement for the 4-min pipelines.
Scan–rescan reliability and T1 values in the corpus callosum (CC). (A) Five regions of interest (ROIs) of the CC overlaid on a synthetic T1w image of a representative subject. (B) T1 values exhibit an inverted U-shape along the anterior-posterior axis of the CC. Gray lines denote individual subjects. (C) Scan–rescan reliability in the CC ROIs is high across all pipelines, with subtle improvement for the 4-min pipelines.
Scan–rescan reliability values for the corpus callosum (CC).
. | Coefficient of determination (R2) . | Coefficient of variation (CV, %) . | ||||||
---|---|---|---|---|---|---|---|---|
ROI | 2 min | 2 min + B0 | 4 min | 4 min + B0 | 2 min | 2 min + B0 | 4 min | 4 min + B0 |
Anterior | 0.777 | 0.784 | 0.804 | 0.794 | 1.58 | 1.52 | 1.37 | 1.44 |
MidAnterior | 0.616 | 0.584 | 0.680 | 0.684 | 2.02 | 2.04 | 1.81 | 1.80 |
Central | 0.605 | 0.516 | 0.663 | 0.621 | 1.66 | 1.79 | 1.65 | 1.74 |
MidPosterior | 0.521 | 0.510 | 0.534 | 0.517 | 2.17 | 2.21 | 1.84 | 1.89 |
Posterior | 0.505 | 0.469 | 0.691 | 0.684 | 2.46 | 2.56 | 1.84 | 1.89 |
. | Coefficient of determination (R2) . | Coefficient of variation (CV, %) . | ||||||
---|---|---|---|---|---|---|---|---|
ROI | 2 min | 2 min + B0 | 4 min | 4 min + B0 | 2 min | 2 min + B0 | 4 min | 4 min + B0 |
Anterior | 0.777 | 0.784 | 0.804 | 0.794 | 1.58 | 1.52 | 1.37 | 1.44 |
MidAnterior | 0.616 | 0.584 | 0.680 | 0.684 | 2.02 | 2.04 | 1.81 | 1.80 |
Central | 0.605 | 0.516 | 0.663 | 0.621 | 1.66 | 1.79 | 1.65 | 1.74 |
MidPosterior | 0.521 | 0.510 | 0.534 | 0.517 | 2.17 | 2.21 | 1.84 | 1.89 |
Posterior | 0.505 | 0.469 | 0.691 | 0.684 | 2.46 | 2.56 | 1.84 | 1.89 |
Reported are the coefficient of determination (R2) and coefficient of variation (CV).
3.4 Reliability of T1 values in white matter tracts
We excluded high-motion diffusion scans where the mean framewise displacement (FD) was greater than 0.8 mm, or the neighborhood correlation was lower than r = 0.6. This resulted in a sample size of 41 subjects that had good quality diffusion data in both timepoints.
We first evaluated scan–rescan reliability by comparing mean tract T1 values across the two scans of each participant, calculated using each of the four different pipelines. This analysis showed that across all tracts, the reliability was high even for the single 2-min scan without B0 correction (mean Pearson’s r = 0.827 across 21 tracts, range [0.751–0.887]). Increasing the scan duration to 4 min slightly increased reliability (mean r = 0.836, range [0.709–0.898]), while B0 correction hardly had an effect on reliability (mean r = 0.826, range [0.753–0.887]; mean r = 0.836, range [0.712–0.897] for 2 and 4 min, respectively). These results are consistent with what we reported above for the voxel-based and ROI-based analyses. Figures 6–7 and Supplementary Figure S3 show the variability in scan–rescan correlation values across different tracts, and Supplementary Table S2 reports the corresponding coefficients of determination and coefficients of variation (CV). Across tracts, the CV remains below 3%, and is either similar or smaller in the 4-min pipelines, suggesting that the increased SNR yields greater stability in measurement over time. Supplementary Figures S4–S7 show the distribution of differences between T1 values in the two scans in the four pipelines.
Scan–rescan reliability of callosal white matter tracts using the four reconstruction pipelines. (A) Mean T1 values for the first and second scan of each participant in the orbital sub-bundle (top) and posterior parietal sub-bundle of the corpus callosum (CC). Dashed lines represent the equality line. (B) Pearson’s r correlation coefficient for mean T1 values across the four pipelines for the seven CC sub-bundles. Error bars denote the 68% confidence interval calculated using a bootstrap permutation procedure. Diffusion metrics are shown for reference in gray (MD, mean diffusivity; FA, fractional anisotropy). Dashed line represents the median reliability across all tracts. (C) Reliability along the tract profile. Each line denotes the Pearson’s r correlation coefficient between T1 values from the two timepoints, across the length of the tract profile. Each line represents a single reconstruction pipeline. In each sub-bundle, nodes are ordered from left to right.
Scan–rescan reliability of callosal white matter tracts using the four reconstruction pipelines. (A) Mean T1 values for the first and second scan of each participant in the orbital sub-bundle (top) and posterior parietal sub-bundle of the corpus callosum (CC). Dashed lines represent the equality line. (B) Pearson’s r correlation coefficient for mean T1 values across the four pipelines for the seven CC sub-bundles. Error bars denote the 68% confidence interval calculated using a bootstrap permutation procedure. Diffusion metrics are shown for reference in gray (MD, mean diffusivity; FA, fractional anisotropy). Dashed line represents the median reliability across all tracts. (C) Reliability along the tract profile. Each line denotes the Pearson’s r correlation coefficient between T1 values from the two timepoints, across the length of the tract profile. Each line represents a single reconstruction pipeline. In each sub-bundle, nodes are ordered from left to right.
Scan–rescan reliability of left hemisphere white matter tracts using the four reconstruction pipelines. (A) Mean T1 values for the first and second scan of each participant in the left arcuate fasciculus (ARC, top) and left inferior fronto-occipital fasciculus (IFO, bottom). Dashed lines represent the equality line. (B) Pearson’s r correlation coefficient for mean T1 values across the four pipelines in left hemisphere tracts. Error bars denote the 68% confidence interval calculated using a bootstrap permutation procedure. Diffusion metrics are shown for reference in gray (MD, mean diffusivity; FA, fractional anisotropy). Dashed line represents the median reliability across all tracts. (C) Reliability along the tract profile. Each line denotes the Pearson’s r correlation coefficient between T1 values from the two timepoints, across the length of the tract profile. Each line represents a single reconstruction pipeline. In each tract, nodes are ordered from anterior to posterior position.
Scan–rescan reliability of left hemisphere white matter tracts using the four reconstruction pipelines. (A) Mean T1 values for the first and second scan of each participant in the left arcuate fasciculus (ARC, top) and left inferior fronto-occipital fasciculus (IFO, bottom). Dashed lines represent the equality line. (B) Pearson’s r correlation coefficient for mean T1 values across the four pipelines in left hemisphere tracts. Error bars denote the 68% confidence interval calculated using a bootstrap permutation procedure. Diffusion metrics are shown for reference in gray (MD, mean diffusivity; FA, fractional anisotropy). Dashed line represents the median reliability across all tracts. (C) Reliability along the tract profile. Each line denotes the Pearson’s r correlation coefficient between T1 values from the two timepoints, across the length of the tract profile. Each line represents a single reconstruction pipeline. In each tract, nodes are ordered from anterior to posterior position.
For reference, we also calculated the reliability of diffusion-based metrics for each tract, fractional anisotropy (FA) and mean diffusivity (MD), in the same manner. This analysis revealed that MRF-derived T1 values show comparable or higher reliability than that of widely used diffusion-based metrics (FA: mean r = 0.754 range [0.584–0.904; MD: median r = 0.795 range [0.548–0.926]). As shown in Figures 6–7 (and Supplementary Fig. S3), the improved reliability of MRF compared with diffusion stands out especially for tracts where diffusion metrics have relatively low reliability, like the Uncinate Fasciculus and Corticospinal tract. This comparison does not suggest that MRF may replace diffusion data, as they are sensitive to different tissue properties and provide complementary information (Huber et al., 2021; Travis et al., 2019; Yeatman, Wandell, et al., 2014). Instead we use this comparison to highlight that the reliability of MRF is just as good or even greater than that of measurements that are widely used in the field.
Lastly, we examined whether R1 (1/T1) values along different tracts replicate known age effects. We found that R1 was positively correlated with age in multiple intra-hemispheric tracts, as well as in the more frontal sub-bundles of the corpus callosum (Fig. 8; Supplementary Fig. S8), in line with previous findings (Yeatman, Wandell, et al., 2014). Repeating this analysis using the four pipelines revealed similar results. Interestingly, there was a trend where in some tracts age effects were larger when R1 values were calculated with 4-min pipelines, compared with the 2-min pipelines. While not statistically significant, this might suggest that using 4-min sequences increases SNR and enhances the ability to capture underlying effects of tissue development.
Tract mean R1 values positively correlate with age. (A) Scatterplots show R1 values calculated using the 4-min pipeline (without B0 correction) from the two scans of each participant (first scan N = 43, black; second scan N = 46, gray). Lines denote the best linear fit surrounded by the 95% confidence interval (shaded area). Shown are the left arcuate fasciculus (ARC), left Inferior fronto-occipital fasciculus (IFO) and left corticospinal tract (CST). (B) Correlation coefficients between age and R1 values in left hemisphere tracts when calculated using the four different pipelines (scatterplots in A correspond to the red bars). Error bars denote the 68% confidence interval calculated using a bootstrap permutation procedure. The dashed line denotes the FDR-corrected significance level at p < 0.05.
Tract mean R1 values positively correlate with age. (A) Scatterplots show R1 values calculated using the 4-min pipeline (without B0 correction) from the two scans of each participant (first scan N = 43, black; second scan N = 46, gray). Lines denote the best linear fit surrounded by the 95% confidence interval (shaded area). Shown are the left arcuate fasciculus (ARC), left Inferior fronto-occipital fasciculus (IFO) and left corticospinal tract (CST). (B) Correlation coefficients between age and R1 values in left hemisphere tracts when calculated using the four different pipelines (scatterplots in A correspond to the red bars). Error bars denote the 68% confidence interval calculated using a bootstrap permutation procedure. The dashed line denotes the FDR-corrected significance level at p < 0.05.
4 Discussion
In this study, we evaluated the reliability of a novel and rapid MRF sequence that can be used to obtain high-quality quantitative T1 data in young children. Our analyses showed that MRF-derived T1 values were highly reliable in white matter over time, and were able to capture known age effects. In addition, we evaluated the contribution of different acquisition and preprocessing choices on scan–rescan reliability.
Our analyses revealed that high scan–rescan reliability in white matter can be achieved with a single 2-min scan and that reliability increases with scan. While previous studies evaluated the reliability of MRF in phantoms and healthy adults (Buonincontri et al., 2019; Gracien et al., 2020; Körzdörfer et al., 2019; Wicaksono et al., 2023), here we assessed the scan–rescan reliability in a real life research scenario with young children. Our results provide a proof of concept for the feasibility of using MRF to measure tissue properties reliably over time with populations where data acquisition and quality are more challenging. These results hold important implications for the field of developmental cognitive neuroscience and pave the way for studies that focus on structural brain development in children.
We examined the reliability of MRF-derived T1 values in white matter using three complementary methodologies, namely, comparing individual voxels, analyzing specific ROIs, and using diffusion-based tractography. This three-pronged approach was intended to test MRF-derived values in the common analyses used to study white matter structure. Together our results show high scan–rescan reliability of MRF in white matter, independent of the downstream analysis method. The voxel-wise analysis allowed us to describe variation in reliability across individuals. The ROI approach was useful to capture the canonical U-shape of the corpus callosum and compare it with existing literature. Lastly, diffusion tractography has been extensively used to study the relationship between white matter development and cognitive functions (Roy et al., 2024; Wang et al., 2017; Yeatman, Dougherty, Ben-Shachar, et al., 2012). We show that using a standard automated pipeline for tractography, the reliability of MRF-derived tract profiles is remarkably high across the tracts examined, and in some cases exceeded the reliability of diffusion-based metrics.
In addition to establishing the reliability of the MRF sequence, the current work validates MRF-derived T1 values in two ways. We first found that T1 values in the corpus callosum follow an inverted U-shape, in line with prior observations using other quantitative methods and histology (Aboitiz et al., 1992; Berman et al., 2018; Hofer et al., 2015; Lynn et al., 2021; Stikov et al., 2011, 2015). This demonstrates that MRF-derived T1 values are sensitive enough to capture this variation in adjacent parts of the corpus callosum. In addition, we observed significant correlations between age and R1 values in multiple intra-hemispheric white matter tracts, replicating known age effects (Callaghan et al., 2014; Eminian et al., 2018; Yeatman, Wandell, et al., 2014). Interestingly, the only callosal sub-bundles that correlated with age were the anterior ones, in line with Yeatman, Wandell, et al. (2014). The observed age effects may reflect several developmental processes, since R1 has been shown to be sensitive to water content and iron composition (Fatouros et al., 1991; Filo et al., 2023; Gelman et al., 2001), in addition to its well-established relationship with myelin content (Lutti et al., 2014; Stüber et al., 2014). Importantly, these processes are not mutually exclusive through development, as increase in myelin content is tightly linked to decrease in water content in the same regions and may be accompanied by local changes in the molecular composition of the myelin sheath. One interpretation of the observed increase in R1 through development is as reflecting the prolonged myelination of white matter which lasts into adulthood. In our age range, we were able to capture linear growth in R1 values, yet studies with a wider age range will be needed to determine whether the relationship between age and R1 remains linear or plateaus later in adolescence. Together, this shows that MRF-derived T1 values capture similar effects to standard T1 values acquired with longer acquisitions. The short duration of the MRF sequence and its ability to capture multiple quantitative maps in a single acquisition make it feasible for future studies to combine several indices to disentangle the tissue processes underlying neurodevelopment, disease, and aging.
One of the principal strengths of this work lies in its implementation within a cohort of children scanned multiple times, which offers promise for quantitative MRI adoption in brain development research. A central challenge in imaging young children is motion during scans, which significantly compromises data quality and leads to high rates of data exclusion (Afacan et al., 2016; Nebel et al., 2022; Satterthwaite et al., 2012). Importantly, data exclusion based on motion may lead to biased samples that underrepresent younger children or those that suffer from more extreme phenotypes (Nebel et al., 2022; Wylie et al., 2014). The proposed MRF protocol overcomes this barrier by employing a fast and robust sequence with spiral readout, which is less likely to be affected by motion (Glover & Pauly, 1992). Using short sequences not only mitigates motion artifacts in acquired data, but also increases the probability of acquiring the data successfully, as it is not uncommon for children to end scanning sessions prematurely before completing the full sequence. With a protocol that takes 2–4 min, this risk is greatly reduced compared with standard protocols that take about 20 min to complete and require a sequence of images. This may also reduce the drop-out rate in follow-up visits which is a major challenge in longitudinal studies. Based on the pipeline comparison we conducted, below we provide a list of practical recommendations for researchers and clinicians interested in using MRF sequences in research and practice.
4.1 Practical recommendations
4.1.1 Scan duration
Our findings show that reliability increased when we combined two 2-min sequences compared with a single run. Importantly, even for the single 2-min sequence, scan–rescan reliability was greater than r = 0.7, and that for white matter tracts was comparable or better than the scan–rescan reliability of diffusion metrics. We recommend that if time allows, two repetitions will boost SNR and reliability, but for clinical studies and situations where time is a limiting factor, a single 2-min sequence could provide usable quantitative T1 data for many applications.
4.1.2 B0 correction
Our findings do not indicate a significant improvement in reliability of white matter T1 values by including a separate B0 correction. B0 correction will likely be more important for studies that target orbital regions in the vicinity of the sinuses, which may suffer more from inhomogeneity (see Fig. 2). B0 correction may also prove crucial for accurate T2 maps, but this remains to be assessed in future work.
4.1.3 Registration
In order to accurately evaluate the reliability of MRF-derived T1, we created a half-way space template image and simultaneously resampled both scan sessions into the half-way space (Reuter & Fischl, 2011; Reuter et al., 2010). The common practice of registering one scan to another creates an inherent asymmetry in the sense that the second scan is aligned, registered, and resampled to match the spatial configuration of the first scan, while the initial scans remain unaltered. The disparity introduces a systematic bias into the repeatability analysis, such that the values may differ depending on which scan serves as the reference image and which scan is undergoing the registration and resampling. We propose that registration to a half-way space resolves this asymmetry and leads to more accurate results.
4.2 Limitations and future directions
The current protocol and preprocessing pipeline do not provide information about motion during the scan. The evaluation of motion was done qualitatively based on the reconstructed images. Quantifying motion would allow us to explicitly quantify the effect of motion on MRF-derived T1 values, and potentially explain some of the variance in scan–rescan reliability across subjects. Another limitation of the current protocol is that it did not include B1 inhomogeneity mapping, which is an essential step for generating accurate T2 maps from the MRF sequence. Future versions of the MRF sequence may incorporate B1 quantification to fully exploit MRF capabilities to obtain multiple quantitative maps simultaneously (e.g., T2, proton density). Lastly, a technical challenge for implementing this protocol widely in clinical settings is that the reconstruction phase is currently lengthy and requires heavy computations. Future work can improve the efficiency of the reconstruction making it feasible in clinical settings (Zhou et al., 2024) while also including online motion correction (Nurdinova et al., 2024).
5 Conclusion
In sum, MRF provides a promising methodology for deriving reliable quantitative metrics of brain tissue structure in children and patient populations where scan time and motion are of particular concern. Coupled with the pipeline we propose which improves the repeatability, the current work paves the way for using MRF in longitudinal pediatric studies. This method would facilitate studies focusing on structural brain development by providing an easy and rapid way to quantify changes in myelin and other properties of white matter.
Data and Code Availability
Data associated with this publication is publicly available here: https://purl.stanford.edu/gz224th5011
Author Contributions
Maya Yablonski: software, formal analysis, visualization, investigation, writing—original draft, writing—review and editing. Zihan Zhou: methodology, software, formal analysis, investigation, writing—original draft, writing—review and editing. Xiaozhi Cao: software. Sophie Schauman: software, data curation. Congyu Liao: software. Kawin Setsompop: conceptualization, methodology, supervision, funding acquisition, writing—review and editing. Jason D. Yeatman: conceptualization, methodology, supervision, funding acquisition, writing—review and editing.
Declaration of Competing Interest
The authors declare no competing financial interests.
Acknowledgments
This work was supported by NICHD R01-HD095861 to J.D.Y., grant R01-MH116173 to K.S. and by the Stanford GSE/SOM postdoctoral fellowship to Z.Z. We thank Adam Kerr and Hua Wu from the Stanford CNI for their support and advice in implementing the protocol. We thank Jamie Mitchell, Hannah Stone, Mia Fuentes-Jimenez, and Jasmine Tran for their invaluable part in data collection.
Supplementary Materials
Supplementary material for this article is available with the online version here: https://doi.org/10.1162/imag_a_00470.
References
Author notes
Equal contribution