Abstract
Over the last few decades, diffusion MRI (dMRI) streamline tractography has emerged as the dominant method for in vivo estimation of white matter (WM) pathways in the brain. One key limitation to this technique is that modern tractography implementations require high angular resolution diffusion imaging (HARDI). However, HARDI can be difficult to collect clinically, limiting the reach of tractography analyses to research cohorts and thus limiting many WM investigations to certain populations and pathologies. As such, a clinically viable tractography solution applicable to wider patient populations scanned as a part of routine care would be of key significance in broadening WM analyses to underfunded or rarer diseases and to the clinical setting. Such a solution would require the ability to perform arbitrary tractography analyses, use only clinical imaging for input, and be open source and widely accessible and implementable. Thus, here we evaluate our recently developed, containerized, and open-source, T1-weighted (T1w) MRI-based deep learning model for streamline propagation. We empirically assess its performance against traditional dMRI-based and established atlas-based approaches in a healthy young population, an aging one, and in those with epilepsy, depression, and brain cancer. In the healthy young population, we find slightly increased error compared to traditional tractography with the deep learning model that falls within the bounds attributable to dMRI variability and is considerably less than the atlas-based approach. Further, seeking to replicate previously published dMRI tractography effects in the remaining cohorts as an initial assessment of clinical viability, we find this model successfully does so in some key cases—particularly in applications that rely on long-range streamlines including those not captured by the atlas-based approach—but importantly not all. These results suggest a deep learning-based approach to tractography with T1w MRI demonstrates promise within the limitations of our definition of clinical viability and especially over atlas-based approaches but requires refinement and more robust consideration of out-of-distribution effects prior to widespread clinical use. We also find these results raise additional questions regarding the differences in image content between dMRI and T1w MRI and their relationship to tractography. Further investigation of these questions will improve the field’s understanding of which features of the brain influence measured tractography effects.
1 Introduction
Diffusion MRI (dMRI) is a well-established method for making in vivo measurements of the brain (Jones, 2010). By estimating how the diffusion of water is restricted along different directions, dMRI signals capture the architectures of cell populations on the voxel level at millimeter scale (Jones, 2010). In the last few decades, many different models have arisen to fit these signals to allow investigators to better interpret changes. These include the diffusion tensor and fiber orientation distribution (FOD) models, among others (Tournier et al., 2004; Westin et al., 1997).
One shared theme among many dMRI models is the concept of orientation, especially in white matter (WM). Due to the highly anisotropic nature of axon fiber populations in typical WM, these models allow researchers to investigate the directionality of the underlying fiber populations, a unique feature of dMRI. This has led to the development of a core dMRI technique, streamline tractography, which has since facilitated advances in the understanding and treatment of neurological health and disease (Essayed et al., 2017; Johnson et al., 2023; Korgaonkar et al., 2012; Schilling et al., 2022). First proposed in the 1990s, traditional tractography propagates streamlines, or virtual estimates of axons, by stepping through the brain following the orientation of dMRI models at each step (A. L. Alexander, 2010; Basser, 1998; Mori et al., 1998, 1999). Then, once millions of streamlines are generated across the brain to produce a “tractogram,” or a virtual estimate of all the axons in the brain, subsequent analyses of WM can occur. One common analysis is the virtual dissection of the tractogram into known anatomical WM bundles and the investigation of their properties (Chandio et al., 2020; Yeh, 2020). Another is the analysis of how different parts of the brain are connected, known as network analyses or “structural connectomics” (Sporns et al., 2005).
One essential factor for tractography is a dMRI acquisition that allows it to be performed robustly. For instance, some voxels may contain single-fiber populations which are readily represented by simpler models, like diffusion tensors, that require measurements in only a few diffusion directions (Tournier et al., 2007). Other voxels, however, may contain multiple crossing fiber populations which necessitate more advanced models, like FODs, that require measurements in 10s or 100s of diffusion directions for accurate fits (D. C. Alexander & Seunarine, 2010; Tournier, 2010; Tournier et al., 2007). Thus, dMRI acquisitions known as high angular diffusion imaging (HARDI) that measure diffusion signals in many directions and facilitate FOD fitting have become the standard for tractography analyses (Descoteaux, 1999). However, one key challenge remains. HARDI acquisitions are time-consuming and can suffer from the influence of noise (Jones & Cercignani, 2010). As such, they are rarely collected clinically. This has resulted in HARDI and tractography being primarily limited to not only the research realm, but also to investigators with the expertise to robustly acquire and analyze the images. Thus, this renders a significant portion of neuroimaging data and the associated patients and disease states unavailable for tractography analyses, limiting advances in understanding of WM in the brain and the adoption of tractography in clinical workflows.
As such, short of rapid advances in HARDI acquisition schemes, an alternative approach to tractography would be key for clinical use as well as for studying patients with rare diseases, pathologies without significant research funding, large cohorts with only clinical imaging, or legacy data lacking modern HARDI. Such an approach would need to satisfy three key criteria:
The ability to perform arbitrary tractography analyses in emulation of traditional streamline propagation (i.e., “plug-and-play”),
Take only data routinely acquired in clinical settings as input (i.e., T1-weighted MRI), and
Be open-source and widely accessible and implementable as a part of the research workflows necessary for clinical adoption.
One possible established solution for these criteria is the use of non-linear or deformable registration of an atlas, or population-based tractogram, to participant imaging. With this approach, one could perform any arbitrary tractography on the atlas and then move the resultant streamlines to subject space for further analysis. However, while many registration options exist, this approach would necessarily suffer from decreased subject specificity, especially in the subcortical white matter where cortical folding patterns between participants are highly variable.
As such, prior studies have explored alternative solutions. For instance, Yang et al. (2022) recently utilized deep learning to segment subject-specific WM bundle regions on T1-weighted (T1w) MRI that would typically be identified with tractography. Similarly, Alemán-Gómez et al. (2022) recently used population-based tractograms registered to structural imaging to facilitate a voxel-wise probabilistic atlas for structural connections which would typically be measured with streamlines. These approaches, however, are limited in that they do not allow for streamline-based analyses, only voxel-based ones, nor do they allow for analyses outside those predefined by the neural networks or atlases. Others have taken the approach of image synthesis, attempting to create dMRI and its derivatives directly from other types of imaging. For instance, Gu et al. (2019) and Chan et al. (2023) learned to synthesize dMRI scalar maps, such as fractional anisotropy and mean diffusivity maps, from T1w MRI and fluid attenuated inversion recovery (FLAIR) imaging, respectively. These approaches, while useful for estimating dMRI scalar maps, do not provide the necessary directionality needed for streamline propagation. To capture such directionality, Son et al. (2019) and Anctil-Robitaille et al. (2022) created networks to generate diffusion tensor models from T1w MRI directly, with the former also requiring functional MRI as input and the latter detailing methodology without an associated open-source implementation. Finally, Ren et al. (2021) provide a 2-dimensional slice-wise network for synthesizing arbitrary dMRI from b0, T1w MRI, and T2-weighted (T2w) MRI. While these data are routinely acquired clinically, they are typically done so at different resolutions, with T1w and T2w MRI acquired at 1–2 mm isotropic, but the b0 often with much thicker slices, rendering an isotropic synthetic dMRI volume suitable for tractography out of reach (Neher et al., 2013).
One approach that has yet to be investigated is the notion of streamline propagation directly on T1w MRI. With this approach, the tractography operation on dMRI would be directly substituted with an analogous operation on T1w MRI. We previously developed the first model of this kind and released a containerized implementation (Cai, Lee, Newlin, Kerley, et al., 2024). In this original proof-of-concept paper, the model was trained on 80, validated on 10, and tested on 10 healthy young adult research participants from the Human Connectome Project (HCP) (Cai, Lee, Newlin, Kerley, et al., 2024; Cai, Lee, Newlin, Kim, et al., 2023; Van Essen et al., 2013). As a benchmark in this small cohort, it was found qualitatively to have similar error to dMRI as a repeat dMRI scan, or “rescan,” across WM bundles and connectomics. Though this model satisfies the above three criteria, the original paper was limited in two primary ways. The first was that the test sample size preempted statistical investigation, and the second was that the original paper only looked at 10 healthy young adults, rendering the model’s clinical viability unknown.
In this work, we extend our investigation of this model to fill these two gaps. First, we repeat the benchmark analysis from the original proof-of-concept paper with a larger sample of healthy young adults withheld from training. Our hope is to elaborate on the original qualitative findings and enable statistical investigation of performance differences. Second, we seek to reproduce previously published tractography effects in clinically relevant cohorts where tractography-based analyses have become increasingly popular. These include an aging population to study WM degeneration, patients with major depressive disorder (MDD) to study brain disorganization, and patients with drug-resistant epilepsy (DRE) and brain cancer for presurgical evaluation. Throughout these experiments, we compare the performance of the T1w MRI-based tractography model with traditional dMRI and atlas-based approaches. Overall, our hypothesis is that this model will outperform atlas-based approaches and reproduce dMRI-based tractography effects in a variety of clinical scenarios.
By investigating this hypothesis, our goal is to provide an initial empirical assessment of the clinical viability of the T1w MRI-based tractography model. To define “clinical viability,” we consider the primary use of any clinical score, test, or tool: to differentiate patients in order to inform their care. For instance, suppose a patient comes to the doctor’s office after having an MRI with tractography to extract some measurement from their brain. From the clinician’s perspective, the most important thing is for this measurement to be able to distinguish this patient in some way in order to adjust their care. There is no requirement that this tractography tool must exactly match another existing technique. In fact, this is often not the case where different clinical scoring systems or tests that are used for similar purposes have entirely different units, scales, and offsets. Instead, the measurement tool must simply be able to reflect a change with respect to some change in the patient’s condition (i.e., their age or disease status) such that a threshold can be studied in the appropriate population and implemented to identify patients who need intervention. As such, the first question for clinical viability for a new model like streamline propagation on T1w MRI is simple: does it reflect an expected change in brain state with respect to the relevant independent variable up to a scale and shift? If it does not, it cannot ever be used to differentiate patients clinically.
Thus, we seek to answer this initial question of viability by assessing whether the T1w MRI-based model can reproduce tractography effects that were previously studied and published with dMRI—up to a scale and shift bias. “Reproduction of tractography effects” is defined simply as statistically significant capture of effect direction in response to the appropriate independent variable or patient condition with comparison of effect sizes. Given expected biases between different clinical tools; those between scanner, acquisition, and software approaches to tractography studies; and the existence of the entire field of data harmonization dedicated to overcoming such biases, we do not prioritize a direct evaluation of agreement, though we provide an analysis in the Supplementary Materials (Cai, Yang, Kanakaraj, et al., 2021; Schilling, Tax, et al., 2021).
In addressing this initial hypothesis, we hope our study will move the field closer to one day having a viable tractography alternative for wide-scale clinical use and investigation of WM. Additionally, as this is one of the first investigations of a new form of tractography, we hope this study will promote discussion about the relationship between tractography, dMRI signal, and T1w MRI contrast.
2 Methods
2.1 Streamline propagation on T1w MRI
For this work, we characterize our previously published convolutional-recurrent neural network (CoRNN)-based model for propagating streamlines directly on T1w MRI (Cai, Lee, Newlin, Kerley, et al., 2024). The full training details, model architecture, and streamline propagation implementation can be found as described for the “student model” in the original paper (Cai, Lee, Newlin, Kerley, et al., 2024). In short, this model functions analogously on T1w MRI as the SD_STREAM deterministic propagator would on dMRI, using the T1w MRI itself and derived anatomical context, including a FAST tissue-type mask, a spatially localized atlas network tiles (SLANT) whole-brain segmentation, and WM Learning (WML) TractSeg segmentation, as input (Huo et al., 2019; Jenkinson et al., 2012; Tournier et al., 2012, 2019; Van Essen et al., 2013; Wasserthal et al., 2018; Yang et al., 2022; Zhang et al., 2001). We perform anatomically constrained tractography using tissue-type masks, a minimum/maximum streamline length of 50/250 mm, a maximum step angle of 60°, a step size of 1 mm, and random seeding in WM (Tournier et al., 2012, 2019). All computations are implemented on an Ubuntu 20.04 workstation with PyTorch 1.12.1 and performed on an NVIDIA RTX Quadro 5000 or RTX A6000 with CUDA 11.6.
2.2 Contextualization against dMRI variability in healthy young adults
In the original paper, we qualitatively evaluated the CoRNN model in 10 healthy young adults from HCP withheld from training in two commonly used tractography applications, WM bundle analysis and structural connectomics, in order to contextualize the model’s performance in the setting of dMRI variability. This design was chosen as ground-truth comparisons would require ex vivo histological validation and be unfeasible (Sporns et al., 2005; Yeh, 2020). To extend this analysis to larger sample sizes in order to facilitate statistical quantification, we repeat this analysis with 38 HCP participants withheld from training. Each of these participants have two imaging sessions with dMRI and one T1w MRI. The dMRI images from the different sessions are denoted dMRI 1 and dMRI 2, respectively. Using these data, we first create a reference with tissue-type anatomically constrained SD_STREAM tractography to 1 million streamlines on dMRI 1 (Smith et al., 2012; Tournier et al., 2012, 2019). As with the CoRNN model, we seed randomly in WM, use a length range of 50 to 250 mm, a step size of 1 mm, and a maximum step angle of 60°. We then compute the same tractogram using the same SD_STREAM propagator on dMRI 2 and the CoRNN model on the T1w MRI. This study design allows us to investigate the error of the CoRNN model against the variability of scan-rescan dMRI with traditional tractography. Comparisons are made in Montreal Neurological Institute (MNI) 2 mm space to compare between dMRI and T1w MRI (Grabner et al., 2006). Importantly, we omit the dMRI deep learning-based tractography study (i.e., the “teacher model”) from the original paper as our focus here is on differences between dMRI and T1w MRI (Cai, Lee, Newlin, Kerley, et al., 2024).
To investigate bundles, we utilize the RecoBundlesX framework to identify 39 WM bundles from the whole-brain tractograms (Supplementary Table S1) (Garyfallidis et al., 2018; Rheault, 2020). We note these bundles are defined differently than those defined by the WML TractSeg framework used for anatomical context in the CoRNN input. We compare bundles across methodologies using the bundle adjacency streamlines metric, a distance metric representing the minimum pairwise direct-flip distance on bundle centroids in mm, and the Dice coefficient voxel-wise metric, a similarity metric representing the ratio of intersecting over total volume ranging from 0 to 1 (Garyfallidis et al., 2012, 2018; Rheault, 2020; Schilling, Rheault, et al., 2021). Notably, lower bundle adjacency and higher Dice indicate improved agreement, and the streamline-based metric provides additional insight into shape information that the voxel-wise one does not (Rheault, 2020). Additionally, we compare bundle characteristics as defined by Yeh (2020), namely the volume, length, span, and surface area.
To investigate connectomics, we use 98 cortical regions defined by the SLANT BrainCOLOR framework (Huo et al., 2019). We note that these regions are also defined differently than those used as input to the CoRNN model because the latter are grouped together before training. We compute two types of structural connectomes, one whose edges are weighted by the streamline count between any two given regions and the other whose edges are weighted analogously by average streamline length (Sporns et al., 2005). To compare connectomes across methodologies, we compute the Pearson correlation between them. Additionally, we reduce each connectome to three scalar graph theory measures representing brain organization with the Brain Connectivity Toolbox for comparison (Rubinov & Sporns, 2010). From the connectomes weighted by streamline count, we compute the maximum modularity. This scalar represents the degree to which the nodes of the connectome may be subdivided into separate non-overlapping groups. From the connectomes weighted by average streamline length, we compute the average betweenness centrality and characteristic path length. The former represents the average fraction of shortest paths through the connectome in which the brain regions participate, and the latter represents the average shortest path between brain regions in mm.
Last, we provide an additional comparison against a common T1w MRI-only method of bundle and connectomics analysis: non-linear registration of streamlines from an atlas. We use the FOD and T1w MRI atlases provided by Lv et al. (2023), computing 1 million streamlines in atlas space with the same anatomically constrained SD_STREAM tractography as previously described for consistency. The T1w MRI atlas is subsequently deformably registered to each subject’s T1w MRI and the resultant transform is used to warp the tractogram streamlines from atlas space to subject space (Avants et al., 2008; Tustison et al., 2014). We investigate the agreement of bundles and connectomics generated by this atlas with the SD_STREAM reference to compare its performance to that of the CoRNN model.
We statistically characterize differences between the CoRNN model, the reference, and the atlas with pair-wise Wilcoxon rank-sum tests with Bonferroni multiple comparisons correction.
2.3 Changes in the forceps minor and brain organization in aging
Moving away from healthy young adults, we investigate the CoRNN model in a more clinically applicable, older, aging population by seeking to reproduce previously published age effects in WM bundles and structural connectomics. For bundles, Schilling et al. (2022) previously characterized decreases in the volume and surface area of the forceps minor with age in a large, multisite, longitudinal cohort. In connectomics, Newlin et al. (2023), among others, previously identified a positive association in maximum modularity with age, also in a multisite cohort (Dennis et al., 2012; Wang et al., 2022).
Of the sites used by both studies, one common dataset was derived from the Vanderbilt Memory and Aging Project (VMAP). As such, from this cohort we randomly selected 104 cognitively unimpaired participants with single-session imaging for simplicity to evaluate the CoRNN model. These participants include 45 males and 59 females and have an average age of 72.2 years with 7.4 standard deviation. Twenty-one of the participants carry the apolipoprotein E e4 (APOEe4) gene, placing them at increased risk for dementia. Each participant is imaged once with T1w MRI acquired on a Philips Achieva (Best, The Netherlands) scanner at 3T (TE/TR = 4.6/8.9 ms) at 1 mm isotropic resolution with Institutional Review Board (IRB) approval under number 120158. After tracking with the CoRNN model, the forceps minor bundle and its volume and surface area are computed using the RecoBundlesX framework, as described previously (Garyfallidis et al., 2018; Rheault, 2020). The connectomes are computed with regions defined by nonlinear ANTs registration of the Desikan-Killiany atlas provided by FreeSurfer, and the maximum modularity measures are computed with the Brain Connectivity toolbox, as previously described (Avants et al., 2008; Desikan et al., 2006; Fischl et al., 2004; Garyfallidis et al., 2018; Rheault, 2020; Rubinov & Sporns, 2010).
Newlin et al. (2023) modeled the effects of age, sex, and imaging site (not relevant in this case) on modularity in a cohort of non-APOEe4 carriers with and without cognitive impairment. As such, the statistically analogous model here for the CoRNN model is as follows, substituting cognitive impairment for APOEe4 status (Eq. 1).
Similarly, Schilling et al. (2022) modeled the forceps minor against age, sex, imaging site (also not relevant in this case), total intracranial volume (TICV), and CSF volume (CSFV) in non-APOEe4 carriers without cognitive impairment. Thus, we compute TICV and CSFV with FreeSurfer and use the following model with consideration of APOEe4 as done in Equation 1 to investigate age effects in the forceps minor (Eq. 2) (Buckner et al., 2004; Fischl, 2012). Of note, Schilling et al. (2022) included repeated measures (also not relevant in this case).
We test for significance in at 0.05 for each of the three models (modularity, volume, and surface area). We plot the data alongside model predictions isolating the age effect with the 95% confidence interval and provide the estimates (after z-score normalization for volume and surface area, consistent with the original publication) and associated standard error alongside the previously published values.
Additionally, we provide comparisons against dMRI data and an atlas-based approach from the VMAP cohort. dMRI were acquired on a 3T Philips Achieva scanner at Vanderbilt University under IRB 120158 with 32 gradient directions at b = 1,000 s/mm2 and 2 mm isotropic resolution (TE/TR = 60/10,000 ms) and preprocessed to remove artifacts (Cai, Yang, Hansen, et al., 2021; Lauzon et al., 2013). To investigate the forceps minor, the dMRI data are processed with one of the two bundle-based approaches, traditional TractSeg, as described in the original publication by Schilling et al. (2022). The volume and surface area of the forceps minor are computed as previously described (Wasserthal et al., 2018; Yeh, 2020). For the atlas-based approach, the same tractogram template generated from the atlas provided by Lv et al. (2023) is registered to each subject and the forceps minor and its volume and surface area are computed as previously described (Yeh, 2020). To investigate modularity, the measures from dMRI are processed as described by Newlin et al. (2023), and the T1w atlas-based comparison is performed as previously described with the Lv et al. (2023) template and modularity computation (Rubinov & Sporns, 2010). Finally, these data undergo the same statistical analyses and plots as described for the CoRNN method.
2.4 Brain connectivity differences in depression
Having evaluated the CoRNN model across different age groups, we turn to our first clinical disease model, MDD. Korgaonkar et al. (2012) previously investigated cortex-to-cortex structural connectivity differences in younger adults with MDD. They found that connectomes weighted by streamline count separated cases from age- and sex-matched controls with a linear discriminant analysis classifier. As such, we investigate the CoRNN model’s ability to reproduce these findings, defined as the existence of a classifier able to statistically separate participants with MDD from controls using only connectomes generated from the CoRNN model and T1w MRI.
We leverage a cohort of 62 geriatric participants sex- and age-matched to within 5 years with structural T1w MRI acquired at Vanderbilt University on a Philips Achieva scanner at 3T (TE/TR = 4.6/8.8 ms) at 0.8 x 0.8 x 1.2 mm resolution under IRB 141137. These include 34 females and 28 males with an average age of 66.7 years with 5.7 standard deviation. As in previous sections, we generate the connectomes from cortical parcellations using the 98 regions defined by the SLANT framework with edges weighted by streamline count (Huo et al., 2019).
Importantly, we note the differences in age between the two cohorts, with Korgaonkar et al. (2012) investigating younger adults and us investigating a geriatric population. As such, we recognize this older cohort to be more heterogeneous, and accordingly modify our classifier approach from the linear discriminant analysis for stability (Eavani et al., 2018). To reduce the data dimensionality and the impact of outliers, we normalize the log edge weights of each connectome to their sum, z-score each across the cohort, and project each connectome to the 61-dimensional feature space defined by principal component analysis, similarly to Korgaonkar et al. (2012). On these features, we train a random forest classifier in 10-fold stratified cross-validation with MDD as the positive class and pool the predicted probabilities for all withheld participants across the folds. We plot these probabilities and evaluate for differences in them between the control and MDD groups with the Wilcoxon rank sum test at 0.05 significance.
Korgaonkar et al. (2012) also assessed which connections were important for their classifier. As such, we evaluate the connections identified in their paper presently for statistically significant differences between control and MDD with a 2-sample Student’s t-test at 0.05 significance as done by the original authors and tabulate the results (Korgaonkar et al., 2012).
For context and given the age differences between these cohorts, we also investigate these effects against a dMRI and an atlas-based approach. dMRI were acquired on a 3T Philips Achieva scanner at Vanderbilt University under IRB 141137 with 60 gradient directions at b = 2,000 s/mm2 and 2.5 mm isotropic resolution (TE/TR = 75/6,525 ms) and preprocessed to remove artifacts (Cai, Yang, Hansen, et al., 2021). We produce dMRI connectomes weighted by streamline count for the same cortical regions using the Connectoflow pipeline and fit them with the same random forest classifier framework to predict MDD status with the Wilcoxon rank sum test at 0.05 (Rheault et al., 2021). The T1w atlas-based connectomes are similarly analyzed after being produced with the Lv et al. (2023) template as previously described. As with the CoRNN approach, both of these approaches also undergo analysis of the specific connections identified by Korgaonkar et al. (2012).
2.5 Intracranial electrode-to-electrode connectivity in epilepsy
Next, we investigate the CoRNN model’s ability to capture structural connectivity abnormalities previously found in the neurosurgical planning for patients with DRE. Johnson et al. (2023) previously published a technique that specifically subsamples a dMRI-generated raw streamline tractogram to estimate the WM connectivity between depth electrodes implanted for intracranial electroencephalography (iEEG). This technique is called Subsampling Whole-brain tractography with iEEG Nearfield Dynamic Localization (SWiNDL). The authors observed that electrodes implanted in confirmed seizure onset zones (SOZs) and propagative zones (PZs) have a much higher SWiNDL-estimated structural connectivity compared to that of non-involved zones (NIZs) (Johnson et al., 2023).
To assess the CoRNN model’s ability to capture these findings, we compute the SWiNDL connectivity in SOZs, PZs, and NIZs as originally described by Johnson et al. (2023) from tractograms with one million streamlines generated with the CoRNN model in T1w MRI registered to each patient’s dMRI. We use the exact same patient cohort acquired at Vanderbilt University under IRB 170560 and 170681. To assess for differences, we use a one-way analysis of variance (ANOVA), as done by Johnson et al. (2023), with post-hoc Tukey tests to understand between which class of regions the differences were present. To provide a numerical assessment, we also investigate the effect sizes from the original data and compare them to those found presently, computed with Cohen’s d.
As the exact corresponding dMRI data to this cohort were already published in Johnson et al. (2023), we add only an additional atlas-based approach for comparison. The whole-brain tractograms for each subject in this approach are generated using the Lv et al. (2023) template as previously described and analyzed with the same steps as the CoRNN model.
2.6 Neurosurgical planning of brain tumor resections
Last, we evaluate the CoRNN model against traditional tractography in a patient with a known mass-occupying tumor to assess its performance in brain cancer. We use open-source T1w MRI and dMRI acquired preoperatively from a 39-year-old male patient with a right frontotemporal anaplastic astrocytoma with insular involvement available from openneuro.org (Aerts et al., 2018).
As a baseline, we first perform traditional tractography. We preprocess the dMRI with PreQual, fit the data with FODs computed with constrained spherical deconvolution, and perform anatomically constrained SD_STREAM tractography to 1 million streamlines as previously described (Cai, Yang, Hansen, et al., 2021; Tournier et al., 2007, 2012). Analogously, we perform tractography with the CoRNN model on the T1w MRI to 1 million streamlines. The images are rigidly co-registered to allow for direct anatomical comparison.
As the tumor would distort streamline bundles compared to a traditional atlas for bundle dissection based on healthy subjects, we directly define bundles with the patient’s dMRI. With the SD_STREAM tractogram, we identify 186 streamline clusters with the QuickBundles framework (Garyfallidis et al., 2012). These clusters are then visually inspected to identify anatomically relevant WM bundles involving the right hemisphere and specifically the right temporal lobe: the superior longitudinal fasciculus (SLF), inferior longitudinal fasciculus (ILF), uncinate fasciculus (UF), and arcuate fasciculus (AF). Identified clusters are subsequently cleaned with outlier rejection and used as a template to identify the corresponding clusters in the CoRNN tractogram with the RecoBundlesX framework (Garyfallidis et al., 2018; Rheault, 2020).
For analysis, we visualize the bundles from both approaches and evaluate how well the model captures involvement of the tumor in different bundles compared to traditional tractography. At the same time, we evaluate how similar the resection approach suggested by the model is to that suggested by traditional tractography.
Of note, we do not provide a comparison to an atlas-based approach for this experiment, as registration of a template to a subject with additional tissue like a tumor is an open area of research.
3 Results
3.1 Contextualization against dMRI variability in healthy young adults
In a representative participant, we find small visual differences between whole-brain tractograms, the right arcuate fasciculus and left cingulum, and cortical connectomes (Fig. 1). We do not find these small differences to be prohibitive for further analysis.
To evaluate these differences quantitatively, we compute and plot the median bundle adjacency and Dice coefficients per bundle in Figure 2a and b, respectively. We observe a statistically significant increase in bundle adjacency in the CoRNN model compared to rescan with SD_STREAM, but by less than 0.5 mm, and similarly about 0.5 mm less than in the atlas. We find no statistically significant difference in Dice across bundles between 0.6 and 0.7 for the CoRNN model and for rescan with SD_STREAM, both demonstrating statistically significant improvement over 0.4–0.5 Dice for the atlas. The per-bundle break down of these results is plotted in Supplementary Figures S1 and S2, demonstrating these trends are consistent across bundles of different sizes.
In Figure 2c, we plot the median absolute percent differences in bundle volume, length, span, and surface area per bundle and find larger differences with both T1w-based methods compared to dMRI rescan. These increased differences are more accentuated in the atlas compared to the CoRNN model in bundle volume and surface area. We observe these differences to be largely statistically significant overall.
In Figure 2d, we plot the median correlation for connectomes weighted by streamline count and length. We find statistically significant reductions in correlations with the CoRNN model and even larger statistical reductions with the atlas. In Figure 2e, we plot the absolute percent difference in connectome graph theory measures, finding no statistically significant differences between the CoRNN model and rescan dMRI tractography for modularity and average betweenness centrality. We find a small but statistically significant difference between the two in characteristic path length. All connectome measures demonstrate dramatic statistically significant differences between the atlas and other two approaches.
3.2 Evaluation in clinically-relevant cohorts
We plot the age effects identified by the CoRNN model from T1w MRI in the VMAP cohort for the forceps minor in Figure 3. We observe that even with increased error in the volume and surface area measures with the CoRNN model compared to dMRI rescan (Fig. 2c), we reproduce the negative associations with age with statistical significance previously published by Schilling et al. (2022) to an expected bias factor. Additionally, we find close agreement of the effect sizes, with the reference values contained within one standard error of the CoRNN estimates (Table 1). Importantly, we note a noticeable difference in trend line fit in Figure 3: this is expected as the original figure by Schilling et al. (2022) plotted pure age-volume and age-surface area correlations for visualization purposes, whereas the effect plotted here is the isolated age effect from the multivariate models in equations 1 and 2. We also note the presence of multiple datasets in the original study compared to the one presently chosen for simplicity. We note similar results with the dMRI data but note no statistically significant capture of an age effect with the atlas-based approach.
. | . | Published . | dMRI . | CoRNN . | Atlas . |
---|---|---|---|---|---|
Schilling et al. (2022) () | Normalized Volume (Forceps Minor) vs. Age | –0.0278*** | -0.0187 ±0.0096^ | –0.0285 ±0.0092** | -0.0054 ±0.0064 |
Normalized Surface Area (Forceps Minor) vs. Age | –0.0254*** | -0.0250 ±0.0099* | –0.0291 ±0.0096** | -0.0046 ±0.0067 | |
Newlin et al. (2023) () | Maximum Modularity vs. Age | 0.0014*—0.0025*** | 0.0009 ±0.0002*** | 0.0022 ±0.0005** | -0.0001 ±0.0002 |
Johnson et al. (2023) (Cohen’s d) | SOZ vs. PZ | –0.02035 | –0.3252 | 0.3539 | |
SOZ vs. NZ | 1.7789*** | 0.4197 | 1.4293** | ||
PZ vs. NZ | 1.5640*** | 0.8238* | 0.5636 |
. | . | Published . | dMRI . | CoRNN . | Atlas . |
---|---|---|---|---|---|
Schilling et al. (2022) () | Normalized Volume (Forceps Minor) vs. Age | –0.0278*** | -0.0187 ±0.0096^ | –0.0285 ±0.0092** | -0.0054 ±0.0064 |
Normalized Surface Area (Forceps Minor) vs. Age | –0.0254*** | -0.0250 ±0.0099* | –0.0291 ±0.0096** | -0.0046 ±0.0067 | |
Newlin et al. (2023) () | Maximum Modularity vs. Age | 0.0014*—0.0025*** | 0.0009 ±0.0002*** | 0.0022 ±0.0005** | -0.0001 ±0.0002 |
Johnson et al. (2023) (Cohen’s d) | SOZ vs. PZ | –0.02035 | –0.3252 | 0.3539 | |
SOZ vs. NZ | 1.7789*** | 0.4197 | 1.4293** | ||
PZ vs. NZ | 1.5640*** | 0.8238* | 0.5636 |
^p < 0.1, *p < 0.05, **p < 0.005, ***p < 0.0005.
Similarly, we reproduce the positive associations of modularity with age to an expected bias factor with statistical significance previously published by Newlin et al. (2023) (Fig. 4). We also find close agreement of the effect sizes (Table 1). In their original figure, Newlin et al. (2023) plotted the age effect from their multivariate model, as did we. As with the results in Figure 3, we note the presence of only one dataset presently for simplicity and similar results with the dMRI but not with the atlas-based approach.
We plot the classifier-predicted probabilities of MDD against control in Figure 5. Compared to the original study by Korgaonkar et al. (2012), we find the connectomes weighted by streamline count are able to recapitulate the MDD effect, despite reduced correlation in the healthy young adult cohort compared to rescan dMRI (Fig. 2e). However, we note the effect size reproduced by CoRNN is smaller than the original. An analysis of the specific connections identified by Korgaonkar et al. (2012) is presented in Table 2. Of note, the CoRNN and original region definitions are not a 1:1 match, so the p-values of the possible corresponding regions with the largest effect sizes are reported. The left cuneus to right corpus callosum connection is omitted due to the regional definitions being all cortical gray matter presently. We find statistically significant capture of between group differences in 3 of the 7 relevant connections. We find similar results in this cohort with the dMRI approach but failure of effect capture with the atlas-based approach. We also note that the atlas approach was unable to recover one of the connections predefined by Korgaonkar et al. (2012) (Table 2).
Regions connected . | Published . | dMRI . | CoRNN . | Atlas . | |
---|---|---|---|---|---|
Left inferior temporal | Left post central | 0.003 | 0.1663 | 0.0344 | 0.3062 |
Right inferior parietal | Right insula | 0.003 | 0.0051 | 0.0005 | 0.0134 |
Left lingual | Right lateral occipital | 0.003 | 0.6347 | 0.0152 | 0.3213 |
Left lingual | Right superior parietal | 0.006 | 0.5515 | 0.8683 | NF |
Left pars triangularis | Left superior frontal | 0.006 | 0.0470 | 0.2895 | 0.4326 |
Right inferior parietal | Right post central | 0.006 | 0.0640 | 0.1033 | 0.4303 |
Left cuneus | Right corpus callosum | 0.008 | NA | NA | NA |
Left lingual | Left superior temporal | 0.009 | 0.7778 | 0.2579 | 0.9002 |
Regions connected . | Published . | dMRI . | CoRNN . | Atlas . | |
---|---|---|---|---|---|
Left inferior temporal | Left post central | 0.003 | 0.1663 | 0.0344 | 0.3062 |
Right inferior parietal | Right insula | 0.003 | 0.0051 | 0.0005 | 0.0134 |
Left lingual | Right lateral occipital | 0.003 | 0.6347 | 0.0152 | 0.3213 |
Left lingual | Right superior parietal | 0.006 | 0.5515 | 0.8683 | NF |
Left pars triangularis | Left superior frontal | 0.006 | 0.0470 | 0.2895 | 0.4326 |
Right inferior parietal | Right post central | 0.006 | 0.0640 | 0.1033 | 0.4303 |
Left cuneus | Right corpus callosum | 0.008 | NA | NA | NA |
Left lingual | Left superior temporal | 0.009 | 0.7778 | 0.2579 | 0.9002 |
NA = not applicable, NF = not found.
We plot the SWiNDL connectivity estimated by the CoRNN model and the atlas approach in Figure 6. We find the CoRNN model is able to recapitulate the increase in connectivity in PZs compared to NIZs with statistical significance, but not in SOZs. The corresponding effect sizes are reported in Table 1. In the PZ versus NZ effect, we see a decrease in effect size by about half with the CoRNN method, though both the original and CoRNN effect are considered to be large by the established Cohen’s d threshold at ≥0.8 (Sullivan & Feinn, 2012). On the other hand, we find the atlas approach is able to recapitulate the difference between SOZs and NIZs with statistical significance at a similar effect size to the original study, but not the difference between PZs and NIZs.
We plot the right temporal bundles identified by the Quick- and RecoBundles frameworks in the patient with astrocytoma in Figure 7. We note that no right UF was identified by either method, suggesting full infiltration of the tumor into this bundle, as is logical given its location. We find recovery of the SLF, AF, and ILF by both SD_STREAM and the CoRNN model with synchronous fiber tract courses in relation to the tumor (Fig. 7b). We also find that both techniques identify ILF streamlines that terminate at the tumor border as well as ILF streamlines that wrap the lateral aspect of the insula, suggesting infiltration of the tumor into some but not all of this bundle (Fig. 7c, d). Both traditional tractography and the CoRNN model support that resection of this tumor may result in damage to the UF and/or ILF either through tumor involvement or during resection of the tumor, especially if supratotal tumor resection is planned. Both methods also suggest the SLF and AF are additional structures that may be at risk, especially with retraction of the frontal lobe.
4 Discussion
4.1 Commentary on results
Overall, in the training population of healthy young adults, our results suggest that performance of the CoRNN model is statistically equivalent to inferior compared to that of a repeat dMRI scan, but also superior to a traditional atlas-based approach. Further, prior imaging studies have estimated between-session dMRI variability to be on the order of <5–30% coefficient of variation which translates to about 5–40% difference considering the conversion between them (Cai, Yang, Kanakaraj, et al., 2021). We are further reassured that the error of the CoRNN method falls within this range, despite the inferiority results. This provides important contextualization for performance of the CoRNN method against dMRI variability, though we note that it is not a direct comparison.
Moreover, in the aging populations and in the MDD cohort, we find that the CoRNN method reproduces some previously published tractography effects whereas the atlas-based approach does not. Specifically, in aging, we find strong reproductions of effects in the forceps minor and in brain modularity with the CoRNN method. In MDD, we also reproduce connectome effects with the CoRNN method, though at a smaller effect size than the original and with loss of signal in approximately half of the important connections previously published. However, we note that the reproduction with dMRI also exhibits this behavior, placing a limit on the effects identifiable by the CoRNN method. Thus, these findings could likely be explained by the cohort investigated presently being also older than the original, potentially introducing more heterogeneity into the study (Eavani et al., 2018).
In the DRE cohort, we find the CoRNN method recovers PZ and NIZ differences in WM connectivity between electrodes but not WM effects in SOZs. Importantly, however, the SOZ effect is the primary effect of interest in this scenario. We believe this identifies an important situation in which CoRNN results should be interpreted carefully. The original SWiNDL paper indicated that the SOZ effects were largely driven by short-range (i.e., 5–20 mm) cortically adjacent fibers, as shown in Figure 5b in Johnson et al. (2023). Taken together with the recovery of the bundle and whole-brain connectomics effects in aging and MDD, these results suggest that studies investigating these short-range cortically adjacent pathways should be mindful of fibers reconstructed by the CoRNN model. This is consistent with recent pilot explorations of streamline-based error of the CoRNN model (Yu et al., 2024). Future characterizations of the model should elucidate why this occurs, but one potential reason is that the recurrent architecture of CoRNN requires streamline “memory” for propagation which may not be available at shorter scales.
Further, we found the atlas approach identified a significant but reduced effect size between SOZs and NIZs. We do not believe this to be a surprising result for two reasons. The first is that short-range cortically adjacent streamlines are likely to exist in the template tractogram created from healthy populations, though registration may affect their precise localization on the subject-level in the DRE cohort. This is in contrast to the CoRNN approach where the model as trained may not have appropriately generated these streamlines at all. The second is that healthy populations are expected to have mesial temporal structural connectivity differences compared to the rest of the brain, and the DRE cohort is heavily biased toward mesial temporal lobe epilepsy patients (Johnson et al., 2023; Taylor et al., 2015). Thus, even with the expected reduction in localization fidelity, connectivity based on an atlas of the general population will likely identify differences in short-range connections between mesial and non-mesial structures as reflected in the SOZ versus NIZ effect presently.
Finally, we find traditional tractography and the CoRNN model suggest similar tumor involvement in WM and inform similar surgical approaches.
Expanding on these findings, the implications are that as is the CoRNN method may enable an approximation of tractography that is not perfect, but far exceeds atlas-based approaches in long-range (i.e., >20 mm) bundle and whole-brain connectomics and may be sufficient enough for clinical use, depending on the question. For instance, if evaluating surgical tumor resection candidacy, structural WM markers for aging, or whole-brain connectivity changes in MDD, the model may be sufficient. However, it may not be sufficient for short-range local connectivity estimations in DRE or a more granular assessment of connectivity changes in MDD.
Fortunately, the CoRNN model is presently naively trained in a limited population with a low sample size, suggesting with refinement the method will only improve. For instance, the convolutional portion utilized consists only of one convolutional layer. Moreover, it was trained only on 80 healthy young adults. Further, the participants with MDD, epilepsy, and brain tumors were all omitted from training. However, there is no reason why these data or those like them cannot be included with retraining. This would widen the conditioning of the model from healthy young adults to a more diverse population and perhaps allow the model to recover the epilepsy effects missing presently. With the diversity of dMRI data available, this represents a major future direction for this work.
As such, we conclude that the error of the first CoRNN model for tractography is within the ballpark of error expected with dMRI variability, offers dramatic improvement over atlas-based approaches for long-range and whole-brain assessments, is already clinically viable in terms of patient differentiability in certain circumstances, but requires refinement before it would be able to make tractography studies more widely available in clinical imaging settings where T1w MRI, but not HARDI, are often acquired.
4.2 Implications of results
It is well known that dMRI and T1w MRI have different image contrasts. For instance, while voxels in dMRI effectively estimate cellular architecture and orientation, those in T1w MRI measure proton relaxation thought to be dependent on myelin content and other tissue properties (Harkins et al., 2016). However, traditionally, only dMRI has found established success as a basis for tractography given its dependence on WM orientation. Given the black box nature of neural networks, this begs a question: what exactly in T1w MRI is providing the signal for the CoRNN method? On one hand, perhaps WM orientation is somehow encoded in T1w MRI image contrast (Schyboll et al., 2020). On the other, perhaps there exists an alternative signal in T1w MRI such as brain shape or gray/white matter boundaries capable of propagating streamlines, one that could be related to dMRI signal as a Bayesian prior distribution or be entirely independent. Or, perhaps it is something else in the middle.
Given the current understanding of tractography, our opinion is that it is likely the Bayesian prior explanation, one that is stronger or more subject-specific than population-based atlases for long-range applications, given the current results. Additionally, during training, we empirically observed that streamlines were often too straight and short when the contextual information provided by SLANT, FAST, and WML was removed, further suggesting that the contextual information was important for the model’s success—though both SLANT and WML (and other tissue segmentation operators like FAST) are learned operations and could in theory be removed with sufficient end-to-end retraining of the model. However, if it is true that there exists an alternative shape or boundary-based signal or context for tractography and that the CoRNN method is merely the first to attempt to characterize it, a second important question arises: how many, if any, known tractography effects are simply reflecting changes in this alternative signal, as opposed to changes to dMRI signal in WM as is typically assumed? For instance, perhaps Schilling et al. (2022), Newlin et al. (2023), and the present study all simply were measuring some change in brain shape over age using the two different tractography approaches—one initial thought could be simply brain atrophy, though the age effect was preserved in all cases even controlling for ventricle and vault size, and CoRNN regularly outperformed the atlas-based approach which in theory should also capture ventricle and vault information (Eq. 2). Further, perhaps such a hypothetical shape phenomenon could be more easily captured with a technique that can be more simply integrated clinically, or perhaps both dMRI signal and this alternative signal are surrogates for a third unifying one that has yet to be discovered. Finally, we want to carefully note that while the results herein may comment on the existence of such a shape phenomenon for tractography and associated measures (i.e., streamline count, bundle volume, etc.), it does not comment on dMRI microstructural measures (i.e., fractional anisotropy, mean diffusivity, etc.).
Thus, the implications of these results are broad and require further investigation beyond the scope of the present study. Moreover, their answers will be important for understanding how tractography can be most efficiently used (or not) in studying the human brain. As such, we hope that as the CoRNN methodology is advanced and studied in future work, they will serve as guides for understanding how it can best be incorporated into clinical workflows.
4.3 Key limitations of results and experimental design
For this study, we use a broad and general definition of “clinical viability.” This decision was made as no assessment of this kind on a model of this kind has been done before. However, such a definition has limitations. Namely, it does not provide any ability to estimate a theoretical upper or lower bound for the general accuracy of the model. It also does not define which populations would do better or worse using this technique as opposed to traditional dMRI nor does it theoretically or empirically account for specific population distribution effects or confounders. As such, it does not provide a comprehensive assessment of viability in all clinical populations or tractography applications. However, we note that this definition does provide a framework for an initial empirical assessment of viability. In other words, it facilitates a “pilot” study where overwhelmingly negative results would indicate that the model—as trained—is not worth further investigation in clinical settings or populations. Thus, in this context, we believe this initial study provides a foundation for future work elaborating on the most appropriate, robust, and specific clinically relevant use cases.
Additionally, one key consideration to most assessments of deep learning models is potential confounding from out-of-distribution effects. For instance, deep learning models trained to segment the brain would logically produce skewed, misrepresented, or nonsense results on images of the abdomen. In this hypothetical context, any seemingly accurate abdominal segmentations would need to be taken with a grain of salt in that they could be due to an out-of-distribution effect masquerading as a positive result.
In this study, however, the out-of-distribution effects are more so integrated as experimental as opposed to confounding variables due to the head-to-head comparison of the CoRNN model to traditional dMRI-based tractography in clinical populations. This is because the CoRNN model is evaluated in aging, DRE, MDD, and cancer but trained on young and healthy populations and because dMRI directly measures neurologic phenomena without concern for out-of-distribution confounding. In other words, our study can be described as measuring the ability of the T1w-based model to capture dMRI information in distributions it was not trained on, indirectly assessing the strength of these out-of-distribution effects on clinically meaningful information as opposed to being confounded by them.
That being said, it is important to note that this approach does not fully rule out or characterize these out-of-distribution effects. For instance, this study is a purely empirical assessment, and thus does not comment on the theoretical bounds of such effects or establish how strong they could or should be relative to what they are measured to be. It also does not imply anything about unseen distributions, nor does it completely establish the lack of such confounding. However, we are reassured that this initial study supports the need for further investigation given the more-or-less consistent performance of the model across these different clinical populations.
4.4 Additional limitations and future directions
Our results demonstrate the CoRNN method as trained captures at least some tractography effects, but it is clear that it does not capture all of them, like the DRE SOZ findings for instance. One potential reason for this, as alluded to in the previous sections, is that the CoRNN approach may be measuring an alternative signal that is not sensitive to all dMRI changes, especially those in short fibers near the cortex. Further, consider many common applications of tractography, including those investigated presently outside the DRE cohort: bundles, connections, and brain organization. These analyses are all derived from individual streamlines, but ultimately group them into long-range, macroscopically appreciable structures prior to analysis. Thus, it is possible that even if tractography on the streamline level is a reflection of dMRI signal changes, as is currently understood, the common uses of tractography could be so far removed from individual streamlines that their effects can be captured by a shape and boundary-based technique.
Therefore, perhaps the SOZ signal is not reflected in these larger changes appreciable on T1w MRI. Alternatively, the SOZ distribution under CoRNN demonstrated increased variability, so it could also just be due to noise or another methodological failure of CoRNN. Thus, two important future directions arise. One is to understand how the CoRNN model may perform under different scales, not just across bundle size as characterized presently, but also on the streamline level across the brain, locally near the cortex, and at short-range. A second is to not only better understand the signal captured by the CoRNN method, as discussed previously, but also better characterize its failures to understand when it would be an appropriate alternative to traditional tractography and when it would not be. As it is an artificial intelligence-based method, it would be prudent to ensure it was not “hallucinating” results. Thus, importantly, we note that this model is not yet ready to be a true “tractography replacement”—even though it likely will offer improved performance over an atlas-based approach in most traditional tractography applications.
Continuing down this line of thought, we note that a common use of dMRI is performing voxel-wise interrogation of models fit to the signals, as opposed to streamline-wise through the use of streamline count, for instance. For diffusion tensor models, this includes measurement of the mean diffusivity and fractional anisotropy (Westin et al., 1997). However, we note that while streamlines are propagated from FODs, FODs can also be reconstructed from streamlines. Thus, one future direction for better untangling the CoRNN signal is to use streamlines derived by the method to reconstruct diffusion tensors or FODs voxel-wise and compare them against those directly measured with dMRI or learned from structural imaging (Anctil-Robitaille et al., 2022; Chan et al., 2023; Gu et al., 2019; Ren et al., 2021). Investigations such as this would further elucidate the nuances in the CoRNN signal and its best use cases, including the interplay between other dMRI-derived measures and tractography.
One key aspect of long-term clinical viability is the ability of technology to adapt to heterogeneous conditions. As such, we note that the variability in imaging workflows, acquisitions, and software studied in the clinical cohorts presently represents a challenge that we view as an overall positive. We are reassured that many effects were identified despite this variability and that future studies ought to consider this heterogeneity when characterizing the methodology. One example of this was the decreased effect size identified in the MDD experiment, though in this case the heterogeneity also extended to the geriatric population itself and associated dMRI imaging (Eavani et al., 2018). As such, we used a slightly different classifier approach for stability, but again are reassured that the effect was identified despite this change.
Another example of heterogeneous conditions is the potential for anisotropic T1w acquisitions, especially those acquired slice-by-slice. As CoRNN was trained on 2 mm isotropic data and given the existing literature indicating dMRI tractography performs more optimally under isotropic conditions, we anticipate similar degradations in CoRNN performance with anisotropic T1w MRI (Neher et al., 2013). However, if CoRNN is truly relying on larger shape effects, as opposed to local dMRI signal changes, it may be more robust to anisotropy. This is speculation, however, and additional investigation will be necessary to empirically establish how the performance of CoRNN changes under anisotropic conditions as compared to traditional tractography with HARDI.
One implication of this heterogeneity and decreased effect size is that future studies may potentially require larger samples in order to recover some tractography findings using T1w MRI compared to dMRI. As such, there may exist a tradeoff where the simplicity of using T1w MRI over HARDI is outweighed by the need to acquire more data. Therefore, power analyses should also be considered alongside failure assessments during future work, likely to be most useful after further refinement of the technique. Presently, we are reassured that the effects were reproduced with relatively low N (i.e., less than 100 or so) across the experiments.
Finally, we note that though dMRI tractography is well-established, it is not a “ground truth” per se, as ex vivo histological validation would be required. Thus, it is important to note that measurements made by tractography are relative and biases between methodologies are known to exist and can be accounted for with existing harmonization techniques (Fortin et al., 2017; Schilling, Tax, et al., 2021). As such, we expected scale and shift biases between CoRNN and SD_STREAM presently and investigated effects up to such bias factors. However, we note that future work characterizing what these biases look like and more importantly how they can be overcome in different populations, under different imaging workflows, and in different clinical settings would be critical prior to widespread adoption of this technique.
Data and Code Availability
We make the source code for the CoRNN method publicly available for evaluation along with a containerized implementation at github.com/MASILab/cornn_tractography. The HCP and tumor data used presently are also publicly available. The remaining data are not made publicly available for privacy reasons. For details regarding the access procedures of these data, please reach out to the corresponding author.
Ethics Statement
This work was conducted in accordance with approval and guidance from the Vanderbilt University and Vanderbilt University Medical Center Institutional Review Boards under policy numbers 160268, 170681, 162119, 170560, 141137, 130727, 181231, 182089, 120158, 010637, and 130859.
Author Contributions
L.Y.C. conceived and implemented the project; performed the experiments; analyzed the results; created the figures; and wrote the first draft of the manuscript. H.H.L. assisted in experimental design and results interpretation. G.W.J., N.R.N., K.R., D.B.A., J.P.B., B.D.B., V.N., S.C., L.B., and M.D. assisted in data gathering and results interpretation. M.E.K. assisted in experimental implementation and software development. T.J.H., A.L.J., W.D.T., V.L.M., D.J.E., L.E.C., B.M.D., and J.C.G. obtained funding and resources used for the project and provided guidance in data use and experimental design. F.R., D.C.M., and K.G.S. provided guidance in experimental design and results interpretation. B.A.L. obtained funding and resources used for the project and was the primary supervisor in all aspects. All authors contributed to revisions of the manuscript prior to submission.
Declaration of Competing Interest
The authors have no significant competing financial, professional, or personal interests that might have influenced the work described in this manuscript.
Acknowledgments
This work was conducted in part using the resources of the Advanced Computing Center for Research and Education at Vanderbilt University, Nashville, TN. A portion of the study data were obtained from the Vanderbilt Memory and Aging Project, collected, and processed by Vanderbilt Memory and Alzheimer’s Center investigators at Vanderbilt University Medical Center. This work was supported by the National Institutes of Health (NIH) under award numbers 5R01EB017230, 1U34DK123895-01, U34DK123894-01, P50HD103537, U54HD083211, U54HD083211-S1, K01EB032898, R01NS095291, R01MH102246, F31NS120401, R01NS112252, R01NS108445, R01NS110130, R01AG034962, R01AG056534, R01NS100980, R01AG062826, K24AG046373, UL1TR000445, UL1TR002243, S10OD023680, K01AG073584, and T32GM007347; by the National Science Foundation (NSF) under award number 2040462; and by the Alzheimer’s Association under award number IIRG-08-88733. This research was conducted with the support from the Intramural Research Program of the National Institute on Aging of the NIH. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or NSF.
Supplementary Materials
Supplementary material for this article is available with the online version here: https://doi.org/10.1162/imag_a_00259