Abstract
Recent years have seen a surge in the use of diffusion MRI to map connectomes in humans, paralleled by a similar increase in processing and analysis choices. Yet these different steps and their effects are rarely compared systematically. Here, in a healthy young adult population (n = 294), we characterized the impact of a range of analysis pipelines on one widely studied property of the human connectome: its degree distribution. We evaluated the effects of 40 pipelines (comparing common choices of parcellation, streamline seeding, tractography algorithm, and streamline propagation constraint) and 44 group-representative connectome reconstruction schemes on highly connected hub regions. We found that hub location is highly variable between pipelines. The choice of parcellation has a major influence on hub architecture, and hub connectivity is highly correlated with regional surface area in most of the assessed pipelines (ρ > 0.70 in 69% of the pipelines), particularly when using weighted networks. Overall, our results demonstrate the need for prudent decision-making when processing diffusion MRI data, and for carefully considering how different processing choices can influence connectome organization.
Author Summary
The increasing use of diffusion MRI for mapping white matter connectivity has been matched by a similar increase in the number of ways to process the diffusion data. Here, we assess how diffusion processing affects hubs across 1,760 pipeline variations. Many processing pipelines do not show a high concentration of connectivity within hubs. When present, hub location and distribution vary based on processing choices. The choice of probabilistic or deterministic tractography has a major impact on hub location and strength. Finally, node strength in weighted networks can correlate highly with node size. Overall, our results illustrate the need for prudent decision-making when processing and interpreting diffusion MRI data.
INTRODUCTION
A major priority for neuroscience is to robustly and accurately map the connections of the human brain (Sporns et al., 2005). These connections are thought to be distributed heterogeneously across different brain regions, with putative ‘hub’ areas having stronger and more frequent connections with other regions (Arnatkevičiūtė et al., 2021; Oldham et al., 2019; van den Heuvel & Sporns, 2013b). Hub connectivity is viewed as playing an integral role in supporting coordinated dynamics (Mišić et al., 2015; van den Heuvel & Sporns, 2013a). It has a distinct developmental trajectory (Fan et al., 2011; Oldham et al., 2022; Oldham & Fornito, 2019); is important for cognitive function (Fagerholm et al., 2015; Sleurs et al., 2021); is under strong genetic influence (Arnatkevičiūtė et al., 2018, 2019, 2021; Fulcher & Fornito, 2016); and is implicated in a diverse range of clinical disorders (Crossley et al., 2014; de Lange et al., 2019; Fornito et al., 2015; Gollo et al., 2018). In humans, the anatomical connectivity of hub and nonhub brain regions is most commonly mapped using tractographic analysis of diffusion magnetic resonance imaging (MRI) data (Betzel & Bassett, 2017; Sotiropoulos & Zalesky, 2019). One challenge of such analyses is that diffusion MRI data are noisy and the final generation of a tractographic estimate of connectivity—a tractogram—depends on many different processing steps, each relying on multiple user-selected options (Jones & Cercignani, 2010; Oldham et al., 2020; Sarwar et al., 2021). As a result, different investigators make different choices, resulting in connectome models that arise from data processed in different ways.
Among the numerous available processing choices, commonly varied steps include the following: the algorithms used to seed, propagate, and prune tractography streamlines in individuals (Jeurissen et al., 2019; Sarwar et al., 2019); the cortical parcellations used to delineate distinct regions and facilitate computational tractability (Lawrence et al., 2021); and the approaches used to generate a group-representative network (Betzel et al., 2019). One recent preliminary investigation found that changing preprocessing steps can shift the location of hubs from parietal/cingulate cortex to temporal cortex (Oldham et al., 2020). Other work has similarly found a high variability in the validity of streamline reconstruction between research groups utilizing different processing pipelines (Maier-Hein et al., 2017). Valid inferences about the structure and function of human connectome hubs critically depend on our ability to reliably identify them, but a detailed examination of precisely how variations in connectome-generation pipelines affect classifications of network hubs has not yet been conducted. Here, we evaluate how such variations influence hub identification, focusing on three key steps in the connectome-generation pipeline: tractography algorithm, cortical parcellation, and group reconstruction.
Tractography refers to the process by which white matter streamlines are generated based on anisotropic water diffusivity. An indirect marker of connectivity with many distinct steps (Jeurissen et al., 2019), it is dependent on user-defined parameters that include (among others) where the streamlines are seeded, how they propagate, and where they can terminate. One well-known, significant choice is between probabilistic and deterministic tractography. In some instances, probabilistic tractography has been shown to match more closely with ex vivo anatomical tract dissections than deterministic tractography (Lilja et al., 2014), while other work has reported that probabilistic tractography is more prone to generating false-positive connections (Sarwar et al., 2019). Indeed, there is a general trade-off between the sensitivity and specificity of different tractography algorithms, with probabilistic tractography being more sensitive but less specific compared to deterministic tractography (Thomas et al., 2014). Moreover, the use of a particular tractography algorithm may interact with other choices in diffusion MRI pipelines, further contributing to connectome variability. For instance, Li et al. (2012) found that 50% of hubs are recategorized when streamline seeds (the locations from where streamlines are propagated) are located at the gray matter–white matter interface rather than deep in the white matter. Methods to differentially retain anatomically probable streamlines have also been suggested (Schiavi et al., 2020; Smith et al., 2012), and manual inclusion/exclusion of streamlines for a given bundle have been shown to increase reconstruction accuracy from 73% to 91% compared to template-generated dissections (Schilling et al., 2020). As such, changing the parameters used for generating tractograms in individuals can result in connectomes with significantly different architecture, a phenomenon that has not been extensively characterized.
Cortical parcellations—the atlases used to define the boundaries between brain areas acting as network nodes—are also a source of variability in connectome architecture. Such parcellations have been undergoing continual revision since at least the time of Brodmann (Brodmann, 1909; Zilles, 2018), and the methods used to generate them are highly variable (Arslan et al., 2018). For example, parcellations have been generated using manual segmentation based on sulcal and gyral anatomy (Desikan et al., 2006); using network models based on functional connectivity (Schaefer et al., 2018); and on multimodal combinations of anatomical, microstructural, and functional features (Genon et al., 2018; Glasser et al., 2016; Wang et al., 2015). Parcellations are also difficult to compare due to differences in the number of regions delineated (Fornito et al., 2010; Zalesky et al., 2010), variability in the surface areas of regions (Van Essen et al., 2012), and interhemispheric (a)symmetry (Yan et al., 2023).
After the connectomes of individuals have been constructed, it is common practice to aggregate the data to derive a group-representative network (de Reus & van den Heuvel, 2013a; Yeh et al., 2016). At this stage it is important to define which connections (edges) should be maintained and how these connections should be weighted. Different methods have been proposed for the former, including retaining edges that are the strongest or most frequently occurring across individuals (de Reus & van den Heuvel, 2013a), retaining edges that are the least variable across people (Roberts et al., 2017), or retaining edges that preserve a specific proportion of connections in different distance bins (Betzel et al., 2019). Complicating the methodological differences of these approaches, the specific thresholds used are often chosen heuristically (Bordier et al., 2017), making it difficult to compare studies using different thresholds.
Here, we compare the effects of different choices at these three key steps—tractography, parcellation, and group reconstruction—on properties of hub connectivity in a sample of 294 healthy young adults. The different options examined at each step resulted in 1,760 different group-representative connectomes. We evaluate the effects of each of these choices on measures of binary and weighted node degree, given that these measures are fundamental to many other network measures and to the definition of network hubs. In particular, we focus on both the distribution of degree measures across nodes and their spatial topography, evaluating the consistency with which hubs are localized to the same anatomical positions.
RESULTS
The processing steps that we independently varied are summarized in Figure 1. In total, we compared 1,760 distinct pipelines. This set encompassed 40 pipelines for generating individual connectomes (10 tractography workflows and 4 parcellation schemes) as well as 44 pipelines for reconstructing group-representative connectomes (4 group aggregation methods and 11 density thresholds). The Results section is organized as follows: first, we examine the effects of different processing steps on statistical properties of the weighted degree distributions. Second, we compare the spatial distribution of node degrees across the different pipelines. Finally, we examine how specific properties of each parcellation are associated with node degree. We focus in the main text on analyses of weighted node degree (also called node strength) distributions and report results for unweighted (binary) distributions in the Supporting Information.
Statistical Properties of Node Degree Distributions
Figure 2 shows how properties of the node strength distributions vary as a function of parcellation and tractography parameters. For simplicity, we focus on networks thresholded at a connection density of 20% and aggregated using edge coefficient of variation (CV), since different density thresholds and aggregation methods did not substantially alter the shape of the distribution (Figures S1–S4). We focus on three key properties that quantify the tail decay of the empirical degree distributions in comparison to the exponential distribution: the right-tailedness (Jordanova & Petkova, 2017), the skewness, and the excess kurtosis (see Methods for details). The exponential distribution has been generally defined as the cutoff for heavy-tailed distributions (Foss et al., 2013), and is therefore a benchmark for assessing whether the concentration of connectivity in hub nodes exceeds that of a single-scale network (Amaral et al., 2000). The distribution of outliers (the right-tailedness) and the third and fourth standardized moments (the skewness and kurtosis) have been previously described to capture tail behavior in statistical distributions (DeCarlo, 1997; Jordanova & Petkova, 2017; Westfall, 2014). They are also parameter-invariant for the exponential distribution (right-tailedness ≈ 0.009, skewness = 2, and excess kurtosis = 6), allowing for an assessment of heavy-tailedness that is not biased by user-defined parameters.
Figure 2A indicates that all pipeline variations qualitatively show some evidence of skewness and a heavy tail. The DK68 and HCP360 parcellations show the largest positive skews, whereas the S200 and S500 parcellations show much smaller tails, consistent with a lower likelihood of finding very highly connected hubs. The exception is the use of workflow 3 (ACT/WM seeding/FACT), which shows an extended tail across all parcellations.
The right-tailedness and skewness of the node strength distributions are shown in Figures 2B and 2C, respectively, as functions of parcellation, workflow, and connection density; 692 of 1,760 pipelines (39%) have more right-sided outliers than an equivalent exponential degree distribution (Figure S2). The skewness is always positive (Figure S3), ranging between 0.42 and 6.04. However, only 25% of pipelines (432 of 1,760) demonstrate a greater skew than the exponential distribution (skewness = 2). Excess kurtosis (Figure S4) is greater than 0 (i.e., more kurtotic than the Gaussian distribution) in 94% of the pipelines (1,654 of 1,760) and greater than 6 in 25% of the pipelines (445 of 1,760). Thus, despite the widely held belief that connectomes contain network hubs, a property that should be reflected in a heavy-tailed degree distribution, only ∼25% to ∼40% of the processing pipelines examined here displayed distributions with properties that align with this hypothesis, depending on how heavy-tailedness is quantified.
There are three further key findings that are evident in Figure 2B and C. First, tractography algorithm has a major effect on the properties of the node strength distribution, with evidence of a skewed, heavy-tailed distribution only obtained when using specific processing steps in combination with deterministic tractography (FACT). More specifically, the most skewed distributions are observed when combining FACT with ACT (workflows 1, 3, and 5), with the additional use of white matter seeding yielding the highest skewness (workflow 3). This effect is apparent across connection densities and persists regardless of the method used for group aggregation (Figures S2–S4). In contrast, probabilistic tractography (iFOD2) only yields evidence of a right-tailed distribution when combined with the HCP360 parcellation.
The second key finding in Figure 2B and C is that parcellation type affects the strength distributions. The skewness, kurtosis, and right-tailedness of connectomes using the HCP360 parcellation are generally higher than other parcellations, regardless of the processing steps used. In part, this may be due to the known size discrepancy between regions of the HCP parcellation, which we examine in more detail in the section The Effect of Variations in Regional Surface Area. The skewness, kurtosis, and right-tailedness only exceed those of an exponential distribution when using the HCP360 parcellation for all pipelines using probabilistic tractography.
The third key finding in Figure 2B and C is that skewness and right-tailedness change minimally as connection density is varied. Thus, connection density does not have a large impact on the tails of the strength distributions of weighted connectomes.
The results for binarized connectomes show some differences relative to weighted connectomes (Figures S5–S8). Specifically, the skewness, right-tailedness, and kurtosis of binarized connectomes are more stable than weighted connectomes when different data processing parameters are varied. Only 2.1% of connectomes (37 of 1,760) are more skewed than the exponential distribution (all using the HCP360 parcellation; Figure S6). Similarly, 2.2% (39 of 1,760) are more kurtotic (Figure S7) and 6.2% (109 of 1,760; Figure S8) show evidence of greater right-tailedness than the exponential distribution. Notably, the skewness of the binarized connectomes was more sensitive to changes in connection density, particularly when edge consistency and CV-based thresholding were used with the HCP360 parcellation. In these specific cases, the distributions showed supra-exponential skewness and right-tailedness at thresholds of 5–10% but not at thresholds of 20–30%. Evidence of strong skewness, kurtosis, or right-tailedness in connectomes using parcellations other than HCP360 was weak and only occurred in rare instances.
Our analysis of strength distributions indicates that conclusions about the degree to which connectivity is concentrated in network hubs can vary substantially depending on how the data are processed, with tractography algorithm (i.e., deterministic or probabilistic) and parcellation type having particularly large impacts. We now turn our attention to how different processing choices affect the spatial embedding of degree. That is, we evaluate whether different pipelines produce network hubs localized to consistent anatomical regions.
Topographical Properties of Node Degree Sequences
For each parcellation separately, we first calculated the partial rank correlations between the degree distribution of each pair of pipelines, controlling for regional surface area. The resulting matrices (one for each parcellation) represent the similarity in spatial location of hubness between tractography workflows. Hierarchical agglomerative clustering of these matrices was used to group similar pipelines together (Figure 3A–D). Taking the S200 parcellation as an example, Figure 3B shows that there are substantial differences in the node strength correlation between pairs of workflows, spanning the range −0.11 < ρ < 1.00, with an average of 0.47 (Figure 3E). As per prior work (Oldham et al., 2020), two large clusters are evident, separating workflows using deterministic tractography from those using probabilistic tractography. The average correlation within the cluster corresponding to deterministic tractography is 0.64 (0.19 < ρ < 1.00) and is 0.67 within the probabilistic tractography cluster (0.32 < ρ < 0.99), with the average correlation between clusters being 0.30 (−0.11 < ρ < 0.57). Within the deterministic tractography cluster, there is a further split as a function of spatial constraint (i.e., ACT vs. GWM) with further subdivisions according to seeding strategy. Within the probabilistic tractography cluster, smaller clusters can also be defined as a function of spatial constraint and seeding strategy, but these subclusters are less homogeneous than those in the deterministic tractography cluster. The basic cluster structure was largely consistent across parcellations, with some minor variations. For instance, with the DK68 atlas, connectomes generated using dynamic seeding, probabilistic tractography, and a gray-white mask (workflow 7) formed their own subcluster. The group aggregation algorithm and threshold density have minimal impact on the clustering (Figure S9).
Figure 4 shows how the spatial distribution of node strengths varies across workflows and parcellations. First, for a fixed parcellation (e.g., the S200 parcellation), the location of putative hubs varies considerably across maps under different processing variations. When using deterministic tractography (FACT), the highest strength nodes are located in the vicinity of the paracentral lobule and supplementary motor area, compared to be located in primary visual areas when using probabilistic tractography (iFOD2). The enhanced skewness associated with the combination of ACT/WM/FACT (workflow 3) is also apparent in these maps. Notably, the DK68 and HCP360 atlas appear more robust to processing variations, which may be driven by the large variability in the size of the parcels comprising these atlases. We consider this issue in more detail in the next section.
Second, for a fixed workflow, Figure 4 shows variations across parcellations. Such comparisons across parcellations can only be performed qualitatively as the lack of region-to-region correspondence precludes direct comparison. Once again, conclusions about the locations of hub regions vary dramatically. The highest strength nodes for the DK68 atlas are located in the medial prefrontal cortex (PFC), whereas this area is associated with relatively low strength in the other parcellations. The S200, HCP360, and S500 parcellations show a greater degree of consistency, with higher strength nodes located in visual, lateral prefrontal, anterior insula, and inferior parietal regions. The major discrepancy between these parcellations is in the primary sensorimotor cortex, which has a high strength in HCP360, but not in S200 or S500 parcellations. For a given parcellation and tractography workflow, the group aggregation algorithm and threshold density have a small effect on node strength rankings (Figure S10).
Figures S11 and S12 compare the spatial distribution of node rank variability across tractography workflows, group reconstructions, and densities. As tractography workflow is altered, relative node rankings can vary drastically (Figure S11). While the middle nodes are the most variable, the rankings of the top 10–20% of nodes are also highly inconsistent. For example, across 30 density-matched group-reconstructed connectomes, the node that is ranked 4th on average can vary between ranks 1 to 43; similarly, the node that is ranked 10th on average can vary between ranks 2 to 54. We also assessed the node rank variability across variations in group reconstruction for an exemplar tractography workflow (Figure S12). While the findings are qualitatively similar to those in Figure S11, the magnitude of the variability is much lower in this analysis. When considered together with Figure 3, this result suggests that tractography workflow—rather than group reconstruction—drives much of the variance in node rankings.
Similarities between node degree distributions in binarized connectomes are shown in Figures S13 and S14. The results show a major difference between probabilistic and deterministic tractography across all parcellations (Figure S13). The locations of the strongest nodes are similarly variable: for example, using the S200 parcellation, the highest degree node is consistently found in the insula, but other high-degree nodes are located in the occipital cortex when using FACT and in temporal areas when using iFOD2 (Figure S14).
The effects of parcellation on node strength seem, in some cases at least, related to the node surface area (here, node surface area is defined as the average surface area of the given node across all participants). For instance, the most skewed strength distributions were observed for the DK68 and HCP360 parcellations, which have a much wider variance in regional surface areas than the S200 and S500 parcellations (Figure S15). Moreover, the medial PFC in the DK68 parcellation falls under the superior frontal gyrus anatomical label, which is the largest region in this parcellation. In the other parcellations, the medial PFC is subdivided into smaller parcels. It is also evident from Figure 4 that the degree sequences of the DK68 and HCP360 atlases are fairly robust to processing variations, which is notable since these are the atlases with the greatest variance in regional surface area. Areas with larger surface area will be able to accommodate more incoming and outgoing connections, and we should thus expect node strength/degree to be related with surface area. This raises the possibility that node degree will largely be driven by regional size variations, particularly in atlases with a high variance of parcel surface area. We therefore examine, in the next section, the degree to which the size of a node in a given parcellation determines its hubness by correlating node strength with surface area across parcellations, workflows, and group reconstruction methods.
The Effect of Variations in Regional Surface Area
Figure 5A shows spatial maps of node strengths obtained for two example parcellation and workflow combinations (S200 + GWM/dynamic seeding/iFOD2 and HCP360 + ACT/GMWMI/FACT) and Figure 5B shows the scatterplot of the association between node surface area and strength for each. Figure 5C shows the correlation coefficients for all tractography parameters and threshold densities for the S200 and HCP360 parcellations using edge CV (all tractograms/parcellations/group reconstructions are in Figures S16 and S17). Across all processing and parcellation combinations, the correlations between node strength and node surface area spanned the range 0.10 < r < 0.96, with a median of 0.82. Correlations for pipelines using probabilistic tractography (iFOD2) were all above r = 0.78 with a median correlation coefficient of 0.88. This high correlation persists regardless of thresholding algorithm or connection density (Figure S16). Correlations for pipelines using deterministic tractography (FACT) were somewhat lower, with a median value of 0.67 (0.10 < r < 0.91). The relationship between node strength and surface area was slightly weaker when using either of the Schaefer parcellations (S200 or S500) or the combination of ACT/WM/FACT (or both; Figures S16 and S17). Note that while node strength is highly correlated with node surface area, the same is not true of individual edges: Figure S18 shows that the weight of individual edges is not related to the total surface area of their endpoint nodes.
We next investigated whether removing the dependence of node strength on size changes the spatial distribution of the former measure. Figure 5D shows an example of the spatial distributions of the residual node strength values obtained after removing their dependence on regional surface area via linear regression. In the S200 parcellation, the nodes with the highest residuals tend to be those that are originally of medium-high strength (e.g., insula and inferior temporal gyrus). Thus, the locations of the most strongly connected nodes remain approximately similar. In contrast, in the HCP360 parcellation, the retrosplenial and presupplementary motor area cortices show disproportionately high strengths relative to surface area.
The relationship between the residuals and the original strength of each node is shown in Figure 5E. The residuals remain highly correlated with the original strengths (mean correlation across all pipelines r = 0.63 ± 0.20). While the distribution of residuals may change in location (mean) and scale (variance), the skewness, right-tailedness, and kurtosis are preserved (Figure 5F). Qualitatively similar results were obtained for all parcellations and group reconstructions (Figure S19).
Figure S20 shows the relationship between node surface area and degree in binarized connectomes. Similar to weighted node strength, the correlation is stronger when using probabilistic than deterministic tractography. In contrast with node strength, binary node degree generally has a lower correlation with surface area but a greater dependence on threshold density than in weighted connectomes (Figure S20). Across all parcellations and workflows, the median correlation was 0.44 (compared to 0.82 for the weighted connectomes). However, this relationship weakened as connection density increased. For example, in the Schaefer parcellations (S200 and S500), a correlation coefficient above 0.5 occurred only when the density was below 20%. Taken together, these findings suggest that atlas-specific variations in parcel size can influence, but not fully explain, statistical and topographical properties of node strength and degree.
DISCUSSION
We characterized the effects of several key processing steps of diffusion MRI on the distribution and location of the most strongly connected regions of the human connectome. In total, we examined 1,760 group connectomes (40 pipelines for individual connectome construction, and 44 group reconstruction schemes) which represent common choices and techniques in diffusion MRI processing. However, this analysis still encompasses only a fraction of the flexibility and variability that is possible in diffusion processing pipelines.
We found that, across all the investigated pipelines, evidence of concentrated connectivity in hubs (i.e., degree distribution properties that differ from the exponential case) was apparent in only a minor fraction of pipeline variations. When relying on node strength to define hubs, variations in tractography algorithm and parcellation had a much greater effect than changes in group reconstruction method and connection density. The use of binary degree yielded a less pronounced concentration of connectivity in network hubs and the resulting connectomes were more sensitive to connection density. When considering the spatial topography of hubs, the choice between probabilistic and deterministic tractography resulted in the largest difference and, in some circumstances, led to anti-correlated weighted degree sequences. Finally, although hubs were often the regions with the largest surface area, particularly in weighted connectomes, removal of this dependence of degree on region size generally retained a similar hub topography. Together, these findings raise concerns about the consistency with which hubs can be identified in the literature and suggest that careful consideration must be paid to processing choices when mapping connectomes with diffusion MRI.
The Effects of Tractography Algorithm
Degree distribution properties and hub strengths showed significant variations based on the tractography parameters used. Among the properties compared in our analysis, the choice of probabilistic versus deterministic tractography drove the greatest variation in degree distribution properties, as represented in the skewness, kurtosis, and right-tailedness of the degree distributions. In general, deterministic tractography resulted in more asymmetric distributions with heavier tails; in particular, the most skewed distributions in weighted connectomes resulted from the combination of white matter seeding, an anatomical streamline constraint, and deterministic tractography (ACT/WM/FACT). Given that these results were not consistently replicated across other workflows, the results of this combination of parameters may be atypical. Whether this atypicality reflects a unique sensitivity of this workflow combination in recovering the true underlying network architecture, or a result of interaction between processing steps, is unclear.
Changes in the shape of the degree distribution were also reflected in changes in the location of the strongest nodes and the relationship to node surface area. Probabilistic tractography showed a strong correlation between node strength and node surface area in weighted connectomes. This was observed across all parcellations, seeding strategies, spatial constraints, and group reconstructions. As such, the locations of hubs derived from probabilistic tractography were slightly more consistent, and degree distributions were generally more correlated between workflows.
The Effects of Cortical Parcellation
Many different parcellations have been used in the literature to map connectomes. These parcellations vary with respect to two key factors relevant to connectome mapping: their spatial resolution and their variance in parcel sizes. Spatial resolution naturally affects the precision with which the connectivity of regions can be resolved and can lead to differences in the spatial topography of hubs. For instance, the medial PFC was a prominent hub in the DK68 atlas but not in the other parcellations, where this area is subdivided into smaller regions. This variation is likely related to regional variations in surface area, since the medial PFC is among the largest in the DK68 parcellation. Such variations can interact with other processing choices; for instance, degree distributions were highly skewed and kurtotic (sub-exponential decay) when using probabilistic, but not deterministic, tractography with the HCP360 parcellation, for which the largest parcel is more than 1.5× larger than the largest parcel in the S200 parcellation.
The Effects of Regional Variation in Surface Area
To the extent that a given parcellation defines valid functional areas of the brain, the correlation between region size and degree may be an accurate reflection of biological reality—some regions may be more connected simply because of their size. However, it can be useful to determine whether a region’s hubness is simply a result of its surface area. It is somewhat reassuring that the relative degree rankings of different areas only changed moderately after controlling for the effects of size variations, but these effects should nonetheless be considered when drawing conclusions about the hub status of specific brain regions. Further work could consider the mechanisms by which the weight of individual edges (which are uncorrelated with node surface area) contribute to total node strength (which is often highly correlated with node surface area).
The Effects of Group Reconstruction and Connection Density
The specific method for aggregating individual connectomes into a group-averaged representation had minimal effect on node strength distributions or topographies. Binary degree was more susceptible to the effect of varying connectome density, which is likely because thresholding removes the weakest connections. Such connections make a small contribution to weighted degree but make an equal contribution to strong edges when estimating binary degree.
Limitations
We intentionally used model-free quantities to characterize network degree distributions to simplify and standardize measures across the various pipelines considered. An alternative is to fit specific distributions to the data. For example, previous studies have reported that weighted connectomes have a degree distribution that follows a power-law distribution (Varshney et al., 2011), a truncated power-law distribution (Modha & Singh, 2010), or a generalized Pareto distribution (Zucca et al., 2019). In the best case, these models can suggest a biological mechanism which may produce observed patterns of hub connectivity, but care should be taken in performing inference using such analyses (Clauset et al., 2009). Our approach offers a hypothesis-free way of quantifying the degree to which connectivity is concentrated in putative hub nodes, but future work could consider characterizing the precise forms of connectome degree distributions in more detail.
We focused here on cortex for simplicity, given the large number of processing pipelines that we examined. However, our conclusions are sufficiently general that they should not be significantly altered by the inclusion of non-cortical areas. Similarly, we focus on a normative cohort, given that applications to some clinical cohorts may require further steps and thereby exacerbate workflow variability (Martínez-Heras et al., 2015; Mochizuki et al., 2023; Richards et al., 2021; Sanvito et al., 2020; Shu et al., 2011). Future work may examine the effects of such additional variations (Gonzalez-Aquines et al., 2019; Horbruegger et al., 2019; Lipp et al., 2020).
The absence of a ground truth for diffusion MRI makes comparisons between pipelines challenging. Diffusion MRI results have been compared to tract tracing in animals (Calabrese et al., 2015; Girard et al., 2020) and to simulations (Farrher et al., 2012; Maier-Hein et al., 2017), but the field is yet to converge on a gold standard pipeline.
Finally, our analysis focused on group connectomes, as these are most commonly studied in the literature. Recent analyses of functional MRI data have suggested that there is considerable individual variability in network architecture that is behaviorally meaningful (Kong et al., 2019; Levakov et al., 2021; Sun et al., 2022). Developing better ways of capturing biologically meaningful individual differences, as distinct from measurement noise, remains an important challenge for the field.
Conclusions
Our findings indicate that different processing choices affect inferences about network hubs, and that evidence for a concentration of connectivity in hubs occurs in a minor fraction of pipeline variations. Thus, our analysis suggests that it can be quite difficult to identify network hubs in a consistent way, at least across different tractography algorithms and parcellations. However, not all pipeline choices are equal. Although no gold standard pipeline currently exists, some choices are preferred over others. Some denoising procedures, such as the use of eddy correction with outlier replacement and within-slice motion correction, show excellent denoising performance, and are thus recommended (Oldham et al., 2020). ACT (Smith et al., 2012) represents a principled, reasonable constraint on tractography that can be used to remove biologically implausible streamlines. Furthermore, certain parcellations yield parcels that are more functionally homogeneous than others, supporting their biological validity. In this respect, the Schaefer parcellations generally perform quite well with respect to diverse benchmarks (Bryce et al., 2021; Schaefer et al., 2018). However, whether one should choose deterministic or probabilistic tractography is a difficult question to answer definitively. Deterministic tractography is more conservative, but may miss real long-range connections that are important for mapping hub connectivity (Arnatkevičiūtė et al., 2021; Fulcher & Fornito, 2016; van den Heuvel et al., 2012). Probabilistic tractography is better able to resolve such connections but may be prone to false positives. Choices related to different seeding strategies may require more detailed investigation. The incorporation and improvement of sparsity constraints and filtering techniques (Schiavi et al., 2020; Smith et al., 2015b) will be important for improving the accuracy of these approaches. Ongoing assessment with respect to plausible phantoms may help adjudicate between these alternatives (e.g., Maier-Hein et al., 2017). Until then, investigators should assess the robustness of their results by analyzing dMRI data using multiple pipelines, and should be aware of the effects that the choices they exercise in processing their data have on their final results.
METHODS
Participants
The 294 healthy participants (mean age 23.12 ± 5.18 years, 162 females) were recruited at Monash University with informed consent. All participants self-reported right-handedness and had reported no significant neurological/psychiatric history (i.e., no personal history of neurological or psychiatric disorders, no loss of consciousness or memory due to head injury, and no history of drug use disorder). Further information on sample characteristics is provided elsewhere (Sabaroedin et al., 2019). The study was conducted in accordance with the Monash University Human Research Ethics Committee (reference number 2012001562).
Image Acquisition
T1-weighted (T1w) and diffusion MRIs were acquired on a Siemens (Munich, Germany) Skyra 3T scanner with a 32-channel head coil at Monash Biomedical Imaging in Clayton, Victoria, Australia. T1w structural scans were acquired with the following parameters: 1 mm3 isotropic voxels, TR = 2,300 ms, TE = 2.07 ms, TI = 900 ms, and a FOV of 256 mm. Diffusion scans were obtained using an interleaving acquisition with the following parameters: 2.5 mm3 isotropic voxels, TR = 8,800 ms, TE = 110 ms, FOV of 240 mm, 60 directions with b = 3,000 s/mm2, and seven b = 0 s/mm2 vol. In addition, a single b = 0 s/mm2 was obtained with reversed phase encoding direction for susceptibility field estimation.
Image Processing Common to All Pipelines
Imaging data were processed using the Multi-modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE) high-performance computing infrastructure (Goscinski et al., 2014) as described by Oldham et al. (2020). The analysis evaluated the efficacy of 240 different diffusion MRI processing pipelines in mitigating motion-related artifacts in connectivity estimates, with the pipelines generated by varying choices at each of seven steps (distortion correction, tractography algorithm, propagation constraints, streamline seeding, tractogram reweighting, edge weighting, and parcellation). We adopted recommendations of Oldham et al. (2020) for three of these (distortion correction, tractogram reweighting, and edge weighting), as specific options in these steps were shown to reduce the correlation between head movement and structural connectivity. We evaluated effects of the four remaining factors, three of which pertain to the tractography algorithm (probabilistic vs. deterministic algorithm, propagation constraints, streamline seeding) and the last of which pertains to parcellation. We further considered how these steps interact with different thresholding and group aggregation methods. A visual schematic of our pipeline variations is presented in Figure 1. Further details about the choices made at each step are provided in the following sections.
DWI and T1w preprocessing.
MRtrix version 3.0.15 (Tournier et al., 2019) and FSL version 5.0.11 (Jenkinson et al., 2012) were used to process the diffusion MRI data. First, FSL’s topup was used to estimate the susceptibility-induced off-resonance field using the forward and reverse phase-encoded b = 0 s/mm2 images (Andersson et al., 2003; Smith et al., 2004). Then, FSL’s eddy tool was used for motion and eddy current correction, which has been shown to successfully mitigate motion-related artifact in connectivity estimates (Oldham et al., 2020), and which incorporates both (i) a Gaussian process-based generative model for volume prediction and realignment (Andersson & Sotiropoulos, 2016) and (ii) reconstruction and replacement of slices with significant signal dropout (Andersson et al., 2016, 2017). The following parameters were used for slice-to-volume correction: temporal order of movement = 30, iterations = 5, strength of temporal regularization = 6, and trilinear interpolation. Finally, FAST in FSL was used to correct for B1 field inhomogeneities (Smith et al., 2004; Zhang et al., 2001).
The diffusion images were then coregistered to the T1w images via a rigid-body transformation using FSL’s FLIRT (Jenkinson et al., 2002; Jenkinson & Smith, 2001) and the inverse of this transformation was used to map the T1w image to the subject’s native diffusion space, where all tractography was performed. FreeSurfer version 5.3 (Fischl, 2012) was used to extract cortical surface models (gray/white matter surface and gray/CSF surface) from T1w images. All outputs were visually inspected and manually corrected, if required. Parcellation schemes (detailed in the section Parcellation) were applied to the cortical surface models; these were then projected to the T1w image grid and used to define network nodes.
Pipeline-Specific Image Processing
In this section, we outline the key pipeline variations considered in our analysis.
Streamline seeding algorithm.
Streamline seeding is the process by which voxels are selected to be the propagation points for streamlines. As in Oldham et al. (2020), we compare three streamline seeding algorithms:
White matter (WM): voxels coded as white matter are randomly chosen as streamline seeds.
Gray matter–white matter interface (GMWMI): voxels containing a gradient between gray matter and white matter are chosen as streamline seeds, with the aim of improving the tractography of shorter fibers (Smith et al., 2013, 2015a).
Dynamic: the relative difference between the predicted fiber density (based on the diffusion model) and the current density is used to inform the probability of choosing a particular location as a seed, with the aim of correcting for under- or oversampling of a given fiber tract (Smith et al., 2015b).
Streamline tractography algorithm.
Most tractography algorithms are classified as being either deterministic or probabilistic. Deterministic algorithms tend to be more conservative and thus prone to false negatives, while probabilistic algorithms are more sensitive but can be prone to false positives (Reveley et al., 2015; Sarwar et al., 2019; Thomas et al., 2014). We compared an exemplar of each class. Both approaches were implemented in MRtrix3 (Tournier et al., 2019):
Deterministic tractography was performed using the fiber assignment by continuous tractography (FACT) algorithm (Mori et al., 1999; Mori & van Zijl, 2002).
Probabilistic tractography was performed using second-order integration over fiber orientation distributions (iFOD2) (Tournier et al., 2007, 2010, 2012).
For both tractography algorithms, 2,000,000 streamlines were generated with a maximum length of 400 mm, a maximum curvature of 45° per step, the default step size (1.25 mm for FACT; 0.25 mm for iFOD2), and the default termination criterion (0.05 amplitude of the primary eigenvector for FACT; 0.05 FOD amplitude for iFOD2).
Streamline propagation constraint.
Tractography algorithms often track streamlines through anatomically implausible areas (e.g., ventricles), which can be addressed by imposing some constraints on streamline propagation. We examined two spatial constraints:
Gray and white matter masking (GWM), involving the use of a binary mask (combining the gray and white matter masks from the FreeSurfer segmentation) that ensures streamlines only travel through brain parenchyma.
Anatomically constrained tractography (ACT), which uses a multitissue segmentation (cortical gray matter, subcortical gray matter, white matter, and CSF) and a series of propagation rules to ensure that streamlines follow anatomically viable paths (Smith et al., 2012).
Because the implementation of GMWMI seeding in MRtrix3 requires ACT, pipelines combining GWM and GMWMI were excluded. The above combinations therefore resulted in a total of 10 different tractography workflows for comparison.
Parcellation.
A wide variety of parcellations has been used in the connectomics literature (Arslan et al., 2018; de Reus & van den Heuvel, 2013b; Lawrence et al., 2021). The specific parcellation used can affect various network properties (Eickhoff et al., 2015; Fornito et al., 2010; Zalesky et al., 2010). We compared four different cortical parcellation schemes derived using three different approaches:
The Desikan-Killiany parcellation (DK68), comprising 34 cortical nodes in each hemisphere delineated using sulcal and gyral landmarks (Desikan et al., 2006).
The Human Connectome Project MMP1 parcellation (HCP360), comprising 180 cortical nodes per hemisphere defined using a semiautomated pipeline that leverages information on regional cortical architecture, function, connectivity, and topography (Glasser et al., 2016).
The Schaefer et al. (Schaefer et al., 2018) 200 and 500 node parcellations (S200 and S500), generated based on local gradients of global profile similarities in regional functional coupling estimates.
These parcellations represent both (i) distinct technical and methodological approaches relying different biological properties, and (ii) diversity in the sizes and shapes of parcels produced. Each parcellation was generated using surface models estimated by FreeSurfer using fsaverage coordinates; these were registered to each individual’s surface and then projected out to the T1w volume. The combination of 10 tractography workflows and 4 parcellations resulted in a total of 40 pipelines for reconstructing individual connectomes.
Group aggregation.
Having generated individual connectomes using the above parameters, we compared four methods for aggregating the data to obtain a group-representative connectome:
Edge weight, which retains edges with the largest mean weight, up to a specified density.
Edge coefficient of variation (CV), which retains edges with the smallest CV across participants (Roberts et al., 2017), up to a specified density.
Edge consistency, which retains edges that are present (i.e., with nonzero weight) in the greatest number of participants (de Reus & van den Heuvel, 2013a), up to a specified density. While this this approach can be formulated by selecting a specific consistency threshold, here we equivalently specified density thresholds and retained the most consistent edges to ensure that connectomes are density-matched, which facilitates comparisons across pipelines (see section Group thresholding).
Edge distance–dependent binning, which bins edges according to their length, using a specified number of bins, and retains edges that are most frequently present within each bin (Betzel et al., 2019).
Note that for each method, the final weight of the retained edges is equal to the mean of the edge weights across all participants; it is only the choice of which edges to retain that changes.
Group thresholding.
Having generated a group connectome using one of the above approaches, we thresholded the resulting matrix at different levels using one of two approaches, depending on the aggregation method:
Density thresholds were used for group connectomes aggregated using edge weight, edge CV, and edge consistency, retaining the top-ranked edges according to each measure, evaluating densities spanning 5% to 30%, in increments of 2.5%.
The number of bins was used for the group connectome generated with edge distance–dependent binning, in which we changed the number of bins from 10 to 110, in increments of 10. In general, increasing the number of bins increases the density of the group connectome, resulting in networks with densities spanning 2% to 94%. Note that connectomes generated in this way were evaluated separately when evaluating how network properties depend on connectome density.
The combination of four group aggregation methods and 11 thresholds for each threshold resulted in a total of 44 group reconstruction regimes for comparison.
Statistical Analysis
We first evaluated how the above processing choices affect properties of the degree distribution of the connectome. The degree distribution defines the extent to which connectivity is concentrated in network hubs. Distributions with a heavy tail imply the existence of highly connected hubs, whereas distributions with an approximately exponential fall-off imply that the concentration of connections on putative hubs does not exceed the expectations of a random network (Fornito et al., 2016). We therefore concentrated on the properties of the distribution tails. Distributions of both binary and weighted node degree in brain networks have been previously described as heavy-tailed (taken here to mean that the tail decays subexponentially), but the precise distribution they follow has been the subject of debate (Buzsáki & Mizuseki, 2014; Fornito et al., 2016; Roberts et al., 2015; Zucca et al., 2019). Moreover, parametric modeling of these distributions is dependent on user-defined inputs, such as the choice of the models under consideration or the model fitting procedures used, resulting in another source of variability when comparing computational pipelines.
Fitting the empirical degree distribution to the generalized extreme value distribution and obtaining a tail decay index can mitigate these problems (Gomes & Guillou, 2015; de Haan & Ferreira, 2006; Hill, 1975). However, this approach often requires a large number of data points (Németh & Zempléni, 2020) and still depends on heuristic measures to define the start and end of the tail (Bauke, 2007; Gomes et al., 2009; Paulauskas & Vaičiulis, 2017). We therefore used the nonparametric approach described by Jordanova and Petkova (2017), which more directly focuses on the question of heavy-tailedness.
In brief, to determine if the distribution of a random variable X has a heavier right tail than the exponential distribution, we calculate the empirical first and third quartiles, 1 and 3, respectively, and the interquartile range, IR = 3 − 1. We then define the “right-tailedness” of the distribution as the probability that a random drawn observation from the distribution is greater than the value given by 3 + 3IR (i.e., pR(X) = P(X > 3 + 3IR)), utilizing the commonly used definition of extreme outliers (McGill et al., 1978). This value can be compared to the right-tailedness of the exponential distribution (X ∼ e−λx), which is invariant to the shape parameter λ, such that pR(X) = exp(−λ · ln(33 · 4)/λ) = 1/108 ≈ 0.009 (Jordanova & Petkova, 2017). This analytic solution offers a convenient threshold for determining whether a distribution has a heavier right tail than the exponential distribution, with heavy-tailedness implied if the empirical pR > 0.009.
Additionally, we quantified the asymmetry of the whole distribution using the skewness (the third standardized moment), which is also constant for the exponential distribution (skewness = 2). Finally, for completeness, we calculated the excess kurtosis (the fourth standardized moment), which provides an alternative method for capturing the behavior at the tails (DeCarlo, 1997; Westfall, 2014). This measure has been shown to be robust for detecting outliers in small samples (Hayes et al., 2007; Livesey, 2007) and is also independent of the shape parameter of the exponential distribution (excess kurtosis = 6). We note that other methods are available, including tail index estimation (Caers & Dyck, 1998; Németh & Zempléni, 2020), parametric fitting (Zucca et al., 2019), and skewness-free kurtosis measures (Critchley & Jones, 2008; Eberl & Klar, 2022; Jones et al., 2011; Oja, 1981). However, as with parametric modeling, these methods rely on user-defined algorithms or parameters (such as the number of quantiles to be used or the cutoff point for initialization of the tail), similarly making comparisons difficult.
After characterizing the statistical properties of the degree distribution, we examined the spatial distribution of interregional connectivity by considering the degree sequence. The degree sequence encodes the assignment of degree values to specific nodes, hence capturing the spatial position or topography of network hubs. Within parcellations, we compared the consistency of hub topography and the effects of surface area on hubness. Between parcellations, we examined the consistency of hub topography across different pipelines qualitatively, as the lack of region-to-region correspondence precludes a direct comparison.
AUTHOR CONTRIBUTIONS
Mehul Gajwani: Conceptualization; Formal analysis; Methodology; Writing – original draft. Stuart Oldham: Conceptualization; Formal analysis; Methodology; Software; Writing – review & editing. James C. Pang: Supervision; Writing – review & editing. Aurina Arnatkevičiūtė: Methodology; Writing – review & editing. Jeggan Tiego: Resources; Writing – review & editing. Mark A. Bellgrove: Funding acquisition; Writing – review & editing. Alex Fornito: Conceptualization; Formal analysis; Funding acquisition; Methodology; Supervision; Writing – original draft.
SUPPORTING INFORMATION
Supporting information for this article is available at https://doi.org/10.1162/netn_a_00324. All the data used in this study is openly available on Figshare at https://doi.org/10.26180/c.6352886.v1. Scripts to analyze these data are available on GitHub at https://github.com/BMHLab/DegreeVariability.
FUNDING INFORMATION
Alex Fornito, Sylvia and Charles Viertel Charitable Foundation (https://dx.doi.org/10.13039/100008717). Alex Fornito, National Health and Medical Research Council, Award ID: 1149292. Alex Fornito, National Health and Medical Research Council, Award ID: 1197431. Mark A. Bellgrove, NHMRC Senior Research Fellowship, Award ID: Level B. Alex Fornito, Australian Research Council (https://dx.doi.org/10.13039/501100000923), Award ID: DP200103509.
TECHNICAL TERMS
- Connectome:
A comprehensive map of connections between neural elements.
- Tractography:
The process of reconstructing the putative trajectories of axonal fibres, often based on the anisotropic diffusion of water.
- Parcellation:
A map delineating distinct neural elements.
- Group-representative network:
A single network that aggregates and represents connectivity data from multiple individuals.
- Hubs:
Nodes that have stronger or more numerous connections, and that may be viewed as more significant to network structure/function.
- Degree distribution:
The probability density function of node degree (either weighted or unweighted) in a given connectome.
- Right-tailedness:
The frequency of outliers in the right-sided tail of the distribution of a random variable.
- Structural connectivity:
Anatomical (rather than functional or statistical) connections between neural elements.
REFERENCES
Competing Interests
Competing Interests: The authors have declared that no competing interests exist.
Author notes
Handling Editor: Mikail Rubinov