Abstract
Despite the widespread exploration and availability of parcellations for the functional connectome, parcellations designed for the structural connectome are comparatively limited. Current research suggests that there may be no single “correct” parcellation and that the human brain is intrinsically a multiresolution entity. In this work, we propose the Continuous Structural Connectivitity-based, Nested (CoCoNest) family of parcellations—a fully data-driven, multiresolution family of parcellations derived from structural connectome data. The CoCoNest family is created using agglomerative (bottom-up) clustering and error-complexity pruning, which strikes a balance between the complexity of each parcellation and how well it preserves patterns in vertex-level, high-resolution connectivity data. We draw on a comprehensive battery of internal and external evaluation metrics to show that the CoCoNest family is competitive with or outperforms widely used parcellations in the literature. Additionally, we show how the CoCoNest family can serve as an exploratory tool for researchers to investigate the multiresolution organization of the structural connectome.
Author Summary
In this work, we derive a family of structural connectivity-based parcellations, called the CoCoNest family, based on a continuous representation of structural connectivity. This family is created using agglomerative clustering to grow a fully binary tree and then error-complexity pruning to greedily derive subtrees that balance complexity and goodness of fit. The CoCoNest family is evaluated with a comprehensive collection of internal and external evaluation metrics across two independent datasets and compared with widely used parcellations. Our results show that members of the CoCoNest family are competitive with other parcellations and often achieve superior performance. We show that the CoCoNest family can serve as an exploratory tool for investigations into the multiscale nature of the structural connectome.
INTRODUCTION
The human connectome has long inspired critical research within the field of neuroscience (Brodmann, 1909; Elam et al., 2021; Hagmann et al., 2008; Leergaard & Bjaalie, 2022; Marin-Padilla, 1983; Sporns, Tononi, & Kötter, 2005; Yeo et al., 2011). The exploration of the human connectome has deepened in recent years due to advances in noninvasive brain imaging techniques. These advances have spurred the collection of massive brain imaging datasets (e.g., the Human Connectome Project (HCP) and the Adolescent Brain Cognitive Development (ABCD) study; Casey et al., 2018; Van Essen et al., 2013), which have enabled and enriched connectome research. A crucial step in connectome research involves parcellating the brain into discrete regions of interest (ROIs) that are both spatially and neurobiologically coherent. This collection of ROIs, also called a parcellation or an atlas, is essential because it reduces the complexity of the connectome data while preserving important features relevant to analyses. Parcellations enable researchers to easily explore the organization of the brain, unravel interactions between brain regions, and study how the connectome relates to the overall brain function and behavior (Glasser et al., 2016; Gosnell, Fowler, & Salas, 2019; Hirsiger et al., 2016; Yeo et al., 2011). Additionally, by viewing the ROIs as nodes and the connection strengths between them as edges, parcellations form a link between human connectome research and the expansive literature on network analysis (Fornito, Zalesky, & Bullmore, 2016; Sporns et al., 2005; J. Wang et al., 2010, 2015).
Connectomes are primarily categorized into two modalities: the functional connectome and the structural connectome. The functional connectome serves as a map of the dynamic organization of the brain, where dependencies between activation patterns, measured using functional MRI (fMRI), are used to quantify the connection between ROIs. The structural connectome serves as an anatomical map, where the presence of interconnecting pathways of white matter fiber tracts, uncovered using diffusion MRI (dMRI) and tractography, are used to quantify the connection between ROIs.
When analyzing the connectome, researchers typically choose between an anatomical and a connectivity-based parcellation. Anatomical-based parcellations (Desikan et al., 2006; Destrieux, Fischl, Dale, & Halgren, 2010) are facilitated by neuroanatomy experts who manually identify anatomical landmarks. These parcellations contain neurobiologically meaningful parcels that are relevant for clinical applications. However, such parcellations are often constructed from small cohorts of subjects that may limit their generalizability. Furthermore, because these parcellations do not directly use connectome data, they may be limited in capturing characteristics unique to the functional or structural connectome. These limitations, along with the collection of massive brain imaging datasets, have motivated the creation of connectivity-based parcellations that directly use functional or structural connectivity data. Recent work in connectivity-based parcellations has predominantly focused on the use of functional connectome data. However, there is a general consensus that the structural architecture of the brain plays a critical role in its functional dynamics (Van Essen, 2013; Passingham, Stephan, & Kötter, 2002). This notion, together with recent advances in structural connectome reconstruction (St-Onge, Daducci, Girard, & Descoteaux, 2018; Van Essen et al., 2013), has inspired deeper explorations of the structural connectome, necessitating the construction of parcellations from structural connectome data.
In this paper, we propose a method for constructing a rich, multiresolution family of parcellations from structural connectome data. This approach leverages recent advances in structural connectome reconstruction to uncover the white matter architecture, explicitly strikes a balance between the complexity of the connectome data and the preservation of high resolution connectivity patterns and facilitates tractable investigations of the multiresolution nature of the structural connectome. To achieve this, we start by leveraging a recently developed tractography algorithm called surface-enhanced tractography (SET), which has been shown to decrease gyral bias and better approximate the underlying white matter structure (St-Onge et al., 2018). Subsequently, we use a continuous representation of structural connectivity (SC) (Cole et al., 2021; Consagra, Cole, & Zhang, 2022; Gutman et al., 2014; Mansour L., Seguin, Smith, & Zalesky, 2022; Moyer, Gutman, Faskowitz, Jahanshad, & Thompson, 2017), which provides a rigorous statistical framework for modeling white matter fiber track endpoints and constructs dense, high-resolution SC matrices. Starting with the high-resolution SC data, we use a conventional agglomerative (bottom-up) clustering algorithm to construct a full binary tree that aims to reflect the hierarchical organization of the structural connectome. We then use error-complexity pruning (Breiman, Friedman, Olshen, & Stone, 1984; Chou, Lookabaugh, & Gray, 1989) to iteratively remove branches from this tree in a greedy fashion, balancing the complexity of the tree with its fit to the high-resolution connectome data. This procedure creates a nested sequence of subtrees where each subtree corresponds to a member of our multiresolution parcellation family. This allows users not only to choose the desired complexity of the parcellation but also to explore the interactions and distinctions between multiple resolutions of the structural connectome. In addition, we draw on a collection of internal and external evaluation metrics from the literature to assess the consistency of our parcellation with neurobiological intuition and its performance in typical downstream tasks across two independent datasets.
Our internal and external evaluation results demonstrate that members of our proposed family of parcellations, which we call the CoCoNest family, are competitive or outperform several widely used parcellations in the literature. Additionally, we make CoCoNest easy to reconstruct and access in the most popular template spaces. The code and data used to create the CoCoNest family are freely available at https://github.com/sbci-brain/CoCoNest.
MATERIALS AND METHODS
Figure 1 illustrates the pipeline used to create the CoCoNest family. We began by downloading high-quality imaging data from the HCP (Van Essen et al., 2013). These imaging data were processed using a comprehensive pipeline called the Surface-Based Connectivity Integration (SBCI) pipeline, recently developed by Cole et al. (2021). For each subject, SBCI performed standard image preprocessing steps and outputted a high-resolution SC matrix based on a continuous representation of SC. Each element of these matrices quantifies the density of white matter fiber tracts between two vertices on the brain’s surface. These subject-specific, high-resolution SC matrices were then averaged to obtain a high-resolution, group-averaged SC matrix, denoted as . was then fed into an agglomerative clustering algorithm, which created a full binary tree. This tree aims to both group together vertices with similar SC patterns and capture the multiresolution nature of the SC data in . Finally, we pruned this tree by adopting an error-complexity pruning approach, developed by Breiman et al. (1984). This pruning procedure resulted in the nested, multiresolution sequence of parcellations that we call the CoCoNest family. The sections below explore each step in more detail.
Data and Preprocessing
Data.
CoCoNest was constructed using high-quality dMRI and structural MRI (sMRI) data from 897 healthy, young adults (ages 22–35 years) who participated in the HCP (Van Essen et al., 2013). The HCP provides rich subject-level data on brain connectivity, behavior, and genetics. Minimally preprocessed data from this study can be accessed at https://db.humanconnectome.org/. Details on the data collection and the minimal preprocessing steps can be found in Van Essen et al. (2013).
Image data processing and connectome extraction.
The dMRI and sMRI data were processed through a recently developed neuroimaging pipeline called the SBCI pipeline (Cole et al., 2021). This pipeline consists of dMRI preprocessing, cortical surface extraction, and structural connectome reconstruction steps. A brief overview of these steps is provided below.
Starting with minimally preprocessed dMRI and sMRI data, SBCI follows standard image preprocessing steps. The dMRI data are skull stripped, bias-field corrected, cropped, intensity normalized, and resampled to 1-mm isotropic resolution using tools from MRtrix3, Advanced Normalization Tools (ANTs), and the Sherbrooke Connectivity Imaging Lab toolbox (Scilpy). Subsequently, the sMRI data are registered to the dMRI space and then processed using FreeSurfer’s recon-all. SBCI then computes voxel-wise fiber orientation distribution function (fODF) estimates, using Dipy’s implementation of constrained spherical deconvolution (Tournier, Calamante, & Connelly, 2007). These fODF estimates are then fed into a recently developed probabilistic tractography algorithm called SET (St-Onge et al., 2018), which incorporates information about the geometry of the white surface to improve tractography results. The SET algorithm has been shown to reduce gyral bias in SC and limit the amount of false-positive connections output from tractography.
Let be the union of two white matter surfaces, and , corresponding to the left and right hemispheres of the brain (as shown in the top panel of Figure 2A). In this context, the set Ω × Ω represents all possible pairs of endpoints for white matter tracts that interface with the white matter boundary. The output of the SET algorithm is then a subset of Ω × Ω, denoted by , where each pair of points indicates the ending positions of white matter tracts on Ω, and q is the total number of endpoint pairs. Since and are homeomorphic to , SBCI parametrizes them using spherical coordinates. Let (p1, p2) be the image of on under the homeomorphism. Then, we can denote the set of reparameterized ending points as , and with a slight abuse of notation, we let , as shown in the bottom panel of Figure 2A.
In our parcellation pipeline, we excluded from the matrix S 446 vertices that correspond to the corpus callosum, as identified by Glasser et al. (2016), as the corpus callosum is often treated separately in connectivity analyses.
Thus, we obtained a 3,675 × 3,675 high-resolution SC matrix for each of the 897 subjects in HCP. The connectivity values in these matrices were highly skewed, with a large number of weak connections and very few strong connections. To prevent our method, and several of the evaluation metrics, from being dominated by these extreme values, we scaled and log-transformed each of the connectivity values, S′ = log ((105 × S + 1). These transformed SC matrices were then averaged to produce a high-resolution, group-level SC matrix, denoted (see Figure 1C). Averaging serves to enhance the signal-to-noise ratio of the SC values and yields a group-level SC representation.
For more details on the SBCI pipeline, see Cole et al. (2021) and the accompanying Github repositories at https://github.com/sbci-brain/.
Tree Creation
Using , we measured the similarity of SC patterns across the cortical surface, at multiple levels of granularity, that is, resolutions. At the highest resolution, we considered the similarity between SC patterns at the level of individual vertices. We then iteratively merged these vertices into parcels to measure the similarity of SC patterns at coarser resolutions. This was carried out using a standard agglomerative clustering algorithm. In the first stage of this algorithm, each of the 3,675 vertices is assigned to its own distinct parcel. In subsequent stages, the most similar parcels are merged, until all parcels are merged into a single parcel. The result of this algorithm is a full binary tree, or dendrogram, where each parcel has either zero or two children (Rosen, 2011). The final parcel, with no parent, is called the root node; parcels with two children are called internal nodes, while parcels with no children are called terminal nodes. Figure 3A illustrates these concepts.
Tree Pruning
In order to derive a family of parcellations from Tmax, we make use of tree pruning—a popular technique that has been widely studied in classification and regression trees, vector quantization, and signal compression (Breiman et al., 1984; Chou et al., 1989; Gersho & Gray, 1991; Quinlan, 1993).
We define the branch Tt as the subset of Tmax consisting of the node t and all of its descendant nodes. To prune Tt from Tmax, we delete all descendants of t from Tmax, leaving t itself in place. After pruning, t becomes a terminal node, and a new tree Tmax − Tt is formed. Figure 3C illustrates a single branch being pruned from a tree. If a tree T is obtained by iteratively pruning branches from Tmax, then T is called a “pruned subtree” of Tmax, and this relationship is written as T ≼ Tmax. A parcellation is derived from a pruned subtree T by treating the vertices in a terminal node of T as a distinct parcel. Thus, by iteratively pruning Tmax, we can create a set of parcellations.
A dendrogram visually represents the results of a hierarchical clustering algorithm: The height of each node indicates the similarity between the two clusters, called parcels in this paper, that are being merged at that node. The most popular method of tree pruning is carried out by horizontally “cutting” the dendrogram at a fixed height, that is, collapsing all of the nodes below this fixed height into terminal nodes. This cutoff height effectively sets a limit on the maximum D(Ei, Ej) allowable for merging parcels. The limit is often determined by the desired number of parcels, that is, the user will choose a cutoff height such that no more than K parcels remain.
However, this form of tree pruning does not actively balance the core goals of a parcellation: reducing the complexity of the connectome data while preserving meaningful patterns in the data. Instead, to prune Tmax, we adopt the error-complexity pruning method introduced by Breiman et al. (1984) and further generalized by Chou et al. (1989). This method iteratively prunes a tree in a greedy fashion, reducing its complexity while seeking to minimize its loss of fit, or error, to the data. We will first discuss how we quantify the fit and complexity of a tree TTmax; then, using these concepts, we will provide a brief overview of how we carry out error-complexity pruning to derive a nested, multiresolution family of parcellations.
Since it is computationally infeasible to explore every combination of pruning decisions to find the subtree that minimizes Rα(T), a method called weakest link cutting is used to facilitate error-complexity pruning.
Tt* is then pruned to create a new tree T1 = Tmax − Tt*. Subsequently, the method finds a new weakest link in T1 and prunes the corresponding branch. This process is repeated until only the root node remains. More details on this algorithm can be found in Breiman et al. (1984).
The result of this algorithm is a nested sequence of subtrees Tmax ≻ T1 ≻ T2 … root. The terminal nodes of each each subtree in this sequence yields a distinct parcellation of the cortical surface. We call the resulting nested, multiresolution sequence of parcellations the CoCoNest family. The pruning procedure tends to remove two terminal nodes at a time during the early stages of the algorithm, while larger cuts are made in the later stages, as illustrated in Figure 9. Thus, although a parcellation of any fixed size (number of terminal nodes) K is not guaranteed, the CoCoNest sequence still spans a wide range of resolutions.
Quantifying Parcellation Performance
In this section, we use several metrics for quantifying the quality of a given parcellation. We divide these metrics into two categories: internal evaluation and external evaluation. Internal evaluation metrics measure the performance of a parcellation based on properties intrinsic to the parcels themselves, such as the size of a parcel, without reference to external information. In contrast, external evaluation metrics rely on auxiliary or external information, like a subject’s performance on a cognitive test. In the Results section, we used these metrics to assess the performance of members of the CoCoNest family and to compare its performance with other parcellations commonly used in the literature.
Internal Evaluation Metrics.
We considered seven internal evaluation metrics: (a) approximation error, (b) 1-Wasserstein distance, (c) parcel homogeneity, (d) Calinski-Harabasz index, (e) proportion of contiguous parcels, (f) entropy, and (g) stability. Their definitions are given below.
Approximation error.
A parcellation merges the connectivity patterns of vertices by grouping them into parcels. This merging reduces the dimension of the high-resolution data at the cost of losing vertex-level connectivity information. Ideally, a parcellation should parcellate the high-resolution data in a way that minimizes this information loss. To quantify this loss, we created a metric called approximation error. Recall that is the parcel-based SC matrix derived from the parcellation . Ideally, this parcel-based connectivity should resemble the connectivity patterns present in the full-resolution connectivity data in . However, a direct comparison between and is difficult due to their differing dimensions: is a K × K matrix and is a 3,675 × 3,675 matrix.
1-Wasserstein distance.
Parcel homogeneity.
The Calinski-Harabasz index.
Proportion of connected parcels.
The structural connectome is known to be spatially contiguous, that is, parcels should not be internally disconnected or split across the cortex. A parcellation with many disconnected parcels does not properly capture this property of the structural connectome. Given the adjacency matrices associated with the surface meshes introduced in the Image Data Processing and Connectome Extraction section (see Figure 2A), we also computed the proportion of connected parcels within a parcellation using a Depth-First Search (DFS) algorithm. For each parcel in the parcellation, the DFS algorithm starts at an arbitrary vertex within the parcel and iteratively explores its neighbors, via the surface mesh adjacency matrix. If every vertex within a given parcel can be reached from any starting vertex via DFS, then the parcel is considered connected. By performing this analysis, we can quantify the proportion of connected parcels in the parcellation. A higher value indicates that the parcellation contains a greater number of connected parcels.
Entropy of parcel sizes.
Adjusted mutual information.
Stability.
A method for parcellating the brain should be stable across homogeneous groups of subjects. To quantify the stability of CoCoNest, we randomly created 10 batches of 100 subjects from the 897 subject considered. Each of these batches was then fed into the CoCoNest pipeline (see Figure 1) to create 10 nested families of parcellations. We then used the AMI, as detailed above, to quantify how similar these nested families were to each other and to the CoCoNest family constructed using all 897 subjects.
External Evaluation Metrics.
We use the following two external evaluation metrics to assess parcellation performance.
Trait prediction.
An important task in connectome studies is to understand the relationship between brain connectivity and various demographic, behavioral, or physiological traits (Gosnell et al., 2019; Hirsiger et al., 2016; Smith, 2016; Zimmermann, Griffiths, & McIntosh, 2018). A derived brain parcellation should facilitate such analysis tasks. To evaluate this, we predicted selected behavioral traits available from the HCP beginning with different parcellations. These traits included score on reading recognition test (reading) number of correct responses on the Penn Matrix test (fluid intelligence), number of correct responses on the Penn Line Orientation test (spatial orientation), number of correct responses on the Penn Word Memory test (verbal episodic memory), score on the openness to experiences section of the Five Factor Model (openness), and gender. The variable names in parenthesis follow the convention of Ooi et al. (2022). Since our primary goal was to compare all parcellations under a straightforward and efficient predictive model, rather than identifying the optimal model for linking brain connectivity to human traits, we opted to use three simple models: principal component regression (PCR), ridge regression, and support vector regression (SVR) with a linear kernel. For PCR, a ridge regression model was used to predict y with the derived PC scores.
In our experiments, for each parcellation, each subject’s high-resolution SC matrix was converted into a parcel-based SC matrix. The log transformation was omitted to achieve better prediction results, and thus, S(x, y) replaced S′(x, y) in Equation 8. We followed the standard convention and set self-connections in the parcel-based SC matrix to 0. The parcel-based SC matrix for each subject was then vectorized to derive a N × K(K − 1)/2 matrix, where N is the number of subjects and K is the number of parcels in the considered parcellation. To predict a selected trait y, the data were randomly split 50 times into training (80%) and testing (20%) sets. In each split, each of the three considered models were trained, with their hyperparameters (e.g., the number of PC scores to retain for PCR or the penalty parameter in ridge) chosen using fivefold cross-validation. Subsequently, the testing set was fed into the trained model to predict the values of the response y. The model’s performance was evaluated using the Pearson correlation between the measured and predicted y values. The performance of a parcellation across the 50 data splits is summarized by the mean and the standard error of the mean of the correlations. To assess a parcellation’s performance in predicting gender, we used a logistic regression (LR) model with PC scores (PCLR), L2-regularized LR, and a linear support vector classifier (SVC). The performance was measured using the AUC area under the receiver operating characteristic (ROC) curve.
A parcellation that is well-suited for exploring the relationship between brain connectivity and subject-level traits should have a strong predictive power for those traits.
Test–retest identifiability.
SC has been shown to be highly reproducible (Bonilha et al., 2015; Prčkovska et al., 2016; Zhang et al., 2018), that is, SCs generated by the same individual are notably more similar than SCs derived from different individuals. We draw on a metric, known as neural identifiability (), introduced by Mansour L., Tian, Yeo, Cropley, and Zalesky (2021) to assess how well a parcellation preserves this inherent reproducibility of SC. To define NI, we made use of the imaging from 897 subjects from the HCP, where 37 of these subjects underwent an additional imaging session at a later date. We designate the SC data derived from the initial scans from the 897 subjects as the test dataset and the SC data derived from the subsequent second scans from the subset of 37 subjects as the retest dataset. To calculate for a given parcellation , we first constructed a parcel-based SC matrix, using Equation 8, from each of the full-resolution test and retest datasets.
Validation on an external dataset.
A good parcellation of the structural connectome should capture meaningful SC patterns across diverse populations. To assess this, we evaluated the performance of CoCoNest on the ABCD study dataset (Casey et al., 2018).
To do this, we first downloaded dMRI and sMRI data from 493 subjects who participated in the ABCD study. Details on the minimal processing pipelines and acquision protocols can be found in Casey et al. (2018) and Hagler et al. (2019). To avoid interscanner effects, we only considered subjects who were scanned using a Siemens manufactured scanner. The downloaded ABCD imaging data were then processed through the SBCI Pipeline (see the Image Data Processing and Connectome Extraction section). As before, the SBCI pipeline performed standard preprocessing steps and output a high-resolution SC matrix for each subject based on a continuous representation of SC.
The performance of a parcellation was then measured using a collection of the introduced internal evaluation metrics introduced in the Quantifying Parcellation Performance section, namely, approximation error, 1-Wasserstein distance, the Calinski-Harabasz index, and parcel homogeneity. For external evaluation, we predicted selected behavioral traits available from the ABCD study. These traits included vocabulary, fluid intelligence, fluid cognition, crystallized cognition, reward responsiveness, and gender. These variable names follow the convention of Ooi et al. (2022).
RESULTS
We used high-quality imaging data from the HCP to construct a multiresolution family of CoCoNest parcellations based on structural connectome data.
Figure 1E shows the subtrees corresponding to three members of the CoCoNest family with 9, 68, and 360 terminal nodes. Figure 1F displays the parcellations derived from these subtrees along with the corresponding parcel-based representation of , derived using Equation 8.
We used the internal and external evaluation metrics described above to evaluate the performance of members of the CoCoNest family, and to compare CoCoNest with the widely used parcellations in the literature, including the anatomical-based Desikan (Desikan et al., 2006) and Destrieux (Destrieux et al., 2010) parcellations; the structural connectome-based Brainnetome parcellation (Fan et al., 2016); the functional connectome-based Yeo-17 (Yeo et al., 2011), Gordon (Gordon et al., 2016), and multiresolution Schaefer (Schaefer et al., 2018) parcellations; and the multimodal Glasser parcellation (Glasser et al., 2016).
Internal Evaluation Results
The comparison results for the internal evaluation metrics are shown in Figure 4. Higher values in the proportion of contiguous parcels, entropy, and CH index indicate better performance; these are shown in the top panel. Conversely, lower values of W1 and logAE indicate better performance; these are shown in the bottom panel.
The results show that members of the CoCoNest family containing more than 100 parcels possess several desirable characteristics. Firstly, they feature homogeneous, uniformly sized, and connected parcels, as indicated by the proportion of connected parcels and the entropy of parcels. Secondly, these parcellations preserve high-resolution SC patterns in both and subject-specific high-resolution SC data, as seen in the log AE measure. Thirdly, they have a much higher ratio of between-parcel variation to within-parcel variation (as measured by CH) compared with the other parcellations considered. Lastly, CoCoNest members outperform the other parcellations with similar sizes in both the mean per-subject AE and the mean per-subject W1 metrics, indicating that CoCoNest members better preserve high-resolution SC patterns in subject-specific data.
We found small deviations from connectedness in several parcels in the considered parcellations from the literature (e.g., a single parcel in the Desikan parcellation). These deviations are a result of small errors incurred during downsampling from the high-resolution white matter surfaces (see the Image Data Processing and Connectome Extraction section). Examples of such small deviations are given in the Examples of Non-connected Clusters section of the Supporting Information.
Figure 5A shows the AMI between selected members of the CoCoNest family and other parcellations with a similar number of parcels. The Yeo-17 and Gordon parcellations show the most dissimilarity to the other parcellations, which is expected since they were constructed using fMRI data. We found moderate similarity (AMI ≈ 0.60) between CoCoNest family members and the Desikan, Destrieux, Brainnetome, Glasser, and Shaefer parcellations. Figure 5B overlays members of the CoCoNest family (outlined in black) on top of the Glasser parcellation. The color scheme of the Glasser parcellation allowed us to visualize how parcels from the CoCoNest family overlap with known anatomical and functional regions. For example, the CoCoNest member with 17 parcels contains large parcels covering the visual and sensorimotor regions of the brain. As the number of parcels increase, CoCoNest members are characterized by elongated parcels along the sensorimotor regions of the brain, similar to the Glasser parcellation, and larger parcels in the frontal lobe. Additionally, members of the CoCoNest family show a high degree of symmetry, with parcels on the left hemisphere mirroring the shape and size of similarly located parcels on the right hemisphere.
Figure 6 shows the stability of the CoCoNest procedure, as measured by the AMI between members of CoCoNest families derived from different subsets of the 897 subjects. Here, we include results for CoCoNest-5, CoCoNest-50, CoCoNest-100, CoCoNest-250, and CoCoNest-1000. The parcellations were remarkably similar, with the AMI value exceeding 0.9 for every parcellation size greater than 50 parcels. These results show that CoCoNest parcellations are robust and stable across homogeneous groups of subjects.
External Evaluation Results
Figure 7 shows the trait prediction and neural identification performance of members of the CoCoNest family and the other parcellations considered. With a similar number of parcels, CoCoNest family members consistently outperformed the other parcellations considered across all prediction models. Additionally, after 250 parcels, the performance of CoCoNest members tends to plateau. From the NI metric, we found that members of the CoCoNest family were competitive in reflecting the high reproducibility of SC, and thus creating a more distinguishable SC fingerprint for an individual. It is interesting to note that although the Yeo-17 parcellation was constructed using functional connectome data, it provides an SC fingerprint competitive with CoCoNest members and the Desikan atlas.
Validation on the ABCD Dataset
Figure 8 shows the trait prediction results using data downloaded from the ABCD study. Echoing the results of the HCP external validation, we found that members of the CoCoNest family were simultaneously competitive with the other parcellations considered and often showed superior performance across all prediction models. Additionally, as in the HCP external validation, we found that the CoCoNest member with 250 parcels consistently achieved the best, or close to the best, performance across the considered resolutions. All parcellations considered were competitive in the internal evaluation metrics, with CoCoNest achieving marginal improvements in performance. These internal validation results can be found in Supporting Information Section S1.
Additionally, a natural question is whether a family of CoCoNest parcellations derived from the ABCD data, referred to as CoCoNest_ABCD, outperforms the proposed CoCoNest family, derived from the HCP data, when validated using the ABCD data. Our findings indicate that CoCoNest_ABCD parcellations indeed show moderately better performance across both internal and external evaluation metrics. These results are reported in Supporting Information Section S1.2.
CoCoNest as an Exploratory Framework
The nested structure of the parcellations in the CoCoNest family can provide a valuable framework for exploratory analyses into the organization of the structural connectome. To give an example, consider the final stages of the error-complexity pruning procedure shown in Figure 9. Notably, CoCoNest does not separately partition the left and right hemispheres as might be expected. Rather, it merges the frontal and parietal lobes of both hemispheres into one parcel (Parcel 3 in Figure 9), while merging the temporal and occipital lobes of the left hemisphere separately from those on the right hemisphere (Parcels 1 and 2 in Figure 9). This approach suggests that the tree creation and pruning procedures identify similar SC patterns in the frontal and parietal lobes across both hemispheres, but disparate patterns in the occipital and temporal lobes. This division of lobes is consistent with existing literature that highlights functional differences between the left and right temporal lobes (Chan et al., 2009; Scott, Blank, Rosen, & Wise, 2000). As this example illustrates, by navigating the family of CoCoNest parcellations, researchers can gain a better understanding of the structural variations between brain regions and can investigate how these variations might correlate with functional roles.
The error-complexity pruning algorithm used to construct the CoCoNest family aims to strike a balance between the complexity of a parcellation and how well it represents the high-resolution SC data . For each terminal node in a subtree corresponding to a CoCoNest member, we calculated the number of branches pruned from the full CoCoNest tree to arrive at that node. A large number of branches pruned in a brain region suggests that many splits were eliminated. This elimination is done to balance complexity with accuracy. In other words, the higher the number of pruned branches in a region, the less distinct the SC profiles in that region are. Figure 10 shows this calculation for members of the CoCoNest family with roughly 10, 100, 250, 500, and 1,000 parcels.
We observe that, across various resolutions, the CoCoNest pruning procedure favors pruning parcels in regions including the occipital, precentral, postcentral, and temporal lobes. This suggests that combining the SC patterns in these lobes reduces the complexity of the connectome data while still preserving high-resolution SC patterns. When conducting SC analyses using the CoCoNest family, researchers can use the number of pruned branches below a parcel to inform hypotheses about which parcels possess crucial SC patterns.
As mentioned earlier, a parcellation serves as a natural bridge between neuroscience and network science. By treating the parcels as nodes and the structural connections as edges, can be viewed as a weighted network with K nodes. In order to showcase how members of the CoCoNest family can be used as an exploratory tool in network-based analyses, we used them to investigate the rich club effect in the structural connectome. To carry out a more interpretable analysis, the log transformation introduced in the Image Data Processing and Connectome Extraction section was omitted for this task.
Recall that the degree of a node in a network is measured by the number of outgoing connections that it has to other nodes. Similarly, the weighted degree of a node is the sum of the weights of these outgoing connections. Rich clubs in networks are high degree nodes that are more strongly connected to each other than nodes of lower degrees. Previous work has shown that the structural connectome strongly displays the rich club effect (Grayson et al., 2014; Liu et al., 2021; van den Heuvel, Kahn, Goñi, & Sporns, 2012; van den Heuvel & Sporns, 2011). We follow (van den Heuvel & Sporns, 2011) and use the normalized, weighted rich club coefficient to probe for the rich club effect in the structural connectome. A normalized rich club coefficient greater than 1 for successive values of k indicates the presence of the rich club effect.
Figure 11 shows the weighted degrees of each parcel on the cortical surface, the normalized rich club coefficients, and the identified rich club nodes for networks derived from CoCoNest-150, CoCoNest-250, and CoCoNest-500. In general, we found that parcels in the frontal and parietal lobes showed lower weighted degrees, whereas parcels in the occipital lobe showed higher weighted degrees. For all three members, we found evidence of the rich club effect. Notably, the rich club nodes across these members include parcels that overlap with the superior frontal cortex, somatosensory regions, and the inferior parietal lobe across both hemispheres. These findings are similar to previous work (Grayson et al., 2014; Liu et al., 2021; van den Heuvel et al., 2012; van den Heuvel & Sporns, 2011). We also found similar results using the other considered external parcellations (see Supporting Information Section S3). Additionally, we found that nodes in the anterior portion of the frontal lobe and the posterior portion of the occipital lobe showed variable rich club status across the three resolutions. For instance, while no rich club nodes were found in the anterior portion of the frontal lobe while analyzing the network created by CoCoNest-250, rich club nodes were present in these areas when analyzing the networks created by CoCoNest-150 and CoCoNest-250. Further details on this rich club analysis can be found in Supporting Information Section S3. These observations highlight how network-based insights into the structural connectome can vary with resolution, demonstrating CoCoNest’s usefulness for conducting tractable, multiscale analyses of the structural connectome.
Selection and Sharing of CoCoNest
Since each member of the CoCoNest family corresponds to a unique subtree within the full CoCoNest tree, the CoCoNest family enables researchers to explore the multiresolution nature of the structural connectome. While the hierarchical structure may offer enhanced insights into brain organization, researchers may be interested in using just a single resolution, or a CoCoNest member, for their analyses. According to the internal and external validation results in Figures 4 and 7, larger CoCoNest parcellations tend to show diminishing improvements in performance after about 250 parcels. Therefore, if one is seeking a single-resolution parcellation for the HCP data considered above, a natural candidate is CoCoNest-250 (shown in Figure 12) which performs well on both the internal and external evaluation metrics.
Every member of the CoCoNest family can be freely accessed at https://github.com/sbci-brain/CoCoNest. Additionally, the repository contains scripts to convert any member of the CoCoNest family to the fsLR-32 k, fsaverage, and MNI152 template spaces. The repository also contains the scripts used to create the CoCoNest family from SBCI-processed brain connectivity data.
DISCUSSION
In this work, we leveraged recent advances in structural connectome reconstruction, as well existing techniques from hierarchical clustering and error-complexity pruning, to construct a nested, multiresolution family of parcellations, called the CoCoNest family. The CoCoNest family provides researchers with insights into the structural connectome across multiple resolutions. The multiresolution nature of the CoCoNest family is naturally aligned with the current consensus in the literature on the multiscale nature of the brain. Through an extensive battery of internal and external evaluation metrics, we have shown that the CoCoNest family is simultaneously competitive with widely used parcellations in the literature, including the Yeo-17, Desikan, Destrieux, Brainnetome, Gordon, Schaefer, and Glasser parcellations (Desikan et al., 2006; Destrieux et al., 2010; Fan et al., 2016; Glasser et al., 2016; Gordon et al., 2016; Schaefer et al., 2018; Yeo et al., 2011). In particular, CoCoNest members with a similar number of parcels as these widely used parcellations often show superior performance in a number of unsupervised and predictive metrics.
Additionally, there are several alternative methods for creating a connectome-based parcellation. These methods can be broadly categorized into gradient-based methods (Cohen et al., 2008; Gordon et al., 2016; Schaefer et al., 2018; Wig et al., 2014), which identify abrupt changes in connectivity across the cortical surface, and statistical clustering methods, which cluster regions based on similar connectivity patterns. Widely used statistical clustering methods include mixture models (Baldassano, Beck, & Fei-Fei, 2015; Golland, Golland, & Malach, 2007; Kong et al., 2019; Lashkari et al., 2012; Moyer, Gutman, Jahanshad, & Thompson, 2017; Roca et al., 2010; Yeo et al., 2011), k-means clustering (Flandin et al., 2002; Kahnt, Chang, Park, Heinzle, & Haynes, 2012; Salehi, Karbasi, Shen, Scheinost, & Constable, 2018), and spectral clustering (Chen et al., 2013; Craddock et al., 2012; Fan et al., 2016; Thirion, Varoquaux, Dohmatob, & Poline, 2014). These methods typically produce a single parcellation with a predefined number of parcels. However, recent work has revealed the multiscale nature of human brain connectivity, where the interactions and distinctions between scales are thought to be critical to the overall functionality of the brain (Bassett & Siebenhühner, 2013; Betzel & Bassett, 2017; Meunier, Lambiotte, & Bullmore, 2010; R. Wang et al., 2019). As opposed to generating a sequence of predefined number of parcels and using the aforementioned methods, hierarchical clustering approaches, like the proposed CoCoNest, have been used to better represent the multiscale nature of the connectome (Blumensath et al., 2013; Cammoun et al., 2012; Diez et al., 2015; Eickhoff et al., 2011; Gallardo, Wells, Deriche, & Wassermann, 2018; Kurmukov et al., 2020; Michel et al., 2012; Moreno-Dominguez, Anwander, & Knösche, 2014). This approach represents parcels as being composed of smaller subparcels, thus capturing both the interactions and distinctions between scales. Here, we call this spatial, multiscale representation of the connectome a multiresolution representation to highlight its nested structure and to distinguish it from the temporal and topological scales of brain networks often studied (Betzel & Bassett, 2017).
The main decisions in these hierarchical parcellation methods center on creating a hierarchical structure from connectome data and deriving parcellations from this structure. A simple approach is to iteratively merge nearby vertices in high-resolution structural connectome data to form a hierarchical structure (Cammoun et al., 2012). However, this method does not allow the structural connectome data itself to directly influence the resulting hierarchical structure. Modularity maximization algorithms, particularly the Louvain algorithm, have also been commonly used to create hierarchical structures from structural connectome data (Diez et al., 2015; Kurmukov et al., 2020). Yet, this approach has been shown to be limited in capturing partitions with small clusters, especially when used with dense, high-resolution data (Fortunato & Barthélemy, 2007). Additionally, implementations of hierarchical clustering for structural connectome data predominantly focus on subject-level parcellations (Kurmukov et al., 2020; Moreno-Dominguez et al., 2014). However, aggregating individual hierarchical structures or computing a consensus hierarchy is challenging and often avoided, making group-level analysis of the organization of the structural connectome difficult. Finally, many existing parcellations have not undergone rigorous validation by both internal and external metrics across independent datasets. In this work, we have shown that parcellations derived using the CoCoNest framework effectively address the above limitations.
Despite the advantages of the CoCoNest framework, there are several limitations. First, CoCoNest relies solely on the density of white matter fiber tracts between brain regions. In order to fully characterize the structural connectome, it may be important to include additional information about the SC between vertices. For example, considering the shape and curvature of these tracts could offer a more nuanced characterization of each connection. Second, CoCoNest is built on the population-averaged high-resolution SC matrix. Such averaging may distort SC characteristics found at the subject level. Exploring more refined methods to construct a consensus SC matrix for a population might enhance the ability of CoCoNest to characterize the structural connectome. Similarly, incorporating measures of uncertainty arising from the variability of SC may yield a more rigorous notion of similarity between SC patterns. Lastly, CoCoNest is completely deterministic, which restricts our ability to study the error in the CoCoNest tree or explore alternative trees that were likely to be generated from the data (Peixoto, 2023). Leveraging tools from the literature on hierarchical random graphs (Briercliffe, 2023) or nested stochastic block models (Peixoto, 2014) may provide avenues for further analyses and facilitate statistical inference on nested parcellations like the CoCoNest family.
As the CoCoNest family was constructed using dMRI data, it is important to note that numerous challenges exist in accurately uncovering the underlying white matter architecture, which could impact any subsequent analysis, including our proposed CoCoNest family. Tractography algorithms strive to infer global connectivity from local diffusion information. This approach will inevitably produce false-positive and omit true-positive fiber tracts (Maier-Hein et al., 2017; Rheault, Poulin, Caron, St-Onge, & Descoteaux, 2020). Additionally, the complex geometry of these tracts may produce diffusion signals that generate biases in tractography. For example, overlapping fiber bundles with independent endpoints have been shown to produce invalid structural connections (Maier-Hein et al., 2017) (also called the bottleneck effect; Rheault et al., 2020). Moreover, the reconstructed fiber tracts can have arbitrary endpoints inside of the brain rather than on the cerebral cortex, the number of short-range fiber tracts may be overestimated, and significant gyral bias can occur (Reveley et al., 2015; Rheault et al., 2020; St-Onge et al., 2018). To mitigate the influence of these biases on the CoCoNest family, we have used high-quality imaging data from the HCP and SET, which has been shown to reduce such biases. However, we acknowledge that our method is not entirely immune to these limitations. As brain imaging technology and tractography algorithms evolve, it is important to reevaluate parcellations derived from any method of structural connectome reconstruction.
We have made all members of the CoCoNest family freely available, along with the code used to construct the entire family, at https://github.com/sbci-brain/CoCoNest. In general, the methodology presented in this work can be used to create a “CoCoNest”-like parcellation from any SC matrix. The choice of similarity and linkage functions and the error-complexity objective function are flexible. However, we believe our method works best with dense, high-resolution SC matrices (as produced from the SBCI pipeline; Cole et al., 2021).
ACKNOWLEDGMENTS
Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. Nobel’s research was supported by NSF Grant DMS 2113676. Zhang gratefully acknowledges support from the Ralph E. Powe Junior Faculty Enhancement Award and NIH Grant R25DA058940.
SUPPORTING INFORMATION
Supporting information for this article is available at https://doi.org/10.1162/netn_a_00409.
AUTHOR CONTRIBUTIONS
Adrian Allen: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Software; Validation; Visualization; Writing – original draft; Writing – review & editing. Zhengwu Zhang: Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Resources; Software; Supervision; Validation; Writing – original draft; Writing – review & editing. Andrew Nobel: Conceptualization; Formal analysis; Investigation; Methodology; Project administration; Resources; Supervision; Validation; Writing – original draft; Writing – review & editing.
DATA AVAILABILITY STATEMENT
Data and code are available at https://github.com/sbci-brain. This repository contains codes to extract members of the CoCoNest family and project them to popular template spaces.
TECHNICAL TERMS
- Parcellation:
A partitioning of the brain into discrete regions of interest called parcels.
- Structural connectome:
A comprehensive map of interconnecting white matter fiber tracts between brain regions.
- Diffusion MRI (dMRI):
An imaging technique that maps the diffusion of water molecules in biological tissue.
- Tractography:
A method that leverages dMRI data to reconstruct white matter fiber tracts.
- Continuous representation of structural connectivity:
A representation of structural connectivity as a continuous function across the brain’s surface.
- Full binary tree:
A hierarchical, tree-based data structure in which each node has either zero or two children nodes.
- Internal parcellation evaluation:
A method for evaluating a parcellation based solely on the intrinsic properties of the parcels.
- External parcellation evaluation:
A method for evaluating a parcellation using auxiliary or external information.
- Tree pruning:
A technique for removing branches from a tree data structure.
REFERENCES
Competing Interests
Competing Interests: The authors have declared that no competing interests exist.
Author notes
Handling Editor: Marcus Kaiser