Mapping of human brain structural connectomes via diffusion magnetic resonance imaging (dMRI) offers a unique opportunity to understand brain structural connectivity and relate it to various human traits, such as cognition. However, head displacement during image acquisition can compromise the accuracy of connectome reconstructions and subsequent inference results. We develop a generative model to learn low-dimensional representations of structural connectomes invariant to motion-induced artifacts, so that we can link brain networks and human traits more accurately, and generate motion-adjusted connectomes. We apply the proposed model to data from the Adolescent Brain Cognitive Development (ABCD) study and the Human Connectome Project (HCP) to investigate how our motion-invariant connectomes facilitate understanding of the brain network and its relationship with cognition. Empirical results demonstrate that the proposed motion-invariant variational autoencoder (inv-VAE) outperforms its competitors in various aspects. In particular, motion-adjusted structural connectomes are more strongly associated with a wide array of cognition-related traits than other approaches without motion adjustment.

To comprehensively understand brain function and organization, enormous efforts have been dedicated to imaging and analyzing the brain structural connectome, defined as a collection of white matter fiber tracts connecting different brain regions. Recent advancements in noninvasive neuroimaging techniques have led to a rapid increase in brain imaging datasets, such as the Human Connectome Project (HCP) (Glasser et al., 2016) and the Adolescent Brain Cognitive Development (ABCD) study (Bjork et al., 2017). These developments have inspired a substantial body of literature (Zhang, Allen, et al., 2019; Zhang et al., 2018) that focuses on analyzing structural connectomes derived from diffusion-weighted magnetic resonance imaging (dMRI) data. However, these studies often encounter challenges due to subject head displacement during image acquisition, resulting in image misalignment and artifacts that hinder unbiased connectome reconstructions and analysis (Andersson et al., 2016). Addressing these biases is crucial, which has motivated us to develop novel statistical methods capable of modeling and analyzing brain structural connectomes in an invariant manner (Achille & Soatto, 2018; Zemel et al., 2013).

Our goal is to learn a low-dimensional representation of brain connectomes invariant to motion artifacts in the data, thereby facilitating prediction and inference on the relationship between brain structure and human traits such as cognitive abilities. A common motion correction technique in neuroimaging involves registering motion-corrupted images to a geometrically correct template (Bhushan et al., 2015; Kochunov et al., 2006). However, in dMRI, motion frequently causes geometric distortion and signal dropout, significantly complicating registration (Ben-Amitay et al., 2012) and impacting subsequent brain network analyses (Le Bihan et al., 2006). Current motion correction methods often utilize the FSL eddy tool (Andersson & Sotiropoulos, 2016) to estimate and correct for eddy current-induced distortions and subject movements. Recent advancements include techniques such as SHORELine (Cieslak et al., 2021) and SHARD (Christiaens et al., 2021) for correcting multishell dMRI sequences. Despite these improvements, studies have shown that even after motion correction in diffusion signals, in-scanner head motion can still have notable impacts on the recovered structural connectivity (Baum et al., 2018). Moreover, not all sources of image misalignment are due to motion (Yendiki et al., 2014). Various imperfections in the imaging system can lead to artifacts causing image misalignment. Given these challenges, our major motivation is to develop a method that addresses motion-induced artifacts and adjusts for residual misalignment during the brain network modeling process, rather than in the data preprocessing stage. Here, motion-induced artifacts refer to the in-scanner head displacement that can be approximately quantified using image registration algorithms (Andersson & Sotiropoulos, 2016). Residual misalignment encompasses some remaining artifacts measured after eddy motion correction. By incorporating these considerations into our modeling approach, we aim to improve the robustness and reliability of brain connectome analysis in the presence of motion-induced artifacts.

In biomedical imaging, removing unwanted effects in high-dimensional data has been an important problem for at least the last decade. Without proper adjustment, the association between the signal of interest and one or more nuisance variables can degrade the ability to infer the signal accurately. This issue is particularly evident in brain imaging studies, and various tools have been developed. A popular matrix factorization method (Alter et al., 2000) uses a singular value decomposition of the original matrix to filter out the singular vector associated with the nuisance variable, and reconstructs an adjusted matrix with the remaining singular vectors. Similarly, the Sparse Orthogonal to Group (SOG) (Aliverti et al., 2021) proposes an adjusted dataset based on a constrained form of matrix decomposition. In addition, model-based methods are popular alternatives for removing unwanted associations in the data. For example, distance-weighted discrimination (Benito et al., 2004) adapts Support Vector Machines (SVM) for batch effect removal; the ComBat method (Johnson et al., 2007) adjusts for batch effects in data using empirical Bayes. Our work falls under the model-based framework, and we are particularly interested in incorporating batch effect removal into a deep learning framework so that the nonlinear batch effect can be modeled and removed.

Removing unwanted information in the deep learning literature is formulated as an invariant representation learning problem. Existing deep learning methods learn invariant representations through either adversarial or nonadversarial methods. Adversarial learning methods, prevalent in domain adaptation, involve a generative adversarial network (GAN) (Creswell et al., 2018) where the generator extracts features to deceive the discriminator, while the discriminator aims to distinguish between different data domains. However, adversarial learning has limitations, including GANs’ mode collapse issues and sensitivity to hyperparameter choices (Papernot et al., 2016). Conversely, nonadversarial invariant methods, including recent group invariance techniques such as contrastive learning (Chen et al., 2020; He et al., 2020; Khosla et al., 2020) and conditionally invariant VAE (Aliee et al., 2023), aim to develop predictors invariant to a group’s action on the input space. Despite their strengths, these methods face a notable limitation: they are designed to handle categorical nuisance variables and may encounter challenges when confronted with continuous-valued nuisance variables. Some deep learning methods aim to mitigate the impact of continuous-valued nuisances, often incorporating mutual information loss. For instance, Greenfeld and Shalit (2020) utilize the Hilbert Schmidt Independence Criterion (HSIC), while Yu et al. (2021) replace HSIC with matrix-based mutual information, yielding performance improvements. Another variant, squared-loss MI (SMI) (Y. Liu et al., 2021), is also popular in this context. However, these mutual information-based methods focus solely on capturing dependencies in data in an unsupervised setting and do not ensure that the learned representations are task relevant. In our work, we advocate for learning motion-invariant representations guided by the information bottleneck principle (Alemi et al., 2016; Moyer et al., 2018). Our approach removes the mutual information between latent representations and undesirable artifacts, with latent representations learned via graph variational autoencoders (VAE) (M. Liu et al., 2021). Situated in the information bottleneck literature, our work addresses the challenge of finding representations that maximize the relationship with the target response while minimizing mutual information with nuisance variables such as subject motion and residual image misalignment. This framework not only addresses the noted limitations of existing methods but also correlates learned invariant representations with human behavior and cognition-related attributes.

We aim to model brain connectomes and learn their representations invariant to nuisance factors, particularly motion-related artifacts. Our analysis of ABCD and HCP data, preprocessed with FSL eddy software (Andersson & Sotiropoulos, 2016), reveals that structural connectome data are significantly affected by motion. This impact is observed not only in the motion quantified by FSL eddy but also in residual misalignment in the diffusion MRI data after FSL eddy correction. Notably, individuals exhibiting substantial head displacement during data acquisition tend to have fewer reconstructed fiber connections across nearly all brain regions. To address this issue, we develop a generative model that learns the distribution of brain connectomes while adjusting for undesirable artifacts induced by motion. This model also learns low-dimensional representations of structural connectomes that are invariant to motion. We achieve this by introducing a penalized VAE objective to eliminate mutual information between connectome representations and motion measures. The learned invariant representation enhances our understanding of motion’s impact on structural connectivity reconstructions. It can also be used to produce motion-adjusted connectomes for downstream analysis tasks, such as predicting behavior- and cognition-related traits and inferring structural connections between brain regions. Our analysis of ABCD and HCP data demonstrates that these motion-invariant representations improve the prediction of many cognitive traits by 5% to 25% compared with connectome representations without motion adjustment.

The rest of the paper is organized as follows: Section 2 introduces the dMRI datasets used for deriving structural brain connectomes and outlines the motion measures utilized in this study. In Section 3, we present our method for adjusting for motion-induced artifacts during the modeling of structural brain networks. Section 4 details a simulation study verifying the effectiveness of our method in removing unwanted information. Section 5 applies our proposed method to two large neuroimaging studies, demonstrating its efficacy in correcting for motion-induced artifacts and establishing a more accurate analysis between structural connectomes and cognitive traits. Section 6 includes a discussion. In the Appendix, we present additional results by considering the residual misalignment after FSL eddy processing as an alternative source of artifacts. Furthermore, we expand the simulation study to assess the robustness of our method when handling extreme motion-related artifacts in the Appendix.

2.1 Structural connectomes and cognitive traits

The ABCD study tracks the brain development of over 10,000 adolescents in the United States. We download and process the diffusion MRI (dMRI) data for 8,646 subjects from the NIH Data Archive. Details about data acquisition and preprocessing can be found in Casey et al. (2018). The Human Connectome Project (HCP) characterizes brain connectivity in more than 1,000 adults; see Van Essen et al. (2012) for details about data acquisition. The latest dMRI data from the HCP can be accessed through ConnectomeDB. We download and process 1,065 subjects from the HCP.

To extract structural connectomes from the raw dMRI data, a reproducible tractography algorithm (Maier-Hein et al., 2017) is used to generate the whole-brain tractography for each subject. We use the Desikan–Killiany parcellation (Desikan et al., 2006) to define brain regions of interest (ROIs) corresponding to different nodes in the brain network. The Desikan–Killiany atlas contains 68 ROIs with 34 ROIs belonging to each hemisphere. We use Freesurfer (Iannopollo et al., 2019) to perform brain registration and parcellation. We extract streamlines connecting ROI pairs from the tractography data and the brain parcellation. Fiber count is often used to measure the coupling strength between ROI pairs in the current literature (Fornito et al., 2013; Zhang, Allen, et al., 2019), and, therefore, we summarize each brain connection with its fiber count.

ABCD and HCP use reliable measures to assess a wide range of human behaviors and functions (Gershon et al., 2013). We focus on inferring the relationship between structural connectomes and cognition-related traits, for example, scores from the picture vocabulary and oral reading recognition tests.

2.2 Motion quantification

Our study considers two sources of motion artifacts: head motion and residual misalignment. Head motion refers to the displacement of the head in the scanner estimated by the eddy tool (Andersson & Sotiropoulos, 2016). Residual misalignment, on the other hand, refers to any remaining misalignment after applying the eddy correction tool. While we analyze both types of artifacts, our primary focus is on head motion because we are interested in the impact of head motion on the diffusion-weighted images used for tractography and the subsequent construction of structural connectivity. We provide additional analysis of the residual misalignment in Appendix G. This section explores how head motion estimates are obtained and its impact on brain structural connectomes.

We primarily rely on the FSL eddy tool to estimate head motion. The eddy software corrects head motion in diffusion MRI data through a multistep process that involves modeling and correcting for both eddy currents and subject movement. It outputs a file named “my_eddy_output.eddy_movement_rms,” which provides summaries of the total movement in each volume. This is calculated by determining the displacement of each voxel, averaging the squares of those displacements across all intracerebral voxels, and finally taking the square root of that average. The file contains two columns: the first column lists the RMS (root mean square) movement relative to the first volume, and the second column lists the RMS movement relative to the previous volume. We use the second measure as our head motion metric in this paper (as it is the only head motion measure accessible in the ABCD data).

Figure 1(A) shows distributions of head motion for the ABCD and HCP studies. We notice that the motion distribution of ABCD subjects is more right skewed with more subjects experiencing bigger motion during image acquisition. This disparity in motion measure distributions between the ABCD and HCP studies could be attributed to the demographic differences in their respective cohorts. Our hypothesis suggests that the HCP study focuses on adults, who tend to follow instructions better and exhibit less movement during image acquisition compared with adolescents in the ABCD study.

Fig. 1.

Head motion analysis for subjects in the ABCD and HCP datasets and the impact of motion on brain structural connectome data. (A) Distribution histograms of head motion for ABCD and HCP subjects, with the blue and orange lines representing the 10th and 90th percentiles. Subjects with head motion below the blue line are classified into the small motion group, while those with head motion above the orange line belong to the large motion group. (B) Mean network differences between the large and small motion groups (large–small). (C) Circular plots representing the data in panel (B). The color scale corresponds to that in (B), with the display limited to the 15 edges showing the largest fiber count differences.

Fig. 1.

Head motion analysis for subjects in the ABCD and HCP datasets and the impact of motion on brain structural connectome data. (A) Distribution histograms of head motion for ABCD and HCP subjects, with the blue and orange lines representing the 10th and 90th percentiles. Subjects with head motion below the blue line are classified into the small motion group, while those with head motion above the orange line belong to the large motion group. (B) Mean network differences between the large and small motion groups (large–small). (C) Circular plots representing the data in panel (B). The color scale corresponds to that in (B), with the display limited to the 15 edges showing the largest fiber count differences.

Close modal

To explore the impact of head motion on structural connectomes, we assign individuals to a large and a small motion group corresponding to each movement type. Figure 1 shows the cutoff lines for both ABCD and HCP data. Specifically, in the ABCD data, individuals whose motion estimates are ranked among the top (bottom) 5 percent are assigned to the large (small) motion group. In the HCP data, we use the top and the bottom 10 percent as the threshold. We then quantify brain connection differences between groups. We subtract edge-specific means of the small motion group from those of the large motion group and show the results as adjacency matrices in Figure 1(B). In addition, we visualize between-group differences using a circular layout in Figure 1(C) to examine how connection patterns and strength differ across ROIs. Note that the dMRI data used here are preprocessed with the FSL eddy tool and its motion correction procedures. Figure 1(B and C) reveals that brain networks of large-motion subjects show systematic differences in fiber counts across multiple ROIs compared with those of small-motion subjects, implying that motion artifacts still affect structural connection reconstruction even after motion correction in the preprocessing stage (Andersson & Sotiropoulos, 2016; Andersson et al., 2016). Motivated by these findings, we propose the motion-invariant graph variational autoencoder in Section 3 to remove such systematic artifacts from the connectome data and downstream analyses.

The brain connectome for an individual is represented as an undirected graph G=(V,) with nodes V and edges . The collection of nodes V refers to the partitioned brain regions, and the set of edges represents connections between pairs of ROIs. The ROIs are prealigned so that each node in V corresponds to the same brain region across different subjects. We use the symmetric adjacency matrix A to represent G with Auv denoting the number of streamlines connecting ROIs u and v; Auu=0 since we do not consider self-connections. For N subjects, we denote the network data as Ai,i=1,2,,N. Also, denote the cognitive trait score for an individual as yi, and the amount of motion as ci. Note that ci can be a scalar quantifying overall head displacement or a vector (ciC) with each dimension measuring different aspects of motion such as the amplitudes of translation or rotation. Moreover, ci can represent other undesirable artifacts, such as the residual misalignment, depending on the research goal.

We aim to infer the relationship between individuals’ structural connectomes, recovered from neuroimaging data, and their measured cognitive traits, while adjusting for motion-induced artifacts during data acquisition. The key to achieving this goal is to learn a low-dimensional feature representation of Ai as zi that is invariant to motion ci, and then relate this feature zi to yi. Inspired by the variational autoencoder (VAE) framework in D. P. Kingma (2013), we achieve our goal in three steps: (1) an encoder module (inference model) that learns the mapping from the observation Ai to the motion-invariant latent variable zi; (2) a decoder module (generative model) that specifies how to construct Ai from zi and motion ci; and (3) a regression module to link the motion-invariant zi to the cognitive trait yi; see Figure 2 for the model architecture. Intuitively, in the training process, our approach seeks to maximize the joint likelihood of (Ai,yi) conditional on ci while minimizing the dependence between zi and ci. This is achieved by characterizing the joint likelihood through the brain network generative model in Section 3.1 and addressing the dependence through mutual information between the low-dimensional representations of brain networks and motion, as detailed in Section 3.2.

Fig. 2.

(A) Structure of the standard VAE model. The encoder learns motion-affected latents from motion-affected brain networks, and the decoder reconstructs networks from these latents. (B) Architecture of our proposed motion-invariant VAE. The encoder learns motion-invariant latents from the motion-affected networks, and the decoder reconstructs motion-adjusted networks from the invariant latents and the user-specified amount of subject motion. The invariant latents additionally serve as inputs to a regressor for predicting cognition-related traits.

Fig. 2.

(A) Structure of the standard VAE model. The encoder learns motion-affected latents from motion-affected brain networks, and the decoder reconstructs networks from these latents. (B) Architecture of our proposed motion-invariant VAE. The encoder learns motion-invariant latents from the motion-affected networks, and the decoder reconstructs motion-adjusted networks from the invariant latents and the user-specified amount of subject motion. The invariant latents additionally serve as inputs to a regressor for predicting cognition-related traits.

Close modal

In the following sections, we start with introducing our graph generative model and then discuss the encoder and regression modules.

3.1 Brain network generative model

For each individual i, we assume the edges Ai[uv] are conditionally independent given zi and ci. The likelihood of the set of edges in Ai is

(1)

where pθ(Ai|zi,ci) is the generative model for Ai given zi and ci, parameterized by θ learned via neural networks. We assume Ai to be generated from the following process:

(2)
(3)

where K is the dimension of the latent feature space. The fiber count between ROIs u and v is modeled using a Poisson distribution with a parameter λi[uv](zi,ci), which relates to both the encoded feature zi and motion ci.

To capture the varying effect of motion across individuals and brain connections, we further model λi[uv](zi,ci) as

(4)
(5)

where denotes the concatenation operator and z˜iK+C is a concatenated vector representing motion-affected latent features. ξuv is a baseline parameter controlling the population connection strength between regions u and v, representing shared structural similarity across individuals, and ψuv(z˜i) characterizes individual connection strength after accounting for motion.

We model ψuv(z˜i) based on the latent space model in Hoff et al. (2002). The key idea is that the connection strength (fiber count) between regions u and v depends on their distance in some network latent space in R. In this network latent space, each brain region is represented by a vector in the R space, with the distance between two points reflecting the degree of association between the corresponding brain regions. The closer two points are in this space, the stronger their connection is expected to be. The network latent space is different from the latent space of z˜i, which represents a low-dimensional space where the original high-dimensional brain structural connectomes are encoded into a lower dimensional representation.

We construct a mapping X from K+C to V×R that maps the motion-affected feature z˜i to the network latent space that captures the latent positions of all brain regions, that is, X(z˜i)=(X1(z˜i),,XV(z˜i)) with Xu(z˜i)=(Xu1(z˜i),...,XuR(z˜i))R for brain region u. Specifically, we assume each node uV for individual i lies in a latent space R and represent ψuv(z˜i) as

(6)

where αr>0 is a weight controlling the importance of the r-th dimension. A large positive inner product between Xu(z˜i) and Xv(z˜i) implies a large fiber count between this ROI pair. Therefore, Xu(z˜i) incorporates the geometric collaborations among nodes, and broadcasts the impact of motion to all brain connections. M. Liu et al. (2021) proposed a graph convolutional network (GCN) to learn a nonlinear mapping from the encoded feature zi to (X1(zi),,XV(zi)), leveraging on unique geometric features of the brain networks. The same procedure is adopted in this paper with the additional adjustment for motion ci, and we defer the detailed GCN framework to Appendix A.

3.2 Motion-invariant brain network encoding and traits prediction

We are interested in studying the relationship between brain structural networks and human traits, correcting for motion-related artifacts. The key is to find an optimal encoder that produces zi from Ai invariant to ci, and then formulate a regression of the trait yi with respect to the latent feature vector zi. We represent the inference model (encoder) as qϕ(zi|Ai), the generative model (decoder) as pθ(Ai|zi,ci), and regression as pθ(yi|zi). Our invariant encoding and trait prediction task is to find qϕ(zi|Ai) and pθ(Ai|zi,ci) that maximize EAi,yi,ci[logpθ(Ai,yi|ci)], subject to zi being independent of ci. Here pθ(Ai,yi|ci) is the joint likelihood of Ai,yi conditional on ci. Since correlation is a limited notion of statistical dependence, which does not capture nonlinear dependencies, we instead use mutual information between zi and ci to quantify their dependency similar to Moyer et al. (2018), and formulate the objective function as follows:

(7)

where I(zi,ci) is the mutual information between zi and ci, and λ is a trade-off parameter between the joint likelihood and the mutual information. ϕ are parameters of the encoder that are learned via neural networks.

To simplify our model, we assume (1) the human trait yi and the brain connectivity Ai are conditionally independent given the latent representation zi for individual i and (2) only Ai is affected by ci so yi is not affected. Assumption (2) indicates that motion happens randomly given the population considered and does not relate to the human traits (e.g., cognitive ability) considered. In certain scenarios, this assumption might not hold. For example, in the older population, people with mild cognitive impairment (MCI) tend to move more than normal controls (Haller et al., 2014), and if yi is an indicator of MCI, Assumption (2) does not hold here. Assumption (1) indicates that the encoded feature zi contains all the information from Ai needed to predict yi, and the residuals are independent of each other. Intuitively, we assume the motion ci explains part of the variability in brain networks Ai, but this part of variability in Ai does not relate to yi. Of course, we can modify our model to add ci into the prediction of yi in the case that ci is not independent of yi. But it will make the final results difficult to interpret since we are mostly concerned with the relationship between Ai and yi.

Under Assumptions (1) and (2), we can express the conditional likelihood of Ai,yi given ci as

(8)

The derivation is based on Jensen’s inequality; see Appendix B. Intuitively, the lower bound of logpθ(Ai,yi|ci) in (8) consists of three parts. The first term, log pθ(Ai|zi,ci), is a reconstruction error that utilizes log-likelihood to measure how well the generative model in (3) reconstructs Ai. The second term, logpθ(yi|zi), measures the trait prediction accuracy. The third KL divergence term is a regularizer that pushes qϕ(zi|Ai) to be close to its prior N(0,IK) so that we can sample zi easily.

For the second term in the objective function (7), we can upper bound I(zi,ci) as

(9)

which consists of two parts: the KL divergence between qϕ(zi|Ai) and its marginal qϕ(zi) to ensure less variability across the input data Ai, and the reconstruction error. See Appendix C for detailed derivations of (9).

Combining the log-likelihood and the mutual information terms, our training objective in (7) can be expressed as

(10)

Our goal is to maximize (Ai,yi,ci;θ,ϕ) through its lower bound (10). In practice, the expectation in (10) is intractable and we employ Monte Carlo approximation for computation; see Appendix D. We define the approximated objective as ˜(Ai,yi,ci;θ,ϕ), and implement a stochastic variational Bayesian Algorithm 1 to update the parameters using minibatch training. For the regression module, we consider pθ(yi|zi) as a univariate Gaussian, that is, yiN(ziβ+b,σ2), where β,b,σ2θ are parameters to be learned. We list the model parameters and architecture in the Appendix E.

Algorithm 1. Motion-Invariant Variational Autoencoders
Input:{Ai}i=1N,{yi}i=1N,{ci}i=1N
 Initialize θ,ϕ 
 while not converged do 
  Sample a batch of {Ai}i=1m, denoted as Am
  for allAiAmdo 
   Sample #iN(0,IK)
   Compute zi=μϕ(Ai)+#iΣϕ(Ai) 
   where denotes the inner product operator. 
   Compute the gradients θ˜(Ai,yi,ci;θ,ϕ) 
   and ϕ˜(Ai,yi,ci;θ,ϕ) with zi
  end for 
  Average the gradients across the batch. 
  Update θ,ϕ using gradients of θ,ϕ
 end while 
 Return θ,ϕ
Algorithm 1. Motion-Invariant Variational Autoencoders
Input:{Ai}i=1N,{yi}i=1N,{ci}i=1N
 Initialize θ,ϕ 
 while not converged do 
  Sample a batch of {Ai}i=1m, denoted as Am
  for allAiAmdo 
   Sample #iN(0,IK)
   Compute zi=μϕ(Ai)+#iΣϕ(Ai) 
   where denotes the inner product operator. 
   Compute the gradients θ˜(Ai,yi,ci;θ,ϕ) 
   and ϕ˜(Ai,yi,ci;θ,ϕ) with zi
  end for 
  Average the gradients across the batch. 
  Update θ,ϕ using gradients of θ,ϕ
 end while 
 Return θ,ϕ

We first conduct a simulation study to verify the efficacy of our method in removing unwanted information from the data. We simulate random graphs with the community structure (Nowicki & Snijders, 2001) using the Python package NetworkX. Random graphs of two communities are considered: Nodes in the same group are connected with a probability of 0.25, and nodes of different groups are connected with a probability of 0.01. We simulate 1,000 such networks {Ai}i=1n, where AiV×V with V=68 nodes and n=1,000. Figure 3(A) displays the average network across 1,000 simulated networks. Next, we simulate a nuisance variable, {si}i=1n, by sampling 20% of its elements from N(0.6, 0.05) and the rest from N(1, 0.01). We generate the nuisance-affected networks A˜i by propagating si across all edges in Ai as follows:

Fig. 3.

(A) Edge-specific means of simulated networks, nuisance-affected networks, reconstructed networks by GATE, and nuisance-corrected networks by inv-VAE. (B) Edge-specific differences between networks affected by large and small nuisance artifacts. The order of plots follows that in (A). (C) PCA projections of simulated networks, nuisance-affected networks, latent embeddings learned by GATE, and invariant embeddings from inv-VAE colored by the amount of nuisance artifacts.

Fig. 3.

(A) Edge-specific means of simulated networks, nuisance-affected networks, reconstructed networks by GATE, and nuisance-corrected networks by inv-VAE. (B) Edge-specific differences between networks affected by large and small nuisance artifacts. The order of plots follows that in (A). (C) PCA projections of simulated networks, nuisance-affected networks, latent embeddings learned by GATE, and invariant embeddings from inv-VAE colored by the amount of nuisance artifacts.

Close modal
(11)

Intuitively, siN(0.6, 0.05) introduces large artifacts, since its multiplication with Ai in (11) results in a A˜i with reduced edge values; siN(1, 0.01) generates small artifacts as the multiplication operation in (11) changes the edges of Ai to a lesser extent. The average network across 1,000 such nuisance-affected networks is displayed in Figure 3(A). To visualize the impact of the simulated nuisance variable, the edge-specific differences of the large and small nuisance groups are computed for both simulated networks {Ai}i=1n and nuisance-affected networks {A˜i}i=1n (Fig. 3(B)). We observe apparent differences in edge values between the large and small nuisance groups of {A˜i}i=1n. We further apply principal component analysis (PCA) to both {Ai}i=1n and {A˜i}i=1n, and visualize their projections onto the first two principal components (PCs) in Figure 3(C). The first two PCs of {Ai}i=1n are indistinguishable. Note that these simulated networks are nuisance free, even though we color their projections with the simulated nuisance for visualization purposes. On the contrary, we observe an apparent separation between the large and small nuisance groups of {A˜i}i=1n after introducing unwanted associations between the nuisance variable and the simulated networks.

To demonstrate the effectiveness of our method in removing artifacts introduced by the nuisance variable, we further compare inv-VAE with the graph autoencoder (GATE) developed in M. Liu et al. (2021). GATE has the same generative model as inv-VAE (see Section 3) except without adjusting for ci. Specifically, GATE assumes the following generative process for the brain network Ai:

(12)

Moreover, GATE learns the latent vector zi using the following objective

(13)

instead of the motion-invariant objective in (10). We train GATE using {A˜i}i=1n to learn latent representations of nuisance-affected networks. The learned latents are used by the generative model in (12) to reconstruct networks in Figure 3(A). Figure 3(B) and (C) shows the reconstructed edge-specific differences between the large and small nuisance group, and the first two PCs of the learned latent variables, respectively.

Next, we train inv-VAE with {A˜i}i=1n to learn parameters ϕ and θ for the inference and generative model. The inference model then learns motion-invariant latent representations, from which the generative model produces nuisance-corrected networks (Fig. 3(A)) by setting si=1. As previously defined, si1 corresponds to small nuisance artifacts, whereas a small si represents large nuisance artifacts. This is because multiplication of Ai with a small si in (11) shrinks its edge values to a large degree, whereas multiplication with si1 keeps the edge values roughly unchanged. Between-group edge-specific differences of the nuisance-corrected networks, and the first two PCs of invariant latents are shown in Figure 3(B, C). Figure 3(B, C) together suggests that GATE reconstructs the artifacts introduced by the nuisance variable, and its learned latent embeddings are affected by nuisance artifacts. On the contrary, such edge-specific differences between the large and small nuisance groups do not exist in the nuisance-corrected networks from inv-VAE, and projections of invariant embeddings of the large and small nuisance groups are indistinguishable. This observation implies that inv-VAE has the ability to remove unwanted information from the data.

5.1 Understanding and removing motion artifacts

A major motivation of our work is to understand how motion affects learning a low-dimensional representation of the structural connectome and to remove motion-induced artifacts from the connectome data. For this purpose, we apply our inv-VAE to the ABCD (8,646 subjects) and HCP data (1,065 subjects), and compare inv-VAE with its competitors on two tasks: (1) learning a latent connectome representation that is uninformative of motion and (2) generating a motion-adjusted connectome. This section focuses on the FSL eddy head motion. We provide an additional analysis in Appendix G on addressing residual misalignment after eddy motion correction.

We first average the motion-affected brain networkAi over all individuals in the study to obtain the motion-affected edge-specific means (see Section 2.2) of the ABCD and HCP datasets, respectively. We call the resulting matrices as “motion-affected,” meaning no motion adjustment has been done. The first columns of Figure 4(A, D) show the motion-affected edge-specific means in both datasets. We assign subjects into a small and a large motion group for each type of movement; See Section 2.2 for the cutoff thresholds for the large and small motion groups in both datasets. The first columns of Figure 4(B, E) show the motion-affected edge-specific differences, obtained by subtracting the motion-affected edge-specific means of the small motion group from those of the large motion group. For both datasets, we notice apparent fiber count differences between the two motion groups in almost all pairs of brain regions. Such brain connection differences due to motion artifacts are more prominent when we visualize the first two PCs of brain networks between the two groups. In the first columns of Figure 4(C, F), each point represents an individual colored with the corresponding amount of motion. In the first column of Figure 4(C), the PCA projections of the large and small motion subjects are closer compared with those in the first column of Figure 4(F). This may be explained by the amount of motion-induced artifacts in the data: participants of the HCP study are young adults who may move less during data acquisition relative to the young adolescent ABCD participants.

Fig. 4.

(A) From left to right: edge-specific means of ABCD brain networks, reconstructed brain networks by GATE, and motion-adjusted brain networks by inv-VAE, color coded according to fiber count. (B) From left to right: edge-specific differences (large group subtracts small group) for raw network data, GATE reconstructed network data, and inv-VAE reconstructed data, respectively, with color indicating differences in fiber count. (C) From left to right: the first two principal component scores of ABCD brain networks, latent embeddings learned by GATE, and invariant embeddings from inv-VAE for observations in the large- and small-motion groups, colored by the amount of head motion. (D)–(F) replicate the same content as (A)–(C) but emphasize head motion in the HCP data. Head motion is quantified in millimeters (mm).

Fig. 4.

(A) From left to right: edge-specific means of ABCD brain networks, reconstructed brain networks by GATE, and motion-adjusted brain networks by inv-VAE, color coded according to fiber count. (B) From left to right: edge-specific differences (large group subtracts small group) for raw network data, GATE reconstructed network data, and inv-VAE reconstructed data, respectively, with color indicating differences in fiber count. (C) From left to right: the first two principal component scores of ABCD brain networks, latent embeddings learned by GATE, and invariant embeddings from inv-VAE for observations in the large- and small-motion groups, colored by the amount of head motion. (D)–(F) replicate the same content as (A)–(C) but emphasize head motion in the HCP data. Head motion is quantified in millimeters (mm).

Close modal

Next, we apply GATE (M. Liu et al., 2021) to the ABCD and HCP data. For each dataset, GATE is trained using all brain networks Ai’s to learn the model parameters θ and ϕ. Equipped with well-trained estimators, GATE obtains the learned latent variables from the inference model and reconstructs brain networks from the generative model in (12). We denote the reconstructed brain network as A^i. The second columns of Figure 4(A, D) show the reconstructed edge-specific means averaged across all A^i’s in both datasets. In the second columns of Figure 4(B, E), reconstructed edge-specific differences between the large and small motion group are shown. Although GATE can generate brain networks that resemble the observed data, motion artifacts are also undesirably reconstructed, as shown in the second columns of Figure 4(B, E). The second columns of Figure 4(C, F) show the first two PCs of the learned latents by GATE from the large and small motion groups. The gap between the two groups of latents suggests that representations learned by GATE are also corrupted by motion artifacts, which will further affect prediction and inferences.

For our inv-VAE, we use ci to represent the i-th subject’s head motion, with 0 representing the smallest amount of motion. To adjust for motion artifacts, we first train the inv-VAE model using the ABCD and HCP data separately. The inference model of inv-VAE learns a set of motion-invariant representations zi. Following the generative model in Section 3.1, zi are subsequently used to generate motion-adjusted brain networks, denoted as Ai*, by setting ci to 0 to remove motion artifacts. For each dataset, we average across all Ai* to obtain the motion-adjusted edge-specific means shown in the third columns of Figure 4(A, D), which resemble the observed edge-specific means. The third columns of Figure 4(B, E) show the motion-adjusted edge-specific differences by taking the difference between the motion-adjusted edge-specific means of the large and small motion groups. Compared with the color scale of the observed between-group edge-specific differences, the colors of the motion-adjusted between-group edge-specific differences are noticeably lighter, suggesting that brain connection differences between the large and small motion groups become smaller after motion adjustment. The third columns of Figure 4(C, F) show the first two PCs of invariant representations zi for the ABCD and HCP data. The gap between large and small motion subjects is smaller after motion adjustment compared with the projections of the observed brain networks. Although the HCP subjects have smaller head movement and their connectomes are less affected by motion-induced artifacts, we still observe that motion adjustment minimizes the reconstructed structural connection difference between the large and small motion groups in the third column of Figure 4(F).

Motion can significantly impact the acquired diffusion data, introducing signal loss, geometric distortion, misalignment volumes, and compromised diffusion measures. There have been many studies to see how motion impacts the diffusion MRI signal and constructed structural connectomes. For example, Andersson and Sotiropoulos (2016) highlight the susceptibility of diffusion MRI to artifacts caused by subject movements, resulting in both signal loss and geometric distortion. Additionally, Baum et al. (2018) demonstrate that motion has a noticeable impact on a considerable percentage of edges in structural brain networks. This effect is not limited to network edges but extends to significantly affected node strength and total network strength. In this work, we aim to understand how motion impacts our understanding of structural brain connectivity. For both ABCD and HCP studies, we first obtain the motion-affected edge-specific means via averaging Ai’s across all individuals in the dataset. After training inv-VAE using all Ai’s, we generate motion-adjusted networks Ai*’s from the invariant embeddings. We then average across all Ai*’s to construct the motion-adjusted edge-specific means. The differences between the motion-adjusted and motion-affected edge-specific means are shown in Figure 5(A, B). In both datasets, we observe more connections between the cingulate and the other brain regions in both hemispheres after motion adjustment, implying that brain connections related to the cingulate are more severely impacted by motion artifacts. There are also notable impacts in other regions, including the frontal, temporal, occipital, and parietal lobes, where we observe significant changes in connection strength after motion adjustment.

Fig. 5.

Structural connectivity differences between motion-adjusted and motion-affected ABCD (A) and HCP (B) brain networks; see Section 5.1 for definition. Each line connecting a pair of brain regions is colored by the corresponding fiber count difference after motion adjustment. We only show the 15 mostly changed brain connections.

Fig. 5.

Structural connectivity differences between motion-adjusted and motion-affected ABCD (A) and HCP (B) brain networks; see Section 5.1 for definition. Each line connecting a pair of brain regions is colored by the corresponding fiber count difference after motion adjustment. We only show the 15 mostly changed brain connections.

Close modal

5.2 Relating motion-adjusted connectomes to cognition-related traits

Our method can effectively adjust for motion-induced artifacts in the connectome data. A natural question is whether motion-invariant representations further our understanding of the relationship between brain connectomes and cognition-related traits. From the ABCD dataset, we extract the following cognitive traits as yi: (1) picture vocabulary test score for language comprehension; (2) oral reading recognition test score for language decoding; (3) fluid composite score for new learning; (4) crystallized composite score for past learning; and (5) cognition total composite score for overall cognition capacity. From the HCP dataset, we consider similar traits: (1) picture vocabulary test score; (2) pattern completion test score for processing speed; (3) picture sequence test score for episodic memory; (4) list sorting test score for working memory; and (5) fluid intelligence score; see Gershon et al. (2013) for details about these traits. All traits are age corrected.

According to Section 3.2, we assume that motion ci is independent of the trait of interest yi. However, there might be significant correlations between clinically and scientifically interesting traits and subject motion. To mitigate the concern of erroneously attributing part of the connectome differences to motion rather than underlying traits, we preprocess trait yi by regressing out ci. For our analysis, we predict the residuals of yi after regressing out ci and examine the relationship between motion-adjusted connectomes and cognition-related traits.

To investigate whether our method outperforms other approaches in relating structural connectomes to traits, we compare inv-VAE with the following competitors: (1) LR-PCA (motion adjusted) that uses PCA to obtain a lower dimensional projection from each individual’s brain matrix, and then links yi to the projection via a linear regression (LR). Translation and rotation are covariates that LR adjusts for; (2) Sparse Orthogonal to Group (SOG) (Aliverti et al., 2021) that utilizes matrix decomposition to produce an adjusted dataset that is statistically independent of the motion variable. We then regress yi onto the obtained low-dimensional factorizations via LR; (3) ComBat (Johnson et al., 2007) that uses empirical Bayes for batch-effect removal, and returns a motion-adjusted dataset. We then apply PCA to the adjusted data to obtain low-dimensional projections as input to LR; (4) GATE (M. Liu et al., 2021) that has the same architecture as inv-VAE, but does not adjust for motion; see Section 4 for details about the model specification.

We use the above competing methods to produce 68-dimensional connectome representations for each individual, consistent with the recommended latent dimension K in inv-VAE; see Section 3. To assess trait prediction quality, we use the Pearson correlation coefficient to measure the correlation between the observed and predicted traits, and perform fivefold cross-validation (CV). A large correlation coefficient means the predicted trait scores more closely match the observed ones, indicating a stronger association between the low-dimensional representations of structural connectomes and the cognition-related traits.

A comparison of inv-VAE with the other approaches with respect to the trait-prediction quality is displayed in Figure 6, which shows that inv-VAE outperforms LR-PCA, ComBat, and SOG over all studied traits in both datasets, and either outperforms or is comparable with GATE over most selected traits. The prediction quality of SOG is comparable with GATE and inv-VAE for some cognitive traits: oral reading recognition and fluid intelligence score.

Fig. 6.

A comparison of all methods on trait prediction in the ABCD (A) and HCP (B) data. The error bars show the min, max, mean, and standard deviation of correlation coefficients from fivefold CV. We preprocess the target trait yi by regressing out head motion. For each fold, we train inv-VAE on the training data. For the test data, we obtain trait predictions after setting ci to 0 to remove head motion.

Fig. 6.

A comparison of all methods on trait prediction in the ABCD (A) and HCP (B) data. The error bars show the min, max, mean, and standard deviation of correlation coefficients from fivefold CV. We preprocess the target trait yi by regressing out head motion. For each fold, we train inv-VAE on the training data. For the test data, we obtain trait predictions after setting ci to 0 to remove head motion.

Close modal

Another way to assess the predictive performance is to visualize the associations between low-dimensional representations of structural connectomes and cognition-related traits. Both inv-VAE and GATE produce latent features zi for each individual, which can be used for visualizations. Particularly, for the ABCD study, we consider picture vocabulary test, oral reading recognition test, and crystalized composite score; for the HCP study, we consider picture vocabulary test, oral reading recognition test, and dimensional change card sort test. Both inv-VAE and GATE are trained with brain networks of all individuals in each dataset to obtain the zi’s for the yi’s. For each trait, latent features from 200 subjects are selected, with the first group of 100 having the lowest trait scores and the second group of 100 having the highest scores. Next, we visualize the first three PCs of the posterior means of zi colored with their corresponding trait scores in Figure 7. For both inv-VAE and GATE, Figure 7 shows that representations from the two trait groups are separated, suggesting that structural connectomes are different among individuals with different cognitive abilities. We highlight that motion-invariant zi’s of the two groups are more separated than motion-affected zi’s from GATE, particularly for the oral reading recognition test. It implies that motion-invariant structural connectomes are more strongly associated with cognition-related traits than other approaches without motion adjustment.

Fig. 7.

Relationships between learned embeddings and cognition-related traits in ABCD (A) and HCP (B) datasets. We preprocess the target trait yi by regressing out head motion. Both inv-VAE and GATE are trained with brain networks of all individuals in each dataset to obtain the latent zi’s for the yi’s. For each trait, latent features from 200 subjects are selected, with the first group of 100 having the lowest trait scores and the second group of 100 having the highest scores. The first three PCs of the posterior means of zi colored with their corresponding trait scores are displayed for each trait yi. Latent embeddings produced by GATE and motion-invariant embeddings from inv-VAE are compared.

Fig. 7.

Relationships between learned embeddings and cognition-related traits in ABCD (A) and HCP (B) datasets. We preprocess the target trait yi by regressing out head motion. Both inv-VAE and GATE are trained with brain networks of all individuals in each dataset to obtain the latent zi’s for the yi’s. For each trait, latent features from 200 subjects are selected, with the first group of 100 having the lowest trait scores and the second group of 100 having the highest scores. The first three PCs of the posterior means of zi colored with their corresponding trait scores are displayed for each trait yi. Latent embeddings produced by GATE and motion-invariant embeddings from inv-VAE are compared.

Close modal

We develop a motion-invariant variational autoencoder (inv-VAE) for learning low-dimensional, motion-adjusted representations of structural connectomes. We apply inv-VAE to the Adolescent Brain Cognitive Development (ABCD) and Human Connectome Project (HCP) data and discover noticeable motion artifacts in both, despite the incorporation of motion correction procedures in preprocessing. This observation reinforces the need for effective motion mitigation strategies in connectome analysis. Being invariant to motion artifacts during the connectome modeling phase, inv-VAE shows improved performance in understanding the correlation between structural connectomes and cognition-related traits. While our inv-VAE is motivated to handle the motion confounder, it can be easily extended to handle other confounders, such as site batch effect and variation due to machine settings. Compared with popular batch effect removing models such as ComBat, the advantages of our inv-VAE are that (1) it can handle nonlinear objects (brain network data) and (2) it can deal with nonlinear batch effects. Therefore, we believe that our inv-VAE framework can be extended to broader brain imaging analysis tasks, highlighting the importance of confounder or batch effect adjustment methodologies.

Note that inv-VAE is developed under a generative model framework, which allows us to simulate brain networks under various conditions. The simulation of brain networks is crucial since brain network data are generally not widely available, and the extraction of brain networks from raw imaging data is far from straightforward. Our inv-VAE enables us to simulate (Ai,ci,yi)—that is, the brain network, undesirable artifacts, and cognition ability measures—simultaneously. Such simulated data can be freely shared with biostatisticians and data scientists who wish to develop statistical models for brain networks, but prefer not to get involved in the intricate process of brain imaging preprocessing.

Our method can model the spatially heterogeneous impact of motion: edges that share the same node can have correlations under our model since they are functions of both zi and ci. In this work, we simplify ci to encode only a summary of subject-level motion artifacts. One possible future direction is to explore the representations of motion that can improve our generative model.

Our proposed inv-VAE currently models the motion-affected structural connectome using a Poisson generative model in conjunction with a GCN. Future work could incorporate pre-existing neuroscience knowledge regarding the effects of motion-induced artifacts on specific brain regions or connections to refine our model and improve the reconstruction of motion-adjusted structural connectomes. This could be achieved by leveraging empirical evidence, neuroscientific findings, or theoretical frameworks that shed light on the disparate impacts of motion on individual brain regions or connections. Through this integration of knowledge, we aim to boost the precision and accuracy of our motion artifact mitigation process, enabling us to better capture the subtle influences of motion on brain connectivity.

Another future direction is to leverage recent developments of generative modeling frameworks, such as variational diffusion models (D. Kingma et al., 2021), score-based generative models (Song & Ermon, 2020), and Hierarchical Variational Autoencoders (HVAE) (Sønderby et al., 2016). These models can be understood under the general Evidence Lower Bound (ELBO) method used here in this paper but with different constraints in the latent space (Luo, 2022). Moreover, we use the mean motion estimate across frames, as computed by the FSL eddy tool, to represent motion during diffusion MRI acquisition and ignore the longitudinal or higher order information in the motion. This simplification may not fully represent the impacts of motion on diffusion MRI data, and thus the motion-adjusted connectomes from our inv-VAE may not be completely invariant to motion. Future developments may incorporate more complex summaries of dynamic motion.

The data used in this study contain publicly available datasets from the Human Connectome Project* and the Adolescent Brain Cognitive Development Study. The model training and testing code is available at https://github.com/yzhang511/inv-vae.

Conceptualization and Study Design: Yizi Zhang, Meimei Liu, Zhengwu Zhang. Data curation: Zhengwu Zhang. Methodology, Formal Analysis, Visualizations: Yizi Zhang, Meimei Liu. Manuscript Writing (original draft): Yizi Zhang. Manuscript Writing (review & editing): Yizi Zhang, Meimei Liu, Zhengwu Zhang, David Dunson. Supervision: David Dunson. All authors have substantively revised the work, reviewed the manuscript, approved the submitted version, and agreed to be personally accountable for their contributions.

The datasets were anonymized and not collected by the investigators, in which case the work is classified as nonhuman research.

The authors declare no competing interests.

This work is supported by grants from the National Science Foundation (NSF-DMS-2124535).

Achille
,
A.
, &
Soatto
,
S.
(
2018
).
Emergence of invariance and disentanglement in deep representations
.
Journal of Machine Learning Research
,
19
(
50
),
1
34
. https://doi.org/10.1109/ita.2018.8503149
Alemi
,
A. A.
,
Fischer
,
I.
,
Dillon
,
J. V.
, &
Murphy
,
K.
(
2016
).
Deep variational information bottleneck
.
arXiv preprint arXiv:1612.00410
. https://doi.org/10.1038/534602a
Aliee
,
H.
,
Kapl
,
F.
,
Hediyeh-Zadeh
,
S.
, &
Theis
,
F. J.
(
2023
).
Conditionally invariant representation learning for disentangling cellular heterogeneity
.
arXiv preprint arXiv:2307.00558
. https://doi.org/10.48550/arXiv.2307.00558
Aliverti
,
E.
,
Lum
,
K.
,
Johndrow
,
J. E.
, &
Dunson
,
D. B.
(
2021
).
Removing the influence of group variables in high-dimensional predictive modelling
.
Journal of the Royal Statistical Society Series A: Statistics in Society
,
184
(
3
),
791
811
. https://doi.org/10.1111/rssa.12613
Alter
,
O.
,
Brown
,
P. O.
, &
Botstein
,
D.
(
2000
).
Singular value decomposition for genome-wide expression data processing and modeling
.
Proceedings of the National Academy of Sciences of the United States of America
,
97
(
18
),
10101
10106
. https://doi.org/10.1073/pnas.97.18.10101
Andersson
,
J. L.
,
Graham
,
M. S.
,
Zsoldos
,
E.
, &
Sotiropoulos
,
S. N.
(
2016
).
Incorporating outlier detection and replacement into a non-parametric framework for movement and distortion correction of diffusion MR images
.
NeuroImage
,
141
,
556
572
. https://doi.org/10.1016/j.neuroimage.2016.06.058
Andersson
,
J. L.
, &
Sotiropoulos
,
S. N.
(
2016
).
An integrated approach to correction for off-resonance effects and subject movement in diffusion MR imaging
.
NeuroImage
,
125
,
1063
1078
. https://doi.org/10.1016/j.neuroimage.2015.10.019
Baum
,
G. L.
,
Roalf
,
D. R.
,
Cook
,
P. A.
,
Ciric
,
R.
,
Rosen
,
A. F.
,
Xia
,
C.
,
Elliott
,
M. A.
,
Ruparel
,
K.
,
Verma
,
R.
,
Tunç
,
B.
,
Gur
,
R. C.
,
Gur
,
R. E.
,
Bassett
,
D. S.
, &
Satterthwaite
,
T. D.
(
2018
).
The impact of in-scanner head motion on structural connectivity derived from diffusion MRI
.
NeuroImage
,
173
,
275
286
. https://doi.org/10.1016/j.neuroimage.2018.02.041
Ben-Amitay
,
S.
,
Jones
,
D. K.
, &
Assaf
,
Y.
(
2012
).
Motion correction and registration of high b-value diffusion weighted images
.
Magnetic Resonance in Medicine
,
67
(
6
),
1694
1702
. https://doi.org/10.1002/mrm.23186
Benito
,
M.
,
Parker
,
J.
,
Du
,
Q.
,
Wu
,
J.
,
Xiang
,
D.
,
Perou
,
C. M.
, &
Marron
,
J. S.
(
2004
).
Adjustment of systematic microarray data biases
.
Bioinformatics
,
20
(
1
),
105
114
. https://doi.org/10.1093/bioinformatics/btg385
Bhushan
,
C.
,
Haldar
,
J. P.
,
Choi
,
S.
,
Joshi
,
A. A.
,
Shattuck
,
D. W.
, &
Leahy
,
R. M.
(
2015
).
Co-registration and distortion correction of diffusion and anatomical images based on inverse contrast normalization
.
NeuroImage
,
115
,
269
280
. https://doi.org/10.1016/j.neuroimage.2015.03.050
Bjork
,
J. M.
,
Straub
,
L. K.
,
Provost
,
R. G.
, &
Neale
,
M. C.
(
2017
).
The ABCD study of neurodevelopment: Identifying neurocircuit targets for prevention and treatment of adolescent substance abuse
.
Current Treatment Options in Psychiatry
,
4
,
196
209
. https://doi.org/10.1007/s40501-017-0108-y
Casey
,
B. J.
,
Cannonier
,
T.
,
Conley
,
M. I.
,
Cohen
,
A. O.
,
Barch
,
D. M.
,
Heitzeg
,
M. M.
,
Soules
,
M. E.
,
Teslovich
,
T.
,
Dellarco
,
D. V.
,
Garavan
,
H.
,
Orr
,
C. A.
,
Wager
,
T. D.
,
Banich
,
M. T.
,
Speer
,
N. K.
,
Sutherland
,
M. T.
,
Riedel
,
M. C.
,
Dick
,
A. S.
,
Bjork
,
J. M.
,
Thomas
,
K. M.
, …
ABCD Imaging Acquisition Workgroup
. (
2018
).
The adolescent brain cognitive development (ABCD) study: Imaging acquisition across 21 sites
.
Developmental Cognitive Neuroscience
,
32
,
43
54
. https://doi.org/10.1016/j.dcn.2018.03.001
Chen
,
T.
,
Kornblith
,
S.
,
Norouzi
,
M.
, &
Hinton
,
G.
(
2020
).
A simple framework for contrastive learning of visual representations
. In
the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research
,
119
,
1597
1607
. https://proceedings.mlr.press/v119/chen20j.html
Christiaens
,
D.
,
Cordero-Grande
,
L.
,
Pietsch
,
M.
,
Hutter
,
J.
,
Price
,
A. N.
,
Hughes
,
E. J.
,
Vecchiato
,
K.
,
Deprez
,
M.
,
Edwards
,
A. D.
,
Hajnal
,
J. V.
, &
Tournier
,
J. D.
(
2021
).
Scattered slice shard reconstruction for motion correction in multi-shell diffusion MRI
.
NeuroImage
,
225
,
117437
. https://doi.org/10.1016/j.neuroimage.2020.117437
Cieslak
,
M.
,
Cook
,
P. A.
,
He
,
X.
,
Yeh
,
F.-C.
,
Dhollander
,
T.
,
Adebimpe
,
A.
,
Aguirre
,
G. K.
,
Bassett
,
D. S.
,
Betzel
,
R. F.
,
Bourque
,
J.
,
Cabral
,
L. M.
,
Davatzikos
,
C.
,
Detre
,
J. A.
,
Earl
,
E.
,
Elliott
,
M. A.
,
Fadnavis
,
S.
,
Fair
,
D. A.
,
Foran
,
W.
,
Fotiadis
,
P.
,…
Satterthwaite
,
T. D.
(
2021
).
QSIPrep: An integrative platform for preprocessing and reconstructing diffusion MRI data
.
Nature Methods
,
18
(
7
),
775
778
. https://doi.org/10.1038/s41592-021-01185-5
Creswell
,
A.
,
White
,
T.
,
Dumoulin
,
V.
,
Arulkumaran
,
K.
,
Sengupta
,
B.
, &
Bharath
,
A. A.
(
2018
).
Generative adversarial networks: An overview
.
IEEE Signal Processing Magazine
,
35
(
1
),
53
65
. https://doi.org/10.1109/msp.2017.2765202
Desikan
,
R. S.
,
Ségonne
,
F.
,
Fischl
,
B.
,
Quinn
,
B. T.
,
Dickerson
,
B. C.
,
Blacker
,
D.
,
Buckner
,
R. L.
,
Dale
,
A. M.
,
Maguire
,
R. P.
,
Hyman
,
B. T.
,
Albert
,
M. S.
, &
Killiany
,
R. J.
(
2006
).
An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest
.
NeuroImage
,
31
(
3
),
968
980
. https://doi.org/10.1016/j.neuroimage.2006.01.021
Fornito
,
A.
,
Zalesky
,
A.
, &
Breakspear
,
M.
(
2013
).
Graph analysis of the human connectome: Promise, progress, and pitfalls
.
NeuroImage
,
80
,
426
444
. https://doi.org/10.1016/j.neuroimage.2013.04.087
Gershon
,
R. C.
,
Wagster
,
M. V.
,
Hendrie
,
H. C.
,
Fox
,
N. A.
,
Cook
,
K. F.
, &
Nowinski
,
C. J.
(
2013
).
NIH toolbox for assessment of neurological and behavioral function
.
Neurology
,
80
(
11 Supp. 3
),
S2
S6
. https://doi.org/10.1212/wnl.0b013e3182872e5f
Glasser
,
M. F.
,
Smith
,
S. M.
,
Marcus
,
D. S.
,
Andersson
,
J. L.
,
Auerbach
,
E. J.
,
Behrens
,
T. E.
,
Coalson
,
T. S.
,
Harms
,
M. P.
,
Jenkinson
,
M.
,
Moeller
,
S.
,
Robinson
,
E. C.
,
Sotiropoulos
,
S. N.
,
Xu
,
J.
,
Yacoub
,
E.
,
Ugurbil
,
K.
, &
Van Essen
,
D. C.
(
2016
).
The human connectome project’s neuroimaging approach
.
Nature Neuroscience
,
19
(
9
),
1175
1187
. https://doi.org/10.1038/nn.4361
Greenfeld
,
D.
, &
Shalit
,
U.
(
2020
).
Robust learning with the Hilbert-Schmidt independence criterion
. In
the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research
,
119
,
3759
3768
. https://proceedings.mlr.press/v119/greenfeld20a.html
Haller
,
S.
,
Monsch
,
A. U.
,
Richiardi
,
J.
,
Barkhof
,
F.
,
Kressig
,
R. W.
, &
Radue
,
E. W.
(
2014
).
Head motion parameters in fMRI differ between patients with mild cognitive impairment and Alzheimer disease versus elderly control subjects
.
Brain Topography
,
27
,
801
807
. https://doi.org/10.1007/s10548-014-0358-6
He
,
K.
,
Fan
,
H.
,
Wu
,
Y.
,
Xie
,
S.
, &
Girshick
,
R.
(
2020
).
Momentum contrast for unsupervised visual representation learning
. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(pp.
9729
9738
). https://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.html
Hoff
,
P. D.
,
Raftery
,
A. E.
, &
Handcock
,
M. S.
(
2002
).
Latent space approaches to social network analysis
.
Journal of the American Statistical Association
,
97
(
460
),
1090
1098
. https://doi.org/10.1198/016214502388618906
Iannopollo
,
E.
,
Plunkett
,
R.
, &
Garcia
,
K.
(
2019
).
Surface-based analysis of cortical thickness and volume loss in Alzheimer’s disease
.
Proceedings of IMPRS
,
2
(
1
). https://doi.org/10.18060/23524
Johnson
,
W. E.
,
Li
,
C.
, &
Rabinovic
,
A.
(
2007
).
Adjusting batch effects in microarray expression data using empirical Bayes methods
.
Biostatistics
,
8
(
1
),
118
127
. https://doi.org/10.1093/biostatistics/kxj037
Khosla
,
P.
,
Teterwak
,
P.
,
Wang
,
C.
,
Sarna
,
A.
,
Tian
,
Y.
,
Isola
,
P.
,
Maschinot
,
A.
,
Liu
,
C.
, &
Krishnan
,
D.
(
2020
).
Supervised contrastive learning
.
Advances in Neural Information Processing Systems
,
33
,
18661
18673
. https://doi.org/10.48550/arXiv.2004.11362
Kingma
,
D.
,
Salimans
,
T.
,
Poole
,
B.
, &
Ho
,
J.
(
2021
).
Variational diffusion models
.
Advances in Neural Information Processing Systems
,
34
,
21696
21707
. https://doi.org/10.48550/arXiv.2107.00630
Kingma
,
D. P.
(
2013
).
Auto-encoding variational Bayes
.
arXiv preprint arXiv:1312.6114
. https://doi.org/10.48550/arXiv.1312.6114
Kingma
,
D. P.
(
2014
).
Adam: A method for stochastic optimization
.
arXiv preprint arXiv:1412.6980
. https://doi.org/10.48550/arXiv.1412.6980
Kochunov
,
P.
,
Lancaster
,
J. L.
,
Glahn
,
D. C.
,
Purdy
,
D.
,
Laird
,
A. R.
,
Gao
,
F.
, &
Fox
,
P.
(
2006
).
Retrospective motion correction protocol for high-resolution anatomical MRI
.
Human Brain Mapping
,
27
(
12
),
957
962
. https://doi.org/10.1002/hbm.20235
Le Bihan
,
D.
,
Poupon
,
C.
,
Amadon
,
A.
, &
Lethimonnier
,
F.
(
2006
).
Artifacts and pitfalls in diffusion MRI
.
Journal of Magnetic Resonance Imaging
,
24
(
3
),
478
488
. https://doi.org/10.1002/jmri.20683
Liu
,
M.
,
Zhang
,
Z.
, &
Dunson
,
D. B.
(
2021
).
Graph auto-encoding brain networks with applications to analyzing large-scale brain imaging datasets
.
NeuroImage
,
245
,
118750
. https://doi.org/10.1016/j.neuroimage.2021.118750
Liu
,
Y.
,
Yamada
,
M.
,
Tsai
,
Y.-H. H.
,
Le
,
T.
,
Salakhutdinov
,
R.
, &
Yang
,
Y.
(
2021
).
LSMI-Sinkhorn: Semi-supervised mutual information estimation with optimal transport
.
Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part I 21
(pp.
655
670
). https://doi.org/10.1007/978-3-030-86486-6_40
Louizos
,
C.
,
Swersky
,
K.
,
Li
,
Y.
,
Welling
,
M.
, &
Zemel
,
R.
(
2015
).
The variational fair autoencoder
.
arXiv preprint arXiv:1511.00830
. https://doi.org/10.48550/arXiv.1511.00830
Luo
,
C.
(
2022
).
Understanding diffusion models: A unified perspective
.
arXiv preprint arXiv:2208.11970
. https://doi.org/10.48550/arXiv.2208.11970
Maier-Hein
,
K. H.
,
Neher
,
P. F.
,
Houde
,
J.-C.
,
Côté
,
M.-A.
,
Garyfallidis
,
E.
,
Zhong
,
J.
,
Chamberland
,
M.
,
Yeh
,
F.-C.
,
Lin
,
Y.-C.
,
Ji
,
Q.
,
Wilburn E Reddick
,
W. E.
,
Glass
,
J. O.
,
Chen
,
D. Q.
,
Feng
,
Y.
,
Gao
,
C.
,
Wu
,
Y.
,
Ma
,
J.
,
He
,
R.
,
Li
,
Q.
,…
Descoteaux
,
M.
(
2017
).
The challenge of mapping the human connectome based on diffusion tractography
.
Nature Communications
,
8
(
1
),
1349
. https://doi.org/10.1038/s41467-017-01285-x
Moyer
,
D.
,
Gao
,
S.
,
Brekelmans
,
R.
,
Galstyan
,
A.
, &
Ver Steeg
,
G.
(
2018
).
Invariant representations without adversarial training
.
Advances in Neural Information Processing Systems
,
31
.
Nowicki
,
K.
, &
Snijders
,
T. A. B.
(
2001
).
Estimation and prediction for stochastic blockstructures
.
Journal of the American Statistical Association
,
96
(
455
),
1077
1087
. https://doi.org/10.1198/016214501753208735
Papernot
,
N.
,
McDaniel
,
P.
,
Jha
,
S.
,
Fredrikson
,
M.
,
Celik
,
Z. B.
, &
Swami
,
A.
(
2016
).
The limitations of deep learning in adversarial settings
. In
2016 IEEE European Symposium on Security and Privacy (EuroS&P)
(pp.
372
387
). https://doi.org/10.1109/EuroSP.2016.36
Sønderby
,
C. K.
,
Raiko
,
T.
,
Maaløe
,
L.
,
Sønderby
,
S. K.
, &
Winther
,
O.
(
2016
).
Ladder variational autoencoders
. In
Advances in Neural Information Processing Systems
,
29
. https://doi.org/10.48550/arXiv.1602.02282
Song
,
Y.
, &
Ermon
,
S.
(
2020
).
Improved techniques for training score-based generative models
.
Advances in Neural Information Processing Systems
,
33
,
12438
12448
. https://doi.org/10.48550/arXiv.2006.09011
Van Essen
,
D. C.
,
Ugurbil
,
K.
,
Auerbach
,
E.
,
Barch
,
D.
,
Behrens
,
T. E.
,
Bucholz
,
R.
,
Chang
,
A.
,
Chen
,
L.
,
Corbetta
,
M.
,
Curtiss
,
S. W.
,
Penna
,
S. D.
,
Feinberg
,
D.
,
Glasser
,
M. F.
,
Harel
,
N.
,
Heath
,
A. C.
,
Larson-Prior
,
L.
,
Marcus
,
D.
,
Michalareas
,
G.
,
Moeller
,
S.
, …
WU-Minn HCP Consortium
. (
2012
).
The human connectome project: A data acquisition perspective
.
NeuroImage
,
62
(
4
),
2222
2231
. https://doi.org/10.1016/j.neuroimage.2012.02.018
Yendiki
,
A.
,
Koldewyn
,
K.
,
Kakunoori
,
S.
,
Kanwisher
,
N.
, &
Fischl
,
B.
(
2014
).
Spurious group differences due to head motion in a diffusion MRI study
.
NeuroImage
,
88
,
79
90
. https://doi.org/10.1016/j.neuroimage.2013.11.027
Yu
,
S.
,
Alesiani
,
F.
,
Yu
,
X.
,
Jenssen
,
R.
, &
Principe
,
J.
(
2021
).
Measuring dependence with matrix-based entropy functional
.
Proceedings of the AAAI Conference on Artificial Intelligence
,
35
(
12
),
10781
10789
. https://doi.org/10.1609/aaai.v35i12.17288
Zemel
,
R.
,
Wu
,
Y.
,
Swersky
,
K.
,
Pitassi
,
T.
, &
Dwork
,
C.
(
2013
).
Learning fair representations
. In
Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research
,
28
(
3
),
325
333
. https://proceedings.mlr.press/v28/zemel13.html
Zhang
,
Z.
,
Allen
,
G. I.
,
Zhu
,
H.
, &
Dunson
,
D.
(
2019
).
Tensor network factorizations: Relationships between brain structural connectomes and traits
.
NeuroImage
,
197
,
330
343
. https://doi.org/10.1016/j.neuroimage.2019.04.027
Zhang
,
Z.
,
Descoteaux
,
M.
, &
Dunson
,
D. B.
(
2019
).
Nonparametric Bayes models of fiber curves connecting brain regions
.
Journal of the American Statistical Association
,
114
(
528
),
1505
1517
. https://doi.org/10.1080/01621459.2019.1574582
Zhang
,
Z.
,
Descoteaux
,
M.
,
Zhang
,
J.
,
Girard
,
G.
,
Chamberland
,
M.
,
Dunson
,
D.
,
Srivastava
,
A.
, &
Zhu
,
H.
(
2018
).
Mapping population-based structural connectomes
.
NeuroImage
,
172
,
130
145
. https://doi.org/10.1016/j.neuroimage.2017.12.064

Appendix

This section provides model details and additional numerical studies. The appendix is organized as follows:

  • Section A: Specification of GCN details used in both inv-VAE and GATE.

  • Section B: Derivation of the joint likelihood of brain networks Ai and cognition traits yi conditional on the nuisance factor ci as defined in Equation (8).

  • Section C: Derivation of the mutual information between latent variables zi and ci as defined in Equation (7).

  • Section D and E: Implementation details, including Monte Carlo approximation of the training objective and model architecture details of our developed inv-VAE.

  • Section F: Investigation of correlation between motion score and outlier measure to assess the relationship between head motion, residual misalignment, and outlier data.

  • Section G: Analysis of the effect of residual misalignment adjustment on brain connectome reconstruction and trait prediction.

  • Section H: Expanded simulation study to evaluate our method’s robustness in handling extreme motion-induced artifacts.

  • Section I: Methods for extracting residual misregistration representations.

Appendix A. GCN model specification

To exploit the local collaborative patterns among brain regions, M. Liu et al. (2021) propose to learn each region’s representation by propagating node-specific k-nearest neighbor information. We use the relative distance between a pair of brain regions to represent their locality. The relative distance is measured through the length of the white matter fiber tract connecting the pair, and stored in a matrix DV×V. Duv denotes the fiber length between regions u and v. For u, its k-nearest neighbors (kNNs) can be defined according to D. Graph convolution is defined to learn the r-th latent coordinate Xr(z˜i) for subject i. In particular, an m-layer GCN is Xr(m)(z˜i)=hm(Wr(m)Xr(m1)(z˜i)+bm), for 2mM, with Xr(1)(z˜i)=h1(Wr(1)z˜i+b1). Xr(m)(z˜i) denotes the output of the m-th GCN layer, hm() is an activation function of the m-th layer, and Wr(m) is a weight matrix characterizing the convolution operation at the m-th layer. The embedding feature of each region at the m-th layer is determined by the weighted sum of itself and its nearest neighbor regions at the (m1)-th layer. When m=1, Wr(1)V×(K+C) maps z˜iK+C to the latent space Xr(1)(z˜i)V. When m2, Wr(m) is a V×V matrix with the u-th row wu(r,m) satisfying wuv(r,m)>0 if v=u or vkrNN(u), and wuv(r,m)=0 otherwise. Here, krNN(u) denotes the k nearest neighbors of region u in the r-th dimension of the latent space.

Appendix B. Derivation of the joint likelihood of Ai,yi conditional on ci in (8)

The joint likelihood of Ai and yi given ci is

By Jensen’s inequality,

Assuming that zi is independent of ci, we have pθ(zi|ci)=p(zi). We then have

Appendix C. Derivation of mutual information I(zi,ci) defined in (7)

From mutual information properties, we have that

where I(zi,ci|Ai)=H(zi|Ai)H(zi|Ai,ci)=H(zi|Ai)H(zi|Ai)=0, because the distribution of zi solely depends on Ai not on ci. Using mutual information properties, we write I(zi,ci) as

Using variational inequality, we have

where H(Ai|ci) is a constant that can be ignored.

Appendix D. Monte Carlo approximation of inv-VAE objective

We use Monte Carlo variational inference (D. P. Kingma, 2013) to approximate the intractable expectation in our invariant objective. Using reparametrization trick, for each =1,.. .,L, we sample ziqϕ(zi|Ai)=N(μϕ(Ai),Σϕ(Ai)), where μϕ(Ai)K and Σϕ(Ai)K×K. With zi, we decompose the objective in (10) into three parts: (1) The reconstruction error

can be approximated by ˜recon=1L=1L(logpθ(Ai|zi,ci)+logpθ(yi|zi)). (2) Because both qϕ(zi|Ai) and pθ(zi) are normally distributed, the KL divergence between them,

can be approximated by ˜prior=12k=1K(μk2+σk21log(σk)2), where μk,σk are the k-th element of μϕ(Ai) and Σϕ(Ai). (3) qϕ(zi) is the marginal distribution of qϕ(zi|Ai), and they are two Gaussians. We can approximate

with the following pairwise Gaussian KL divergence (Louizos et al., 2015):

where μ0 and Σ0 parametrize qϕ(zi), and μ1 and Σ1 parametrize qϕ(zi|Ai). The approximated invariant objective, ˜(Ai,yi,ci;θ,ϕ)=(1+λ)˜reconλ˜marg˜prior, is now differentiable with respect to θ and ϕ.

Appendix E. Architecture of inv-VAE model

The architecture of our inv-VAE is presented in Table A1. Cross-validation is used to determine the best model for both the ABCD and HCP datasets. The latent dimension size and network depth are chosen to achieve the minimal training loss. We use the Adam optimizer (D. P. Kingma, 2014) to train both inv-VAE and GATE on a GPU device. During model training, the parameters in Table A1 are set to be the same for both inv-VAE and GATE for a fair comparison. The learning rate is set to 5e6 in the simulation study and 2e5 in applications to the two datasets. For the minibatches, we choose a batch size of 32, and the batches are sampled uniformly at random at each epoch and repeated for 200 epochs in both simulation study and real applications.

Table A1

Model architecture of inv-VAE.

Inference model (encoder)Generative model (decoder)
inv-VAE W1 :68×256 k-NN: 32 
 W2 :256×68 K = 68 
 b1 :256×1 M = 2 
 b2 :68×1 R = 5 
 φ1: ReLU h1: Sigmoid 
 φ2: Linear h2: Sigmoid 
Inference model (encoder)Generative model (decoder)
inv-VAE W1 :68×256 k-NN: 32 
 W2 :256×68 K = 68 
 b1 :256×1 M = 2 
 b2 :68×1 R = 5 
 φ1: ReLU h1: Sigmoid 
 φ2: Linear h2: Sigmoid 

K is the latent dimension of zi; M is the number of GCN layers; R is the latent dimension of X(zi);W., b., φ. are weights, biases, and activation functions in the encoder; h. are the activation functions in the decoder.

When the experiment’s objective is to adjust for motion-induced artifacts, as demonstrated in Section 5.1, 90% subjects are included in the training set and 10% subjects are included in the validation set to select optimal model parameters. When the goal is to predict cognition-related traits, as outlined in Section 5.2, we use fivefold cross-validation (CV) to designate 80% of the data as the training and validation data (in which, 90% are for training the model and 10% serve as the validation set to choose model parameters) and 20% as the test set for evaluating model predictions.

Appendix F. Correlation between motion score and outlier measure

To clarify the specific aspects of image quality captured by our proposed motion measure, we conduct a comparison between the head motion estimate from the eddy tool, the residual misalignment (translation and rotation), and a summary of the outlier measure. In Figure A1, we examine the correlation between the proposed artifact measures and the number of outlier slices in the HCP dataset. The analysis reveals no significant relationship between outliers and either motion or residual misalignment, suggesting that the proposed measures capture other distinct artifacts present in the data.

Fig. A1.

Pearson’s correlation between head motion, residual misalignment (translation and rotation), and outlier measure in the HCP dataset. Head motion and residual translation are measured in millimeters (mm), while residual rotation is measured in angles.

Fig. A1.

Pearson’s correlation between head motion, residual misalignment (translation and rotation), and outlier measure in the HCP dataset. Head motion and residual translation are measured in millimeters (mm), while residual rotation is measured in angles.

Close modal

Appendix G. Inv-VAE with residual misalignment adjustment

In this section, we apply our proposed inv-VAE to residual misalignment adjustment. Residual misalignment refers to the remaining distortions and imperfections in image registration following FSL eddy correction. Our analyses of the ABCD and HCP datasets indicate that, although residual misalignment exhibits a low correlation with head motion obtained from FSL eddy (Fig. A3), it can still impact the constructed brain networks. Therefore, addressing this artifact is crucial as it enables us to correct for sources of artifacts that are not correlated with head motion.

To measure the residual misalignment, we rely on b=0 images (images acquired without diffusion weighting). In the ABCD data, each individual has 7 b=0 frames evenly distributed throughout the diffusion scan, while each HCP subject has 18 frames with b=0. We use the FSL FLIRT tool to align other b=0 images to the first b=0 image with 6 degrees of freedom (DOF = 6, rigid body). Our analysis employs FSL version 5.0.9 for both the eddy and FLIRT tools. To quantify residual misalignment, we use the total amplitudes of rotation and translation. For each individual, we calculate the average translation and rotation displacements across all frames, providing a measure of residual misalignment. The detailed process for extracting these representations of residual translation and rotation is described in Appendix I.

The residual translation and rotation in the ABCD and HCP datasets demonstrate a relatively low correlation with the head motion (see Fig. A3), motivating us to check further the relationship between the residual misalignment and brain structural network. The results are displayed in Figure A2(A)–(C). In the column A of Figure A2, we display the histograms of residual translation and rotation for ABCD and HCP subjects, with the blue and orange lines representing the 10th and 90th percentiles. Subjects with the residual translation (rotation) below the blue line are classified into the small translation (rotation) group, while those above the orange line belong to the large translation (rotation) group. In column (B), we show the mean network differences between the large and small translation (rotation) groups (largesmall). Column (C) shows circular plots representing the data in panel (C), limited to the 15 edges showing the largest fiber count differences.

Fig. A2.

(A) Distributions of residual translation in millimeters (mm) and residual rotation in angles for the ABCD and HCP datasets. In ABCD data, the 5-th and 95-th percentiles are marked by the blue and orange lines. In HCP data, the blue and orange lines indicate the 10-th and 90-th percentiles. Subjects whose residual translation (rotation) falls below the blue line are assigned to the small translation (rotation) group, and those falling above the orange line are in the large translation (rotation) group. (B) Differences of mean networks between the large and small (largesmall) translation (rotation) groups. (C) Plotting column (B) in circular plots. The color scale is the same as that in (B), but we only show 15 edges with the largest fiber count differences for display.

Fig. A2.

(A) Distributions of residual translation in millimeters (mm) and residual rotation in angles for the ABCD and HCP datasets. In ABCD data, the 5-th and 95-th percentiles are marked by the blue and orange lines. In HCP data, the blue and orange lines indicate the 10-th and 90-th percentiles. Subjects whose residual translation (rotation) falls below the blue line are assigned to the small translation (rotation) group, and those falling above the orange line are in the large translation (rotation) group. (B) Differences of mean networks between the large and small (largesmall) translation (rotation) groups. (C) Plotting column (B) in circular plots. The color scale is the same as that in (B), but we only show 15 edges with the largest fiber count differences for display.

Close modal
Fig. A3.

Pearson’s correlation between head motion and the residual translation and rotation in each dataset. Head motion is quantified in millimeters (mm) by computing the average displacement of each intracerebral voxel, then calculating the square root of the average squared displacements. Residual translation is measured in millimeters, while residual rotation is measured in angles.

Fig. A3.

Pearson’s correlation between head motion and the residual translation and rotation in each dataset. Head motion is quantified in millimeters (mm) by computing the average displacement of each intracerebral voxel, then calculating the square root of the average squared displacements. Residual translation is measured in millimeters, while residual rotation is measured in angles.

Close modal

We further conduct a similar analysis parallel with Section 5.1 “Understanding and Removing Motion Artifacts” to demonstrate the impact of residual misalignment and mitigate its effects on both the ABCD and HCP datasets. Firstly, we average the residual-affected brain network Ai across all individuals in the study, obtaining the residual-affected edge-specific means for both datasets. The first columns of Figures A4 and A5(A) illustrate these residual-affected edge-specific means in both datasets. The first columns of Figures A4 and A5(B, C) display the differences in residual-affected edges in ABCD and HCP data, determined by subtracting the means of the small translation (rotation) group from those of the large translation (rotation) group. Additionally, we visualize the first two principal components (PCs) of brain networks in the first columns of Figures A4 and A5(D, E), with each point representing an individual colored according to their corresponding residual translation (rotation) displacement.

Fig. A4.

(A) From left to right: observed edge-specific means of ABCD brain networks, reconstructed brain networks by GATE, and residual-adjusted brain networks by inv-VAE, color coded by fiber count. (B) and (C): edge-specific differences (large group subtract small group) for residual translation and rotation, respectively, colored by differences in fiber count. Ordering follows that in (A). (D) and (E): from left to right—the first two PC scores of ABCD brain networks, latent embeddings learned by GATE, and invariant embeddings from inv-VAE for observations in the large and small translation (rotation) groups, colored by the amount of residual translation (D) and rotation (E), respectively. Residual translation is in millimeters (mm) and residual rotation is in angles.

Fig. A4.

(A) From left to right: observed edge-specific means of ABCD brain networks, reconstructed brain networks by GATE, and residual-adjusted brain networks by inv-VAE, color coded by fiber count. (B) and (C): edge-specific differences (large group subtract small group) for residual translation and rotation, respectively, colored by differences in fiber count. Ordering follows that in (A). (D) and (E): from left to right—the first two PC scores of ABCD brain networks, latent embeddings learned by GATE, and invariant embeddings from inv-VAE for observations in the large and small translation (rotation) groups, colored by the amount of residual translation (D) and rotation (E), respectively. Residual translation is in millimeters (mm) and residual rotation is in angles.

Close modal
Fig. A5.

(A) From left to right: observed edge-specific means of HCP brain networks, reconstructed brain networks by GATE, and residual-adjusted brain networks by inv-VAE, color coded by fiber count. (B) and (C): edge-specific differences (large group subtract small group) for residual translation and rotation, respectively, colored by differences in fiber count. Ordering follows that in (A). (D) and (E): from left to right—the first two PC scores of HCP brain networks, latent embeddings learned by GATE, and invariant embeddings from inv-VAE for observations in the large and small translation (rotation) groups, colored by the amount of residual translation (D) and rotation (E), respectively. Residual translation is in millimeters (mm) and residual rotation is in angles.

Fig. A5.

(A) From left to right: observed edge-specific means of HCP brain networks, reconstructed brain networks by GATE, and residual-adjusted brain networks by inv-VAE, color coded by fiber count. (B) and (C): edge-specific differences (large group subtract small group) for residual translation and rotation, respectively, colored by differences in fiber count. Ordering follows that in (A). (D) and (E): from left to right—the first two PC scores of HCP brain networks, latent embeddings learned by GATE, and invariant embeddings from inv-VAE for observations in the large and small translation (rotation) groups, colored by the amount of residual translation (D) and rotation (E), respectively. Residual translation is in millimeters (mm) and residual rotation is in angles.

Close modal

We then apply GATE (M. Liu et al., 2021) to both the ABCD and HCP datasets. The second columns of Figures A4 and A5(A) illustrate the reconstructed edge-specific means averaged across all reconstructed networks  A^i for both datasets. The second columns of Figures A4 and A5(B, C) show reconstructed edge-specific differences between the large and small translation (rotation) groups. The first two PCs of the learned GATE latents from the large and small translation (rotation) groups are displayed in the second columns of Figures A4 and A5(D, E) for residual translation (rotation).

To address artifacts due to residual misalignment, we train inv-VAE separately using the ABCD and HCP data. We generate residual-adjusted brain networks by setting ci=(ci1,ci2) to 0, where ci1 and ci2 represent residual translational and rotational displacements, to eliminate residual artifacts. Averaging across all residual-adjusted networks for each dataset yields the residual-adjusted edge-specific means shown in the third columns of Figures A4 and A5(A). The third columns of Figures A4 and A5(B, C) illustrate the residual-adjusted edge-specific differences obtained by subtracting the residual-adjusted edge-specific means of the large and small translation (rotation) groups. The third columns of Figures A4 and A5(D, E) depict the first two PCs of invariant representations zi for both datasets. The gap between large and small translation (rotation) groups is reduced after residual adjustment compared with the projections of the observed brain networks. Figures A4 and A5 display the reconstructed fiber count differences between subject groups with low and high translation and rotation residual artifacts when ci = 0. To demonstrate how removing residual artifacts (by setting ci = 0) impacts the reconstructed brain connectomes across different population groups, Figure A6 presents comparisons between residual-adjusted networks and original networks for each subject group, categorized by their varying levels of residual artifacts.

Fig. A6.

When setting ci=0 and applying the generative model, we observe small differences in the connectomes for the population with low residual artifacts, whereas large differences are observed in the connectomes of the populations with high residual artifacts. Edge-specific differences between the residual-adjusted networks and the original ABCD (A) and HCP (B) networks are averaged across all subjects in the small and large residual translation (rotation) groups. Colors represent the change in fiber counts after residual misalignment correction. The histograms depicting the scale of correction in fiber counts in the ABCD (C) and HCP (D) data are also presented. To improve clarity in visualization, only brain connections with a change in fiber counts exceeding 100 are displayed.

Fig. A6.

When setting ci=0 and applying the generative model, we observe small differences in the connectomes for the population with low residual artifacts, whereas large differences are observed in the connectomes of the populations with high residual artifacts. Edge-specific differences between the residual-adjusted networks and the original ABCD (A) and HCP (B) networks are averaged across all subjects in the small and large residual translation (rotation) groups. Colors represent the change in fiber counts after residual misalignment correction. The histograms depicting the scale of correction in fiber counts in the ABCD (C) and HCP (D) data are also presented. To improve clarity in visualization, only brain connections with a change in fiber counts exceeding 100 are displayed.

Close modal

We further investigate how residual artifacts influence our understanding of structural brain connectivity. For both ABCD and HCP datasets, we first compute the residual-affected edge-specific means by averaging Ai’s across all subjects in the dataset. After training inv-VAE using all Ai’s, we generate residual-adjusted networks Ai*’s from the invariant embeddings. We then average across all Ai*’s to obtain the residual-adjusted edge-specific means. Figure A7(A,B) illustrates the differences between the residual-adjusted and residual-affected edge-specific means.

Fig. A7.

Structural connectivity differences between residual-adjusted and residual-affected ABCD (A) and HCP (B) brain networks. Each line connecting a pair of brain regions is colored by the corresponding fiber count difference after adjusting for residual misalignment. We only show the 15 mostly changed brain connections.

Fig. A7.

Structural connectivity differences between residual-adjusted and residual-affected ABCD (A) and HCP (B) brain networks. Each line connecting a pair of brain regions is colored by the corresponding fiber count difference after adjusting for residual misalignment. We only show the 15 mostly changed brain connections.

Close modal

Parallel to Section 5.2, we explore whether residual-invariant representations can improve our understanding of the relationship between brain connectomes and cognition-related traits. To prevent mistakenly attributing connectome differences to residual misalignment rather than underlying traits, we predict the residuals of trait yi after regressing out the residual artifacts, ci=(ci1,ci2), where ci1 and ci2 are the mean translational and rotational displacements across all frames. Figure A8 compares inv-VAE’s trait-prediction quality with other baselines described in Section 5.2. The results demonstrate that inv-VAE outperforms LR-PCA, ComBat, and SOG across all studied traits in both datasets, while either surpassing or matching GATE’s performance for most selected traits.

Fig. A8.

A comparison of all methods on trait prediction in the ABCD (A) and HCP (B) data. The error bars show the minimum, maximum, mean, and standard deviation of correlation coefficients from a fivefold CV. We preprocess the target trait yi by regressing out residual misalignment. For each fold, we train inv-VAE on the training data. For the test data, we obtain trait predictions after setting ci to 0 to remove residual misalignment.

Fig. A8.

A comparison of all methods on trait prediction in the ABCD (A) and HCP (B) data. The error bars show the minimum, maximum, mean, and standard deviation of correlation coefficients from a fivefold CV. We preprocess the target trait yi by regressing out residual misalignment. For each fold, we train inv-VAE on the training data. For the test data, we obtain trait predictions after setting ci to 0 to remove residual misalignment.

Close modal

We also assess predictive performance by visualizing associations between latent representations of structural connectomes and cognitive traits. For both ABCD and HCP studies, we select specific traits and use inv-VAE and GATE to produce latent features zi for each subject. We then choose 200 subjects (100 with the lowest scores, 100 with the highest) for each trait and visualize the first three PCs of the posterior means of their zi, colored by trait scores (Fig. A9). These plots show separation between high- and low-trait groups, indicating structural connectome differences related to cognitive abilities. Notably, inv-VAE’s residual-invariant zi’s show clearer separation than GATE’s residual-affected zi’s, especially for the oral reading recognition test. This suggests residual-invariant structural connectomes have stronger associations with cognitive traits compared with approaches without residual misalignment adjustment.

Fig. A9.

Relationships between learned embeddings and cognition-related traits in ABCD (A) and HCP (B) datasets. We preprocess the target trait yi by regressing out residual misalignment. Both inv-VAE and GATE are trained with brain networks of all individuals in each dataset to obtain the latent zi’s for the yi’s. For each trait, latent features from 200 subjects are selected, with the first group of 100 having the lowest trait scores and the second group of 100 having the highest scores. The first three PCs of the posterior means of zi colored with their corresponding trait scores are displayed for each trait yi. Latent embeddings produced by GATE and residual-invariant embeddings from inv-VAE are compared.

Fig. A9.

Relationships between learned embeddings and cognition-related traits in ABCD (A) and HCP (B) datasets. We preprocess the target trait yi by regressing out residual misalignment. Both inv-VAE and GATE are trained with brain networks of all individuals in each dataset to obtain the latent zi’s for the yi’s. For each trait, latent features from 200 subjects are selected, with the first group of 100 having the lowest trait scores and the second group of 100 having the highest scores. The first three PCs of the posterior means of zi colored with their corresponding trait scores are displayed for each trait yi. Latent embeddings produced by GATE and residual-invariant embeddings from inv-VAE are compared.

Close modal

Appendix H. Extreme motion and breakdown point

Given that HCP and ABCD data undergo multiple curation steps to exclude instances of very high motion, the threshold at which our proposed method might encounter limitations and fail to correct for extreme subject motion remains unclear. To address this, we conduct tests on our motion-invariant method using simulated data with varying levels of nuisance variables, evaluating the method’s robustness.

We expand the simulation study outlined in Section 4 with varying motion levels. In particular, we simulate networks {Ai}i=1n, where AiV×V with V=68 nodes and n=200. We then introduce a nuisance variable, {si}i=1n, by sampling its elements from N(μj,μj) with μj{0.5, 0.1, 0.01, 0.001, 0.0001}. To keep the notation consistent, we use ci to denote the simulated motion, and set ci=1si. Motion-affected networks A˜i are generated by propagating ci across all edges in Ai using the expression:

(14)

Intuitively, large ci’s introduce large artifacts to the simulated networks, as the multiplication of 1ci with Ai in Equation (14) results in a A˜i with greatly reduced edge values; small ci’s generate smaller artifacts as the multiplication operation in Equation (14) changes the edges of Ai to a lesser extent. Figure A10(A) displays the first two principal components of the simulated brain networks colored by the motion group (motion free is in blue and motion affected is in orange), illustrating that as the simulated motion ci increases, the distance between the two groups of networks becomes larger, making it more challenging for our proposed method to mitigate distortions from such artifacts.

Fig. A10.

Analysis of the robustness of the proposed method under extreme motion and motion invariance test. (A) The first two PCs of the nuisance-free (blue) and the nuisance-affected (orange) networks under various levels of nuisances. (B) The first two PCs of the learned invariant latents for the nuisance-free (blue) and the nuisance-affected (orange) networks.

Fig. A10.

Analysis of the robustness of the proposed method under extreme motion and motion invariance test. (A) The first two PCs of the nuisance-free (blue) and the nuisance-affected (orange) networks under various levels of nuisances. (B) The first two PCs of the learned invariant latents for the nuisance-free (blue) and the nuisance-affected (orange) networks.

Close modal

We then train inv-VAE on the simulated motion-free and motion-affected networks to learn the parameters ϕ and θ to learn the motion-invariant latent representations (see Fig. A10(B)). As defined earlier, ci0 corresponds to small motion artifacts, whereas large ci’s represent large motion artifacts. If our method cannot accurately recover the motion-free zi, the PCs of the invariant zi’s from the motion-free and motion-affected group will be separated. In Figure A10(B), we visually display the invariant zi’s of the two groups. Notably, when ci is small, our proposed method performs well in recovering the motion-free latents. However, when ci is large, our proposed method starts to break down, indicated by the clear separation between the two groups of latent representations.

We have conducted a simple motion-invariance test using both ABCD and HCP data. In Figures 4 and 5 panels (D–E) in the manuscript, we present the first two principal component (PC) scores of observed brain networks, latent embeddings acquired through GATE, and invariant embeddings obtained from inv-VAE for brain networks in both large- and small-motion groups. Notably, we observe that both the PCs of the observed brain networks and GATE latents are correlated with the subject motion ci. In contrast, the invariant zi demonstrate independence from ci.

Additionally, we conduct a simulation study to see whether the same subject, imaged with varying amounts of motion, would have the same zi vector. For a given Ai, we can simulate various motion-affected networks A˜i with different motion parameters ci. To check whether identical zi can be obtained from these A˜i’s, we generate A˜i with a distinct ci=1si and examine whether the embeddings match those obtained when ci=0. To test whether identical networks yield the same zi under different motion levels, we use the symmetric Kullback–Leibler (KL) divergence to quantify the distance between the learned zi’s. This choice is motivated by the fact that each zi follows a multivariate normal distribution characterized by means and variances learned by our inference model:

where P and Q are the distributions of the motion-free zi and motion-invariant zi, respectively. If our method (inv-VAE) cannot accurately recover the same zi, the network will have a large KL value.

In our simulation, we set ci=0 to simulate motion-free networks, and simulate motion-affected networks with five levels of motion ci=1si{0.5, 0.9, 0.99, 0.999, 0.9999}. Each motion group comprises 200 networks. We construct a distance matrix to measure the KL divergence between each pair of zi’s within and between the corresponding motion groups. In Figure A11, each entry of the distance matrix represents the KL divergence between a pair of networks, either within the same or different motion groups. For visualization purposes, we normalize the KL divergence across network pairs within each submatrix in Figure A11. Examining the first row (column) of the distance matrix allows us to assess the motion invariance of zi’s across various motion conditions—ranging from no motion (ci=0) to increased levels of motion (ci{0.5, 0.9, 0.99, 0.999, 0.9999}). Remarkably, when motion is not large, for example, ci{0.5, 0.9}, our proposed method successfully recovers the same zi vector, evident from the distinct dark diagonal line in the submatrices in the first column (row) of Figure A11 (indicating low KL divergence, approximately 0). However, as motion levels (ci) increase, our proposed method exhibits limitations, illustrated by the increased KL values between the motion-free zi and the learned motion-invariant zi.

Fig. A11.

We construct a distance matrix to measure the KL divergence between each pair of zi’s within and between the corresponding motion groups. Here, ci=0 denotes the simulated motion-free networks. We simulate motion-affected networks with five levels of motion ci=1si{0.5,0.9,0.99,0.999,0.9999}. Each motion group comprises 200 networks. Each matrix entry represents the KL divergence between a pair of networks, either within the same or different motion groups. The colors in the matrix reflect the normalized KL divergence.

Fig. A11.

We construct a distance matrix to measure the KL divergence between each pair of zi’s within and between the corresponding motion groups. Here, ci=0 denotes the simulated motion-free networks. We simulate motion-affected networks with five levels of motion ci=1si{0.5,0.9,0.99,0.999,0.9999}. Each motion group comprises 200 networks. Each matrix entry represents the KL divergence between a pair of networks, either within the same or different motion groups. The colors in the matrix reflect the normalized KL divergence.

Close modal

Appendix I. Representation of residual misregistration

For residual translation, it is represented as a 3D vector and we compute its amplitude using an L2 norm. For residual rotation, we utilize the method introduced in Zhang, Descoteaux, et al. (2019). More specifically, let I3 denote the identity element of rotation matrices SO(3). The tangent space at I3, TI3(SO(3)) forms a Lie algebra, which is usually denoted as so(3). The exponential map, exp:so(3)SO(3), provides a mapping from the tangent space TI3(SO(3)) to SO(3). The inverse of the exponential map is called the log map. so(3) is a set of 3×3 skew-symmetric matrices. We use the following notation to denote any matrix Pvso(3):

where v=[v1,v2,v3]3. The exponential map for so(3) is given by Rodrigues’ formula,

where α=12tr(PvTPv)=v[0,π). The log map for a matrix XSO(3) is a matrix in so(3), given by

where α satisfies tr(X)=2cos(α)+1. Define a mapping Φ to embed an element in so(3) to 3, Φ:so(3)3, ϕ(Av)=v. The vector v is used in our paper to denote the residual rotation.

This is essentially an axis–angle representation of a rotation, which converts a rotation matrix into a compact, vector form that is easier to work with in many applications. To link this mapping to Euler angles, one would need to perform additional transformations, because Euler angles are a different way of representing 3D rotations. Euler angles specify three sequential rotations around the axes of a coordinate system, typically referred to as roll, pitch, and yaw.

This work is licensed under a Creative Commons Attribution 4.0 International License. For more details, see https://creativecommons.org/licenses/by/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.