We investigate the utility of Wishart processes (WPs) for estimating time-varying functional connectivity (TVFC), which is a measure of changes in functional coupling as the correlation between brain region activity in functional magnetic resonance imaging (fMRI). The WP is a stochastic process on covariance matrices that can model dynamic covariances between time series, which makes it a natural fit to this task. Recent advances in scalable approximate inference techniques and the availability of robust open-source libraries have rendered the WP practically viable for fMRI applications. We introduce a comprehensive benchmarking framework to assess WP performance compared with a selection of established TVFC estimation methods. The framework comprises simulations with specified ground-truth covariance structures, a subject phenotype prediction task, a test-retest study, a brain state analysis, an external stimulus prediction task, and a novel data-driven imputation benchmark. The WP performed competitively across all the benchmarks. It outperformed a sliding window (SW) approach with adaptive cross-validated window lengths and a dynamic conditional correlation (DCC)-multivariate generalized autoregressive conditional heteroskedasticity (MGARCH) baseline on the external stimulus prediction task, while being less prone to false positives in the TVFC null models.

The human brain is a dynamic system of interacting regions. Higher-order cognition and complex behavior arise from the spatiotemporal integration, reorganization, and segregation of brain region activity (Bressler & Menon, 2010; Deco et al., 2011; Shine et al., 2016). This intrinsic functional architecture or “connectome” exhibits complex and modular system structures (Deco et al., 2015). Understanding the dynamics of brain region interconnectivity and its changes, for example, during disease, development, or learning, is a critical goal in modern neuroscience.

Functional connectivity (FC) is the study of temporal coactivation across distinct brain regions. This is inferred from the blood-oxygen-level-dependent (BOLD) neural activity signal detected using functional magnetic resonance imaging (fMRI) (Bijsterbosch et al., 2021). These connectivity maps can be used to “fingerprint” subjects, and many inter-individual differences have been identified (Finn et al., 2015; Liégeois et al., 2019). FC is characterized by statistical dependencies (i.e., correlations) between neural activity in brain regions, often assuming a stationary covariance structure (Wang et al., 2014; Zalesky et al., 2012). However, this assumption neglects the complex temporal structure of interregional relationships (Hutchison et al., 2013; Lurie et al., 2020). Therefore, the study of time-varying FC (TVFC) has become increasingly popular (Cohen, 2018; Vidaurre et al., 2021). Instead of estimating a single correlation matrix for an entire scan, correlation is estimated as a function of time during a scan.

Approaches for estimating TVFC include sliding window (SW) methods (Chang & Glover, 2010; Sakoğlu et al., 2010), multiplication of temporal derivatives (Shine et al., 2015), phase coherence models (Glerean et al., 2012), multivariate generalized autoregressive conditional heteroskedasticity (MGARCH) processes (Hakimdavoodi & Amirmazlaghani, 2020; Lindquist et al., 2014), customized Bayesian models (Ebrahimi et al., 2023; Li, Pluta, et al., 2019; Taghia et al., 2017; Warnick et al., 2018), wavelet-based methods (Zhang et al., 2016), and phase synchronization models (Demirtaş et al., 2016; Varela et al., 2001). Another popular approach is to use state-based models, such as the Hidden Markov Model (HMM). Instead of estimating a full covariance structure, each scanning volume is assigned to one of several canonical correlation matrices or “brain states” (Ahrends et al., 2022; Vidaurre et al., 2018). However, all the aforementioned methods exhibit a large degree of variation in their TVFC estimates (Foti & Fox, 2019; Lurie et al., 2020).

This study proposes to estimate TVFC using Wishart processes (WPs). This stochastic process model learns a distribution over (covariance) matrix-valued functions. This is analogous to the Gaussian process (GP) that learns a distribution over scalar-valued functions (Rasmussen & Williams, 2005). Therefore, it is a natural candidate for TVFC modeling. It has recently become possible to adopt this class of models in neuroimaging, thanks to advances in approximate inference routines and easy-to-use computational libraries (Heaukulani & van der Wilk, 2019; Matthews et al., 2017). We discuss the motivation, assumptions, and background of this model; how to construct it; how to employ Bayesian inference to fit its parameters; and its potential advantages and limitations.

Evaluating TVFC estimation methods remains difficult because the ground-truth covariance structure in real data is unobserved. Therefore, estimation methods coexist, despite the large degree of variation in their estimates. To evaluate the application of WPs to covariance estimation between fMRI time series, we designed a comprehensive collection of benchmarks that serve as proxies for the direct observation of true TVFC. Each benchmark comprises a competitive prediction task in which the WP is compared with other TVFC estimation methods (Leenings et al., 2022; Yarkoni & Westfall, 2017). We study established prediction tasks including simulation benchmarks, where time series are generated using prespecified underlying covariance structures (Hindriks et al., 2016; Thompson et al., 2018) and where the prediction task is to “reconstruct” this ground-truth covariance structure from the simulated observations, resting-state fMRI (rs-fMRI) benchmarks, where we study the predictive power of estimated TVFC of subject measures and their test-retest robustness, and task-based fMRI (tb-fMRI) benchmarks, where the goal is to predict the presence of an external stimulus (Gonzalez-Castillo & Bandettini, 2018; Sahib et al., 2018). Additionally, we introduce a data-driven imputation benchmark, in which the prediction task is to infer TVFC at a left-out test set of observations using observations from the remaining training dataset.

In Section 2, we describe the existing TVFC estimation methods that serve as baselines for comparison with the WP. These include the static FC (sFC) estimate, an improved version of the SW approach using cross-validated window lengths (SW-CV), and the dynamic conditional correlation (DCC) MGARCH model. Next, we discuss how we constructed the WP and inferred its model parameters. We then discuss the benchmarks that were studied. After presenting the results in Section 3, we summarize the insights into WP performance across these benchmarks in Section 4. We discuss limitations and provide promising avenues for future work to address these.

In this section, we describe methods for time-varying covariance estimation, how to extract features from estimated covariance structures, and how to compare estimation method performance through benchmarking.

2.1 Baseline TVFC estimation methods

2.1.1 Static functional connectivity (sFC)

Static connectivity estimates assume that the covariance between the random variables of interest does not change across any arbitrary scanning session length. The multivariate BOLD observations at time step (i.e., scanning volume) 1nN for activity time series from D regions (or nodes) are denoted as ynD. As sample means are assumed to be zero, the (unbiased) covariance matrix estimator is

Σ^=1N1n=1NynynT.
(2. 1)

The respective (Pearson) correlation matrix is obtained by dividing each covariance matrix element by the product of the standard deviations of the respective variables.

2.1.2 Sliding window (SW) functional connectivity

SW methods slide a time window of a certain length ω and shape (typically rectangular or Gaussian) across observations. The covariance is estimated within this window for each step (Shakil et al., 2016). The window length is typically between 30 and 60 seconds (Shirer et al., 2012). In this study, we implemented a rectangular window with a step size of a single volume. Time series were high-pass filtered (5th-order Butterworth) to remove the frequency components below 1/ω prior to TVFC estimation (Leonardi & Van De Ville, 2015). Zeros were padded to the start and end of the time series to allow for estimation around those locations.

Window lengths affect the estimated TVFC. Small windows can detect fast-changing aspects of the data but often return spurious covariance, whereas large windows are more robust to outliers but are unable to detect faster dynamics. Real datasets contain unknown temporal structures, and each scan may require a different window length. Strategies exist to determine the window length a priori or to circumvent the choice of window length (Nielsen et al., 2017; Vergara et al., 2019; Yaesoubi et al., 2018), such as choosing a window length to avoid 1/ω exceeding the Nyquist frequency (Zalesky et al., 2014). However, a consensus strategy for selecting this hyperparameter is lacking. In this study, we posited that good methods for inferring TVFC will also be able to predict TVFC at unseen time points. We implemented a data-driven approach for optimal a priori window length selection using cross-validation (Fig. 1). The evaluation data points were obtained one at a time from the time series as a validation set. The average log likelihood of these observations under a zero-mean multivariate Gaussian distribution was then computed as

Fig. 1.

Schematic of determining the optimal window length for sliding window (SW) TVFC estimation using cross-validation. A range of proposed window lengths was used to estimate TVFC around a range of evaluation data points. The optimal window length is determined as the one whose estimates yield the highest likelihood under a zero-mean Gaussian distribution, averaged over all evaluation data points.

Fig. 1.

Schematic of determining the optimal window length for sliding window (SW) TVFC estimation using cross-validation. A range of proposed window lengths was used to estimate TVFC around a range of evaluation data points. The optimal window length is determined as the one whose estimates yield the highest likelihood under a zero-mean Gaussian distribution, averaged over all evaluation data points.

Close modal
=1NenNe[D2log2π12log|Σ^n|12ynTΣ^n1yn],
(2. 2)

where 1nNe is an evaluation data point, Ne denotes the number of evaluation points (equal to the total number of time steps N minus the maximum proposal window length ωmax), and Σ^n (a function of ω) is the covariance matrix estimate for all observations in the surrounding window of length ω (excluding the evaluation data point). This procedure was repeated for a range of proposed window lengths. The optimal window length ω^ maximizes the log likelihood:

ω^=arg maxωminωωmax1NenNe[D2log2π12log|Σ^n|12ynTΣ^n1yn].
(2. 3)

The minimum and maximum window lengths to explore (ωmin and ωmax) were set to 20 (Leonardi & Van De Ville, 2015) and 180 seconds, respectively. For example, assuming a repetition time (TR) of 2 seconds, a window would include 10–90 data points. We empirically verified that this approach consistently outperformed hard-coded window lengths (see Supplementary Materials Sections S.1 and S.7).

2.1.3 MGARCH: Dynamic conditional correlation (DCC)

The MGARCH framework, which is popular in both econometrics and neuroimaging, describes a family of models that considers a zero-mean vector stochastic process ynD with a time-varying covariance structure:

yn=n1/2ηn,
(2. 4)

where ηn is an i.i.d. white noise vector process. The variances of the individual time series are assumed to follow a vector autoregression process; therefore, the dynamic covariance process n is conditioned on all observations up until time n1. In this study, we considered the DCC-GARCH model (Engle, 2002). Specifically, we implemented a first-order DCC(1,1) model as commonly used in FC studies (Choe et al., 2017; Lindquist et al., 2014). Parameter inference starts by fitting a univariate GARCH model (Bollerslev, 1986; Engle, 1982) to each time series individually, for which we used a first-order GARCH(1,1) process, meaning each state only considers both the GARCH and ARCH terms from the preceding state.

When estimating the covariance structure between a set of D>2 time series, we could either model time series jointly or loop over all pairs of time series, training D(D1)/2 models and thus learning more parameters. Whether different parameters should be learned for each node pair (or edge) depends on whether we expect significantly different dynamic characteristics between edges. In FC studies, running DCC in a pairwise manner is common (Choe et al., 2017). However, DCC models were designed to model all dimensions jointly. The optimal implementation remains unclear. Therefore, we included both: throughout this paper, “DCC-J” indicates a (multivariate) jointly trained model and “DCC-BL” refers to bivariate loop (pairwise) training.

2.2 The Wishart process

2.2.1 Model construction

We constructed the WP as a single statistical model that describes the time-varying covariance structures between time series. To start, the Wishart distribution defines a probability density function over positive definite matrices :

f(|V,ν)=1Z||(νD1)/2exp(12tr(V1)),
(2. 5)

where V is a D×D positive definite scale matrix, νD is the number of degrees of freedom, and with normalization constant Z=2νD/2|V|ν/2ΓD(ν/2), where ΓD(·) is the multivariate gamma function. This distribution is a multivariate extension of the chi-squared distribution and is used extensively in the estimation of covariance matrices in multivariate statistics. Crucially, we can construct a Wishart-distributed random (matrix-valued) variable from a collection of i.i.d. zero-mean multivariate Gaussian random variables. The sum of the outer products of such variables is Wishart distributed:

=i=1νuiuiTWD(V,ν),
(2. 6)

where ui are i.i.d. N(0,V) distributed, D-dimensional random variables, and WD(V,ν) denotes a Wishart distribution with a scale matrix V and ν degrees of freedom.

The WP is a continuous time stochastic process on symmetric positive definite matrices, which makes it suitable for modeling covariance matrices (Bru, 1991; Wilson & Ghahramani, 2011). We let Y:=(yn,1nN) denote a sequence of BOLD measurements in D, where N is the number of scan volumes and D is the number of time series. Index locations are denoted as X:=(xn,1nN) in as the times at which the measurement yn is observed, considered in a fixed interval of [0, 1] during training and prediction. These index locations were univariate in our case, but could be multivariate when other covariates, such as head motion, would be included. We chose the conditional likelihood of observations to be a multivariate Gaussian:

yn|μn,nN(μn,n),
(2. 7)

where n is a D×D covariance matrix and μn=0 in our context. The random and unobserved process 1,2,,N constitutes TVFC. Analogous to constructing a Wishart-distributed random variable from a collection of Gaussian random variables, we constructed the Wishart process from i.i.d. collections of Gaussian processes. Let

fd,kGP(0,K(,;θ)),
(2. 8)

for 1dD and 1kν, where K(,;θ) denotes a kernel function. Let Fn,d,k:=fd,k(Xn) denote the evaluation of the GP at Xn. We wrote Fn for the aggregate D×ν matrix (Fn,d,k)1dD,1kν, which has entries Fn,d,k indexed by d and k. Analogues to Eq. (2.6), we construct

n=AFnFnTAT
(2. 9)

as a Wishart-distributed random matrix at time 1nN. This construction allows us to query covariance matrices continuously throughout an fMRI scan, because the underlying GPs can be queried at any time. It also handles missing data naturally because data are not expected in a grid-like pattern. Matrix AD×D is restricted such that the scale matrix AAT is positive definite, and it is trained as part of the inference routine. This scale matrix provides the expected value of for all n. The log likelihood from Eq. (2.7) with this construction of n plugged in is

logp(yn|A,Fn)=D2log(2π)12log|AFnFnTAT|12ynT(AFnFnTAT)1yn.
(2. 10)

Heaukulani and van der Wilk (2019) suggested adding white noise for robust inference, which we validated empirically. The covariance matrix construction is updated to

n=AFnFnTAT+Λ,
(2. 11)

with additive noise matrix Λ: a diagonal D×D matrix with positive entries, whose values are trained with the rest of the parameters (off-diagonal values remain zero).

2.2.2 Bayesian inference

After defining our model, we needed a procedure to fit it. Doing so requires computing p(Y)=p(Y|F)p(F)dF in the posterior p(F|Y)=p(Y,F)p(Y), which is intractable. Therefore, we used variational inference, a technique that approximates the probability density through optimization (Blei et al., 2017). This contrasts with Markov Chain Monte Carlo methods, which are often employed in such situations. Variational inference posits a family of distributions q(F) over the latent variables and then finds the member of that family that is close in terms of Kullback-Leibler (KL) divergence to the target distribution (the true posterior). Importantly, model parameters and covariance structure aspects are learned directly from the data instead of having fixed hyperparameters representative of inductive biases on behalf of the practitioner.

To make our WP models more computationally viable, we used sparse underlying variational GPs as WP building blocks. The sparse variant introduces the concept of inducing points: learned auxiliary data points (Bauer et al., 2016; Titsias, 2009). The number of inducing points Z:=(zm,1mM) is smaller than N. We write Um,d,k:=fd,k(Zm) for sparse GP evaluations. We chose a fully factorized Gaussian prior over Ud,k. We collectively denote Ud,k:=(Um,d,k,mM) and chose its variational approximation as

q(Ud,k)N(Ud,k;μd,k,Sd,k),
(2. 12)

for variational parameters μd,kM and Sd,kM×M a real, symmetric, positive definite matrix. For inference, we iteratively maximized the evidence lower bound (ELBO)

ELBO=n=1NEq(Fn)[logp(yn|Fn)]d=1Dk=1νKL(q(Ud,k)||p(Ud,k))
(2. 13)

as our objective function using gradient descent, where (approximate) gradients were computed based on samples (Monte Carlo estimates) of our objective function. Intuitively, the first ELBO term is a likelihood (data fit) term and the second is a regularizing term.

2.2.3 Implementation

The properties of the underlying GPs are determined by a kernel function, for which we used a (stationary, isotropic) Matérn 5/2 kernel

k(x,x)=σ2(1+5r+53r2)exp(5r),
(2. 14)

with r=xxl, and where l and σ2 are the kernel length scales and variance parameters, respectively. This length scales parameter is easily interpretable and defines how quickly the GP (and, therefore, TVFC as a WP) can change over time by how much weight the model places on observations farther from the evaluation index location. Therefore, we hypothesized that this parameter may be analogous to the window length in SW (see Supplementary Materials Section S.7).

Parameters were updated using gradient descent with the Adam optimizer (Kingma & Ba, 2015). The estimated parameters include the variational parameters, kernel hyperparameters, the lower Cholesky decomposition A of the scale matrix, and the diagonal values of the additive noise matrix Λ. We set ν=D and M=200. All figures include the 95% confidence interval of the mean estimate plus/minus two standard deviations, based on 3,000 samples from the posterior. Although this model is complex in its description, and thus introduces a complexity penalty, its implementation is lean and robust, relying heavily on open-source libraries. Further, while there is no theoretical limit to the number of time series modeled, the computational demand for training this model on high-dimensional data (over 100 time series) is high, taking days or even weeks for a single scan. In practice, this means researchers working with fMRI may need access to sizeable compute clusters (e.g., parallelizing training bivariate models for each edge) or GPUs. Alternatively, they may run a dimensionality reduction algorithm before fitting a WP to their data.

2.3 Benchmarking TVFC estimation methods

2.3.1 Simulation benchmarks

We considered simulated data from seven underlying deterministic synthetic covariance structures (Fig. 2): null covariance (time series are uncorrelated), static constant covariance of σij=0.8, two periodic covariance structures (a slow and a fast oscillating sine wave defined by one and three periods, respectively) that model transient changes in coupling, stepwise covariance that models two large change points in covariance, a covariance structure that mimics a series of state transitions (Thompson et al., 2018), and a covariance structure inspired by a boxcar external task design. For each of these, we sampled the observations ynD at time steps 1nN from a zero-mean Gaussian distribution:

Fig. 2.

Synthetic covariance structures as a function of time for simulation benchmarks. This set was designed to capture a wide range of edge cases and potentially realistic underlying covariance structures. Synthetic activity time series were generated from these covariance structures.

Fig. 2.

Synthetic covariance structures as a function of time for simulation benchmarks. This set was designed to capture a wide range of edge cases and potentially realistic underlying covariance structures. Synthetic activity time series were generated from these covariance structures.

Close modal
ynN(0,n),
(2. 15)

for the respective covariance matrices. We tested on pairwise (i.e., bivariate) and trivariate datasets. The covariance structure for generating bivariate data was

n=[1σ(n)σ(n)1],
(2. 16)

where the covariance term σ(n) varied with time according to the synthetic covariance structures (Fig. 2). For the trivariate data, both a dense, fully covarying covariance structure

n=[1σ(n)σ(n)σ(n)1σ(n)σ(n)σ(n)1],
(2. 17)

and a sparse version where only the first two time series are correlated

n=[1σ(n)0σ(n)10001],
(2. 18)

were implemented. The range of σ(n) was updated to [−0.5, 1] to maintain the covariance matrices as positive semidefinite in the dense case. The time series were individually normalized to have a mean of zero and a unit standard deviation. Model performance was averaged across T=200 trials. We report the results of N=400 scanning volumes. The results for higher dimensions can be found in Supplementary Materials Section S.2.

Noise with a signal-to-noise ratio (SNR) of 2 was added to these time series. The noisy yn* was constructed as a linear mix of noiseless signal yn (Eq. 2.15) and (independent) noise time series ϵn:

yn*=αyn+(1α)ϵn,
(2. 19)

where 0<α<1 and the SNR is equal to α1α. Noisy time series more realistically mimic fMRI data, especially when the noise contains properties as found in fMRI data (Deco et al., 2009; Welvaert & Rosseel, 2013). Therefore, we used data from the Human Connectome Project (HCP) (Elam et al., 2021; Smith, Beckmann, et al., 2013) as ϵn; a source of noise. A detailed description of HCP data can be found in Appendix A.2. The time series obtained from the HCP were representative of an independent component analysis (ICA) component of whole-brain activity. For each D-dimensional set of simulated time series (i.e., individually for each covariance structure and trial), we selected the time series from D randomly selected ICA components from a randomly selected scan of different HCP subjects. As they were obtained from different subjects, these noise time series could be assumed to be uncorrelated. This ensured that no additional and adversarial (i.e., other than the predetermined “ground truth”) covariance structure was introduced. These noise time series were preferred over other types of noise, such as white noise, as they preserve realistic fMRI data characteristics such as spatial and temporal autocorrelation (Arbabshirani et al., 2014; Honari et al., 2019; Keilholz et al., 2017). The addition of noise affects the ground-truth variances and covariances, which were updated accordingly (Appendix A.1). The root mean square error (RMSE) between the estimated and ground-truth correlations served as the performance metric (Lindquist et al., 2014; Wilson & Ghahramani, 2011). For the trivariate cases, we computed the RMSE across all elements of the full correlation matrices.

2.3.2 Resting-state fMRI benchmarks

We studied the capability of predicting subject measures, test–retest robustness, and a brain state analysis using HCP rs-fMRI data. Each activity time series in this dataset represented an ICA component of whole-brain activity. For interpretable features, we computed edgewise (i.e., independently for each connection between nodes) summary measures of TVFC estimates across time: the mean (analogous to sFC estimates; considered the connection “strength”)

μi,j=1Nn=1Nn,i,j,
(2. 20)

the variance (Choe et al., 2017; Hutchison et al., 2013), sometimes interpreted as connectivity “stability” or “flexibility” (Allen et al., 2014)

σi,j2=1Nn=1N(n,i,jμi,j)2,
(2. 21)

and we proposed an additional summary measure, the TVFC rate-of-change, defined as the mean absolute relative difference between subsequent time steps

ri,j=1N1n=2N|n,i,jn1,i,j1|,
(2. 22)

for the edge between nodes 1iD and 1jD. The latter summary measure captured how smoothly the estimated TVFC changed over time.

The estimated TVFC was related to subject measures using a morphometricity analysis (Sabuncu et al., 2016), but instead of anatomical variation between subjects, we studied variation in TVFC. We posited the linear mixed effects model

y=Xβ+a+ϵ,
(2. 23)

where y is a column vector of length equal to the number of subjects studied containing some quantitative subject measure, X is a design matrix of nuisance variables (covariates) weighted by β, a~N(0,σa2Ka) is a random effects vector, and ϵ~N(0,σe2) is a noise vector. Entries in the symmetric matrix Ka encoded how globally “similar” the TVFC estimates between the respective subjects were. This similarity was defined using the three summary measures (including two dynamic measures) of estimated whole-brain TVFC, taking the lower triangular values as a vector of size D(D1)/2. The similarity “distance” between subject vectors was defined by (nonlinear) Gaussian kernel k(x,x)=exp((xx)2). Morphometricity was computed as the proportion of phenotypic variation that could be explained by intersubject variation:

m2=σa2σa2+σe2=σa2σy2,
(2. 24)

with σy2 the phenotypic variance. Parameters σa2 and σe2 were estimated using the restricted maximum likelihood method (Harville, 1977). The analysis was repeated for 15 subject measures, including age, gender, and cognitive task scores (Kong et al., 2019; Li, Kong, et al., 2019). Age and gender were included as nuisance variables (when they were not predicted).

Further, we compared the robustness of the TVFC estimates across scans from the same subject (Abrol et al., 2016, 2017; Choe et al., 2017; Elliott et al., 2020; Noble et al., 2019). The test–retest reliability of TVFC summary measures was computed as interclass correlation coefficients (ICCs), individually for each edge (Chen et al., 2018; Shrout & Fleiss, 1979):

ICC=σX2σX2+σU2,
(2. 25)

with σX2 the between-subject and σU2 the within-subject variance. We computed image intraclass correlation coefficients (I2C2) as a whole-brain reliability assessment (Shou et al., 2013):

I2C2=tr(KX)tr(KX+KU),
(2. 26)

with KXD(D1)/2×D(D1)/2 the within-subjects covariance (of “true” FC) and KUD(D1)/2×D(D1)/2 the covariance of the measurement error vector (the difference between measurement and “true” FC). These matrices were computed using method of moments estimators (Carroll et al., 2006; Shou et al., 2013). Values for both ICC and I2C2 lie between 0 and 1, where 0 represents complete retest independence and 1 represents perfect measurement reliability.

Finally, TVFC estimates are commonly characterized as switching between recurring FC correlation matrices or “brain states” (Keilholz et al., 2017; Kringelbach & Deco, 2020). At any point in time, a subject’s functional organization is defined by one of k brain states, between which a subject traverses during the scanning session. In this study, for each session separately, all estimated correlation matrices for all N time steps and all subjects were concatenated, and brain states were extracted using the k-means algorithm (Lloyd, 1982). We extracted k=3 brain states, following Choe et al. (2017). We also computed the number of brain state change points (i.e., switches between states within a scan). The brain state analysis was not considered as a benchmark for method selection, because it was unclear how to determine the superior TVFC estimation method from this analysis.

2.3.3 Task-based fMRI benchmarks

We assessed which method’s TVFC estimates could best predict an external task stimulus of brain activity, using an elegantly simple visual task experiment from the Rockland dataset (Nooner et al., 2012). More details on this dataset and the task paradigm can be found in Appendix A.3. Predicting the presence or absence of the visual stimulus was framed using a generalized linear model (GLM)

yg=Xβg+eg,
(2. 27)

where ygN is the estimated TVFC for a given method, which was run independently for all 1gG edges of interest (i.e., the connectivity between the primary visual cortex (V1) and five other ROIs; V2, V3, V4, mPFC, and M1), XRxN is the design matrix with R=3 regressors (a single experimental regressor and two nuisance regressors) weighted by βR, and egN are the error terms. The experimental regressor was the boxcar (block design) model of the visual stimulus convolved with a hemodynamic response function (HRF) (Glover, 1999). The two nuisance regressors were first-order polynomials, where the polynomial (drift) order was determined as the TR multiplied by N150 (Worsley et al., 2002). We inferred β from the observations, using ordinary least squares (OLS) to minimize the error terms (Worsley & Friston, 1995). The magnitudes of β parameters were then used to determine which TVFC estimates best captured the presence of the visual stimulus.

2.3.4 The imputation benchmark

Our proposed imputation benchmark is a data-driven benchmark for comparing TVFC estimation methods that only uses the observed time series. Again, we posited that good TVFC estimation methods can predict TVFC at unseen time points. Training and test sets were defined as data points taken from the time series under a leave-every-other-out (LEOO) split, resulting in equally sized training and test sets (Fig. 3). Each candidate method was tasked with estimating the covariance structure at the test locations using only observations from the training set. Except for the WP, where interpolation follows naturally, the test location covariance matrix estimates were elementwise and linearly interpolated between the estimates at the enclosing training locations. The performance metric was the mean log likelihood of observing the test set observations under a zero-mean Gaussian distribution defined by the estimated covariance matrices. We applied this benchmark to all datasets enclosed in this study.

Fig. 3.

Schematic of the imputation benchmark. The time series were split into equally sized training and test sets. Training set data points were used to infer TVFC at the test set locations. Benchmark performance was the log likelihood of observing the test set under a zero-mean Gaussian distribution defined by the inferred TVFC, averaged over all test set data points.

Fig. 3.

Schematic of the imputation benchmark. The time series were split into equally sized training and test sets. Training set data points were used to infer TVFC at the test set locations. Benchmark performance was the log likelihood of observing the test set under a zero-mean Gaussian distribution defined by the inferred TVFC, averaged over all test set data points.

Close modal

3.1 Simulation benchmarks

First, we investigated the ability of the TVFC estimation methods to recover known covariance structures from simulated multivariate time series. Simulation benchmarks address a significant obstacle in assessing TVFC in fMRI data: the lack of ground-truth knowledge of the covariance structure. We tested against a battery of synthetic covariance structures that could plausibly be encountered in an fMRI scan (Fig. 2): null covariance, constant static covariance, periodic (slow and fast) covariance structures of an oscillating sine wave (slow, low frequency—one period; fast, high frequency—three periods), stepwise covariance that models two large change points, a covariance structure that mimics a series of state transitions (Thompson et al., 2018), and a covariance structure inspired by a blocked-design boxcar tb-fMRI study that mimics the presence and absence of an external stimulus, as modeled by convolving a square wave stimulus time series with an HRF.

To investigate the qualitative differences between the estimation methods, the bivariate TVFC estimates for a single trial for all covariance structures are shown in Figure 4. We quantified the performance by computing the RMSE between the estimated and ground-truth correlation terms (i.e., the off-diagonal term; Figure 5, top row). Differences between the methods were statistically tested using two-tailed t-tests with Bonferroni correction. Of note is the spurious covariance detected by SW-CV in the static null and constant cases (Hindriks et al., 2016; Lindquist et al., 2014). The WP and DCC methods performed comparably to sFC, with the SW-CV performing worse than the WP (t=22.00, p=9.41×1071, and t=9.67, p=5.3×1020, for the null and constant covariance structures, respectively). For the two periodic (slow and fast) covariance structures, the smooth nature of the WP was particularly suitable, in contrast to the DCC estimates, where large, spurious jumps were observed (t=25.87, p=2.9×1087, and t=34.76, p=1.2×10122, for the slow and fast periodic covariance structures, respectively). However, the WP estimates were too smooth to capture the sudden changes in the stepwise covariance structure. The SW-CV estimates suffered from a similar problem, where static parts require longer windows, but the change points require shorter windows. While DCC could model the change points, it returned spurious covariance estimates in the static parts of the time series. All methods failed to reconstruct the state transition covariance structure, as evidenced by the fact that none of the methods were able to significantly outperform the sFC estimate (Fig. 5, top row). This was interpreted as none of the methods was sensitive enough to detect subtle yet sudden changes in the covariance structure. All methods could model the boxcar covariance structure; however, performance was reduced when noise was added. The SW-CV performed better than the WP on this dynamic covariance structure (t=13.77, p=1.4×1035).

Fig. 4.

Simulation benchmarks single trial bivariate TVFC estimates from the methods considered for N=400 time steps. The estimation methods have distinct failure modes. All methods struggle with the change points in the covariance structures. WP, Wishart process; DCC, dynamic conditional correlation; SW-CV, cross-validated sliding windows.

Fig. 4.

Simulation benchmarks single trial bivariate TVFC estimates from the methods considered for N=400 time steps. The estimation methods have distinct failure modes. All methods struggle with the change points in the covariance structures. WP, Wishart process; DCC, dynamic conditional correlation; SW-CV, cross-validated sliding windows.

Close modal
Fig. 5.

Simulation benchmarks quantified results for bivariate (top row), dense trivariate (middle row), and sparse trivariate (bottom row) scenarios. Performance was quantified as the root mean squared error (RMSE) between the TVFC estimates and ground truth for all prespecified covariance structures. The performance of sFC estimates is included for reference. Shown are the results for the simulated data with added rs-fMRI noise (SNR of 2) for N=400 time steps. Means and standard deviations are shown across T=200 trials. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) and bivariate loop (“-BL”) fashion; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Fig. 5.

Simulation benchmarks quantified results for bivariate (top row), dense trivariate (middle row), and sparse trivariate (bottom row) scenarios. Performance was quantified as the root mean squared error (RMSE) between the TVFC estimates and ground truth for all prespecified covariance structures. The performance of sFC estimates is included for reference. Shown are the results for the simulated data with added rs-fMRI noise (SNR of 2) for N=400 time steps. Means and standard deviations are shown across T=200 trials. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) and bivariate loop (“-BL”) fashion; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Close modal

We undertook further benchmarking in the trivariate case (Fig. 5, middle and bottom row). Both a dense version, where all time series edges share the same correlation structure, and a sparse case, where the third time series is uncorrelated with the first two time series, were implemented. Competitive performance, measured as the RMSE between all elements of the estimated and ground-truth correlation matrices, was observed for WP estimates. Similar to the bivariate case, SW-CV estimates return a time-varying structure when the ground-truth covariance is static (e.g., comparing with the WP on the null covariance structure; t=5.04, p=.003 and t=5.06, p=.003, for the dense and sparse cases, respectively).

The comparison between the dense and sparse cases was of interest. Whereas the pairwise DCC-BL and the jointly trained DCC-J performed similarly for the dense case, DCC-BL performed better in the sparse case for the covariance structures with time-varying structure (e.g., comparing these two on the slow periodic covariance structure; t=0.14, p=1.00, and t=9.66, p=1.5×108, for the dense and sparse cases, respectively). The SW-CV estimates performed worse in the sparse case relative to the WP (e.g., on the fast periodic covariance structure; t=1.46, p=1.00 and t=7.76, p=3.8×107, for the dense and sparse cases, respectively). This suggests that it may also benefit from learning different parameters (window lengths in this case) for each edge, instead of a globally optimal window length.

3.2 Resting-state fMRI benchmarks

Under real-world conditions, the ground-truth covariance structure between brain components is unknown, making the assessment of TVFC estimation methods difficult. Next, we framed indirect prediction tasks as proxies to shed light on which method’s TVFC estimates were closer to the ground truth. We analyzed the time-varying covariance between ICA-identified networks for 812 subjects from the HCP. The estimates for several edges (i.e., covariances between two network time series) of a single scan of a single subject are shown in Figure 6. These estimates varied radically among the methods considered, underscoring the importance of the choice of estimation method. Estimates for training DCC in a joint (“-J”) or pairwise (“-BL”) manner strongly overlapped; therefore, the results for DCC-BL are omitted. The characteristics and differences in the estimates could be reasonably captured by three TVFC summary measures: mean, variance, and rate-of-change (Fig. 7). To illustrate, DCC-J estimates changed rapidly within a narrow range of correlation estimates, which was captured by the low variance and high rate-of-change.

Fig. 6.

TVFC (correlation) estimates on ICA-identified time series from resting-state fMRI benchmark data from the Human Connectome Project, for a selection of edges for a single scan of a single subject. Estimates varied radically across methods, where Wishart process (WP) estimates resembled a smoothed version of the cross-validated sliding window (SW-CV) estimates. ICA components were mapped to functional networks: V, visual with lateral (L), medial (M), occipital (O) subsets; AUD, auditory; CBM, cerebellum; SM, sensorimotor; EC, executive control; FP, frontoparietal with cognition-language (CL) subset. DCC, dynamic conditional correlation, trained in joint (“-J”) and bivariate loop (“-BL”) fashion.

Fig. 6.

TVFC (correlation) estimates on ICA-identified time series from resting-state fMRI benchmark data from the Human Connectome Project, for a selection of edges for a single scan of a single subject. Estimates varied radically across methods, where Wishart process (WP) estimates resembled a smoothed version of the cross-validated sliding window (SW-CV) estimates. ICA components were mapped to functional networks: V, visual with lateral (L), medial (M), occipital (O) subsets; AUD, auditory; CBM, cerebellum; SM, sensorimotor; EC, executive control; FP, frontoparietal with cognition-language (CL) subset. DCC, dynamic conditional correlation, trained in joint (“-J”) and bivariate loop (“-BL”) fashion.

Close modal
Fig. 7.

Edgewise TVFC summary measures for the first scan (1A) for resting-state fMRI benchmark data from the Human Connectome Project, averaged over 812 subjects. The three summary measures captured distinct characteristics of each TVFC estimation method. For interpretation, ICA components were mapped to functional networks: V, visual with medial (M), occipital (O), lateral (L) subsets; DMN, default mode network; CBM, cerebellum; SM, sensorimotor; AUD, auditory; EC, executive control; FP, frontoparietal with cognition-language (CL) subset. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows.

Fig. 7.

Edgewise TVFC summary measures for the first scan (1A) for resting-state fMRI benchmark data from the Human Connectome Project, averaged over 812 subjects. The three summary measures captured distinct characteristics of each TVFC estimation method. For interpretation, ICA components were mapped to functional networks: V, visual with medial (M), occipital (O), lateral (L) subsets; DMN, default mode network; CBM, cerebellum; SM, sensorimotor; AUD, auditory; EC, executive control; FP, frontoparietal with cognition-language (CL) subset. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows.

Close modal

TVFC estimates were used to predict subject measures using a morphometricity analysis (Fig. 8) (Sabuncu et al., 2016). Strong heterogeneity was observed across the estimation methods and the predictive power of subject measures. For example, for the subject measure of Sustained Attention Sensitivity, none of the variance across all TVFC summary measures and estimation methods could be explained, indicating that the signatures of this task may not be captured by FC. Subject age was better predicted by the mean and variance of TVFC estimates than by the rate-of-change. To illustrate, Hutchison and Morton (2015) found that FC variability over the length of a scan correlates positively with age. We found that the WP and SW-CV methods, but not DCC-J, were predictive of age. The only significant differences in the average variance explained across all subject measures (the rightmost column in Fig. 8) were the comparison between the mean of DCC-J estimates and the sFC estimates (t=2.17, p=3.8×102) and for the variance summary measure between WP and DCC-J (t=4.17, p=2.7×104) and between DCC-J and SW-CV (t=5.48, p=1.0×105). This means that the WP and SW-CV methods exhibited similar performance. The complete lack of predictive power of the variance of DCC-J estimates was particularly noteworthy. Another popular way to frame this type of prediction task is as an out-of-sample predictive modeling study (Dhamala et al., 2021; Greene et al., 2018; Smith et al., 2015; Zamani Esfahlani et al., 2022). Such an analysis has been added to Supplementary Materials Section S.4 for comparison. Our findings replicate those of prior studies, and the relative performance of the estimation methods is similar. However, DCC-J performed relatively better in the out-of-sample prediction task.

Fig. 8.

Subject measure prediction (morphometricity) scores for resting-state fMRI benchmark on data from the Human Connectome Project. The analysis was run separately for the TVFC summary measures of the mean (top), variance (middle), and rate-of-change (bottom row). Static functional connectivity (sFC) was added as a reference in the TVFC mean plot. The right-most column displays the mean across all subject measures. The error bars display the standard error, but the explained variance never exceeds one. Subject measures exhibited varying degrees of predictability from TVFC. Notable was the lack of predictive power of the variance of dynamic conditional correlation (DCC-J) estimates. A mapping between the interpretable item names printed here and the HCP-coded item names can be found in Supplementary Table S1. WP, Wishart process; SW-CV, cross-validated sliding windows.

Fig. 8.

Subject measure prediction (morphometricity) scores for resting-state fMRI benchmark on data from the Human Connectome Project. The analysis was run separately for the TVFC summary measures of the mean (top), variance (middle), and rate-of-change (bottom row). Static functional connectivity (sFC) was added as a reference in the TVFC mean plot. The right-most column displays the mean across all subject measures. The error bars display the standard error, but the explained variance never exceeds one. Subject measures exhibited varying degrees of predictability from TVFC. Notable was the lack of predictive power of the variance of dynamic conditional correlation (DCC-J) estimates. A mapping between the interpretable item names printed here and the HCP-coded item names can be found in Supplementary Table S1. WP, Wishart process; SW-CV, cross-validated sliding windows.

Close modal

The test–retest benchmark examined the reliability of TVFC estimates across four scans of the same subjects, separately for each TVFC summary measure. The edgewise ICC scores are shown in Figure 9A. The mean of TVFC estimates was more robust than their dynamic summary measures (variance and rate-of-change). Some edges were consistently more robust than others, irrespective of the estimation method. The robustness of mean TVFC was similar across estimation methods (no significant differences were found), but the variance of DCC-J estimates was more robust compared with the other two methods (t-test between the mean ICC score over all edges, p=4.63×1017 and p=6.01×1014, respectively, in comparison with the WP and SW-CV scores), replicating Choe et al. (2017). This finding was perhaps unexpected, as the variance of DCC-J estimates was previously found to have no predictive power of subject measures. Further, the robustness of the rate-of-change of the SW-CV estimates was higher than that of the other two methods (p=9.71×105 and p=1.18×102, in comparison with the WP and DCC-J scores, respectively). The I2C2 scores for whole-brain reliability of the TVFC summary measures were in broad agreement with the ICC scores (Fig. 9B). For perspective, the ICC scores for the learned WP kernel parameters were 0.29 for the kernel variance and 0.48 for the kernel length scales (Eq. 2.14).

Fig. 9.

Test–retest robustness results for resting-state fMRI benchmark on ICA-identified data from the Human Connectome Project. (A) Edgewise ICC scores of estimated TVFC across four scans. Higher scores indicate a method’s ability to produce more robust subject-specific estimates. For interpretation, ICA components were mapped to functional networks: V, visual with medial (M), occipital (O), lateral (L) subsets; DMN, default mode network; CBM, cerebellum; SM, sensorimotor; AUD, auditory; EC, executive control; FP, frontoparietal with cognition-language (CL) subset. (B) Whole-brain test–retest results. I2C2 scores for mean, variance, and rate-of-change TVFC summary measures. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Fig. 9.

Test–retest robustness results for resting-state fMRI benchmark on ICA-identified data from the Human Connectome Project. (A) Edgewise ICC scores of estimated TVFC across four scans. Higher scores indicate a method’s ability to produce more robust subject-specific estimates. For interpretation, ICA components were mapped to functional networks: V, visual with medial (M), occipital (O), lateral (L) subsets; DMN, default mode network; CBM, cerebellum; SM, sensorimotor; AUD, auditory; EC, executive control; FP, frontoparietal with cognition-language (CL) subset. (B) Whole-brain test–retest results. I2C2 scores for mean, variance, and rate-of-change TVFC summary measures. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Close modal

The k=3 extracted brain states from the WP, DCC-J, and SW-CV estimates are shown in Supplementary Materials Section S.5. Apart from the extracted brain states, we were interested in the dynamics of the transitions between brain states. The number of brain state change points for each method for each subject is shown in Figure 10. The median number of brain state switches during a scanning session was one, zero, and nine for the WP, DCC-J, and SW-CV methods, respectively. Intuitively, given that SW-CV estimates have a higher rate-of-change (Fig. 7), it estimates more brain state switches than the WP.

Fig. 10.

Brain state switch counts from brain state analysis of resting-state fMRI data from the Human Connectome Project. The raincloud plot of the number of brain state change points (switches between brain states) per TVFC estimation method includes the distribution of the number of switches; a boxplot showing the median, quartiles, and outliers; and a scatterplot below each boxplot representing individual participants. The total number of time points is N=1,200. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows.

Fig. 10.

Brain state switch counts from brain state analysis of resting-state fMRI data from the Human Connectome Project. The raincloud plot of the number of brain state change points (switches between brain states) per TVFC estimation method includes the distribution of the number of switches; a boxplot showing the median, quartiles, and outliers; and a scatterplot below each boxplot representing individual participants. The total number of time points is N=1,200. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows.

Close modal

3.3 Task-based fMRI benchmarks

We used a tb-fMRI benchmark to investigate whether estimates of the dynamic coupling (TVFC) of V1 with five other brain regions could predict the presence of an external visual stimulus. Similar to the rs-fMRI case, ground-truth knowledge of the true underlying covariance structures was absent. However, in contrast to rs-fMRI, external stimulation can shape the coupling of regional brain activity in a more predictable and controllable manner. We observed (as expected) that the correlation between V1 and the other regions decreased as we moved up and away from the visual cortex hierarchy (Di et al., 2015): connectivity strength with V2 was the highest, followed by V3, V4, mPFC; and it was lowest with M1 (see Supplementary Materials Section S.6).

Next, as we expected that this external stimulus would induce periodic covariance between brain regions, we investigated the predictive utility of each covariance structure estimate for recovering the stimulus waveform. The relationship between the external stimulus and TVFC estimates is plotted in Figure 11, where the trend and offset were removed from the estimates and the estimates were averaged across subjects (Fig. 11A). The extracted β parameters from the GLM are shown to quantify this relationship (Fig. 11B). Most of the edges were anticorrelated to the external stimulus, as evidenced by the negative learned values. The WP model had the largest (absolute) estimated task-related β parameters and had thus captured most of the external task. The within-visual cortex edges had the least predictive power, and the V1-mPFC and V1-M1 edges had the most. This may indicate that connectivity between distant brain regions is more informative of external task conditions, which may be related to the stronger connectivity between visual regions (Supplementary Fig. S11B). Additional coupling effects in these edges may be more difficult to detect than in those with lower baseline connectivity.

Fig. 11.

External stimulus prediction task-based fMRI benchmark results. (A) TVFC estimates for five edges between regions of interest (ROIs), averaged over 286 subjects from the Rockland dataset, after removing the (linear) trend and offset. Shaded areas indicate the presence of the external visual stimulus. (B) GLM β (beta) parameters per TVFC estimation method, indicating the learned weights. Values farther from zero indicate that the GLM used the stimulus regressor more. Wishart process (WP) estimates were the most useful for predicting the presence of the external stimulus. Significance one-sided test p-values (after Bonferroni correction); *: p<.05, **: p<.01, ***: p<.001. DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows; V1-V4, visual cortex regions; mPFC, medial prefrontal cortex; M1, primary motor cortex.

Fig. 11.

External stimulus prediction task-based fMRI benchmark results. (A) TVFC estimates for five edges between regions of interest (ROIs), averaged over 286 subjects from the Rockland dataset, after removing the (linear) trend and offset. Shaded areas indicate the presence of the external visual stimulus. (B) GLM β (beta) parameters per TVFC estimation method, indicating the learned weights. Values farther from zero indicate that the GLM used the stimulus regressor more. Wishart process (WP) estimates were the most useful for predicting the presence of the external stimulus. Significance one-sided test p-values (after Bonferroni correction); *: p<.05, **: p<.01, ***: p<.001. DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows; V1-V4, visual cortex regions; mPFC, medial prefrontal cortex; M1, primary motor cortex.

Close modal

3.4 Imputation benchmarks

A major difficulty in evaluating TVFC estimation methods is the lack of ground-truth knowledge of the covariance structures underlying observed time series. The imputation benchmark proposed here used observed time series directly as a ground truth to circumvent this issue. Data were split into equally sized training and test sets by removing every other scanning volume. Performance was assessed as the ability to predict unobserved test volumes from observed training volumes. Across the different datasets, we obtained results on this benchmark which corresponded to the performance metrics computed with knowledge of a ground truth or proxy thereof.

For the simulated data, we examined two cardinal cases using bivariate noiseless data. For the null covariance case (Fig. 12A), all methods performed similarly well on this benchmark (e.g., t-test between WP and sFC; p=.47). However, for a nonstatic covariance structure, such as the slowly oscillating periodic structure (Fig. 12B), we found that the sFC estimates performed much worse on the imputation benchmark than the other methods (e.g., t-test between WP and sFC; p=1.75×10106). These findings correspond to the prior results shown for the simulation benchmarks, where the prespecified ground-truth covariance structure was reconstructed.

Fig. 12.

Imputation benchmark results for the simulated data. The raincloud plot of test log likelihoods for bivariate data for N=400 time steps shows the distribution of test log likelihoods; a boxplot showing the median, quartiles, and outliers; and a scatterplot below each boxplot representing one of the T=200 trials. Higher test likelihoods were preferred. (A) Null covariance. (B) Periodic (slow) covariance. WP, Wishart process; DCC, dynamic conditional correlation; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Fig. 12.

Imputation benchmark results for the simulated data. The raincloud plot of test log likelihoods for bivariate data for N=400 time steps shows the distribution of test log likelihoods; a boxplot showing the median, quartiles, and outliers; and a scatterplot below each boxplot representing one of the T=200 trials. Higher test likelihoods were preferred. (A) Null covariance. (B) Periodic (slow) covariance. WP, Wishart process; DCC, dynamic conditional correlation; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Close modal

Next, we ran the imputation benchmark on rs-fMRI data from the HCP. The WP and SW-CV methods outperformed DCC-J and sFC on the whole-brain (all edges) imputation benchmark (Fig. 13). DCC-J estimates performed worse than sFC estimates, indicating a poor model fit. This would underwrite the interpretation of the high test–retest scores for its variance as a result of numerical instability rather than being indicative of having modeled the data well.

Fig. 13.

Imputation benchmark results on resting-state fMRI data from the Human Connectome Project. The raincloud plot of test log likelihoods, where higher test likelihoods are preferable, shows the distribution of test log likelihoods; a boxplot showing the median, quartiles, and outliers; and a scatterplot below each boxplot representing individual participants. The table on the right shows the effect sizes (Cohen’s D) and the t-test significance p-values. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Fig. 13.

Imputation benchmark results on resting-state fMRI data from the Human Connectome Project. The raincloud plot of test log likelihoods, where higher test likelihoods are preferable, shows the distribution of test log likelihoods; a boxplot showing the median, quartiles, and outliers; and a scatterplot below each boxplot representing individual participants. The table on the right shows the effect sizes (Cohen’s D) and the t-test significance p-values. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Close modal

There may be certain edges where WP performance was similar to that of the static approach (i.e., static edges), and some where it outperformed it (i.e., dynamic edges). Therefore, we also ran the imputation benchmark individually for each edge. These results (averaged over all subjects) are shown in Figure 14. Certain edges had high performance on this imputation benchmark, which may be because they were static and thus easier to fit. For example, test likelihoods were high for one of the FP(CL)-FP edges, which can be explained by the low variance and rate-of-change of this edge (see Fig. 7). However, we were interested in comparing estimation methods. The WP outperformed sFC more strongly for certain edges. This may be because these edges changed more over time, as indicated by the higher variance and rate-of-change summary measures for these edges (see Fig. 7).

Fig. 14.

Imputation benchmark edgewise results on resting-state fMRI data from the Human Connectome Project. The equivalent of mean test log likelihoods from Figure 13 is shown for each edge individually. Higher test likelihoods were preferred. For interpretation, ICA components were mapped to functional networks: V, visual with medial (M), occipital (O), lateral (L) subsets; DMN, default mode network; CBM, cerebellum; SM, sensorimotor; AUD, auditory; EC, executive control; FP, frontoparietal with cognition-language (CL) subset. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Fig. 14.

Imputation benchmark edgewise results on resting-state fMRI data from the Human Connectome Project. The equivalent of mean test log likelihoods from Figure 13 is shown for each edge individually. Higher test likelihoods were preferred. For interpretation, ICA components were mapped to functional networks: V, visual with medial (M), occipital (O), lateral (L) subsets; DMN, default mode network; CBM, cerebellum; SM, sensorimotor; AUD, auditory; EC, executive control; FP, frontoparietal with cognition-language (CL) subset. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Close modal

Applying the imputation benchmark to the tb-fMRI Rockland data showed strong performance for the WP and weak performance for DCC-J (Fig. 15). SW-CV outperformed sFC, but with a small effect size, indicative of its failure to capture the dynamics in these data.

Fig. 15.

Imputation benchmark results for task-based fMRI Rockland data. The raincloud plot of test log likelihoods for 286 subjects, where higher test likelihoods were preferred, shows the distribution of test log likelihoods; a boxplot showing the median, quartiles, and outliers; and a scatterplot below each boxplot representing individual participants. The table on the right shows the effect sizes (Cohen’s D) and the t-test significance test p-values. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Fig. 15.

Imputation benchmark results for task-based fMRI Rockland data. The raincloud plot of test log likelihoods for 286 subjects, where higher test likelihoods were preferred, shows the distribution of test log likelihoods; a boxplot showing the median, quartiles, and outliers; and a scatterplot below each boxplot representing individual participants. The table on the right shows the effect sizes (Cohen’s D) and the t-test significance test p-values. WP, Wishart process; DCC, dynamic conditional correlation, trained in joint (“-J”) fashion; SW-CV, cross-validated sliding windows; sFC, static functional connectivity.

Close modal

We investigated the utility of WPs as a natural method for TVFC estimation. Further, we introduced a data-driven method to determine the optimal window length in SW approaches. Method comparison and selection are intrinsically difficult because of the lack of access to the true covariance structure underlying neural activity. We proposed a range of prediction tasks to benchmark TVFC estimation methods and proposed a data-driven imputation benchmark. We found that the WP performed competitively.

We simulated BOLD time series representing putative covariance structures present in fMRI. We showed that the WP does not produce spurious estimates of covariance dynamics when the true covariance is static, while we replicated this well-known failure mode of the SW and DCC approaches (Lindquist et al., 2014; Lurie et al., 2020). This may be due to the WP’s strong prior over static covariance structures; its estimates will be static unless provided with sufficient evidence of the contrary. Further, SW methods estimate in a pairwise fashion and have less inherent protection against false positive estimation. They view each pair of time series independently from all other time series. The WP estimates outperformed the other methods in the case of a smoothly changing covariance structure. However, the WP was not particularly well suited for modeling covariance structures with sudden changes in connectivity strength. This could be attributed to the smoothness of the underlying GPs. The other methods considered (DCC and SW-CV) also failed in such cases. We found that the WP performed competitively when scaling to three dimensions, especially when the edges between the simulated time series had distinct covariance structures. Nonetheless, simulation studies have limited value as it remains unclear what covariance structures are present in real fMRI data.

We studied benchmarks using rs-fMRI data from the HCP. We found large qualitative differences in the TVFC estimates among the different estimation methods. We related summary measures of TVFC estimates with subject measures through a morphometricity analysis. The results for sFC (as a reference null model; the alternative hypothesis that FC is static) and the first summary measure (mean across time) of TVFC estimates replicated those of a previous study (Li, Kong, et al., 2019). The time-varying summary measures (variance and rate-of-change) contained information about subject measures, which justifies the study of TVFC beyond sFC. However, these simple summary measures may be limited characterizations of the dynamics of TVFC, and a more sophisticated mapping between TVFC estimates and subject measures may lead to an improved benchmark. WP and SW-CV estimates had the highest predictive power. However, they performed differently for different subject measures, which may indicate the capture of distinct elements of covariance structures. In terms of test–retest robustness, all methods performed similarly, but the variance of the DCC estimates and the rate-of-change of the SW-CV estimates were more robust across scanning sessions. However, the high robustness of the DCC estimates may be because the raw numerical values of the variance of the DCC estimates were close to zero. The test–retest measure may be unstable in cases where both within-subject and between-subject variances are small. Lastly, the brain state analysis demonstrated that the choice of TVFC estimation method impacts the number of brain state switches during a scan.

We used a tb-fMRI dataset from the Rockland sample with an external visual task that induced a covariance structure in the participants. The ability of covariance estimates to detect putative changes in interregion correlations induced by the stimulus was determined by regressing the stimulus onto the TVFC estimates. The WP estimates recovered the external stimulus better compared with the other methods, witnessed by learning larger weights for the stimulus in a GLM setup. This outperformance could be attributed to the small number of scan volumes. Bayesian methods generally perform well in small samples because they use priors, DCC is known to struggle in small samples, and SW-CV may not have seen enough data to find a reasonable window length, and is even more sensitive to outliers in such regimes. The highest predictive performance was found for the V1-M1 and V1-mPFC edges. V1 and M1 are anatomically farther apart than V1 and the other visual areas. They underlie functionally distinct processes, which may explain their anticorrelated interaction (Fox et al., 2005; Murphy et al., 2009). This may explain why the V1-M1 edge was most (anti)correlated with the external task conditions. It is possible that visual stimulus onset disrupts coactivity between V1 and M1 (Di et al., 2015). Further, Figure S11B indicates that the overall FC between V1 and the other visual areas was much stronger than with the mPFC and M1. The baseline strong static correlation between V1 and other visual areas may have masked any potential dynamic changes in connectivity induced by the stimulus. The same could be argued for the mPFC, potentially to a lesser degree. To investigate TVFC estimation beyond the simple external stimulus studied here, other tb-fMRI benchmarks with richer task structures should be included (Xie et al., 2019).

We found a correspondence between the estimation method performance on each benchmark (simulations, rs-fMRI, and tb-fMRI) and their respective performance on the respective imputation benchmarks. In other words, the WP performs well on this benchmark when it does so on the matched benchmark, and vice versa. This marks its exciting promise as a benchmark that can be used in realistic situations where no ground-truth covariance structure or proxies thereof (e.g., concurrent information, expert labeling, or additional collected participant information) are available.

Based on the simulation benchmark results, we posit that a method’s outperformance of the sFC estimate can be considered an indication of how time-varying the covariance structure is (but only if the method has successfully modeled the data). The TVFC estimation method performance was similar to that of sFC when either there was no or little dynamic structure present, or the methods failed to model it. However, the TVFC estimation methods outperformed sFC when there was a dynamic structure (if the method had successfully modeled it). In other words, we posit that outperformance over the sFC estimate is a proxy for a statistical test of whether there is any time-varying structure (Hindriks et al., 2016; Zalesky & Breakspear, 2015) and can be considered a null model study (Liégeois et al., 2021; Miller et al., 2018; Novelli & Razi, 2022). The imputation benchmark finding is consistent with this interpretation. Outperformance of the sFC estimate on this benchmark can also be considered a null model study. Therefore, practitioners are recommended to conduct this study prior to any further analysis, not only when in doubt about how to estimate TVFC, but also whether their data contain a strong dynamic component.

In addition to competitive benchmarking performance, the WP offers qualitative advantages, which may also explain its competitive performance. First, unlike SW, it is a model-based approach, which can lead to increased predictive power (Bassett et al., 2011), allowing for easier specification of inductive biases (Foti & Fox, 2019), and uncertainty modeling (Kudela et al., 2017). In general, a model that makes fewer assumptions about the data and extracts relevant hyperparameters automatically is preferred.

4.1 Limitations and future work

We must still consider the possibility of covariance structures in real data being organized as point processes of change points, where connectivity can suddenly and drastically change. All the considered models performed poorly on such data, for example, the simulated “state transition” dataset. State-based models such as HMMs are expected to perform better on such covariance structures. Change point detection algorithms can help determine whether such sudden changes exist (Cribben et al., 2012; Robinson et al., 2010). Within the WP framework, change point kernels can be included in models (Saatçi et al., 2010; Wilson, 2013). A key question is whether state transitions are expected in the covariance structures of real data. If methods that are better suited for detecting states outperform those that cannot capture change points well on real data benchmarks designed for this purpose, this would provide evidence for their existence.

Another limitation is that the utility of (rs-fMRI) test–retest studies is questionable. The classical measurement error framework assumes an immutable subject-specific property, whereas in FC studies, at least part of the functional architecture is expected to be different in different sessions, and thus less robust than more stable characteristics such as anatomical properties. One estimation method could extract subject-specific traits, such as participant age, whereas another could extract participant states, such as stress during the scan. It is not clear whether a TVFC estimation method that extracts stable subject- or session-specific information is preferred. This makes it a poorly framed exercise, and its interpretation in relation to the optimal method choice is unclear. Further, viewing this as a prediction problem, the test–retest problem can be stated as, when given the first scan, how well we can predict which scan is that subject’s subsequent scan. In this light, the method with the best test–retest score from any model feature can be considered stronger, such as the WP’s learned kernel parameters, for which we found high ICC scores. Although test–retest reliability may be desired, optimizing it for its own sake overlooks the goal of TVFC estimation. It has recently been argued that (behaviorally) predictive connectomes are more important than reliable connectomes (Finn & Rosenberg, 2021).

The WP implemented in this study can be augmented in several ways. First, one of the disadvantages of this model is its computational cost (Meng et al., 2023). To mitigate the computational cost due to large values of N, innovations in sparse GPs that remove the need for inducing points can be ported to WP models (Adam et al., 2020; Wilkinson et al., 2021). However, this may be more relevant in applications to EEG data, where the number of time steps is much larger. The model scales in computational time with respect to D in O(D3), owing to the required inversion of a D×D matrix. A low rank (or factored) implementation can help in situations where D is large (Fox et al., 2015; Heaukulani & van der Wilk, 2019). This assigns each time series in a weighted manner to KD clusters, and only learns the correlation structure between these clusters. Apart from reducing the number of model parameters, this can be considered as a dimensionality reduction step that posits an inherent network organization onto the brain. Second, while assuming the mean function μn to be zero(s) is fair if we subtract the empirical mean from the data, models that include an estimate of the mean function may yield better covariance estimates (Lan et al., 2020). Finally, by specifying the similarity between observations, the kernel choice imposes assumptions (inductive bias) on the behavior of the observed time series, such as covariance structure smoothness (Foti & Fox, 2019; Fyshe et al., 2012). Expressive kernels can be designed to incorporate domain knowledge (Gönen & Alpaydin, 2011). For example, a periodic kernel could model TVFC where scanning volumes at a certain periodic interval are more related, which could be useful in the case of periodic task stimuli. Although our simple kernel choice in this work resembles a tapered SW approach, more complex kernels can introduce structures that are unattainable by other methods.

Having a comprehensive benchmarking framework is crucial because it allows for rapid iterative improvements and verification of model adjustments. Additional benchmarks will increase robustness (e.g., across data collection routines, scanners, and demographics) and deepen our knowledge of the utility of estimation methods. As a priority, a carefully tailored prediction task could help assess the degree to which brain states are artifacts of the estimation method choice or valid constructs that accurately describe underlying brain dynamics. Finally, we must accept that perfect benchmarking is unattainable in practice and that a single estimation method will not outperform in all configurations. One way to increase the robustness of our scientific conclusions is to formalize the TVFC estimation method choice (and other choices) using a multiverse analysis framework (Bell & Kampman, 2021; Steegen et al., 2016). Such approaches have recently been demonstrated in neuroimaging data processing pipeline choices (Dafflon et al., 2022) and for exploring machine learning model hyperparameter spaces (Bell et al., 2022). This views each decision “universe” in parallel and repeats an analysis in each universe. If a conclusion holds in most universes, it can be considered more robust.

This study assessed the utility of WP models for TVFC estimation. We also proposed a more principled method for running SW by cross-validating its window length. We carefully considered how to compare TVFC estimation methods by designing a comprehensive suite of benchmarks, including simulations, subject measure prediction, test–retest studies, brain state analyses, and external task prediction. A novel imputation benchmark based on cross-validation that can be run on any dataset was introduced. We showed how the WP performed competitively in relation to baselines, including both DCC and SW approaches, outperforming them especially on the external task prediction benchmark while estimating fewer false positives in the TVFC null model. Finally, we recommend running the imputation benchmark on any newly collected dataset to determine the optimal TVFC estimation method.

Researchers can use the pip-installable, open-source Python package FCEst (https://github.com/OnnoKampman/FCEst) for TVFC estimation using WPs and the baselines discussed. Code for reproducing the benchmarking and their results can be found at https://github.com/OnnoKampman/FCEst-benchmarking. We used open-source Python libraries nilearn (Abraham et al., 2014), pingouin (Vallat, 2018), and GPflow (Matthews et al., 2017) and the R package rmgarch (Galanos, 2022). Human Connectome Project data are available at https://db.humanconnectome.org/ (requiring free registration). Rockland data are available at http://fcon_1000.projects.nitrc.org/indi/pro/nki.html (requiring free registration).

O.P.K. carried out the conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing, and editing. J.Z. carried out the conceptualization, data curation, formal analysis, methodology, validation, writing, and editing. S.A. carried out the conceptualization, data curation, formal analysis, methodology, software, validation, and editing. M.v.d.W. carried out the conceptualization, methodology, software, supervision, and editing. Z.K. carried out the conceptualization, funding acquisition, project administration, resources, supervision, and editing.

This work was supported by grants to Z.K. from the Wellcome Trust (grants 205067/Z/16/Z, 223131/Z/21/Z, 221633/Z/20/Z), the Royal Society (INF\R2\202107), the Biotechnology and Biological Sciences Research Council (grants H012508 and BB/P021255/1). This work was further supported by the European Union’s Horizon 2020 Marie Skłodowska-Curie Actions Initial Training Network (H2020-MSCA-ITN-2017) “DyViTo: Dynamics in Vision and Touch” (grant agreement 765121).

None.

This work used open-access data from the HCP and the NKI-Rockland sample. No identifying information related to participants from either dataset has been published or shared.

We thank Gustavo Deco, Emmanuel Stamatakis, Creighton Heaukulani, and Vasilis Karlaftis for their helpful feedback. We would also like to thank the reviewers for their helpful comments. Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. For the purpose of open access, the authors have applied for a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Supplementary material for this article is available with the online version here: https://doi.org/10.1162/imag_a_00184.

Abraham
,
A.
,
Pedregosa
,
F.
,
Eickenberg
,
M.
,
Gervais
,
P.
,
Mueller
,
A.
,
Kossaifi
,
J.
,
Gramfort
,
A.
,
Thirion
,
B.
, &
Varoquaux
,
G.
(
2014
).
Machine learning for neuroimaging with scikit-learn
.
Frontiers in Neuroinformatics
,
8
,
14
. https://doi.org/10.3389/fninf.2014.00014
Abrol
,
A.
,
Chaze
,
C.
,
Damaraju
,
E.
, &
Calhoun
,
V. D.
(
2016
).
The chronnectome: Evaluating replicability of dynamic connectivity patterns in 7500 resting fMRI datasets
.
The 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS)
,
5571
5574
. https://doi.org/10.1109/EMBC.2016.7591989
Abrol
,
A.
,
Damaraju
,
E.
,
Miller
,
R. L.
,
Stephen
,
J. M.
,
Claus
,
E. D.
,
Mayer
,
A. R.
, &
Calhoun
,
V. D.
(
2017
).
Replicability of time-varying connectivity patterns in large resting state fMRI samples
.
NeuroImage
,
163
,
160
176
. https://doi.org/10.1016/j.neuroimage.2017.09.020
Adam
,
V.
,
Eleftheriadis
,
S.
,
Durrande
,
N.
,
Artemev
,
A.
, &
Hensman
,
J.
(
2020
). Doubly sparse variational Gaussian processes.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics
,
108
,
2874
2884
. https://proceedings.mlr.press/v108/adam20a.html
Ahrends
,
C.
,
Stevner
,
A.
,
Pervaiz
,
U.
,
Kringelbach
,
M. L.
,
Vuust
,
P.
,
Woolrich
,
M. W.
, &
Vidaurre
,
D.
(
2022
).
Data and model considerations for estimating time-varying functional connectivity in fMRI
.
NeuroImage
,
252
,
119026
. https://doi.org/10.1016/j.neuroimage.2022.119026
Allen
,
E. A.
,
Damaraju
,
E.
,
Plis
,
S. M.
,
Erhardt
,
E. B.
,
Eichele
,
T.
, &
Calhoun
,
V. D.
(
2014
).
Tracking whole-brain connectivity dynamics in the resting state
.
Cerebral Cortex
,
24
(
3
),
663
676
. https://doi.org/10.1093/cercor/bhs352
Arbabshirani
,
M. R.
,
Damaraju
,
E.
,
Phlypo
,
R.
,
Plis
,
S.
,
Allen
,
E. A.
,
Ma
,
S.
,
Mathalon
,
D.
,
Preda
,
A.
,
Vaidya
,
J. G.
,
Adalı
,
T.
, &
Calhoun
,
V. D.
(
2014
).
Impact of autocorrelation on functional connectivity
.
NeuroImage
,
102
(
P2
),
294
308
. https://doi.org/10.1016/j.neuroimage.2014.07.045
Barch
,
D. M.
,
Burgess
,
G. C.
,
Harms
,
M. P.
,
Petersen
,
S. E.
,
Schlaggar
,
B. L.
,
Corbetta
,
M.
,
Glasser
,
M. F.
,
Curtiss
,
S.
,
Dixit
,
S.
,
Feldt
,
C.
,
Nolan
,
D.
,
Bryant
,
E.
,
Hartley
,
T.
,
Footer
,
O.
,
Bjork
,
J. M.
,
Poldrack
,
R.
,
Smith
,
S.
,
Johansen-Berg
,
H.
,
Snyder
,
A. Z.
, &
Van Essen
,
D. C.
(
2013
).
Function in the human connectome: Task-fMRI and individual differences in behavior
.
NeuroImage
,
80
,
169
189
. https://doi.org/10.1016/j.neuroimage.2013.05.033
Bassett
,
D. S.
,
Wymbs
,
N. F.
,
Porter
,
M. A.
,
Mucha
,
P. J.
,
Carlson
,
J. M.
, &
Grafton
,
S. T.
(
2011
).
Dynamic reconfiguration of human brain networks during learning
.
Proceedings of the National Academy of Sciences of the United States of America
,
108
(
18
),
7641
7646
. https://doi.org/10.1073/pnas.1018985108
Bauer
,
M.
,
van der Wilk
,
M.
, &
Rasmussen
,
C. E.
(
2016
).
Understanding probabilistic sparse Gaussian Process approximations.
Advances in Neural Information Processing Systems
,
29
,
1533
1541
. https://doi.org/10.48550/arXiv.1606.04820
Beckmann
,
C. F.
, &
Smith
,
S. M.
(
2004
).
Probabilistic independent component analysis for functional magnetic resonance imaging
.
IEEE Transactions on Medical Imaging
,
23
(
2
),
137
152
. https://doi.org/10.1109/TMI.2003.822821
Bell
,
S. J.
, &
Kampman
,
O. P.
(
2021
).
Perspectives on machine learning from psychology’s reproducibility crisis
.
Proceedings of the 9th International Conference on Learning Representations
,
1
7
. https://doi.org/10.48550/arXiv.2104.08878
Bell
,
S. J.
,
Kampman
,
O. P.
,
Dodge
,
J.
, &
Lawrence
,
N. D.
(
2022
).
Modeling the machine learning multiverse
.
Advances in Neural Information Processing Systems
,
35
,
18416
18429
. https://doi.org/10.48550/arXiv.2206.05985
Bijsterbosch
,
J. D.
,
Valk
,
S. L.
,
Wang
,
D.
, &
Glasser
,
M. F.
(
2021
).
Recent developments in representations of the connectome
.
NeuroImage
,
243
,
118533
. https://doi.org/10.1016/j.neuroimage.2021.118533
Blei
,
D. M.
,
Kucukelbir
,
A.
, &
McAuliffe
,
J. D.
(
2017
).
Variational inference: A review for statisticians
.
Journal of the American Statistical Association
,
112
(
518
),
859
877
. https://doi.org/10.1080/01621459.2017.1285773
Bollerslev
,
T.
(
1986
).
Generalized autoregressive conditional heteroskedasticity
.
Journal of Econometrics
,
31
(
3
),
307
327
. https://doi.org/10.1016/0304-4076(86)90063-1
Bressler
,
S. L.
, &
Menon
,
V.
(
2010
).
Large-scale brain networks in cognition: Emerging methods and principles
.
Trends in Cognitive Sciences
,
14
(
6
),
277
290
. https://doi.org/10.1016/j.tics.2010.04.004
Bru
,
M. F.
(
1991
).
Wishart processes
.
Journal of Theoretical Probability
,
4
(
4
),
725
751
. https://doi.org/10.1007/BF01259552
Carroll
,
R. J.
,
Ruppert
,
D.
,
Stefanski
,
L. A.
, &
Crainiceanu
,
C. M.
(
2006
).
Measurement error in nonlinear models
.
Chapman and Hall/CRC
. https://doi.org/10.1201/9781420010138
Chang
,
C.
, &
Glover
,
G. H.
(
2010
).
Time-frequency dynamics of resting-state brain connectivity measured with fMRI
.
NeuroImage
,
50
(
1
),
81
98
. https://doi.org/10.1016/j.neuroimage.2009.12.011
Chen
,
G.
,
Taylor
,
P. A.
,
Haller
,
S. P.
,
Kircanski
,
K.
,
Stoddard
,
J.
,
Pine
,
D. S.
,
Leibenluft
,
E.
,
Brotman
,
M. A.
, &
Cox
,
R. W.
(
2018
).
Intraclass correlation: Improved modeling approaches and applications for neuroimaging
.
Human Brain Mapping
,
39
(
3
),
1187
1206
. https://doi.org/10.1002/hbm.23909
Choe
,
A. S.
,
Nebel
,
M. B.
,
Barber
,
A. D.
,
Cohen
,
J. R.
,
Xu
,
Y.
,
Pekar
,
J. J.
,
Caffo
,
B. S.
, &
Lindquist
,
M. A.
(
2017
).
Comparing test-retest reliability of dynamic functional connectivity methods
.
NeuroImage
,
158
,
155
175
. https://doi.org/10.1016/j.neuroimage.2017.07.005
Cohen
,
J. R.
(
2018
).
The behavioral and cognitive relevance of time-varying, dynamic changes in functional connectivity
.
NeuroImage
,
180
,
515
525
. https://doi.org/10.1016/j.neuroimage.2017.09.036
Cribben
,
I.
,
Haraldsdottir
,
R.
,
Atlas
,
L. Y.
,
Wager
,
T. D.
, &
Lindquist
,
M. A.
(
2012
).
Dynamic connectivity regression: Determining state-related changes in brain connectivity
.
NeuroImage
,
61
(
4
),
907
920
. https://doi.org/10.1016/j.neuroimage.2012.03.070
Dafflon
,
J.
,
Costa
Da
,
F.
P.
,
Váša
,
F.
,
Monti
,
R. P.
,
Bzdok
,
D.
,
Hellyer
,
P. J.
,
Turkheimer
,
F.
,
Smallwood
,
J.
,
Jones
,
E.
, &
Leech
,
R.
(
2022
).
A guided multiverse study of neuroimaging analyses
.
Nature Communications
,
13
(
1
),
1
13
. https://doi.org/10.1038/s41467-022-31347-8
Deco
,
G.
,
Jirsa
,
V. K.
, &
McIntosh
,
A. R.
(
2011
).
Emerging concepts for the dynamical organization of resting-state activity in the brain
.
Nature Reviews Neuroscience
,
12
(
1
),
43
56
. https://doi.org/10.1038/nrn2961
Deco
,
G.
,
Jirsa
,
V. K.
,
McIntosh
,
A. R.
,
Sporns
,
O.
, &
Kötter
,
R.
(
2009
).
Key role of coupling, delay, and noise in resting brain fluctuations
.
Proceedings of the National Academy of Sciences of the United States of America
,
106
(
25
),
10302
10307
. https://doi.org/10.1073/pnas.0901831106
Deco
,
G.
,
Tononi
,
G.
,
Boly
,
M.
, &
Kringelbach
,
M. L.
(
2015
).
Rethinking segregation and integration: Contributions of whole-brain modelling
.
Nature Reviews Neuroscience
,
16
(
7
),
430
439
. https://doi.org/10.1038/nrn3963
Demirtaş
,
M.
,
Tornador
,
C.
,
Falcón
,
C.
,
López-Solà
,
M.
,
Hernández-Ribas
,
R.
,
Pujol
,
J.
,
Menchón
,
J. M.
,
Ritter
,
P.
,
Cardoner
,
N.
,
Soriano-Mas
,
C.
, &
Deco
,
G.
(
2016
).
Dynamic functional connectivity reveals altered variability in functional connectivity among patients with major depressive disorder
.
Human Brain Mapping
,
37
(
8
),
2918
2930
. https://doi.org/10.1002/hbm.23215
Dhamala
,
E.
,
Jamison
,
K. W.
,
Jaywant
,
A.
,
Dennis
,
S.
, &
Kuceyeski
,
A.
(
2021
).
Distinct functional and structural connections predict crystallised and fluid cognition in healthy adults
.
Human Brain Mapping
,
42
(
10
),
3102
3118
. https://doi.org/10.1002/hbm.25420
Di
,
X.
,
Fu
,
Z.
,
Chan
,
S. C.
,
Hung
,
Y. S.
,
Biswal
,
B. B.
, &
Zhang
,
Z.
(
2015
).
Task-related functional connectivity dynamics in a block-designed visual experiment
.
Frontiers in Human Neuroscience
,
9
,
543
. https://doi.org/10.3389/fnhum.2015.00543
Ebrahimi
,
M.
,
Calarco
,
N.
,
Hawco
,
C.
,
Voineskos
,
A.
, &
Khisti
,
A.
(
2023
).
Time-resolved fMRI shared response model using Gaussian process factor analysis
.
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
. https://doi.org/10.1109/ICASSP49357.2023.10096325
Elam
,
J. S.
,
Glasser
,
M. F.
,
Harms
,
M. P.
,
Sotiropoulos
,
S. N.
,
Andersson
,
J. L. R.
,
Burgess
,
G. C.
,
Curtiss
,
S. W.
,
Oostenveld
,
R.
,
Larson-Prior
,
L. J.
,
Schoffelen
,
J. M.
,
Hodge
,
M. R.
,
Cler
,
E. A.
,
Marcus
,
D. M.
,
Barch
,
D. M.
,
Yacoub
,
E.
,
Smith
,
S. M.
,
Ugurbil
,
K.
, &
Van Essen
,
D. C.
(
2021
).
The Human Connectome Project: A retrospective
.
NeuroImage
,
244
,
118543
. https://doi.org/10.1016/j.neuroimage.2021.118543
Elliott
,
M. L.
,
Knodt
,
A. R.
,
Ireland
,
D.
,
Morris
,
M. L.
,
Poulton
,
R.
,
Ramrakha
,
S.
,
Sison
,
M. L.
,
Moffitt
,
T. E.
,
Caspi
,
A.
, &
Hariri
,
A. R.
(
2020
).
What is the test-retest reliability of common task-functional MRI measures? New empirical evidence and a meta-analysis
.
Psychological Science
,
31
(
7
),
792
806
. https://doi.org/10.1177/0956797620916786
Engle
,
R. F.
(
1982
).
Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation
.
Econometrica
,
50
(
4
),
987
1007
. https://doi.org/10.2307/1912773
Engle
,
R. F.
(
2002
).
Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models
.
Journal of Business and Economic Statistics
,
20
(
3
),
339
350
. https://doi.org/10.1198/073500102288618487
Finn
,
E. S.
, &
Rosenberg
,
M. D.
(
2021
).
Beyond fingerprinting: Choosing predictive connectomes over reliable connectomes
.
NeuroImage
,
239
,
118254
. https://doi.org/10.1016/j.neuroimage.2021.118254
Finn
,
E. S.
,
Shen
,
X.
,
Scheinost
,
D.
,
Rosenberg
,
M. D.
,
Huang
,
J.
,
Chun
,
M. M.
,
Papademetris
,
X.
, &
Constable
,
R. T.
(
2015
).
Functional connectome fingerprinting: Identifying individuals using patterns of brain connectivity
.
Nature Neuroscience
,
18
(
11
),
1664
1671
. https://doi.org/10.1038/nn.4135
Foti
,
N. J.
, &
Fox
,
E. B.
(
2019
).
Statistical model-based approaches for functional connectivity analysis of neuroimaging data
.
Current Opinion in Neurobiology
,
55
,
48
54
. https://doi.org/10.1016/j.conb.2019.01.009
Fox
,
E. B.
,
Dunson
,
D. B.
, &
Airoldi
,
E. M.
(
2015
).
Bayesian nonparametric covariance regression
.
Journal of Machine Learning Research
,
16
,
2501
2542
. https://www.jmlr.org/papers/v16/fox15a.html
Fox
,
M. D.
, &
Raichle
,
M. E.
(
2007
).
Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging
.
Nature Reviews Neuroscience
,
8
(
9
),
700
711
. https://doi.org/10.1038/nrn2201
Fox
,
M. D.
,
Snyder
,
A. Z.
,
Vincent
,
J. L.
,
Corbetta
,
M.
,
Van Essen
,
D. C.
, &
Raichle
,
M. E.
(
2005
).
The human brain is intrinsically organized into dynamic, anticorrelated functional networks
.
Proceedings of the National Academy of Sciences of the United States of America
,
102
(
27
),
9673
9678
. https://doi.org/10.1073/pnas.0504136102
Fyshe
,
A.
,
Fox
,
E. B.
,
Dunson
,
D. B.
, &
Mitchell
,
T.
(
2012
).
Hierarchical latent dictionaries for models of brain activation
.
Journal of Machine Learning Research
,
22
,
409
421
. http://proceedings.mlr.press/v22/fyshe12.html
Galanos
,
A.
(
2022
).
Multivariate GARCH models (R package)
. https://cran.r-project.org/web/packages/rmgarch/rmgarch.pdf
Giorgio
,
J.
,
Karlaftis
,
V. M.
,
Wang
,
R.
,
Shen
,
Y.
,
Tino
,
P.
,
Welchman
,
A.
, &
Kourtzi
,
Z.
(
2018
).
Functional brain networks for learning predictive statistics
.
Cortex
,
107
,
204
219
. https://doi.org/10.1016/j.cortex.2017.08.014
Glasser
,
M. F.
,
Smith
,
S. M.
,
Marcus
,
D. S.
,
Andersson
,
J. L. R.
,
Auerbach
,
E. J.
,
Behrens
,
T. E. J.
,
Coalson
,
T. S.
,
Harms
,
M. P.
,
Jenkinson
,
M.
,
Moeller
,
S.
,
Robinson
,
E. C.
,
Sotiropoulos
,
S. N.
,
Xu
,
J.
,
Yacoub
,
E.
,
Ugurbil
,
K.
, &
Van Essen
,
D. C.
(
2016
).
The Human Connectome Project’s neuroimaging approach
.
Nature Neuroscience
,
19
(
9
),
1175
1187
. https://doi.org/10.1038/nn.4361
Glasser
,
M. F.
,
Sotiropoulos
,
S. N.
,
Wilson
,
J. A.
,
Coalson
,
T. S.
,
Fischl
,
B.
,
Andersson
,
J. L. R.
,
Xu
,
J.
,
Jbabdi
,
S.
,
Webster
,
M.
,
Polimeni
,
J. R.
,
Van Essen
,
D. C.
, &
Jenkinson
,
M.
(
2013
).
The minimal preprocessing pipelines for the Human Connectome Project
.
NeuroImage
,
80
,
105
124
. https://doi.org/10.1016/j.neuroimage.2013.04.127
Glerean
,
E.
,
Salmi
,
J.
,
Lahnakoski
,
J. M.
,
Jääskeläinen
,
I. P.
, &
Sams
,
M.
(
2012
).
Functional magnetic resonance imaging phase synchronization as a measure of dynamic functional connectivity
.
Brain Connectivity
,
2
(
2
),
91
101
. https://doi.org/10.1089/brain.2011.0068
Glover
,
G. H.
(
1999
).
Deconvolution of impulse response in event-related BOLD fMRI
.
NeuroImage
,
9
(
4
),
416
429
. https://doi.org/10.1006/nimg.1998.0419
Gönen
,
M.
, &
Alpaydin
,
E.
(
2011
).
Multiple kernel learning algorithms
.
Journal of Machine Learning Research
,
12
,
2211
2268
. https://www.jmlr.org/papers/volume12/gonen11a/gonen11a.pdf
Gonzalez-Castillo
,
J.
, &
Bandettini
,
P. A.
(
2018
).
Task-based dynamic functional connectivity: Recent findings and open questions
.
NeuroImage
,
180
,
526
533
. https://doi.org/10.1016/j.neuroimage.2017.08.006
Greene
,
A. S.
,
Gao
,
S.
,
Scheinost
,
D.
, &
Constable
,
R. T.
(
2018
).
Task-induced brain state manipulation improves prediction of individual traits
.
Nature Communications
,
9
(
1
),
2807
. https://doi.org/10.1038/s41467-018-04920-3
Griffanti
,
L.
,
Salimi-Khorshidi
,
G.
,
Beckmann
,
C. F.
,
Auerbach
,
E. J.
,
Douaud
,
G.
,
Sexton
,
C. E.
,
Zsoldos
,
E.
,
Ebmeier
,
K. P.
,
Filippini
,
N.
,
Mackay
,
C. E.
,
Moeller
,
S.
,
Xu
,
J.
,
Yacoub
,
E.
,
Baselli
,
G.
,
Ugurbil
,
K.
,
Miller
,
K. L.
, &
Smith
,
S. M.
(
2014
).
ICA-based artefact removal and accelerated fMRI acquisition for improved resting state network imaging
.
NeuroImage
,
95
,
232
247
. https://doi.org/10.1016/j.neuroimage.2014.03.034
Hakimdavoodi
,
H.
, &
Amirmazlaghani
,
M.
(
2020
).
Using autoregressive-dynamic conditional correlation model with residual analysis to extract dynamic functional connectivity
.
Journal of Neural Engineering
,
17
(
3
),
035008
. https://doi.org/10.1088/1741-2552/ab965b
Harville
,
D. A.
(
1977
).
Maximum likelihood approaches to variance component estimation and to related problems
.
Journal of the American Statistical Association
,
72
(
358
),
320
338
. https://doi.org/10.1080/01621459.1977.10480998
Heaukulani
,
C.
, &
van der Wilk
,
M.
(
2019
).
Scalable Bayesian dynamic covariance modeling with variational Wishart and inverse Wishart processes
.
Advances in Neural Information Processing Systems
,
32
,
4584
4594
. http://arxiv.org/abs/1906.09360
Hindriks
,
R.
,
Adhikari
,
M. H.
,
Murayama
,
Y.
,
Ganzetti
,
M.
,
Mantini
,
D.
,
Logothetis
,
N. K.
, &
Deco
,
G.
(
2016
).
Can sliding-window correlations reveal dynamic functional connectivity in resting-state fMRI
?
NeuroImage
,
127
,
242
256
. https://doi.org/10.1016/j.neuroimage.2015.11.055
Honari
,
H.
,
Choe
,
A. S.
,
Pekar
,
J. J.
, &
Lindquist
,
M. A.
(
2019
).
Investigating the impact of autocorrelation on time-varying connectivity
.
NeuroImage
,
197
,
37
48
. https://doi.org/10.1016/j.neuroimage.2019.04.042
Hutchison
,
R. M.
, &
Morton
,
J. B.
(
2015
).
Tracking the brain’s functional coupling dynamics over development
.
Journal of Neuroscience
,
35
(
17
),
6849
6859
. https://doi.org/10.1523/JNEUROSCI.4638-14.2015
Hutchison
,
R. M.
,
Womelsdorf
,
T.
,
Gati
,
J. S.
,
Everling
,
S.
, &
Menon
,
R. S.
(
2013
).
Resting-state networks show dynamic functional connectivity in awake humans and anesthetized macaques
.
Human Brain Mapping
,
34
(
9
),
2154
2177
. https://doi.org/10.1002/hbm.22058
Jenkinson
,
M.
,
Beckmann
,
C. F.
,
Behrens
,
T. E. J.
,
Woolrich
,
M. W.
, &
Smith
,
S. M.
(
2012
).
FSL
.
NeuroImage
,
62
(
2
),
782
790
. https://doi.org/10.1016/J.NEUROIMAGE.2011.09.015
Keilholz
,
S. D.
,
Caballero-Gaudes
,
C.
,
Bandettini
,
P. A.
,
Deco
,
G.
, &
Calhoun
,
V. D.
(
2017
).
Time-resolved resting-state functional magnetic resonance imaging analysis: Current status, challenges, and new directions
.
Brain Connectivity
,
7
(
8
),
465
481
. https://doi.org/10.1089/brain.2017.0543
Kingma
,
D. P.
, &
Ba
,
J. L.
(
2015
).
Adam: A method for stochastic optimization
.
Proceedings of the 3rd International Conference on Learning Representations
,
1
15
. https://arxiv.org/pdf/1412.6980
Kong
,
R.
,
Li
,
J.
,
Orban
,
C.
,
Sabuncu
,
M. R.
,
Liu
,
H.
,
Schaefer
,
A.
,
Sun
,
N.
,
Zuo
,
X. N.
,
Holmes
,
A. J.
,
Eickhoff
,
S. B.
, &
Yeo
,
B. T. T.
(
2019
).
Spatial topography of individual-specific cortical networks predicts human cognition, personality, and emotion
.
Cerebral Cortex
,
29
(
6
),
2533
2551
. https://doi.org/10.1093/cercor/bhy123
Kringelbach
,
M. L.
, &
Deco
,
G.
(
2020
).
Brain states and transitions: Insights from computational neuroscience
.
Cell Reports
,
32
(
10
),
108128
. https://doi.org/10.1016/j.celrep.2020.108128
Kudela
,
M.
,
Harezlak
,
J.
, &
Lindquist
,
M. A.
(
2017
).
Assessing uncertainty in dynamic functional connectivity
.
NeuroImage
,
149
,
165
177
. https://doi.org/10.1016/j.neuroimage.2017.01.056
Laird
,
A. R.
,
Fox
,
P. M.
,
Eickhoff
,
S. B.
,
Turner
,
J. A.
,
Ray
,
K. L.
,
Mckay
,
D. R.
,
Glahn
,
D. C.
,
Beckmann
,
C. F.
,
Smith
,
S. M.
, &
Fox
,
P. T.
(
2011
).
Behavioral interpretations of intrinsic connectivity networks
.
Journal of Cognitive Neuroscience
,
23
(
12
),
4022
4037
. https://doi.org/10.1162/jocn_a_00077
Lan
,
S.
,
Holbrook
,
A.
,
Elias
,
G. A.
,
Fortin
,
N. J.
,
Ombao
,
H.
, &
Shahbaba
,
B.
(
2020
).
Flexible Bayesian dynamic modeling of correlation and covariance matrices
.
Bayesian Analysis
,
15
(
4
),
1199
1228
. https://doi.org/10.1214/19-ba1173
Leenings
,
R.
,
Winter
,
N. R.
,
Dannlowski
,
U.
, &
Hahn
,
T.
(
2022
).
Recommendations for machine learning benchmarks in neuroimaging
.
NeuroImage
,
257
,
119298
. https://doi.org/10.1016/j.neuroimage.2022.119298
Leonardi
,
N.
, &
Van De Ville
,
D.
(
2015
).
On spurious and real fluctuations of dynamic functional connectivity during rest
.
NeuroImage
,
104
,
430
436
. https://doi.org/10.1016/j.neuroimage.2014.09.007
Li
,
J.
,
Kong
,
R.
,
Liégeois
,
R.
,
Orban
,
C.
,
Tan
,
Y.
,
Sun
,
N.
,
Holmes
,
A. J.
,
Sabuncu
,
M. R.
,
Ge
,
T.
, &
Yeo
,
B. T. T.
(
2019
).
Global signal regression strengthens association between resting-state functional connectivity and behavior
.
NeuroImage
,
196
,
126
141
. https://doi.org/10.1016/j.neuroimage.2019.04.016
Li
,
L.
,
Pluta
,
D.
,
Shahbaba
,
B.
,
Fortin
,
N. J.
,
Ombao
,
H.
, &
Baldi
,
P.
(
2019
).
Modeling dynamic functional connectivity with latent factor Gaussian processes
.
Advances in Neural Information Processing Systems
,
32
,
8263
8273
. https://doi.org/10.1214/19-ba1173
Liégeois
,
R.
,
Li
,
J.
,
Kong
,
R.
,
Orban
,
C.
,
Van De Ville
,
D.
,
Ge
,
T.
,
Sabuncu
,
M. R.
, &
Yeo
,
B. T. T.
(
2019
).
Resting brain dynamics at different timescales capture distinct aspects of human behavior
.
Nature Communications
,
10
(
1
),
1
9
. https://doi.org/10.1038/s41467-019-10317-7
Liégeois
,
R.
,
Yeo
,
B. T. T.
, &
Van De Ville
,
D.
(
2021
).
Interpreting null models of resting-state functional MRI dynamics: Not throwing the model out with the hypothesis
.
NeuroImage
,
243
,
118518
. https://doi.org/10.1016/j.neuroimage.2021.118518
Lindquist
,
M. A.
,
Xu
,
Y.
,
Nebel
,
M. B.
, &
Caffo
,
B. S.
(
2014
).
Evaluating dynamic bivariate correlations in resting-state fMRI: A comparison study and a new approach
.
NeuroImage
,
101
,
531
546
. https://doi.org/10.1016/j.neuroimage.2014.06.052
Lloyd
,
S. P.
(
1982
).
Least squares quantization in PCM
.
IEEE Transactions on Information Theory
,
28
(
2
),
129
137
. https://doi.org/10.1109/TIT.1982.1056489
Lurie
,
D. J.
,
Kessler
,
D.
,
Bassett
,
D. S.
,
Betzel
,
R. F.
,
Breakspear
,
M.
,
Kheilholz
,
S.
,
Kucyi
,
A.
,
Liégeois
,
R.
,
Lindquist
,
M. A.
,
McIntosh
,
A. R.
,
Poldrack
,
R. A.
,
Shine
,
J. M.
,
Thompson
,
W. H.
,
Bielczyk
,
N. Z.
,
Douw
,
L.
,
Kraft
,
D.
,
Miller
,
R. L.
,
Muthuraman
,
M.
,
Pasquini
,
L.
, …
Calhoun
,
V. D.
(
2020
).
Questions and controversies in the study of time-varying functional connectivity in resting fMRI
.
Network Neuroscience
,
4
(
1
),
30
69
. https://doi.org/10.1162/netn_a_00116
Matthews
,
A. G. de G.
,
van der Wilk
,
M.
,
Nickson
,
T.
,
Fujii
,
K.
,
Boukouvalas
,
A.
,
León-Villagrá
,
P.
,
Ghahramani
,
Z.
, &
Hensman
,
J.
(
2017
).
GPflow: A Gaussian process library using TensorFlow
.
Journal of Machine Learning Research
,
18
,
1
6
. http://jmlr.org/papers/v18/16-537.html
Meng
,
R.
,
Yang
,
F.
, &
Kim
,
W. H.
(
2023
).
Dynamic covariance estimation via predictive Wishart process with an application on brain connectivity estimation
.
Computational Statistics and Data Analysis
,
185
,
107763
. https://doi.org/10.1016/j.csda.2023.107763
Miller
,
R. L.
,
Abrol
,
A.
,
Adalı
,
T.
,
Levin-Schwarz
,
Y.
, &
Calhoun
,
V. D.
(
2018
).
Resting-State fMRI dynamics and null models: Perspectives, sampling variability, and simulations
.
Frontiers in Neuroscience
,
12
,
551
. https://doi.org/10.3389/fnins.2018.00551
Murphy
,
K.
,
Birn
,
R. M.
,
Handwerker
,
D. A.
,
Jones
,
T. B.
, &
Bandettini
,
P. A.
(
2009
).
The impact of global signal regression on resting state correlations: Are anti-correlated networks introduced
?
NeuroImage
,
44
(
3
),
893
905
. https://doi.org/10.1016/J.NEUROIMAGE.2008.09.036
Nielsen
,
S. F. V.
,
Madsen
,
K. H.
,
Schmidt
,
M. N.
, &
Morup
,
M.
(
2017
).
Modeling dynamic functional connectivity using a Wishart mixture model
.
International Workshop on Pattern Recognition in Neuroimaging (PRNI)
,
1
4
. https://doi.org/10.1109/PRNI.2017.7981505
Noble
,
S.
,
Scheinost
,
D.
, &
Constable
,
R. T.
(
2019
).
A decade of test-retest reliability of functional connectivity: A systematic review and meta-analysis
.
NeuroImage
,
203
,
116157
. https://doi.org/10.1016/j.neuroimage.2019.116157
Nooner
,
K. B.
,
Colcombe
,
S. J.
,
Tobe
,
R. H.
,
Mennes
,
M.
,
Benedict
,
M. M.
,
Moreno
,
A. L.
,
Panek
,
L. J.
,
Brown
,
S.
,
Zavitz
,
S. T.
,
Li
,
Q.
,
Sikka
,
S.
,
Gutman
,
D.
,
Bangaru
,
S.
,
Schlachter
,
R. T.
,
Kamiel
,
S. M.
,
Anwar
,
A. R.
,
Hinz
,
C. M.
,
Kaplan
,
M. S.
,
Rachlin
,
A. B.
, …
Milham
,
M. P.
(
2012
).
The NKI-Rockland sample: A model for accelerating the pace of discovery science in psychiatry
.
Frontiers in Neuroscience
,
6
,
152
. https://doi.org/10.3389/fnins.2012.00152
Novelli
,
L.
, &
Razi
,
A.
(
2022
).
A mathematical perspective on edge-centric brain functional connectivity
.
Nature Communications
,
13
(
1
),
2693
. https://doi.org/10.1038/s41467-022-29775-7
Pruim
,
R. H. R.
,
Mennes
,
M.
,
van Rooij
,
D.
,
Llera
,
A.
,
Buitelaar
,
J. K.
, &
Beckmann
,
C. F.
(
2015
).
ICA-AROMA: A robust ICA-based strategy for removing motion artifacts from fMRI data
.
NeuroImage
,
112
,
267
277
. https://doi.org/10.1016/j.neuroimage.2015.02.064
Rasmussen
,
C. E.
, &
Williams
,
C. K. I.
(
2005
).
Gaussian processes for machine learning
.
The MIT Press
. https://doi.org/10.7551/mitpress/3206.001.0001
Robinson
,
L. F.
,
Wager
,
T. D.
, &
Lindquist
,
M. A.
(
2010
).
Change point estimation in multi-subject fMRI studies
.
NeuroImage
,
49
(
2
),
1581
1592
. https://doi.org/10.1016/j.neuroimage.2009.08.061
Saatçi
,
Y.
,
Turner
,
R.
, &
Rasmussen
,
C. E.
(
2010
).
Gaussian process change point models
.
Proceedings of the 27th International Conference on Machine Learning
,
927
934
. https://icml.cc/Conferences/2010/papers/170.pdf
Sabuncu
,
M. R.
,
Ge
,
T.
,
Holmes
,
A. J.
,
Smoller
,
J. W.
,
Buckner
,
R. L.
, &
Fischl
,
B.
(
2016
).
Morphometricity as a measure of the neuroanatomical signature of a trait
.
Proceedings of the National Academy of Sciences of the United States of America
,
113
(
39
),
E5749
E5756
. https://doi.org/10.1073/pnas.1604378113
Sahib
,
A. K.
,
Erb
,
M.
,
Marquetand
,
J.
,
Martin
,
P.
,
Elshahabi
,
A.
,
Klamer
,
S.
,
Vulliemoz
,
S.
,
Scheffler
,
K.
,
Ethofer
,
T.
, &
Focke
,
N. K.
(
2018
).
Evaluating the impact of fast-fMRI on dynamic functional connectivity in an event-based paradigm
.
PLoS One
,
13
(
1
),
1
15
. https://doi.org/10.1371/journal.pone.0190480
Sakoğlu
,
Ü.
,
Pearlson
,
G. D.
,
Kiehl
,
K. A.
,
Wang
,
Y. M.
,
Michael
,
A. M.
, &
Calhoun
,
V. D.
(
2010
).
A method for evaluating dynamic functional network connectivity and task-modulation: Application to schizophrenia
.
Magnetic Resonance Materials in Physics, Biology and Medicine
,
23
(
5–6
),
351
366
. https://doi.org/10.1007/s10334-010-0197-8
Salimi-Khorshidi
,
G.
,
Douaud
,
G.
,
Beckmann
,
C. F.
,
Glasser
,
M. F.
,
Griffanti
,
L.
, &
Smith
,
S. M.
(
2014
).
Automatic denoising of functional MRI data: Combining independent component analysis and hierarchical fusion of classifiers
.
NeuroImage
,
90
,
449
468
. https://doi.org/10.1016/j.neuroimage.2013.11.046
Shakil
,
S.
,
Lee
,
C. H.
, &
Keilholz
,
S. D.
(
2016
).
Evaluation of sliding window correlation performance for characterizing dynamic functional connectivity and brain states
.
NeuroImage
,
133
,
111
128
. https://doi.org/10.1016/j.neuroimage.2016.02.074
Shine
,
J. M.
,
Bissett
,
P. G.
,
Bell
,
P. T.
,
Koyejo
,
O.
,
Balsters
,
J. H.
,
Gorgolewski
,
K. J.
,
Moodie
,
C. A.
, &
Poldrack
,
R. A.
(
2016
).
The cynamics of functional brain networks: Integrated network states during cognitive task performance
.
Neuron
,
92
(
2
),
544
554
. https://doi.org/10.1016/j.neuron.2016.09.018
Shine
,
J. M.
,
Koyejo
,
O.
,
Bell
,
P. T.
,
Gorgolewski
,
K. J.
,
Gilat
,
M.
, &
Poldrack
,
R. A.
(
2015
).
Estimation of dynamic functional connectivity using Multiplication of Temporal Derivatives
.
NeuroImage
,
122
,
399
407
. https://doi.org/10.1016/j.neuroimage.2015.07.064
Shirer
,
W. R.
,
Ryali
,
S.
,
Rykhlevskaia
,
E.
,
Menon
,
V.
, &
Greicius
,
M. D.
(
2012
).
Decoding subject-driven cognitive states with whole-brain connectivity patterns
.
Cerebral Cortex
,
22
(
1
),
158
165
. https://doi.org/10.1093/cercor/bhr099
Shou
,
H.
,
Eloyan
,
A.
,
Lee
,
S.
,
Zipunnikov
,
V.
,
Crainiceanu
,
A. N.
,
Nebel
,
M. B.
,
Caffo
,
B.
,
Lindquist
,
M. A.
, &
Crainiceanu
,
C. M.
(
2013
).
Quantifying the reliability of image replication studies: The image intraclass correlation coefficient (I2C2)
.
Cognitive, Affective and Behavioral Neuroscience
,
13
(
4
),
714
724
. https://doi.org/10.3758/s13415-013-0196-0
Shrout
,
P. E.
, &
Fleiss
,
J. L.
(
1979
).
Intraclass correlations: Uses in assessing rater reliability
.
Psychological Bulletin
,
86
(
2
),
420
428
. https://doi.org/10.1037/0033-2909.86.2.420
Smith
,
S. M.
,
Beckmann
,
C. F.
,
Andersson
,
J. L. R.
,
Auerbach
,
E. J.
,
Bijsterbosch
,
J. D.
,
Douaud
,
G.
,
Duff
,
E.
,
Feinberg
,
D. A.
,
Griffanti
,
L.
,
Harms
,
M. P.
,
Kelly
,
M.
,
Laumann
,
T. O.
,
Miller
,
K. L.
,
Moeller
,
S.
,
Petersen
,
S.
,
Power
,
J.
,
Salimi-Khorshidi
,
G.
,
Snyder
,
A. Z.
,
Vu
,
A. T.
, …
Glasser
,
M. F.
(
2013
).
Resting-state fMRI in the Human Connectome Project
.
NeuroImage
,
80
,
144
168
. https://doi.org/10.1016/j.neuroimage.2013.05.039
Smith
,
S. M.
,
Fox
,
P. T.
,
Miller
,
K. L.
,
Glahn
,
D. C.
,
Fox
,
P. M.
,
Mackay
,
C. E.
,
Filippini
,
N.
,
Watkins
,
K. E.
,
Toro
,
R.
,
Laird
,
A. R.
, &
Beckmann
,
C. F.
(
2009
).
Correspondence of the brain’s functional architecture during activation and rest
.
Proceedings of the National Academy of Sciences of the United States of America
,
106
(
31
),
13040
13045
. https://doi.org/10.1073/pnas.0905267106
Smith
,
S. M.
,
Nichols
,
T. E.
,
Vidaurre
,
D.
,
Winkler
,
A. M.
,
Behrens
,
T. E. J.
,
Glasser
,
M. F.
,
Ugurbil
,
K.
,
Barch
,
D. M.
,
Van Essen
,
D. C.
, &
Miller
,
K. L.
(
2015
).
A positive-negative mode of population covariation links brain connectivity, demographics and behavior
.
Nature Neuroscience
,
18
(
11
),
1565
1567
. https://doi.org/10.1038/nn.4125
Smith
,
S. M.
,
Vidaurre
,
D.
,
Beckmann
,
C. F.
,
Glasser
,
M. F.
,
Jenkinson
,
M.
,
Miller
,
K. L.
,
Nichols
,
T. E.
,
Robinson
,
E. C.
,
Salimi-Khorshidi
,
G.
,
Woolrich
,
M. W.
,
Barch
,
D. M.
,
Uǧurbil
,
K.
, &
Van Essen
,
D. C.
(
2013
).
Functional connectomics from resting-state fMRI
.
Trends in Cognitive Sciences
,
17
(
12
),
666
682
. https://doi.org/10.1016/J.TICS.2013.09.016
Steegen
,
S.
,
Tuerlinckx
,
F.
,
Gelman
,
A.
, &
Vanpaemel
,
W.
(
2016
).
Increasing transparency through a multiverse analysis
.
Perspectives on Psychological Science
,
11
(
5
),
702
712
. https://doi.org/10.1177/1745691616658637
Taghia
,
J.
,
Ryali
,
S.
,
Chen
,
T.
,
Supekar
,
K.
,
Cai
,
W.
, &
Menon
,
V.
(
2017
).
Bayesian switching factor analysis for estimating time-varying functional connectivity in fMRI
.
NeuroImage
,
155
,
271
290
. https://doi.org/10.1016/j.neuroimage.2017.02.083
Thompson
,
W. H.
,
Richter
,
C. G.
,
Plavén-Sigray
,
P.
, &
Fransson
,
P.
(
2018
).
Simulations to benchmark time-varying connectivity methods for fMRI
.
PLoS Computational Biology
,
14
(
5
),
e1006196
. https://doi.org/10.1371/journal.pcbi.1006196
Titsias
,
M.
(
2009
).
Variational learning of inducing variables in sparse Gaussian processes
.
Journal of Machine Learning Research
,
5
,
567
574
. https://proceedings.mlr.press/v5/titsias09a.html
Vallat
,
R.
(
2018
).
Pingouin: Statistics in Python
.
Journal of Open Source Software
,
3
(
31
),
1026
. https://doi.org/10.21105/joss.01026
Varela
,
F.
,
Lachaux
,
J.-P.
,
Rodriguez
,
E.
, &
Martinerie
,
J.
(
2001
).
The Brainweb: Phase synchronization and large-scale integration
.
Nature Reviews Neuroscience
,
2
,
229
239
. https://doi.org/10.1038/35067550
Vergara
,
V. M.
,
Abrol
,
A.
, &
Calhoun
,
V. D.
(
2019
).
An average sliding window correlation method for dynamic functional connectivity
.
Human Brain Mapping
,
40
(
7
),
2089
2103
. https://doi.org/10.1002/hbm.24509
Vidaurre
,
D.
,
Abeysuriya
,
R.
,
Becker
,
R.
,
Quinn
,
A. J.
,
Alfaro-Almagro
,
F.
,
Smith
,
S. M.
, &
Woolrich
,
M. W.
(
2018
).
Discovering dynamic brain networks from big data in rest and task
.
NeuroImage
,
180
,
646
656
. https://doi.org/10.1016/j.neuroimage.2017.06.077
Vidaurre
,
D.
,
Llera
,
A.
,
Smith
,
S. M.
, &
Woolrich
,
M. W.
(
2021
).
Behavioural relevance of spontaneous, transient brain network interactions in fMRI
.
NeuroImage
,
229
,
117713
. https://doi.org/10.1016/j.neuroimage.2020.117713
Wang
,
H. E.
,
Bénar
,
C. G.
,
Quilichini
,
P. P.
,
Friston
,
K. J.
,
Jirsa
,
V. K.
, &
Bernard
,
C.
(
2014
).
A systematic framework for functional connectivity measures
.
Frontiers in Neuroscience
,
8
,
405
. https://doi.org/10.3389/fnins.2014.00405
Warnick
,
R.
,
Guindani
,
M.
,
Erhardt
,
E. B.
,
Allen
,
E. A.
,
Calhoun
,
V. D.
, &
Vannucci
,
M.
(
2018
).
A Bayesian approach for estimating dynamic functional network connectivity in fMRI data
.
Journal of the American Statistical Association
,
113
(
521
),
134
151
. https://doi.org/10.1080/01621459.2017.1379404
Welvaert
,
M.
, &
Rosseel
,
Y.
(
2013
).
On the definition of signal-to-noise ratio and contrast-to-noise ratio for fMRI data
.
PLoS One
,
8
(
11
),
e77089
. https://doi.org/10.1371/journal.pone.0077089
Wilkinson
,
W. J.
,
Solin
,
A.
, &
Adam
,
V.
(
2021
).
Sparse Algorithms for Markovian Gaussian Processes
.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics
,
130
,
1747
1755
. https://proceedings.mlr.press/v130/wilkinson21a.html
Wilson
,
A. G.
(
2013
).
The change point kernel
.
2010
,
2010
2013
. https://www.cs.cmu.edu/~andrewgw/changepoints.pdf
Wilson
,
A. G.
, &
Ghahramani
,
Z.
(
2011
).
Generalised Wishart processes
.
Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence
,
736
744
. https://doi.org/10.48550/arXiv.1101.0240
Worsley
,
K. J.
, &
Friston
,
K. J.
(
1995
).
Analysis of fMRI time-series revisited—Again
.
NeuroImage
,
2
(
3
),
173
181
. https://doi.org/10.1006/nimg.1995.1023
Worsley
,
K. J.
,
Liao
,
C. H.
,
Aston
,
J.
,
Petre
,
V.
,
Duncan
,
G. H.
,
Morales
,
F.
, &
Evans
,
A. C.
(
2002
).
A general statistical analysis for fMRI data
.
NeuroImage
,
15
(
1
),
1
15
. https://doi.org/10.1006/nimg.2001.0933
Xie
,
H.
,
Zheng
,
C. Y.
,
Handwerker
,
D. A.
,
Bandettini
,
P. A.
,
Calhoun
,
V. D.
,
Mitra
,
S.
, &
Gonzalez-Castillo
,
J.
(
2019
).
Efficacy of different dynamic functional connectivity methods to capture cognitively relevant information
.
NeuroImage
,
188
,
502
514
. https://doi.org/10.1016/j.neuroimage.2018.12.037
Yaesoubi
,
M.
,
Adalı
,
T.
, &
Calhoun
,
V. D.
(
2018
).
A window-less approach for capturing time-varying connectivity in fMRI data reveals the presence of states with variable rates of change
.
Human Brain Mapping
,
39
(
4
),
1626
1636
. https://doi.org/10.1002/hbm.23939
Yarkoni
,
T.
, &
Westfall
,
J.
(
2017
).
Choosing prediction over explanation in psychology: Lessons from machine learning
.
Perspectives on Psychological Science
,
12
(
6
),
1100
1122
. https://doi.org/10.1177/1745691617693393
Zalesky
,
A.
, &
Breakspear
,
M.
(
2015
).
Towards a statistical test for functional connectivity dynamics
.
NeuroImage
,
114
,
466
470
. https://doi.org/10.1016/j.neuroimage.2015.03.047
Zalesky
,
A.
,
Fornito
,
A.
, &
Bullmore
,
E. T.
(
2012
).
On the use of correlation as a measure of network connectivity
.
NeuroImage
,
60
(
4
),
2096
2106
. https://doi.org/10.1016/j.neuroimage.2012.02.001
Zalesky
,
A.
,
Fornito
,
A.
,
Cocchi
,
L.
,
Gollo
,
L. L.
, &
Breakspear
,
M.
(
2014
).
Time-resolved resting-state brain networks
.
Proceedings of the National Academy of Sciences of the United States of America
,
111
(
28
),
10341
10346
. https://doi.org/10.1073/pnas.1400181111
Zamani Esfahlani
,
F.
,
Faskowitz
,
J.
,
Slack
,
J.
,
Mišić
,
B.
, &
Betzel
,
R. F.
(
2022
).
Local structure-function relationships in human brain networks across the lifespan
.
Nature Communications
,
13
(
1
),
2053
. https://doi.org/10.1038/s41467-022-29770-y
Zhang
,
Z.
,
Telesford
,
Q. K.
,
Giusti
,
C.
,
Lim
,
K. O.
, &
Bassett
,
D. S.
(
2016
).
Choosing wavelet methods, filters, and lengths for functional brain network construction
.
PLoS One
,
11
(
6
),
e0157243
. https://doi.org/10.1371/journal.pone.0157243

A.1. Updating ground-truth covariances after noise addition

Although the distribution of the HCP time series used as noise ϵ in the simulation benchmarks is unknown, we know that E[ϵ]=0 and E[ϵ2]=1. First, we note that

E[yn*]=αE[yn]+(1α)E[ϵn]=0,
(A. 1)

so that the variance of the new (noisy) time series is

σyn*2=E[α2yn2+2α(1α)ynϵn+(1α)2ϵn2]=α2+(1α)2.
(A. 2)

Regarding the covariance terms, all cross-terms except within yn are zero; therefore, we are simply left with σyn,1*yn,2*=α2σyn,1yn,2. To maintain unit time series variances, the new (noisy) signal is

yn*=1α2+(1α)2[αyn+(1α)ϵn],
(A. 3)

so our new ground-truth covariance terms can be analytically updated as

σyn,1*yn,2*=α2α2+(1α)2σyn,1yn,2.
(A. 4)

A.2. Human connectome project (HCP) data description

HCP participants were young adults (twins and nontwin siblings), aged 22–35 years. We used the data from the HCP S1200 public release (published in March 2017), from the 812 subjects with four available scans processed using improved fMRI image reconstruction (“recon2”). These scans were acquired with a 3T Siemens connectome-Skyra using a multiband sequence. The readily available D=15 dimensional ICA-based node time series were used. For interpretability, each ICA component was mapped to a well-known functional network (FN) (Giorgio et al., 2018). Three neuroimaging researchers visually inspected and manually assigned each ICA component to 1 of 10 networks that resemble task-driven FNs (Fox & Raichle, 2007; Laird et al., 2011; Smith et al., 2009). The four scans were divided into two immediately consecutive scans on two separate days (scans 1A, 1B, 2A, and 2B). Scans were acquired with a TR of 0.72 seconds and a voxel size of 2 mm isotropic. Each scan contained N=1,200 images and was 15 minutes long. The data had been preprocessed using a (minimal) preprocessing pipeline (Glasser et al., 2013, 2016; Smith, Vidaurre, et al., 2013). This minimal preprocessing pipeline included motion and distortion correction, and anatomical registration to standard MNI space. Noise components had been filtered out during preprocessing using ICA+FIX (Griffanti et al., 2014; Salimi-Khorshidi et al., 2014). Further global signal regression was not performed. All time series had been demeaned, with variance normalized (Beckmann & Smith, 2004).

A.3. Rockland data description

The Rockland dataset contained fMRI scans of subjects (aged 18-35 years) alternating between fixating in the dark or seeing a checkerboard visual pattern consecutively (20 seconds in duration of both stimuli and rest periods). We used the dataset with a TR of 0.645 seconds (N=240). We only used data from the first baseline visit (coded as “ses-BAS1”). Data preprocessing was performed in FSL (Jenkinson et al., 2012) consisting of brain extraction and field-of-view fixation, data registration, motion correction, smoothing (5 mm Gaussian kernel), and detrending (high-pass, 0.01 Hz). No initial volumes were excluded from the scans. Finally, we denoised the data using ICA-AROMA (Pruim et al., 2015). The data were not prewhitened to ensure preservation of the visual task. Subjects with a correlation between their V1 BOLD time series and the stimulus time series (convolved with an HRF) of less than 0.4 were discarded, resulting in 286 subjects. Such low correlations indicated that the expected BOLD signature in V1 was not observed (e.g., owing to the subjects closing their eyes). This visual task is simpler than those recorded under the umbrella of the HCP, which makes it better suited for our benchmarking aims (Barch et al., 2013). We extracted BOLD activations from visual areas V1, V2, V3, V4, the medial prefrontal cortex (mPFC), and the primary motor cortex (M1). A plot of the time series, aligned across subjects, is shown in Figure S11A. As motivation for using this design, Di et al. (2015) showed that FC is not static across such visual stimuli and that FC between higher and lower visual regions is affected in a consistent manner with stimulus onset and offset.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.

Supplementary data