## Abstract

Functional connectivity (FC) of blood oxygen level-dependent (BOLD) fMRI time series can be estimated using methods that differ in sensitivity to the temporal order of time points (static vs. dynamic) and the number of regions considered in estimating a single edge (bivariate vs. multivariate). Previous research suggests that dynamic FC explains variability in FC fluctuations and behavior beyond static FC. Our aim was to systematically compare methods on both dimensions. We compared five FC methods: Pearson’s/full correlation (static, bivariate), lagged correlation (dynamic, bivariate), partial correlation (static, multivariate), and multivariate AR model with and without self-connections (dynamic, multivariate). We compared these methods by (i) assessing similarities between FC matrices, (ii) by comparing node centrality measures, and (iii) by comparing the patterns of brain-behavior associations. Although FC estimates did not differ as a function of sensitivity to temporal order, we observed differences between the multivariate and bivariate FC methods. The dynamic FC estimates were highly correlated with the static FC estimates, especially when comparing group-level FC matrices. Similarly, there were high correlations between the patterns of brain-behavior associations obtained using the dynamic and static FC methods. We conclude that the dynamic FC estimates represent information largely similar to that of the static FC.

## Author Summary

Functional connectivity (FC) methods differ in their sensitivity to temporal order (static vs. dynamic) and the number of regions used to estimate a single edge (bivariate vs. multivariate). Dynamic connectivity measures are sensitive to temporal order, whereas static ones are not. We compared the FC methods with respect to both dimensions and showed that the dynamic and static FC estimates represent largely similar information. Using simulations, we have shown that the similarity between dynamic and static FC estimates may be due to the convolution of the neural signal with the hemodynamic response function (HRF) alone. On the other hand, bivariate FC estimates showed better generalizability in cross-validation compared with multivariate FC estimates in the study of brain-behavior associations.

## INTRODUCTION

Brain functional connectivity (FC) is estimated by calculating statistical associations between time series of brain signal (Fornito, Zalesky, & Bullmore, 2016), which reflect functional relationships between brain regions (Reid et al., 2019). The investigation of FC has improved our understanding of brain function in health and disease and has been shown to be useful as a tool to predict interindividual differences, such as cognition, personality, or the presence of mental or neurological disorders (Tompson, Falk, Vettel, & Bassett, 2018; Wu et al., 2021). In functional magnetic resonance imaging (fMRI) studies, FC is most commonly estimated using the Pearson’s correlation coefficient between time series of pairs of regions. Although correlation is simple to understand and compute, it is insensitive to the temporal order of time points. Measures or models that are sensitive to the temporal order of time points are called *dynamic*, while measures that are insensitive to temporal order are measures of *static* FC. Given that the information flow in the brain is causally organized in time (Bolton, Van De Ville, Amico, Preti, & Liégeois, 2023; Cole, Ito, Bassett, & Schultz, 2016), dynamic connectivity models could be more informative in terms of understanding brain function and investigating brain-behavior associations.

In FC temporal information can be represented in two ways. First, the temporal order of the time points can be taken into account when computing the FC estimates. Models that are sensitive to temporal order are called *dynamic* models, whereas models that do not take temporal order into account are called *static* FC models. Second, the methods can investigate whether and how FC estimates change over time. A time series model is *stationary* (in a weak sense) if its first- and second-order statistics (mean and variance) do not vary as a function of time (Chatfield & Xing, 2019; Liégeois, Laumann, Snyder, Zhou, & Yeo, 2017). Importantly, the distinction between dynamic and static FC should not be confused with the distinction between stationary and nonstationary FC.

Nonstationarities are commonly estimated using measures of time-varying functional connectivity (TVFC), such as sliding window correlation (SWC) (Lurie et al., 2020; Preti, Bolton, & Van De Ville, 2017). In this method, we calculate connectivity (e.g., using correlation) in a time window of selected length around a given time point; this window is continuously being moved from the start to the end of the recording. Procedures such as autoregressive (AR) randomization or phase randomization can be used to construct surrogate time series, which can then be used to perform a statistical test of the null hypothesis that a time series is stationary, linear, and Gaussian (Liégeois, Yeo, & Van De Ville, 2021). Using these procedures Liégeois et al. (2017) and Hindriks et al. (2016) have shown that the hypothesis that FC is stationary cannot be rejected for most participants. Similarly, Laumann et al. (2017) concluded that variation in FC over time within a single session can be largely explained by sampling variability, head motion, and fluctuating sleep state. Furthermore, EEG FC has been shown to be largely stable during resting state (Daniel Arzate-Mena et al., 2022), during sleep (Olguín-Rodríguez et al., 2018), and even before, during, and after epileptic seizures (Müller, Rummel, Goodfellow, & Schindler, 2014). On the other hand, several studies rejected the stationarity hypothesis for certain connections (Chang & Glover, 2010; Handwerker, Roopchansingh, Gonzalez-Castillo, & Bandettini, 2012; Zalesky, Fornito, Cocchi, Gollo, & Breakspear, 2014). However, do note that Zalesky et al. (2014) found that on average only 4% of the connections are nonstationary.

The inability to reject the stationarity hypothesis does not imply the absence of brain states (Liégeois et al., 2017), nor does it preclude finding (behaviorally) relevant information using models of TVFC (Liégeois et al., 2021). However, if a simpler model (i.e., a more interpretable model with fewer parameters) can be used to describe FC dynamics, it should be preferred to a complex model (such as SWC) unless the simpler model cannot model some important aspect of the time series a researcher is interested in (Liégeois et al., 2017; Novelli & Razi, 2022). Indeed, recent work on resting-state fMRI FC in humans has shown that many of the properties of TVFC can be predicted from a static and stationary FC model (Ladwig et al., 2022; Laumann et al., 2017; Matsui, Pham, Jimura, & Chikazoe, 2022; Novelli & Razi, 2022). Similarly, Liégeois et al. (2017) showed that SWC fluctuations can be explained with a model of dynamic (and stationary) FC. Since many studies have shown that FC is largely stationary, we focused on the relationship between *dynamic* (as defined above) and *static* connectivity.

Dynamic FC can be estimated using measures of lag-based connectivity, such as lagged correlation or multivariate autoregressive (AR) model. In contrast to static FC, dynamic FC methods can be used to estimate the *directionality* of information flow based on temporal precedence (Smith et al., 2011). Although these methods have been commonly used, some studies (David et al., 2008; Friston, 2009, 2011; Seth, Chorley, & Barnett, 2013; Smith et al., 2011) have warned that the ability of these methods to accurately estimate the presence and directionality of connections is compromised due to the convolution of the neural signal with the hemodynamic response function (HRF) and the resulting blurring of the signal, due to interregional variability of HRF (David et al., 2008; Friston, 2009, 2011), noise (David et al., 2008; Friston, 2009; Smith et al., 2011), and/or downsampling of the neural signal in fMRI (Seth et al., 2013). Other studies (Gilson et al., 2018; Gilson, Moreno-Bote, Ponce-Alvarez, Ritter, & Deco, 2016; Jin et al., 2017; Pallarés et al., 2018) have shown that the measures of dynamic FC complement the measures of static FC. For example, lagged FC measures can improve discrimination between individuals and between tasks (Gilson et al., 2018; Pallarés et al., 2018), have better predictive value for PTSD compared to static FC (Jin et al., 2017), and can be used to improve effective connectivity estimates (Gilson et al., 2016). Furthermore, Liégeois et al. (2017) have shown that the multivariate AR model explains temporal FC fluctuations better than Pearson correlations.

In subsequent research Liégeois et al. (2019) showed that static FC and dynamic FC exhibit different patterns of brain-behavior associations. They concluded that dynamic FC explains additional variance in behavior beyond variance that can be explained by static FC. However, this comparison confounds two orthogonal properties of FC methods. Although Pearson’s correlation and multivariate AR models differ in their sensitivity to temporal reordering (i.e., static vs. dynamic), they also differ in terms of how many variables (brain regions) are taken into account during the estimation of a single edge (bivariate vs. multivariate). Hence, a more valid comparison between static and dynamic FC methods should consider both dimensions: the number of variables and the sensitivity to temporal reordering. Combining these two factors enabled us to differentiate between four basic classes of FC methods (see Figure 1B).

Our aim was to systematically compare the FC estimated by both dimensions, that is, the sensitivity to temporal reordering (static vs. dynamic) and the number of independent variables (bivariate vs. multivariate). We focused on five mathematically related methods: full/Pearson’s correlation, partial correlation, lagged correlation, and multivariate AR model with and without self-connections, where self-connections refer to autocorrelation of the region with itself (Arbabshirani et al., 2019; Liégeois, Santos, Matta, Van De Ville, & Sayed, 2020). We were interested in similarities of the FC estimates and patterns of brain-behavior associations. We compared FC methods (i) by assessing similarities between FC matrices, (ii) by comparing node centrality measures, and (iii) by comparing brain-behavior associations. In addition, to better understand the results obtained using different methods and the relationship between them, we generated and analyzed synthetic data in which we systematically varied the length of time series and the amount of noise.

We used empirical and simulated data to test several hypotheses. First, we predicted that dynamic and static FC methods will provide similar FC estimates due to autocorrelation of the fMRI time series. Autocorrelation is inherent to the fMRI signal and originates from two main sources: physiological noise and convolution of neural activity with HRF (Honari, Choe, Pekar, & Lindquist, 2019). We expected the degree of similarity between static and dynamic FC estimates to be similar to or greater than the average autocorrelation of the fMRI time series. Furthermore, we expected the similarity between dynamic and static FC to be lower when the fMRI time series is prewhitened (i.e., when autocorrelation is removed before computation of FC).

Second, we predicted that multivariate methods can improve inferences about causal relationships between regions, as they estimate *direct* connections by removing the confounding influence of indirect associations (Reid et al., 2019) as opposed to bivariate methods, which cannot separate *indirect* and *direct* connections (Liégeois et al., 2020). By providing more direct information on causal relationships between brain regions (Novelli & Lizier, 2021), multivariate methods could improve brain-behavior associations in terms of explained variance and/or brain-behavior correlation estimates. Existing research has shown inconsistent differences in behavior predictive accuracy between partial and full/Pearson’s correlations, favoring either partial (Cai, Zhu, & Yu, 2020; Cheng, Ji, Zhang, & Feng, 2012) or full correlation (Abraham et al., 2017) or showing negligible differences between them (Dadi et al., 2019).

Finally, the choice of FC method can affect the measures of network topology (e.g., Liang et al., 2012; Zalesky, Fornito, & Bullmore, 2012). To address this problem, we compared FC estimates using common node centrality measures, including strength, eigenvector centrality, PageRank centrality, and participation coefficient. Using centrality measures also allowed us to compare incoming and outgoing connections in dynamic FC estimates. Based on previous research showing that nodes are either receptors or feeders (i.e., they have predominantly incoming or outgoing connections), but not both (Gilson et al., 2016), we expected a negative correlation between in-degree and out-degree. We had no specific hypotheses regarding the similarity of node centrality measures between multivariate and bivariate methods or between static and dynamic FC methods.

## METHOD

### Participants

To address the research questions, the analyses were performed on publicly available deidentified data from 1,096 participants (*M*_{age} = 28.8, *SD*_{age} = 3.7, 596 women) included in the Human Connectome Project, 1,200 Subjects Release (Van Essen et al., 2013). Each participant took part in two imaging sessions over two consecutive days that included the acquisition of structural, functional (rest and task), and diffusion-weighted MR images. The study was approved by the Washington University institutional review board, and informed consent was signed by each participant.

### fMRI Data Acquisition and Preprocessing

Data were acquired in two sessions using the Siemens 3T Connectome Skyra tomograph. Structural MPRAGE T1w image (TR = 2,400 ms, TE = 2.14 ms, TI = 1,000 ms, voxel size = 0.7 mm isotropic, SENSE factor = 2, flip angle = 8°) and T2w image (TR = 3,200 ms, TE = 565 ms, voxel size = 0.7 mm isotropic) were acquired in the first session. The participants underwent four resting-state fMRI runs, two in each session (gradient echo EPI sequence, multiband factor: 8, acquisition time: 14 min 24 s, TR = 720 ms, TE = 33.1 ms, flip angle = 52°).

Initial preprocessing was performed by the HCP team and included minimal preprocessing (Glasser et al., 2013), ICA-FIX denoising (Salimi-Khorshidi et al., 2014), and MSMAll registration (Robinson et al., 2014). The data was then further processed using QuNex (Ji et al., 2023) to prepare them for functional connectivity analyzes. First, we identified frames with excessive movement and/or frame-to-frame signal changes. We marked any frame that was characterized by frame displacement greater than 0.3 mm or for which the frame-to-frame change in signal, computed as intensity normalized root-mean-squared difference (DVARS) across all voxels, exceeded 1.2 times the DVARS median across the time series, as well as one frame before and two frames after them. Marked frames were used for motion censoring, which is described in detail below. Next, we used linear regression to remove multiple nuisance signals, including six movement correction parameters and their squared values, signals from the ventricles, white matter and the whole brain, as well as the first derivatives of the listed signals. The previously marked frames were excluded from the regression, and all subsequent analysis steps were performed on the residual signal. No temporal filtering was applied to the data, except a very gentle high-pass filter at the cutoff of 2,000 s applied by the HCP team (Glasser et al., 2013), since temporal filtering could introduce additional autocorrelation (Arbabshirani et al., 2014) and inflate correlation estimates (Davey, Grayden, Egan, & Johnston, 2013; Honari et al., 2019).

Motion scrubbing is usually performed by removing frames thought to be affected by movement (i.e., *bad* frames) before calculating the correlation or related measure of static FC. This is not appropriate in the case of dynamic FC or autocorrelation, since removing time points disrupts the autocorrelation structure of time series. To overcome this limitation, a frame was considered bad if it was bad in either original or lagged time series. Frames at transition between concatenated time series (last frame in the first time series and first frame in the next time series) were also marked as bad in this case.

Only sessions with at least 50% useful frames after motion censoring were used in the further analysis, except where noted otherwise. This resulted in 1,003 participants with at least one session. Before FC analyses, all resting-state BOLD runs from available sessions were concatenated and parcellated using a multimodal cortical parcellation (MMP1.0) containing 360 regions (Glasser et al., 2016). Each parcel was represented by a mean signal across all the parcel grayordinates.

### Functional Connectivity Estimation

Functional connectivity was estimated using five methods: full (Pearson’s) correlation, partial correlation, lagged correlation, multivariate AR model (also called vector AR model), and multivariate AR model without self-connections. The listed methods differ in terms of the number of regions used to estimate the connectivity of a single edge (bivariate vs. multivariate) and in terms of sensitivity to temporal reordering of time points (static vs. dynamic) (see Figure 1B). A multivariate AR model without self-connections was included to test how much similarity between the multivariate AR model and partial correlation depends on self-connections (the diagonal terms in the autocovariance matrix).

*x*

_{i}be a demeaned

*T*× 1 vector of region

*i*time series (

*T*is the number of time points) and let

*X*= [

*x*

_{1}, …,

*x*

_{N}]′ be a

*N*×

*T*matrix of the demeaned region time series (

*N*is the number of regions). Then the sample covariance matrix

*C*can be estimated with

A correlation matrix can be obtained by standardizing the time series to zero mean and unit standard deviation (i.e., *z*-scores) beforehand.

*ρ*is an element of a partial correlation matrix,

*w*is an element of a precision matrix, and

*i*and

*j*are the indices of rows and columns, respectively (Pervaiz, Vidaurre, Woolrich, & Smith, 2020).

*X*

_{t}be an

*N*× (

*T*−

*p*) matrix of shortened time series with time points from 1 to

*T*−

*p*(

*p*is the lag/model order) and

*X*

_{t+p}be a similar matrix with time points from

*p*+ 1 to

*T*. Then,

*p*-th order autocovariance or lagged covariance matrix. Diagonal entries are called autocovariances or, sometimes, self-connections or self-loops (Arbabshirani et al., 2019; Liégeois et al., 2020). Off-diagonal entries of autocovariance matrix are also called cross-covariances. Note that the autocovariance matrix of lag 0 is equal to the ordinary covariance matrix. The autocorrelation matrix was obtained by standardizing time series before computing autocovariance.

Correlations, autocorrelations, and partial correlations were Fisher *z*-transformed for subsequent analyzes.

*Z*be an

*Np*× (

*T*−

*p*) matrix of stacked matrices of shortened time series,

*Z*= [$Xt+p\u22121\u2032$, …, $Xt+1\u2032$, $Xt\u2032$]′. The multivariate AR model can be written in matrix notation as:

*A*is an

*N*×

*Np*matrix of AR coefficients of the

*p*-th order model and

*E*is an

*N*× (

*T*−

*p*) matrix of zero-mean, independent, normally distributed residuals. The matrix

*A*can be estimated using the ordinary least squares (OLS) estimator:

*p*= 1 $A\u02c6$ equals:

*i*separately, such that we set

*i*-th row of matrix

*X*

_{t}to zero (the equation above applies for

*p*= 1 only, but the model could be extended to include higher lags as in Equation 4). Vectors

*x*

_{i,t+p}were taken from rows of the matrix

*X*

_{t+p}and included time points from

*p*+ 1 to

*T*. Vectors

*e*

_{i}represent normally distributed, zero-mean, independent residuals. FC matrix was constructed by organizing

*N*× 1 vectors

*a*

_{i}into the

*N*×

*N*matrix

*A*

_{1}= [

*a*

_{i}, …,

*a*

_{N}]′. This matrix is asymmetric with zeros on the diagonal. The coefficients of both multivariate AR models were estimated using the coordinate descent algorithm implemented in the GLMnet package for MATLAB (Friedman, Hastie, & Tibshirani, 2010).

All AR models were estimated for lag 1 only. This order was shown to be optimal for the multivariate AR model for resting-state fMRI data with a high number of regions (Ting, Seghouane, Khalid, & Salleh, 2015; Valdes-Sosa, 2004), and also in a study using HCP data (Casorso et al., 2019). There were no differences between the variance of order 1 and the higher order models explained by the first principal component of the null data generated from the multivariate AR model in a previous study (Liégeois et al., 2017); therefore, we did not consider higher order autoregressive models.

### Prewhitening

We expected that FC estimates based on AR models would be similar to static FC estimates due to autocorrelation present in the fMRI time series. To test the similarity between static and dynamic FC in the absence of autocorrelation, we computed connectivity from from nonprewhitened time series and prewhitened time series. The exception was the multivariate AR model, where the diagonal terms (self-connections) effectively act as prewhitening. The difference between the multivariate AR model with and without prewhitening is essentially the difference between the multivariate AR model with and without self-connections. We performed the prewhitening by taking the residuals of the regression of the “raw” time series on the lagged time series.

To retain frame sequence after prewhitening, frames that were marked as bad in any of the original or lagged time series were set to zero before computing residuals. For this reason, frames that were preceded by a bad frame in any of the 1 to *l* previous frames were not prewhitened. At higher orders, this resulted in fewer total prewhitened frames.

Prewhitening was performed on orders 1 to 3 (abbreviated AR1/2/3 prewhitened). Autocorrelations were already significantly reduced at order 1 and were additionally reduced at lags 2 and 3 (Figure 2B). Since the results were similar regardless of the prewhitening order, only the results for the prewhitening on order 1 are shown in the main text, and the results for higher orders are shown in the Supporting Information.

### Similarities Between FC Estimates Obtained Using Different Methods

We estimated similarities between the FC estimates by computing the correlation between vectorized FC matrices. We adjusted the vectorization for each pair of methods so that only unique elements were taken into account. For example, correlation and partial correlation matrices are symmetric; therefore, only the upper or lower triangular part of the matrix (without the diagonal) should be considered. On the other hand, the FC matrices derived from the AR models are not symmetric; therefore, the whole matrix must be vectorized. The exception is the multivariate AR model without self-connections, which does not contain any information on the diagonal, so in this case matrix without the diagonal needs to be vectorized. When comparing asymmetric and symmetric matrices, we computed and used the average of the upper and lower triangular parts of the matrix (using equation (*X* + *X*′)/2).

We estimated similarities in two ways: first, by computing correlations between connectivity estimates for each subject separately and then averaging the resulting correlations (mean correlations between individual-level FC matrices), and second, by averaging FC matrices over participants and then computing correlation between methods on group FC matrices (correlations between group-level FC matrices).

To test how similarity between FC estimates depends on data quality, we repeated analyses on a subset of 200 participants with the largest number of retained frames.

#### Correlation between edge similarity and test-retest reliability.

To better understand the origin of the similarities between the FC methods, we examined the relationship between the edge similarity of the FC estimates obtained using different methods and test-retest reliability at the edge level. If similarities between FC estimates depend on the signal-to-noise ratio (SNR), more reliable edges will be more similar across methods.

*y*is an estimate of an edge,

*p*indicates participant,

*d*day,

*r*run, and

*e*residual.

We computed the ICC as a ratio between between-subject variance (which included interaction terms pertaining to participants) and the total variance (L. Li, Zeng, Lin, Cazzell, & Liu, 2015). For this analysis, the runs were not concatenated.

Finally, we applied Fisher’s *z*-transformation to both edge similarity and ICC and computed the correlation between them. To reduce the number of comparisons, we only investigated the most relevant comparisons: full correlation versus lagged correlation, partial correlation versus multivariate AR1, and partial correlation versus multivariate AR1 without self-connections. Since we estimated test-retest reliability separately for each method in a pair, there were two correlations for each pair of methods. We averaged both correlations for each comparison.

### Node Centrality Measures

In the second part, we compared FC estimates using four different centrality measures: mean strength, eigenvector centrality, PageRank centrality, and participation coefficient. It is important to note that we did not use path-based methods, such as betweenness centrality or closeness centrality, because their interpretation is not clear for correlation-based networks. In such networks, a statistical association between two nodes does not necessarily indicate a path of information flow (Fornito et al., 2016; Rubinov & Sporns, 2010). Moreover, the correlation coefficient already captures the shortest path between two nodes (Fornito et al., 2016).

The mean strength was computed as a mean of the edge weights for each region and it is analogous to a degree (number of connections) in binary networks. The shortcoming of strength is that it gives equal weight to all connections—it gives equal importance to nodes that are connected to other important nodes and to nodes that are connected to unimportant nodes.

*i*as the

*i*-th entry of a principal eigenvector of the network’s adjacency matrix (Fornito et al., 2016; Newman, 2018). Using a recursive formula, the eigenvector centrality can be expressed as:

*x*

_{i}is the centrality of the

*i*-th node,

*λ*

_{1}is the principal eigenvalue, and

*A*

_{ij}are the elements of the adjacency matrix.

*β*(usually set to 1) is added to ensure that no node has zero centrality and

*x*

_{j}is divided by the out-degree of node

*j*($kjout$) to prevent high-degree nodes from having a disproportionate influence on other nodes (Fornito et al., 2016; Newman, 2018). The balance between the eigenvector and the constant term is controlled by the parameter

*α*, which is usually set to 0.85.

We computed both eigenvector and PageRank centrality using the implementations available in the Brain Connectivity Toolbox (Rubinov & Sporns, 2010).

The degree-based or strength-based measures may be biased in correlation networks based on Pearson correlation, as node strength tends to correlate with community size (Pedersen, Omidvarnia, Shine, Jackson, & Zalesky, 2020; Power, Schlaggar, Lessov-Schlaggar, & Petersen, 2013). To mitigate this bias, we additionally used the participation coefficient to characterize node importance. The participation coefficient measures the distribution of a node’s connections across different modules (Guimerà & Nunes Amaral, 2005). If the node’s connections are evenly distributed across modules, the participation coefficient approaches 1, while a participation coefficient of zero indicates that the node’s connections are completely restricted to its module.

Here, *M* is a set of modules (communities), *k*_{i} is the total degree of node *i*, *k*_{i}(*m*) is the intramodular degree for node *i* in module *m*. *k*_{i}(*m*)_{rand} represents a median intramodular degree for node *i*, obtained by generating randomized networks using the Maslov-Sneppen rewiring algorithm (Maslov & Sneppen, 2002). *B*_{0} is a multiplicative term used to constrain the range of *PC*_{norm} between 0 and 1 and was set to 0.5. The number of randomizations was set to 100 at the individual level and to 1,000 at the group level. The module definitions were taken from Ji et al. (2019).

We calculated centrality measures at the level of individual FC matrices and at the level of the group-averaged FC matrix. We then compared centrality measures based on different FC methods by computing Pearson’s correlation between the obtained centrality measures. For the comparison at the individual level, we averaged the obtained correlations. Additionally, to better understand the relationship between different centrality measures, we computed correlations between different centrality measures for selected FC methods (full correlation, partial correlation, and multivariate AR model).

In the case of dynamic FC estimates, all centrality measures were estimated separately for incoming and outgoing connections. The matrix of outgoing connections was obtained by transposing the original FC matrix. In addition, all centrality measures were estimated separately for positive and negative connections. For the sake of brevity, only the results for positive connections are presented in the main text; other results can be found in the Supporting Information. We refer to the strength of nodes based on incoming or outgoing connections as in-strength and out-strength, respectively.

### Brain-Behavior Associations

To compare the brain-behavior associations obtained by different FC measures, we used 58 behavioral measures (see Supporting Information Table S1) that included cognitive, emotion and personality measures and were previously used in other studies (Kashyap et al., 2019; J. Li et al., 2019; Liégeois et al., 2019).

#### Variance component model.

*Y*represents the

*N*×

*P*matrix (number of subjects × number of traits) of behavioral measures,

*C*represents shared effects, and

*E*represents unique effects. The model has the following assumptions:

*I*is the identity matrix.

*F*represents

*N*×

*N*matrix of similarities between participants, which were estimated with the Pearson’s correlation coefficient. Σ

_{c}and Σ

_{e}are

*P*×

*P*matrices, which are being estimated. The total variance explained is computed as:

*M*is analogous to the concept of heritability and can be interpreted as the amount of variance in behavior that can be explained with the variance in the connectome.

Before computing VCM, we imputed missing behavioral data using the R package missForest (Stekhoven & Buhlmann, 2012). There were 0.59% missing data points overall. Following the procedure of Liégeois et al. (2019), we applied quantile normalization to behavioral data. To remove potential confounding factors, we regressed age, gender, race, education, and movement (mean FD) using the procedure described in Ge et al. (2015, 2016).

We estimated *M* for each connectivity method separately. We compared patterns of explained variances by correlating the variance explained at the trait level between all methods.

Since the results of VCM are based on similarities between participants (matrix *F*), we tested the extent to which the similarities between participants, and thus the results of VCM, depend on the levels of noise in the data. To this end, we performed a simulation in which we added random Gaussian noise (mean 0, standard deviation 0–1 in steps of 0.1) to the standardized time series. To reduce complexity, we performed this analysis only for static FC methods.

#### Canonical correlation analysis.

Since VCM is rarely used to study brain-behavior associations, we repeated the analysis using canonical correlation (CCA). CCA is used to reveal the low-dimensional structure of the shared variability between two sets of variables (in our case, connectivity and behavior).

Let *X* and *Y* be *N* × *P* and *N* × *Q* matrices (*N* is the number of observations, *P* and *Q* are the number of variables), respectively.

Here, *U*_{N×K} and *V*_{N×K} represent matrices of canonical scores (or variables), and *A*_{P×K} and *B*_{Q×K} represent matrices of canonical weights. The objective is to maximize the correlation between pairs of columns from *U* and *V* with the same index. These correlations are known as canonical correlations. The solution to the above set of equations is found under the constraint *U*′*U* = *V*′*V* = *I*. The columns of the *U* and *V* matrices tell us the relative position of each observation in the canonical variables. Columns of the *A* and *B* matrices contain information on the relative contribution of each variable to each of the canonical variables. Additionally, one can calculate canonical loadings—the correlations between original data matrices and canonical scores. Canonical variables are ordered in descending order according to the size of canonical correlations. Usually, only the first or first few canonical components are of interest, as these explain most of the shared variance. Mathematical details on CCA can be found elsewhere (e.g., Helmer et al., 2021; Rencher, 2002; Wang et al., 2020; Winkler, Renaud, Smith, & Nichols, 2020; Zhuang, Yang, & Cordes, 2020).

We performed the CCA using the GEMMR package (Helmer et al., 2021). To prepare the data for CCA, we followed the procedure by Smith et al. (2015), including deconfouding using the same variables as for VCM. Prior to CCA, we reduced the dimensionality of both sets of variables to 20 components using principal component analysis (PCA). This number was chosen to optimize the number of samples per feature based on the recommendation by Helmer et al. (2021) under the assumption of a real first canonical correlation *r* = .30. We performed a fivefold cross-validation to assess the generalizability of the model. We only examined the first canonical correlation since it was shown that the first canonical variable explains the most shared variance, and it was the only statistically significant canonical variable in a previous study (Smith et al., 2015).

We repeated the CCA for all FC methods. The similarities between the methods were assessed by comparing the first canonical correlation obtained in the training and the test set. Next, we correlated the canonical weights and loadings related to behavior.

#### Principal least squares.

Finally, we used principal least squares (PLS) to estimate brain-behavior associations. PLS is similar to CCA, with the goal to maximize covariance rather than correlation between sets (Helmer et al., 2021; Mihalik et al., 2022). When the columns of *X* and *Y* are standardized, PLS gives the same results as CCA.

It has been shown that the first PLS component is biased toward the first principal component (PC) axis (Helmer et al., 2021). To assess the degree of bias in our data, we estimated the similarity between the PLS/CCA weights or loadings for behavior and the weights for the first behavioral PC.

#### Control analyses.

Participants in the HCP dataset are genetically and environmentally related, which can inflate between-subject similarities and influence the results related to interindividual differences. Therefore, we repeated all analyses related to brain-behavior associations on two subsamples of genetically unrelated participants (sample sizes 384 and 339).

### Simulation

We hypothesized that dynamic and static FC estimates would be similar due to autocorrelation of fMRI time series, which is partly the result of convolution of neural time series with HRF. In addition, an important source of similarities (or differences) between FC results obtained by different methods could be due to similar (or different) effects of the amount of noise and the amount of available data on the resulting FC matrices. To evaluate the impact of convolution with HRF, signal quality, and the amount of data on estimated similarities between results using different FC measures, we used numerical simulations of data with known covariance structure. We generated multivariate time series of events for 1,000 “participants.” Events were sampled from a multivariate normal distribution with a mean of zero. The covariances differed for each participant and were taken from experimental data parcellated using Schaefer’s local-global parcellation with 100 regions (Schaefer et al., 2018). We used this parcellation instead of MMP to reduce the computational burden and the size of the generated data. Events were not autocorrelated. The generated events were then convolved with HRF using the SimTB toolbox (Erhardt, Allen, Wei, Eichele, & Calhoun, 2012). TR was set to 0.72 s (the same as in HCP data), and HRF parameters were set equal for all participants and regions (delay of response: 6, delay of the undershoot: 15, dispersion of the response: 1, dispersion of the undershoot: 1, the ratio of response to the undershoot: 3, onset in seconds: 0, length of the kernel in seconds: 32). The resulting time series were standardized.

To estimate the effects of signal quality on FC estimates and on similarities between FC methods, we added Gaussian noise with zero mean and standard deviation ranging from 0 to 1 standard deviation in steps of 0.1. This translates to SNR from 10 to 1 (excluding time series without noise, which has infinite SNR). We varied the time series durations from 500 to 10,000 data points in steps of 500.

The first step in the analysis was to establish the ground truth for each method, that is, the results that would be obtained in an ideal situation. We defined the ground truth as FC at maximum length and without noise in the event time series. Note that because events were not autocorrelated, the ground truth for all autoregressive FC methods was a matrix with all zero entries.

Next, we compared results using different FC methods in the same manner as for experimental data for all noise level and signal length combinations on prewhitened and non-prewhitened data. We computed (1) correlations between ground truth FC matrices and simulated FC matrices for all FC methods and (2) correlations between FC estimates obtained using different methods. To reduce the number of comparisons, we only investigated the most relevant comparisons: full correlation versus lagged correlation, partial correlation versus multivariate AR, and partial correlation versus multivariate AR without self-connections.

## RESULTS

### Similarities Between FC Estimates Obtained Using Different Methods

To address our research questions, we first focused on estimating similarities between the results obtained with different FC methods using empirical data. Comparison of group-level FC matrices showed very high correlations between FC results obtained using bivariate methods (*r* ⩾ .87, Figure 2A), as well as between results obtained using multivariate methods (correlation between partial correlation [AR1 prewhitened] and multivariate AR model: *r* = .80). In contrast, the correlations between the bivariate and multivariate FC estimates were lower and ranged from .36 to .65.

When comparing and pooling results based on individual-level FC matrices, the mean correlation between FC matrices obtained using different methods was lower. The correlations between the bivariate methods were still very high (correlation between lagged and full correlation: *r* = .99, correlation between prewhitened lagged and prewhitened full correlation: *r* = .83), while the correlations between the multivariate methods were lower on average. In particular, the correlation between the partial correlation (AR1 prewhitened) and the multivariate AR model was .05, compared to .80 between the group-level FC matrices.

The correlations between the results obtained using static and dynamic FC methods were smaller after prewhitening, with the greatest differences when comparing individual-level FC matrices obtained using multivariate methods. Specifically, the correlation between the coefficients of the multivariate AR model and the partial correlation decreased from .40 to .05 in the individual-level FC and from .86 to .80 in the group-level FC. The order of prewhitening had minimal effect on the correlations between the methods (Supporting Information Figure S2A), except for the comparison of the results obtained using the multivariate AR model and the partial correlation at the individual-level FC, where the correlations increased from .05 to .12 (*r* = .15–.22 for the multivariate AR model without self-connections).

The correlations between the FC results obtained using different methods were slightly higher when the analysis was repeated on 200 participants with the highest data quality (Supporting Information Figure S3).

#### Autocorrelations of fMRI time series.

To test the prediction that the similarities between the dynamic and static FC estimates would be similar to or greater than the mean autocorrelation of the fMRI time series, the mean autocorrelation function was computed across all participants and regions. The autocorrelation before prewhitening was .40 at lag 1 (Figure 2B). This autocorrelation decreased to −.10 after AR1 prewhitening, and was essentially zero after AR2 and AR3 prewhitening. Thus, the similarities between the dynamic and static FC were always higher than the autocorrelation at lag 1. Interestingly, prewhitening at orders 1 and 2 reversed the sign of the autocorrelation at low lags.

#### Correlation between edge similarity and test-retest reliability.

We computed edge similarity between FC methods as a correlation over subjects at every edge for selected pairs of FC methods. We estimated test-retest reliability at every edge for each method separately. Next, we computed the correlation between edge similarity and test-retest reliability for each of selected pairs of FC methods. The correlation was moderate to high for pairs of multivariate methods (*r* = .47–.66) and high for pairs of bivariate methods (*r* = .55–.79, Figure 3). Prewhitening lowered the correlations.

### Similarity of Node Centrality Measures

In the second part, we compared methods for estimating FC by comparing four node centrality measures: strength, eigenvector centrality, PageRank centrality, and participation coefficient (Figure 4, Supporting Information Figure S4, Supporting Information Figure S5). Unless otherwise noted, we focus on the positive connections.

As before, we observed a clear distinction between bivariate and multivariate methods for computing FC matrices. Correlations between centrality measures based on bivariate FC methods were consistently high, regardless of the centrality measure (above .97 at the group level and above .96 at the individual level). In contrast, the correlations between multivariate FC methods were lower, ranging from .59 to .99 for strength, eigenvector centrality, and normalized participation coefficient at the group level. Correlations for PageRank centrality were generally lower for multivariate methods, ranging from −.21 to 1.00.

Similarities between multivariate and bivariate FC methods were moderately high for strength, PageRank centrality, and normalized participation coefficient, ranging from .32 to .79, except for PageRank centrality based on outgoing connections of the multivariate AR model, where the correlations were around .10. However, for eigenvector centrality, we found low similarities between multivariate and bivariate FC methods at the group level (ranging from −.15 to .25) and moderate similarities at the individual level (ranging from −.27 to .45).

Notably, we observed positive correlations between incoming and outgoing connections for strength-based centrality measures at the group level, but negative correlations at the individual level. This pattern was present only for multivariate dynamic methods. Additionally, when comparing partial correlation and multivariate AR models, we found that the correlations between strength-based centrality measures were positive for incoming connections and negative for outgoing connections. In contrast, all correlations were positive for the normalized participation coefficient.

We also found that our results were generally consistent when analyzing centralities computed on negative connections (Supporting Information Figure S5). More specifically, similarities between multivariate FC methods were smaller for normalized participation coefficient, and similarities between multivariate and bivariate methods were larger for eigenvector centrality.

Prewhitening reduced similarities between methods, especially for outgoing connections (Supporting Information Figure S4, Supporting Information Figure S5). This was evident for both positive and negative connections.

To better understand the similarities and differences between the FC methods, we plotted the distributions of the centrality measures on the cortical surface (Figure 5, Supporting Information Figure S8). For both static FC methods, the strength was highest in the parietal regions (Figure 5A). For partial correlation, eigenvector centrality was distributed similarly to strength, whereas for full correlation, the highest eigenvector centrality values were in visual and somatomotor cortex (Figure 5B). For partial correlation, participation coefficient values were lowest in visual and somatomotor cortex, and highest in frontal and parietal regions belonging to parts of the cingulo-opercular, dorsal attention, and multimodal networks (Figure 5C). For the full correlation, similar to the partial correlation, the participation coefficient values were lowest in medial frontal regions and parts of visual cortex, and highest in parts of parietal cortex.

We also plotted the distributions of centrality measures based on incoming and outgoing connections of a multivariate AR model (Figure 6, Supporting Information Figure S9). In-strength was highest in the parietal lobe, while out-strength was highest in the parts of the temporal lobe and in the medial parietal lobe (Figure 6A). Eigenvector centrality was similarly distributed for incoming and outgoing connections, with highest values in the frontal and parietal regions, mainly in parts of the frontoparietal network (Figure 6B). Similar to partial correlation, participation coefficient values were lowest in the somatomotor and visual cortex and highest in medial frontal and medial temporal cortex (Figure 6C). For incoming connections, the PageRank centrality was distributed similarly to the eigenvector centrality, while for outgoing connections, the values were highest in the medial parietal and medial temporal lobes, and lowest in the somatomotor and frontal regions (Figure 6D).

Since in the case of the multivariate AR model the results of group-averaged connectomes differ from individual connectomes, we also plotted the distributions of the centrality measures for a representative subject (Supporting Information Figure S10, Supporting Information Figure S11). In the individual case, the distribution of strength-based centralities for outgoing connections is the opposite of that for incoming connections, with the lowest values in the parietal and occipital cortex.

Next, we analyzed the correlations between the centrality measures for full and partial correlation (Figure 5D, Supporting Information Figure S6). The correlations between the strength-based centrality measures were generally very high (*r* ⩾ .90) within positive and within negative connections. Exceptions were the correlation between eigenvector centrality and strength (*r* = .80) and the correlation between eigenvector centrality and PageRank centrality (*r* = .64) for positive connections on connectomes based on full correlation. In both cases, examination of the scatter plots (Figure 5D) revealed two groups of nodes—one group of nodes had higher similarity between centrality measures, while the other had a lower similarity. This pattern was also observed for negative connections, but with a smaller difference between the two groups.

### Brain-Behavior Associations

Next, we compared patterns of brain-behavior associations derived from different FC methods.

#### Variance component model.

The results of the VCM show that bivariate methods explain about 30 percentage points less variance in behavior than multivariate methods (Figure 7A and B). Furthermore, the similarity of patterns of variance explained over behavioral measures was high between static and dynamic FC methods using the same number of variables, that is, between full correlation and lagged correlation (*r* = 1.00), and between partial correlation and multivariate AR models (*r* = .83–.86, Figure 7A and C). The pattern of similarities in behavioral variance explained between the FC methods was comparable to the direct comparison of the FC matrices (Figure 7C, cf. Figure 2A). Patterns of similarities between the FC methods were similar when the analysis was performed on subsamples of unrelated participants (Supporting Information Figure S12A and C); however, the differences in total variance explained between the bivariate and multivariate methods were smaller (Supporting Information Figure S12B).

Simulation of the effects of noise in which we added various levels of noise to the fMRI time series showed that noise affects estimates of the behavioral variance explained by the connectome. In particular, the mean of the variance explained increased with increasing noise for both the full correlation and the partial correlation, but the increase was more pronounced in the case of partial correlations (Figure 8B). This pattern was not equal for all behavioral variables—for some, the variance explained decreased and for others, it increased (Figure 8A). On the other hand, the similarity between the participants decreased with increasing noise (Figure 8C). This effect was more pronounced for partial correlation compared to full correlation.

#### Canonical correlation analysis.

The results of the similarity between the FC methods when investigating brain-behavior associations using CCA were comparable to those obtained using VCM. In particular, the correlations between the weights or loadings on behavioral measures between the FC methods were high when comparing the methods that use the same number of variables for the estimation of a single edge (*r* > .80) (Figure 9C). On the other hand, there was no discernible difference between dynamic and static FC estimates.

The first canonical correlation was around .35 in the training sample for the bivariate methods and around .30 for the multivariate methods (Figure 9B). Cross-validated R was much lower, around .25 for bivariate methods and around 0.05 for multivariate methods. Although these results differ from VCM (where multivariate methods explained more variance), the pattern of similarity between FC methods is the same.

The pattern of results was similar for the subsamples of unrelated participants (Supporting Information Figure S13B and D), but the differences between the training and test sets were larger (Supporting Information Figure S13A and C). The large difference between the performance of the model in training and test sets is indicative of overfitting, which is characteristic of CCA with a small number of samples per feature (Helmer et al., 2021).

#### Principal least squares.

Analysis of brain-behavior associations using PLS revealed higher similarities between FC methods compared to CCA (Supporting Information Figure S14A and C). Specifically, the correlations between loadings were consistently greater than .91 for all methods compared, and the correlations between weights were greater than .51. Consistent with all previous results, we observed a clear separation between multivariate and bivariate methods when comparing weights, and no difference between static and dynamic FC methods based on the same number of variables. PLS was less generalizable compared to CCA, with canonical correlations on the training sample around .15–.20 and canonical correlations on the test sample around 0–.05.

However, our results also suggest that the high similarities between FC methods in PLS may be due to the strong similarity between the first behavioral canonical component and the first behavioral principal component, as reported in a previous study (Helmer et al., 2021) (Figure 10). The correlations between loadings and the first behavioral principal component were around 1.00, while the correlations between weights and the first behavioral principal component were around .80. In contrast, for CCA, these correlations were about .60–.75 and .10–.25, respectively.

### Evaluation of Similarities Between Methods on Simulated Data

#### Relationship between FC estimates and ground truth.

Correlations of FC estimates with ground truth were greater than 0.8 for full correlation and between 0.25 and 0.9 for partial correlation (Figure 11B). Prewhitening decreased the correlation with ground truth. This effect was more pronounced for partial correlations. Longer time series also had higher correlations with ground truth (the difference was up to .5 for partial correlation and up to .3 for full correlation). The correlation with ground truth generally decreased with decreasing SNR (increasing noise), but in the case of partial correlation, these effects were not monotonic. In particular, for short time series, correlation with ground truth increased with low to moderate noise. Also in the case of partial correlation, prewhitening increased the correlation with ground truth at low noise. In contrast, prewhitening decreased the correlation with ground truth in the presence of high noise compared to the case without prewhitening.

#### Similarity between FC estimates.

The connectivity matrices computed on the simulated data were compared in the same manner as for the experimental data. For brevity, we focus only on the three most relevant comparisons (lagged correlation vs. full correlation, multivariate AR model vs. partial correlation, multivariate AR model without self-connections vs. partial correlation).

Estimates based on lagged and full correlation were highly similar (*r* ≈ 1 in the case without prewhitening) for all levels of noise and signal length (Figure 11C). The correlation between FC estimates was reduced for prewhitened data, especially for low signal lengths (< 1,000 frames).

The FC estimates of the multivariate AR model did not correlate with the FC estimates based on partial correlation when the noise was low (*r* = 0 for zero noise). However, with increasing noise and increasing signal length, FC estimates became very similar (up to *r* = .95), especially in the case without prewhitening and for long signal lengths.

Conversely, FC estimates based on a multivariate AR model without self-connections showed a high similarity to the FC estimates based on partial correlation at a low noise level (*r* > .95). For prewhitened data, there was a nonmonotonic relationship between FC estimates with increasing noise, but overall correlations remained high in conditions with high signal length.

For both multivariate AR models, the similarities to the partial correlation were negative for very short time series. This effect was more pronounced for higher levels of noise, but the relationship with noise was not monotonic.

To better understand the relationship between the multivariate FC methods we plotted the distribution of edge values as a function of noise separately for diagonal and off-diagonal terms (Supporting Information Figure S18). For brevity we did this only for the longest signal (10,000 frames). For the multivariate AR model, the diagonal terms (self-connections) were close to 1 at very high SNR and decreased with decreasing SNR. Conversely, the mean of the off-diagonal terms remained close to zero, regardless of the SNR, but the variability increased with increasing SNR. The opposite pattern was observed when the data were prewhitened. At maximum SNR (i.e., when no noise was added to the data), the diagonal terms were essentially equal to one and the off-diagonal terms were essentially zero, with very low variability compared to all other distributions. The distribution of values at maximum SNR was not affected by prewhitening. For the multivariate AR1 model without self-connections and partial correlations, the variability of the edges decreased with increasing SNR. Prewhitening reduced the variability of the edges.

#### Autocorrelation on the simulated data.

We computed the average autocorrelation function over all participants and regions (Figure 11D, Supporting Information Figure S1). In general, noise and prewhitening reduced the absolute autocorrelation. The shape of the autocorrelation function varied as a function of noise and prewhitening. In the case without prewhitening, the autocorrelation decreased monotonically, reaching 0 at lag 8. With AR1 prewhitening, the autocorrelation decreased to negative values after lag 4.

## DISCUSSION

In this study, we addressed the question of whether the temporal order of the BOLD fMRI time series contains information important for the study of the fMRI brain functional connectivity. To this end, we compared FC estimates between methods that differ in their sensitivity to temporal order, that is, static and dynamic measures of FC. We also compared methods that differed in the number of variables considered in estimating the connectivity of individual edges, that is, bivariate and multivariate. Our results suggest that dynamic FC connectivity methods provide similar connectivity estimates as static FC methods of the same type (bivariate or multivariate), whereas bivariate and multivariate methods differ in terms of the explanation of individual differences in behavior.

### Dynamic Functional Connectomes Represent Information Similar to Static Functional Connectomes

By directly comparing the FC matrices, we have shown that the estimates of the dynamic FC represent information similar to the estimates of the static FC. The similarity between estimates of FC, obtained by different methods, depended on several factors. First, there were high correlations between the FC estimates when the same number of variables was considered (Figure 2A).

Second, similarities between connectomes were greater when averages were compared at the group level than when correlations were aggregated across individual-level FC matrices. We believe that the differences between the group- and individual-level cases are mainly due to better SNR in the case of the group-level data. Two observations support this conclusion: first, similarities in FC estimates between methods were greater for participants with the highest data quality, and this effect was more pronounced when comparing individual-level matrices than at the group level. Second, edges with higher test-retest reliability (an indicator of SNR) were more similar between FC estimates obtained by different methods. Thus, we can conclude that SNR influences the similarity between FC estimates.

Using simulation, we tested the similarities between FC as a function of noise and signal length (Figure 11C). We have shown that the dynamic FC estimates resemble static FC estimates even in the absence of true lagged correlation. The similarity between the multivariate AR1 model and partial correlations can be partially explained by the fact that the multivariate AR1 coefficients are a product of the inverse covariance and the lagged covariance matrix. In the case of the multivariate AR model, the similarity to the partial correlation was actually higher when more noise was added to the data. This occurs because the self-connections (the diagonal term in the AR matrix) act as a prewhitening term. When the SNR was maximal, the self-connections were close to 1 and the off-diagonal terms were close to zero (Supporting Information Figure S18). In other words, the self-connections explained all the variance in the time series and there was no variability left to be explained by the off-diagonal terms. When noise was added to the data, the autocorrelations were reduced (Figure 11D) and the self-connections shrank (Supporting Information Figure S18). Consequently there was less prewhitening due to the self-connections and the off-diagonal elements became more similar to the partial correlations. For the same reason, estimates based on a multivariate AR1 model without self-connections were highly correlated with estimates based on partial correlation regardless of the noise level—there were no self-connections to explain the autocorrelation.

We also found a high similarity between the full and the lagged correlation. Therefore, the similarity between the multivariate AR1 model and the partial correlation cannot be explained solely by the inclusion of the precision matrix in the estimation of the coefficients of the multivariate AR model. Rather, the lagged covariance matrix also contributes to this effect.

We hypothesized that the similarities between the dynamic and static FC estimates originate from autocorrelation of the fMRI time series. We predicted that the similarities between the dynamic and static FC estimates would be at least as large as the average autocorrelation of the fMRI time series and that this similarity would be reduced after prewhitening. Both predictions were confirmed in experimental and simulated data. However, even when autocorrelation was reduced to virtually zero at all lags (this occurred at prewhitening order 3), similarities between estimates based on dynamic and static FC models remained high for group-level matrices and simulated data. This suggests that prewhitening (or even the presence of noise that reduces autocorrelation) does not completely eliminate the influence of convolution with HRF on the estimation of dynamic FC.

We conclude that even if AR models represent information that goes beyond the static FC, this cannot be claimed on the basis of a direct comparison of dynamic and static FC estimates. One of the main differences between static and dynamic FC methods is the ability of dynamic FC methods to estimate the directionality of connections (Smith et al., 2011). FC matrices based on dynamic FC methods are therefore asymmetric. To allow comparisons between static and dynamic FC matrices, the former were symmetrized and the information about the directionality of the connections was lost. To test the possibility that there is specific information in the dynamic FC estimates that could not be detected in a direct comparison of the FC matrices, we additionally compared the node centrality measures and the patterns of brain-behavior associations between the FC methods.

### Network Topology Is Affected by the Functional Connectivity Estimation Method

Examining node centrality measures allowed us to investigate how different FC methods affect network topology. First, we analyzed the similarity between FC estimates based on different FC methods for each centrality measure separately. Overall, the results were consistent with direct comparisons of FC matrices. We found a clear distinction between multivariate and bivariate FC methods, while the difference between static and dynamic FC estimates was rather small (with an important caveat regarding the difference between incoming and outgoing connections; see below). The similarities were also influenced by the choice of the node centrality measure. In particular, the similarities between multivariate and bivariate FC methods were relatively low for eigenvector centrality (from −.15 to .25 for the group-level comparison), while for other centrality measures the similarity between multivariate and bivariate methods was higher (e.g., around .70 for strength).

We explored this finding further by examining the similarities between the centrality measures. While the correlations between the centrality measures were predominantly positive, and in some cases close to 1, there were some exceptions, suggesting that the centrality measures are not redundant. Specifically, for the correlation between eigenvector centrality and strength computed on full correlation connectomes, we observed two groups of nodes, one with higher similarity between the two centrality measures and one with lower similarity (Figure 5D). This pattern has been observed before (Zuo et al., 2012) and suggests that one group of nodes is connected to other important nodes, while the other is mainly connected to less connected nodes. In other words, these two groups of nodes can be distinguished by jointly considering both eigenvector centrality, which measures how well a node is connected to other important nodes (i.e., nodes with many or strong connections), and strength, which is affected only by the number or strength of a node’s connections. Notably, however, we observed this pattern for full correlation, but not for partial correlation. This suggests that indirect connections have an important influence on the global position of nodes in functional connectomes estimated using full correlation. When indirect connections are removed (i.e., when partial correlation is used to estimate FC), the topological position (importance) of a node is the same regardless of the centrality measure. In summary, the choice of FC method has a different impact on the network topology depending on the centrality measure used.

Second, we were interested in the relationship between incoming and outgoing connections. For multivariate AR model estimates, we found a negative correlation between in-strength and out-strength when comparing at the individual level. However, when comparing group-averaged FC matrices, the correlations between in-strength and out-strength were positive. Interestingly, when comparing the partial correlation with the multivariate AR model, the correlations of strength from the partial correlation connectomes were positive with in-strength from the multivariate AR model, but negative with out-strength. The individual-level results confirm previous findings (Gilson et al., 2016), suggesting that brain regions are either feeders or receivers, but not both. However, this information is lost when FC matrices are averaged across subjects. In addition, there was a positive correlation between the in-strength and out-strength of bivariate dynamic FC estimates, regardless of the level of comparison. In FC analyses, individual-level matrices are often averaged, concatenated (e.g., Liégeois et al., 2020), or estimated using a group prior (e.g., Chong et al., 2017). Because group-level FC matrices may be qualitatively different from individual-level FC matrices, we recommend that researchers perform analyses and/or examine results at both the group and individual levels whenever possible and/or meaningful.

### Dynamic FC Models Do Not Explain Additional Variance in Behavior Over Static FC Models

We used the variance component model (VCM), canonical correlation analysis (CCA), and principal least squares (PLS) to estimate brain-behavior associations. The results of all methods showed that there were no large differences between the dynamic and static FC estimates in the patterns of associations with behavior. However, we found large differences between the bivariate and multivariate methods. These differences were specific to the method used to estimate brain-behavior associations.

In the case of CCA, the canonical correlations were higher for bivariate methods than for multivariate methods. The cross-validated canonical correlations for multivariate methods were around 0, indicating that the results were not generalizable. In contrast, the difference between the canonical correlations in the training and test sets was relatively small for the bivariate methods.

In the case of PLS, the similarities between the FC methods were extremely high, especially when we compared loadings. We showed that these results are most likely due to the high similarity of the behavioral loadings and weights to the first behavioral PC, confirming previous observations that the PLS loadings and weights are biased toward the first principal axis, especially at low sample-to-feature ratios (Helmer et al., 2021). Compared to PLS, CCA, on the other hand, shows much less bias toward the first principal axis. In addition, the canonical correlations based on PLS had negligible generalizability. Therefore, we advise users to be cautious when using PLS. We recommend that users perform cross-validation and examine the similarity between canonical weights/loadings and PCs.

In the case of VCM, the multivariate methods explained on average about 30 percentage points more variance in behavior than the bivariate methods. To better understand this observation, we examined the impact of intersubject similarities on VCM results. To this end, we added random noise to the data, reducing the similarities between subjects. Interestingly, full correlation and partial correlation explained more variance in behavior on average when we added random noise to the data. This may sound counterintuitive, but keep in mind that VCM was developed to estimate heritability (Ge et al., 2016), that is, the proportion of variance in phenotype that can be explained by variance in genotype. Holding the environment constant, higher genetic similarity would reduce the estimate of heritability. If all individuals within a sample had the same genotype, heritability would be zero because no variance in phenotype could be explained by variance in genotype. The input to VCM is a between-subject similarity matrix (usually a genetic similarity matrix or, in our case, a connectome similarity matrix). Participants were more similar when we used full correlation as an estimate of FC compared to partial correlation. This explains the observation that the partial correlation explained more variance in behavior.

Our second simulation showed that the partial correlation estimates are less stable and more affected by noise and signal length. This explains the apparent discrepancy between VCM and CCA. Our results show that when we add noise to the experimental data, participants become more dissimilar and, in the case of VCM, the proportion of behavioral variance explained by the variance in the connectome becomes larger. In the case of CCA, lower SNR leads to lower and less generalizable canonical correlations for multivariate FC methods. For this reason, we recommend that great care be taken when estimating brain-behavior associations with measures that are sensitive to noise.

Liégeois et al. (2019) have used VCM to compare brain-behavior associations between correlation and the multivariate AR model. They concluded that the dynamic FC explained variance in behavior beyond that explained by static FC. We have shown that these results are confounded by the mixing of two orthogonal properties of the FC methods: sensitivity to the temporal order of time points and the number of regions used to estimate a single edge. The difference between the explanatory value of the multivariate AR model and the full correlation is better explained by the difference between the multivariate and bivariate nature of the method than by the sensitivity to the temporal order of the time points.

### Relationship Between Static, Dynamic, and Time-Varying Functional Connectivity

As explained in the Introduction, dynamic and time-varying FC encode different aspects of temporal information in FC. Based on previous research investigating resting-state fMRI, which showed that FC is largely stationary (Liégeois et al., 2017) and independent of cognitive content (Laumann & Snyder, 2021), we assumed stationarity of FC time series, and chose models of stationary static and dynamic FC as the basis of our study (as opposed to models of TVFC). Nevertheless, stationary FC is not incompatible with the presence of meaningful FC fluctuations (Ladwig et al., 2022; Liégeois et al., 2017).

Brain states can be estimated using TVFC estimation methods such as hidden Markov models (HMM) or clustering of sliding window correlation (SWC) matrices (Lurie et al., 2020; Preti et al., 2017). Brain states derived in this way have been studied in the context of tracking ongoing cognition and behavior, and also for predicting trait aspects of behavior, such as personality, psychopathology, and performance on cognitive tests (see reviews in J. R. Cohen, 2018; Lurie et al., 2020; Preti et al., 2017). Commonly used metrics derived from brain states include transition matrix (a matrix that encodes the probabilities of transitioning from one state to another), fractional occupancy (proportion of time spent in each state), and switching rate (the frequency of switching between states) (Allen et al., 2014; Eichenbaum, Pappas, Lurie, Cohen, & D’Esposito, 2021; Vidaurre, Llera, Smith, & Woolrich, 2021). In addition, some studies have quantified TVFC using edge variability metrics, such as edge variance or standard deviation (Choe et al., 2017; Thompson & Fransson, 2015), amplitude of low-frequency fluctuations (ALFF) (Zhang, Baum, Adduru, Biswal, & Michael, 2018), and excursions from a median time-varying correlation (Zalesky et al., 2014). Edge variability has been shown repeatedly to be negatively correlated with the static FC (Choe et al., 2017; Thompson & Fransson, 2015; Zalesky et al., 2014; Zhang et al., 2018), suggesting that stronger edges have lower variability, and that variability of FC is partially redundant with the edge strength derived from static and stationary FC.

Several studies have compared TVFC with static and stationary FC in terms of behavioral prediction, showing that TVFC-derived metrics have differential or better predictive power over static/stationary FC and/or anatomical brain features (Eichenbaum et al., 2021; Jia, Hu, & Deshpande, 2014; Jin et al., 2017; Rashid, Damaraju, Pearlson, & Calhoun, 2014; Vidaurre et al., 2021), and also for the prediction of psychopathology. Jin et al. (2017) compared the predictive value of static, dynamic, and time-varying FC, and showed that TVFC had the best predictive value for PTSD. However, consistent with our findings, dynamic FC was only slightly better than static FC. Note that some of these studies suffer from methodological shortcomings, such as small sample sizes (Eichenbaum et al., 2021; Jia et al., 2014), and thus the results may have low generalizability (Helmer et al., 2021; Marek et al., 2022). Nevertheless, overall, the results suggest that TVFC does contain additional information beyond static or dynamic FC. Further studies are needed to reconcile these findings with the evidence that resting-state connectivity is largely stationary.

### Limitations and Future Directions

A number of limitations should be considered in drawing conclusions from our study. First, in our simulation, we generated noise using a multivariate normal distribution. We could have used more advanced noise modeling that incorporated specific noise components such as drift, moving average, physiological noise, and system noise (Ellis, Baldassano, Schapiro, Cai, & Cohen, 2020). Unlike white noise, these noise sources are autocorrelated and therefore could affect the (dynamic) FC estimates. We wanted to keep the model simple and interpretable. Even with the simplest noise model without autocorrelation in neural time series, we showed that AR models can be affected by convolution of the neural signal with HRF and that consequently the dynamic FC estimates resemble the static FC. However, more advanced noise modeling could be used for a more realistic assessment of the sources of similarities between different FC methods.

Similarly, we used a very simple procedure, prewhitening, to reduce autocorrelation. Other methods could also be used to reduce autocorrelation, such as advanced physiological modeling (Bollmann, Puckett, Cunnington, & Barth, 2018; Chen, Polimeni, Bollmann, & Glover, 2019) or deconvolution (Rangaprakash, Wu, Marinazzo, Hu, & Deshpande, 2018). Deconvolution can improve dynamic (David et al., 2008) and static FC estimates (Rangaprakash et al., 2018). However, Seth et al. (2013) have shown that sufficient sampling rate is more important for valid dynamic FC estimates. Unlike fMRI, electrophysiological measurements such as EEG and MEG have sufficient sampling rates and do not require deconvolution, so they could be used to study the relationship between static and dynamic FC (Tagliazucchi & Laufs, 2015). Note that in EEG, volume conduction can inflate zero-lag connectivity, so careful consideration is needed to disentangle true zero-phase lag connectivity from volume conduction effects (M. X. Cohen, 2014). Furthermore, because instantaneous (zero-lag) signal transmission is not physiologically plausible, zero-phase lag effects in EEG most likely reflect indirect (noncausal) connections, whereas lagged effects are influenced by both indirect and direct (causal) connections. Therefore, the comparison of static and dynamic FC measures in the EEG can be used to disentangle causal and noncausal effects.

### Conclusions

Our results show that the dynamic FC estimates represent information about connectivity that is broadly similar to the static FC. Moreover, we have shown that the similarity between dynamic and static FC is due, at least in part, to the convolution of neural time series with HRF. In contrast, we observed less similarity in the patterns of FC estimates between multivariate and bivariate methods. Multivariate FC methods were more sensitive to noise and CCA models based on multivariate methods were less generalizable. We also showed that the choice of FC methods affects the network topology, with a noticeable difference between multivariate and bivariate FC estimates, and only small differences between dynamic and static FC estimates. While dynamic FC estimates can still provide information about the directionality of the connections, careful inspection of the results is required, as this information may change after averaging the FC matrices across participants.

Although dynamic FC models are useful as a model for directed FC or for modeling the evolution of neural time series over time (Liégeois et al., 2017), our results suggest that estimates of the functional connectome change very little when temporal information is taken into account. Dynamic FC estimates also show strong similarity to static FC in terms of brain-behavior associations.

## ACKNOWLEDGMENTS

Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.

## SUPPORTING INFORMATION

Supporting information for this article is available at https://doi.org/10.1162/netn_a_00325. Raw data are available as part of the Human Connectome Project (https://www.humanconnectome.org/). For CCA and PLS, we used the GEMMR package: https://github.com/murraylab/gemmr. Strength-based node centrality measures were computed using the Brain Connectivity Toolbox (https://sites.google.com/site/bctnet/). The code for the normalized participation coefficient is available in the repository: https://github.com/omidvarnia/Dynamic_brain_connectivity_analysis. All other relevant code is available in the Open Science Framework repository: https://dx.doi.org/10.17605/OSF.IO/XFTDH.

## AUTHOR CONTRIBUTIONS

Andraž Matkovič: Conceptualization; Formal analysis; Investigation; Methodology; Visualization; Writing – original draft; Writing – review & editing. Alan Anticevic: Conceptualization; Writing – review & editing. John D. Murray: Conceptualization; Writing – review & editing. Grega Repovš: Conceptualization; Funding acquisition; Project administration; Software; Supervision; Writing – original draft; Writing – review & editing.

## FUNDING INFORMATION

Grega Repovš, Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost RS (https://dx.doi.org/10.13039/501100004329), Award ID: J7-8275. Grega Repovš, Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost RS, Award ID: J5-4590. Simon Podnar, Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost RS (https://dx.doi.org/10.13039/501100004329), Award ID: P3-0338. Anja Podlesek, Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost RS (https://dx.doi.org/10.13039/501100004329), Award ID: P5-0110.

## COMPETING INTERESTS

J.D.M. and A.A. consult for and hold equity with Neumora (formerly BlackThorn Therapeutics), Manifest Technologies, and are co-inventors on the following patents: Anticevic A, Murray JD, Ji JL: Systems and Methods for Neuro-Behavioral Relationships in Dimensional Geometric Embedding (N-BRIDGE), PCT International Application No. PCT/US2119/022110, filed March 13, 2019 and Murray JD, Anticevic A, Martin, WJ:Methods and tools for detecting, diagnosing, predicting, prognosticating, or treating a neurobehavioral phenotype in a subject, U.S. Application No. 16/149,903 filed on October 2, 2018, U.S. Application for PCT International Application No. 18/054,009 filed on October 2, 2018. G.R. consults for and holds equity with Neumora (formerly BlackThorn Therapeutics) and Manifest Technologies. A.M. declares no conflict of interest.

## TECHNICAL TERMS

- Functional connectivity:
Represents statistical associations between signals from brain regions.

- Dynamic:
A model or measure is sensitive to temporal reordering of time points.

- Static:
A model or measure is insensitive to temporal reordering of time points.

- Autoregressive model:
A (linear) model in which the value of a next time point is predicted from previous time points.

- Hemodynamic response function:
A nonlinear function that describes the flow of oxygenated blood after activity in a brain region.

- Autocorrelation:
A correlation of a time series with its lagged version.

- Variance component model:
Estimates how much variance in a dependent variable can be explained by different sources of variability.

## REFERENCES

## Competing Interests

Competing Interests: See Competing Interests.

## Author notes

Handling Editor: Alex Fornito