Abstract

The brain consists of specialized cortical regions that exchange information between each other, reflecting a combination of segregated (local) and integrated (distributed) processes that define brain function. Functional magnetic resonance imaging (fMRI) is widely used to characterize these functional relationships, although it is an ongoing challenge to develop robust, interpretable models for high-dimensional fMRI data. Gaussian mixture models (GMMs) are a powerful tool for parcellating the brain, based on the similarity of voxel time series. However, conventional GMMs have limited parametric flexibility: they only estimate segregated structure and do not model interregional functional connectivity, nor do they account for network variability across voxels or between subjects. To address these issues, this letter develops the functional segregation and integration model (FSIM). This extension of the GMM framework simultaneously estimates spatial clustering and the most consistent group functional connectivity structure. It also explicitly models network variability, based on voxel- and subject-specific network scaling profiles. We compared the FSIM to standard GMM in a predictive cross-validation framework and examined the importance of different model parameters, using both simulated and experimental resting-state data. The reliability of parcellations is not significantly altered by flexibility of the FSIM, whereas voxel- and subject-specific network scaling profiles significantly improve the ability to predict functional connectivity in independent test data. Moreover, the FSIM provides a set of interpretable parameters to characterize both consistent and variable aspects functional connectivity structure. As an example of its utility, we use subject-specific network profiles to identify brain regions where network expression predicts subject age in the experimental data. Thus, the FSIM is effective at summarizing functional connectivity structure in group-level fMRI, with applications in modeling the relationships between network variability and behavioral/demographic variables.

1  Introduction

The brain exhibits network organization, consisting of functionally distinct cortical regions that share information between each other. Thus, cognitive states are defined by two modes of brain function: specialized, local processing (segregation) and interactions between these segregated regions (integration) (Tononi, Sporns, & Edelman, 1994). A critical goal of neuroscience research is to learn the “segregative” and “integrative” properties of brain function that underlie healthy and impaired cognition (Friston, 2002; Fair et al., 2007; Rubinov & Sporns, 2010). Blood oxygenation level dependent functional magnetic resonance imaging (BOLD fMRI) provides a powerful technique for investigating the network properties of brain function, typically by measuring the covariance (or correlation) between BOLD time series of different brain regions. This approach to measuring functional connectivity has seen increasing use over the past decade, spurred by the identification of reliable functional networks in the brain (Raichle et al., 2001; Greicius, Krasnow, Reiss, & Menon, 2003), with the literature showing numerous applications, including aging (Grady, 2008), maturation (Power, Fair, Schlaggar, & Petersen, 2010), and characterizing clinical populations (Greicius, Srivastava, Reiss, & Menon, 2004; Garrity et al., 2007; Lowe et al., 2002; Grefkes et al., 2008).

Functional connectivity studies typically attempt to quantify integrative relationships between brain regions. However, it is an ongoing challenge to identify generalizable features of functional connectivity, as the number of brain voxels V usually exceeds the number of samples N by orders of magnitude, and we must summarize covariance relationships among a massive set of possible elements. Functional connectivity estimates may also be altered by noise confounds (Chang & Glover, 2009; Van Dijk, Sabuncu, & Buckner, 2012) and variability in functional networks, both within and between subjects. A simple modeling approach is to parcellate the cortex based on an anatomical atlas and measure functional connectivity between mean time series of these regions. However, atlas-based parcellations may be inconsistent, as they are highly sensitive to spatial normalization methods (Bohland, Bokil, Allen, & Mitra, 2009), and may poorly represent populations that deviate from the anatomical template, including child, aging, and clinical populations (Crinion et al., 2007; Rorden, Bonilha, Fridriksson, Bender, & Karnath, 2012). This is a major issue, as suboptimal parcels can lead to significant network modeling errors (Craddock, James, Holtzheimer, Hu, & Mayberg, 2012; Smith et al., 2013).

As an alternative, there is growing interest in unsupervised clustering models, which automatically parcellate brain regions based on the similarity of their BOLD time series. A variety of approaches have been successfully applied to fMRI data, including variants of k-means clustering (Goutte, Toft, Rostrup, Nielsen, & Hansen, 1999; Golland, Golland, Bentin, & Malach, 2008; Bellec, Rosa-Neto, Lyttelton, Benali, & Evans, 2010), spectral clustering (Van den Heuvel, Mandl, & Hulshoff Pol, 2008; Craddock et al., 2012), region-growing approaches (Lu, Jiang, & Zang, 2003; Bellec et al., 2006), hierarchical models (Cordes, Haughton, Carew, Arfanakis, & Maravilla, 2002; Thirion et al., 2006), and mixture models (Woolrich, Behrens, Beckmann, & Smith, 2005; Lashkari et al., 2012). Gaussian mixture models (GMMs) are particularly appealing, as they employ a probabilistic framework that does not require clustering threshold heuristics, and have a rich, well-developed theory for the analysis of large data sets (McLachlan & Peel, 2000; Fraley & Raftery, 2002; Melnykov & Maitra, 2010). This latent variable modeling approach assumes voxels originate from a set of K clusters, each with a mean time series and normally distributed variability.

When parcellating fMRI data, conventional GMMs have two significant limitations, which are common to all temporal clustering models. The first issue is that GMMs treat individual cluster means independently and only estimate “segregative” brain structure (i.e., within-cluster mean BOLD time series), requiring users to measure between-cluster functional connectivity as a post hoc exercise. This is potentially an oversimplified representation of the highly integrated human brain and does not exploit between-cluster functional connectivity information during model learning. The second issue is a lack of parametric flexibility in GMMs, as within-cluster mean BOLD time series (and post hoc connectivity estimates) represent the BOLD response across all voxels and subjects. They do not model voxel-dependent variability in signal amplitude or subject-dependent variations in functional connectivity. The inclusion of these parameters in GMMs would provide a more detailed description of functional structure in fMRI data and potentially make clustering models robust to outliers.

This letter addresses the above issues by extending the GMM framework to identify the most stable parcellation for group fMRI in terms of between-cluster functional connectivity. Termed the FSIM (functional segregation and integration model), the model reparameterizes cluster means using a constraint adapted from multilinear component modeling (Harshman, 1972; Kiers, Ten Berge, & Bro, 1999), in order to learn the most consistent group functional connectivity structure. Moreover, FSIM estimates voxel-specific and subject-specific network scaling profiles. This framework allows us to identify a core functional connectivity structure that is most consistent within the group, as well as spatial and subject-dependent variations in network scaling. Figure 1 depicts the features estimated in standard mixture models and compares them to the proposed FSIM framework.

Figure 1:

Schematic comparing (A) standard temporal clustering models to (B) the proposed FSIM. Whereas standard clustering models focus on the parcellations (clusters zkv), the FSIM simultaneously estimates clusters zkv and between-cluster connectivity matrix . It also estimates voxel scaling profiles (i.e., voxel-wise expression of ) and subject scaling profiles (i.e., network scaling of for subject s).

Figure 1:

Schematic comparing (A) standard temporal clustering models to (B) the proposed FSIM. Whereas standard clustering models focus on the parcellations (clusters zkv), the FSIM simultaneously estimates clusters zkv and between-cluster connectivity matrix . It also estimates voxel scaling profiles (i.e., voxel-wise expression of ) and subject scaling profiles (i.e., network scaling of for subject s).

The scaling profile approach of FSIM also bridges the gap between discrete mixture modeling and linear component-based profile models, which are well established in the functional neuroimaging literature. This includes bilinear techniques such as scaled subprofile models (Moeller, Strother, Sidtis, & Rottenberg, 1987) and partial least squares (McIntosh, Chau, & Protzner, 2004), which characterize structured covariance across subjects and task conditions. Multilinear models such as parallel factor analysis (PARAFAC) have also been applied to functional neuroimaging (Andersen & Rayens, 2004; Miwakeichi et al., 2004; Martínez-Montes, Vega-Hernández, Sánchez-Bornot, & Valdés-Sosa, 2008), providing scaling profiles that describe voxel × time × subject structure in group data. A more flexible extension of PARAFAC, termed PARAFAC2, was proposed by Harshman (1972) and further refined by Kiers et al. (1999). It allows for component time series to be asynchronous across subjects, under the constraint of a consistent between-component correlation matrix. The PARAFAC2 model forms the basis for the proposed FSIM by applying a constraint of consistent between-cluster functional connectivity to the GMM. The discrete latent variable approach to modeling functional connectivity has less flexibility compared to component-based models (which represent voxel time series as a linear combination of components) but is able to define discrete cortical subdivisions, making it possible to fully describe data in terms of the extracted connectivity networks. This is of particular interest for research domains associated with network and graph-theoretic models of brain function.

In this letter, we develop the FSIM framework and an inference procedure based on expectation-maximization (EM). We then implement a cross-validation framework in order to determine how different parameters of the FSIM affect the model’s ability to predict between-voxel connectivity relationships and the stability of model parameters, including brain parcellations and the between-cluster connectivity matrix. The results are evaluated for both simulated and experimental resting-state data sets. In addition, the FSIM generates a set of parameters that can be used to characterize the functional organization of group fMRI data. As an example of the utility of these parameters, we demonstrate how network scaling profiles can be used to predict age for the experimental data set.

This letter focuses on the unsupervised modeling of functional connectivity, quantifying the linear dependencies between brain regions based on their BOLD time series. This is distinct from models of directed or effective connectivity, of the sort commonly applied in dynamic causal modeling (e.g., Penny, Stephan, Mechelli, & Friston, 2004). These approaches serve complementary roles, as functional connectivity methods are useful exploratory tools that are able to identify structure in high-dimensional data, whereas models of effective connectivity are better suited for directed hypothesis testing. (See section 4 for further review of these issues.)

2  Materials and Methods

The FSIM framework is established in section 2.1, based on the GMM formalism and solved via the expectation-maximization (EM) algorithm. Different possible FSIM implementations and parameter choices are reviewed in section 2.2, followed in section 2.3 by the cross-validation framework for measuring model generalizability. Finally, we describe the experimental data and preprocessing parameters used to evaluate the models in section 2.4. For this letter, the EM algorithm is used to optimize model log likelihood as the most direct extension of the PARAFAC2 fitting approach (Harshman, 1972; Kiers et al., 1999) to the mixture-modeling framework. This approach, combined with the proposed cross-validation framework, allows us to explore trade-offs in both predictive generalization and parameter reproducibility as a function of FSIM parameter choices. Alternative Bayesian approaches to model estimation, along with implementation challenges in the FSIM framework, are reviewed in section 4.

2.1  The FSIM Framework

2.1.1  Model Definition

For a group of subjects , we have a set of fMRI data matrices (V voxels, acquired over T time points), and it is assumed that time-series vectors are generated from a mixture of K multivariate gaussian distributions. Each voxel has a set of binary latent variables defining the cluster it originates from: for voxel v, a value indicates it originates from cluster k, and thus all other . In this model, subjects are governed by the same latent variables—that is, they have the same spatial clustering structure but different model parameters. Our goal is to estimate maximum-likelihood model parameters for the full observed data set by marginalizing zkv of the likelihood on observed data:
formula
where is the prior likelihood of the kth cluster and is the likelihood of voxel time-series originating from the kth cluster under an assumed Gaussian model. The mean cluster time-series are subject dependent. In analogy to component-based models, we also assume that the cluster means have a voxel-dependent scaling factor wkv. In this GMM representation, we now enforce a consistent covariance structure between the cluster means over all subjects. This is done by reparameterizing them using the approach devised by Harshman (1972) and Kiers et al. (1999) for component-based group covariance estimation in the PARAFAC2 tensor model. For each subject, the (column) matrix of cluster means is defined by
formula
where are constrained to be columnwise orthonormal (), is fixed across subjects, and is a diagonal matrix of cluster scalings. For mean-centered time series, the covariance between clusters is therefore .
This model identifies a fixed connectivity matrix across subjects while admitting subject-specific connectivity scaling via the diagonal matrix . Assembling terms, we define data likelihood to be
formula
2.1
To avoid scaling ambiguity between parameters, we rescale wkv and columns of to unit norm and absorb subject-specific scaling into the elements of . This defines an explicit network-driven clustering model, where individual clusters form network elements. We directly estimate the most stable intercluster correlation across subjects while also learning subject- and voxel-specific network scaling. The FSIM therefore clusters voxels into groups (segregation) in terms of consistent between-cluster correlations (integration).

2.1.2  Expectation-Maximization Solution

For model inference, we consider the EM approach widely adopted in mixture modeling (Fukunaga, 2013; Duda & Hart, 1973; Dempster, Laird, & Rubin, 1977; McLachlan & Peel, 2000) to optimize the log likelihood of equation (2.1):
formula
2.2
This problem is analytically intractable; the EM algorithm incorporates latent variable matrix and uses Jensen’s inequality to establish that the expectation of log likelihood over observed and latent variables,
formula
is a lower bound on the full-data log likelihood (Dempster et al., 1977). We iteratively optimize this more tractable expression, which increases the log likelihood of equation 2.2 up to a stationary point. To perform EM, we take the expectation of the full-data log likelihood with respect to , which is
formula
where denotes model parameters. This is the auxiliary equation Q that we must iteratively optimize. If we define cluster responsibilities and replace with the joint gaussian probability over all subjects, this expands into the expression
formula
2.3
where . We now solve for expectation and maximization steps of this function. The parameter update equations are defined below (refer to appendix A for full solution details). We first update cluster responsibilities (E-step), then compute maximum-likelihood estimates of all model parameters for the auxiliary function Q (see equation 2.3; M-step). The M-step updates are obtained by solving for each parameter. For notational convenience the solutions to the updated equations are simply referred to as below. Resolving each in turn:

E-Step

  1. Cluster responsibilities: The probability of assigning voxel time series to cluster k, conditional on the posterior expectations of . It is obtained via Bayes’ rule:
    formula
    Specifically, these values reflect the likelihood that the vth voxel belongs to cluster k. The estimates are used to parcellate brain voxels into k subregions by assigning each voxel to the cluster with the highest responsibility. (See the paragraphs following the M-step list for further details.)
M-Step
  1. Cluster priors: The average probability of cluster k computed over all brain voxels:
    formula
  2. Noise variance: Computed as the variance of each voxel time series, about the (scaled) cluster mean. Note that noise variance is estimated as a function of cluster, subject, and voxel:
    formula
  3. Voxel-specific network scaling: Computed as variance-weighted regression of voxel time series against cluster means (with all terms except wkv included):
    formula
    This measure therefore has a practical interpretation; it produces a set of K voxel scaling profiles (brain maps), reflecting the similarity between each voxel time series and the kth cluster mean.
  4. Subject-specific network scaling: The elements of the diagonal network scaling matrices can be solved independently. They are computed as variance-weighted regressions of voxel time series against cluster means ( with all terms except cks included):
    formula
    This subject scaling profile also has direct interpretation, as it quantifies the relative amplitude of network expression for each subject.
  5. Group connectivity matrix and orthonormal subject matrices: We define the following terms: , , , where, for example, refers to the K-dimensional vector of all values for a fixed v, converted into a K-dimensional diagonal matrix. If we also define 1 as a vector of ones, the group connectivity-forming matrix is given by
    formula
    For the orthonormal subject matrices, the singular value decomposition is obtained, such that the solution is given by
    formula
    with further details in appendix A. From the above equations, we see that the parameters have complex interdependencies. Due to EM convergence properties, it is sufficient to increase the likelihood of individual parameters for each cycle in order to converge to a (local) maximum, in an approach sometimes termed “generalized EM” (Laird, Lange, & Stram, 1987). To fit the FSIM, we iteratively perform an E-step update, followed by cyclic updating of the model parameters during the M-step.

To facilitate model interpretation at the group level, nonnegativity constraints are imposed on voxel scaling wkv and subject scaling cks terms. Unlike component models, where individual elements are interdependent, necessitating numerical optimization methods (Gillis, 2014), the FSIM update equations show independence between elements of wkv and between elements of cks when keeping other parameters fixed Thus, the max-likelihood solution at a stationary point x0 is given simply by .

For GMMs, the brain is parcellated into K regions, based on the assumption that each voxel originates from one of K latent clusters. The binary latent variables dictate cluster assignments, with indicating that voxel v originates from cluster k, and for all . However the latent variables are unobservable quantities. Thus, GMMs typically use responsibilities to infer the most probable parcellation of brain voxels . Voxels are assigned to their highest-probability clusters: for the vth voxel, we set for the cluster k with the highest-responsibility (and for all ). In principle, responsibilities freely range between 0 and 1. However, for fMRI data, we generally observe that responsibilities converge to binary values , which directly form the expected latent variable assignments. For notational convenience, we therefore denote the expected latent variable assignments of a clustering solution although they are not the “true” latent variables.

2.1.3  Variance Prior and Model Convergence

One limitation of GMM clustering is that it is prone to model collapse if a singleton cluster is identified. In this case, noise variance shrinks to zero, giving infinite model likelihood . To control for this issue, a weak prior is imposed, based on the improper distribution:
formula
This leads to a modified M-step update for variance (derived in appendix B):
formula
In the limit where and for a fixed cluster k and voxel v, we now obtain a nonzero . In order to minimize the influence of the prior on our solution, the hyperparameter is set to a minimal value of This provides a maximum a posteriori solution (MAP) to the problem, based on an augmented M-step. In more general variational Bayes settings, the prior would be naturally absorbed into variational free energy.

2.2  Model Validation

We examined how the addition of flexible modeling parameters affects model convergence, predictive generalization of functional connectivity, and parameter reproducibility compared to standard GMMs. The following four models were examined, ordered by increasing number of fitted parameters:

  1. Segregative model, no voxel scaling (SX): The standard GMM signal model. Cluster means are distributed as .

  2. Segregative model, voxel scaling (SV): This model includes the voxel-specific network scaling parameter wkv. Cluster means are distributed as .

  3. Integrative model, no subject scaling (IX): The cluster means are now constrained to have a consistent connectivity matrix over all subjects. Cluster means are distributed as .

  4. Integrative model, subject scaling (IS): This model admits subject-specific network scaling terms . Cluster means are distributed as .

The optimization space for GMMs is nonconvex. While the continuous latent variable model of PARAFAC2 has been shown to find a global optimum under most conditions (Kiers et al., 1999), this is not guaranteed for FSIM. For all models, to reduce the risk of identifying local minima, clustering was initialized by 10 runs of k-means clustering with random seed initialization, and the maximum-likelihood k-means solution was used to initialize the considered mixture model, which was then iteratively updated until convergence (see section 2.2.1 for further details on convergence criteria). To further avoid local minima in the mixture analyses, we performed 10 restarts of the entire process (10 k-means restarts, followed by full mixture-model convergence), and selected the maximum-likelihood overall solution. To determine how parameter choices affect the performance of the mixture models, we examined three issues: model convergence properties, generalization of between-voxel covariance relationships, and parameter reproducibility.

2.2.1  Model Convergence

We compared the training model likelihoods and convergence rates for the different mixture models. The log likelihood was plotted as a function of the EM iteration step, for the 10 initializations of each mixture model. We also compared the average number of EM iterations until the fractional change in log likelihood fell below a fixed threshold of 10 and the average number of EM iterations until there were no measured changes in cluster assignments zkv For all subsequent clustering results, EM iteration was terminated when both of these criteria were met.

We also measured how sensitive the spatial segregations are to different initializations. For each model, we compared spatial parcellations between the 10 restarts, which can be represented in a binary matrix , where column k indicates whether voxels belong to cluster k. For each pair of mixture model solutions , we computed the normalized mutual information (NMI) between and . This is a nonparametric measure of shared information, analogous to the Pearson correlation coefficient, and is invariant to permutations in the clustering partitions. It is computed as
formula
2.4
based on cluster entropy and and unnormalized mutual information . Entropy measures were obtained by computing the discrete probabilities of each cluster in and mutual information given by the identity . We compared the average convergence rates and NMI values for different mixture models for a set of representative model orders K.

2.2.2  Covariance Generalization and Model Reproducibility

The EM approach is prone to overfitting, and thus we cannot directly perform model comparison based on the (biased) likelihood defined in equation 2.2. Instead, split-half cross-validation is used to obtain unbiased estimates of model generalization error and parameter reproducibility. Generalization error quantifies the model’s ability to predict structure in independent data, and statistical models are often chosen to minimize generalization error, as this provides an optimal balance between bias and variance in parameter estimates (Efron & Tibshirani, 1986). Reproducibility determines the stability of the model parameters, which are used to visualize and interpret brain function. The proposed cross-validation framework allows us to evaluate both metrics, which exhibit complex trade-offs when fitting multivariate models of neuroimaging data (Rasmussen, Hansen, Madsen, Churchill, & Strother, 2012; Strother, Rasmussen, Churchill, & Hansen, 2015).

Covariance Generalization. We evaluated the generalization of clustering models, based on the predictive likelihood of subject covariance for independent test data, where denotes temporal degrees of freedom. This approach is similar to the one used for component subspace selection in Hansen et al. (1999), but here used to directly predict covariance matrices. We employed a within-subject split-half framework for group data, as individual subjects are expected to have different covariance structure. For a subject data set , each matrix was split into two halves containing an equal number of time points. The FSIM was fit on split-half 1 and used to predict covariance for split-half 2. The split halves were then exchanged, with FSIM fit to split-half 2 and covariance predicted for split-half 1. Afterward, prediction was averaged across the two splits. For a K-component clustering model, the predictive likelihood is defined as follows.

Consider the standard GMM. For a matrix of cluster means and latent variable matrix , the K-cluster representation of the signal subspace is . The covariance of the estimated signal matrix is
formula
To model total covariance, we also include noise estimated by the mixture models. For subject s, a diagonal matrix of voxel-wise noise estimates is defined . That is, for voxel v and subject s, we assign variance from the “true” latent class. To correctly account for degrees of freedom, the values are rescaled by , giving full-rank covariance estimate for subject s:
formula
To model log likelihood of the ill-posed sample covariance matrix , we employ the singular Wishart distribution (Uhlig, 1994):
formula
2.5
where is the pseudo-determinant, given by the product of all nonzero singular values This expression can be solved for high-dimensional fMRI data using a set of matrix identities (see appendix C for further details). We computed total log likelihood per clustering model by summing over all subjects, which was then plotted as a function of model order K.

Parameter reproducibility. It is crucial to evaluate the reliability of GMM parameters, as they are used to interpret network structure in the brain. Reproducibility of spatial parcellations was evaluated using the split-half approach defined for covariance generalization. FSIM models were fitted independently for data split halves 1 and 2, and the resulting latent variable matrices were compared via normalized mutual information (NMI) as defined in equation 2.4. This quantifies the test-retest reliability of clustering models for a fixed group of subjects.

We also evaluated the reliability of between-cluster connectivity for the full FSIM in order to demonstrate that it produces robust network models. For this letter, we focused on partial correlation matrix , which may be computed from (Marrelec et al., 2006). This measure is of particular interest for connectivity modeling as it accounts for interdependencies between brain regions. Because split halves may have different parcellations, their connectivity values cannot be directly compared. As an alternative, we employed a within-split bootstrapping procedure. The FSIM was initially fit using all subject data and the resulting parcellation held fixed. Subjects were then sampled with replacement for iterations. For each bootstrap sample, we refit FSIM for fixed , producing a new estimate of which was used to compute . For each connectivity element (), we obtained an empirical significance estimate by converting bootstrapped partial correlations to normally distributed variables via Fisher’s Z-transform . From the set of transformed values , we computed the bootstap ratio as the mean divided by the standard error which has a z-score distribution (Efron & Tibshirani, 1986), allowing us to estimate p-values based on the two-tailed normal distribution. We estimated p-values for all elements of , adjusted for multiple comparisons at a false-discovery Rate (FDR) threshold of 0.05.

2.3  Interpreting Model Parameters

This section focuses on the information provided by the full parametric FSIM implementation for experimental fMRI data. We demonstrated example results and interpreted (1) the segregative structure, including latent variable parcellations zkv and noise variance , and (2) integrative structure, including group connectivity-forming matrix , voxel-specific scaling profile wkv, and subject-specific scaling profile cks. These results were demonstrated in experimental resting-state data for the full FSIM model and representative clusters. To better visualize results, we performed k-means clustering directly on the connectivity matrix and grouped parcels into subclusters. For this step, the optimal number of clusters was based on the variance ratio criterion (Caliński & Harabasz, 1974).

We also demonstrated the utility of the subject-specific network scaling profile cks by using it to identify parcellations where network scaling is predictive of subject age. This was estimated by performing principal component regression (PCR) on the scaling profile matrix , using it to predict the vector of subject ages in a 10-fold cross-validation framework. For each cluster size K, we measured prediction using the statistic (Consonni, Ballabio, & Todeschini, 2010), which is the cross-validated equivalent of the coefficient of determination; significance was determined via bootstrapped confidence bounds on the estimates (1000 iterations). We plotted as a function of model order K, and for the model that had optimal prediction, we identified parcels showing a significant change in network expression with age. This was done by computing the bootstrap ratio on regression weights for each parcel (bootstrapped mean / standard error) and plotting significant parcels after multiple comparison correction for an FDR threshold of 0.05.

2.4  Simulation Data

To validate the proposed metrics of covariance generalization and NMI of spatial parcellations, we analyzed simulated data with a known number of parcels. We simulated 2D brain slices, consisting of six square brain parcels (864 pixels total), acquired at 100 time points per split-half. For this study, simulated data were generated from the FSIM model (the IS model defined in section 2.2) by randomly sampling all parameters of the cluster means for each data set, under the appropriate constraints. They were generated as follows: orthonormal basis was obtained by generating random normal vectors and applying sequential Gram-Schmidt orthogonalization, connectivity-forming matrix was generated as a random normal matrix with unit-normalized columns, and were generated as a random uniformly distributed matrix with range [0, 1]. We also generated spatial weightings wkv within each parcel, using a square cosine basis to define relative scaling from 0.1 (edge) to 1 (center). Independent gaussian noise was also simulated for each pixel, time point, and subject. To account for the temporally smooth hemodynamic response in fMRI, both signal and noise time series were convolved with AFNI’s SPMG1 function (afni.nimh.nih.gov/pub/dist/doc/program_help/3dDeconvolve.html), using standard parameter settings and a TR of 2.0s. In order to test FSIM performance for a range of signal strengths, background noise was scaled so that the average signal-to-noise ratio (SNR) in regions of peak scaling across all clusters and subjects, was . Averaging over all pixels, this corresponded to a mean SNR over the entire “brain slice” of . This scheme was used to simulate data sets of 20 subjects, with 100 time points per split-half.

Along with results in the main text, appendix D shows the effects on model prediction and reproducibility due to including or excluding different network variability parameters. It also shows the impact of between-subject variability in the underlying connectivity matrix , to determine if FSIM is robust in cases of inconsistent core connectivity. In addition, appendix E shows the performance of the FSIM for data generated under a continuous latent variable model, where independent components may be spatially overlapped; this is the basis for standard principal component analysis (PCA) and independent component analysis (ICA) models. These results demonstrates how the FSIM recovers and represents overlapped signals.

2.5  Experimental Data

To evaluate the different clustering models, we used a resting-state data set previously published in Andersen et al. (2014), consisting of 30 healthy subjects (median age 45, range 22–69; 15 female). The data were acquired during 20 minutes of eyes-closed resting-state scanning, using a Siemens Magnetom Trio 3T scanner. Scan volumes were obtained using echo-planar imaging with an isotropic resolution of mm, with sequence parameters: TR = 2.49 s, TE = 30 ms, FA = 90; 42 slices, acquired in a matrix. This resulted in 480 acquired volumes per subject, which we preprocessed using the SPM8 software package (SPM8, rev. 5236). The standard preprocessing steps included rigid-body realignment to the subject mean, slice timing correction with sinc interpolation, and spatial normalization to an MNI template using affine warping and with discrete cosine transform basis functions (Ashburner & Friston, 1999). To further reduce noise and variability in the data, we performed spatial smoothing with an isotropic 6 mm FWHM gaussian kernel and corrected for head motion spikes using the algorithm established in Campbell, Grigg, Saverinu, Churchill, and Grady (2013). Global signal fluctuations were removed by regressing out the first PC of each run (Carbonell, Bellec, & Shmuel, 2011), and residual variance due to nonrigid head motion effects was modeled using the first two principal components of the six rigid-body realignment parameters. Low-frequency signal fluctuations potentially caused by hardware instability or gradual motion were modeled using a third-order Legendre polynomial basis (chosen using AFNI’s selection heuristic; afni.nimh.nih.gov/pub/dist/doc/program_help/3dDetrend.html). We controlled physiological fluctuations in the data by removing vascular, CSF, and white matter signal using the PHYCAA+ algorithm (Churchill & Strother, 2013). To reduce dependence between data split halves, temporal preprocessing steps were performed separately on the split halves.

3  Results

3.1  Simulation Data

Figure 2 plots the true simulation parameters (left column) and parameters estimated by the four different clustering models at () for the true model order of . For spatial parcellations (row 1), all models except SX (the standard GMM) correct identify clusters based on zkv, with few errors near the between-parcel boundaries, where signal scaling is lowest. The SX fails due to an inability to account for voxel scaling wkv, allocating voxels with low scaling to incorrect clusters. In contrast, VS, IX, and IS successfully reconstruct the spatial scaling of cluster signals in the wkv maps (row 2). Both IX and IS also accurately model between-cluster connectivity (row 3), although IS appears more similar to the true model. Finally, the IS estimates subject-specific network scaling via cks profiles (row 4). They are highly consistent with true subject scaling, showing a mean correlation of 0.92 0.03 (average SD for the six clusters).

Figure 2:

Plot depicting the true parameter estimates used to generate simulated data (leftmost column) and the parameters obtained by the highest-likelihood solution of each clustering model: segregative model, no voxel scaling (SX); segregative model, voxel scaling (SV); integrative model, no subject scaling (IX); integrative model, subject scaling (IS).

Figure 2:

Plot depicting the true parameter estimates used to generate simulated data (leftmost column) and the parameters obtained by the highest-likelihood solution of each clustering model: segregative model, no voxel scaling (SX); segregative model, voxel scaling (SV); integrative model, no subject scaling (IX); integrative model, subject scaling (IS).

Figure 3 depicts cross-validation metrics as a function of model order K for the simulated data. For all tested SNR levels, the predictive log likelihood saturates at the true model order of , and the simplest SX model (i.e., a standard GMM) shows lowest prediction, followed by the highly constrained IX model. The full IS model and highly flexible SV model show no significant differences in performance (, Wilcoxon paired-rank test on subject prediction values at ), indicating that the more heavily parameterized IS approach is not significantly more biased. However, as shown in appendix D, in cases with inconsistent between-subject connectivity , the SV has higher predictive log likelihood than IS. This is expected, as SV does not impose any consistency constraints on between-cluster connectivity. Examining reproducibility of the spatial parcellations, all data sets show a peak at the true model order , and there are no consistent differences between models. Both the predictive log likelihood and NMI decrease for data with lower SNR, as expected.

Figure 3:

Cross-validation measures of model performance, simulated data. For data generated under FSIM modeling assumptions of consistent between-cluster connectivity, (A) predictive generalization of voxel covariance and (B) reproducibility of spatial parcellations zkv are plotted. Similarly, for data with between-subject variability in cluster connectivity, (C) predictive generalization and (D) reproducibility of spatial parcellations are plotted. Results are shown for the maximum-likelihood solutions, for each clustering model and number of clusters K, with true model order circled in red.

Figure 3:

Cross-validation measures of model performance, simulated data. For data generated under FSIM modeling assumptions of consistent between-cluster connectivity, (A) predictive generalization of voxel covariance and (B) reproducibility of spatial parcellations zkv are plotted. Similarly, for data with between-subject variability in cluster connectivity, (C) predictive generalization and (D) reproducibility of spatial parcellations are plotted. Results are shown for the maximum-likelihood solutions, for each clustering model and number of clusters K, with true model order circled in red.

The FSIM and cross-validation metrics are also robust in cases where simulated data do not conform to discrete latent variable modeling assumptions. Appendix E shows that the FSIM also provide interpretable results for data generated from a component-based model with significant spatial overlap, although it requires a higher model order than the intrinsic subspace dimensionality in order to fully represent functional connectivity structure.

3.2  Model Convergence

Figure 4A plots the 10 log-likelihood curves for training data for each of the different mixture models at a representative clusters. Log likelihood is plotted across EM iterations, showing that all models all rapidly plateau within five iterations, with more gradual improvements beyond this point. The models with constrained covariance (IX, IS) have consistently lower training likelihood compared to the purely segregative models (SX, SV), whereas the addition of voxel-scaling profiles (SV) and subject-scaling profiles (IS) increases model likelihood. Figure 4B plots the 10 log-likelihood values at EM convergence as a function of K, demonstrating that training likelihood consistently grows with number of clusters and models without constrained covariance (SX, SV) show a more rapid increase in likelihood than those with constrained covariance (IX, IS).

Figure 4:

Training data log-likelihood curves of the different models for experimental data. (A) Curves are shown for the 10 different EM runs at a representative clusters for each clustering model. (B) Convergence log-likelihood values are plotted for the 10 different EM runs as a function of K. Models include segregative without voxel scaling (SX), segregative with voxel scaling (SV), integrative without subject scaling (IX), and integrative with subject scaling (IS).

Figure 4:

Training data log-likelihood curves of the different models for experimental data. (A) Curves are shown for the 10 different EM runs at a representative clusters for each clustering model. (B) Convergence log-likelihood values are plotted for the 10 different EM runs as a function of K. Models include segregative without voxel scaling (SX), segregative with voxel scaling (SV), integrative without subject scaling (IX), and integrative with subject scaling (IS).

Table 1 provides convergence statistics for the different models, at K = 8, 60, 100. The consistency of parcellations (based on pairwise NMI between clustering runs) is not significantly different between models (, nonparametric Friedman rank test), and increases for higher numbers of clusters. In addition, the parcellations zkv tend to converge within five EM iterations for all models and across all tested cluster sizes. Conversely, the log likelihood converges relatively rapidly for the standard SX model but tends to require a greater number of iterations for more complex models. However, no consistent differences were identified between models (, Friedman test) due to the relatively high variability across EM runs.

Table 1:
Convergence Statistics for Selected Model Orders , 60, 100.
SXSVIXIS
K = 8 Parcellation NMI 0.672 0.08 0.681 0.08 0.693 0.07 0.657 0.04 
 Convergence (zkv3.5 0.8 5.6 1.6 5.3 1.5 6.1 1.6 
 Convergence (likelihood) 6.1 0.3 14.1 3.5 21.31 6.7 21.3 8.7 
K = 60 Parcellation NMI 0.732 0.01 0.738 0.04 0.737 0.04 0.732 0.01 
 Convergence (zkv3.9 0.8 5.1 1.3 5.0 1.3 5.7 1.3 
 Convergence (likelihood) 7.3 0.9 25.4 11.2 15.5 5.5 32.5 9.3 
K = 100 Parcellation NMI 0.735 0.01 0.743 0.04 0.741 0.04 0.741 0.04 
 Convergence (zkv3.9 0.6 4.2 1.6 4.7 1.5 5.6 1.2 
 Convergence (likelihood) 8.2 1.2 24.4 11.3 12.4 2.3 26.8 7.3 
SXSVIXIS
K = 8 Parcellation NMI 0.672 0.08 0.681 0.08 0.693 0.07 0.657 0.04 
 Convergence (zkv3.5 0.8 5.6 1.6 5.3 1.5 6.1 1.6 
 Convergence (likelihood) 6.1 0.3 14.1 3.5 21.31 6.7 21.3 8.7 
K = 60 Parcellation NMI 0.732 0.01 0.738 0.04 0.737 0.04 0.732 0.01 
 Convergence (zkv3.9 0.8 5.1 1.3 5.0 1.3 5.7 1.3 
 Convergence (likelihood) 7.3 0.9 25.4 11.2 15.5 5.5 32.5 9.3 
K = 100 Parcellation NMI 0.735 0.01 0.743 0.04 0.741 0.04 0.741 0.04 
 Convergence (zkv3.9 0.6 4.2 1.6 4.7 1.5 5.6 1.2 
 Convergence (likelihood) 8.2 1.2 24.4 11.3 12.4 2.3 26.8 7.3 

Notes: Parcellation NMI is the mean pairwise NMI between all 10 clustering runs: Convergence (zkv) is the mean number of iterations before cluster assignments do not change. Convergence (likelihood) is the mean number of iterations before change in log likelihood is less than 10. Results are shown standard deviation.

3.3  Covariance Generalization and Model Reproducibility

Figure 5 depicts independent cross-validation metrics for the different clustering models. Predictive log likelihood is depicted in Figure 5A. Unlike the training log likelihood (see Figure 4), both constrained SV and IS models outperform SX (the standard GMM), although SV shows highest likelihood, particularly for larger numbers of clusters. This is expected, since there are no constraints on individual subjects’ covariance, allowing more flexible subject fit. The models show a consistent ranking on prediction across K, (; Friedman test), where SV IS IX SX (post hoc Nemenyi permutation test, significance). The SV consistently outperforms the full IS model in experimental data, which matches simulation results with simulated between-subject variability in connectivity networks (refer to section 4 for more on this point). For all models, the predictive log-likelihood curves do not saturate within clusters, indicating that high-order clustering models continue to identify generalizable covariance structure. Figure 5B plots reproducibility of the spatial parcellations. There is no significant ranking in reproducibility between the different models (; Friedman test). While reproducibility curves are somewhat noisy, they tend to increase until saturating at for all models, indicating the stability of parcellations is consistently maximized at this point for the tested GMMs.

Figure 5:

Cross-validation measures of model performance, experimental data. (A) Predictive generalization measured by the log likelihood of test data, given the covariance estimated in the clustering models. (B) Reproducibility of the spatial parcellations, obtained by comparing independent split halves using normalized mutual information (NMI).

Figure 5:

Cross-validation measures of model performance, experimental data. (A) Predictive generalization measured by the log likelihood of test data, given the covariance estimated in the clustering models. (B) Reproducibility of the spatial parcellations, obtained by comparing independent split halves using normalized mutual information (NMI).

Figure 6 displays the full network estimated by FSIM for a representative cluster, including the spatial parcellations and between-cluster connectivity values, based on partial correlation estimates , with bootstrapped significance estimates. The FSIM produces significantly reliable correlations between the majority of brain parcellations, even after correcting for multiple comparisons at an FDR of 0.05.

Figure 6:

Full clustering and network of experimental data using the FSIM model, for clusters. Spatial clustering depicted as sagittal, coronal, and axial orthographic projections arranged in a circle. The connectivity between each pair of clusters is given by partial correlation estimated from , where color denotes partial correlation magnitude. Line width indicates the bootstrapped significance of partial correlation values, corrected for multiple comparisons over all 780 connections at an FDR threshold of 0.05.

Figure 6:

Full clustering and network of experimental data using the FSIM model, for clusters. Spatial clustering depicted as sagittal, coronal, and axial orthographic projections arranged in a circle. The connectivity between each pair of clusters is given by partial correlation estimated from , where color denotes partial correlation magnitude. Line width indicates the bootstrapped significance of partial correlation values, corrected for multiple comparisons over all 780 connections at an FDR threshold of 0.05.

3.4  Interpreting Model Parameters

Figure 7 depicts the segregative structure identified by the FSIM for clusters. Figure 7 (top) plots the whole-brain parcellation based on the zkv estimates for each voxel; each color corresponds to a different cluster k. We observe clear delineation of cortical subregions. At this clustering scale, many bilateral brain regions are combined into the same cluster, indicating strong coherence in BOLD fluctuations. Figure 7 (bottom) plots noise variance at each voxel, obtained from its assigned cluster. This map quantifies how well individual voxels are represented by their cluster means . Many interior gray matter regions are highly consistent (e.g., the insula and thalamus), as are superior frontal regions. Conversely, dorsal areas show comparatively high noise variance, including the cuneus, middle cingulum, and parietal lobes, indicating relatively high signal heterogeneity within these regions.

Figure 7:

Segregative structure in experimental data using the FSIM model, for clusters. (Top) Gray matter parcellations given by estimated cluster latent variables zkv. (Bottom) Noise variance map , where variance at each voxel is given by its assigned (i.e., most probable) cluster. The map shows the average variance estimates over subjects. Results are shown for the clustering repeat with the highest likelihood.

Figure 7:

Segregative structure in experimental data using the FSIM model, for clusters. (Top) Gray matter parcellations given by estimated cluster latent variables zkv. (Bottom) Noise variance map , where variance at each voxel is given by its assigned (i.e., most probable) cluster. The map shows the average variance estimates over subjects. Results are shown for the clustering repeat with the highest likelihood.

Figure 8 represents integrative structure obtained by the FSIM for clusters. Figure 8A shows the most consistent group connectivity matrix , with strong correlations (positive and negative) and a clear grouping structure within this matrix. Figures 8B and 8C show representative parcels from two subgroups, illustrating the relationship between these brain regions. For each subgroup, we plot (left) parcellation given by zkv and (right) the map of voxel scaling values wkv for network components. In Figure 8B, we observe elements of the default mode network, including parcels within the posterior cingulate, left and right middle temporal lobes, and ventromedial prefrontal cortex. Figure 8C depicts elements of the sensorimotor network, with parcels located in the supplementary motor area and both precentral and postcentral gyri.

Figure 8:

Integrative relationships between clusters in experimental data using the FSIM model, for the cluster solution. (A) The consistent group-level connectivity matrix estimated by FSIM. The similarity matrix was organized based on k-means clustering (for an optimal eight subgroups) to improve visualization. Black boxes denote distinct connectivity subgroups. We also show representative single-slice plots for parcels belonging to (B) the fourth cluster, –38 and (C) the sixth cluster, –51. For each group, we show (left) parcellations, delineated by latent variables zkv, and (right) the map of voxel scaling values wkv for network components.

Figure 8:

Integrative relationships between clusters in experimental data using the FSIM model, for the cluster solution. (A) The consistent group-level connectivity matrix estimated by FSIM. The similarity matrix was organized based on k-means clustering (for an optimal eight subgroups) to improve visualization. Black boxes denote distinct connectivity subgroups. We also show representative single-slice plots for parcels belonging to (B) the fourth cluster, –38 and (C) the sixth cluster, –51. For each group, we show (left) parcellations, delineated by latent variables zkv, and (right) the map of voxel scaling values wkv for network components.

The binary plots of zkv show brain regions that have been assigned to each cluster. In contrast, the voxel scaling profiles wkv show nonzero values in voxels extending outside each cluster. These plots can be interpreted as regression maps, showing the relative association between the cluster prototype and all brain voxels. This has two uses: (1) determining which brain regions within a cluster of interest are well represented by the mean time series and (2) identifying brain regions from other clusters that show highly similar BOLD fluctuations. Given a specific cluster of interest, the latter may be used to identify other clusters in the brain that are likely to show strong associations in the integrative matrix

Figure 9 plots the results of a regression analysis using cluster scalings cks to predict subject age. One subject was identified as an outlier across all K, based on Cook’s distance criterion (Cook, 1977) and was excluded (no values were significantly greater than zero if included). The predicted regression coefficient is plotted in Figure 9A, as a function of K. For low K, we obtain negative values, indicating worse prediction than a flat model that assigns all subjects the mean age . However, prediction improves beyond and peaks at .

Figure 9:

Principal component regression of subject network scaling profile against age for experimental data. (A) Predicted regression coefficients , plotted as a function of number of clusters K, with bootstrapped standard error bars. (B) Subject scaling profile for the optimal model. (C) Predicted versus true age, based on cross-validated estimates. (D) Parcels showing significant regression weights reflecting age-related scaling. Significance is based on bootstrap ratios at an FDR threshold of 0.05.

Figure 9:

Principal component regression of subject network scaling profile against age for experimental data. (A) Predicted regression coefficients , plotted as a function of number of clusters K, with bootstrapped standard error bars. (B) Subject scaling profile for the optimal model. (C) Predicted versus true age, based on cross-validated estimates. (D) Parcels showing significant regression weights reflecting age-related scaling. Significance is based on bootstrap ratios at an FDR threshold of 0.05.

Figure 9B depicts cks values for the model with (excluded) outlier subject #22, showing both subject-specific and cluster-specific trends in network scaling. Figure 9C shows the predicted ages based on cks values, indicating good fit for most of the subjects Figure 9D displays maps of the significant clusters. All significant parcels have negative bootstrap ratios including the sensorimotor cortex and cingulum, indicating reduced network expression in these brain regions as a function of age.

4  Discussion

One of the principal goals of functional connectivity analysis is to learn the properties of brain function, both segregative and integrative, that underlie cognition. Unsupervised clustering models are increasingly applied to BOLD fMRI data, to parcellate the brain based on similarity of BOLD time series and measure functional connectivity between clusters (Li, Guo, Nie, Li, & Liu, 2009; Thirion, Varoquaux, Dohmatob, & Poline, 2014). However, these clustering models have limited parametric flexibility and do not account for integrative between-cluster relationships. In this letter, we propose FSIM (functional segregation and integration model), a major extension of GMMs that learns the most consistent between-cluster connectivity matrix in group fMRI data, along with profiles of voxel- and subject-specific network expression. This is the first attempt to jointly model segregative and integrative properties of brain networks in a mixture-model framework. The FSIM provides a descriptive model of the functional connectivity structure in fMRI data, with interpretable parameters. We also examined how different choices in model parameters affect covariance generalization and the stability of model parameters.

4.1  Effects of FSIM Model Parameters

The addition of constraints (e.g., consistent group connectivity) and flexibility terms (e.g., voxel and subject scaling) have a modest effect on rates of model convergence. Moreover, they do not affect the reproducibility of spatial parcellations across split halves, as there was no significant difference in NMI for the different clustering models. This indicates that the use of more complex clustering models does not significantly reduce the stability of spatial clustering.

However, measures of covariance generalization showed significant model effects. For example, the voxel scaling profile wkv appears to be critical for modeling group-level functional networks, as all models that include it (SV, IX, IS) outperform the model that does not (SX). Moreover, these spatial scaling values wkv must be relatively consistent across subjects and data splits to give high generalization. Similarly, subject scaling profiles cks improve the generalization of IS compared to IX; this indicates that subjects have sufficiently reliable network scaling (i.e., between test-retest splits) that it improves generalization. Conversely, the fixed between-cluster connectivity matrix in IS leads to reduced generalization compared to the unconstrained SV. Taken with the simulation data, this is evidence of significant between-subject variability in network connectivity of experimental data, which is unsurprising. For example, Daimoiseaux et al. (2006) found that many functional networks are highly variable across subjects. We emphasize that this reduced generalizability should not be considered a failing of the FSIM model. Rather, it demonstrates the inherent challenge of identifying consistent functional connectivity structure in heterogeneous group data. FSIM provides an optimal parametric description of this connectivity structure, which is not estimated by standard clustering models but must be extracted as a postprocessing exercise.

Regarding the training log likelihood, standard GMMs show higher likelihood relative to the FSIM model. However, this measure does not account for integrative relationships, whereas our predictive covariance model considers both within- and between-cluster relationships. The covariance log likelihood is a more generalizable metric, since it does not presume a fixed BOLD time series across test-retest sessions, an issue for resting-state tasks without temporally constrained stimuli. Although there is a consistent difference between different GMMs, the log likelihoods are relatively similar; the choice in number of clusters (K) has a larger impact than model parameterization. The reduced importance of model parameterization is also likely due to the flexible noise estimator , which absorbs voxel/subject-specific variability.

4.2  Model Order and Cross-Validation Metrics

Although predictive likelihood saturated at the true model order for simulated data, it did not maximize for experimental data within the tested range of to 100 but increased continually. This suggests that there is a generalizable covariance structure for cortical divisions well beyond the tested range of . These findings agree with literature showing reliable cortical subdivisions in fMRI down to the millimeter scale, dictated by the spatial extent of arterioles that irrigate neural tissue (Detre & Wang, 2002). This is corroborated by Thirion et al. (2014), who found that the predictive likelihood of clustering models saturates at for task-based fMRI data. However, this level of granularity is not always desirable, as higher model orders K trade off better generalization for decreased model simplicity (and interpretability). Studies of cross-validated task fMRI have shown that optimization based solely on prediction tends to select excessively high-order models, whereas reproducibility often provides a better metric for choosing robust, interpretable models (Strother et al., 2015). In this study, this is supported by NMI of spatial parcellations, which plateaus near , with no significant improvements beyond this point. Current results show reliable spatial parcellations for up to in experimental data (NMI 0.70) and highly robust functional connectivity structure based on bootstrapped partial correlations.

These findings also provide evidence that when FSIM clustering is performed, the criterion for model order selection may depend on the specific research aims. If the goal is simply to predict covariance structure, the user may prefer the highest attainable model order K, determined by available computational resources. If instead the goal is to define the most parsimonious segregative structure, NMI may be used to identify the lowest-order K at which reproducibility plateaus. Alternatively, if the goal is to use brain function to predict specific behavioral, demographic, or clinical correlations, the user may select the K directly optimizing cross-validated regression scores as in Figure 9.

4.3  Interpretation of FSIM Parameters

In this letter, we demonstrated that FSIM provides a detailed parametric model of connectivity structure and subject-dependent variability. These parameters provide information about connectivity structure beyond standard GMMs. The matrix provides a direct estimate of integrative structure without relying on post hoc connectivity estimation. Spatial scaling profiles wkv may be used to determine which brain regions within a cluster are well represented by cluster means and which voxels outside a cluster show strong associations and, thus, potentially high connectivity. Subject scaling profiles cks may be used to characterize between-subject differences in network expression and identify potential outliers.

Importantly, FSIM model parameters are also relevant for studies of behavior and cognition, as they can be used to model the relationships between network modulation and behavioral measures. As an example, we demonstrated this by investigating changes in network scaling as a function of subject age. The analysis revealed age-related decreases in variance in elements of the sensorimotor cortex and cingulum. This is consistent with prior studies showing decreased BOLD signal as a hallmark of aging, with an emphasis on motor cortex (D’Esposito, Deouell, & Gazzaley, 2003), and prior studies in aging that reported decreased BOLD variance (Garrett, Kovacevic, McIntosh, & Grady, 2010; McIntosh et al., 2010). Interestingly, we did not observe decreases in visual cortex, which is also commonly reported (D’Esposito et al., 2003). This may be due to prior studies focusing on task-based activation, which may have a distinct profile from resting-state network expression.

4.4  Future Directions

While current results are promising, further development of FSIM will be an important area of future research. The maximum a posteriori (MAP) approach provides robust model solutions; however, these methods are prone to model overfitting, necessitating cross-validated model comparison. An appealing alternative is to extend the current model to fully Bayesian inference. Variational Bayes (VB) provides a natural extension of the current framework, in which a solution is obtained by minimizing variational free energy (Corduneau & Bishop, 2001). This would allow the computation of a bound on model evidence to compare between models and optimize model order K. The VB approach represents an alternative to cross-validation, as it automatically balances between model generalization and parsimony. It will be important to investigate trade-offs between these approaches: VB methods have a larger sample on which to compute parameter estimates compared to split-half methods but may also be prone to underestimating parameter uncertainty.

The development of a complete VB model is beyond the scope of this letter, as this requires distributions over all model parameters, including the orthonormal matrices needed for consistent covariance estimation. There exist distributions on orthonormal matrices, typically based on the hypergeometric function of a matrix argument, including the Von Mises-Fisher (or Langevin) and more general matrix-Bingham distributions (Turaga, Veeraraghavan, & Chellappa, 2008). An alternative to VB is the variational Laplace method (Friston, Mattout, Trujillo-Barreto, Ashburner, & Penny, 2007), which finesses many of the difficulties of VB. This method uses approximate forms to represent unknown variables, and priors need not be conjugate. Future work will attempt to develop a complete variational framework based on these distributions and building on methods proposed by Šmídl and Quinn (2007) for Bayesian PCA.

The FSIM characterizes linear dependence between regional BOLD responses in terms of unknown regional clusters and an unknown functional connectivity matrix that is conserved over subjects. As we have shown in this letter, this approach is effective at summarizing high-dimensional fMRI data in a low-rank representation. This linear gaussian model can be considered one approach among a range of possible dependency measures, chosen for its analytic tractability and robust results. In addition, there is evidence that linear measures can reliably account for the majority of fMRI signal (Hlinka, Paluš, Vejmelka, Mantini, & Corbetta, 2011) and provide a useful, reliable metric of brain function (Raichle et al., 2001; Greicius et al., 2003).

However, fMRI signal arises from a network of nonlinear neural oscillators, mediated via complex neurovascular response. As such, measures of linear dependency provide limited information about the neural architecture and causal processes that underlie these dependencies. In this respect, nonlinear effective connectivity models such as dynamic causal modeling (DCM) are well suited, identifying models based on Bayesian evidence (Penny et al., 2004; Stephan et al., 2010). However, due to their inherent complexity, these models are also highly dependent on the a priori selection of brain regions of interest. This suggests the utility of combining these approaches, using FSIM to identify linear dependencies in the brain, and then applying models of effective connectivity to characterize underlying neuronal structure.

Finally, the current FSIM framework adapts elements of the PARAFAC2 model (a continuous latent variable model) into the discrete latent variable framework. This involves significant trade-offs, as discrete models are able to parameterize data specifically in terms of segregative structure (discrete parcellations) and integrative structure (interregional connectivity), which is particularly relevant to domains such as graph-theoretical network analysis. As shown in appendix E, we also successfully applied clustering models to simulated continuous latent variable (i.e., component-based) data. Nonetheless, clustering models tend to have higher computational burden and nonconvex optimization space, often making them more challenging to implement. To date, there has been limited direct comparison between these approaches to functional connectivity modeling, although they are equivalent under certain sparsity conditions (Vinnikov & Shalev-Shwartz, 2014), and in some cases, clustering models may outperform component-based models in separating signal from correlated noise (Baumgartner et al., 2000). Further research is needed to compare the utility of these approaches under a wider set of cohorts and experimental conditions.

Appendix A:  Deriving the EM Update Equations

In this appendix, we provide full, explicit solutions for the parameter update equations of the EM algorithm. All M-steps (i.e., all solutions except cluster responsibilities) are obtained by finding the maximum-likelihood solution of the auxiliary equation, equation 2.3.

A.1  Cluster Responsibilities ()

In this E-step update, the conditional likelihoods , or responsibilities, are estimated for the current model parameter values. GMMs assume independence between voxels, conditioned on the cluster parameters. This model assumes that voxels within a “functional unit” (parcel) are all realizations of an underlying latent time series with random neural noise accounting for within-cluster variability, and thus the only dependence structure of interest is between . Due to the conditional independence of voxels, the complete data log likelihood factorizes on v. This allows us to use Bayes’ rule to directly evaluate the conditional likelihood that each voxel belongs to cluster k:
formula
where data likelihood is given by marginalizing over the full latent space.

A.2  Cluster Priors

We simplify the optimization problem by neglecting terms in the auxiliary equation Q that are constant in . This requires that we optimize the expression
formula
This can be solved in the manner of standard GMM solutions (Dempster et al., 1977), defined as follows. Because are cluster probabilities, we have the constraint . Using the methods of Lagrange multipliers, this can be solved by the constrained optimization of
formula
We take the partial derivative to find the optimum. This yields the expression
formula
Using the trick of summing over k on both sides of the equation, this solves for . Resubstituting,
formula
Cluster priors are thus given by the average responsibility over all voxels.

A.3  Noise Variance

We now discard terms in Q that are constant in , which requires that we optimize the expression
formula
We take the partial derivative to find the stationary point:
formula
which is directly rearranged to solve for variance:
formula

A.4  Voxel-Specific Network Scaling

Discarding terms in Q that are constant in wkv requires that we optimize the expression
formula
We take the partial derivative to find the stationary point:
formula
Isolating for wkv, we obtain the expression
formula

A.5  Subject-Specific Network Scaling

The elements of the diagonal network scaling matrices are independent, which allows us to solve for each diagonal element. Making the substitution of and discarding terms of Q that are constant in cks, we obtain:
formula
Taking the partial derivative to find the stationary point, we obtain
formula
which rearranges to solve for cks as
formula

A.6  Group Connectivity Matrix

Making the substitution of and discarding terms of Q that are constant in , the following expression must be optimized for the group connectivity matrix:
formula
Unlike the matrices, where nonzero elements are fully independent (i.e., they each contribute to unique cluster/subject), all elements of the matrix contribute across subjects and clusters. Thus, we must optimize the entire matrix simultaneously. This can be done by redefining as an expression of matrix traces,
formula
where , , , and is a vector of ones. Expanding, and further dropping terms constant in , we obtain
formula
By cyclic permutation and reordering of the diagonal matrices, this can be reexpressed:
formula
Taking the partial derivative , we obtain
formula
Rearranging to solve for , we obtain
formula

A.7  Orthonormal Subject Matrices

As with connectivity-defining matrix , the elements within orthonormal subject matrices are all interdependent, as they contribute mutually to all clusters and the column vectors must be orthonormal to each other. Thus, we solve for all elements simultaneously in a similar fashion as for the previous section. The auxiliary equation is of similar form as , but we no longer need to sum across subjects since they are solved independently:
formula
Expanding, we obtain:
formula
From this equation, it is evident that optimizing is equivalent to maximizing the cross-product term, as it is the only part of the initial expression that depends on :
formula
Applying cyclic permutation of matrices and moving the sum inside the trace now gives us
formula
We can optimize this expression subject to the constraint that is orthonormal using the approach established by Green (1952), also examined by Kiers et al. (1999) For the singular value decomposition , the cross-product is maximized for orthonormal matrix by
formula

Appendix B:  A Weak Prior on Noise Variance

For a gaussian mixture model with likelihood
formula
we can avoid noise variance collapsing to zero by imposing the improper prior function:
formula
Now, instead of maximizing model likelihood , we seek to maximize posterior distribution , where the latter term is the joint prior over all variance elements , with hyperparameter . The log likelihood is thus:
formula
We now perform M-step updating for each variance element using the modified auxiliary equation:
formula
Taking the partial derivative , we obtain stationary point
formula
Collecting terms and rearranging, we solve for variance:
formula

Appendix C:  Solving the Singular Wishart Distribution

To reduce computational costs and issues of numerical stability, we diagonalize signal covariance via eigendecomposition and redefine the following terms in equation 2.5, based on Sylvester’s determinant theorem and the Woodbury matrix identity, respectively:
formula
This defines the predictive likelihood, based on estimated voxel covariance , and it is extended to FSIM by substituting into equation 2.5 and rescaling voxels of the signal-space matrix by the appropriate wkv value, that is, , where “” denotes element-wise multiplication.

Appendix D:  Effects of Simulated Variability Parameters

To demonstrate the impact of various aspects of signal variability, we examined four simulation parameter settings, based on the initial simulation model described in section 2.4: (1) a data set without voxel or subject scaling variability and a consistent core network (i.e., setting ; (2) a data set with voxel scaling only; (3) a data set with subject and voxel scaling; and (4) a data set with subject and voxel scaling, along with intersubject variability in the core network. Intersubject variability was simulated by generating subject-specific and computing cluster means based on the average . For each simulated data set, we measured covariance generalization as a function of K for the four clustering models.

The predictive likelihood curves are plotted in Figure 10. For data without signal variability models, all show comparable performance. The addition of spatial variability reduces generalization for SX (i.e., standard GMM model), which assumes uniform spatial scaling relative to the other models. The addition of subject variability reduces generalization for the highly constrained IX model, which assumes consistent subject scaling relative to SV and IS. Finally, the addition of between-subject variability in connectivity (see Figure 10, right most panel) causes IS models to show worse performance compared to SV, which assumes no consistent connectivity structure.

Figure 10:

Predictive generalization measured by the log likelihood of test data, given the covariance estimated in the clustering models. Results are shown for four simulation models.

Figure 10:

Predictive generalization measured by the log likelihood of test data, given the covariance estimated in the clustering models. Results are shown for four simulation models.

Appendix E:  Clustering and Continuous Latent Variable Data

This appendix demonstrates that discrete latent variable clustering models can also robustly describe data derived from a continuous latent variable model. As a simple example, we simulated data consisting of three temporally independent, spatially overlapped components (see Figure 11A), instead of the spatially disjoint parcels zkv and spatial weights wkv described in section 2.4. These components were temporally uncorrelated by setting Simulation parameters were otherwise the same as in previous simulation modeling, with

Figure 11:

Applying the discrete FSIM to data simulated from a continuous latent variable model, consisting of three temporally independent but spatially overlapped components (). (A) Spatially overlapped components. (B) Model predictive generalization. (C) Spatial reproducibility. (D) Estimated FSIM model parameters for the optimal clusters.

Figure 11:

Applying the discrete FSIM to data simulated from a continuous latent variable model, consisting of three temporally independent but spatially overlapped components (). (A) Spatially overlapped components. (B) Model predictive generalization. (C) Spatial reproducibility. (D) Estimated FSIM model parameters for the optimal clusters.

Figures 11B and 11C show that predictive log likelihood saturates at clusters, and NMI rapidly decreases beyond this point, indicating this is the optimal model order. Figure 11D shows parameters of the FSIM at . Parcels and form the parts of components 1 and 2 that do not overlap with other components; this is shown by the near-zero correlation between and in the connectivity matrix. Parcel identifies the overlapped region between components 1 and 3, which is reflected in the strong correlation of with and in the connectivity matrix Here, overlap is represented as multiple (high) correlations between spatially disjoint parcels. A similar relationship is seen for parcel .

Thus, the discrete FSIM model gives a robust alternate representation of overlapped components, although it requires a higher model order () than the intrinsic subspace dimensionality 3 in order to fully represent the integrative structure. In general, for P overlapped components, clustering models may find up to disjoint regions with distinct connectivity structure. However, in practice, the optimal model order will depend on the effective spatial extent of components, and their signal-to-noise ratio (SNR).

Acknowledgments

This work was supported by the Lundbeckfonden (fellowship grant R105-9813 to M.M.). K.H.M. was supported by Lundbeckfonden (Grant of Excellence “ContAct” R59-5399 to Hartwig Roman Siebner) and by a Novo Nordisk Foundation Interdisciplinary Synergy Grant (NNF14OC0011413). The Magnetom Trio MR scanner was donated by the Simon Spies Foundation.

References

Andersen
,
A. H.
, &
Rayens
,
W. S.
(
2004
).
Structure-seeking multilinear methods for the analysis of fMRI data
.
NeuroImage
,
22
(
2
),
728
739
.
Andersen
,
K. W.
,
Madsen
,
K. H.
,
Siebner
,
H. R.
,
Schmidt
,
M. N.
,
Mørup
,
M.
, &
Hansen
,
L. K.
(
2014
).
Non-parametric Bayesian graph models reveal community structure in resting state fMRI
.
NeuroImage
,
100
,
301
315
.
Ashburner
,
J.
, &
Friston
,
K. J.
(
1999
).
Nonlinear spatial normalization using basis functions
.
Human Brain Mapping
,
7
(
4
),
254
266
.
Baumgartner
,
R.
,
Ryner
,
L.
,
Richter
,
W.
,
Summers
,
R.
,
Jarmasz
,
M.
, &
Somorjai
,
R.
(
2000
).
Comparison of two exploratory data analysis methods for fMRI: Fuzzy clustering vs. principal component analysis
.
Magnetic Resonance Imaging
,
18
(
1
),
89
94
.
Bellec
,
P.
,
Perlbarg
,
V.
,
Jbabdi
,
S.
,
Pélégrini-Issac
,
M.
,
Anton
,
J. L.
,
Doyon
,
J.
, &
Benali
,
H.
(
2006
).
Identification of large-scale networks in the brain using fMRI
.
NeuroImage
,
29
(
4
),
1231
1243
.
Bellec
,
P.
,
Rosa-Neto
,
P.
,
Lyttelton
,
O. C.
,
Benali
,
H.
, &
Evans
,
A. C.
(
2010
).
Multilevel bootstrap analysis of stable clusters in resting-state fMRI
.
NeuroImage
,
51
(
3
),
1126
1139
.
Bohland
,
J. W.
,
Bokil
,
H.
,
Allen
,
C. B.
, &
Mitra
,
P. P.
(
2009
).
The brain atlas concordance problem: Quantitative comparison of anatomical parcellations
.
PLoS One
,
4
(
9
),
e7200
.
Caliński
,
T.
, &
Harabasz
,
J.
(
1974
).
A dendrite method for cluster analysis
.
Communications in Statistics–Theory and Methods
,
3
(
1
),
1
27
.
Campbell
,
K.
,
Grigg
,
O.
,
Saverinu
,
C.
,
Churchill
,
N.
, &
Grady
,
C.
(
2013
).
Age differences in the intrinsic function at connectivity of default network subsystems
.
Frontiers in Aging Neuroscience
,
5
,
73
.
Carbonell
,
F.
,
Bellec
,
P.
, &
Shmuel
,
A.
(
2011
).
Global and system-specific resting-state FMRI fluctuations are uncorrelated: Principal component analysis reveals anticorrelated networks
.
Brain Connectivity
,
1
(
6
),
496
510
.
Chang
,
C.
, &
Glover
,
G. H.
(
2009
).
Relationship between respiration, end-tidal CO, and BOLD signals in resting-state fMRI
.
NeuroImage
,
47
(
4
),
1381
1393
.
Churchill
,
N. W.
, &
Strother
,
S. C.
(
2013
).
PHYCAA+: An optimized, adaptive procedure for measuring and controlling physiological noise in BOLD fMRI
.
NeuroImage
,
82
,
306
325
.
Consonni
,
V.
,
Ballabio
,
D.
, &
Todeschini
,
R.
(
2010
).
Evaluation of model predictive ability by external validation techniques
.
Journal of Chemometrics
,
24
(
3–4
),
194
201
.
Cook
,
R. D.
(
1977
).
Detection of influential observation in linear regression
.
Technometrics
,
19
,
15
18
.
Cordes
,
D.
,
Haughton
,
V.
,
Carew
,
J. D.
,
Arfanakis
,
K.
, &
Maravilla
,
K.
(
2002
).
Hierarchical clustering to measure connectivity in fMRI resting-state data
.
Magnetic Resonance Imaging
,
20
(
4
),
305
317
.
Corduneau
,
A.
, &
Bishop
,
C. M.
(
2001
).
Variational Bayesian model selection for mixture distributions
. In
Proceedings of the 8th International Conference on Artificial intelligence and statistics
(pp.
27
34
).
Burlington, MA
:
Morgan Kaufmann
.
Craddock
,
R. C.
,
James
,
G. A.
,
Holtzheimer
,
P. E.
,
Hu
,
X. P.
, &
Mayberg
,
H. S.
(
2012
).
A whole brain fMRI atlas generated via spatially constrained spectral clustering
.
Human Brain Mapping
,
33
(
8
),
1914
1928
.
Crinion
,
J.
,
Ashburner
,
J.
,
Leff
,
A.
,
Brett
,
M.
,
Price
,
C.
, &
Friston
,
K.
(
2007
).
Spatial normalization of lesioned brains: Performance evaluation and impact on fMRI analyses
.
NeuroImage
,
37
(
3
),
866
875
.
Damoiseaux
,
J. S.
,
Rombouts
,
S. A. R. B.
,
Barkhof
,
F.
,
Scheltens
,
P.
,
Stam
,
C. J.
,
Smith
,
S. M.
, &
Beckmann
,
C. F.
(
2006
).
Consistent resting-state networks across healthy subjects
.
Proceedings of the National Academy of Sciences
,
103
(
37
),
13848
13853
.
Dempster
,
A. P.
,
Laird
,
N. M.
, &
Rubin
,
D. B.
(
1977
).
Maximum likelihood from incomplete data via the EM algorithm
.
Journal of the Royal Statistical Society, Series B
(
Methodological
),
1
,
1
38
.
D’Esposito
,
M.
,
Deouell
,
L. Y.
, &
Gazzaley
,
A.
(
2003
).
Alterations in the BOLD fMRI signal with ageing and disease: A challenge for neuroimaging
.
Nature Reviews Neuroscience
,
4
(
11
),
863
872
.
Detre
,
J. A.
, &
Wang
,
J.
(
2002
).
Technical aspects and utility of fMRI using BOLD and ASL
.
Clinical Neurophysiology
,
113
(
5
),
621
634
.
Duda
,
R. O.
, &
Hart
,
P. E.
(
1973
).
Pattern classification and scene analysis
(
vol. 3
).
New York
:
Wiley
.
Efron
,
B.
, &
Tibshirani
,
R.
(
1986
).
Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy
.
Statistical Science
,
1
,
54
75
.
Fair
,
D. A.
,
Dosenbach
,
N. U.
,
Church
,
J. A.
,
Cohen
,
A. L.
,
Brahmbhatt
,
S.
,
Miezin
,
F. M.
, …
Schlaggar
,
B. L.
(
2007
).
Development of distinct control networks through segregation and integration
.
Proceedings of the National Academy of Sciences
,
104
(
33
),
13507
13512
.
Fraley
,
C.
, &
Raftery
,
A. E.
(
2002
).
Model-based clustering, discriminant analysis, and density estimation
.
Journal of the American Statistical Association
,
97
(
458
),
611
631
.
Friston
,
K.
(
2002
).
Functional integration and inference in the brain
.
Progress in Neurobiology
,
68
(
2
),
113
143
.
Friston
,
K.
,
Mattout
,
J.
,
Trujillo-Barreto
,
N.
,
Ashburner
,
J.
, &
Penny
,
W.
(
2007
).
Variational free energy and the Laplace approximation
.
Neuroimage
,
34
(
1
),
220
234
.
Fukunaga
,
K.
(
2013
).
Introduction to statistical pattern recognition
.
Orlando, FL
:
Academic Press
.
Garrett
,
D. D.
,
Kovacevic
,
N.
,
McIntosh
,
A. R.
, &
Grady
,
C. L.
(
2010
).
Blood oxygen level-dependent signal variability is more than just noise
.
Journal of Neuroscience
,
30
(
14
),
4914
4921
.
Garrity
,
A. G.
,
Pearlson
,
G. D.
,
McKiernan
,
K.
,
Lloyd
,
D.
,
Kiehl
,
K. A.
, &
Calhoun
,
V. D.
(
2007
).
Aberrant “default mode” functional connectivity in schizophrenia
.
American Journal of Psychiatry
,
164
(
3
),
450
457
.
Gillis
N.
(
2014
).
The why and how of nonnegative matrix factorization
.
ArXive e-prints
.
Golland
,
Y.
,
Golland
,
P.
,
Bentin
,
S.
, &
Malach
,
R.
(
2008
).
Data-driven clustering reveals a fundamental subdivision of the human cortex into two global systems
.
Neuropsychologia
,
46
(
2
),
540
553
.
Goutte
,
C.
,
Toft
,
P.
,
Rostrup
,
E.
,
Nielsen
,
F. Å.
, &
Hansen
,
L. K.
(
1999
).
On clustering fMRI time series
.
NeuroImage
,
9
(
3
),
298
310
.
Grady
,
C. L.
(
2008
).
Cognitive neuroscience of aging
.
Annals of the New York Academy of Sciences
,
1124
(
1
),
127
144
.
Green
,
B. F.
(
1952
).
The orthogonal approximation of an oblique structure in factor analysis
.
Psychometrika
,
17
(
4
),
429
440
.
Grefkes
,
C.
,
Nowak
,
D. A.
,
Eickhoff
,
S. B.
,
Dafotakis
,
M.
,
Küst
,
J.
,
Karbe
,
H.
, &
Fink
,
G. R.
(
2008
).
Cortical connectivity after subcortical stroke assessed with functional magnetic resonance imaging
.
Annals of Neurology
,
63
(
2
),
236
246
.
Greicius
,
M. D.
,
Krasnow
,
B.
,
Reiss
,
A. L.
, &
Menon
,
V.
(
2003
).
Functional connectivity in the resting brain: A network analysis of the default mode hypothesis
.
Proceedings of the National Academy of Sciences
,
100
(
1
),
253
258
.
Greicius
,
M. D.
,
Srivastava
,
G.
,
Reiss
,
A. L.
, &
Menon
,
V.
(
2004
).
Default-mode network activity distinguishes Alzheimer’s disease from healthy aging: Evidence from functional MRI
.
Proceedings of the National Academy of Sciences of the United States of America
,
101
(
13
),
4637
4642
.
Hansen
,
L. K.
,
Larsen
,
J.
,
Nielsen
,
F. Å.
,
Strother
,
S. C.
,
Rostrup
,
E.
,
Savoy
,
R.
, …
Paulson
,
O. B.
(
1999
).
Generalizable patterns in neuroimaging: How many principal components?
NeuroImage
,
9
(
5
),
534
544
.
Harshman
,
R. A.
(
1972
).
PARAFAC2: Mathematical and technical notes
.
UCLA Working Papers in Phonetics
,
22
(
3044
),
122215
.
Hlinka
,
J.
,
Paluš
,
M.
,
Vejmelka
,
M.
,
Mantini
,
D.
, &
Corbetta
,
M.
(
2011
).
Functional connectivity in resting-state fMRI: Is linear correlation sufficient?
NeuroImage
,
54
(
3
),
2218
2225
.
Kiers
,
H. A.
,
Ten Berge
,
J. M.
, &
Bro
,
R.
(
1999
).
PARAFAC2-Part I. A direct fitting algorithm for the PARAFAC2 model
.
Journal of Chemometrics
,
13
(
3–4
),
275
294
.
Laird
,
N.
,
Lange
,
N.
, &
Stram
,
D.
(
1987
).
Maximum likelihood computations with repeated measures: Application of the EM algorithm
.
Journal of the American Statistical Association
,
82
(
397
),
97
105
.
Lashkari
,
D.
,
Sridharan
,
R.
,
Vul
,
E.
,
Hsieh
,
P. J.
,
Kanwisher
,
N.
, &
Golland
,
P.
(
2012
).
Search for patterns of functional specificity in the brain: A nonparametric hierarchical Bayesian model for group fMRI data
.
NeuroImage
,
59
(
2
),
1348
1368
.
Li
,
K.
,
Guo
,
L.
,
Nie
,
J.
,
Li
,
G.
, &
Liu
,
T.
(
2009
).
Review of methods for functional brain connectivity detection using fMRI
.
Computerized Medical Imaging and Graphics
,
33
(
2
),
131
139
.
Lowe
,
M. J.
,
Phillips
,
M. D.
,
Lurito
,
J. T.
,
Mattson
,
D.
,
Dzemidzic
,
M.
, &
Mathews
,
V. P.
(
2002
).
Multiple sclerosis: Low-frequency temporal blood oxygen level–dependent fluctuations indicate reduced functional connectivity—initial results 1
.
Radiology
,
224
(
1
),
184
192
.
Lu
,
Y.
,
Jiang
,
T.
, &
Zang
,
Y.
(
2003
).
Region growing method for the analysis of functional MRI data
.
NeuroImage
,
20
(
1
),
455
465
.
Marrelec
,
G.
,
Krainik
,
A.
,
Duffau
,
H.
,
Pélégrini-Issac
,
M.
,
Lehéricy
,
S.
,
Doyon
,
J.
, &
Benali
,
H.
(
2006
).
Partial correlation for functional brain interactivity investigation in functional MRI
.
NeuroImage
,
32
(
1
),
228
237
.
McIntosh
,
A. R.
,
Chau
,
W. K.
, &
Protzner
,
A. B.
(
2004
).
Spatiotemporal analysis of event-related fMRI data using partial least squares
.
NeuroImage
,
23
(
2
),
764
775
.
McIntosh
,
A. R.
,
Kovacevic
,
N.
,
Lippe
,
S.
,
Garrett
,
D.
,
Grady
,
C.
, &
Jirsa
,
V.
(
2010
).
The development of a noisy brain
.
Archives italiennes de biologie
,
148
(
3
),
323
337
.
McLachlan
,
G.
, &
Peel
,
D.
(
2000
).
Finite mixture models
.
New York
:
Wiley
.
Martínez-Montes
,
E.
,
Vega-Hernández
,
M.
,
Sánchez-Bornot
,
J. M.
, &
Valdés-Sosa
,
P. A.
(
2008
).
Identifying complex brain networks using penalized regression methods
.
Journal of Biological Physics
,
34
(
3–4
),
315
323
.
Melnykov
,
V.
, &
Maitra
,
R.
(
2010
).
Finite mixture models and model-based clustering
.
Statistics Surveys
,
4
,
80
116
.
Miwakeichi
,
F.
,
Martnez-Montes
,
E.
,
Valdés-Sosa
,
P. A.
,
Nishiyama
,
N.
,
Mizuhara
,
H.
, &
Yamaguchi
,
Y.
(
2004
).
Decomposing EEG data into space–time–frequency components using parallel factor analysis
.
NeuroImage
,
22
(
3
),
1035
1045
.
Moeller
,
J. R.
,
Strother
,
S. C.
,
Sidtis
,
J. J.
, &
Rottenberg
,
D. A
(
1987
).
Scaled subprofile model: A statistical approach to the analysis of functional patterns in positron emission tomographic data
.
Journal of Blood Flow and Metabolism
,
7
,
649
658
.
Penny
,
W. D.
,
Stephan
,
K. E.
,
Mechelli
,
A.
, &
Friston
,
K. J.
(
2004
).
Comparing dynamic causal models
.
NeuroImage
,
22
(
3
),
1157
1172
.
Power
,
J. D.
,
Fair
,
D. A.
,
Schlaggar
,
B. L.
, &
Petersen
,
S. E.
(
2010
).
The development of human functional brain networks
.
Neuron
,
67
(
5
),
735
748
.
Raichle
,
M. E.
,
MacLeod
,
A. M.
,
Snyder
,
A. Z.
,
Powers
,
W. J.
,
Gusnard
,
D. A.
, &
Shulman
,
G. L.
(
2001
).
A default mode of brain function
.
Proceedings of the National Academy of Sciences
,
98
(
2
),
676
682
.
Rasmussen
,
P. M.
,
Hansen
,
L. K.
,
Madsen
,
K. H.
,
Churchill
,
N. W.
, &
Strother
,
S. C.
(
2012
).
Model sparsity and brain pattern interpretation of classification models in neuroimaging
.
Pattern Recognition
,
45
(
6
),
2085
2100
.
Rorden
,
C.
,
Bonilha
,
L.
,
Fridriksson
,
J.
,
Bender
,
B.
, &
Karnath
,
H. O.
(
2012
).
Age-specific CT and MRI templates for spatial normalization
.
NeuroImage
,
61
(
4
),
957
965
.
Rubinov
,
M.
, &
Sporns
,
O.
(
2010
).
Complex network measures of brain connectivity: Uses and interpretations
.
NeuroImage
,
52
(
3
),
1059
1069
.
Šmídl
,
V.
, &
Quinn
,
A.
(
2007
).
On Bayesian principal component analysis
.
Computational Statistics and Data Analysis
,
51
(
9
),
4101
4123
.
Smith
,
S. M.
,
Vidaurre
,
D.
,
Beckmann
,
C. F.
,
Glasser
,
M. F.
,
Jenkinson
,
M.
,
Miller
,
K. L.
, …
Barch
,
D. M.
(
2013
).
Functional connectomics from resting-state fMRI
.
Trends in Cognitive Sciences
,
17
(
12
),
666
682
.
Stephan
,
K. E.
,
Penny
,
W. D.
,
Moran
,
R. J.
,
den Ouden
,
H. E.
,
Daunizeau
,
J.
, &
Friston
,
K. J.
(
2010
).
Ten simple rules for dynamic causal modeling
.
NeuroImage
,
49
(
4
),
3099
3109
.
Strother
,
S.
,
Rasmussen
,
P.
,
Churchill
,
N.
, &
Hansen
,
L.
(
2015
).
Stability and reproducibility in fMRI analysis
. In
I.
Rish
,
G.
Cecchi
, &
A.
Lozano
(Eds.),
Optimization for machine learning
.
Cambridge, MA
:
MIT Press
.
Thirion
,
B.
,
Flandin
,
G.
,
Pinel
,
P.
,
Roche
,
A.
,
Ciuciu
,
P.
, &
Poline
,
J. B.
(
2006
).
Dealing with the shortcomings of spatial normalization: Multi-subject parcellation of fMRI datasets
.
Human Brain Mapping
,
27
(
8
),
678
693
.
Thirion
,
B.
,
Varoquaux
,
G.
,
Dohmatob
,
E.
, &
Poline
,
J. B.
(
2014
).
Which fMRI clustering gives good brain parcellations?
Frontiers in Neuroscience
,
8
.
Tononi
,
G.
,
Sporns
,
O.
, &
Edelman
,
G. M.
(
1994
).
A measure for brain complexity: Relating functional segregation and integration in the nervous system
.
Proceedings of the National Academy of Sciences
,
91
(
11
),
5033
5037
.
Turaga
,
P.
,
Veeraraghavan
,
A.
, &
Chellappa
,
R.
(
2008
).
Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp.
1
8
).
Piscataway, NJ
:
IEEE
.
Uhlig
,
H.
(
1994
).
On singular Wishart and singular multivariate beta distributions
.
Annals of Statistics
,
22
,
395
405
.
Van Den Heuvel
,
M.
,
Mandl
,
R.
, &
Hulshoff Pol
,
H.
(
2008
).
Normalized cut group clustering of resting-state FMRI data
.
PloS One
,
3
(
4
),
e2001
.
Van Dijk
,
K. R.
,
Sabuncu
,
M. R.
, &
Buckner
,
R. L.
(
2012
).
The influence of head motion on intrinsic functional connectivity MRI
.
NeuroImage
,
59
(
1
),
431
438
.
Vinnikov
,
A.
, &
Shalev-Shwartz
,
S.
(
2014
).
K-means recovers ICA filters when independent components are sparse
. In
Proceedings of the 31st International Conference on Machine Learning
(pp.
712
720
).
JMLR
.
Woolrich
,
M. W.
,
Behrens
,
T. E.
,
Beckmann
,
C. F.
, &
Smith
,
S. M.
(
2005
).
Mixture models with adaptive spatial regularization for segmentation with an application to fMRI data
.
IEEE Transactions on Medical Imaging
,
24
(
1
),
1
11
.