## Abstract

Functional connectivity (FC) studies have predominantly focused on resting state, where ongoing dynamics are thought to reflect the brain’s intrinsic network architecture, which is thought to be broadly relevant because it persists across brain states (i.e., is state-general). However, it is unknown whether resting state is the optimal state for measuring intrinsic FC. We propose that latent FC, reflecting shared connectivity patterns across many brain states, better captures state-general intrinsic FC relative to measures derived from resting state alone. We estimated latent FC independently for each connection using leave-one-task-out factor analysis in seven highly distinct task states (24 conditions) and resting state using fMRI data from the Human Connectome Project. Compared with resting-state connectivity, latent FC improves generalization to held-out brain states, better explaining patterns of connectivity and task-evoked activation. We also found that latent connectivity improved prediction of behavior outside the scanner, indexed by the general intelligence factor (*g*). Our results suggest that FC patterns shared across many brain states, rather than just resting state, better reflect state-general connectivity. This affirms the notion of “intrinsic” brain network architecture as a set of connectivity properties persistent across brain states, providing an updated conceptual and mathematical framework of intrinsic connectivity as a latent factor.

## Author Summary

The initial promise of resting-state fMRI was that it would reflect “intrinsic” functional relationships in the brain free from any specific task context, yet this assumption has remained untested until recently. Here we propose a latent variable method for estimating intrinsic functional connectivity (FC) as an alternative to rest FC. We show that latent FC outperforms rest FC in predicting held-out FC and regional activation states in the brain. Additionally, latent FC better predicts a marker of general intelligence measured outside of the scanner. We demonstrate that the latent variable approach subsumes other approaches to combining data from multiple states (e.g., averaging) and that it outperforms rest FC alone in terms of generalizability and predictive validity.

## INTRODUCTION

A major goal in cognitive neuroscience in recent years has been to move away from characterizing brain activation and connectivity in specific task states towards understanding “intrinsic” or context-free brain activity. Such activity reflects the more than 95% of metabolic brain activity that remains unchanged across cognitive demands (Raichle, 2006). This ongoing brain activity persists across states and is not attributable to external stimuli or task demands. Efforts to understand intrinsic function have focused primarily on statistical associations between brain activity time series (functional connectivity; FC) during the resting state (Fox & Raichle, 2007; but see Finn et al., 2015, and Greene, Gao, Scheinost, & Constable, 2018, for task-based investigations), which has revealed an intrinsic brain functional network architecture that recapitulates patterns of task-evoked brain activity (Cole, Ito, Bassett, & Schultz, 2016; Smith et al., 2009) and structural connectivity (Honey et al., 2009). As the name implies, however, resting state is just one state that the brain can occupy, and a truly “intrinsic” connectivity network should persist across the many different states a brain might assume. In other words, a “state-general” intrinsic network. Despite its importance for understanding brain function, many uncertainties remain on how to best estimate intrinsic FC. While some efforts have focused on the need to obtain longer resting-state scans (Anderson, Ferguson, Lopez-Larson, & Yurgelun-Todd, 2011; Elliott et al., 2019; Hacker et al., 2013; Laumann et al., 2015), more recent approaches have highlighted advantages of combining resting-state and task data to analyze intrinsic activity.

This second set of approaches leverages functional data across different task (and rest) scans in order to improve the reliability of FC estimates and their predictive utility (Elliott et al., 2019). Because of the relatively high stability of FC networks across task states (Cole, Bassett, Power, Braver, & Petersen, 2014; Gratton et al., 2018; Krienen, Yeo, & Buckner, 2014), combining data across task runs aims to distinguish what is common across a larger set of brain states. What is common therefore reflects the intrinsic patterns of covariance in the brain, while variation between different brain states is treated as noise in the combined data. However, this work largely relies on averaging data from multiple scans together (Elliott et al., 2019). While this approach has been shown to be useful, and has the advantage of simplicity, there are potential theoretical limitations to such an approach that may limit its generalizability. Given its ubiquity and close-formed, arithmetic solution, the average is rarely thought of as a formal statistical model. However, recent work (McNeish & Wolf, 2020) has shown that the average can be thought of as a restricted case of the more general factor analytic model. Embedding the average in a theoretically rich statistical framework is likely to offer advantages for interpretation of results using this measure as well as insights into the measure itself.

Factor analysis has a long tradition in the behavioral sciences (Spearman, 1904; Thurstone, 1935) and is an invaluable tool in psychometrics and psychological measurement. Its key insight is that observed measures (e.g., behavioral responses or fMRI scans) are imperfect manifestations of an unobserved (i.e., latent) variable (Bollen, 2002). In the factor model, observed indicators (*y*_{i,t}; *i* = individual, *t* = task state) are modeled as dependent on the underlying latent factor (** η**; Figure 1). Variability in the indicators is partitioned into common variance (transmitted through the factor loading matrix,

**Λ**) and unique variance (

*ε*_{t}). In this model, latent FC represents an unmeasured, underlying brain state that is common to all observed brain states (i.e., the indicators: resting state, motor task, etc.), but we also explicitly model additional variance that is only found in each individual task state through the error terms. Factor loadings for the individual task states (e.g.,

*λ*

_{11}for Rest) in this single-factor model can be interpreted as the proportion of variance explained in each task state by latent FC (similar to

*R*

^{2}in regression).

As can be seen in Figure 1, the factor analysis model of latent FC is a parameter-rich model that allows for differentially weighted relationships between the underlying latent connectivity and measured connectivity in each specific state. What McNeish and Wolf (2020) showed, however, is that the average can be recovered using this model by setting all factor loadings (*λ*) equal to 1 and the unique variances to 0. This recast of the average as a special case of the factor model not only has the advantage of making the assumptions of the average clearer, but it enables a formal test of those assumptions. For instance, by setting all factor loadings equal, the average assumes that each observed FC state is equally (and positively) related to the underlying latent FC. If we want to relax that assumption, the factor analytic model can be used to compute unique optimally weighted values for each factor loading, which suggests that some observed states may be better (or worse) reflections of underlying latent FC. Indeed, factor loadings may take on negative values, which implies that an observed indicator is anti-correlated with the underlying latent FC. However, if the assumption of equal, positive weighting is indeed an appropriate assumption, freely estimated factor loadings will converge towards equal values and approximate the average. In other words, the flexibility of the full factor loading does not preclude the average, but instead offers a broader range of possibility for deriving a measure of latent FC in heterogeneous data and can be used to test the validity of the average FC assumption of equal positive factor loadings across brain states.

Here, we test the reliability of a factor analytic framework for modeling state-general brain connectivity—“intrinsic” FC that generalizes across a variety of brain states. First, we hypothesized that latent FC reflects a positive manifold (analogous to the positive correlations across intelligence tests in general intelligence research; Kovacs & Conway, 2016), where all state-specific connectivity values are positively correlated with each other and so load positively onto the underlying latent variable. This would confirm that a single common intrinsic functional network architecture exists across conscious brain states. Importantly, this differs from the idea that states are correlated (Finn et al., 2015; Gratton et al., 2018), as between-subject variance is decomposed at each individual connection rather than correlating across connections. We further hypothesized that by combining information across task states, such as in the factor model, a more reliable measure of “intrinsic” connectivity can be estimated than when using resting-state data alone (the current field standard). This would suggest that resting-state FC is not necessarily the best state for estimating intrinsic FC, especially if resting state does not load higher on the latent variable than other states. In testing these hypotheses, we developed an analytic framework for estimating state-general, latent FC in whole-brain functional data. Using multitask fMRI data from the Human Connectome Project (HCP), we compare the ability of latent and resting-state FC to predict task-evoked activation and task-state FC for held-out brain states, as well as to explain individual differences in psychometric “*g*” (a measure of human intelligence derived with a similar factor analytic model). Results demonstrate the promise of the latent variable approach in functional neuroimaging, particularly for the estimation of intrinsic FC that generalizes beyond specific brain states (e.g., rest). Finally, we demonstrate the relationship between freely estimated latent FC and the simpler average FC approach and discuss the theoretical advantages of casting both methods in the latent variable framework for future work.

## RESULTS

### Factor Analysis Model of Latent Connectivity

We ran independent factor analysis models for each connection, estimating the factor loadings of the latent variable (i.e., latent FC) onto each state. Latent FC captures the shared variance in FC across all states (see Figure 2). Factor analyses were run using all available data (i.e., the full time series and all states). All analyses were performed in the exploratory sample independently and then replicated in the validation sample (both *N* = 176; see the Participants section for additional information). Importantly, all factor analytic models were fit for each sample separately to avoid issues of circularity when comparing results across samples.

Consistent with our hypothesis that there is a “positive manifold” demonstrating a common latent FC architecture across states, almost all factor loadings were positive (greater than 99%) across all connections and all states (see Table 1). Furthermore, 70.7% of all factor loadings were reasonably large in magnitude (factor loading ≥ 0.4), and 97.4% of connections had two or more states with factor loadings ≥ 0.4 in the full latent FC model. The emotion task had the fewest large factor loadings (47.3%), and the resting state had the most (92.6%; see Table 1 for full details).

**Table 1.**

State . | All data . | Controlling for no. of time points . | ||
---|---|---|---|---|

% Loadings ≥ 0 . | % Loadings ≥ 0.4 . | % Loadings ≥ 0 . | % Loadings ≥ 0.4 . | |

Rest | 99.9 | 92.6 | 99.0 | 31.6 |

Emotion | 99.3 | 47.3 | 98.7 | 46.3 |

Gambling | 99.6 | 65.0 | 99.1 | 62.2 |

Motor | 99.8 | 68.0 | 99.4 | 54.8 |

Reasoning | 99.5 | 62.1 | 99.1 | 62.3 |

Social | 99.8 | 66.2 | 99.3 | 58.4 |

Working memory | 99.7 | 67.0 | 99.2 | 64.9 |

Math | 99.9 | 82.4 | 99.6 | 82.4 |

Language | 99.9 | 86.0 | 99.7 | 86.3 |

State . | All data . | Controlling for no. of time points . | ||
---|---|---|---|---|

% Loadings ≥ 0 . | % Loadings ≥ 0.4 . | % Loadings ≥ 0 . | % Loadings ≥ 0.4 . | |

Rest | 99.9 | 92.6 | 99.0 | 31.6 |

Emotion | 99.3 | 47.3 | 98.7 | 46.3 |

Gambling | 99.6 | 65.0 | 99.1 | 62.2 |

Motor | 99.8 | 68.0 | 99.4 | 54.8 |

Reasoning | 99.5 | 62.1 | 99.1 | 62.3 |

Social | 99.8 | 66.2 | 99.3 | 58.4 |

Working memory | 99.7 | 67.0 | 99.2 | 64.9 |

Math | 99.9 | 82.4 | 99.6 | 82.4 |

Language | 99.9 | 86.0 | 99.7 | 86.3 |

To control for differences between states in the amount of data used to obtain state-specific FC estimates, factor analyses were rerun while matching the number of time points from rest and task data (2,112 TRs from rest and 264 TRs for each of the eight tasks). With this approach, resting state had the fewest number of relatively high-magnitude factor loadings of all states—only 31.6% of resting-state connections had factor loadings ≥ 0.4. Thus, resting state had the highest factor loadings onto latent FC when a large amount of data were used to estimate resting-state FC, but the lowest factor loadings when less data were used. Controlling for the number of time points between task and rest led to less pronounced changes in the factor loadings of the other states (see Figure 2), likely because there was no relationship between the number of TRs for a given task state and its average factor loading in the full TR analysis (see Figure S1 in the Supporting Information). Note that this drop occurs even though rest continues to have substantially more TRs (8×) than any given task state in these analyses.

### Latent FC Improves Generalization to Connectivity of Held-Out States

We next sought to test our second hypothesis: A more reliable and generalizable measure of “intrinsic” connectivity can be estimated by combining information across task states, such as in the factor model, than by using resting-state data alone (the current field standard). To test whether the measures of intrinsic FC persist across brain states, we quantified the generalizability of rest FC and latent FC to held-out brain states. To calculate the similarity of FC patterns (i.e., across 64,620 network connections), we computed the Pearson’s correlation of rest FC or latent FC with state FC for each individual subject, applying Bonferroni correction to correct for multiple comparisons. For latent FC, similarity was always computed for the state that was held-out while running the factor analysis model. Compared with rest FC, we found that latent FC exhibited significantly greater similarity with a variety of independent brain states (see Figure 3A). Similarity of each state with latent FC was comparable across states, exhibiting the greatest similarity to the WM task (*r* = 0.71) and the least similarity to the social task (*r* = 0.66) and resting state (*r* = 0.65). Rest FC exhibited the greatest similarity to the full resting-state data (*r* = 0.73), providing a measure of test-retest similarity of rest FC (i.e., how well the restricted TR data represent the correlation matrix computed on the complete resting-state data). For the task states, rest FC had the greatest similarity to the motor task (*r* = 0.61) and the least similarity to the relational task (*r* = 0.56).

When using the full time series (i.e., not controlling for the amount of data used to obtain the FC estimates across states), we still found greater similarity of latent FC relative to rest FC with the task states. However, latent FC exhibited the greatest similarity to the resting state (*r* = 0.80) and the least similarity to the social task (*r* = 0.67; see Figure 3B). Alongside greater similarity estimates with all states, this suggests that states may converge towards latent FC as we sample substantially more data for any given state (e.g., for resting-state FC, 26 min of data per participant were included in the data-restricted analysis vs. 58 min of data in the unrestricted analysis). All findings were replicated in the validation dataset (Supporting Information, Figure S2).

### Latent FC Improves Prediction of Task Activation Patterns

We next sought to further test our hypothesis that latent FC is highly generalizable (relative to resting-state FC), this time by testing for generalization beyond FC to patterns of task-evoked activation. We began by using general linear models (GLMs) to estimate the pattern of task-evoked activation for each of 24 task conditions. We then used activity flow mapping (Figure 4A) to predict the pattern of task-evoked activation based on a simple neural network model parameterized using either resting-state FC or latent FC. We used Pearson’s correlation to compute the similarity of predicted-to-actual task activations of two activity flow models with different connectivity estimates based on either latent FC or rest FC. As a global measure of performance, we first correlated the predicted activation patterns from the activity flow model using rest and latent FC with the observed activations. Predicted activation patterns from activity flow models with connectivity based on latent FC (*r* = 0.66) outperformed predictions based on resting-state FC (*r* = 0.56) in reproducing the observed beta activation patterns (Figure 4D). We then compared the results of the two models at the region (i.e., prediction for a given region across conditions) and condition (i.e., prediction for a given condition across regions) level.

We first estimated predicted beta activations for each region (across conditions) using the activity flow models. This reflects the changes in activation within each region that are dependent on the task condition. For each region, we compared the beta activation predictions of the two activity flow models. For each network, we computed the percentage of regions with significantly improved predictions for one of the two models. When using the activity flow model based on latent FC, the predictions were significantly improved (based on a corrected *t* test of z-transformed correlation coefficients) for 68% of brain regions (246 out of 360 total), accounting for 33% of VIS1, 69% of VIS2, 64% of SMN, 73% of CON, 70% of DAN, 62% of LAN, 62% of FPN, 100% of AUD, 78% of DMN, 14% of PMM, 0% of VMM, and 33% of ORA. Activity flow based on rest FC significantly improved predictions in 1% of brain regions (4 out of 360 total), accounting for 7% of VIS2 and no other networks (Figure 4B).

When considering prediction accuracy for each task condition, we found that latent FC significantly improved the across-region predicted activations for all task conditions—except the left-hand condition of the motor task—when comparing the relative activations across the topology of the brain within a condition (Figure 4C). Overlap of predicted-to-actual task activations for the activity flow models were variable by task condition. The activity flow model based on latent FC exhibited the greatest similarity to the two-back body condition of the WM task (*r* = 0.76) and the least similarity to the math condition of the language task (*r* = 0.45). The activity flow model based on rest FC exhibited the greatest similarity to the matching condition of the relational task (*r* = 0.67) and the least similarity to the math condition of the language task (*r* = 0.4). All findings were replicated in the validation dataset (Supporting Information, Figure S3).

### Latent FC Improves Prediction of General Intelligence

Our hypothesis that latent FC generalizes better than resting-state FC also predicts that latent FC should be more related to general cognition and behavior, even behavior independent of the particular tasks used for estimating the task-state FC going into the latent FC estimates. We tested whether latent FC improves prediction of general intelligence using psychometric *g* to capture many different behavioral and cognitive measures (Dubois, Galdi, Paul, & Adolphs, 2018; Gottfredson, 1997). We estimated general intelligence (psychometric *g*) using a factor analysis model on behavioral data from a range of cognitive tasks, then tested whether latent FC and/or rest FC measures could predict general intelligence. We combined the exploration and validation samples to increase the number of participants to 352 for this analysis, given the need for additional participants (relative to the other analyses in this study) to achieve reasonable statistical power for individual difference correlations (Yarkoni, 2009). We then employed a multiple linear regression with ridge regularization approach to predict general intelligence from FC. However, one potential confounding issue with simply pooling the full sample data is that the estimated factor scores for latent FC and psychometric *g* would be influenced by the data of to-be-predicted individuals, introducing circularity into these analyses. To avoid this, we implemented a between-sample cross-validation approach. Here, we estimated factor models for latent FC and psychometric *g* scores in each subsample separately (i.e., exploratory and validation), and predictions for the exploratory subjects were generated from the validation sample regression model and vice versa.

We found that predicted general intelligence was significantly correlated with actual general intelligence for models using both rest FC (*r* = 0.26, *p* = 5.46e−07) and latent FC (*r* = 0.35, *p* = 1.37e−11) (Figure 5A). Consistent with our hypothesis, the model using latent FC significantly improved prediction of general intelligence compared with the model using rest FC (Δ*r* = 0.09, *t* = 1.77, *p* = 0.04; see Eid, Gollwitzer, & Schmitt, 2011, for the correlation comparison method). The magnitude of this effect was large, as the percentage linear variance explained by latent FC (*R*^{2} = 0.123) was approximately two times the percentage linear variance explained by rest FC (*R*^{2} = 0.067). In comparison with the overall sample results, the correlation and difference in *R*^{2} was larger for the exploratory sample (Figure 5B), while the validation set showed a more similar difference in *R*^{2} despite lower correlations between predicted and actual psychometric *g* scores for both latent and rest FC data (Figure 5C). A meta-analysis (Field, 2001) of the exploratory and validation samples suggested that the pooled correlation difference effect was significant (Δ*z*_{pooled} = 0.09, *p* = 0.016).

### Comparing Latent and Average FC

While the factor model uses the covariance among the different states to compute optimal weights, a simpler approach to finding consensus among states involves taking a simple average across states. This approach assumes the weights/loadings between measured states are equal. Given that the computed weights in our results with latent FC were relatively uniform across states, we determined that this assumption was reasonable in this case. This supports the use of average FC; however, we directly compared latent FC with average FC to assess whether there were any advantages to either method. To compare the factor model with a simple average, we computed the mean value of each edge across states to construct an average connectivity matrix. For all analyses, we controlled for the amount of data between rest and task. Results indicated that combining across states, regardless of the approach, shows substantial improvements over using even the full resting-state data. Indeed, the average FC approach appears to outperform the latent FC approach (albeit only slightly) in generalizing to held-out connectivity states (Figure 6A). In the activity flow mapping results, however, latent FC consistently outperforms average FC in predicting regional activity patterns, showing better predictions in 348 out of 360 regions (97%), whereas average FC showed no improved predictions (Figure 6B). Similarly, latent FC outperformed average FC in condition-wise activity flow predictions in 22 out of 24 conditions (Figure 6C). Together these results suggest that the average FC approach (sometimes termed “general functional connectivity”) is a reasonable alternative to the more complex latent FC approach, so long as the optimal weights across states are close to equal (an assumption not made by latent FC). This difference between the methods would likely become more meaningful in cases wherein a particular brain state is highly distinct from all others (e.g., deep sleep vs. conscious states) or when one or more states is much noisier than the others (which would be weighted lower by latent FC but not by average FC).

## DISCUSSION

Defining a map of task-independent, intrinsic functional connections in the brain is a major aim of basic research in cognitive neuroscience. Intrinsic FC persists across task states, making it a more reliable and generalizable measure of the underlying functional dynamics that shape cognition and behavior. As such, measures of intrinsic FC are better candidates to serve as stable biomarkers of important individual differences in behavioral outcomes (Elliott et al., 2019). We utilized a factor analytic approach, a well-developed technique from measurement psychometrics (Bollen, 2002), to define intrinsic FC as a latent variable derived from the common variances in FC across task states. We compared the factor model against the standard approach applied in the field, FC derived from resting state. The factor model not only shows enhanced measurement and predictive properties beyond measures of intrinsic FC derived from resting state, it also offers a unique theoretical perspective on the relationship between intrinsic and task-specific brain states. In a latent variable model, individual task states are viewed as observable sample realizations of the underlying intrinsic connectivity, and task-specific deviations from this baseline are modeled as unique errors arising from a combination of noise and state-specific properties. The factor modeling approach allows researchers not only to gain traction in defining intrinsic FC common among brain states, but also to separate and explore properties that are specific to individuals and states.

### Factor Analytic Model of Functional Connectivity

We began by building factor analysis models of latent FC using two approaches. In the first, we modeled latent FC using all available data. In this model, resting-state functional connections had the highest number of significant loadings of any condition. However, when controlling for the number of time points (by reducing the number of resting-state time points to match the tasks with shorter durations), resting-state connections had the lowest percentage of significant factor loadings. This property of the factor model highlights one of its strengths; higher precision measurements show higher fidelity to the underlying common latent factor than do lower precision measures. Here, the precision appears to be driven primarily by the amount of data. However, in the absence of stringent data quality control, the factor model can also down-weight poor-quality data (e.g., high motion, artifacts) relative to higher quality data when variability associated with noise does not replicate across task states.

Conversely, tasks that more closely represent underlying intrinsic FC will show stronger factor loadings, similar to how the Raven’s Progressive Matrices task loads highly onto the generalized intelligence factor (Dubois et al., 2018). Given its widespread use as a marker of intrinsic FC, we might have expected that resting state would load highly onto the latent FC factor regardless of how much data went into its estimation. However, when controlling for the amount of data used to estimate FC, the resting-state loadings were lower than all other examined states, even though there were still many more TRs of resting state than any one task state. Additionally, when using the full amount of data to estimate rest FC, the factor loadings for resting state was similar to the story and math tasks, each of which were estimated with much less data (Figure 2; values in parentheses). These results suggest that resting state is not an especially good proxy for intrinsic FC, which aligns with its relatively poor performance compared with latent FC in predicting the patterns of connectivity and evoked brain activity observed for other states.

### Latent FC as a Reliable Measure of Intrinsic Connectivity

As mentioned previously, a marker of intrinsic connectivity is its persistence across task states (i.e., generalizability), as well as its ability to accurately recapitulate observed realizations of evoked brain activity and connectivity (Elliott et al., 2019; Elliott et al., 2020; Kragel, Han, Kraynak, Gianaros, & Wager, 2020; Parkes, Satterthwaite, & Bassett, 2020). Our results highlight the advantages of latent versus rest FC to reliably predict independent connectivity and regional activations. When comparing patterns of connectivity, we showed that latent FC showed higher correlation with held-out, task-specific connectivity states compared with rest FC, with the sole exception of resting-state connectivity where rest FC outperformed. This pattern of results suggests that resting-state FC is less generalizable as a measure of intrinsic connectivity and instead there are resting-state-specific factors that shape the dynamics of rest FC that are not present in other states.

One potential explanation for this might be that tasks as a group reliably differ from rest FC’s more intrinsic profile, and the reduction in generalizability reflects deviations from a default state. Under this explanation, the latent FC advantage could simply reflect that there are more task indicators in the measurement model than rest (although note that even when controlling for number of time points, the amount of rest data is equal to all the tasks combined) and we would predict that latent FC would be a poorer representation of rest FC patterns of connectivity. However, results did not show a substantial drop in the correlation of latent and rest FC compared with the correlations of latent FC with the various task FC patterns (blue bars, Figure 3). Indeed, it is when we used rest FC as the predictor that we observed reductions in its correlation with task connectivity, compared with resting state (red bars, Figure 3). This suggests that latent FC does a better job of representing common, stable variability in FC profiles across both resting and task states. Importantly, latent FC does so even though the task or rest condition being correlated is left out of the factor model for that specific comparison to avoid circularity. As such, the factor score analytically has different indicators across all comparisons, and nevertheless still outperforms rest FC. Moreover, obtaining a better sample of the resting state by using the full time series resulted in the resting state having the highest factor loadings and the strongest correlation with latent FC, which suggests that over time the resting state converges to latent FC.

The advantages of latent FC are not, however, restricted to the connectivity space; the latent measure of intrinsic FC also outperforms rest FC in predicting state-specific activation patterns. Not only did latent FC support higher prediction accuracy by the activity flow model of task activation globally (*r*_{latent} = 0.66 vs. *r*_{rest} = 0.56), it showed condition-specific advantages in 23 out of 24 specific task conditions (Figure 4B). Rest FC, in comparison, displayed higher prediction accuracy in none of the task conditions (in the left-hand motor condition, latent and rest FC performed comparably; Figure 4C). When we examined predictions of region-specific patterns of activation, results showed that latent FC had improved prediction over rest FC for 68% of all brain regions across a variety of distributed networks. In contrast, rest FC showed improved prediction for only 1.1% of regions, all of which were restricted to the VIS2 network (and constituted only 7% of that network). These improvements, as before, were not due to circularity in the analyses, as task predictions using latent FC were done using the leave-one-task-out approach in the factor model.

### Improving External Validity with Latent FC

While latent FC has demonstrable advantages for prediction within the brain, its utility as a method of estimating brain-based biomarkers relies on its predictive validity for outcomes of interest. Here, we showed that connectivity values from latent FC showed superior prediction of a metric of generalized intelligence (psychometric *g*) than did rest FC connections. Although both rest FC and latent FC values significantly predicted individual differences in generalized intelligence, latent FC nearly doubled the percentage of explained variance in the outcome over rest FC (∼12% versus ∼7%). In measurement science, this is a hallmark advantage of the latent variable approach used in factor analysis. Methods that fail to account for measurement error tend to show reduced relationships between variables, whereas modeling state-specific error terms dis-attenuates those relationships (Schmidt & Hunter, 1996). Indeed, generalized intelligence is generally modeled with a factor analytic approach for precisely this reason. We demonstrate that the framework for improving measurement properties in behavioral measures applies equally to measures derived from functional neuroimaging data. As such, factor analytic models are ideal for aiding the search for biomarkers across a wide domain of individual difference outcomes. Furthermore, more reliable estimates of FC may aid modeling efforts that use intermediate network metrics (e.g., modularity, hub diversity) to predict participant behavior (e.g., Bertolero, Yeo, Bassett, & D’Esposito, 2018), and offer an exciting range of possible uses for latent FC in future work.

### State Aggregation Improves Predictive Performance

The performance of average FC suggests that aggregating information across states has advantages over longer scan sessions of resting state, regardless of the approach used. Interestingly, average FC performance is not uniform in relation to the latent FC, performing as good or slightly better than latent FC in correlating with state-specific connectivity, but underperforming latent FC in predicting held-out activity in almost all regions. A few circumstances may predict when we would expect to see more or less pronounced differences between average and latent FC. First, data quality: We expect more pronounced differences for lower quality data and less pronounced differences for higher quality data. The HCP data used here is of extremely high quality, which reduces variability in noise between scans. This is reflected in the average factor loadings, which are relatively close in value across states (Figure 2). Of course, as the loadings converge in value, the more similar average and latent FC will become (here the connectivity values are correlated; *r* = 0.98). Second, the method of factor analysis used: Here, we opted to fit a single-factor model for each connection independently because of the large number of operations (e.g., separate models for each connection). However, a single factor in isolation may not be the best fit for brain data (van Kesteren & Kievit, 2020), and the method here might represent a sort of floor performance for latent FC relative to approaches that adopt a dependent model that tries to optimize the fit for each factor model.

Finally, there appear to be differences depending on the type of dependent variable in question. For example, while the factor and average models converge in their correlation with connectivity for held-out states, we found that activity flow models that incorporated latent FC performed better. Average and factor models produced similar patterns of relative connectivity (i.e., highly correlated patterns of FC); however, the distribution of connectivity values differ. Latent FC estimates exhibited a sparser distribution of connectivity by zeroing out low and/or unstable connections, which may have improved the activity flow models by reducing the contributions of disconnected brain regions (see Figure S5 in the Supporting Information).

Despite the relatively small differences in performance between average and latent FC, there are theoretical reasons to prefer a latent variable perspective for FC estimation. The first, as mentioned before, is that while the average FC must assume equal loadings, latent FC makes this a testable hypothesis. If loadings converge towards equal values, then average and latent FC will converge (as they nearly did here). This suggests that averaging will likely perform well under conditions similar to the HCP data (high-quality, young adult data). However, as the data diverge from this baseline, latent FC should have advantages by weighting data according to how closely they reflect intrinsic functional states and contribute to the common variance across measures. If differences among measures increase (i.e., measures reflect intrinsic FC better or worse), we would hypothesize that average and latent FC would diverge in their performance. We can see this in a small reproducible example (see supplemental code demonstrations in the Supporting Information), where more variable loadings impact the ability of sums scores, but not factor scores, to predict a hypothetical outcome variable. However, apart from these practical considerations, a latent variable model of FC is a good theoretical model for how state-specific functional connections emerge from underlying, intrinsic neural connectivity. Intrinsic connectivity is an unobserved state (Bollen, 2002) that gives rise to state-specific phenotypes based on combinations of common (i.e., the latent factor) and state-specific (i.e., the error) variance.

### Conclusions

In summary, we utilized a factor analytic approach to derive intrinsic FC from multiple task and resting-state data. Our derived measure, termed latent FC, showed improved generalizability and reliability compared with a standard measure of resting-state FC. Not only did latent FC do a better job of reflecting state-specific FC patterns across tasks, it also overwhelmingly improved predictions of regional activations when utilized in activity flow models. Finally, connectivity derived from latent FC doubled the predictive utility of an external measure of generalized intelligence (*g*) compared with connectivity from rest FC, highlighting its suitability for use in clinical and other individual difference research, where reliable biomarkers are needed. These results present compelling support for the use of factor analytic models in cognitive neuroscience, demonstrating the value of established tools from psychometrics for enhancing measurement quality in neuroscience.

## MATERIALS AND METHODS

For clarity, portions of the text in this section are from our prior publication using the same dataset and some identical analysis procedures: Ito et al. (2020).

### Participants

Data in the present study were collected as part of the Washington University–Minnesota Consortium of the Human Connectome Project (HCP; Van Essen et al., 2013). A subset of data (*n* = 352) from the HCP 1200 release was used for empirical analyses. Specific details and procedures of subject recruitment can be found in Van Essen et al. (2013). The subset of 352 participants was selected based on the following: quality control assessments; that is, any participants with any quality control flags were excluded, including (a) focal anatomical anomaly found in T1w and/or T2w scans, (b) focal segmentation or surface errors, as output from the HCP structural pipeline, (c) data collected during periods of known problems with the head coil, and (d) data in which some of the FIX-ICA components were manually reclassified; exclusion of high-motion participants (participants that had any fMRI run in which more than 50% of TRs had greater than 0.25-mm framewise displacement); and removal according to family relations (unrelated participants were selected only, and those with no genotype testing were excluded). A full list of the 352 participants used in this study will be included as part of the code release.

All participants were recruited from Washington University in St. Louis and the surrounding area. We split the 352 subjects into two cohorts of 176 subjects: an exploratory cohort (99 women) and a validation cohort (84 women). The exploratory cohort had a mean age of 29 years of age (range = 22–36 years of age), and the validation cohort had a mean age of 28 years of age (range = 22–36 years of age). All subjects gave signed, informed consent in accordance with the protocol approved by the Washington University Institutional Review Board.

### Scan Acquisition

Whole-brain multiband echo-planar imaging acquisitions were collected on a 32-channel head coil on a modified 3T Siemens Skyra with TR = 720 ms, TE = 33.1 ms, flip angle = 52°, bandwidth = 2,290 Hz/Px, in-plane FOV = 208 × 180 mm, 72 slices, 2.0-mm isotropic voxels, with a multiband acceleration factor of 8. Data for each subject were collected over the span of two days. On the first day, anatomical scans were collected (including T1-weighted and T2-weighted images acquired at 0.7-mm isotropic voxels) followed by two resting-state fMRI scans (each lasting 14.4 min) and ending with a task fMRI component. The second day consisted of first collecting a diffusion imaging scan, followed by a second set of two resting-state fMRI scans (each lasting 14.4 min), and again ending with a task fMRI session.

Each of the seven tasks was collected over two consecutive fMRI runs. The seven tasks consisted of an emotion cognition task, a gambling reward task, a language task, a motor task, a relational reasoning task, a social cognition task, and a working memory task. Briefly, the emotion cognition task required making valence judgments on negative (fearful and angry) and neutral faces. The gambling reward task consisted of a card guessing game, where subjects were asked to guess the number on the card to win or lose money. The language processing task consisted of interleaving two language conditions, which involved answering questions related to a story presented aurally, and a math condition, which involved basic arithmetic questions presented aurally. Note that we treated the two language task conditions as separate tasks, given the highly distinct nature of the conditions (other than that they were presented aurally). The motor task involved asking subjects to either tap their left/right fingers, squeeze their left/right toes, or move their tongue. The reasoning task involved asking subjects to determine whether two sets of objects differed from each other in the same dimension (e.g., shape or texture). The social cognition task was a theory of mind task, where objects (squares, circles, triangles) interacted with each other in a video clip, and subjects were subsequently asked whether the objects interacted in a social manner. Lastly, the working memory task was a variant of the *N*-back task. Further details on the resting-state fMRI portion can be found in Smith et al. (2013), and additional details on the task fMRI components can be found in Barch et al. (2013).

### Behavior: Data

To assess generalized intelligence (*g*), we drew 11 measures of cognitive ability from the HCP dataset, which are derived from the NIH Toolbox for Assessment of Neurological and Behavioral Function (https://www.nihtoolbox.org; Gershon et al., 2013) and the Penn Computerized Neurocognitive Battery (Gur et al., 2010). Tasks included picture sequence memory; dimensional card sort; flanker attention and inhibitory control; the Penn Progressive Matrices; oral reading recognition; picture vocabulary; pattern completion processing speed; variable short Penn line orientation test; Penn word memory test (number correct and median reaction time as separate variables); and list sorting. For all measures, the age-unadjusted score was used where applicable. For complete information regarding all measures, see the descriptions in the Cognition Category of the HCP Data Dictionary (https://wiki.humanconnectome.org/display/PublicData/HCP+Data+Dictionary+Public-+Updated+for+the+1200+Subject+Release).

### Behavior: Factor Analysis Model of Psychometric *g*

We then derived a general factor of intelligence using a multiple-indicator latent factor model. We approach the factor model using a confirmatory factor analysis (CFA) approach with a unitary factor underlying all individual cognitive tasks. Factor loadings were estimated using the *psych* R package (Revelle, 2017). Factor scores were computed using the regression method (Thurstone, 1935) to obtain manifest variables for prediction.

### fMRI: Preprocessing

Minimally preprocessed data for both resting-state and task fMRI were obtained from the publicly available HCP data. Minimally preprocessed surface data were then parcellated into 360 brain regions using the Glasser atlas (Glasser et al., 2016). We performed additional preprocessing steps on the parcellated data for resting-state fMRI and task state fMRI to conduct neural variability and FC analyses. This included removing the first five frames of each run, de-meaning and de-trending the time series, and performing nuisance regression on the minimally preprocessed data (Ciric et al., 2017). Nuisance regression removed motion parameters and physiological noise. Specifically, six primary motion parameters were removed, along with their derivatives, and the quadratics of all regressors (24 motion regressors in total). Physiological noise was modeled using aCompCor on time series extracted from the white matter and ventricles (Behzadi, Restom, Liau, & Liu, 2007). For aCompCor, the first five principal components from the white matter and ventricles were extracted separately and included in the nuisance regression. In addition, we included the derivatives of each of those components, and the quadratics of all physiological noise regressors (40 physiological noise regressors in total). The nuisance regression model contained a total of 64 nuisance parameters. This was a variant of previously benchmarked nuisance regression models reported in Ciric et al. (2017).

We excluded global signal regression (GSR), given that GSR can artificially induce negative correlations (Murphy, Birn, Handwerker, Jones, & Bandettini, 2009; Power et al., 2014), which could bias analyses of whether global correlations decrease during task performance. We included aCompCor as a preprocessing step here given that aCompCor does not include the circularity of GSR (regressing out some global gray matter signal of interest) while including some of the benefits of GSR (some extracted components are highly similar to the global signal; Power et al., 2018). This logic is similar to a recently developed temporal-ICA-based artifact removal procedure that seeks to remove global artifacts without removing global neural signals, which contains behaviorally relevant information such as vigilance (Glasser et al., 2018; Wong, Olafsson, Tal, & Liu, 2013). We extended aCompCor to include the derivatives and quadratics of each of the component time series to further reduce artifacts. Code to perform this regression is publicly available online using Python code (version 2.7.15) (https://github.com/ito-takuya/fmriNuisanceRegression). Following nuisance regression, the time series for each run (task-state and rest-state) were z-normalized such that variances across runs would be on the same scale (i.e., unit variance).

Task data for task FC analyses were additionally preprocessed using a standard general linear model (GLM) for fMRI analysis. For each task paradigm, we removed the mean evoked task-related activity for each task condition by fitting the task timing (block design) for each condition using a finite impulse response (FIR) model (Cole et al., 2019). (There were 24 task conditions across seven cognitive tasks.) We used an FIR model instead of a canonical hemodynamic response function given recent evidence suggesting that the FIR model reduces both false positives and false negatives in the identification of FC estimates (Cole et al., 2019). This is because of the FIR model’s ability to flexibly fit the mean evoked response across all blocks.

FIR-modeled task blocks were modeled separately for task conditions within each of the seven tasks. In particular, two conditions were fit for the emotion cognition task, where coefficients were fit to either the face condition or the shape condition. For the gambling reward task, one condition was fit to trials with the punishment condition, and the other condition was fit to trials with the reward condition. For the language task, one condition was fit for the story condition, and the other condition was fit to the math condition. For the motor task, six conditions were fit: (a) cue; (b) right-hand trials; (c) left-hand trials; (d) right-foot trials; (e) left-foot trials; and (f) tongue trials. For the relational reasoning task, one condition was fit to trials when the sets of objects were matched, and the other condition was fit to trials when the objects were not matched. For the social cognition task, one condition was fit if the objects were interacting socially (theory of mind), and the other condition was fit to trials where objects were moving randomly. Lastly, for the working memory task, eight conditions were fit: (a) 2-back body trials; (b) 2-back face trials; (c) 2-back tool trials; (d) 2-back place trials; (e) 0-back body trials; (f) 0-back face trials; (g) 0-back tool trials; and (h) 0-back place trials. Since all tasks were block designs, each time point for each block was modeled separately for each task condition (i.e., FIR model), with a lag extending up to 25 TRs after task block offset.

### fMRI: Task Activation

We performed a standard task GLM analysis on fMRI task data to estimate evoked brain activity during task states. The task timing for each of the 24 task conditions was convolved with the SPM canonical hemodynamic response function to obtain task-evoked activity estimates (Friston et al., 1994). Coefficients were obtained for each parcel in the Glasser et al. (2016) cortical atlas for each of the 24 task conditions.

### fMRI: Functional Connectivity Estimation

Residual time series from the rest and task nuisance regressions were used to estimate functional connectivity for each task. Connectivity values were estimated using zero-lag Pearson product-moment correlations. Time series were concatenated across separate runs of the same task to yield a single connectivity value per edge for a given task or resting-state condition. For each task scan, we utilized TRs that corresponded to “on-task” time points. For instance, we extracted TRs from the working memory scan during *N*-back task blocks, excluding TRs from the interblock fixation periods. For the number of TRs included in the connectivity estimates for each condition and scan state, see Table S1 in the Supporting Information.

### fMRI: Factor Analysis Model of Latent FC

Factor analysis for obtaining latent FC was conducted with the same approach used to obtain factor scores for generalized intelligence. FC estimates from each separate fMRI task were used as indicators on a unitary factor model, and factor scores were obtained using the regression method in the *psych* R package. A separate model was computed for each edge in the connectivity adjacency matrix. We took several approaches to test the predictive utility of latent FC for activation and behavior (detailed below).

The first set of analyses tested two alternative measurement approaches for latent FC. The first was to utilize all available data from each functional scan to estimate factor scores for each edge. However, because of the differential amount of scan time for different functional runs (e.g., ∼58 min of resting-state vs. ∼10 min of working memory scans), we might expect that indicators (i.e., scan types) with more data would dominate the measurement model in the factor analysis. To control for this potential confound, we ran additional analyses in which indicators were constrained to have equivalent numbers of TRs used to estimate individual scan functional edges between task and rest, and between different task states. The reasoning task had the fewest “on-task” TRs (264) and therefore served as the limiting factor for task scan data. As such, 264 TRs of each task (for 2,112 TRs of task) and a corresponding 2,112 TRs of rest were used in these analyses. All of these analyses were performed modeling all available scan types in the same factor model.

For activity flow mapping (ActFlow) analyses (Cole et al., 2016; Ito et al., 2020), where activations in held-out regions were predicted using estimated activity flowing over estimated connections, latent FC was estimated independently for each connection by applying leave-one-state-out factor analysis (LOSO-FA) on the state FC estimates to prevent circularity in the predictive model. For instance, when predicting activation in the emotion task, FC estimates were obtained without including the emotion task as an indicator in the factor model. In all ActFlow analyses, we estimated predictions per subject and then pooled results (i.e., an estimate-then-average approach).

### Meta-Analysis Across Samples

*g*-prediction analysis across the validation and exploratory sample, we computed r-to-z-score transformations of the individual coefficients and then combined them into a weighted z-score using the standard formula (Field, 2001) where

*z*is the z-score and the weight (

*w*) corresponds to the sample size.

## ACKNOWLEDGMENTS

Data were provided by the Human Connectome Project, WU-Minn Consortium (principal investigators: D. Van Essen and K. Ugurbil; 1U54MH091657), funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. The authors acknowledge the Office of Advanced Research Computing (OARC) at Rutgers, the State University of New Jersey, for providing access to the Amarel cluster and associated research computing resources that have contributed to the results reported here. The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.

## SUPPORTING INFORMATION

Supporting information for this article is available at https://doi.org/10.1162/netn_a_00234.

## AUTHOR CONTRIBUTIONS

Ethan M. McCormick: Formal analysis; Methodology; Visualization; Writing – original draft; Writing – review & editing. Katelyn L. Arnemann: Conceptualization; Formal analysis; Methodology; Software; Visualization; Writing – original draft; Writing – review & editing. Takuya Ito: Conceptualization; Methodology; Resources; Writing – review & editing. Stephen José Hanson: Conceptualization; Methodology; Writing – review & editing. Michael W. Cole: Conceptualization; Funding acquisition; Supervision; Writing – review & editing.

## FUNDING INFORMATION

Michael W. Cole, National Institute on Aging (https://dx.doi.org/10.13039/100000049), Award ID: R01 AG055556. Michael W. Cole, National Institute of Mental Health (https://dx.doi.org/10.13039/100000025), Award ID: R01 MH109520. David C. Van Essen and Kamil Ugurbil, National Institute of Mental Health (https://dx.doi.org/10.13039/100000025), Award ID: 1U54MH091657.

## TECHNICAL TERMS

- Intrinsic connectivity:
The part of FC between a given set of brain regions that exists independently of current task demands.

- Functional connectivity (FC):
A measure of the degree of relationship between the function time course of two regions. Here measured with functional MRI.

- Factor analysis:
A statistical framework for estimating the properties of latent variables from a set of correlated observed variables.

- Latent variable:
A variable (often denoted

*η*) that is not directly observed but inferred by patterns of covariation in the data.- Factor loading:
A regression-based estimate of the relationship between latent and observed variables (often denoted

*λ*).- Positive manifold:
A positive manifold implies that all variables positively correlate with one another.

- Psychometric
*g*: A measure of generalized intelligence underlying performance on a set of diverse cognitive tasks.

- Bonferroni correction:
Correction for multiple comparison where the corrected

*p*value threshold is the original threshold divided by the number of tests performed.- Activity flow mapping:
Computational model where activity in a target region reflects activity in other regions weighted by their FC with the target.

- Ridge regularization:
A penalized method for estimating a linear regression model when the number of predictors outnumbers the number of subjects.

- Meta-analysis:
A statistical analysis used to pool results across multiple samples.

## REFERENCES

## Author notes

Competing Interests: The authors have declared that no competing interests exist.

Denotes equal authorship.

Handling Editor: Shella Keilholz