Inferring the heritability of large-scale functional networks with a multivariate ACE modeling approach

Recent evidence suggests that the human functional connectome is stable at different timescales and is unique. These characteristics posit the functional connectome not only as an individual marker but also as a powerful discriminatory measure characterized by high intersubject variability. Among distinct sources of intersubject variability, the long-term sources include functional patterns that emerge from genetic factors. Here, we sought to investigate the contribution of additive genetic factors to the variability of functional networks by determining the heritability of the connectivity strength in a multivariate fashion. First, we reproduced and extended the connectome fingerprinting analysis to the identification of twin pairs. Then, we estimated the heritability of functional networks by a multivariate ACE modeling approach with bootstrapping. Twin pairs were identified above chance level using connectome fingerprinting, with monozygotic twin identification accuracy equal to 57.2% on average for whole-brain connectome. Additionally, we found that a visual (0.37), the medial frontal (0.31), and the motor (0.30) functional networks were the most influenced by additive genetic factors. Our findings suggest that genetic factors not only partially determine intersubject variability of the functional connectome, such that twins can be identified using connectome fingerprinting, but also differentially influence connectivity strength in large-scale functional networks.


Introduction 43
In the past few years, fMRI research has been living a paradigm shift, moving from 44 population inferences to the study of individual differences (Dubois & Adolphs, 2016;45 Seghier & Price, 2018). Previous studies have paved the way for the study of individual 46 variability in functional connectivity patterns of the human brain (Finn et al., 2015;47 Miranda-Dominguez et al., 2014; Mueller et al., 2013). In this context, resting-state fMRI 48 (rs-fMRI) showed to be particularly powerful in determining underlying differences in 49 the wiring patterns of functional connectome (FC) profiles. Indeed, connectome-based 50 individual predictions achieved identification accuracies as high as 99% when comparing 51 functional connectivity matrices (Finn et al., 2015). Hence, the endeavor to identify and 52 to characterize the individual functional connectivity architecture has been shown to have 53 an imperative place in the study of individual differences. 54 Our prediction was based on the selection of the highest correlation score (excluding the 168 correlation scores between functional connectivity matrices of the same individual) for 169 each 'target' participant vs. 'database' iteration. The mean whole-brain based prediction 170 accuracy was 57.2% (SD = 2.6%). This result indicates that the idiosyncratic FC profiles 171 might be genetically determined and they are sufficiently stable so one could identify 172 monozygotic twins well above chance. Indeed, we have performed a permutation test, by 173 exchanging twin pairs' identities 1,000 times, such that for each identification iteration, 174 a new twin pair identity was assigned. The maximum identification accuracy found 175 through these 1,000 permutations was 1.6%, indicating that the whole-brain based 176 identification performance is significantly different from the chance level (p-value < 177 0.001). 178 Later on, we investigated the ability of specific functional networks in discriminating a 179 twin pair from pairs of unrelated individuals ( Figure 1B). At this stage, the most 180 successful functional networks were the subcortical-cerebellum (28.6 ± 1.5%) and medial 9 frontal (21.1 ± 2.2%) networks. Noteworthy, the most successful functional networks on 182 twin identification were amongst the ones that best performed on individual 183 identifications. Nonetheless, a substantial decrease in the successful twin identification 184 rates was observed for functional networks when compared to the whole-brain 185 connectome, and these results were particularly affected by the number of nodes within 186 each network. The least successful functional networks on twin identification were the 187 ones with the least number of nodes, while the networks with a larger number of nodes 188 tended to present higher accuracies. The Pearson's correlation score between the number 189 of nodes of each network and its ability to correctly identify monozygotic twins was r = 190 0.95 (p-value = 6.3E-5; Supplementary Table 2), as opposed to a nonsignificant 191 correlation between the number of nodes and individual identification accuracy (r = 0.52, 192 p-value = 0.15). This implies that the ability of a priori defined functional networks to 193 capture similarities in the FC profiles of monozygotic twins differentially relies on the 194 amount of information provided (i.e., by the number of nodes). 195 Finally, we performed all the previous analyses for the identification of dizygotic twins. 196 At this time, we selected only the dizygotic individuals (n=134) within the 'target' set, 197 giving 134x380=50,920 comparisons. For the whole-brain based identification, the mean 198 prediction accuracy was 8.9% (SD = 2.3%; p-value <0.001). This abrupt change in twin 199 identification accuracy indicates that the functional connectivity patterns of monozygotic 200 twins are strictly more similar in comparison to dizygotic twins, which indicates the 201 relevance of shared genetic background. At the level of individual functional networks, 202 identification accuracies dropped even further ( Figure 1B), and they were also correlated 203 with the number of nodes of the networks (r = 0.92, p-value < 0.001). 204

Fingerprinting as a function of the number of edges 205
The previous results indicated that twin identification accuracy was correlated with the 206 number of nodes of functional networks, and hence with the number of edges. To further 207 investigate the relationship between the number of edges in connectome fingerprinting 208 and twin identification accuracy, we performed identification analyses using randomly 209 selected subsets of edges, with 100 random selections per subset size (Byrge & Kennedy,210 2018). Our results show that it is possible to identify an individual with high accuracy 211 using a random subset of edges ( Figure 2), with accuracy above 80% using only 500 212 random edges (a similar finding is reported at Byrge & Kennedy, 2018). However, 213 monozygotic twin identification only reaches near 50% accuracy using 10,000 random 214 edges, while dizygotic twin identification accuracy is equal to 8% on average with the 215 same subset size. Noteworthy, monozygotic twin identification accuracy with 500 216 random edges was on average equal to approximately 20%, similar to the prediction 217 accuracy using the medial frontal network (29 nodes and 406 unique edges). On the other 218 hand, prediction accuracy reached 32% with 1,000 random edges and 46% with 5,000 219 random edges. At a similar level, the prediction accuracy of the subcortical-cerebellum 220 network (90 nodes and 4,005 unique edges) was 28.6%. 221 Therefore, our findings suggest that while it is possible to identify twin pairs above 222 chance, differences seen across functional networks in twin pair identification may be 223 mostly driven by differences in the number of nodes/edges. However, the fact that twin 224 identification accuracy with subsets of random edges could outperform functional 225 networks with a similar amount of edges suggests that edges might be differently 226

248
As one could expect, the mean of the distributions of correlation scores from the SI group 249 is notably higher than the ones from the remaining groups. This is observed not only for 250 the whole-brain connectome but also for most of the functional networks, especially for 251 the medial frontal and frontoparietal functional networks. In order to characterize the 252 13 importance of the distance between these distributions -that is, the effect size -to 253 identification analyses, we determined identification accuracy as a function of effect size, 254 Cliff's delta (Cliff, 1993) ( Table  255 3). In Figure 4, we observe that high prediction accuracy is associated with high effect 256 size, while low prediction accuracy was associated with low effect size. This suggests 257 that high intersubject variability (which is related to low correlation between unrelated 258 individuals' connectivity matrices) and low intrasubject variability (high correlation 259 between the connectivity matrices of the same individual in different sessions) are crucial 260 for high prediction accuracy. Additionally, the higher similarity between monozygotic 261 twins in comparison to unrelated individuals (medium to high effect sizes) suggests that 262 a portion of this intersubject variability is heritable and differs across functional networks.

Narrow-sense heritability of functional connections 269
To further investigate these functional networks, we performed heritability analyses using 270 a multivariate ACE modeling approach with bootstrapping. High dimensionality is a 271 14 common hurdle when multivariate processing is considered for regression or inference 272 methods. Hence, univariate analyses are usually preferred to avoid the necessity of 273 increasing computational resources and time due to high dimensional multivariate 274 analyses trade-off, despite the fact that multivariate analyses tend to be more suitable for 275 complex data that includes several thousand of covariates. In neuroscience, the 276 heritability of functional networks is usually determined as the average heritability of shared environment (C) and external sources of variability (E). Then, we determined the 295 mean of A, C and E components across edges. This procedure was repeated with 296 reposition for 8,000 times for the whole-brain network and 1,000 times for each functional 297 network, which resulted in the final distributions of means for each component (A, C and 298 E) ( Figure 5B). Finally, null distributions were similarly obtained by randomly shuffling 299 monozygotic and dizygotic twin statuses at each iteration ( Figure 5C). 300 The heritability distributions with their respective null distributions for all functional 301 networks are illustrated in Figure 6A (Supplementary Figure 3). As expected, the mean 302 heritability of all null distributions was virtually equal to zero. Apart from that, all 303 heritability estimates distributions were significantly different from their respective null 304 distributions (independent t-test, p<.001). Among all functional networks, the visual II 305 has shown to be the most heritable with mean heritability of 0.37 (37% of the variance of 306 the phenotype is attributed to additive shared genetics; Supplementary Table 4), while the 307 subcortical-cerebellum was the least heritable with mean heritability of 0.20 (Figure 6B  308   and Supplementary Table 4, 5 and 6). Additionally, we compared the mean heritability 309 found for all functional networks using our approach with the mean estimates based on 310 univariate models ( Figure 6C). As expected, the mean heritability found using our 311 approach is nearly equal to the classic univariate heritability (Supplementary Table 7 Here, we found that the functional connectivity profiles of twin pairs were more similar 324 than of unrelated individuals, although the degree of similarity varied across functional 325 networks. Indeed, we demonstrated that functional networks have distinct discriminatory 326 power in connectome fingerprinting analyses, in both individual and twin identifications, 327 although in the latter differences in identification performances may be mostly driven by 328 differences in the number of nodes/edges. We also found that high intersubject variability 329 (i.e. variability of a trait between individuals) is crucial for connectome fingerprinting. 330 Finally, our multivariate ACE modeling approach suggests that the heritability of 331 functional networks are consistent throughout the brain, although our findings suggest 332 that functional networks are differentially influenced by additive genetic factors. 333 Altogether, we were able to establish the influence of genetic factors to intersubject 334 variability of functional networks by leveraging a multivariate ACE model in addition to 335 the multivariate connectome fingerprinting approach. 336

Intra and intersubject variability trade-off in connectome fingerprinting 337
Evidence suggests that the different levels of inter and intrasubject variability in 338 functional networks contribute to their distinctiveness, such that high intersubject 339

Genetic influence on functional networks 358
To investigate the impact of additive genetic factors in determining stable patterns of 359 intersubject variability, we performed an alternative approach to the univariate ACE 360 model. In our multivariate ACE model, a fixed number of edges were randomly and 361 iteratively selected to fit the model, and the mean heritability estimate was determined by 362 averaging individual edges heritability at each of those iterations. Therefore, 8,000 363 models were fitted to estimate the heritability of the whole-brain network, as opposed to 364 fitting 35,778 univariate models. In addition to that, 1,000 models were generated for 365 each functional network, totaling 16,000 models (8,000 models for the whole-brain 366 network + 8 * 1,000 models), which is still far less than fitting 35,778 univariate models. 367 We also observed a gain in statistical power with our approach (this is illustrated by the 368 19 narrower confidence intervals of the multivariate model -Supplementary Table 4 -as  369   opposed to the univariate version -Supplementary Table 7). Additionally, our modeling 370 approach provides a straightforward way for building null distributions by randomly 371 shuffling twin statuses at each iteration as the final step before heritability estimation. 372 Therefore, we believe that the contribution of this method is twofold: it reduces the 373 number of models to be fitted for the estimation of the heritability of functional networks 374 and it also provides a straightforward way for building null distributions. 375 We found that the functional networks that were the most influenced by additive genetic 376 factors were not the ones that best performed on twin identifications. This is particularly 377 prominent for the visual II and subcortical-cerebellum functional networks. The first has 378 shown to be highly influenced by additive genetic factors, but it had a poor performance 379 on monozygotic twin identification and individual identification. This indicates that the 380 intersubject variability was low, thus being difficult to discriminate between pairs of 381 connectomes from UN/Twin/SI groups. However, a great portion of this low intersubject 382 variability might be due to additive genetic factors. On the other hand, the subcortical-383 cerebellum network has shown lower heritability but the best performance on twin 384 identification (after whole-brain network). A possible explanation for this finding is that 385 a high intersubject variability allowed a better discrimination between unrelated 386 individuals versus twin pairs, even though a smaller portion of its intersubject variability 387 was due to additive genetic factors. Nonetheless, our findings also suggest that twin 388 identification accuracy of functional networks varies with the number of edges, indicating 389 that the inconsistency seen between twin identification accuracy and heritability is At the network-level, higher-order associative networks were particularly better at 413 discriminations. This result further supports that associative networks accommodate 414 higher intersubject variability in comparison to sensorimotor networks (Gratton et al., 415 2018). Despite that, we observed that the default mode network (DMN) defined by both 416 parcellation schemas differed in performance during identification analyses. For 417 "Gordon" parcels, the DMN figured among the most distinctive networks, similarly to 418 other associative networks. However, this pattern was not observed using " Shen" parcels,419 in which the defined DMN figured among the worst functional networks on individual 420 predictions. This distinction could be due to the different number of nodes attributed to 421 DMN in both schemas. Another finding is that the heritability level of functional networks 422 differed between parcellations, although the mean heritability of the whole-brain 423 functional network was 0.18 using "Gordon" parcels and 0.24 using "Shen" parcels 424 (Supplementary Table 4). This suggests that different brain areas definition greatly 425 impact on heritability estimates, which is a potential topic for further investigation. 426 Using "Gordon" parcellation, we found that the cingulo parietal and retrosplenial 427 temporal networks were the most influenced by additive genetic factors, whilst the 428 somato-sensory mouth and salience networks were the least ones. On the other hand, 429 Miranda-Dominguez et al. (2018) found that the retrosplenial temporal and somato-430 sensory mouth were the most heritable, and the visual and salience networks the least 431 heritable. Additionally, their heritability estimates ranged from 0.11 to 0.14, with the 432 heritability of the whole-brain network being equal to 0.20 (Miranda-Dominguez et al., 433 2018); while our estimates ranged from 0.47 to 0.12. These differences are likely due to 434 differences in heritability estimation approaches; whilst we used the conventional ACE 435 modeling approach, they used three-way repeated-measures ANOVAs. Although the 436 heritability estimates we obtained using "Shen" parcels were more homogeneous, we 437 were still able to capture the different levels of heritability of functional networks, 438 suggesting that our approach is suitable for capturing such differences. Therefore, although we found differences in how additive genetic factors may be 509 influencing intersubject variability of functional networks, such estimates are not definite. 510 Critically, different models' assumptions may potently lead to inconsistent findings of 511 heritability estimates for large-scale functional networks, and future refinements of such 512 estimates (using meta-analysis, for instance) should consider them. 513

Database and participant information 515
In this study, we used the dataset from the "1200 subjects data release" of the Human 516 From this subsample, we excluded the participants who did not have both resting-state 520 fMRI sessions (ICA-FIX versions) available, and who did not have the twin within the 521 group. Therefore, our final sample size was n=380.

Data acquisition 527
The acquisition protocol has been previously described (Van Essen et al., 2013). In 528 summary, functional and structural data were acquired in a 3T Siemens Skyra scanner 529 using a 32-channel head coil. Resting-state data were collected in two separated sessions 530 (REST1 and REST2) in different days, each session containing two runs of 15 minutes.  Table 1), in addition to 47 ROIs not assigned to any specific network. 557 The latter defines 268 ROIs clustered in 8 networks (Supplementary Table 1). 558

Equation 1 576
where e is the number of edges. In order to predict the identity of the target participant, 577 the maximal Pearson's correlation coefficient was selected ( Figure 1A). Additionally, we 578 also investigated the contribution of single networks to identification accuracy by sub-579 sectioning the functional connectivity matrices into sub-matrices of single networks. To 580 perform this, we selected only connection within a specified network. Then, we calculated 581 the Pearson's correlation coefficients, similarly to the previous approach. Results are 582 reported as mean ± SD. 583

Twin identification 584
The twin pair identification algorithm was based on the previous individual identification 585 analysis. At this stage, we removed the correlations corresponding to the same individual 586 in different sessions, that is the diagonal of individuals × individuals matrices, and then 587 performed a new set of identification analyses. In this condition, if the chosen maximum 588 correlation value belonged to the target subject's twin, the prediction was considered 589 correct. Monozygotic and dizygotic twins were analyzed separately, and all conditions 590 (RESTX × RESTY, where X and Y ∈ {1, 2}) were tested. Results are reported as mean 591 ± SD. 592

Statistical significance assessment 593
To assess the statistical significance of twin identification analyses, we performed a 594 permutation testing. To ensure the independence of the dataset, we permuted the twin 595 pairs' identities, such that for each row of the 'individuals vs. individuals' matrix ( Figure  596 29 1A) a new twin pair identity was assigned. The permutation process was repeated 1,000 597 times for each functional network. 598

Effect size 599
The distribution of correlation scores between pairs of connectivity matrices (i.e. 600 correlation among the vectorized form of the connectivity matrices) was determined by 601 grouping these scores based on familial relationship: 1) same individual -SI; 2) 602 monozygotic twins -MZ; 3) dizygotic twins -DZ and 4) unrelated individuals -UN. 603 Following that, the effect size of the differences between the distributions of correlation 604 values was measured through the calculation of Cliff's delta. This a non-parametric effect 605 size measure based on all pairwise differences (Cliff, 1993), which gives how often values 606 from one distribution are larger than the ones from a second distribution (Equation 2). 607 608 = :;< = * >= 2 .:;<(= * ?= 2 ) @ * @ 2 Equation 2 609 610 Therefore, the number of times that values from one group are higher than the ones from 611 a second group is calculated for all possible combinations of values between the two 612 groups (n 1 n 2 , where n 1 and n 2 are the number of values within the distribution 1 and 2, 613 respectively). The final Cliff's delta value is the difference between the previous 614 calculations divided by all possible combinations. Thus, a positive and high value of d 615 (d maximum = 1) mean that values within distribution 1 are mostly higher than the ones within 616 distribution 2, a negative and high absolute value of d (d minimum = -1) means the other way 617 round, that values within distribution 1 are mostly lower than the ones within distribution 618 2, and d = 0 means that the distribution 1 and 2 are equal. 619