Abstract
Functional and effective networks inferred from time series are at the core of network neuroscience. Interpreting properties of these networks requires inferred network models to reflect key underlying structural features. However, even a few spurious links can severely distort network measures, posing a challenge for functional connectomes. We study the extent to which micro- and macroscopic properties of underlying networks can be inferred by algorithms based on mutual information and bivariate/multivariate transfer entropy. The validation is performed on two macaque connectomes and on synthetic networks with various topologies (regular lattice, small-world, random, scale-free, modular). Simulations are based on a neural mass model and on autoregressive dynamics (employing Gaussian estimators for direct comparison to functional connectivity and Granger causality). We find that multivariate transfer entropy captures key properties of all network structures for longer time series. Bivariate methods can achieve higher recall (sensitivity) for shorter time series but are unable to control false positives (lower specificity) as available data increases. This leads to overestimated clustering, small-world, and rich-club coefficients, underestimated shortest path lengths and hub centrality, and fattened degree distribution tails. Caution should therefore be used when interpreting network properties of functional connectomes obtained via correlation or pairwise statistical dependence measures, rather than more holistic (yet data-hungry) multivariate models.
Author Summary
We compare bivariate and multivariate methods for inferring networks from time series, which are generated using a neural mass model and autoregressive dynamics. We assess their ability to reproduce key properties of the underlying structural network. Validation is performed on two macaque connectomes and on synthetic networks with various topologies (regular lattice, small-world, random, scale-free, modular). Even a few spurious links can severely bias key network properties. Multivariate transfer entropy performs best on all topologies for longer time series.
INTRODUCTION
Functional and effective network inference in neuroscience typically involves preprocessing the data, defining the parcellation, extracting the time series, inferring the links in the model network, and measuring network properties, for example, to compare patients and controls or to predict phenotype (Bassett & Sporns, 2017; Fornito, Zalesky, & Bullmore, 2016). Each step in the pipeline requires making modeling and analysis choices, whose influence on the final results is the subject of ongoing research (Aquino, Fulcher, Parkes, Sabaroedin, & Fornito, 2020; Cliff, Novelli, Fulcher, Shine, & Lizier, 2020; Zalesky et al., 2016). As part of this effort, we study how the choice of different inference algorithms affects the properties of the resulting network model in comparison to the underlying structural network, and whether the ability to accurately reflect these properties changes across different underlying structural networks.
The structure (or topology) of a network can be described at multiple scales (Bassett & Sporns, 2017): from the microscopic (individual links), to the mesoscopic (modules and motifs) and the macroscopic (summary statistics, such as average shortest path length and measures of small-worldness; Rubinov & Sporns, 2010). At each scale, the structure is associated with development, aging, cognition, and neuropsychiatric diseases (Xia et al., 2020). Previous studies have assessed the performance of different network inference algorithms in identifying the structural links at the microscale (Kim, Rogers, Sun, & Bollt, 2016; Novelli, Wollstadt, Mediano, Wibral, & Lizier, 2019; Razi, Kahan, Rees, & Friston, 2015; Runge, Nowack, Kretschmer, Flaxman, & Sejdinovic, 2018; Sun, Taylor, & Bollt, 2015). The goal of this work is to extend the assessment to all scales and to a variety of topologies, across a range of related network inference algorithms. We link the performance at the microscale to the resulting network properties at the macroscale, and describe how this changes as a function of the overall topology.
We compare bivariate and multivariate approaches for inferring network models, employing statistical dependence measures based on information theory (Shannon, 1948). These approaches include functional network inference, which produces models of networks of pairwise or bivariate statistical relationships between nodes, and can either quantify undirected statistical dependence, in the case of mutual information (MI; Cover & Thomas, 2005), or directed dependence, in the case of transfer entropy (TE; Bossomaier, Barnett, Harré, & Lizier, 2016; Schreiber, 2000). These approaches also include effective network inference, which is intended to produce the simplest possible circuit models that explain the observed responses (Aertsen, Gerstein, Habib, & Palm, 1989). In this class, we evaluate the use of multivariate TE, which, in contrast to the bivariate approaches, aims to minimize spurious links and infer minimal models of the parent sets for each target node in the network.
All of these inference techniques seek to infer a network model of the relationships between the nodes in a system. Different methods capture different aspects of these relationships and don’t necessarily seek to replicate the underlying structural topology, nor do we expect them to in general (particularly in neuroimaging experiments, where aspects of the structure may be expressed more or less or not at all, depending on the cognitive task). In spite of that, in this paper we do seek to evaluate and indeed validate these methods in inferring microscopic, mesoscopic, and macroscopic features of the underlying network structure. Crucially, we perform this validation under idealized conditions—including full observability, stationarity no subsampling, and so fourth—that allow us to establish a hypothesis that effective networks should be not just complementary to the structural but converge to it under these conditions, as our available data increase. Indeed, under these idealized conditions (specifically in the absence of hidden nodes, and other simplifying assumptions, including stationarity), effective networks inferred via multivariate TE are proven to converge to the underlying structure for sufficiently long time series (Runge, 2018; Sun et al., 2015). In gaining an understanding of these multivariate effective connectivity inference algorithms, it is important to validate that they perform to that expectation where it is applicable, and investigate how that performance varies with respect to factors such as sample size, and so on. In doing so, we also address the recent call for more extensive model diversity in testing multivariate algorithms: “To avoid biased conclusions, a large number of different randomly selected connectivity structures should be tested [including link density as well as properties such as small-worldness]” (Runge, 2018, p. 13).
Outside of these idealized circumstances though, we can no longer make a clear general hypothesis on how the effective network models are expected to reflect the underlying structure, yet a successful validation gives confidence that the directed statistical relationships they represent remain accurate as an effective network model at the microscale. Furthermore, it is at least desirable for not only effective networks but also functional networks to recognize important features in the underlying network structure: to track overall regime changes in the macroscopic structure reliably, and to reflect the mesoscopic properties of distinctive nodes (or groups of nodes) in the structure. The desire for recognition of important features in the network is applicable whether the inference is made under idealized conditions or not.
This motivates our validation study, which is primarily conducted under idealized conditions as above and based on synthetic datasets involving ground truth networks of 100–200 nodes with different topologies, from regular lattice to small-world, random, scale-free, and modular. Many of these structural properties are incorporated in the macaque connectomes analyzed in the last section. At the macroscale, we measure several fundamental and widely used properties, including shortest path length, clustering coefficient, small-world coefficient, betweenness centrality, and features of the degree distributions (Rubinov & Sporns, 2010). These properties of the inferred network models are compared with those of the real underlying structural networks in order to validate and benchmark different inference algorithms in terms of their ability to capture the key properties of the underlying topologies. At the microscale, the performance is assessed in terms of precision, recall, and specificity of the inferred model in classifying the links of the underlying structural network. As above, while we do not expect all approaches to strictly capture the microscale features, these results help to explain their performance at the macroscale.
For most of our experiments, the time series of node activity on these networks are generated by vector autoregressive (VAR) dynamics, with linearly coupled nodes and Gaussian noise. Both the VAR process and the inference algorithms are described in detail in the Methods section, where we also discuss how MI and the magnitude of Pearson correlation are equivalent for stationary VAR processes. This implies that the undirected networks obtained via the bivariate MI algorithm are equivalent to the widely employed undirected functional networks obtained via correlation, extending the implications of our results beyond information-theoretic methods. Further, our results based on TE extend to Granger causality, which is equivalent to TE for stationary VAR processes (Barnett, Barrett, & Seth, 2009). Networks inferred using bivariate TE are typically referred to as directed functional networks, to emphasize the directed nature of their links. The extension to multivariate TE for effective network inference can also be viewed as an extension to multivariate Granger causality for the stationary VAR processes here.
We find that multivariate TE performs better on all network topologies at all scales, for longer time series. Bivariate methods can achieve better recall with limited amount of data (shorter time series) in some circumstances, but the precision and the ability to control false positives are neither consistent nor predictable a priori. On the other hand, thanks to recent statistical improvements, multivariate TE guarantees high specificity and precision regardless of the amount of data available, and the recall steadily increases with more data. We discuss how the mesoscopic properties of the underlying structural network—particularly the network motifs—can influence the precision and recall of the model at the microscale. In turn, we show how the performance at the microscale affects the inferred network properties at the macroscale. We observe that bivariate methods are often unable to capture the most distinctive topological features of the networks under study (including path length, clustering, degree distribution, and modularity), largely because of their inability to control false positives at the microscale.
Our final section moves beyond the validation under idealized conditions to extend the experiments to time series from a neural mass model employed on the CoCoMac connectome. Although the incorporation of complexities such as nonlinear interactions and subsampled time series do somewhat reduce the performance of the methods, the superior performance of multivariate TE aligns with the results above from the validation in idealized conditions. While further and more wide-ranging experiments are required, our experiments provide substantial evidence for the validity of multivariate TE in providing effective network models, which still retain meaningful network insights in more realistic conditions.
METHODS
Generating Dynamics on Networks
Two models are used to generate time series dynamics on networks of coupled variables. Vector autoregressive processes are employed for validation studies under idealized conditions, and a neural mass model is used on the weighted CoCoMac connectome as a final investigation going beyond these conditions. The simulation and analysis pipeline is illustrated in Figure 1.
Networks of linearly coupled Gaussian variables.
Neural mass model on the CoCoMac connectome.
To provide an extension beyond the linear VAR dynamics, neural activity in various brain regions is modeled (following Shine, Aburn, Breakspear, & Poldrack, 2018; Li et al., 2019) as an oscillating two-dimensional neural mass model derived by mode decomposition from the Fitzhugh-Nagumo single neuron model (FitzHugh, 1961). As previously presented (Li et al., 2019; Shine et al., 2018), the CoCoMac connectome (Kötter, 2004) is used to provide directed coupling between 76 regions, with axonal time delays between these regions based on the length of fiber tracts as estimated by diffusion spectrum imaging (Sanz Leon et al., 2013). Data provided from Li et al. (2019) and Shine et al. (2018) was simulated using the open-source framework The Virtual Brain (Sanz Leon et al., 2013), using code implementing the model freely available at https://github.com/macshine/gain_topology (Shine, 2018).
Finally, the time series of membrane voltage Vi(t) (originally obtained with a 0.5-ms temporal resolution via stochastic Heun integration) are subsampled at 15 ms (selected as half the median time for the autocorrelation functions to decay to 1/e).
Network Inference Algorithms
Bivariate mutual information for functional connectivity.
Mutual information (MI) is computed between all pairs of nodes independently, in a bivariate fashion, and only the measurements that pass a strict statistical significance test (described below) are interpreted as undirected links.
Bivariate transfer entropy for directed functional connectivity.
Transfer entropy (TE) is computed between all pairs of nodes independently, in a bivariate fashion, and only the measurements that pass a strict statistical significance test (described below) are interpreted as links.
In practice, in order to estimate the TE from the time series, Y<t is usually constructed as an embedding vector (Takens, 1981), and a maximum lag must be specified to build a finite embedding of the target’s past (using either uniformly or nonuniformly spaced variables, (Faes, Nollo, & Porta, 2011; Kugiumtzis, 2013; Vlachos & Kugiumtzis, 2010). Here, using the IDTxl Python package (Wollstadt et al., 2019), a nonuniform embedding of the target’s past Y<t is built via iterative greedy selection of statistically significant elements of Y<t (as detailed in the next section). While similar multivariate embeddings and lags larger than one time step can be used for the source X in principle, here only a single sample at lag 1 for the source is used for the VAR model in alignment with its dynamics.
Analogously to the MI, the bivariate TE values are transformed into a directed network structure by testing their statistical significance. This is computed (using a theoretical null distribution, as summarized in Lizier, 2014) to reflect the probability of observing a larger TE from the same samples if the source samples were temporally decoupled from the target and its past. A critical level of α = 0.01/N is used as per MI.
Bivariate TE has found wide application in studies of directed functional connectivity (e.g., Honey, Kotter, Breakspear, & Sporns, 2007; Lizier, Heinzle, Horstmann, Haynes, & Prokopenko, 2011; Marinazzo, Wu, Pellicoro, Angelini, & Stramaglia, 2012; Orlandi, Stetter, Soriano, Geisel, & Battaglia, 2014; Stetter, Battaglia, Soriano, & Geisel, 2012; Wibral et al., 2011). Importantly, TE and Granger causality are equivalent for Gaussian variables (Barnett et al., 2009), which applies to the VAR processes considered here (Equation 1), suggesting that a Gaussian estimator for TE be employed (Bossomaier et al., 2016). The networks inferred via the bivariate TE algorithm with this estimator are equivalent (for any dynamics) to those obtained via Granger causality, which is also widely employed in neuroscience.
Multivariate transfer entropy for effective connectivity.
The multivariate TE network inference algorithm is described in full in Novelli et al. (2019), synthesizing together components of Faes et al. (2011), Lizier and Rubinov (2012), Montalto, Faes, and Marinazzo (2014), Sun et al. (2015), and Vlachos and Kugiumtzis (2010). A greedy approach is used to iteratively select the candidate variables to add to X<t from a candidate set {Xi, t−l} of lagged variables from the past of each source Xi ∈ Z ∖{Y}, up to some maximum lag l ≤ L (L = 1 for the VAR model and L = 4 for the neural mass model used here). Testing the statistical significance of the conditional mutual information for a new candidate source to be included at each step—conditioning on the previously selected sources—provides an adaptive stopping condition for the greedy algorithm. The family-wise error rate for each target is set to α = 0.01 using the max statistic (Novelli et al., 2019), meaning that for each target there is a 0.01 chance that, under the null hypothesis, at least one spurious parent node is selected. The conditioning on previously selected sources serves to prevent the selection of spurious sources that are correlated with true sources because of common driver as well as pathway or chain effects (referred to as holding redundant information only; Lizier & Rubinov, 2012; Stramaglia, Cortes, & Marinazzo, 2014). Such conditioning also enables capturing multivariate or synergistic effects on the target that cannot be detected by examining individual sources in isolation (Lizier & Rubinov, 2012; Stramaglia et al., 2014). Furthermore, in contrast to always conditioning on all potential sources, by conditioning only on previously selected sources the iterative approach defers overly high-dimensional analysis until it is genuinely required, buying statistical sensitivity (“recall”; Lizier & Rubinov, 2012; Novelli et al., 2019). Every node is studied as a target (in parallel on a computing cluster, using IDTxl; Wollstadt et al., 2019) and the results are then combined into a directed network describing the information flows in the system. Similar to the bivariate TE discussed above, a nonuniform embedding of the target’s past Y<t is built first (Vlachos & Kugiumtzis, 2010), before the second step of selecting sources via the same iterative greedy algorithm (Novelli et al., 2019). While multiple past samples of any given source can be considered (e.g., as has be done in Novelli et al., 2019), only one past value is examined here (L = 1) for the VAR experiments in line with their known structure in Equation 1 and in order to focus on network structure effects only.
Given that TE and Granger causality are equivalent for Gaussian variables (Barnett et al., 2009), using the Gaussian estimator with the multivariate TE algorithm can be viewed as extending Granger causality in the same multivariate/greedy fashion.
Evaluation Metrics
At the microscale (individual links), the network inference performance is evaluated against the known underlying network structure as a binary classification task, using standard statistics based on the number of true positives (TP, i.e., correctly classified existing links), false positives (FP, i.e., absent links falsely classified as existing), true negatives (TN, i.e., correctly classified absent links), and false negatives (FN, i.e., existing links falsely classified as absent). The following standard statistics are employed in the evaluation:
Precision = TP/(TP + FP);
Recall (true positive rate) = TP/(TP + FN);
Specificity (true negative rate) = TN/(TN + FP).
False positive rateFP/(FP + TN).
Intuitively, the precision measures how often an inferred link is actually present in the underlying structure, the recall measures the proportion of true links that are detected, and the specificity measures the proportion of absent links that are correctly not inferred. For a properly controlled family-wise error rate (α), the expected specificity is 1 − α.
At the macroscale, the performance is evaluated in terms of the accuracy in measuring network properties of interest on the inferred network, as compared with their real values when measured on the underlying structural network. These properties include the following;
Characteristic path length. The average shortest distance between all pairs of nodes (Rubinov & Sporns, 2010). A shorter average value is typically interpreted as an indication of the efficiency of the network in propagating information. The characteristic path length is only well defined for connected networks—a problem that is avoided by construction in this study by only generating connected networks. This limitation could be alternatively overcome by replacing the characteristic path length with the analogous global efficiency (Latora & Marchiori, 2001), reported in the Supporting Information for completeness.
Clustering coefficient. The clustering coefficient of a node is the fraction of the node’s neighbours that are also neighbours of each other (Fagiolo, 2007; Watts & Strogatz, 1998). The mean clustering coefficient hence reflects the average prevalence of clustered connectivity around individual nodes.
- Small-worldness coefficient. Small-world networks are formally defined as networks that are significantly more clustered than random networks, yet have approximately the same characteristic path length as random networks (Watts & Strogatz, 1998). The small-worldness coefficient was proposed by Humphries and Gurney (2008) to capture this effect in the following single statistic (although improvements on the original measure have been recently suggested; Neal, 2017; Zanin, 2015):In Equation 9, C and L respectively denote the average clustering coefficient and the characteristic path length; analogously, Crand and Lrand denote the average clustering coefficient and the characteristic path length of Erdős-Rényi networks having the same size and number of links as the network under study.(9)
Degree distribution. The probability distribution of the in- and out-degree over the whole network (where the in- and out-degree of a node are defined as the number of its incoming and outgoing connections, respectively).
Betweenness centrality. The fraction of all shortest paths in the network that pass through a given node (Rubinov & Sporns, 2010), excluding paths that start and end on the given node.
Modularity. A measure of the separation of a network into specific groups (or modules), defined as the fraction of the edges that fall within the given groups minus the expected fraction if the edges were distributed at random (Newman & Girvan, 2004).
- Rich-club coefficient. The “rich-club” phenomenon refers to the tendency of hubs (nodes with high degree) to form tightly interconnected communities (Colizza, Flammini, Serrano, & Vespignani, 2006). The rich-club coefficient was first proposed by Zhou and Mondragon (2004) to quantify this effect:whereby E>k denoted the number of edges among the N>k nodes having degree higher than a given value k.(10)
SMALL-WORLD NETWORKS
Numerical Simulations
The first experiment is aimed at testing the robustness of the three inference algorithms with respect to vast changes in network structure. The Watts-Strogatz model is used to generate a spectrum of topologies, ranging from regular lattices to random networks (similar to Erdős-Rényi networks, although not equivalent; Maier, 2019) through a small-world transition (Watts & Strogatz, 1998). Each simulation starts with a directed ring network of 100 nodes with uniform link weights Cij = Cii = 0.15 and fixed in-degree din = 4 (i.e., each node is linked to two neighbors on each side, as well as to itself via a self-loop). The source of each link is then rewired with a given probability p ∈ [0, 1], so as to change the overall network topology while keeping the in-degree of each node fixed. Only rewiring attempts that keep the network connected are accepted, in order to allow the measurement of the average shortest path length. The simulations for each p are repeated 10 times on different network realizations and with random initial conditions.
Results
At the microscale, the performance is evaluated in terms of precision, recall, and specificity in the classification of the links (present or absent) in the inferred network compared with the underlying structural network. In the case of bivariate MI, each undirected link in the inferred network is represented as two directed links in opposite directions. For longer time series of 10,000 samples, multivariate TE is the most accurate method, achieving optimal performance according to all metrics on all the network topologies generated by the Watts-Strogatz rewiring model (Figure 2, right column). Bivariate TE also achieves nearly optimal recall and high specificity on all topologies; however, despite the strict statistical significance level, the precision is significantly lower on lattice-like topologies (low rewiring probability) than on random ones (high rewiring probability). The opposite trend is shown by the bivariate MI algorithm, whose precision and recall drastically decrease with increasing rewiring probability. As expected, the recall of all methods decreases when shorter time series of 1,000 samples are provided (Figure 2, left column). However, the recall for multivariate TE is consistent across topologies, while it decreases with higher rewiring probability when bivariate methods are used. This results in the bivariate TE having larger recall for lattice-like topologies, while multivariate TE has larger recall than bivariate for more random topologies (i.e., for a rewiring probability larger than p = 0.2). A further interesting effect is that bivariate TE attains better precision on shorter time series than on longer ones.
At the macroscale, the three algorithms are tested on their ability to accurately measure three fundamental network properties relevant through the small-world transition, using the longer time series of 10,000 samples. Multivariate TE is able to closely approximate the real shortest path length on all the network topologies generated by the Watts-Strogatz rewiring model, while the bivariate MI and TE algorithms produce significant underestimates, particularly on lattice-like topologies (Figure 3). Similarly, multivariate TE is able to closely match the real mean clustering on all the network topologies generated by the Watts-Strogatz rewiring model, while the bivariate MI and TE algorithms consistently overestimate it (Figure 4). The related measure of local efficiency (Latora & Marchiori, 2001) is reported in the Supporting Information.
Given the above results on the characteristic path length and the mean clustering coefficient, it is not surprising that the bivariate MI and TE algorithms significantly overestimate the real small-worldness coefficient, while the multivariate TE method produces accurate estimates on all the network topologies generated by the Watts-Strogatz rewiring model (Figure 5). Equivalent results are found if the alternative measures of “small-world index” (Neal, 2017) or “double-graph normalized index” (Telesford, Joyce, Hayasaka, Burdette, & Laurienti, 2011) are computed instead (not shown).
Discussion
At the microscale, the results concerning the bivariate TE can be explained in the light of the recent theoretical derivation of TE from network motifs for VAR dynamics (Novelli, Atay, Jost, & Lizier, 2020). For a fixed in-degree, the TE decreases with the rewiring probability, making it harder for candidate links to pass the statistical significance tests when only short time series are available. This explains why the recall for the bivariate TE slightly drops with higher rewiring probability for T = 1000 (Figure 2). We can speculate that a similar mechanism could be responsible for the more drastic drop in the recall for the bivariate MI (via evidence from derivations for covariances from network structure for similar processes; Pernice, Staude, Cardanobile, & Rotter, 2011; Schwarze & Porter, 2020). The fact that bivariate TE is larger for regular lattice structures also explains why its recall is slightly higher than for multivariate TE here: The redundancy between close sources that elevates their bivariate TE is explicitly conditioned out of the multivariate TE for secondary sources. On the other hand, as the rewiring increases, the higher recall for multivariate TE must be due to this method capturing synergistic effects that (more disparate) multiple sources have on the target, which the bivariate method does not.
Comparing the results between shorter and longer time series raises another question: Why is the precision of the bivariate TE worse for longer time series than for shorter ones, especially for lattice-like topologies? More complex motifs involving common parents and multiple walks, which are more prevalent in regular lattice topologies, can result in nonzero TE on spurious links. These indirect effects are typically weak; however, for long enough time series, the low TE values can be distinguished from noise and thus pass the statistical significance tests. The resulting spurious links (false positives) decrease the precision and the specificity as the time series length is increased, with the effect being stronger in regular lattice topologies. In other words, the Bonferroni correction of the statistical significance level (i.e., dividing α by the network size N) does not result in a well-calibrated test for bivariate inference methods—the sources are correlated, and the tests on them are not independent. The differences in the specificity on the plots are subtle because the networks are sparse; however, they manifest in large differences in the precision. Crucially, this effect is not seen for the multivariate TE, which maintains specificity consistent with the requested α for all topologies and time series lengths. Thus, lower recall achieved by multivariate TE on regular lattice networks for short time series (compared with bivariate TE) can be viewed as a compromise to control the specificity in a consistent fashion. A compelling argument in favor of controlling the specificity is provided by Zalesky et al. (2016 p. 407), who conclude that “specificity is at least twice as important as sensitivity [i.e., recall] when estimating key properties of brain networks, including topological measures of network clustering, network efficiency and network modularity.” Unfortunately, there is currently no consistent a priori way (and no reasonable candidate) to determine the optimal time series length for bivariate TE to attain high precision.
In particular, a strong positive correlation between two pairs of them implies a positive correlation within the third pair: + > 1 implies > 0 (Langford et al., 2001). The problem was further investigated by Zalesky, Fornito, and Bullmore (2012), who showed that functional connectivity matrices of independent processes also exhibit small-world properties and that—in practice—the correlation between neighbors is much higher than the theoretical lower bound in Equation 11. These considerations on correlation extend to bivariate MI, given the one-to-one relationship between MI and the absolute value of Pearson’s correlation coefficient for the Gaussian variables considered in this study (see the ec5Bivariate mutual information for functional connectivity section in the Methods Section). This transitivity property results in more triangular cliques in functional networks, that is, an inflated clustering coefficient across the whole spectrum of networks in Figure 4. Together with the underestimate of the shortest path length discussed above, the outcome is an overestimate of the small-worldness coefficient (Figure 5). As shown, the limitations of bivariate methods can be overcome by multivariate TE, to a large degree for shorter time series and certainly when sufficiently long time series are available.
SCALE-FREE NETWORKS
Numerical Simulations
The linear preferential attachment algorithm without attractiveness (Barabási & Albert, 1999) is used to generate undirected scale-free networks of N = 200 nodes. Starting with two connected nodes, a new node is added at each iteration and linked bidirectionally to two existing nodes, selected with probability proportional to their current degree (via linear preferential attachment). This preferential mechanism makes high-degree nodes more likely to be selected and further increase their degree—a positive feedback loop that generates few highly connected hubs and many low-degree nodes. The resulting density is approximately 4/N = 0.02 (average in- and out-degrees being approximately 4), with hubs having degrees up to around 50 here. A constant uniform link weight CXY = CXX = 0.1 is assigned to all the links, achieving strong coupling but ensuring the stationarity of the VAR dynamics. For robustness, each simulation is repeated 10 times on different network realizations and with random initial conditions.
Results
At the microscale, the performance is evaluated in terms of precision and recall in the classification of the links. The outcome is qualitatively similar to the small-world case presented above: For longer time series (10,000 samples), multivariate TE is the most accurate method, achieving optimal performance according to all metrics (Figure 6, right column). Bivariate TE also achieves optimal recall; however, despite the strict statistical significance level, the precision is significantly lower than multivariate TE. The bivariate MI algorithm scores comparatively very poorly in terms of both precision and recall (< 40% on average). As expected, the recall of all methods decreases when shorter time series of 1,000 samples are provided (Figure 6, left column). Once more, bivariate TE attains better precision on shorter time series than on longer ones, and for these networks attains slightly better recall than multivariate TE on the shorter time series.
At the macroscale, the three algorithms are tested on their ability to accurately measure several relevant properties of scale-free networks. It is well known that the degree distribution of networks generated via this preferential attachment algorithm follows a power-law, with theoretical exponent β = 3 in the limit of large networks (Barabási & Albert, 1999). Fitting power-laws to empirical data requires some caution, such as adopting a logarithmic binning scheme (Virkar & Clauset, 2014), and the dedicated powerlaw Python package is employed for this purpose (Alstott, Bullmore, & Plenz, 2014). For sufficiently long time series (T = 10,000 in this study), multivariate TE is able to accurately recover the in-degrees of the nodes in our scale-free networks, while the bivariate MI and TE algorithms produce significant overestimates (Figure 7). As a consequence, the (absolute value of the) exponent of the fitted power-law is underestimated by the latter methods, as shown in Figure 8.
Hubs are a key feature of scale-free networks and have high betweenness centrality, since most shortest paths pass through them. However, their centrality is highly underestimated by bivariate methods, often making it indistinguishable from the centrality of peripheral nodes (Figure 9).
As in the small-world case, multivariate TE is able to very closely approximate the real mean clustering coefficient, while the bivariate MI and TE algorithms consistently overestimate it (Figure 10). The related measure of local efficiency (Latora & Marchiori, 2001) is reported in the Supporting Information.
A closer examination of the clustering of individual nodes (instead of the average) reveals that low clustering values are consistently overestimated by bivariate methods, while high clustering values are underestimated (Supporting Information). Finally, bivariate methods overestimate the rich-club coefficient (Supporting Information).
Discussion
Echoing the discussion of small-world networks above, the ability to control the false positives while building connectomes—exhibited only by multivariate TE—is also crucial for correctly identifying fundamental features of scale-free networks, such as the power-law degree distribution and the presence of hub nodes. Hubs are characterized by high degree and betweenness centrality. Unfortunately, the centrality of hubs is not robust with respect to false positives: The addition of spurious links cause’s strong underestimates of the betweenness centrality of real hubs, since additional links provide alternative shortest paths. For bivariate TE, the effect is so prominent that the inferred centrality of real hubs can be indistinguishable from the centrality of peripheral nodes, as shown in Figure 9. The in-degree is in principle more robust with respect to false positives; however, bivariate methods infer so many spurious incoming links into nonhubs that they become as connected (or more) than the real hubs are inferred to be (Figure 7). Taken together, these effects on the in-degree and centrality greatly hinder the identification of real hubs when bivariate MI or TE are employed. The inflation of the in-degree of peripheral nodes also fattens the tail of the in-degree distribution (Figure 8), resulting in an underestimate of the exponent of the fitted power-law with respect to the theoretical value β = 3 (Barabási & Albert, 1999). This has severe implications for the synthetic networks used in this study, erroneously providing evidence against the simple preferential attachment algorithm used to generate them. The third distinct characteristic of these networks is their low average clustering, which is also induced by the preferential attachment algorithm, whereby each new node is only connected to two existing ones. However, bivariate methods fail to capture this feature, producing a strong overestimate of the average clustering coefficient (Figure 10). This can be attributed to the transitivity property of Pearson’s correlation, which produces overabundant triangular cliques in functional networks (as previously discussed). Given the significant biases affecting all the distinctive properties of scale-free networks—in addition to the small-world networks presented above—it is evident that great caution should be used when applying bivariate inference methods (cross-correlation, MI, TE) to draw conclusions as to topological properties of real-world networks. In contrast, again, the multivariate TE was demonstrated to produce network models with microscopic and macroscopic topological properties consistent with those of the underlying structural scale-free networks.
MODULAR NETWORKS
Numerical Simulations
In order to study the performance of the three inference algorithms on modular topologies, networks of 100 nodes are generated and equally partitioned into five groups of 20. Initially, each node is directly linked to 10 random targets within its own group, such that the five communities are completely disconnected. The initial density is thus 50% within each group and 10% overall. Link targets are then gradually rewired from within to between groups, weakening the modular structure but preserving the overall density and keeping the out-degrees fixed. Eventually, the concepts of “within” and “between” groups are no longer meaningful—the links are equally distributed and the topology resembles a random Erdős-Rényi network of equal overall density. This happens when the rewiring is so prevalent that only 2 links are left within the initial groups and 8 out of 10 links are formed between them (for each node). Going even further, when all 10 links are formed between the initial groups and none within, the network becomes multipartite, that is, the nodes are partitioned into five independent sets having no internal connections. A constant uniform link weight CXY = CXX = 0.08 is assigned to all the links, achieving strong coupling but ensuring the stationarity of the VAR dynamics. Each simulation is repeated 10 times on different network realizations and with random initial conditions.
Results
At the microscale, we find that bivariate MI and TE infer more spurious links within the initial groups than between them for smaller between-group densities (Figure 11, left column). As the between-group density increases though, we find more spurious links between the initial groups than within them. The normalized false positive rate is also significantly higher within groups for smaller between-group densities (right column), however the normalization sees the false positive rate becoming comparable between and within group’s as the between-group density increases. The number of false positives produced by multivariate TE is comparatively negligible.
At the mesoscale, the modularity of the partition corresponding to the five disconnected communities is maximal in the absence of rewiring and decreases as more and more links are formed between groups rather than within them (Figure 12). Bivariate and multivariate TE produce accurate estimates of the real modularity, while bivariate MI often underestimates it, particularly for shorter time series (T = 1,000) and intermediate between-group densities.
Discussion
Our results on modular networks confirm and extend previous findings on correlation-based functional connectivity, stating that “false positives occur more prevalently between network modules than within them, and the spurious inter-modular connections have a dramatic impact on network topology” (Zalesky et al., 2016, 407). Indeed, the left column of Figure 11 shows that bivariate MI and TE infer a larger number of false positives between the initial groups than within them, once we have a midrange between-group link density in the underlying structure (which induces the transitive relationships). However, the same does not apply to the false positive rate (i.e., the normalized number of false positives) shown in the right column of Figure 11: Where an edge does not actually exist, it is more likely to be inferred if it is within rather than across groups (for up to midrange between-group link densities). As such, the higher number of false positives between modules is mostly due to the larger number of potential spurious links available between different communities compared with those within them. Nonetheless, the key message is that the modular structure (at the mesoscale level) affects the performance of bivariate algorithms in inferring single links (at the microscale level). This provides further empirical evidence for the theoretical finding that bivariate TE—despite being a pairwise measure—does not depend solely on the directed link weight between a single pair of nodes, but on the larger network structure they are embedded in, via the mesoscopic network motifs (Novelli et al., 2020). In particular, the abundance of specific “clustered motifs” in modular structure increase the bivariate TE, making links within each group easier to detect but also increasing the false positive rate within modules. Other studies have related also the correlation-based functional connectivity to specific structural features, such as search information, path transitivity (Goni et al., 2014), and topological similarity (Bettinardi et al., 2017).
The underestimate of the modularity of the initial partition by bivariate MI (Figure 12, bottom panel) is a direct result of these higher numbers of spurious between-group links. This has important implications for the identification of the modules, since a lower score makes this partition less likely to be deemed optimal by popular greedy modularity maximization algorithms (Blondel, Guillaume, Lambiotte, & Lefebvre, 2008). We speculate that the spurious intermodular links would also hinder the identification of the modules when alternative approaches for community detection are employed (a thorough comparison of which is beyond the scope of this study).
MACAQUE CONNECTOME
Finally, the three inference algorithms are compared on two real macaque brain connectomes, using both linear VAR dynamics and a nonlinear neural mass model (pipeline illustrated in Figure 1).
Numerical Simulations
Linear VAR dynamics.
As a final validation study under idealized conditions, the linear VAR dynamics in Equation 1 is run on the connectome obtained via tract-tracing by Young (1993). This directed network consists of 71 nodes and 746 links (15% density) and incorporates multiple properties investigated in the previous sections, including a small-world topology and the presence of hubs and modules. The scaling of the performance is studied as a function of the cross-coupling strength (i.e., the sum of incoming link weights into each node, denoted as Cin and formally defined as Cin = ∑XCXY for each node Y). The coupling Cin is varied in the [0.3, 0.7] range, making CXY constant for each parent X for a given Y to achieve this, and the self-link weights are kept constant at CXX = 0.2 to ensure the stationarity of the VAR dynamics. For robustness, each simulation is repeated 10 times with random initial conditions.
Nonlinear neural mass model.
As a final experiment, we provide an initial investigation of whether the insights from the previous validation studies extend beyond the idealized conditions there. Specifically, in moving towards a more realistic setting, neural mass model dynamics are simulated on the CoCoMac connectome, as described in the Methods Section. This network structure contains 76 nodes with 1,560 directed connections (27% density), which are weighted and have experimentally estimated coupling delays. Importantly, by incorporating nonlinear coupling, coupling delays, a distribution of coupling weights, and subsampling, this last study drops many of the simplifying assumptions made using the VAR dynamics in the previous sections. The linear Gaussian estimator is retained for our information-theoretic measures despite the nonlinear interactions here, so as to remain consistent with the previous studies. Dropping the assumption of sampling at the real causal process resolution adds a particular challenge, and is often encountered in practice in modalities with low temporal resolution. To handle the variation in coupling delays, we consider sources at lags up to L = 4 time steps (60 ms) here. The longest time series analyzed (30,000 samples) corresponds to 7.5 min. of one sample per 15 ms.
Results
At the microscale, the results for the linear and nonlinear dynamics (Figure 13 and Figure 14) are complementary and summarize the main findings presented so far. There exists a window—characterized by low cross-coupling strength and short time series—where bivariate TE attains similar or better performance compared with multivariate TE in terms of recall, specificity, and precision. For stronger coupling or longer time series, the recall of all methods increase’s, but the precision and specificity of the bivariate methods substantially drop while those of multivariate TE remain consistently high.
An intuitive visual representation of how these differences in precision, recall, and specificity affect the macroscopic inferred network is provided in Figure 15, where the inferred adjacency matrices are displayed beside the real connectome, with different colors indicating which links are correctly/incorrectly inferred or missed by each method.
The macroscale results (in terms of local and global efficiency measures) are reported in the Supporting Information.
Discussion
Interestingly, Figures 13 and 14 show how similar outcomes are produced by either stronger coupling (link weights) or longer time series. An explanation is readily available in the simple case of VAR dynamics: The bivariate TE on spurious links is typically lower than the TE on real links, and it increases with the coupling strength. (Bivariate TE can be analytically derived from the network structure under the assumption of VAR dynamics, and spurious links are associated with larger motifs (e.g., longer chains), contributing to the TE at lower orders of magnitude (Novelli et al., 2020). Therefore, spurious links can only pass statistical significance tests when sufficiently long time series are available (in order for their weak TE values to be distinguished from noise); for the same reason, for shorter time series, spurious links can only be detected in the presence of strong enough coupling. Unfortunately, for real datasets, there is no consistent a priori way to determine the optimal window of time series lengths for bivariate TE, before increasing false positive rate decays precision and specificity.
It is crucial to note that, despite moving beyond the idealized conditions used for validation with the VAR model, the qualitative differences between the inference algorithms remain unchanged in the neural mass model study. That is, multivariate TE attains higher and more consistent precision and specificity than bivariate methods, testifying to a more effective control of false positives that enables more faithful representation of macroscale network features—an advantage that becomes increasingly important as longer time series are provided. Indeed, on a more intuitive level, Figure 15 provides immediate visual evidence of the importance of a reliable method for controlling the false positive rate. The large number of spurious interhemispheric links produced by bivariate methods can hinder the identification of important macroscopic features, starting from the very presence of the two hemispheres themselves and extending to other fundamental network properties (as shown for local and global efficiency measures in the Supporting Information). This issue becomes particularly problematic for longer time series, as in the case of T = 30,000 time samples shown in Figure 15.
With that said, the precision and specificity of the multivariate TE for this more realistic study in Figure 14 is noticeably lower compared with the previous experiments in idealized conditions, such as that shown in Figure 13. Specifically, the specificity is lower than would be expected from the proven well-controlled false positive rate under idealized conditions. This is potentially due to a number of factors in this study, including the subsampling, nonlinear dynamics, strong autocorrelation, and to a lesser extent coupling delays. While we retained the linear Gaussian estimator to be consistent with the previous experiments, we have previously demonstrated substantial performance enhancements for the multivariate TE algorithm when using a nonlinear estimator for studying nonlinear dynamics (Novelli et al., 2019). That could be expected to improve performance here as well. Similarly, we have recently demonstrated approaches to rigorously control the known inflation of false positive rates for MI and TE because of autocorrelation (Cliff et al., 2020), although this is not expected to have as dramatic an effect in this study because of the selection of subsampling time via the autocorrelation time. The subsampling itself though (by a factor of 30 on the original time series) is likely to have had a significant impact on performance. This is because subsampling obscures our view of the dynamics at the real interaction scale, and prevents us from properly conditioning on the past of the target (which is known to inflate the false positive ratel; Wibral et al., 2011). While initial studies have suggested some level of robustness of TE to subsampling (Lizier et al., 2011), a more systematic analysis would be important future work to properly understand its effect.
CONCLUSION
We have sought to evaluate how well network models produced via bivariate and multivariate network inference methods capture features of underlying structural topologies. As outlined in the Introduction, these inference techniques seek to infer a network model of the relationships between the nodes in a system, and are not necessarily designed or expected to replicate the underlying structural topology in general. Our primary focus, however, was in evaluating the techniques under specific idealized conditions under which effective network models are proven to converge to the underlying structure. The focus on such conditions is important because they provide assumptions under which our evaluation becomes a validation study. The performance of these methods was evaluated at both the microscopic and the macroscopic scales of the network. While we may not expect the same performance in identifying links at the microscopic scale, we should expect all of the methods to identify relevant macroscopic features in the underlying network structure, as well as distinctive nodes or groups of nodes.
For longer time series, multivariate TE performs better on all network topologies (lattice-like, small-world, scale-free, modular, and the real macaque connectome). This enhanced performance is very clear at the microscale of single links, achieving high precision and recall, and consequently at the macroscale of network properties, accurately reflecting the key summary statistics of the ground truth networks used for validation.
Bivariate methods (directed and undirected) can exhibit higher recall (or sensitivity) for shorter time series for certain underlying topologies; however, as available data increase, they are unable to control false positives (that is, they have lower specificity). While decreasing statistical significance thresholds (critical α levels) for inferring links is a common strategy to reduce false positives, the bivariate measures simply cannot match the sensitivity of the multivariate approach at the same specificity (compare the precisions for same recall level in Figure 6 as an example, or refer to the sample ROC curve in the Supporting Information for a more extensive comparison). At the macroscale, the comparatively larger number of false positives leads to overestimated clustering, small-world, and rich-club coefficients, underestimated shortest path lengths and hub centrality, and fattened degree distribution tails. The changes in these measures are partly due to the aforementioned transitivity property for bivariate measures (implying that false positives are often “close” to real links in the network), and partly due to higher density; untangling these effects is a topic for future work. In any case, caution should therefore be used when interpreting network properties of functional connectomes obtained via correlation or pairwise statistical dependence measures. Their use is advisable only when the limited amount of data don’t allow the use of the more sophisticated but more accurate multivariate TE, which more faithfully tracks trends in underlying structural topology. Further research is required to try to reliably identify—a priori—situations where bivariate TE will exhibit higher precision and recall (particularly in terms of time series length), as there is no clear candidate approach to do so at present. In the current status quo, the critical strength of the multivariate approach lies in its ability to appropriately control the false positive rate to meet the requested values.
Our evaluation of the inference techniques under idealized conditions considered several of the highest profile complex network topologies: lattice-like, small-world, scale-free, modular, and a mix of their features in a real macaque connectome. This complements previous work (Novelli et al., 2019; Sun et al., 2015) at a similar scale, which evaluated performance on random network structures and incorporated a study of the effect of network size and linear-versus-nonlinear dynamics and estimators. Obviously, we have only scratched the surface of examining the effects of the myriad combinations of network parameters that could be investigated, which could include larger variations in degree, distributions on edge weights, super or sublinear preferential attachment, nonuniform module sizes, or cross-module connection probabilities, and could also incorporate experiments across other types of dynamics. Thus far, our conclusions on how the multivariate TE approach performs against the bivariate measures were consistent across the variety of structures, and while it would be interesting to see how other variations in structure affect the performance, we do expect the general conclusions on the comparison between approaches to remain similar.
Of course, there is a computational time tradeoff, with the run-time of the multivariate TE algorithm requiring 𝒪(d) longer in comparison to the bivariate approach (where d is the average inferred in-degree). The runtime complexity is analyzed in detail and benchmarked for both linear and nonlinear estimators on similar scale experiments in Novelli et al. (2019; Supporting Information). Our experiments here on 10,000 time samples for up to 200 nodes, with the more efficient linear estimator, took less than 2 hr (single core) on average per target on the same hardware. Given the availability of parallel computing to analyze targets simultaneously, we believe the trade-off in runtime increase is justifiable for the performance increase demonstrated here.
Beyond idealized conditions, effective network inference techniques are not guaranteed to converge in such manner to an underlying structure. This can be for many reasons, including hidden nodes (or lack of full observability), nonstationarity or short sample size, or subsampling obscuring the scale of interaction. Yet, our final experiment (examining time series dynamics of a neural mass model on the 76 node CoCoMac connectome) extended our investigations into the domain beyond idealized conditions and also demonstrated superior performance of the multivariate TE, aligning with the validation studies in idealized conditions. Importantly, this included visually revealing the characteristic hemispheric macroscopic structure of this connectome. With that said, the performance of multivariate TE in this example was certainly reduced in comparison to our experiments under idealized conditions. This appears to be due to various factors as discussed in that section, including the use of a linear estimator on nonlinear dynamics as well as the effect of subsampling. There is substantial scope for further study to understand the performance of inference techniques under nonideal techniques, and how the effective network models they infer are related to underlying structure. This will involve further experiments on realistic neural dynamics; systematic study of the effect of subsampling in network inference (building on existing studies for Granger causality; Barnett & Seth, 2017), and assessing the ability of inference algorithms to capture key network features when only a subset of nodes is observed (i.e., in the presence of hidden nodes).
Finally, while we focus on functional brain networks, our conclusions and methods also apply to anatomical brain networks in which connectivity is measured using correlation in cortical thickness or volume (He, Chen, & Evans, 2007). Beyond neuroscience, they also extend to metabolite, protein and gene correlation networks (Gillis & Pavlidis, 2011; a similar validation study using synthetic networks was carried out in gene regulatory networks using bivariate MI and TE; Budden & Crampin, 2016).
ACKNOWLEDGMENTS
The authors thank Mac Shine and Daniele Marinazzo for useful discussions. The authors acknowledge the Sydney Informatics Hub and the University of Sydney’s high-performance computing cluster Artemis for providing the high-performance computing resources that have contributed to the research results reported within this paper. The authors thank Matthew Aburn for providing time-series data simulated from the neural mass model on the CoCoMac connectome from Li et al. (2019) and Shine et al. (2018).
SUPPORTING INFORMATION
Supporting Information for this article is available at https://doi.org/netn_a_00178. The network inference algorithms described in this paper are implemented in the open-source Python software package IDTxl (Wollstadt et al., 2019), which is freely available on GitHub (https://github.com/pwollstadt/IDTxl). The code used for the systematic exploration of network structures and inference methods is also publicly available (Novelli, 2020; https://github.com/LNov/infonet).
AUTHOR CONTRIBUTIONS
Leonardo Novelli: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Software; Validation; Visualization; Writing original draft. Joseph Lizier: Conceptualization; Funding acquisition; Investigation; Supervision; Writing review & editing.
FUNDING INFORMATION
Joseph Lizier, Australian Research Council, Award ID: DECRA Fellowship grant DE160100630. Joseph Lizier, University of Sydney, Award ID: Research Accelerator (SOAR) prize program.
TECHNICAL TERMS
- Brain parcellation:
The subdivision of the brain into “parcels” or regions of interest, serving as nodes in network-based models of the brain.
- Network topology:
The structure of the network describing the connections between its elements.
- Multivariate network inference:
Considering multiple time series at once (as opposed to bivariate inference, which considers each pair of nodes in isolation).
- IDTxl:
The Information Dynamics Toolkit xl is an open-source Python package for network inference, freely available on GitHub (https://github.com/pwollstadt/IDTxl).
- Bonferroni correction:
A strategy for controlling the error rate when multiple comparisons are performed. The significance threshold is divided by the number of comparisons.
- Nonuniform embedding:
A set of nonuniformly spaced time lags that capture the underlying state of the process, akin to a Takens’ embedding.
REFERENCES
External Supplements
Author notes
Competing Interests: The authors have declared that no competing interests exist.
Handling Editor: Daniele Marinazzo