Abstract
Binary undirected graphs are well established, but when these graphs are constructed, often a threshold is applied to a parameter describing the connection between two nodes. Therefore, the use of weighted graphs is more appropriate. In this work, we focus on weighted undirected graphs. This implies that we have to incorporate edge weights in the graph measures, which require generalizations of common graph metrics. After reviewing existing generalizations of the clustering coefficient and the local efficiency, we proposed new generalizations for these graph measures. To be able to compare different generalizations, a number of essential and useful properties were defined that ideally should be satisfied. We applied the generalizations to two real-world networks of different sizes. As a result, we found that not all existing generalizations satisfy all essential properties. Furthermore, we determined the best generalization for the clustering coefficient and local efficiency based on their properties and the performance when applied to two networks. We found that the best generalization of the clustering coefficient is , defined in Miyajima and Sakuragawa (2014), while the best generalization of the local efficiency is , proposed in this letter. Depending on the application and the relative importance of sensitivity and robustness to noise, other generalizations may be selected on the basis of the properties investigated in this letter.
1 Introduction
A complex system can be modeled as a graph or network, which is composed of nodes and edges connecting them. Analysis over a wide range of complex systems has led to a fundamental insight: many complex systems often share certain topological characteristics, and these can be captured by graph-theoretical metrics (Barabasi & Oltvai, 2004; Amaral & Ottino, 2004; Zhang & Horvath, 2005; Bullmore & Sporns, 2009; Fornito, Zalesky, & Breakspear, 2013). The small-world topology for example, has been found in many real-world networks (He, Chen, & Evans, 2007; Opsahl & Panzarasa, 2009; Batalle et al., 2012; Vandenberghe et al., 2013), and is an indication of the cost-efficiency of these networks.
While traditional graph analysis uses binary edges to enhance contrast between strong and weak connections, there is an increasing demand for using edge weight, which entails potentially important information. Incorporating edge weights in the graph analysis calls for generalizations of the graph metrics. While some of these measures can be naturally generalized to a weighted version (e.g., node degree to node strength), others cannot be generalized in a straightforward way. The generalization of clustering coefficient and local efficiency, used to quantify the small-world topology (Watts & Strogatz, 1998; Achard and Bullmore, 2007; Batalle et al., 2012), is far from trivial.
The clustering coefficient reflects the tendency that neighbors of a node are also neighbors to each other (Rubinov & Sporns, 2010). The clustering coefficient is high in small-world networks compared to random networks (Watts & Strogatz, 1998). Local efficiency is a measure for the fault tolerance of the system: it measures how efficient the communication is between neighbors of a node when that node is removed (Latora & Marchiori, 2003). A small-world network features a local efficiency intermediate to that of regular (lattice) and random network (Achard & Bullmore, 2007; Batalle et al., 2012). The two measures are related in a way that the clustering coefficient in an undirected network is found to be a reasonable approximation of local efficiency (Latora & Marchiori, 2003).
Although it is straightforward to find the neighbors of a node, the question of how to define their weighted surrogates is far from obvious. Several generalizations have been proposed (Barrat, Barthelemy, Pastor-Satorras, & Vespignani, 2004; Onnela, Saramäki, Kertész, & Kaski, 2005; Zhang & Horvath, 2005; Saramäki, Kivelä, Onnela, Kaski, & Kertesz, 2007; Opsahl & Panzarasa, 2009; Miyajima & Sakuragawa, 2014; Rubinov & Sporns, 2010). Different definitions capture slightly different aspects of the network, yet some of the generalizations are not designed for fully weighted networks (Rubinov & Sporns, 2010; Barrat et al., 2004; Onnela et al., 2005). These generalizations require the removal of the weak or noisy connections beforehand. A preferable solution is to adapt the equations in such a way that they can be used for a fully weighted network (Zhang & Horvath, 2005; Saramäki et al., 2007; Opsahl & Panzarasa, 2009; Miyajima & Sakuragawa, 2014).
In this letter, we first define a number of essential and useful properties that ideally should be satisfied when using a generalized graph measure and explain how we will evaluate them. Then we review the existing generalizations, and we propose new generalizations for the local efficiency for fully weighted undirected networks with no self-connections. Finally, we make a thorough comparison of the different generalizations and apply them to two real-world networks.
2 Methods
2.1 Properties of Generalized Graph Measures
A generalization of a binary graph measure should ideally satisfy some properties. Some of these properties are essential, while others may depend on the application. The essential properties are:
- •
General versatility. If the input is given as a binary network, the output of the generalizations should give the same results as the binary version: when (Miyajima & Sakuragawa, 2014).
- •
Continuity. The graph measure should be continuous, that is, in which except for one connection weight, which differs by (Miyajima & Sakuragawa, 2014).
- •
Sensitivity. The graph measure should be able to make a distinction between different cases for which the graph measure is designed. Because the clustering coefficient and the local efficiency are defined using triangles, we will evaluate the six possible cases in which the weight is either low or high in one of the connections of the triangle (see Figure 1).
- •Robustness to noise. The graph measure for node should be robust when adding noise to the weights of the connections. If we assume a noise matrix , which represents additive noise on each connection weight , and define the mean relative error (in percent) with respect to the noise-free measures as then a small value of should lead to a small error value .
Additional useful properties can be defined, but these properties depend on the application under investigation and should not be considered essential:
- •
Weight-scale invariance. The graph measure is invariant to a global scale factor for all edges: (Miyajima & Sakuragawa, 2014).
- •
Applicable to fully weighted networks. In some applications, it is beneficial to avoid any thresholding of the weights of the connections in a network. As a result, every node is connected with every other node, although the weights may be very small. A generalization of the clustering coefficient and the local efficiency should be applicable to such cases.
2.2 Generalizations
A good review of existing generalizations of the clustering coefficient and local efficiency can be found in Miyajima and Sakuragawa (2014). The authors investigated general versatility, weight-scale invariance, and continuity. Here, we investigate all criteria listed in the previous section. We first give a short description of the different generalizations currently available.
They proposed four methods to calculate : arithmetic mean (), geometric mean (), and maximum () and minimum () of the weights of the edges. We refer to the set of these four methods by .
Miyajima and Sakuragawa (2014) extended the generalization of the cluster coefficient of node to the case of weighted directed networks using different functions (multiplication, geometric mean, minimum, and harmonic mean). The case of multiplication in the context of an undirected network leads to the same generalization as the one from Holme et al. (2007). For undirected networks, we give their other extensions below:
As is the case for the clustering coefficient, multiple generalizations can be defined for the weighted local efficiency. We define three possible extensions that we compare with the current generalization defined by Rubinov and Sporns (2010).
Weighting by makes the shortest distance invariant to the weight scale.
2.3 Application to Two Real-World Networks
A number of essential (general versatility, continuity) and useful (weight-scale invariance and applicability to fully weighted networks) properties are evaluated on a theoretical basis. The essential property sensitivity will be assessed by studying the cases shown in Figure 1, while the fourth essential property robustness to noise will be assessed using two real-world networks.
2.3.1 The Associative-Semantic Network
We will evaluate the robustness to noise of the different generalizations by studying the situation in which we add noise to the -value (before the transformation to weights) of each (nonzero) connection using a standard normal distribution as noise model multiplied by some constant to model the amount of noise. After adding the noise, weights are calculated as before using equation 2.20. We will study the case of corresponding to different levels of noise ranging from weak to strong noise. The average error will be calculated over 10 noise realizations.
Since this is a fully weighted network, we will also calculate the correlation of the different versions of the clustering coefficient and local efficiency between this fully weighted network and the soft-thresholded network (thresholded such that the density is 80%, 60%, or 40%). Often a threshold is used to remove connections with low weight. This is referred to as the creation of soft-thresholded weighted networks. We hypothesize that weighted graph measures calculated for the original weighted network and for the soft-thresholded weighted network are highly correlated, especially when the density of the latter network is high.
2.3.2 The Resting State fMRI Network
The second network is a functional connectivity network constructed from correlations between regional fMRI times series, measured in 638 nodal locations from 27 healthy volunteers scanned in resting state on a Siemens 3T scanner. The details of this experiment can be found in Crossley et al. (2013). The data of this network are publicly available.1 Unfortunately, the connection strengths (expressed as -scores) were already thresholded, and as a result, only the connections with were available. Therefore, we can only study the robustness to noise in this case. We will do this in the same way as the first real-world network, including the transformation of -values to weights using equation 2.20.
3 Results
3.1 General Versatility
All the generalizations of the clustering coefficient, except , lead to the same equation in the case of binary undirected networks, as can be easily seen from equations 2.5 to 2.7 and 2.9 to 2.12, where , 0 or 1 and . The generalization does not show general versatility because in the denominator, there is no requirement that the two sides of a triangle should be different (i.e., when the triplet is not a triangle but a line).
The generalization of the local efficiency, introduced in equation 2.13, does not show general versatility because of the power of the distance compared to in the binary case. In contrast, the generalizations of the local efficiency proposed in this letter—equations 2.15, 2.17, and 2.18—do show general versatility.
The results of each generalization for this criterion are summarized in Table 1.
Method . | General Versatility . | Continuity . |
---|---|---|
Yes | No | |
Yes | No | |
Yes | Yes | |
No | Yes | |
Yes | No | |
Yes | No | |
Yes | No | |
Yes | No | |
Yes | Yes | |
Yes | Yes | |
Yes | Yes | |
No | No | |
Yes | Yes | |
Yes | Yes | |
Yes | Yes |
Method . | General Versatility . | Continuity . |
---|---|---|
Yes | No | |
Yes | No | |
Yes | Yes | |
No | Yes | |
Yes | No | |
Yes | No | |
Yes | No | |
Yes | No | |
Yes | Yes | |
Yes | Yes | |
Yes | Yes | |
No | No | |
Yes | Yes | |
Yes | Yes | |
Yes | Yes |
3.2 Continuity
The node degree is a discontinuous function for a weighted network since any node with a nonzero weight is considered a neighbor irrespective of the amplitude. This implies that the value for the node degree will differ with 1 between the case in which an arbitrary small weight is present for an edge between nodes and and the case in which this edge is not present (i.e., has zero weight). As a result, , and are not continuous. is also not continuous since in the numerator, only closed triangles will contribute, no matter how small the weight of the third connection in the triangle is and the contribution depends only on the weights of the two other connections in the triangle. The other extensions—, and —are all continuous.
The results of each generalization for this criterion are summarized in Table 1.
3.3 Sensitivity
Method . | Number of Different Values (max = 6) . | Minimum . | Maximum . |
---|---|---|---|
1 | 1 | 1 | |
4 | 0.1 | 1 | |
4 | 0.1 | 1 | |
4 | 0.0165 | 0.5 | |
1 | 1 | 1 | |
1 | 1 | 1 | |
1 | 1 | 1 | |
1 | 1 | 1 | |
2 | 0.3162 | 1 | |
2 | 0.1 | 1 | |
4 | 0.1818 | 1 | |
4 | 0.1 | 1 | |
5 | 0.0001 | 1 | |
5 | 0.001 | 1 | |
2 | 0.4642 | 1 |
Method . | Number of Different Values (max = 6) . | Minimum . | Maximum . |
---|---|---|---|
1 | 1 | 1 | |
4 | 0.1 | 1 | |
4 | 0.1 | 1 | |
4 | 0.0165 | 0.5 | |
1 | 1 | 1 | |
1 | 1 | 1 | |
1 | 1 | 1 | |
1 | 1 | 1 | |
2 | 0.3162 | 1 | |
2 | 0.1 | 1 | |
4 | 0.1818 | 1 | |
4 | 0.1 | 1 | |
5 | 0.0001 | 1 | |
5 | 0.001 | 1 | |
2 | 0.4642 | 1 |
3.4 Robustness to Noise
When adding different amounts of gaussian noise to both real-world networks, we observe that the generalizations and of the clustering coefficient perform best for both networks across all levels of noise used in this study with a mean error within 5%. Most of the other generalizations also perform reasonably well, with mean errors within 10% (see Figure 2).
All generalizations of the local efficiency have an acceptable mean error (i.e., less than 5%) in both networks when noise is not too large (see Figure 3). However, when noise increases, only and have mean errors below 10% in both networks.
3.5 Weight-scale Invariance and Applicability to Fully Weighted Networks
Weight-scale invariance is satisfied for all generalizations of the clustering coefficient as can be derived mathematically by multiplying every weight by a factor and observing that the result is independent of this factor. The generalizations of the local efficiency , , and also show weight-scale invariance, but this is not the case for .
From equation 2.5 for and equation 2.9 for , we see that for fully weighted networks, the clustering coefficient equals 1 for all nodes, and therefore we consider these generalizations not suitable for fully weighted networks (see Table 3). All other generalizations for the clustering coefficient and the local efficiency can be used for fully weighted networks.
Method . | Weight Scale Invariance . | Suitable for Fully Weighted Networks . |
---|---|---|
Yes | No | |
Yes | Yes | |
Yes | Yes | |
Yes | Yes | |
Yes | No | |
Yes | No | |
Yes | No | |
Yes | No | |
Yes | Yes | |
Yes | Yes | |
Yes | Yes | |
No | Yes | |
Yes | Yes | |
Yes | Yes | |
Yes | Yes |
Method . | Weight Scale Invariance . | Suitable for Fully Weighted Networks . |
---|---|---|
Yes | No | |
Yes | Yes | |
Yes | Yes | |
Yes | Yes | |
Yes | No | |
Yes | No | |
Yes | No | |
Yes | No | |
Yes | Yes | |
Yes | Yes | |
Yes | Yes | |
No | Yes | |
Yes | Yes | |
Yes | Yes | |
Yes | Yes |
A summary table with the main findings for the essential and useful properties is given in Table 4.
Property . | Clustering Coefficient . | Local Efficiency . |
---|---|---|
General versatility | , , , , , , | , , |
, , , | ||
Continuity | , , , , | , , |
Sensitivity | , , , , , , | , , , |
, , , , | ||
Robustness to noise | , , , , , | , , , |
Weight scale invariance | All generalizations | , , |
Applicable to fully | , , , , , | All generalizations |
weighted networks |
Property . | Clustering Coefficient . | Local Efficiency . |
---|---|---|
General versatility | , , , , , , | , , |
, , , | ||
Continuity | , , , , | , , |
Sensitivity | , , , , , , | , , , |
, , , , | ||
Robustness to noise | , , , , , | , , , |
Weight scale invariance | All generalizations | , , |
Applicable to fully | , , , , , | All generalizations |
weighted networks |
3.6 Soft-Thresholded Weighted Networks
For generalizations of the clustering coefficient that can be applied to fully weighted networks, we compared the values obtained in a fully weighted network with those obtained in a soft-threshold network. Since we have only unthresholded data for the associative-semantic network, the analysis is limited to this network. The hypothesis is that there will be a high correlation between both cases, especially when the density of the soft-thresholded network is high. In Figure 4 and Table 5, the results are shown for different soft-threshold values.
Density (%) . | 40 . | 60 . | 80 . |
---|---|---|---|
0.21 | 0.57 | 0.81 | |
0.91 | 1.00 | 1.00 | |
0.91 | 1.00 | 1.00 | |
0.52 | 0.85 | 0.96 | |
0.71 | 0.97 | 1.00 | |
0.55 | 0.92 | 1.00 | |
0.51 | 0.68 | 0.82 | |
0.97 | 1.00 | 1.00 | |
0.92 | 0.99 | 1.00 | |
0.65 | 0.89 | 0.98 |
Density (%) . | 40 . | 60 . | 80 . |
---|---|---|---|
0.21 | 0.57 | 0.81 | |
0.91 | 1.00 | 1.00 | |
0.91 | 1.00 | 1.00 | |
0.52 | 0.85 | 0.96 | |
0.71 | 0.97 | 1.00 | |
0.55 | 0.92 | 1.00 | |
0.51 | 0.68 | 0.82 | |
0.97 | 1.00 | 1.00 | |
0.92 | 0.99 | 1.00 | |
0.65 | 0.89 | 0.98 |
4 Discussion
In this letter, we have defined a set of essential and useful properties that should ideally be satisfied for a generalization of a graph measure when extending from a binary network to a (fully) weighted network. We have compared all of these properties for the generalizations for the clustering coefficient and the local efficiency found in the literature, as well as for new generalizations. Some of the generalizations are especially suited in the case of (fully) weighted undirected graphs.
4.1 Essential Properties of Generalizations of Binary Graph Measures
Generalizations from binary graph measures that are applicable to (fully) weighted graphs should ideally satisfy a number of properties. These properties can be subdivided into essential properties and useful properties. The latter class of properties depends on the application and should be considered relevant only in those applications.
The first essential property is general versatility, which refers to the fact that when applying the generalization on a binary graph, the result should be the same as the corresponding binary graph measure. This property is not satisfied for the generalization of the clustering coefficient and the generalization of the local efficiency. The second essential property is continuity, which means that an infinitesimal small change in one of the weights should lead to an infinitesimal small change in the graph measure. This is not the case for , , and , and it is also not satisfied for . Important to note is that expressions for the local clustering coefficient (see equation 2.6) and the local efficiency (see equation 2.13), currently used in the brain connectivity toolbox in case of weighted undirected graphs, are not satisfying continuity (Onnela et al., 2005; Rubinov & Sporns, 2010). The third essential property is sensitivity to capture different cases for which the graph measures are designed. We have evaluated this property by looking at six possible cases for a simple triangle since both clustering coefficient and local efficiency are based on triangles. We found that the most sensitive generalizations for the clustering coefficient are , , , and , which could distinguish four of six cases. The best generalizations for the local efficiency are and , which were able to distinguish five of six cases. The fourth essential property is robustness to noise. We have investigated the robustness against different amounts of gaussian noise for two different real-world networks of different sizes. We found that and were the most robust generalizations for the clustering coefficient and and are the most robust generalizations for the local efficiency.
4.2 Useful Properties of Generalizations of Binary Graph Measures
Weight-scale invariance means that the graph measure is invariant to a global scale factor for all edges. In some cases, only relative connection strengths can be determined, and this property is especially useful in such cases. All generalizations of the clustering coefficient and the local efficiency are weight-scale-invariant except the generalization for the local efficiency. Furthermore, all generalizations of the local efficiency can be applied to fully weighted networks. This is also the case for most generatlizations of the clustering coefficient except for and .
4.3 Fully Weighted Undirected Graphs
Most studies on graphs in neuroscience are related to binary undirected graphs (Sporns, Honey, & Kötter, 2007; He et al., 2007; He, Chen, & Evans, 2008; Van Wijk, Stam, & Daffertshofer, 2010; Vandenberghe et al., 2013). These graphs have either a connection or not between a pair of nodes, and they are easy to analyze. However, in order to obtain a binary graph, some measure of connectivity (often continuous) between nodes needs to be calculated and then thresholded on either amplitude or significance. The results of this procedure critically depend on the threshold used and do not take into account the strength of the connection. As a result, true connections that do not survive the threshold are removed, while false connections may sometimes be included. An alternative is the use of weighted graphs. In order to avoid taking into account spurious noisy connections, a soft threshold is sometimes applied, which removes these connections. The other connections are weighted (Wang, Li, Metzak, He, & Woodward, 2010; van den Heuvel, Mandl, Stam, Kahn, & Pol, 2010). This reduces the problem described above but does not solve it completely. The advantage of fully weighted graphs is that no thresholding is required and all connections are taken into account (Mumford et al., 2010). The weight should then reflect not only the strength of the actual underlying biological connection but also the probability of being a true connection. Therefore, it might be necessary to apply a transformation from connection strength (e.g., defined by the (partial) correlation between two nodes in fMRI based functional networks) to weights. In case of the associative-semantic network, we have shown that the generalizations that are applicable to fully weighted networks show a high correlation with the network, which is soft-thresholded, especially when the density of the latter is high.
4.4 Choice of the Best Generalization
The choice of the best generalization is not always easy. The essential requirements of general versatility and continuity can be proven mathematically, and they are completely independent of the application. The relative importance of the two other essential properties, sensitivity and robustness to noise, depend on the application and most likely requires a trade-off between these two properties. In this study, we have tried to quantify sensitivity based on the values obtained for six possible triangles and robustness to noise by evaluating the behavior when adding different amounts of gaussian noise. Based on our results, we propose that the best generalization of the clustering coefficient is , which is more robust to noise compared to . The choice of the best generalization of the local efficiency is , but if robustness to noise is very important, can be selected; sensitivity, however, will clearly be lower compared to . If the noise in an application is nongaussian, a similar approach, as we have shown, can be taken for the noise model, which is more appropriate for the application. Satisfying the useful properties can be important but depends on the application. Despite the difficulty in selecting the optimal generalization, we believe that readers can use the assessment of all properties for all generalizations to a select the best generalization for the application under study.
5 Conclusion
In this letter, we have focused on the comparison of different generalizations for the clustering coefficient and local efficiency to the case of (fully) weighted networks by looking at different properties of these graph measures and studying the performance in two real-world networks of different sizes. The best generalization of the clustering coefficient is , defined in Miyajima and Sakuragawa (2014), while the best generalization of the local efficiency is proposed in our work. Depending on the application and the relative importance of sensitivity and robustness to noise, other generalizations may be selected on the basis of the properties investigated in this letter.
Acknowledgments
This work was supported by Research Foundation Flanders (FWO; G0660.09 and G0A0913N to R.V. and P.D.), KU Leuven (OT/12/097 to R.V. and P.D.), Federaal Wetenschapsbeleid belspo (IAP-VII P7/11), and Stichting voor Alzheimer Onderzoek (SAO11020 and 13007). Y.W. has a grant from the Chinese Scholarship Council, and R.V. is a senior clinical investigator. We thank both reviewers for their constructive comments, which greatly improved the letter.
References
Note
https://sites.google.com/site/bctnet/datasets as GroupAverage_rsfMRI_matrix.mat.