## Abstract

A graph is a mathematical representation of a set of variables where some pairs of the variables are connected by edges. Common examples of graphs are railroads, the Internet, and neural networks. It is both theoretically and practically important to estimate the intensity of direct connections between variables. In this study, a problem of estimating the intrinsic graph structure from observed data is considered. The observed data in this study are a matrix with elements representing dependency between nodes in the graph. The dependency represents more than direct connections because it includes influences of various paths. For example, each element of the observed matrix represents a co-occurrence of events at two nodes or a correlation of variables corresponding to two nodes. In this setting, spurious correlations make the estimation of direct connection difficult. To alleviate this difficulty, a digraph Laplacian is used for characterizing a graph. A generative model of this observed matrix is proposed, and a parameter estimation algorithm for the model is also introduced. The notable advantage of the proposed method is its ability to deal with directed graphs, while conventional graph structure estimation methods such as covariance selections are applicable only to undirected graphs. The algorithm is experimentally shown to be able to identify the intrinsic graph structure.

## 1.  Introduction

A graph is a mathematical structure that represents the relationship of data. Typical examples of graph structures are railroads, the Internet, transmission networks, and neural networks. In recent years, pattern recognition and data mining based on graph structure have been studied. For example, the graph structure was used by Bunke and Riesen (2011) for document analysis, Conte, Foggia, Sansone, and Vento (2004) for biometric identification, and Washio and Motoda (2003) for data mining.

A graph is composed of nodes (vertices) and edges (arcs), where a connection between a pair of nodes is expressed by an edge. Consider a case of stock prices; nodes correspond to companies, and edges correspond to direct connectivity between companies. Influences of stock prices can transfer through several companies. As another example, in the case of neural networks, nodes correspond to nerve cells (neurons), and edges correspond to direct connectivity between neurons. If coincident multineuronal firings are observed, correlations between these observed neurons are stronger than others. However, these neurons may not be connected directly, and the observed coincident firings may be a result of influences from other neurons. This effect is called a pseudo-correlation, and it is problematic in data analysis.

There are two major graph estimation problems. One is a problem of estimating the importance of nodes when connectivities of whole edges are known—for example, web page ranking problems such as PageRank (Page, Brin, Motowani, & Winograd, 1997) and HITS (Kleinberg, 1999). The other problem is estimating edges based on certain statistics, which represents dependency between nodes—for example, cell-signaling data analysis (Friedman, Hastie, & Tibshirani, 2007). We deal with the latter problem in this letter. There are many different kinds of statistics for representing the relationship between nodes; therefore, we collectively refer to these statistics as dependency.

In a graph structure estimation, it is typically assumed that a set of observed data is generated from a multivariate gaussian distribution. Following this, the inverse covariance matrix is regarded as the intensity of direct connections between nodes. In many cases, it is important to estimate a few intrinsic connections between nodes of the graph because the inverse covariance matrix is assumed to be sparse. This formulation is referred to as the SICS (sparse inverse covariance selection; Dempster, 1972). In the framework of SICS, given the empirical covariance matrix T and a regularization parameter , the objective is to minimize the regularized negative log likelihood of a gaussian distribution:
1.1
where is the inverse covariance matrix, denotes positive definiteness of the matrix, , and is the element-wise -norm of . Typical algorithms are glasso (graphical lasso; Friedman et al., 2007), COVSEL (Banerjee, Ghaoui, & d'Aspremont, 2008), SINCO (Scheinberg & Rish, 2009), LOREC (Luo, 2011), and QUIC (Hsieh, Sustil, Dhillon, & Ravikumar, 2011). These algorithms are computationally efficient and scalable and can deal with large-scale graphs. However, they cannot be used for estimating asymmetric graph structures because they assume a multivariate gaussian distribution as a data generative model.
Another method for graph structure estimation is based on the information-theoretic approach (Schneidman, Berry, Segev, & Bialek, 2006; Tang et al., 2008; Tatsuno, Fellous, & Amari, 2009). Schneidman et al. (2006) and Tang et al. (2008) considered how to measure the information content of the observed spike train data. They proposed using the information entropy of the firing probability of neurons for measuring the information content of the data. Using real spike train data, they showed that the pairwise interaction between neurons has dominant information. Tatsuno et al. (2009) considered the generative model of the spike train data and modeled the probability of coincident multineuronal firings by a second-order log-linear model,
1.2
where xi is the binary variable representing the state of the firing of neuron i; , are the canonical parameters of the exponential model; and is a normalization factor. They estimated parameters and to identify the neural network structure from the observed spike trains. Similar to the SICS case, these models assume a symmetric graph structure, up to second-order statistics. Moreover, there are hyperparameters to be determined (Tatsuno et al., 2009).

In this study, based on the random walk model on graphs, we propose a method that can extract only important connections in graphs without any tuning parameters. A random walk model is often used in graph analyses. For example, Ribeiro and Towsley (2010) proposed using random walk models for sampling nodes and edges from a given graph to calculate network characteristics such as degree distribution. Sarma, Gollapudi, and Panigrahy (2011) proposed to approximate the importance index of the node in this PageRank (Page et al., 1997) scheme by using random walk modeling for a given graph. Although a random walk model is used for sampling from a fixed graph structure, to the best of our knowledge, there has been no attempt to use a random walk model on a graph to estimate intrinsic graph structure. Although the proposed framework of graph structure estimation is different from SICS, we compare the ability of our method to several SICS methods, which are actively studied and considered to offer state-of-the-art accuracy.

The remainder of this letter is organized as follows. In section 2, we define the notion of graph structures and represent dependency as the total intensity of directed and undirected connections between nodes. Section 3 describes the digraph Laplacian that is a simple matrix function characterizing a graph and describes information transitions on a graph. In section 4, an algorithm for estimating graph structure is proposed. Section 5 shows the comparison of graph structure estimation with our method and existing methods by using both synthetic and real-world data sets. The last section, a conclusion, provides examples of future research on this field of study.

## 2.  Relevant Background and Problem Setting

In this section, we explain the notion of graph structures (Chung, 1997) and introduce the framework for estimating graph structure. Let V be a set of nodes and E be a set of edges that connects nodes i and j. A graph is characterized by a pair (V, E). There are two types of graphs: undirected and directed. When we are interested in only connectivity between nodes, we use undirected graphs, and when we are interested in the directions of edges, we use directed graphs.

Our aim is in estimating the intrinsic graph connectivity when a certain kind of dependency between nodes is observed with noise. The relationship of intrinsic graph connectivity and dependency between nodes is modeled by a function,
2.1
The formal definitions of and are given later. By modeling information transition on a graph by a function f, we propose a framework of estimating the intrinsic graph structure by an observation .1 As a concrete model of f, we introduce the random walk on graphs, which is characterized by the digraph Laplacian introduced in section 3. We show a conceptual diagram of graph estimation in Figure 1.
Figure 1:

Conceptual diagram of a graph estimation problem. We focus on a problem to estimate edges based on data observed at each node.

Figure 1:

Conceptual diagram of a graph estimation problem. We focus on a problem to estimate edges based on data observed at each node.

Let be a connectivity matrix with , , which represents the connection intensity from nodes i to j, and can be zero if there is no edge between i and j. For the example of stock prices, represents direct connectivity between companies. We can deal with both undirected and directed graphs, but in this study, we mainly consider the directed graph, that is, generally . In our representation, diagonal elements are always set to 0, that is, we assume does not include self-loops. We emphasize that the connectivity in this study indicates the direction and intensity of information transition via the edge, unlike graphical models (Pearl, 2000) in which edges represent causality.

Since we cannot observe directly, we introduce another parameter, which represents the dependent relationship between nodes. Let be a dependency matrix with , , which can be represented as
2.2
where are attenuation coefficients. We show the relationship between parameters and in Figure 2. The first term of equation 2.2 represents external inputs. The second term represents direct information propagation between nodes i and j, while the third term represents information propagation via one intermediate node k, and so on. A concrete example of equation 2.2 is shown in equation 3.11. For stock prices, is supposed to be a certain quantity that represents how one company's stock price affects another company's stock price.
Figure 2:

Relationship between and . The superficial dependency is composed of the intrinsic connectivity between various nodes.

Figure 2:

Relationship between and . The superficial dependency is composed of the intrinsic connectivity between various nodes.

As shown in equation 2.2, includes the information propagating through various paths, and direct relationships between nodes are not clear. Therefore, we propose a framework for estimating the connectivity matrix from . As described before, represents the essential structure of the graph. In principle, we could estimate from by taking the inverse of f. Therefore, we have to choose f, which has the inverse f−1. For computational and theoretical simplicity, we later propose using the matrix exponential map as the function f.
In algorithm 1, we show a general procedure for estimating when f and off-diagonal elements of are given. Algorithm 1 is the general framework of proposed graph structure estimation; an algorithm with a concrete model for f is given in algorithm 2. Since autodependency is not well defined in terms of equation 2.2, we consider as a latent variable and replace it with an appropriate initial value . That is, the diagonal elements of are complemented by 0 and replace the diagonal elements with r>0 so that all of the eigenvalues of become positive.2 Then we obtain . We impose certain constraints on . One of the constraints is that the diagonal elements should be 0. In section 3, as an additional constraint, we restrict the maximum number of nonzero elements of . Let be a subspace of the connectivity , which is composed of elements that satisfy these constraints. We denote the projection of to the constrained space by
2.3
Thus, we obtain . By using the diagonal elements of , we update , where diag(M) is a matrix-valued function for a square matrix M that replaces off-diagonal elements of M by 0. We denote this updating as a map from to :
2.4
where denotes a subspace of composed of elements that the off-diagonal elements are equal to those of . We can estimate from off-diagonal elements of by repeating the above procedure. A conceptual diagram of the general procedure for estimating from is shown in Figure 3.
Figure 3:

Conceptual diagram of the general procedure for estimating .

Figure 3:

Conceptual diagram of the general procedure for estimating .

## 3.  Model of Graph Structure

We propose using the exponential map as the function f in equation 2.1 for theoretical and computational simplicity. Then we can simply obtain the inverse of f as the matrix logarithm. In this section, we explain digraph Laplacian (Li & Zhang, 2010) for considering the exponential map of directed graphs. We explain the notion of a random walk on a graph characterized by the digraph Laplacian and represent information propagation on a graph by a random walk.

### 3.1.  Digraph Laplacian.

The digraph Laplacian (Li & Zhang, 2010) is a function of a graph (V, E). First, we consider a directed graph and assume the connectivity matrix as
3.1
where is known as an adjacency matrix in the literature of the graph theory. This assumption is reasonable when the connectivity takes two distinct values that represents the states “connected” and “unconnected.” We expect that the estimation of becomes more robust to observation noises by this assumption.
Next, we define a digraph Laplacian L as a function of as
3.2
From equation 3.2 all row-wise sums of are 0s, for example, , where 1 and 0 are n-dimensional vectors whose elements are all 1s and 0s, respectively.
The graph Laplacian is defined as a special case of the digraph Laplacian with a symmetric connectivity matrix :
3.3
The definition of the Laplacian matrix L is the same as equation 3.2, and L also becomes symmetric. By definition, all row-wise and column-wise sums of are 0s—for example, and . The graph Laplacian plays a central role in spectral graph theory (Chung, 1997) and is utilized for various purposes of data analysis such as clustering (Shi & Malik, 2000; Belkin & Niyogi, 2001), manifold learning (Sahbi, Etyngier, Audibert, & Keriven, 2008), and web page ranking (Page et al., 1997; Kleinberg, 1999).

### 3.2.  Random Walk on Graph.

In this section, we consider a random walk on a graph characterized by (Kondor & Lafferty, 2002). Let be the transition probability between connected nodes in a unit time. We focus on the transition probability in a short interval . The transition probability matrix for this interval is written as
3.4
where In is the unit matrix. Note that holds. The element is the transition probability from node i to j in the interval , and is the probability that a random walker remains in node i. We show a conceptual diagram of the transition probability that is represented by the matrix in Figure 4.
Figure 4:

Transition probability on a graph with the connectivity .

Figure 4:

Transition probability on a graph with the connectivity .

Let be an n-dimensional unit vector defined as
3.5
The jth element of gives the probability of a one-step transition from node i to j. In the same way, , (i.e., the ith column of ,) gives the -step transitions probability vector starting from the node i. By considering the continuous time limit of the random walk, we define the transition probability in a unit time as
3.6
The digraph Laplacian characterizes a random walk on a directed graph, and the ij element of the matrix is regarded as the probability that the walker starting from node i stays at node j after a unit time. When is the graph Laplacian, the right-hand side of equation 3.6 is called the diffusion kernel (Kondor & Lafferty, 2002).

### 3.3.  Observed Data Model.

In this section, we introduce a generative model of the observed data using the exponential map defined in equation 3.6. We show the relationship between variables in Figure 5.

Figure 5:

Relationship between variables. The dependency matrix is obtained by mapping the connectivity matrix . The quantity T=(tij) is the observed dependency with noise . We assume that is a {0,1}-valued matrix and approximate T using the random walk .

Figure 5:

Relationship between variables. The dependency matrix is obtained by mapping the connectivity matrix . The quantity T=(tij) is the observed dependency with noise . We assume that is a {0,1}-valued matrix and approximate T using the random walk .

In our model, is composed of intrinsic connectivity modeled by a function f in equation 2.1. Here we assume that the dependency is observed with an additive noise as
3.7
where represents a noise term. Specifically, we consider as an estimate of , and T as an matrix with the ij element tij. That is, the quantity tij is characterized by the dependency with a certain probability model and T is observed instead of . We can assume an arbitrary distribution for tij, but for simplicity, we assume tij are samples from gaussian distributions with means . The probability density function of observed data tij is written as
3.8
where is a common variance. The joint distribution of the observed data T can be expressed as
3.9
We observe only the dependency with noise, though we can estimate connectivity by the algorithm proposed in the next section. Figure 5 shows that the intrinsic connectivity is simplified, and then the digraph Laplacian is calculated. The intrinsic connectivity leads to dependency via a function f. We approximate using the random walk, and we assume a matrix T is observed as affected by additive noises.
With a multiplicative factor for in equation 3.6, the random walk on a graph is equivalent to the polynomial transformation model in equation 2.2, and we obtain
3.10
by a Taylor series expansion. The off-diagonal element () of is written as
3.11
The last expression of equation 3.11 corresponds to equation 2.2,
3.12
where we extended the function f to have additional parameters in the random walk in equation 3.10. We can see the relationship between equations 2.2 and 3.11, that is, corresponds to , and corresponds to . Since by equation 3.12,
3.13
where is the variance of the gaussian noise for the observation. We have to calculate the inverse function of f in equation 2.1 to estimate a graph structure . Since f is modeled by the exponential map , we only have to calculate by using fixed and .

## 4.  Algorithm for Estimating Graph Structure

In this section, we propose an iterative algorithm for estimating parameters , , and for the proposed model in equation 3.13 from an observed matrix T. This procedure is summarized in algorithm 2. Since we assume observation noises are gaussian, we consider the objective function as the sum of squared residuals,
4.1
and estimate , , and that minimize equation 4.1. If , the ij element of the logarithm of is
4.2
We can estimate as
4.3
We note that diagonal elements of are not defined, and we update while updating parameters , and using the following iterative algorithm.
First, we initialize as
4.4
where r is chosen so as to . Next, we calculate the logarithm of . For fixed and , we calculate an approximation of the matrix logarithm using the Jordan normal form (Higham, 2008) of to obtain , which corresponds to procedure A in algorithm 1. The number of maximum possible edges m is incremented by a value of 1 starting from m=1 in the iteration of the algorithm. Let be the mth largest value of for . Then we set
4.5
By this thresholding operation, we choose m most likely connected edges. We estimate and that minimize the objective function in equation 4.1 with ,
4.6
and obtain
4.7
We used the Nelder-Mead method (Lagarias, Reeds, Wright, & Wright, 1998) for optimizing and . The procedures shown in equations 4.6 and 4.7 correspond to procedure B in algorithm 1. We update the diagonal elements of by
4.8
which corresponds to procedure D in algorithm 1. For a fixed m, we repeat these procedures times. The stopping criterion of the inner loop can be defined by the gap between the previous and current update of the estimate, but for simplicity, we set the maximum number of iterations . In our preliminary experiments, we observed that the estimate converged at most two iterations; hence, we set in our experiments. We denote the estimates obtained by these procedures as , , and .
We increased m to m+1 and repeated algorithm 2. After we obtain a set of estimates, , we select as
4.9
and obtain the estimate , which leads to the estimate of the intrinsic connectivity . We note that when we deal with undirected graphs, , the size of S is reduced to n(n−1)/2.

## 5.  Experiments

In this section, we show experimental results of graph structure estimation. In section 5.1, we focus on an estimate of undirected graphs. We compare results of the proposed method with results of SICS methods. Since generative models of the proposed framework and SICS framework are different, we estimate graph structures from observed data according to each generative model. Observed data are generated from the random walk in section 5.1.1. This condition is faithful to the proposed framework. Observed data are generated by the multidimensional gaussian distribution in section 5.1.2. This condition is faithful to the SICS framework. In section 5.2, we estimate directed graph structures using the proposed method only because SICS methods cannot be applied to directed graph structure estimation. In section 5.3, we apply the proposed method and SICS methods to real-world financial data sets and compare the results of these methods. As representative SICS methods, we adopt glasso (Friedman et al., 2007), QUIC (Hsieh et al., 2011), and LOREC (Luo, 2011). We use StARS (stability approach to regularization selection; Liu, Roeder, & Wasserman, 2010) to find an appropriate regularization parameter in SICS methods.

### 5.1.  Artificial Data of Undirected Graphs.

First, we estimate symmetric graph structures using both the proposed method and SICS methods and compare their performances. We set the number of nodes as n=20.

#### 5.1.1.  Data in Accordance with the Random Walk.

We compare the accuracy of different methods on the data in accordance with equation 3.13, the random walk on the graph. To see the relationship between the accuracy and the density of the ground-truth graph , we first define the density of as the probability , for . A density is a value from 0 to 1. We use with different densities (0.2, 0.4, and 0.6) as case studies. The definition of accuracy is provided later. Next, we generate observed data T by
5.1
where and are randomly generated from a uniform distribution over [0, 1].
In SICS methods, diagonal elements of T are set to a common value r in the same way as equation 4.4. In all our experiments, r is fixed at 1, and it always made the eigenvalues of positive. We regard T as the covariance matrix of a gaussian distribution and solve the regularized gaussian MLE problem:
5.2
Then, to obtain -valued connectivity, we transform the estimated by the following thresholding operation:
5.3
We use StARS to find the regularization parameter . StARS finds an appropriate value of based on stability estimated by random subsampling. In our study, the observed data T are generated 100 times according to the assumed generative model for calculating stabilities in StARS.
The proposed method and SICS methods return the estimated graph structure from observed data T. In Figure 6, we show the estimation results where the vertical axis and horizontal axis indicate the accuracy and size of the observed noise, respectively. We define the accuracy of estimation as
5.4
where is the ij element of estimated connectivity matrix. We also define the size of observed noise as
5.5
that is, the size of noise is the coefficient of variation. Figures 6a to 6c are accuracies obtained by different methods when the densities of are 0.2, 0.4, and 0.6.
Figure 6:

Accuracies of undirected graph structure estimations with different methods. Densities of the ground-truth graphs are (a) 0.2, (b) 0.4, and (c) 0.6. Error bars are standard deviations.

Figure 6:

Accuracies of undirected graph structure estimations with different methods. Densities of the ground-truth graphs are (a) 0.2, (b) 0.4, and (c) 0.6. Error bars are standard deviations.

In Figure 6, we can see that the accuracies of SICS methods decrease as the densities increase. The proposed method is shown to obtain high accuracies in most cases.

In Figure 7, to see the effect of the regularization parameters in SICS methods, we show the average estimation accuracies by the proposed method and the glasso method with various regularization parameters. The vertical axis and horizontal axis of the graphs indicate accuracy and regularization parameters , respectively. In this study, we fixed the density of to 0.2, but we observed a similar tendency for other densities. We prepared two levels of noise: 0.006 and 0.06. The proposed method has no tuning parameter in contrast to SICS methods. Figure 7 shows that the misspecification of the regularization parameter of glasso leads to inaccurate estimation results. In this study, the proposed method achieved almost the same accuracy as the optimal case of glasso.

Figure 7:

Averaged accuracies with 1 standard deviation error bars of undirected graph structure estimation. Dashed lines show the results of glasso using various regularization parameters, and solid lines show those of the proposed method with different noise levels: (a) 0.006 and (b) 0.06. The density of the ground-truth graph is fixed to 0.2. The observed matrix is generated 100 times in each setting.

Figure 7:

Averaged accuracies with 1 standard deviation error bars of undirected graph structure estimation. Dashed lines show the results of glasso using various regularization parameters, and solid lines show those of the proposed method with different noise levels: (a) 0.006 and (b) 0.06. The density of the ground-truth graph is fixed to 0.2. The observed matrix is generated 100 times in each setting.

#### 5.1.2.  Data Generated from the Multidimensional Gaussian Distribution.

In this section, we compare the accuracy of different methods on the data generated from multidimensional gaussian distributions. We make the precision matrix from the ground-truth graph as . We sample from the multidimensional gaussian distribution , where N is the number of samples. We consider the empirical covariance matrix of as an observation
5.6
In Figure 8a, we show the relationship between the accuracy and the number of samples N when the density of is fixed to 0.2 to investigate how the performance is influenced by the number of samples. In Figure 8b, we show the accuracy when N=200 to investigate how the performance is influenced by the connection density.
Figure 8:

Accuracies of undirected graph structure estimations with different methods. (a) Density of the ground-truth graph is set to 0.2. The numbers of observed data are 200, 500, 1000, and 10,000. (b) Number of observed data is set to 200. Densities of the ground-truth graphs are 0.2, 0.4, and 0.6. Error bars are standard deviations.

Figure 8:

Accuracies of undirected graph structure estimations with different methods. (a) Density of the ground-truth graph is set to 0.2. The numbers of observed data are 200, 500, 1000, and 10,000. (b) Number of observed data is set to 200. Densities of the ground-truth graphs are 0.2, 0.4, and 0.6. Error bars are standard deviations.

From Figure 8a, we see that when the data are generated from a multivariate gaussian with the inverse covariance matrix , the proposed method accurately estimates the ground-truth graph. From Figure 8b, we can see that the accuracies of every method decrease as the densities increase. However, the accuracy decrease of the proposed method is slower than others and retains more than 60% accuracy even when the density reached 0.6.

There are two possible reasons for the better accuracies achieved by the proposed method shown in Figures 6 and 8. The first is the essential difference of the underlying models of the proposed and SICS frameworks. While the SICS framework considers only the second-order statistics, the proposed framework takes higher-order statistics into account, as shown in equations 3.10 and 3.11. The second reason is the difference of the regularization methods. We adopt StARS for defining the regularization parameter in SICS methods, while algorithm 2 has a mechanism to automatically determine an appropriate sparsity based on the fitness of the model.

### 5.2.  Artificial Data of Directed Graphs.

In this section, we estimate directed graphs represented by asymmetric adjacency matrices . Since SICS methods cannot be used for estimating directed graphs, we use only the proposed method. We prepare several with densities 0.2, 0.4, 0.6, and randomly generate , from a uniform distribution over [0, 1]. We set the number of nodes to n=20. We generate observed data T according to equation 3.13 in the same manner as the experiments of the undirected graph. We show the relationship between accuracy, defined by equation 5.4, and the size of observed noise defined, by equation 5.5, in Figure 9. As shown in Figures 6 and 9, the proposed method is able to estimate the directed graph structure the same as the undirected case. The accuracy of estimation deteriorates with the increase of the density of the ground-truth graph and the size of noise. This tendency is observed in the experiments with both directed and undirected graphs.

Figure 9:

Averages of accuracies of directed graph structure with respect to different densities. The densities of the ground-truth graphs are 0.2, 0.4, and 0.6. Error bars are standard deviations. The observed matrix is generated 100 times in each setting.

Figure 9:

Averages of accuracies of directed graph structure with respect to different densities. The densities of the ground-truth graphs are 0.2, 0.4, and 0.6. Error bars are standard deviations. The observed matrix is generated 100 times in each setting.

Sections 5.1 and 5.2 are summarized as follows. We performed experiments with artificial data and showed that the proposed method is able to estimate the graph structures more accurately than the conventional SICS methods in most cases. The proposed method, which has no tuning parameter, achieved comparable or superior accuracies to the finely tuned glasso. Also, the proposed method is applicable to directed graphs, while the conventional SICS methods are applicable only to undirected graphs.

### 5.3.  Real-World Data.

In this section, we apply the proposed method and SICS methods to analyze a set of time-series data of stock prices downloaded from Yahoo! finance.3 First, we consider an undirected graph with nodes corresponding to companies to compare the results of the proposed method with the results of SICS methods. Next, we consider a directed graph to analyze the relationship of the companies in more detail by the proposed method. The data set is composed of the daily closing prices of 10 companies from November 11, 2010, to July 11, 2011, which is equivalent to 158 business days. We show the data in Figure 10. We take an interest in how the relationship of companies changed around the day of the Great East Japan Earthquake on March 11, 2011.

Figure 10:

Stock prices of several companies. The vertical lines that go across all graphs represent the day of the Great East Japan Earthquake on March 11, 2011.

Figure 10:

Stock prices of several companies. The vertical lines that go across all graphs represent the day of the Great East Japan Earthquake on March 11, 2011.

#### 5.3.1.  Undirected Graphs.

The edges of the graph represent a certain kind of relationship of stock prices between companies. For example, if stock prices of companies i and j behave similarly, it is natural to consider that there is an edge between these companies. We estimate the undirected connectivity matrix from an observation. Let be a stock price of company i at time t. Since we focus on the existence of connections and do not consider whether connections are positive or negative, we consider an absolute value of relative change rate Ri(t)=|(Si(t+1)−Si(t))|/Si(t). We assume that virtual random walkers in the stock market have uptick and downturn factors. They transit between nodes according to the rate determined by the intensity of the connection, and we assume the rise and decline of stock prices of companies can be explained by the transition of these random walkers.

Given the stock values of each company at each time tick, we first translate the stock values to the difference of them at times t and t+1 to obtain and . The degree of synchrony of companies i and j is measured by using Kendall's correlation (Kendall & Gibbons, 1990):
5.7
There is no need to assume a specific distribution on Ri(t) when we use Kendall's correlation. Using the observation matrix T defined by equation 5.7, we estimate connections between companies by the proposed method and SICS methods.

Ten large electronic companies (Sony, Panasonic, Sharp, Toshiba, Google, IBM, Apple, Microsoft, LG, and Samsung) are chosen and their daily closing stock prices Si(t) before and after the Great East Japan Earthquake (November 11, 2010–March 11, 2011 and March 11, 2011–July 11, 2011) are used to construct observation matrices. Figures 11 and 12 are the estimated graph structures before and after the earthquake, respectively.

Figure 11:

The estimated graphs correspond to before the Great East Japan Earthquake occurred. The data set is composed of daily closing prices from November 11, 2010, to March 11, 2011.

Figure 11:

The estimated graphs correspond to before the Great East Japan Earthquake occurred. The data set is composed of daily closing prices from November 11, 2010, to March 11, 2011.

Figure 12:

The estimated graphs correspond to after the Great East Japan Earthquake occurred. The data set is composed of daily closing prices from March 11, 2011, to July 11, 2011.

Figure 12:

The estimated graphs correspond to after the Great East Japan Earthquake occurred. The data set is composed of daily closing prices from March 11, 2011, to July 11, 2011.

From Figures 11 and 12, only a few edges were detected by SICS methods. The sparseness of SICS methods is decided by StARS. From the results of the SICS methods, we cannot see any changes in the relationship between companies around the day of the earthquake. The proposed method outperformed SICS methods on artificial data sets; therefore, the result of the proposed method can be considered more reliable. In addition, the proposed method reveals an interesting relationship between companies. From Figure 11a, we can see that only the proposed method extracts some mutual connections between Japanese companies (Sony, Panasonic, and Toshiba) and U.S. companies (Apple, IBM, and Google). From Figure 12a, we can see that Japanese and U.S. companies are mutually connected to each other, but companies in these two countries become isolated after the earthquake occurred. From this result, it is conceivable that the stock prices of Japanese companies showed similar behavior, and the behavior was distinct from those of U.S. companies.

#### 5.3.2.  Directed Graphs.

Finally, we estimated a directed graph representing the relationship between companies. Considering the direction of the effect from the company i to j, we calculate the time-shifted variant of Ri as R+i(t)=Ri(t+1) and define observation T by
5.8
By this definition of observation, effects both with and without delay can be taken into account, and the observation matrix T becomes asymmetric.

We show the estimated directed graph structures in Figure 13. Figures 13a and 13b have different connectivity from the structures shown in Figures 11a and 12a. Comparing the estimated structures corresponding to after the earthquake shown in Figures 13b and 12a, connectivity ignoring directions is similar, but directed dependencies between companies can be seen from Figure 13b.

Figure 13:

The estimated directed graphs correspond to estimates before and after the Great East Japan Earthquake. Both graphs are estimated by the proposed method.

Figure 13:

The estimated directed graphs correspond to estimates before and after the Great East Japan Earthquake. Both graphs are estimated by the proposed method.

## 6.  Conclusion

In this study, we proposed a method for estimating graph structures using dependency between nodes. Since only the intrinsic connectivity of a graph is important in many cases, we simply represented a graph structure using an adjacency matrix and modeled the dependency between nodes using a probability matrix of transition on a graph characterized by a digraph Laplacian. In SICS methods such as glasso, we must tune a regularization parameter, whereas there is no need to tune any parameter in our proposed method. SICS methods are scalable and faster than the proposed method. However, when the size of graphs is of the same order to those as in section 5, the proposed method is shown experimentally to be comparable to or more accurate in estimating the intrinsic graph structure as compared to SICS methods with parameters optimized by StARS. Furthermore, the proposed method is applicable to estimating asymmetric graphs, while the conventional SICS methods are applicable only to symmetric graphs. As shown in the real-world experiment, the proposed method is able to capture an asymmetric structure that was not extracted by SICS methods, which assume symmetric structures only.

Since we restricted the value of each element of the connectivity to be 0 or 1, our conjecture is that our model based on the random walk on the graph is identifiable. Proving strict identifiability of the proposed model remains an important factor for future work. Also, the proposed general algorithm 1 does not have a theoretical guarantee of convergence to the ground-truth solution because it estimates the graph structure in a greedy manner. However, from Figure 8a, we see that the estimation accuracy moves close to 100% as the size of the data increases.

The proposed method requires O(n2) iterations for estimating the graph structure with n nodes. Although this order of computational complexity is linear in the number of unknown variables , each iteration involves optimization with respect to and , and to the matrix logarithm and exponential. To reduce the computational cost, it would be helpful to estimate the number of possible edges m. The major computational issue in the proposed method is calculating the matrix logarithm in each iteration, which requires at least the computational cost of an order higher than O(n5) (Roch & Villard, 1996), where n is the number of edges. The stochastic optimization approach and approximation algorithm would be applicable for reducing the computational cost. It is also possible to use another model for f in which the inverse map can be efficiently calculated. Improvement of computational efficiency for the application of large-scale graphs is an important future study to be conducted.

We designed our algorithm for estimating a -valued connectivity matrix to focus on estimating the existence of edges and obtaining intrinsic connectivity. The proposed method can be extended to estimate a real-valued matrix by omitting the thresholding operation in equation 4.5. This can make the model more flexible, and we would be able to obtain more detailed graph structure, but would need an additional tuning parameter for controlling the sparsity . In addition, for simplicity, the observation model in equation 3.8 is assumed to be a gaussian, although it is possible to replace it with another density function.

When we apply the proposed method to graph structure estimation, the most important thing is how to define the observed matrix T characterizing the dependency of nodes. In contrast to conventional Bayesian graph estimation methods based on linear structure equation models (e.g., Shimizu, Hoyer, Hyvärinen, & Kerminen, 2006), our proposed method assumes a random information transition on a graph and estimates its structure based on the matrix T, which represents dependency between nodes. However, given a set of raw observations, the estimation results vary according to how the observation matrix T is generated from the raw data. In section 5, we have shown an example of graph structure estimation using a set of stock data in which the observation matrix is defined as a correlation matrix of stock prices. Further investigation will be needed for defining the observed matrix.

Finally, in the field of neuroscience, multielectrode recording has enabled us to monitor large numbers of neurons simultaneously. Analysis of neural activities by the proposed method, specifically before and after learning, could provide visualization on the change of the network structure. We are now analyzing extracellularly obtained spike data sets by our proposed method to estimate the directed connectivity between neurons.

## Acknowledgments

We express special thanks to the editor and reviewers whose comments led to valuable improvements of the manuscript. H.H. is supported by JSPS KAKENHI grant number 25870811, and S.A. and N.M. are supported by JSPS KAKENHI grant number 25120011. M.T. is supported by the Innovates Center of Research Excellence grant (iCORE SCH001) and Natural Sciences and Engineering Research Council of Canada grant (NSERC DG 386522-2010).

## References

Banerjee
,
O.
,
El Ghaoui
,
L.
, &
d'Aspremont
,
A.
(
2008
).
Model selection through sparse maximum likelihood estimation for multivariate gaussian for binary data
.
Journal of Machine Learning Research
,
9
,
485
516
.
Belkin
,
M.
, &
Niyogi
,
P.
(
2001
).
Laplacian eigenmaps and spectral techniques for embedding and clustering
. In
T. G. Dietterich, S. Becker, & Z. Ghahramani
,
Advances in neural information processing systems
,
14
(pp.
585
591
).
Cambridge, MA
:
MIT Press
.
Bunke
,
H.
, &
Riesen
,
K.
(
2011
).
Recent advances in graph-based pattern recognition with applications in document analysis
.
Pattern Recognition
,
44
(
5
),
1057
1067
.
Chung
,
F. R. K.
(
1997
).
Spectral graph theory
.
Providence, RI
:
American Mathematical Society
.
Conte
,
D.
,
Foggia
,
P.
,
Sansone
,
C.
, &
Vento
,
M.
(
2004
).
Thirty years of graph matching in pattern recognition
.
International Journal of Pattern Recognition and Artificial Intelligence
,
18
(
3
),
265
298
.
Dempster
,
A.
(
1972
).
Covariance selection
.
Biometrics
,
28
,
157
175
.
Friedman
,
J.
,
Hastie
,
T.
, &
Tibshirani
,
R.
(
2007
).
Sparse inverse covariance estimation with the graphical lasso
.
Biostatistics
,
9
(
3
),
432
441
.
Higham
,
N. J.
(
2008
).
Functions of matrices: Theory and computation
.
:
Society for Industrial and Applied Mathematics
.
Hsieh
,
C.
,
Sustil
,
A.
,
Dhillon
,
S.
, &
Ravikumar
,
P.
(
2011
).
Sparse inverse covariance matrix estimation using quadratic approximation
. In
J. S. Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, & K. Q. Weinberger (Eds.)
,
Advances in neural information processing systems
,
24
(pp.
2330
2338
).
Red Hook, NY
:
Curran
.
Kendall
,
M.
, &
Gibbons
,
J. D.
(
1990
).
Rank correlation methods
.
London
:
Edward Arnold
.
Kleinberg
,
J. M.
(
1999
).
Authoritative sources in a hyperlinked environment
.
Journal of the ACM
,
46
(
5
),
604
632
.
Kondor
,
R.
, &
Lafferty
,
J.
(
2002
).
Diffusion kernels on graphs and other discrete input spaces
. In
Proceedings of the 19th Internation Conference on Machine Learning
(pp.
315
322
).
San Francisco
:
Morgan Kaufmann
.
Lagarias
,
J.
,
Reeds
,
J.
,
Wright
,
M.
, &
Wright
,
P.
(
1998
).
Convergence properties of the Nelder-Mead simplex method in low dimensions
.
SIAM Journal on Optimization
,
9
(
1
),
112
147
.
Li
,
Y.
, &
Zhang
,
Z.
(
2010
).
Random walks on digraphs, the generalized digraph laplacian and the degree of asymmetry
. In
R. Kumar & D. Sivakumar (Eds.)
,
Algorithms and models for the web-graph
(pp.
74
85
).
New York
:
Springer-Verlag
.
Liu
,
H.
,
Roeder
,
K.
, &
Wasserman
,
L.
(
2010
).
Stability approach to regularization selection (StARS) for high dimensional graphical models
. In
J. D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta
(Eds.),
Advances in neural information processing systems
, 23 (pp.
1432
1440
).
Red Hook, NY
:
Curran
.
Luo
,
Xi.
(
2011
).
High dimensional low rank and sparse covariance matrix estimation via convex minimization
.
arXiv
,
1111
1133
.
Page
,
L.
,
Brin
,
S.
,
Motowani
,
R.
, &
,
T.
(
1997
).
PageRank citation ranking: Bring order to the web
.
Stanford, CA
:
Stanford Digital Library
(
working paper
).
Pearl
,
J.
(
2000
).
Causality: Models, reasoning, and inference
Cambridge
:
Cambridge University Press
.
Ribeiro
,
B.
, &
Towsley
,
D.
(
2010
).
Estimating and sampling graphs with multidimensional random walks
. In
Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement
(pp.
390
403
).
New York
:
ACM
.
Roch
,
J.
, &
Villard
,
G.
(
1996
).
Fast parallel computation of the Jordan normal form of matrices
.
Parallel Processing Letters
,
6
(
2
),
203
212
.
Sahbi
,
H.
,
Etyngier
,
P.
,
Audibert
,
J.
, &
Keriven
,
R.
(
2008
).
Manifold learning using robust graph Laplacian for interactive image search
. In
Proceedings of the 2008 IEEE Computer Society, Conference on Computer Vision and Pattern Recognition.
San Mateo, CA
:
IEEE Computer Society
.
Sarma
,
A.
,
Gollapudi
,
S.
, &
Panigrahy
,
R.
(
2011
).
Estimating PageRank on graph streams
.
Journal of the ACM
,
58
(
3
),
13
19
.
Scheinberg
,
K.
, &
Rish
,
I.
(
2009
).
SINCO—a greedy coordinate ascent method for sparse inverse covariance selection problem
. (
Technical Report IBM RC24837
).
Schneidman
,
E.
,
Berry
,
M.
,
Segev
,
R.
, &
Bialek
,
W.
(
2006
).
Weak pairwise correlations imply strongly correlated network states in a neural population
.
Nature
,
440
,
1007
1012
.
Shi
,
J.
, &
Malik
,
J.
(
2000
).
Normalized cuts and image segmentation
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
22
(
8
),
888
905
.
Shimizu
,
S.
,
Hoyer
,
P.
,
Hyvärinen
,
A.
&
Kerminen
,
A.
(
2006
).
A linear non-gaussian acyclic model for causal discovery
.
Journal of Machine Learning Research
,
7
,
2003
2030
.
Tatsuno
,
M.
,
Fellous
,
J. M.
, &
Amari
,
S.
(
2009
).
Information-geometric measures as robust estimators of connection strengths and external inputs
.
Neural Computation
,
21
,
2309
2335
.
Tang
,
A.
,
Jackson
,
D.
,
Hobbs
,
J.
,
Chen
,
W.
,
Smith
,
J.
,
Patel
,
H.
, …
Beggs
,
J.
(
2008
).
A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro
.
Journal of Neuroscience
,
28
,
505
518
.
Washio
,
T.
, &
Motoda
,
H.
(
2003
).
State of the art of graph based data mining
.
ACM SIGKDD Explorations Newsletter
,
5
(
1
),
59
68
.

## Notes

1

Later in this study, we introduce the random walk model. In the model, a walker is regarded as an information carrier on the transit network represented by a graph.

2

Technically, this operation is to ensure that we can calculate the inverse of the matrix log function in algorithm 2, and we set r=1 when performing experiments.