## Abstract

A graph is a mathematical representation of a set of variables where some pairs of the variables are connected by edges. Common examples of graphs are railroads, the Internet, and neural networks. It is both theoretically and practically important to estimate the intensity of direct connections between variables. In this study, a problem of estimating the intrinsic graph structure from observed data is considered. The observed data in this study are a matrix with elements representing dependency between nodes in the graph. The dependency represents more than direct connections because it includes influences of various paths. For example, each element of the observed matrix represents a co-occurrence of events at two nodes or a correlation of variables corresponding to two nodes. In this setting, spurious correlations make the estimation of direct connection difficult. To alleviate this difficulty, a digraph Laplacian is used for characterizing a graph. A generative model of this observed matrix is proposed, and a parameter estimation algorithm for the model is also introduced. The notable advantage of the proposed method is its ability to deal with directed graphs, while conventional graph structure estimation methods such as covariance selections are applicable only to undirected graphs. The algorithm is experimentally shown to be able to identify the intrinsic graph structure.

## 1. Introduction

A graph is a mathematical structure that represents the relationship of data. Typical examples of graph structures are railroads, the Internet, transmission networks, and neural networks. In recent years, pattern recognition and data mining based on graph structure have been studied. For example, the graph structure was used by Bunke and Riesen (2011) for document analysis, Conte, Foggia, Sansone, and Vento (2004) for biometric identification, and Washio and Motoda (2003) for data mining.

A graph is composed of nodes (vertices) and edges (arcs), where a connection between a pair of nodes is expressed by an edge. Consider a case of stock prices; nodes correspond to companies, and edges correspond to direct connectivity between companies. Influences of stock prices can transfer through several companies. As another example, in the case of neural networks, nodes correspond to nerve cells (neurons), and edges correspond to direct connectivity between neurons. If coincident multineuronal firings are observed, correlations between these observed neurons are stronger than others. However, these neurons may not be connected directly, and the observed coincident firings may be a result of influences from other neurons. This effect is called a pseudo-correlation, and it is problematic in data analysis.

There are two major graph estimation problems. One is a problem of estimating the importance of nodes when connectivities of whole edges are known—for example, web page ranking problems such as PageRank (Page, Brin, Motowani, & Winograd, 1997) and HITS (Kleinberg, 1999). The other problem is estimating edges based on certain statistics, which represents dependency between nodes—for example, cell-signaling data analysis (Friedman, Hastie, & Tibshirani, 2007). We deal with the latter problem in this letter. There are many different kinds of statistics for representing the relationship between nodes; therefore, we collectively refer to these statistics as dependency.

*T*and a regularization parameter , the objective is to minimize the regularized negative log likelihood of a gaussian distribution: where is the inverse covariance matrix, denotes positive definiteness of the matrix, , and is the element-wise -norm of . Typical algorithms are glasso (graphical lasso; Friedman et al., 2007), COVSEL (Banerjee, Ghaoui, & d'Aspremont, 2008), SINCO (Scheinberg & Rish, 2009), LOREC (Luo, 2011), and QUIC (Hsieh, Sustil, Dhillon, & Ravikumar, 2011). These algorithms are computationally efficient and scalable and can deal with large-scale graphs. However, they cannot be used for estimating asymmetric graph structures because they assume a multivariate gaussian distribution as a data generative model.

*x*is the binary variable representing the state of the firing of neuron

_{i}*i*; , are the canonical parameters of the exponential model; and is a normalization factor. They estimated parameters and to identify the neural network structure from the observed spike trains. Similar to the SICS case, these models assume a symmetric graph structure, up to second-order statistics. Moreover, there are hyperparameters to be determined (Tatsuno et al., 2009).

In this study, based on the random walk model on graphs, we propose a method that can extract only important connections in graphs without any tuning parameters. A random walk model is often used in graph analyses. For example, Ribeiro and Towsley (2010) proposed using random walk models for sampling nodes and edges from a given graph to calculate network characteristics such as degree distribution. Sarma, Gollapudi, and Panigrahy (2011) proposed to approximate the importance index of the node in this PageRank (Page et al., 1997) scheme by using random walk modeling for a given graph. Although a random walk model is used for sampling from a fixed graph structure, to the best of our knowledge, there has been no attempt to use a random walk model on a graph to estimate intrinsic graph structure. Although the proposed framework of graph structure estimation is different from SICS, we compare the ability of our method to several SICS methods, which are actively studied and considered to offer state-of-the-art accuracy.

The remainder of this letter is organized as follows. In section 2, we define the notion of graph structures and represent dependency as the total intensity of directed and undirected connections between nodes. Section 3 describes the digraph Laplacian that is a simple matrix function characterizing a graph and describes information transitions on a graph. In section 4, an algorithm for estimating graph structure is proposed. Section 5 shows the comparison of graph structure estimation with our method and existing methods by using both synthetic and real-world data sets. The last section, a conclusion, provides examples of future research on this field of study.

## 2. Relevant Background and Problem Setting

In this section, we explain the notion of graph structures (Chung, 1997) and introduce the framework for estimating graph structure. Let *V* be a set of nodes and *E* be a set of edges that connects nodes *i* and *j*. A graph is characterized by a pair (*V*, *E*). There are two types of graphs: undirected and directed. When we are interested in only connectivity between nodes, we use undirected graphs, and when we are interested in the directions of edges, we use directed graphs.

*f*, we propose a framework of estimating the intrinsic graph structure by an observation .

^{1}As a concrete model of

*f*, we introduce the random walk on graphs, which is characterized by the digraph Laplacian introduced in section 3. We show a conceptual diagram of graph estimation in Figure 1.

Let be a connectivity matrix with , , which represents the connection intensity from nodes *i* to *j*, and can be zero if there is no edge between *i* and *j*. For the example of stock prices, represents direct connectivity between companies. We can deal with both undirected and directed graphs, but in this study, we mainly consider the directed graph, that is, generally . In our representation, diagonal elements are always set to 0, that is, we assume does not include self-loops. We emphasize that the connectivity in this study indicates the direction and intensity of information transition via the edge, unlike graphical models (Pearl, 2000) in which edges represent causality.

*i*and

*j*, while the third term represents information propagation via one intermediate node

*k*, and so on. A concrete example of equation 2.2 is shown in equation 3.11. For stock prices, is supposed to be a certain quantity that represents how one company's stock price affects another company's stock price.

*f*. Therefore, we have to choose

*f*, which has the inverse

*f*

^{−1}. For computational and theoretical simplicity, we later propose using the matrix exponential map as the function

*f*.

*f*and off-diagonal elements of are given. Algorithm 1 is the general framework of proposed graph structure estimation; an algorithm with a concrete model for

*f*is given in algorithm 2. Since autodependency is not well defined in terms of equation 2.2, we consider as a latent variable and replace it with an appropriate initial value . That is, the diagonal elements of are complemented by 0 and replace the diagonal elements with

*r*>0 so that all of the eigenvalues of become positive.

^{2}Then we obtain . We impose certain constraints on . One of the constraints is that the diagonal elements should be 0. In section 3, as an additional constraint, we restrict the maximum number of nonzero elements of . Let be a subspace of the connectivity , which is composed of elements that satisfy these constraints. We denote the projection of to the constrained space by Thus, we obtain . By using the diagonal elements of , we update , where diag(

*M*) is a matrix-valued function for a square matrix

*M*that replaces off-diagonal elements of

*M*by 0. We denote this updating as a map from to : where denotes a subspace of composed of elements that the off-diagonal elements are equal to those of . We can estimate from off-diagonal elements of by repeating the above procedure. A conceptual diagram of the general procedure for estimating from is shown in Figure 3.

## 3. Model of Graph Structure

We propose using the exponential map as the function *f* in equation 2.1 for theoretical and computational simplicity. Then we can simply obtain the inverse of *f* as the matrix logarithm. In this section, we explain digraph Laplacian (Li & Zhang, 2010) for considering the exponential map of directed graphs. We explain the notion of a random walk on a graph characterized by the digraph Laplacian and represent information propagation on a graph by a random walk.

### 3.1. Digraph Laplacian.

*V*,

*E*). First, we consider a directed graph and assume the connectivity matrix as where is known as an adjacency matrix in the literature of the graph theory. This assumption is reasonable when the connectivity takes two distinct values that represents the states “connected” and “unconnected.” We expect that the estimation of becomes more robust to observation noises by this assumption.

*L*as a function of as From equation 3.2 all row-wise sums of are 0s, for example, , where

**1**and

**0**are

*n*-dimensional vectors whose elements are all 1s and 0s, respectively.

*L*is the same as equation 3.2, and

*L*also becomes symmetric. By definition, all row-wise and column-wise sums of are 0s—for example, and . The graph Laplacian plays a central role in spectral graph theory (Chung, 1997) and is utilized for various purposes of data analysis such as clustering (Shi & Malik, 2000; Belkin & Niyogi, 2001), manifold learning (Sahbi, Etyngier, Audibert, & Keriven, 2008), and web page ranking (Page et al., 1997; Kleinberg, 1999).

### 3.2. Random Walk on Graph.

*I*is the unit matrix. Note that holds. The element is the transition probability from node

_{n}*i*to

*j*in the interval , and is the probability that a random walker remains in node

*i*. We show a conceptual diagram of the transition probability that is represented by the matrix in Figure 4.

*n*-dimensional unit vector defined as The

*j*th element of gives the probability of a one-step transition from node

*i*to

*j*. In the same way, , (i.e., the

*i*th column of ,) gives the -step transitions probability vector starting from the node

*i*. By considering the continuous time limit of the random walk, we define the transition probability in a unit time as The digraph Laplacian characterizes a random walk on a directed graph, and the

*ij*element of the matrix is regarded as the probability that the walker starting from node

*i*stays at node

*j*after a unit time. When is the graph Laplacian, the right-hand side of equation 3.6 is called the diffusion kernel (Kondor & Lafferty, 2002).

### 3.3. Observed Data Model.

In this section, we introduce a generative model of the observed data using the exponential map defined in equation 3.6. We show the relationship between variables in Figure 5.

*f*in equation 2.1. Here we assume that the dependency is observed with an additive noise as where represents a noise term. Specifically, we consider as an estimate of , and

*T*as an matrix with the

*ij*element

*t*. That is, the quantity

_{ij}*t*is characterized by the dependency with a certain probability model and

_{ij}*T*is observed instead of . We can assume an arbitrary distribution for

*t*, but for simplicity, we assume

_{ij}*t*are samples from gaussian distributions with means . The probability density function of observed data

_{ij}*t*is written as where is a common variance. The joint distribution of the observed data

_{ij}*T*can be expressed as We observe only the dependency with noise, though we can estimate connectivity by the algorithm proposed in the next section. Figure 5 shows that the intrinsic connectivity is simplified, and then the digraph Laplacian is calculated. The intrinsic connectivity leads to dependency via a function

*f*. We approximate using the random walk, and we assume a matrix

*T*is observed as affected by additive noises.

*f*to have additional parameters in the random walk in equation 3.10. We can see the relationship between equations 2.2 and 3.11, that is, corresponds to , and corresponds to . Since by equation 3.12, where is the variance of the gaussian noise for the observation. We have to calculate the inverse function of

*f*in equation 2.1 to estimate a graph structure . Since

*f*is modeled by the exponential map , we only have to calculate by using fixed and .

## 4. Algorithm for Estimating Graph Structure

*T*. This procedure is summarized in algorithm 2. Since we assume observation noises are gaussian, we consider the objective function as the sum of squared residuals,

*ij*element of the logarithm of is We can estimate as We note that diagonal elements of are not defined, and we update while updating parameters , and using the following iterative algorithm.

*r*is chosen so as to . Next, we calculate the logarithm of . For fixed and , we calculate an approximation of the matrix logarithm using the Jordan normal form (Higham, 2008) of to obtain , which corresponds to procedure A in algorithm 1. The number of maximum possible edges

*m*is incremented by a value of 1 starting from

*m*=1 in the iteration of the algorithm. Let be the

*m*th largest value of for . Then we set By this thresholding operation, we choose

*m*most likely connected edges. We estimate and that minimize the objective function in equation 4.1 with , and obtain We used the Nelder-Mead method (Lagarias, Reeds, Wright, & Wright, 1998) for optimizing and . The procedures shown in equations 4.6 and 4.7 correspond to procedure B in algorithm 1. We update the diagonal elements of by which corresponds to procedure D in algorithm 1. For a fixed

*m*, we repeat these procedures times. The stopping criterion of the inner loop can be defined by the gap between the previous and current update of the estimate, but for simplicity, we set the maximum number of iterations . In our preliminary experiments, we observed that the estimate converged at most two iterations; hence, we set in our experiments. We denote the estimates obtained by these procedures as , , and .

## 5. Experiments

In this section, we show experimental results of graph structure estimation. In section 5.1, we focus on an estimate of undirected graphs. We compare results of the proposed method with results of SICS methods. Since generative models of the proposed framework and SICS framework are different, we estimate graph structures from observed data according to each generative model. Observed data are generated from the random walk in section 5.1.1. This condition is faithful to the proposed framework. Observed data are generated by the multidimensional gaussian distribution in section 5.1.2. This condition is faithful to the SICS framework. In section 5.2, we estimate directed graph structures using the proposed method only because SICS methods cannot be applied to directed graph structure estimation. In section 5.3, we apply the proposed method and SICS methods to real-world financial data sets and compare the results of these methods. As representative SICS methods, we adopt glasso (Friedman et al., 2007), QUIC (Hsieh et al., 2011), and LOREC (Luo, 2011). We use StARS (stability approach to regularization selection; Liu, Roeder, & Wasserman, 2010) to find an appropriate regularization parameter in SICS methods.

### 5.1. Artificial Data of Undirected Graphs.

First, we estimate symmetric graph structures using both the proposed method and SICS methods and compare their performances. We set the number of nodes as *n*=20.

#### 5.1.1. Data in Accordance with the Random Walk.

*T*by where and are randomly generated from a uniform distribution over [0, 1].

*T*are set to a common value

*r*in the same way as equation 4.4. In all our experiments,

*r*is fixed at 1, and it always made the eigenvalues of positive. We regard

*T*as the covariance matrix of a gaussian distribution and solve the regularized gaussian MLE problem: Then, to obtain -valued connectivity, we transform the estimated by the following thresholding operation: We use StARS to find the regularization parameter . StARS finds an appropriate value of based on stability estimated by random subsampling. In our study, the observed data

*T*are generated 100 times according to the assumed generative model for calculating stabilities in StARS.

*T*. In Figure 6, we show the estimation results where the vertical axis and horizontal axis indicate the accuracy and size of the observed noise, respectively. We define the accuracy of estimation as where is the

*ij*element of estimated connectivity matrix. We also define the size of observed noise as that is, the size of noise is the coefficient of variation. Figures 6a to 6c are accuracies obtained by different methods when the densities of are 0.2, 0.4, and 0.6.

In Figure 6, we can see that the accuracies of SICS methods decrease as the densities increase. The proposed method is shown to obtain high accuracies in most cases.

In Figure 7, to see the effect of the regularization parameters in SICS methods, we show the average estimation accuracies by the proposed method and the glasso method with various regularization parameters. The vertical axis and horizontal axis of the graphs indicate accuracy and regularization parameters , respectively. In this study, we fixed the density of to 0.2, but we observed a similar tendency for other densities. We prepared two levels of noise: 0.006 and 0.06. The proposed method has no tuning parameter in contrast to SICS methods. Figure 7 shows that the misspecification of the regularization parameter of glasso leads to inaccurate estimation results. In this study, the proposed method achieved almost the same accuracy as the optimal case of glasso.

#### 5.1.2. Data Generated from the Multidimensional Gaussian Distribution.

*N*is the number of samples. We consider the empirical covariance matrix of as an observation In Figure 8a, we show the relationship between the accuracy and the number of samples

*N*when the density of is fixed to 0.2 to investigate how the performance is influenced by the number of samples. In Figure 8b, we show the accuracy when

*N*=200 to investigate how the performance is influenced by the connection density.

From Figure 8a, we see that when the data are generated from a multivariate gaussian with the inverse covariance matrix , the proposed method accurately estimates the ground-truth graph. From Figure 8b, we can see that the accuracies of every method decrease as the densities increase. However, the accuracy decrease of the proposed method is slower than others and retains more than 60% accuracy even when the density reached 0.6.

There are two possible reasons for the better accuracies achieved by the proposed method shown in Figures 6 and 8. The first is the essential difference of the underlying models of the proposed and SICS frameworks. While the SICS framework considers only the second-order statistics, the proposed framework takes higher-order statistics into account, as shown in equations 3.10 and 3.11. The second reason is the difference of the regularization methods. We adopt StARS for defining the regularization parameter in SICS methods, while algorithm 2 has a mechanism to automatically determine an appropriate sparsity based on the fitness of the model.

### 5.2. Artificial Data of Directed Graphs.

In this section, we estimate directed graphs represented by asymmetric adjacency matrices . Since SICS methods cannot be used for estimating directed graphs, we use only the proposed method. We prepare several with densities 0.2, 0.4, 0.6, and randomly generate , from a uniform distribution over [0, 1]. We set the number of nodes to *n*=20. We generate observed data *T* according to equation 3.13 in the same manner as the experiments of the undirected graph. We show the relationship between accuracy, defined by equation 5.4, and the size of observed noise defined, by equation 5.5, in Figure 9. As shown in Figures 6 and 9, the proposed method is able to estimate the directed graph structure the same as the undirected case. The accuracy of estimation deteriorates with the increase of the density of the ground-truth graph and the size of noise. This tendency is observed in the experiments with both directed and undirected graphs.

Sections 5.1 and 5.2 are summarized as follows. We performed experiments with artificial data and showed that the proposed method is able to estimate the graph structures more accurately than the conventional SICS methods in most cases. The proposed method, which has no tuning parameter, achieved comparable or superior accuracies to the finely tuned glasso. Also, the proposed method is applicable to directed graphs, while the conventional SICS methods are applicable only to undirected graphs.

### 5.3. Real-World Data.

In this section, we apply the proposed method and SICS methods to analyze a set of time-series data of stock prices downloaded from Yahoo! finance.^{3} First, we consider an undirected graph with nodes corresponding to companies to compare the results of the proposed method with the results of SICS methods. Next, we consider a directed graph to analyze the relationship of the companies in more detail by the proposed method. The data set is composed of the daily closing prices of 10 companies from November 11, 2010, to July 11, 2011, which is equivalent to 158 business days. We show the data in Figure 10. We take an interest in how the relationship of companies changed around the day of the Great East Japan Earthquake on March 11, 2011.

#### 5.3.1. Undirected Graphs.

The edges of the graph represent a certain kind of relationship of stock prices between companies. For example, if stock prices of companies *i* and *j* behave similarly, it is natural to consider that there is an edge between these companies. We estimate the undirected connectivity matrix from an observation. Let be a stock price of company *i* at time *t*. Since we focus on the existence of connections and do not consider whether connections are positive or negative, we consider an absolute value of relative change rate *R _{i}*(

*t*)=|(

*S*(

_{i}*t*+1)−

*S*(

_{i}*t*))|/

*S*(

_{i}*t*). We assume that virtual random walkers in the stock market have uptick and downturn factors. They transit between nodes according to the rate determined by the intensity of the connection, and we assume the rise and decline of stock prices of companies can be explained by the transition of these random walkers.

*t*and

*t*+1 to obtain and . The degree of synchrony of companies

*i*and

*j*is measured by using Kendall's correlation (Kendall & Gibbons, 1990): There is no need to assume a specific distribution on

*R*(

_{i}*t*) when we use Kendall's correlation. Using the observation matrix

*T*defined by equation 5.7, we estimate connections between companies by the proposed method and SICS methods.

Ten large electronic companies (Sony, Panasonic, Sharp, Toshiba, Google, IBM, Apple, Microsoft, LG, and Samsung) are chosen and their daily closing stock prices *S _{i}*(

*t*) before and after the Great East Japan Earthquake (November 11, 2010–March 11, 2011 and March 11, 2011–July 11, 2011) are used to construct observation matrices. Figures 11 and 12 are the estimated graph structures before and after the earthquake, respectively.

From Figures 11 and 12, only a few edges were detected by SICS methods. The sparseness of SICS methods is decided by StARS. From the results of the SICS methods, we cannot see any changes in the relationship between companies around the day of the earthquake. The proposed method outperformed SICS methods on artificial data sets; therefore, the result of the proposed method can be considered more reliable. In addition, the proposed method reveals an interesting relationship between companies. From Figure 11a, we can see that only the proposed method extracts some mutual connections between Japanese companies (Sony, Panasonic, and Toshiba) and U.S. companies (Apple, IBM, and Google). From Figure 12a, we can see that Japanese and U.S. companies are mutually connected to each other, but companies in these two countries become isolated after the earthquake occurred. From this result, it is conceivable that the stock prices of Japanese companies showed similar behavior, and the behavior was distinct from those of U.S. companies.

#### 5.3.2. Directed Graphs.

*i*to

*j*, we calculate the time-shifted variant of

*R*as

_{i}*R*

^{+}

_{i}(

*t*)=

*R*(

_{i}*t*+1) and define observation

*T*by By this definition of observation, effects both with and without delay can be taken into account, and the observation matrix

*T*becomes asymmetric.

We show the estimated directed graph structures in Figure 13. Figures 13a and 13b have different connectivity from the structures shown in Figures 11a and 12a. Comparing the estimated structures corresponding to after the earthquake shown in Figures 13b and 12a, connectivity ignoring directions is similar, but directed dependencies between companies can be seen from Figure 13b.

## 6. Conclusion

In this study, we proposed a method for estimating graph structures using dependency between nodes. Since only the intrinsic connectivity of a graph is important in many cases, we simply represented a graph structure using an adjacency matrix and modeled the dependency between nodes using a probability matrix of transition on a graph characterized by a digraph Laplacian. In SICS methods such as glasso, we must tune a regularization parameter, whereas there is no need to tune any parameter in our proposed method. SICS methods are scalable and faster than the proposed method. However, when the size of graphs is of the same order to those as in section 5, the proposed method is shown experimentally to be comparable to or more accurate in estimating the intrinsic graph structure as compared to SICS methods with parameters optimized by StARS. Furthermore, the proposed method is applicable to estimating asymmetric graphs, while the conventional SICS methods are applicable only to symmetric graphs. As shown in the real-world experiment, the proposed method is able to capture an asymmetric structure that was not extracted by SICS methods, which assume symmetric structures only.

Since we restricted the value of each element of the connectivity to be 0 or 1, our conjecture is that our model based on the random walk on the graph is identifiable. Proving strict identifiability of the proposed model remains an important factor for future work. Also, the proposed general algorithm 1 does not have a theoretical guarantee of convergence to the ground-truth solution because it estimates the graph structure in a greedy manner. However, from Figure 8a, we see that the estimation accuracy moves close to 100% as the size of the data increases.

The proposed method requires *O*(*n*^{2}) iterations for estimating the graph structure with *n* nodes. Although this order of computational complexity is linear in the number of unknown variables , each iteration involves optimization with respect to and , and to the matrix logarithm and exponential. To reduce the computational cost, it would be helpful to estimate the number of possible edges *m*. The major computational issue in the proposed method is calculating the matrix logarithm in each iteration, which requires at least the computational cost of an order higher than *O*(*n*^{5}) (Roch & Villard, 1996), where *n* is the number of edges. The stochastic optimization approach and approximation algorithm would be applicable for reducing the computational cost. It is also possible to use another model for *f* in which the inverse map can be efficiently calculated. Improvement of computational efficiency for the application of large-scale graphs is an important future study to be conducted.

We designed our algorithm for estimating a -valued connectivity matrix to focus on estimating the existence of edges and obtaining intrinsic connectivity. The proposed method can be extended to estimate a real-valued matrix by omitting the thresholding operation in equation 4.5. This can make the model more flexible, and we would be able to obtain more detailed graph structure, but would need an additional tuning parameter for controlling the sparsity . In addition, for simplicity, the observation model in equation 3.8 is assumed to be a gaussian, although it is possible to replace it with another density function.

When we apply the proposed method to graph structure estimation, the most important thing is how to define the observed matrix *T* characterizing the dependency of nodes. In contrast to conventional Bayesian graph estimation methods based on linear structure equation models (e.g., Shimizu, Hoyer, Hyvärinen, & Kerminen, 2006), our proposed method assumes a random information transition on a graph and estimates its structure based on the matrix *T*, which represents dependency between nodes. However, given a set of raw observations, the estimation results vary according to how the observation matrix *T* is generated from the raw data. In section 5, we have shown an example of graph structure estimation using a set of stock data in which the observation matrix is defined as a correlation matrix of stock prices. Further investigation will be needed for defining the observed matrix.

Finally, in the field of neuroscience, multielectrode recording has enabled us to monitor large numbers of neurons simultaneously. Analysis of neural activities by the proposed method, specifically before and after learning, could provide visualization on the change of the network structure. We are now analyzing extracellularly obtained spike data sets by our proposed method to estimate the directed connectivity between neurons.

## Acknowledgments

We express special thanks to the editor and reviewers whose comments led to valuable improvements of the manuscript. H.H. is supported by JSPS KAKENHI grant number 25870811, and S.A. and N.M. are supported by JSPS KAKENHI grant number 25120011. M.T. is supported by the Innovates Center of Research Excellence grant (iCORE SCH001) and Natural Sciences and Engineering Research Council of Canada grant (NSERC DG 386522-2010).

## References

*23*(pp.

*High dimensional low rank and sparse covariance matrix estimation via convex minimization*

## Notes

^{1}

Later in this study, we introduce the random walk model. In the model, a walker is regarded as an information carrier on the transit network represented by a graph.

^{2}

Technically, this operation is to ensure that we can calculate the inverse of the matrix log function in algorithm 2, and we set *r*=1 when performing experiments.