## Abstract

Detecting and characterizing causal interdependencies and couplings between different activated brain areas from functional neuroimage time series measurements of their activity constitutes a significant step toward understanding the process of brain functions. In this letter, we make the simple point that all current statistics used to make inferences about directed influences in functional neuroimage time series are variants of the same underlying quantity. This includes directed transfer entropy, transinformation, Kullback-Leibler formulations, conditional mutual information, and Granger causality. Crucially, in the case of autoregressive modeling, the underlying quantity is the likelihood ratio that compares models with and without directed influences from the past when modeling the influence of one time series on another. This framework is also used to derive the relation between these measures of directed influence and the complexity or the order of directed influence. These results provide a framework for unifying the Kullback-Leibler divergence, Granger causality, and the complexity of directed influence.

## 1. Introduction

With the widespread acceptance of the network description onto brain processing functions, identification and quantification of directed interactions between brain structures that support the processing of specific brain functions (perceptual, cognitive or motor functions) are fundamental issues in neuroscience. For example, information about the functioning or dysfunctioning of the brain can be inferred from the network structure. The current theories of schizophrenia emphasize that the core aspects of the pathophysiology are due to the disconnection hypothesis (Friston, 1998) rather than deficits in specific brain areas or neurotransmitter systems. Neuroimage techniques, such as electroencephalography (EEG), magnetoencephalography (MEG), functional near infrared imaging (fNIR), and functional magnetic magnetic resonance imaging (fMRI) are commonly employed to generate time series measurements of brain structure activities to address questions of functional and effective connectivity that are fundamental for the description of these brain networks (Roebroeck, Formissamo, & Goebel, 2005; Hesse, Moller, Arnold, & Schack, 2003). The key distinction between functional and effective connectivity pertains to whether one is assessing simple statistical dependencies (functional connectivity) or trying to infer the parameters of an underlying model of connectivity (effective connectivity). The interesting issue, from our perspective, is that Granger causality and related analysis allow the in vivo study of effective connectivity through the statistical analysis of neuroimaging time series. Although we have been in motivated this work by the analysis of neuroimaging time series, all the arguments presented in this letter apply to any inference about directed causality in any time series.

Granger causality analysis (Granger, 1969; Geweke, 1982; Kim, Putrino, Ghosh, & Brown, 2011; Quinn, Coleman, & Kiyavash, 2011) and transfer entropy methods (Schreider, 2000; Massey, 1990; Kamitake, Harashima, & Miyakawa, 1984; Amblard & Michel, 2011; Quinn, Coleman, Kiyavash, & Hatsopoulos, 2011) have become increasingly used approaches to explore directed influences between brain structures using functional neuroimage data (Hinrichs, Heinze, & Schoenfeld, 2006; Seghouane, 2011). While both approaches target the same objective—measuring the effect that one brain structure has on another—the transfer entropy specifically uses the Kullback-Leibler divergence to measure the influence of extra information. The general idea of Granger causality is defined in terms of upgrading predictability. If a signal X causes a signal Y, the knowledge of the past of both X and Y should improve the prediction of the future of Y in comparison to the knowledge of the past of Y only, whereas the idea behind transfer entropy is defined in terms of influence on conditional probabilities as measured with the Kullback-Leibler divergence (Kullback & Leibler, 1951). If a signal Y does not cause a signal X, then the probability density describing the future of X conditioned on its past should not be different from the probability density describing the future of X conditioned on its past and the past of Y as measured by the Kullback-Leibler divergence.

In this letter, we make the simple point that all current statistics used to make inferences about directed influences in functional neuroimage time series are variants of the same underlying quantity. This includes directed transfer entropy, transinformation, Kullback-Leibler formulations, conditional mutual information, and Granger causality. In the autoregressive modeling framework, the relation between these information measures of directed influence, Granger causality, and the complexity of directed influence is derived in both the univariate and multivariate cases. The rest of this letter is organized as follows. After a brief formulation of the problem, the different measures of directed influence are introduced in section 2, and the different relations of these measures are described. The autoregressive modeling is introduced in section 3, and the link with Granger causality and the complexity of directed influence is established for the univariate case. The multivariate case is treated in section 4. A simulation example is described in section 5, and the conclusion is given in section 6.

## 2. Problem Formulation and Measures of Directed Influence

In a typical fMRI experiment, several regions of interest (ROIs) are a priori identified in the brain. Each ROI is represented in the fMRI data set by multiple voxels, where each voxel is a variable comprising a single time series reflecting changes in the underlying metabolic signal. A standard approach used to assess directed influence between two ROIs is to derive a single time series for each ROI either by averaging or extracting a principal component (Zhou, Chen, Ding, Lu, & Liu, 2009); alternatively repeated pairwise analysis can be performed on pairs of voxels.

*N*, and corresponding to two ROIs between which some interaction exists. Let and be two scalar-valued time series of length

*N*sampled from

*X*and

*Y*, respectively. At time

*k*, the discrepancy between the probability densities and can be used to test the directed influence . In the absence of interaction,

*Y*has no influence on

*X*, and these two densities are equal (Schreider, 2000). The directed influence

*Y*has on

*X*can then be quantified by measuring the expectation of the discrepancy between

*p*(

*x*|

_{k}*x*) and (

^{p}_{k}*p*(

*x*|

_{k}*x*,

^{p}_{k}*y*), which can be quantified using the Kullback-Leibler divergence (Kullback & Leibler, 1951) as follows: If the information set

^{q}_{k}*J*available at time

_{k}*k*is

*x*

_{k−j}, . Let define the expended information set

*J*plus

_{n}*y*

_{k−j}, . Then, based on the definition of Granger causality (Granger & Newbold, 1986),

*y*

_{k−j}does not cause,

*x*with respect to if for all

_{k}*k*>0, so that the extra information in does not affect the conditional distribution. Therefore, equation 2.1 is an extension of this definition where the Kullback-Leibler divergence is used to quantify the distance between the conditional distribution

*P*(

*x*|

_{k}*J*) and .

_{n}*N*, where

*H*(.|.) and

*I*(., .|.) represent the conditional entropy and the conditional mutual information, respectively. The measure described in equation 2.2 is linked to two other entropy-based measures introduced in the literature to describe the amount and the direction of interaction between two time series. For example, for

*p*=

*q*=

*k*−1, Equation 2.2 resembles one of the first introduced directed interaction measure (Massey, 1990), where . The only difference between equations 2.3 and 2.4 is the presence of the sample

*y*in the conditional mutual information expression. Equation 2.4 corresponds to finding the mutual information between the time series

_{k}*Y*up to

*k*rather than

*k*−1 for equation 2.3 and the current sample of

*X*, conditioned on the past

*k*−1 samples of

*X*.

*p*samples of

*X*and

*Y*, respectively, and are the

*q*future samples of

*X*with

*p*+

*q*+1=

*N*. This measure was used in Hinrichs et al. (2006) to evaluate effective connectivity in a patient with homonymous hemianopsia due to a posterior cerebral artery stroke. Relation 2.5 corresponds to finding the mutual information between the current sample of

*Y*and future

*q*samples of

*X*conditioned on the past

*p*samples of

*X*, the past

*p*samples of

*Y*, and the current sample of

*X*. The proposed measure, equation 2.2, measures the influence of the past samples of

*Y*on the current sample of

*X*, while equation 2.5 measures the influence of the current sample of

*Y*on the future samples of

*X*. In equation 2.2, only the past samples of

*Y*are considered rather than the past and current samples of

*Y*because it is unlikely that the current measure of

*Y*will instantaneously have direct influence on the current measure of

*X*. This is also useful to make a link with transition probabilities and autoregressive modeling.

Relations 2.6 and 2.7 are similar to the conversation law (Massey & Massey, 2005). Their derivation is given in appendix A.

The computation of the above measure requires estimating multivariate probability density functions. Two different approaches can be used to estimate these multivariate entropies from experimental data: nonparametric or parametric. The nonparametric approach requires the choice of a kernel and the estimation of its bandwidth (Hinrichs et al., 2006; Schreider, 2000). The parametric approach requires the selection of the appropriate model and the estimation of its parameters. We adopt the parametric approach using autoregressive modeling in this letter for its relation with Granger causality (Granger, 1969; Geweke, 1982).

## 3. Parametric Estimation of the Directed Influence

*X*and

*Y*can be modeled using AR models Kaminski et al. (2001), where are identical and independently distributed (i.i.d.). , independent of

*x*

_{1}, , and is a scalar value time series sampled from

*X*. When this model is used, the conditional probability density, On the other hand, the ARX model, where are i.i.d. , independent of

*y*

_{max(p,q)+1−q},

*x*

_{max(p,q)+1−p}and , which incorporates the

*q*past samples of

*Y*to improve the prediction of

*X*at the current time, can be used to model the conditional probability density: Equation 2.2 is well adapted for use with models 3.2 and 3.4, and in case max(

*p*,

*q*)=

*p*, equation 3.5 becomes In practice, the candidate models 3.1 and 3.3 are estimated from the data. Substituting the maximum likelihood estimates and in the equation above for and , we obtain This represents the Granger causality defined in terms of the predictive error variances introduced in Granger (1969) and Geweke (1982) up to a multiplication constant. In this context, if the variability of the error , of the ARX model, equation 3.3, as measured by its variance , is smaller than the variability of the error of the AR model, equation 3.1, as measured by its variance , then there is an improvement in the prediction of X due to Y. In contrast to Granger causality, the proposed directed influence measure is framed not in terms of prediction error but in terms of the discrepancy between conditional probabilities. In the framework of autoregressive modeling, the proposed directed influence measure is equivalent to Granger causality.

The derivation is given in appendix B.

Therefore, (and the Granger causality) measures the assistance of *Y* in predicting the future of *X* by asymptotically measuring the order or complexity by which *Y* influences *X*. As a result, there is an attractive complementarity between the notions’ complexity, information, and prediction when it comes to measuring the interaction or influence between time series. The parametric estimation of the proposed measure, equation 2.2, using models 3.1 and 3.3, requires estimating the parameters of these models as well as the determination of their orders. The estimation of the parameters can be derived using the least squares method or the Yule-Walker equations (Brockwell & Davis, 1991). The estimation of the orders *p* and *q* can be obtained using a model selection criterion (Seghouane & Amari, 2007; Seghouane & Bekara, 2004; Seghouane, 2010).

## 4. The Multivariate Case

Instead of an approach that consists of deriving a single time series for each ROI, a possible alternative to assess the directed influence between two ROIs would be to consider multivariate random variables represented by vector time series characterizing the group of voxels comprising each ROI (Barrett, Seth, & Barnett, 2010). In this case, the autoregressive modeling requires the use of vector autoregressive models.

*m*-dimensional stochastic processes reflecting changes in the metabolic

*m*-dimensional signals of two ROIs comprising

*m*voxels. Then the vector autoregressive modeling equivalent to equation 3.1 is given by where is an observed vector of measurements of brain structure activities at times ;

*A*, , are coefficient matrices of unknown parameters; and are i.i.d. normal random variables with mean zeros and variance-covariance matrix . The approximating VAR(p) model, equation 4.1, can be rewritten as where and is a known design matrix of full column rank. The vector autoregressive modeling equivalent to equation 3.3 is given by where

_{i}*B*, , are coefficient matrices of unknown parameters and are i.i.d. normal random variables with mean zeros and variance-covariance matrix . In case max(

_{i}*p*,

*q*)=

*p*, the approximating VAR(p,q) model, equation 4.3, can also be rewritten as where and is a known design matrix of full column rank.

*m*-dimensional stochastic processes

*X*due to

*Y*. Also if framed in term of discrepancy between conditional probabilities, in the context of vector autoregressive modeling, the proposed directed influence measure, equation 2.2, is equivalent to Granger causality.

The derivation is given in appendix C.

In comparison to equation 3.7, note the presence of *m*^{2} on the right-hand side of equation 4.6. This represents the dimension of the coefficient matrices *A _{i}*, and

*B*, . In the univariate case,

_{i}*m*=1, (22) is equal to equation 3.7. Therefore, (and the Granger causality) measures the directed influence of

*Y*on

*X*by asymptotically measuring the complexity by which

*Y*influences

*X*as measured by the number of required new parameters.

The parametric estimation of the proposed measure, equation 2.2, using models 4.1 and 4.3, requires the estimation of the parameters of these models as well as the determination of their orders. The estimation of the parameters can be obtained using the ordinary least squares method or the multivariate Yule-Walker equations (Brockwell & Davis, 1991). The estimation of orders *p* and *q* can be obtained using an appropriate model selection criterion (Hurvich & Tsai, 1993; Seghouane, 2006).

## 5. Simulation Examples

To verify relation 3.7 of proposition 2, we generated 50 data sets of size

using two models:

- •
- •

For each sample size *N*, the expectation of relation 3.7 was obtained by averaging relation 3.6 () over the 50 realizations. For both models, each *N*, and each data set, the parameter estimates of the *AR* and *ARX* models were obtained by least squares. The results of the numerical estimations closely correspond to the theoretical estimate (see Figure 1) for model 1 and (see Figure 2) for model 2.

## 6. Conclusion

Directed influence measures are finding increasing applications in neuroscience. In this letter, the relations between all current statistics used to make inference about directed influences have been derived. It has been established that these statistics are variants of the same underlying quantity, the optimum statistics under the Neyman-Pearson lemma. In the context of autoregressive modeling, it has been shown that asymptotically, both measures quantify the same information about the directed influence—the complexity of directed influence as measured by the order of the extra-autoregressive part of the model used to characterize the influence of the past of one time series on another. In the multivariate case, this complexity of influence is represented by the order of the extra-autoregressive part of the model used to characterize the influence of the past of one multivariate time series on another, multiplied by the dimension of the coefficient matrices. This corresponds to the number of extra parameters needed to model the influence. There is therefore an attractive complementarity among complexity, information, and prediction when it comes to measuring directed influences from neuroimage time series measurements.

## Appendix A: Proof of Proposition 1

## Appendix B: Proof of Proposition 2

## Appendix C: Proof of Proposition 3

## Acknowledgments

We thank the anonymous reviewers for their comments and suggestions, which helped to improve the quality and presentation of this letter. NICTA is funded by the Australian government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program. This work was partly supported by the Human Frontier Science Program (grant RGY 80/2008) and the Japanese Society for Promoting Science Fellowship (number S-10738).