Abstract

Accurate causal inference among time series helps to better understand the interactive scheme behind the temporal variables. For time series analysis, an unavoidable issue is the existence of time lag among different temporal variables. That is, past evidence would take some time to cause a future effect instead of an immediate response. To model this process, existing approaches commonly adopt a prefixed time window to define the lag. However, in many real-world applications, this parameter may vary among different time series, and it is hard to be predefined with a fixed value. In this letter, we propose to learn the causal relations as well as the lag among different time series simultaneously from data. Specifically, we develop a probabilistic decomposed slab-and-spike (DSS) model to perform the inference by applying a pair of decomposed spike-and-slab variables for the model coefficients, where the first variable is used to estimate the causal relationship and the second one captures the lag information among different temporal variables. For parameter inference, we propose an efficient expectation propagation (EP) algorithm to solve the DSS model. Experimental results conducted on both synthetic and real-world problems demonstrate the effectiveness of the proposed method. The revealed time lag can be well validated by the domain knowledge within the real-world applications.

1  Introduction

Temporal causal inference aims to reveal the causal relationship among temporal variables and time series. Much effort has gone into learning an accurate temporal causal inference model for various applications, such as gene regulatory network discovery (Friedman, 2004; Liu, Niculescu-Mizil, Lozano, & Lu, 2011), social network analysis (Goldenberg & Moore, 2004), indictor analysis in business (Arnold, Liu, & Abe, 2007), climate data study (Chen, Liu, Liu, & Carbonell, 2010), and traffic analysis (Han, Song, Cong, & Xie, 2012).

For temporal causal modeling, an important factor is the time lag, which indicates how long one must look into the past to make accurate predictions of a current event. A real-world example is provided in Figure 1. Figure 1a shows the structure of a highway network; the red circle indicates a destination, and the blue circles denote the starting points. The traffic flow originating from the blue circles goes through the network, with a cost of some travel time, and finally arrives at the destination. Here, the travel time brings a time lag between the time series recorded at the source points and the time series recorded at the destination. This lag information is captured in Figures 1b to 1d. Figure 1b records the number of vehicles from the source point A arriving at the destination between 10:00 and 10:15. We observe that these vehicles depart from A earlier: 9:15 to 10:00. By treating 15 minutes as a unit, the time lag from A to the destination is thus a value in in terms of units. Similarly, Figures 1c and 1d record the lag information from starting points B and C to the destination, respectively. Commonly, the lag between a pair of time series can be described as a value in a range (e.g., the values in Figure 1). Hence, an accurate estimation of the range for each pair of time series can bring a better understanding of the causal relations and more accurate prediction performance.

Figure 1:

(a) A traffic network. (b–d) Lag intervals for different starting points.

Figure 1:

(a) A traffic network. (b–d) Lag intervals for different starting points.

Most of the existing causal inference methods treat and as constants instead of variables before learning. For example, previous approaches (Bahadori, Liu, & Xing, 2013; Cheng, Bahadori, & Liu, 2014; Lebre, Becq, Devaux, Stumpf, & Lelandais, 2010) set , which assumes that the current event is affected by the most recent history and choose either by cross-validation method or as a predefined value. As we can see from the previous example, such a setting will prevent the model from detecting the true time lags and causal relationship among temporal variables.

In this letter, we propose to learn the causal relationship and the time lag among different temporal variables at the same time. Specifically, we propose a probabilistic decomposed slab-and-spike (DSS) model in which we separate the inference by decomposing the slab-and-spike variable into a product of two latent variables—one designed for estimating the causal relationship among different temporal variables and another layer to capture the optimal time lag among these variables in the form of range . For parameter inference, we propose an efficient expectation propagation algorithm to solve the DSS model. Experimental evaluations are performed on both synthetic and real-world problems, and the results demonstrate the effectiveness of the DSS model and that the recovered time lag can well match the domain knowledge.

The rest of the letter is organized as follows. Related work is discussed in section 2. We describe our model in section 3; then we give the detailed inference in section 4. We show experimental results on two data sets in section 5. We summarize the letter and conclude with future work in section 6.

2  Related Work

There has been a rich literature on temporal causal learning. Since the seminal work of Granger causality (Granger, 1969), where a statistical hypothesis test was proposed to determine whether one time series was useful in predicting another one, many approaches have been developed for inferring the causal relationship among temporal variables. For example, the exhaustive graphical Granger method (Arnold, Liu, & Abe, 2007) applied the Granger causality test to each pair of the temporal variables and established a causal relation graph for the variables; the vector autoregressive (VAR) model (Gilbert, 1995; Hatemi-J, 2004) fitted a linear regression by placing the current value of all the temporal variables as the response while treating the past values as covariates; the LASSO Granger method (Arnold et al., 2007; Qiu, Liu, Subrahmanya, & Li, 2012) employed the regularization to learn sparse causal relations among the variables. The SIN method (Arnold et al., 2007; Drton & Perlman, 2008) adopted a notion of the gaussian graphical model to test the causal relationship among temporal variables, and a greedy extension of this idea to deal with large scale graphs was proposed in Han et al. (2012). Some causal inference models utilizing the hidden Markov random filed regression were discussed in Liu, Kalagnanam, and Johnsen (2009), Liu, Niculescu-Mizil, Lozano, and Lu (2010, 2011). Other causal inference models that aim to analyze irregular time series analysis (Qiu et al., 2012; Bahadori & Liu, 2012), evolving relational graphs (Chen et al., 2010), and the switch of roles between cause and effect (Cheng et al., 2014) have been studied recently as well. Although this is a rich literature, when considering the time lag, most of these methods prefix this value as constant before learning, leading to inaccurate estimates of the causal relations.

Very few studies tried to understand the time lag from data. Dhurandhar (2010) proposed estimating the maximum lag for a grouped graphical Granger model. Knapp and Carter (1976) developed a maximum likelihood estimator to determine the time delay in correlation analysis. Unfortunately, this work focuses only on inferring the lag; they cannot infer the causal relationship at the same time. We aim to learn the causal relationship and the lag among temporal variables simultaneously from data.

3  The Decomposed Slab-and-Spike Model

Throughout the letter, we use roman letters to denote scalars, boldface and lowercase letters for vectors, and boldface and uppercase letters for matrices. Denote the target time series as , where is the total length of the considered time. The input data are with features where denotes the th sample of . Assume the considered time series elapses in time steps. The input data are converted to a new matrix with size , where and is the number of samples. The th row of matrix is . Now, we consider a linear regression model,
formula
3.1
where is the regression coefficient and is the gaussian noise.
The causal inference problem is to find the optimal coefficient matrix . We propose a probabilistic model for the coefficient . The likelihood function for is defined as
formula
3.2
where denotes a normal distribution with mean and variance .

Since we aim to learn the causal relationship and the time lag between the causes and the target simultaneously, we introduce two binary latent variables and , where controls the causal relationship between the causes and the target and contains the lag information. Specifically, indicates that the variable is possibly causally related to ; otherwise, if , is not responsible for .

On the other hand, determines whether a specific time point lies in the lag interval of the th variable. Let denote the th time point in the time window of variable : . That is, when , the th time point is regarded as belonging to the lag interval of variable ; hence, could be the cause of , that is, . Otherwise, when , the th time point is outside the lag interval and is removed from the model: .

In order to capture the causalities and lag information in one unified model, we use the corresponding two hyperparameters to control the regression coefficient . The model coefficient should satisfy that if and if . Therefore, the model coefficient admits a sparse structure determined by two latent variables and . To achieve this, we propose to apply the slab-and-spike priors, which have been used in the previous probabilistic sparse learning literature (Chipman, 1996; Hernndez-Lobato, Hernndez-Lobato, & Surez, 2015; Hernndez-Lobato, Hernndez-Lobato, & Dupont, 2013). Specifically, we devise a probabilistic model based on the spike-and-slab priors by using two binary variables controlling the sparsity of as
formula
3.3
where is the variance of a gaussian distribution (the slab) and is a Dirac delta function, which is also known as the point probability mass (the spike), centered at 0. The value of controls the shrinkage of the coefficients that are different from zero. If is large, 's are barely regularized and hence will be different from zero. Conversely, if is small, the coefficients 's will strongly shrink toward zero. According to equation 3.3, it is easy to see that the model coefficient depends on two latent variables, which imply the causal relationship and the lag information, respectively. Therefore, we call our method the decomposed slab-and-spike (DSS) model.
For the latent variable , we apply a multivariate Bernoulli prior, which is commonly used for binary variables, as
formula
3.4
where is the probability that is nonzero and .
Now we focus on the binary variable . The function of hyperparameter is to check whether the th time point should be included in the lag interval of the . Thus we apply a Bernoulli prior for :
formula
3.5
where is the probability of the time point lying in the lag interval of .

Recall that we consider the lag with form ; therefore, we need to encode the two parameters and in the latent variable . To achieve this, we introduce two new variables, and , and reformulate and as and . Now the lag interval can be represented as .

We assume that if belongs to the lag interval of , the probability . Otherwise, . Here the probability in the lag interval does not follow a fixed form. Thus, we assume follows the uniform distribution:
formula
3.6
Without generality, we apply normal priors for variables and as follows:
formula
3.7
formula
3.8
where and are the mean vectors and and denote the variance matrices, which are the diagonal matrix.
Now, by marginalizing out and , we have
formula
3.9

Based on equation 3.9, Figures 2a and 2b show example plots of the probability density under different choices of hyperparameters. The spike-and-slab prior is a mixture of some gaussian distributions (the slab) and a point probability mass (the spike), and it is displayed by an upward-pointing arrow. The height of this arrow is the probability mass of the prior at zero. From Figure 2, we observe that the proposed spike-and-slab prior can clearly discriminate the coefficients modeled by the slab, whose probability density is barely changed, and the coefficients described by the spike, whose posterior has significantly different peaks around zero.

Figure 2:

Probability density for under different choices of and . (a) . . takes values 0.5, 1, and 0, respectively. (b) . . takes values 0.5, 1, and 0, respectively.

Figure 2:

Probability density for under different choices of and . (a) . . takes values 0.5, 1, and 0, respectively. (b) . . takes values 0.5, 1, and 0, respectively.

In many real-world applications, we may be aware of some a priori knowledge about the causal relationship and the time lag. For example, in traffic and climate analysis, the time series collected from spatially close locations are more likely to be causal related, and the time lag between them is relatively small compared with those from distant locations (see the example in Figure 1). Hence, we may choose the hyperparameters, such as , , and , to match this knowledge to facilitate the learning, as we will do in our experiments.

3.1  Prediction and Dependence

Given the observed data and , we make inferences about the potential values of and and using Bayes' theorem:
formula
3.10
where is a normalization constant, known as the model evidence. This posterior distribution and the likelihood, equation 3.2, can be combined to compute a predictive distribution for the target associated with a new observation :
formula
3.11
The posterior distribution of defines the probability of causal features:
formula
3.12

A practical difficulty is that the exact computation of equations 3.10 to 3.12 is intractable for typical learning problems. All of these expressions involve a number of summations that grow exponentially. Thus, they must be approximated in practice. As an efficient alternative, we employ expectation propagation (EP), a method for fast, approximate inference with Bayesian models.

4  Parameter Inference

In this section, we propose an efficient parameter inference algorithm to solve the DSS model.

EP is a deterministic method for carrying out approximate Bayesian inference. EP approximates the posterior distribution of the parameters of interest using a simpler parametric distribution . The form of is chosen so that the integrals required to calculate expected values and marginal distributions with respect to can be obtained analytically in closed form. EP fixes the parameters of to approximately minimize the Kullback-Leibler divergence between the exact posterior and (Hernndez-Lobato et al., 2013).

The joint posterior distribution of all the latent variables can be written as
formula
4.1
EP approximates each exact factor by a simpler factor , such that
formula
4.2
All approximate factors are considered to belong to the same family of exponential distributions. , where is a normalization term.

4.1  Approximation of the Posterior

In equation 4.1, , , and can be represented by equations 3.4, 3.7, and 3.8, respectively. refers to a multivariate gaussian distribution with mean and covariance matrix defined as
formula
is set to be equivalent to , and it is not iteratively reestimated through the EP algorithm. The corresponding parameters of are
formula
4.3
where is the logit function defined as
formula
4.4
We propose to derive approximations of and , denoted as and , respectively, using the EP method. In order to take advantage of the structure of the true posterior distribution, we assume that and can be decomposed into independent terms. Specifically, let and be
formula
4.5
formula
4.6
where denotes the sigmoid function as and denotes the probability with parameters and . Each component is the product of a univariate gaussian distribution and two Bernoulli distributions, and is the product of two univariate gaussian distributions and a Bernoulli distribution. Based on the approximations in equations 4.5 and 4.6, we obtain a joint variational approximation on the posterior:
formula
4.7
According to the quotient rules for gaussian and Bernoulli distributions (Hernndez-Lobato et al., 2013), the joint parameters are defined as
formula
4.8
formula
4.9
formula
4.10
where . denotes the the variance of the in approximate factors .

4.2  EP Update Operations

Next, we perform the expectation propagation. Taking as an example, we first compute the corresponding parameters of the marginal distributions of , , and under the cavity distribution . The parameters of are obtained from the Bayesian rules of the quotient between gaussian distributions and the quotient between Bernoulli distributions (Hernndez-Lobato, 2009). Once is computed, we can find the corresponding approximation , which minimizes the KL divergence between and . Once all the approximate factors have been updated in parallel, needs to be recomputed as the product of all the approximate factors. The similar calculation process can be found in Hernndez-Lobato et al. (2013) and Hernndez-Lobato (2009). Computations for can be derived similarly. The procedure is depicted in algorithm 1.

formula

5  Experiments

In this section, we conduct empirical results on both synthetic and real-world data sets. We compare the proposed DSS model with the state-of-the-art temporal causal inference methods, including the Granger LASSO (L(G); Arnold et al., 2007), the standard LASSO (L(S); Lee, Lee, Abbeel, & Ng, 2006), the SIN Granger method (Drton & Perlman, 2008), the TLHL model (Zhou et al., 2015), and the VAR model (Gilbert, 1995). Specifically, standard LASSO does not consider time lag interval; it looks back only one time unit. Granger LASSO and SIN Granger are given a fixed time window as lag interval. The TLHL model takes lags as adaptive parameters, but it assumes the regression coefficients follow exponential form. The VAR model generalizes the univariate AR model to multiple time series. For the LASSO-based methods, we choose their regularization parameters from a candidate set by cross-validation according to Arnold et al. (2007).

5.1  Synthetic Data

5.1.1  Setting

We first evaluate various methods on synthetic data. We set , , and in the synthetic data. The predictors are generated from the standard normal distribution. In order to simulate the causal relationship, we set the response variable to be affected by —that is, . The corresponding probability matrix is set as Figure 3a.

Figure 3:

(a) The true probability matrix . (b) The probability matrix learned by the DSS model.

Figure 3:

(a) The true probability matrix . (b) The probability matrix learned by the DSS model.

For the model coefficient , we generate each from , where for , we set ; we set for any other cases. The variance is always set to be 1 for all cases. The response is finally generated as
formula

We select the in the standard LASSO based on cross-validation. In this experiment, we select the optimal from the candidate set and get finally. The selection of in the Granger LASSO is the same as that in the standard LASSO.

5.1.2  Evaluation

In order to evaluate the performance of lag inference of the proposed methods, we introduce some measurements. In our model, for each temporal variable , we can determine the lag interval via the probability —that is, the probability of the time point lying in the . When and , the time point is regarded as . When and , the time point is referred as the . Given the minimum and maximum lag values, we can get the true parameter . Let denote the probability estimated by the DSS model. Let denote the true probability based on the fixed minimum and maximum lags. If , is lying in the lag interval of . Then the precision and recall for recovering the true lag intervals are defined as follows:
formula
Based on the precision and recall, the score is defined as . We report the average score to evaluate the proposed methods. For measuring the causal recovery performance, we also use the score. Let denote the true causal result vector and denote the learned causal result vector. Then the precision and recall for recovering the true causalities are defined as follows:
formula
Moreover, we use the mean absolute percentage error (MAPE), defined as
formula
to evaluate the performance of different methods of their prediction performance.

5.1.3  Results

We evaluate the estimated parameters in the DSS model. Figures 3a and 3b show the true probability and the learned by the DSS model, respectively. Comparing the two panels, we can see that the learned probability matrix is close to the true matrix, especially for the time points where the probability equals 1. Thus, we can conclude that the learned probability clearly recovers the true probability, which demonstrates that the proposed DSS model can accurately estimate the lag intervals for different covariates.

Moreover, we can obtain the causal relations by observing the probability , which is the probability that the coefficient is different from zero. Figure 4 reports the learned probabilities. From the results, we observe that the results well match the ground truth that the first eight variables are the causal factors of the response .

Figure 4:

The learned probability from the DSS model.

Figure 4:

The learned probability from the DSS model.

Figure 5 shows the effect of the parameter , where we vary from 1 to 25 and evaluate the F1 measures of the obtained lags. It is clear that when is larger than the maximum lag defined for all the features (i.e., 20), the F1 score can reach a high value around 0.8 and generally remains stable. This observation is reasonable since the considered time window should be larger than the existing maximum lag, such that we can accurately discover all the lags in the variables. Moreover, the results also imply that we do not need a very large .

Figure 5:

The score of lag estimation with varying W.

Figure 5:

The score of lag estimation with varying W.

Next, we compare the DSS model with other methods in terms of learning accuracy. The score of the causal relationship recovery is provided in Figure 6. From the result, we can conclude that the DSS model is much more stable than all the other methods. Because the method of standard LASSO with fixed time lag considers only one value of lag, the lag fluctuations cannot be captured. Thus, the performance of standard LASSO is worse.

Figure 6:

The score of causal relation recovery.

Figure 6:

The score of causal relation recovery.

Finally, based on the recovered dependency relationships, we can evaluate various models in terms of predication performance. We use the MAPE to evaluate the performance of different methods. Figure 7 provides the predication results. We see that our model achieves the lowest prediction errors in the MAPE.

Figure 7:

The MAPE of different methods.

Figure 7:

The MAPE of different methods.

5.2  Traffic Data

In this section, we evaluate the proposed methods on a real-world traffic data set studied by Han et al. (2012).

5.2.1  Setting

The features in this traffic data set are observations collected from sensors located on stations in a traffic network. The observation of each station is the number of passed vehicles over 15 minutes. The data are collected from records over 92 days, and there are 236 traffic stations, so 236 time series were observed in this data set. The data values are normalized to [0,1]. The task here is to infer the causal relationship among different time series from two busy periods in each day: 7:00 a.m. to noon and midnight to 7:00 a.m. We select one month's data, with 600 samples as the training data.

In this traffic data set, there exists prior knowledge on the lag intervals. When a vehicle passes a traffic station, the time is recorded. And as a consequence, we can compute the average travel time between any two locations. This travel time can be viewed as a rough guess of the lag, and we use it to define the hyperparameters in the DSS model. Specifically, we set the hyperparameter as the corresponding average travel time.

5.2.2  Results and Discussion

Table 1 shows the predicted mean absolute percentage error for all the methods. From the results, we see that the DSS method outperforms the other methods. This demonstrates that learning both the causal relations and the lag intervals simultaneously from data can significantly improve prediction performance compared to the methods that fix the length of the lag.

Table 1:
MAPE of Various Methods on the Traffic Data Set.
L(S)VARL(G)TLHLDSS
MAPE 0.35 0.22 0.3 0.23 0.12 
L(S)VARL(G)TLHLDSS
MAPE 0.35 0.22 0.3 0.23 0.12 

We also show the detailed causal relations that the DSS model learned from 7:00 a.m. to noon and midnight to 5:00 a.m. in Figures 8a and 8b, respectively. In the two figures, the red circle indicates the target station (destination) and the blue circle the source station (starting point). We observe that the causal-related source stations are mostly located close to the target station from 7:00 a.m. to noon. However, from midnight to 5:00 a.m. (the late night), the causal-related source stations change dramatically where these source stations are generally distant from the target station.

This interesting phenomenon can be well explained by our domain experts. A large number of trucks travel through this traffic network in the late night every day, and most of the trucks travel a long distance for business reasons. In contrast, during the day, most traffic is composed of private cars, which usually travel a very short distance. This knowledge well explains the results in Figures 8a and 8b and demonstrates the effectiveness of the proposed method.

Figure 8:

Causal relations learned by the DSS model. The red circle indicates the target station and the blue circle the source station.

Figure 8:

Causal relations learned by the DSS model. The red circle indicates the target station and the blue circle the source station.

The lag intervals that the DSS model learned are shown in Figure 9. It is obvious that most of the ground-truth travel time values lie in the lag intervals the DSS model learned, which implies that the inferred lag intervals are very accurate. For example, from Figure 9a, we can see that the lag interval of station 10 is [9,15] while that of station 1 is [2,8], which is consistent with the fact that the distance between station 10 and the target station is much farther than the one between station 1 and the target station.

Figure 9:

The lag intervals learned by DSS and the corresponding mean travel time. (a) The lag intervals of dependency relationships for 7:00 a.m. to noon. (b) The lag intervals of dependency relationships for midnight to 5:00 a.m. The black curve denotes the travel time from the dependent station to the target station.

Figure 9:

The lag intervals learned by DSS and the corresponding mean travel time. (a) The lag intervals of dependency relationships for 7:00 a.m. to noon. (b) The lag intervals of dependency relationships for midnight to 5:00 a.m. The black curve denotes the travel time from the dependent station to the target station.

5.3  Air Quality Data

In this section, we apply our method to the air quality data analysis. The problem of air quality has attracted increasing interest in recent years (Intergovernmental Panel on Climate Change, 2007). In this application, one of the most important challenges is to discover the dispersion path of air pollution (Seinfeld & Pandis, 2016) and learning spatial causal relationship is very helpful for interpreting and understanding the spreading pattern of the pollution. We will test all the methods on a real-world air quality data set to evaluate their effectiveness.

5.3.1  Data

The air quality data are collected from 14 air quality monitoring stations in different locations of Beijing, China. Each station measures the hourly PM2.5 data from December 1, 2014, to March 31, 2015. We extract the period from February 20, 2015, at 7:00 a.m. to February 22 at 7:00 p.m. from the entire time series, since the air quality is seriously polluted during this time interval. We chose the time series from four sensors located at four important areas as the target time series and then determined their causal factors from all the time series collected from all the sensors separately.

5.3.2  Results and Discussion

Figure 11 shows the recovered causal relationship and the estimated lag intervals for the four sensors. We find that the estimated causal relationship is consistent with the direction of the pollution diffusion as revealed in Figure 10. From the results, we could infer that this pollution incident originates southwest of Beijing. As our domain experts acknowledged, there are a number of chemical factories located in these areas, and this evidence provides a reasonable explanation for our results.

Figure 10:

The three snapshots of the pollution process.

Figure 10:

The three snapshots of the pollution process.

Figure 11:

The spatial dependencies and lag intervals of four sensors learned by the DSS model.

Figure 11:

The spatial dependencies and lag intervals of four sensors learned by the DSS model.

In addition, we also compare the error with other methods. Table 2 shows the MAPE of all methods on this data set. From the table, we see that our model obtains the lowest prediction error. Thus, we can conclude that causal inference with time lag helps to provide more accurate prediction of the future effects.

Table 2:
The MAPE of Various Methods on the Air Quality Data Set.
L(G)L(s)VARTLHLDSS
MAPE 0.29 0.64 0.51 0.29 0.24 
L(G)L(s)VARTLHLDSS
MAPE 0.29 0.64 0.51 0.29 0.24 

6  Conclusion

Time lag is an important parameter in causal relationship learning. In this letter, we proposed a probabilistic decomposed slab-and-spike model to simultaneously estimate the causal relationship and the time lag interval within a unified framework. We developed an expectation propagation algorithm for the parameter inference. We evaluated our model on one synthetic data set and two real-world data sets. Experimental results demonstrate the effectiveness of the proposed model in learning the causal relationship as well as the time lag. Importantly, the results conducted on the real data sets are well interpreted by the domain experts.

It will be interesting to extend the static DSS model to learn a dynamic model, where the causal relationship and lags are time varying over the entire time series. We are also interested in learning the causal relationship and the time lag between the covariates and multiple responses at the same time.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61572041) and the Beijing Natural Science Foundation (4152023).

References

Arnold
,
A.
,
Liu
,
Y.
, &
Abe
,
N.
(
2007
).
Temporal causal modeling with graphical Granger methods.
In
Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(pp.
66
75
).
New York
:
ACM
.
Bahadori
,
M. T.
, &
Liu
,
Y.
(
2012
).
Granger causality analysis in irregular time series
. In
Proceedings of the 2012 SIAM International Conference on Data Mining
(pp.
660
671
).
Philadelphia
:
Society for Industrial and Applied Mathematics
.
Bahadori
,
M. T.
,
Liu
,
Y.
, &
Xing
,
E. P.
(
2013
).
Fast structure learning in generalized stochastic processes with latent factors
. In
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(pp.
284
292
).
New York
:
ACM
.
Chen
,
X.
,
Liu
,
Y.
,
Liu
,
H.
, &
Carbonell
,
J. G.
(
2010
,
July
).
Learning spatial-temporal varying graphs with applications to climate data analysis
. In
Proceedings of the 24th AAAI Conference on Artificial Intelligence.
Cambridge, MA
:
AAAI Press
.
Cheng
,
D.
,
Bahadori
,
M. T.
, &
Liu
,
Y.
(
2014
).
FBLG: A simple and effective approach for temporal dependence discovery from time series data
. In
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(pp.
382
391
).
New York
:
ACM
.
Chipman
,
H.
(
1996
).
Bayesian variable selection with related predictors
.
Canadian Journal of Statistics
,
24
(
1
),
17
36
.
Dhurandhar
,
A.
(
2010
,
December
).
Learning maximum lag for grouped graphical granger models
. In
Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
(pp.
217
224
).
Piscataway, NJ
:
IEEE
.
Drton
,
M.
, &
Perlman
,
M. D.
(
2008
).
A SINful approach to gaussian graphical model selection
.
Journal of Statistical Planning and Inference
,
138
(
4
),
1179
1200
.
Friedman
,
N.
(
2004
).
Inferring cellular networks using probabilistic graphical models
.
Science
,
303
(
5659
),
799
805
.
Gilbert
,
P. D.
(
1995
).
Combining VAR estimation and state space model reduction for simple good predictions
.
Journal of Forecasting
,
14
(
3
),
229
250
.
Goldenberg
,
A.
, &
Moore
,
A.
(
2004
).
Tractable learning of large Bayes net structures from sparse data
. In
Proceedings of the Twenty-First International Conference on Machine learning
(p.
44
).
New York
:
ACM
.
Granger
,
C. W.
(
1969
).
Investigating causal relations by econometric models and cross-spectral methods
.
Econometrica: Journal of the Econometric Society
,
37
,
424
438
.
Han
,
L.
,
Song
,
G.
,
Cong
,
G.
, &
Xie
,
K.
(
2012
).
Overlapping decomposition for causal graphical modeling
. In
Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(pp.
114
122
).
New York
:
ACM
.
Hatemi-J
,
A.
(
2004
).
Multivariate tests for autocorrelation in the stable and unstable VAR models
.
Economic Modelling
,
21
(
4
),
661
683
.
Hernndez-Lobato
,
D.
(
2009
).
Prediction based on averages over automatically induced learners: Ensemble methods and Bayesian techniques
.
Ph.D. diss., Universidad Autónoma de Madrid
.
Hernndez-Lobato
,
D.
,
Hernndez-Lobato
,
J. M.
, &
Dupont
,
P.
(
2013
).
Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation
.
Journal of Machine Learning Research
,
14
(
1
),
1891
1945
.
Hernndez-Lobato
,
J. M.
,
Hernndez-Lobato
,
D.
, &
Surez
,
A.
(
2015
).
Expectation propagation in linear regression models with spike-and-slab priors
.
Machine Learning
,
99
(
3
),
437
487
.
Intergovernmental Panel on Climate Change
. (
2007
).
Fourth Assessment Report: Climate Change 2007 (Working Paper 1325)
. http://wmo.insomnation.com/sites/default/files/documents/meetings/session20/doc2.pdf
Knapp
,
C.
, &
Carter
,
G.
(
1976
).
The generalized correlation method for estimation of time delay
.
IEEE Transactions on Acoustics, Speech, and Signal Processing
,
24
(
4
),
320
327
.
Lebre
,
S.
,
Becq
,
J.
,
Devaux
,
F.
,
Stumpf
,
M. P.
, &
Lelandais
,
G.
(
2010
).
Statistical inference of the time-varying structure of gene-regulation networks
.
BMC Systems Biology
,
4
(
1
),
130
.
Lee
,
S. I.
,
Lee
,
H.
,
Abbeel
,
P.
, &
Ng
,
A. Y.
(
2006
).
Efficient l 1 regularized logistic regression
. In
AAAI (Vol. 6
, pp.
401
408
).
Liu
,
Y.
,
Kalagnanam
,
J. R.
, &
Johnsen
,
O.
(
2009
).
Learning dynamic temporal graphs for oil-production equipment monitoring system
. In
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(pp.
1225
1234
).
New York
:
ACM
.
Liu
,
Y.
,
Niculescu-Mizil
,
A.
,
Lozano
,
A. C.
, &
Lu
,
Y.
(
2010
).
Learning temporal causal graphs for relational time-series analysis
. In
Proceedings of the 27th International Conference on Machine Learning
(pp.
687
694
).
New York
:
ACM
.
Liu
,
Y.
,
Niculescu-Mizil
,
A.
,
Lozano
,
A.
, &
Lu
,
Y.
(
2011
).
Temporal graphical models for cross-species gene regulatory network discovery
.
Journal of Bioinformatics and Computational Biology
,
9
(
2
),
231
250
.
Qiu
,
H.
,
Liu
,
Y.
,
Subrahmanya
,
N. A.
, &
Li
,
W.
(
2012
, December).
Granger causality for time-series anomaly detection
. In
IEEE 12th International Conference on Data Mining
(pp.
1074
1079
).
Piscataway, NJ
:
IEEE
.
Seinfeld
,
J. H.
, &
Pandis
,
S. N.
(
2016
).
Atmospheric chemistry and physics: From air pollution to climate change
.
Hoboken, NJ
:
Wiley
.
Zhou
,
X.
,
Hong
,
H.
,
Xing
,
X.
,
Huang
,
W.
,
Bian
,
K.
, &
Xie
,
K.
(
2015
, June).
Mining dependencies considering time lag in spatio-temporal traffic data
. In
Proceedings of the International Conference on Web-Age Information Management
(pp.
285
296
).
New York
:
Springer
.