Spectral dynamic causal modeling: A didactic introduction and its relationship with functional connectivity

Abstract We present a didactic introduction to spectral dynamic causal modeling (DCM), a Bayesian state-space modeling approach used to infer effective connectivity from noninvasive neuroimaging data. Spectral DCM is currently the most widely applied DCM variant for resting-state functional MRI analysis. Our aim is to explain its technical foundations to an audience with limited expertise in state-space modeling and spectral data analysis. Particular attention will be paid to cross-spectral density, which is the most distinctive feature of spectral DCM and is closely related to functional connectivity, as measured by (zero-lag) Pearson correlations. In fact, the model parameters estimated by spectral DCM are those that best reproduce the cross-correlations between all measurements—at all time lags—including the zero-lag correlations that are usually interpreted as functional connectivity. We derive the functional connectivity matrix from the model equations and show how changing a single effective connectivity parameter can affect all pairwise correlations. To complicate matters, the pairs of brain regions showing the largest changes in functional connectivity do not necessarily coincide with those presenting the largest changes in effective connectivity. We discuss the implications and conclude with a comprehensive summary of the assumptions and limitations of spectral DCM.


Introduction
Dynamic causal modelling (DCM) refers to the Bayesian fitting of state-space models to explain observed physiological signals in terms of hidden neuronal activity and connectivity (Friston et al., 2003;Triantafyllopoulos, 2021).The distinction between observed and hidden variables is particularly relevant in neuroscience because the signals recorded via non-invasive neuroimaging are not a direct measurement of neuronal states or connectivity.In fact, using observed recordings to infer unobserved neural interactions is the main purpose of DCM and the reason for its widespread adoption.It is also a distinctive feature that sets it apart from functional connectivity analysis, which simply characterises statistical dependencies in observed time series.Over the last 20 years, the versatility offered by state-space models has seen DCM applications in most neuroimaging modalities (Friston et al., 2003;Kiebel et al., 2008;Moran et al., 2009;Friston et al., 2014;Tak et al., 2015;Jung et al., 2019;Friston et al., 2019;Frässle et al., 2021), along with recent applications to epidemiology (Friston et al., 2022) and beyond (Bach et al., 2010).Navigating the vast and technical DCM literature, however, is by no means a trivial taskespecially to the novice learner.Happily, there are excellent introductory resources on individual-and group-level analysis using deterministic versions of DCM, which are designed for neuroimaging experiments involving behavioural tasks (Stephan, 2004;Stephan et al., 2010;Zeidman et al., 2019a,b).A recent Email address: leonardo.novelli@monash.edu(Leonardo Novelli) primer on variational Laplace explains how Bayesian inference is performed in DCM (Zeidman et al., 2022) using the SPM software (http://www.fil.ion.ucl.ac.uk/spm).However, there is a lack of introductory material on DCM for resting-state data analysis, despite the remarkable growth of the resting-state paradigm and the widespread uptake of these methods.Here, we fill this gap with a didactic introduction to spectral DCM that aims to explain its technical aspects (Friston et al., 2014;Razi et al., 2015).
What distinguishes spectral DCM from other DCM versions, and when should we choose it?Firstly, spectral DCM employs random differential equations instead of deterministic ones.These are used to model spontaneous endogenous fluctuations in neuronal activity, enabling resting-state analysis in the absence of experimental inputs.But its distinctive feature is the focus on modelling the measured cross-spectral density, which is a second-order summary statistic of the time-series data.This is closely related to Pearson's correlation, another second-order statistic and the most widely used measure of functional connectivity in neuroimaging (see diagram in Fig. 1; for mathematical relationships, also see (Friston et al., 2014, Fig. 1)).In fact, the correlation is obtained by normalising the covariance such that its values are restricted to the [−1, 1] interval.In turn, the covariance is a special case of the cross-covariance function between two time series, when there is no time lag between them.Finally, the Fourier transform of the cross-covariance function gives the cross-spectral density (under stationarity assumptions).In other words, the cross-spectral density is the equivalent representation of the cross-covariance function in the frequency domain is the focus on modelling the measured cross-spectral density (top right), which is a second-order summary statistic of the time-series data.This is closely related to Pearson's correlation (bottom left), another second-order statistic and the most widely used measure of functional connectivity in neuroimaging.In fact, both can be derived from the cross-covariance function (top left).The cross-spectral density is obtained directly via the Fourier transform (horizontal arrow).As such, it is the equivalent representation of the cross-covariance function in the frequency domain instead of the time domain.The correlation is obtained in two steps (vertical arrows).We first compute the covariance as a special case of the cross-covariance function between two time series, by setting the time lag between them to zero.Finally, we normalise the covariance such that its values are restricted to the [−1, 1] interval to obtain the zero-lag correlation.For the mathematical relationships among these quantities, we refer readers to (Friston et al., 2014, Fig. 1).
instead of the time domain-an important relationship that we will unpack later in this article.
Spectral DCM fits the parameters of a linear, continuous-time model to the observed cross-spectral density.The estimated parameters are those that best reproduce the cross-correlations between all variables, at all time lags.In particular, the estimated effective connectivity also reproduces the zero-lag correlations between the observed time series-the most common measure of functional connectivity in the literature.This would be appealing to researchers who are interested in both effective and functional connectivity.The nuanced relationship between effective and functional connectivity is explored in Section 3. Prior to that, in Section 2, we introduce and explain the various components of the generative model, i.e., the model that generates the crossspectral density given a set of parameters.These basic building blocks are used routinely in signal processing and control theory and are often presented only briefly in the DCM literature.Here, we adopt an inclusive and slower pace for those who are not familiar with state-space models and spectral data analysis.That said, we count on the reader to fill in the gaps and look up concepts such as the Fourier transform (Oppenheim et al., 1997;Smith, 2002) or convolution (Smith, 2002), if needed.Moving from theory to practice, a step-by-step guide to running spectral DCM on a real resting-state functional MRI (fMRI) dataset is provided in Chapter 38 of the SPM12 manual (John Ashburner et al., 2020).
A final reason for choosing spectral DCM is its computational advantage compared to stochastic DCM (Li et al., 2011).It is important to note that the lower computational complexity and the resulting increase in speed rely on the assumption that the the statistics of endogenous neuronal fluctuations are conserved over the experimental time window, making spectral DCM suitable for resting-state neuroimaging experiments.Experimental inputs can also be included via an additional term in the model, although applications to task experiments are infrequent in the literature.Introducing even stronger assumptions leads to even faster schemes, such as regression DCM, which can analyse hundreds of brain regions in minutes (Frässle et al., 2021).However, this method forgoes the strict separation between hidden and observed variables that is typical of state-space modelling and that we have used to define DCM herein.As the name suggests, regression DCM is more akin to the Bayesian fitting of a multivariate autoregressive model in the frequency domain.
The key assumptions made in spectral DCM are summarised in Section 4 with a discussion of the ensuing limitations.

Building the generative model, one element at a time
The signals recorded via non-invasive neuroimaging are not a direct measurement of neuronal activity.In the case of fMRI, the observed blood-oxygen-level-dependent (BOLD) signal captures changes in blood oxygenation that indirectly reflect neuronal activity.For this simple reason, spectral DCM models the neuronal and the observed variables separately (denoted by x and y, respectively).Such a distinction represents both the main strength and the challenge of the DCM framework.

Neuronal model
In spectral DCM, the neuronal model is defined by the linear random differential equation where x(t) is the n-dimensional state vector whose N scalar components represent different brain regions.These are called state variables in the state-space modelling literature (Durbin and Koopman, 2012;Williams II and Lawrence, 2007).The time derivative of the state vector is denoted as ẋ(t), where differentiation with respect to time is performed component-wise, i.e., ẋ(t) = [ ẋ1 (t), . . ., ẋN (t)] ⊺ .The activity of the system is sustained by stochastic, non-Markovian, endogenous fluctuations denoted as v(t), which we will consider in Section 2.4.Let us first turn our attention to the A matrix, which defines and parameterises the effective connectivity.

Effective connectivity
The effective connectivity quantifies the directed effect of one brain region on another.To better understand its meaning, consider a simple deterministic system with two brain regions and no stochastic components.The matrix-vector notation in Eq. ( 1) can be unpacked into two scalar components x 2 x 1 (with a 12 =1, a 11 =-1) x 1 (with a 12 =3, a 11 =-3) The role of the effective connectivity parameters in a deterministic linear system with two variables.The parameter a 12 determines the instantaneous rate of response of x 1 to an input from x 2 .In this example, the input is constant with duration ∆t.The initial slope of the response is steeper for higher values of a 12 .Once the input from x 2 ceases, the activity of x 1 decays exponentially in the absence of other inputs.The decay rate is determined by the self-connection a 11 , with larger negative values resulting in faster decay (shorter memory).
where a jk corresponds to the element in the j-th row and k-th column of A. More explicitly, in this example, we have Let's initially assume that x 1 is inactive at time t 1 and set x 1 (t 1 ) = 0 in Eq. (2a).We get ẋ1 (t 1 ) = a 12 x 2 (t 1 ), (4) stating that the instantaneous change in x 1 is proportional to the input from x 2 .The effective connectivity a 12 is simply the coefficient that determines the rate of such change.Therefore, in DCM, the effective connectivity a jk quantifies the instantaneous response rate of x j caused by a change in x k , in the ideal case where all other variables were kept fixed or set to zero. 1 Being a rate, effective connectivity is always measured in Hz (change per second).Fig. 2 shows the impact of a 12 on the response of x 1 to a constant input from x 2 with duration ∆t.Note that the effective connectivity determines the initial slope of the curve, which is steeper for high values of a 12 .However, once the input from x 2 ceases, the magnitude and duration of the response in x 1 no longer depend on a 12 ; instead, they only depend on the self-connection a 11 .That is, after the time interval ∆t, we have x 2 (t) = 0 and Eq.(2a) becomes which has the simple exponential solution for each time t > (t 1 + ∆t), where the constant factor c is the value of x 1 when the input from x 2 ceases, i.e., c = x 1 (t 1 + ∆t).
1 Readers who are familiar with multivariate calculus would recognise this as a partial derivative and the A matrix as a Jacobian.
It is useful, and biologically-plausible, to impose a negativity constraint on the rate constant of the self-connections (i.e., a 11 ) to avoid instability and divergence to infinity.In this example, a negative value of a 11 guarantees that x 1 (t) converge to zero.2In the multivariate case, the stability of a linear dynamical system is guaranteed when all the eigenvalues of the effective connectivity matrix A have a negative real part (Izhikevich, 2006).

Power spectral density and cross-spectral density
A time-varying signal z(t) is a function of time.However, the same function could be represented as a sum of elementary sine waves, each characterised by a single frequency.This sum is weighted, with some frequencies carrying more weight than others (i.e., the sine waves can have different amplitudes).Every function, z(t), is a unique mix of frequencies, some more pronounced, some less.This unique profile is called Fourier frequency spectrum.The time-and frequency-domain representations of a function are two sides of the same coin: they are equally informative but reveal different and complementary aspects of the same data.The Fourier transform (F ) is the mathematical tool that turns the coin over: it converts a function of time into the corresponding function of frequency (while the inverse Fourier transform does the opposite).If we denote the (angular) frequency by ω, then F turns the time function z(t) into the frequency function Z(ω).Mathematically, this transformation is achieved via the integral (for time signals, we often only consider positive values of t and compute the integral in the [0, ∞] interval; this is equivalent to setting z(t) = 0 for all t < 0).The resulting Z(ω) is a function of the frequency ω and no longer depends on time.Somewhat ambiguously, the term "Fourier transform" is used both to denote the mathematical operation and the resulting function, Z(ω).
Note that Z(ω) typically returns complex values, due to the presence of the imaginary unit i in Eq. ( 7).Yet, the squared magnitude of a complex number (e.g.|Z(ω)| 2 ) is a real number, defined as the square of its real part plus the square of its complex part.Therefore, the magnitude of the Fourier transform is a function that only returns real values, which makes it easier to understand and visualise.This function is the power spectral density of the signal.The simplicity of interpretation comes with of a loss of information.After computing the squared magnitude to obtain the power spectral density, we cannot go back and recover the original complex-valued Fourier transform (similarly to what happens for real numbers, where the square produces a unique result but the square root has two solutions).A similar information loss affects all second-order properties of the time series, including the cross-spectral density and the (cross-) correlation, which are two key concepts that we will discuss and connect later.
Until now, we have only considered deterministic signals.However, spectral DCM is concerned with stochastic (nondeterministic) processes, such as the endogenous fluctuations that we will examine in the next section, which are a proxy for thoughts or mind-wandering-like processes during resting-state brain activity.A stochastic process is a sequence of random variables.If x is a stochastic process indexed by time, then x(t) is not a number but a random variable with a given probability distribution (see Appendix A for an illustration).The simplest example is the white noise process, which follows a Normal distribution at each time point.According to stochastic calculus, the Fourier transform of white noise is also a stochastic process; however, it is indexed by frequency instead of time.Specifically, each frequency ω corresponds to a random variable that follows a Normal distribution with unit variance.Since the variance is the same for all frequencies, white noise has a flat power spectral density.The mathematically-versed reader would have noticed that the power spectral density of a stochastic process is also a stochastic process indexed by frequency, obtained as the squared magnitude of the Fourier transform.Therefore, in the case of stochastic processes, we will consider the expected value of the power spectral density (E[|Z(ω)| 2 ]), which is a number, i.e., a scalar function of frequency.Being the expectation of the squared magnitude, the power spectral density can also be understood as the variance of the Fourier transform of a stochastic process, if the latter has zero mean.We can equivalently express this concept via the equation (8) (Technical note: for simplicity, we assume that the Fourier transform exists.The general definition of the power spectral density involves a limit over the bounds of the Fourier integral (Miller and Childers, 2012)).
From here, we can seamlessly transition to multivariate stochastic processes using the same mathematical tools.If x(t) is a vector with one element per brain region, its Fourier transform X(ω) is also a vector.This is important because it applies to the stochastic neuronal variable in Eq. ( 1).The multivariate analogue of the power spectral density is the cross-spectral density, defined as the covariance matrix where † indicates the conjugate transpose of the vector X(ω).
The dot product between the column vector X(ω) and its conjugate transpose is a square matrix.Specifically, G x (ω) is a N × N matrix whose diagonal elements are the power spectral densities (variances) of individual neuronal variables, representing various brain regions.These are real positive numbers.Each off-diagonal element describes the cross-spectral density (covariance) between a different pair of variables.Unlike the diagonal elements, they generally take complex values.Admittedly, the cross-spectral density definition as a covariance in the frequency domain is quite abstract.A better intuition will develop after exploring the close relationship between crossspectral density and functional connectivity.In Section 3.2, we'll see how the cross-spectral density is the Fourier transform of the cross-covariance function, which captures both the correlation matrix and its time-lagged extensions.For now, the power spectral density definition given above is sufficient to understand how endogenous fluctuations are modelled in spectral DCM.

Endogenous fluctuations
Stable deterministic linear systems can only converge to a state of permanent equilibrium or produce an infinite sequence of identical oscillations.To overcome these limitations and add variability to the neuronal oscillations, we can introduce endogenous (intrinsic) fluctuations in the system, sometimes referred to as state noise.For example, adding the stochastic term v 1 (t) to Eq. ( 5) gives: At each time t, the random variable v 1 (t) provides an endogenous input to the neuronal variable so that x 1 (t) doesn't converge to zero despite the negative self-decay rate a 11 .This holds true even in the absence of experimental inputs and inputs from other variables, as is the case in Eq. ( 10).In other words, the neuronal activity is now also modelled as an intrinsically fluctuating signal, i.e., a stochastic process.The addition of a stochastic term to a dynamical system is traditionally used to model noise, often assumed to be white (that is, serially uncorrelated and with a flat spectral density).Spectral DCM relaxes this assumption and allows the endogenous fluctuations to be temporally correlated, which makes them non-Markovian and smooth.Specifically, their power spectral density is modelled to follow a power-law decay as a function of the frequency ω: The parameters α v j and β v j determine the amplitude and the decay rate of the power-law and may differ between neuronal regions ( j = 1, . . ., N).Note that the power-law family includes the flat spectrum (white noise) as a special case where β v j = 0.
The endogenous fluctuations driving one neuronal variable are assumed to be independent of those driving the others.The result is that the cross-spectral density of the endogenous fluctuations vector v(t) = [v 1 (t), . . ., v N (t)] ⊺ is a diagonal matrix.More precisely, the Fourier transform of v(t), denoted as the vector V(ω), is a multivariate Gaussian random variable with zero mean and diagonal covariance matrix where each diagonal entry is the power spectral density of an endogenous fluctuation variable.We will return to this expression when assembling all the elements of the generative model.

Observation function
We motivated the use of state-space models by their ability to distinguish between hidden and observed variables.The function that relates the two is known as the observation function.Imagine hearing thunder, where the sound (observed variable) is generated by lightning (hidden variable).The role of the observation function is to describe the intensity and delay of the sound based on the distance from the lightning.The specific observation function used in fMRI is the hemodynamic response function (HRF), which links the neuronal activity to the observed BOLD signal.Similarly to the lightning and thunder example, there is a delay between the neuronal activity and the ensuing peak of the BOLD response.The profile of the response depends on several region-specific biophysical parameters and can be modelled mathematically (Stephan et al., 2007).For simplicity, we will denote the HRF of a brain region j as h j (t), without explicitly indicating the biophysical parameters.The BOLD signal y j (t) is obtained via convolution of the HRF with the neuronal activity: where j = 1, . . ., N and e j (t) denotes the observation noise.By analogy with the endogenous fluctuations in Eq. ( 11), spectral DCM assumes that the power spectral density of the observation noise also follows a power-law decay: The equivalent representation in vector notation is where h(t) is a diagonal matrix with diagonal elements h j (t) for j = 1, . . ., N (as before, N is the number of regions).The noise terms in the vector e(t) = [e 1 (t), . . ., e N (t)] ⊺ are assumed to be independent of each other, i.e., the noise in each region is independent of the noise in the other regions.Thus, the Fourier transform of e(t), denoted as E(ω), is a multivariate Gaussian random variable with zero mean and diagonal covariance matrix , whose diagonal entries are G e j (ω), for all j = 1, . . ., N.
When working in the frequency domain, we can implement the haemodynamic response function as a filter-usually suppressing high frequencies-and implementing delays by operating on the imaginary parts of the Fourier coefficients.

Putting it all together
The full state-space model used in spectral DCM is where each equation has been introduced in its respective section above.Since DCM uses a Bayesian framework, all the model parameters are equipped with a prior distribution (Friston et al., 2014).Their posterior distribution is then computed via Bayesian inference using variational Laplace (Zeidman et al., 2022;Friston et al., 2007).In spectral DCM, this goal is achieved by fitting the generative model in Eq. ( 16) to the cross-spectral density of the data3 , which is defined as the matrix Going "backwards" from the observed cross-spectral density to the parameter distribution is only possible once we specify the forward model.Specifying the forward model means to derive G y (ω) as a function of the model parameters.We can start by invoking the convolution theorem, which states that the Fourier transform of a convolution of two functions is the (dot) product of their Fourier transforms.In the case of the observed signal y(t), we get where the terms in capital letters are the Fourier transforms of the corresponding lowercase functions.Linear control theory gives us a useful expression for X(ω), obtained as the solution to the linear differential equation in Eq. ( 1) via the Laplace method: where I is the N-dimensional identity matrix (Pipes, 1968).Plugging Eqs. ( 18) and ( 19) into Eq.( 17) yields The last two terms in Eq. ( 20) vanish since the noise term E(ω) is a Gaussian random variable with zero mean, i.e., E[E(ω)] = 0.In Eq. ( 22), we have substituted , as defined in Sections 2.4 and 2.5.We have now fully described the forward model used in spectral DCM.The predicted cross-spectral density can be computed using the following model parameters: 1. the effective connectivity parameters in the A matrix; 2. the power-law parameters (i.e., the amplitude and the exponent) describing the spectrum of the endogenous fluctuations and the observation noise (Eqs.( 11) and ( 14)); 3. the observation function parameters, e.g. the biophysical parameters of the BOLD balloon model.Crucially, neither the neuronal state variables X(ω) nor the endogenous fluctuations V(ω) appear in Eq. ( 22), only the parameters describing their cross-spectral densities.This parametrisation allows spectral DCM to infer the model parameters listed above without inferring the hidden neuronal states.Inferring the state variables (neuronal time series) is a computationally harder problem addressed by stochastic DCM (Li et al., 2011).

Simulated and empirical cross-spectral density
The forward model enables Bayesian inversion and also allows us to understand how different parameters affect the observed cross-spectral density.Fig. 3 shows how the crossspectral density [G y (ω)] 21 varies in a system with two hidden neuronal state variables, as a function of their effective connectivity strength.Specifically, the chosen effective connectivity matrix is The asymmetry in A indicates a directed effect of the first state variable on the second, but not vice-versa.The strength of the connection is determined by a 21 .In this first simple example, increasingly large and positive values of a 21 generate increasingly large and positive cross-spectral density amplitudes.Similarly, negative values generate negative amplitudes (but we'll soon encounter more complex scenarios that violate this monotonic relationship).When a 21 = 0, the two neuronal state variables are independent of each other and the cross-spectral density is zero at all frequencies (Fig. 3; also see Appendix B for an explanation of the real and imaginary parts).Here, the cross-spectral densities of the endogenous fluctuations and of the observation noise are identical to each other and identical for both state variables (so that both diagonal entries are equal): The observation function is also identical for both state variables.Spectral DCM employs a HRF model with region-specific biophysical parameters (Stephan et al., 2007).However, for simplicity, the examples in this section are based on the canonical HRF gamma-mixture, whose Fourier spectrum is High-frequencies are 'cut off' since the slow HRF smooths the faster neuronal activity (a typical feature of the BOLD signal).
Let's now extend the system in Eq. ( 23) by adding a third neuronal state variable (see Fig. 4) and a third row and column to the effective connectivity matrix: Figure 4: Network graph of the system described by the effective connectivity matrix A in Eq. ( 26).The strength of the directed connection from x 1 to x 2 is set via the parameter a 21 .The third neuronal state variable receives an inhibitory influence from x 1 and an excitatory influence from x 2 .All three variables have the same negative self-connections represented by the diagonal elements of A.
As before, the strength of the directed connection from x 1 to x 2 is set via the parameter a 21 .The third neuronal state variable receives an inhibitory influence from x 1 and an excitatory influence from x 2 .All three state variables have the same negative self-connections represented by the diagonal elements of A. The resulting cross-spectral density is plotted in Fig. 5, assuming the same canonical HRF defined in Eq. ( 25) and the same power-law parameters as in Eq. ( 24).The figure only shows the special case where a 21 = 0 but different values of a 21 would generate different sets of nine plots.
It is instructive to qualitatively compare the simulated crossspectral density plots in Fig. 5 with the empirical plots in Fig. 6, obtained by fitting the spectral DCM model to a real restingstate dataset (Razi et al., 2015).Note that the empirical plots correspond to a system with four neuronal variables instead of three.Here, a four regions default mode network (DMN) is modelled with posterior cingulate cortex (PCC), medial prefrontal cortex (mPFC) and bilateral inferior parietal cortices (IPC) as its nodes.We used a fully connected model where each state variable (or node) is connected to every other state variable.The figure is reproduced from Chapter 38 of the SPM12 manual (John Ashburner et al., 2020), which provides the link to the  22).The effective connectivity matrix is presented in Eq. ( 26).This figure only shows the special case where a 21 = 0 but using different values of a 21 would generate different sets of nine plots.Each plot corresponds to a pair of state variables and, if we chose a specific value for the frequency ω, we would obtain a 3 × 3 matrix (the cross-spectral density matrix at that specific frequency).The three plots on the diagonal represent the power spectral densities, which take real values (zero imaginary part).The same canonical hemodynamic response function defined in Eq. ( 25) is used for all variables in this example, although spectral DCM employs the BOLD balloon model with region-specific biophysical parameters.The power-law parameter of the endogenous fluctuations and of the observation noise are set as in Eq. ( 24).
For simplicity, we have normalised the rows and columns of the cross-spectral density so that integrating over all frequencies generates the correlation matrix instead of the covariance matrix.
data and a step-by-step tutorial to replicate the DCM specification and estimation results using the SPM graphical interface.Despite the clear differences due to different data and parameter setting, the empirical cross-spectral density plots also feature a single large peak at low frequencies, followed by a decay at larger frequencies (with smaller fluctuations).If the data is too noisy, due to head-motion and various physiological artifacts, additional large peaks may appear at higher frequencies.These empirical plots can be visualised using the review function (spm dcm fmri csd results()) in SPM.Another DCM diagnostics function (spm dcm fmri check()) in SPM also reports the percentage of variance explained (R 2 ), which is a useful performance metric to judge the quality and success of the model fit to the data (see Fig.

Effective and functional connectivity
The main difference between effective and functional connectivity is that the former characterises the interaction between neuronal state variables, while the latter describes statistical dependence between the observed variables (e.g. the BOLD signals in fMRI).Given its logical and biological precedence, effective connectivity can be used to derive functional connectivity.Specifically, the functional connectivity matrix of the observed time series can be obtained by integrating the cross-spectral density over all frequencies.The reason will become clear in the following sections and a mathematical proof will be given in Eq. ( 37).Let's first return to the simulated example depicted in Fig. 4 to understand the relevant implications.The effective connectivity matrix of the system is given in Eq. ( 26).Integrating the cross-spectral density in Fig. 5 over all frequencies gives us the 3 × 3 symmetric correlation matrix R, typically used to quantify the functional connectivity 4 : Crucially, all three pairwise correlations (ρ 21 , ρ 31 , ρ 32 ) explicitly depend on the effective connectivity parameter a 21 , as illustrated in Fig. 7 (for the analytic solutions, see Appendix C).This shows that a local variation in a single effective connectivity parameter in the A matrix can have a global impact across all functional connectivity values in the network.Let's examine each pair to unpack some of the many nuances involved.First, the symmetric nature of the correlation matrix requires that ρ k j = ρ jk , even if a k j a jk .This is a key difference between functional and effective connectivity: only the latter is directed and is able to differentiate between two bilateral connections.
Both correlations ρ 21 and ρ 31 increase monotonically with a 21 , but this is not the case for ρ 32 .Therefore, there is no oneto-one mapping between effective and functional connectivity that holds in general (Park and Friston, 2013).This poses a challenge for spectral DCM because it makes model fitting an ill-posed problem with multiple potential solutions.Technically, this problem is mitigated by using the cross spectral density that implicitly contains information about functional connectivity over all lags (we will unpack this below) and by using priors on the solutions implicit in the functional form of the DCM.Although model fitting remains an ill-posed problem, these two additional constraints allow spectral DCM to find better solutions, such that the model can reproduce a larger set of statistical relationships between the observed time series.One 4 Usually, integrating the cross-spectral density over all frequencies would produce the covariance matrix; normalising its rows and columns would then generate the correlation matrix as a second step.In Fig. 5, we have already normalised the rows and columns of the cross-spectral density so that integrating over all frequencies generates the correlation matrix directly.The nine integrals produce real numbers because all the imaginary parts are 'odd functions' so their integrals are equal to zero.increase monotonically with a 21 , but this not the case for ρ 32 .Therefore, there is no one-to-one mapping between effective and functional connectivity that holds in general.
way to appreciate the amount of additional information and constraints provided by the cross-spectral density over the zero-lag correlation is to inspect Fig. 5 again.Zero-lag correlation only measures the area under the curves in the nine plots, regardless of their detailed shapes.On the other hand, spectral DCM fits all values taken by the cross-spectral density curves at different frequencies.This requirement narrows the parameter space is a useful way (note that the area under the curve is still reproduced as a consequence).
The sign of ρ 31 (representing a positive or a negative correlation) either matches or contradicts the sign of the underlying effective connectivity a 31 (representing an excitatory or an inhibitory connectivity), depending on how we set a 21 .There is a value of a 21 such that ρ 31 = 0, even though the underlying effective connectivity a 31 is negative (i.e., inhibitory).Again, this shows that each pairwise functional connectivity value is a summary statistic (global property) of the system: given a pair of variables j and k, the correlation value ρ k j does not only depend on the corresponding effective connectivity parameter a k j but, potentially, on all the entries of the A matrix.
While here we have only discussed its dependence on the effective connectivity, functional connectivity also depends on the parameters characterising the endogenous fluctuations, the observation function, and the observation noise.This is because all these parameters appear in the forward model for the crossspectral density derived in Eq. ( 22) and, in turn, determine the correlation matrix of the system.In fMRI, specific and reproducible spatial patterns of functional connectivity could simply arise from specific and reproducible variations in the hemodynamic response across brain regions-even in the absence of interregional effective connectivity (Rangaprakash et al., 2018).When comparing groups, differences in BOLD functional connectivity may reflect differences in the vasculature rather than in effective connectivity (or both), e.g.due to ageing or to a neurodegenerative or psychiatric disease (Tsvetanov et al., 2020).This important point is further discussed in (Friston, 2011), where it is shown that the correlation values are also influenced by different levels of observation noise: "one can see a change in correlation by simply changing the signal-to-noise ratio of the data.This can be particularly important when comparing correlations between different groups of subjects.For example, obsessive compulsive patients may have a heart rate variability that differs from normal subjects.This may change the noise in observed hemodynamic responses, even in the absence of neuronal differences or changes in effective connectivity."In fact, separating these confounding factors from the effective connectivity was one of the main motivations for the development of DCM.This is not to say that DCM is without issues: fitting a large model with many parameters-using the limited amount of data available in typical fMRI studies-is not guaranteed to produce optimal estimates.We'll discuss this and other limitations in Section 4.
Functional connectivity faces yet another challenge.Even though we observed that any variations in effective connectivity can propagate through the network and affect the correlation among several brain areas, one could hope that the pair of variables showing the largest functional connectivity change would coincide with the pair evincing the largest effective connectivity change.Alas, this is also not guaranteed.When a 21 increases from zero to one, R shows the following changes: 0.513367 0.379316 0.513367 0 0.782089 0.379316 0.782089 0 The largest functional connectivity change is observed in ρ 32 , not in ρ 21 as we might have expected.The fact that the variable pairs showing the largest changes in functional connectivity do not necessarily coincide with the pairs with the largest changes in effective connectivity may hinder the use of functional connectivity as a quick and easy way for selecting regions of interest in DCM studies.
On the other hand, functional connectivity has proven valuable for fingerprinting, that is, identifying an individual based on their brain activity (Finn et al., 2015).It is also useful in studies aiming to differentiate between two groups or conditions (by detecting statistically significant changes in correlation between observed time series), rather than in identifying which effective connections between brain regions underlie that change.Moreover, being a global property of the system, each pairwise functional connectivity value naturally captures higher-order interactions, which is a topic of growing interest in complex systems, network science, and neuroimaging (Benson et al., 2016;Rosas et al., 2022).
So far, we have only examined the zero-lag correlation matrix because it is widely used to quantify the functional connectivity in neuroimaging.However, spectral DCM doesn't just explain the zero-lag correlation but also the cross-correlations between all variables at all time lags.To understand this point, in the next section, we will invoke the elegant Wiener-Khinchin theorem, which links the cross-covariance function to the cross-spectral density.

Autocovariance and cross-covariance functions
Let's briefly revisit the role of the self-connections in the presence of the endogenous fluctuations.As discussed in Section 2.2, a self-connection determines the rate of decay of a variable.It can also be understood as quantifying the memory of a variable, i.e., whether inputs have a short-lived or long-lasting impact on its activity.Large negative self-connections reflect short memory: the variable "forgets" its past quickly and responds promptly to any new inputs.On the contrary, small negative values reflect long memory: the neuronal variable integrates and smooths out the endogenous fluctuations and other inputs, resulting in slower oscillations and lower frequencies.
The concept of a system having memory can be quantified via the autocovariance function.For a deterministic signal z(t), there is a simple intuition: the autocovariance function at a time lag ∆t measures the similarity (i.e., sample covariance) between the time series z(t) and a shifted version of itself, that is, z(t + ∆t).The agreement is perfect when there is no shift (∆t = 0) and it typically decreases with longer time lags, unless the signal is constant or periodic.However, spectral DCM is concerned with stochastic (non-deterministic) processes, as we discussed in Section 2.3 and illustrated in Appendix A. Let's consider the single stochastic neuronal variable x 1 .At any given time point, x 1 (t) is not a number but a random variable.If, after a time interval ∆t, the random variable x 1 (t + ∆t) is still positively correlated with its previous state x 1 (t), the autocovariance between the two time points would be positive.If, as the time interval further increases, x 1 forgets its past state and become independent of it, the autocovariance would become zero at that point (for BOLD signals, time intervals or time lags are always multiples of the repetition time).In summary, the autocovariance function measures the covariance between the states of the same stochastic process at two different points in time.For the stationary processes considered here, the autocovariance only depends on the time interval ∆t: Dividing the autocovariance function by σ 11 (0) would produce the commonly used autocorrelation function, whose values are normalised to the [−1, 1] interval.We can compute the autocovariance explicitly in the simple scenario of Eq. ( 10), where x 1 is driven by endogenous fluctuations only, i.e., it doesn't receive any other inputs.For simplicity, assume that the endogenous fluctuations were modelled as a white-noise process with α v 1 = 1 and β v 1 = 0.The equation would then describe an Ornstein-Uhlenbeck process that has the following autocovariance function (Vatiwutipong and Phewchean, 2019): This confirms the intuition above: large negative values of a 11 correspond to short memory, in this case, the decay is exponential.Starting from the Ornstein-Uhlenbeck process as an example (top left), the power spectral density can either be computed as the variance of the Fourier transform (top right, then bottom right), or as the Fourier transform of the autocovariance function (bottom left, then bottom right).The former uses the definition of the power-spectral density; the latter uses the Wiener-Khinchin theorem.In the Ornstein-Uhlenbeck process, the endogenous fluctuations (v 1 (t)) are modelled as a white-noise process.Its Fourier transform, denoted as V 1 (ω), is a Normal random variable for each frequency ω.
There is another reason why the autocovariance function is relevant for spectral DCM and, more generally, for DCM applications to electroencephalography (EEG) and magnetoencephalography (MEG).These models and modalities are concerned with the power spectral density of signals, and the Wiener-Khinchin theorem proves that the power spectral density of a stationary process is the Fourier transform of its autocovariance function.In the case of x 1 , we get This offers an alternative way to compute the power spectral density compared to the definition given in Section 2.3.The two equivalent representations are often portrayed with arrow diagrams in spectral DCM papers (Friston et al., 2014;Razi and Friston, 2016).Here, a similar diagram is provided in Fig. 8 using the Ornstein-Uhlenbeck process as an example.
In the presence of multiple state variables, the autocovariance function is replaced by the more general cross-covariance function Σ x (∆t) = cov(x(t), x(t + ∆t)).
For any time lag ∆t, the cross-covariance function produces a N × N matrix with entries Each diagonal element in this matrix is the autocovariance function of a single neuronal variable, introduced in Eq. ( 29).The off-diagonal elements are the cross-covariance functions between different state variables.For deterministic processes, the visual intuition is analogous to the autocovariance case: for each pair of variables x j and x k , the cross-covariance function σ jk (∆t) measures the similarity between the time series x j (t) and a shifted version of x k by a time lag ∆t.The analogy with memory can then be used to extend the intuition to stochastic processes.When there is no time lag (∆t = 0), the cross-covariance function produces the covariance matrix Σ x (0), which is a symmetric matrix because σ jk (0) = σ k j (0).After normalisation, this yields the correlation matrix that is used to quantify the functional connectivity in neuroimaging.

Cross spectral density and functional connectivity
We can again invoke the Wiener-Khinchin theorem and obtain the cross-spectral density as the Fourier transform of the crosscovariance function: We previously noted that the off-diagonal elements of the crossspectral density matrix are complex numbers and that their definition provided in Eq. ( 9) doesn't lend itself to an intuitive understanding.However-now armed with the Wiener-Khinchin theorem-we gain another perspective.Each offdiagonal element is equivalent to the Fourier transform of the cross-covariance function between two variables.By modelling the cross-spectral density, spectral DCM doesn't just capture the zero-lag correlations but also the cross-correlations between all variables at all time lags.This is true for both the hidden neuronal variables and the observed (e.g.BOLD) variables.The equivalent of Eq. ( 34) for the observed variables is: We opened Section 3 by stating that the functional connectivity matrix can be obtained by integrating the cross-spectral density over all frequencies.To understand why, we need to invert Eq. ( 35) using the (inverse) Fourier transform which retrieves the cross-covariance function from the crossspectral density G x (ω).When t = 0, we have the desired result The correlation matrix R is finally obtained by normalising the covariance matrix.

Assumptions and limitations
State-space modelling The foundational hypothesis of the DCM framework is that the system of interest can be modelled using a state-space approach.This formulation separates the equations describing the temporal evolution of unobserved variables (e.g. the neuronal activity) from those describing the observed variables (e.g. the BOLD).Such a separation is important because some of the model parameters have a direct interpretation in terms of effective connectivity between unobserved neuronal populations rather than statistical dependency between observed variables.A subtle and important consequence of the distinction between functional effective connectivity is that one can only estimate recurrent or self inhibition using a state space model.This is because the correlation of a time series with itself is always one (and the variance is always positive).Assessing self connectivity in terms of excitability or disinhibition of a node can be empirically important (e.g. in estimating changes in excitation-inhibition balance or condition-specific changes in the 'gain' of a particular source or region).
Continuous-time formulation Like most DCM approaches, spectral DCM treats time as a continuous quantity.Representing time as a sequence of discrete steps may seem more natural, especially for fMRI, where the observations are recorded at evenly spaced time intervals with a relatively low sampling rate.However, DCM also models the neuronal activity, which unfolds at a much faster time scale and in an asynchronous manner.These two features can be naturally modelled using differential equations in continuous time.The discrete formulation usually converges to the continuous one with faster acquisition times, as is the case for EEG and MEG.

Separation of time scales and macroscopic modelling
Spectral DCM assumes that a single macroscopic neuronal variable can capture the essential dynamical properties of a population of neurons.This is not just a pragmatic way to avoid billions of equations representing individual neurons.
It is an approach based on the separation of time scales that has a long history in dynamical systems theory (Carr, 1981;Haken, 1983).The assumption is that the neuronal activity can be separated into slow and fast modes: the fast modes decay quickly so that the long-term behaviour of the system can be described by a few slow modes, or even a single one.Mathematically, the assumption (often satisfied in real systems) is that only a small number of eigenvalues of the Jacobian are near zero, while the rest are large and negative (Friston et al., 2011(Friston et al., , 2021)).Macroscopic modelling also relies on the mean-field assumption that the dynamics of one region are determined by the mean activity in another (Deco et al., 2008).Since the mean activity is dominated by the slow modes, it is possible to build a compact macroscopic model where only the slow modes are communicated among brain regions, whereas the fast endogenous fluctuations only affect the local activity.This is precisely the neuronal model in Eq. ( 1).It is important to note that, despite being grounded in dynamical and complex systems theory, this model is an abstraction.Biological details are necessarily omitted to enable the analytic treatment and faster numerical computations.A detailed critical review of the biophysical and statistical foundations of DCM is provided in (Daunizeau et al., 2011).
Stationarity Among the DCM variants, spectral DCM has the most direct conceptual and analytical links to functional connectivity, which we have examined in Section 3.Both methods can be read as assuming that the observed processes are weakly stationary, i.e., that their covariance remains unchanged over the length of the experiment (Liégeois et al., 2017).In the case of functional connectiv-ity, this assumption arises because the covariance is used as a proxy for connectivity, after being normalised to obtain the correlation matrix.Therefore, assuming weak stationarity is equivalent to assuming that the functional connectivity remains unchanged.Similarly, in spectral DCM, the stationarity assumption allows one to interpret the effective connectivity as remaining unchanged over the length of the time series (Moran et al., 2009;Friston et al., 2014).This is because the effective connectivity is inferred from the observed cross-spectral density, i.e., the Fourier transform of the cross-covariance function (see Section 3.2), which remains unchanged under stationarity assumptions.However, in contrast to functional connectivity, spectral DCM only treats the cross-covariance of the observed time series as a means to an end, where the end is to infer the effective connectivity between neuronal variables (and various other model parameters).In other words, spectral DCM looks "under the hood" of functional connectivity and beyond the observed variables.
The stationarity assumption can certainly be challenged and it is generally untenable in task experiments; nonetheless, it is widely adopted in resting-state studies.Stationarity is not just typically assumed in functional connectivity and spectral DCM, but also in Granger causality analysis (Granger, 1969;Seth et al., 2013), transfer entropy (Bossomaier et al., 2016) and autoregressive modelling (Liégeois et al., 2017).That said, Granger causality and informationtheoretic methods have been adapted for non-stationary processes (Dhamala et al., 2008;Lizier et al., 2008;Gómez-Herrero et al., 2015;Wollstadt et al., 2014;Paluš, 2017) and time-varying approaches to functional connectivity analysis have been rapidly gaining popularity (Lurie et al., 2020;Novelli and Razi, 2022).A time-varying extension of spectral DCM has also been developed (Park et al., 2018).All of these methods relax the stationarity assumption and allow the statistical properties of the system to change over the course of the experiment.Practically, in DCM, one appeals to something called an adiabatic approximation: that effective connectivity is constant over a small timescale but can change at longer timescales.This means that one can apply spectral DCM to short segments of data and then examine (or model) slow fluctuations in effective connectivity (Jafarian et al., 2021;Rosch et al., 2019;Zarghami and Friston, 2020).
Linearity The second assumption that spectral DCM (partially) shares with functional connectivity is linearity, although nonlinear extensions of both methods exist (Stephan et al., 2008;Kraskov et al., 2004).In DCM for fMRI, the use of a linear random differential equation to model the neuronal activity is motivated by the separation of time scales, whereby the neuronal variables represent the slow modes of the system, which are assumed to be linearly coupled, while the endogenous random fluctuations represent the fast modes (Friston et al., 2011).Spectral DCM also extends the linearity assumption to the observation function, which can be a nonlinear function of time and frequency (as in Eq. ( 25)) but is linearly convolved with the neuronal activity (Eq.( 13)).In Section 2.4, we have observed that deterministic linear models have a limited scope since they can only describe a system that converges to equilibrium or generates a sequence of identical oscillations.Adding stochastic terms to the linear differential equations allows for a richer repertoire of behaviours.5 Stochastic fluctuations The addition of a stochastic term to a dynamical system is traditionally used to model noise, often assumed to be white and temporally uncorrelated.The underlying assumption is that the noise is due to physical processes operating at a much faster time scale than the state variables, e.g.microscopic thermal fluctuations.Spectral DCM relaxes this assumption and allows the endogenous fluctuations to be temporally correlated, with a spectrum following a power-law decay in the frequency domain (see Section 2.4 to see how this form includes white noise as a special case).This is easily motivated by noting the endogenous fluctuations are themselves generated by dynamical processes within the source or region of interest.The ensuing temporal autocorrelation makes the endogenous fluctuations differentiable and smooth, in line with their characterisation as mixtures of fast modes of a dynamical system (Friston et al., 2011).
Gaussianity In line with most DCM approaches, spectral DCM assumes a Gaussian distribution for the prior over model parameters (the non-negative parameters are transformed using the natural logarithm and are assumed to follow a log-Normal distribution).This enables a fast Bayesian inversion scheme called variational Laplace (Zeidman et al., 2022;Friston et al., 2007).As for most of the assumptions listed in this section, the Gaussian hypothesis can also be relaxed and other (albeit more computationally intensive) inversion schemes can be used instead, e.g.Markov chain Monte Carlo methods (Friston et al., 2007;Aponte et al., 2022;Xie et al., 2023).
Many-to-one mapping There is no one-to-one mapping between effective and functional connectivity (or crossspectral density) that holds in general.This is a challenge for spectral DCM because it makes model inversion an ill-posed problem with multiple potential solutions, in the absence of any constraints on the way data are generated.
As with all ill-posed problems, this is addressed by placing prior constraints on the explanations in the form of a model and prior densities over model parameters.When one does not know which priors to use, a weighted average of plausible priors is often performed in DCM analysis using Bayesian model averaging (Hoeting et al., 1999)), where each set of priors corresponds to a separate model.A related challenge is that one cannot always use functional connectivity to identify the regions of interest to study using DCM.As we saw in Section 3, not even the regions showing the largest differences in correlation are guaranteed to coincide with the regions with a change in effective connectivity.
Computational complexity Despite the simplifying assumptions mentioned above, the computational complexity of spectral DCM limits the number of brain regions that can be studied in reasonable time.Selecting the regions of interest requires more upfront work than in functional connectivity analysis, which can be quickly performed across the whole brain.Given the large size of the parameter space, the specification of the model has been noted as a conceptual issue for DCM in the past (Friston et al., 2013).That said, a more recent theoretical advance now enables the exploration of a large model space using Bayesian model reduction (Seghier and Friston, 2013).Alternatively, one can introduce further assumptions to winnow the parameter space.Using functional connectivity to place prior constraints on the eigenvectors of the effective connectivity matrix enables spectral DCM analyses with dozens of brain regions (Razi et al., 2017).In fMRI, ignoring the spatial variability of the hemodynamics and removing the separation between hidden and observed variables leads to the regression DCM scheme, which can analyse hundreds of regions in minutes (Frässle et al., 2021).However, this method forgoes the state-space formulation and can be understood as a Bayesian multivariate regression in the frequency domain.

Figure 1 :
Figure 1: The distinctive feature of spectral Dynamic Causal Modelling (DCM)is the focus on modelling the measured cross-spectral density (top right), which is a second-order summary statistic of the time-series data.This is closely related to Pearson's correlation (bottom left), another second-order statistic and the most widely used measure of functional connectivity in neuroimaging.In fact, both can be derived from the cross-covariance function (top left).The cross-spectral density is obtained directly via the Fourier transform (horizontal arrow).As such, it is the equivalent representation of the cross-covariance function in the frequency domain instead of the time domain.The correlation is obtained in two steps (vertical arrows).We first compute the covariance as a special case of the cross-covariance function between two time series, by setting the time lag between them to zero.Finally, we normalise the covariance such that its values are restricted to the [−1, 1] interval to obtain the zero-lag correlation.For the mathematical relationships among these quantities, we refer readers to(Friston  et al., 2014, Fig. 1).

Figure 5 :
Figure5: Simulated BOLD cross-spectral density of the system in Fig.4, generated via the forward model in Eq. (22).The effective connectivity matrix is presented in Eq. (26).This figure only shows the special case where a 21 = 0 but using different values of a 21 would generate different sets of nine plots.Each plot corresponds to a pair of state variables and, if we chose a specific value for the frequency ω, we would obtain a 3 × 3 matrix (the cross-spectral density matrix at that specific frequency).The three plots on the diagonal represent the power spectral densities, which take real values (zero imaginary part).The same canonical hemodynamic response function defined in Eq. (25) is used for all variables in this example, although spectral DCM employs the BOLD balloon model with region-specific biophysical parameters.The power-law parameter of the endogenous fluctuations and of the observation noise are set as in Eq. (24).For simplicity, we have normalised the rows and columns of the cross-spectral density so that integrating over all frequencies generates the correlation matrix instead of the covariance matrix.

Figure 6 :
Figure 6: BOLD cross-spectral density obtained via spectral DCM analysis of a real resting-state fMRI dataset.The dashed lines represent the predicted cross-spectral density and the solid lines the observed ones.Reproduced with permission from Chapter 38 of the SPM12 manual (John Ashburner et al., 2020).

Figure 7 :
Figure7: Simulated BOLD functional connectivity of the system in Fig.4, as a function of the effective connectivity parameter a 21 .The three curves corresponding to the correlation values are obtained analytically by integrating the cross-spectral density shown in Fig.5over all frequencies.Changing a single effective connectivity parameter, a 21 , has a global impact across all pairwise functional connectivity values in the network.Both correlations ρ 21 and ρ 31 increase monotonically with a 21 , but this not the case for ρ 32 .Therefore, there is no one-to-one mapping between effective and functional connectivity that holds in general.

Figure 8 :
Figure 8: The two equivalent representations of the power spectral density.Starting from the Ornstein-Uhlenbeck process as an example (top left), the power spectral density can either be computed as the variance of the Fourier transform (top right, then bottom right), or as the Fourier transform of the autocovariance function (bottom left, then bottom right).The former uses the definition of the power-spectral density; the latter uses the Wiener-Khinchin theorem.In the Ornstein-Uhlenbeck process, the endogenous fluctuations (v 1 (t)) are modelled as a white-noise process.Its Fourier transform, denoted as V 1 (ω), is a Normal random variable for each frequency ω.

Figure A. 9 :
Figure A.9: Illustration of an ensemble of realisations of an Ornstein-Uhlenbeck process.Each curve is a different realisation, independent of the others.The process is sliced at two time points t = 1 and t = 2.Although the individual realisations (curves) take different values at these two time points, they collectively preserve the same Gaussian distribution.This property is known as stationarity and can be assumed when using spectral DCM.Non-stationary processes have different distributions at different time points and would require different models that allow for time varying parameters, e.g.stochastic DCM or adiabatic DCM.
Cross-spectral density of the BOLD signal as a function of the effective connectivity between two neuronal state variables (a 21 ).In this first example, increasingly large and positive values of a 21 generate increasingly large and positive cross-spectral density amplitudes, while negative values generate negative amplitudes.However, this monotonic relationship doesn't hold in more realistic scenarios.When a 21 = 0, the two neuronal state variables are independent and the cross-spectral density is zero at all frequencies.