## Abstract

This letter proposes an algorithm for linear whitening that minimizes the mean squared error between the original and whitened data without using the truncated eigendecomposition (ED) of the covariance matrix of the original data. This algorithm uses Lanczos vectors to accurately approximate the major eigenvectors and eigenvalues of the covariance matrix of the original data. The major advantage of the proposed whitening approach is its low computational cost when compared with that of the truncated ED. This gain comes without sacrificing accuracy, as illustrated with an experiment of whitening a high-dimensional fMRI data set.

## 1. Introduction

The univariate and multivariate linear regression models are simple and widely used parametric models for fMRI data analysis (Ashby, 2011; Lazar, 2010). In these models, the linear (deterministic) part is used to characterize the activation response in a single voxel for the univariate model or a group of voxels for the multivariate model and the baseline drift, whereas the second (stochastic) part of the model characterizes the noise from both physical and physiological processes (Ashby, 2011; Lazar, 2010).

On the one hand, the univariate linear regression model uses a hemodynamic response function (HRF) as a parameter vector with extra parameters for the drift and a univariate temporally correlated noise to model the fMRI time series at each voxel to infer task-related activations with estimates of the level of significance (Ardekani, Kershaw, Kashikura, & Kanno, 1999; Seghouane & Shah, 2012). As a consequence, the sensitivity of hypothesis methods based on this model across a group of voxels is crucially dependent on the group correction postprocessing step (Genovese, Lazar, & Nichols, 2002).

On the other hand, multivariate linear regression models have been used mainly in data-driven methods (Esposito, Formisano, Seifritz, Goebel, Morrone, Tedeschi, & Di Salle, 2002; Calhoun, Adali, Pearlson, & Pekar, 2001b). They allow the exploitation of the relationships among voxels. Among the multivariate data-driven techniques used in fMRI, independent component analysis (ICA; Hyvarien, Karhunen, & Oja, 2001; Jolliffe, 2002) has been widely applied to fMRI to find components that are spatially (Esposito et al., 2002) (sICA) or temporally (Calhoun, Adali, Pearlson, & Pekar, 2001a) (tICA) independent.

A common preprocessing step used with both models is prewhitening. In the case of the univariate model, this allows the generation of the best linear unbiased estimates of the parameters vector and therefore more accurate activation tests. In the ICA case, it has the advantage of reducing the complexity and improving the convergence of the ICA algorithm (Hyvarien et al., 2001).

Prewhitening is obtained by multiplying the fMRI data by the square root of the inverse covariance matrix, which can be obtained from the Cholesky factorization or the eigendecomposition (ED) of the inverse covariance matrix. As it is well known, however, the Cholesky factorization and the eigendecomposition are computationally expensive, scaling *O*(*p*^{3}) for a dense covariance matrix. Although this is not a major inconvenience when dealing with univariate linear regression modeling of fMRI time series or sICA, this can be a major problem for tICA when the number of voxels considered is very large. In fMRI analysis, the spatial resolution is often at least about over 32 slices, resulting in 131,072 voxels. Prewhitening needs substantial computational work to get the eigendecomposition. To overcome this inconvenience, an alternative approach for prewhitening large-dimensional multivariate vectors is proposed in this letter.

The proposed subspace method constructs an approximated whitened vector based on Krylov sequences of subspaces reachable from the unwhitened vector. The goal of the proposed algorithm is the same as the approach based on ED: to preserve the quality of the resulting product between the inverse square root of the covariance matrix and the data vector to be whitened in the major eigendirections of the covariance matrix. A Lanczos-based approach is used to achieve this goal by using a relatively small number *r* of Lanczos vectors. The advantage of the proposed approach is its computational complexity, which is *O*(*rp*^{2}) with , compared to the *O*(*p*^{3}) associated with the ED-based approach. Like the Cholesky factorization or the eigendecomposition, the proposed method involves an eigendecomposition (factorization), but it is implemented on a much smaller matrix, resulting in a computationally much cheaper whitening method in comparison to *O*(*p*^{3}). The proposed method is particularly appealing when a reduced-rank approximation of the covariance matrix is used for prewhitening.

The rest of the letter is organized as follows. The prewhitening preprocessing step of the multivariate temporal fMRI time series is reviewed in section 2. The proposed prewhitening algorithm is described in section 3. The choice of the dimension of the Krylov subspace is discussed in section 4. The performance of the proposed whitening method on real fMRI data is illustrated in section 5. Concluding remarks are given in section 6.

## 2. Prewhitening and the Multivariate fMRI Temporal Model

*n*across

*p*voxels. The matrix contains

*m*temporally independent sources of length

*n*. In the prewhitening step, the mixed vector

**y**is processed by where and

*V*are the mean vector and covariance matrix of

**y**, such that

**z**is . In classic prewhitening, they are estimated by in a batch way sampling.

*V*assumed full rank positive definite, is usually obtained from the eigendecomposition as follows, where is the diagonal matrix of eigenvalues and

*U*is the unitary matrix of eigenvectors. However, as in many scientific fields including tICA in fMRI, where the number of samples

*n*is much smaller than the number of variables

*p*, although the data are presented in a high-dimensional space, they actually have a much lower intrinsic dimensionality, . In this case,

*V*is generally rank deficient, and the best full rank-

*k*approximation of

*V*in the 2-norm (obtained by truncating the ED) is used instead of

*V*, where

*U*consists of the first

_{k}*k*columns of

*U*and

*S*is the

_{k}*k*th principal submatrix of

*S*. When a very large number of voxels is considered, prewhitening is computationally expensive since it requires a computational complexity of order

*O*(

*np*

^{2}) for the covariance matrix and

*O*(

*p*

^{3}) for the eigendecomposition. The usefulness of the latter is questionable since only the major eigenvalues and eigenvectors of

*V*are needed in equation 2.4. Furthermore, the least squares residual associated with the centered whitened vector with equation 2.4 is given by Relation 2.5 shows that the truncated ED used for choosing the order in which to include components is unnatural. Based on the least squares residual, the components should be chosen using the magnitude of the components of . The singular values (principal components) are ordered in such a way that they reflect the direction of the largest variation in the covariance matrix

*V*. However, the purpose of the whitening procedure is to reduce the least squares residual using a smaller-dimension projection subspace. Then the choice of the components should be done according to the components of the vector .

In what follows, based on Krylov sequences of subspaces reachable from the vector **y**, we propose a computationally efficient approach for generating the major eigenvectors and eigenvalues of *V* without computing its eigendecomposition. An eigendecomposition is still necessary but is applied on a much smaller matrix of dimension .

## 3. Krylov Subspace Approximations for Prewhitening

**y**, a Krylov subspace is a subspace of the form Increasing the dimension

*r*allows us to reach the entire subspace of the pair (

*V*,

**y**). The dimension . In our computation, we not only use the Krylov subspace

*K*(

_{r}*V*,

**y**) but also an orthonormal basis for it. The Lanczos algorithm builds this orthonormal basis.

*r*vectors

*q*, that form an orthonormal basis of the Krylov subspace

_{i}*K*(

_{r}*V*,

**y**). These vectors satisfy the three-term recurrence with . The coefficients and are computed so as to ensure that and ||

*q*

_{i+1}||=1. The computational cost of this algorithm is

*O*(

*rp*

^{2}) for a dense matrix

*V*. If , then an important equality that results from the algorithm is where

*T*is a tridiagonal matrix. The eigenvalues of

_{k}*T*are called Ritz values, and for an eigenvector

_{r}**w**of

*T*,

_{r}*Q*

_{r}**w**defines the Ritz vector. As

*r*increases, more Ritz values and vectors will converges toward the eigenvalues and eigenvectors of

*V*(Golub & Van Loan, 1996; Saad, 1992; Chen & Saad, 2009).

*K*(

_{r}*V*,

**y**) reached from a small number of

*r*of naive vectors is used for the construction of an approximation of

**z**. The eigendecomposition of

*T*, defined in equation 3.1, obtained from

_{r}*K*(

_{r}*V*,

**y**) is easily computed in practice because

*r*is assumed to be small: The eigenvalues in

*D*are known to approximate the extremal eigenvalues of

*V*, and

*Q*are the approximate eigenvectors.

_{r}W**z**such that (Eldar & Oppenheim, 2005) is minimum subject to Assuming the vector ,

**z**can be written as for some appropriate . The orthonormal basis

*Q*generated by the Lanczos algorithm is used to represent this vector. In the basis

_{r}*Q*, can be written as for some appropriate and

_{r}**t**is since

*U*is orthogonal. With this, the MSE, equation 3.3, can be written as where

*t*is and is

_{i}*N*(0,

*s*). The minimum of equation 3.7 is obtained for the maximum of the last term of the right-hand side of equation 3.7. Using Cauchy-Schwartz inequality, the maximum of equation 3.7 is obtained for Therefore, and By replacing in equation 3.7 and minimizing with respect to , we have when we use equation 3.12,

_{i}**z**becomes where ,

*e*

_{1}is the first column of the identity matrix and the first vector of the basis

*Q*is

_{r}*q*

_{1}=

**y**/||

**y**||

_{2}. The approximate whitened vector is then based on computing the inverse square root of a matrix of substantially smaller dimension , where it is inexpensive to compute exactly.

The above procedure is captured in algorithm 1.

*r*, equation 3.13 gives the orthogonal projection of the exact solution, equation 2.2, onto the Krylov subspace. When projected to the subspace spanned by the major eigenvectors of

*V*, this approximation is very close to the projection of

**z**if

*r*is sufficiently large.

*f*

_{1}(.) above does not depend on

**y**, that is, is linear in

**y**, any will increase the bias in the

*i*th direction. The proposed whitening procedure in equation 3.11 is a filtering estimator as well. The shrinkage function in this case is closely related to the Ritz values and vectors and is given by Van der Sluis and Van der Vorst (1997): In comparison to equation 3.16, is nonlinear in

**y**. The values

*f*(

*s*) depend on the eigenvalues of

_{i}*T*, and

_{r}*T*in turn depends on

_{r}**y**in a complicated manner.

*f*

_{2}is related to equation 2.5 through , which are subspaces of the space spanned by the vectors

**u**

_{i}for which .

*Q*is obtained through a Gram-Schmidt orthonormalization (GSO) of

_{r}*K*: The values of

_{r}*f*

_{2}(

*s*) automatically take into account both equation 2.5 and the variances in

_{i}*V*, whereas the value of

*f*

_{1}extracts only the directions

**u**

_{i}that explain the maximum variance in V. Examples of

*f*

_{2}for different values of

*r*are presented in Figure 1 for a symmetric matrix of dimension 10.

where **u**_{j} is the *j*th eigenvector of *V*, *m*=*r*−*j* with , *c _{j}*>0 and are constants independent of

*r*(not considering ) and

*T*(.) is the Chebyshev polynomial of degree

_{m}*m*. Therefore, the approximation error projected on

**u**

_{j}decays as . The derivation of this inequality is given in the appendix.

where . Therefore, converges geometrically.

## 4. Choosing the Dimension r

Similar to the problem of determining the most appropriate dimension *k* when using equation 2.4, the dimension *r* of the Krylov subspace *K _{r}*(

*V*,

**y**) plays a crucial role in finding the accurate approximation of the major eigenvectors and eigenvalues of

*V*. This number

*r*also corresponds to the number of steps in algorithm 1. It can be chosen beforehand using a model selection criterion (Seghouane & Bekara, 2004; Seghouane & Cichocki, 2007) or determined by a convergence rule tested at the end of each iteration of the algorithm. In this latter case, the whitened vector is computed at the end of each step

*i*of the algorithm.

## 5. Application to fMRI Time Series Whitening

The data used to assess the performance of the proposed prewhitening method were generated from an fMRI experiment performed to investigate an event-related right-finger tapping task. A 3.0 T functional MRI system was used to acquire the whole brain BOLD/EPI images. Each acquisition consisted of 35 conitguous slices ( 3.44 mm × 3.44 mm × 4 mm voxel). The data were recorded for 650 s with TR =2 s. First, 30 dummy scans were discarded. After the first 30 s of rest, the task and resting period activity, which consisted of a 14 s window, was repeated 40 times followed by an additional 30 s of rest. The task period consisted of 2 s of right-finger tapping. For the resting period that comes after the task, the interstimulus interval (ISI) ranged between 4 s and 20 s with an average ISI period of 12 s (Lee, Tak, & Ye, 2011).

*p*=1000,

*p*=5000,

*p*=10, 000, and

*p*=40, 000 voxels were constructed from both activated and nonactivated voxels. These data sets were whitened using the proposed algorithm and the truncated SVD approach implemented in the GIFT software (Calhoun, Adali, Pearlson, & Pekar, 2001a). The Q-Q plots (with Mahalanobis distance) were used to compare the underlying distribution of each of the whitened data sets with the standard normal distribution. As it can be seen from Figure 2, the Q-Q plots approximately follow the line

*y*=

*x*for whitened data sets obtained with both methods (the second and third lines of the figure). This indicates that the underlying distribution of the whitened data sets on the vertical axis is identical to the standard normal. On the other hand, when we look at the computational time, Table 1 indicates that the proposed whitening method is computationally more efficient than the classic approach of truncated ED or SVD.

Number of Voxels | 1000 | 5000 | 10,000 | 40,000 |

GIFT | 4.62 | 10.07 | 17.61 | 64.25 |

Proposed method | 0.10 | 0.86 | 1.64 | 8.13 |

Number of Voxels | 1000 | 5000 | 10,000 | 40,000 |

GIFT | 4.62 | 10.07 | 17.61 | 64.25 |

Proposed method | 0.10 | 0.86 | 1.64 | 8.13 |

## 6. Conclusion

This letter introduces a computationally efficient method for whitening high-dimensional fMRI data sets based on Krylov subspaces. The gain in the computational cost of the method is achieved by avoiding the use of the ED of V, which has a computational cost of *O*(*p*^{3}). Instead, a Lanczos-based approach is used to accurately approximate the major eigenvectors and eigenvalues of V. This has the advantage of avoiding the computation of the eigenvectors and eigenvalues that are not associated with the intrinsic dimension of the data, leading to a reduced computational cost of *O*(*rp*^{2}), . As illustrated with an experiment of whitening high-dimensional fMRI data, this comes without sacrificing accuracy.