Abstract

This letter proposes an algorithm for linear whitening that minimizes the mean squared error between the original and whitened data without using the truncated eigendecomposition (ED) of the covariance matrix of the original data. This algorithm uses Lanczos vectors to accurately approximate the major eigenvectors and eigenvalues of the covariance matrix of the original data. The major advantage of the proposed whitening approach is its low computational cost when compared with that of the truncated ED. This gain comes without sacrificing accuracy, as illustrated with an experiment of whitening a high-dimensional fMRI data set.

1.  Introduction

The univariate and multivariate linear regression models are simple and widely used parametric models for fMRI data analysis (Ashby, 2011; Lazar, 2010). In these models, the linear (deterministic) part is used to characterize the activation response in a single voxel for the univariate model or a group of voxels for the multivariate model and the baseline drift, whereas the second (stochastic) part of the model characterizes the noise from both physical and physiological processes (Ashby, 2011; Lazar, 2010).

On the one hand, the univariate linear regression model uses a hemodynamic response function (HRF) as a parameter vector with extra parameters for the drift and a univariate temporally correlated noise to model the fMRI time series at each voxel to infer task-related activations with estimates of the level of significance (Ardekani, Kershaw, Kashikura, & Kanno, 1999; Seghouane & Shah, 2012). As a consequence, the sensitivity of hypothesis methods based on this model across a group of voxels is crucially dependent on the group correction postprocessing step (Genovese, Lazar, & Nichols, 2002).

On the other hand, multivariate linear regression models have been used mainly in data-driven methods (Esposito, Formisano, Seifritz, Goebel, Morrone, Tedeschi, & Di Salle, 2002; Calhoun, Adali, Pearlson, & Pekar, 2001b). They allow the exploitation of the relationships among voxels. Among the multivariate data-driven techniques used in fMRI, independent component analysis (ICA; Hyvarien, Karhunen, & Oja, 2001; Jolliffe, 2002) has been widely applied to fMRI to find components that are spatially (Esposito et al., 2002) (sICA) or temporally (Calhoun, Adali, Pearlson, & Pekar, 2001a) (tICA) independent.

A common preprocessing step used with both models is prewhitening. In the case of the univariate model, this allows the generation of the best linear unbiased estimates of the parameters vector and therefore more accurate activation tests. In the ICA case, it has the advantage of reducing the complexity and improving the convergence of the ICA algorithm (Hyvarien et al., 2001).

Prewhitening is obtained by multiplying the fMRI data by the square root of the inverse covariance matrix, which can be obtained from the Cholesky factorization or the eigendecomposition (ED) of the inverse covariance matrix. As it is well known, however, the Cholesky factorization and the eigendecomposition are computationally expensive, scaling O(p3) for a dense covariance matrix. Although this is not a major inconvenience when dealing with univariate linear regression modeling of fMRI time series or sICA, this can be a major problem for tICA when the number of voxels considered is very large. In fMRI analysis, the spatial resolution is often at least about over 32 slices, resulting in 131,072 voxels. Prewhitening needs substantial computational work to get the eigendecomposition. To overcome this inconvenience, an alternative approach for prewhitening large-dimensional multivariate vectors is proposed in this letter.

The proposed subspace method constructs an approximated whitened vector based on Krylov sequences of subspaces reachable from the unwhitened vector. The goal of the proposed algorithm is the same as the approach based on ED: to preserve the quality of the resulting product between the inverse square root of the covariance matrix and the data vector to be whitened in the major eigendirections of the covariance matrix. A Lanczos-based approach is used to achieve this goal by using a relatively small number r of Lanczos vectors. The advantage of the proposed approach is its computational complexity, which is O(rp2) with , compared to the O(p3) associated with the ED-based approach. Like the Cholesky factorization or the eigendecomposition, the proposed method involves an eigendecomposition (factorization), but it is implemented on a much smaller matrix, resulting in a computationally much cheaper whitening method in comparison to O(p3). The proposed method is particularly appealing when a reduced-rank approximation of the covariance matrix is used for prewhitening.

The rest of the letter is organized as follows. The prewhitening preprocessing step of the multivariate temporal fMRI time series is reviewed in section 2. The proposed prewhitening algorithm is described in section 3. The choice of the dimension of the Krylov subspace is discussed in section 4. The performance of the proposed whitening method on real fMRI data is illustrated in section 5. Concluding remarks are given in section 6.

2.  Prewhitening and the Multivariate fMRI Temporal Model

ICA has become the main model for multivariate data-driven fMRI analysis. With this model, an observed data vector is modeled as a mixture of an unobservable source vector as follows,
formula
2.1
where is a mixing matrix. In tICA, a measurement matrix is formed by collecting fMRI times series of time length n across p voxels. The matrix contains m temporally independent sources of length n. In the prewhitening step, the mixed vector y is processed by
formula
2.2
where and V are the mean vector and covariance matrix of y, such that z is . In classic prewhitening, they are estimated by
formula
in a batch way sampling.
A prewhitened data set is necessary in some ICA algorithms. It reduces the complexity of the ICA problems Hyvarien et al., (2001). Given V assumed full rank positive definite, is usually obtained from the eigendecomposition as follows,
formula
2.3
where is the diagonal matrix of eigenvalues and U is the unitary matrix of eigenvectors. However, as in many scientific fields including tICA in fMRI, where the number of samples n is much smaller than the number of variables p, although the data are presented in a high-dimensional space, they actually have a much lower intrinsic dimensionality, . In this case, V is generally rank deficient, and the best full rank-k approximation of V in the 2-norm (obtained by truncating the ED) is used instead of V,
formula
2.4
where Uk consists of the first k columns of U and Sk is the kth principal submatrix of S. When a very large number of voxels is considered, prewhitening is computationally expensive since it requires a computational complexity of order O(np2) for the covariance matrix and O(p3) for the eigendecomposition. The usefulness of the latter is questionable since only the major eigenvalues and eigenvectors of V are needed in equation 2.4. Furthermore, the least squares residual associated with the centered whitened vector with equation 2.4 is given by
formula
2.5
Relation 2.5 shows that the truncated ED used for choosing the order in which to include components is unnatural. Based on the least squares residual, the components should be chosen using the magnitude of the components of . The singular values (principal components) are ordered in such a way that they reflect the direction of the largest variation in the covariance matrix V. However, the purpose of the whitening procedure is to reduce the least squares residual using a smaller-dimension projection subspace. Then the choice of the components should be done according to the components of the vector .

In what follows, based on Krylov sequences of subspaces reachable from the vector y, we propose a computationally efficient approach for generating the major eigenvectors and eigenvalues of V without computing its eigendecomposition. An eigendecomposition is still necessary but is applied on a much smaller matrix of dimension .

3.  Krylov Subspace Approximations for Prewhitening

Given a symmetric matrix and an initial vector y, a Krylov subspace is a subspace of the form
formula
Increasing the dimension r allows us to reach the entire subspace of the pair (V, y). The dimension . In our computation, we not only use the Krylov subspace Kr(V, y) but also an orthonormal basis for it. The Lanczos algorithm builds this orthonormal basis.
The Lanczos algorithm generates a set of r vectors qi, that form an orthonormal basis of the Krylov subspace Kr(V, y). These vectors satisfy the three-term recurrence
formula
with . The coefficients and are computed so as to ensure that and ||qi+1||=1. The computational cost of this algorithm is O(rp2) for a dense matrix V. If , then an important equality that results from the algorithm is
formula
3.1
where Tk is a tridiagonal matrix. The eigenvalues of Tr are called Ritz values, and for an eigenvector w of Tr, Qrw defines the Ritz vector. As r increases, more Ritz values and vectors will converges toward the eigenvalues and eigenvectors of V (Golub & Van Loan, 1996; Saad, 1992; Chen & Saad, 2009).
The Krylov subspace Kr(V, y) reached from a small number of r of naive vectors is used for the construction of an approximation of z. The eigendecomposition of Tr, defined in equation 3.1, obtained from Kr(V, y) is easily computed in practice because r is assumed to be small:
formula
3.2
The eigenvalues in D are known to approximate the extremal eigenvalues of V, and QrW are the approximate eigenvectors.
The problem of linear whitening can be formulated as searching for the vector z such that (Eldar & Oppenheim, 2005)
formula
3.3
is minimum subject to
formula
3.4
Assuming the vector , z can be written as
formula
3.5
for some appropriate . The orthonormal basis Qr generated by the Lanczos algorithm is used to represent this vector. In the basis Qr, can be written as
formula
3.6
for some appropriate and t is since U is orthogonal. With this, the MSE, equation 3.3, can be written as
formula
3.7
where ti is and is N(0, si). The minimum of equation 3.7 is obtained for the maximum of the last term of the right-hand side of equation 3.7. Using Cauchy-Schwartz inequality,
formula
3.8
the maximum of equation 3.7 is obtained for
formula
3.9
Therefore,
formula
3.10
and
formula
3.11
By replacing in equation 3.7 and minimizing with respect to , we have
formula
3.12
when we use equation 3.12, z becomes
formula
3.13
where , e1 is the first column of the identity matrix and the first vector of the basis Qr is q1=y/||y||2. The approximate whitened vector is then based on computing the inverse square root of a matrix of substantially smaller dimension , where it is inexpensive to compute exactly.

The above procedure is captured in algorithm 1.

formula
Since the whitened vector of interest is of the form , the vector defined in equation 3.13 is a good alternative. Using
formula
3.14
where the last term is valid for sufficiently large r, equation 3.13 gives
formula
3.15
the orthogonal projection of the exact solution, equation 2.2, onto the Krylov subspace. When projected to the subspace spanned by the major eigenvectors of V, this approximation is very close to the projection of z if r is sufficiently large.
The proposed approach, equation 3.11, as well as the truncated ED, equation 2.4, are two methods for biased estimation. Each of these methods generates a filtered version of z given in equation 2.2. In the case of the truncated ED, we have
formula
3.16
where the shrinkage function f1(si) is given by
formula
The bias is introduced in the directions that are responsible for the high variance. The hope is that the increase in bias is small compared to the decrease in variance, leading to a reduction in the mean square error. If the shrinkage function f1(.) above does not depend on y, that is, is linear in y, any will increase the bias in the ith direction. The proposed whitening procedure in equation 3.11 is a filtering estimator as well. The shrinkage function in this case is closely related to the Ritz values and vectors and is given by Van der Sluis and Van der Vorst (1997):
formula
In comparison to equation 3.16,
formula
is nonlinear in y. The values f(si) depend on the eigenvalues of Tr, and Tr in turn depends on y in a complicated manner. f2 is related to equation 2.5 through , which are subspaces of the space spanned by the vectors ui for which . Qr is obtained through a Gram-Schmidt orthonormalization (GSO) of Kr:
formula
The values of f2(si) automatically take into account both equation 2.5 and the variances in V, whereas the value of f1 extracts only the directions ui that explain the maximum variance in V. Examples of f2 for different values of r are presented in Figure 1 for a symmetric matrix of dimension 10.
Figure 1:

An illustration of the shrinkage function f2 for different values of r. The symmetric matrix V is of dimension 10. The shrinkage is high if r is small.

Figure 1:

An illustration of the shrinkage function f2 for different values of r. The symmetric matrix V is of dimension 10. The shrinkage is high if r is small.

The convergence of to z in the directions of the major eigenvectors of V as r increases can be analyzed using
formula
3.17

where uj is the jth eigenvector of V, m=rj with , cj>0 and are constants independent of r (not considering ) and Tm(.) is the Chebyshev polynomial of degree m. Therefore, the approximation error projected on uj decays as . The derivation of this inequality is given in the appendix.

The Chebyshev polynomial Tm(x) for x>1 large m is
formula

where . Therefore, converges geometrically.

4.  Choosing the Dimension r

Similar to the problem of determining the most appropriate dimension k when using equation 2.4, the dimension r of the Krylov subspace Kr(V, y) plays a crucial role in finding the accurate approximation of the major eigenvectors and eigenvalues of V. This number r also corresponds to the number of steps in algorithm 1. It can be chosen beforehand using a model selection criterion (Seghouane & Bekara, 2004; Seghouane & Cichocki, 2007) or determined by a convergence rule tested at the end of each iteration of the algorithm. In this latter case, the whitened vector is computed at the end of each step i of the algorithm.

A different convergence rule can be used. In this letter, the relative norm of the difference between two consecutive approximations,
formula
is used as a convergence criterion. Algorithm 1 is stopped when this is below a certain threshold.

5.  Application to fMRI Time Series Whitening

The data used to assess the performance of the proposed prewhitening method were generated from an fMRI experiment performed to investigate an event-related right-finger tapping task. A 3.0 T functional MRI system was used to acquire the whole brain BOLD/EPI images. Each acquisition consisted of 35 conitguous slices ( 3.44 mm × 3.44 mm × 4 mm voxel). The data were recorded for 650 s with TR =2 s. First, 30 dummy scans were discarded. After the first 30 s of rest, the task and resting period activity, which consisted of a 14 s window, was repeated 40 times followed by an additional 30 s of rest. The task period consisted of 2 s of right-finger tapping. For the resting period that comes after the task, the interstimulus interval (ISI) ranged between 4 s and 20 s with an average ISI period of 12 s (Lee, Tak, & Ye, 2011).

Four data sets of size p=1000, p=5000, p=10, 000, and p=40, 000 voxels were constructed from both activated and nonactivated voxels. These data sets were whitened using the proposed algorithm and the truncated SVD approach implemented in the GIFT software (Calhoun, Adali, Pearlson, & Pekar, 2001a). The Q-Q plots (with Mahalanobis distance) were used to compare the underlying distribution of each of the whitened data sets with the standard normal distribution. As it can be seen from Figure 2, the Q-Q plots approximately follow the line y=x for whitened data sets obtained with both methods (the second and third lines of the figure). This indicates that the underlying distribution of the whitened data sets on the vertical axis is identical to the standard normal. On the other hand, when we look at the computational time, Table 1 indicates that the proposed whitening method is computationally more efficient than the classic approach of truncated ED or SVD.
Figure 2:

QQ plots of the whitened data sets obtained from the two different methods versus the distribution.

Figure 2:

QQ plots of the whitened data sets obtained from the two different methods versus the distribution.

Table 1:
Computation Time in Seconds for the Whitening of the Three Data Sets Obtained for the Method Implemented in GIFT Software and the Proposed Method.
Number of Voxels 1000 5000 10,000 40,000 
GIFT 4.62 10.07 17.61 64.25 
Proposed method 0.10 0.86 1.64 8.13 
Number of Voxels 1000 5000 10,000 40,000 
GIFT 4.62 10.07 17.61 64.25 
Proposed method 0.10 0.86 1.64 8.13 

6.  Conclusion

This letter introduces a computationally efficient method for whitening high-dimensional fMRI data sets based on Krylov subspaces. The gain in the computational cost of the method is achieved by avoiding the use of the ED of V, which has a computational cost of O(p3). Instead, a Lanczos-based approach is used to accurately approximate the major eigenvectors and eigenvalues of V. This has the advantage of avoiding the computation of the eigenvectors and eigenvalues that are not associated with the intrinsic dimension of the data, leading to a reduced computational cost of O(rp2), . As illustrated with an experiment of whitening high-dimensional fMRI data, this comes without sacrificing accuracy.

Appendix

The bound on the angle between the jth eigenvector uj of V and the Krylov subspace defined by Qr is given by Saad (1980):
formula
Applying the above inequality to equation 3.17,
formula

References

Ardekani
,
B.
,
Kershaw
,
J.
,
Kashikura
,
K.
, &
Kanno
,
I.
(
1999
).
Activation detection in functional MRI using subspace modeling and maximum likelihood estimation
.
IEEE Transactions on Medical Imaging
,
18
(
2
),
101
114
.
Ashby
,
F. G.
(
2011
).
Statistical analysis of fMRI data
.
Cambridge, MA
:
MIT Press
.
Calhoun
,
V. D.
,
Adali
,
T.
,
Pearlson
,
G. D.
, &
Pekar
,
J. J.
(
2001a
).
Spatial and temporal independent component analysis of functional MRI data containing a pair of task-related waveforms
.
Human Brain Mapping
,
13
(
1
),
43
53
.
Calhoun
,
V. D.
,
Adali
,
T.
,
Pearlson
,
G. D.
, &
Pekar
,
J. J.
(
2001b
).
A method for making group inferences from functional MRI data using independent component analysis
.
Human Brain Mapping
,
43
(
3
),
140
151
.
Chen
,
J.
, &
Saad
,
Y.
(
2009
).
Lanczos vectors versus singular vectors for effective dimension reduction
.
IEEE Transactions on Knowldge and Data Engineering
,
21
(
8
),
1091
1103
.
Eldar
,
Y. C.
, &
Oppenheim
,
A. V.
(
2005
).
MMSE whitening and subspace whitening
.
IEEE Transactions on Information Theory
,
49
(
7
),
1846
1851
.
Esposito
,
F.
,
Formisano
,
E.
,
Seifritz
,
E.
,
Goebel
,
R.
,
Morrone
,
R.
,
Tedeschi
,
G.
, &
Di Salle
,
F.
(
2002
).
Spatial independent component analysis of functional MRI time series: To what extent do results depend on the algorithm used?
Human Brain Mapping
,
16
(
3
),
146
157
.
Genovese
,
C. R.
,
Lazar
,
N. A.
, &
Nichols
,
T.
(
2002
).
Thresholding of statistical maps in functional neuroimaging using the false discovery rate
.
Neuroimage
,
15
(
4
),
870
878
.
Golub
,
G. H.
, &
Van Loan
,
C. F.
(
1996
).
Matrix computations
.
Baltimore, MD
:
Johns Hopkins University Press
.
Hyvarinen
,
A.
,
Karhunen
,
J.
, &
Oja
,
E.
(
2001
).
Independent component analysis
.
New York
:
Wiley
.
Jolliffe
,
I. T.
(
2002
).
Principal component analysis
.
New York
:
Springer
.
Lazar
,
N. A.
(
2010
).
The statistical analysis of functional MRI data
.
New York
:
Springer
.
Lee
,
K.
,
Tak
,
S.
, &
Ye
,
J. C.
(
2011
).
A data-driven sparse GLM for fMRI analysis using sparse dictionary learning with MDL criterion
.
IEEE Transactions on Medical Imaging
,
30
(
5
),
1076
1089
.
Saad
,
Y.
(
1980
).
On the rates of convergence of the Lanczos and block-Lanczos methods
.
SIAM Journal of Numerical Analysis
,
17
(
5
),
687
706
.
Saad
,
Y.
(
1992
).
Numerical methods for large eigenvalues problems
.
New York
:
Halstead Press
.
Seghouane
,
A. K.
, &
Bekara
,
M.
(
2004
).
A small sample model selection criterion based on Kullback's symmetric divergence
.
IEEE Transactions on Signal Processing
,
52
(
12
),
3314
3323
.
Seghouane
,
A. K.
, &
Cichocki
,
A.
(
2007
).
Bayesian estimation of the number of principal components
.
Signal Processing
,
87
(
3
),
562
568
.
Seghouane
,
A. K.
, &
Shah
,
A.
(
2012
).
HRF estimation in fMRI data with unknown drift matrix by iterative minimization of the Kullback-Leibler divergence
.
IEEE Transactions on Medical Imaging
,
31
(
2
),
192
206
.
Van der Sluis
,
A.
, &
Van der Vorst
,
H. A.
(
1986
).
The rate of convergence of conjugate gradient
.
Numerische Mathematik
,
48
(
5
),
543
560
.