This letter proposes an algorithm for linear whitening that minimizes the mean squared error between the original and whitened data without using the truncated eigendecomposition (ED) of the covariance matrix of the original data. This algorithm uses Lanczos vectors to accurately approximate the major eigenvectors and eigenvalues of the covariance matrix of the original data. The major advantage of the proposed whitening approach is its low computational cost when compared with that of the truncated ED. This gain comes without sacrificing accuracy, as illustrated with an experiment of whitening a high-dimensional fMRI data set.
The univariate and multivariate linear regression models are simple and widely used parametric models for fMRI data analysis (Ashby, 2011; Lazar, 2010). In these models, the linear (deterministic) part is used to characterize the activation response in a single voxel for the univariate model or a group of voxels for the multivariate model and the baseline drift, whereas the second (stochastic) part of the model characterizes the noise from both physical and physiological processes (Ashby, 2011; Lazar, 2010).
On the one hand, the univariate linear regression model uses a hemodynamic response function (HRF) as a parameter vector with extra parameters for the drift and a univariate temporally correlated noise to model the fMRI time series at each voxel to infer task-related activations with estimates of the level of significance (Ardekani, Kershaw, Kashikura, & Kanno, 1999; Seghouane & Shah, 2012). As a consequence, the sensitivity of hypothesis methods based on this model across a group of voxels is crucially dependent on the group correction postprocessing step (Genovese, Lazar, & Nichols, 2002).
On the other hand, multivariate linear regression models have been used mainly in data-driven methods (Esposito, Formisano, Seifritz, Goebel, Morrone, Tedeschi, & Di Salle, 2002; Calhoun, Adali, Pearlson, & Pekar, 2001b). They allow the exploitation of the relationships among voxels. Among the multivariate data-driven techniques used in fMRI, independent component analysis (ICA; Hyvarien, Karhunen, & Oja, 2001; Jolliffe, 2002) has been widely applied to fMRI to find components that are spatially (Esposito et al., 2002) (sICA) or temporally (Calhoun, Adali, Pearlson, & Pekar, 2001a) (tICA) independent.
A common preprocessing step used with both models is prewhitening. In the case of the univariate model, this allows the generation of the best linear unbiased estimates of the parameters vector and therefore more accurate activation tests. In the ICA case, it has the advantage of reducing the complexity and improving the convergence of the ICA algorithm (Hyvarien et al., 2001).
Prewhitening is obtained by multiplying the fMRI data by the square root of the inverse covariance matrix, which can be obtained from the Cholesky factorization or the eigendecomposition (ED) of the inverse covariance matrix. As it is well known, however, the Cholesky factorization and the eigendecomposition are computationally expensive, scaling O(p3) for a dense covariance matrix. Although this is not a major inconvenience when dealing with univariate linear regression modeling of fMRI time series or sICA, this can be a major problem for tICA when the number of voxels considered is very large. In fMRI analysis, the spatial resolution is often at least about over 32 slices, resulting in 131,072 voxels. Prewhitening needs substantial computational work to get the eigendecomposition. To overcome this inconvenience, an alternative approach for prewhitening large-dimensional multivariate vectors is proposed in this letter.
The proposed subspace method constructs an approximated whitened vector based on Krylov sequences of subspaces reachable from the unwhitened vector. The goal of the proposed algorithm is the same as the approach based on ED: to preserve the quality of the resulting product between the inverse square root of the covariance matrix and the data vector to be whitened in the major eigendirections of the covariance matrix. A Lanczos-based approach is used to achieve this goal by using a relatively small number r of Lanczos vectors. The advantage of the proposed approach is its computational complexity, which is O(rp2) with , compared to the O(p3) associated with the ED-based approach. Like the Cholesky factorization or the eigendecomposition, the proposed method involves an eigendecomposition (factorization), but it is implemented on a much smaller matrix, resulting in a computationally much cheaper whitening method in comparison to O(p3). The proposed method is particularly appealing when a reduced-rank approximation of the covariance matrix is used for prewhitening.
The rest of the letter is organized as follows. The prewhitening preprocessing step of the multivariate temporal fMRI time series is reviewed in section 2. The proposed prewhitening algorithm is described in section 3. The choice of the dimension of the Krylov subspace is discussed in section 4. The performance of the proposed whitening method on real fMRI data is illustrated in section 5. Concluding remarks are given in section 6.
2. Prewhitening and the Multivariate fMRI Temporal Model
In what follows, based on Krylov sequences of subspaces reachable from the vector y, we propose a computationally efficient approach for generating the major eigenvectors and eigenvalues of V without computing its eigendecomposition. An eigendecomposition is still necessary but is applied on a much smaller matrix of dimension .
3. Krylov Subspace Approximations for Prewhitening
The above procedure is captured in algorithm 1.
where uj is the jth eigenvector of V, m=r−j with , cj>0 and are constants independent of r (not considering ) and Tm(.) is the Chebyshev polynomial of degree m. Therefore, the approximation error projected on uj decays as . The derivation of this inequality is given in the appendix.
where . Therefore, converges geometrically.
4. Choosing the Dimension r
Similar to the problem of determining the most appropriate dimension k when using equation 2.4, the dimension r of the Krylov subspace Kr(V, y) plays a crucial role in finding the accurate approximation of the major eigenvectors and eigenvalues of V. This number r also corresponds to the number of steps in algorithm 1. It can be chosen beforehand using a model selection criterion (Seghouane & Bekara, 2004; Seghouane & Cichocki, 2007) or determined by a convergence rule tested at the end of each iteration of the algorithm. In this latter case, the whitened vector is computed at the end of each step i of the algorithm.
5. Application to fMRI Time Series Whitening
The data used to assess the performance of the proposed prewhitening method were generated from an fMRI experiment performed to investigate an event-related right-finger tapping task. A 3.0 T functional MRI system was used to acquire the whole brain BOLD/EPI images. Each acquisition consisted of 35 conitguous slices ( 3.44 mm × 3.44 mm × 4 mm voxel). The data were recorded for 650 s with TR =2 s. First, 30 dummy scans were discarded. After the first 30 s of rest, the task and resting period activity, which consisted of a 14 s window, was repeated 40 times followed by an additional 30 s of rest. The task period consisted of 2 s of right-finger tapping. For the resting period that comes after the task, the interstimulus interval (ISI) ranged between 4 s and 20 s with an average ISI period of 12 s (Lee, Tak, & Ye, 2011).
|Number of Voxels||1000||5000||10,000||40,000|
|Number of Voxels||1000||5000||10,000||40,000|
This letter introduces a computationally efficient method for whitening high-dimensional fMRI data sets based on Krylov subspaces. The gain in the computational cost of the method is achieved by avoiding the use of the ED of V, which has a computational cost of O(p3). Instead, a Lanczos-based approach is used to accurately approximate the major eigenvectors and eigenvalues of V. This has the advantage of avoiding the computation of the eigenvectors and eigenvalues that are not associated with the intrinsic dimension of the data, leading to a reduced computational cost of O(rp2), . As illustrated with an experiment of whitening high-dimensional fMRI data, this comes without sacrificing accuracy.