Abstract

Sparse signal representations have gained much interest recently in both signal processing and statistical communities. Compared to orthogonal matching pursuit (OMP) and basis pursuit, which solve the L0 and L1 constrained sparse least-squares problems, respectively, least angle regression (LARS) is a computationally efficient method to solve both problems for all critical values of the regularization parameter λ. However, all of these methods are not suitable for solving large multidimensional sparse least-squares problems, as they would require extensive computational power and memory. An earlier generalization of OMP, known as Kronecker-OMP, was developed to solve the L0 problem for large multidimensional sparse least-squares problems. However, its memory usage and computation time increase quickly with the number of problem dimensions and iterations. In this letter, we develop a generalization of LARS, tensor least angle regression (T-LARS) that could efficiently solve either large L0 or large L1 constrained multidimensional, sparse, least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameter λ and with lower computational complexity and memory usage than Kronecker-OMP. To demonstrate the validity and performance of our T-LARS algorithm, we used it to successfully obtain different sparse representations of two relatively large 3D brain images, using fixed and learned separable overcomplete dictionaries, by solving both L0 and L1 constrained sparse least-squares problems. Our numerical experiments demonstrate that our T-LARS algorithm is significantly faster (46 to 70 times) than Kronecker-OMP in obtaining K-sparse solutions for multilinear leastsquares problems. However, the K-sparse solutions obtained using Kronecker-OMP always have a slightly lower residual error (1.55% to 2.25%) than ones obtained by T-LARS. Therefore, T-LARS could be an important tool for numerous multidimensional biomedical signal processing applications.

1  Introduction

Sparse signal representations have gained much interest recently in both signal processing and statistics communities. A sparse signal representation usually results in simpler and faster processing, in addition to lower memory storage requirement for fewer coefficients (Mallat, 2009; Wickramasingha, Sobhy, & Sherif, 2017). However, finding optimal sparse representations for different signals is not a trivial task (Mallat, 2009). Therefore, redundant signal representations using overcomplete dictionaries have been introduced to facilitate finding more sparse representations for different signals (Mallat & Zhang, 1993; Mallat, 2009; Pati, Rezaiifar, & Krishnaprasad, 1993; Wickramasingha et al., 2017). Undercomplete dictionaries could also be used to obtain approximate signal representations (Kreutz-Delgado et al., 2003; Malioutov, Çetin, & Willsky, 2005; Tosic & Frossard, 2011). We note that at their core such signal representation problems typically involve solving a least-squares problem (Donoho, 2006; Donoho & Elad, 2003).

A number of methods have been proposed to solve the sparse least-squares problem, including the Method of Frames (MOF; Daubechies, 1990), Matching Pursuit (MP; Mallat & Zhang, 1993), Orthogonal Matching Pursuit (OMP; Pati et al., 1993), Best Orthogonal Basis (BOB; Coifman & Wickerhauser, 1992), Lasso (also known as Basis Pursuit, BP); Chen, Donoho, & Saunders, 1998; Efron et al., 2004), and Least Angle Regression (LARS; Efron et al., 2004). MP and OMP obtain sparse signal representations by solving a nonconvex L0 constrained least-squares problem (Tropp, 2004). MPs are heuristic methods that construct sparse signal representations by sequentially adding atoms from a given dictionary in a greedy (i.e., nonglobally optimal) manner. BP relaxes the nonconvex L0 constrained optimization problem to solve a convex L1 constrained least-squares problem instead (Chen et al., 1998). In both problem formulations, a regularization parameter λ determines the trade-off between the representation error of the signal and its sparsity, as shown in the Pareto curve in van den Berg and Friedlander (2009). A common approach to obtaining a sparse signal representation using BP is to solve the optimization problem multiple times for different values λ, before choosing the most suitable solution for the application at hand (van den Berg & Friedlander, 2009).

Compared to the above methods, LAR efficiently solves the L0 or, with a slight modification the L1 constrained least-squares problem for all critical values of the regularization parameter λ (Efron et al., 2004). However, even LARS is not suitable for large-scale problems as it would require multiplication and inversion of very large matrices (Yang, Peng, Xu, & Dai, 2009). For example, for m unknown variables and n equations, the LARS algorithm has O(m3+nm2) computational complexity (Efron et al., 2004).

LARS and other algorithms to solve sparse least-squares problems are directly applicable to one-dimensional signals. Therefore, multidimensional signals that are represented by tensors (i.e., multidimensional arrays) would need to be vectorized first to enable the application of these methods (Acar, Dunlavy, & Kolda, 2011; Cichocki et al., 2015; Kolda, 2006). For IN number of vectorized variables, a dictionary of size IN×JN, where J>I for an overcomplete dictionary, is required to solve the sparse linear least-squares problem, using LARS and other algorithms mentioned above. N is the order of the tensor, also known as the number of modes, and I is the dimension of each mode. Therefore, the number of vectorized unknown variables IN would increase exponentially with the order of the tensor N. Such problems would quickly become increasingly large and computationally intractable. For example, a 3D tensor with 100 unknown variables in each mode has 1 million unknowns, which requires a dictionary with at least 1 trillion (1012) elements; a 4D tensor with 100 unknown variables in each mode has 10 million unknowns, which requires a dictionary of at least 10 quadrillion (1016) elements.

Mathematically separable signal representations (i.e., those using separable dictionaries) have been typically used for multidimensional signals because they are simpler and easier to obtain than nonseparable representations (Caiafa & Cichocki, 2012; Sulam, Ophir, Zibulevsky, & Elad, 2016). Caiafa and Cichocki (2012) introduced Kronecker-OMP a generalization of OMP that could represent multidimensional signals represented by tensors using separable dictionaries. They also developed the N-BOMP algorithm to exploit block-sparse structures in multidimensional signals. However, similar to OMP, Kronecker-OMP could obtain only an approximate nonglobally optimal solution of the nonconvex L0 constrained sparse, least-squares problem (Elrewainy & Sherif, 2019; van den Berg & Friedlander, 2009). Also, there is currently no computationally efficient method to obtain a sparse representation of a multidimensional signal by solving the convex L1 constrained sparse, least-squares problem for all critical values of the regularization parameter λ. However, two of us, Elrewainy and Sherif have developed the Kronecker least angle regression (K-LARS) algorithm to efficiently solve either large L0 or large L1 sparse, least-squares problems (overdetermined) with a particular Kronecker form AI, for all critical values of the regularization parameter λ (Elrewainy & Sherif, 2019). They used K-LARS to sparsely fit one-dimensional multichannel, hyperspectral, spectral imaging data to a Kronecker model AI.

In this letter, we develop a generalization of K-LARS, tensor least angle regression (T-LARS) which could efficiently solve either large L0 or large L1 multidimensional sparse least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameter λ. The balance of the letter is organized as follows: Section 2 provides a brief introduction to tensors, tensor operations, and multilinear sparse least-squares problems. In section 3, we review LARS and describe our T-LARS algorithm in detail. Section 4 presents the computational complexity of our T-LARS algorithm and compares its computational complexity with that of Kronecker-OMP. Section 5 provides results of experiments applying both T-LARS and Kronecker-OMP. We present our conclusions in section 6.

2  Problem Formulation

2.1  Tensors and Multilinear Transformations

The term tensor has a specific mathematical definition in physics, but it has been widely accepted in many disciplines (e.g., signal processing and statistics) to mean a multidimensional array. Therefore, a vector is a first-order tensor, and a matrix is a second-order tensor. An N-dimensional array is an Nth-order tensor whose N dimensions are also known as modes (Hawe, Seibert, & Kleinsteuber, 2013; Sidiropoulos et al., 2017). The Nth-order tensor XRI1××In××IN has N modes, with dimensions I1,I2,,IN where vectors along a specific mode, n, are called mode-n fibers.

Vectorization and mode-n matricization of tensors (Hawe et al., 2013; Sidiropoulos et al., 2017) are two important tensor reshaping operations. As the names imply vectorization of a tensor generates a vector and the matricization of a tensor generates a matrix. Tensors are vectorized by stacking mode-1 fibers in reverse lexicographical order, where this vectorization is denoted vec(X):
XRI1××In××INvec(X)RININ-1I1.
In mode-n tensor matricization, mode-n fibers become the columns of the resulting matrix. We note that such ordering of these columns is not consistent across the literature (Kolda & Bader, 2009; Kolda, 2006). In this letter, we use reverse lexicographical order INIn+1In-1I1 for the column ordering in mode-n tensor matricization. In such reverse lexicographical order, I1 varies the fastest and IN varies the slowest. Let Xn denote mode-n matricization of a tensor X:
XRI1××In××INX(n)RIn×INIn+1In-1I1.
Another important operation that could be performed on a tensor is its mode-n product, that is, its multiplication by a matrix along one of its modes. Let the tensor XRI1××In××IN and a matrix Φ(n)RJn×In; then their mode-n product, YnRI1××Jn××IN, is denoted as
Yn=X×nΦ(n),
(2.1)
where all mode-n fibers of the tensor are multiplied by the matrix Φ(n). Equation 2.1 could also be written as Y(n)=Φ(n)X(n), where Y(n) and X(n) are mode-n matricizations of tensors X and Y, respectively. A mode-n product could be thought of as a linear transformation of the mode-n fibers of a tensor. Therefore, a multilinear transformation of a tensor X could be defined as
Y=X×1Φ(1)×2Φ(2)×3×NΦ(N),
(2.2)
where, Φ(n);n{1,2,,N} are matrices with dimensions Φ(n)RJn×In;n1,2,,N and YRJ1××Jn××JN (Hawe et al., 2013; Sidiropoulos et al., 2017). This multilinear transformation could also be written as a product of vec(X) and the Kronecker product of matrices Φ(n);n{1,2,,N} (Hawe et al., 2013; Sidiropoulos et al., 2017):
vec(Y)=Φ(N)Φ(N-1)Φ(1)vec(X).
(2.3)
We note that equation 2.3 is a linear system relating to x=vec(X) and y=vec(Y). If matrix Φ represents a separable dictionary, that is,
Φ=(Φ(N)Φ(N-1)Φ(2)Φ(1)).
(2.4)
then equation 2.3 describes a representation of y=vec(Y) using a dictionary Φ, where x=vec(X) represents its coefficients (y=Φx). Similarly, we could think of equation 2.2 as a representation of tensor Y using dictionaries Φ(n);n{1,2,,N}, where tensor X represents its coefficients.

2.2  Sparse Multilinear Least-Squares Problem

A sparse multilinear representation of equation 2.3 could be obtained by rewriting it as an Lp minimization problem (Wickramasingha et al., 2017)
x˜=argminxΦ(N)Φ(n)Φ(1)vec(X)-vec(Y)22+λvec(X)p,
(2.5)
where λ is a regularization parameter.
Alternatively using equations 2.2 and 2.3, equation 2.5 could be written as
X˜=argminXX×1Φ(1)×2Φ(2)×3×NΦ(N)-Y22+λXp,
(2.6)
where YRJ1××Jn××JN and XRI1××In××IN.

For the same values of p, the Lp minimization problems in equations 2.5 and 2.6 are equivalent, even though their formulations are in vector and tensor forms, respectively. We note that for 0p<1, equations 2.5 and 2.6 are nonconvex optimization problems, while for p1, they are convex optimization problems. The sparsest solution of equations 2.5 and 2.6 would be obtained when p=0, that is, L0 constrained problem, but its sparsity would be reduced as p increases.

The L0 minimization problem (p=0) is a nonconvex optimization problem whose vector formulation, problem 2.5, could be solved approximately (i.e., nonglobally) using OMP or LARS (van den Berg & Friedlander, 2009). The L1 minimization problem (p=1) is a convex optimization problem, whose vector formulation, problem 2.5, could be exactly (i.e., globally) solved using BP or LARS (Donoho & Tsaig, 2008). However, OMP, BP, and LARS share a serious drawback: they are not suitable for solving very large sparse least-squares problems because they involve multiplication and inverting very large matrices.

As a way of overcoming this drawback, Caiafa and Cichocki (2012) proposed Kronecker-OMP a tensor-based generalization of OMP for solving sparse multilinear least-squares problems. However, similar to OMP, Kronecker-OMP could obtain only a nonglobally optimal solution of the nonconvex L0 constrained, sparse, least-squares problem (Donoho & Tsaig, 2008).

3  Tensor Least Angle Regression (T-LARS)

Least angle regression (LARS) is a computationally efficient method to solve an L0 or L1 constrained minimization problem in vector form, problem 2.5 for all critical values of the regularization parameter λ (Efron et al., 2004). In this letter, we develop a generalization of LARS, tensor least angle regression (T-LARS), to solve large sparse tensor least-squares problems to for example, obtain sparse representations of multidimensional signals using a separable dictionary as described by equation 2.3. As shown below, our T-LARS calculations are performed without explicitly generating or inverting large matrices, thereby keeping its computational complexity and memory requirement relatively low. Both T-LARS and Kronecker-OMP algorithms use the Schur complement inversion formula for inverting large matrices without explicitly inverting them (Caiafa & Cichocki, 2012).

3.1  Least Angle Regression

LARS solves the L0 or L1 constrained minimization problem in equation 2.5 for all critical values of the regularization parameter λ. It starts with a very large value of λ that results in an empty active column's matrix, ΦI and a solution x˜t=0=0. The set I denotes an active set of the dictionary Φ that is column indices where the optimal solution x˜t at iteration t is nonzero, and Ic denotes its corresponding inactive set. Therefore, ΦI contains only the active columns of the dictionary Φ and ΦIc contains only its inactive columns.

At each iteration t, a new column is either added (L0) to the active set I or added or removed (L1) from the active set I, and λ is reduced by a calculated value δt*. As a result of such iterations new solutions with an increased number of coefficients that follow a piecewise linear path are obtained until a predetermined residual error ɛ is obtained. One important characteristic of LARS is that the current solution at each iteration is the optimum sparse solution for the selected active columns.

Initialization of LARS includes setting the active set to an empty set, I={}, the initial solution vector x˜0=0, initial residual vector r0=y, and initial regularization coefficient λ1=max(c1) where, c1=ΦTr0. The optimal solution x˜t at any iteration t must satisfy the following two optimality conditions,
ΦIcTrtλt
(3.1)
ΦITrt=-λtzt,
(3.2)
where, rt is the residual vector at iteration t, rt=y-Φx˜t, and zt is the sign sequence of ct on the active set I.

The condition in equation 3.2 ensures that the magnitude of the correlation between all active columns and the residual is equal to |λt| at each iteration, and the condition in equation 3.1 ensures that the magnitude of the correlation between the inactive columns and the residual is less than or equal to λt.

For an L1 constrained minimization problem, at each iteration, if an inactive column violates condition 3.1, it is added to the active set, and if an active column violates condition 3.2, it is removed from the active set. For an L0 constrained minimization problem only the columns that violate condition 3.1 are added to the active set at each iteration.

For a given active set I, the optimal solution x˜t could be written as
x˜t=ΦItTΦIt-1ΦItTy-λtzt,onI0,Otherwise
(3.3)
where, zt is the sign sequence of ct on the active set I and ct=ΦTrt-1 is the correlation vector of all columns of the dictionary Φ with the residual vector rt-1 at iteration t. (See algorithm 1 for the LARS algorithm.)
formula
formula

3.2  Tensor Least Angle Regression Algorithm

T-LARS is a generalization of LARS to solve the sparse multilinear least-squares problem in equation 2.6 using tensors and multilinear algebra. Unlike LARS, T-LARS does not calculate large matrices such as the Kronecker dictionary, Φ in equation 2.4, which is required in vectorized sparse multilinear least-squares problems. Instead, T-LARS uses much smaller mode-n dictionaries for calculations. A mapping between column indices of dictionary Φ and column indices of mode-n dictionaries Φ(n);n1,,N is essential in T-LARS calculations (see appendix A).

Required inputs to the T-LARS algorithm are the tensor YRJ1××Jn××JN, mode-n dictionary matrices Φ(n);n{1,,N}, and the stopping criterion as a residual tolerance ɛ or the maximum number of nonzero coefficients K (K-sparse representation). The output is the solution tensor XRI1××In××IN.

First, normalize the tensor Y and columns of each dictionary Φ(n);n1,,N to have a unit L2 norm. Note that normalizing columns of each dictionary Φ(n);n1,,N ensures normalization of the separable dictionary Φ in equation 2.4 (see appendix B). For notational simplicity in the following sections, we will use Y to represent the normalized tensor and Φ(n) to represent normalized dictionary matrices.

Gram matrices are used in several steps of T-LARS. For a large separable dictionary, Φ, its Gram matrix G=ΦTΦ would be large as well. Therefore, explicitly building this Gram matrix and using it in computations could be very inefficient for large problems. Instead, T-LARS uses Gram matrices of mode-n dictionary matrices, Φ1,Φ2,,Φ(N), defined as G1,G2,,G(N). We can obtain a Gram matrix G(n);n1,,N for each mode-n dictionary Φ(n);n1,,N by
G(n)=Φ(n)TΦ(n)
(3.4)
The tensor C1 is the correlation between the tensor Y and the mode-n dictionary matrices Φ(n);n1,,N:
C1=Y×1Φ(1)T×2×nΦ(n)T×n+1×NΦ(N)T.
(3.5)
The tensor C1 could be calculated efficiently as N mode-n products and the initial correlation vector is obtained by vectorizing C1, where c1=vecC1 (see appendix C).

T-LARS requires several parameters to be initialized before starting the iterations. The regularization parameter λ1 is initialized to the maximum value of the correlation vector c1, and the corresponding most correlated column ϕI1 from the separable dictionary Φ is added to the initial active set I. The initial residual tensor R0 is set to Y and the initial solution vector x0 and the initial direction vector d0 are set to 0. Initial step size δ0* is also set to 0. T-LARS starts the iterations at t=1 to run until a stopping criterion is reached:

  • Initial residual tensor: R0=Y

  • Initial solution vector: x˜0=0

  • Initial direction vector: d0=0

  • Initial step size: δ0*=0

  • Initial regularization parameter: λ1=max(c1)

  • Active set: I=ϕI1

  • Start iterations at t=1

The following calculations are performed at every iteration t=1,2, of the T-LARS algorithm until the stopping criterion is reached.

3.2.1  Obtain the Inverse of the Gram Matrix of the Active Columns of the Dictionary

We obtain the Gram matrix of the active columns of the dictionary Gt=ΦItTΦIt at each iteration t. The size of this Gram matrix would either increase (dictionary column addition) or decrease (dictionary column removal) with each iteration t. Therefore, for computational efficiency, we use the Schur complement inversion formula to calculate Gt-1 from Gt-1-1 thereby avoiding its full calculation (Caiafa & Cichocki, 2012; Goldfarb, 1972; Rosário, Monteiro, & Rodrigues, 2016).

Updating the Gram matrix after adding a new column ka to the active set. Let the column kaI be the new column added to the active matrix. Given Gt-1-1, the inverse of the Gram matrix Gt-1 could be calculated using the Schur complement inversion formula for a symmetric block matrix (Björck, 2015; Boyd & Vandenberghe, 2010; Hager, 1989),
Gt-1=F11-1αbαbTα,
(3.6)
where F11-1=Gt-1-1+αbbT, b=-Gt-1-1ga, and α=1/gka,ka+gaTb and the column vector gaT is given by
gaT=gk1,kagkn,kagka-1,ka1×a-1.
The elements gkn,ka of gaT are elements of the gram matrix Gt that are obtained using mode-n gram matrices G(n);n1,,N,
g(kn,ka)=gNknN,kaNg1kn1,ka1,
where, knN,,kn1 are the tensor indices corresponding to the column index kn and kaN,,ka1 are the tensor indices corresponding to the column index ka (see appendix A)
Updating the Gram matrix after Removing a Column kr from the Active Set. Let the column krI be the column removed from the active set. We move column kr and row kr of Gt-1-1 to become its last column and last row, respectively. We denote this new matrix as G^t-1-1. By using the Schur complement inversion formula for a symmetric block matrix, the inverse G^t-1-1 could be interpreted as
G^t-1-1=F11-1αbαbTαN×N,
(3.7)
where F11-1=Gt-1+αbbT. Therefore, we could calculate the inverse of the Gram matrix at iteration t as (Goldfarb, 1972; Rosário et al., 2016)
Gt-1=F11-1-αbbT.
(3.8)
Both F11-1 and αbbT could be easily obtained from (G^t-1-1) as follows (Matlab notation):
F11-1=G^t-1-11:N-1,1:N-1,
(3.9)
αbbT=G^t-1-11:N-1,NG^t-1-1N,1:N-1G^t-1-1(N,N).
(3.10)

3.2.2  Obtain Direction Vector dt

The direction vector along which the solution x follows in a piecewise linear fashion when an active column is added to or removed from the active set is given by
dt=Gt-1zt,
(3.11)
where, zt=signctI, that is, the sign sequence of the correlation vector over the active set.

3.2.3  Obtain vt

A vector vt could be defined as
vt=ΦTΦItdt.
(3.12)
This vector vt could be efficiently obtained as a multilinear transformation of a direction tensor Dt by the Gram matrices G(n);n1,,N,
Vt=Dt×1G1×2×nG(n)×n+1×NGN,
(3.13)
where vecDtI=dtandDtIc=0. We note that vecVt=vt.

3.2.4  Obtain the Correlation Vector ct

Because c1 would be obtained at initialization, the following calculations are needed only for an iteration t2. The correlation vector ct is given by
ct=ΦTvecRt-1,
(3.14)
where Rt-1 is the residual tensor from the previous iteration.
Since
vecRt-1=vecRt-2-δt-1*ΦIt-1dt-1,
(3.15)
we could update the correlation vector ct by
ct=ΦTvecRt-2-δt-1*ΦTΦIt-1dt-1.
(3.16)
Substituting equation 3.12 into equation 3.16, we obtain an update for the correlation:
ct=ct-1-δt-1*vt-1.
(3.17)

3.2.5  Calculate Step Size δ*

The minimum step size for adding a new column to the active set is given by
δt+=miniIcλt-ct(i)1-vt(i),λt+cti1+vt(i).
(3.18)
The minimum step size for removing a column from the active set is given by
δt-=-xt-1idti.
(3.19)
Therefore, the minimum step size for L1 constrained sparse least-squares problem is
δt*=minδt+,δt-.
(3.20)
For the L0 constrained sparse least-squares problem, only new columns are added to the active set at every iteration. Therefore, the minimum step size for L0 constrained sparse least-squares problem is δt*=δt+

3.2.6  Update the Solution x˜t, Regularization Parameter, and Residual

The current solution x˜t=vec(X˜t) is given by
x˜t=x˜t-1+δt*dt.
(3.21)
Update λt+1and Rt+1 for the next iteration:
λt+1=λt-δt*.
(3.22)
The residual tensor Rt+1 for the next iteration (t+1) can be calculated as
Rt+1=Rt-δt*Dt×1Φ1×2Φ(2)×3×NΦN.
(3.23)

3.2.7  Check Stopping Criterion

Check if either of the following stopping criteria has been reached:
Rt+12<ɛ
(3.24)
orlengthIK.
(3.25)

The T-LARS algorithm solves the sparse tensor least-squares problem in equation 2.6 to obtain a sparse solution XRI1××In××IN for a tensor YRJ1××Jn××JN using N mode-n dictionaries Φ(n)RJn×In;n{1,N} Tensor Y and N mode-n dictionaries Φ1,Φ2,,Φ(N) are the inputs to the T-LARS algorithm where N1. The T-LARS algorithm can be used to solve underdetermined, square or overdetermined sparse tensor least-squares problems, where the mode-n dictionaries Φ(n)RJn×In;n{1,.N} are overcomplete dictionaries Jn<In complete dictionaries Jn=In or undercomplete dictionaries Jn>In, respectively.

The complete T-LARS algorithm is summarized in algorithm 2 (using Matlab notation). This T-LARS algorithm solves the Lp sparse separable least-squares problem when the T-LARS_mode is set to Lp

4  Algorithm Computational Complexity

In this section, we analyze the computational complexity of our T-LARS algorithm and compare it with the computational complexity of Kronecker-OMP. We show that the computational complexity of T-LARS is significantly lower than that of Kronecker-OMP when solving sparse tensor least-squares problems.

4.1  The Computational Complexity of T-LARS

Let K be the number of iterations used in T-LARS P=I1××In××IN be the number of atoms in the Kronecker dictionary Φ and Q=J1××Jn××JN be the total number of elements in tensor Y. In a typical sparse solution obtained by T-LARS KP. In the following analysis we refer to algorithm 2.

Step 1 of the T-LARS algorithm runs only once and has a computational complexity of
I1Q+I1I2QJ1++PQJ1Jn-1×In+1IN++PJN
Step 10 of the T-LARS algorithm obtains the inverse Gram matrix using Schur complement inversion, and has complexity OPk+k3 where k is the iteration number whose maximum value is K. The computational complexity of step 10 for column addition is PK+4k=1Kk2 and the computational complexity of step 10 for column removal is 2k=1Kk2. Therefore, the maximum computational complexity of step 10 is given by
PK+4k=1Kk2.
Steps 13 and 27 of the T-LARS algorithm involve multilinear transformations. In both steps Dt is a sparse tensor with at most k nonzero entries at any iteration k. Therefore, for K iterations, the computational complexities of steps 13 and 27 are
2k=1Kn=1NkIn
and
2k=1Kn=1NkJn+2KQ,
respectively.

4.1.1  Case of Overcomplete mode-n Dictionaries

For overcomplete mode-n dictionaries Φ(n)RJn×In;Jn<In, n1,,N, step 13 of the T-LARS algorithm would have higher computational complexity compared to step 27. Therefore, the computational complexity for most computationally intensive steps of the T-LARS algorithm would be less than
I1Q+I1I2QJ1++PQJ1Jn-1×In+1IN++PJN+PK+4k=1Kk2+2k=1Kn=1NkIn+2KQ
(4.1)

4.2  Comparison of Computational Complexities of Kronecker-OMP and T-LARS

Caiafa and Cichocki (2012) earlier analyzed the computational complexity of Kronecker-OMP to solve problem 2.6 given YRJ1××Jn××JN, Jn=J;n1,,N and mode-n dictionaries ΦnRJ×I. From Caiafa and Cichocki (2012) after K iterations, the combined computational complexity of Kronecker-OMP was given by
2INJ1-JIN1-JIK+2INK+7k=1Kk2N+2NJ+N+4k=1KkN+N(N-1)+3KJN.
(4.2)
To obtain the computational complexity of T-LARS to solve problem 2.6 given YRJ1××Jn××JN, Jn=J;n1,,N and mode-n dictionaries Φ(n)RJ×I, we substitute In=I and Jn=J;n1,,N, in equation 4.1 to obtain
2INJ1-JIN1-JI+INK+4k=1Kk2+2NIk=1Kk+2KJN.
(4.3)

Table 1 shows a term-by-term comparison of the computational complexity of Kronecker-OMP and T-LARS given in equations 4.2 and 4.3, respectively.

Table 1:
Term-by-Term Comparison of the Computational Complexity of Kronecker-OMP and T-LARS Given in Equations 4.2 and 4.3.
Kronecker-OMPT-LARS
First term 2INJ1-JIN1-JIK 2INJ1-JIN1-JI 
Second term 2INK+7k=1Kk2N INK+4k=1Kk2 
Third term 2NJ+N+4k=1KkN 2NIk=1Kk 
Fourth term N(N-1)+3KJN 2KJN 
Kronecker-OMPT-LARS
First term 2INJ1-JIN1-JIK 2INJ1-JIN1-JI 
Second term 2INK+7k=1Kk2N INK+4k=1Kk2 
Third term 2NJ+N+4k=1KkN 2NIk=1Kk 
Fourth term N(N-1)+3KJN 2KJN 

On comparing equations 4.2 and 4.3, we note that the first term of the computational complexity of T-LARS is more than K times lower than the first term of the computational complexity of Kronecker-OMP:
2INJ1-JIN1-JI<2INJ1-JIN1-JIK.
(4.4)
On comparing equations 4.2 and 4.3, we note that the second term of the computational complexity of T-LARS is OINK+K3 while the second term of the computational complexity of Kronecker-OMP is OINK+K2N+1 Therefore, for N2 and the same number of iterations
INK+4k=1Kk2<2INK+7k=1Kk2N.
(4.5)
On comparing equations 4.2 and 4.3, we note that the third term of the computational complexity of T-LARS is OK2 while the third term of the computational complexity of Kronecker-OMP is OKN+1. Therefore, for N2 and the same number of iterations
2NIk=1Kk<2NJ+N+4k=1KkN.
(4.6)
On comparing equations 4.2 and 4.3, we note that both fourth terms of the computational complexity of T-LARS and of Kronecker-OMP are OJN. Therefore,
2KJN<N(N-1)+3KJN.
(4.7)
Therefore, from equations 4.4 to 4.7, we observe that the computational complexity of our T-LARS algorithm is significantly lower than Kronecker-OMP when solving sparse tensor least-squares problems with N2 with the same number of iterations.

For multidimensional problems, N2, typically KI; therefore, the second terms of the computational complexities of both T-LARS and Kronecker-OMP dominate over all other terms. Therefore, for K iterations, the asymptotic computational complexities of T-LARS and Kronecker-OMP are OINK+K3 and OINK+K2N+1, respectively.

5  Experimental Results

In this section, we present experimental results to compare the performance of Kronecker-OMP and T-LARS when used to obtain sparse representations of 3D brain images using both fixed and learned mode-n overcomplete dictionaries.

5.1  Experimental Data Sets

For our computational experiments, we obtained a 3D MRI brain image and a 3D PET-CT brain image from publicly available data sets.

Our 3D MRI brain image consists of 175×150×10 voxels and was obtained from the OASIS-3: Longitudinal Neuroimaging, Clinical, and Cognitive Dataset for Normal Aging and Alzheimer's Disease (LaMontagne et al., 2018). This image shows a region in the brain of a 38-year-old male patient with a tumor in his right frontal lobe.

Our 3D PET-CT brain image consists of 180×160×10 voxels and was obtained from the Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection (Clark et al., 2013). This image shows a region in the brain of a 38-year-old female patient

5.2  Experimental Setup

We compared the performance of T-LARS and Kronecker-OMP when used to obtain different sparse representations for our 3D MRI and PET-CT brain images by solving L0 and L1 constrained sparse multilinear least-squares problems using both fixed and learned overcomplete dictionaries.

Our fixed mode-n overcomplete dictionaries were a union of a discrete cosine transform (DCT) dictionary and a Symlet wavelet packet with four vanishing moments dictionary. In this case of using fixed mode-n dictionaries, we obtained the required 3D sparse representations by solving either the 3D L0 or L1 minimization problem.

Our learned mode-n overcomplete dictionaries were learned using the tensor-method of optimal directions (T-MOD) (Roemer, Del Galdo, & Haardt, 2014) algorithm. We used T-MOD to learn three overcomplete mode-n dictionaries—Φ1R32×38, Φ(2)R32×38, and Φ3R10×12—using random patches, 32×32×10 voxels with a 10% overlap from either one the 3D brain images that we used: MRI or PET-CT (Elad & Aharon, 2006; Zhai, Zhang, Lv, Fu, & Yu, 2018). In this case of using learned dictionaries, we obtained the required 3D sparse representations by solving either a 4D (due to the use of image patches) L0 or L1 minimization problem. For fair comparison of the performance of T-LARS and Kronecker-OMP, we generated our results using two learned dictionaries, ΦKOMP and ΦTLARS, that were obtained using Kronecker-OMP and T-LARS as the sparse coding algorithm used by T-MOD, respectively

To compare the performance of T-LARS and Kronecker-OMP when used to solve L0 and L1 constrained sparse multilinear least-squares problems, we designed the following experiments to obtain sparse representations of our 3D brain images under different conditions:

  1. Experiment 1: Fixed mode-n dictionaries—3D L0 minimization problem

  2. Experiment 2: Learned mode-n dictionaries ΦKOMP—4D L0 minimization problem

  3. Experiment 3: Learned mode-n dictionaries ΦTLARS—4D L0 minimization problem

  4. Experiment 4: Fixed mode-n dictionaries—3D L1 minimization problem

  5. Experiment 5: Learned mode-n dictionaries ΦTLARS—4D L1 minimization problem

All of our experimental results were obtained using a Matlab implementation of T-LARS and Kronecker-OMP on an MS Windows machine: 2 Intel Xeon CPUs E5-2637 v4 3.5G Hz, 32 GB RAM, and NVIDIA Tesla P100 GPU with 12 GB memory.

5.3  Experimental Results for 3D MRI Brain Images

In this section, we compare the performance of T-LARS and Kronecker-OMP to obtain K-sparse representations of our 3D MRI brain image, Y, 175×150×10 voxels by solving the L0 constrained sparse tensor least-squares problem. We also obtained similar K-sparse representations using T-LARS by solving the L1 optimization problem. Table 2 summarizes our results for the five experiments described in section 5.2. In all experiments, the algorithms were stopped when the number of nonzero coefficients K reached 13, 125, which is 5% of the number of elements in Y. We note that in Table 2, the number of iterations for L1 optimization problems is larger than K because, as shown in algorithm 2 at each iteration T-LARS could either add or remove nonzero coefficients to or from the solution.

Table 2:
Summary of Experimental Results for Our 3D MRI Brain Image.
ExperimentImage SizeOptimization ProblemDictionary TypeNumber of IterationsComputation Time (sec) K-OMPComputation Time (sec) T-LARS
175 × 150 × 10 L0 Fixed 13,125 20,144 434 
32 × 32 × 10 × 36 L0 Learned ΦKOMP 13,125 25,002 394 
32 × 32 × 10 × 36 L0 Learned ΦTLARS 13,125 22,646 400 
175 × 150 × 10 L1 Fixed 14,216 -- 495 
32 × 32 × 10 × 36 L1 Learned ΦTLARS 14,856 -- 490 
ExperimentImage SizeOptimization ProblemDictionary TypeNumber of IterationsComputation Time (sec) K-OMPComputation Time (sec) T-LARS
175 × 150 × 10 L0 Fixed 13,125 20,144 434 
32 × 32 × 10 × 36 L0 Learned ΦKOMP 13,125 25,002 394 
32 × 32 × 10 × 36 L0 Learned ΦTLARS 13,125 22,646 400 
175 × 150 × 10 L1 Fixed 14,216 -- 495 
32 × 32 × 10 × 36 L1 Learned ΦTLARS 14,856 -- 490 

5.3.1  Experiment 1

Figures 1 and 2 show obtained experimental results for representing our 3D MRI brain image using three fixed mode-n overcomplete dictionaries Φ(1)R175×351, Φ(2)R150×302, and Φ3R10×26, by solving the L0 minimization problem using both T-LARS and Kronecker-OMP. The residual error of the reconstructed 3D images obtained using T-LARS was R2=0.0839 (8.39%); for Kronecker-OMP, it was R2=0.0624 (6.24%).

Figure 1:

Experiment 1. Original 3D MRI brain image (a), its reconstruction using 5% nonzero coefficients (K=13,125) obtained by Kronecker-OMP (b), and T-LARS (c) using fixed mode-n overcomplete dictionaries.

Figure 1:

Experiment 1. Original 3D MRI brain image (a), its reconstruction using 5% nonzero coefficients (K=13,125) obtained by Kronecker-OMP (b), and T-LARS (c) using fixed mode-n overcomplete dictionaries.

Figure 2:

Experiment 1. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying Kronecker-OMP and T-LARS to our 3D MRI brain image and using fixed mode-n overcomplete dictionaries.

Figure 2:

Experiment 1. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying Kronecker-OMP and T-LARS to our 3D MRI brain image and using fixed mode-n overcomplete dictionaries.

5.3.2  Experiments 2 and 3

Figures 3 and 4 show the experimental results for representing our 3D MRI brain image using our learned overcomplete dictionaries, ΦKOMP and ΦTLARS by solving the L0 minimization problem using both T-LARS and Kronecker-OMP. For the ΦKOMP dictionary, the residual error of the reconstructed 3D images obtained using T-LARS was R2=0.1368 (13.68%) and for Kronecker-OMP, it was R2=0.1143 (11.43%). For the ΦTLARS dictionary, the residual error of the reconstructed 3D images obtained using T-LARS was R2=0.1127 (11.27%) and for Kronecker-OMP, it was R2=0.0955 (9.55%)

Figure 3:

Experiments 2 and 3. Original 3D MRI brain image (a), its reconstructions using 5% nonzero coefficients (K=13,125), (b–e), the difference images, (f–i) obtained using Kronecker-OMP and T-LARS, using our learned overcomplete dictionaries.

Figure 3:

Experiments 2 and 3. Original 3D MRI brain image (a), its reconstructions using 5% nonzero coefficients (K=13,125), (b–e), the difference images, (f–i) obtained using Kronecker-OMP and T-LARS, using our learned overcomplete dictionaries.

Figure 4:

Experiments 2 and 3. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D MRI brain image and using our learned overcomplete dictionaries.

Figure 4:

Experiments 2 and 3. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D MRI brain image and using our learned overcomplete dictionaries.

5.3.3  Experiment 4

Figures 5 and 6 show the experimental results for representing our 3D MRI brain image using three fixed mode-n overcomplete dictionaries, Φ(1)R175×351, Φ(2)R150×302 and Φ3R10×26, by solving the L1 minimization problem using T-LARS. The residual error of the reconstructed 3D image obtained using T-LARS was R2=0.121 (12.1%)

Figure 5:

Experiment 4. Original 3D MRI brain image (a), its reconstruction using 5% nonzero coefficients (K=13,125) obtained by T-LARS using fixed moden overcomplete dictionaries (b), and the difference image (c).

Figure 5:

Experiment 4. Original 3D MRI brain image (a), its reconstruction using 5% nonzero coefficients (K=13,125) obtained by T-LARS using fixed moden overcomplete dictionaries (b), and the difference image (c).

Figure 6:

Experiment 4. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying T-LARS to our 3D MRI brain image and using fixed mode-n overcomplete dictionaries.

Figure 6:

Experiment 4. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying T-LARS to our 3D MRI brain image and using fixed mode-n overcomplete dictionaries.

5.3.4  Experiment 5

Figures 7 and 8 show obtained experimental results for representing our 3D MRI brain image using our learned overcomplete dictionaryΦTLARS, by solving the L1 minimization problem, using T-LARS. The residual error of the reconstructed 3D image was R2=0.138 (13.8%)

Figure 7:

Experiment 5. Original 3D MRI brain image (a) and its reconstruction using 5% nonzero coefficients (K=13,125) obtained by T-LARS using our learned overcomplete dictionary (b) and the difference image (c).

Figure 7:

Experiment 5. Original 3D MRI brain image (a) and its reconstruction using 5% nonzero coefficients (K=13,125) obtained by T-LARS using our learned overcomplete dictionary (b) and the difference image (c).

Figure 8:

Experiment 5. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying T-LARS to our 3D MRI brain image and using our learned overcomplete dictionary.

Figure 8:

Experiment 5. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying T-LARS to our 3D MRI brain image and using our learned overcomplete dictionary.

5.4  Experimental Results for 3D PET-CT Brain Images

In this section, we compare the performance of T-LARS and Kronecker-OMP to obtain K-sparse representations of our 3D PET-CT brain image, Y, 180×160×10 voxels. by solving the L0 constrained sparse tensor least-squares problem. We also obtained similar K-sparse representations using T-LARS by solving the L1 optimization problem. Table 3 summarizes our results for the five experiments. In all experiments, the algorithms were stopped when the number of nonzero coefficients K reached 14,400, which is 5% of the number of elements in Y. We note that in Table 2 the number of iterations for L1 optimization problems is larger than K because, as shown in algorithm 2, at each iteration T-LARS could either add or remove nonzero coefficients to or from the solution.

Table 3:
Summary of Experimental Results for Our 3D PET-CT Brain Image.
ExperimentImage SizeOptimization ProblemDictionary TypeIterationsNumber of K-OMPComputation Time (sec) T-LARS
180 × 160 × 10 L0 Fixed 14,400 29,529 505 
32 × 32 × 10 × 42 L0 Learned ΦKOMP 14,400 33,453 476 
32 × 32 × 10 × 42 L0 Learned ΦTLARS 14,400 31,083 490 
180 × 160 × 10 L1 Fixed 16,059 – 591 
32 × 32 × 10 × 42 L1 Learned ΦTLARS 18,995 – 744 
ExperimentImage SizeOptimization ProblemDictionary TypeIterationsNumber of K-OMPComputation Time (sec) T-LARS
180 × 160 × 10 L0 Fixed 14,400 29,529 505 
32 × 32 × 10 × 42 L0 Learned ΦKOMP 14,400 33,453 476 
32 × 32 × 10 × 42 L0 Learned ΦTLARS 14,400 31,083 490 
180 × 160 × 10 L1 Fixed 16,059 – 591 
32 × 32 × 10 × 42 L1 Learned ΦTLARS 18,995 – 744 

5.4.1  Experiment 1

Figures 9 and 10 show the experimental results for representing our 3D PET-CT brain image using three fixed mode-n overcomplete dictionaries, Φ(1)R180×364, Φ(2)R160×320 and Φ3R10×26, by solving the L0 minimization problem using both T-LARS and Kronecker-OMP. The residual error of the reconstructed 3D images obtained using T-LARS was R2=0.054 (5.4%), and for Kronecker-OMP, it was R2=0.0368(3.68%).

Figure 9:

Experiment 1. Original PET-CT brain image (a) and its reconstruction using 5% nonzero coefficients (K=14,400) obtained by Kronecker-OMP (b) and TLARS (c) using fixed mode-n overcomplete dictionaries.

Figure 9:

Experiment 1. Original PET-CT brain image (a) and its reconstruction using 5% nonzero coefficients (K=14,400) obtained by Kronecker-OMP (b) and TLARS (c) using fixed mode-n overcomplete dictionaries.

Figure 10:

Experiment 1. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D PET-CT brain image and using fixed mode-n overcomplete dictionaries.

Figure 10:

Experiment 1. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D PET-CT brain image and using fixed mode-n overcomplete dictionaries.

5.4.2  Experiments 2 and 3

Figures 11 and 12 show the experimental results for representing our 3D PET-CT brain image using our learned overcomplete dictionaries ΦKOMP and ΦTLARS, by solving the L0 minimization problem using both T-LARS and Kronecker-OMP. For the ΦKOMP dictionary, the residual error of the reconstructed 3D images obtained using T-LARS was R2=0.096 (9.6%) and for Kronecker-OMP, it was R2=0.077 (7.7%). For the ΦTLARS dictionaries, the residual error of the reconstructed 3D images obtained using T-LARS was R2=0.0877 (8.77%) and for Kronecker-OMP, it was R2=0.0722 (7.22%).

Figure 11:

Experiments 2 and 3. Original 3D PET-CT brain image (a), its reconstructions using 5% nonzero coefficients (K=14,400) (b–e), the difference images (f–i) obtained using Kronecker-OMP and T-LARS, using our learned overcomplete dictionaries.

Figure 11:

Experiments 2 and 3. Original 3D PET-CT brain image (a), its reconstructions using 5% nonzero coefficients (K=14,400) (b–e), the difference images (f–i) obtained using Kronecker-OMP and T-LARS, using our learned overcomplete dictionaries.

Figure 12:

Experiments 2 and 3. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D PET-CT brain image and using our learned overcomplete dictionaries.

Figure 12:

Experiments 2 and 3. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D PET-CT brain image and using our learned overcomplete dictionaries.

5.4.3  Experiment 4

Figures 13 and 14 show the experimental results for representing 3D PET-CT brain images using three fixed overcomplete mode-n dictionaries, Φ(1)R180×364, Φ(2)R160×320, and Φ3R10×26, by solving the L1 minimization problem using T-LARS. The residual error of the reconstructed 3D PET-CT brain images obtained using T-LARS is R2=0.0838 (8.38%).

Figure 13:

Experiment 4. Original 3D PET-CT brain image (a), its reconstruction using 5% nonzero coefficients (K=14,400) obtained by T-LARS using fixed mode-n overcomplete dictionaries (b), and the difference image (c).

Figure 13:

Experiment 4. Original 3D PET-CT brain image (a), its reconstruction using 5% nonzero coefficients (K=14,400) obtained by T-LARS using fixed mode-n overcomplete dictionaries (b), and the difference image (c).

Figure 14:

Experiment 4. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying T-LARS to our 3D PET-CT brain image and using fixed mode-n overcomplete dictionaries.

Figure 14:

Experiment 4. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying T-LARS to our 3D PET-CT brain image and using fixed mode-n overcomplete dictionaries.

5.4.4  Experiment 5

Figures 15 and 16 show the experimental results for representing the 3D PET-CT brain images using our learned overcomplete dictionary ΦTLARS, by solving the L1 minimization problem using T-LARS. The residual error of the reconstructed 3D PET-CT brain images obtained using T-LARS is R2=0.106 (10.6%).

Figure 15:

Experiment 5. Original 3D PET-CT brain image (a) and its reconstruction using 5% nonzero coefficients (K=14,400) obtained by T-LARS using our learned overcomplete dictionary (b) and the difference image (c).

Figure 15:

Experiment 5. Original 3D PET-CT brain image (a) and its reconstruction using 5% nonzero coefficients (K=14,400) obtained by T-LARS using our learned overcomplete dictionary (b) and the difference image (c).

Figure 16:

Experiment 5. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying T-LARS to our 3D PET-CT brain image and using our learned overcomplete dictionary.

Figure 16:

Experiment 5. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying T-LARS to our 3D PET-CT brain image and using our learned overcomplete dictionary.

6  Conclusion

In this letter, we developed tensor least angle regression (T-LARS), a generalization of least angle regression to efficiently solve large L0 or large L1 constrained multidimensional (tensor) sparse least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameter λ. An earlier generalization of OMP, Kronecker-OMP was developed to solve the L0 problem for large multidimensional sparse least-squares problems. To demonstrate the validity and performance of our T-LARS algorithm, we successfully used it to obtain different K-sparse signal representations of two 3D brain images, using fixed and learned separable overcomplete dictionaries, by solving 3D and 4D, L0 and L1 constrained sparse least-squares problems. Our numerical experiments demonstrate that our T-LARS algorithm is significantly faster (46 to 70 times) than Kronecker-OMP in obtaining K-sparse solutions for multilinear least-squares problems. However, the K-sparse solutions obtained using Kronecker-OMP always have a slightly lower residual error (1.55% to 2.25%) than those obtained by T-LARS. These numerical results confirm our analysis in section 4.2 that the computational complexity of T-LARS is significantly lower than the computational complexity of Kronecker-OMP. In future work we plan to exploit this significant computational efficiency of T-LARS to develop more computationally efficient Kronecker dictionary learning methods. A Matlab GPU-based implementation of our Tensor Least Angle Regression (T-LARS) algorithm, Algorithm 2, is available at https://github.com/SSSherif/Tensor-Kronecker-Least-Angle-Regression.

Appendix A:  Mapping of Column Indices of Dictionary Φ to Column Indices of Mode-n Dictionaries

T-LARS avoids the construction of large matrices such as the separable dictionary Φ in equation 2.4. Instead, T-LARS uses mode-n dictionaries for calculations. The following mapping between column indices of dictionary Φ and column indices of mode-n dictionaries Φ(n);n1,,N is essential in T-LARS calculations.

The arbitrary column Φk is the kth column of the separable dictionary Φ in equation 2.4, which is given by the Kronecker product of the columns of the dictionary matrices Φ1,Φ2,,ΦN, n1,,N:
Φk=ϕiN(N)ϕiN-1N-1ϕI11.
(A.1)
The column indices (iN,iN-1,,i1) are the indices of the columns of the dictionary matrices Φ1,Φ2,,Φ(N). The column index k of the separable dictionary Φ is given by (Kolda, 2006)
k=i1+n=2Nin-1I1I2In-1,
(A.2)
where, I1I2,,IN are the dimensions of the columns of the dictionary matrices Φ1,Φ2,,Φ(N), respectively. The following proposition shows how to obtain the column indices of the dictionary matrices (iN,iN-1,,i1), which correspond to the column index k of the separable dictionary Φ.
Proposition 1.
Let k be the column index of the separable dictionary column vector Φk and In be the dimension of the columns of each dictionary matrix Φ(n);n1,,N. In equation A.1, the corresponding column indices in;n1,,N of each dictionary matrix ϕin(n) are given by
in=kI1××In-1-p=n+1;pNNip-1q=n;q>0p-1Iq
(A.3)
where * indicates the ceiling function—for example,
i1=k-iN-1IN-1××I1--i2-1I1iN-1=kI1××IN-2-iN-1IN-1iN=kI1××IN-1.
Proof.

We note that in;n{1,2,,N are integers and 1inIn.

From equation A.2,
k=I1+i2-1I1++iN-1I1××IN-1.
(A.4)
Therefore,
iN-1=kI1××IN-1SN-i1+(i2-1)I1++(iN-1-1)I1××IN-2I1××IN-1fNiN=SN+1-fN.
(A.5)
Since inIn;n{1,2,,N},
fNI1+I2-1I1++IN-1-1I1××IN-2I1××IN-1=1.
(A.6)
Also since 1in;n{1,2,,N}, we have
fN=1+1-1I1++1-1I1××IN-2I1××IN-1>0.
(A.7)
Therefore, from equations A.6 and A.7
0<fN1,
(A.8)
01-fN<1.
(A.9)
Since iNZ, from equations A.5 and A.9, we have
iN=iN-1-fN=SNiN=kI1××IN-1,
(A.10)
where * indicates the ceiling function.
Similarly,
iN-1-1=kI1××IN-2-(iN-1)IN-1SN-1-i1+(i2-1)I1++(iN-1-1)I1××IN-2I1××IN-2fN-1,iN-1=sN-1+1-fN-1,01-fN-1<1.
Since iN-1Z and 01-fN-1<1,
iN-1=iN-1-1-fN-1=SN-1iN-1=kI1××IN-2-iN-1IN-1.
Similarly, for n{1,2,,N},
01-fn<1,in=in-1-fn=Sn,in=kI1××In-1-p=n+1;pNNip-1q=n;q>0p-1Iq.

Appendix B:  Tensor and Dictionary Normalization

B.1  Normalization of the Tensor YRJ1××Jn××JN

Compute
Y^=YY2.
(B.1)
where, Y2=Y,Y=j1J1jNJNyj1j2jN212.

B.2  Normalization of Columns of the Separable Dictionary Φ to Have a Unit L2 Norm

The column Φk in equation A.1 is the kth column of the separable dictionary Φ. Normalization of each column vector Φk of the separable dictionary is given by
Φ^k=ΦkΦk2.
(B.2)
Proposition 2.
Normalization of the column Φk in equation A.1 is given by the Kronecker product of the normalization of the dictionary columns ϕiN(N),ϕiN-1N-1,,ϕI11:
Φ^k=ϕ^iNNϕ^iN-1N-1ϕ^I11.
(B.3)
Proof.
The L2 norm of the Kronecker product of vectors is the product of L2 norms of these vectors (Lancaster & Farahat, 1972):
Φk22=ϕiN(N)ϕiN-1N-1ϕI11=ϕiNN22×ϕiN-1N-122××ϕI1122.
(B.4)
From equations B.2 and B.3,
Φ^k=ΦkΦk2=ϕiN(N)ϕiN(N)22ϕI11ϕI1122.
(B.5)
Therefore,
Φ^k=ϕ^iN(N)ϕ^iN-1N-1ϕ^I11.

Appendix C:  Obtaining the Initial Correlation Tensor C1

In LARS the initial correlation vector c1 is obtained by taking the correlation between all columns of Φ and the vectorization of the tensor Y:
c1=Φ(N)TΦN-1TΦ1TvecY.
(C.1)
We could also represent equation C.1 as a multilinear transformation of the tensor Y .(Kolda, 2006):
C1=Y×1Φ1T×2×nΦ(n)T×n+1×NΦ(N)T.
(C.2)
The tensor C1 is the correlation between the tensor Y and the mode-n dictionary matrices Φ(n);n1,,N. The tensor C1 could be calculated efficiently as N mode-n products.

Appendix D:  Creating a Gram Matrix for Each mode-n Dictionary Φn

Gram matrices are used in several steps of T-LARS. For a large separable dictionary, Φ, its Gram matrix would be large as well. Therefore, explicitly building this Gram matrix and using it in computations could be very inefficient for large problems. Therefore, we developed T-LARS to use Gram matrices of mode-n dictionary matrices, Φ1,Φ2,,Φ(N), defined as G(n);n1,,N, instead of the Gram matrix ΦTΦ:
ΦTΦ=ΦNTΦNΦnTΦnΦ1TΦ1.
(D.1)
We can obtain Gram matrix G(n) for each mode-n dictionary Φ(n) by
G(n)=Φ(n)TΦ(n).
(D.2)
The total sizes of the Gram matrices G(n);n1,,N would be much smaller than the Gram matrix G=ΦTΦ, thereby allowing faster calculations and requiring less computer storage.

Acknowledgments

This work was supported by a Discovery Grant (RGPIN-2018-06453) from the National Sciences and Engineering Research Council of Canada.

References

Acar
,
E.
,
Dunlavy
,
D. M.
, &
Kolda
,
T. G.
(
2011
).
A scalable optimization approach for fitting canonical tensor decompositions.
Journal of Chemometrics
,
25
,
67
86
. https://doi.org/10.1002/cem.1335
Björck
,
Å.
(
2015
).
Numerical methods in matrix computations
.
Berlin
:
Springer
.
Boyd
,
S.
, &
Vandenberghe
,
L.
(
2010
).
Convex optimization
.
Cambridge
:
Cambridge University Press
.
Caiafa
,
C. F.
, &
Cichocki
,
A.
(
2012
).
Computing sparse representations of multidimensional signals using Kronecker bases.
Neural Computation
,
25
,
1
35
. https://doi.org/10.1162/neco_a_00385
Chen
,
S. S.
,
Donoho
,
D. L.
, &
Saunders
,
M. A.
(
1998
).
Atomic decomposition by basis pursuit.
SIAM Journal on Scientific Computing
,
20
(
1
),
33
61
. https://doi.org/10.1137/S1064827596304010
Cichocki
,
A.
,
Mandic
,
D.
,
De Lathauwer
,
L.
,
Zhou
,
G.
,
Zhao
,
Q.
,
Caiafa
,
C.
, &
Phan
,
H. A.
(
2015
).
Tensor decompositions for signal processing applications: From two-way to multiway component analysis.
IEEE Signal Processing Magazine
,
32
,
145
163
. https://doi.org/10.1109/MSP.2013.2297439
Clark
,
K.
,
Vendt
,
B.
,
Smith
,
K.
,
Freymann
,
J.
,
Kirby
,
J.
,
Koppel
,
P.
, &
Prior
,
F.
(
2013
).
The cancer imaging archive (TCIA): Maintaining and operating a public information repository.
Journal of Digital Imaging
,
26
(
6
),
1045
1057
. https://doi.org/10.1007/s10278-013-9622-7
Coifman
,
R. R.
, &
Wickerhauser
,
M. V.
(
1992
).
Entropy-based algorithms for best basis selection.
IEEE Transactions on Information Theory
,
38
(
2
),
713
718
. https://doi.org/10.1109/18.119732
Daubechies
,
I.
(
1990
).
The wavelet transform, time-frequency localization and signal analysis.
IEEE Transactions on Information Theory
,
36
(
5
),
961
1005
. https://doi.org/10.1109/18.57199
Donoho
,
D. L.
(
2006
).
For most large underdetermined systems of linear equations the minimal 1-norm solution is also the sparsest solution.
Communications on Pure and Applied Mathematics
,
59
(
6
),
797
829
. https://doi.org/10.1002/cpa.20132
Donoho
,
D. L.
, &
Elad
,
M.
(
2003
).
Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization.
In
Proceedings of the National Academy of Sciences
,
100
(
5
),
2197
2202
. https://doi.org/10.1073/pnas.0437847100
Donoho
,
D. L.
, &
Tsaig
,
Y.
(
2008
).
Fast solution of 1-norm minimization problems when the solution may be sparse.
IEEE Transactions on Information Theory
,
54
(
11
),
4789
4812
. https://doi.org/10.1109/TIT.2008.929958
Efron
,
B.
,
Hastie
,
T.
,
Johnstone
,
I.
,
Tibshirani
,
R.
,
Ishwaran
,
H.
,
Knight
,
K.
, &
Tibshirani
,
R.
(
2004
).
Least angle regression.
Annals of Statistics
,
32
(
2
),
407
499
. https://doi.org/10.1214/009053604000000067
Elad
,
M.
, &
Aharon
,
M.
(
2006
).
Image denoising via sparse and redundant representations over learned dictionaries.
IEEE Transactions on Image Processing
,
15
(
12
),
3736
3745
. https://doi.org/10.1109/TIP.2006.881969
Elrewainy
,
A.
, &
Sherif
,
S. S.
(
2019
).
Kronecker least angle regression for unsupervised unmixing of hyperspectral imaging data.
Signal, Image and Video Processing
,
14
(
2
),
359
367
. https://doi.org/10.1007/s11760-019-01562-w
Goldfarb
,
D.
(
1972
).
Modification methods for inverting matrices and solving systems of linear algebraic equations.
Mathematics of Computation
,
26
(
120
),
829
829
. https://doi.org/10.1090/S0025-5718-1972-0317527-4
Hager
,
W. W.
(
1989
).
Updating the inverse of a matrix.
SIAM Review
,
31
(
2
),
221
239
. https://doi.org/10.1137/1031049
Hawe
,
S.
,
Seibert
,
M.
, &
Kleinsteuber
,
M.
(
2013
). Separable dictionary learning. In
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(pp.
438
445
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/CVPR.2013.63
Kolda
,
T. G.
(
2006
).
Multilinear operators for higher-order decompositions.
(Technical Report SAND2006-2081)
.
Livermore, CA
:
Sandia National Laboratories
. https://doi.org/10.2172/923081
Kolda
,
T. G.
, &
Bader
,
B. W.
(
2009
).
Tensor decompositions and applications.
SIAM Review
,
51
(
3
),
455
500
. https://doi.org/10.1137/07070111X
Kreutz-Delgado
,
K.
,
Murray
,
J. F.
,
Rao
,
B. D.
,
Engan
,
K.
,
Lee
,
T.-W.
, &
Sejnowski
,
T. J.
(
2003
).
Dictionary learning algorithms for sparse representation.
Neural Computation
,
15
(
2
),
349
396
. https://doi.org/10.1162/089976603762552951
LaMontagne
,
P. J.
,
Keefe
,
S.
,
Lauren
,
W.
,
Xiong
,
C.
,
Grant
,
E. A.
,
Moulder
,
K. L.
, &
Marcus
,
D. S.
(
2018
).
OASIS-3: Longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and Alzheimer's disease.
Alzheimer's and Dementia
,
14
(
7S, pt. 2
),
P138
P138
. https://doi.org/10.1016/j.jalz.2018.06.2231
Lancaster
,
P.
, &
Farahat
,
H. K.
(
1972
).
Norms on direct sums and tensor products
.
Mathematics and Computation
,
26
(
118
),
401
414
.
Malioutov
,
D. M.
,
Çetin
,
M.
, &
Willsky
,
A. S.
(
2005
).
Homotopy continuation for sparse signal representation.
In
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
(pp.
733
736
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/ICASSP.2005.1416408
Mallat
,
S. S.
(
2009
).
A wavelet tour of signal processing
.
Amsterdam
:
Elsevier
. https://doi.org/10.1016/B978-0-12-374370-1.X0001-8
Mallat
,
S. G.
, &
Zhang
,
Z.
(
1993
).
Matching pursuits with time-frequency dictionaries.
IEEE Transactions on Signal Processing
,
41
(
12
),
3397
3415
. https://doi.org/10.1109/78.258082
Pati
,
Y. C.
,
Rezaiifar
,
R.
, &
Krishnaprasad
,
P. S.
(
1993
).
Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition.
In
Proceedings of the Asilomar Conference on Signals, Systems & Computers
(vol.
1
, pp.
40
44
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/acssc.1993.342465
Roemer
,
F.
,
Del Galdo
,
G.
, &
Haardt
,
M.
(
2014
).
Tensor-based algorithms for learning multidimensional separable dictionaries.
In
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
(pp.
3963
3967
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/ICASSP.2014.6854345
Rosário
,
F.
,
Monteiro
,
F. A.
, &
Rodrigues
,
A.
(
2016
).
Fast matrix inversion updates for massive MIMO detection and precoding.
IEEE Signal Processing Letters
,
23
(
1
),
75
79
. https://doi.org/10.1109/LSP.2015.2500682
Sidiropoulos
,
N. D.
,
De Lathauwer
,
L.
,
Fu
,
X.
,
Huang
,
K.
,
Papalexakis
,
E. E.
, &
Faloutsos
,
C.
(
2017
).
Tensor decomposition for signal processing and machine learning.
IEEE Transactions on Signal Processing
,
65
(
13
),
3551
3582
. https://doi.org/10.1109/TSP.2017.2690524
Sulam
,
J.
,
Ophir
,
B.
,
Zibulevsky
,
M.
, &
Elad
,
M.
(
2016
).
Trainlets: Dictionary learning in high dimensions
.
ArXiv:1602.00212
. https://doi.org/10.1109/TSP.2016.2540599
Tosic
,
I.
, &
Frossard
,
P.
(
2011
).
Dictionary learning.
IEEE Signal Processing Magazine
,
28
(
2
),
27
38
. https://doi.org/10.1109/MSP.2010.939537
Tropp
,
J. A.
(
2004
).
Greed is good: Algorithmic results for sparse approximation.
IEEE Transactions on Information Theory
,
50
(
10
),
2231
2242
. https://doi.org/10.1109/TIT.2004.834793
van den Berg
,
E.
, &
Friedlander
,
M. P.
(
2009
).
Probing the Pareto frontier for basis pursuit solutions.
SIAM Journal on Scientific Computing
,
31
(
2
),
890
912
. https://doi.org/10.1137/080714488
Wickramasingha
,
I.
,
Sobhy
,
M.
, &
Sherif
,
S. S.
(
2017
).
Sparsity in Bayesian signal estimation.
In
J. P.
Tejedor
(Ed.),
Bayesian inference
.
InTech
. https://doi.org/10.5772/intechopen.70529
Yang
,
J.
,
Peng
,
Y.
,
Xu
,
W.
, &
Dai
,
Q.
(
2009
).
Ways to sparse representation: An overview.
Science in China, Series F: Information Sciences
,
52
(
4
),
695
703
. https://doi.org/10.1007/s11432-009-0045-5
Zhai
,
L.
,
Zhang
,
Y.
,
Lv
,
H.
,
Fu
,
S.
, &
Yu
,
H.
(
2018
).
Multiscale tensor dictionary learning approach for multispectral image denoising.
IEEE Access
,
6
,
51898
51910
. https://doi.org/10.1109/ACCESS.2018.2868765

External Supplements