## Abstract

Sparse signal representations have gained much interest recently in both signal processing and statistical communities. Compared to orthogonal matching pursuit (OMP) and basis pursuit, which solve the $L0$ and $L1$ constrained sparse least-squares problems, respectively, least angle regression (LARS) is a computationally efficient method to solve both problems for all critical values of the regularization parameter $λ$. However, all of these methods are not suitable for solving large multidimensional sparse least-squares problems, as they would require extensive computational power and memory. An earlier generalization of OMP, known as Kronecker-OMP, was developed to solve the $L0$ problem for large multidimensional sparse least-squares problems. However, its memory usage and computation time increase quickly with the number of problem dimensions and iterations. In this letter, we develop a generalization of LARS, tensor least angle regression (T-LARS) that could efficiently solve either large $L0$ or large $L1$ constrained multidimensional, sparse, least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameter $λ$ and with lower computational complexity and memory usage than Kronecker-OMP. To demonstrate the validity and performance of our T-LARS algorithm, we used it to successfully obtain different sparse representations of two relatively large 3D brain images, using fixed and learned separable overcomplete dictionaries, by solving both $L0$ and $L1$ constrained sparse least-squares problems. Our numerical experiments demonstrate that our T-LARS algorithm is significantly faster (46 to 70 times) than Kronecker-OMP in obtaining $K$-sparse solutions for multilinear leastsquares problems. However, the $K$-sparse solutions obtained using Kronecker-OMP always have a slightly lower residual error (1.55% to 2.25%) than ones obtained by T-LARS. Therefore, T-LARS could be an important tool for numerous multidimensional biomedical signal processing applications.

## 1  Introduction

Sparse signal representations have gained much interest recently in both signal processing and statistics communities. A sparse signal representation usually results in simpler and faster processing, in addition to lower memory storage requirement for fewer coefficients (Mallat, 2009; Wickramasingha, Sobhy, & Sherif, 2017). However, finding optimal sparse representations for different signals is not a trivial task (Mallat, 2009). Therefore, redundant signal representations using overcomplete dictionaries have been introduced to facilitate finding more sparse representations for different signals (Mallat & Zhang, 1993; Mallat, 2009; Pati, Rezaiifar, & Krishnaprasad, 1993; Wickramasingha et al., 2017). Undercomplete dictionaries could also be used to obtain approximate signal representations (Kreutz-Delgado et al., 2003; Malioutov, Çetin, & Willsky, 2005; Tosic & Frossard, 2011). We note that at their core such signal representation problems typically involve solving a least-squares problem (Donoho, 2006; Donoho & Elad, 2003).

A number of methods have been proposed to solve the sparse least-squares problem, including the Method of Frames (MOF; Daubechies, 1990), Matching Pursuit (MP; Mallat & Zhang, 1993), Orthogonal Matching Pursuit (OMP; Pati et al., 1993), Best Orthogonal Basis (BOB; Coifman & Wickerhauser, 1992), Lasso (also known as Basis Pursuit, BP); Chen, Donoho, & Saunders, 1998; Efron et al., 2004), and Least Angle Regression (LARS; Efron et al., 2004). MP and OMP obtain sparse signal representations by solving a nonconvex $L0$ constrained least-squares problem (Tropp, 2004). MPs are heuristic methods that construct sparse signal representations by sequentially adding atoms from a given dictionary in a greedy (i.e., nonglobally optimal) manner. BP relaxes the nonconvex $L0$ constrained optimization problem to solve a convex $L1$ constrained least-squares problem instead (Chen et al., 1998). In both problem formulations, a regularization parameter $λ$ determines the trade-off between the representation error of the signal and its sparsity, as shown in the Pareto curve in van den Berg and Friedlander (2009). A common approach to obtaining a sparse signal representation using BP is to solve the optimization problem multiple times for different values $λ$, before choosing the most suitable solution for the application at hand (van den Berg & Friedlander, 2009).

Compared to the above methods, LAR efficiently solves the $L0$ or, with a slight modification the $L1$ constrained least-squares problem for all critical values of the regularization parameter $λ$ (Efron et al., 2004). However, even LARS is not suitable for large-scale problems as it would require multiplication and inversion of very large matrices (Yang, Peng, Xu, & Dai, 2009). For example, for $m$ unknown variables and $n$ equations, the LARS algorithm has $O(m3+nm2)$ computational complexity (Efron et al., 2004).

LARS and other algorithms to solve sparse least-squares problems are directly applicable to one-dimensional signals. Therefore, multidimensional signals that are represented by tensors (i.e., multidimensional arrays) would need to be vectorized first to enable the application of these methods (Acar, Dunlavy, & Kolda, 2011; Cichocki et al., 2015; Kolda, 2006). For $IN$ number of vectorized variables, a dictionary of size $IN×JN$, where $J>I$ for an overcomplete dictionary, is required to solve the sparse linear least-squares problem, using LARS and other algorithms mentioned above. $N$ is the order of the tensor, also known as the number of modes, and $I$ is the dimension of each mode. Therefore, the number of vectorized unknown variables $IN$ would increase exponentially with the order of the tensor $N$. Such problems would quickly become increasingly large and computationally intractable. For example, a 3D tensor with 100 unknown variables in each mode has 1 million unknowns, which requires a dictionary with at least 1 trillion $(1012)$ elements; a 4D tensor with 100 unknown variables in each mode has 10 million unknowns, which requires a dictionary of at least 10 quadrillion $(1016)$ elements.

Mathematically separable signal representations (i.e., those using separable dictionaries) have been typically used for multidimensional signals because they are simpler and easier to obtain than nonseparable representations (Caiafa & Cichocki, 2012; Sulam, Ophir, Zibulevsky, & Elad, 2016). Caiafa and Cichocki (2012) introduced Kronecker-OMP a generalization of OMP that could represent multidimensional signals represented by tensors using separable dictionaries. They also developed the N-BOMP algorithm to exploit block-sparse structures in multidimensional signals. However, similar to OMP, Kronecker-OMP could obtain only an approximate nonglobally optimal solution of the nonconvex $L0$ constrained sparse, least-squares problem (Elrewainy & Sherif, 2019; van den Berg & Friedlander, 2009). Also, there is currently no computationally efficient method to obtain a sparse representation of a multidimensional signal by solving the convex $L1$ constrained sparse, least-squares problem for all critical values of the regularization parameter $λ$. However, two of us, Elrewainy and Sherif have developed the Kronecker least angle regression (K-LARS) algorithm to efficiently solve either large $L0$ or large $L1$ sparse, least-squares problems (overdetermined) with a particular Kronecker form $A⊗I$, for all critical values of the regularization parameter $λ$ (Elrewainy & Sherif, 2019). They used K-LARS to sparsely fit one-dimensional multichannel, hyperspectral, spectral imaging data to a Kronecker model $A⊗I$.

In this letter, we develop a generalization of K-LARS, tensor least angle regression (T-LARS) which could efficiently solve either large $L0$ or large $L1$ multidimensional sparse least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameter $λ$. The balance of the letter is organized as follows: Section 2 provides a brief introduction to tensors, tensor operations, and multilinear sparse least-squares problems. In section 3, we review LARS and describe our T-LARS algorithm in detail. Section 4 presents the computational complexity of our T-LARS algorithm and compares its computational complexity with that of Kronecker-OMP. Section 5 provides results of experiments applying both T-LARS and Kronecker-OMP. We present our conclusions in section 6.

## 2  Problem Formulation

### 2.1  Tensors and Multilinear Transformations

The term tensor has a specific mathematical definition in physics, but it has been widely accepted in many disciplines (e.g., signal processing and statistics) to mean a multidimensional array. Therefore, a vector is a first-order tensor, and a matrix is a second-order tensor. An N-dimensional array is an $Nth$-order tensor whose $N$ dimensions are also known as modes (Hawe, Seibert, & Kleinsteuber, 2013; Sidiropoulos et al., 2017). The Nth-order tensor $X∈RI1×…×In×…×IN$ has $N$ modes, with dimensions $I1,I2,…,IN$ where vectors along a specific mode, n, are called mode-n fibers.

Vectorization and mode-$n$ matricization of tensors (Hawe et al., 2013; Sidiropoulos et al., 2017) are two important tensor reshaping operations. As the names imply vectorization of a tensor generates a vector and the matricization of a tensor generates a matrix. Tensors are vectorized by stacking mode-1 fibers in reverse lexicographical order, where this vectorization is denoted $vec(X)$:
$X∈RI1×…×In×…×IN⟶vec(X)∈RININ-1…I1.$
In mode-$n$ tensor matricization, mode-$n$ fibers become the columns of the resulting matrix. We note that such ordering of these columns is not consistent across the literature (Kolda & Bader, 2009; Kolda, 2006). In this letter, we use reverse lexicographical order $IN…In+1In-1…I1$ for the column ordering in mode-$n$ tensor matricization. In such reverse lexicographical order, $I1$ varies the fastest and $IN$ varies the slowest. Let $Xn$ denote mode-$n$ matricization of a tensor $X$:
$X∈RI1×…×In×…×IN⟶X(n)∈RIn×IN…In+1In-1…I1.$
Another important operation that could be performed on a tensor is its mode-$n$ product, that is, its multiplication by a matrix along one of its modes. Let the tensor $X∈RI1×…×In×…×IN$ and a matrix $Φ(n)∈RJn×In$; then their mode-$n$ product, $Yn∈RI1×…×Jn×…×IN$, is denoted as
$Yn=X×nΦ(n),$
(2.1)
where all mode-$n$ fibers of the tensor are multiplied by the matrix $Φ(n)$. Equation 2.1 could also be written as $Y(n)=Φ(n)X(n)$, where $Y(n)$ and $X(n)$ are mode-$n$ matricizations of tensors $X$ and $Y$, respectively. A mode-$n$ product could be thought of as a linear transformation of the mode-$n$ fibers of a tensor. Therefore, a multilinear transformation of a tensor $X$ could be defined as
$Y=X×1Φ(1)×2Φ(2)×3⋯×NΦ(N),$
(2.2)
where, $Φ(n);n∈{1,2,…,N}$ are matrices with dimensions $Φ(n)∈RJn×In;n∈1,2,…,N$ and $Y∈RJ1×…×Jn×…×JN$ (Hawe et al., 2013; Sidiropoulos et al., 2017). This multilinear transformation could also be written as a product of $vec(X)$ and the Kronecker product of matrices $Φ(n);n∈{1,2,…,N}$ (Hawe et al., 2013; Sidiropoulos et al., 2017):
$vec(Y)=Φ(N)⊗Φ(N-1)⊗⋯⊗Φ(1)vec(X).$
(2.3)
We note that equation 2.3 is a linear system relating to $x=vec(X)$ and $y=vec(Y)$. If matrix $Φ$ represents a separable dictionary, that is,
$Φ=(Φ(N)⊗Φ(N-1)⊗⋯⊗Φ(2)⊗Φ(1)).$
(2.4)
then equation 2.3 describes a representation of $y=vec(Y)$ using a dictionary $Φ$, where $x=vec(X)$ represents its coefficients $(y=Φx)$. Similarly, we could think of equation 2.2 as a representation of tensor $Y$ using dictionaries $Φ(n);n∈{1,2,…,N}$, where tensor $X$ represents its coefficients.

### 2.2  Sparse Multilinear Least-Squares Problem

A sparse multilinear representation of equation 2.3 could be obtained by rewriting it as an $Lp$ minimization problem (Wickramasingha et al., 2017)
$x˜=argminxΦ(N)⊗⋯⊗Φ(n)⊗⋯⊗Φ(1)vec(X)-vec(Y)22+λvec(X)p,$
(2.5)
where $λ$ is a regularization parameter.
Alternatively using equations 2.2 and 2.3, equation 2.5 could be written as
$X˜=argminXX×1Φ(1)×2Φ(2)×3⋯×NΦ(N)-Y22+λXp,$
(2.6)
where $Y∈RJ1×…×Jn×…×JN$ and $X∈RI1×…×In×…×IN$.

For the same values of $p$, the $Lp$ minimization problems in equations 2.5 and 2.6 are equivalent, even though their formulations are in vector and tensor forms, respectively. We note that for $0≤p<1$, equations 2.5 and 2.6 are nonconvex optimization problems, while for $p≥1$, they are convex optimization problems. The sparsest solution of equations 2.5 and 2.6 would be obtained when $p=0$, that is, $L0$ constrained problem, but its sparsity would be reduced as $p$ increases.

The $L0$ minimization problem ($p=0$) is a nonconvex optimization problem whose vector formulation, problem 2.5, could be solved approximately (i.e., nonglobally) using OMP or LARS (van den Berg & Friedlander, 2009). The $L1$ minimization problem ($p=1$) is a convex optimization problem, whose vector formulation, problem 2.5, could be exactly (i.e., globally) solved using BP or LARS (Donoho & Tsaig, 2008). However, OMP, BP, and LARS share a serious drawback: they are not suitable for solving very large sparse least-squares problems because they involve multiplication and inverting very large matrices.

As a way of overcoming this drawback, Caiafa and Cichocki (2012) proposed Kronecker-OMP a tensor-based generalization of OMP for solving sparse multilinear least-squares problems. However, similar to OMP, Kronecker-OMP could obtain only a nonglobally optimal solution of the nonconvex $L0$ constrained, sparse, least-squares problem (Donoho & Tsaig, 2008).

## 3  Tensor Least Angle Regression (T-LARS)

Least angle regression (LARS) is a computationally efficient method to solve an $L0$ or $L1$ constrained minimization problem in vector form, problem 2.5 for all critical values of the regularization parameter $λ$ (Efron et al., 2004). In this letter, we develop a generalization of LARS, tensor least angle regression (T-LARS), to solve large sparse tensor least-squares problems to for example, obtain sparse representations of multidimensional signals using a separable dictionary as described by equation 2.3. As shown below, our T-LARS calculations are performed without explicitly generating or inverting large matrices, thereby keeping its computational complexity and memory requirement relatively low. Both T-LARS and Kronecker-OMP algorithms use the Schur complement inversion formula for inverting large matrices without explicitly inverting them (Caiafa & Cichocki, 2012).

### 3.1  Least Angle Regression

LARS solves the $L0$ or $L1$ constrained minimization problem in equation 2.5 for all critical values of the regularization parameter $λ$. It starts with a very large value of $λ$ that results in an empty active column's matrix, $ΦI$ and a solution $x˜t=0=0$. The set $I$ denotes an active set of the dictionary $Φ$ that is column indices where the optimal solution $x˜t$ at iteration $t$ is nonzero, and $Ic$ denotes its corresponding inactive set. Therefore, $ΦI$ contains only the active columns of the dictionary $Φ$ and $ΦIc$ contains only its inactive columns.

At each iteration $t$, a new column is either added $(L0)$ to the active set $I$ or added or removed ($L1$) from the active set $I$, and $λ$ is reduced by a calculated value $δt*$. As a result of such iterations new solutions with an increased number of coefficients that follow a piecewise linear path are obtained until a predetermined residual error $ɛ$ is obtained. One important characteristic of LARS is that the current solution at each iteration is the optimum sparse solution for the selected active columns.

Initialization of LARS includes setting the active set to an empty set, $I={}$, the initial solution vector $x˜0=0$, initial residual vector $r0=y$, and initial regularization coefficient $λ1=max(c1)$ where, $c1=ΦTr0$. The optimal solution $x˜t$ at any iteration $t$ must satisfy the following two optimality conditions,
$ΦIcTrt∞≤λt$
(3.1)
$ΦITrt=-λtzt,$
(3.2)
where, $rt$ is the residual vector at iteration $t$, $rt=y-Φx˜t$, and $zt$ is the sign sequence of $ct$ on the active set $I$.

The condition in equation 3.2 ensures that the magnitude of the correlation between all active columns and the residual is equal to $|λt|$ at each iteration, and the condition in equation 3.1 ensures that the magnitude of the correlation between the inactive columns and the residual is less than or equal to $λt$.

For an $L1$ constrained minimization problem, at each iteration, if an inactive column violates condition 3.1, it is added to the active set, and if an active column violates condition 3.2, it is removed from the active set. For an $L0$ constrained minimization problem only the columns that violate condition 3.1 are added to the active set at each iteration.

For a given active set $I$, the optimal solution $x˜t$ could be written as
$x˜t=ΦItTΦIt-1ΦItTy-λtzt,onI0,Otherwise$
(3.3)
where, $zt$ is the sign sequence of $ct$ on the active set $I$ and $ct=ΦTrt-1$ is the correlation vector of all columns of the dictionary $Φ$ with the residual vector $rt-1$ at iteration $t$. (See algorithm 1 for the LARS algorithm.)

### 3.2  Tensor Least Angle Regression Algorithm

T-LARS is a generalization of LARS to solve the sparse multilinear least-squares problem in equation 2.6 using tensors and multilinear algebra. Unlike LARS, T-LARS does not calculate large matrices such as the Kronecker dictionary, $Φ$ in equation 2.4, which is required in vectorized sparse multilinear least-squares problems. Instead, T-LARS uses much smaller mode-$n$ dictionaries for calculations. A mapping between column indices of dictionary $Φ$ and column indices of mode-$n$ dictionaries $Φ(n);n∈1,…,N$ is essential in T-LARS calculations (see appendix A).

Required inputs to the T-LARS algorithm are the tensor $Y∈RJ1×…×Jn×…×JN$, mode-$n$ dictionary matrices $Φ(n);n∈{1,…,N}$, and the stopping criterion as a residual tolerance $ɛ$ or the maximum number of nonzero coefficients $K$ (K-sparse representation). The output is the solution tensor $X∈RI1×…×In×…×IN$.

First, normalize the tensor $Y$ and columns of each dictionary $Φ(n);n∈1,…,N$ to have a unit $L2$ norm. Note that normalizing columns of each dictionary $Φ(n);n∈1,…,N$ ensures normalization of the separable dictionary $Φ$ in equation 2.4 (see appendix B). For notational simplicity in the following sections, we will use $Y$ to represent the normalized tensor and $Φ(n)$ to represent normalized dictionary matrices.

Gram matrices are used in several steps of T-LARS. For a large separable dictionary, $Φ$, its Gram matrix $G=ΦTΦ$ would be large as well. Therefore, explicitly building this Gram matrix and using it in computations could be very inefficient for large problems. Instead, T-LARS uses Gram matrices of mode-$n$ dictionary matrices, $Φ1,Φ2,…,Φ(N)$, defined as $G1,G2,…,G(N)$. We can obtain a Gram matrix $G(n);n∈1,…,N$ for each mode-$n$ dictionary $Φ(n);n∈1,…,N$ by
$G(n)=Φ(n)TΦ(n)$
(3.4)
The tensor $C1$ is the correlation between the tensor $Y$ and the mode-$n$ dictionary matrices $Φ(n);n∈1,…,N$:
$C1=Y×1Φ(1)T×2…×nΦ(n)T×n+1…×NΦ(N)T.$
(3.5)
The tensor $C1$ could be calculated efficiently as $N$ mode-$n$ products and the initial correlation vector is obtained by vectorizing $C1$, where $c1=vecC1$ (see appendix C).

T-LARS requires several parameters to be initialized before starting the iterations. The regularization parameter $λ1$ is initialized to the maximum value of the correlation vector $c1$, and the corresponding most correlated column $ϕI1$ from the separable dictionary $Φ$ is added to the initial active set $I$. The initial residual tensor $R0$ is set to $Y$ and the initial solution vector $x0$ and the initial direction vector $d0$ are set to 0. Initial step size $δ0*$ is also set to 0. T-LARS starts the iterations at $t=1$ to run until a stopping criterion is reached:

• Initial residual tensor: $R0=Y$

• Initial solution vector: $x˜0=0$

• Initial direction vector: $d0=0$

• Initial step size: $δ0*=0$

• Initial regularization parameter: $λ1=max(c1)$

• Active set: $I=ϕI1$

• Start iterations at $t=1$

The following calculations are performed at every iteration $t=1,2,…$ of the T-LARS algorithm until the stopping criterion is reached.

#### 3.2.1  Obtain the Inverse of the Gram Matrix of the Active Columns of the Dictionary

We obtain the Gram matrix of the active columns of the dictionary $Gt=ΦItTΦIt$ at each iteration $t$. The size of this Gram matrix would either increase (dictionary column addition) or decrease (dictionary column removal) with each iteration $t$. Therefore, for computational efficiency, we use the Schur complement inversion formula to calculate $Gt-1$ from $Gt-1-1$ thereby avoiding its full calculation (Caiafa & Cichocki, 2012; Goldfarb, 1972; Rosário, Monteiro, & Rodrigues, 2016).

Updating the Gram matrix after adding a new column $ka$ to the active set. Let the column $ka∈I$ be the new column added to the active matrix. Given $Gt-1-1$, the inverse of the Gram matrix $Gt-1$ could be calculated using the Schur complement inversion formula for a symmetric block matrix (Björck, 2015; Boyd & Vandenberghe, 2010; Hager, 1989),
$Gt-1=F11-1αbαbTα,$
(3.6)
where $F11-1=Gt-1-1+αbbT$, $b=-Gt-1-1ga$, and $α=1/gka,ka+gaTb$ and the column vector $gaT$ is given by
$gaT=gk1,ka⋯gkn,ka⋯gka-1,ka1×a-1.$
The elements $gkn,ka$ of $gaT$ are elements of the gram matrix $Gt$ that are obtained using mode-$n$ gram matrices $G(n);n∈1,…,N$,
$g(kn,ka)=gNknN,kaN⊗…⊗g1kn1,ka1,$
where, $knN,…,kn1$ are the tensor indices corresponding to the column index $kn$ and $kaN,…,ka1$ are the tensor indices corresponding to the column index $ka$ (see appendix A)
Updating the Gram matrix after Removing a Column $kr$ from the Active Set. Let the column $kr∈I$ be the column removed from the active set. We move column $kr$ and row $kr$ of $Gt-1-1$ to become its last column and last row, respectively. We denote this new matrix as $G^t-1-1$. By using the Schur complement inversion formula for a symmetric block matrix, the inverse $G^t-1-1$ could be interpreted as
$G^t-1-1=F11-1αbαbTαN×N,$
(3.7)
where $F11-1=Gt-1+αbbT$. Therefore, we could calculate the inverse of the Gram matrix at iteration $t$ as (Goldfarb, 1972; Rosário et al., 2016)
$Gt-1=F11-1-αbbT.$
(3.8)
Both $F11-1$ and $αbbT$ could be easily obtained from ($G^t-1-1$) as follows (Matlab notation):
$F11-1=G^t-1-11:N-1,1:N-1,$
(3.9)
$αbbT=G^t-1-11:N-1,NG^t-1-1N,1:N-1G^t-1-1(N,N).$
(3.10)

#### 3.2.2  Obtain Direction Vector $dt$

The direction vector along which the solution $x$ follows in a piecewise linear fashion when an active column is added to or removed from the active set is given by
$dt=Gt-1zt,$
(3.11)
where, $zt=signctI$, that is, the sign sequence of the correlation vector over the active set.

#### 3.2.3  Obtain $vt$

A vector $vt$ could be defined as
$vt=ΦTΦItdt.$
(3.12)
This vector $vt$ could be efficiently obtained as a multilinear transformation of a direction tensor $Dt$ by the Gram matrices $G(n);n∈1,…,N$,
$Vt=Dt×1G1×2…×nG(n)×n+1…×NGN,$
(3.13)
where $vecDtI=dtandDtIc=0$. We note that $vecVt=vt$.

#### 3.2.4  Obtain the Correlation Vector $ct$

Because $c1$ would be obtained at initialization, the following calculations are needed only for an iteration $t≥2$. The correlation vector $ct$ is given by
$ct=ΦTvecRt-1,$
(3.14)
where $Rt-1$ is the residual tensor from the previous iteration.
Since
$vecRt-1=vecRt-2-δt-1*ΦIt-1dt-1,$
(3.15)
we could update the correlation vector $ct$ by
$ct=ΦTvecRt-2-δt-1*ΦTΦIt-1dt-1.$
(3.16)
Substituting equation 3.12 into equation 3.16, we obtain an update for the correlation:
$ct=ct-1-δt-1*vt-1.$
(3.17)

#### 3.2.5  Calculate Step Size $δ*$

The minimum step size for adding a new column to the active set is given by
$δt+=mini∈Icλt-ct(i)1-vt(i),λt+cti1+vt(i).$
(3.18)
The minimum step size for removing a column from the active set is given by
$δt-=-xt-1idti.$
(3.19)
Therefore, the minimum step size for $L1$ constrained sparse least-squares problem is
$δt*=minδt+,δt-.$
(3.20)
For the $L0$ constrained sparse least-squares problem, only new columns are added to the active set at every iteration. Therefore, the minimum step size for $L0$ constrained sparse least-squares problem is $δt*=δt+$

#### 3.2.6  Update the Solution $x˜t$⁠, Regularization Parameter, and Residual

The current solution $x˜t=vec(X˜t)$ is given by
$x˜t=x˜t-1+δt*dt.$
(3.21)
Update $λt+1$and $Rt+1$ for the next iteration:
$λt+1=λt-δt*.$
(3.22)
The residual tensor $Rt+1$ for the next iteration ($t+1$) can be calculated as
$Rt+1=Rt-δt*Dt×1Φ1×2Φ(2)×3⋯×NΦN.$
(3.23)

#### 3.2.7  Check Stopping Criterion

Check if either of the following stopping criteria has been reached:
$Rt+12<ɛ$
(3.24)
$orlengthI≥K.$
(3.25)

The T-LARS algorithm solves the sparse tensor least-squares problem in equation 2.6 to obtain a sparse solution $X∈RI1×…×In×…×IN$ for a tensor $Y∈RJ1×…×Jn×…×JN$ using $N$ mode-$n$ dictionaries $Φ(n)∈RJn×In;∀n∈{1,N}$ Tensor $Y$ and $N$ mode-$n$ dictionaries $Φ1,Φ2,…,Φ(N)$ are the inputs to the T-LARS algorithm where $N≥1$. The T-LARS algorithm can be used to solve underdetermined, square or overdetermined sparse tensor least-squares problems, where the mode-$n$ dictionaries $Φ(n)∈RJn×In;∀n∈{1,.N}$ are overcomplete dictionaries $Jn complete dictionaries $Jn=In$ or undercomplete dictionaries $Jn>In$, respectively.

The complete T-LARS algorithm is summarized in algorithm 2 (using Matlab notation). This T-LARS algorithm solves the $Lp$ sparse separable least-squares problem when the T-LARS_mode is set to $Lp$

## 4  Algorithm Computational Complexity

In this section, we analyze the computational complexity of our T-LARS algorithm and compare it with the computational complexity of Kronecker-OMP. We show that the computational complexity of T-LARS is significantly lower than that of Kronecker-OMP when solving sparse tensor least-squares problems.

### 4.1  The Computational Complexity of T-LARS

Let $K$ be the number of iterations used in T-LARS $P=I1×…×In×…×IN$ be the number of atoms in the Kronecker dictionary $Φ$ and $Q=J1×…×Jn×…×JN$ be the total number of elements in tensor $Y$. In a typical sparse solution obtained by T-LARS $K≪P$. In the following analysis we refer to algorithm 2.

Step 1 of the T-LARS algorithm runs only once and has a computational complexity of
$I1Q+I1I2QJ1+…+PQJ1…Jn-1×In+1…IN+…+PJN$
Step 10 of the T-LARS algorithm obtains the inverse Gram matrix using Schur complement inversion, and has complexity $OPk+k3$ where $k$ is the iteration number whose maximum value is $K$. The computational complexity of step 10 for column addition is $PK+4∑k=1Kk2$ and the computational complexity of step 10 for column removal is $2∑k=1Kk2$. Therefore, the maximum computational complexity of step 10 is given by
$PK+4∑k=1Kk2.$
Steps 13 and 27 of the T-LARS algorithm involve multilinear transformations. In both steps $Dt$ is a sparse tensor with at most $k$ nonzero entries at any iteration $k$. Therefore, for $K$ iterations, the computational complexities of steps 13 and 27 are
$2∑k=1K∑n=1NkIn$
and
$2∑k=1K∑n=1NkJn+2KQ,$
respectively.

#### 4.1.1  Case of Overcomplete mode-n Dictionaries

For overcomplete mode-$n$ dictionaries $Φ(n)∈RJn×In;Jn, $n∈1,…,N$, step 13 of the T-LARS algorithm would have higher computational complexity compared to step 27. Therefore, the computational complexity for most computationally intensive steps of the T-LARS algorithm would be less than
$I1Q+I1I2QJ1+…+PQJ1…Jn-1×In+1…IN+…+PJN+PK+4∑k=1Kk2+2∑k=1K∑n=1NkIn+2KQ$
(4.1)

### 4.2  Comparison of Computational Complexities of Kronecker-OMP and T-LARS

Caiafa and Cichocki (2012) earlier analyzed the computational complexity of Kronecker-OMP to solve problem 2.6 given $Y∈RJ1×…×Jn×…×JN$, $Jn=J;∀n∈1,…,N$ and mode-$n$ dictionaries $Φn∈RJ×I$. From Caiafa and Cichocki (2012) after $K$ iterations, the combined computational complexity of Kronecker-OMP was given by
$2INJ1-JIN1-JIK+2INK+7∑k=1Kk2N+2NJ+N+4∑k=1KkN+N(N-1)+3KJN.$
(4.2)
To obtain the computational complexity of T-LARS to solve problem 2.6 given $Y∈RJ1×…×Jn×…×JN$, $Jn=J;∀n∈1,…,N$ and mode-$n$ dictionaries $Φ(n)∈RJ×I$, we substitute $In=I$ and $Jn=J;∀n∈1,…,N$, in equation 4.1 to obtain
$2INJ1-JIN1-JI+INK+4∑k=1Kk2+2NI∑k=1Kk+2KJN.$
(4.3)

Table 1 shows a term-by-term comparison of the computational complexity of Kronecker-OMP and T-LARS given in equations 4.2 and 4.3, respectively.

Table 1:
Term-by-Term Comparison of the Computational Complexity of Kronecker-OMP and T-LARS Given in Equations 4.2 and 4.3.
Kronecker-OMPT-LARS
First term $2INJ1-JIN1-JIK$ $2INJ1-JIN1-JI$
Second term $2INK+7∑k=1Kk2N$ $INK+4∑k=1Kk2$
Third term $2NJ+N+4∑k=1KkN$ $2NI∑k=1Kk$
Fourth term $N(N-1)+3KJN$ $2KJN$
Kronecker-OMPT-LARS
First term $2INJ1-JIN1-JIK$ $2INJ1-JIN1-JI$
Second term $2INK+7∑k=1Kk2N$ $INK+4∑k=1Kk2$
Third term $2NJ+N+4∑k=1KkN$ $2NI∑k=1Kk$
Fourth term $N(N-1)+3KJN$ $2KJN$

On comparing equations 4.2 and 4.3, we note that the first term of the computational complexity of T-LARS is more than $K$ times lower than the first term of the computational complexity of Kronecker-OMP:
$2INJ1-JIN1-JI<2INJ1-JIN1-JIK.$
(4.4)
On comparing equations 4.2 and 4.3, we note that the second term of the computational complexity of T-LARS is $OINK+K3$ while the second term of the computational complexity of Kronecker-OMP is $OINK+K2N+1$ Therefore, for $N≥2$ and the same number of iterations
$INK+4∑k=1Kk2<2INK+7∑k=1Kk2N.$
(4.5)
On comparing equations 4.2 and 4.3, we note that the third term of the computational complexity of T-LARS is $OK2$ while the third term of the computational complexity of Kronecker-OMP is $OKN+1$. Therefore, for $N≥2$ and the same number of iterations
$2NI∑k=1Kk<2NJ+N+4∑k=1KkN.$
(4.6)
On comparing equations 4.2 and 4.3, we note that both fourth terms of the computational complexity of T-LARS and of Kronecker-OMP are $OJN$. Therefore,
$2KJN
(4.7)
Therefore, from equations 4.4 to 4.7, we observe that the computational complexity of our T-LARS algorithm is significantly lower than Kronecker-OMP when solving sparse tensor least-squares problems with $N≥2$ with the same number of iterations.

For multidimensional problems, $N≥2$, typically $K≫I$; therefore, the second terms of the computational complexities of both T-LARS and Kronecker-OMP dominate over all other terms. Therefore, for $K$ iterations, the asymptotic computational complexities of T-LARS and Kronecker-OMP are $OINK+K3$ and $OINK+K2N+1$, respectively.

## 5  Experimental Results

In this section, we present experimental results to compare the performance of Kronecker-OMP and T-LARS when used to obtain sparse representations of 3D brain images using both fixed and learned mode-$n$ overcomplete dictionaries.

### 5.1  Experimental Data Sets

For our computational experiments, we obtained a 3D MRI brain image and a 3D PET-CT brain image from publicly available data sets.

Our 3D MRI brain image consists of $175×150×10$ voxels and was obtained from the OASIS-3: Longitudinal Neuroimaging, Clinical, and Cognitive Dataset for Normal Aging and Alzheimer's Disease (LaMontagne et al., 2018). This image shows a region in the brain of a 38-year-old male patient with a tumor in his right frontal lobe.

Our 3D PET-CT brain image consists of $180×160×10$ voxels and was obtained from the Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection (Clark et al., 2013). This image shows a region in the brain of a 38-year-old female patient

### 5.2  Experimental Setup

We compared the performance of T-LARS and Kronecker-OMP when used to obtain different sparse representations for our 3D MRI and PET-CT brain images by solving $L0$ and $L1$ constrained sparse multilinear least-squares problems using both fixed and learned overcomplete dictionaries.

Our fixed mode-$n$ overcomplete dictionaries were a union of a discrete cosine transform (DCT) dictionary and a Symlet wavelet packet with four vanishing moments dictionary. In this case of using fixed mode-$n$ dictionaries, we obtained the required 3D sparse representations by solving either the 3D $L0$ or $L1$ minimization problem.

Our learned mode-$n$ overcomplete dictionaries were learned using the tensor-method of optimal directions (T-MOD) (Roemer, Del Galdo, & Haardt, 2014) algorithm. We used T-MOD to learn three overcomplete mode-$n$ dictionaries—$Φ1∈R32×38$, $Φ(2)∈R32×38$, and $Φ3∈R10×12$—using random patches, $32×32×10$ voxels with a 10% overlap from either one the 3D brain images that we used: MRI or PET-CT (Elad & Aharon, 2006; Zhai, Zhang, Lv, Fu, & Yu, 2018). In this case of using learned dictionaries, we obtained the required 3D sparse representations by solving either a 4D (due to the use of image patches) $L0$ or $L1$ minimization problem. For fair comparison of the performance of T-LARS and Kronecker-OMP, we generated our results using two learned dictionaries, $ΦKOMP$ and $ΦTLARS$, that were obtained using Kronecker-OMP and T-LARS as the sparse coding algorithm used by T-MOD, respectively

To compare the performance of T-LARS and Kronecker-OMP when used to solve $L0$ and $L1$ constrained sparse multilinear least-squares problems, we designed the following experiments to obtain sparse representations of our 3D brain images under different conditions:

1. Experiment 1: Fixed mode-$n$ dictionaries—3D $L0$ minimization problem

2. Experiment 2: Learned mode-$n$ dictionaries $ΦKOMP$—4D $L0$ minimization problem

3. Experiment 3: Learned mode-$n$ dictionaries $ΦTLARS$—4D $L0$ minimization problem

4. Experiment 4: Fixed mode-$n$ dictionaries—3D $L1$ minimization problem

5. Experiment 5: Learned mode-$n$ dictionaries $ΦTLARS$—4D $L1$ minimization problem

All of our experimental results were obtained using a Matlab implementation of T-LARS and Kronecker-OMP on an MS Windows machine: 2 Intel Xeon CPUs E5-2637 v4 3.5G Hz, 32 GB RAM, and NVIDIA Tesla P100 GPU with 12 GB memory.

### 5.3  Experimental Results for 3D MRI Brain Images

In this section, we compare the performance of T-LARS and Kronecker-OMP to obtain $K$-sparse representations of our 3D MRI brain image, $Y$, $175×150×10$ voxels by solving the $L0$ constrained sparse tensor least-squares problem. We also obtained similar K-sparse representations using T-LARS by solving the $L1$ optimization problem. Table 2 summarizes our results for the five experiments described in section 5.2. In all experiments, the algorithms were stopped when the number of nonzero coefficients $K$ reached 13, 125, which is 5% of the number of elements in $Y$. We note that in Table 2, the number of iterations for $L1$ optimization problems is larger than K because, as shown in algorithm 2 at each iteration T-LARS could either add or remove nonzero coefficients to or from the solution.

Table 2:
Summary of Experimental Results for Our 3D MRI Brain Image.
ExperimentImage SizeOptimization ProblemDictionary TypeNumber of IterationsComputation Time (sec) K-OMPComputation Time (sec) T-LARS
175 $×$ 150 $×$ 10 $L0$ Fixed 13,125 20,144 434
32 $×$ 32 $×$ 10 $×$ 36 $L0$ Learned $ΦKOMP$ 13,125 25,002 394
32 $×$ 32 $×$ 10 $×$ 36 $L0$ Learned $ΦTLARS$ 13,125 22,646 400
175 $×$ 150 $×$ 10 $L1$ Fixed 14,216 -- 495
32 $×$ 32 $×$ 10 $×$ 36 $L1$ Learned $ΦTLARS$ 14,856 -- 490
ExperimentImage SizeOptimization ProblemDictionary TypeNumber of IterationsComputation Time (sec) K-OMPComputation Time (sec) T-LARS
175 $×$ 150 $×$ 10 $L0$ Fixed 13,125 20,144 434
32 $×$ 32 $×$ 10 $×$ 36 $L0$ Learned $ΦKOMP$ 13,125 25,002 394
32 $×$ 32 $×$ 10 $×$ 36 $L0$ Learned $ΦTLARS$ 13,125 22,646 400
175 $×$ 150 $×$ 10 $L1$ Fixed 14,216 -- 495
32 $×$ 32 $×$ 10 $×$ 36 $L1$ Learned $ΦTLARS$ 14,856 -- 490

#### 5.3.1  Experiment 1

Figures 1 and 2 show obtained experimental results for representing our 3D MRI brain image using three fixed mode-$n$ overcomplete dictionaries $Φ(1)∈R175×351$, $Φ(2)∈R150×302$, and $Φ3∈R10×26$, by solving the $L0$ minimization problem using both T-LARS and Kronecker-OMP. The residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.0839$ (8.39%); for Kronecker-OMP, it was $R2=0.0624$ (6.24%).

Figure 1:

Experiment 1. Original 3D MRI brain image (a), its reconstruction using 5% nonzero coefficients ($K=13,125$) obtained by Kronecker-OMP (b), and T-LARS (c) using fixed mode-$n$ overcomplete dictionaries.

Figure 1:

Experiment 1. Original 3D MRI brain image (a), its reconstruction using 5% nonzero coefficients ($K=13,125$) obtained by Kronecker-OMP (b), and T-LARS (c) using fixed mode-$n$ overcomplete dictionaries.

Figure 2:

Experiment 1. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying Kronecker-OMP and T-LARS to our 3D MRI brain image and using fixed mode-$n$ overcomplete dictionaries.

Figure 2:

Experiment 1. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying Kronecker-OMP and T-LARS to our 3D MRI brain image and using fixed mode-$n$ overcomplete dictionaries.

#### 5.3.2  Experiments 2 and 3

Figures 3 and 4 show the experimental results for representing our 3D MRI brain image using our learned overcomplete dictionaries, $ΦKOMP$ and $ΦTLARS$ by solving the $L0$ minimization problem using both T-LARS and Kronecker-OMP. For the $ΦKOMP$ dictionary, the residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.1368$ (13.68%) and for Kronecker-OMP, it was $R2=0.1143$ (11.43%). For the $ΦTLARS$ dictionary, the residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.1127$ (11.27%) and for Kronecker-OMP, it was $R2=0.0955$ (9.55%)

Figure 3:

Experiments 2 and 3. Original 3D MRI brain image (a), its reconstructions using 5% nonzero coefficients ($K=13,125$), (b–e), the difference images, (f–i) obtained using Kronecker-OMP and T-LARS, using our learned overcomplete dictionaries.

Figure 3:

Experiments 2 and 3. Original 3D MRI brain image (a), its reconstructions using 5% nonzero coefficients ($K=13,125$), (b–e), the difference images, (f–i) obtained using Kronecker-OMP and T-LARS, using our learned overcomplete dictionaries.

Figure 4:

Experiments 2 and 3. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D MRI brain image and using our learned overcomplete dictionaries.

Figure 4:

Experiments 2 and 3. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D MRI brain image and using our learned overcomplete dictionaries.

#### 5.3.3  Experiment 4

Figures 5 and 6 show the experimental results for representing our 3D MRI brain image using three fixed mode-$n$ overcomplete dictionaries, $Φ(1)∈R175×351$, $Φ(2)∈R150×302$ and $Φ3∈R10×26$, by solving the $L1$ minimization problem using T-LARS. The residual error of the reconstructed 3D image obtained using T-LARS was $R2=0.121$ (12.1%)

Figure 5:

Experiment 4. Original 3D MRI brain image (a), its reconstruction using 5% nonzero coefficients ($K=13,125$) obtained by T-LARS using fixed moden overcomplete dictionaries (b), and the difference image (c).

Figure 5:

Experiment 4. Original 3D MRI brain image (a), its reconstruction using 5% nonzero coefficients ($K=13,125$) obtained by T-LARS using fixed moden overcomplete dictionaries (b), and the difference image (c).

Figure 6:

Experiment 4. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying T-LARS to our 3D MRI brain image and using fixed mode-$n$ overcomplete dictionaries.

Figure 6:

Experiment 4. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying T-LARS to our 3D MRI brain image and using fixed mode-$n$ overcomplete dictionaries.

#### 5.3.4  Experiment 5

Figures 7 and 8 show obtained experimental results for representing our 3D MRI brain image using our learned overcomplete dictionary$ΦTLARS$, by solving the $L1$ minimization problem, using T-LARS. The residual error of the reconstructed 3D image was $R2=0.138$ (13.8%)

Figure 7:

Experiment 5. Original 3D MRI brain image (a) and its reconstruction using 5% nonzero coefficients ($K=13,125$) obtained by T-LARS using our learned overcomplete dictionary (b) and the difference image (c).

Figure 7:

Experiment 5. Original 3D MRI brain image (a) and its reconstruction using 5% nonzero coefficients ($K=13,125$) obtained by T-LARS using our learned overcomplete dictionary (b) and the difference image (c).

Figure 8:

Experiment 5. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying T-LARS to our 3D MRI brain image and using our learned overcomplete dictionary.

Figure 8:

Experiment 5. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying T-LARS to our 3D MRI brain image and using our learned overcomplete dictionary.

### 5.4  Experimental Results for 3D PET-CT Brain Images

In this section, we compare the performance of T-LARS and Kronecker-OMP to obtain $K$-sparse representations of our 3D PET-CT brain image, $Y$, $180×160×10$ voxels. by solving the $L0$ constrained sparse tensor least-squares problem. We also obtained similar $K$-sparse representations using T-LARS by solving the $L1$ optimization problem. Table 3 summarizes our results for the five experiments. In all experiments, the algorithms were stopped when the number of nonzero coefficients $K$ reached 14,400, which is 5% of the number of elements in $Y$. We note that in Table 2 the number of iterations for $L1$ optimization problems is larger than $K$ because, as shown in algorithm 2, at each iteration T-LARS could either add or remove nonzero coefficients to or from the solution.

Table 3:
Summary of Experimental Results for Our 3D PET-CT Brain Image.
ExperimentImage SizeOptimization ProblemDictionary TypeIterationsNumber of K-OMPComputation Time (sec) T-LARS
180 $×$ 160 $×$ 10 $L0$ Fixed 14,400 29,529 505
32 $×$ 32 $×$ 10 $×$ 42 $L0$ Learned $ΦKOMP$ 14,400 33,453 476
32 $×$ 32 $×$ 10 $×$ 42 $L0$ Learned $ΦTLARS$ 14,400 31,083 490
180 $×$ 160 $×$ 10 $L1$ Fixed 16,059 – 591
32 $×$ 32 $×$ 10 $×$ 42 $L1$ Learned $ΦTLARS$ 18,995 – 744
ExperimentImage SizeOptimization ProblemDictionary TypeIterationsNumber of K-OMPComputation Time (sec) T-LARS
180 $×$ 160 $×$ 10 $L0$ Fixed 14,400 29,529 505
32 $×$ 32 $×$ 10 $×$ 42 $L0$ Learned $ΦKOMP$ 14,400 33,453 476
32 $×$ 32 $×$ 10 $×$ 42 $L0$ Learned $ΦTLARS$ 14,400 31,083 490
180 $×$ 160 $×$ 10 $L1$ Fixed 16,059 – 591
32 $×$ 32 $×$ 10 $×$ 42 $L1$ Learned $ΦTLARS$ 18,995 – 744

#### 5.4.1  Experiment 1

Figures 9 and 10 show the experimental results for representing our 3D PET-CT brain image using three fixed mode-$n$ overcomplete dictionaries, $Φ(1)∈R180×364$, $Φ(2)∈R160×320$ and $Φ3∈R10×26$, by solving the $L0$ minimization problem using both T-LARS and Kronecker-OMP. The residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.054$ (5.4%), and for Kronecker-OMP, it was $R2=0.0368(3.68%)$.

Figure 9:

Experiment 1. Original PET-CT brain image (a) and its reconstruction using 5% nonzero coefficients ($K=14,400$) obtained by Kronecker-OMP (b) and TLARS (c) using fixed mode-$n$ overcomplete dictionaries.

Figure 9:

Experiment 1. Original PET-CT brain image (a) and its reconstruction using 5% nonzero coefficients ($K=14,400$) obtained by Kronecker-OMP (b) and TLARS (c) using fixed mode-$n$ overcomplete dictionaries.

Figure 10:

Experiment 1. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D PET-CT brain image and using fixed mode-$n$ overcomplete dictionaries.

Figure 10:

Experiment 1. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D PET-CT brain image and using fixed mode-$n$ overcomplete dictionaries.

#### 5.4.2  Experiments 2 and 3

Figures 11 and 12 show the experimental results for representing our 3D PET-CT brain image using our learned overcomplete dictionaries $ΦKOMP$ and $ΦTLARS$, by solving the $L0$ minimization problem using both T-LARS and Kronecker-OMP. For the $ΦKOMP$ dictionary, the residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.096$ (9.6%) and for Kronecker-OMP, it was $R2=0.077$ (7.7%). For the $ΦTLARS$ dictionaries, the residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.0877$ (8.77%) and for Kronecker-OMP, it was $R2=0.0722$ (7.22%).

Figure 11:

Experiments 2 and 3. Original 3D PET-CT brain image (a), its reconstructions using 5% nonzero coefficients ($K=14,400$) (b–e), the difference images (f–i) obtained using Kronecker-OMP and T-LARS, using our learned overcomplete dictionaries.

Figure 11:

Experiments 2 and 3. Original 3D PET-CT brain image (a), its reconstructions using 5% nonzero coefficients ($K=14,400$) (b–e), the difference images (f–i) obtained using Kronecker-OMP and T-LARS, using our learned overcomplete dictionaries.

Figure 12:

Experiments 2 and 3. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D PET-CT brain image and using our learned overcomplete dictionaries.

Figure 12:

Experiments 2 and 3. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D PET-CT brain image and using our learned overcomplete dictionaries.

#### 5.4.3  Experiment 4

Figures 13 and 14 show the experimental results for representing 3D PET-CT brain images using three fixed overcomplete mode-$n$ dictionaries, $Φ(1)∈R180×364$, $Φ(2)∈R160×320$, and $Φ3∈R10×26$, by solving the $L1$ minimization problem using T-LARS. The residual error of the reconstructed 3D PET-CT brain images obtained using T-LARS is $R2=0.0838$ (8.38%).

Figure 13:

Experiment 4. Original 3D PET-CT brain image (a), its reconstruction using 5% nonzero coefficients ($K=14,400$) obtained by T-LARS using fixed mode-$n$ overcomplete dictionaries (b), and the difference image (c).

Figure 13:

Experiment 4. Original 3D PET-CT brain image (a), its reconstruction using 5% nonzero coefficients ($K=14,400$) obtained by T-LARS using fixed mode-$n$ overcomplete dictionaries (b), and the difference image (c).

Figure 14:

Experiment 4. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying T-LARS to our 3D PET-CT brain image and using fixed mode-$n$ overcomplete dictionaries.

Figure 14:

Experiment 4. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients, obtained by applying T-LARS to our 3D PET-CT brain image and using fixed mode-$n$ overcomplete dictionaries.

#### 5.4.4  Experiment 5

Figures 15 and 16 show the experimental results for representing the 3D PET-CT brain images using our learned overcomplete dictionary $ΦTLARS$, by solving the $L1$ minimization problem using T-LARS. The residual error of the reconstructed 3D PET-CT brain images obtained using T-LARS is $R2=0.106$ (10.6%).

Figure 15:

Experiment 5. Original 3D PET-CT brain image (a) and its reconstruction using 5% nonzero coefficients ($K=14,400$) obtained by T-LARS using our learned overcomplete dictionary (b) and the difference image (c).

Figure 15:

Experiment 5. Original 3D PET-CT brain image (a) and its reconstruction using 5% nonzero coefficients ($K=14,400$) obtained by T-LARS using our learned overcomplete dictionary (b) and the difference image (c).

Figure 16:

Experiment 5. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying T-LARS to our 3D PET-CT brain image and using our learned overcomplete dictionary.

Figure 16:

Experiment 5. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus number of nonzero coefficients obtained by applying T-LARS to our 3D PET-CT brain image and using our learned overcomplete dictionary.

## 6  Conclusion

In this letter, we developed tensor least angle regression (T-LARS), a generalization of least angle regression to efficiently solve large $L0$ or large $L1$ constrained multidimensional (tensor) sparse least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameter $λ$. An earlier generalization of OMP, Kronecker-OMP was developed to solve the $L0$ problem for large multidimensional sparse least-squares problems. To demonstrate the validity and performance of our T-LARS algorithm, we successfully used it to obtain different $K$-sparse signal representations of two 3D brain images, using fixed and learned separable overcomplete dictionaries, by solving 3D and 4D, $L0$ and $L1$ constrained sparse least-squares problems. Our numerical experiments demonstrate that our T-LARS algorithm is significantly faster (46 to 70 times) than Kronecker-OMP in obtaining $K$-sparse solutions for multilinear least-squares problems. However, the $K$-sparse solutions obtained using Kronecker-OMP always have a slightly lower residual error (1.55% to 2.25%) than those obtained by T-LARS. These numerical results confirm our analysis in section 4.2 that the computational complexity of T-LARS is significantly lower than the computational complexity of Kronecker-OMP. In future work we plan to exploit this significant computational efficiency of T-LARS to develop more computationally efficient Kronecker dictionary learning methods. A Matlab GPU-based implementation of our Tensor Least Angle Regression (T-LARS) algorithm, Algorithm 2, is available at https://github.com/SSSherif/Tensor-Kronecker-Least-Angle-Regression.

## Appendix A:  Mapping of Column Indices of Dictionary $Φ$ to Column Indices of Mode-$n$ Dictionaries

T-LARS avoids the construction of large matrices such as the separable dictionary $Φ$ in equation 2.4. Instead, T-LARS uses mode-$n$ dictionaries for calculations. The following mapping between column indices of dictionary $Φ$ and column indices of mode-$n$ dictionaries $Φ(n);n∈1,…,N$ is essential in T-LARS calculations.

The arbitrary column $Φk$ is the $kth$ column of the separable dictionary $Φ$ in equation 2.4, which is given by the Kronecker product of the columns of the dictionary matrices $Φ1,Φ2,…,ΦN$, $n∈1,…,N$:
$Φk=ϕiN(N)⊗ϕiN-1N-1⊗…⊗ϕI11.$
(A.1)
The column indices ($iN,iN-1,…,i1)$ are the indices of the columns of the dictionary matrices $Φ1,Φ2,…,Φ(N)$. The column index $k$ of the separable dictionary $Φ$ is given by (Kolda, 2006)
$k=i1+∑n=2Nin-1I1I2…In-1,$
(A.2)
where, $I1I2,…,IN$ are the dimensions of the columns of the dictionary matrices $Φ1,Φ2,…,Φ(N)$, respectively. The following proposition shows how to obtain the column indices of the dictionary matrices ($iN,iN-1,…,i1)$, which correspond to the column index $k$ of the separable dictionary $Φ$.
Proposition 1.
Let $k$ be the column index of the separable dictionary column vector $Φk$ and $In$ be the dimension of the columns of each dictionary matrix $Φ(n);n∈1,…,N$. In equation A.1, the corresponding column indices $in;n∈1,…,N$ of each dictionary matrix $ϕin(n)$ are given by
$in=kI1×…×In-1-∑p=n+1;p≤NNip-1∏q=n;q>0p-1Iq$
(A.3)
where $*$ indicates the ceiling function—for example,
$i1=k-iN-1IN-1×…×I1-…-i2-1I1⋮iN-1=kI1×…×IN-2-iN-1IN-1iN=kI1×…×IN-1.$
Proof.

We note that $in;∀n∈{1,2,…,N$ are integers and $1≤in≤In$.

From equation A.2,
$k=I1+i2-1I1+…+iN-1I1×…×IN-1.$
(A.4)
Therefore,
$iN-1=kI1×…×IN-1︸SN-i1+(i2-1)I1+…+(iN-1-1)I1×…×IN-2I1×…×IN-1︸fNiN=SN+1-fN.$
(A.5)
Since $in≤In;∀n∈{1,2,…,N}$,
$fN≤I1+I2-1I1+…+IN-1-1I1×…×IN-2I1×…×IN-1=1.$
(A.6)
Also since $1≤in;∀n∈{1,2,…,N}$, we have
$fN=1+1-1I1+…+1-1I1×…×IN-2I1×…×IN-1>0.$
(A.7)
Therefore, from equations A.6 and A.7
$0
(A.8)
$0≤1-fN<1.$
(A.9)
Since $iN∈Z$, from equations A.5 and A.9, we have
$iN=iN-1-fN=SNiN=kI1×…×IN-1,$
(A.10)
where $*$ indicates the ceiling function.
Similarly,
$iN-1-1=kI1×…×IN-2-(iN-1)IN-1︸SN-1-i1+(i2-1)I1+⋯+(iN-1-1)I1×…×IN-2I1×…×IN-2︸fN-1,iN-1=sN-1+1-fN-1,0≤1-fN-1<1.$
$□$
Since $iN-1∈Z$ and $0≤1-fN-1<1$,
$iN-1=iN-1-1-fN-1=SN-1iN-1=kI1×…×IN-2-iN-1IN-1.$
Similarly, for $∀n∈{1,2,…,N}$,
$0≤1-fn<1,in=in-1-fn=Sn,in=kI1×…×In-1-∑p=n+1;p≤NNip-1∏q=n;q>0p-1Iq.$

## Appendix B:  Tensor and Dictionary Normalization

### B.1  Normalization of the Tensor $Y∈RJ1×…×Jn×…×JN$

Compute
$Y^=YY2.$
(B.1)
where, $Y2=〈Y,Y〉=∑j1J1…∑jNJNyj1j2…jN212$.

### B.2  Normalization of Columns of the Separable Dictionary $Φ$ to Have a Unit $L2$ Norm

The column $Φk$ in equation A.1 is the $k$th column of the separable dictionary $Φ$. Normalization of each column vector $Φk$ of the separable dictionary is given by
$Φ^k=ΦkΦk2.$
(B.2)
Proposition 2.
Normalization of the column $Φk$ in equation A.1 is given by the Kronecker product of the normalization of the dictionary columns $ϕiN(N),ϕiN-1N-1,…,ϕI11$:
$Φ^k=ϕ^iNN⊗ϕ^iN-1N-1⊗…⊗ϕ^I11.$
(B.3)
Proof.
The $L2$ norm of the Kronecker product of vectors is the product of $L2$ norms of these vectors (Lancaster & Farahat, 1972):
$Φk22=ϕiN(N)⊗ϕiN-1N-1⊗…⊗ϕI11=ϕiNN22×ϕiN-1N-122×…×ϕI1122.$
(B.4)
From equations B.2 and B.3,
$Φ^k=ΦkΦk2=ϕiN(N)ϕiN(N)22⊗…⊗ϕI11ϕI1122.$
(B.5)
Therefore,
$Φ^k=ϕ^iN(N)⊗ϕ^iN-1N-1⊗…⊗ϕ^I11.$
$□$

## Appendix C:  Obtaining the Initial Correlation Tensor $C1$

In LARS the initial correlation vector $c1$ is obtained by taking the correlation between all columns of $Φ$ and the vectorization of the tensor $Y$:
$c1=Φ(N)T⊗ΦN-1T⊗⋯⊗Φ1TvecY.$
(C.1)
We could also represent equation C.1 as a multilinear transformation of the tensor $Y$ .(Kolda, 2006):
$C1=Y×1Φ1T×2…×nΦ(n)T×n+1…×NΦ(N)T.$
(C.2)
The tensor $C1$ is the correlation between the tensor $Y$ and the mode-$n$ dictionary matrices $Φ(n);n∈1,…,N$. The tensor $C1$ could be calculated efficiently as $N$ mode-$n$ products.

## Appendix D:  Creating a Gram Matrix for Each mode-$n$ Dictionary $Φn$

Gram matrices are used in several steps of T-LARS. For a large separable dictionary, $Φ$, its Gram matrix would be large as well. Therefore, explicitly building this Gram matrix and using it in computations could be very inefficient for large problems. Therefore, we developed T-LARS to use Gram matrices of mode-$n$ dictionary matrices, $Φ1,Φ2,…,Φ(N)$, defined as $G(n);n∈1,…,N$, instead of the Gram matrix $ΦTΦ$:
$ΦTΦ=ΦNTΦN⊗⋯⊗ΦnTΦn⊗⋯⊗Φ1TΦ1.$
(D.1)
We can obtain Gram matrix $G(n)$ for each mode-$n$ dictionary $Φ(n)$ by
$G(n)=Φ(n)TΦ(n).$
(D.2)
The total sizes of the Gram matrices $G(n);n∈1,…,N$ would be much smaller than the Gram matrix $G=ΦTΦ$, thereby allowing faster calculations and requiring less computer storage.

## Acknowledgments

This work was supported by a Discovery Grant (RGPIN-2018-06453) from the National Sciences and Engineering Research Council of Canada.

## References

Acar
,
E.
,
Dunlavy
,
D. M.
, &
Kolda
,
T. G.
(
2011
).
A scalable optimization approach for fitting canonical tensor decompositions.
Journal of Chemometrics
,
25
,
67
86
. https://doi.org/10.1002/cem.1335
Björck
,
Å.
(
2015
).
Numerical methods in matrix computations
.
Berlin
:
Springer
.
Boyd
,
S.
, &
Vandenberghe
,
L.
(
2010
).
Convex optimization
.
Cambridge
:
Cambridge University Press
.
Caiafa
,
C. F.
, &
Cichocki
,
A.
(
2012
).
Computing sparse representations of multidimensional signals using Kronecker bases.
Neural Computation
,
25
,
1
35
. https://doi.org/10.1162/neco_a_00385
Chen
,
S. S.
,
Donoho
,
D. L.
, &
Saunders
,
M. A.
(
1998
).
Atomic decomposition by basis pursuit.
SIAM Journal on Scientific Computing
,
20
(
1
),
33
61
. https://doi.org/10.1137/S1064827596304010
Cichocki
,
A.
,
Mandic
,
D.
,
De Lathauwer
,
L.
,
Zhou
,
G.
,
Zhao
,
Q.
,
Caiafa
,
C.
, &
Phan
,
H. A.
(
2015
).
Tensor decompositions for signal processing applications: From two-way to multiway component analysis.
IEEE Signal Processing Magazine
,
32
,
145
163
. https://doi.org/10.1109/MSP.2013.2297439
Clark
,
K.
,
Vendt
,
B.
,
Smith
,
K.
,
Freymann
,
J.
,
Kirby
,
J.
,
Koppel
,
P.
, &
Prior
,
F.
(
2013
).
The cancer imaging archive (TCIA): Maintaining and operating a public information repository.
Journal of Digital Imaging
,
26
(
6
),
1045
1057
. https://doi.org/10.1007/s10278-013-9622-7
Coifman
,
R. R.
, &
Wickerhauser
,
M. V.
(
1992
).
Entropy-based algorithms for best basis selection.
IEEE Transactions on Information Theory
,
38
(
2
),
713
718
. https://doi.org/10.1109/18.119732
Daubechies
,
I.
(
1990
).
The wavelet transform, time-frequency localization and signal analysis.
IEEE Transactions on Information Theory
,
36
(
5
),
961
1005
. https://doi.org/10.1109/18.57199
Donoho
,
D. L.
(
2006
).
For most large underdetermined systems of linear equations the minimal $ℓ1$-norm solution is also the sparsest solution.
Communications on Pure and Applied Mathematics
,
59
(
6
),
797
829
. https://doi.org/10.1002/cpa.20132
Donoho
,
D. L.
, &
,
M.
(
2003
).
Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization.
In
Proceedings of the National Academy of Sciences
,
100
(
5
),
2197
2202
. https://doi.org/10.1073/pnas.0437847100
Donoho
,
D. L.
, &
Tsaig
,
Y.
(
2008
).
Fast solution of $ℓ1$-norm minimization problems when the solution may be sparse.
IEEE Transactions on Information Theory
,
54
(
11
),
4789
4812
. https://doi.org/10.1109/TIT.2008.929958
Efron
,
B.
,
Hastie
,
T.
,
Johnstone
,
I.
,
Tibshirani
,
R.
,
Ishwaran
,
H.
,
Knight
,
K.
, &
Tibshirani
,
R.
(
2004
).
Least angle regression.
Annals of Statistics
,
32
(
2
),
407
499
. https://doi.org/10.1214/009053604000000067
,
M.
, &
Aharon
,
M.
(
2006
).
Image denoising via sparse and redundant representations over learned dictionaries.
IEEE Transactions on Image Processing
,
15
(
12
),
3736
3745
. https://doi.org/10.1109/TIP.2006.881969
Elrewainy
,
A.
, &
Sherif
,
S. S.
(
2019
).
Kronecker least angle regression for unsupervised unmixing of hyperspectral imaging data.
Signal, Image and Video Processing
,
14
(
2
),
359
367
. https://doi.org/10.1007/s11760-019-01562-w
Goldfarb
,
D.
(
1972
).
Modification methods for inverting matrices and solving systems of linear algebraic equations.
Mathematics of Computation
,
26
(
120
),
829
829
. https://doi.org/10.1090/S0025-5718-1972-0317527-4
Hager
,
W. W.
(
1989
).
Updating the inverse of a matrix.
SIAM Review
,
31
(
2
),
221
239
. https://doi.org/10.1137/1031049
Hawe
,
S.
,
Seibert
,
M.
, &
Kleinsteuber
,
M.
(
2013
). Separable dictionary learning. In
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(pp.
438
445
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/CVPR.2013.63
Kolda
,
T. G.
(
2006
).
Multilinear operators for higher-order decompositions.
(Technical Report SAND2006-2081)
.
Livermore, CA
:
Sandia National Laboratories
. https://doi.org/10.2172/923081
Kolda
,
T. G.
, &
,
B. W.
(
2009
).
Tensor decompositions and applications.
SIAM Review
,
51
(
3
),
455
500
. https://doi.org/10.1137/07070111X
,
K.
,
Murray
,
J. F.
,
Rao
,
B. D.
,
Engan
,
K.
,
Lee
,
T.-W.
, &
Sejnowski
,
T. J.
(
2003
).
Dictionary learning algorithms for sparse representation.
Neural Computation
,
15
(
2
),
349
396
. https://doi.org/10.1162/089976603762552951
LaMontagne
,
P. J.
,
Keefe
,
S.
,
Lauren
,
W.
,
Xiong
,
C.
,
Grant
,
E. A.
,
Moulder
,
K. L.
, &
Marcus
,
D. S.
(
2018
).
OASIS-3: Longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and Alzheimer's disease.
Alzheimer's and Dementia
,
14
(
7S, pt. 2
),
P138
P138
. https://doi.org/10.1016/j.jalz.2018.06.2231
Lancaster
,
P.
, &
Farahat
,
H. K.
(
1972
).
Norms on direct sums and tensor products
.
Mathematics and Computation
,
26
(
118
),
401
414
.
Malioutov
,
D. M.
,
Çetin
,
M.
, &
Willsky
,
A. S.
(
2005
).
Homotopy continuation for sparse signal representation.
In
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
(pp.
733
736
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/ICASSP.2005.1416408
Mallat
,
S. S.
(
2009
).
A wavelet tour of signal processing
.
Amsterdam
:
Elsevier
. https://doi.org/10.1016/B978-0-12-374370-1.X0001-8
Mallat
,
S. G.
, &
Zhang
,
Z.
(
1993
).
Matching pursuits with time-frequency dictionaries.
IEEE Transactions on Signal Processing
,
41
(
12
),
3397
3415
. https://doi.org/10.1109/78.258082
Pati
,
Y. C.
,
Rezaiifar
,
R.
, &
,
P. S.
(
1993
).
Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition.
In
Proceedings of the Asilomar Conference on Signals, Systems & Computers
(vol.
1
, pp.
40
44
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/acssc.1993.342465
Roemer
,
F.
,
Del Galdo
,
G.
, &
Haardt
,
M.
(
2014
).
Tensor-based algorithms for learning multidimensional separable dictionaries.
In
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
(pp.
3963
3967
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/ICASSP.2014.6854345
Rosário
,
F.
,
Monteiro
,
F. A.
, &
Rodrigues
,
A.
(
2016
).
Fast matrix inversion updates for massive MIMO detection and precoding.
IEEE Signal Processing Letters
,
23
(
1
),
75
79
. https://doi.org/10.1109/LSP.2015.2500682
Sidiropoulos
,
N. D.
,
De Lathauwer
,
L.
,
Fu
,
X.
,
Huang
,
K.
,
Papalexakis
,
E. E.
, &
Faloutsos
,
C.
(
2017
).
Tensor decomposition for signal processing and machine learning.
IEEE Transactions on Signal Processing
,
65
(
13
),
3551
3582
. https://doi.org/10.1109/TSP.2017.2690524
Sulam
,
J.
,
Ophir
,
B.
,
Zibulevsky
,
M.
, &
,
M.
(
2016
).
Trainlets: Dictionary learning in high dimensions
.
ArXiv:1602.00212
. https://doi.org/10.1109/TSP.2016.2540599
Tosic
,
I.
, &
Frossard
,
P.
(
2011
).
Dictionary learning.
IEEE Signal Processing Magazine
,
28
(
2
),
27
38
. https://doi.org/10.1109/MSP.2010.939537
Tropp
,
J. A.
(
2004
).
Greed is good: Algorithmic results for sparse approximation.
IEEE Transactions on Information Theory
,
50
(
10
),
2231
2242
. https://doi.org/10.1109/TIT.2004.834793
van den Berg
,
E.
, &
Friedlander
,
M. P.
(
2009
).
Probing the Pareto frontier for basis pursuit solutions.
SIAM Journal on Scientific Computing
,
31
(
2
),
890
912
. https://doi.org/10.1137/080714488
Wickramasingha
,
I.
,
Sobhy
,
M.
, &
Sherif
,
S. S.
(
2017
).
Sparsity in Bayesian signal estimation.
In
J. P.
Tejedor
(Ed.),
Bayesian inference
.
InTech
. https://doi.org/10.5772/intechopen.70529
Yang
,
J.
,
Peng
,
Y.
,
Xu
,
W.
, &
Dai
,
Q.
(
2009
).
Ways to sparse representation: An overview.
Science in China, Series F: Information Sciences
,
52
(
4
),
695
703
. https://doi.org/10.1007/s11432-009-0045-5
Zhai
,
L.
,
Zhang
,
Y.
,
Lv
,
H.
,
Fu
,
S.
, &
Yu
,
H.
(
2018
).
Multiscale tensor dictionary learning approach for multispectral image denoising.
IEEE Access
,
6
,
51898
51910
. https://doi.org/10.1109/ACCESS.2018.2868765