## Abstract

Sparse signal representations have gained much interest recently in both signal processing and statistical communities. Compared to orthogonal matching pursuit (OMP) and basis pursuit, which solve the $L0$ and $L1$ constrained sparse least-squares problems, respectively, least angle regression (LARS) is a computationally efficient method to solve both problems for all critical values of the regularization parameter $\lambda $. However, all of these methods are not suitable for solving large multidimensional sparse least-squares problems, as they would require extensive computational power and memory. An earlier generalization of OMP, known as Kronecker-OMP, was developed to solve the $L0$ problem for large multidimensional sparse least-squares problems. However, its memory usage and computation time increase quickly with the number of problem dimensions and iterations. In this letter, we develop a generalization of LARS, tensor least angle regression (T-LARS) that could efficiently solve either large $L0$ or large $L1$ constrained multidimensional, sparse, least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameter $\lambda $ and with lower computational complexity and memory usage than Kronecker-OMP. To demonstrate the validity and performance of our T-LARS algorithm, we used it to successfully obtain different sparse representations of two relatively large 3D brain images, using fixed and learned separable overcomplete dictionaries, by solving both $L0$ and $L1$ constrained sparse least-squares problems. Our numerical experiments demonstrate that our T-LARS algorithm is significantly faster (46 to 70 times) than Kronecker-OMP in obtaining $K$-sparse solutions for multilinear leastsquares problems. However, the $K$-sparse solutions obtained using Kronecker-OMP always have a slightly lower residual error (1.55% to 2.25%) than ones obtained by T-LARS. Therefore, T-LARS could be an important tool for numerous multidimensional biomedical signal processing applications.

## 1 Introduction

Sparse signal representations have gained much interest recently in both signal processing and statistics communities. A sparse signal representation usually results in simpler and faster processing, in addition to lower memory storage requirement for fewer coefficients (Mallat, 2009; Wickramasingha, Sobhy, & Sherif, 2017). However, finding optimal sparse representations for different signals is not a trivial task (Mallat, 2009). Therefore, redundant signal representations using overcomplete dictionaries have been introduced to facilitate finding more sparse representations for different signals (Mallat & Zhang, 1993; Mallat, 2009; Pati, Rezaiifar, & Krishnaprasad, 1993; Wickramasingha et al., 2017). Undercomplete dictionaries could also be used to obtain approximate signal representations (Kreutz-Delgado et al., 2003; Malioutov, Çetin, & Willsky, 2005; Tosic & Frossard, 2011). We note that at their core such signal representation problems typically involve solving a least-squares problem (Donoho, 2006; Donoho & Elad, 2003).

A number of methods have been proposed to solve the sparse least-squares problem, including the Method of Frames (MOF; Daubechies, 1990), Matching Pursuit (MP; Mallat & Zhang, 1993), Orthogonal Matching Pursuit (OMP; Pati et al., 1993), Best Orthogonal Basis (BOB; Coifman & Wickerhauser, 1992), Lasso (also known as Basis Pursuit, BP); Chen, Donoho, & Saunders, 1998; Efron et al., 2004), and Least Angle Regression (LARS; Efron et al., 2004). MP and OMP obtain sparse signal representations by solving a nonconvex $L0$ constrained least-squares problem (Tropp, 2004). MPs are heuristic methods that construct sparse signal representations by sequentially adding atoms from a given dictionary in a greedy (i.e., nonglobally optimal) manner. BP relaxes the nonconvex $L0$ constrained optimization problem to solve a convex $L1$ constrained least-squares problem instead (Chen et al., 1998). In both problem formulations, a regularization parameter $\lambda $ determines the trade-off between the representation error of the signal and its sparsity, as shown in the Pareto curve in van den Berg and Friedlander (2009). A common approach to obtaining a sparse signal representation using BP is to solve the optimization problem multiple times for different values $\lambda $, before choosing the most suitable solution for the application at hand (van den Berg & Friedlander, 2009).

Compared to the above methods, LAR efficiently solves the $L0$ or, with a slight modification the $L1$ constrained least-squares problem for all critical values of the regularization parameter $\lambda $ (Efron et al., 2004). However, even LARS is not suitable for large-scale problems as it would require multiplication and inversion of very large matrices (Yang, Peng, Xu, & Dai, 2009). For example, for $m$ unknown variables and $n$ equations, the LARS algorithm has $O(m3+nm2)$ computational complexity (Efron et al., 2004).

LARS and other algorithms to solve sparse least-squares problems are directly applicable to one-dimensional signals. Therefore, multidimensional signals that are represented by tensors (i.e., multidimensional arrays) would need to be vectorized first to enable the application of these methods (Acar, Dunlavy, & Kolda, 2011; Cichocki et al., 2015; Kolda, 2006). For $IN$ number of vectorized variables, a dictionary of size $IN\xd7JN$, where $J>I$ for an overcomplete dictionary, is required to solve the sparse linear least-squares problem, using LARS and other algorithms mentioned above. $N$ is the order of the tensor, also known as the number of modes, and $I$ is the dimension of each mode. Therefore, the number of vectorized unknown variables $IN$ would increase exponentially with the order of the tensor $N$. Such problems would quickly become increasingly large and computationally intractable. For example, a 3D tensor with 100 unknown variables in each mode has 1 million unknowns, which requires a dictionary with at least 1 trillion $(1012)$ elements; a 4D tensor with 100 unknown variables in each mode has 10 million unknowns, which requires a dictionary of at least 10 quadrillion $(1016)$ elements.

Mathematically separable signal representations (i.e., those using separable dictionaries) have been typically used for multidimensional signals because they are simpler and easier to obtain than nonseparable representations (Caiafa & Cichocki, 2012; Sulam, Ophir, Zibulevsky, & Elad, 2016). Caiafa and Cichocki (2012) introduced Kronecker-OMP a generalization of OMP that could represent multidimensional signals represented by tensors using separable dictionaries. They also developed the N-BOMP algorithm to exploit block-sparse structures in multidimensional signals. However, similar to OMP, Kronecker-OMP could obtain only an approximate nonglobally optimal solution of the nonconvex $L0$ constrained sparse, least-squares problem (Elrewainy & Sherif, 2019; van den Berg & Friedlander, 2009). Also, there is currently no computationally efficient method to obtain a sparse representation of a multidimensional signal by solving the convex $L1$ constrained sparse, least-squares problem for all critical values of the regularization parameter $\lambda $. However, two of us, Elrewainy and Sherif have developed the Kronecker least angle regression (K-LARS) algorithm to efficiently solve either large $L0$ or large $L1$ sparse, least-squares problems (overdetermined) with a particular Kronecker form $A\u2297I$, for all critical values of the regularization parameter $\lambda $ (Elrewainy & Sherif, 2019). They used K-LARS to sparsely fit one-dimensional multichannel, hyperspectral, spectral imaging data to a Kronecker model $A\u2297I$.

In this letter, we develop a generalization of K-LARS, tensor least angle regression (T-LARS) which could efficiently solve either large $L0$ or large $L1$ multidimensional sparse least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameter $\lambda $. The balance of the letter is organized as follows: Section 2 provides a brief introduction to tensors, tensor operations, and multilinear sparse least-squares problems. In section 3, we review LARS and describe our T-LARS algorithm in detail. Section 4 presents the computational complexity of our T-LARS algorithm and compares its computational complexity with that of Kronecker-OMP. Section 5 provides results of experiments applying both T-LARS and Kronecker-OMP. We present our conclusions in section 6.

## 2 Problem Formulation

### 2.1 Tensors and Multilinear Transformations

The term tensor has a specific mathematical definition in physics, but it has been widely accepted in many disciplines (e.g., signal processing and statistics) to mean a multidimensional array. Therefore, a vector is a first-order tensor, and a matrix is a second-order tensor. An N-dimensional array is an $Nth$-order tensor whose $N$ dimensions are also known as modes (Hawe, Seibert, & Kleinsteuber, 2013; Sidiropoulos et al., 2017). The N*th*-order tensor $X\u2208RI1\xd7\u2026\xd7In\xd7\u2026\xd7IN$ has $N$ modes, with dimensions $I1,I2,\u2026,IN$ where vectors along a specific mode, n, are called *mode-n* fibers.

### 2.2 Sparse Multilinear Least-Squares Problem

For the same values of $p$, the $Lp$ minimization problems in equations 2.5 and 2.6 are equivalent, even though their formulations are in vector and tensor forms, respectively. We note that for $0\u2264p<1$, equations 2.5 and 2.6 are nonconvex optimization problems, while for $p\u22651$, they are convex optimization problems. The sparsest solution of equations 2.5 and 2.6 would be obtained when $p=0$, that is, $L0$ constrained problem, but its sparsity would be reduced as $p$ increases.

The $L0$ minimization problem ($p=0$) is a nonconvex optimization problem whose vector formulation, problem 2.5, could be solved approximately (i.e., nonglobally) using OMP or LARS (van den Berg & Friedlander, 2009). The $L1$ minimization problem ($p=1$) is a convex optimization problem, whose vector formulation, problem 2.5, could be exactly (i.e., globally) solved using BP or LARS (Donoho & Tsaig, 2008). However, OMP, BP, and LARS share a serious drawback: they are not suitable for solving very large sparse least-squares problems because they involve multiplication and inverting very large matrices.

As a way of overcoming this drawback, Caiafa and Cichocki (2012) proposed Kronecker-OMP a tensor-based generalization of OMP for solving sparse multilinear least-squares problems. However, similar to OMP, Kronecker-OMP could obtain only a nonglobally optimal solution of the nonconvex $L0$ constrained, sparse, least-squares problem (Donoho & Tsaig, 2008).

## 3 Tensor Least Angle Regression (T-LARS)

Least angle regression (LARS) is a computationally efficient method to solve an $L0$ or $L1$ constrained minimization problem in vector form, problem 2.5 for all critical values of the regularization parameter $\lambda $ (Efron et al., 2004). In this letter, we develop a generalization of LARS, tensor least angle regression (T-LARS), to solve large sparse tensor least-squares problems to for example, obtain sparse representations of multidimensional signals using a separable dictionary as described by equation 2.3. As shown below, our T-LARS calculations are performed without explicitly generating or inverting large matrices, thereby keeping its computational complexity and memory requirement relatively low. Both T-LARS and Kronecker-OMP algorithms use the Schur complement inversion formula for inverting large matrices without explicitly inverting them (Caiafa & Cichocki, 2012).

### 3.1 Least Angle Regression

LARS solves the $L0$ or $L1$ constrained minimization problem in equation 2.5 for all critical values of the regularization parameter $\lambda $. It starts with a very large value of $\lambda $ that results in an empty active column's matrix, $\Phi I$ and a solution $x\u02dct=0=0$. The set $I$ denotes an active set of the dictionary $\Phi $ that is column indices where the optimal solution $x\u02dct$ at iteration $t$ is nonzero, and $Ic$ denotes its corresponding inactive set. Therefore, $\Phi I$ contains only the active columns of the dictionary $\Phi $ and $\Phi Ic$ contains only its inactive columns.

At each iteration $t$, a new column is either added $(L0)$ to the active set $I$ or added or removed ($L1$) from the active set $I$, and $\lambda $ is reduced by a calculated value $\delta t*$. As a result of such iterations new solutions with an increased number of coefficients that follow a piecewise linear path are obtained until a predetermined residual error $\u025b$ is obtained. One important characteristic of LARS is that the current solution at each iteration is the optimum sparse solution for the selected active columns.

The condition in equation 3.2 ensures that the magnitude of the correlation between all active columns and the residual is equal to $|\lambda t|$ at each iteration, and the condition in equation 3.1 ensures that the magnitude of the correlation between the inactive columns and the residual is less than or equal to $\lambda t$.

For an $L1$ constrained minimization problem, at each iteration, if an inactive column violates condition 3.1, it is added to the active set, and if an active column violates condition 3.2, it is removed from the active set. For an $L0$ constrained minimization problem only the columns that violate condition 3.1 are added to the active set at each iteration.

### 3.2 Tensor Least Angle Regression Algorithm

T-LARS is a generalization of LARS to solve the sparse multilinear least-squares problem in equation 2.6 using tensors and multilinear algebra. Unlike LARS, T-LARS does not calculate large matrices such as the Kronecker dictionary, $\Phi $ in equation 2.4, which is required in vectorized sparse multilinear least-squares problems. Instead, T-LARS uses much smaller mode-$n$ dictionaries for calculations. A mapping between column indices of dictionary $\Phi $ and column indices of mode-$n$ dictionaries $\Phi (n);n\u22081,\u2026,N$ is essential in T-LARS calculations (see appendix A).

Required inputs to the T-LARS algorithm are the tensor $Y\u2208RJ1\xd7\u2026\xd7Jn\xd7\u2026\xd7JN$, mode-$n$ dictionary matrices $\Phi (n);n\u2208{1,\u2026,N}$, and the stopping criterion as a residual tolerance $\u025b$ or the maximum number of nonzero coefficients $K$ (K-sparse representation). The output is the solution tensor $X\u2208RI1\xd7\u2026\xd7In\xd7\u2026\xd7IN$.

First, normalize the tensor $Y$ and columns of each dictionary $\Phi (n);n\u22081,\u2026,N$ to have a unit $L2$ norm. Note that normalizing columns of each dictionary $\Phi (n);n\u22081,\u2026,N$ ensures normalization of the separable dictionary $\Phi $ in equation 2.4 (see appendix B). For notational simplicity in the following sections, we will use $Y$ to represent the normalized tensor and $\Phi (n)$ to represent normalized dictionary matrices.

T-LARS requires several parameters to be initialized before starting the iterations. The regularization parameter $\lambda 1$ is initialized to the maximum value of the correlation vector $c1$, and the corresponding most correlated column $\varphi I1$ from the separable dictionary $\Phi $ is added to the initial active set $I$. The initial residual tensor $R0$ is set to $Y$ and the initial solution vector $x0$ and the initial direction vector $d0$ are set to **0**. Initial step size $\delta 0*$ is also set to 0. T-LARS starts the iterations at $t=1$ to run until a stopping criterion is reached:

Initial residual tensor: $R0=Y$

Initial solution vector: $x\u02dc0=0$

Initial direction vector: $d0=0$

Initial step size: $\delta 0*=0$

Initial regularization parameter: $\lambda 1=max(c1)$

Active set: $I=\varphi I1$

Start iterations at $t=1$

The following calculations are performed at every iteration $t=1,2,\u2026$ of the T-LARS algorithm until the stopping criterion is reached.

#### 3.2.1 Obtain the Inverse of the Gram Matrix of the Active Columns of the Dictionary

We obtain the Gram matrix of the active columns of the dictionary $Gt=\Phi ItT\Phi It$ at each iteration $t$. The size of this Gram matrix would either increase (dictionary column addition) or decrease (dictionary column removal) with each iteration $t$. Therefore, for computational efficiency, we use the Schur complement inversion formula to calculate $Gt-1$ from $Gt-1-1$ thereby avoiding its full calculation (Caiafa & Cichocki, 2012; Goldfarb, 1972; Rosário, Monteiro, & Rodrigues, 2016).

*Updating the Gram matrix after adding a new column $ka$ to the active set*. Let the column $ka\u2208I$ be the new column added to the active matrix. Given $Gt-1-1$, the inverse of the Gram matrix $Gt-1$ could be calculated using the Schur complement inversion formula for a symmetric block matrix (Björck, 2015; Boyd & Vandenberghe, 2010; Hager, 1989),

*Updating the Gram matrix after Removing a Column $kr$ from the Active Set*. Let the column $kr\u2208I$ be the column removed from the active set. We move column $kr$ and row $kr$ of $Gt-1-1$ to become its last column and last row, respectively. We denote this new matrix as $G^t-1-1$. By using the Schur complement inversion formula for a symmetric block matrix, the inverse $G^t-1-1$ could be interpreted as

#### 3.2.2 Obtain Direction Vector $dt$

#### 3.2.3 Obtain $vt$

#### 3.2.4 Obtain the Correlation Vector $ct$

#### 3.2.5 Calculate Step Size $\delta *$

#### 3.2.6 Update the Solution $x\u02dct$, Regularization Parameter, and Residual

#### 3.2.7 Check Stopping Criterion

The T-LARS algorithm solves the sparse tensor least-squares problem in equation 2.6 to obtain a sparse solution $X\u2208RI1\xd7\u2026\xd7In\xd7\u2026\xd7IN$ for a tensor $Y\u2208RJ1\xd7\u2026\xd7Jn\xd7\u2026\xd7JN$ using $N$ mode-$n$ dictionaries $\Phi (n)\u2208RJn\xd7In;\u2200n\u2208{1,N}$ Tensor $Y$ and $N$ mode-$n$ dictionaries $\Phi 1,\Phi 2,\u2026,\Phi (N)$ are the inputs to the T-LARS algorithm where $N\u22651$. The T-LARS algorithm can be used to solve underdetermined, square or overdetermined sparse tensor least-squares problems, where the mode-$n$ dictionaries $\Phi (n)\u2208RJn\xd7In;\u2200n\u2208{1,.N}$ are overcomplete dictionaries $Jn<In$ complete dictionaries $Jn=In$ or undercomplete dictionaries $Jn>In$, respectively.

The complete T-LARS algorithm is summarized in algorithm 2 (using Matlab notation). This T-LARS algorithm solves the $Lp$ sparse separable least-squares problem when the T-LARS_mode is set to $Lp$

## 4 Algorithm Computational Complexity

In this section, we analyze the computational complexity of our T-LARS algorithm and compare it with the computational complexity of Kronecker-OMP. We show that the computational complexity of T-LARS is significantly lower than that of Kronecker-OMP when solving sparse tensor least-squares problems.

### 4.1 The Computational Complexity of T-LARS

Let $K$ be the number of iterations used in T-LARS $P=I1\xd7\u2026\xd7In\xd7\u2026\xd7IN$ be the number of atoms in the Kronecker dictionary $\Phi $ and $Q=J1\xd7\u2026\xd7Jn\xd7\u2026\xd7JN$ be the total number of elements in tensor $Y$. In a typical sparse solution obtained by T-LARS $K\u226aP$. In the following analysis we refer to algorithm 2.

#### 4.1.1 Case of Overcomplete mode-n Dictionaries

### 4.2 Comparison of Computational Complexities of Kronecker-OMP and T-LARS

Table 1 shows a term-by-term comparison of the computational complexity of Kronecker-OMP and T-LARS given in equations 4.2 and 4.3, respectively.

. | Kronecker-OMP . | T-LARS . |
---|---|---|

First term | $2INJ1-JIN1-JIK$ | $2INJ1-JIN1-JI$ |

Second term | $2INK+7\u2211k=1Kk2N$ | $INK+4\u2211k=1Kk2$ |

Third term | $2NJ+N+4\u2211k=1KkN$ | $2NI\u2211k=1Kk$ |

Fourth term | $N(N-1)+3KJN$ | $2KJN$ |

. | Kronecker-OMP . | T-LARS . |
---|---|---|

First term | $2INJ1-JIN1-JIK$ | $2INJ1-JIN1-JI$ |

Second term | $2INK+7\u2211k=1Kk2N$ | $INK+4\u2211k=1Kk2$ |

Third term | $2NJ+N+4\u2211k=1KkN$ | $2NI\u2211k=1Kk$ |

Fourth term | $N(N-1)+3KJN$ | $2KJN$ |

For multidimensional problems, $N\u22652$, typically $K\u226bI$; therefore, the second terms of the computational complexities of both T-LARS and Kronecker-OMP dominate over all other terms. Therefore, for $K$ iterations, the asymptotic computational complexities of T-LARS and Kronecker-OMP are $OINK+K3$ and $OINK+K2N+1$, respectively.

## 5 Experimental Results

In this section, we present experimental results to compare the performance of Kronecker-OMP and T-LARS when used to obtain sparse representations of 3D brain images using both fixed and learned mode-$n$ overcomplete dictionaries.

### 5.1 Experimental Data Sets

For our computational experiments, we obtained a 3D MRI brain image and a 3D PET-CT brain image from publicly available data sets.

Our 3D MRI brain image consists of $175\xd7150\xd710$ voxels and was obtained from the OASIS-3: Longitudinal Neuroimaging, Clinical, and Cognitive Dataset for Normal Aging and Alzheimer's Disease (LaMontagne et al., 2018). This image shows a region in the brain of a 38-year-old male patient with a tumor in his right frontal lobe.

Our 3D PET-CT brain image consists of $180\xd7160\xd710$ voxels and was obtained from the Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection (Clark et al., 2013). This image shows a region in the brain of a 38-year-old female patient

### 5.2 Experimental Setup

We compared the performance of T-LARS and Kronecker-OMP when used to obtain different sparse representations for our 3D MRI and PET-CT brain images by solving $L0$ and $L1$ constrained sparse multilinear least-squares problems using both fixed and learned overcomplete dictionaries.

Our fixed mode-$n$ overcomplete dictionaries were a union of a discrete cosine transform (DCT) dictionary and a Symlet wavelet packet with four vanishing moments dictionary. In this case of using fixed mode-$n$ dictionaries, we obtained the required 3D sparse representations by solving either the 3D $L0$ or $L1$ minimization problem.

Our learned mode-$n$ overcomplete dictionaries were learned using the tensor-method of optimal directions (T-MOD) (Roemer, Del Galdo, & Haardt, 2014) algorithm. We used T-MOD to learn three overcomplete mode-$n$ dictionaries—$\Phi 1\u2208R32\xd738$, $\Phi (2)\u2208R32\xd738$, and $\Phi 3\u2208R10\xd712$—using random patches, $32\xd732\xd710$ voxels with a 10% overlap from either one the 3D brain images that we used: MRI or PET-CT (Elad & Aharon, 2006; Zhai, Zhang, Lv, Fu, & Yu, 2018). In this case of using learned dictionaries, we obtained the required 3D sparse representations by solving either a 4D (due to the use of image patches) $L0$ or $L1$ minimization problem. For fair comparison of the performance of T-LARS and Kronecker-OMP, we generated our results using two learned dictionaries, $\Phi KOMP$ and $\Phi TLARS$, that were obtained using Kronecker-OMP and T-LARS as the sparse coding algorithm used by T-MOD, respectively

To compare the performance of T-LARS and Kronecker-OMP when used to solve $L0$ and $L1$ constrained sparse multilinear least-squares problems, we designed the following experiments to obtain sparse representations of our 3D brain images under different conditions:

Experiment 1: Fixed mode-$n$ dictionaries—3D $L0$ minimization problem

Experiment 2: Learned mode-$n$ dictionaries $\Phi KOMP$—4D $L0$ minimization problem

Experiment 3: Learned mode-$n$ dictionaries $\Phi TLARS$—4D $L0$ minimization problem

Experiment 4: Fixed mode-$n$ dictionaries—3D $L1$ minimization problem

Experiment 5: Learned mode-$n$ dictionaries $\Phi TLARS$—4D $L1$ minimization problem

All of our experimental results were obtained using a Matlab implementation of T-LARS and Kronecker-OMP on an MS Windows machine: 2 Intel Xeon CPUs E5-2637 v4 3.5G Hz, 32 GB RAM, and NVIDIA Tesla P100 GPU with 12 GB memory.

### 5.3 Experimental Results for 3D MRI Brain Images

In this section, we compare the performance of T-LARS and Kronecker-OMP to obtain $K$-sparse representations of our 3D MRI brain image, $Y$, $175\xd7150\xd710$ voxels by solving the $L0$ constrained sparse tensor least-squares problem. We also obtained similar K-sparse representations using T-LARS by solving the $L1$ optimization problem. Table 2 summarizes our results for the five experiments described in section 5.2. In all experiments, the algorithms were stopped when the number of nonzero coefficients $K$ reached **13**, **125**, which is 5% of the number of elements in $Y$. We note that in Table 2, the number of iterations for $L1$ optimization problems is larger than K because, as shown in algorithm 2 at each iteration T-LARS could either add or remove nonzero coefficients to or from the solution.

Experiment . | Image Size . | Optimization Problem . | Dictionary Type . | Number of Iterations . | Computation Time (sec) K-OMP . | Computation Time (sec) T-LARS . |
---|---|---|---|---|---|---|

1 | 175 $\xd7$ 150 $\xd7$ 10 | $L0$ | Fixed | 13,125 | 20,144 | 434 |

2 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 36 | $L0$ | Learned $\Phi KOMP$ | 13,125 | 25,002 | 394 |

3 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 36 | $L0$ | Learned $\Phi TLARS$ | 13,125 | 22,646 | 400 |

4 | 175 $\xd7$ 150 $\xd7$ 10 | $L1$ | Fixed | 14,216 | -- | 495 |

5 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 36 | $L1$ | Learned $\Phi TLARS$ | 14,856 | -- | 490 |

Experiment . | Image Size . | Optimization Problem . | Dictionary Type . | Number of Iterations . | Computation Time (sec) K-OMP . | Computation Time (sec) T-LARS . |
---|---|---|---|---|---|---|

1 | 175 $\xd7$ 150 $\xd7$ 10 | $L0$ | Fixed | 13,125 | 20,144 | 434 |

2 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 36 | $L0$ | Learned $\Phi KOMP$ | 13,125 | 25,002 | 394 |

3 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 36 | $L0$ | Learned $\Phi TLARS$ | 13,125 | 22,646 | 400 |

4 | 175 $\xd7$ 150 $\xd7$ 10 | $L1$ | Fixed | 14,216 | -- | 495 |

5 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 36 | $L1$ | Learned $\Phi TLARS$ | 14,856 | -- | 490 |

#### 5.3.1 Experiment 1

Figures 1 and 2 show obtained experimental results for representing our 3D MRI brain image using three fixed mode-$n$ overcomplete dictionaries $\Phi (1)\u2208R175\xd7351$, $\Phi (2)\u2208R150\xd7302$, and $\Phi 3\u2208R10\xd726$, by solving the $L0$ minimization problem using both T-LARS and Kronecker-OMP. The residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.0839$ (8.39%); for Kronecker-OMP, it was $R2=0.0624$ (6.24%).

#### 5.3.2 Experiments 2 and 3

Figures 3 and 4 show the experimental results for representing our 3D MRI brain image using our learned overcomplete dictionaries, $\Phi KOMP$ and $\Phi TLARS$ by solving the $L0$ minimization problem using both T-LARS and Kronecker-OMP. For the $\Phi KOMP$ dictionary, the residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.1368$ (13.68%) and for Kronecker-OMP, it was $R2=0.1143$ (11.43%). For the $\Phi TLARS$ dictionary, the residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.1127$ (11.27%) and for Kronecker-OMP, it was $R2=0.0955$ (9.55%)

#### 5.3.3 Experiment 4

Figures 5 and 6 show the experimental results for representing our 3D MRI brain image using three fixed mode-$n$ overcomplete dictionaries, $\Phi (1)\u2208R175\xd7351$, $\Phi (2)\u2208R150\xd7302$ and $\Phi 3\u2208R10\xd726$, by solving the $L1$ minimization problem using T-LARS. The residual error of the reconstructed 3D image obtained using T-LARS was $R2=0.121$ (12.1%)

#### 5.3.4 Experiment 5

### 5.4 Experimental Results for 3D PET-CT Brain Images

In this section, we compare the performance of T-LARS and Kronecker-OMP to obtain $K$-sparse representations of our 3D PET-CT brain image, $Y$, $180\xd7160\xd710$ voxels. by solving the $L0$ constrained sparse tensor least-squares problem. We also obtained similar $K$-sparse representations using T-LARS by solving the $L1$ optimization problem. Table 3 summarizes our results for the five experiments. In all experiments, the algorithms were stopped when the number of nonzero coefficients $K$ reached 14,400, which is 5% of the number of elements in $Y$. We note that in Table 2 the number of iterations for $L1$ optimization problems is larger than $K$ because, as shown in algorithm 2, at each iteration T-LARS could either add or remove nonzero coefficients to or from the solution.

Experiment . | Image Size . | Optimization Problem . | Dictionary Type . | Iterations . | Number of K-OMP . | Computation Time (sec) T-LARS . |
---|---|---|---|---|---|---|

1 | 180 $\xd7$ 160 $\xd7$ 10 | $L0$ | Fixed | 14,400 | 29,529 | 505 |

2 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 42 | $L0$ | Learned $\Phi KOMP$ | 14,400 | 33,453 | 476 |

3 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 42 | $L0$ | Learned $\Phi TLARS$ | 14,400 | 31,083 | 490 |

4 | 180 $\xd7$ 160 $\xd7$ 10 | $L1$ | Fixed | 16,059 | – | 591 |

5 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 42 | $L1$ | Learned $\Phi TLARS$ | 18,995 | – | 744 |

Experiment . | Image Size . | Optimization Problem . | Dictionary Type . | Iterations . | Number of K-OMP . | Computation Time (sec) T-LARS . |
---|---|---|---|---|---|---|

1 | 180 $\xd7$ 160 $\xd7$ 10 | $L0$ | Fixed | 14,400 | 29,529 | 505 |

2 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 42 | $L0$ | Learned $\Phi KOMP$ | 14,400 | 33,453 | 476 |

3 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 42 | $L0$ | Learned $\Phi TLARS$ | 14,400 | 31,083 | 490 |

4 | 180 $\xd7$ 160 $\xd7$ 10 | $L1$ | Fixed | 16,059 | – | 591 |

5 | 32 $\xd7$ 32 $\xd7$ 10 $\xd7$ 42 | $L1$ | Learned $\Phi TLARS$ | 18,995 | – | 744 |

#### 5.4.1 Experiment 1

Figures 9 and 10 show the experimental results for representing our 3D PET-CT brain image using three fixed mode-$n$ overcomplete dictionaries, $\Phi (1)\u2208R180\xd7364$, $\Phi (2)\u2208R160\xd7320$ and $\Phi 3\u2208R10\xd726$, by solving the $L0$ minimization problem using both T-LARS and Kronecker-OMP. The residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.054$ (5.4%), and for Kronecker-OMP, it was $R2=0.0368(3.68%)$.

#### 5.4.2 Experiments 2 and 3

Figures 11 and 12 show the experimental results for representing our 3D PET-CT brain image using our learned overcomplete dictionaries $\Phi KOMP$ and $\Phi TLARS$, by solving the $L0$ minimization problem using both T-LARS and Kronecker-OMP. For the $\Phi KOMP$ dictionary, the residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.096$ (9.6%) and for Kronecker-OMP, it was $R2=0.077$ (7.7%). For the $\Phi TLARS$ dictionaries, the residual error of the reconstructed 3D images obtained using T-LARS was $R2=0.0877$ (8.77%) and for Kronecker-OMP, it was $R2=0.0722$ (7.22%).

#### 5.4.3 Experiment 4

Figures 13 and 14 show the experimental results for representing 3D PET-CT brain images using three fixed overcomplete mode-$n$ dictionaries, $\Phi (1)\u2208R180\xd7364$, $\Phi (2)\u2208R160\xd7320$, and $\Phi 3\u2208R10\xd726$, by solving the $L1$ minimization problem using T-LARS. The residual error of the reconstructed 3D PET-CT brain images obtained using T-LARS is $R2=0.0838$ (8.38%).

#### 5.4.4 Experiment 5

Figures 15 and 16 show the experimental results for representing the 3D PET-CT brain images using our learned overcomplete dictionary $\Phi TLARS$, by solving the $L1$ minimization problem using T-LARS. The residual error of the reconstructed 3D PET-CT brain images obtained using T-LARS is $R2=0.106$ (10.6%).

## 6 Conclusion

In this letter, we developed tensor least angle regression (T-LARS), a generalization of least angle regression to efficiently solve large $L0$ or large $L1$ constrained multidimensional (tensor) sparse least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameter $\lambda $. An earlier generalization of OMP, Kronecker-OMP was developed to solve the $L0$ problem for large multidimensional sparse least-squares problems. To demonstrate the validity and performance of our T-LARS algorithm, we successfully used it to obtain different $K$-sparse signal representations of two 3D brain images, using fixed and learned separable overcomplete dictionaries, by solving 3D and 4D, $L0$ and $L1$ constrained sparse least-squares problems. Our numerical experiments demonstrate that our T-LARS algorithm is significantly faster (46 to 70 times) than Kronecker-OMP in obtaining $K$-sparse solutions for multilinear least-squares problems. However, the $K$-sparse solutions obtained using Kronecker-OMP always have a slightly lower residual error (1.55% to 2.25%) than those obtained by T-LARS. These numerical results confirm our analysis in section 4.2 that the computational complexity of T-LARS is significantly lower than the computational complexity of Kronecker-OMP. In future work we plan to exploit this significant computational efficiency of T-LARS to develop more computationally efficient Kronecker dictionary learning methods. A Matlab GPU-based implementation of our Tensor Least Angle Regression (T-LARS) algorithm, Algorithm 2, is available at https://github.com/SSSherif/Tensor-Kronecker-Least-Angle-Regression.

## Appendix A: Mapping of Column Indices of Dictionary $\Phi $ to Column Indices of Mode-$n$ Dictionaries

T-LARS avoids the construction of large matrices such as the separable dictionary $\Phi $ in equation 2.4. Instead, T-LARS uses mode-$n$ dictionaries for calculations. The following mapping between column indices of dictionary $\Phi $ and column indices of mode-$n$ dictionaries $\Phi (n);n\u22081,\u2026,N$ is essential in T-LARS calculations.

We note that $in;\u2200n\u2208{1,2,\u2026,N$ are integers and $1\u2264in\u2264In$.

## Appendix B: Tensor and Dictionary Normalization

### B.1 Normalization of the Tensor $Y\u2208RJ1\xd7\u2026\xd7Jn\xd7\u2026\xd7JN$

### B.2 Normalization of Columns of the Separable Dictionary $\Phi $ to Have a Unit $L2$ Norm

## Appendix C: Obtaining the Initial Correlation Tensor $C1$

## Appendix D: Creating a Gram Matrix for Each mode-$n$ Dictionary $\Phi n$

## Acknowledgments

This work was supported by a Discovery Grant (RGPIN-2018-06453) from the National Sciences and Engineering Research Council of Canada.