## Abstract

Recently there has been great interest in sparse representations of signals under the assumption that signals (data sets) can be well approximated by a linear combination of few elements of a known basis (dictionary). Many algorithms have been developed to find such representations for one-dimensional signals (vectors), which requires finding the sparsest solution of an underdetermined linear system of algebraic equations. In this letter, we generalize the theory of sparse representations of vectors to multiway arrays (tensors)—signals with a multidimensional structure—by using the Tucker model. Thus, the problem is reduced to solving a large-scale underdetermined linear system of equations possessing a Kronecker structure, for which we have developed a greedy algorithm, Kronecker-OMP, as a generalization of the classical orthogonal matching pursuit (OMP) algorithm for vectors. We also introduce the concept of multiway block-sparse representation of ** N**-way arrays and develop a new greedy algorithm that exploits not only the Kronecker structure but also block sparsity. This allows us to derive a very fast and memory-efficient algorithm called N-BOMP (

**-way block OMP). We theoretically demonstrate that under the block-sparsity assumption, our N-BOMP algorithm not only has a considerably lower complexity but is also more precise than the classic OMP algorithm. Moreover, our algorithms can be used for very large-scale problems, which are intractable using standard approaches. We provide several simulations illustrating our results and comparing our algorithms to classical algorithms such as OMP and BP (basis pursuit) algorithms. We also apply the N-BOMP algorithm as a fast solution for the compressed sensing (CS) problem with large-scale data sets, in particular, for 2D compressive imaging (CI) and 3D hyperspectral CI, and we show examples with real-world multidimensional signals.**

*N*## 1. Introduction

A concept that underlies some recent developments in many fields such as in image and signal processing is sparsity. Usually signals living in a vector space do not cover the entire space uniformly (Lu & Do, 2008). In particular, it was discovered that most signals of interest can be well approximated by a sparse representation over a known dictionary by using a linear combination of few dictionary elements (atoms) (Donoho, 2006; Candès, Romberg, & Tao, 2006). Moreover, sometimes it is useful to consider overcomplete dictionaries, where the number of atoms is larger than the signal size (Elad, Figueiredo, & Ma, 2010). In fact, sparse representations are found in the way that visual cortex codes natural images in the brain (Olshausen & Field, 1996). The implications of the sparsity assumption have recently driven the development of many exciting applications in signal processing. For example, it was proved that signals with a sparse representation can be reconstructed from a reduced number of measurements, the main objective of compressed sensing (CS) (Donoho, 2006; Candès et al., 2006); some algorithms for blind source separation are based on the sparsity assumption (Bobin, Starck, Fadili, & Moudden, 2007; Li et al, 2004; Caiafa & Cichocki, 2009); also techniques for denoising and inpainting of images have been developed using similar ideas (Elad, Starck, Querre, & Donoho, 2005; Elad & Aharon, 2006). In this framework, the main problem is how to solve an underdetermined system of linear algebraic equations constrained to the fact that among the infinite number of solutions, the sparsest solution may be unique. To this end, many algorithms were proposed, including greedy methods such as matching pursuit (MP) (Davis, Mallat, & Zhang, 1994), orthogonal matching pursuit (OMP) (Tropp, 2004), compressive sampling matching pursuit (CoSaMP) (Needell & Tropp, 2009), norm minimization methods such as basis pursuit (Chen & Donoho, 2001), gradient projection sparse reconstruction (GPSR) (Figueiredo, Nowak, & Wright, 2007), and many others (see Tropp & Wright, 2010, for an up-to-date review of algorithms).

Another characteristic of signals found in modern applications is that they often have a multidimensional structure, where each dimension (mode) has a particular physical meaning (e.g., space, time, frequency, trials). For example, a 3D image produced by a computed tomography (CT) system or a magnetic resonance imaging (MRI) system corresponds to a sampled version of a 3D function *f*(*x*_{1}, *x*_{2}, *x*_{3}). In this case, the multidimensional image is stored in memory as a multiway array (tensor) whose elements are samples taken on a grid, that is, (, *n*=1, 2, 3), with *h* being the discretization step among all dimensions. Multiway arrays are the generalization of vectors and matrices to a higher number of dimensions and are usually referred in the literature as *N*-way arrays or tensors. They are very attractive mathematical tools possessing their own properties (Kolda & Bader, 2009; Cichocki, Zdunek, Phan, & Amari, 2009).

Due to the availability of massive data sets occurring in new applications and requirements in scientific computation, a great deal of effort has been devoted to the development of new algorithms for multiway structured data sets, sometimes with a high number of dimensions (Oseledets, Savostianov, & Tyrtyshnikov, 2008; Caiafa & Cichocki, 2010; Oseledets, 2011). The curse of dimensionality problem makes the task of finding sparse representations of *N*-way arrays very expensive in terms of memory storage resources and computation load since the number of entries grows exponentially with the number of dimensions. To solve this problem, one may try to find some structure in data sets and build approximate models using fewer parameters than the number of entries, as is the case of the Tucker model (Lathauwer, Moor, & Vandewalle, 2000b; Kolda & Bader, 2009). In fact, multidimensional signals often reveal a structure in each mode that allows one to adopt good approximate representations based on the Kronecker product of dictionaries associated with each of the modes (Rivenson & Stern, 2009a). For example, most popular transforms applied to two-dimensional signals (2D images) are based on the application of a transformation of rows followed by a transformation of columns known as separable transforms. In this case, the dictionary associated with the vectorized image can be written as the Kronecker product of dictionaries associated with rows and columns. The Kronecker structure has been extensively used in the image processing community (Nagy & Kilmer, 2006; Rivenson & Stern, 2009a). Also the Kronecker structure has been been proposed in the matrix analysis community as a good preconditioner to solve linear systems (Loan & Pitsianis, 1992) and to approximate function-related linear systems (Ford & Tyrtyshnikov, 2003; Tyrtyshnikov, 2004).

This letter is organized as follows. In section 2, the basic notation is introduced, and some important previous results are presented; in section 3, we generalize the theory of sparse representations of vectors to multiway arrays and introduce the concept of block sparsity; in section 4, we introduce the multidimensional compressed sensing problem; in section 5, our new greedy algorithms are developed for the case of multidimensional sparsity (Kronecker-OMP algorithm) and multiway block sparsity (N-BOMP), including a detailed analysis of their computational complexities; in section 6, we present a new theoretical result about the performance guarantee of N-BOMP algorithm, which we show to be much less restrictive than the case of the classical OMP algorithm; in section 7, extensive simulations are presented using synthetically generated *N*-way arrays, as well as the application to CS of real-world multidimensional signals. In section 8, we outline the main conclusions of this work.

## 2. Notation and Preliminaries

In this letter, *N*-way arrays (multidimensional signals) are denoted by underlined boldface capital letters; matrices (two-way arrays) are denoted by bold uppercase letters and vectors by boldface lowercase letters, that is, ; and and are examples of an *N*-way array, a matrix, and a vector, respectively. The *i*th entry of a vector **y** is denoted by *y _{i}*, and the element (

*i*,

*j*) of a matrix

**y**is denoted by either of the following ways

**Y**(

*i*,

*j*)=

*y*. The same notation is used for

_{ij}*N*-way arrays by referring to the element as . As a natural generalization, we define the Frobenius norm of an

*N*-way array by . Sometimes we use MATLAB notation to indicate the full range of indices in certain modes; for example,

**Y**(:,

*j*) means a vector composed of all the entries of column

*j*. Additionally, we refer to a subarray (or block) by restricting the indices to belonging to certain subsets of indices, for example, given the following subsets of

*S*indices in each mode , the subarray is obtained by keeping the entries of the original

_{n}*N*-way array at the selected index subsets (). We denote the cardinality of a subset of indices by .

### 2.1. Sparse Solutions of Underdetermined Linear Systems.

We say that a one-dimensional signal (vector) has a sparse representation in a known dictionary if it can be recovered exactly by a linear combination of few selected elements of the dictionary (atoms) as the following definition states:

*(sparse representation of vectors). A signal has a*where ||

*K*-*sparse*representation with respect to the dictionary if the following relation holds:**||**

*x*_{0}is the quasi-norm

^{1}of the vector obtained by counting the number of nonzero entries and typically , . In other words, there is a small set of

*K*indices such that

*x*=0 if .

_{i}**D**has more columns than rows, the solution

**x**cannot be uniquely recovered; however, the sparsest solution may be unique. For instance, it is known that if no 2

*K*columns of

**D**are linearly dependent, then the solution of equation 2.1, if it exists, is unique (Donoho & Elad, 2003). Many other conditions on matrix

**D**have been provided in order to guarantee the uniqueness of a sparse representation (Donoho & Elad, 2003; Tropp, 2004). In particular, for a matrix with unit-norm columns (||

**d**

_{i}||

^{2}

_{2}=1, ), its “coherence” , which is defined as the maximum absolute value of the correlation between two columns, gives us a characterization of the system. More specifically, if then the solution of equation 2.1 is unique (Tropp, 2004).

Several algorithms have been proposed to find the sparsest representation of a signal for a given dictionary by solving the corresponding underdetermined linear system of algebraic equations under the sparsity constraint (Tropp & Wright, 2010). These methods can be divided in two main groups: basis pursuit (BP) and matching pursuit (MP).

**x**||

_{1}is minimized subject to the constraint

**y**=

**Dx**, which can be proved to solve our original problem if the vector is sparse enough and the matrix

**D**has sufficiently low coherence (Donoho & Elad, 2003; Tropp, 2004). However, it is well known that BP is computationally expensive and unsuitable for large-scale problems (Tropp, 2004). On the other side, MP methods, which are also known as greedy algorithms (Davis et al., 1994; Tropp, 2004), have complexity , which is significantly smaller compared to BP, especially when the signal sparsity level

*K*is low (Tropp & Gilbert, 2007). A powerful standard greedy algorithm is the orthogonal matching pursuit (OMP) which was adapted and studied in Tropp (2004), Needell, Tropp, and Vershynin (2008), and Needell and Tropp (2009). The OMP pseudocode is reproduced in algorithm 1. OMP iteratively refines a sparse solution by successively identifying one component at a time that yields the greatest improvement in quality until a desired sparsity level

*K*is reached or the approximation error is below some predetermined level . An optimized implementation of this algorithm is obtained by using the Cholesky factorization for the computation of STEP 5 (Rubinstein, Zibulevsky, & Elad, 2008).

### 2.2. Kronecker Dictionaries.

The Kronecker product is defined as follows:

^{2}with and being nonsingular matrices associated with the transforms of columns and rows, respectively, and is the matrix of coefficients. In other words, the original image

**y**can be recovered by applying the inverse transform,

**x**=

*vec*(

**X**) converts the matrix into a vector by stacking the columns of matrix

**x**. If we use property 2.7 in equation 2.6 and by defining

**D**

^{T}

_{1}=

**T**

^{−1}

_{1},

**D**

^{T}

_{2}=

**T**

^{−1}

_{2},

**x**=

*vec*(

**X**) and

**y**=

*vec*(

**Y**) we easily obtain where we explicitly expresse the vector

**y**as a linear combination of elements of a dictionary with Kronecker structure . If the coefficient matrix

**x**has few nonzero representative entries, then the signal

**y**has an approximated sparse representation. We can generalize the definition of a Kronecker dictionary for any arbitrary number

*N*of matrices, that is, . It is known that the coherence of

**D**(global coherence) satisfies the following equation (Jokar & Mehrmann, 2009): where (), that is, if one of the dictionaries

**D**

_{n}has a large coherence , then it will dominate the coherence of

**D**.

### 2.3. Multiway Arrays (*N*-way arrays) and the Tucker Model.

Given a multiway array (tensor) , its mode-*n* vectors are obtained by fixing every index but the one in mode *n*. The mode-*n* unfolding matrix is defined by arranging all the mode-*n* vectors as columns of a matrix. Note that for the 2D case, mode-1 and mode-2 vectors are columns and rows, respectively, and the mode-2 unfolding matrix is the transposed matrix.

*N*-way array. More specifically, when the ranks of these matrices are bounded by (), then the following multilinear expression holds: with a core tensor and factor matrices . It is easy to see that mode-

*n*vectors of an

*N*-way array with a Tucker representation belong to the span of the columns of matrix

**A**

_{n}. In fact, it can be shown that equation 2.11 implies (Kolda & Bader, 2009)

^{3}Let us define the vectorization operator on

*N*-way arrays as , that is, by stacking all the mode-1 vectors. There is a connection between the Tucker model and a Kronecker representation for multiway arrays, as the following proposition stablishes.

The result of proposition 1 tells us that the multilinear Tucker model structure for a multidimensional signal is equivalent to a linear representation (matrix equation) of the vectorized signal where the dictionary has a Kronecker structure. In the standard Tucker model, a core tensor usually has a much smaller size than with and the main objective is to find such compressed decomposition, that is, to compute and factor matrices **A**_{n}, usually with additional constraints. In contrast, in our approach, the data set (measurements) and dictionaries **D**_{n} are assumed to be known, and our objective is to compute the core tensor , which is assumed to be very sparse and is larger than ().

## 3. Sparse Representations of *N*-Way Arrays

Here we generalize the concept of sparse representations of vectors to *N*-way arrays through the equivalence between the Tucker model and the Kronecker structure. Formally, definition 1 can be applied to a vectorized version of an *N*-way array with respect to a Kronecker dictionary arriving at the following obvious generalization:

*(multiway sparsity). A multidimensional signal ( N-way array) is K-sparse with respect to the factors (, ) if its vectorized version y admits a K-sparse representation over the Kronecker dictionary according to representation 2.14*.

The columns of factor matrices **D _{n}** are interpreted as dictionary elements or atoms associated with each mode. For many problems, these atoms can be selected to resemble the coherent structures that appear in each mode of the input

*N*-way array. In other applications, we may want to design such dictionaries in order to favor a sparse representation. In this work, we assume that the dictionaries

**D**

_{n}for are known (mode-

*n*dictionaries).

*N*-way array having a

*K*-sparse representation with respect to a Kronecker dictionary has an equivalent Tucker representation (see equation 2.13) with a sparse core tensor , that is, with only

*K*nonzero entries. If we define the location of the nonzero entries in by with , then we can express the

*N*-way array as a weighted sum of

*K*rank-1

*N*-way arrays as where the symbol stands for the outer product of vectors: .

Without losing generality, we will assume throughout the letter that mode-*n* dictionaries have unit-norm columns (, , ).

In the multiway sparsity definition, definition 3, the nonzero entries can be located anywhere within the core tensor . Sparsity is a simple assumption that naturally arises in real life, but it leads to mathematical problems that are not easy to solve. Therefore, additional assumptions are needed in order to simplify the problems and develop more efficient algorithms, especially for high-dimensional data sets. Here, we propose block sparsity for tensor data sets motivated by the fact that in the real world, the nonzero coefficients are not evenly distributed and are likely to be grouped in blocks (in Figure 2, two motivating examples of natural images revealing block-sparse representations, are shown). In other words, block sparsity is a natural and realistic assumption that incorporates valuable prior information about signals in nature, because block-sparse signals are more likely to occur than totally random sparse representations.^{4}

We typically assume that and . In other words, multiway block sparsity assumes that the nonzero entries of the core tensor are located within a subarray (block) defined by . In Figure 1, a comparison of the sparsity types used in this work is shown.

The following basic results are easily derived from the previous definitions:

*(multiway block sparsity implies sparsity of the vectorized version of the signal). If an N-way array is -block sparse with respect to factor matrices (), then its vectorized version is K-sparse (K=S_{1}S_{2}⋅⋅⋅S_{N}) with respect to the Kronecker dictionary *.

*(multiway block sparsity implies sparsity of mode- n vectors). If an N-way array is -block sparse, then its mode-n vectors have an S_{n}-sparse representation with respect to D_{n} for each *.

*The N-way array has a Tucker representation with a core tensor and factors D_{n} (). By using equation 2.12, we obtain*

*where we have written the mode-*.

*n*vectors as linear combinations of elements in the mode-*n*dictionary*D*_{n}, that is,*Y*_{(n)}=*D*_{n}*Z*_{(n)}with . Now we note that the mode-*n unfolding matrix**X*_{(n)}has at least*S*rows with all-zero entries because_{n}*X*_{(n)}(*i*,_{n}*j*)=0 and therefore*Z*_{(n)}also has at least*S*rows with all-zero entries, which let us conclude that mode-_{n}*n*vectors are*S*-_{n}*sparse*Note that all *n*-vectors in each mode *n* belong to the same subspace with dimension at most *S _{n}*; that is, not only all these vectors can be sparsely represented over

**D**

_{n}but also they all use the same reduced set of dictionary elements, which means that the reciprocal of proposition 3 is not true in general; that is, sparsity of mode-

*n*vectors does not implies multiway block sparsity.

It is important to mention that our definition of multiway block sparsity is different from the common definition of block-sparse representation introduced for 1D signals in Eldar, Kuppinger, and Bolcskei (2010). For 1D signals, block sparsity of order *K* usually refers to the fact that the vector of coefficients **x** is composed of a concatenation of blocks of length *d*, that is, , with no more than *K* nonzero blocks.

## 4. Multidimensional Compressed Sensing

*K*-sparse representation on the basis of , that is,

**z**=

**Wx**with . Suppose that we cannot access the whole signal

**z**but instead have available a set of

*I*linear random projections obtained as follows: where is the sensing matrix and we have defined . From equation 4.1, we see that we could reconstruct the signal

**z**by identifying the proper sparse vector

**x**; in other words, the CS problem is reduced to solve an undetermined system of linear equations with the sparsity constraint as discussed in section 2.1.

The generalization of CS to higher dimensions is straightforward by assuming the Kronecker structure for the dictionary and the sensing matrix. This was recently proposed by Rivenson and Stern (2009a) for 2D signals and Duarte and Baraniuk (2011) for the general *N*-dimensional signals case. Formally, the vectorization of an *N*-way array has a *K*-sparse representation on a Kronecker basis, that is, with . Then, by using a global sensing matrix with Kronecker structure , we obtain a large-scale linear system of equations with Kronecker structure as given by equation 2.14, where ().

**I**is the identity matrix and is the separable sensing matrix for 2D images. Thus, HCI consists of computing the sparsest

*N*-way array such as constrained to the available measurements: In the following section, we develop algorithms to solve this problem.

## 5. Solving Large, Underdetermined Linear Systems with Kronecker Structure

It is clear that if the dictionary does not have a predefined structure, we can use any of the available algorithms such as OMP or BP. With a Kronecker dictionary, we can exploit its structure in order to reduce the computational complexity and memory requirements of a greedy algorithm as presented in section 5.1. It is worth mentioning that a BP algorithm that exploits the Kronecker structure has been proposed by Rivenson and Stern (2009a, 2009b), which we refer to here as Kronecker-BP (or Kronecker-SPGL1) algorithm. But we can do it better: if the multiway block-sparsity model is valid, we can use this structure to develop a very fast algorithm called N-BOMP, which we introduce in section 5.2. In Table 1 we summarize the algorithms and their applicability conditions that are covered in this letter.

Dictionary D
. | Sparsity Type . | Algorithms . |
---|---|---|

Nonstructured | Nonstructured | Classical OMP/BP |

Kronecker | Nonstructured | Kronecker-OMP/Kronecker-BP |

Kronecker | Multiway block sparsity | N-BOMP |

Dictionary D
. | Sparsity Type . | Algorithms . |
---|---|---|

Nonstructured | Nonstructured | Classical OMP/BP |

Kronecker | Nonstructured | Kronecker-OMP/Kronecker-BP |

Kronecker | Multiway block sparsity | N-BOMP |

In the following sections, we consider algorithms to compute a sparse representation of a multidimensional signal (*N*-way array) given a fixed dictionary , with . To simplify the notation and facilitate the analysis of the associated complexity, we consider *N*-way arrays with the same size in each mode and dictionaries with the same number of atoms in each mode: *I _{n}*=

*I*and

*M*=

_{n}*M*().

### 5.1. The Kronecker-OMP Algorithm for Multiway Sparsity.

We exploit here the Kronecker structure of the dictionary to avoid the explicit storage of it and save memory resources. In other words, we can compute the equivalent multiway product, in algorithm 1, STEP 3, which also gives us much lower complexity. Note that *M* is related to the number of dictionary elements associated with each mode, which is usually in the range (underdetermined system of equations). For example, if *M*=2*I*, with this simple technique we achieve a complexity of order , which is smaller compared to the complexity of the OMP algorithm applied to the vectorized signal (approximately )).

*k*, the approximation of the

*N*-way array can be written in the following convenient vector form: where matrices () contain in their columns the corresponding mode atoms selected in iterations ; more precisely,

**W**

_{n}(:,

*m*)=

**D**

_{n}(:,

*i*), the symbol stands for the Khatri-Rao product;

^{m}_{n}^{5}and the vector

**a**contains the nonzero entries of the

*N*-way array , that is, . Then the vector

**a**in equation 5.1 is formally given by where stands for the Moore-Penrose pseudoinverse. Since the pseudoinverse is rather expensive to compute, we apply the following property of the Khatri-Rao product (Kolda & Bader, 2009): , with being the element-wise product of matrices (Hadamard product); we finally obtain where . If we assume that

**z**is non-singular, then , and we can compute this inverse adaptively as follows. The matrix at iteration

*k*, denoted by

**Z**

^{(k)}, can be expressed in terms of the matrix used at step (

*k*−1), with where we defined the partition , that is, and

**w**

_{n}=

**W**

_{n}(:,

*k*). Then the inverse at step

*k*can be quickly computed using the Schur complement inversion formula for a block matrix: where

*c*=1−

**b**

^{T}

**b**and

**d**=−(

**Z**

^{(k−1)})

^{−1}

**b**.

*k*, the nonzero coefficients are computed by using equation 5.3, which can be written in the following equivalent form:

**a**is dominated by the computation of , which is only , and the saving of memory storage is huge compared to the classical OMP with an explicit dictionary. The implementation of OMP for

*N*-way arrays using a Kronecker dictionary (Kronecker-OMP) is given in algorithm 2.

### 5.2. N-BOMP Algorithm: An Efficient Algorithm to Find Multiway Block-Sparse Representations.

*N*-way Block OMP (N-BOMP), to find an -block sparse representation of an

*N*-way array (see definition 4) with respect to the factors (). We show that since the nonzero entries are restricted to being located within a subarray (block) of size , they can be identified very quickly and in many fewer iterations compared to the Kronecker-OMP presented in the previous section (see algorithm 2).

*n*dictionaries to the columns indicated by indices , that is, , then the approximation of the signal is given by a Tucker model using these matrices as factors that can be written in vector form as where () is the vectorized version of the

*N*-way array consisting of only nonzero entries. Accordingly, step 5 in algorithm 1 corresponds to the following general minimization problem: where is the vectorized version of the

*N*-way array . By defining , we see that the solution of this problem is given by

**a**=[

**B**

^{T}

**B**]

^{−1}

**By**, which means that [

**B**

^{T}

**B**]

**a**=

**By**. This allows us to write

**Z**

^{(1)})

_{(1)}efficiently by using a Cholesky factorization of the Hermitian matrix

**B**

^{T}

_{1}

**B**

_{1}. Note that this is a relatively small problem because the size of the matrix

**B**

^{T}

_{1}

**B**

_{1}is only . Now, we can use the solution of the subproblem, equation 5.11, and write its mode-2 unfolded version as where defining leads us to the following simple subproblem also solved efficiently using the Cholesky factorization of the Hermitian matrix

**B**

^{T}

_{2}

**B**

_{2}: By subsequently applying this procedure, after

*N*steps, we finally arrive at the desired matrix

**A**

_{(N)}, which corresponds to the coefficients in a mode-

*N*matrix format for selected indices in the current iteration.

The N-BOMP algorithm not only optimizes the memory storage but also requires far fewer iterations compared to the classic OMP algorithm because the maximum number of iterations is , with *K* being the number of nonzero entries within the core tensor (see a complexity analysis below).

### 5.3. Complexity Analysis.

Here we analyze the computational complexity associated with each of the algorithms discussed in this letter. For this analysis, we assume an *N*-way array and mode-*n* dictionaries given by matrices . We also assume that an -block sparse representation of with factors **D**_{n} (), which means that there are *S ^{N}* nonzero coefficients. We consider the arithmetic operations required by step 3 (maximum correlated atom detection), step 5 (least squares estimation of nonzero coefficients), and step 6 (residual update) in all algorithms at iteration number

*k*in terms of

*I*(mode size),

*M*(number of atoms per mode), and

*N*(number of dimensions). The comparative results are summarized in Table 2 (see the details in the appendix) where the advantage of N-BOMP over Kronecker-OMP and OMP is evident. From this table, we observe that for very sparse representations with , the complexity is dominated by step 3, which is exactly the same for Kronecker-OMP and N-BOMP. The key advantage of N-BOMP over the other algorithms is that it requires many fewer iterations ( against iterations in OMP/Kronecker-OMP). In addition, in step 5, the N-BOMP algorithm complexity in terms of the number of entries

*I*is sublinear compared to a linear dependence of the standard OMP and the Kronecker-OMP algorithms. In section 7, we show several numerical results with comparisons of the computation times required by different algorithms applied to multidimensional signals.

^{N}. | OMP . | Kronecker-OMP . | N-BOMP . |
---|---|---|---|

Number of Iterations | S ^{N} | S ^{N} | |

Step 3 | 2(MI)^{N}+2M ^{N} | ||

Step 5 | (kI)^{N}+3k^{2N} | 2I+2^{N}Nk+(^{N}IN+4)k+7^{N}k^{2N} | |

Step 6 | 2(Ik)^{N}+I ^{N} | N(N−1)I+^{N}I ^{N} | 2kI+^{N}I ^{N} |

. | OMP . | Kronecker-OMP . | N-BOMP . |
---|---|---|---|

Number of Iterations | S ^{N} | S ^{N} | |

Step 3 | 2(MI)^{N}+2M ^{N} | ||

Step 5 | (kI)^{N}+3k^{2N} | 2I+2^{N}Nk+(^{N}IN+4)k+7^{N}k^{2N} | |

Step 6 | 2(Ik)^{N}+I ^{N} | N(N−1)I+^{N}I ^{N} | 2kI+^{N}I ^{N} |

Note: Iterations and operations required at iteration *k* for standard OMP, Kronecker-OMP and N-BOMP algorithms for an *N*-way array with dictionaries and the block sparsity parameter *S* (usually ).

## 6. Algorithms Performance Guarantees

*N*-way array has a -block sparse representation according to the model of definition 4 and the coherence of each mode-

*n*dictionary is bounded by (). It is easy to see that if then algorithm 1 (standard OMP) applied to (the vectorized version of an

*N*-way array) with a global dictionary , will obtain the true sparse representation after

*S*iterations as predicted by condition 2.3. It is also straightforward to see that this is true for algorithm 2 (Kronecker-OMP) applied to the

^{N}*N*-way array with mode-

*n*dictionaries ().

The following theorem shows that for an *N*-way array generated as described before, the N-BOMP algorithm is able to obtain the true expected sparse representation in fewer iterations and under a less restrictive condition compared to equation 6.1.

*x*

_{11⋅⋅⋅1}|. Thus,

*k*=1, the residual is set to . Thus, for the first step of choosing the largest absolute value |

*x*

_{11⋅⋅⋅1}|, we must require that for . Now we construct a lower bound for the left-hand side (LHS) and an upper bound for the right-hand side (RHS), and then pose the above requirement. Using equation 6.3, the LHS can be written as equal to where

*g*(

_{n}*i*,

_{n}*j*) are the elements of the Gram matrices

_{n}**G**

_{n}=

**D**

^{T}

_{n}

**D**

_{n}. By expanding this equation, we obtain where

*x*

_{11⋅⋅⋅1}, we can write the bounds , which allows us to write the following lower bound for the LHS of equation 6.4: Similarly, for the RHS of equation 6.4, we can write it equal to which allows us to to write the following upper bound for the RHS of equation 6.4:

*NS*, which corresponds to the worst scenario in which only one new index is added to each mode-

*n*dictionary at every iteration. The minimum number of iterations is

*S*, which corresponds to the best scenario, where a new index is added to each mode at every iteration.

Condition 6.2 establishes an upper bound for the number of nonzero entries *S ^{N}* in the multiway block-sparse representation, which is less restrictive than condition 6.1, as illustrated in Figure 3, where this bound was computed numerically. We can see that the N-BOMP bound is always much larger than the classic OMP bound, and the situation improves for a higher number of dimensions

*N*(at a fixed global coherence value ). It is interesting to note that in practice, even beyond this upper bound, the algorithms are able to recover the correct representation with high probability, and N-BOMP always performs better than OMP for multiway block-sparse representations as our simulations in section 7 demonstrate.

## 7. Experimental Evaluation

In this section, we present several simulation results on synthetically generated signals and real-world signals in order to compare the performance of our algorithms against state-of-the-art methods like the classic OMP and BP. In particular, we evaluated an optimized version of a BP algorithm called SPGL1 (a solver for large-scale sparse reconstruction) (Berg & Friedlander, 2007), referred to here as Kronecker-SPGL1, which is optimized for large data sets and takes into account the Kronecker structure of the linear operator as used in Rivenson and Stern (2009a, 2009b) in the CI context. All the experiments were performed on a 2.65 GHz Intel Core i5 with 8GB RAM with Matlab using the Tensor Toolbox (Bader & Kolda, 2007) for efficient handling of multiway arrays. (A downloadable Matlab Demo package is available at http://web.fi.uba.ar/~ccaiafa/Cesar/N-BOMPdemo.html, which reproduces our results for 2D and 3D CS as reported in this letter.)

### 7.1. Recovery of Exact Multiway Block-Sparse Representations.

Here we analyze the capability of our algorithms to recover the correct set of nonzero coefficients and compare them against the classic OMP and BP (SPGL1). We have generated *N*-way arrays having exact multiway block-sparse representations according to definition 4 with *M _{n}*=

*M*,

*I*=

_{n}*I*, and

*S*=

_{n}*S*for . The indices that define the subarray of nonzero coefficients were randomly chosen, and the values of the nonzero entries were generated with independent and identically distributed (i.i.d.) gaussian numbers. We decide that a sparse representation was correctly recovered if the relative error based in the Frobenius norm is less than a threshold of 10

^{−2}, that is, if .

In experiment 1 (deterministic dictionary), we generated matrices by concatenating two orthogonal bases: the discrete cosine transform (DCT) basis and the canonical basis (spikes). More specifically, the dictionaries are generated as follows: **D**_{n}=[**C**|**I**] where the entries of the DCT matrix are defined by for and and the matrix is the identity matrix. It is known that sines and spikes provide highly incoherent matrices (Tropp, 2008), which is desirable in order to guarantee the recovery of sparse representations. In fact, the coherence of these matrices is , and therefore the global coherence of matrix is . These kinds of dictionaries (i.e., concatenation of orthogonal bases) have already been tested in the compressed sensing literature (Donoho & Elad, 2003; Gribonval, 2003). In Figure 4 we compare the percentage of correctly recovered representations over an ensemble of 100 simulations by using standard OMP/Kronecker-OMP, BP (implemented by SPGL1/Kronecker-SPGL1), and N-BOMP algorithms applied to generated signals for 1D, 2D, and 3D cases as a function of the total number of nonzero entries (*S ^{N}*). It is important to note that the N-BOMP algorithm always performed better than standard OMP and BP. Furthermore, it is interesting to note that the maximum number of nonzero entries required by conditions 6.1 and 6.2 to guarantee the success of OMP and N-BOMP, respectively, is much lower than the actual required number of nonzero entries. For example, in Figure 4c, the global coherence is , which gives a maximum number of nonzero entries of

*S*

^{3}

_{max}=1.87 (OMP) and

*S*

^{3}

_{max}=4.05 (N-BOMP); nevertheless, our experimental results show that 100% of the cases were correctly recovered even with

*S*

^{3}=27 (OMP) and

*S*

^{3}=64 (N-BOMP), which reveals that the theoretical guarantees are very conservative (pesimistic) and the algorithms also perform well beyond those limits.

In experiment 2 (random dictionary), we consider that matrices are random as in the case of CS, because they are the result of multiplying a random-sensing matrix by an orthogonal matrix. Dictionaries have been generated using i.i.d. gaussian variables and applying normalization to the columns. It is well known that gaussian matrices have a high probability of low coherence (Tropp & Gilbert, 2007). In Figure 5 we compare the percentage of correctly recovered representations over an ensemble of 100 simulations by using standard OMP/Kronecker-OMP, BP (implemented by SPGL1/Kronecker-SPGL1), and N-BOMP algorithms applied to generated signals for 1D, 2D and 3D cases as a function of the total number of measurements (*I ^{N}*). N-BOMP provides a higher percentage of correctly recovered signals for a few measurements. For example, in the 3D case for approximately 3000 measurements, which represents 3000/24

^{3}=21.7% of the whole signal, N-BOMP provides almost 90% of successfully recovered representations against 65% and 35% obtained by BP and Kronecker-OMP, respectively.

In Figure 6, the computation times required by Kronecker-OMP, Kronecker-BP, and N-BOMP algorithms are shown for experiment 1 (time versus the number of nonzero coefficients *S ^{N}*) and experiment 2 (time versus the number of measurements

*I*). It is important to highlight that we used the SPGL1 algorithm for the Kronecker-BP, which is optimally implemented by Berg and Friedlander (2007) in Matlab, including fast C implementations. Our implementation of Kronecker-OMP and N-BOMP does not include any C implementation. For experiment 1 (see Figures 6a and 6b), the complexity of Kronecker-OMP, as a function of the number of nonzero entries, grows faster than the cases of Kronecker-BP and N-BOMP. This is because the number of iterations required by Kronecker-OMP is

^{N}*S*and for for N-BOMP is between

^{N}*S*and

*NS*(see section 5.3). On the other side, the number of iterations of Kronecker-BP depends on the convergence of the algorithm, not on the number of nonzero entries directly. For experiment 2 (see Figures 6c and 6d), we observe that Kronecker-OMP and N-BOMP show approximately flat plots because the number of nonzero entries is kept fixed to

*S*

^{2}=8

^{2}=64 in Figure 6c and

*S*

^{3}=3

^{3}=27 in Figure 6d. On the other side, as more measurements become available, the Kronecker-BP algorithm runs faster because it requires fewer iterations to converge. From all these plots, the advantage of N-BOMP algorithm over the other algorithms is clear.

### 7.2. Application to Multidimensional CS.

Here we apply our N-BOMP algorithm to CS and compare it to a state-of-the-art method for the case of multidimensional images (large-scale data sets). More specifically we compare it to the optimized implementation of Kronecker-SPGL1 as proposed in Rivenson and Stern (2009a, 2009b), where the Kronecker structure is considered. We use the peak signal-to-noise Ratio (PSNR) to measure the quality of reconstruction of a multidimensional signal , which is defined as PSNR(dB) . Our results demonstrate that by taking into account the multiway block sparsity, we are able to obtain much better results in terms of computation time and quality of reconstruction, as the following experiments clearly show.

#### 7.2.1. Compressive Imaging.

Our N-BOMP algorithm is very attractive for CI since the Kronecker structure can be forced in the implementation of sensors by separable optics demonstrated in Rivenson and Stern (2009a, 2009b). In order to put our algorithms in the context of the state-of-the-art methods for CI, in this section we consider the reconstruction of a 1-megapixel 2D image ( with *M*=1024) benchmark already used in Candès and Romberg (2006) Candès and Wakin (2008), and Rivenson and Stern (2009a) where the original image was first processed by thresholding the largest 25, 000 wavelet coefficients. Here, we kept 99, 078 coefficients contained in a 2D block, which concentrates most of the signal energy in the Daubechies-separable wavelet transform domain (this block was selected by using the N-BOMP algorithm applied to the whole image with a target reconstruction relative error of 0.0874, the same error obtained by keeping the 25, 000 largest coefficients). We have projected the input image by using gaussian sensing matrices (*n*=1, 2) for a wide range of the sampling ratio *I*^{2}/*M*^{2}. In Figure 7a, the obtained PSNRs for the Kronecker-SPGL1 and the N-BOMP algorithms as a function of the sampling ratio *I*^{2}/*M*^{2} are shown. We see that for small sampling ratios, N-BOMP outperforms the BP strategy. For example, for a sampling ratio of 15%, we obtained PSNR values of 27 dB (Kronecker-SPGL1) and 35 dB (N-BOMP), as illustrated in Figure 8. In Figure 7b, the computation times required by these algorithms are shown. It is interesting to note that N-BOMP is always faster than the very optimized Kronecker-SPGL1 algorithm, which is extremely expensive for low levels of the sampling ratio due to the number of iterations required to converge to the solution.

#### 7.2.2. Hyperspectral Compressive Imaging.

Our sparse models for tensors are particularly relevant when dealing with large data sets with high dimensionality () because existing vector techniques such as OMP or SPGL1 algorithms become prohibitively expensive or intractable. For example, large-scale problems with 3D signals cannot be processed with state-of-the-art algorithms (such as the Kronecker-BP based on SPGL1) without the aid of a super-computer. One example of such a problem is hyperspectral compressive imaging (HCI), introduced in section 4, which we illustrate in this section.

We show the results of applying our N-BOMP algorithm to two examples of natural scene hyperspectral images () corresponding to the signals extracted from scenes 7 and 8 in the Foster and Nascimento & Amano database (Foster, Nascimento, & Amano, 2004, available online at http://personalpages.manchester.ac.uk webpage). No preprocessing steps were applied to these data sets. These hyperspectral images contain scene reflectances measured at 32 different frequency channels acquired by a low-noise Peltier-cooled digital camera in the wave-length range of 400 to 720 nm (see the details in Foster et al., 2004).

As explained in section 4, for each channel, we apply a separable random-sensing matrix given by , where is a gaussian random matrix. We also assume that the data set has a multiway block-sparse representation on the separable Daubechies wavelet transform basis given by the matrix , with and . The resulting sampling ratio is . Results are shown in Figures 9 and 10. Quantitatively, the reconstructions perform very well with PSNR =35.26 dB (global PSNR) for scene 7 and PSNR =42.52 dB (global PSNR) for scene 8. This can be qualitatively verified by visual inspection of the images (the worst and best reconstructed slices are shown in Figures 9 and 10 with their corresponding PSNR values). It is important to highlight that these large data sets required only about 1 hour of computation time with our N-BOMP algorithm, which is very fast compared to the computation required by other state-of-the-art algorithms.

## 8. Conclusion

Sparsity is a simple assumption that naturally arises in real life, but it leads to mathematical problems that are not easy to solve. Therefore, additional assumptions are needed in order to simplify the problems and develop more efficient algorithms, especially for high-dimensional data sets. In this work, we demonstrated the advantage of keeping the multidimensional structure of a data set by applying *N*-way array algorithms instead of classical vector algorithms. Since multidimensional signals usually have a sparse representation over, for example, the cosine, fourier, or wavelet separable transforms, it is reasonable to consider that dictionaries of multidimensional signals can be modeled by a Kronecker product of mode dictionaries. Additionally, motivated by the fact that in the real world, the nonzero coefficients are not evenly distributed and are likely to be grouped in blocks, we have introduced block sparsity, which allowed us to develop N-BOMP, a very efficient algorithm. In other words, block sparsity is a natural and realistic assumption that incorporates valuable prior information about signals, because block-sparse signals are more likely to occur than totally random sparse representations. Moreover, block-sparse signals are better recovered; in fact, the theoretical bound for N-BOMP is better than the theoretical bound corresponding to the vector case (OMP). We have provided extensive experimental results demonstrating the tremendous advantages of using algorithm 3 (N-BOMP), especially for higher-dimensional signals. Examples of application of N-BOMP algorithm to compressed sensed (CS), especially to 2D compressive imaging (CI) and 3D hyperspectral CI, are presented showing the efficiency and usefulness of our algorithm compared to available state-of-the-art algorithms such as classical OMP and Kronecker-BP. Summarizing, our work showed that block-sparsity naturally arises in nature; incorporating block sparsity allows one to dramatically reduce the complexity of algorithms; and the block-sparsity assumption also allows one to obtain better-quality reconstructions of signals, as our experimental results demonstrated.

## Appendix: Detailed Complexity Analysis

### A.1. OMP Algorithm (Algorithm 1).

We begin with analysis of algorithm 1 in its recent optimized version as explained in Rubinstein et al. (2008). This algorithm has as input the vector and an explicit dictionary with *M*>*I* (overcomplete case). Assuming that the algorithm is guaranteed to obtain the true sparse representation, it will need exactly *S ^{N}* iterations:

*Step 3:* This step includes computing **D**^{T}**r** and taking the absolute value maximum (2(*MI*)^{N}+2*M ^{N}* operations).

*Step 5:* By using the Cholesky factorization method, this step requires computing (*kI*)^{N}+3*k*^{2N} operations (Rubinstein et al., 2008).

*Step 6:* This step includes computing and subtracting it from **y** (2(*Ik*)^{N}+*I ^{N}* operations)

### A.2. Kronecker-OMP Algorithm (Algorithm 2).

As in the previous case, assuming that the algorithm is guaranteed to obtain the true sparse representation, it will need exactly *S ^{N}* iterations:

*Step 3:* This step includes computing ( opera-rations) and taking its absolute-value maximum (2*M ^{N}* operations).

*Step 5:* This step includes computing (2*I ^{N}* op-erations), ( operations),

**d**=(

**Z**

^{(k−1)})

^{−1}

**b**(2

*k*

^{2N}operations),

**b**

^{T}

**d**(2

*k*op-erations), updating the inverse matrix according to equation 5.4 (2

^{N}*k*+3

^{N}*k*

^{2N}), and computing the nonzero coefficients according to equation 5.7 (2

*k*

^{2N}). Thus, giving a total number of operations equals 2

*I*+2

^{N}*Nk*+(

^{N}I*N*+4)

*k*+7

^{N}*k*

^{2N}.

*Step 6:* This step includes computing and subtracting it from (*N*(*N*−1)*I ^{N}*+

*I*operations).

^{N}### A.3. N-BOMP Algorithm (Algorithm 3).

A distinctive characteristic of this algorithm compared to the previous ones is that, assuming that the algorithm is granted to obtain the true sparse representation, it will require much less iterations. More specifically, after the maximum correlated atom is detected in step 3, its position within the multiway array determines the indices to be added to the current () subsets. Some of these indices may already be included in the corresponding mode indices subsets. It is granted that at least one new index in one mode will be added. Thus, at every iteration, a situation between the following two extreme cases can happen: case 1: only one subset of indices is incremented by 1 for some *n*; case 2: a new index is selected so are incremented by 1 for every . Below, we analyze the complexity for the worst case (case 2). We note that the minimum number of iterations is *S* (case 2) and the maximum number of iterations is *NS* (case 1):

*Step 3:* The same as for the Kronecker-OMP algorithm ( operations).

*Step 5:* The update of the Cholesky factorization for each mode, similar to the case of the algorithm 1, requires 2*kI*+*k*^{2}+2*k* operations and solving a set of *N* equations of type 5.10 using again the Cholesky factorization, requires 2*Nk*^{N+1} operations. Thus, in the worst case (case 2) where the Cholesky factorization update is needed for every mode, the total number of operations is .

*Step 6:* This step includes computing ( operations) and subtracting it from , giving us approximately 2*kI ^{N}*+

*I*operations where we assumed that .

^{N}## Acknowledgments

We are grateful to the anonymous reviewers for their comments, which helped us to improve this letter. This work has been supported in part by CONICET under grant PIP 2012-2014, number 11420110100021.

## References

## Notes

^{1}

Note that norm is not a norm formally since .

^{2}

Classical examples of such separable transforms in the area of image processing are the discrete Fourier transform (DFT), the discrete cosine transform (DCT), discrete wavelet transform (DWT), and others.

^{3}

It is worth mentioning that equation 2.12 is also valid for factor matrices with more columns than rows, that is, with *R _{n}*>

*I*.

_{n}^{4}

This multiway block-sparsity structure was recently proposed in Caiafa and Cichocki (2012)

^{5}

The Khatri-Rao product is obtained by applying the Kronecker product of columns, that is, given matrices **A**=[**a**_{1}**a**_{2}⋅⋅⋅**a**_{M}] and **B**=[**b**_{1}**b**_{2}⋅⋅⋅**b**_{M}], then .