## Abstract

In this letter, we consider the nonnegative matrix factorization (NMF) problem and several NMF variants. Two approaches based on DC (difference of convex functions) programming and DCA (DC algorithm) are developed. The first approach follows the alternating framework that requires solving, at each iteration, two nonnegativity-constrained least squares subproblems for which DCA-based schemes are investigated. The convergence property of the proposed algorithm is carefully studied. We show that with suitable DC decompositions, our algorithm generates most of the standard methods for the NMF problem. The second approach directly applies DCA on the whole NMF problem. Two algorithms—one computing all variables and one deploying a variable selection strategy—are proposed. The proposed methods are then adapted to solve various NMF variants, including the nonnegative factorization, the smooth regularization NMF, the sparse regularization NMF, the multilayer NMF, the convex/convex-hull NMF, and the symmetric NMF. We also show that our algorithms include several existing methods for these NMF variants as special versions. The efficiency of the proposed approaches is empirically demonstrated on both real-world and synthetic data sets. It turns out that our algorithms compete favorably with five state-of-the-art alternating nonnegative least squares algorithms.

## 1 Introduction

This problem was introduced in 1994 by Paatero and Tapper (1994) and then received considerable interest after the work of Lee and Seung (1999, 2001). Due to the nonnegativity, equation 1.1 means that each column *i*th of *A* is approximated by an additive linear combination of *r* basis columns in *U*, weighted by the components of the *i*th row of *V*. The NMF has been successfully applied in many applications such as text mining (Shahnaz, Berry, Langville, Pauca, & Plemmons, 2006; Berry, Browne, Langville, Pauca, & Plemmons, 2006), image processing (Lee & Seung, 1999), bioinformatics (Kim & Park, 2007), and spectral data analysis (Pauca, Piper, & Plemmons, 2006; Berry et al., 2006).

### 1.1 Existing Work

Research has been very active on solution methods for the NMF problem and its extension. Work can be divided into two categories: the first, widely investigated, is based on alternating schemes and the second, less studied, directly tackles the NMF problem by minimizing the function *F* with respect to the pair

The idea of alternating schemes is based on the fact that when one of the factors *U* or *V* is fixed, problem 1.2 reduces to an efficiently solvable convex nonnegativity-constrained least squares problem (NNLS). Exploiting this property, most of algorithms proposed for the NMF solve, at each iteration, alternatively problem 1.2 with respect to one variable *U* (resp. *V*) when the other variable *V* (resp. *U*) is fixed. Hence each algorithm differs from one of the other by the way it solves the NNLS subproblems, which follows two directions: some algorithms try to compute an optimal solution, while the others settle for an approximate solution (sometimes very roughly).

Algorithms that compute optimal solutions to NNLS subproblems belong to the class of alternating nonnegative least squares (ANLS). The first proposed algorithms in this class include the projected gradient method (Lin, 2007b), the projected quasi-Newton method (Kim, Sra, & Dhillon, 2007), the active set method (Kim & Park, 2008), and the block principal pivoting method (Kim & Park, 2011). The name of these algorithms indicates the methods they use to solve the NNLS subproblems. However, these methods are quite expensive and complicated to implement. Recently, Guan, Tao, Lou, and Yuan (2012) proposed the so-called NeNMF, which applies Nesterov’s optimal gradient method to solve the NNLS subproblems. Gong and Zhang (2012) solved the NNLS subproblems by a second-order algorithm, the projected Newton method (PNM). In another work (Huang, Liu, & Zhou, 2014), a quadratic regularization projected Barzilai-Borwein (QRPBB) method is used to solve the NNLS subproblems.

Algorithms that compute only an approximate solution to the NNLS subproblem have a cheaper computational cost at each iteration than ANLS-based ones. One such procedure is the multiplicative updating (MU) algorithm proposed by Lee and Seung (1999, 2001). This algorithm has been one of the most commonly used for the NMF, but there are some issues related to its performance and convergence (Gonzalez & Zhang, 2005; Lin, 2007a, 2007b; Kim et al., 2007). Lin (2007a) presented a modified version of the MU to fix its convergence issue. However, the modification is more costly and has a slow convergence rate Berry et al. (2006) provided a simple way to treat the NNLS problem. It first finds the solution to the unconstrained least squares problem and then projects this solution onto the nonnegative orthant. This method is simple, but it is unstable and lacks convergence guarantee. An algorithm that iteratively updates each column of *V* and *U* per iteration, the hierarchical alternating least squares (HALS) algorithm, also called the rank-one residue iteration (RRI), was introduced in Cichocki, Zdunek, and Amari (2007), Cichocki and Phan (2009); Ho (2008), and Gillis and Glineur (2008). In fact, this method amounts to solving the NNLS subproblems approximately by executing only one step in the sequential coordinate-wise algorithm (Franc, Hlavac, & Navara, 2005). By carefully analyzing the computational cost needed at each iteration, Gillis and Glineur (2012) proposed an accelerated version of the HALS. The idea of this method is to repeat the sequential coordinate-wise algorithm several times at cheap additional cost. This is due to the fact that the computational cost of each sequential coordinate-wise update is much less than the computational cost of the preparation step.

In the second category of solution methods for the NMF, the algorithms simultaneously compute both factors *U* and *V* at each iteration. This approach seems to be more difficult and therefore less developed; only few work has been done by Chu, Diele, Plemmons, and Ragni (2004), Lin (2007b), and Le Thi, Pham Dinh, and Vo (2014).

### 1.2 Our Contribution

We propose two approaches based on DC (difference of convex functions) programming and DCA (DC algorithms; see Pham Dinh & Le Thi, 1997, 1998; Le Thi & Pham Dinh, 2005), in both categories noted for solving the NMF problem. The first approach follows the general alternating framework and applies DCA on the NNLS subproblems; the second approach directly investigates DCA on the NMF problem and simultaneously computes both factors *U* and *V* at each iteration.

Our motivation is based on the fact that DC programming and DCA, powerful tools and innovative approaches in nonconvex optimization, have been successfully applied to many (smooth or nonsmooth) large-scale nonconvex programs in various domains of applied sciences and proved to be more robust and efficient than the standard methods, particularly in data analysis and data mining (see, e.g. Le Thi, Belghiti, & Pham Dinh, 2006; Le Thi, Le Hoai, & Pham Dinh, 2007; Le Thi, Le, & Pham Dinh, 2007; Le Thi, Le, Nguyen, & Pham Dinh, 2008; Le, Le Thi, Pham Dinh, & Huynh, 2013; Le Thi, Le, Pham Dinh, & Huynh, 2013; Le Thi, Le, & Pham Dinh, 2014, 2015; Le Thi, Nguyen, & Pham Dinh, 2014; Le Thi, Vo, & Pham Dinh, 2014; Le Thi & Nguyen, 2014; Le Thi, Pham Dinh, Le, & Vo, 2015; Le, Le Thi, & Nguyen, 2015 see also references in Le Thi, 2005).

Our contributions are multiple. First, we develop a DCA scheme for solving the NNLS convex subproblem in the alternating method. Although convex programming has been studied for about a century, an increasing amount of effort has been put recently into developing fast and scalable algorithms to deal with large-scale problems. Especially for the NNLS problem, several efficient convex optimization methods have been investigated. Since a convex problem can be formulated as a DC program, DCA can also be applied to convex programs. Exploiting the specific structure of the NNLS problem and using a suitable DC decomposition of its objective function, we propose a DCA scheme whose calculations are all explicit. Furthermore, we improve this DCA scheme by considering a *like DCA algorithm* that requires much less computational effort.

Second, following the general alternating method and using the DCA designed to the NNLS problem, we propose an alternative DCA-based algorithm for solving the NMF problem and prove its convergence properties. We show that with a suitable DC decomposition, our method includes many existing algorithms for the NMF as special cases.

Third, based on the conventional DCA, we develop two algorithms for the NMF problem that simultaneously compute two factors *U* and *V* of equation 1.2 at each iteration. Exploiting the fact that even if equation 1.2 is NP-hard, its structure is quite simple—the objective function *F* is twice differentiable and the feasible set is simply defined by nonnegative constraints—we propose two versions of DCA where all computations are explicit. The first requires the projection of points on a ball, while the second is based on the projection of points on a box in combining with a variable selection updating strategy. This constitutes an original contribution of the letter, knowing that most existing work shares the alternating scheme and little previous work has solved the whole NMF problem directly.

Fourth, we adapt the proposed DCA-based methods for several NMF variants. We also show that these algorithms cover several existing methods for these NMF problems as special cases. Preliminary experiments on both synthetic and real data sets show the efficiency of our proposed NMF algorithms.

The rest of the letter is organized as follows. In the next section we look at briefly DC programming and DCA for general DC programs. Section 3 deals with the first approach, the alternative DCA-based algorithm, while section 4 concerns the second approach, with two DCA-based algorithms for solving the whole NMF problem. Several NMF variants and their solution methods are considered in section 5. The numerical experiments are presented in section 6. Section 7 concludes the letter.

Before beginning, we mention some notations we use. We use uppercase letters to denote matrices and lowercase letters for vectors or scalars. Vectors are also regarded as matrices with one column. denotes the matrix whose entries are all 1. For a matrix , the notation (resp. ) refers to the *i*th row (resp. *j*th column) of *X*. The inequality means that all entries of *X* are nonnegative. Each entry of the absolute matrix is defined as for all . The notation (resp. ) denotes the element-wise product (resp. division) of matrices. The matrix inner product of two matrices is defined by . For two matrices *X* and *Y* of the same size , the inequality means that for all indices , while (resp. ) denotes the matrix *Z* such that (resp. ) for all . For a scalar , is shortly denoted as .

*B*,

*C*, and

*X*with compatible sizes, we have the relation (Van Loan, 2000)

For a matrix , . The set of nonnegative (resp. positive) real numbers is denoted by (resp. ). The projection of a vector on a subset , denoted by , is the nearest element of to *x* with regard to Euclidean distance. When , we use the notation instead of . It is clear that .

## 2 Outline of DC Programming and DCA

*G*,

*H*are lower semicontinuous proper convex functions on . Such a function

*F*is called a DC function, and a DC decomposition of

*F*while

*G*and

*H*are the DC components of

*F*. Note that the closed convex constraint can be incorporated in the objective function of (

*P*) by using the indicator function on

_{dc}*C*denoted by , which is defined by if and otherwise:

*G*−

*H*if is finite and there exists a neighborhood of such that The necessary local optimality condition for (primal) DC program (

*P*) is given by Condition 2.2 is also sufficient (for local optimality) in many important classes of DC programs, for example, for polyhedral DC programs with

_{dc}*H*being polyhedral convex or when function

*F*is locally convex at (Pham Dinh & Le Thi, 1997; Le Thi & Pham Dinh, 2005).

### 2.1 Philosophy of DCA

DCA is based on local optimality conditions and duality in DC programming (for simplicity, we omit here the presentation of DC duality and the dual part of DCA). The main idea of DCA is simple: each iteration *k* of DCA approximates the concave part—*H* by its affine majorization (that corresponds to taking —and minimizes the resulting convex function.

The generic DCA scheme can be described as follows:

**Generic DCA Scheme**

**Initialization:**Let be an initial guess, .**Repeat**- •
Calculate

- •
Calculate

- •

- •
**Until**convergence of .

Convergence properties of DCA and its theoretical basic can be found in Pham Dinh and Le Thi (1997) and Le Thi and Pham Dinh (2005). It is worth mentioning that:

DCA is a descent method (i.e., the sequence is decreasing) with global convergence (its converges from any starting point). Moreover, DCA does not require line search (its step size is always equal to one).

If , then

*x*is a critical point of^{k}*G*−*H*. In such a case, DCA terminates at the*k*th iteration.If the optimal value of problem (

*P*) is finite and the infinite sequence is bounded, then every limit point of the sequences is a critical point of_{dc}*G*−*H*.DCA has a linear convergence for general DC programs.

A deeper insight into DCA has been described in Le Thi and Pham Dinh (2005). For instance, it is crucial to note the main feature of DCA: DCA is constructed from DC components and their conjugates but not the DC function *f* itself, which has infinitely many DC decompositions, and there are as many DCA as there are DC decompositions. Such decompositions play a crucial role in determining the speed of convergence, stability, robustness, and globality of computed solutions. This flexibility of DC programming and DCA is of particular interest from both a theoretical and an algorithmic point of view. Moreover, it is worth pointing out that with suitable DC decompositions, DCA generates most of standard methods in convex and nonconvex programming. For instance, the four well-known methods in machine learning—EM (expectation-maximization), SLA (successive linear approximation), CCCP (convex-concave procedure), and ISTA (iterative shrinkage–thresholding algorithm)—are particular cases of DCA (see Pham Dinh & Le Thi, 1998).

For a complete study of DC programming and DCA, readers are referred to Pham Dinh and Le Thi (1997, 1998, 2014). Le Thi and Pham Dinh (2005). The solution of a nonconvex program (*P _{dc}*) by DCA must be composed of two stages: the search for an appropriate DC decomposition of

*f*and that of a good initial point.

In the last decade, a variety of work on machine learning based on DCA has been conducted. The efficiency and the scalability of DCA have been proved in a lot of work (see Le et al., 2013; Le Thi, Pham Dinh et al., 2015; the list of references on http://lita.sciences.univ-metz.fr/lethi/DCA.html). The successful work on DCA motivated us to investigate it for solving the NMF problem.

## 3 The First Approach: Alternative Method Based on DCA

The alternating methods for solving the NMF problem consist of holding *U* fixed and optimizing with respect to *V*, and then holding *V* fixed and optimizing with respect to *U*. More precisely, these methods share the following general scheme:

Initialize with nonnegative elements.

For do

Similarly, one may exchange the order of updating *U* and *V*.

*V*. Hence, solving the last problem amounts to solving

*n*NNLS problems with a single right-hand side of the form where

*v*and

*a*are the transpose of rows of

*V*and

*A*accordingly.

### 3.1 DCA for Solving the NNLS problem

#### 3.1.1 DCA for Solving the NNLS Problem with a Single Right-Hand-Side Vector

*f*can be expressed as a DC function, where Thus, problem 3.2 can be reformulated as a DC program: Applying DCA to equation 3.5, we have to determine, at each iteration

*k*, and then compute the next iterate by For simplifying the presentation, we drop the index

*k*; then the update procedure can be briefly described by A common choice of is such that (Pham Dinh & Le Thi, 1998). With this choice, the update rule, equation 3.6, can be interpreted as a projected gradient scheme with the constant step size . However, by modifying , we get different DC decompositions of

*f*that may influence the efficiency of the corresponding DCA. The following result suggests a more flexible way to choose .

^{1}, for any , we can choose where is a safety parameter that guarantees the positivity of . Then the update rule, equation 3.6, becomes or, equivalently, Hence, by varying , we get various vector with different component values.

*Improvement of Rule 3.7: A like DCA algorithm*. Despite the condition , the update rule, equation 3.7, is well defined for any . Moreover, we observe that if the index , that is, , then for any , the update rule, equation 3.7, does not affect the value of . Thus, at each iteration, we can discard the variables *v _{i}* with . Consequently, to reduce the computational effort, it is reasonable to consider in equation 3.7, and let when

*v*is not under consideration. The following result proves that this modified DCA scheme is still a descent method as the standard DCA.

_{i}For exploiting the effect of DC decomposition on the resulting DCA, we can again refine the modified DCA by computing the vector *d* dynamically during DCA’s iterations. This procedure can be seen as the use of local DC decompositions, that is, DC decompositions are changed from one to another iteration.

*A dynamic computation of*For , define Then () is the updated value of

*d*.*v*by applying the update rule 3.7 with – the

_{t}*t*th canonical basis vector in . And is the corresponding displacement. Define () by and for all . For , if we update only the

*t*th component of

*v*, the objective function

*f*will decrease an amount of Let and Let , where

*v*

^{0}is the starting point and is a small positive number. Then the vector in the update rule, equation 3.7, is defined by It is clear that if and ,

*d*defined by equation 3.10 satisfies condition 3.8.

In the sequel we will use equation 3.10 to define *d*; the resulting algorithm is named *like DCA algorithm*.

#### 3.1.2 DCA for Solving the NNLS Problem with Multiple Right-Hand-Side Vectors

Now let be computed by equation 3.10. Then the *like DCA* for solving equation 3.12 is described in algorithm DCA-NNLS:

DCA-NNLS: DCA for Solving NNLS problem 3.12

The following assertion is a direct consequence of lemma ^{2}:

*V*is not a critical point of equation 3.12, that is, , we have , otherwise .

Consequently, let be the input and output of algorithm DCA-NNLS. If *V*^{0} is not a solution of problem, equation 3.12, then ; otherwise .

### 3.2 The Alternative DCA-Based Algorithm and Its Convergence Properties

The alternative DCA-based algorithm for solving the NMF problem, equation 1.2 is described in the ADCA algorithm:

ADCA: Alternative DCA-Based Algorithm for Solving the NMF Problem

Initialize with nonnegative elements, , ,

**repeat**Compute

Compute

**until**Stopping criterion is satisfied.

First, let us prove a general result, which will be used for deriving the convergence of ADCA.

*f*is a continuous real-valued function taken on the closed convex set . Suppose that is a sequence satisfying the following conditions:

The sequence is monotonically decreasing,

Then, every limit point of is a critical point of the problem (P).

Assume that is a limit point of and is not a critical point. Then there is a sub-sequence such that . By the assumption, there are a function and a sub-sub-sequence that satisfy condition b.

The following lemma is useful to prove that the sequence generated by ADCA satisfies condition b of theorem ^{6}.

There is a finite set of continuous functions from such that for any , , and computed by algorithm DCA-NNLS, we have for a .

*T*, defined by is continuous with regard to . Therefore, where stands for the composite function (

*l*times), is a finite set of continuous functions from .

By the construction of DCA-NNLS, if , and are the inputs and output of DCA-NNLS, there is a such that .

Now we are in a position to prove the convergence of ADCA.

Every limit point of the sequence generated by algorithm ADCA is a critical point of the NMF problem, equation 1.2.

According to lemma ^{4}, the sequence generated by algorithm ADCA verifies condition a of theorem ^{6}.

^{7}, there are finite families and of continuous functions from and from , respectively, such that, for , for some , .

^{6}. Applying theorem

^{6}, we deduce that every limit point of the sequence generated by algorithm ADCA is a critical point of the NMF problem, equation 1.2.

It is worth noting that this convergence analysis is not only valid for our specific choice of *D* but also to other *D* satisfying condition 3.14 and the assumption that there is a finite set of continuous function from such that each *D ^{l}* in algorithm DCA-NNLS can be expressed as for a . The latter assumption simply requires that each element of

*D*is computed from

*U*and

*V*via a finite number of formulations. It is easy to see that this assumption is satisfied when , and

*D*is constructed by equation 3.17. The two latter constructions of

*D*also verify condition 3.14, so the convergence of the corresponding NMF algorithms is guaranteed.

### 3.3 Link between ADCA and Existing Methods

We will see that the algorithm ADCA includes some well-known alternating methods for the NMF problem. As indicated in section 1, each alternating algorithm differs from one another by how they solve NNLS subproblems. We show below that the update rules for solving the NNLS problem, equation 3.12, in some existing NMF algorithms is nothing other than our update rule, equation 3.13, with a suitable choice of *D*.

#### 3.3.1 Multiplicative Updating (MU) Algorithm (Lee & Seung, 2001)

#### 3.3.2 Modified MU Algorithm (Lin, 2007a)

#### 3.3.3 Hierarchical Alternating Least Squares (HALS) Algorithm (Ho, 2008; Cichocki & Phan, 2009)

## 4 The Second Approach: DCA Applied on the Whole NMF Problem

Even if equation 1.2, is NP-hard, its structure is quite suitable to investigate DCA: the objective function *F* is twice differentiable, and the feasible set is simply defined by nonnegative constraints. For a DC program with the objective function being twice differentiable, it is well known that the projection DC decomposition results in very inexpensive DCAs when the feasible set is a ball or a box (see e.g. Le et al., 2013; Le Thi, Le Hoai et al., 2007; Le Thi, Le et al., 2007, Le Thi, Le et al., 2013; Le Thi & Nguyen, 2014). In this section we propose two versions of such DCAs. The first requires the projection of points on a ball, and the second is based on the projection of points on a box in combining with the variable selection updating strategy similar to the technique developed in section 3.1.

### 4.1 The First DCA for the Whole NMF Problem

Attempting to design efficient DCAs based on the projection DC decomposition, we first construct the two balls bounding the variables *U* and *V* (naturally, the smaller the balls are, the better DCA is) and then reduce the feasible set of the NMF problem on the product of these balls.

(ii) If is a critical point (resp. optimal solution) of problem 4.1, then it is also a critical point (resp. optimal solution) of problem 1.2.

(ii) Following condition i, it is easy to see that if is an optimal solution to problem 4.1, then it is an optimal solution to problem 1.2.

By virtue of lemma ^{9}, instead of solving problem 1.2 on the whole space , we only need to consider the restricted problem, 4.1. In the rest of this section, we reformulate problem 4.1 as a DC program and develop DCA for solving it.

DCA1: The First DCA for solving the Whole NMF Problem

Initialize ,

**repeat**Compute using equation 4.4

Compute using equation 4.5

**until**Stopping criterion is satisfied.

The sequence generated by algorithm DCA1 has at least one limit point, and every limit point of is a stationary point of problem 2.

The first conclusion is trivial due to the fact that the sequence is bounded. The second conclusion follows lemma ^{9} and the general convergence properties of DCA.

### 4.2 The Second DCA for the Whole NMF Problem

In the first DCA, we have to compute all the variables *U _{ij}* and

*V*at each iteration. Although the projection of points on a ball is explicit, DCA1 requires important computational efforts if we want to determine an optimal solution of the convex subproblem at each iteration. In order to reduce computational efforts, we investigate a variable selection strategy similar to the

_{ij}*like DCA*for the NNLS problem—say, at each iteration of the DCA scheme, some variables are discarded and others are updated. Such a procedure is possible if the subproblem at each iteration is separable in its variables. This suggested using the projection DC decomposition on rectangles. For this purpose, we design local DC decompositions of

*F*by bounding the variables to be updated on a box and then decompose

*F*on this reduced domain, as we show below.

^{1}, we have Therefore, Consider now functions and defined by From the above estimation of , for any and , we have Therefore, and are convex functions on . Then the NMF problem, equation 1.2, is a DC program of the form Hence, each iteration of DCA applied to equation 4.8 consists of the two following steps:

*D*and

^{U}*D*. By the same reason in section 3.1.2, we see that

^{V}*D*and

^{U}*D*in equations 4.9 and 4.10 are not necessarily strictly positive; instead, they must only satisfy where and . As in algorithm DCA-NNLS, we will compute

^{V}*D*and

^{U}*D*row by row using equation 3.10. Then only entries

^{V}*U*(resp.

_{it}*V*) with (resp. ) are updated at iteration

_{jt}*k*.

Finally, the second DCA for the NMF problem is described in algorithm DCA2:

DCA2: The Second DCA for Solving the Whole NMF Problem

With the same arguments in lemmas ^{4} and ^{7} and theorem ^{8} and the fact that in step 2 of algorithm DCA2 depends on , we have the following convergence result:

^{8}still holds.

## 5 Solving NMF Variants

In this section, we adapt our proposed approaches for solving several NMF variants, including the nonnegative factorization, the smooth regularization NMF, the sparse regularization NMF, the multilayer NMF, the convex/convex-hull NMF, and the symmetric NMF. For the first five classes of problems, we use the alternating framework similar to equation 3.1 and the method in section 3.1 for solving subproblems. We also show that some existing algorithms are actually special cases of our method. For solving the symmetric NMF, we develop a DCA-based algorithm similar to the method in section 4.1 for the original NMF.

To simplify the presentation, we do not describe the complete alternating procedure as algorithms DCA-NNLS and ADCA. Instead, we only derive the update rule that is the essential part in the alternating framework.

### 5.1 Nonnegative Factorization

Gillis and Glineur (2008) introduced the problem of nonnegative factorization (NF), which is the same as NMF except that the matrix to factorize *A* can be any real matrix. We still consider problem 1.2 but do not assume .

Note that the update rules derived in section 3.1.1 (resp. 3.1.2) do not use the assumption (resp. ). Thus, they can be applied on the NF problem. Below, we propose another update rule for the NF problem.

*A*can always be written as the difference of two nonnegative matrices

*B*and

*C*: . For , deriving from equation 5.1, we get the following update rule for factor

*V*: Especially by taking , we have This multiplicative rule was also introduced in Gillis and Glineur (2008).

### 5.2 Regularized Nonnegative Matrix Factorization

*J*

_{1}and

*J*

_{2}are regularization terms enforcing certain dependent-application constraints on the solutions, and are trade-off parameters.

*U*and

*V*at each iteration by solving two subproblems. For instance, the subproblem to compute

*V*when

*U*is fixed takes the form We will derive the update rules for computing solution to this problem in specific cases.

#### 5.2.1 Smooth Regularization

*f*of problem 5.3 can be represented as where is a permutation matrix such that .

*L*is represented as with being semipositive symmetric matrices, we have As indicated in remark

^{3}, we can replace

*Q*with in the denominator of equation 5.4. Thus, for , an update rule of

*V*takes the form

Below, we will study the update rules, equations 5.4 and 5.5 for some standard smooth regularizations and show that existing methods are special cases of our methods.

*Tikhonov regularization (Berry et al., 2006).*If , then , which is known as the Tikhonov regularization. The update rule, equation 5.4, becomes or, under the matrix form, When and , the preceding update rule becomes This is the multiplicative update rule in the algorithm constrained NMF proposed in Berry et al. (2006).

*Temporal smoothness regularization (Chen & Cichocki, 2005)*. Let , where is a template operator. We can apply directly update rules, equations 5.4 and 5.5, by representing . In a special case, when is sufficiently small so that , and with and , equation 5.4 becomes or, equivalently, We obtain the update rule proposed in Chen and Cichocki (2005).

*Graph Laplacian regularization (Cai, He, Wu, & Han, 2008)*. Considered , where

*T*is a symmetric edge weight matrix with elements and

*M*is the diagonal matrix whose elements are column sums of

*T*. Since

*M*and

*T*are nonnegative, we can apply the update rule, equation 5.5. Specifically, if we take and , equation 5.5 becomes or, equivalently, This is also the update rule proposed in Cai et al. (2008).

#### 5.2.2 Sparse Regularization

Recall that one of the important applications of NMF is to learn parts-base representations of nonnegative data. Although NMF itself usually produces sparse representations of data, it is useful to impose more sparsity (Hoyer, 2002, 2004). Below, we consider some existing sparse regularizations.

*-norm regularization (Hoyer, 2002)*. The regularized term is given by The objective function of problem 5.3 becomes where

*P*is the permutation matrix defined in section 5.2.1. For , let According to equation 3.6, the update rule takes the form or, under the matrix form, Particularly by taking , we get back the multiplicative rule proposed in Hoyer (2002):

*Squared -norm regularization.*A sparse regularization similar to the -norm regularization has been proposed by Kim and Park (2011):

*The Hoyer-regularization (Hoyer, 2004).*Hoyer proposed a sparsity measure of a vector as follows: To control the sparsity of

*V*, Hoyer (2004) imposed the constraints where is the desired sparsity of

*V*. These constraints can be rewritten as, when

*V*is represented as a vector, This can be penalized in the objective function, and then the regularization term is where .

*J*

_{2}is nonconvex but can be expressed as a DC function, we investigate DCA for solving equation 5.3. For this purpose, we consider the following DC decomposition of

*f*, the objective function of equation 5.3: DCA applied to problem 5.3 consists of, at each iteration, computing and solving the convex subproblem, that has form 3.11. Performing one iteration of the update rule, 3.7, on this subproblem, we derive the update rule for problem 5.3 as follows, where , , and .

### 5.3 Multilayer Nonnegative Matrix Factorization

*A*as a product of

*N*nonnegative factor matrices with compatible sizes (Cichocki & Zdunek, 2006). Measuring discrepancy by Euclidean distance, the multilayer nonnegative matrix factorization takes the form (Ho, 2008) For , we set and and . Then the objective function

*F*in equation 5.7 can be expressed as According to the alternating framework, we iteratively update s as follows:

**for****do**Fix s (), update

*X*as an approximate solution to the problem_{i}.

**end for**

### 5.4 Convex and Convex-Hull Nonnegative Matrix Factorization

#### 5.4.1 Convex Coding

*UV*of an input matrix

^{T}*A*such that and . It means that each column of

*A*has to be approximated by a convex combination of columns of

*U*. Most popular algorithms for NMF cannot directly handle the constraint . Lee and Seung (1997) and Berry et al. (2006) proposed penalizing it in the objective function; then the regularization term is Then the subproblem for computing

*V*has the form of equation 3.11. However it is very difficult to find solutions satisfying the full constraint. We show in this section that DCA can be efficiently investigated for this problem without the penalty technique.

*V*(

*U*is computed by solving an NNLS problem using the algorithm DCA-NNLS). Computing

*V*amounts to solving a sequence of separate problems of the form where is the canonical simplex defined by For , we consider the following DC decomposition of

*f*: DCA applied on this DC decomposition is iteratively processed as follows:

Compute .

Update .

#### 5.4.2 Convex Nonnegative Matrix Factorization

*V*, the convex-NMF imposes the constraint on the first factor

*U*so that its columns are convex combinations of columns of the data matrix

*A*. This yields an interesting interpretation: each data point can be expressed as a weighted sum of given data points. The mathematical formulation of convex-NMF takes the form (here, the variable

*W*is used instead of ) Similar to equation 5.11, the alternating scheme applied to equation 5.14 differs from the one for the original NMF only on the step of computation of

*W*(

*V*is computed by solving an NNLS problem via the algorithm DCA-NNLS). Computing

*W*amounts to solving the convex problem, Since the product

*WV*is invariant under diagonal scaling (for a diagonal matrix with positive diagonal elements, ), the first approach to compute

^{T}*W*can be carried out through these two steps:

Ignore the additivity constraint and compute

*W*using the method discussed at the end of section 5.3.

We can also completely ignore the additive constraint and apply the normalization step only when the iteration process of computing *W* and *V* is finished. However, it may occur that some of columns of *W* are zero and the normalization step is invalid. In other words, this approach cannot be applied when at least one column of *W* is zero.

The second approach consists of directly solving problem 5.15 by DC programming and DCA.

Compute and then

#### 5.4.3 Convex-Hull Nonnegative Matrix Factorization

Thurau, Kersting, Wahabzada, and Bauckhage (2011) extended the idea of convex-NMF to the so-call convex-hull NMF. This NMF variant adds the constraint on the second factor *V* in the convex coding problem, (i.e., ) to the convex-NMF 5.14. Now each data point can be expressed as a convex combination of convex combinations of specific data points. For this problem, we can alternatively compute *W* using equation 5.16 and compute *V* by DCA as discussed in section 5.4.1.

It is worth mentioning that both Ding et al. (2010) and Thurau et al (2011) did not directly deal with the constraint . Ding et al. (2010) simply ignored this constraint and alternatively computed *W* and *V* by a majorization-minimization method. Meanwhile, instead of computing *W* and setting , Thurau et al. (2011) determined directly *U* by (approximately) finding *k* appropriate vertices of the convex hull of *A*.

### 5.5 Symmetric Nonnegative Matrix Factorization

*A*is symmetric (e.g., the adjacency matrix of an undirected graph, correlation matrices), and we want to find a nonnegative factorization of

*A*that preserves the symmetry. The SNMF aims at approximating a symmetric data matrix by a matrix of the form

*UU*such that with . This means that we desire the matrix factors in equation 1.1 to be equal. The mathematical formulation of the SNMF then takes the form where

^{T}*A*is an symmetric nonnegative matrix and .

As with the case of NMF, we can bound the solution of the SNMF problem as follows:

If *U* is a solution of problem 5.17, then , where .

*F*: Thus, for , the objective function of equation 5.18 admits the following DC decomposition: and the resulting DCA can be summarized as shown here:

_{s}DCA-sym: DCA for Solving the SNMF

Initialize

*U*^{0}satisfied , .**repeat**Compute .

Compute

.

**until**Convergence of .

## 6 Numerical Experiments

In this section, we evaluate our proposed algorithms, ADCA, DCA1, and DCA2, and compare them with the following state-of-the-art algorithms:

BPP: Alternating nonnegative least squares with the block principal pivoting method (Kim & Park, 2011)

^{1}- •
HAac: The accelerated hierarchical alternating least squares (Gillis & Glineur, 2012)

^{2} - •
NeNMF: An NMF solver based on Nesterov’s optimal gradient method (Guan et al., 2012)

^{3} - •
PNM: NMF via projected Newton method (Gong & Zhang, 2012)

^{4} - •
QRPBB: NMF using quadratic regularization projected Barzilai-Borwein method (Huang et al., 2014)

We use the source codes of these algorithms available on the authors’ home pages, except for the QRPBB, which is provided directly by the authors. For the NNLS subproblems, we used the default settings parameters in their code, which are the best settings according to the authors’ suggestions.

In all experiment, the settings , , were used for algorithm DCA-NNLS and *D* was row-by-row constructed using equation 3.10.

All of these algorithms were executed with Matlab R2008b an Intel Core i5-2500S 2 × 2.7 GHz processor and 4 GB memory. The multithreading option of Matlab was disabled.

*U*

^{0}and

*V*

^{0}is picked randomly in