In this letter, we consider the nonnegative matrix factorization (NMF) problem and several NMF variants. Two approaches based on DC (difference of convex functions) programming and DCA (DC algorithm) are developed. The first approach follows the alternating framework that requires solving, at each iteration, two nonnegativity-constrained least squares subproblems for which DCA-based schemes are investigated. The convergence property of the proposed algorithm is carefully studied. We show that with suitable DC decompositions, our algorithm generates most of the standard methods for the NMF problem. The second approach directly applies DCA on the whole NMF problem. Two algorithms—one computing all variables and one deploying a variable selection strategy—are proposed. The proposed methods are then adapted to solve various NMF variants, including the nonnegative factorization, the smooth regularization NMF, the sparse regularization NMF, the multilayer NMF, the convex/convex-hull NMF, and the symmetric NMF. We also show that our algorithms include several existing methods for these NMF variants as special versions. The efficiency of the proposed approaches is empirically demonstrated on both real-world and synthetic data sets. It turns out that our algorithms compete favorably with five state-of-the-art alternating nonnegative least squares algorithms.
This problem was introduced in 1994 by Paatero and Tapper (1994) and then received considerable interest after the work of Lee and Seung (1999, 2001). Due to the nonnegativity, equation 1.1 means that each column ith of A is approximated by an additive linear combination of r basis columns in U, weighted by the components of the ith row of V. The NMF has been successfully applied in many applications such as text mining (Shahnaz, Berry, Langville, Pauca, & Plemmons, 2006; Berry, Browne, Langville, Pauca, & Plemmons, 2006), image processing (Lee & Seung, 1999), bioinformatics (Kim & Park, 2007), and spectral data analysis (Pauca, Piper, & Plemmons, 2006; Berry et al., 2006).
1.1 Existing Work
Research has been very active on solution methods for the NMF problem and its extension. Work can be divided into two categories: the first, widely investigated, is based on alternating schemes and the second, less studied, directly tackles the NMF problem by minimizing the function F with respect to the pair
The idea of alternating schemes is based on the fact that when one of the factors U or V is fixed, problem 1.2 reduces to an efficiently solvable convex nonnegativity-constrained least squares problem (NNLS). Exploiting this property, most of algorithms proposed for the NMF solve, at each iteration, alternatively problem 1.2 with respect to one variable U (resp. V) when the other variable V (resp. U) is fixed. Hence each algorithm differs from one of the other by the way it solves the NNLS subproblems, which follows two directions: some algorithms try to compute an optimal solution, while the others settle for an approximate solution (sometimes very roughly).
Algorithms that compute optimal solutions to NNLS subproblems belong to the class of alternating nonnegative least squares (ANLS). The first proposed algorithms in this class include the projected gradient method (Lin, 2007b), the projected quasi-Newton method (Kim, Sra, & Dhillon, 2007), the active set method (Kim & Park, 2008), and the block principal pivoting method (Kim & Park, 2011). The name of these algorithms indicates the methods they use to solve the NNLS subproblems. However, these methods are quite expensive and complicated to implement. Recently, Guan, Tao, Lou, and Yuan (2012) proposed the so-called NeNMF, which applies Nesterov’s optimal gradient method to solve the NNLS subproblems. Gong and Zhang (2012) solved the NNLS subproblems by a second-order algorithm, the projected Newton method (PNM). In another work (Huang, Liu, & Zhou, 2014), a quadratic regularization projected Barzilai-Borwein (QRPBB) method is used to solve the NNLS subproblems.
Algorithms that compute only an approximate solution to the NNLS subproblem have a cheaper computational cost at each iteration than ANLS-based ones. One such procedure is the multiplicative updating (MU) algorithm proposed by Lee and Seung (1999, 2001). This algorithm has been one of the most commonly used for the NMF, but there are some issues related to its performance and convergence (Gonzalez & Zhang, 2005; Lin, 2007a, 2007b; Kim et al., 2007). Lin (2007a) presented a modified version of the MU to fix its convergence issue. However, the modification is more costly and has a slow convergence rate Berry et al. (2006) provided a simple way to treat the NNLS problem. It first finds the solution to the unconstrained least squares problem and then projects this solution onto the nonnegative orthant. This method is simple, but it is unstable and lacks convergence guarantee. An algorithm that iteratively updates each column of V and U per iteration, the hierarchical alternating least squares (HALS) algorithm, also called the rank-one residue iteration (RRI), was introduced in Cichocki, Zdunek, and Amari (2007), Cichocki and Phan (2009); Ho (2008), and Gillis and Glineur (2008). In fact, this method amounts to solving the NNLS subproblems approximately by executing only one step in the sequential coordinate-wise algorithm (Franc, Hlavac, & Navara, 2005). By carefully analyzing the computational cost needed at each iteration, Gillis and Glineur (2012) proposed an accelerated version of the HALS. The idea of this method is to repeat the sequential coordinate-wise algorithm several times at cheap additional cost. This is due to the fact that the computational cost of each sequential coordinate-wise update is much less than the computational cost of the preparation step.
In the second category of solution methods for the NMF, the algorithms simultaneously compute both factors U and V at each iteration. This approach seems to be more difficult and therefore less developed; only few work has been done by Chu, Diele, Plemmons, and Ragni (2004), Lin (2007b), and Le Thi, Pham Dinh, and Vo (2014).
1.2 Our Contribution
We propose two approaches based on DC (difference of convex functions) programming and DCA (DC algorithms; see Pham Dinh & Le Thi, 1997, 1998; Le Thi & Pham Dinh, 2005), in both categories noted for solving the NMF problem. The first approach follows the general alternating framework and applies DCA on the NNLS subproblems; the second approach directly investigates DCA on the NMF problem and simultaneously computes both factors U and V at each iteration.
Our motivation is based on the fact that DC programming and DCA, powerful tools and innovative approaches in nonconvex optimization, have been successfully applied to many (smooth or nonsmooth) large-scale nonconvex programs in various domains of applied sciences and proved to be more robust and efficient than the standard methods, particularly in data analysis and data mining (see, e.g. Le Thi, Belghiti, & Pham Dinh, 2006; Le Thi, Le Hoai, & Pham Dinh, 2007; Le Thi, Le, & Pham Dinh, 2007; Le Thi, Le, Nguyen, & Pham Dinh, 2008; Le, Le Thi, Pham Dinh, & Huynh, 2013; Le Thi, Le, Pham Dinh, & Huynh, 2013; Le Thi, Le, & Pham Dinh, 2014, 2015; Le Thi, Nguyen, & Pham Dinh, 2014; Le Thi, Vo, & Pham Dinh, 2014; Le Thi & Nguyen, 2014; Le Thi, Pham Dinh, Le, & Vo, 2015; Le, Le Thi, & Nguyen, 2015 see also references in Le Thi, 2005).
Our contributions are multiple. First, we develop a DCA scheme for solving the NNLS convex subproblem in the alternating method. Although convex programming has been studied for about a century, an increasing amount of effort has been put recently into developing fast and scalable algorithms to deal with large-scale problems. Especially for the NNLS problem, several efficient convex optimization methods have been investigated. Since a convex problem can be formulated as a DC program, DCA can also be applied to convex programs. Exploiting the specific structure of the NNLS problem and using a suitable DC decomposition of its objective function, we propose a DCA scheme whose calculations are all explicit. Furthermore, we improve this DCA scheme by considering a like DCA algorithm that requires much less computational effort.
Second, following the general alternating method and using the DCA designed to the NNLS problem, we propose an alternative DCA-based algorithm for solving the NMF problem and prove its convergence properties. We show that with a suitable DC decomposition, our method includes many existing algorithms for the NMF as special cases.
Third, based on the conventional DCA, we develop two algorithms for the NMF problem that simultaneously compute two factors U and V of equation 1.2 at each iteration. Exploiting the fact that even if equation 1.2 is NP-hard, its structure is quite simple—the objective function F is twice differentiable and the feasible set is simply defined by nonnegative constraints—we propose two versions of DCA where all computations are explicit. The first requires the projection of points on a ball, while the second is based on the projection of points on a box in combining with a variable selection updating strategy. This constitutes an original contribution of the letter, knowing that most existing work shares the alternating scheme and little previous work has solved the whole NMF problem directly.
Fourth, we adapt the proposed DCA-based methods for several NMF variants. We also show that these algorithms cover several existing methods for these NMF problems as special cases. Preliminary experiments on both synthetic and real data sets show the efficiency of our proposed NMF algorithms.
The rest of the letter is organized as follows. In the next section we look at briefly DC programming and DCA for general DC programs. Section 3 deals with the first approach, the alternative DCA-based algorithm, while section 4 concerns the second approach, with two DCA-based algorithms for solving the whole NMF problem. Several NMF variants and their solution methods are considered in section 5. The numerical experiments are presented in section 6. Section 7 concludes the letter.
Before beginning, we mention some notations we use. We use uppercase letters to denote matrices and lowercase letters for vectors or scalars. Vectors are also regarded as matrices with one column. denotes the matrix whose entries are all 1. For a matrix , the notation (resp. ) refers to the ith row (resp. jth column) of X. The inequality means that all entries of X are nonnegative. Each entry of the absolute matrix is defined as for all . The notation (resp. ) denotes the element-wise product (resp. division) of matrices. The matrix inner product of two matrices is defined by . For two matrices X and Y of the same size , the inequality means that for all indices , while (resp. ) denotes the matrix Z such that (resp. ) for all . For a scalar , is shortly denoted as .
For a matrix , . The set of nonnegative (resp. positive) real numbers is denoted by (resp. ). The projection of a vector on a subset , denoted by , is the nearest element of to x with regard to Euclidean distance. When , we use the notation instead of . It is clear that .
2 Outline of DC Programming and DCA
2.1 Philosophy of DCA
DCA is based on local optimality conditions and duality in DC programming (for simplicity, we omit here the presentation of DC duality and the dual part of DCA). The main idea of DCA is simple: each iteration k of DCA approximates the concave part—H by its affine majorization (that corresponds to taking —and minimizes the resulting convex function.
The generic DCA scheme can be described as follows:
Generic DCA Scheme
Initialization: Let be an initial guess, .
Until convergence of .
DCA is a descent method (i.e., the sequence is decreasing) with global convergence (its converges from any starting point). Moreover, DCA does not require line search (its step size is always equal to one).
If , then xk is a critical point of G−H. In such a case, DCA terminates at the kth iteration.
If the optimal value of problem (Pdc) is finite and the infinite sequence is bounded, then every limit point of the sequences is a critical point of G−H.
DCA has a linear convergence for general DC programs.
A deeper insight into DCA has been described in Le Thi and Pham Dinh (2005). For instance, it is crucial to note the main feature of DCA: DCA is constructed from DC components and their conjugates but not the DC function f itself, which has infinitely many DC decompositions, and there are as many DCA as there are DC decompositions. Such decompositions play a crucial role in determining the speed of convergence, stability, robustness, and globality of computed solutions. This flexibility of DC programming and DCA is of particular interest from both a theoretical and an algorithmic point of view. Moreover, it is worth pointing out that with suitable DC decompositions, DCA generates most of standard methods in convex and nonconvex programming. For instance, the four well-known methods in machine learning—EM (expectation-maximization), SLA (successive linear approximation), CCCP (convex-concave procedure), and ISTA (iterative shrinkage–thresholding algorithm)—are particular cases of DCA (see Pham Dinh & Le Thi, 1998).
For a complete study of DC programming and DCA, readers are referred to Pham Dinh and Le Thi (1997, 1998, 2014). Le Thi and Pham Dinh (2005). The solution of a nonconvex program (Pdc) by DCA must be composed of two stages: the search for an appropriate DC decomposition of f and that of a good initial point.
In the last decade, a variety of work on machine learning based on DCA has been conducted. The efficiency and the scalability of DCA have been proved in a lot of work (see Le et al., 2013; Le Thi, Pham Dinh et al., 2015; the list of references on http://lita.sciences.univ-metz.fr/lethi/DCA.html). The successful work on DCA motivated us to investigate it for solving the NMF problem.
3 The First Approach: Alternative Method Based on DCA
The alternating methods for solving the NMF problem consist of holding U fixed and optimizing with respect to V, and then holding V fixed and optimizing with respect to U. More precisely, these methods share the following general scheme:
Initialize with nonnegative elements.
Similarly, one may exchange the order of updating U and V.
3.1 DCA for Solving the NNLS problem
3.1.1 DCA for Solving the NNLS Problem with a Single Right-Hand-Side Vector
Improvement of Rule 3.7: A like DCA algorithm. Despite the condition , the update rule, equation 3.7, is well defined for any . Moreover, we observe that if the index , that is, , then for any , the update rule, equation 3.7, does not affect the value of . Thus, at each iteration, we can discard the variables vi with . Consequently, to reduce the computational effort, it is reasonable to consider in equation 3.7, and let when vi is not under consideration. The following result proves that this modified DCA scheme is still a descent method as the standard DCA.
For exploiting the effect of DC decomposition on the resulting DCA, we can again refine the modified DCA by computing the vector d dynamically during DCA’s iterations. This procedure can be seen as the use of local DC decompositions, that is, DC decompositions are changed from one to another iteration.
In the sequel we will use equation 3.10 to define d; the resulting algorithm is named like DCA algorithm.
3.1.2 DCA for Solving the NNLS Problem with Multiple Right-Hand-Side Vectors
DCA-NNLS: DCA for Solving NNLS problem 3.12
The following assertion is a direct consequence of lemma 2:
Consequently, let be the input and output of algorithm DCA-NNLS. If V0 is not a solution of problem, equation 3.12, then ; otherwise .
3.2 The Alternative DCA-Based Algorithm and Its Convergence Properties
The alternative DCA-based algorithm for solving the NMF problem, equation 1.2 is described in the ADCA algorithm:
ADCA: Alternative DCA-Based Algorithm for Solving the NMF Problem
Initialize with nonnegative elements, , ,
until Stopping criterion is satisfied.
First, let us prove a general result, which will be used for deriving the convergence of ADCA.
The sequence is monotonically decreasing,
Then, every limit point of is a critical point of the problem (P).
Assume that is a limit point of and is not a critical point. Then there is a sub-sequence such that . By the assumption, there are a function and a sub-sub-sequence that satisfy condition b.
The following lemma is useful to prove that the sequence generated by ADCA satisfies condition b of theorem 6.
There is a finite set of continuous functions from such that for any , , and computed by algorithm DCA-NNLS, we have for a .
By the construction of DCA-NNLS, if , and are the inputs and output of DCA-NNLS, there is a such that .
Now we are in a position to prove the convergence of ADCA.
Every limit point of the sequence generated by algorithm ADCA is a critical point of the NMF problem, equation 1.2.
It is worth noting that this convergence analysis is not only valid for our specific choice of D but also to other D satisfying condition 3.14 and the assumption that there is a finite set of continuous function from such that each Dl in algorithm DCA-NNLS can be expressed as for a . The latter assumption simply requires that each element of D is computed from U and V via a finite number of formulations. It is easy to see that this assumption is satisfied when , and D is constructed by equation 3.17. The two latter constructions of D also verify condition 3.14, so the convergence of the corresponding NMF algorithms is guaranteed.
3.3 Link between ADCA and Existing Methods
We will see that the algorithm ADCA includes some well-known alternating methods for the NMF problem. As indicated in section 1, each alternating algorithm differs from one another by how they solve NNLS subproblems. We show below that the update rules for solving the NNLS problem, equation 3.12, in some existing NMF algorithms is nothing other than our update rule, equation 3.13, with a suitable choice of D.
3.3.1 Multiplicative Updating (MU) Algorithm (Lee & Seung, 2001)
3.3.2 Modified MU Algorithm (Lin, 2007a)
4 The Second Approach: DCA Applied on the Whole NMF Problem
Even if equation 1.2, is NP-hard, its structure is quite suitable to investigate DCA: the objective function F is twice differentiable, and the feasible set is simply defined by nonnegative constraints. For a DC program with the objective function being twice differentiable, it is well known that the projection DC decomposition results in very inexpensive DCAs when the feasible set is a ball or a box (see e.g. Le et al., 2013; Le Thi, Le Hoai et al., 2007; Le Thi, Le et al., 2007, Le Thi, Le et al., 2013; Le Thi & Nguyen, 2014). In this section we propose two versions of such DCAs. The first requires the projection of points on a ball, and the second is based on the projection of points on a box in combining with the variable selection updating strategy similar to the technique developed in section 3.1.
4.1 The First DCA for the Whole NMF Problem
Attempting to design efficient DCAs based on the projection DC decomposition, we first construct the two balls bounding the variables U and V (naturally, the smaller the balls are, the better DCA is) and then reduce the feasible set of the NMF problem on the product of these balls.
By virtue of lemma 9, instead of solving problem 1.2 on the whole space , we only need to consider the restricted problem, 4.1. In the rest of this section, we reformulate problem 4.1 as a DC program and develop DCA for solving it.
DCA1: The First DCA for solving the Whole NMF Problem
Compute using equation 4.4
Compute using equation 4.5
until Stopping criterion is satisfied.
The sequence generated by algorithm DCA1 has at least one limit point, and every limit point of is a stationary point of problem 2.
The first conclusion is trivial due to the fact that the sequence is bounded. The second conclusion follows lemma 9 and the general convergence properties of DCA.
4.2 The Second DCA for the Whole NMF Problem
In the first DCA, we have to compute all the variables Uij and Vij at each iteration. Although the projection of points on a ball is explicit, DCA1 requires important computational efforts if we want to determine an optimal solution of the convex subproblem at each iteration. In order to reduce computational efforts, we investigate a variable selection strategy similar to the like DCA for the NNLS problem—say, at each iteration of the DCA scheme, some variables are discarded and others are updated. Such a procedure is possible if the subproblem at each iteration is separable in its variables. This suggested using the projection DC decomposition on rectangles. For this purpose, we design local DC decompositions of F by bounding the variables to be updated on a box and then decompose F on this reduced domain, as we show below.
Finally, the second DCA for the NMF problem is described in algorithm DCA2:
DCA2: The Second DCA for Solving the Whole NMF Problem
5 Solving NMF Variants
In this section, we adapt our proposed approaches for solving several NMF variants, including the nonnegative factorization, the smooth regularization NMF, the sparse regularization NMF, the multilayer NMF, the convex/convex-hull NMF, and the symmetric NMF. For the first five classes of problems, we use the alternating framework similar to equation 3.1 and the method in section 3.1 for solving subproblems. We also show that some existing algorithms are actually special cases of our method. For solving the symmetric NMF, we develop a DCA-based algorithm similar to the method in section 4.1 for the original NMF.
To simplify the presentation, we do not describe the complete alternating procedure as algorithms DCA-NNLS and ADCA. Instead, we only derive the update rule that is the essential part in the alternating framework.
5.1 Nonnegative Factorization
Gillis and Glineur (2008) introduced the problem of nonnegative factorization (NF), which is the same as NMF except that the matrix to factorize A can be any real matrix. We still consider problem 1.2 but do not assume .
Note that the update rules derived in section 3.1.1 (resp. 3.1.2) do not use the assumption (resp. ). Thus, they can be applied on the NF problem. Below, we propose another update rule for the NF problem.
5.2 Regularized Nonnegative Matrix Factorization
5.2.1 Smooth Regularization
5.2.2 Sparse Regularization
Recall that one of the important applications of NMF is to learn parts-base representations of nonnegative data. Although NMF itself usually produces sparse representations of data, it is useful to impose more sparsity (Hoyer, 2002, 2004). Below, we consider some existing sparse regularizations.
5.3 Multilayer Nonnegative Matrix Factorization
Fix s (), update Xi as an approximate solution to the problem
5.4 Convex and Convex-Hull Nonnegative Matrix Factorization
5.4.1 Convex Coding
5.4.2 Convex Nonnegative Matrix Factorization
Ignore the additivity constraint and compute W using the method discussed at the end of section 5.3.
We can also completely ignore the additive constraint and apply the normalization step only when the iteration process of computing W and V is finished. However, it may occur that some of columns of W are zero and the normalization step is invalid. In other words, this approach cannot be applied when at least one column of W is zero.
The second approach consists of directly solving problem 5.15 by DC programming and DCA.
Compute and then
5.4.3 Convex-Hull Nonnegative Matrix Factorization
Thurau, Kersting, Wahabzada, and Bauckhage (2011) extended the idea of convex-NMF to the so-call convex-hull NMF. This NMF variant adds the constraint on the second factor V in the convex coding problem, (i.e., ) to the convex-NMF 5.14. Now each data point can be expressed as a convex combination of convex combinations of specific data points. For this problem, we can alternatively compute W using equation 5.16 and compute V by DCA as discussed in section 5.4.1.
It is worth mentioning that both Ding et al. (2010) and Thurau et al (2011) did not directly deal with the constraint . Ding et al. (2010) simply ignored this constraint and alternatively computed W and V by a majorization-minimization method. Meanwhile, instead of computing W and setting , Thurau et al. (2011) determined directly U by (approximately) finding k appropriate vertices of the convex hull of A.
5.5 Symmetric Nonnegative Matrix Factorization
As with the case of NMF, we can bound the solution of the SNMF problem as follows:
If U is a solution of problem 5.17, then , where .
DCA-sym: DCA for Solving the SNMF
Initialize U0 satisfied , .
until Convergence of .
6 Numerical Experiments
In this section, we evaluate our proposed algorithms, ADCA, DCA1, and DCA2, and compare them with the following state-of-the-art algorithms:
QRPBB: NMF using quadratic regularization projected Barzilai-Borwein method (Huang et al., 2014)
We use the source codes of these algorithms available on the authors’ home pages, except for the QRPBB, which is provided directly by the authors. For the NNLS subproblems, we used the default settings parameters in their code, which are the best settings according to the authors’ suggestions.
In all experiment, the settings , , were used for algorithm DCA-NNLS and D was row-by-row constructed using equation 3.10.
All of these algorithms were executed with Matlab R2008b an Intel Core i5-2500S 2 × 2.7 GHz processor and 4 GB memory. The multithreading option of Matlab was disabled.