## Abstract

Robust principal component analysis (PCA) is one of the most important dimension-reduction techniques for handling high-dimensional data with outliers. However, most of the existing robust PCA presupposes that the mean of the data is zero and incorrectly utilizes the average of data as the optimal mean of robust PCA. In fact, this assumption holds only for the squared -norm-based traditional PCA. In this letter, we equivalently reformulate the objective of conventional PCA and learn the optimal projection directions by maximizing the sum of projected difference between each pair of instances based on -norm. The proposed method is robust to outliers and also invariant to rotation. More important, the reformulated objective not only automatically avoids the calculation of optimal mean and makes the assumption of centered data unnecessary, but also theoretically connects to the minimization of reconstruction error. To solve the proposed nonsmooth problem, we exploit an efficient optimization algorithm to soften the contributions from outliers by reweighting each data point iteratively. We theoretically analyze the convergence and computational complexity of the proposed algorithm. Extensive experimental results on several benchmark data sets illustrate the effectiveness and superiority of the proposed method.

## 1  Introduction

High-dimensional data are frequently generated in many scientific domains, such as image processing, visual description, remote sensing, time series prediction, and gene expression. However, it is usually computationally expensive to handle high-dimensional data due to the curse of dimensionality (Chang, Nie, Ma, Yang, & Zhou, 2015; Chang, Nie, Wang et al., 2015; Nie, Huang, Cai, & Ding, 2010; Oh & Kwak, 2016; Parsons, Haque, & Liu, 2004; S. Wang et al., 2015). Dimension-reduction techniques are typically used to extract meaningful features from high-dimensional data without degrading performance. In this field, principal component analysis (PCA) is one of the most important unsupervised dimensionality-reduction algorithms and has been widely used in many real-world applications for its simplicity and effectiveness (Candès, Li, Ma, & Wright, 2011; Chang, Nie, Yang, & Huang, 2014; Gottumukkal & Asari, 2004; Jolliffe, 2002; Kim & Choi, 2007). Traditional PCA learns a set of projections (transformation matrix) by minimizing the reconstruction error of the projected data points based on squared -norm. According to the theoretical analysis of the equivalence, the projections can also be learned by maximizing the variance of data in the projected subspace with squared -norm. However, traditional PCA is disproportionately affected by the presence of outliers, which dominate the sum of reconstruction errors, where the outliers are defined as the data points deviating significantly from the rest of the data (Ding, Zhou, He, & Zha, 2006).

A number of efforts have been devoted to enhancing the robustness of PCA to outliers (Ke & Kanade, 2005; Kwak, 2008; Markopoulos, Karystinos, & Pados, 2014; Meng, Zhao, & Xu, 2012; Nie, Huang, Ding, Luo, & Wang, 2011; Torre & Black, 2001; Wright, Ganesh, Rao, Peng, & Ma, 2009). For example, Ding et al. (2006) proposed a rotational invariant -norm-based robust PCA (-PCA), which softens the contributions from outliers by reweighting each data point iteratively. This method was extended to its two-dimensional (2D) version in Huang and Ding (2008) and Yang, Zhang, Frangi, & Yang (2004). However, -PCA models are solved with subspace iteration algorithm, which requires a lot of time to achieve convergence (Kwak, 2008). Kwak (2008) proposed an intuitive method by maximizing the -norm of variance with a greedy algorithm. The corresponding 2D and supervised versions can be found in Li et al. (2016), Li, Pang, and Yuan (2010), Liu, Liu, and Chan (2010) and Pang, Li, and Yuan (2010). However, the greedy algorithm optimizes the projection directions one by one, which makes it easy to get stuck in a local solution. Nie et al. (2011) exploited an efficient nongreedy optimization algorithm for the -norm maximization problem, which optimizes all projection directions simultaneously. Kwak (2014) extended the nongreedy algorithm to an -norm–based maximization problem. The corresponding 2D version of robust PCA can be found in Wang (2016) and R. Wang et al. (2015). However, -norm maximization-based robust PCA methods are not theoretically connected to minimization of the reconstruction error, the important goal of traditional PCA (Nie & Huang, 2016). For this issue, Nie and Huang (2016) proposed maximizing the -norm based robust PCA objective, which is theoretically connected to the minimization of reconstruction error.

It is noteworthy that the existing robust PCA models usually use the average of the data as its optimal mean and assume that the data are already centered, that is, the average of the data is zero. However, this assumption indeed ignores the data mean calculation of robust PCA and incorrectly uses the average of the data as its optimal mean. In fact, this assumption holds only for squared -norm-based PCA. Additionally, the outliers in high-dimensional data often make the predetermined data mean biased, which degrades the performance of PCA (He, Hu, Zheng, & Kong, 2011; Oh & Kwak, 2016). Relatively few studies take these important issues into consideration. To the best of our knowledge, He et al. (2011) proposed a robust PCA based on the maximum correntropy criterion and handled noncentered data with an estimation of the optimal mean; Nie, Yuan, and Huang (2014) introduced a mean variable and exploited a novel robust PCA objective with optimal mean. Nevertheless, both of these methods integrate the mean calculation into the optimization objective and lead to expensive computation.

In Luo et al. (2016), we developed a new robust PCA by maximizing the sum of projected differences between each pair of instances based on the -norm distance. This method automatically avoids calculating the -norm-based optimal mean and makes the assumption of centered data unnecessary. However, as the previous -norm maximization-based robust PCA, this formulation is still not theoretically connected to minimization of the reconstruction error. Following the reformulation of traditional PCA, in this letter, we learn the optimal projection directions by maximizing the sum of projected difference between each pair of instances based on -norm. The proposed method is robust to data outliers and also invariant to rotation. More important, this approach automatically avoids calculating the optimal mean and makes the assumption of centered data unnecessary. On the other hand, the -norm-based objective function is theoretically connected to minimizing the reconstruction error. An efficient reweighted algorithm is exploited to solve the -norm based maximization problem. We also analyze the convergence of the proposed algorithm, as well as its computational cost. We extend the proposed robust PCA to its tensor version. Extensive experimental results on several benchmark data sets illustrate the effectiveness and superiority of the proposed method.

The remainder of this letter is organized as follows. We briefly review conventional PCA and some robust PCA related to our work in section 2. In section 3, we propose a novel -norm maximization-based robust PCA for reconstruction that automatically avoids the optimal mean calculation. Section 4 focuses on solving the reweighted algorithm for the proposed nonsmooth optimization problem with theoretical analysis on the convergence and computational complexity. We extend the -norm maximization-based robust PCA to its tensor version in section 5. In section 6, we conduct extensive experiments over some real-world data sets to verify the effectiveness and superiority of the proposed method. Conclusions are given in section 7.

In this letter, vectors are written as boldface lowercase letter. Matrices are formulated as uppercase letters. For any matrix , its th row and th column are represented as and , respectively. The -norm of a vector is defined as . The Frobenius norm of the matrix is defined as , where refers to the trace of matrix. We define the -norm of matrix as . Note that the -norm is a valid norm since it satisfies the three conditions for norm (Kong, Ding, & Huang, 2011). Additionally, the -norm is rotational invariant for columns, for any orthogonal (rotational) matrix (Ding et al., 2006).

## 2  PCA Revisit

Suppose a data matrix is given as , where each instance is represented by a vector with -dimensionality; refers to the number of instances. Conventional PCA learns a transformation that maps high-dimensional data to low-dimensional representations. Specifically, let be a semiorthogonal transformation matrix, and the idea of traditional PCA is formulated by minimizing the reconstruction error in the projected subspace with -norm:
2.1
where is the mean of the data. The optimal transformation matrix of conventional PCA in terms of reconstruction error can also be learned through maximizing the variance of data in the projected subspace. We prove this conclusion with lemma 3 and theorem 4.
Lemma 1.
For any matrix satisfying , the equation
holds for any .
Theorem 1.
Let be the data matrix and be the average of data . Denote and the optimal solution of conventional PCA formulated by reconstruction error objective function . Then we have , and is also the optimal solution of the following optimization problem:
2.2
Proof.
Setting the derivative of objective function with respect to variable to zero, we have . Moreover, we substitute into the objective , and thus the conventional PCA can be reformulated as
2.3
Recalling that ; we arrive at the following equation,
2.4
according to lemma 3. Therefore, the optimal transformation matrix for -norm-based PCA can also be achieved by maximizing the variance of objective function .

Based on theorem 4, we observe that the optimal mean of -norm-based PCA is indeed the average mean of the data. As a result, the traditional PCA is usually implemented by subtracting the average of data from each datum. This preprocessing ensures that the data are centered in a prior (i.e., the mean of the data turns to be zero). However, on one hand, the outliers involved in a data set often make the average of mean biased; on the other hand, the high computational complexity and serious sensitivity to outliers induced by -norm make traditional PCA hard for real-world applications (Nie et al., 2011).

Some research has been devoted to enhancing the robustness of PCA. From the perspective of reconstruction error, Ding et al. (2006) preserve the rotational invariant of traditional PCA based on -norm and proposed R1-PCA by replacing the squared -norm in optimization problem 3.1 with -norm:
2.5
From the viewpoint of maximizing the variance, Kwak (2008) exploited a robust PCA by employing the -norm instead of the squared -norm for optimization problem 3.3:
2.6
However, there is no theoretical analysis in Kwak (2008) to ensure that the objective is closely related to the reconstruction error based objective . Nie and Huang (2016) replaced the squared -norm of optimization problem 3.3 with -norm and proposed maximizing the variance based on -norm:
2.7
Theoretical and experimental results suggested that the objective is closely related to the reconstruction error-based objective (Nie & Huang, 2016),
2.8
where and () denote two constants that are independent of transformation matrix .

Note that the optimal mean is removed from the corresponding objectives of optimization problems 3.5 to 3.7 since the average of data is supposed to be the optimal mean of the robust PCA. In Ding et al. (2006), Kwak (2008) and Nie and Huang (2016); even these objectives are no longer based on Euclidean distance. However, the average of data equals the optimal mean valid only for the squared -norm-based PCA (Luo et al., 2016; Nie et al., 2014). As a result, the robust PCA methods we have noted indeed ignore the calculation of the optimal mean.

## 3  PCA Revisit

Suppose the data matrix is given as , where each instance is represented by a vector with -dimensionality; refers to the number of instances. Conventional PCA learns a transformation that maps high-dimensional data to low-dimensional representations. Specifically, let be a semiorthogonal transformation matrix; the idea of traditional PCA is formulated by minimizing the reconstruction error in the projected subspace with -norm,
3.1
where is the mean of the data. The optimal transformation matrix of conventional PCA in terms of reconstruction error can also be learned through maximizing the variance of data in the projected subspace. We prove this conclusion with lemma 3 and theorem 4.
Lemma 2.
For any matrix satisfying ,
holds for any .
Theorem 2.
Let be the data matrix and be the average of data . Denote and the optimal solution of conventional PCA formulated by reconstruction error objective function . Then we have , and is also the optimal solution of the following optimization problem:
3.2
Proof.
Setting the derivative of objective function with respect to variable to zero, we have . Moreover, we substitute into the objective , and thus the conventional PCA can be reformulated as
3.3
Recalling that , we arrive at the following equation,
3.4
according to lemma 3. Therefore, the optimal transformation matrix for -norm-based PCA can also be achieved by maximizing the variance of objective function .

Based on theorem 4, we observe that the optimal mean of -norm-based PCA is indeed the average mean of the data. As a result, the traditional PCA is usually implemented by subtracting the average of data from each datum. This preprocessing ensures that the data are centered in a prior (i.e., the mean of the data turns out to be zero). On one hand, the outliers involved in the data set often make the average of mean biased. On the other hand, the high computational complexity and serious sensitivity to outliers induced by -norm make traditional PCA hard for real-world applications (Nie et al., 2011).

There has been some effort devoted to enhancing the robustness of PCA. From the perspective of reconstruction error, Ding et al. (2006) preserve the rotational invariant of traditional PCA based on -norm and proposed R1-PCA by replacing the squared -norm in optimization problem 3.1 with -norm:
3.5
From the viewpoint of maximization the variance, Kwak (2008) exploited a robust PCA by employing the -norm instead of the squared -norm for optimization problem 3.3:
3.6
However, there is no theoretical analysis in Kwak (2008) to ensure that the objective is closely related to the reconstruction error-based objective . Nie and Huang (2016) replaced the squared -norm of optimization problem 3.3 with -norm and proposed maximizing the variance based on -norm:
3.7
Theoretical and experimental results suggested that the objective is closely related to the reconstruction error-based objective (Nie & Huang, 2016), that is,
3.8
where and () denote two constants independent of transformation matrix .

Note that the optimal mean is removed from the corresponding objectives of optimization problems 3.5 to 3.7 since the average of data is supposed to be the optimal mean of the robust PCA in Ding et al. (2006), Kwak (2008), and Nie and Huang (2016). Even these objectives are no longer based on Euclidean distance. However, the fact is that the average of data equals the optimal mean only for the squared -norm-based PCA (Luo et al., 2016; Nie et al., 2014). As a result, the robust PCA methods already mentioned ignore the calculation of the optimal mean.

## 4  The Proposed Methodology

In this section, we consider a general case that the mean of the data is not zero and propose a novel robust PCA based on -norm. We first introduce lemma 5 and theorem 6 for a better representation.

Lemma 3.
Let be the data matrix and be the average of data . Then we have
4.1
Proof.
On one hand, with , we have
4.2
On the other hand, we have
4.3
As a result, equation 4.1 holds according to the two equations above. The proof is complete.
Theorem 3.
Let be the data matrix and be the average of data . Denote the optimal transformation matrix of conventional PCA formulated by reconstruction error objective function . Then is also the optimal solution of the following optimization problem:
4.4
Proof.
Based on lemma 5, it is clear that
4.5
that is, . Therefore, the optimal solution of conventional PCA is also the optimal solution of objective function according to theorem 4.
Based on theorem 6, the optimal transformation matrix can also be achieved by maximizing the sum of the projected difference between each pair of instances instead of the difference between each instance and the mean of the data. As a result, the alternative objective function actually estimates the transformation matrix with the calculation of optimal mean avoided automatically. Motivated by the alternative formulation, a robust PCA is proposed in Luo et al. (2016) by maximizing the sum of the projected difference between each pair of instances with -norm distance measurement:
4.6
Note that the utilization of -norm improves the robustness of conventional PCA to outliers and automatically avoids calculating the -norm-based optimal mean. However, there is no theoretical result to ensure the objective is closely related to the robust reconstruction error.
In this letter, we replace the squared -norm in objective function with -norm and propose a novel robust PCA by solving the following optimization problem:
4.7
Note that the proposed objective 4 automatically avoids calculating the optimal mean and makes the assumption on centered data unnecessary. Subsequently, we give a theoretical analysis to show that the proposed robust PCA with objective is closely related to the reconstruction error objective of robust PCA. For better understanding, we introduce theorem 7:
Theorem 4.
If , then
Proof.
It is evident that
On the other hand, using Cauchy-Schwarz inequality, we have
According to equations 3.4 and 4.5 and theorem 7, the following inequality holds:
4.8
that is, , where and () denote two constants that are independent of transformation matrix . Therefore, the proposed objective is theoretically connected to the problem with objective . In such a way, maximizing the objective function also makes sense to minimize the reconstruction error and thus is suitable for the robust PCA. We demonstrate the pair of PCA models that are closely related to each other in theory in Table 1.
Table 1:
The Pair of PCA Models That Are Closely Related to Each Other in Theory, where Are Constants.
NormMaximizing VarianceMinimizing Reconstruction LossReason for Theoretical Connection

NormMaximizing VarianceMinimizing Reconstruction LossReason for Theoretical Connection

## 5  Optimization Procedure and Theoretical Analysis

In this section, we exploit a reweighted algorithm to solve the proposed optimization problem 4.7. The corresponding theoretical analyses on the convergence and computational complexity are provided to illustrate the efficiency of the proposed algorithm.

### 5.1.  Optimization Procedure

Considering the nonsmooth of optimization problem 4.7 with -norm, we introduce the reweighted algorithm for the general optimization problem,
5.1
where is an arbitrary constraint; is an arbitrary scatter-output function defined on ; is an arbitrary scatter vector or matrix-output function defined on for each ; and is an arbitrary convex function for each . It is supposed that the objective has an upper bound. Following Nie and Huang (2016) and Zhang, Zha, Yang, Yan, and Chua (2014), the optimal solution of optimization problem 5.1 can be achieved by addressing the following optimization problem,
5.2
with reweighted algorithm 1, where the weighted vector is calculated by
5.3
For optimization problem 5.1, let and . We solve the proposed optimization problem 4.7 by addressing the following optimization problem,
5.4
where the unknown variables depend on and is calculated through
5.5
With the reweighted algorithm for -norm maximization, we solve optimization problem 5.4 by updating with current solution at the th iteration and the solution is updated with the updated . The iterative procedure is repeated until the algorithm coverages. Note that the key step lies in addressing optimization problem 5.4. Let ; we reformulate the objective of optimization problem 5.4 as
5.6
This optimization problem can be solved efficiently according to theorem 8.
Theorem 5.
Suppose the SVD of is , where , , and . The solution of optimization problem
5.7
is derived as .
Proof.
Based on the SVD of , we have
where ; and represent the th element of matrices and , respectively. Recall that the constraint , so we have and the th element of , , where is an by identity matrix. Combining with the fact that since is a singular value of , we arrive at the following inequality,
5.8
where the equality in equation 5.8 holds when for any . As a result, the objective function reaches its maximum when , where . Recall that ; thus, the optimal solution to optimization problem 5.7 is
5.9

In summary, we describe the reweighted optimization algorithm for problem 4.7 in algorithm 2, where the iteration stops when the difference in objective values between consecutive iterations (normalized by the current objective) is smaller than .

### 5.2.  Convergence Analysis

In this section, we prove that the proposed reweighted algorithm 2 will monotonically increase the value of the proposed objective function 4.7 at each iteration and will converge to a local solution.

Theorem 6.

The reweighted algorithm 2 monotonically increase the value of objective function 4 at each iteration.

Proof.
Based on the Cauchy-Schwarz inequality, we obtain
5.10
Combining the calculation of in equation 5.5, we have
5.11
when . In fact, if , inequality 5.11 still holds since according to equation 5.5. On the other hand, the calculation of in equation 5.5 implies that
5.12
Therefore, according to inequality 5.11 and equation 5.12, we have
5.13
for each . According to the step 3 in reweighted algorithm 2, we have
5.14
for each iteration ; that is, the inequality
5.15
holds for any . Based on inequalities 5.13 and 5.15, we arrive at
5.16
Thus, the reweighted algorithm 2 monotonically increases the objective function of optimization problem 4.7 at each iteration. Because the objective of optimization problem 4.7 has an upper bound, theorem 9 indicates that the proposed reweighted algorithm 2 converges.

### 5.3.  Complexity Analysis

It is evident that the main computation cost of algorithm 2 lies in calculating and its SVD in steps 3 and 4. Indeed, computing depends only on -dimensional vectors and due to the following equivalent transformation:
Consequently, the computational cost of is when and are given. Letting with computational complexity , we have
5.17
for . Thus, the computational cost of is . This cost is acceptable since it is independent of the high dimension of the data; meanwhile, the reduced dimension is usually small. The SVD of matrix has time complexity for . In summary, the computational complexity of algorithm 2 at each iteration is .

## 6  Extension to Tensor Version of Robust PCA

Traditional PCA methods are usually based on data points with a vector format; thus, high-order tensor data should be vectorized to very high-dimensional vectors before applying PCA. This strategy destroys the spatial information of tensor data and makes the computation very expensive. For this issue, tensor version PCA is extended to handle the original tensor data directly. In this way, the image matrix does not need to be transformed into a vector prior to feature extraction (Yang et al., 2004). In this section, we take the two-dimensional (2D) tensor as an example and extend the proposed -norm-based robust PCA to its tensor 2D version. Note that the higher-order tensor cases can also be intuitively extended by replacing the linear operator with the corresponding tensor operator (De Lathauwer, 1997).

We suppose that the given data are denoted by , where each component refers to a 2D matrix; is the number of data. To handle 2D tensor data, we replace the linear operator used in traditional PCA with the corresponding two transformation matrices and , where and are the reduced dimensions of two projection subspaces. The conventional 2D tensor PCA learns transformation matrices and by maximizing the following optimization problem with the Frobenius norm,
6.1
where the unknown variable refers to the mean of the tensor 2D data. Based on the Frobenius norm, the optimal mean of the objective optimization problem 6.1 is calculated as . Therefore, the tensor data are usually centralized to ensure the mean of the data is zero. Considering the sensitivity of Frobenius norm to outliers that exist in many practical situations, some research replaces the Frobenius norm with other distance measurement, such as -norm used in Pang et al. (2010) and Wang et al. (2015) and -norm used in Nie and Huang (2016). However, all of these methods incorrectly employ as the optimal mean even though this assumption is valid only for conventional tensor 2D PCA with the Frobenius norm.
According to the similar result of theorem 6, we extend the proposed robust PCA with -norm to its tensor 2D version as the following optimization problem:
6.2
Note that the objective function above enhances the robustness of sensor 2D PCA with -norm-based error measurement; at the same time, it does not depend on the centralized data and avoids calculating the optimal mean automatically. Similar to other tensor methods, problem 6.2 can be solved through an alternative optimization algorithm. With fixed , the optimal can be achieved by solving a similar problem to optimization problem 4.7 with algorithm 2. In turn, the optimal can also be obtained with the current updated through algorithm 2.

## 7  Experiment

To illustrate the effectiveness and superiority of the proposed method, we conduct experimental evaluations on face image reconstruction with occlusions and outliers.

### 7.1.  Competitors and Data Set Descriptions

We compare the proposed avoiding optimal mean -norm maximization-based PCA (PCAAOML21) with other state-of-the-art PCA models, including traditional -norm-based PCA, rotational invariant -norm PCA (PCAR1; Ding et al., 2006), nongreedy -norm maximization-based PCA (PCAL21; Nie & Huang, 2016), and -norm maximization-based PCA with the calculation of optimal mean avoided automatically (PCAAOML1; Luo et al., 2016).

All of the experiments are conducted on six real-world benchmark data sets. Some samples from these data sets are shown in Figure 1. Note that each image of facial expression is size for all data sets except the AR data set, with an image size of .

1. Yale data set (Yale; Cai, He, Hu, Han, & Huang, 2007). It contains 165 gray-scale facial images of 15 individuals. Each object has 11 images with different facial expressions or configurations: center light, with glasses, happy, left light, without glasses, normal, right light, sad, sleepy, surprised, and wink.

Some samples from six benchmark face data sets. The images from the first row to the sixth row are XM2VTS, UMIST, AR, ORL, JAFFE, and Yale.

Some samples from six benchmark face data sets. The images from the first row to the sixth row are XM2VTS, UMIST, AR, ORL, JAFFE, and Yale.

2. Japanese Female Facial Expression database (JAFFE; Dailey et al., 2010). This data set consists of 213 images of seven facial expressions (happy, angry, disgust, fear, sad, surprise, and neutral) posed by 10 Japanese female models.

3. ORL faces data set (AT&T database of faces; Samaria & Harter, 1994). This contains 400 face images of 40 distinct subjects. For each subject, 10 images are taken at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling), and facial details (glasses/no glasses).

4. AR face database (Martinez, 1998). This face database contains over 4000 images, including frontal views of faces with different facial expressions, lighting conditions, and occlusions. Following Nie and Huang (2016), we select seven face images of 120 people (65 men and 55 women) in the experiment.

5. UMIST face data set (Wechsler, Phillips, Bruce, Soulie, & Huang, 2012). This consists of 1012 face images of 20 individuals with mixed race/gender/appearance. Each individual is shown in a range of poses from profile to frontal views.

6. Extended Multi-Modal Verification for Teleservices and Security applications database (XM2VTS; Messer, Matas, Kittler, Luettin, & Maitre, 1999). This database is collected for multi-modal identification of human faces. It contains four face images of 295 people (1180 images) taken over a period of four months.

### 7.2.  Reconstruction Error with Occlusions

For each data set, we randomly select 10%, 20%, and 30% face images to be occluded with a randomly located white square, where each square takes up one-fourth of the image. To qualify the capability of reconstruction, we use the following evaluation metric,
7.1
where is the number of reduced dimension. and denote the th original image without occlusion and the th image used in the procedure of learning, respectively. is the number of instances; is the learned transformation matrix.

Under different reduced dimensions, we compare the reconstruction error of the proposed PCAAOML21 with the competitors mentioned above, regrading to varying percentages of images occluded (see Figures 2, 3, and 4). From these figures, we arrive at the following observations. First, with a small number of face images occluded, there is no significant difference in the reconstruction performance of traditional PCA and other robust PCA methods (except for the JAFFE data set). However, as more and more face images are occluded in each data set, the traditional PCA performs worse than other robust PCA methods. This is because the traditional PCA uses -norm-based distance measurement, which is sensitive to the samples with large reconstruction error. Second, since there is no theoretical analysis to show the close connection of -norm maximization-based PCA models to the minimization of reconstruction error, PCAAOML1 performs worse than other robust PCA models, and even the traditional PCA model for some data sets, such as the XM2VTS data set with 20% and 30% occluded images. The same observation (Kwak, 2008; Nie et al., 2011) is also verified in Nie and Huang (2016) for robust PCAL1 with extensively experimental results. As a result, the -norm maximization-based robust PCA is not a good option for PCA in some cases. Finally, thanks to the theoretical connection to reconstruction error, PCAAOML21 and PCAL21 achieve even better performance among other state-of-the-art robust PCA models over almost all of the data sets. Note that the PCAL21 model ignores the calculation of optimal mean and incorrectly uses the average of data as the optimal mean. However, the proposed PCAAOML21 does not depend on the zero mean assumption and avoids the optimal mean calculation automatically.

Figure 2:

Reconstruction error comparison with respect to different reduced dimensions over six benchmark data sets. Ten percent images of each data set are randomly occluded with a randomly located white square.

Figure 2:

Reconstruction error comparison with respect to different reduced dimensions over six benchmark data sets. Ten percent images of each data set are randomly occluded with a randomly located white square.

Figure 3:

Reconstruction error comparison with respect to different reduced dimensions over six benchmark data sets. Twenty percent images of each data set are randomly occluded with a randomly located white square.

Figure 3:

Reconstruction error comparison with respect to different reduced dimensions over six benchmark data sets. Twenty percent images of each data set are randomly occluded with a randomly located white square.

Figure 4:

Reconstruction error comparison with respect to different reduced dimensions over six benchmark data sets. Thirty percent images of each data set are randomly occluded with a randomly located white square.

Figure 4:

Reconstruction error comparison with respect to different reduced dimensions over six benchmark data sets. Thirty percent images of each data set are randomly occluded with a randomly located white square.

### 7.3.  Reconstruction Error with Noise Images

To illustrate reconstruction performance with noise images, in this experiment, we add varying number of instances from the Columbia University Image Library (COIL-20) data set (Nene, Nayar, & Murase, 1996) into facial image data sets. The COIL-20 database is collected with 1440 gray-scale images of 20 objects. For each object, 72 images are taken at pose intervals of 5 degrees with respect to a fixed camera. We demonstrate some samples of COIL-20 in Figure 5.

Figure 5:

Some samples from the Coil20 data set.

Figure 5:

Some samples from the Coil20 data set.

In this experiment, we evaluate the performance of reconstruction according to the following metric,
7.2
where is the reduced dimensions. refers to the feature vector of facial images. denotes the learned transformation matrix by PCA models over the collection of facial images and the added noise images from the COIL-20 data set.

Regarding the different number of reduced dimensions, we demonstrate the reconstruction performance of different PCA models over ORL and XM2VTS data sets in Figure 6, where 10%, 20%, and 30% images from the COIL-20 data set are randomly selected as the noise images. From Figure 6, we arrive at two observations. First, traditional PCA and PCAAOML1 methods perform worse than other robust PCA methods, especially over the ORL data set. Traditional -norm-based PCA is more sensitive to the noise images from the COIL-20 data set, and thus performs worse as the number of noise images increase. PCAAOML1 has a poor ability to reconstruct the data sets, which experimentally reconfirms that -norm maximization PCA is not theoretically connected with the minimization of reconstruction. Thus, it performs unsteadily with respect to different degrees of reduced dimensionality. A similar observation is also verified in Nie and Huang (2016) for robust PCAL1, with extensively experimental results. Due to more robust measurements of reconstruction loss used in the PCA-HQ and PCA-GM algorithms, they outperform traditional PCA as well as -norm-based robust PCA models. Second, the proposed PCAAOML21 performs comparably even better than other -norm-based PCA methods, including -norm minimization-based PCAR1 for reconstruction and -norm maximization-based PCAL21. Note that all of the -norm-based methods mentioned in this letter are theoretically connected with minimizing reconstruction error. Although PCAR1 and PCAL21 achieve only slightly worse performance, they incorrectly use the -norm-based optimal mean and presuppose the data are centralized in a prior. It is noteworthy that the proposed PCAAOML21 is not only theoretically connected to minimizing of reconstruction error, but also abandons the data-centralized assumption and avoids the -norm based optimal mean automatically.

Figure 6:

Reconstruction error comparison with respect to different reduced dimensions over ORL and XM2VTS data sets, with 10%, 20% and 30% images from the COIL-20 data set randomly selected as the noise images.

Figure 6:

Reconstruction error comparison with respect to different reduced dimensions over ORL and XM2VTS data sets, with 10%, 20% and 30% images from the COIL-20 data set randomly selected as the noise images.

## 8  Conclusion

In this letter, we rewrite the objective of traditional PCA and formulate a novel robust PCA as -norm maximization of difference between the pair of instances in the projected space. We prove this formulation is theoretically connected to minimizing of reconstruction error. At the same time, this strategy automatically avoids calculating the optimal mean and makes the centralized assumption on training data unnecessary. We exploited an efficient nongreedy algorithm to solve the proposed nonsmooth optimization problem with theoretical analysis on its convergence and computational complexity. Extensive experimental results on several benchmark datasets illustrate the effectiveness and superiority of the proposed methods for reconstruction.

## Acknowledgments

This work was funded by the National Science Foundation of China (Nos 61502377, 61532015, 61532004), the National Key Research and Development Program of China (No. 2016YFB1000903), the National Science Foundation (NSF) under grant No. IIS-1638429, and China Postdoctoral Science Foundation (No. 2015M582662).

## References

Cai
,
D.
,
He
,
X.
,
Hu
,
Y.
,
Han
,
J.
, &
Huang
,
T.
(
2007
).
Learning a spatially smooth subspace for face recognition
. In
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
.
Piscataway, NJ
:
IEEE
.
Candès
,
E. J.
,
Li
,
X.
,
Ma
,
Y.
, &
Wright
,
J.
(
2011
).
Robust principal component analysis?
Journal of the ACM
,
58
,
11
.
Chang
,
X.
,
Nie
,
F.
,
Ma
,
Z.
,
Yang
,
Y.
, &
Zhou
,
X.
(
2015
).
A convex formulation for spectral shrunk clustering
. In
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence
(pp.
2532
2538
).
Cambridge, MA
:
MIT Press
.
Chang
,
X.
,
Nie
,
F.
,
Wang
,
S.
,
Yang
,
Y.
,
Zhou
,
X.
, &
Zhang
,
C.
(
2015
).
Compound rank- projections for bilinear analysis
.
IEEE Transactions on Neural Networks and Learning Systems
,
27
,
1502
1513
.
Chang
,
X.
,
Nie
,
F.
,
Yang
,
Y.
, &
Huang
,
H.
(
2014
).
A convex sparse PCA for feature analysis.
arXiv:1411.6233
Dailey
,
M. N.
,
Joyce
,
C.
,
Lyons
,
M. J.
,
Kamachi
,
M.
,
Ishi
,
H.
,
Gyoba
,
J.
, &
Cottrell
,
G. W.
(
2010
).
Evidence and a computational explanation of cultural differences in facial expression recognition
.
Emotion
,
10
,
874
.
De Lathauwer
,
L.
(
1997
).
Signal processing based on multilinear algebra.
Leuven
:
Katholieke Universiteit Leuven
.
Ding
,
C.
,
Zhou
,
D.
,
He
,
X.
, &
Zha
,
H.
(
2006
).
R1-PCA: Rotational invariant -norm principal component analysis for robust subspace factorization
. In
Proceedings of International Conference on Machine Learning.
New York
:
ACM
.
Gottumukkal
,
R.
, &
Asari
,
V. K.
(
2004
).
An improved face recognition technique based on modular PCA approach
.
Pattern Recognition Letter
,
25
,
429
436
.
He
,
R.
,
Hu
,
B.
,
Zheng
,
W.
, &
Kong
,
X.
(
2011
).
Robust principal component analysis based on maximum correntropy criterion
.
IEEE Transactions on Image Processing
,
20
,
1485
1494
.
Huang
,
H.
, &
Ding
,
C.
(
2008
).
Robust tensor factorization using norm
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
.
Piscataway, NJ
:
IEEE
.
Jolliffe
,
I.
(
2002
).
Principal component analysis
.
New York
:
Wiley
.
Ke
,
Q.
, &
,
T.
(
2005
).
Robust norm factorization in the presence of outliers and missing data by alternative convex programming
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
.
Piscataway, NJ
:
IEEE
.
Kim
,
C.
, &
Choi
,
C.-H.
(
2007
).
Image covariance-based subspace method for face recognition
.
Pattern Recognition
,
40
,
1592
1604
.
Kong
,
D.
,
Ding
,
C.
, &
Huang
,
H.
(
2011
).
Robust nonnegative matrix factorization using -norm
. In
Proceedings of ACM International Conference on Information and Knowledge Management
.
New York
:
ACM
.
Kwak
,
N.
(
2008
).
Principal component analysis based on -norm maximization
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
30
,
1672
1680
.
Kwak
,
N.
(
2014
).
Principal component analysis by -norm maximization
.
IEEE Transactions on Cybernetics
,
44
,
594
609
.
Li
,
B. N.
,
Yu
,
Q.
,
Wang
,
R.
,
Xiang
,
K.
,
Wang
,
M.
, &
Li
,
X.
(
2016
).
Block principal component analysis with nongreedy -norm maximization
.
IEEE Transactions on Cybernetics
,
46
,
2543
2547
.
Li
,
X.
,
Pang
,
Y.
, &
Yuan
,
Y.
(
2010
).
L1-norm-based 2dpca
.
IEEE Transactions on Systems, Man, and Cybernetics, Part B
,
40
,
1170
1175
.
Liu
,
Y.
,
Liu
,
Y.
, &
Chan
,
K. C. C.
(
2010
).
Multilinear maximum distance embedding via -norm optimization
. In
Proceedings of Association for the Advancement of Artificial Intelligence.
Luo
,
M.
,
Nie
,
F.
,
Chang
,
X.
,
Yang
,
Y.
,
Hauptmann
,
A. G.
, &
Zheng
,
Q.
(
2016
).
Avoiding optimal mean robust PCA/2DPCA with non-greedy -norm maximization
. In
Proceedings of International Joint Conference on Artificial Intelligence
.
Cambridge, MA
:
MIT Press
.
Markopoulos
,
P. P.
,
Karystinos
,
G. N.
, &
,
D. A.
(
2014
).
Optimal algorithms for-subspace signal processing
.
IEEE Transactions on Image Processing
,
62
,
5046
5058
.
Martinez
,
A. M.
(
1998
).
The AR face database
(CVC Technical Report).
West Lafayette, IN
:
Purdue University
.
Meng
,
D.
,
Zhao
,
Q.
, &
Xu
,
Z.
(
2012
).
Improve robustness of sparse pca by l 1-norm maximization
.
Pattern Recognition
,
45
,
487
497
.
Messer
,
K.
,
Matas
,
J.
,
Kittler
,
J.
,
Luettin
,
J.
, &
Maitre
,
G.
(
1999
).
XM2VTSDB: The extended M2VTS database
. In
Proceedings of the Second International Conference on Audio and Video-Based Biometric Person Authentication
.
Nene
,
S. A.
,
Nayar
,
S. K.
, &
Murase
,
H.
(
1996
).
Columbia Object Image Library (COIL-20
) (Technical Report CUCS-005-96).
New York
:
Columbia University
.
Nie
,
F.
, &
Huang
,
H.
(
2016
).
Non-greedy ℓ2,1-norm maximization for principal component analysis
.
arXiv:1603.08293
Nie
,
F.
,
Huang
,
H.
,
Cai
,
X.
, &
Ding
,
C. H. Q.
(
2010
).
Efficient and robust feature selection via joint -norms minimization
. In
J. D.
Lafferty
,
C. K. I.
Williams
,
J.
Shawe-Taylor
,
R. S.
Zemel
, &
A.
Culotta
(Eds.),
Advances in neural information processing systems
,
23
.
Red Hook, NY
:
Curran
.
Nie
,
F.
,
Huang
,
H.
,
Ding
,
C.
,
Luo
,
D.
, &
Wang
,
H.
(
2011
).
Robust principal component analysis with non-greedy -norm maximization
. In
Proceedings of the International Joint Conference on Artificial Intelligence
.
Cambridge, MA
:
MIT Press
.
Nie
,
F.
,
Yuan
,
J.
, &
Huang
,
H.
(
2014
).
Optimal mean robust principal component analysis
. In
Proceedings of International Conference on Machine Learning.
New York
:
ACM
.
Oh
,
J.
, &
Kwak
,
N.
(
2016
).
Generalized mean for robust principal component analysis
.
Pattern Recognition
,
54
,
116
127
.
Pang
,
Y.
,
Li
,
X.
, &
Yuan
,
Y.
(
2010
).
Robust tensor analysis with -norm
.
IEEE Transactions on Circuits and Systems for Video Technology
,
20
,
172
178
.
Parsons
,
L.
,
Haque
,
E.
, &
Liu
,
H.
(
2004
).
Subspace clustering for high dimensional data: a review
.
SIGKDD Explorations
,
6
,
90
105
.
Samaria
,
F. S.
, &
Harter
,
A. C.
(
1994
).
Parameterisation of a stochastic model for human face identification
. In
Proceedings of the Second IEEE Workshop on Applications of Computer Vision.
Piscataway, NJ
:
IEEE
.
Torre
,
F. D.
, &
Black
,
M. J.
(
2001
).
Robust principal component analysis for computer vision
. In
Proceedings of International Conference on Computer Vision
.
Piscataway, NJ
:
IEEE
.
Wang
,
J.
(
2016
).
Generalized 2-D principal component analysis by -norm for image analysis
.
IEEE Transactions on Cybernetics
,
46
,
792
803
.
Wang
,
R.
,
Nie
,
F.
,
Yang
,
X.
,
Gao
,
F.
, &
Yao
,
M.
(
2015
).
Robust 2DPCA with non-greedy-norm maximization for image analysis
.
IEEE Transactions on Cybernetics
,
45
,
1108
1112
.
Wang
,
S.
,
Nie
,
F.
,
Chang
,
X.
,
Yao
,
L.
,
Li
,
X.
, &
Sheng
,
Q. Z.
(
2015
).
Unsupervised feature analysis with class margin optimization
. In
Proceedings of Machine Learning and Knowledge Discovery in Databases
(pt.
I
, pp.
383
398
).
New York
:
ACM
.
Wechsler
,
H.
,
Phillips
,
J. P.
,
Bruce
,
V.
,
Soulie
,
F. F.
, &
Huang
,
T. S.
(
2012
).
Face recognition: From theory to applications
.
New York
:
.
Wright
,
J.
,
Ganesh
,
A.
,
Rao
,
S. R.
,
Peng
,
Y.
, &
Ma
,
Y.
(
2009
).
Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization
. In
Y.
Bengio
,
D.
Schuurmans
,
J. D.
Lafferty
,
C. K. I.
Williams
, &
A.
Culotta
(Eds.),
Advances in neural information processing systems
,
22
.
Red Hook, NY
:
Curran
.
Yang
,
J.
,
Zhang
,
D.
,
Frangi
,
A. F.
, &
Yang
,
J.-Y.
(
2004
).
Two-dimensional PCA: A new approach to appearance-based face representation and recognition
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
26
,
131
137
.
Zhang
,
H.
,
Zha
,
Z.-J.
,
Yang
,
Y.
,
Yan
,
S.
, &
Chua
,
T.-S.
(
2014
).
Robust (semi) nonnegative graph embedding
.
IEEE Transactions on Image Processing
,
23
,
2996
3012
.