## Abstract

Robust principal component analysis (PCA) is one of the most important dimension-reduction techniques for handling high-dimensional data with outliers. However, most of the existing robust PCA presupposes that the mean of the data is zero and incorrectly utilizes the average of data as the optimal mean of robust PCA. In fact, this assumption holds only for the squared -norm-based traditional PCA. In this letter, we equivalently reformulate the objective of conventional PCA and learn the optimal projection directions by maximizing the sum of projected difference between each pair of instances based on -norm. The proposed method is robust to outliers and also invariant to rotation. More important, the reformulated objective not only automatically avoids the calculation of optimal mean and makes the assumption of centered data unnecessary, but also theoretically connects to the minimization of reconstruction error. To solve the proposed nonsmooth problem, we exploit an efficient optimization algorithm to soften the contributions from outliers by reweighting each data point iteratively. We theoretically analyze the convergence and computational complexity of the proposed algorithm. Extensive experimental results on several benchmark data sets illustrate the effectiveness and superiority of the proposed method.

## 1 Introduction

High-dimensional data are frequently generated in many scientific domains, such as image processing, visual description, remote sensing, time series prediction, and gene expression. However, it is usually computationally expensive to handle high-dimensional data due to the curse of dimensionality (Chang, Nie, Ma, Yang, & Zhou, 2015; Chang, Nie, Wang et al., 2015; Nie, Huang, Cai, & Ding, 2010; Oh & Kwak, 2016; Parsons, Haque, & Liu, 2004; S. Wang et al., 2015). Dimension-reduction techniques are typically used to extract meaningful features from high-dimensional data without degrading performance. In this field, principal component analysis (PCA) is one of the most important unsupervised dimensionality-reduction algorithms and has been widely used in many real-world applications for its simplicity and effectiveness (Candès, Li, Ma, & Wright, 2011; Chang, Nie, Yang, & Huang, 2014; Gottumukkal & Asari, 2004; Jolliffe, 2002; Kim & Choi, 2007). Traditional PCA learns a set of projections (transformation matrix) by minimizing the reconstruction error of the projected data points based on squared -norm. According to the theoretical analysis of the equivalence, the projections can also be learned by maximizing the variance of data in the projected subspace with squared -norm. However, traditional PCA is disproportionately affected by the presence of outliers, which dominate the sum of reconstruction errors, where the outliers are defined as the data points deviating significantly from the rest of the data (Ding, Zhou, He, & Zha, 2006).

A number of efforts have been devoted to enhancing the robustness of PCA to outliers (Ke & Kanade, 2005; Kwak, 2008; Markopoulos, Karystinos, & Pados, 2014; Meng, Zhao, & Xu, 2012; Nie, Huang, Ding, Luo, & Wang, 2011; Torre & Black, 2001; Wright, Ganesh, Rao, Peng, & Ma, 2009). For example, Ding et al. (2006) proposed a rotational invariant -norm-based robust PCA (-PCA), which softens the contributions from outliers by reweighting each data point iteratively. This method was extended to its two-dimensional (2D) version in Huang and Ding (2008) and Yang, Zhang, Frangi, & Yang (2004). However, -PCA models are solved with subspace iteration algorithm, which requires a lot of time to achieve convergence (Kwak, 2008). Kwak (2008) proposed an intuitive method by maximizing the -norm of variance with a greedy algorithm. The corresponding 2D and supervised versions can be found in Li et al. (2016), Li, Pang, and Yuan (2010), Liu, Liu, and Chan (2010) and Pang, Li, and Yuan (2010). However, the greedy algorithm optimizes the projection directions one by one, which makes it easy to get stuck in a local solution. Nie et al. (2011) exploited an efficient nongreedy optimization algorithm for the -norm maximization problem, which optimizes all projection directions simultaneously. Kwak (2014) extended the nongreedy algorithm to an -norm–based maximization problem. The corresponding 2D version of robust PCA can be found in Wang (2016) and R. Wang et al. (2015). However, -norm maximization-based robust PCA methods are not theoretically connected to minimization of the reconstruction error, the important goal of traditional PCA (Nie & Huang, 2016). For this issue, Nie and Huang (2016) proposed maximizing the -norm based robust PCA objective, which is theoretically connected to the minimization of reconstruction error.

It is noteworthy that the existing robust PCA models usually use the average of the data as its optimal mean and assume that the data are already centered, that is, the average of the data is zero. However, this assumption indeed ignores the data mean calculation of robust PCA and incorrectly uses the average of the data as its optimal mean. In fact, this assumption holds only for squared -norm-based PCA. Additionally, the outliers in high-dimensional data often make the predetermined data mean biased, which degrades the performance of PCA (He, Hu, Zheng, & Kong, 2011; Oh & Kwak, 2016). Relatively few studies take these important issues into consideration. To the best of our knowledge, He et al. (2011) proposed a robust PCA based on the maximum correntropy criterion and handled noncentered data with an estimation of the optimal mean; Nie, Yuan, and Huang (2014) introduced a mean variable and exploited a novel robust PCA objective with optimal mean. Nevertheless, both of these methods integrate the mean calculation into the optimization objective and lead to expensive computation.

In Luo et al. (2016), we developed a new robust PCA by maximizing the sum of projected differences between each pair of instances based on the -norm distance. This method automatically avoids calculating the -norm-based optimal mean and makes the assumption of centered data unnecessary. However, as the previous -norm maximization-based robust PCA, this formulation is still not theoretically connected to minimization of the reconstruction error. Following the reformulation of traditional PCA, in this letter, we learn the optimal projection directions by maximizing the sum of projected difference between each pair of instances based on -norm. The proposed method is robust to data outliers and also invariant to rotation. More important, this approach automatically avoids calculating the optimal mean and makes the assumption of centered data unnecessary. On the other hand, the -norm-based objective function is theoretically connected to minimizing the reconstruction error. An efficient reweighted algorithm is exploited to solve the -norm based maximization problem. We also analyze the convergence of the proposed algorithm, as well as its computational cost. We extend the proposed robust PCA to its tensor version. Extensive experimental results on several benchmark data sets illustrate the effectiveness and superiority of the proposed method.

The remainder of this letter is organized as follows. We briefly review conventional PCA and some robust PCA related to our work in section 2. In section 3, we propose a novel -norm maximization-based robust PCA for reconstruction that automatically avoids the optimal mean calculation. Section 4 focuses on solving the reweighted algorithm for the proposed nonsmooth optimization problem with theoretical analysis on the convergence and computational complexity. We extend the -norm maximization-based robust PCA to its tensor version in section 5. In section 6, we conduct extensive experiments over some real-world data sets to verify the effectiveness and superiority of the proposed method. Conclusions are given in section 7.

In this letter, vectors are written as boldface lowercase letter. Matrices are formulated as uppercase letters. For any matrix , its th row and th column are represented as and , respectively. The -norm of a vector is defined as . The Frobenius norm of the matrix is defined as , where refers to the trace of matrix. We define the -norm of matrix as . Note that the -norm is a valid norm since it satisfies the three conditions for norm (Kong, Ding, & Huang, 2011). Additionally, the -norm is rotational invariant for columns, for any orthogonal (rotational) matrix (Ding et al., 2006).

## 2 PCA Revisit

^{3}and theorem

^{4}.

^{3}. Therefore, the optimal transformation matrix for -norm-based PCA can also be achieved by maximizing the variance of objective function .

Based on theorem ^{4}, we observe that the optimal mean of -norm-based PCA is indeed the average mean of the data. As a result, the traditional PCA is usually implemented by subtracting the average of data from each datum. This preprocessing ensures that the data are centered in a prior (i.e., the mean of the data turns to be zero). However, on one hand, the outliers involved in a data set often make the average of mean biased; on the other hand, the high computational complexity and serious sensitivity to outliers induced by -norm make traditional PCA hard for real-world applications (Nie et al., 2011).

Note that the optimal mean is removed from the corresponding objectives of optimization problems 3.5 to 3.7 since the average of data is supposed to be the optimal mean of the robust PCA. In Ding et al. (2006), Kwak (2008) and Nie and Huang (2016); even these objectives are no longer based on Euclidean distance. However, the average of data equals the optimal mean valid only for the squared -norm-based PCA (Luo et al., 2016; Nie et al., 2014). As a result, the robust PCA methods we have noted indeed ignore the calculation of the optimal mean.

## 3 PCA Revisit

^{3}and theorem

^{4}.

^{3}. Therefore, the optimal transformation matrix for -norm-based PCA can also be achieved by maximizing the variance of objective function .

Based on theorem ^{4}, we observe that the optimal mean of -norm-based PCA is indeed the average mean of the data. As a result, the traditional PCA is usually implemented by subtracting the average of data from each datum. This preprocessing ensures that the data are centered in a prior (i.e., the mean of the data turns out to be zero). On one hand, the outliers involved in the data set often make the average of mean biased. On the other hand, the high computational complexity and serious sensitivity to outliers induced by -norm make traditional PCA hard for real-world applications (Nie et al., 2011).

Note that the optimal mean is removed from the corresponding objectives of optimization problems 3.5 to 3.7 since the average of data is supposed to be the optimal mean of the robust PCA in Ding et al. (2006), Kwak (2008), and Nie and Huang (2016). Even these objectives are no longer based on Euclidean distance. However, the fact is that the average of data equals the optimal mean only for the squared -norm-based PCA (Luo et al., 2016; Nie et al., 2014). As a result, the robust PCA methods already mentioned ignore the calculation of the optimal mean.

## 4 The Proposed Methodology

In this section, we consider a general case that the mean of the data is not zero and propose a novel robust PCA based on -norm. We first introduce lemma ^{5} and theorem ^{6} for a better representation.

^{6}, the optimal transformation matrix can also be achieved by maximizing the sum of the projected difference between each pair of instances instead of the difference between each instance and the mean of the data. As a result, the alternative objective function actually estimates the transformation matrix with the calculation of optimal mean avoided automatically. Motivated by the alternative formulation, a robust PCA is proposed in Luo et al. (2016) by maximizing the sum of the projected difference between each pair of instances with -norm distance measurement: Note that the utilization of -norm improves the robustness of conventional PCA to outliers and automatically avoids calculating the -norm-based optimal mean. However, there is no theoretical result to ensure the objective is closely related to the robust reconstruction error.

^{7}:

^{7}, the following inequality holds: that is, , where and () denote two constants that are independent of transformation matrix . Therefore, the proposed objective is theoretically connected to the problem with objective . In such a way, maximizing the objective function also makes sense to minimize the reconstruction error and thus is suitable for the robust PCA. We demonstrate the pair of PCA models that are closely related to each other in theory in Table 1.

## 5 Optimization Procedure and Theoretical Analysis

In this section, we exploit a reweighted algorithm to solve the proposed optimization problem 4.7. The corresponding theoretical analyses on the convergence and computational complexity are provided to illustrate the efficiency of the proposed algorithm.

### 5.1. Optimization Procedure

^{8}.

In summary, we describe the reweighted optimization algorithm for problem 4.7 in algorithm 2, where the iteration stops when the difference in objective values between consecutive iterations (normalized by the current objective) is smaller than .

### 5.2. Convergence Analysis

In this section, we prove that the proposed reweighted algorithm 2 will monotonically increase the value of the proposed objective function 4.7 at each iteration and will converge to a local solution.

The reweighted algorithm 2 monotonically increase the value of objective function 4 at each iteration.

^{9}indicates that the proposed reweighted algorithm 2 converges.

### 5.3. Complexity Analysis

## 6 Extension to Tensor Version of Robust PCA

Traditional PCA methods are usually based on data points with a vector format; thus, high-order tensor data should be vectorized to very high-dimensional vectors before applying PCA. This strategy destroys the spatial information of tensor data and makes the computation very expensive. For this issue, tensor version PCA is extended to handle the original tensor data directly. In this way, the image matrix does not need to be transformed into a vector prior to feature extraction (Yang et al., 2004). In this section, we take the two-dimensional (2D) tensor as an example and extend the proposed -norm-based robust PCA to its tensor 2D version. Note that the higher-order tensor cases can also be intuitively extended by replacing the linear operator with the corresponding tensor operator (De Lathauwer, 1997).

^{6}, we extend the proposed robust PCA with -norm to its tensor 2D version as the following optimization problem: Note that the objective function above enhances the robustness of sensor 2D PCA with -norm-based error measurement; at the same time, it does not depend on the centralized data and avoids calculating the optimal mean automatically. Similar to other tensor methods, problem 6.2 can be solved through an alternative optimization algorithm. With fixed , the optimal can be achieved by solving a similar problem to optimization problem 4.7 with algorithm 2. In turn, the optimal can also be obtained with the current updated through algorithm 2.

## 7 Experiment

To illustrate the effectiveness and superiority of the proposed method, we conduct experimental evaluations on face image reconstruction with occlusions and outliers.

### 7.1. Competitors and Data Set Descriptions

We compare the proposed avoiding optimal mean -norm maximization-based PCA (PCAAOML21) with other state-of-the-art PCA models, including traditional -norm-based PCA, rotational invariant -norm PCA (PCAR1; Ding et al., 2006), nongreedy -norm maximization-based PCA (PCAL21; Nie & Huang, 2016), and -norm maximization-based PCA with the calculation of optimal mean avoided automatically (PCAAOML1; Luo et al., 2016).

All of the experiments are conducted on six real-world benchmark data sets. Some samples from these data sets are shown in Figure 1. Note that each image of facial expression is size for all data sets except the AR data set, with an image size of .

- Yale data set (Yale; Cai, He, Hu, Han, & Huang, 2007). It contains 165 gray-scale facial images of 15 individuals. Each object has 11 images with different facial expressions or configurations: center light, with glasses, happy, left light, without glasses, normal, right light, sad, sleepy, surprised, and wink.
Japanese Female Facial Expression database (JAFFE; Dailey et al., 2010). This data set consists of 213 images of seven facial expressions (happy, angry, disgust, fear, sad, surprise, and neutral) posed by 10 Japanese female models.

ORL faces data set (AT&T database of faces; Samaria & Harter, 1994). This contains 400 face images of 40 distinct subjects. For each subject, 10 images are taken at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling), and facial details (glasses/no glasses).

AR face database (Martinez, 1998). This face database contains over 4000 images, including frontal views of faces with different facial expressions, lighting conditions, and occlusions. Following Nie and Huang (2016), we select seven face images of 120 people (65 men and 55 women) in the experiment.

UMIST face data set (Wechsler, Phillips, Bruce, Soulie, & Huang, 2012). This consists of 1012 face images of 20 individuals with mixed race/gender/appearance. Each individual is shown in a range of poses from profile to frontal views.

Extended Multi-Modal Verification for Teleservices and Security applications database (XM2VTS; Messer, Matas, Kittler, Luettin, & Maitre, 1999). This database is collected for multi-modal identification of human faces. It contains four face images of 295 people (1180 images) taken over a period of four months.

### 7.2. Reconstruction Error with Occlusions

Under different reduced dimensions, we compare the reconstruction error of the proposed PCAAOML21 with the competitors mentioned above, regrading to varying percentages of images occluded (see Figures 2, 3, and 4). From these figures, we arrive at the following observations. First, with a small number of face images occluded, there is no significant difference in the reconstruction performance of traditional PCA and other robust PCA methods (except for the JAFFE data set). However, as more and more face images are occluded in each data set, the traditional PCA performs worse than other robust PCA methods. This is because the traditional PCA uses -norm-based distance measurement, which is sensitive to the samples with large reconstruction error. Second, since there is no theoretical analysis to show the close connection of -norm maximization-based PCA models to the minimization of reconstruction error, PCAAOML1 performs worse than other robust PCA models, and even the traditional PCA model for some data sets, such as the XM2VTS data set with 20% and 30% occluded images. The same observation (Kwak, 2008; Nie et al., 2011) is also verified in Nie and Huang (2016) for robust PCAL1 with extensively experimental results. As a result, the -norm maximization-based robust PCA is not a good option for PCA in some cases. Finally, thanks to the theoretical connection to reconstruction error, PCAAOML21 and PCAL21 achieve even better performance among other state-of-the-art robust PCA models over almost all of the data sets. Note that the PCAL21 model ignores the calculation of optimal mean and incorrectly uses the average of data as the optimal mean. However, the proposed PCAAOML21 does not depend on the zero mean assumption and avoids the optimal mean calculation automatically.

### 7.3. Reconstruction Error with Noise Images

To illustrate reconstruction performance with noise images, in this experiment, we add varying number of instances from the Columbia University Image Library (COIL-20) data set (Nene, Nayar, & Murase, 1996) into facial image data sets. The COIL-20 database is collected with 1440 gray-scale images of 20 objects. For each object, 72 images are taken at pose intervals of 5 degrees with respect to a fixed camera. We demonstrate some samples of COIL-20 in Figure 5.

Regarding the different number of reduced dimensions, we demonstrate the reconstruction performance of different PCA models over ORL and XM2VTS data sets in Figure 6, where 10%, 20%, and 30% images from the COIL-20 data set are randomly selected as the noise images. From Figure 6, we arrive at two observations. First, traditional PCA and PCAAOML1 methods perform worse than other robust PCA methods, especially over the ORL data set. Traditional -norm-based PCA is more sensitive to the noise images from the COIL-20 data set, and thus performs worse as the number of noise images increase. PCAAOML1 has a poor ability to reconstruct the data sets, which experimentally reconfirms that -norm maximization PCA is not theoretically connected with the minimization of reconstruction. Thus, it performs unsteadily with respect to different degrees of reduced dimensionality. A similar observation is also verified in Nie and Huang (2016) for robust PCAL1, with extensively experimental results. Due to more robust measurements of reconstruction loss used in the PCA-HQ and PCA-GM algorithms, they outperform traditional PCA as well as -norm-based robust PCA models. Second, the proposed PCAAOML21 performs comparably even better than other -norm-based PCA methods, including -norm minimization-based PCAR1 for reconstruction and -norm maximization-based PCAL21. Note that all of the -norm-based methods mentioned in this letter are theoretically connected with minimizing reconstruction error. Although PCAR1 and PCAL21 achieve only slightly worse performance, they incorrectly use the -norm-based optimal mean and presuppose the data are centralized in a prior. It is noteworthy that the proposed PCAAOML21 is not only theoretically connected to minimizing of reconstruction error, but also abandons the data-centralized assumption and avoids the -norm based optimal mean automatically.

## 8 Conclusion

In this letter, we rewrite the objective of traditional PCA and formulate a novel robust PCA as -norm maximization of difference between the pair of instances in the projected space. We prove this formulation is theoretically connected to minimizing of reconstruction error. At the same time, this strategy automatically avoids calculating the optimal mean and makes the centralized assumption on training data unnecessary. We exploited an efficient nongreedy algorithm to solve the proposed nonsmooth optimization problem with theoretical analysis on its convergence and computational complexity. Extensive experimental results on several benchmark datasets illustrate the effectiveness and superiority of the proposed methods for reconstruction.

## Acknowledgments

This work was funded by the National Science Foundation of China (Nos 61502377, 61532015, 61532004), the National Key Research and Development Program of China (No. 2016YFB1000903), the National Science Foundation (NSF) under grant No. IIS-1638429, and China Postdoctoral Science Foundation (No. 2015M582662).

## References

_{2,1}-norm maximization for principal component analysis