## Abstract

Most existing multiview clustering methods require that graph matrices in different views are computed beforehand and that each graph is obtained independently. However, this requirement ignores the correlation between multiple views. In this letter, we tackle the problem of multiview clustering by jointly optimizing the graph matrix to make full use of the data correlation between views. With the interview correlation, a concept factorization–based multiview clustering method is developed for data integration, and the adaptive method correlates the affinity weights of all views. This method differs from nonnegative matrix factorization–based clustering methods in that it can be applicable to data sets containing negative values. Experiments are conducted to demonstrate the effectiveness of the proposed method in comparison with state-of-the-art approaches in terms of accuracy, normalized mutual information, and purity.

## 1 Introduction

In data analysis, instances are often represented in heterogeneous views. For example, an image is represented by various feature extractors; a web page is described by the words on the page and the words in hyperlink that point to the page; a user's information is fused and analyzed from different social networks (Jia et al., 2016); and a video includes dynamic images, sound, and subtitles (Yang et al., 2012; Yang, Zhang, & Xu, 2015; Yan et al., 2016) Multiview learning uses the correlations between views to obtain higher performance than using any single-view features (Blum & Mitchell, 1998; Bickel & Scheffer, 2004; Kakade & Foster, 2007; Zhan, Zhang, Guan, & Wang, 2017).

Multiview clustering starts with a series of works on cotraining methods. Cotraining methods train models separately on each view and iteratively learn for each model through the exploitation of disagreement between models (Blum & Mitchell, 1998); the reasons for the success of cotraining methods have been investigated by Balcan, Blum, and Yang (2004) and Wang and Zhou (2010). Spectral clustering is one of the most popular clustering approaches. Taking advantage of the well-defined mathematical framework of spectral clustering (Shi & Malik, 2000; Ng, Jordan, & Weiss, 2002; Zelnik-Manor & Perona, 2004; Von Luxburg, 2007; Yang, Xu, Nie, Yan, & Zhuang, 2010), many multiview clustering methods are proposed (Blaschko & Lampert, 2008; Kumar & Daumé, 2011; Kumar, Rai, & Daume, 2011; Cai, Nie, Huang, & Kamangar, 2011; Xia, Pan, Du, & Yin, 2014; Li, Nie, Huang, & Huang, 2015). However, the drawbacks of these spectral clustering methods are that the performance of these methods highly depends on the precomputed affinity graph matrix, involves time-consuming calculation of eigenvectors of high-dimensional matrices, and the eigenvectors obtained have no direct relationship to the semantic structure of the data sets. Nonnegative matrix factorization (NMF) methods have recently been applied to multiview clustering with impressive results (Liu, Wang, Gao, & Han, 2013; Zhang, Zhao, Zong, Liu, & Yu, 2014) because the results of NMF-based clustering approaches have better semantic interpretation (Xu, Liu, & Gong, 2003; Xu & Gong, 2004; Ding, He, & Simon, 2005) and these NMF-based methods can be implemented by novel multiplicative update rules. However, a limitation of these NMF-based methods is that they are not applicable to data sets containing negative values.

Concept factorization (CF), a variant of NMF, can be used to process arbitrary data sets even though they have negative values, and CF inherits the advantage of the multiplicative update rules of NMF. Using these two advantages of CF, we apply an adaptive CF-based method to multiview clustering in this letter. We use an adaptive graph term to capture the local intrinsic geometrical structure of the data space (Cai, He, & Han, 2011), and the similarity between the data points is measured based on the new representations. We take all the data points in each view into consideration to optimize elements of the graph matrix in a global view by assuming that there is a larger probability that data points with a small distance between them will be neighbors. Our algorithm uses novel update rules to effectively find a solution to a well-designed optimization problem. A convergence analysis is also provided. Extensive empirical results on nine data sets show that the proposed multiview clustering method achieves better clustering results than state-of-the-art approaches.

This letter makes the following contributions. First, The proposed method jointly optimizes the graph matrix to make full use of the data correlation between views for multiview clustering. The novelty lies in learning one affinity graph from multiview data to address the correlation between views and avoid exploit construction of single graphs. Second, the proposed method can process arbitrary data sets even though they contain negative values, and the CF-based method has a better semantic interpretation (Xu et al., 2003; Xu & Gong, 2004; Ding et al., 2005) than spectral clustering-based methods. Finally, we propose a multiview clustering algorithm that combines concept factorization and locality-preserving methods in a unified optimization problem and solves this hard optimization problem with alternating optimization. The effectiveness of the algorithm is evaluated on nine data sets for the multiview clustering problem.

The remainder of the letter is organized as follows. In section 2, we propose an adaptive graph-regularized multiview concept factorization algorithm. We incorporate the correlation among multiple views to improve the performance of existing concept factorization clustering algorithms by jointly optimizing the graph matrix. In section 3, we propose a novel algorithm to optimize the well-designed objective function in section 2. In section 4, we present numerical experiments and comparison results. We use nine data sets and compare them with seven state-of-the-art methods. Section 5 concludes with some discussion.

## 2 Multiview Concept Factorization

As opposed to precomputing the affinity graphs, the graph in equation 2.7 is learned by globally modeling all the features from multiple views, making the multiview learning procedures mutually beneficial and reciprocal. In the following section, we describe a novel solution for obtaining the local optima to solve the objective function in equation 2.7. In equation 2.7, the first term is CF, the second term is the manifold regularization, and the third term is -norm regularization. The first term is used to learn the low-dimensional data representation because most NMF-based methods are applied to data clustering. The second term is used to add a manifold regularization so that the data structure of the original space is still perserved in low-dimensional manifold. The third term is used to avoid the trivial solution of the view-weight .

## 3 Optimization

### 3.1 Algorithm Derivation

We optimize equation 2.7 with the following three steps.

*Step 1*. First, we fix and for all , and , updating and for each independently. Then equation 2.7 becomes

The multiplicative update algorithm of equation 3.1 is based on the following theorem proposed by Sha, Lin, Saul, and Lee (2007).

To avoid the scale ambiguity of equation 3.1, we also adopt the normalization strategy of equations 2.2 and 2.3.

The algorithm for solving the problem, equation 2.7, is summarized in algorithm 1.

### 3.2 Convergence Analysis

If we set the parameter to a value of larger than 1, the objective function in equation 2.7 is nonincreasing with respect to one variable while holding the others.

To prove theorem ^{2} and because is a convex optimization problem and is a convex optimization problem when , we first prove theorem ^{3}:

We use an auxiliary function as used in the expectation-maximization algorithm (Dempster, Laird, & Rubin, 1977; Lee & Seung, 2001) to prove the convergence of . The definition of the auxiliary function is given by definition ^{5}.

.

Thus, by iterating the update in equation 3.24, we obtain a sequence of estimates that converge to a local minimum of the objective function .

Note that the minimum of the objective function in equation 3.1 is our update rules in equations 3.9 and 3.12 with theorem ^{4} and proper auxiliary functions. As the two update rules are based on theorem ^{1}, we need the proof of theorem ^{1}:

^{5}, we need to prove . Comparing equation 3.26 to 3.25 is equivalent to proving the following inequalities:

The minimization of equation 3.25 is performed by setting its derivative to zero with respect to , leading to the update rule in equation 3.6.

As is a quadratic form of or , we have proved theorem ^{3}. Again, since objective functions and are convex optimization problems, theorem ^{2} is also proved.

### 3.3 Computational Complexity Analysis

The overall computational complexity of the proposed algorithm is , where is the number of data points. The complexity of the first step in the algorithm is , where is the number of iterations and is the number of clusters. The second step is , where is the number of iterations and is the number of views. The third step is . Since , and , the overall complexity is .

## 4 Experimental Results

### 4.1 Data Sets

3-Sources is constructed from three well-known online news sources: BBC, Reuters, and the *Guardian*. In total there are 948 news articles covering 416 distinct news stories from February 2009 to April 2009. Of these stories, 169 were reported in all three sources. Each story was manually annotated with one of the six topical labels: business, entertainment, health, politics, sport, and technology.

WebKB contains four subsets (Cornell, Texas, Washington, and Wisconsin) of documents and is described by two views (content and citations). Cornell contains 195 documents over five labels (student, project, course, staff, and faculty). The documents are described by 1703 words in the content view and by the 569 links between them in the citations view. Texas, Washington, and Wisconsin have the same structure and contain 187, 230, and 265 documents, respectively.

Animals with Attributes (AwA) is an animal data set. We use four published features for 500 images belonging to five classes. The features are SIFT, Local Self-Similarity, PyramidHOG, and SURF, respectively.

Caltech101 is an object recognition data set. We select seven widely used classes: Faces, Motorbikes, Dolla-Bill, Garfield, Snoopy, Stop-Sign, and Windsor-Chair. We sample 441 data points from the data set in our experiment.

Handwritten Numerals (Numerals) consists of 2000 data points for 0 to 9 10-digit classes. We use the four published visual features extracted from each image: Fourier coefficients of the character shapes, profile correlations, pixel averages in 2 3 windows, and Zernike moment.

Outdoor Scene (Scene) is an outdoor scene data set. This data set contains 2150 data points corresponding to 2150 color images, which belong to eight outdoor scene categories: coast, mountain, forest, open county, street, inside city, tall buildings, and highways.

### 4.2 Experimental Setup

We evaluate the performance of the proposed multiview concept factorization (MVCF) method on the nine data sets. MVCF is compared with state-of-the-art multiview clustering methods, including multimodal spectral clustering (MMSC) (Cai, Nie et al., 2011), cotrained spectral clustering (CTSC) (Kumar & Daumé, 2011), coregularized spectral clustering (CRSC) (Kumar et al., 2011), multiview NMF clustering (MultiNMF) (Liu et al., 2013), robust multiview -means clustering (RMKMC) (Cai, Nie, & Huang, 2013), robust multiview spectral clustering (RMSC) (Xia et al., 2014), and large-scale multiview spectral clustering (MVSC) (Li et al., 2015), to demonstrate its effectiveness.

We compare MVCF with the following methods:

MMSC (Cai, Nie et al., 2011). In the MMSC algorithm, each type of feature is considered as one modal. The MMSC algorithm aims to learn a commonly shared graph Laplacian matrix by unifying different modals. In addition, a nonnegative relaxation is added in this method to improve the robustness and efficiency of clustering.

CTSC (Kumar & Daumé, 2011). This is a multiview spectral clustering approach using the idea of cotraining. Under the assumption that the true underlying clustering would assign a point to the same cluster regardless of the view, it learns the clustering in one view and then uses it to label the data in the other view so as to modify the graph structure (similarity matrix).

CRSC (Kumar et al., 2011). This applies a centroid-based coregularization scheme to multiview spectral clustering. To make the clusterings in different views agree with each other, CRSC enforces view-specific eigenvectors to look similar by regularizing them toward a common consensus and then optimizes individual clusterings as well as the consensus by using a joint cost function.

MultiNMF (Liu et al., 2013). This aims to search for a factorization that gives compatible clustering solutions across multiple views, requiring coefficient matrices learned from factorizations of different views to be regularized toward a common consensus.

RMKMC (Cai et al., 2013). This simultaneously performs clustering using each view of features and unifies their results based on their importance to the clustering task. -norm is also employed to improve the robustness.

RMSC (Xia et al., 2014). For each view, this constructs a corresponding transition probability matrix, which is then used for recovering a low-rank transition probability matrix. Based on this, the standard Markov chain method is utilized for processing, and then clustering is conducted.

MVSC (Li et al., 2015). This large-scale multiview spectral clustering approach is based on the bipartite graph. MVSC uses local manifold fusion to integrate heterogeneous features and approximates the similarity graphs using bipartite graphs to improve efficiency. Furthermore, this method can be easily extended to handle the out-of-sample problem.

The parameters of the eight baseline algorithms are tuned to obtain the best results, as suggested by the respective authors. Our method has two parameters: and . For all experiments, is empirically fixed as 10 for all data sets. controls the weight distribution of different views, and we obtain the best by searching in the range of [4.8, 2.6] with interval 0.2. We obtain the optimal data representations by adding the product of the data representation matrix and its weight in each view together (Wang et al., 2017). Because each learned represents diverse information of an intrinsic data structure, we can integrate them with the weighted sum rule. Following Li et al. (2015), we obtain the clustering labels by running -means on the optimal data. Without loss of generality, we run each method 10 times and report the mean performance as well as the standard deviation. In each experiment, we run -means clustering processing 30 times and obtain the best result to reduce the randomness of -means.

### 4.3 Evaluation Metric

Three metrics—the clustering accuracy (ACC), the normalized mutual information (NMI) (Strehl & Ghosh, 2002) and the Purity (Ievgen & Younes, 2014)—are used to evaluate the performance in this work. These measurements are widely used, and they can be calculated by comparing the obtained label of each sample with the ideal label provided by the data set. For each metric, a larger value indicates better clustering performance.

### 4.4 Performance Comparison

First, we apply the proposed concept factorization framework into each single view of all data sets to obtain the clustering results and then compare the results with MVCF's results, which are shown in Figures 1, 2, and 3. From the three bar graphs, it is obvious that MVCF outperforms any single view's result, which means that the multiview framework can learn and integrate all of the useful information from complementary views, consequently obtaining better clustering results.

After comparing the proposed method with other baseline algorithms, we show the clustering results in terms of ACC, NMI, and Purity in Tables 1, 2, and 3, respectively. In each row of the tables, the best and second best results are highlighted in bold. Note that AwA, Caltech101, Numerals, and Scene contain negative data, so the MultiNMF method is not applicable to these data sets.

. | MMSC . | CTSC . | CRSC . | RMKMC . | RMSC . | MVSC . | MultiNMF . | MVCF . |
---|---|---|---|---|---|---|---|---|

3-Sources | 52.37 3.02 | 59.97 1.29 | 62.72 0.93 | 45.92 3.57 | 52.78 1.30 | 62.54 3.47 | 54.20 0.75 | 80.24 1.58 |

Cornell | 46.97 0.26 | 43.71 0.73 | 57.03 1.15 | 48.62 6.42 | 38.62 0.64 | 50.62 3.49 | 42.87 0.50 | 61.74 4.97 |

Texas | 64.01 2.05 | 56.26 1.10 | 61.23 1.64 | 58.45 2.28 | 42.46 1.19 | 59.63 5.20 | 63.90 1.08 | 68.24 4.34 |

Washington | 59.00 1.25 | 59.24 0.30 | 59.61 0.25 | 60.61 2.41 | 42.09 1.77 | 59.78 3.18 | 63.09 0.86 | 71.26 1.17 |

Wisconsin | 51.36 0.38 | 49.43 0.60 | 58.87 0.69 | 57.55 3.12 | 36.11 1.56 | 59.28 2.96 | 45.85 3.47 | 73.47 2.28 |

AwA | 28.12 0.44 | 28.23 0.37 | 26.92 0.58 | 28.80 0.72 | 28.52 0.76 | 29.72 0.49 | — | 30.20 0.71 |

Caltech101 | 67.82 0.41 | 71.24 1.29 | 73.83 1.55 | 69.21 2.13 | 71.00 0.80 | 71.07 1.75 | — | 73.11 2.02 |

Numerals | 77.31 1.60 | 79.08 1.02 | 86.35 3.04 | 70.81 2.68 | 83.52 2.10 | 88.41 1.11 | — | 88.30 1.48 |

Scene | 43.46 1.83 | 63.80 1.07 | 65.44 1.00 | 55.99 2.65 | 61.71 0.12 | 64.28 1.20 | — | 68.56 3.93 |

Average | 54.49 | 56.77 | 61.33 | 55.11 | 50.76 | 60.59 | — | 68.35 |

. | MMSC . | CTSC . | CRSC . | RMKMC . | RMSC . | MVSC . | MultiNMF . | MVCF . |
---|---|---|---|---|---|---|---|---|

3-Sources | 52.37 3.02 | 59.97 1.29 | 62.72 0.93 | 45.92 3.57 | 52.78 1.30 | 62.54 3.47 | 54.20 0.75 | 80.24 1.58 |

Cornell | 46.97 0.26 | 43.71 0.73 | 57.03 1.15 | 48.62 6.42 | 38.62 0.64 | 50.62 3.49 | 42.87 0.50 | 61.74 4.97 |

Texas | 64.01 2.05 | 56.26 1.10 | 61.23 1.64 | 58.45 2.28 | 42.46 1.19 | 59.63 5.20 | 63.90 1.08 | 68.24 4.34 |

Washington | 59.00 1.25 | 59.24 0.30 | 59.61 0.25 | 60.61 2.41 | 42.09 1.77 | 59.78 3.18 | 63.09 0.86 | 71.26 1.17 |

Wisconsin | 51.36 0.38 | 49.43 0.60 | 58.87 0.69 | 57.55 3.12 | 36.11 1.56 | 59.28 2.96 | 45.85 3.47 | 73.47 2.28 |

AwA | 28.12 0.44 | 28.23 0.37 | 26.92 0.58 | 28.80 0.72 | 28.52 0.76 | 29.72 0.49 | — | 30.20 0.71 |

Caltech101 | 67.82 0.41 | 71.24 1.29 | 73.83 1.55 | 69.21 2.13 | 71.00 0.80 | 71.07 1.75 | — | 73.11 2.02 |

Numerals | 77.31 1.60 | 79.08 1.02 | 86.35 3.04 | 70.81 2.68 | 83.52 2.10 | 88.41 1.11 | — | 88.30 1.48 |

Scene | 43.46 1.83 | 63.80 1.07 | 65.44 1.00 | 55.99 2.65 | 61.71 0.12 | 64.28 1.20 | — | 68.56 3.93 |

Average | 54.49 | 56.77 | 61.33 | 55.11 | 50.76 | 60.59 | — | 68.35 |

. | MMSC . | CTSC . | CRSC . | RMKMC . | RMSC . | MVSC . | MultiNMF . | MVCF . |
---|---|---|---|---|---|---|---|---|

3-Sources | 45.28 5.40 | 54.55 0.59 | 54.09 1.17 | 33.57 8.16 | 43.23 1.28 | 47.31 6.30 | 48.23 0.87 | 70.84 2.04 |

Cornell | 17.26 1.71 | 22.67 0.48 | 32.54 0.79 | 23.92 9.42 | 14.93 0.36 | 27.48 4.12 | 13.21 0.96 | 31.18 5.89 |

Texas | 31.95 5.06 | 25.39 0.38 | 28.89 2.02 | 26.08 5.32 | 18.36 0.34 | 26.48 6.03 | 25.37 1.10 | 38.42 4.54 |

Washington | 22.49 2.20 | 30.89 0.53 | 30.67 0.49 | 29.79 5.09 | 18.79 1.00 | 27.52 6.04 | 24.49 1.25 | 41.99 1.57 |

Wisconsin | 15.61 1.17 | 23.02 0.95 | 41.81 0.00 | 35.32 6.64 | 14.40 0.33 | 35.45 3.93 | 12.64 3.76 | 46.26 3.40 |

AwA | 4.65 0.46 | 4.33 0.34 | 3.16 0.28 | 4.37 0.92 | 4.55 0.26 | 5.16 0.63 | — | 6.20 0.57 |

Caltech101 | 57.27 0.60 | 69.99 0.49 | 70.84 1.78 | 61.82 1.92 | 68.09 1.89 | 64.97 3.89 | — | 74.26 2.97 |

Numerals | 71.30 0.30 | 76.42 0.31 | 77.65 1.58 | 67.21 3.00 | 75.94 1.07 | 80.52 1.26 | — | 80.53 1.54 |

Scene | 35.50 2.95 | 50.22 0.46 | 50.11 0.72 | 47.81 2.92 | 47.78 0.27 | 50.45 0.81 | — | 53.37 2.13 |

Average | 33.48 | 39.72 | 43.31 | 36.65 | 34.01 | 40.59 | — | 49.23 |

. | MMSC . | CTSC . | CRSC . | RMKMC . | RMSC . | MVSC . | MultiNMF . | MVCF . |
---|---|---|---|---|---|---|---|---|

3-Sources | 45.28 5.40 | 54.55 0.59 | 54.09 1.17 | 33.57 8.16 | 43.23 1.28 | 47.31 6.30 | 48.23 0.87 | 70.84 2.04 |

Cornell | 17.26 1.71 | 22.67 0.48 | 32.54 0.79 | 23.92 9.42 | 14.93 0.36 | 27.48 4.12 | 13.21 0.96 | 31.18 5.89 |

Texas | 31.95 5.06 | 25.39 0.38 | 28.89 2.02 | 26.08 5.32 | 18.36 0.34 | 26.48 6.03 | 25.37 1.10 | 38.42 4.54 |

Washington | 22.49 2.20 | 30.89 0.53 | 30.67 0.49 | 29.79 5.09 | 18.79 1.00 | 27.52 6.04 | 24.49 1.25 | 41.99 1.57 |

Wisconsin | 15.61 1.17 | 23.02 0.95 | 41.81 0.00 | 35.32 6.64 | 14.40 0.33 | 35.45 3.93 | 12.64 3.76 | 46.26 3.40 |

AwA | 4.65 0.46 | 4.33 0.34 | 3.16 0.28 | 4.37 0.92 | 4.55 0.26 | 5.16 0.63 | — | 6.20 0.57 |

Caltech101 | 57.27 0.60 | 69.99 0.49 | 70.84 1.78 | 61.82 1.92 | 68.09 1.89 | 64.97 3.89 | — | 74.26 2.97 |

Numerals | 71.30 0.30 | 76.42 0.31 | 77.65 1.58 | 67.21 3.00 | 75.94 1.07 | 80.52 1.26 | — | 80.53 1.54 |

Scene | 35.50 2.95 | 50.22 0.46 | 50.11 0.72 | 47.81 2.92 | 47.78 0.27 | 50.45 0.81 | — | 53.37 2.13 |

Average | 33.48 | 39.72 | 43.31 | 36.65 | 34.01 | 40.59 | — | 49.23 |

. | MMSC . | CTSC . | CRSC . | RMKMC . | RMSC . | MVSC . | MultiNMF . | MVCF . |
---|---|---|---|---|---|---|---|---|

3-Sources | 62.25 4.90 | 75.15 3.46 | 70.41 2.33 | 58.22 4.59 | 66.33 0.98 | 68.93 4.87 | 63.91 1.12 | 83.79 1.58 |

Cornell | 51.54 1.46 | 54.87 0.48 | 62.05 1.67 | 57.13 7.28 | 50.26 1.51 | 59.44 3.56 | 46.51 0.84 | 62.26 4.94 |

Texas | 69.84 3.23 | 67.01 1.84 | 65.61 2.28 | 65.78 3.86 | 59.47 2.32 | 66.26 3.86 | 64.87 1.43 | 71.34 2.63 |

Washington | 63.65 1.45 | 66.87 0.67 | 68.04 1.33 | 67.61 2.37 | 62.74 1.79 | 67.43 2.09 | 65.17 0.56 | 72.39 0.83 |

Wisconsin | 54.15 0.89 | 60.42 1.95 | 74.08 1.82 | 69.32 4.69 | 56.26 1.30 | 70.83 2.47 | 52.91 2.11 | 74.42 2.51 |

AwA | 28.56 0.60 | 27.52 0.14 | 27.18 0.58 | 29.16 0.83 | 29.14 0.33 | 29.98 0.69 | — | 30.30 0.74 |

Caltech101 | 69.23 0.40 | 80.63 3.04 | 79.46 2.95 | 71.32 3.02 | 73.38 2.62 | 76.87 2.96 | — | 80.32 2.39 |

Numerals | 77.56 1.04 | 72.62 0.16 | 86.37 2.98 | 72.65 2.61 | 83.52 2.10 | 88.41 1.11 | — | 88.30 1.48 |

Scene | 43.62 1.73 | 43.47 2.03 | 65.44 1.00 | 56.35 3.24 | 61.71 0.12 | 64.39 1.10 | — | 68.94 3.10 |

Average | 57.82 | 60.95 | 66.52 | 60.84 | 60.31 | 65.84 | — | 70.23 |

. | MMSC . | CTSC . | CRSC . | RMKMC . | RMSC . | MVSC . | MultiNMF . | MVCF . |
---|---|---|---|---|---|---|---|---|

3-Sources | 62.25 4.90 | 75.15 3.46 | 70.41 2.33 | 58.22 4.59 | 66.33 0.98 | 68.93 4.87 | 63.91 1.12 | 83.79 1.58 |

Cornell | 51.54 1.46 | 54.87 0.48 | 62.05 1.67 | 57.13 7.28 | 50.26 1.51 | 59.44 3.56 | 46.51 0.84 | 62.26 4.94 |

Texas | 69.84 3.23 | 67.01 1.84 | 65.61 2.28 | 65.78 3.86 | 59.47 2.32 | 66.26 3.86 | 64.87 1.43 | 71.34 2.63 |

Washington | 63.65 1.45 | 66.87 0.67 | 68.04 1.33 | 67.61 2.37 | 62.74 1.79 | 67.43 2.09 | 65.17 0.56 | 72.39 0.83 |

Wisconsin | 54.15 0.89 | 60.42 1.95 | 74.08 1.82 | 69.32 4.69 | 56.26 1.30 | 70.83 2.47 | 52.91 2.11 | 74.42 2.51 |

AwA | 28.56 0.60 | 27.52 0.14 | 27.18 0.58 | 29.16 0.83 | 29.14 0.33 | 29.98 0.69 | — | 30.30 0.74 |

Caltech101 | 69.23 0.40 | 80.63 3.04 | 79.46 2.95 | 71.32 3.02 | 73.38 2.62 | 76.87 2.96 | — | 80.32 2.39 |

Numerals | 77.56 1.04 | 72.62 0.16 | 86.37 2.98 | 72.65 2.61 | 83.52 2.10 | 88.41 1.11 | — | 88.30 1.48 |

Scene | 43.62 1.73 | 43.47 2.03 | 65.44 1.00 | 56.35 3.24 | 61.71 0.12 | 64.39 1.10 | — | 68.94 3.10 |

Average | 57.82 | 60.95 | 66.52 | 60.84 | 60.31 | 65.84 | — | 70.23 |

Clearly, MVCF achieves the best performance in most cases; for the remaining ones, it surprisingly still produces competitive results. Compared with the second-best performance on the 3-Sources data set, the proposed MVCF method significantly improves the clustering performance significantly by more than 10%. In addition, we calculate the mean performance of the different methods on all data sets, shown in the last row of each table. An interesting point is that CRSC is then demonstrated to be the second-best method and the MVCF method performs the best. The quantitative result fully demonstrates the superiority of the proposed method because MVCF better captures the geometrical structure of the data space.

## 5 Conclusion and Future Work

In this letter, we have proposed a multiview clustering model that can address the negative data issue under nonnegativity constraints and the interview correlation issue of most existing models. The first issue is tackled by adopting concept factorization, and the second is addressed by learning a single affinity graph from the multiple views. We have proposed a novel CF-based algorithm that not only inherits the strengths of NMF, such as fast multiplicative iteration and parts-based representation in accordance with human brain intuition (Lee & Seung, 1999) but also is applicable to data sets containing negative values. We have taken the great impact of local manifold geometry structure into consideration and extended the proposed algorithm to a multiview clustering to effectively use the complementary information of the data. The experiments demonstrate the superiority of our algorithm over other state-of-the-art methods.

MVCF exploits the data structure by using manifold regularization without the requirement of eigenvalue decomposition, which renders MVCF effective. However, the time complexity is still high. In the future, we will use the active Riemannian subspace search for maximum margin matrix factorization (Yan et al., 2015) to reduce the complexity and obtain high accuracy in large-scale data sets.

## Acknowledgments

This work has been supported by the National Natural Science Foundation of China under grant 61201422, the Specialized Research Fund for the Doctoral Program of Higher Education under grant 20120211120013, and the Fundamental Research Funds for the Central Universities under grants lzujbky-2017-it73 and No. lzujbky-2017-it76.