Abstract

Graph-based clustering methods perform clustering on a fixed input data graph. Thus such clustering results are sensitive to the particular graph construction. If this initial construction is of low quality, the resulting clustering may also be of low quality. We address this drawback by allowing the data graph itself to be adaptively adjusted in the clustering procedure. In particular, our proposed weight adaptive Laplacian (WAL) method learns a new data similarity matrix that can adaptively adjust the initial graph according to the similarity weight in the input data graph. We develop three versions of these methods based on the L2-norm, fuzzy entropy regularizer, and another exponential-based weight strategy, that yield three new graph-based clustering objectives. We derive optimization algorithms to solve these objectives. Experimental results on synthetic data sets and real-world benchmark data sets exhibit the effectiveness of these new graph-based clustering methods.

1  Introduction

Clustering is an important task in computer vision and machine learning research area with many applications, such as image segmentation (Shi & Malik, 2000), image categorization (Grauman & Darrell, 2006; Chang, Yang, Long, Zhang, & Hauptmann, 2016), scene analysis (Koppal & Narasimhan, 2006; Chang, Nie, Yang, & Huang, 2014), document clustering (Steinbach, Karypis, & Kumar, 2000; Xu, Liu, & Gong, 2003), motion modeling (Ochs & Brox, 2012), and medical image analysis (Brun, Knutsson, Park, Shenton, & Westin, 2004). Many clustering methods have been proposed (Cai, Nie, & Huang, 2013; Chang, Nie, Wang et al., 2015; Hagen & Kahng, 1992; Huang, Nie, & Huang, 2013; Li & Ding, 2006; Ng, Jordan, & Weiss, 2002; Nie, Zeng, Tsang, Wu, & Zhang, 2011; Huang, Nie, & Huang, 2015). Among these methods, graph-based clustering is a popular choice. It is easy to efficiently implement and often outperforms traditional clustering methods such as K-means. The graph-based clustering methods model the data as a weighted undirected graph based on pair-wise similarities (Nie, Wang, Deng et al., 2016). The goal of clustering is to separate data vectors in different clusters according to their similarities. For the similarity graph constructed from data, we want to find a partition of the graph such that the edges between different clusters have low weights and the edges within the same cluster have high weights. That is, we want to find a partition such that data vectors within the same clusters are similar to each other, and vectors in different clusters are dissimilar from each other.

State-of-the-art clustering methods are often based on a graphical model of the relationships among data points. For instance, nonnegative matrix factorization (Lee & Seung, 2001; Hoyer, 2004; Li & Ding, 2006), spectral clustering (Ng et al., 2002; Von Luxburg, 2007), normalized cut (Shi & Malik, 2000; Dhillon, Guan, & Kulis, 2004), and ratio cut (Hagen & Kahng, 1992; Chan, Schlag, & Zien, 1994) all transform the data into a weighted, undirected graph based on pairwise similarities. Then, clustering is accomplished by spectral or graphical theoretical optimization procedures. All of these graph-based methods have these two stages: a data graph is formed from the input data, and then optimization procedures are invoked on this fixed input data graph. These two stages are independent; thus, the clustering result depends on the quality of the input affinity matrix, which makes the clustering result sensitive to the particular graph construction methods. If this initial constructed graph is of low quality, the resulting clustering may also be of low quality.

In order to address these drawbacks, we hope to learn another data similarity matrix that can be adaptively adjusted in the spectral clustering procedure. In this letter, we propose a novel adaptive optimization process for the graph-based clustering model that learns a graph by weight-adaptive Laplacian algorithm. In the new model, instead of fixing the input data graph associated with the affinity matrix, we learn a new data similarity matrix that can adaptively adjust the optimization procedure and use this new learned data similarity matrix to guide the optimization process for the spectral clustering task. Based on the weight adaptive Laplacian algorithm, we also propose three weight strategies, based on the L2-norm (Nie, Wang, Jordan, & Huang, 2016), fuzzy entropy regularizer (Li, Ng, Cheung, & Huang, 2008), and another exponential-based weight strategy (Cai, Nie, Cai, & Huang, 2013), which yield three new graph-based clustering objectives. Finally, we derive optimization algorithms for the proposed three graph-based objective functions. We conduct empirical studies on simulated data sets and seven real-world benchmark data sets to validate the efficiency of our proposed methods.

In the rest of the letter, we first introduce our three proposed weight adaptive Laplacian algorithms for the graph-based clustering. Next, we derive the optimization method for each of the three objective functions. After that, we conduct experiments on both the synthetic data set and seven real-world benchmark data sets to illustrate the efficiency of the proposed clustering methods; we also provide some analysis of the experiments. We conclude with additional observations and future work.

Throughout the letter, all the matrices are capitalized. For matrix , the th row and the th element of are denoted by and , respectively. The L2-norm of the vector is denoted by , and means the trace of matrix . An identity matrix is denoted by , and denotes a column vector with all elements as one. For vector and matrix , , and means all the elements of and are equal to or larger than zero.

2  Weight-Adaptive Laplacian Algorithm for Clustering

Exploring the similarity matrix among data points is an important and basic strategy for clustering task. Given a data set , we denote as the probability of data points and to be the same class or to be similar. Thus, we can obtain the matrix to be a similarity matrix of the graph with data points. Suppose each node is assigned a function value , where indicates which class the nodes belong to, and is the number of classes. Then, the clustering problem can be verified to solve the following problem as illustrated in  Von Luxburg (2007) and Nie, Wang, & Huang (2014). The basic idea in spectral analysis is as follows,
formula
2.1
where with the th row formed by is the parameter to be optimized in this objective. is called the Laplacian matrix in graph theory; the degree matrix is defined as a diagonal matrix where the th diagonal element is . In this equation, the similarity matrix , with each element denoted as , is first initialized by the graph construction method described in section 4.1. The objective, equation 2.1, can be optimized by iterative methods described in Niyogi (2004).
Based on various graph constructions, we would like to use the constructed affinity matrix to restrain the misconnected graph weight while strengthening the correctly connected graph weight in order to guide the optimization process for equation 2.1. Graph-based clustering approaches typically optimize their objectives based on a given data graph associated with an affinity matrix , which can be symmetric or nonsymmetric, where is the number of nodes (data points) in the graph. The performance of these graph-based clustering methods is sensitive to the quality of the constructed affinity matrix . To address this challenge, we aim to learn a new data graph based on the data affinity matrix in order to make the new data graph more suitable for the clustering task. We propose to use the weight-adaptive Laplacian algorithm for graph-based clustering, which can adaptively adjust the clustering process. Given an affinity matrix , we learn a new similarity matrix . To avoid the case that some rows of are all zeros, we further constrain the sum of each row of to one. Under this constraint, we learn a new weight constraint graph that is more suitable for clustering, and we also consider three weight strategies on the proposed weight-adaptive Laplacian algorithms for clustering. The three regularization terms aim to make the graph smooth. We define weight-adaptive graph-based clustering as the solution to the following problems:
formula
2.2
formula
2.3
formula
2.4

As defined in equations 2.2 to 2.4, we denote each of the objectives as WAL_L2, WAL_Ln, and WAL_R, respectively. The proposed weight-constrained algorithm will adaptively optimize the spectral clustering objectives. It can be interpreted as follows. Taking equation 2.2 as an example, the first term in it, , is the weight-adaptive graph term, and the second term, , is the regularization term, which aims to make the graph smooth. We mainly focus on the first weight-adaptive graph term. Denote the affinity value between the th and th data node, which is fixed in the optimization process and can be constructed by other methods from the data points, and is the assigned value for indicating function to the th node. Denote , which is the probability distribution. Thus, we can see that when becomes larger, will become smaller, which means that only when both and become larger then become smaller. Because is the element of the fixed input affinity matrix and is the learned value for indicating function, the smaller indicates node and to be approximately the same class. The larger is, the closer the distance is between node and , and the larger means the higher probability that node and will be the same class. Based on this analysis, the proposed weight-adaptive Laplacian algorithms for graph-based clustering have the following two properties.

First, when is small in equation 2.2, there exist the following two cases about and :

  • Based on the weight-adaptive graph term in equation 2.2, when is very large, that means and are consistently assigning and to the same class, but their value is inconsistent (one is small and the other is large). Thus, they have no effect on the value of in the optimization process.

  • When is also small, we can see that the optimized are not consistent with to assign to the same class. Since both and are very small, that would make much smaller and lead the larger in the optimization process. Then adding the small weight to will strengthen the correctly connected weight between the same classes in the optimization process.

Second, when is very large, which means examples and are different classes based on the optimized and , there also exist the following two cases about and , analogous to the previous explanation:

  • When is very small, and are consistently assigning and into different classes, but their value is not consistent (one is small and the other is large). Thus they, have no effect on the value of .

  • When is also large, we can see that the optimized are not consistent with to assign to the same class. Since both and are very large, would be much larger and lead the smaller in the optimization process. Thus, adding a large weight to restrains the misconnected weight between the different classes in the optimization process.

Based on the above analysis of the weight-adaptive Laplacian algorithm for graph-based clustering according to equation 2.2, we can clearly see that the proposed weight strategy can adaptively adjust the optimization of the clustering process. The same properties can be obtained for the other two weight-adaptive clustering methods, illustrated in equations 2.3 and 2.4.

3  Optimization Algorithm to Solve the Weight-Adaptive Graph-Based Clustering Problem

According to the three proposed weight-adaptive Laplacian algorithms for graph-based clustering illustrated in equations 2.2 to 2.4, the three objectives can be optimized using the alternate method. We introduce the optimization methods for the three objectives separately.

3.1  Optimization Algorithm for Solving Problem WAL_L2 in Equation 2.2

Based on the objective function illustrated in equation 2.2, when is fixed, equation 2.2 becomes
formula
3.1
According to He's theorem (Niyogi, 2004), denoting , the minimization problem in equation 3.1 reduces to
formula
3.2
where is a diagonal matrix, its entries are the column sum of , and , is the Laplacian matrix. The optimal solution of is formed by the eigenvectors of corresponding to the smallest eigenvalues where is the cluster number.
When is fixed, the problem in equation 2.2 becomes
formula
3.3
Since the problem in equation 3.3 is independent for different , we can solve the following problem separately for each :
formula
3.4
Denoting and as a vector with the th element equal to , problem 3.4 can be written in vector form as
formula
3.5
This problem can be solved with an efficient iterative algorithm or by a closed-form solution as described in Huang, Nie, & Huang (2015). In the algorithm, we update and iteratively until they converge. The optimization algorithm is described in algorithm 1.
formula

3.2  Optimization Algorithm for Solving Problem WAL_Ln in Equation 2.3

The optimization process for equation 2.3 is analogicous to equation 2.2. When is fixed, the optimal is obtained the same as equation 3.1.

When is fixed, the problem in equation 2.3 becomes
formula
3.6
Denoting , the problem in equation 3.6 can be written in vector form, and since that problem is independent for different , we can solve the following problem separately for each :
formula
3.7
We use the Lagrangian multiplier technique to obtain the following unconstrained minimization problem,
formula
3.8
where is the Lagranian multiplier. In order to get the optimal solution of the above subproblem, set the derivative of equation 3.8 with respect to and to zero. We have
formula
3.9
and
formula
3.10
From equation 3.9, we obtain
formula
3.11
By substituting equation 3.11 into equation 3.10, we have
formula
3.12
It follows that
formula
3.13
and then can be updated by equation 3.13.

The alternating minimization procedure between and can be applied to the objective function (Li et al., 2008). Finally, we use the obtained optimal for clustering. A detailed description of the optimization method is provided in algorithm 2.

formula

3.3  Optimization Algorithm for Solving Problem WAL_R in Equation 2.4

To optimize the objective function in equation 2.4, we also use the alternate algorithm to get the optimal solution. When is fixed, the optimal can be obtained just as equation 3.1 illustrates, where the only difference is that , and we use the scalar to control the distribution of different weights.

When is fixed, the problem becomes
formula
3.14
Since the problem in equation 3.14 is independent for different , we can solve the following problem separately for each :
formula
3.15
Thus, the problem in equation 3.15 can be written as the Lagrange function in the following,
formula
3.16
where is the Lagrange multiplier. In order to get the optimal solution of the above subproblem, set the derivative of equation 3.16 with respect to to zero. We have
formula
3.17
Substituting the resultant in equation 3.17 into the constraint , we get
formula
3.18

By the above two steps, we alternatively update and , and repeat them iteratively until the objective function converges (Cai, Nie, Cai, & Huang, 2013). Finally, we got the optimal and for clustering. The optimization algorithm is described in algorithm 3.

formula

4  Experiments

In this section, we explore the performance of our clustering methods on synthetic and real-world benchmark data sets. For the synthetic data set, we use the block diagonal synthetic data and two-moon synthetic data to analyze the properties of our proposed weight-adaptive Laplacian algorithm for graph-based clustering. Seven real-world benchmark data sets are also used.

4.1  Initial Graph Affinity Matrix Learning

In the proposed algorithms, an initial graph-based affinity matrix is required before learning the new normalized similarity matrix, . We have used the graph construction method proposed in Nie, Wang, Jordan, et al. (2016), in which the learned matrix is naturally sparse and is computationally efficient for graph-based learning tasks such as clustering and semisupervised classification. Given the data points , the learned affinity values of make the smaller distance between data points and corresponding to a larger affinity value .

4.2  Experiments on Synthetic Data Sets

4.2.1  Block Diagonal Synthetic Data

The block diagonal synthetic data set we used is a matrix with four block matrices diagonally arranged. The data within each block denote the affinity of two corresponding points in one cluster, and the data outside all blocks denote the noise. The affinity data within each block are randomly generated in the range of 0 and 1, while the noise data are randomly generated in the range of 0 and , where is set as 0.6 and 0.7, respectively. To make this clustering task more challenging, we randomly pick out 25 noise data points and set their value to be 1.0.

Figure 1 illustrates the original random matrix and the clustering results obtained by the proposed three weight strategies under two settings, and we note that our proposed clustering methods exhibit good performance in this clustering task. We also compared the clustering accuracy with other graph-based clustering methods, as illustrated in Table 1.

Figure 1:

Clustering results on the block diagonal synthetic data by the WAL_L2, WAL_Ln, and WAL_R methods.

Figure 1:

Clustering results on the block diagonal synthetic data by the WAL_L2, WAL_Ln, and WAL_R methods.

Table 1:
Descriptions of the Seven Benchmark Data Sets.
Data SetsNumber of InstancesDimensionsClasses
 3970 8014 
 575 644 20 
 400 1024 40 
 2414 1024 38 
 1440 1024 20 
 213 676 10 
 713 64 
Data SetsNumber of InstancesDimensionsClasses
 3970 8014 
 575 644 20 
 400 1024 40 
 2414 1024 38 
 1440 1024 20 
 213 676 10 
 713 64 

4.2.2  Two-Moon Synthetic Data

The second toy data set is the randomly generated two-moon matrix. In this test, there are two clusters of data distributed in a moon shape. Each cluster has a volume of 100 samples, and the noise percentage is set to be 0.12. Our goal is to recompute the similarity matrix such that the number of connected components in the learned similarity matrix is exactly two. We test our proposed three methods on this data set and obtained good results on all of them, as illustrated in Figure 2, which shows the effectiveness of our proposed methods.

Figure 2:

Clustering results on the two-moon synthetic data.

Figure 2:

Clustering results on the two-moon synthetic data.

4.2.3  Experimental Results on Real Benchmark Data Sets

We also evaluated the proposed clustering methods on seven real-world benchmark data sets: 20news, Umist, Orl, yaleb, Coil20, Jaffe, and Dig0689. All of these data sets are from the UCI Machine Learning Repository (Asuncion & Newman, 2007) or some other image data sets. The descriptions of these seven data sets are summarized in Table 2.

Table 2:
Experimental Results on Real Benchmark Data Sets.
Algorithm20newsUmistOrlyalebCoil20JaffeDig0689
ACC 
K-means 0.2599 0.4087 0.6275 0.1127 0.6208 0.6761 0.6297 
RCut 0.2554 0.6573 0.7775 0.4279 0.7895 0.8439 0.7943 
NCut 0.2554 0.6713 0.7250 0.4275 0.7902 0.8439 0.7943 
NMF 0.2572 0.4278 0.7525 0.4693 0.7917 0.8685 0.7770 
CLR_L1 0.26.41 0.7182 0.7750 0.4675  0.8749 0.8743 
CLR_L2 0.2632 0.7291 0.7725 0.4703 0.8117 0.8755 0.8770 
WAL-R  0.7009 0.5350 0.3169 0.7438 0.8404 0.7055 
WAL-Ln 0.2678 0.7217 0.7675  0.7431 0.8873 0.8857 
WAL-L2 0.2677   0.4544 0.7944   
NMI 
K-means 0.1190 0.6510 0.8004 0.1604 0.7773 0.7272 0.7286 
RCut 0.0579 0.8387 0.8805 0.6450 0.8721 0.9144 0.7780 
NCut 0.0579 0.8426 0.8746 0.6390 0.8721 0.9144 0.7780 
NMF 0.088 0.6094 0.8719 0.6745 0.8974 0.8703 0.7668 
CLR_L1 0.128 0.8594 0.8729  0.8952 0.9213 0.8568 
CLR_L2 0.118 0.8532 0.8749  0.8934 0.9233 0.8598 
WAL-R  0.8369 0.7460 0.4953 0.8643 0.8734 0.6844 
WAL-Ln 0.0370 0.8697 0.8533 0.6698  0.9110 0.8449 
WAL-L2 0.0367   0.6694 0.8776   
Algorithm20newsUmistOrlyalebCoil20JaffeDig0689
ACC 
K-means 0.2599 0.4087 0.6275 0.1127 0.6208 0.6761 0.6297 
RCut 0.2554 0.6573 0.7775 0.4279 0.7895 0.8439 0.7943 
NCut 0.2554 0.6713 0.7250 0.4275 0.7902 0.8439 0.7943 
NMF 0.2572 0.4278 0.7525 0.4693 0.7917 0.8685 0.7770 
CLR_L1 0.26.41 0.7182 0.7750 0.4675  0.8749 0.8743 
CLR_L2 0.2632 0.7291 0.7725 0.4703 0.8117 0.8755 0.8770 
WAL-R  0.7009 0.5350 0.3169 0.7438 0.8404 0.7055 
WAL-Ln 0.2678 0.7217 0.7675  0.7431 0.8873 0.8857 
WAL-L2 0.2677   0.4544 0.7944   
NMI 
K-means 0.1190 0.6510 0.8004 0.1604 0.7773 0.7272 0.7286 
RCut 0.0579 0.8387 0.8805 0.6450 0.8721 0.9144 0.7780 
NCut 0.0579 0.8426 0.8746 0.6390 0.8721 0.9144 0.7780 
NMF 0.088 0.6094 0.8719 0.6745 0.8974 0.8703 0.7668 
CLR_L1 0.128 0.8594 0.8729  0.8952 0.9213 0.8568 
CLR_L2 0.118 0.8532 0.8749  0.8934 0.9233 0.8598 
WAL-R  0.8369 0.7460 0.4953 0.8643 0.8734 0.6844 
WAL-Ln 0.0370 0.8697 0.8533 0.6698  0.9110 0.8449 
WAL-L2 0.0367   0.6694 0.8776   

Note: The best performances are in bold.

We compared our proposed clustering methods with K-means, Ratio Cut(RCut), Normalized Cut(NCut), NMF, , and  (Nie, Wang, Jordan et al., 2016) methods in Table 3. For all the compared methods, we have used the same algorithm to construct the initial affinity matrix (Nie, Wang, Jordan et al., 2016) as the input matrix . To construct the affinity matrix, we set the number of neighbors, , to be five for the affinity matrix construction. As for the clustering method, we determined the value of , , and by the line search method to find the optimal parameter settings. Moreover, we set the number of clusters to be the ground truth in each data set for all the methods. For all of these methods, we record the average performance, and the standard clustering accuracy (ACC) and normalized mutual information (NMI) metrics are used to evaluate all of the clustering methods.

Table 3:
Experimental Results on Real Benchmark Data Sets.
Purity
Algorithm20newsUmistOrlyalebCoil20JaffeDig0689
K-means 0.2625 0.4957 0.6725 0.1326 0.6799 0.8685 0.7532 
RCut 0.2565 0.7625 0.7900 0.4718 0.8118 0.8873 0.7942 
NCut 0.2565 0.7704 0.7625 0.4581 0.8118 0.8873 0.7940 
NMF 0.2597 0.4765 0.7650 0.4925 0.8160 0.6901 0.7770 
CLR_L1 0.2647 0.8004 0.7822  0.8124 0.9013 0.8740 
CLR_L2 0.2747 0.8065 0.7853 0.4915 0.8143 0.8991 0.8790 
WAL-R  0.7339 0.5551 0.3587 0.7777 0.8544 0.7335 
WAL-Ln 0.2680 0.8000  0.4850 0.8000 0.8873 0.8875 
WAL-L2 0.2677  0.5550 0.4933    
Purity
Algorithm20newsUmistOrlyalebCoil20JaffeDig0689
K-means 0.2625 0.4957 0.6725 0.1326 0.6799 0.8685 0.7532 
RCut 0.2565 0.7625 0.7900 0.4718 0.8118 0.8873 0.7942 
NCut 0.2565 0.7704 0.7625 0.4581 0.8118 0.8873 0.7940 
NMF 0.2597 0.4765 0.7650 0.4925 0.8160 0.6901 0.7770 
CLR_L1 0.2647 0.8004 0.7822  0.8124 0.9013 0.8740 
CLR_L2 0.2747 0.8065 0.7853 0.4915 0.8143 0.8991 0.8790 
WAL-R  0.7339 0.5551 0.3587 0.7777 0.8544 0.7335 
WAL-Ln 0.2680 0.8000  0.4850 0.8000 0.8873 0.8875 
WAL-L2 0.2677  0.5550 0.4933    

Note: The best performances are in bold.

Since all the methods involve the K-means, including K-means, R-Cut, NCut, , and , and our proposed three methods, WAL-L2, WAL-Ln, and WAL-R, we used the same initialization for the k-means clustering involved in all the methods and report their average results over 10 repetitions in Table 3. From Table 3, we can conclude that our proposed methods outperform the compared methods on most of the benchmark data sets, and WAL-L2 is much better in most cases, which greatly illustrates the effectiveness of our proposed methods for graph-based clustering. The results of our methods are independent of the initialization and are always stable with a certain setting.

5  Conclusion

We have proposed novel graph-based clustering algorithm that learns a graph by weight-adaptive Laplacian algorithm. In the proposed algorithm, instead of fixing the input data graph associated with the affinity matrix, we learn a new data similarity matrix that can adaptively adjust the optimization procedure; finally, we use this new data similarity matrix for the clustering task. Based on the weight-constrained Laplacian algorithm, we also consider three regularizers on the proposed weight-constrained graph and propose three new clustering objectives, deriving optimization algorithms to solve them. Extensive experiments have been conducted on both the synthetic data and seven real-world benchmark data sets to demonstrate the performance of our models. In future work, we will extend the weight-constrained strategy to other graph-based clustering methods.

Acknowledgments

This work was supported by the National Basic Research Program of China (grant 2015CB351705) and the State Key Program of National Natural Science Foundation of China (grant 61332018).

References

Asuncion
,
A.
, &
Newman
,
D.
(
2007
).
UCI machine learning repository
. https://archive.ics.uci.edu/ml/datasets.html
Brun
,
A.
,
Knutsson
,
H.
,
Park
,
H.-J.
,
Shenton
,
M. E.
, &
Westin
,
C.-F.
(
2004
).
Clustering fiber traces using normalized cuts
. In
Proceedings of the Seventh International Conference of the MICCAI
(pp.
368
375
).
New York/Berlin
:
Springer
.
Cai
,
X.
,
Nie
,
F.
,
Cai
,
W.
, &
Huang
,
H.
(
2013
).
Heterogeneous image features integration via multi-modal semi-supervised learning model
. In
Proceedings of the IEEE International Conference on Computer Vision
(pp.
1737
1744
).
Piscataway, NJ
:
IEEE
.
Cai
,
X.
,
Nie
,
F.
, &
Huang
,
H.
(
2013
).
Multi-view k-means clustering on big data
. In
Proceedings of the International Joint Conference on Artificial Intelligence
.
Cambridge, MA
:
AAAI Press
.
Chan
,
P. K.
,
Schlag
,
M. D.
, &
Zien
,
J. Y.
(
1994
).
Spectral k-way ratio-cut partitioning and clustering
.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
,
13
,
1088
1096
.
Chang
,
X.
,
Nie
,
F.
,
Wang
,
S.
,
Yang
,
Y.
,
Zhou
,
X.
, &
Zhang
,
C.
(
2015
).
Compound rank-k projections for bilinear analysis
.
IEEE Transactions on Neural Networks and Learning systems
,
27
,
1502
1513
.
Chang
,
X.
,
Nie
,
F.
,
Yang
,
Y.
, &
Huang
,
H.
(
2014
).
A convex formulation for semi-supervised multi-label feature selection
. In
Proceedings of the 28th AAAI Conference on Artificial Intelligence
(pp.
1171
1177
).
Cambridge, MA
:
AAAI Press
.
Chang
,
X.
,
Yang
,
Y.
,
Long
,
G.
,
Zhang
,
C.
, &
Hauptmann
,
A. G.
(
2016
).
Dynamic concept composition for zero-example event detection
. In
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
(pp.
3464
3470
).
Cambridge, MA
:
AAAI Press
.
Dhillon
,
I. S.
,
Guan
,
Y.
, &
Kulis
,
B.
(
2004
).
Kernel k-means: Spectral clustering and normalized cuts
. In
Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(pp.
551
556
).
New York
:
ACM
.
Grauman
,
K.
, &
Darrell
,
T.
(
2006
).
Unsupervised learning of categories from sets of partially matching image features
. In
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(
vol. 1
, pp.
19
25
).
Piscataway, NJ
:
IEEE
.
Hagen
,
L.
, &
Kahng
,
A. B.
(
1992
).
New spectral methods for ratio cut partitioning and clustering
.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
,
11
,
1074
1085
.
Hoyer
,
P. O.
(
2004
).
Non-negative matrix factorization with sparseness constraints
.
Journal of Machine Learning Research
,
5
,
1457
1469
.
Huang
,
J.
,
Nie
,
F.
, &
Huang
,
H.
(
2013
).
Spectral rotation versus k-means in spectral clustering
. In
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence
.
Cambridge, MA
:
AAAI Press
.
Huang
,
J.
,
Nie
,
F.
, &
Huang
,
H.
(
2015
).
A new simplex sparse learning model to measure data similarity for clustering
. In
Proceedings of the 24th International Conference on Artificial Intelligence
(pp.
3569
3575
).
Cambridge, MA
:
AAAI Press
.
Koppal
,
S. J.
, &
Narasimhan
,
S. G.
(
2006
).
Clustering appearance for scene analysis
. In
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(
vol. 2
, pp.
1323
1330
).
Piscataway, NJ
:
IEEE
.
Lee
,
D. D.
, &
Seung
,
H. S.
(
2001
). Algorithms for non-negative matrix factorization. In
D. T. K.
Leen
,
T. G.
Dietteerich
, &
V.
Tresp
(Eds.),
Advances in neural information processing systems, 13
(pp.
556
562
).
Cambridge, MA
:
MIT Press
.
Li
,
M. J.
,
Ng
,
M. K.
,
Cheung
,
Y.-M.
, &
Huang
,
J. Z.
(
2008
).
Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters
.
IEEE Transactions on Knowledge and Data Engineering
,
20
,
1519
1534
.
Li
,
T.
, &
Ding
,
C.
(
2006
).
The relationships among various nonnegative matrix factorization methods for clustering
. In
Proceedings of the Sixth International Conference on Data Mining
(pp.
362
371
).
Piscataway, NJ
:
IEEE
.
Ng
,
A. Y.
,
Jordan
,
M. I.
, &
Weiss
,
Y.
(
2002
). On spectral clustering: Analysis and an algorithm. In
T. G.
Dietterich
,
S.
Becker
, &
Z.
Ghahramani
(Eds.),
Advances in neural information processing systems, 14
(pp.
849
856
).
Cambridge, MA
:
MIT Press
.
Nie
,
F.
,
Wang
,
H.
,
Deng
,
C.
,
Gao
,
X.
,
Li
,
X.
, &
Huang
,
H.
(
2016
).
New l1-norm relaxations and optimizations for graph clustering
. In
Proceedings of 30th the AAAI Conference on Artificial Intelligence
.
Cambridge, MA
:
AAAI Press
.
Nie
,
F.
,
Wang
,
X.
, &
Huang
,
H.
(
2014
).
Clustering and projected clustering with adaptive neighbors
. In
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(pp.
977
986
).
New York
:
ACM
.
Nie
,
F.
,
Wang
,
X.
,
Jordan
,
M. I.
, &
Huang
,
H.
(
2016
).
The constrained Laplacian rank algorithm for graph-based clustering
. In
Proceedings of the 30th AAAI Conference on Artificial Intelligence
.
Cambridge, MA
:
AAAI Press
.
Nie
,
F.
,
Zeng
,
Z.
,
Tsang
,
I. W.
,
Xu
,
D.
, &
Zhang
,
C.
(
2011
).
Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering
.
IEEE Transactions on Neural Networks
,
22
,
1796
1808
.
Niyogi
,
X.
(
2004
). Locality preserving projections. In
S.
Thrun
,
L. K.
Saul
, &
B.
Schölkopf
(Eds.),
Advances in neural information processing systems
,
16
.
Cambridge, MA
:
MIT Press
.
Ochs
,
P.
, &
Brox
,
T.
(
2012
).
Higher order motion models and spectral clustering
. In
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition
(pp.
614
621
).
Piscataway, NJ
:
IEEE
.
Shi
,
J.
, &
Malik
,
J.
(
2000
).
Normalized cuts and image segmentation
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
22
,
888
905
.
Steinbach
,
M.
,
Karypis
,
G.
, &
Kumar
,
V.
(
2000
).
A comparison of document clustering techniques
. In
Proceedings of the KDD Workshop on Text Mining
(pp.
525
526
).
Von Luxburg
,
U.
(
2007
).
A tutorial on spectral clustering
.
Statistics and Computing
,
17
,
395
416
.
Xu
,
W.
,
Liu
,
X.
, &
Gong
,
Y.
(
2003
).
Document clustering based on non-negative matrix factorization
. In
Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval
(pp.
267
273
).
New York
:
ACM
.