Abstract

Spectral clustering is a key research topic in the field of machine learning and data mining. Most of the existing spectral clustering algorithms are built on gaussian Laplacian matrices, which is sensitive to parameters. We propose a novel parameter-free distance-consistent locally linear embedding. The proposed distance-consistent LLE can promise that edges between closer data points are heavier. We also propose a novel improved spectral clustering via embedded label propagation. Our algorithm is built on two advancements of the state of the art. First is label propagation, which propagates a node's labels to neighboring nodes according to their proximity. We perform standard spectral clustering on original data and assign each cluster with -nearest data points and then we propagate labels through dense unlabeled data regions. Second is manifold learning, which has been widely used for its capacity to leverage the manifold structure of data points. Extensive experiments on various data sets validate the superiority of the proposed algorithm compared to state-of-the-art spectral algorithms.

1  Introduction

Data clustering is a fundamental research topic and is widely used for many applications in the fields of artificial intelligence, statistics, and social sciences (Jain, Murty, & Flynn, 1999; Jain & Dubes, 1988; Girolami, 2002; Ye, Zhao, & Liu, 2007). The objective of clustering is to partition the original data points into various groups so that data points within the same cluster are dense while those in different clusters are far from each other (Jain & Dubes, 1988; Filippone, Camastra, Masulli, & Rovetta, 2008).

Among various implementations of clustering, k-means is one of the most popular choices because of its simplicity and effectiveness (Wu, Hoi, Jin, Zhu, & Yu, 2012). The general procedure of traditional k-means (TKM) is to randomly initialize clustering centers, assign each data point to its nearest cluster, and compute a new clustering center. Researchers claim that the curse of dimensionality may deteriorate the performance of TKM (Ding & Li, 2007). A straightforward solution to this problem is to project original data sets to a low-dimensional subspace by dimensionality reduction, PCA, before performing TKM. Discriminative analysis has been shown to be effective in enhancing clustering performance (Ding & Li, 2007; La Torre, Fernando, & Kanade, 2006; Ye, Zhao, & Liu, 2007). Motivated by this fact, we propose discriminative k-means (DKM) (Ye, Zhao, & Wu, 2007) to incorporate discriminative analysis and clustering into a single framework to formalize the clustering as a trace maximization problem. However, TKM and DKM fail to take low-dimensional manifold structure of data into consideration.

Spectral clustering (SC) (Yu & Shi, 2003; Filippone et al., 2008; Shi & Malik, 2000) has gradually attracted more and more research attention for its capacity to mine intrinsic data geometric structures, which facilitates partitioning data with more complicated structures (Belkin & Niyogi, 2003; Yang, Shen, Nie, Ji, & Zhou, 2011; Nie, Xu, Tsang, & Zhang, 2009; Wu & Schölkopf, 2006; Yang, Xu, Nie, Yan, & Zhuang, 2010). The basic idea of SC is to find a cluster assignment of the data points by adopting a spectrum of similarity matrix that leverages the nonlinear and low-dimensional manifold structure of the original data. Inspired by the benefits of spectral clustering, researchers have proposed different variants of the SC method to demonstrate its effectiveness. For example, local learning-based clustering (LLC) (Wu & Schölkopf, 2006) uses a kernel regression model for label prediction based on the assumption that the class label of a data point can be determined by its neighbors. Self-tuning SC (Zelnik-Manor & Perona, 2004) is able to tune parameters automatically in a unsupervised scenario. Normalized cuts are capable of balancing the volume of clusters for the use of data density information (Shi & Malik, 2000).

Label propagation has shown its capability for propagating labels through the data set along high-density areas defined by unlabeled data (Zhu & Ghahramani, 2002; Wang & Zhang, 2008). The key to the label propagation is cluster assumption (Chapelle, Weston, & Schölkopf, 2002): nearby data points are likely to belong to the same cluster, and data points on the same structure are likely to have the same label. Motivated by the benefits gained by label propagation, we intend to introduce label propagation into the field of spectral clustering (Kang, Jin, & Sukthankar, 2006; Cao, Luo, & Huang, 2008; Cheng, Liu, & Yang, 2009).

Our proposed spectral clustering algorithm combines the strengths of spectral clustering and label propagation. The main process of our algorithm is shown in Figure 1. We first perform standard spectral clustering on an original data set and obtain clusters. Then we pick out data points that are close to each cluster center respectively and form a label matrix . By using of manifold learning, we propagate labels through dense unlabeled data regions. We call the proposed method improved spectral clustering via embedded label propagation (SCLP).

Figure 1:

Framework of the proposed algorithm.

Figure 1:

Framework of the proposed algorithm.

The main contributions of this letter can be summarized as follows:

  1. To the best of our knowledge, this is the first time that spectral clustering and embedded label propagation have been incorporated into a single framework. We propagate the labels obtained by spectral clustering to other unlabeled data points.

  2. We integrate the advantage of manifold learning, which is capable of leveraging manifold structure among data points, into the proposed framework.

  3. We propose novel distance-consistent locally linear embedding. The proposed graph is different from a traditional gaussian graph approach, parameter free.

  4. Extensive experiments on seven real-world data sets demonstrate that the proposed SCLP outperforms state-of-the-art clustering algorithms.

The rest of this letter is organized as follows. After revisiting related work on locally linear embedding and spectral clustering in section 2, we detail our SCLP algorithm in section 3. Extensive experiments are given in section 4, and section 5 concludes this letter.

2  Related Work

2.1  Locally Linear Embedding

Locally linear embedding (LLE) (Roweis & Saul, 2000) aims to identify low-dimensional global coordinates that lie on or very near a manifold embedded in a high-dimensional space. The purpose is to combine the data points with minimal discrepancy after completing a different linear dimensionality reduction at each point.

LLE has three steps: build a neighborhood for each data point, find the weights in order to linearly approximate the data in that neighborhood, and find the low-dimensional coordinates best reconstructed by those weights.

By way of example, given a data set matrix , the main steps of LLE are as follows:

  1. For each data point , find its nearest neighbors.

  2. Compute the weight matrix by minimizing the residual sum of squares to reconstruct each from its neighbors:
    formula
    where if is not one of 's -nearest neighbors and for each data point , .
  3. Obtain the coordinates by minimizing the following reconstruction error using the weights
    formula
    where is the cluster indicator vector for the datum , for each and .

2.2  Spectral Clustering

Consider a data set , where is the dimension of the data point and is the total number of data points. The objective of clustering is to partition into clusters so as to keep data points within the same cluster close to one another, while data points from different clusters remain apart. Let us denote as the cluster indicator matrix, where is the cluster indicator vector for the datum . The th element of is 1 if belongs to the th cluster and 0 otherwise. Following the work in (Ye, Zhao, & Wu, 2007), we denote the scaled cluster indicator matrix as
formula
2.2
where is the scaled cluster indicator of . The th column of is defined as follows by Ye, Zhao, and Wu (2007),
formula
2.3
and indicates which data points are partitioned into the th cluster . Meanwhile, is the number of data points in cluster .
According to Dhillon, Guan, and Kulis (2004), the overall function of spectral clustering can be defined as
formula
2.4
where denotes the trace operator and is a graph Laplacian matrix computed in accordance with the local data structure. Among different strategies, a common way to compute the weight matrix is
formula
2.5
where denotes nearest neighbors of and is utilized to control the spread of neighbors. The Laplacian graph is then computed by , where is a diagonal matrix with its diagonal elements as .
By replacing in equation 2.4 by the normalized Laplacian matrix ,
formula
2.6
the objective function becomes the well-known SC algorithm normalized cut (Shi & Malik, 2000). In the same manner, if we replace in equation 2.4 with , a Laplacian matrix obtained Yang et al. (2010) by local learning (Wu & Schölkopf, 2006), the objective function is then modified to local learning clustering (LLC).

3  The Proposed Framework

In this section, we illustrate the detailed framework of our algorithm. We aim to cluster the data set into clusters. Suppose indicates the data set; is the dimension of data points, and is the total number of data points.

3.1  Distance-Consistent Similarity Learning

Following the work in Karasuyama and Mamitsuka (2013), we propose leveraging manifold regularization built on the Laplacian graph for label propagation. To begin, we first present a novel distance-consistent LLE.

Intuitively, we expect close data points to have similar labels. We create a graph in which all data points are considered nodes. If () is a -nearest-neighbor of (), the two nodes are connected. The edge between them is weighted so that the closer the nodes are in Euclidean distance, the larger the weight is. Because we have , the objective function of LLE in equation 2.1 can be safely rewritten as
formula
3.1
By way of simple mathematical deduction, we can rewrite the above objective function in this manner:
formula
3.2
formula
3.3
For simplicity, we assign all the nondiagonal elements of to zero. The above objective function is equivalent to
formula
3.4

From the above function, we can observe that the proposed distance-consistent LLE suggests that the edge between closer nodes has a greater weight.

The Lagrangian function of problem equation 3.4 can be written as
formula
3.5
where is a Lagrange multiplier. By setting the derivative of equation 3.5 with regard to to zero, we have
formula
3.6
By substituting the resultant in equation 3.6 into the constraint , we arrive at
formula
3.7

By integrating equations 3.6 and 3.7, we obtain the final solution for .

By denoting as a diagonal matrix with its diagonal , the graph Laplacian can be calculated as
formula
3.8

3.2  Refined Spectral Clustering

After initially clustering the data set into clusters through traditional spectral clustering, we select data points per cluster, which are nearest to each clustering center, and mark them as labeled data points. The remaining points are marked as unlabeled data points. Note that we assume these data points are grouped into the proper clusters. Hence, we obtain the label matrix and diagonal selection matrix , where if is labeled and belongs to the th cluster. otherwise. if is labeled and otherwise. In the experiment, we use to approximate . We propose propagating labels of labeled data points to unlabeled data points. Moreover, we denote a predicted label matrix for the data points in . According to Nie, Xu, Tsang, and Zhang (2010), should satisfy the smoothness on both the obtained label matrix and the manifold structure. Hence, can be obtained as follows (Zhu, 2006):
formula
3.9
where denotes the trace operator. The purpose of this definition is to keep the predicted labels consistent with the ground truth labels .
We further incorporate a regularization term into the objective function to correlate the features with the predicted labels. Consequently, the objective function arrives at
formula
3.10
where denotes the Frobenius norm of a matrx.
Since the least square loss function is very sensitive to outliers, we employ -norm on the regularization term to handle this issue. Hence, we can rewrite the objective function as
formula
3.11
It is worth noting that the proposed framework can be readily applied to out-of-sample clustering. By calculating , we obtain the label predictor matrix for outside samples.

3.3  Optimization

The proposed function involves the -norm, which is difficult to solve in a closed form. We propose to solve this problem in the following steps. By setting the derivative of equation 3.11 with regard to to zero, we have
formula
3.12
where is an identity matrix and is a diagonal matrix defined as
formula
3.13
Letting represent , the objective becomes
formula
3.14
By denoting as , the objective becomes
formula
3.15
By setting the derivative of equation 3.15 with regard to to zero, we have
formula
3.16

Based on the above mathematical deduction, we propose an efficient iterative algorithm to optimize the objective function equation 3.11, which is summarized in algorithm 1.

4  Experiments

In this section, we conduct extensive experiments to validate the performance of the proposed SCLP and compare it to related state-of-the-art spectral clustering algorithms, following a study of parameter sensitivity.

4.1  Data Set Description

We use seven trademark data sets to validate the performance of the proposed algorithm (see Table 1). The USPS data set has 9298 gray-scale handwritten digital images with an image size of 256 scanned from envelopes from the U.S. Postal Service. The Yale-B data set (Georghiades, Belhumeur, & Kriegman, 2001) consists of 2414 near-frontal images from 38 persons under different illuminations. The AR data set (Martinez & Benavente, 1998) has 840 images with a dimension of 768. The FRGC data set (Phillips et al., 2005), collected at the University of Notre Dame, contains 50,000 images taken across 13 different poses, under 43 different illumination conditions, and with 4 different expressions per person. The MSRA50 data set (He, Yan, Hu, Niyogi, & Zhang, 2004) consists of 1799 images and 12 classes. The PALM data set consists of 700 right-hand images, 7 samples per person across 100 users, taken with a digital camera; the images are resized to the same dimension of . The human lung carcinomas (LUNG) data set (Singh et al., 2002) contains 203 samples and 3312 genes. Following previous work, we use pixel values as feature representations of these images.

Table 1:
Data Set Details
Data SetMatrix SizeData Set SizeClass Number
LUNG 3312 203 
PALM 256 2000 100 
MSRA50 1024 1799 12 
FRGC 1296 5658 275 
AR 768 840 120 
Yale-B 1024 2414 38 
USPS 256 9298 10 
Data SetMatrix SizeData Set SizeClass Number
LUNG 3312 203 
PALM 256 2000 100 
MSRA50 1024 1799 12 
FRGC 1296 5658 275 
AR 768 840 120 
Yale-B 1024 2414 38 
USPS 256 9298 10 

4.2  Experiment Setup

We compare the proposed SCLP with traditional k-means (TKM) (Wu et al., 2012), discriminative k-means (DKM) (Ye, Zhao, & Wu, 2007), local learning clustering (LLC) (Wu & Schölkopf, 2006), nonnegative normalized cut (NNC) (Shi & Malik, 2000), spectral clustering (SC), CLGR (Wang, Zhang, & Li, 2009), and spectral embedding clustering (SEC) (Nie et al., 2009).

The size of neighborhood is set to 5 for all spectral clustering algorithms. For the parameter, , in NNC, we perform a self-tuning algorithm (Zelnik-Manor & Perona, 2004) to determine the best parameter. For parameters in DKM, LLC, CLGR, and SEC, we tune them in the range of and report the best results. Note that the results of all clustering algorithms vary based on initialization. To reduce the influence of statistical variation, we repeat each clustering 50 times with random initialization and report the results according to the best objective function values. For SCLP, we select two data points per cluster nearest to the clustering center.

4.3  Evaluation Metrics

Following related clustering studies, we use clustering accuracy (ACC) and normalized mutual information (NMI) as evaluation metrics for our experiments.

Let represent the clustering label result from a clustering algorithm and represent the corresponding ground truth label of arbitrary data point . From there, is defined as follows:
formula
4.1
where if and otherwise. is the best mapping function for matching clustering labels to ground truth labels using the Kuhn-Munkres algorithm. A larger ACC indicates better clustering performance.
For any two arbitrary variables and , NMI is defined as follows (Strehl & Ghosh, 2003):
formula
4.2
where computes mutual information between and , and and are the entropies of and . Let represent the number of data points in the cluster generated by a clustering algorithm and represent the number of data points from the th ground truth class. The NMI metric is then computed as follows (Strehl & Ghosh, 2003),
formula
4.3
where is the number of data samples lying in the intersection between the and th ground truth class. Similarly, a larger NMI indicates better clustering performance.

4.4  Experimental Results

We show the clustering results of different algorithms in terms of ACC and NMI over seven benchmark data sets in Tables 2 and 3. Based on the results of our experiment, we can make the following observations:

  1. When comparing the k-means-based algorithms (i.e., TKM and DKM), DKM generally outperforms TKM because discriminative dimension reduction is integrated into a single framework. Thus, each cluster is more identifiable and facilitates clustering performance. We can therefore safely conclude that discriminative information is beneficial for clustering.

  2. SC outperforms LLC on the Yale-B and USPS data sets, while LLC outperforms SC on all those remaining. That is, CLGR achieves better performance on all data sets than both algorithms combined.

  3. SEC obtains the second-best performance over the seven data sets, which indicates that linearity regularization can also facilitate clustering performance. Similar to our algorithm, SEC is capable of dealing with out-of-sample data.

  4. The proposed algorithm SCLP generally outperforms the compared clustering algorithms on the seven benchmark data sets, which demonstrates that manifold regularization-based label propagation is beneficial for spectral clustering.

Table 2:
Performance Comparison (ACC %) of KM, DKM, NNC, SC, LLC, CLGR, SEC, and SCLP.
LUNGPALMMSRA50FRGCARYaleBUSPS
KM        
DKM        
NNC        
SC        
LLC        
CLGR        
SEC        
SCLP        
LUNGPALMMSRA50FRGCARYaleBUSPS
KM        
DKM        
NNC        
SC        
LLC        
CLGR        
SEC        
SCLP        

Note: The proposed algorithm, SCLP, generally outperforms the compared algorithms, which indicates that manifold regularization-based label propagation is beneficial for spectral clustering.

Table 3:
Performance Comparison (NMI %) of KM, DKM, NNC, SC, LLC, CLGR, SEC, and SCLP.
LUNGPALMMSRA50FRGCARYaleBUSPS
KM        
DKM        
NNC        
SC        
LLC        
CLGR        
SEC        
SCLP        
LUNGPALMMSRA50FRGCARYaleBUSPS
KM        
DKM        
NNC        
SC        
LLC        
CLGR        
SEC        
SCLP        

Note: The proposed algorithm, SCLP, generally outperforms the compared algorithms, which indicates that manifold regularization-based label propagation is beneficial for spectral clustering.

4.5  Parameter Sensitivity

In this section, we study performance variance with regard to the regularization parameters and on all the data sets used. The performance is reported in Figure 2, which shows how clustering performance varies on different combinations of and . We can see that better performance occurs when and are comparable.

Figure 2:

Performance variance with regard to and .

Figure 2:

Performance variance with regard to and .

5  Conclusion

In this letter, we have proposed a novel improved spectral clustering algorithm (SCLP). Most of the existing spectral clustering algorithms are based on gaussian matrices or LLE, both of which are extremely sensitive to parameters. Moreover, the parameters are difficult to tune. We have presented a novel distance-consistent LLE that is parameter free. This LLE can promise that the edge between closer data points has a greater weight. Utilizing this distance-consistent LLE, we have proposed an improved means of spectral clustering using label propagation. The proposed algorithm takes advantage of label propagation and manifold learning. With label propagation, we can propagate the labels obtained through spectral clustering to other unlabeled data points. By adopting manifold learning, we leverage the manifold structure among data points. Note that our framework can also be readily applied to out-of-sample data. Finally, we have evaluated the clustering performance of the proposed algorithm over seven data sets. The experimental results demonstrate that the proposed algorithm consistently outperforms other algorithms compared to it.

Acknowledgments

The research is supported by Science Foundation of the China (Xi'an) Institute for Silk Road Research (2016SY10).

References

Belkin
,
M.
, &
Niyogi
,
P.
(
2003
).
Laplacian eigenmaps for dimensionality reduction and data representation
.
Neural Computation
,
15
(
6
),
1373
1396
.
Cao
,
L.
,
Luo
,
J.
, &
Huang
,
T. S.
(
2008
).
Annotating photo collections by label propagation according to multiple similarity cues
. In
Proceedings of the ACM Conference on Multimedia
(pp.
121
130
).
Chapelle
,
O.
,
Weston
,
J.
, &
Schölkopf
,
B.
(
2002
). Cluster kernels for semi-supervised learning. In
S.
Becker
,
S.
Thrun
, &
K.
Obermayer
(Eds.),
Advances in neural information processing systems, 2
(pp.
585
592
).
Cambridge, MA
:
MIT Press
.
Cheng
,
H.
,
Liu
,
Z.
, &
Yang
,
J.
(
2009
). Sparsity induced similarity measure for label propagation. In
Proceedings of the 12th International Conference on Computer Vision
(pp.
317
324
).
Washington, DC
:
IEEE Computer Society
.
Dhillon
,
I. S.
,
Guan
,
Y.
, &
Kulis
,
B.
(
2004
). Kernel k-means: Spectral clustering and normalized cuts. In
Proc. 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
.
New York
:
ACM
.
Ding
,
C.
, &
Li
,
T.
(
2007
). Adaptive dimension reduction using discriminant analysis and k-means clustering. In
Proc. 24th International Conference on Machine Learning
.
New York
:
ACM
.
Filippone
,
M.
,
Camastra
,
F.
,
Masulli
,
F.
, &
Rovetta
,
S.
(
2008
).
A survey of kernel and spectral methods for clustering
.
Pattern Recognition
,
41
(
1
),
176
190
.
Georghiades
,
A. S.
,
Belhumeur
,
P. N.
, &
Kriegman
,
D. J.
(
2001
).
From few to many: Illumination cone models for face recognition under variable lighting and pose
.
IEEE Trans. PAMI
,
23
(
6
),
643
660
.
Girolami
,
M.
(
2002
).
Mercer kernel-based clustering in feature space
.
IEEE Trans. Neural Networks
,
13
(
3
),
780
784
.
He
,
X.
,
Yan
,
S.
,
Hu
,
Y.
,
Niyogi
,
P.
, &
Zhang
,
H.-J.
(
2004
).
Face recognition using Laplacian faces
.
IEEE Trans. PAMI
,
27
(
3
),
328
340
.
Jain
,
A. K.
, &
Dubes
,
R. C.
(
1988
).
Algorithms for clustering data
.
Upper Saddle River, NJ
:
Prentice-Hall. Inc.
Jain
,
A. K.
,
Murty
,
M. N.
, &
Flynn
,
P. J.
(
1999
).
Data clustering: A review
.
ACM Computing Surveys
,
31
(
3
),
264
323
.
Kang
,
F.
,
Jin
,
R.
, &
Sukthankar
,
R.
(
2006
). Correlated label propagation with application to multi-label learning. In
Proc. IEEE Conference on Computer Vision and Pattern Recognition
(pp.
1719
1726
).
Washington, DC
:
IEEE Computer Society
.
Karasuyama
,
M.
, &
Mamitsuka
,
H.
(
2013
). Manifold-based Similarity Adaptation for Label Propagation. In
C. J. C.
Burges
,
L.
Bottou
,
M.
Welling
,
Z.
Ghahramani
, &
K. Q.
Weinberger
(Eds.),
Advances in neural in formation processing systems
,
26
.
Red Hook, NY
:
Curran
.
La Torre
,
D.
,
Fernando
, &
Kanade
,
T.
(
2006
). Discriminative cluster analysis. In
Proc. of the 23rd International Conference on Machine Learning.
New York
:
ACM
.
Martinez
,
A. M.
, &
Benavente
,
R.
(
1998
).
The ar face database
(Tech. Rep.).
Barcelona
:
Centre Vis. Comput., Univ. Autonoma Barcelona
.
Nie
,
F.
,
Xu
,
D.
,
Tsang
,
I. W.
, &
Zhang
,
C.
(
2009
). Spectral embedded clustering. In
Proceedings of the 21st International Conference on Artificial Intelligence
.
Cambridge, MA
:
Association for the Advancement of Artificial Intelligence
.
Nie
,
F.
,
Xu
,
D.
,
Tsang
,
I.-H.
, &
Zhang
,
C.
(
2010
).
Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction
.
IEEE Trans. Image Process.
,
19
(
7
),
1921
1932
.
Phillips
,
P. J.
,
Flynn
,
P. J.
,
Scruggs
,
T.
,
Bowyer
,
K. W.
,
Chang
,
J.
,
Hoffman
,
K.
, …
Worek
,
W.
(
2005
). Overview of the face recognition grand challenge. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Washington, DC
:
IEEE Computer Society
.
Roweis
,
S. T.
, &
Saul
,
L. K.
(
2000
).
Nonlinear dimensionality reduction by locally linear embedding
.
Science
,
290
(
5500
),
2323
2326
.
Shi
,
J.
, &
Malik
,
J.
(
2000
).
Normalized cuts and image segmentation
.
IEEE Trans. Pattern Analysis and Machine Intelligence
,
22
(
8
),
888
905
.
Singh
,
D.
,
Febbo
,
P. G.
,
Ross
,
K.
,
Jackson
,
D. G.
,
Manola
,
J.
,
Ladd
,
C.
, …
Tamayo
,
P.
(
2002
).
Gene expression correlates of clinical prostate cancer behavior
.
Cancer Cell
,
1
(
2
),
203
209
.
Strehl
,
A.
, &
Ghosh
,
J.
(
2003
).
Cluster ensembles—a knowledge reuse framework for combining multiple partitions
.
Machine Learning Research, 3
,
583
617
.
Wang
,
F.
, &
Zhang
,
C.
(
2008
).
Label propagation through linear neighborhoods
.
IEEE Trans. Knowl. Data Engin.
,
20
(
1
),
55
67
.
Wang
,
F.
,
Zhang
,
C.
, &
Li
,
T.
(
2009
).
Clustering with local and global regularization
.
IEEE Trans. Knowl. Data Engin.
,
21
(
12
),
1665
1678
.
Wu
,
L.
,
Hoi
,
S. C.
,
Jin
,
R.
,
Zhu
,
J.
, &
Yu
,
N.
(
2012
).
Learning Bregman distance functions for semi-supervised clustering
.
IEEE Trans. Knowl. and Data Eng.
,
24
(
3
),
478
491
.
Wu
,
M.
, &
Schölkopf
,
B.
(
2006
). A local learning approach for clustering. In
P. B.
Schölkopf
,
J. C.
Platt
, &
T.
Hoffman
(Eds.),
Advances in neural information processing systems
,
19
.
Cambridge, MA
:
MIT Press
.
Yang
,
Y.
,
Shen
,
H. T.
,
Nie
,
F.
,
Ji
,
R.
, &
Zhou
,
X.
(
2011
). Nonnegative spectral clustering with discriminative regularization. In
Proceedings of the 25th AAAI Conference on Artificial Intelligence
(pp.
555
560
).
Cambridge, MA
:
Association for the Advancement of Artificial Intelligence
.
Yang
,
Y.
,
Xu
,
D.
,
Nie
,
F.
,
Yan
,
S.
, &
Zhuang
,
Y.
(
2010
).
Image clustering using local discriminant models and global integration
.
IEEE Trans. Image Process.
,
19
(
10
),
2761
2773
.
Ye
,
J.
,
Zhao
,
Z.
, &
Liu
,
H.
(
2007
). Adaptive distance metric learning for clustering. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
.
Washington, DC
:
IEEE Computer Society
.
Ye
,
J.
,
Zhao
,
Z.
, &
Wu
,
M.
(
2007
). Discriminative k-means for clustering. In
J. C.
Platt
,
D.
Koller
,
Y.
Singer
, &
S. T.
Roweis
(Eds.),
Advances in neural information processing systems
(pp.
1647
1656
).
Cambridge, MA
:
MIT Press
.
Yu
,
S. X.
, &
Shi
,
J.
(
2003
). Multiclass spectral clustering. In
Proceedings of the IEEE Conference on Computer Vision
.
Washington, DC
:
IEEE Computer Society
.
Zelnik-Manor
,
L.
, &
Perona
,
P.
(
2004
). Self-tuning spectral clustering. In
T. G.
Dietterich
,
S.
Becker
, &
Z.
Ghahramani
(Eds.),
Advances in neural information processing systems
.
Cambridge, MA
:
MIT Press
.
Zhu
,
X.
(
2006
).
Semi-supervised learning literature survey
(Tech. Rep. No. 3).
Madison
:
Computer Science, University of Wisconsin-Madison 2
.
Zhu
,
X.
, &
Ghahramani
,
Z.
(
2002
).
Learning from labeled and unlabeled data with label propagation
(Tech. Rep. CMU-CALD-02-107).
Pittsburgh, PA
:
Carnegie Mellon University
.