In this letter, we consider a density-level detection (DLD) problem by a coefficient-based classification framework with -regularizer and data-dependent hypothesis spaces. Although the data-dependent characteristic of the algorithm provides flexibility and adaptivity for DLD, it leads to difficulty in generalization error analysis. To overcome this difficulty, an error decomposition is introduced from an established classification framework. On the basis of this decomposition, the estimate of the learning rate is obtained by using Rademacher average and stepping-stone techniques. In particular, the estimate is independent of the capacity assumption used in the previous literature.
The aim of this letter is to provide an error analysis for density level detection (DLD) by a classification method with -coefficient regularization and data-dependent hypothesis spaces. Our study is motivated by the increasing attention being paid to the classification framework for DLD (Steinwart, Hush, & Scovel, 2005; Scovel, Hush, & Steinwart, 2005; Cao, Xing, & Zhao, 2012) and error analysis with data-dependent hypothesis spaces (Wu, Ying, & Zhou, 2005; Wu & Zhou, 2008; Sun & Wu, 2011; Tong, Chen, & Yang, 2010; Shi, Feng, & Zhou, 2011; Xiao & Zhou, 2011; Feng & Lv, 2011; Song & Zhang, 2011).
Following the illustration in Steinwart et al. (2005), we introduce the preliminary background of the DLD problem in a Hilbert space. Let be a separable Hilbert space (possibly infinite dimensional), and let with for all and a positive constant B. Let Q be an unknown data-generating distribution on X. Usually we describe the data as anomalies if they are not concentrated (see Ripley, 1996; Scholköpf & Smola, 2002). A reference distribution on X is required to describe the concentration of Q. Assume that Q has a density h with respect to , that is, . Given a , we define the -level set of density h as . This set describes the concentration of Q. To define anomalies in terms of the concentration, one has only to fix a threshold so that a sample is considered to be anomalous whenever . We assume that is a -zero set, a common assumption used in many other papers (see, Polonik, 1995; Tsybakov, 1997). The goal of the DLD problem is to find the -level set as precisely as possible based on empirical data.
Since the empirical comparison in terms of is difficult, Steinwart et al. (2005) proposed a novel performance measure. Let .
From this definition, we know that can be associated with a binary classification problem where positive samples are drawn from sQ and negative samples are drawn from . On the basis of this interpretation, Steinwart et al. (2005) proposed the kernel method. Recall that is a Mercer kernel if it is continuous, symmetric, and positive semidefinite. The candidate reproducing kernel Hilbert space (RKHS) associated with a Mercer kernel K has been defined (see Aronszajn, 1950) as the closure of the linear span of the set of functions , equipped with the inner product defined by .
Our main goal is to establish the generalization error estimate of equation 1.2. Compared with the error analysis of algorithm 1.1 (Scovel et al., 2005; Cao et al., 2012), the following main theoretical contributions highlight three features of this letter:
Although the -regularizer has been successfully used for least square regression (Wu & Zhou, 2008; Xiao & Zhou, 2011; Shi et al., 2011; Song & Zhang, 2011) and linear programming SVM (Vapnik, 1998; Wu et al., 2005; Wu & Zhou, 2008), there are no such studies on the DLD problem. Our method enriches the algorithm design for the DLD problem.
The regularized part of equation 1.2 is essentially different from the regularized part of equation 1.1, and depends closely on empirical data. This leads to additional difficulty in the error analysis. A stepping-stone technique for the DLD problem is introduced to overcome the problem. We also note that all previous applications of the stepping-stone technique have been restricted to SVM and least square regression frameworks (Wu et al., 2005; Feng & Lv, 2011). Hence our hypothesis error estimate is novel and sheds some new light on applications of the stepping-stone technique.
It is worth noting that the sample error estimates in Scovel et al. (2005) and Cao et al. (2012) depend on the capacity measure with covering numbers. However, in some applications, input items are in the form of random functions (speech recordings, spectra, images), and this casts the DLD problem into the general class of functional data analysis (Ramsay & Silverman, 1997; Biau, Devroye, & Lugosi, 2008; Chen & Li, 2012). In these cases, the covering number assumption may be invalid since it usually depends on the dimension of the input data. The sample error estimate here is independent of covering numbers and suitable for the DLD problem where input terms belong to an infinite-dimensional separable Hilbert space.
The rest of this letter is organized as follows. In section 2, we introduce the necessary notations and definitions. The estimates for hypothesis error and sample error are presented in sections 3 and 4, respectively. The main result on learning rate is proved in section 5. We close with a brief conclusion in section 6.
In this section we introduce some definitions and notations used throughout this letter.
It is well known that the Bayes classifier fc=sign(2P(y=1|x)−1) minimizes the misclassification risk . For and , .
The bounding technique for sample error usually relies on capacity measurement of the hypothesis function space . Note that the generalization performance of regression and clustering algorithms in Hilbert spaces has been investigated based on the Rademacher average (Biau et al., 2008; Chen & Li, 2012). In this letter, we also take Rademacher complexity (Bartlett & Mendelson, 2002) as the measure of capacity.
We adopt the following condition for the approximation error, which has been extensively used in the literature (Chen, Wu, Ying, & Zhou, 2004; Wu et al., 2005; Cucker & Zhou, 2007; Ying & Campbell, 2010; Cao et al., 2012).
3. Estimate of Hypothesis Error
For every , there holds .
4. Estimate of Sample Error
Now we turn to the estimate of sample error. McDiarmid’s inequality and some properties of Rademacher complexity (see Bartlett & Mendelson, 2002) are necessary in our estimate of .
Let be classes of real functions. Then:
Ifis Lipschitz with constantand satisfies, then.
To derive the upper bound of the sample error, we establish the concentration estimation of on the basis of the Rademacher average. The analysis technique used here is that of lemma 2.1 in Chen and Li (2012).
To bound , we need the upper bounds of fT and in RKHS:
Based on lemmas 2 and 3, the following estimate of sample error can be obtained directly:
5. Estimate of Learning Rate
We are now in a position to present the main result on the learning rate:
When , the learning rate of fT tends to . This polynomial decay is usually fast enough for a practical problem where a set of finite samples is available. Although the convergence rate is slower than the result in Cao et al. (2012), our result has two advantages compared with the previous result: first, the dual data-dependent characters (through and -regularizer) give us more adaptivity and flexibility to search for a sparse predictive function; second, the estimate is independent of the capacity assumption of a covering number, which is suitable for more general input data in separable Hilbert spaces.
As illustrated in Steinwart et al. (2005), the DLD algorithm, 1.1, can be considered a quadratic program SVM and implemented efficiently by LIPSVM software (Chang & Lin, 2004). Note that the -norm algorithm, equation 1.2, can be transformed as a linear programming SVM formulation (e.g., using the optimization technique in Fung & Mangasarian, 2004). Many experiments demonstrate that linear programming SVM is capable of solving huge sample-size problems and improving computation speed (see Bradley & Mangasarian, 2000; Pedroso & Murata, 2001). Since our main concern in this letter is to establish the approximation analysis of a coefficient-based DLD method, equation 1.2, we leave the empirical evaluation for future study.
We considered the DLD problem with coefficient regularization 1.2. We introduced the stepping-stone technique and Rademacher average technique to establish the error analysis. We deduced the generalization error analysis and obtained a satisfactory estimate of learning rate. This provides a mathematical analysis for DLD with -regularizer.
The authors thank the referees for their valuable comments and helpful suggestions. This work was supported partially by the National Natural Science Foundation of China under Grants No. 11001092 and No. 11071058, the Fundamental Research Funds for the Central Universities (Program Nos. 2011PY130, 2011QC022), and by the Start-up Research of University of Macau under Grant No. SRG010-FST11-TYY, Multi-Year Research of University of Macau under Grants No. MYRG187(Y1-L3)-FST11-TYY and No. MYRG205(Y1-L4)-FST11-TYY.