Abstract
Gradient learning (GL), initially proposed by Mukherjee and Zhou (2006) has been proved to be a powerful tool for conducting variable selection and dimensional reduction simultaneously. This approach presents a nonparametric version of a gradient estimator with positive definite kernels without estimating the true function itself, so that the proposed version has wide applicability and allows for complex effects between predictors. In terms of theory, however, existing generalization bounds for GL depend on capacity-independent techniques, and the capacity of kernel classes cannot be characterized completely. Thus, this letter considers GL estimators that minimize the empirical convex risk. We prove generalization bounds for such estimators with rates that are faster than previous results. Moreover, we provide a novel upper bound for Rademacher chaos complexity of order two, which also plays an important role in general pairwise-type estimations, including ranking and score problems.