Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-5 of 5
Yunlong Feng
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2021) 33 (6): 1656–1697.
Published: 13 May 2021
Abstract
View article
PDF
We develop in this letter a framework of empirical gain maximization (EGM) to address the robust regression problem where heavy-tailed noise or outliers may be present in the response variable. The idea of EGM is to approximate the density function of the noise distribution instead of approximating the truth function directly as usual. Unlike the classical maximum likelihood estimation that encourages equal importance of all observations and could be problematic in the presence of abnormal observations, EGM schemes can be interpreted from a minimum distance estimation viewpoint and allow the ignorance of those observations. Furthermore, we show that several well-known robust nonconvex regression paradigms, such as Tukey regression and truncated least square regression, can be reformulated into this new framework. We then develop a learning theory for EGM by means of which a unified analysis can be conducted for these well-established but not fully understood regression approaches. This new framework leads to a novel interpretation of existing bounded nonconvex loss functions. Within this new framework, the two seemingly irrelevant terminologies, the well-known Tukey's biweight loss for robust regression and the triweight kernel for nonparametric smoothing, are closely related. More precisely, we show that Tukey's biweight loss can be derived from the triweight kernel. Other frequently employed bounded nonconvex loss functions in machine learning, such as the truncated square loss, the Geman-McClure loss, and the exponential squared loss, can also be reformulated from certain smoothing kernels in statistics. In addition, the new framework enables us to devise new bounded nonconvex loss functions for robust learning.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2021) 33 (1): 157–173.
Published: 01 January 2021
FIGURES
Abstract
View article
PDF
Stemming from information-theoretic learning, the correntropy criterion and its applications to machine learning tasks have been extensively studied and explored. Its application to regression problems leads to the robustness-enhanced regression paradigm: correntropy-based regression. Having drawn a great variety of successful real-world applications, its theoretical properties have also been investigated recently in a series of studies from a statistical learning viewpoint. The resulting big picture is that correntropy-based regression regresses toward the conditional mode function or the conditional mean function robustly under certain conditions. Continuing this trend and going further, in this study, we report some new insights into this problem. First, we show that under the additive noise regression model, such a regression paradigm can be deduced from minimum distance estimation, implying that the resulting estimator is essentially a minimum distance estimator and thus possesses robustness properties. Second, we show that the regression paradigm in fact provides a unified approach to regression problems in that it approaches the conditional mean, the conditional mode, and the conditional median functions under certain conditions. Third, we present some new results when it is used to learn the conditional mean function by developing its error bounds and exponential convergence rates under conditional ( 1 + ε )-moment assumptions. The saturation effect on the established convergence rates, which was observed under ( 1 + ε )-moment assumptions, still occurs, indicating the inherent bias of the regression estimator. These novel insights deepen our understanding of correntropy-based regression, help cement the theoretic correntropy framework, and enable us to investigate learning schemes induced by general bounded nonconvex loss functions.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (12): 2853–2889.
Published: 01 December 2016
FIGURES
| View All (10)
Abstract
View article
PDF
This letter investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by general , we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes can be conducted and a sharp oracle inequality for generic regularized empirical risk minimization schemes can be established. The obtained oracle inequality is then applied to derive convergence rates for several learning schemes such as empirical risk minimization (ERM), least squares support vector machines (LS-SVMs) using given generic kernels, and SVMs using gaussian kernels for both least squares and quantile regression. It turns out that for independent and identically distributed (i.i.d.) processes, our learning rates for ERM recover the optimal rates. For non-i.i.d. processes, including geometrically -mixing Markov processes, geometrically -mixing processes with restricted decay, -mixing processes, and (time-reversed) geometrically -mixing processes, our learning rates for SVMs with gaussian kernels match, up to some arbitrarily small extra term in the exponent, the optimal rates. For the remaining cases, our rates are at least close to the optimal rates. As a by-product, the assumed generalized Bernstein-type inequality also provides an interpretation of the so-called effective number of observations for various mixing processes.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (6): 1217–1247.
Published: 01 June 2016
FIGURES
| View All (19)
Abstract
View article
PDF
This letter addresses the robustness problem when learning a large margin classifier in the presence of label noise. In our study, we achieve this purpose by proposing robustified large margin support vector machines. The robustness of the proposed robust support vector classifiers (RSVC), which is interpreted from a weighted viewpoint in this work, is due to the use of nonconvex classification losses. Besides the robustness, we also show that the proposed RSCV is simultaneously smooth, which again benefits from using smooth classification losses. The idea of proposing RSVC comes from M-estimation in statistics since the proposed robust and smooth classification losses can be taken as one-sided cost functions in robust statistics. Its Fisher consistency property and generalization ability are also investigated. Besides the robustness and smoothness, another nice property of RSVC lies in the fact that its solution can be obtained by solving weighted squared hinge loss–based support vector machine problems iteratively. We further show that in each iteration, it is a quadratic programming problem in its dual space and can be solved by using state-of-the-art methods. We thus propose an iteratively reweighted type algorithm and provide a constructive proof of its convergence to a stationary point. Effectiveness of the proposed classifiers is verified on both artificial and real data sets.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (3): 525–562.
Published: 01 March 2016
FIGURES
Abstract
View article
PDF
Kernelized elastic net regularization (KENReg) is a kernelization of the well-known elastic net regularization (Zou & Hastie, 2005 ). The kernel in KENReg is not required to be a Mercer kernel since it learns from a kernelized dictionary in the coefficient space. Feng, Yang, Zhao, Lv, and Suykens ( 2014 ) showed that KENReg has some nice properties including stability, sparseness, and generalization. In this letter, we continue our study on KENReg by conducting a refined learning theory analysis. This letter makes the following three main contributions. First, we present refined error analysis on the generalization performance of KENReg. The main difficulty of analyzing the generalization error of KENReg lies in characterizing the population version of its empirical target function. We overcome this by introducing a weighted Banach space associated with the elastic net regularization. We are then able to conduct elaborated learning theory analysis and obtain fast convergence rates under proper complexity and regularity assumptions. Second, we study the sparse recovery problem in KENReg with fixed design and show that the kernelization may improve the sparse recovery ability compared to the classical elastic net regularization. Finally, we discuss the interplay among different properties of KENReg that include sparseness, stability, and generalization. We show that the stability of KENReg leads to generalization, and its sparseness confidence can be derived from generalization. Moreover, KENReg is stable and can be simultaneously sparse, which makes it attractive theoretically and practically.