Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-4 of 4
Yiming Ying
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2025) 37 (2): 344–402.
Published: 21 January 2025
FIGURES
Abstract
View articletitled, Generalization Guarantees of Gradient Descent for Shallow Neural Networks
View
PDF
for article titled, Generalization Guarantees of Gradient Descent for Shallow Neural Networks
Significant progress has been made recently in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling. Here, network scaling corresponds to the normalization of the layers. In this article, we greatly extend the previous work (Lei et al., 2022 ; Richards & Kuzborskij, 2021 ) by conducting a comprehensive stability and generalization analysis of GD for two-layer and three-layer NNs. For two-layer NNs, our results are established under general network scaling, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of overparameterization. As a direct application of our general findings, we derive the excess risk rate of O ( 1 / n ) for GD in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for underparameterized and overparameterized NNs trained by GD to attain the desired risk rate of O ( 1 / n ) . Moreover, we demonstrate that as the scaling factor increases or the network complexity decreases, less overparameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of O ( 1 / n ) for GD in both two-layer and three-layer NNs.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (4): 743–777.
Published: 01 April 2016
Abstract
View articletitled, Online Pairwise Learning Algorithms
View
PDF
for article titled, Online Pairwise Learning Algorithms
Pairwise learning usually refers to a learning task that involves a loss function depending on pairs of examples, among which the most notable ones are bipartite ranking, metric learning, and AUC maximization. In this letter we study an online algorithm for pairwise learning with a least-square loss function in an unconstrained setting of a reproducing kernel Hilbert space (RKHS) that we refer to as the Online Pairwise lEaRning Algorithm (OPERA). In contrast to existing works (Kar, Sriperumbudur, Jain, & Karnick, 2013 ; Wang, Khardon, Pechyony, & Jones, 2012 ), which require that the iterates are restricted to a bounded domain or the loss function is strongly convex, OPERA is associated with a non-strongly convex objective function and learns the target function in an unconstrained RKHS. Specifically, we establish a general theorem that guarantees the almost sure convergence for the last iterate of OPERA without any assumptions on the underlying distribution. Explicit convergence rates are derived under the condition of polynomially decaying step sizes. We also establish an interesting property for a family of widely used kernels in the setting of pairwise learning and illustrate the convergence results using such kernels. Our methodology mainly depends on the characterization of RKHSs using its associated integral operators and probability inequalities for random variables with values in a Hilbert space.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (3): 497–522.
Published: 01 March 2014
Abstract
View articletitled, Guaranteed Classification via Regularized Similarity Learning
View
PDF
for article titled, Guaranteed Classification via Regularized Similarity Learning
Learning an appropriate (dis)similarity function from the available data is a central problem in machine learning, since the success of many machine learning algorithms critically depends on the choice of a similarity function to compare examples. Despite many approaches to similarity metric learning that have been proposed, there has been little theoretical study on the links between similarity metric learning and the classification performance of the resulting classifier. In this letter, we propose a regularized similarity learning formulation associated with general matrix norms and establish their generalization bounds. We show that the generalization error of the resulting linear classifier can be bounded by the derived generalization bound of similarity learning. This shows that a good generalization of the learned similarity function guarantees a good classification of the resulting linear classifier. Our results extend and improve those obtained by Bellet, Habrard, and Sebban ( 2012 ). Due to the techniques dependent on the notion of uniform stability (Bousquet & Elisseeff, 2002 ), the bound obtained there holds true only for the Frobenius matrix-norm regularization. Our techniques using the Rademacher complexity (Bartlett & Mendelson, 2002 ) and its related Khinchin-type inequality enable us to establish bounds for regularized similarity learning formulations associated with general matrix norms, including sparse L 1 -norm and mixed (2,1)-norm.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2010) 22 (11): 2858–2886.
Published: 01 November 2010
Abstract
View articletitled, Rademacher Chaos Complexities for Learning the Kernel Problem
View
PDF
for article titled, Rademacher Chaos Complexities for Learning the Kernel Problem
We develop a novel generalization bound for learning the kernel problem. First, we show that the generalization analysis of the kernel learning problem reduces to investigation of the suprema of the Rademacher chaos process of order 2 over candidate kernels, which we refer to as Rademacher chaos complexity. Next, we show how to estimate the empirical Rademacher chaos complexity by well-established metric entropy integrals and pseudo-dimension of the set of candidate kernels. Our new methodology mainly depends on the principal theory of U-processes and entropy integrals. Finally, we establish satisfactory excess generalization bounds and misclassification error rates for learning gaussian kernels and general radial basis kernels.