Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-4 of 4
Yunwen Lei
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2025) 37 (2): 344–402.
Published: 21 January 2025
FIGURES
Abstract
View articletitled, Generalization Guarantees of Gradient Descent for Shallow Neural Networks
View
PDF
for article titled, Generalization Guarantees of Gradient Descent for Shallow Neural Networks
Significant progress has been made recently in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling. Here, network scaling corresponds to the normalization of the layers. In this article, we greatly extend the previous work (Lei et al., 2022 ; Richards & Kuzborskij, 2021 ) by conducting a comprehensive stability and generalization analysis of GD for two-layer and three-layer NNs. For two-layer NNs, our results are established under general network scaling, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of overparameterization. As a direct application of our general findings, we derive the excess risk rate of O ( 1 / n ) for GD in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for underparameterized and overparameterized NNs trained by GD to attain the desired risk rate of O ( 1 / n ) . Moreover, we demonstrate that as the scaling factor increases or the network complexity decreases, less overparameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of O ( 1 / n ) for GD in both two-layer and three-layer NNs.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2023) 35 (7): 1234–1287.
Published: 12 June 2023
Abstract
View articletitled, Optimization and Learning With Randomly Compressed Gradient Updates
View
PDF
for article titled, Optimization and Learning With Randomly Compressed Gradient Updates
Gradient descent methods are simple and efficient optimization algorithms with widespread applications. To handle high-dimensional problems, we study compressed stochastic gradient descent (SGD) with low-dimensional gradient updates. We provide a detailed analysis in terms of both optimization rates and generalization rates. To this end, we develop uniform stability bounds for CompSGD for both smooth and nonsmooth problems, based on which we develop almost optimal population risk bounds. Then we extend our analysis to two variants of SGD: batch and mini-batch gradient descent. Furthermore, we show that these variants achieve almost optimal rates compared to their high-dimensional gradient setting. Thus, our results provide a way to reduce the dimension of gradient updates without affecting the convergence rate in the generalization analysis. Moreover, we show that the same result also holds in the differentially private setting, which allows us to reduce the dimension of added noise with “almost free” cost.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2017) 29 (3): 825–860.
Published: 01 March 2017
Abstract
View articletitled, Analysis of Online Composite Mirror Descent Algorithm
View
PDF
for article titled, Analysis of Online Composite Mirror Descent Algorithm
We study the convergence of the online composite mirror descent algorithm, which involves a mirror map to reflect the geometry of the data and a convex objective function consisting of a loss and a regularizer possibly inducing sparsity. Our error analysis provides convergence rates in terms of properties of the strongly convex differentiable mirror map and the objective function. For a class of objective functions with Hölder continuous gradients, the convergence rates of the excess (regularized) risk under polynomially decaying step sizes have the order after iterates. Our results improve the existing error analysis for the online composite mirror descent algorithm by avoiding averaging and removing boundedness assumptions, and they sharpen the existing convergence rates of the last iterate for online gradient descent without any boundedness assumptions. Our methodology mainly depends on a novel error decomposition in terms of an excess Bregman distance, refined analysis of self-bounding properties of the objective function, and the resulting one-step progress bounds.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (4): 739–760.
Published: 01 April 2014
Abstract
View articletitled, Refined Rademacher Chaos Complexity Bounds with Applications to the Multikernel Learning Problem
View
PDF
for article titled, Refined Rademacher Chaos Complexity Bounds with Applications to the Multikernel Learning Problem
Estimating the Rademacher chaos complexity of order two is important for understanding the performance of multikernel learning (MKL) machines. In this letter, we develop a novel entropy integral for Rademacher chaos complexities. As compared to the previous bounds, our result is much improved in that it introduces an adjustable parameter ε to prohibit the divergence of the involved integral. With the use of the iteration technique in Steinwart and Scovel ( 2007 ), we also apply our Rademacher chaos complexity bound to the MKL problems and improve existing learning rates.