Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-2 of 2
Sayandev Mukherjee
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (1999) 11 (3): 747–769.
Published: 01 April 1999
Abstract
View article
PDF
We revisit the oft-studied asymptotic (in sample size) behavior of the parameter or weight estimate returned by any member of a large family of neural network training algorithms. By properly accounting for the characteristic property of neural networks that their empirical and generalization errors possess multiple minima, we rigorously establish conditions under which the parameter estimate converges strongly into the set of minima of the generalization error. Convergence of the parameter estimate to a particular value cannot be guaranteed under our assumptions. We then evaluate the asymptotic distribution of the distance between the parameter estimate and its nearest neighbor among the set of minima of the generalization error. Results on this question have appeared numerous times and generally assert asymptotic normality, the conclusion expected from familiar statistical arguments concerned with maximum likelihood estimators. These conclusions are usually reached on the basis of somewhat informal calculations, although we shall see that the situation is somewhat delicate. The preceding results then provide a derivation of learning curves for generalization and empirical errors that leads to bounds on rates of convergence.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1996) 8 (5): 1075–1084.
Published: 01 July 1996
Abstract
View article
PDF
We study the asymptotic properties of the sequence of iterates of weight-vector estimates obtained by training a feedforward neural network with a basic gradient-descent method using a fixed learning rate and no batch-processing. Earlier results based on stochastic approximation techniques (Kuan and Hornik 1991; Finnoff 1993; Bucklew et al. 1993) have established the existence of a gaussian limiting distribution for the weights, but they apply only in the limiting case of a zero learning rate. We here prove, from an exact analysis of the one-dimensional case and constant learning rate, weak convergence to a distribution that is not gaussian in general. We also run simulations to compare and contrast the results of our analysis with those of stochastic approximation.