Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-7 of 7
Peter L. Bartlett
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2019) 31 (3): 477–502.
Published: 01 March 2019
Abstract
View article
PDF
We analyze algorithms for approximating a function f ( x ) = Φ x mapping ℜ d to ℜ d using deep linear neural networks, that is, that learn a function h parameterized by matrices Θ 1 , … , Θ L and defined by h ( x ) = Θ L Θ L - 1 … Θ 1 x . We focus on algorithms that learn through gradient descent on the population quadratic loss in the case that the distribution over the inputs is isotropic. We provide polynomial bounds on the number of iterations for gradient descent to approximate the least-squares matrix Φ , in the case where the initial hypothesis Θ 1 = … = Θ L = I has excess loss bounded by a small enough constant. We also show that gradient descent fails to converge for Φ whose distance from the identity is a larger constant, and we show that some forms of regularization toward the identity in each layer do not help. If Φ is symmetric positive definite, we show that an algorithm that initializes Θ i = I learns an ε -approximation of f using a number of updates polynomial in L , the condition number of Φ , and log ( d / ε ) . In contrast, we show that if the least-squares matrix Φ is symmetric and has a negative eigenvalue, then all members of a class of algorithms that perform gradient descent with identity initialization, and optionally regularize toward the identity in each layer, fail to converge. We analyze an algorithm for the case that Φ satisfies u ⊤ Φ u > 0 for all u but may not be symmetric. This algorithm uses two regularizers: one that maintains the invariant u ⊤ Θ L Θ L - 1 … Θ 1 u > 0 for all u and the other that “balances” Θ 1 , … , Θ L so that they have the same singular values.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2000) 12 (5): 1207–1245.
Published: 01 May 2000
Abstract
View article
PDF
We propose a new class of support vector algorithms for regression and classification. In these algorithms, a parameter ν lets one effectively control the number of support vectors. While this can be useful in its own right, the parameterization has the additional benefit of enabling us to eliminate one of the other free parameters of the algorithm: the accuracy parameter ε in the regression case, and the regularization constant C in the classification case. We describe the algorithms, give some theoretical results concerning the meaning and the choice of ν, and report experimental results.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1998) 10 (8): 2159–2173.
Published: 15 November 1998
Abstract
View article
PDF
We compute upper and lower bounds on the VC dimension and pseudodimension of feedforward neural networks composed of piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC dimension and pseudo-dimension grow as W log W , where W is the number of parameters in the network. This result stands in opposition to the case where the number of layers is unbounded, in which case the VC dimension and pseudo-dimension grow as W 2 . We combine our results with recently established approximation error rates and determine error bounds for the problem of regression estimation by piecewise polynomial networks with unbounded weights.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1997) 9 (4): 765–769.
Published: 15 May 1997
Abstract
View article
PDF
The earlier article gives lower bounds on the VC-dimension of various smoothly parameterized function classes. The results were proved by showing a relationship between the uniqueness of decision boundaries and the VC-dimension of smoothly parameterized function classes. The proof is incorrect; there is no such relationship under the conditions stated in the article. For the case of neural networks with tanh activation functions, we give an alternative proof of a lower bound for the VC-dimension proportional to the number of parameters, which holds even when the magnitude of the parameters is restricted to be arbitrarily small.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1996) 8 (3): 625–628.
Published: 01 April 1996
Abstract
View article
PDF
We give upper bounds on the Vapnik-Chervonenkis dimension and pseudodimension of two-layer neural networks that use the standard sigmoid function or radial basis function and have inputs from {− D , …, D } n . In Valiant's probably approximately correct (pac) learning framework for pattern classification, and in Haussler's generalization of this framework to nonlinear regression, the results imply that the number of training examples necessary for satisfactory learning performance grows no more rapidly than W log ( WD ), where W is the number of weights. The previous best bound for these networks was O ( W 4 ).
Journal Articles
Publisher: Journals Gateway
Neural Computation (1995) 7 (5): 1040–1053.
Published: 01 September 1995
Abstract
View article
PDF
We examine the relationship between the VC dimension and the number of parameters of a threshold smoothly parameterized function class. We show that the VC dimension of such a function class is at least k if there exists a k -dimensional differentiable manifold in the parameter space such that each member of the manifold corresponds to a different decision boundary. Using this result, we are able to obtain lower bounds on the VC dimension proportional to the number of parameters for several thresholded function classes including two-layer neural networks with certain smooth activation functions and radial basis functions with a gaussian basis. These lower bounds hold even if the magnitudes of the parameters are restricted to be arbitrarily small. In Valiant's probably approximately correct learning framework, this implies that the number of examples necessary for learning these function classes is at least linear in the number of parameters.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1993) 5 (3): 371–373.
Published: 01 May 1993
Abstract
View article
PDF
We show that the Vapnik-Chervonenkis dimension of the class of functions that can be computed by arbitrary two-layer or some completely connected three-layer threshold networks with real inputs is at least linear in the number of weights in the network. In Valiant's "probably approximately correct" learning framework, this implies that the number of random training examples necessary for learning in these networks is at least linear in the number of weights.