We show that minimizing the expected error of a feedforward network over a distribution of weights results in an approximation that tends to be independent of network size as the number of hidden units grows. This minimization can be easily performed, and the complexity of the resulting function implemented by the network is regulated by the variance of the weight distribution. For a fixed variance, there is a number of hidden units above which either the implemented function does not change or the change is slight and tends to zero as the size of the network grows. In sum, the control of the complexity depends on only the variance, not the architecture, provided it is large enough.
This content is only available as a PDF.
© 2001 Massachusetts Institute of Technology