We analyze online gradient descent learning from finite training sets at noninfinitesimal learning rates η. Exact results are obtained for the time-dependent generalization error of a simple model system: a linear network with a large number of weights N, trained on p = αN examples. This allows us to study in detail the effects of finite training set size α on, for example, the optimal choice of learning rate η. We also compare online and offline learning, for respective optimal settings of η at given final learning time. Online learning turns out to be much more robust to input bias and actually outperforms offline learning when such bias is present; for unbiased inputs, online and offline learning perform almost equally well.
This content is only available as a PDF.
© 1998 Massachusetts Institute of Technology
You do not currently have access to this content.