Newton methods that find critical points on a linear network fail on a nonlinear network. (A, B) Newton-MR on a linear autoencoder applied to multivariate gaussian data, as in Frye et al. (2019). (A) Squared gradient norms of the loss , as a function of the parameters , across iterations of Newton-MR, colored by whether, after the first of early termination or 1000 epochs, squared gradient norms are below 1e-8 (blue) or not (orange). (B) The loss and Morse index of putative and actual critical points, with ground truth. The Morse index is defined as the fraction of negative eigenvalues. Analytically derived critical points in gray, points from the end of runs that terminate below a squared gradient norm of 1e-8 in light blue, and points from trajectories stopped early, once they pass a squared gradient norm of 1e-2, in dark red. (C, D) As in panels A and B, on the same network architecture and data, but with Swish (Ramachandran, Zoph, & Le, 2017) nonlinear activations instead of identity activations. (D) Loss and Morse index of putative critical points. Points with squared gradient norm above 1e-8 in orange, those below 1e-8 in blue. Analytical expressions for critical points are not available for this nonlinear network.
This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.