Training loss curves. The training mean-squared error (sine wave generation) or cross-entropy loss (MNIST tasks) of the network averaged over 10 random initializations is plotted over training epochs. The shading indicates a moving average of the standard deviation. On average, all network types converge on a solution within the training time except for the P-MNIST task.
This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.