MIT Press

Figure 6:

Training loss curves. The training mean-squared error (sine wave generation) or cross-entropy loss (MNIST tasks) of the network averaged over 10 random initializations is plotted over training epochs. The shading indicates a moving average of the standard deviation. On average, all network types converge on a solution within the training time except for the P-MNIST task.

This Feature Is Available To Subscribers Only