Deep learning is often criticized by two serious issues that rarely exist in natural nervous systems: overfitting and catastrophic forgetting. It can even memorize randomly labeled data, which has little knowledge behind the instance-label pairs. When a deep network continually learns over time by accommodating new tasks, it usually quickly overwrites the knowledge learned from previous tasks. Referred to as the neural variability, it is well known in neuroscience that human brain reactions exhibit substantial variability even in response to the same stimulus. This mechanism balances accuracy and plasticity/flexibility in the motor learning of natural nervous systems. Thus, it motivates us to design a similar mechanism, named artificial neural variability (ANV), that helps artificial neural networks learn some advantages from “natural” neural networks. We rigorously prove that ANV plays as an implicit regularizer of the mutual information between the training data and the learned model. This result theoretically guarantees ANV a strictly improved generalizability, robustness to label noise, and robustness to catastrophic forgetting. We then devise a neural variable risk minimization (NVRM) framework and neural variable optimizers to achieve ANV for conventional network architectures in practice. The empirical studies demonstrate that NVRM can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs.
Skip Nav Destination
Article navigation
August 2021
July 26 2021
Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting
In Special Collection:
CogNet
Zeke Xie,
Zeke Xie
University of Tokyo, Bunkyo-ku, Tokyo 113-0333, Japan and RIKEN Center for AIP, Chuo-ku, Tokyo 103-0027, Japan [email protected]
Search for other works by this author on:
Fengxiang He,
Fengxiang He
University of Sydney, Level 1, Chippendale NSW 2008, Australia [email protected]
Search for other works by this author on:
Shaopeng Fu,
Shaopeng Fu
University of Sydney, Level 1, Chippendale NSW 2008, Australia [email protected]
Search for other works by this author on:
Issei Sato,
Issei Sato
University of Tokyo, Bunkyo-ku, Tokyo 113-0333, Japan, and RIKEN Center for AIP, Chuo-ku, Tokyo 103-0027, Japan [email protected]
Search for other works by this author on:
Dacheng Tao,
Dacheng Tao
University of Sydney, Level 1, Chippendale NSW 2008, Australi [email protected]
Search for other works by this author on:
Masashi Sugiyama
Masashi Sugiyama
RIKEN Center for AIP, Chuo-ku, Tokyo 103-0027, Japan, and University of Tokyo, Bunkyo-ku, Tokyo 113-0333, Japan [email protected]
Search for other works by this author on:
Zeke Xie
University of Tokyo, Bunkyo-ku, Tokyo 113-0333, Japan and RIKEN Center for AIP, Chuo-ku, Tokyo 103-0027, Japan [email protected]
Fengxiang He
University of Sydney, Level 1, Chippendale NSW 2008, Australia [email protected]
Shaopeng Fu
University of Sydney, Level 1, Chippendale NSW 2008, Australia [email protected]
Issei Sato
University of Tokyo, Bunkyo-ku, Tokyo 113-0333, Japan, and RIKEN Center for AIP, Chuo-ku, Tokyo 103-0027, Japan [email protected]
Dacheng Tao
University of Sydney, Level 1, Chippendale NSW 2008, Australi [email protected]
Masashi Sugiyama
RIKEN Center for AIP, Chuo-ku, Tokyo 103-0027, Japan, and University of Tokyo, Bunkyo-ku, Tokyo 113-0333, Japan [email protected]
Received:
November 13 2020
Accepted:
February 22 2021
Online ISSN: 1530-888X
Print ISSN: 0899-7667
© 2021 Massachusetts Institute of Technology
2021
Massachusetts Institute of Technology
Neural Computation (2021) 33 (8): 2163–2192.
Article history
Received:
November 13 2020
Accepted:
February 22 2021
Citation
Zeke Xie, Fengxiang He, Shaopeng Fu, Issei Sato, Dacheng Tao, Masashi Sugiyama; Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting. Neural Comput 2021; 33 (8): 2163–2192. doi: https://doi.org/10.1162/neco_a_01403
Download citation file:
Sign in
Don't already have an account? Register
Client Account
You could not be signed in. Please check your email address / username and password and try again.
Could not validate captcha. Please try again.
Sign in via your Institution
Sign in via your InstitutionEmail alerts
Advertisement
Related Articles
Local Overfitting Control via Leverages
Neural Comput (June,2002)
On a Scalable Entropic Breaching of the Overfitting Barrier for Small Data Problems in Machine Learning
Neural Comput (August,2020)
Related Book Chapters
Memorization and Generalization
Neural Network Design and the Complexity of Learning
Memorizing and Representing Route Scenes
From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior
Remembering and Forgetting
Left to Our Own Devices: Outsmarting Smart Technology to Reclaim Our Relationships, Health, and Focus
Forget Me Not
Sculptor and Destroyer: Tales of Glutamate—the Brain's Most Important Neurotransmitter