A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.
Skip Nav Destination
Article navigation
June 2021
May 13 2021
Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients Unavailable
In Special Collection:
CogNet
Paulo Rauber,
Paulo Rauber
Queen Mary University of London, London E1 4FZ, U.K. [email protected]
Search for other works by this author on:
Avinash Ummadisingu,
Avinash Ummadisingu
Preferred Networks, Tokyo 100-0004, Japan [email protected]
Search for other works by this author on:
Filipe Mutz,
Filipe Mutz
Instituto Federal do Espírito Santo, Espírito Santo 29056-264, Brazil [email protected]
Search for other works by this author on:
Jürgen Schmidhuber
Jürgen Schmidhuber
Istituto Dalle Molle di studi sull'intelligenza artificiale, 6962 Viganello, Switzerland; Università della Svizzera Italiana, 6900 Lugano, Switzerland; Scuola universitaria professionale della Svizzera italiana, 6928 Manno, Switzerland; and NNAISENSE, 6900 Lugano, Switzerland [email protected]
Search for other works by this author on:
Paulo Rauber
Queen Mary University of London, London E1 4FZ, U.K. [email protected]
Avinash Ummadisingu
Preferred Networks, Tokyo 100-0004, Japan [email protected]
Filipe Mutz
Instituto Federal do Espírito Santo, Espírito Santo 29056-264, Brazil [email protected]
Jürgen Schmidhuber
Istituto Dalle Molle di studi sull'intelligenza artificiale, 6962 Viganello, Switzerland; Università della Svizzera Italiana, 6900 Lugano, Switzerland; Scuola universitaria professionale della Svizzera italiana, 6928 Manno, Switzerland; and NNAISENSE, 6900 Lugano, Switzerland [email protected]
Received:
January 31 2020
Accepted:
August 17 2020
Online ISSN: 1530-888X
Print ISSN: 0899-7667
© 2021 Massachusetts Institute of Technology
2021
Massachusetts Institute of Technology
Neural Computation (2021) 33 (6): 1498–1553.
Article history
Received:
January 31 2020
Accepted:
August 17 2020
Citation
Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, Jürgen Schmidhuber; Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients. Neural Comput 2021; 33 (6): 1498–1553. doi: https://doi.org/10.1162/neco_a_01387
Download citation file:
Sign in
Don't already have an account? Register
Client Account
You could not be signed in. Please check your email address / username and password and try again.
Could not validate captcha. Please try again.
Sign in via your Institution
Sign in via your InstitutionEmail alerts
Advertisement
Cited By
Related Articles
Hindsight Bias and Trust in Government
The Review of Economics and Statistics (February,2024)
Subgoal- and Goal-related Reward Prediction Errors in Medial Prefrontal Cortex
J Cogn Neurosci (January,2019)
From Prediction to Learning: Opening Experts' Minds to Unfolding History
International Security (April,2007)
Neuronal Encoding in Prefrontal Cortex during Hierarchical Reinforcement Learning
J Cogn Neurosci (August,2018)
Related Book Chapters
Hindsight
Wikipedia @ 20: Stories of an Incomplete Revolution
Hindsighted
The Acceleration of Cultural Change: From Ancestors to Algorithms
Planning Simple Trajectories Using Neural Subgoal Generators
From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior
Texture Gradients
Psychologists in Word and Image