Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-2 of 2
Paulo Rauber
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2022) 34 (11): 2232–2272.
Published: 07 October 2022
Abstract
View articletitled, Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits
View
PDF
for article titled, Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits
An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical context may introduce spurious relationships or lack a convenient representation of crucial information. In order to address these issues, we propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment. This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling. Our experiments on a diverse selection of contextual and noncontextual nonstationary problems show that our recurrent approach consistently outperforms its feedforward counterpart, which requires handcrafted historical contexts, while being more widely applicable than conventional nonstationary bandit algorithms. Although it is very difficult to provide theoretical performance guarantees for our new approach, we also prove a novel regret bound for linear posterior sampling with measurement error that may serve as a foundation for future theoretical work.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2021) 33 (6): 1498–1553.
Published: 13 May 2021
FIGURES
Abstract
View articletitled, Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients
View
PDF
for article titled, Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients
A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.