Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-2 of 2
Enric Celaya
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2017) 29 (1): 220–246.
Published: 01 January 2017
FIGURES
| View All (19)
Abstract
View article
PDF
Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2015) 27 (5): 1142–1157.
Published: 01 May 2015
FIGURES
| View All (16)
Abstract
View article
PDF
In the online version of the EM algorithm introduced by Sato and Ishii ( 2000 ), a time-dependent discount factor is introduced for forgetting the effect of the old estimated values obtained with an earlier, inaccurate estimator. In their approach, forgetting is uniformly applied to the estimators of each mixture component depending exclusively on time, irrespective of the weight attributed to each unit for the observed sample. This causes an excessive forgetting in the less frequently sampled regions. To address this problem, we propose a modification of the algorithm that involves a weight-dependent forgetting, different for each mixture component, in which old observations are forgotten according to the actual weight of the new samples used to replace older values. A comparison of the time-dependent versus the weight-dependent approach shows that the latter improves the accuracy of the approximation and exhibits much greater stability.