Abstract
We consider the problem of designing local reinforcement learning rules for
artificial neural network (ANN) controllers. Motivated by the universal
approximation properties of ANNs, we adopt an ANN representation for the
learning rules, which are optimized using evolutionary algorithms. We evaluate
the ANN rules in partially observable versions of four tasks: the mountain car,
the acrobot, the cart pole balancing, and the nonstationary mountain car. For
testing whether such evolved ANN-based learning rules perform satisfactorily, we
compare their performance with the performance of SARSA() with tile coding, when the latter is provided with either
full or partial state information. The comparison shows that the evolved rules
perform much better than SARSA(
) with partial state information and are comparable to the one
with full state information, while in the case of the nonstationary environment,
the evolved rule is much more adaptive. It is therefore clear that the proposed
approach can be particularly effective in both partially observable and
nonstationary environments. Moreover, it could potentially be utilized toward
creating more general rules that can be applied in multiple domains and transfer
learning scenarios.