Abstract
One of the challenges of researching spiking neural networks (SNN) is translation from temporal spiking behavior to classic controller output. While many encoding schemes exist to facilitate this translation, there are few benchmarks for neural networks that inherently utilize a temporal controller. In this work, we consider the common reinforcement problem of animat locomotion in an environment suited for evaluating SNNs. Using this problem, we explore novel methods of reward distribution as they impacts learning. Hebbian learning, in the form of spike time dependent plasticity (STDP), is modulated by a dopamine signal and affected by reward-induced neural activity. Different reward strategies are parameterized and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is used to find the best strategies for fixed animat morphologies. The contribution of this work is two-fold: to cast the problem of animat locomotion in a form directly applicable to simple temporal controllers, and to demonstrate novel methods for reward modulated Hebbian learning.