Abstract

In order to make lifelike, versatile learning adaptive in the artificial domain, one needs a very diverse set of behaviors to learn. We propose a parameterized distribution of classic control-style tasks with minimal information shared between tasks. We discuss what makes a task trivial and offer a basic metric, time in convergence, that measures triviality. We then investigate analytic and empirical approaches to generating reward structures for tasks based on their dynamics in order to minimize triviality. Contrary to our expectations, populations evolved on reward structures that incentivized the most stable locations in state space spend the least time in convergence as we have defined it, because of the outsized importance our metric assigns to behavior fine-tuning in these contexts. This work paves the way towards an understanding of which task distributions enable the development of learning.

This content is only available as a PDF.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.