Unambiguous identification of the rewards driving behaviours of entities operating in complex open-ended real-world environments is typically not possible. Nonetheless, goals and associated behaviours do emerge and are dynamically updated. Reproducing such dynamics in models would be highly desirable in many domains. Simulation experiments described here assess a candidate mechanism for dynamic reward updating through learning and inheritance, and successfully demonstrate the abandonment of an initially rewarded but ultimately detrimental behaviour.

This content is only available as a PDF.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.