Abstract
Unambiguous identification of the rewards driving behaviours of entities operating in complex open-ended real-world environments is typically not possible. Nonetheless, goals and associated behaviours do emerge and are dynamically updated. Reproducing such dynamics in models would be highly desirable in many domains. Simulation experiments described here assess a candidate mechanism for dynamic reward updating through learning and inheritance, and successfully demonstrate the abandonment of an initially rewarded but ultimately detrimental behaviour.
Issue Section:
General Conference
This content is only available as a PDF.
© 2023 Massachusetts Institute of Technology Published under a Creative Commons Attribution 4.0 International (CC BY 4.0) license
2023
Massachusetts Institute of Technology
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.
Issue Section:
General Conference