Skip Nav Destination
Close Modal
Update search
NARROW
Format
TocHeadingTitle
Date
Availability
1-3 of 3
Djordje Grbic
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Proceedings Papers
. isal, ALIFE 2021: The 2021 Conference on Artificial Life108, (July 19–23, 2021) doi: 10.1162/isal_a_00451
Abstract
PDF
Neural Cellular Automata (NCAs) have been proven effective in simulating morphogenetic processes, the continuous construction of complex structures from very few starting cells. Recent developments in NCAs lie in the 2D domain, namely reconstructing target images from a single pixel or infinitely growing 2D textures. In this work, we propose an extension of NCAs to 3D, utilizing 3D convolutions in the proposed neural network architecture. Minecraft is selected as the environment for our automaton since it allows the generation of both static structures and moving machines. We show that despite their simplicity, NCAs are capable of growing complex entities such as castles, apartment blocks, and trees, some of which are composed of over 3,000 blocks. Additionally, when trained for regeneration, the system is able to regrow parts of simple functional machines, significantly expanding the capabilities of simulated morphogenetic systems. The code for the experiment in this paper can be found at: https://github.com/real-itu/3d-artefacts-nca .
Proceedings Papers
. isal, ALIFE 2021: The 2021 Conference on Artificial Life107, (July 19–23, 2021) doi: 10.1162/isal_a_00449
Abstract
PDF
Random exploration is one of the main mechanisms through which reinforcement learning (RL) finds well-performing policies. However, it can lead to undesirable or catastrophic outcomes when learning online in safety-critical environments. In fact, safe learning is one of the major obstacles towards real-world agents that can learn during deployment. One way of ensuring that agents respect hard limitations is to explicitly configure boundaries in which they can operate. While this might work in some cases, we do not always have clear a-priori information which states and actions can lead dangerously close to hazardous states. Here, we present an approach where an additional policy can override the main policy and offer a safer alternative action. In our instinct-regulated RL (IR 2 L) approach, an “instinctual” network is trained to recognize undesirable situations, while guarding the learning policy against entering them. The instinct network is pre-trained on a single task where it is safe to make mistakes, and transferred to environments in which learning a new task safely is critical. We demonstrate IR 2 L in the OpenAI Safety gym domain, in which it receives a significantly lower number of safety violations during training than a baseline RL approach while reaching similar task performance.
Proceedings Papers
. isal2020, ALIFE 2020: The 2020 Conference on Artificial Life283-291, (July 13–18, 2020) doi: 10.1162/isal_a_00318
Abstract
PDF
An important goal in reinforcement learning is to create agents that can quickly adapt to new goals while avoiding situations that might cause damage to themselves or their environments. One way agents learn is through exploration mechanisms, which are needed to discover new policies. However, in deep reinforcement learning, exploration is normally done by injecting noise in the action space. While performing well in many domains, this setup has the inherent risk that the noisy actions performed by the agent lead to unsafe states in the environment. Here we introduce a novel approach called Meta-Learned Instinctual Networks (MLIN) that allows agents to safely learn during their lifetime while avoiding potentially hazardous states. At the core of the approach is a plastic network trained through reinforcement learning and an evolved “instinctual” network, which does not change during the agent's lifetime but can modulate the noisy output of the plastic network. We test our idea on a simple 2D navigation task with no-go zones, in which the agent has to learn to approach new targets during deployment. MLIN outperforms standard meta-trained networks and allows agents, after an evolutionary training phase, to learn to navigate to new targets without colliding with any of the no-go zones. These results suggest that meta-learning augmented with an instinctual network is a promising new approach for RL in safety-critical domains.