The dynamical regime and its importance for evolvability, task performance and generalization

It has long been hypothesized that operating close to the critical state is beneficial for natural and artificial systems. We test this hypothesis by evolving foraging agents controlled by neural networks that can change the system's dynamical regime throughout evolution. Surprisingly, we find that all populations, regardless of their initial regime, evolve to be subcritical in simple tasks and even strongly subcritical populations can reach comparable performance. We hypothesize that the moderately subcritical regime combines the benefits of generalizability and adaptability brought by closeness to criticality with the stability of the dynamics characteristic for subcritical systems. By a resilience analysis, we find that initially critical agents maintain their fitness level even under environmental changes and degrade slowly with increasing perturbation strength. On the other hand, subcritical agents originally evolved to the same fitness, were often rendered utterly inadequate and degraded faster. We conclude that although the subcritical regime is preferable for a simple task, the optimal deviation from criticality depends on the task difficulty: for harder tasks, agents evolve closer to criticality. Furthermore, subcritical populations cannot find the path to decrease their distance to criticality. In summary, our study suggests that initializing models near criticality is important to find an optimal and flexible solution.


Introduction
Operating close to the critical point at a phase transition between order (subcritical) and disorder (supercitical) has long been associated with optimal performance of complex systems. Several biological systems, such as gene regulatory networks (Balleza et al., 2008;Rämö et al., 2006), neural networks (Tkacik et al., 2015;Schneidman et al., 2006), collectively behaving cells (Halley et al., 2009;De Palo et al., 2017) or swarms (Cavagna et al., 2010;Chaté and Muñoz, 2014) have been shown to operate close to a critical point. Criticality has been associated with an ability to solve complex tasks (Villegas et al., 2016), flexibility towards changes in the environment and good evolvability (Aldana et al., 2007) in complex living systems (Kauffman, 1993). All these properties provide an adaptive advantage in natural environments, leading to the assumption that evolutionary dynamics push living systems close to the critical regime.
On the other hand, it has been suggested that the ubiquitous presence of noise in nature pushes living systems into a more robust subcritical regime. For example, in an evolutionary model of Random Boolean Networks (RBNs) decreasing the system size, making the task less complex, or introducing noise to the system pushes the optimal regime further into the subcritical range (Villegas et al., 2016). Similarly Rämö et al. (2007) observed, that information propagation is maximized in critical RBNs, however the optimal regime shifts slightly into the subcritical regime under the presence of noise. A related phenomenon has been observed on neuromorphic chips, where simpler tasks lead to optimal behavior in the subcritical regime, whereas harder tasks require progressively more critical dynamics (Cramer et al., 2020). Finally, for some applications, the combination of systems at different distances to criticality is shown to lead to optimal results (Zierenberg et al., 2020). The supercitical state has been universally observed to perform poorly (Villegas et al., 2016;Kauffman, 1993).
The benefits of criticality for the evolvability of living systems have been associated with the genotype-phenotype coupling. Specifically, it has been shown (De Jong, 2006) that a tight genotype-phenotype coupling leads to optimal evolvability. Due to this coupling, the dynamical regime has an impact on the properties of the fitness landscape. In an RBN model, the super-and subcritical regimes were shown to disturb the genotype-phenotype coupling (Kauffman, 1993) and lead to either very rugged or overly flat fitness landscapes. A rugged fitness landscape means that the evolutionary dynamics are just a random search -inefficient in high dimensions (Kauffman and Levin, 1987), whereas a Figure 1: Snapshot in time of population dynamics and schematic representation of the control network. A: Environment with 50 organisms (red circles with trails) foraging for food (green dots). B: A network with 12 neurons. Four sensory, four hidden and four motor neurons. All potentially allowed edges are displayed. The exact topology and edge weights are subject to evolution by the EA. very flat landscape strongly dampens the optimization process. Both phenomena lead to a complexity catastrophe, where an increase of system size leads to a failure to discover satisfying solutions with evolutionary search. Critical RBNs result in an intermediately rugged fitness landscape which allows for efficient hill climbing search and is less prone to the complexity catastrophe.
We study how the dynamical regime of populations of evolving organisms influences their ability to solve a task. Our investigation is conducted in a simple foraging game of scalable difficulty. This system allows us to analyze the changes in the dynamical regimes during the evolution. In addition, we propose a potential answer to the question of which dynamical regime demonstrates the best performance and stability with respect to changes in the environment.

Methods
We investigate a 2D environment where organisms controlled by individual neural networks forage for food. Each organism gains energy by eating food particles and consumes energy by moving. The organisms eat the food particles as soon as they run over them. We can increase the difficulty of the task by requiring the organism's velocity to be below a certain threshold when running over food in order to be able to consume it. The fitness of an organism is determined by its average energy throughout its lifetime, and an evolutionary algorithm (EA) optimizes the network that controls the organism to maximize its fitness.

Organism
The organisms in our model are controlled by an Ising neural network (INN) that has been previously used by Aguilera and Bedia (2017) as well as Khajehabdollahi and Witkowski (2020). The Ising network consists of N neurons that can be in one of two states s i ∈ {−1, 1}, i = 1, . . . N . There are sensory neurons that only receive input from the sensors and motor neurons that control the agent. All other neurons are hidden units. The connectivity of the network is described by the weight matrix J ∈ [−2, 2] N ×N and the adjacency matrix A ∈ {0, 1} N ×N , as shown in Figure 1B. The adjacency matrix is such that no direct connections between sensor and motor neurons exist. Following the Ising model, each network state has an associated energy where {i, j | A ij = 1} is the set of connections. The network stochastically minimizes the energy by following Glauber dynamics: At each network iteration, all non-sensor neurons are updated in a random order and the state of neuron i changes from s i to −s i with probability: where β is the inverse temperature of the network (β = 1/(T ·k B ), k B is the Boltzmann constant which we set to one and omit for simplicity) and ∆E i is the change in the energy of the network that is caused by the spin-flip of the i th neuron (changing its state s i to −s i ). The energy change ∆E i is determined by the connectivity matrix J and the states of neighboring neurons. A negative energy change leads to a greater likelihood of a flip. The parameter β controls the likelihood of energetically unfavorable flips. A larger β leads to deterministic network behavior dominated by the connectivity, whereas a smaller β leads to more random behavior. In principle, many iterations would be required to converge to equilibrium. For practical reasons, we perform Figure 2: The dynamical regime of a network can be calculated by scaling the inverse temperature with a factor c β to measure its heat capacity and measuring its distance to the corresponding peak. Heat capacity (Eq. 3) of the Ising networks ( Figure 1) for 50 initially subcritical (β init = 10, blue), critical (β init = 1, green), and supercritical (β init = 0.1, red) organisms as a function of c β . For each organism it reaches the maximum (marked by a dot) at individual values c β = c crit β . Dynamical regime δ = log(c crit β ) ≈ − log(β init ). The displayed populations are unevolved and the resulting dynamical regimes closely correspond to their respective β init . a fixed number of iterations (10) to process a new sensor input and create motor commands.
An organism has four input neurons that receive information about the angle θ food and distance d food from the closest food particle as well as its own velocity v and energy E. Moreover, each organism has four output neurons that control linear and rotational acceleration (2 neurons each) and N h hidden neurons ( Figure 1B). For most simulations we take N h = 4, to test scalability of learning we also consider a network with N h = 20. At the beginning of each simulation, an organism is provided with an amount of initial energy E init = 2. Movement reduces energy and consuming food particles increases it. We consider two versions of this environment: In the simple task organisms consume food when passing over it. In the hard task organisms have to slow down and almost stop to be able to consume food. Unless stated differently, a simulation lasts for a lifetime of t = 2000 time steps after which the evolutionary algorithm (EA) is applied, and the task is simple. 50 INN-controlled organisms are placed in a 2D environment with periodic boundaries and ever-respawning food particles, conserved to a value of 100 ( Figure 1A).

Evolutionary algorithm
The evolutionary algorithm applied to the INNs consists of a combination of elitism, mutation, and mating. At the end of the 2D simulation described above, the fitness of each organism is defined as their mean energy throughout their lifetime. Subsequently, the 20 fittest organisms are selected for reproduction.
The next generation of organisms is produced by applying a combination of copying, mutation, and mating procedures to the selected individuals. The copying algorithm transfers some of the fittest individuals into the next generation unchanged. The mutation algorithm adds or deletes edges in Reset 2D environment 17: end for A (connections not present in Figure 1B cannot be added), re-samples a random edge weight in J from a uniform distribution U(−2, 2), and perturbs the inverse temperature with multiplicative noise β = β · ∆β where ∆β ∼ N (1, 0.02). Finally, the mating algorithm takes a weighted average of the connectivity J and inverse temperature β from two parents to produce an offspring. In most of our simulations, the EA iterates for 4000 generations.

Defining the dynamical regime of an organism
We use the heat capacity from statistical physics to derive a measure of an organism's dynamical regime (sub-, super-, critical). In our finite system, we estimate the putative divergence point by changing the inverse temperature β multiplying it with a scaling constant c β . This change of tem- : Critically initialized populations can be successfully evolved in different circumstances, whereas for subcritically initiated populations, a harder task or an increased system size can lead to the breakdown of evolutionary dynamics. For each panel 10 initially critical (β = 1, green, top row) or initially subcritical (β = 0.1, blue, bottom row) populations evolve for 4000 generations. The dashed horizontal line at fitness = 2 in the subcritical panels correspond to the organisms' initial energy. It can be seen in the 12 neuron, hard task panel (bottom right) that the network is unable to achieve a fitness above its starting value.
perature influences how likely the state of the neurons will flip (Eq. 2), and thus change the equilibrium distribution of energies E (Eq. 1). Now we search for a c β that delivers the maximal value of the heat capacity C H (c β ), defined as: We define the c crit . An analogous procedure was used in (Tkacik et al., 2015). We define the distance of the network from the critical point by the logarithm of the scaling factor required to bring the network to criticality, δ = log(c crit β ). For unevolved organisms (first generation, Figure 2), the relationship between β init and δ can be approximated by: On a technical side, for the stability of the numerical procedure for finding the equilibrium distribution to estimate C H (c β ) we initiate the motor and hidden neurons in the state (s 1 , . . . , s N ) delivering minimum of E(s 1 , . . . , s N ), while keeping sensory inputs fixed to the observed values. During the evolution, the distance from the critical point can (and will) change.

Convergence of evolution
Populations of different initial states follow distinct evolutionary strategies but are all able to solve the standard foraging task. We observe evolution for 4000 generations, concentrating on the populations initiated at the supercritical (β = 10, δ ≈ 1), critical (β = 1, δ ≈ 0) and subcritical (β = 0.1, δ ≈ −1) regimes. Critical populations begin to rapidly gain fitness from the first generation in all EA realizations ( Figure 3A). The gradual and stable increase of fitness of the initially critical population suggests that successful hill climbing of the fitness landscape is taking place. In contrast, for subcritical populations, fitness mainly evolves via random jumps, and only half of the simulations reach the same fitness as the critical populations after 4000 generations ( Figure 3A). Such fitness dynamics indicate a random search strategy, which often leads to a population getting trapped in local maxima for extended periods of time. Confirming the previous observations by (Khajehabdollahi and Witkowski, 2020), we see that supercritical populations, after an initial random period, follow the same path as the critical ones.
For successfully evolvable populations, moderate changes in the complexity of the control network should not destroy the ability of the EA to reach a good fitness. Density δ Figure 5: The dynamical regime of the initially critical population in the harder task remains closer to the critical regime, than in the simple task throughout evolution. Main plot: For each condition 10 populations are initiated in the critical regime end evolve for 4000 generations (dark -hard task, light -simple task). Inset: histograms of dynamical regimes at generation 4000 of 75 populations per task, vertical lines denote means, p < 10 −13 . Difference between means in inset and main panel are due to the small sample size for the main panel.
We test the differences in evolvability for initially critical and subcritical populations by changing the size of the network from 12 to 28 neurons. Same as for smaller networks, the initially critical populations rapidly evolve in all realizations. By generation 4000 they even reach a slightly larger fitness than populations of smaller networks that had evolved for the same amount of time ( Figure 3B). At the same time, subcritially-initialized populations do not reach even half of their normal fitness. We observe the same difference between the dynamical regimes when we increase the task's complexity requiring organisms to slow down almost to zero velocity in order to consume food particles ( Figure 3C). In this harder task, the evolved populations' maximal fitness is expected to be lower than for the simple task. For the initially critical populations, we still observe the same hill climbing dynamics. However, the initially subcritical populations stay at an energy level of exactly two. This signifies that they do not use the originally supplied energy for moving and remain static throughout all 4000 generations, trapped in a local optima.
Overall, we see that although in simple tasks all populations can converge to approximately the same fitness, there exists a significant difference between the initially subcritical and initially critical populations. Namely, a convergence of the EA for critical populations resembles hill climbing. It is stable (all populations follow very similar fitness growth) and behaves similarly for larger networks or more complex tasks. At the same time, for subcritical populations, the EA resembles random search, which is stochastic and fails to find solutions in high-dimensional cases or for more complex tasks.

Evolution of the dynamical regime
Next, we investigate how evolution changes the state of the populations. To do so, we select a wide range of initial dynamical regimes (δ ∈ [−1.5, 1]) and examine how the dynamics of populations initialized in each of these regimes change throughout evolution.
Regardless of their initial dynamical regime, all populations converge to the subcritical regime, however, with different distances from the critical point (Figure 4). We also observe that strongly subcritical populations (δ < −1) generally achieve lower fitness levels in the simple task and are unable to solve the hard task.
Both near-critical and supercritical populations rapidly change their dynamical regimes and by generation 4000 reach an intermediately subcritical regime δ * . Strongly subcritical populations with δ δ * remain at their initial regimes, demonstrating a lack of evolutionary mobility, whereas subcritical populations initialized at different Figure 6: Populations initialized at criticality are always able to find solutions whose behaviors generalize in time beyond their training condition, whereas subcritical populations often overfit to the same conditions. A: 54 populations of each type (triangles -initially critical, circles -initially subcritical). After 4000 generations of evolution all critical populations reach a high fitness (indicated by color) and nearly perfect (with one exception) generalizability γ t (Eq. 5). The initially subcritical organisms split into badly generalizing cluster 2 (19 populations), generalizable cluster 3 (28 populations), and 7 populations that we did not assign to any cluster. This split is independent of their attained fitness. B: Energy and velocity as a function of time for representative examples of the organisms from cluster 2 and clusters 1,3 (marked in panel A). Organisms in cluster 2 reach their maximal fitness (sometimes higher than in clusters 1 and 3) at the end of their lives. The dashed orange line denotes the training lifetime of 2000 time steps.
Task complexity changes the dynamical regime found in the evolutionary limit. The hard task requires a smaller distance from criticality. We check the evolution of the dynamical regime in both simple and hard tasks ( Figure 5). We utilize the observation that all populations with an initial regime δ > δ * converge to similar values. Thus, we consider only initially critical populations. We obtain the distribution of dynamical states by considering 75 independent runs of evolution in both tasks after 4000 generations. The harder task results in a smaller distance from the critical regime (large δ), and the difference is significant (p < 10 −13 ). The difference is observed throughout the evolution ( Figure 5, the main panel based on ten populations).
Initiating the evolution close to the critical regime is important for unknown task complexity. We observe that the dynamical regime never changes towards larger values, but the subcritical convergence point can be at different distances from criticality. Thus, only starting at the critical point guarantees that the optimal dynamical state can be reached by evolution.

Generalizability
For successful biological systems robustness towards environmental change is the paramount feature, therefore, it should be used to distinguish the successfully evolved artificial organisms. We propose a simple measure to investigate how the model behaves outside of its explicit training conditions. Specifically, for a population trained for the organism's lifetime t train we define generalizability as the speed of growth of the average fitness if the organism's lifetime is extended to t extend . Formally: The stable generalizability, γ t = 1 corresponds to linear growth whereas sublinear behavior γ t 1 indicates possible overfitting to the particular organism's lifetime t train .
We consider initially critical (δ ≈ 0) and initially subcritical (δ ≈ −1) populations evolved for 4000 generations and then test their performance for an extended lifetime of 50 000 time steps (instead of the 2000 in training). As reported in previous sections, the critical populations converge to δ ≈ −0.55, and they all have a similar fitness after training. Interestingly, when increasing the organisms' lifetime, the fitness of the critical population continues to grow linearly, signifying almost perfect generalizability. About half of the subcritical populations reach the same fitness level. However, the subcritical populations split up into two clusters: one with generalizability close to 0 and another with generalizability close to 1 ( Figure 6A). Surprisingly, there is no difference in fitness between these two clusters. When we look more precisely at the individual organisms from the two clusters, we discover that the strategy of the organisms in the non-generalizable cluster is to increase the velocity permanently until the end of their training lifetime ( Figure 6B). However, moving with such a high velocity is not compatible with the energy influx from feeding, and they break down shortly after the end of their training lifetime, this demonstrates that these organisms overfit the training conditions. For all generalizable populations (clusters 1 and 3), the velocity fluctuates around a stable level, and the energy of the organisms grows linearly.
Overall, the initialization in the critical regime results in almost perfect generalizability of evolved populations, whereas initially strongly subcritical popuations risk overfit their training conditions.

Effect of genetic perturbations on the fitness
Next, we examine the stability of the evolved organisms to genetic perturbations. We apply genetic perturbations of different magnitudes to the evolved organisms of initially critical β init = 1 and subcritical β init populations. We perturb all weights of the connectivity matrix by randomly adding or subtracting a number f pert and then evaluate the fitness of the resulting organism. We find that fitness rapidly declines with perturbation magnitude for both populations, however the subcritical ones decline faster ( Figure 7A). We evaluate the fitness decline by the slope of an exponential function fitted to the fitness. For the hard task it is α = −2.26 and for the simple task it is α = −5.03, which is more than double the decline rate, indicating a much higher sensitivity of subcritical systems to perturbations.
The EA is a source of constant genetic perturbations that are necessary in the beginning of evolution but can become detrimental later. We consider the individual effect of the evolutionary operators (copy, mutate, and mate) on the resulting fitness of the organisms. The variability of fitness for copying simply reflects the natural variability in community fitness rankings and organism behaviour. However, both mating and mutation in fully evolved subcritical populations typically results in a fitness close to 2 -signifying totally unfit organisms ( Figure 7B). At the same time, initially critical organisms retain diverse fitness values after mutation and mating, some being close to the optimum. This indicates that the originally critical populations retain their evolvability as opposed to the rigid search performed by strongly subcritical populations.

Discussion
We demonstrate that in various scenarios evolving populations of agents converge to a moderately subcritical state with the resulting deviation from criticality depending on the tasks difficulty. This might appear to be a contradiction to the previous studies, suggesting that operating close to criticality is optimal for natural systems (Mora and Bialek, 2011;Munoz, 2018;Roli et al., 2018). However, a recent body of research showed that for simple tasks, operating at some distance to criticality might be an optimal solution for the sensitivity/stability tradeoff (Hidalgo et al., 2014;Tomen et al., 2014;Villegas et al., 2016;Cramer et al., 2020).
We observe that the distance from criticality affects an agent's ability to solve complex tasks and to robustly evolve generalizable behavior. Specifically, we observe that slightly subcritical populations are evolvable for different complexities of the control network and task, whereas strongly subcritical populations fail in both cases. Given that these prop-A B Figure 7: Initially critical populations show larger genotypic stability than initially subcritical ones. A: The phenotype (fitness) as a function of the genotypic perturbation (changes in connectivity) for two initial conditions: β init = 1 and β init = 10. Dashed black lines indicate exponential fit with exponent -2.26 (critical) and -5.03 (subcritical). B: The histograms of the fitness values for nearly fully evolve agents (born between generation 3500 and 4000), categorized according to the last evolutionary operator (copy, mutate, or mate) that was applied to them. The β init = 10 agents are much less likely to remain fit when their genotype is changed by either mutation or mating. erties are crucial for adaption in natural environments, we propose that living systems operate in the subcritical regime in close proximity to the critical point. Moreover, we show that the optimal regime moves closer to criticality as we increase the task difficulty, which suggests that the optimal distance from criticality varies. Those findings are confirmed by Cramer et al. (2020) as well as Villegas et al. (2016), who showed that the optimal distance from criticality in the sub-critical regime decreases for higher task complexity or larger system size. We further observe that populations can only become more subcritical during evolution and fail to decrease their distance to criticality even when this would have eventually led to superior behavior. As it is a priori unknown which distance from criticality will be optimal when evolving for a new task, starting at the critical point could be the only way for the evolutionary process to descend to the optimal regime. However, in the long run this would require some sub-populations to always maintain closeness to criticality. How this can be achieved for neuronal networks is a subject of vivid research (for a review, see (Zeraati et al., 2020;Buendía et al., 2020;Kinouchi et al., 2020)) and for the embodied Ising agents it remains open for further investigation. Maintaining evolvability in simpler systems was, for instance, observed by switching between different rough energy landscapes (Wang and Dai, 2019). The inhomogeneity of the environment and coevolution can also contribute to the preservation of the critical regime (Hidalgo et al., 2014). Overall, the maintenance of evolvability throughout evolution is an important question beyond the embodied Ising agents studied here.
Our results extend and partly revise the earlier findings of Khajehabdollahi and Witkowski (2020), that reported a superior evolvability of critical populations and an approximate convergence to criticality during evolution. We confirm that the critical regime allows reliable evolvability, additionally, we extend our understanding by considering a set of tasks and architectures in the model. However, our more precise procedure to infer the dynamical regime and the fine sampling of initial conditions uncover additional intricate dynamics. Namely, that critical populations of Ising-agents converge to the subcritical regime, and the distance to criticality depends on the task complexity. We also propose a new way to investigate capabilities of the resulting organisms by defining generalizability and genetic stability measures. Both measures reveal the benefits of staying close to the critical state beyond a simple fitness comparison.