Many believe that an essential component for the discovery of the tremendous diversity in natural organisms was the evolution of evolvability, whereby evolution speeds up its ability to innovate by generating a more adaptive pool of offspring. One hypothesized mechanism for evolvability is developmental canalization, wherein certain dimensions of variation become more likely to be traversed and others are prevented from being explored (e.g., offspring tend to have similar-size legs, and mutations affect the length of both legs, not each leg individually). While ubiquitous in nature, canalization is rarely reported in computational simulations of evolution, which deprives us of in silico examples of canalization to study and raises the question of which conditions give rise to this form of evolvability. Answering this question would shed light on why such evolvability emerged naturally, and it could accelerate engineering efforts to harness evolution to solve important engineering challenges. In this article, we reveal a unique system in which canalization did emerge in computational evolution. We document that genomes entrench certain dimensions of variation that were frequently explored during their evolutionary history. The genetic representation of these organisms also evolved to be more modular and hierarchical than expected by chance, and we show that these organizational properties correlate with increased fitness. Interestingly, the type of computational evolutionary experiment that produced this evolvability was very different from traditional digital evolution in that there was no objective, suggesting that open-ended, divergent evolutionary processes may be necessary for the evolution of evolvability.
The functional organisms produced by natural evolution are unfathomably diverse, from single-cell bacteria like E. coli to large mammals like elephants. The success of natural evolution is especially remarkable when one considers that it is fueled by mostly random and unrelated changes at the genetic level [57, 58, 79, 80, 96]. Thus, it is believed that natural evolution was aided by the emergence of evolvability [1, 23, 28, 45, 50, 51, 57, 96, 101, 124], that is, the emergence of genetic properties that increase the effectiveness of evolution.
Evolvability in natural systems is facilitated by many different innovations, a few of which are genetic structures like Hox genes [99, 125], sexual reproduction [49, 84, 124], the evolution of mutation rates [7, 97], structural organization in the form of modularity and hierarchy [22, 39, 55, 81, 124] (discussed in greater detail below), and the emergence of standardized body plans [41, 132]. In this article, we will focus on a particularly interesting driver of evolvability known as developmental canalization [16, 31, 117, 122], which ties together many of the aforementioned concepts. Canalization is the process whereby certain phenotypic dimensions of variation become resistant to genetic changes so that other, possibly more adaptive dimensions of variation are more likely to be explored. Here, a dimension of variation refers to a phenotypic trait that can vary individually, or a set of phenotypic traits that vary in concert. For example, change in the length of the right leg of a human would be one dimension of variation, and coordinated change in both legs represents another dimension of variation. As it turns out, in humans it is rare that one leg becomes substantially longer or shorter than the other , but there exists considerable variation in leg length between individuals , indicating that variations in human leg length have been canalized (specifically, the ability to individually vary leg lengths has been reduced, and the ability to vary both at once has been created).
Canalization is ubiquitous in natural systems, and as a result one might expect that forms of canalization would be consistently encountered in models and computational simulations of evolution as well. The opposite appears to be true: Forms of canalization are rarely reported in computational simulations of evolution, despite significant efforts to promote and discover it [3, 6, 17, 33, 61, 105, 109, 119, 123, 126], suggesting that these simulations do not adequately represent the full capacity of natural evolution. In contrast, this article uniquely displays evidence for the spontaneous emergence of canalization in Picbreeder, an interactive and open-ended system of simulated evolution, and we discuss why and how such canalizations may have emerged in this system, but not in others (Figure 1).
It is important to note that the emergence of evolvability, including canalization, does not require the evolutionary process to have knowledge about future environmental changes; that is, evolvability is not a form of “directed evolution” . Instead, it is widely believed that evolvability can emerge based on the evolutionary history of lineages [23, 28, 45, 60, 61, 101, 129]. In short, individuals whose genome is so structured that beneficial mutations are more likely and detrimental mutations are less likely have a better chance of producing viable offspring, meaning such evolvability can be directly selected through the benefits it provides. Provided that some forms of selection are persistent over evolutionary time while others vary, such as mismatching legs always being detrimental while optimal leg length varies over time, genetic structures that increased the probability of beneficial mutations in the past may do so in future environments as well.
Despite the fact that evolvability and canalization are often regarded as essential for the evolution of complex organisms, their origins remain an active topic of research and debate [1, 23, 28, 45, 50, 51, 57, 96, 101, 124]. The main challenge in answering questions regarding the origins of evolvability and canalization is that they are difficult to study in vivo; oftentimes, properties of interest can be difficult to measure, change, or control for [51, 95]. In addition, biological populations evolve slowly; even rapidly reproducing microorganisms take on the order of weeks to experience a few hundred generations of evolution , and the findings from these microorganisms do not necessarily generalize to their more slowly reproducing counterparts .
An alternative is to study these questions in computational simulations of evolution instead. While they may not seem as compelling as in vivo experiments, computational simulations can greatly improve our understanding of evolutionary processes, and have shed light on a variety of complex evolutionary questions, including the evolution of altruism [20, 86]; structural organization such as modularity [22, 39, 55, 56, 127, 128], regularity [52, 114], and hierarchy [24, 81]; mutation rates [7, 19, 21, 131]; sexual reproduction [5, 84]; genomic complexity [73, 74, 124]; gene duplication [63, 98, 125]; and coevolution [42, 94, 135]—to name but a few. Computational simulations are particularly attractive because the experimenters have full control over all variables involved in the evolutionary processes and, provided that the fitness function is simple, modern hardware can run thousands of generations of evolution in just a couple of days, allowing for rapid prototyping of hypothesis.
Unfortunately, while canalization is ubiquitous in nature [16, 31, 122], clear examples of canalization in computational simulations are rare [3, 6, 17, 33, 40, 43, 47, 61, 105, 109, 119, 123, 126], meaning that we lack a proper starting point from which to conduct experiments because we cannot study canalization in computational simulations of evolution if we cannot produce it in the first place. The fact that we do not know how to reproduce canalization also means that, when tackling challenging engineering problems with the help of evolutionary algorithms [14, 15, 18, 25, 36, 65, 76–78, 93, 110, 134], those algorithms are missing a key property that made natural evolution successful, possibly explaining why most evolutionary algorithm research restricts itself to fairly simple, unimodal tasks [8, 13, 55].
While such experiments are rare, the following investigations in computational simulation did touch upon the principles of canalization. Draghi and Wagner demonstrated the evolution of evolvability in a model where two vectors were optimized for minimizing the distance from their vector sum to a target point in a two-dimensional space . Vectors were specified by angle and magnitude, but angle mutations were much less common than magnitude mutations. After evolution, the angles between vectors would reflect the evolutionary history; if the target point remained stationary, the angles between vectors were arbitrary, but if the target point changed frequently, the angles between vectors were close to 90°, so that the entire space of possibly fit phenotypes could be quickly reached through magnitude mutations alone. Here, the angle between the two vectors controlled which dimensions of variation were more or less likely to be explored, and the evolutionary history determined which angle became fixed in the population, thus representing a rudimentary form of canalization.
Another form of canalization was demonstrated by Kouvaris et al., who worked with a model where groups of phenotypic traits had to be coexpressed to gain fitness . That is, the front wings, hind wings, and antennae of an abstract insect consisted of several parts, and all parts needed to be expressed simultaneously to form a functional body part. The environment cycled between favoring individuals with front wings and antennae, favoring individuals with both front and hind wings but no antennae, and favoring insects without any of these traits. Provided that the environment changed at the right frequency, individuals evolved such that mutations would either express or repress entire groups (i.e., modules) of phenotypic traits (e.g., complete wings or complete antennae), but never cause partial expression within a group. These groups of phenotypic traits presented a clear example of canalization, although achieving this effect required a fairly strict set of environmental conditions to emerge.
As we will present in this article, a possible source of canalization is genotypic structural organization in the forms of modularity and hierarchy. Following a conventional definition [22, 39, 55, 59, 82, 111], a genome is considered modular if it consists of groups of genes that have many interactions with genes in the same group, but few interactions with genes in other groups. A genome is considered hierarchical if interactions result in an ordered structure, such that interactions predominantly go from high-level structures, which tend to have global effects, to low-level structures, which are generally associated with local changes. Structural organization in terms of modularity, hierarchy, or both have been found in the gene regulatory networks of many species, including E. coli , sea urchins , yeasts , and Drosophila . Such structural organization can lead to canalization if it changes the likelihood with which phenotypic traits will change. For example, if two phenotypic traits are encoded by a single genotypic module, a single mutation is likely to affect both traits, whereas if the two traits are encoded by separate modules, there is a better chance that only one of those traits is affected. Similarly, if a genotype is hierarchically organized, a single mutation to a high-level component is likely to affect many phenotypic traits simultaneously, whereas a single mutation to a low-level genotypic component will probably only affect a single trait.
Research regarding structural organization has shown that, when individuals needed to adapt to a modularly changed environment (i.e., the overall goal in the changed environment would differ, but many of the subproblems in the environment would remain unchanged), structurally organized individuals both evolved and had increased evolvability (i.e., adapted faster) compared to unstructured individuals [22, 39, 55, 81]. These experiments implicitly also demonstrated a form of canalization, because structurally organized individuals were much more likely to rewire subproblems than unstructured individuals. In other words, for structurally organized individuals, dimensions of variation related to environmental subproblems were much more likely to be explored, whereas dimensions of variation related to more holistic changes in the behavior of the individual were less likely to be explored. However, most of this research focused on individuals with a direct encoding (i.e., the phenotype and genotype are equivalent), precluding the wide array of genetic interactions present in biological organisms [16, 31, 41, 122]. Research that did examine the effects of structural organization with a developmental encoding did not report on forms of canalization [52, 53].
The above experiments provide some proofs of concept for the evolution of canalization in computational simulations, but their models are simple. The present research shows the evolution of canalization in a more complex, open-ended system: the images evolved on Picbreeder.org, a website for the interactive evolution of pictures (Figure 1, top). We also show that many Picbreeder genomes display structural organization in the forms of modularity and hierarchy, and we present examples where the structural organization directly corresponds to the observed canalizations (Figure 1, middle). In addition, the results suggest that these structurally organized genomes are generally more fit in terms of offspring (Figure 1, bottom). Lastly, we will discuss the differences between Picbreeder and other computational simulations of evolution, and argue that the emergence of canalization may be directly facilitated by the ever changing, divergent, goalless nature of Picbreeder. The implication is that, as has been recently argued [66, 69, 103, 115], the success of natural evolution may not be due to short-term competition over common resources, but may be enabled instead by the long-term tendency to invade new niches and avoid competition altogether.
Picbreeder.org is a website, first presented by Secretan et al. , where users can interactively and collaboratively evolve images. Users visiting the site can “breed” images similarly to how one might breed livestock; the user starts with an initial population of images from which the user can select the images he or she finds most promising. Those will then be mated and mutated to form the next generation of images, and the process repeats. The user can continue this process until satisfied or bored, and can then choose to publish the result to the website, so that the results can serve as a seed for other users. Since its inception, over 10,000 images have been published on Picbreeder .
The evolutionary process is driven by the NeuroEvolution of Augmenting Topologies (NEAT) evolutionary algorithm . NEAT is an algorithm for the evolution of networks. It starts with simple networks, and slowly increases the size of the networks by adding nodes and connections. To evolve images with NEAT, the images are represented through an artificial genetic encoding called compositional pattern producing networks (CPPNs), as described in Section 2.2. Whenever a CPPN is mutated, every weight in the network has a chance of being changed by replacing it with a random number drawn from a normal distribution with a mean equal to the original weight of the connection and a variance of 1. In addition, there is a small chance of adding a connection between two unconnected nodes, and there is a small chance of adding a new node onto an existing connection. When multiple images are selected, their underlying CPPNs may be combined through crossover. To perform crossover between networks, following the convention of NEAT, nodes and connections in the network are first aligned by matching historical markings: unique identifiers that are assigned to every node and connection the first time they are added to a CPPN. Nodes and connections that are present in both parents will be randomly selected from either parent, whereas nodes and connections only present in one parent will always be added. The original NEAT algorithm also includes fitness sharing through speciation, added to preserve diversity within the population, but fitness sharing is not in effect on Picbreeder, because the individuals that get to reproduce are directly chosen by the user. Further details and parameters are described in  and .
When starting evolution, the user can choose to start from scratch, or to branch from an existing image. If the user chooses to start from scratch, the initial population of images will consist of simple geometric patterns, as specified by the initial small, randomized genomes (Section 2.2). However, if the user chooses to start from an existing image, the initial population will consist of direct offspring of the selected image. For the purpose of measuring the reproductive success of an image, we define the fitness of an image as the number of direct descendants of that image, where a direct descendant is defined as an image that was branched, evolved, and published directly by a single user from the original image without any of the intermediate forms being published. This measure of fitness encapsulates both the quality of the parent, because interesting images have a higher chance of being selected by a user for further evolution, and the evolvability of the parent, because users are unlikely to publish descendants if they were unable to introduce any interesting new changes in said descendants. This metric is noisy (an image placed on the front page for a long period of time, such as an “editor's pick” or top-rated image, may have many more descendants than a qualitatively similar image that did not make it to the front page), but it is arguably informative when taken in aggregate.
It is important to note that, in contrast to most classic experiments with evolutionary algorithms, Picbreeder has no overall goal. That is, while any individual user will select images that are aesthetically pleasing or interesting in some way, there is no “target image” that needs to be found. In addition, while users may form goals during a session, long lineages are often evolved by many different users, who may all have different strategies and motivations during image selection.
The genomes of the Picbreeder images, known as compositional pattern-producing networks (CPPNs) , are an abstraction of developmental processes. CPPNs have been described at length many times previously [13, 26, 44, 106, 112, 114], so here we only briefly describe them and how they abstract developmental biology. Consider the development of any multicellular organism: The organism will start as a single stem cell, which will multiply over time to form the mass of the organism. To form different functional parts of the organism, stem cells will have to determine what kind of cell to become (muscle, bone, neuronal, etc.), that is, their cell fate. The proper fate of a cell depends on its location in the developing organism; a cranial cell may have to become part of the central nervous system, whereas a distal cell may have to become part of a claw. In developing organisms, a cell can effectively glean its location by measuring the concentrations of different proteins and other chemicals, jointly referred to as morphogens , which form gradients throughout the developing organism. For example, if there exists a morphogen that is only produced at the extreme anterior of the organism, but slowly diffuses throughout the entire organism, the concentration of that morphogen provides location information with respect to the anteroposterior (front to back) axis. If a sufficient number of these morphogens are present over different axes (anteroposterior, dorsoventral, mediolateral, etc.), a cell can determine its location and hence its fate.
While these morphogens are effective at signaling position information to developing cells, simulating the diffusion of such morphogens is computationally expensive, which is why simulated diffusion-based artificial organisms are often restricted in size . However, in computational simulations of development, global positional information can be relayed directly to a cell, without the need to simulate diffusion. Inspired by this idea, CPPNs are functions from global positional information to cell fate; they take the position of a cell, such as the x and y coordinates of a pixel, and return its fate, in this case the color value (Figure 2).
An arbitrary function from position to cell fate is not sufficient to capture the power of developmental biology. For example, diffusing chemicals can spread smoothly in all directions, giving rise to symmetry. Genes can respond to their own gradients, enabling repeated patterns. Genes can also compose different gradients by responding only when multiple different morphogens are present (or absent) at the same time. To capture these properties, CPPNs are compositions of regular functions with specific behaviors; for example, Gaussian functions can provide symmetry, sine waves can provide repetition, and step functions like sigmoids can confer the ability to respond only when all necessary gradients are present (Figure 2, middle).
Each node in a CPPN is associated with one of these functions, and nodes interact with each other through weighted connections. At any node, the incoming values are multiplied by the corresponding connection weights, and the sum is passed to the function of that node. The input to the network is a geometric coordinate, and the output represents the cell fate, which will be the color of a pixel in an image.
Within a CPPN, every node can be considered a gene that produces a unique morphogen, and the output of the node can be considered the expression pattern of that gene. Because the Picbreeder CPPNs describe 2D images, the expression of an intermediate node can be visualized by creating a 2D image where each pixel is colored according to the output of the node at that location (Figure 2, right). For the purpose of such visualizations, pixels for intermediate nodes are colored from red (−1), to black (0), to white (1).
Different paths in a CPPN may result in different intermediate patterns, which may later be combined to form the final output. Thus, CPPNs can model many different interactions, including pleiotropy, redundancy, and different developmental pathways, without sacrificing computational tractability, making the model appropriate for the study of evolvability and canalization.
To analyze the CPPNs produced by Picbreeder, we developed a tool called CPPN-Examiner (CPPN-X). It makes it possible to pick any connection in the network and slowly change its weight while directly observing the effect on the pattern produced. The tool also allows labeling these connections and thereby makes it possible to create a fully annotated version of the network (e.g., Figure 5, right). To ensure that our analysis is relevant to all Picbreeder images, we faithfully modeled the CPPN-X tool after the online code base, written by Secretan et al. [106, 107]. We opted for creating a separate, offline tool to allow for additional computational optimizations of the CPPN, which greatly increased the speed at which the program can render the effect of weight changes, enabling us to view the effect of weight changes smoothly and in real time. The source code is available at www.evolvingai.org/CPPN-X.
Connections in genomes were annotated according to the following procedure. First, a not yet labeled connection was selected and we swept across the possible values for that connection from its minimum (−3) to its maximum value (3) at a 0.1 interval, viewing each intermediate image produced. In some rare cases, the 0.1 interval would be too coarse to properly observe the effect of the sweep, and in those cases we decreased the interval to 0.01. Then, we qualitatively classified the resulting change and annotated the connection accordingly. When assigning a label, we ignored background changes that occurred when the weight got far (generally 1 unit or more) from its original value (see SI Figure 46 for examples).1 We ignored these changes because we are interested in the effect of small genetic mutations, as those are considered to be an effective basis for modeling evolutionary processes . It is likely that every functional connection will cause some background changes if its weight is changed by a sufficiently large number, and so we do not believe that this effect should define the primary function of that connection.
After all connections were labeled, we merged classes with similar effects (e.g., merged classes such as “Move Spotlight Left-Right” and “Move Spotlight Up-Down” into a “Spotlight Only” class) to reveal the high-level functional decompositions of the genomes. In effect, this kind of manual experimentation and annotation is like a kind of artificial bioinformatics for Picbreeder CPPNs. Files containing the fine-grained decompositions are available for download at www.evolvingai.org/PicbreederCanalization.
Nodes were assigned labels according to the majority label among their incoming connections. Because we did not vary any attributes of the nodes, this labeling holds no additional information, and only serves to improve visual clarity.
It is important to note that this analysis of canalization is inherently subjective, as it requires a human observer to classify the nature of each change. While an objective measure of canalization would have been preferable, to the best of our knowledge there does not exist an appropriate, objective measure of canalization for images at this time. For example, naive metrics, such as localized change of pixel values in response to mutations, do not cover all relevant forms of canalization. Moving an object from one location in an image to another location in the image without changing the shape of the object is considered an important form of canalization, but such canalization does not result in local changes in pixel intensities. Conversely, a local change in pixel intensities affecting arbitrary parts of different objects in an image is generally not considered a form of canalization, yet it would be valued as such. It may seem that such issues could be resolved by employing techniques based on automated object recognition systems (e.g., deep neural networks ), as these systems are often associated with a sense of objectivity not attributed to human observers. However, there are still many problems that need to be solved before an automated system can replace a human observer in the current domain, as these systems may see objects that are not actually there , may misclassify objects due to imperceptible changes , and can inherit their own bias from the training data .
Despite the lack of an appropriate method for objectively measuring canalization, we do not believe that studying canalization in images should be avoided just because humans are (currently) the only agents capable of properly interpreting the data. Indeed, many fields, such as those that study animal and human behavior, have to rely on human judgments (e.g., of whether two animals are fighting, cooperating, hugging, etc.) [2, 27, 130]. As in those fields, and as has been argued before specifically in the context of harnessing human judgments in evaluating evolutionary algorithms , while it is important to note that the judgments are made by humans and are thus subjective, more is learned through good, albeit imperfect, measuring devices than by not performing any measurements at all. In addition, to facilitate an open discussion regarding our results and interpretations, many examples are included in this article and its supplementary material, and the visualization tool, capable of accessing and analyzing our complete data set of Picbreeder genomes, is freely available so that readers may judge for themselves the extent to which they agree with our subjective interpretations.
To validate that the previously described labeling process was fair and not biased by our knowledge of the hypothesis of canalization, we tested whether independent people would provide similar labels to those presented in this article. To do so, we compared the labeling presented in this article against the labels provided by individuals recruited through Amazon's Mechanical Turk program, a service where one can pay workers to conduct arbitrary tasks online. Because it was too expensive to obtain the number of labels necessary for proper statistical analysis (which was at least 30 labels per connection) for every connection in every genome analyzed in this article and its SI, we were only able to conduct this validation for a single genome. However, because the process used to label all images was the same (and we did not know at that time we would perform extra validation on any genome), if the process is found to be sufficiently accurate for one genome, it is likely that the process was accurate for all genomes. We decided to conduct the test on the central, focal genome-and-image pair, presented in this article, named “Spotlight Casting Shadow” (Figure 5). The results show that the Mechanical Turk workers, who were independent and not informed of our hypothesis, assigned labels to connections similar to those in the presented labeling (SI Figure 45) and that the presented labeling is not an outlier among the Mechanical Turk workers on three different metrics that measure to what degree a labeling matches the aggregate data obtained from the Mechanical Turk experiment (SI Figure 48). Thus, the analysis confirms that our labeling process was indeed fair and consistent with the labelings obtained from independent individuals. The full analysis and experimental details can be found in the SI (SI Section 4).
To examine whether Picbreeder images canalized dimensions of variation, we selected one image that a user titled “Spotlight Casting Shadow” (Figure 4). We selected it because it visually appears to contain a clearly distinct object in the image (the object), a correlated attribute (the shadow), and an independent, but also conceptually distinct entity (the spotlight). We thus wondered how these entities would respond to changes in the genome. One possibility was that this image would behave like a face seen in clouds. To us, human observers, such a shape may appear to consist of various different entities, such as eyes, a nose, and a mouth. However, as the clouds change shape in the wind, one would not expect any of these components to be preserved. For example, it is exceedingly unlikely for the expression on the face to cycle through different expressions, or the eyes to open and shut, or the entire face to expand appropriately, or the like. Instead, most often shapes seen in clouds are ephemeral, and quickly morph back into an amorphous cloud (or perhaps an entirely different shape), without any regard for the meaning once assigned to the shape and its parts. The same could have been true for this Picbreeder image, where the relationships between the different entities within the image would be solely within the eyes of the beholder, and where changes to the genotype would simply cause the image to become scrambled in unrecognizable ways. However, as described below, we discovered that the genome not only evolved to enable the different aspects of the image to be independently controlled while preserving their meaning, but that the dimensions of variation for these objects are sensible in that they enable changes to the image in a way that humans might expect the objects to be manipulated.
To test whether such conclusions extended to other images, we then analyzed the 12 most branched images from Picbreeder (recall that being branched can be considered a form of fitness in this system). We specifically tested whether their CPPN genomes contained links that affected a single, qualitative aspect of the image. To make this determination, we annotated the genome as described in Section 2.3. Images of all 13 fully labeled networks, including representative examples of variation, can be found in SI (SI Section 2).
Canalizations of dimensions of variation were found in every Picbreeder image we examined. While the images differed in the quality and quantity of canalizations, even the images that seemingly consist of arbitrary patterns have canalized some interesting dimensions of variation. Two example dimensions of variation for three different images are shown in Figure 3. Full videos of these and other transitions are available at www.evolvingai.org/PicbreederCanalization.
We picked the Spotlight Casting Shadow image to present in detail in this article (Figure 4), though most other images we analyzed have qualitatively similar properties (SI Section 2). While the image itself appears to have separate components (the object, its shadow, and the spotlight), it could have been the case that genetically these features were not decomposed and could not be altered independently. Surprisingly, however, the CPPN genome does contain individual connections specialized to modify only one of these three different entities. We found dimensions of variation corresponding to the size of the object, the size of the shadow, and the size and position of the spotlight (Figure 4). Moreover, we also found connections that change multiple entities in a coordinated fashion; the object and the shadow can be modified together so that both objects can grow or shrink simultaneously, which is the behavior one would expect if the shadow was actually cast by the object. Note that such canalization is not an inherent, inevitable property of CPPNs: The image titled “Dolphin” (Figure 1, right) features several visually distinct components, such as the eye of the dolphin, the snout of the dolphin, the head of the dolphin, and the water in the background, but most genes in the Dolphin CPPN are highly pleiotropic, and affect all of those components simultaneously.
From a visual inspection of the genome, color-coded to show the effect of each connection, it is apparent that the connections that control independent dimensions of variation are not randomly distributed throughout the genome (Figure 5). Instead, the genome exhibits a modular and hierarchical organization whereby different clusters of connections enable the manipulation of different dimensions of variation in the image, and where higher-level modules affect multiple aspects of the image while lower-level modules affect single aspects of the image. The genome starts with two high-level modules, one affecting the object and the shadow, and the other affecting the spotlight. The “Object And Shadow” module feeds into two lower-level modules that affect only the object or only the shadow of the object. Lastly, all information is aggregated into a “Global Lighting” module, which affects the brightness of the entire image without affecting the shapes within the image.
Curiously, the “Object Only” module also feeds into the “Shadow Only” module, which raises the question why changes to the “Object Only” module do not affect the shadow. This observation is important because, if changes to the “Object Only” module did affect the shadow, there would be no “Object Only” module, and the image would have lost several independent dimensions of variation. It turns out that the final image can be faithfully reconstructed by replacing the connections between the “Object Only” module and the “Shadow Only” module with connections from the bias input to the “Shadow Only” module, thus demonstrating that these connections from the “Object Only” module to the “Shadow Only” module only serve as a bias (SI Figure 42). Thus, the “Object Only” and “Shadow Only” modules are two low-level modules that are only related because they both receive their information from the higher-level “Object And Shadow” module.
This structural organization directly corresponds to the different dimensions of variation that have been canalized; the object and the shadow can be manipulated both together and separately, because there exist hierarchically organized modules that process those aspects together and separately. This also explains why there are no connections that affect only the object and the spotlight together (but not the shadow); there is simply no location in the genome where only these two properties are processed together. Thus, the Spotlight Casting Shadow genome is a practical example of how structural organization can lead to canalization. Witnessing this structural organization in the Object Casting Shadow and many other genomes (SI Section 2) led us to hypothesize that, in general, some canalizations in CPPN genomes are facilitated by a modular and hierarchical structure. Because modularity and hierarchy can be quantitatively measured, we investigated whether these properties exist at elevated levels in Picbreeder genomes.
We first tested the hypothesis that these genomes have evolved to have elevated levels of modularity. If Picbreeder images have indeed canalized dimensions of variation through modularity, and if those canalizations provide an evolutionary advantage, one would expect that, on average, Picbreeder genomes would be more modular than randomly generated null models. To test this, we approximated the maximal modularity Q-score for directed networks  for each network in the Picbreeder database, which at the time we were given a copy of it contained 9585 genomes, and we compared those values against the similarly approximated maximal modularity Q-score of random null models. The modularity Q-score of a network, given a particular division of the network into modules, indicates the fraction of edges that lie within a module (as opposed to connecting two different modules), minus the expected value of that same fraction for a randomly connected network . The network division that maximizes the modularity Q-score is known as an optimal split (approximated with an efficient, eigenvector-based method [70, 88]), and the corresponding Q-score of that optimal split is widely accepted as a measure of network modularity [11, 22, 48, 55, 81, 82].
To generate fair null models, two factors needed to be controlled for: (1) that the Picbreeder genomes were produced under a set of very specific constraints related to the NEAT algorithm (e.g., a fixed number of inputs and outputs, no recurrent connections, no disconnected nodes), and (2) that the networks in our data set are not all independent from each other. To control for these factors, 10 null models were generated for each Picbreeder network, where each null model was generated by iteratively applying NEAT “add node” and “add connection” mutations to the parent of the Picbreeder network until the null model had the exact same numbers of nodes and connections as the Picbreeder network. Here, the parent network refers to the most recent published ancestor of the Picbreeder network if it was branched, or the minimal starting network if it was created from scratch. Because Picbreeder does not feature any deletion mutations, this ensured that the null models underwent the same types of mutations as the actual network. This way, the only explanation for a difference between a network and its null models lies with the selection processes that happened between the parent and the child. Finally, the average modularity Q-score of the null models was subtracted from the modularity Q-score of the real network to arrive at the residual modularity score provided throughout this article. Because we subtract the null-model modularity Q, a residual modularity greater than zero indicates that a network is more modular than expected by chance. We found that Picbreeder genomes are significantly more modular than the random null models (median residual modularity: 0.0039 [0.0034, 0.0045] with 95% bootstrapped confidence intervals, p = 0 Wilcoxon signed rank test).
Another way to check whether modular genomes provide an evolutionary advantage in Picbreeder is to see whether there exists a positive correlation between (residual) modularity and fitness, where fitness is expressed as the number of times an image was branched and subsequently saved by a user. There exists a significant relationship between residual modularity and fitness (Pearson's correlation coefficient: 0.026, p = 0.012), suggesting that modularity does indeed have a positive effect on successful reproduction (Figure 6a).
Ideally, we would test whether modularity correlates with canalization, but we currently have no general way of quantifying canalization. Instead, we examined whether the algorithmically detected modules correspond to our manually annotated decompositions of the genomes, which would indicate that the algorithmically detected modules are indeed associated with a particular function. While not as fine-grained, the automatically detected modules, in terms of connections, do correspond roughly with the manually labeled modules for the Spotlight Casting Shadow image (Figure 7). However, this alignment is not as clear for other images, especially when the objects in the image are more overlapped (SI Section 2). Thus, while we have shown that modularity provides an evolutionary advantage, it is still an open question to what degree modularity leads to canalization in Picbreeder images.
Second, we quantitatively investigated the role of hierarchy within the subject genomes. As with modularity, we first examined whether Picbreeder images are more hierarchical than randomly generated networks. To do so, we quantified network hierarchy based on a metric described by Mones et al. (2012) . The idea of this metric is that, in hierarchical networks, a small number of nodes have a large influence while most nodes have little influence, whereas in a non-hierarchical network nodes tend to have more similar levels of influence. Thus, if a network is hierarchically organized, we expect a greater variance in node influence within the network than in a non-hierarchical network. Mones et al. (2012)  quantify influence in terms of local reaching centrality (LRC), originally defined as a function of the number of reachable nodes and the weights along the paths to those nodes. Because we are interested in CPPN structure regardless of weights, we define LRC solely based on the number of reachable nodes, as described in previous work . Given an LRC value for each node, the raw hierarchy can be calculated as the average of the normalized differences between each node and the maximum LRC in the network. As in previous work that measured the hierarchy of feedforward networks with this metric, we reverse all edges before we apply the measure, to avoid certain pathological results . In addition, to control for the effect of evolutionary constraints and interdependence between related networks, the average raw hierarchy of the ten null models described previously was subtracted from the hierarchy score of the original network to arrive at the residual hierarchy score reported throughout this article.
On average, Picbreeder networks are significantly more hierarchical than the randomly generated null models (median residual hierarchy: 0.00018 [1.1 × 10−16, 0.00052] and 95% bootstrapped confidence intervals, p = 8.0 × 10−118, Wilcoxon signed rank test). In addition, there exists a significant and positive correlation between residual hierarchy and fitness (Pearson's correlation coefficient: 0.037, p = 0.00028), where networks with a higher residual hierarchy have an increased fitness (Figure 6b). Thus, it appears that in addition to modularity, hierarchy also has a positive effect on the reproductive success of images, meaning that users unknowingly select for genomes that have these organizational properties. Because the hierarchy measure does not provide a hierarchical decomposition, we cannot check whether the measured hierarchy does indeed correspond with the manually observed hierarchy. Thus, testing to what degree hierarchy leads to canalization in Picbreeder images remains a topic for future work.
Thus far, we have shown images that have canalized various dimensions of variation, as well as an example of how, in some cases, structural organization can be the root of such canalization. This leaves us with the question of how these canalizations behave over evolutionary time. To answer this question, we examined the role of single connections in all ancestors of an image titled “Man Standing Silhouette,” a descendant of the Spotlight Casting Shadow image. Because the NEAT algorithm in Picbreeder labels connections with historical markings , it is straightforward to track connections over generations. We found that these canalizations and their genetic causes persist across evolutionary time, serving similar functional roles even in very different-looking images (Figure 8). The fact that these innovations are preserved over evolutionary time may explain why canalization appears to be so ubiquitous on Picbreeder; even if the emergence of these innovations is rare, their persistence means that genomes can accumulate them over time.
Similarly, once an interesting image structure is discovered, the genetic structures that encode it can be preserved throughout the evolutionary process. One especially striking example of this process comes from an image named “Ghost Face Spooky,” which contains a genetic structure that gives rise to a protoface (Figure 9, left, and SI Figure 41). In its descendants, the underlying image concept (e.g., a face) is still present, but it can be altered in a variety of ways to result in very different face images (Figure 9). In other words, the nodes and connections that produce the face concept are roughly preserved, but the exact weights and connectivity of those structures, and thus the pixel-by-pixel image pattern that results from them, can change dramatically. Thus, the genetic structures that give rise to this face concept functionally act as a “face” module that is preserved yet modified throughout generations. This phenomenon is somewhat reminiscent of adaptive radiations , where once an evolutionary innovation is discovered (e.g., the four-legged body plan), there is a cascade of new evolutionary species that take advantage of the new innovation, but apply it in very different ways (e.g., elephants, dolphins, crocodiles, kangaroos, various apes).
While we have shown examples of canalization in Picbreeder images, it is still unclear why canalizations have evolved in this system, but not in others. Clearly, the user somehow selects for genomes that have canalized various dimensions of variation. However, the genome is not visible to the user, and canalization, being a property solely describing how an image might change, is not directly discernible from the image itself. In fact, one might have expected the Dolphin (Figure 1, right) to be better canalized, but it is not, potentially explaining why its descendants are generally of poor quality and have a low fitness (in terms of the number of times branched). Thus, to answer why canalizations have evolved on Picbreeder, we will have to examine how a collection of independent users was able to consistently select for canalized genomes.
One concept that has often been presented as a driver for the emergence of structural organization and evolvability, if not canalization directly, is the idea of a changing, rather than a static, environment [33, 56]. While previous research generally alternated between a few fixed environments [33, 56], the Picbreeder system takes such selection to extremes, because the multitude of different users, and how those users may change their objectives, results in a highly dynamic environment where there exists selection in many different directions that continually change over evolutionary time. Such a selection regime, where there exists selection in many different directions, has also been referred to as divergent search . The divergent nature of Picbreeder has been posited as an essential property for its success, as the resulting images often do not resemble the intermediate stepping stones  (see also Figure 9). Consider the case of evolving the Spotlight Casting Shadow image (Figure 8, middle). Previous research has shown that direct selection for any particular target image, such as by taking the difference in pixel intensities as a fitness measure, only works for very simple shapes, such as the circle of the spotlight . More complex shapes, like the object with its shadow, are unlikely to ever be discovered this way. However, from its evolutionary history we know that to discover the Object Casting Shadow image, we may first have to select for something that resembles a doorknob with a keyhole (Figure 8, left). While we know the evolutionary history for this particular image, there is no way of knowing the intermediate stepping stones for any yet to be evolved image. Evolving the image of a house might require selecting images that resemble a fire hose, teakettle, and school bus first. The many different selection pressures present in a system like Picbreeder circumvent the issue of unknown stepping stones by providing evolutionary advantages for anything that looks interestingly different, thus preserving all potential stepping stones.
To understand how divergent search can increase evolvability, it is helpful to examine what happens with the genome under different selection regimes. When a genome is subject to selection towards a particular goal, the genome tends to expand in size as it collects and preserves small beneficial mutations . For example, in the domain of images, such mutations may cause a small number of pixels to get closer to their desired intensities. This way, the genome incrementally acquires a structure that allows for small, local changes. The downside of such genomic growth is that, in the extreme, every aspect of the phenotype can be changed independently, making coordinated changes much less likely. In the Spotlight Casting Shadow image, changing the size of the spotlight would require hundreds of coordinated mutations if every pixel had to be adjusted independently. Such fine-grained genetic representations are far less likely to evolve in a system with a divergent selection regime, such as Picbreeder. When there is selection for interesting change, rather than selection towards a particular goal, mutations that are most likely to be preserved are mutations that have large, yet coordinated effects (large uncoordinated effects, such as flipping every pixel in the image randomly, are generally not considered interesting). For example, in the Doorknob image (Figure 8, left), a user is much more likely to select an image where the size of the keyhole is changed as a whole than an image where some individual pixels of the keyhole change color, or where every pixel in the image changes color. This way, selection for interesting change is likely to result in genetic structures that favor coordinated changes over small incremental changes, or large uncoordinated changes.
While single-objective algorithms [29, 35] (or algorithms with a few fixed objectives ) are still the norm within the field of evolutionary computation [29, 30, 35], a new family of evolutionary search algorithms explicitly focuses on divergent search [67, 68, 87, 90, 103]. These algorithms, which are often referred to as illumination algorithms or quality diversity algorithms [87, 103], attempt to find the unknown stepping stones towards solutions by selecting for individuals that are interestingly different from anything found before. The main challenge for these algorithms is to quantify “interestingness,” because most problem domains allow individuals to be different in ways that are unlikely to result in stepping stones towards anything (e.g., white noise in image space). The algorithm known as Novelty Search selects individuals with the help of a distance function, such that individuals that are far away from previously discovered individuals have an evolutionary advantage [67, 68]. Essential for the success of Novelty Search is the choice of distance function, because this function determines whether any particular difference is interesting [67, 68, 83]. Another algorithm, known as MAP-Elites, offers a large number of different niches reserved for individuals with particular phenotypic characteristics . Here the success of the algorithm depends on the choice of characteristics, which need to be descriptive enough to preserve potential stepping stones without making the number of niches intractably large [25, 87, 120]. Directly based on MAP-Elites is an algorithm known as the Innovation Engine, where the different niches are not merely reserved for individuals with different phenotypic characteristics, but each niche may have a completely different fitness function . For example, when evolving images, one niche may favor individuals resembling cars, while another niche may favor individuals resembling dolphins. In the basic implementation the niches are determined in advance, and thus the choice of niches will determine the success of the algorithm, but in future implementations, niches may be determined dynamically . While all of these algorithms incorporate the divergent, goalless property of Picbreeder to a greater or lesser extent, none of them have thus far explicitly reported forms of canalization. Examining the genomes produced by these algorithms, and looking into the differences between them and Picbreeder, may shed more light on the origins of canalization and evolvability.
It is debatable to what extent the ever-shifting Picbreeder environment resembles natural evolution. On the one hand, natural evolution involves the presence of many different niches and environments, which may appear, disappear, and change over time [32, 38, 91, 102], thus resulting in divergent selection. On the other hand, natural evolution also includes long periods in which environments remain stable, emphasizing the effects of directional and stabilizing selection towards exploiting established niches [37, 71], selective forces that are mostly absent in the Picbreeder system (there are few evolutionary advantages for an image to remain identical, because users are unlikely to publish an image if they failed to produce at least some kind of change). It is probably a combination of both forces that defines natural evolution, which might explain why natural genomes not only contain potential for variation, but also feature many innovations whose main purpose is to reduce arbitrary variation and mutations in descendants.
Our understanding of the relationship between canalization and genomic structural organization remains incomplete. It is clear that, in some cases, structural organization can directly lead to canalization. However, we have also observed cases where networks with low modularity and hierarchy scores have canalized various dimensions of variation, and it is not hard to imagine a network that features structural organization, but not canalization. For this reason, we expect these properties to be correlated, but not always causally related. Given a good quantification of canalization, it would be straightforward to test this hypothesis, but for now that remains an open challenge for future work.
While ubiquitous in nature, canalization—the propensity of genomic structures to be mutationally robust against changes in some dimensions of variation (ways in which individual or combinations of phenotypic traits can change), whereas other, possibly more adaptive dimensions of variation, are free to vary—rarely emerges in computational simulation. We have shown the emergence of canalization in the goalless and open-ended system of Picbreeder, a website for the interactive evolution of images. An example was investigated where the canalizations are the result of structural organization in the genotype, in the form of modularity and hierarchy, and such genomic structural organization increases the reproductive success of individuals. Lastly, we argued that the divergent, goal-free nature of Picbreeder may be an important driver for the spontaneous evolution of canalization.
Support came from the Santa Fe Institute to K.S. and J.C., a National Science Foundation CAREER award (CAREER: 1453549) to J.C., and a National Science Foundation Robust Intelligence grant no. IIS-1421925 to K.S. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
The supplementary information (SI) can be found at http://www.evolvingai.org./PicbreederCanalization.