I describe the Utrecht Machine (UM), a discrete artificial regulatory network designed for studying how evolution discovers biochemical computation mechanisms. The corresponding binary genome format is compatible with gene deletion, duplication, and recombination. In the simulation presented here, an agent consisting of two UMs, a sender and a receiver, must encode, transmit, and decode a binary word over time using the narrow communication channel between them. This communication problem has chicken-and-egg structure in that a sending mechanism is useless without a corresponding receiving mechanism. An in-depth case study reveals that a coincidence creates a minimal partial solution, from which a sequence of partial sending and receiving mechanisms evolve. Gene duplications contribute by enlarging the regulatory network. Analysis of 60,000 sample runs under a variety of parameter settings confirms that crossover accelerates evolution, that stronger selection tends to find clumsier solutions and finds them more slowly, and that there is implicit selection for robust mechanisms and genomes at the codon level. Typical solutions associate each input bit with an activation speed and combine them almost additively. The parents of breakthrough organisms sometimes have lower fitness scores than others in the population, indicating that populations can cross valleys in the fitness landscape via outlying members. The simulation exhibits back mutations and population-level memory effects not accounted for in traditional population genetics models. All together, these phenomena suggest that new evolutionary models are needed that incorporate regulatory network structure.
Recent investigations into the biological evolution of language have proposed a number of fascinating models of exactly what the benefits of communication and shared linguistic abilities are, and how learning, social structure, and other non-selectional forces contribute. Many of these take the from of simulations or formal mathematical models. For example, there are plenty of abstract models based on communication game dynamics [11, 43, 44, 53,,,–57, 60, 61, 63,–65, 67, 71], iterated learning [8, 9, 23, 37, 62] and neural networks . Language involves sensory, motor, and cognitive machinery that all serves other purposes, such as breathing, mimicking gestures, and producing alarm calls. There is reason to believe that the language faculty arose when selection favored mutants bearing more strongly neurologically connected variants of those components that yielded superior performance on speech production and interpretation tasks [28, 73, 74]. However, there is a gap in the literature between philosophical arguments, neuroscience, and molecular genetics. Until that gap is filled, there can be no complete understanding of how those components were exapted to form the language faculty.
As an example of a problem in this gap, consider how a combinatorial signaling system might evolve. Suppose a simple communication system is in place, where each of a finite list of distinct signals maps to one meaning. In a noisy environment with enough meanings, an information-theoretic constraint limits the fitness of such a system . In contrast, a combinatorial system represents each meaning as a string of sounds from a finite inventory, so it can construct an unlimited lexicon of distinguishable words and bypass this limit. Thus, under appropriate circumstances the combinatorial system can invade. However, it is unclear how a population would develop a combinatorial system in the first place. Zuidema and de Boer  partially address this problem, but the ability to produce words over time is built into their simulation rather than discovered.
The simulation explored in this article focuses on a simplified foundational example of how natural computational machinery might evolve: discovery of a mechanism for transmitting a binary word through a narrow signaling channel by making clever use of timing. Combinatorial phonology is an example of a similar task used in language. In another context, when an animal performs a memorized gesture, a sequence of muscle movements stored as a complete memory must be replayed in sequence along neural fibers to groups of muscles. How might such a system evolve, given that natural computation uses asynchronous chemical reactions as the fundamental operation? A particularly interesting complication is the chicken-and-egg nature of the problem: A perfect encoding mechanism is useless without a corresponding decoding mechanism.
The heart of the simulation is the Utrecht Machine (UM), an artificial regulatory network (ARN) capable of general computations. Its binary representation is compatible with a variety of mutations, including substitutions, deletions, and duplications, as well as recombination. UM simulations are therefore ideal for studying how evolution might discover mechanisms for performing complex calculations in higher organisms, as well as the interactions between population-level forces such as frequency-dependent fitness, and molecular processes such as gene duplication. Simulating the evolution of solutions to coding and transmission problems is meant to be a first step toward understanding how neural networks evolve, and more broadly, how the computational machinery required for syntactic tree manipulation and other language processing tasks might have evolved [2, 77].
This article focuses on how UM-based agents evolve solutions to a communication problem and how selection strength interacts with certain aspects of molecular genetics. As will be explained in Sections 1.1 and 1.2, simulations based on ARNs produce computational mechanisms that resemble biochemical reaction networks. The selective breeding process discovers these mechanisms with the aid of mutations that resemble those of molecular genetics. Thus, evolutionary dynamics observed in the UM simulation provide stronger support for the significance of corresponding dynamics in biological evolution than can be derived from simulations based on other principles.
Several such dynamics are discussed in a detailed case study of one sample run of the simulation in Section 4. This case study exhibits several notable phenomena related to the UM's resemblance to biochemistry: the use of gene duplication, beneficial back mutations, and assembly of gene networks via recombination.
Successful agents in the UM simulation include several genes that are duplicates of a common ancestor. The evolutionary process takes advantage of gene duplication as a means of adjusting the dosage of simulated gene products, and through neofunctionalization; that is, one copy of a gene maintains its original function, freeing others to mutate so as to restructure the regulatory network [18, 70].
The history of a genome explored in detail in the case study includes a mutation that reverses an earlier mutation. Back mutations are normally assumed to be so unlikely that many models explicitly exclude them, as in, for example, the infinite alleles model . This case study suggests that such assumptions may need to be revisited; see for example the discussion in .
As will be discussed in Section 5.1, recombination significantly reduces the time required for the UM simulation to discover solutions . The bit transmission problem requires a sending mechanism, a compatible receiving mechanism, and a timing mechanism. Small-scale mutations, such as substitutions, refine those mechanisms and build an inventory of variations in the population. However, in the case study, all important punctuation events involve recombination of existing gene networks into a novel, better-functioning whole.
Innovations can arise from outliers, that is, members of the population that are not the most highly rated at the time. It seems that highly rated agents tend to cluster near peaks and ridges in the fitness landscape. Peripheral genomes with lower ratings are more likely to yield offspring that lie on a different hill that may have a higher peak.
In Section 5, I present results from 10,000 samples using each of six different settings for selection strength and variants of the recombination mechanism. These confirm that recombination speeds up evolution by an order of magnitude. Stronger selection delays the discovery of solutions, and steers the population toward atypical solutions. The population also experiences implicit selective pressure favoring solutions that are robust against mutation. Furthermore, the time required for the last major innovation is weighted toward shorter times, indicating a memory effect not accounted for by typical population genetics models [20, 26, 65].
In summary, the detailed and aggregate analyses of sample runs of the UM simulation discussed in this article clearly reveal the significance of features of evolutionary dynamics that are often obscured or omitted in other models. It serves as an informative first step toward understanding how biochemical reaction networks evolve the machinery necessary to carry out computations.
1.1 Comparison of Execution Models
In an evolutionary simulation, a virtual machine or other execution model takes the place of biochemical reactions in performing some simplified abstraction of life functions. Some existing execution models are far removed from biochemistry, sometimes even obscuring the distinction between genotype and phenotype. While such a simulation may yield insight into population dynamics or solve an engineering problem, it is unlikely to provide much insight into how living cells function and how those functions developed. Many such simulations use execution models that fall into the following categories:
Numerical parameters: A binary genome encodes a list of flags, integers, and floating-point numbers, which are fed as parameters to an externally specified computation. To give some examples, several projects use such a representation for weights in an artificial neural network [42, 82, 85].
Expression tree: The genome is an S-expression evaluated by a Lisp interpreter . As an example, the ECJ software package includes implementations of several genetic algorithms, including some with expression tree genomes .
Register machine: The genome is a sequence of instructions similar to what might be understood by a conventional CPU, but with modifications to make them more amenable to evolution. Two notable projects using register machines are Avida  and Plazzmid .
1.2 Artificial Regulatory Networks
An artificial regulatory network (ARN) is a different kind of execution model designed to more directly reflect biochemistry. The state of a biological computation lies in which molecules exist where and at what concentrations. Calculations are performed by reaction, diffusion, and transport, and many happen simultaneously. In particular, genes are translated into proteins, which interact with the environment, and regulate gene translation, resulting in a gene regulatory network (GRN) that is frequently modeled as a dynamical system . Since the goal of this project is to develop simulations of the discovery of biological computational mechanisms, ARNs are a more suitable execution model than the others.
Several ARNs have been described in the literature. A discrete example is the Boolean network, consisting of a directed graph where each vertex is associated with a Boolean state and a function that updates that state based on the states of neighboring vertices [21, 36, 79]. Each vertex represents a gene or operon that is either on or off. Vertex update functions model how products of genes enable or disable other genes.
ARNs can also be built using continuous-time mass-action kinetics. Typically, such a network includes a variable for each gene that represents the extent to which it is active, and one for the amount of each gene product in solution [31, 32]. Activation levels change at a rate that is a potentially complicated function of the amounts of various products, and those in turn determine the production rates of products. ARNs of this type are high-dimensional nonlinear dynamical systems.
Since the evolutionary origins and history of biochemical computing mechanisms are also of interest, it will be necessary to formulate an ARN with an encoding analogous to DNA. For example, the genetic code used in the aevol project [39, 41] includes analogues of promoters, operators, introns, exons, and start and stop codons. These are used to construct a network of virtual proteins and ultimately a fuzzy set or function. However, since the phenotype in aevol is essentially static, it is not appropriate for studying temporal coding problems, including the bit transmission problem explored in this article. Knabe et al.  used an ARN with a similarly realistic genome and discrete-time dynamics to evolve oscillators.
2 The Utrecht Machine
This section describes the Utrecht1 Machine (UM), the evolutionary simulation, and the rationale behind their design. The UM was designed to bridge the subfields of molecular genetics, population genetics, and biocomputation, at a level of detail that is informative but not overwhelming. The history of all agents is recorded in a database, which makes it possible to recover in full detail the intermediate stages of the population. The history of any gene can be traced. Any run can be reproduced by using the same configuration and random seed. The complete inner workings of each agent are fully accessible through both command line programs and a graphical user interface (shown in Figure 1). Further implementation details are included in the online supplement.
2.1 UM State and Instructions
The UM's state is an activation table A mapping integers called chemical patterns p to integers A[p] called activation levels. Patterns are analogous to sequences of promoter, enhancer, and operator binding sites near a protein coding region, and to regulatory proteins with matching DNA binding domains such as transcription factors, activators, and repressors. An activation level A[p] is an abstraction of the count of activating molecules minus inhibiting molecules for a binding site pattern p. If A[p] > 0 then activation exceeds inhibition, and if A[p] < 0 then inhibition exceeds activation.
Each pattern is associated with an activation level rather than just being on or off, so it represents chemical memory and time delays more directly than a Boolean network. Execution of UM programs is generally faster than the numerical integration required for mass-action kinetics.
2.2 Input and Output, Channels, Synapses, and Colonies
One bit of input per time step is provided by externally increasing the activation level of a particular pattern during each time step at which the input bit is set. Any number of such inputs can be configured. A channel specifies that a particular pattern's activation is read, and an output integer is emitted when the activation is greater than or equal to a given threshold. The output integer could be interpreted as the index of a bit in an output word to be set, or a pattern to activate in another UM. Two UMs can be linked by such a channel, which is called a synapse. A network of UMs linked by synapses is called a colony. A UM may be configured to output any of several integers, and any set of them may be emitted each time step. By convention, output integer −1 is the stop signal, which indicates that the computation is complete.
2.3 Efficacious Genes
Generally, only a few genes in any particular genome are part of the regulatory network of interest. Such genes and the instructions they encode will be called efficacious, defined as follows. Given a genome, split each instruction A[s] ≥ θ ⇒ inc A[pup], dec A[pdown] that is ever active into two arrows (directed edges) s → pup and s → pdown to form a directed graph. Formally, the efficacious arrows are the directed edges in the maximal directed subgraph thereof from which any pattern node used for output is reachable. An instruction and the gene that encodes it are said to be efficacious if one of the instruction's arrows is efficacious. Algorithmic details are given in Section A of the supplement. Unless otherwise noted, UM diagrams such as Figure 2 show efficacious arrows but no others.
2.4 The nand Example
To give an example, Figure 2 shows a UM that computes the nand operation, that is, q0 = ¬(p0 ∧ p1). The inputs p0 and p1 are fed to the machine through patterns 4 and 5: When p0 is on, 1 is added to A each time step, and when p1 is on, 1 is added to A. Unless both of the inputs are on, instructions 1 and 2 add enough to A that the output integer 0 is eventually emitted, meaning q0 = 1. However, if both inputs are on, instructions 3 and 4 become active, and they keep A below 3, so output 0 is never emitted, meaning q0 = 0. After 4 time steps, A = 4, which meets the threshold to output the stop signal −1. Then the computation is complete, q0 = 1 if the machine is emitting output 0, and q0 = 0 if not.
2.5 Genomic Encoding
For the purposes of evolutionary simulation, a UM program can be encoded as a bit string. The term gene will be used both for a substring that represents exactly one instruction, and for the instruction itself. The first layer of encoding is the instruction bit string. Each instruction is represented by concatenating base two representations of its component integers.2 The instruction bit string is further encoded as a genome bit string using a majority-of-three code . The genome bit string is interpreted by splitting it into triples of bits, called codons. A codon consisting of two or three set bits in any order translates to a instruction bit of 1. Other codons have two or three zeros and translate to 0. The instruction bit string resulting from translated codons is broken into substrings of the proper length, which are then interpreted as integers, and from there into a list of UM instructions. Each possible instruction has a unique representation as an instruction bit string, but many possible representations as a genome bit string. See Figures 3 and 4 for a possible genome of the example nand machine.
The majority-of-three encoding is based on the biological genetic code, which uses 64 three-letter codons to refer to 20 or so amino acids plus a few control codes. Consequently, some amino acids are represented by several synonymous codons, each of which has a certain robustness: Some single-letter mutations turn it into another synonym, leaving the resulting protein unchanged. The choice of synonyms is thought to indicate how recently a gene has been subject to selective pressure to produce a protein that matches another molecule, in arms races between pathogen and immune system for example [47, 75, 76]. In other circumstances codon choice may be due to neutral drift, founder effects, or biases in the DNA repair mechanism [6, 30]. A gene that is part of an essential mechanism is likely to accumulate robust codons, because offspring of individuals with a fragile codon are more likely to suffer a mutation that breaks the mechanism. Conversely, there are more possible single-letter substitutions that convert a codon for a different amino acid to one of the fragile codons than to a robust one, so a recently acquired beneficial mutation that resulted in an amino acid change is more likely to result in a fragile codon.
2.6 Reproduction, Recombination, and Mutation
Child genomes are produced by combining two parent genomes and randomly mutating the result. The crossover process treats the parents asymmetrically, with one designated as the left parent and the other designated as the right parent. They are aligned at their beginnings, and a crossover index is selected uniformly at random. Both parent genomes are split at the crossover index. The part of the left parent from the beginning to the crossover index becomes the left part of the child genome, and the part of the right parent from the crossover index to the end becomes the right part of the child genome (Figure 5). This design combines features of several natural genome architectures. Each agent is effectively haploid like a bacterium, so there are no redundant or competing alleles, but except for certain specially configured sample runs, each reproduction event is sexual. The mutation process modifies the child genome by randomly toggling a few bits, randomly deleting substrings, and randomly duplicating substrings. A substring is duplicated by inserting a second copy immediately following the original. The simulation would more directly represent the biology if it applied mutations to the parent genomes before crossover, but applying mutations to the child is mathematically equivalent and computationally more efficient, because it does not waste random numbers modifying bits in a parent that will be discarded immediately during crossover.
The simulation always cuts genomes between genes, so that duplication, deletion, and crossover never split a gene. The rationale behind this design choice is that DNA in higher organisms seems to be organized into modules suitable for recombination, although the details are complicated and controversial [13, 14, 22, 35]. According to , for example, recombination in yeast is more frequent at hot spots that are generally intergenic rather than intragenic. The biochemical machinery responsible for crossover seems to favor sites with properties typical of intergenic regions, but even if crossover occurred uniformly at random, crossover points would be more likely to fall in an intergenic gap than within a coding region, simply because gaps are very long. Thus crossover typically shuffles the set of coding regions on a chromosome without altering their contents, which allows evolution to explore different combinations of functional modules with less risk of damaging a coding region. Similarly, the substrings representing instructions in a UM genome are effectively separated by infinitely long noncoding regions that are not actually stored.
There are no short indel mutations in the simulation and therefore no frameshifts. Frameshifts normally irreversibly deactivate a gene, but since the simulation allows gene deletion, there is currently no need for this additional mechanism. If a reason arises to include short indels and frameshifts, a future UM simulation could use an alternative representation of genomes as an unstructured bit string that explicitly stores both coding and noncoding regions. Instead of storing one instruction every so many bits, coding regions (exons) would be delimited by special start and end sequences as in DNA, genes could even overlap as in bacteria, and mutations could toggle the status of any substring [39, 41, 78, 82]. However, that representation requires significantly more storage space and processing time and is beyond the scope of the current article.
2.7 Selective Breeding
The selection protocol is straightforward and flexible: The total population size is fixed at some number K, and each generation contains the same number of new agents. Each agent has a unique identification number, assigned sequentially, and a rating based on how well it solves some problem. The current generation is processed as a sequential list of agents, indexed 0 through K − 1, in order of rating from high to low. Ties are broken by ordering younger agents (with higher identification numbers) before older agents, a seemingly small detail that prevents a certain kind of stagnation. Selection is parameterized by a list of integers (k0, h1, k1, h2, k2, …, hL, kL). To form the next generation, the first step is to build an initial list of survivors consisting of agents with indexes 0 through k0 − 1 in the current generation (elitism). Then k1 pairs of parent agents are selected uniformly at random from among agents with indexes 0 through h1 − 1 of the current generation. A random choice determines which parent in each pair is designated as left or right for the crossover process, and one child is built. Then k2 new agents are created by breeding pairs of parents selected randomly from agents indexed 0 through h2 − 1, and so on. Thus agents near the beginning of the current generation may be given more chances to reproduce than agents near the end. A total of k1 + ⋯ + kL new agents are created in this way and concatenated to the list of survivors, resulting in the next generation. The total number of agents remains constant at K = k0 + k1 + ⋯ + kL.
For example, the weak selection protocol used in the case study in Section 4 is (k0 = 200, h1 = 100, k1 = 200, h2 = 500, k2 = 100), meaning the population size is fixed at K = k0 + k1 + k2 = 500, the best k0 = 200 agents survive, the best h1 = 100 agents breed to produce k1 = 200 children, and all h2 = 500 agents breed to produce the remaining k2 = 100 children for a total of 300 children. This gives the 100 highest-rated genomes more opportunities to reproduce while allowing genes in lower-rated genomes a chance to avoid extinction.
2.8 An Important Detail of the Crossover Algorithm
The crossover function for UM genomes must deal with the possibility that the crossover index lies past the end of one of the parent genomes. There are several ways of dealing with this exceptional case.
One approach, called keep-tail crossover, handles this case as shown in Figure 6. In keep-tail crossover, the crossover index is selected uniformly at random between 0 and the length of the left parent. It is allowed to be past the end of the right parent if the right parent is shorter. In this case, the child genome consists of just the beginning part of the left parent. The name “keep-tail” refers to the fact that the resulting child genome keeps part of the “tail” of the longer parent that extends past the end of the shorter parent.
The motivation for keep-tail crossover was to avoid discarding material from the longer genome. However, it turns out to have the side effect of lengthening genomes on average. Half the time, the left parent is shorter and the child is exactly as long as the longer right parent. Half the time, the left parent is longer and the child is at least as long as the shorter parent, with some chance of being longer. Thus, the average length of the children is greater than or equal to the average lengths of the parents, and over time this length accumulates in the population.
Once that side effect was discovered, additional numerical experiments were conducted using a length-neutral crossover variant for comparison. Under length-neutral crossover, the crossover index is restricted to be no more than the length of the shorter parent, so the exceptional case shown in Figure 6 is disallowed. The length of the child genome will be the length of the right parent, so half the time the child is the same length as the shorter parent and half the time it's the same length as the longer parent as shown in Figure 5. The average genome length remains the same over time. This variant has the side effect that it is more likely to discard information from the ends of longer parents.
As an additional control, the simulation includes a no-crossover variant, in which the left parent alone is copied for the child genome. This algorithm also preserves the average lengths of genomes, but no recombination takes place at all, so reproduction is effectively asexual.
Length-neutral and no-crossover simulations sometimes result in populations where deletions outpace duplications to the extent that one agent's genome is reduced to nothing. For the purposes of this article, such sample runs are rejected and replaced by a new run.
Figure 7 shows the result of evolving a population of 500 agents, replacing 300 of them each generation, with mutation and crossover but no selection at all. Each graph shows 100 sample trajectories of the mean genome length in generations 50, 100, …, 500 as thin lines. Dotted lines are drawn for the mean over all trajectories, and for the mean plus and minus twice the standard deviation. The no-crossover variant and the length-neutral variant show no overall trend in genome size, just variability about the initial size of 32 genes. The keep-tail variant shows roughly linear growth over time at a rate of 0.0238 genes per generation in the graph here (slope of a least-squares line through the means over all samples, generations 200, 250, …, 500). However, the growth rate depends in part on the mean genome length, because if all genomes in a population are very long, the crossover events that generate the added length become rarer. Simulations using all three variants are analyzed in Section 5. Fully resolving this growth process is beyond the scope of the current article.
3 The Bit-Transmission Test Problem
Copy(m, n) is a test problem for the UM evolutionary framework. Each agent is a colony of two UMs, a sender and a receiver, connected through n synaptic channels. A word of m bits 〈b0b1 … bm−1〉 is given as input to the sender. When the calculation stops, a word of m bits 〈c0c1 … cm−1〉 is read as output from the receiver. Points are given for each input bit that is successfully copied to the output. The receiver also signals the end of the computation, and points are given for stopping after fewer time steps. In the case of m > n, which will be explored in this article, agents must evolve some sort of temporal coding to transmit all the bits. (If m ≤ n, there are uninteresting solutions in which each input bit is connected to its own synaptic channel and from there directly to an output bit.)
The Copy(2, 1) problem is of primary interest in this article. Specific patterns in the two UMs must be designated for receiving input and producing output as shown in Figure 8. No instructions from any genome are included in Figure 8, only the following configuration information that is fixed and supplied by the simulation to all agents: The two bits of the input word 〈b0b1〉 feed into A and A of the sender. A synaptic channel reads A in the sender during each time step, and if A ≥ 10, it adds 1 to A in the receiver. The calculation stops when A in the receiver reaches 10, or after 100 time steps. Then A and A in the receiver are read for the output word 〈c0c1〉. The sender also receives input to A on every time step to indicate its role as sender. Similarly, the receiver receives input via A. For the binary genome encoding, patterns are six bits wide and thresholds are four bits wide, for a total of 22 instruction bits per gene, or 3 × 22 = 66 genome bits per gene.
For the general Copy(m, n) problem, each genome is translated into an agent and rated as follows. The genome is decoded into instructions. Two UMs, a sender and a receiver, are given those instructions and initialized so that A[p] = 0 for all patterns p. They are then configured into a colony as in Figure 8, with m input bits going into the sender, m output bits coming out of the receiver, and n synaptic channels going from the sender to the receiver. Each possible m-bit input word is given to a copy of this initial state, which runs until the stop signal is emitted or 100 time steps have elapsed. The output word is compared with the input word, and each bit correctly transmitted is worth a correctness point, for a total correctness score of of up to m2m. The number of steps the computation takes for each pattern is recorded, and a time score of 80 less the number of steps past 20 is assigned, for a total time score of up to 80 × 2m. To prioritize correct transmission, the overall score is 10,000 times the total correctness score plus the total time score. The Copy(2, 1) problem has a maximum score of 80,320. The baseline score is what would be earned by an agent whose output is always 〈00…〉 while always taking maximum time. Such an agent gets half of the available correctness points and a time score of 0. The baseline score for the Copy(2, 1) problem is 40,000.
A genome and the corresponding instruction list from an agent that earns a maximum score on a Copy(m, n) problem will be called a perfect solution, or just a solution. Genomes and instruction lists whose score is greater than the baseline, but less than the maximum, will be called partial solutions.
A mapping from input words to time series of states of synapses will be called a synaptic code. Each genome has an associated synaptic code, which can be determined by running the agent built from that genome on all possible input words and recording whether each synapse is active or not at each time step. Distinct solutions may use the same synaptic code.
Each genome in the initial population consists of 32 genes 66 bits long, each of which is set to 0 or 1 independently with equal probability. Most of these initial agents always output 〈00〉 and make no attempt to stop early, thereby earning the baseline rating of 40,000. Typically, a few initial members earn a higher rating, up to 40,320, by stopping early.
4 Case Study: The Copy(2, 1) Problem
The evolutionary simulation can discover solutions to more difficult problems, but in the interest of presenting a problem that can be completely understood in a short time, we consider the simplest nontrivial temporal coding problem, Copy(2, 1). This section describes details obtained by dissecting one particular sample run, using a population of 500 agents and the weak selection protocol: Build k1 = 200 children by breeding the top-scoring h1 = 100 agents, and k2 = 100 children by breeding any of the h2 = 500 agents. The top-scoring 200 parents are retained, so that the population holds steady at 500. (Other selection protocols will be defined and analyzed in Section 5.) The keep-tail variant of the crossover algorithm is used. The network notation described in Figures 2 and 8 is used throughout. Only efficacious arrows are drawn (Section 2.3). For the sender, these are only those instructions that affect when the synapse is active. For the receiver, these are only those instructions that affect the output word and stop signal.
4.1 Overall Trajectory
A major innovation is the discovery of an agent that can correctly transmit at least one more bit in some input word than any agents that have been seen so far. Such an agent will score roughly 10,000 higher than the previous maximum for each additional bit, possibly a bit less because an improvement to the transmission mechanism may damage the timing mechanism and cost it a few points. The population exhibits punctuated equilibrium, experiencing times of drift, interrupted by major innovations, followed by minor innovations in which the timing mechanism is repaired (Figure 9).
4.2 Initial Population
In the case study, agent #79 was an agent in the initial population rated 40,320 (Figure 10). Only one gene in this agent is ever active. Since the activation of pattern 30 starts at 0 and never decreases, this instruction adds to the activation of pattern 0 on each time step, triggering the stop signal at time 10.
4.3 Transmission of the First Bit
The first major innovation was a connection between one input pattern, the synapse, and one output pattern, so that one input bit is correctly transmitted but the other is ignored. This occurred in agent #2244 (Figure 11). Only the two listed genes are ever active, one inherited from each parent. No mutation was involved in the discovery, only recombination. Both genes were present in the original population. They form the foundation of the complete solution. This agent is able to correctly transmit bit 0 in all input words. When input bit 0 is on, it activates pattern 8 in the sender, which activates the synapse, and then pattern 12 in the receiver, which is interpreted as output bit 0 being on. When input bit 0 is off, none of that happens, and output bit 0 stays off. Since no attempt is made in either the sender or receiver to interact with bit 1, that output bit is always off, so the agent earns half the payoff for that bit. No attempt is made to stop before the maximum allowed time. Overall, this agent scores 60,000.
Just transmitting a single bit is a chicken-and-egg problem: The parents of #2244 each have one piece of this mechanism, but get no credit for it because they do not have the other. This single-bit subproblem is simple enough that it is reasonably likely that each component comes into existence by chance, after which recombination stumbles upon the two-gene network, as happened here.
Oddly, the immediate parents of #2244 have ratings of only 40,000, because they do not have timing mechanisms. Thus this first innovation came from suboptimal parents. This phenomenon will be called an innovation through outliers. The population abandoned the efficacious gene in agent #79 at this time (Figure 10).
This major innovation was followed by a long sequence of minor innovations that rebuilt a timing mechanism and restored the 320 timing points. A few of these agents will be described.
First, parents contributed one gene each to form a new timing network in agent #4534 (Figure 12). It only operates when the synapse pattern 4 is activated, and therefore only on input patterns that trigger the synapse.
Its descendant #5456 scores 60250 (Figure 13). A mutation in an otherwise unused gene connected pattern 2 to 42. Since 2 always gets activated in receivers, this new timing mechanism works even for input words that never activate the synapse.
A different descendant of #4534, #6806, only earns a rating of 60,118, but it got a duplicate gene that turned out to be important later (Figure 14).
After further mutations, the first agent to achieve a score of 60,320 for correctly transmitting one bit and always stopping after at most 20 time steps is #11469 (Figure 15), a descendant of both #5456 and #6806. It inherited both the connection from 2 to 42 in the timing mechanism, and the duplicate genes (which have diverged) linking 8 to 3 in the sender. Note that the receiver relies on its role input, that is, the activation automatically added to A on each time step. The sender ignores its role input, which is supplied to A on each time step. The perfect solution eventually discovered in this sample run inherits this configuration. Other sample runs of the simulation found solutions that relied on both role patterns, and still others used neither.
4.4 Partial Transmission of the Second Bit
The next major innovation was the partial transmission of the second input bit. The left parent was #71778 (Figure 16). Its receiver mechanism is potentially capable of correctly reconstructing output words 〈00〉, 〈10〉, and 〈11〉, but not 〈01〉, because it cannot set output bit 1 without also setting output bit 0. Furthermore, its timing mechanism stops at time 17, which is one step before output bit 1 can activate, so it mistransmits 〈11〉 as 〈10〉.
The right parent was #71736 (Figure 17). It lacks any connection to input bit 1 or output bit 1, but its timing mechanism is one step slower because the threshold on the key instruction is 5 instead of 4.
Both parents have ratings of 60,320. Their child #72045, scoring 70,320, was the first to achieve a score over 70,000 (Figure 18). It inherited #71778's sending and receiving network, which are near the beginning of the genome, and #71736's slower timing gene at the very end of the genome. Therefore, it can correctly transmit 〈11〉 as well as 〈00〉 and 〈10〉.
In #72045, the synapse carries sufficient information to fully reconstruct the input pattern (Figure 18). Input bit 0 is associated with fast activation, and bit 1 is associated with slow activation. When both are off (〈00〉), the synapse never activates. When both are on (〈11〉), the synapse activates very early. When only one is on, the synapse activates at an intermediate time, early for 〈10〉 and late for 〈01〉. This kind of code is typical of the solutions found by this simulation.
Some comments are in order about the many connections to pattern 3 that appear in the sender. See Figure 19 for their full history. Many more copies of variations of this same gene are present in this lineage, but only the relevant agents are shown. Agent #72045 includes the following genes from its ancestor #70845:
7) A ≥ 1 ⇒ inc A, dec A
9) A ≥ 1 ⇒ inc A, dec A
10) A ≥ 1 ⇒ inc A, dec A
16) A ≥ 1 ⇒ inc A, dec A
17) A ≥ 1 ⇒ inc A, dec A
The other copies came from misalignment and crossover, as illustrated in Figure 20. This is called an indirect duplication, as it requires one ancestor to have experienced duplication or deletion, elsewhere in the genome, that shifted the position of a gene of interest. If that genome is crossed with one in which the gene of interest is present at its original location, and the crossover index lies between the copies, the child inherits a separate copy of the gene from each parent. This is analogous to unequal crossover in biology.
Several of these duplicate genes seem to serve dosage purposes . They accelerate activation of the synapse when input bit 0 is set. One experienced neofunctionalization (#70584, gene 17), where its input pattern mutated by one bit to 9, to form the crucial connection to input bit 1 .
This history also includes an important back mutation. In tracing gene 10 of agent #69062, one finds the following sequence of ancestral genes:
The threshold is the component of interest. Two single-bit mutations occurred to switch it from 1 to 3 to 11 and back. The changes from 1 to 3 and back were at exactly the same bit in the genome, but the changes from 3 to 11 and back affected two different bits in the same three-bit codon. This is evidence that one must be careful about ignoring back mutations in genetic modeling.
4.5 First Fully Correct Transmission
Finally, with agent #206821, the population found a way to correctly transmit both bits. This innovation was complex.
The timing mechanism came from the right parent #206509 (Figure 21). Until this stage of the sample run, timing had been in its own isolated network (as in #72045, Figure 18), but this parent has a gene A ≥ 5 ⇒ inc A, dec A connecting the timing mechanism to pattern 13, which sets output bit 1.
The sending and receiving mechanisms came from the left parent #206439 (Figure 22). In addition, gene 2 mutated to connect to the timing mechanism.
The resulting child #206821 (Figure 23) has suboptimal timing, but with a pre-boost mechanism. Pattern 2 in the receiver is always activated, which activates 42, which activates 13. When the synapse is activated very late to indicate word 〈01〉, pattern 13 has been pre-boosted to a fairly high activation level and has just enough time to reach the threshold of 10 by the time the calculation stops, thereby enabling the receiver to output the word 〈01〉. The newly mutated instruction A ≥ 3 ⇒ inc A, dec A keeps pre-boosting from ruining the correct transmission of 〈10〉, which is encoded by early activation of the synapse. The key genes forming the timing mechanism are at opposite ends of the genome.
Oddly, both parents have ratings of 60,320, which was suboptimal compared to the rest of the population, but the combination of the two in conjunction with a substitution mutation took advantage of both disabilities to produce a superior decoding mechanism.
4.6 First Perfect Score
Further corrections were made to the timing, which eventually led to the first agent to achieve a perfect score, #225284 in generation 750 (Figure 24). The synaptic code is essentially the same as that used by its ancestor #72045 (Figure 18). Overall, the network has accumulated considerable complexity, which suggests that repairing the timer without disturbing its contribution to the receiving mechanism was difficult.
4.7 Saturation Dynamics
Because of mutation, no genome can achieve perfect fixation, in which all genomes in the population are identical. Instead, once a major innovation happens, the innovative mechanism spreads and eventually saturates the population with a large number of equally high-scoring variants. However, the time required to reach saturation from a single innovator varied as the simulation progressed, as did the score distribution at saturation. Specifically, each major innovation spread exponentially fast, but at a slower rate for later major innovations (Figure 25). Fitting exponential curves er(t−t0) to the growth phase for each major innovation (via least-squares linear regression on the natural logarithm of the counts) yields growth rate constants r of 0.596 for the first, 0.325 for the second, and 0.271 for the third. Furthermore, the long-term equilibrium number of high-scoring agents was lower after each major innovation. Specifically, the means of the forty rightmost points for each major innovation in Figure 25 are 382, 297, and 271. The slowdown and lower saturation levels appear to be due to the fact that more complex networks are less likely to survive recombination and mutation intact.
4.8 Implicit Selection
The population continued to evolve even after perfect solutions were discovered. Statistical significance and p-values reported in this section are from Wilcoxon-Mann-Whitney rank sum tests applied to a feature of all agents in generations 800 and 1200, unless otherwise noted.
Recall that the keep-tail crossover procedure causes genomes to grow on average, in this case 10$ overall from 55.3 to 60.9 from generation 800 to 1200 (p = 5 × 10−83, Figure 26a). The average growth rate is 0.0124 genes per generation (slope of least-squares line through mean genome length at generations 800, 850, …, 1200). This is about half the rate seen in Section 2.8, and that difference may be due to the greater overall genome length in this data set.
Recall that the efficacious genes are the ones that are active during the transmission of at least one word and are connected directly or indirectly to the synapse or output bits. These form the regulatory network. Figure 26b shows that the number of efficacious genes roughly held steady, leveling out at a mean of 13.4, which indicates that the spaces between efficacious genes grew but the functional network did not. Generation 1200 actually has distinctly fewer efficacious genes than generation 800 (p = 0.001).
The network also continued to evolve. For example, agent #369432 (Figure 27) appeared in generation 1230 and has a perfect score, but with a different timer that is no longer entangled with the sending mechanism (Figure 24). The apparent simplification of the mechanism and appearance of another family of duplicate genes suggests that the ongoing changes are not merely neutral drift. There is evidence of implicit selectional forces favoring more robust solutions [19, 59, 83]. Recall that as new agents are added to the population, it is sorted by rating, and ties are broken by putting newer genomes first. Since 200 agents are retained from the old generation, a maximally scoring agent will not die until 200 maximally scoring children have appeared, pushing it far enough down the list that it will not be retained for the next generation. The population stabilizes at around 240–250 perfectly scoring agents, so new agents with less than a perfect score die quickly and are largely irrelevant to the future of the population. Since no agents are immortal, the population can then experience what at first appears to be neutral drift. However, a perfectly scoring genome whose offspring still score perfectly under a larger set of mutations will have more descendants in that surviving core. Thus, the population experiences implicit selective pressure favoring perfect solutions that are more robust to mutations.
One way to measure implicit selection for robustness is to consider each perfectly scoring genome in a generation and score the genomes that result from applying every possible single mutation to it. Each possible mutation is either deleterious (lowers the score) or neutral. The distribution of the number of deleterious mutations is shown in Figure 28, for generations 800, 850, …, 1200. The mean decreases from 325.3 to 281.9, a drop of 13$, indicating an increase in robustness (p = 1.4 × 10−43).
A second measure is to check for a trend in the use of robust codons. Recall that the genome is encoded using a majority-of-three code. One-fourth of the eight possible codons, 〈000〉 and 〈111〉, are robust in that every possible point substitution yields a synonymous codon, but all other codons are fragile in that two of the three possible point substitutions yield a nonsynonymous codon. Given a uniformly distributed random string of k codons, the number of robust codons has a binomial distribution with expected value k/4. Since genes in this simulation have 22 codons, those with 6 or more robust codons are said to be robust. Figure 29a shows the distribution of the number of robust genes. The mean increases from 26.5 to 36.1, an increase of 36$, confirming that there is implicit selection for robustness (t-test, p = 7.1 × 10−66).
Figure 29b and c show the number of robust efficacious and inefficacious genes. The number of robust efficacious genes levels out after increasing to a mean of 9.1. However, the number of robust inefficacious genes continues to grow, from a mean of 19.0 to 26.9 (p = 4.5 × 10−70). The total number of inefficacious genes grows from a mean of 41.5 to 47.6, so most of the increase is in the form of robust inefficacious genes. Therefore, robust codons in inefficacious genes are significant. A possible explanation is that they decrease the probability that a mutation will disrupt the efficacious network by joining in a previously unused instruction.
5 Aggregate Analysis
The next step is to extend the analysis to a larger data set. Three data sets consist of 10,000 sample runs each under the same weak selection protocol as the case study, but with each of the three crossover variants keep-tail, length-neutral, and no-crossover. Three more data sets consist of 10,000 sample runs each, under other selection protocols with the keep-tail crossover variant. Some of the samples run to completion in a few minutes; some take several hours. The largest samples take up tens of gigabytes of storage space. Overall, this section represents an analysis of roughly ten terabytes of simulation data.
All four selection protocols add 300 new agents while keeping the top k0 = 200 from the previous generation.
The very weak protocol (k0 = 200, h1 = 500, k1 = 300) is to build k1 = 300 new agents by breeding any of the h1 = 500 agents, so breeding has no bias toward high-scoring agents, but since only the 200 highest-scoring agents survive to the next generation, there is survival selection for high scores.
The strong protocol (k0 = 200, h1 = 100, k1 = 300) builds k1 = 300 new agents from the top h1 = 100.
The very strong protocol (k0 = 200, h1 = 10, k1 = 300) builds k1 = 300 new agents from the top h1 = 10.
The emphasis of the analysis is on the first agent out of each sample that earns a perfect score. Many of the data sets analyzed in this section have non-normal distributions, some taking small integer values, some with heavy upper tails, and some with outliers spanning many orders of magnitude. These data sets are displayed on a logarithmic scale when appropriate, and in terms of medians and quantiles, as these statistics are less influenced by outliers and heavy tails than are means and variances. Most samples are large enough that any distinction that is clearly visible on any chart is statistically significant. For detailed hypothesis test results, see Section C in the online supplement.
5.1 Typical and Atypical Synaptic Codes, Search Time, Genome Size
Depending on the selection protocol and crossover variant, about 80$ to 90$ of sample runs of the simulation find perfect solutions to the Copy(2, 1) problem that, as in the case study, transmit the four input words using the synapse in the following straightforward way. A typical synaptic code represents the four possible input words by activating the synapse at four different times: late or never for 〈00〉, very early for 〈11〉, and at intermediate times for 〈10〉 and 〈01〉. This encoding recognizes that 〈11〉 = 〈10〉 ∨ 〈01〉 (bit-wise or), and roughly adds the activation speeds for 〈10〉 and 〈01〉 to get the activation speed for 〈11〉. Solutions that use typical codes and sample runs that find them will also be called typical. Examples of typical synaptic codes are shown in Figures 24 and 30.
Other synaptic codes, genomes, and sample runs will be called atypical (Figure 31). Atypical perfect solutions are generally more complex, and the stronger selection protocols are more likely to find them (Figure 32). The crossover variant has little effect on the fraction of atypical codes.
The very strong protocol searches more agents before finding the first perfect solution than do those with other selection strengths, and atypical runs take longer overall than do typical runs under the same selection protocol (Figure 33). Trends among the other protocols are uncertain.
Under weak selection (Figure 33b), the keep-tail and length-neutral crossover variants take about the same amount of time, but the no-crossover variant searches nearly 10 times as many agents to find perfect typical solutions. This is in agreement with an argument from population genetics theory [20, §6.8]: As networks form, it is likely that a mutation will discover a potentially useful allele that only yields a benefit if a complementary allele is present at another locus. Without recombination, such an allele must survive neutral drift and wait for a complementary mutation to occur in one of its carriers before it can be selected. With recombination, it must still survive neutral drift, but if a complementary mutation occurs anywhere in the population, there is a reasonable chance that the two alleles will eventually occur in the same organism, leading to faster discovery of the combination. The strength of this effect depends on details of the population model, but it is clearly significant here.
Oddly, Figure 33 shows that without crossover, the median atypical run searched 435,032 fewer agents than the median typical run. In all other configurations, median typical runs are no longer than median atypical runs, and are clearly shorter than the median typical run with no crossover. This suggests that typical mechanisms are discovered relatively quickly by combining partial mechanisms, but are surprisingly difficult to discover sequentially. It is not clear why this should be the case.
Stronger selection leads to longer typical genomes (Figure 34). Atypical solutions tend to have distinctly longer genomes. As expected, keep-tail crossover leads to longer genomes than length-neutral, and no-crossover leads to distinctly shorter genomes. The efficacious genes show the same trends (Figure 35). This means that there is some implicit pressure favoring shorter genomes when recombination is absent, or longer genomes when recombination is present, even if the recombination process does not itself affect genome length . Without recombination, mechanisms must be built by discovering innovations in sequence, so without crossover, partial solutions that can be completed with fewer additions are favored. The astonishing range of genome lengths found in nature  is perhaps less so in light of the huge range in genome length shown here.
Under keep-tail crossover, the longer a run takes, the longer is the genome of the first perfect solution, no matter what selection strength is used (Figure 36). For all but very strong selection, there seems to be a rough power law relating genome length to the number of agents searched. The data sets from typical solutions are dominated by large central clumps marked by boxes. The clumps come from sample runs that searched 105–106 agents, and found solutions with 30–100 genes. Atypical runs are more spread out.
Compared to keep-tail, length-neutral crossover shows spread in genome length about the initial setting of 32 genes as the search time increases, but favoring longer genomes (Figure 37). Unexpectedly, no-crossover leads to shorter genomes in longer runs (although the effect is broad), which again indicates implicit pressure favoring shorter genomes when there is no recombination. Based on the distributions of the overall number of genes (Figure 34b) and number of efficacious genes (Figure 35b), keep-tail and length-neutral crossover seem to lead to an accumulation of inefficacious genes as was observed in the case study (Section 4.8). A possible explanation is that single-point crossover generates a force called removal bias  as follows. If efficacious genes are grouped into clusters separated by long strings of inefficacious genes, then the crossover point is likely to fall in one of these inefficacious strings, which decreases the probability that misalignment will cause an efficacious gene to be deleted or duplicated during crossover. Thus, removal of inefficacious genes is disfavored, as it exposes the genome to greater risks during crossover. This happens in spite of the fact that crossover indices are restricted to be between genes (Section 2.6).
5.2 Innovations through Outliers
As observed in the case study, innovations can come from agents called outliers whose scores are below the current maximum. Unsurprisingly, such events are more likely under weaker selection protocols, because stronger ones do not give outliers the opportunity to reproduce. The majority of runs under very weak selection have at least one major innovation through outliers, whereas most runs under strong and very strong selection have none at all (Figure 38). No-crossover also yields few major innovations through outliers. A possible explanation is that with crossover, these innovations happen when low-scoring parents happen to have complementary parts that recombine to form the innovative solution. Without recombination, that cannot happen.
The prevalence of major innovations through outliers does not seem to be responsible for the tendency for the weak and very weak selection protocols to yield shorter genomes than the strong and very strong protocols. Among the first perfect solutions found by very weak selection, those with more major innovations through outliers actually have slightly larger genomes and take longer to run (Figure 39). Only the apparent increases between runs with two and three such innovations are not statistically significant (online supplement, Section C.8), and that is probably due to the fact that there are only 143 runs with three, leaving the upper tail of that distribution underrepresented with respect to the others.
5.3 Robustness of Solutions
For each selection protocol, one can examine the first perfect solution, and compute the fraction of robust efficacious and inefficacious genes, that is, those with more than the expected number of robust codons (Figures 40, 41, and 42). Efficacious genes are generally more robust than inefficacious genes. The presence of outliers with many fragile efficacious genes is not surprising, since these numbers are for the first perfect solution out of each run, of which the efficacious genes are likely to include some that have recently experienced beneficial mutations. A mutation that changes the instruction associated with a gene must result in a fragile codon.
Likewise, the ratio of deleterious possible mutations to genome length can be calculated (Figure 43). Among typical solutions, the median ratios are 4.93, 4.78, 4.69, and 5.37 under very weak, weak, strong, and very strong selection, respectively. Among atypical solutions, the median ratios are 4.90, 4.94, 5.07, and 5.74. In either case, under very strong selection, the ratio is distinctly higher than under the other selection protocols, but otherwise there is no simple trend relating selection strength to a higher ratio. (Distinctions among the other selection protocols are statistically significant but small for typical solutions, and not significant for atypical solutions.) For strong and very strong selection, atypical solutions have a distinctly higher ratio than typical. With weak selection, the distinction between typical is significant for length-neutral and no crossover, and just fails to be significant (p = 0.056) for keep-tail crossover. The keep-tail crossover variant causes genomes to grow, and most of the added length consists of inefficacious genes. It is therefore unsurprising that keep-tail samples have lower ratios overall than those from other crossover variants, as the denominators are larger.
The relationship between the number of deleterious possible mutations and genome length is weak except that with no crossover (Figure 44, right column), solutions found by most sample runs lie in a crescent-shaped clump. That is, the protocol finds typical solutions of all lengths with few deleterious possible mutations. The higher ratios in Figure 43b are likely due to the fact that runs without crossover tend to find short solutions, which makes the denominators small.
5.4 Time for Last Major Innovation
Population genetics models typically include a memoryless mutation process, that is, there is a small fixed probability of mutation in each time step [20, 26, 65]. To sketch such a model, suppose an organism is of type either A or B and is replaced asexually by a child in each time step. If each organism has some fixed probability q of generating a child of the other type, then the time until any A lineage mutates to B has a geometric distribution with parameter q and is independent of how many steps have already passed without mutation. A more complex model might include a directed chain of intermediate types of increasing fitness between A and B, in which case the time until an A eventually becomes a B has a negative binomial distribution.
However, the data collected from this simulation does not fit a memoryless model. Define the time for the last major innovation as the number of agents between the first agent to achieve a rating of at least 70,000 and the first to achieve a rating of at least 80,000, that is, how many agents must be searched to finish a mechanism that transmits all but one bit correctly, ignoring the timing mechanism. Log-density histograms (Figure 45) do not appear to be consistent with the log-probability-mass function for a geometric distribution, which is linear, or with that of a negative binomial distribution, which is concave down with an asymptotically linear dropoff. Instead, each histogram is concave up and weighted toward short times but with a heavy tail. The last major innovation is biased in favor of happening either shortly after the next to last major innovation, or after an unexpectedly long time. The effect seems to diminish with increasing selection strength.
6 Conclusion and Further Questions
The UM simulation evolves perfect solutions to the Copy(2, 1) problem under a variety of selection protocols and crossover variants. A detailed case study of a single run reveals the mechanisms at work. The evolutionary dynamics overcame the chicken-and-egg nature of the problem by finding a sequence of partial solutions. It started with a simple chain of two genes, then developed a more sophisticated sender, then the associated receiver, and so on. From the histories of certain efficacious genes it is clear that gene duplication assisted in network formation in addition to adjusting the dosage of abstract chemical patterns. A duplicate of one of these genes experienced a substitution, forming a new regulatory connection, thereby expanding the network and improving its function. As the population discovered partial solutions with increasing complexity, further innovations took longer to find, and high-scoring agents made up a smaller portion of the population. All of the major innovations involved crossover. New combinations of variants of existing mechanisms triggered each breakthrough. Several innovations came from crossing parents with distinctly low scores. There was an important back mutation. The first perfect solution represents each input bit as an activation speed and combines them roughly additively.
Evolution continued even after finding a genome with a perfect score, resulting in new networks that are more robust against mutation and crossover. The duplicate genes in #369432 (Figure 27) may be favored because they add robustness to the network in the form of redundancy [34, 66]. Many perfectly scoring agents in this sample run have similar networks with three to five variant copies of these genes. These duplications may, however, be a product of chance, since they lie at the very end of the genome, where they are particularly subject to indirect duplication.
Based on these observations, the choice of an ARN execution model and genetic encoding compatible with crossover and gene duplication in simulating the evolution of biochemical computation reveals important phenomena that would otherwise be obscured.
Aggregate analysis reveals a complex relationship between selection strength and the evolutionary trajectory. The roughly additive synaptic code seen in the case study turns out to be typical of the synaptic codes found by the simulation. The probability of finding an atypical solution increases with selection strength. Using keep-tail crossover and comparing the results of different selection protocols, we find that sample runs using very strong selection take longer to find perfect solutions. The solutions they find generally contain more efficacious genes and are less robust. A plausible explanation for these effects begins with the observation that since very strong selection only allows 10 agents per generation to breed, such runs contain low genetic diversity, so they take longer to explore the space of possible genomes. Since keep-tail crossover causes genomes to accumulate length over time, runs under very strong selection have longer to accumulate length. Apparently, greater genome length and longer search time increase the chance of stumbling onto larger, clumsier mechanisms that still solve the communication problem but are fragile. Forces other than keep-tail crossover are definitely affecting the genome length and search time, since the point clouds in Figure 36 for very strong selection are more diffuse than those for other selection strengths. A topic for further study is the extent to which genetic diversity might be the cause.
Very weak selection produces smaller, more robust solutions that are discovered more quickly. When using weak and strong selection, runs of the simulation search about as many agents as when using very weak selection, but the resulting solutions have genomes that are generally longer and contain larger networks of efficacious genes. These trends are clearly visible in the means and medians shown in the figures in Section 5. However, the distributions of genome length, number of agents searched, and other features of the sample runs are very spread out, some over several orders of magnitude.
Recombination results in significantly faster discovery of perfect solutions. Overall, the sample runs with no crossover took an order of magnitude longer to find perfect solutions than those with the other two crossover variants. These runs also yielded distinctly shorter genomes. The explanation for the short genomes seems to be that without crossover, partial mechanisms that require fewer added genes to achieve greater functionality are favored (although they take many generations to discover) and there is no removal bias to reward the accumulation of inefficacious spaces between efficacious genes.
The case study showed how key innovations can be discovered based on agents that did not achieve the highest rating over the population at that time. This suggests that selection-mutation processes sometimes use a kind of tunneling to escape through a fitness barrier or entropy barrier in a fitness landscape [15, 24, 58]. Genomes rated lower than the population's maximum are more likely to be close to the boundary of the hill or ridge in the fitness landscape, where the population is concentrated. They are therefore more likely to produce offspring on an adjoining hill. Once that happens, small mutations can lead the population to climb the hill to a new maximum, and the process repeats. Weak and very weak selection clearly make use of this mechanism (Figure 38). However, it is unlikely to be responsible for the greater speed of evolution under these protocols, since the number of agents searched increases with the number of innovations through outliers (Figure 39).
It may seem obvious that outliers are more likely to produce offspring that cross a fitness valley than genomes at the top of a hill, but this phenomenon is often absent from simulations  and omitted from models, such as evolution by neutral drift . The justification for such omissions is the simplifying assumption that selection drives deleterious mutations to extinction so rapidly that they have no permanent impact on the species. However, in light of this study and others [15, 24, 58], it is clear that this assumption is not always appropriate, and that deleterious mutations may contribute indirectly to the evolutionary discovery of beneficial mutations.
Aggregate analysis confirms that there are indirect selective forces. Figures 28 and 29 indicate that there is implicit selection for robustness against mutation, including codon bias, and that it affects both efficacious and inefficacious genes. As genomes grow, they accumulate inefficacious genes (Figure 29c), which suggests that networks experience pressure to move genes toward locations that are less likely to be disrupted by crossover. Further study of these phenomena is needed.
The unexpected distributions of times for the final innovation (Section 5.4) may indicate a complex interaction between population size, genetic diversity, and founder effects. When an innovative genome is discovered after a long equilibrium period, the population has probably accumulated considerable diversity. That diversity diminishes as the innovation spreads, but before it disappears, it may increase the probability of a second innovation. Further studies should investigate this possibility.
The analysis of many sample runs confirms that ARNs evolved by this simulation have a number of properties that mimic natural GRNs. Most genes in the simulated genomes are never active, which agrees with observations that many noncoding regions in higher organisms are never transcribed, produce nonfunctional products, or are otherwise seemingly unnecessary. Inefficacious simulated genes and natural noncoding regions do seem to serve purposes, however, such as facilitating gene duplication events during recombination . Natural introns are known to influence recombination rates [4, 5, 13, 14] and chromosome reorganization . In its current form, the UM simulation does not include chromosome reorganization or nonuniform recombination rates, but future studies using the UM simulation could be used to investigate a related phenomenon, the extent to which “spacers” consisting of consecutive inefficacious genes organize efficacious genes into functional modules and protect them from breaking up during recombination. Thus the UM simulation framework provides further evidence that noncoding DNA should not be classified as junk, and can test hypotheses about its functions [10, 72]. The computational networks discovered by this simulation can be somewhat tangled, with the timing mechanism intertwined with the decoding mechanism. This reflects the interconnectedness of biochemical reaction networks, in which a single gene product may have several different roles in seemingly unrelated systems.
Certain parameters in the simulation are the result of trial and error and could be tweaked more rigorously. The mutation rate, for example, seems to be high. Also, the number of instruction bits allocated for patterns and thresholds determines their ranges, which places constraints on the network structure of the resulting UM programs. If patterns have many bits, then there are too many possible patterns, most genes in the initial population will not interact, and the simulation makes no progress toward a solution. If patterns have few bits, then too many genes in the initial population will interact, resulting in excessive activation and an onslaught of mutations that interfere with any partial solution that happens to be discovered. These parameters should be tested and modeled to determine appropriate ranges.
In the current simulation, agents do not interact at all. An obvious extension to this project is to rate agents on their ability to communicate with each other rather than how well UMs with the same genome communicate within an agent colony. Preliminary experiments suggest that the population then takes much longer to arrive at a perfectly correct coding mechanism, apparently because potentially useful communication features grant no benefit unless others in the population have them as well [3, 16, 17, 55]. Thus, useful genes may not be noticed until they surpass some threshold frequency through neutral drift.
The UM was designed to resemble a gene regulatory network in both its execution model and its binary encoding. However, many details of molecular genetics were omitted that may be worth including in a future project. One such possibility is to add more flexible mechanisms for activating and deactivating instructions. Currently, each UM instruction is regulated by a single switch pattern, and only units of that exact pattern affect it. Transcription of a gene in natural DNA is more complicated. It may be controlled by several binding sites, and there may be many transcription factors and repressors that bind to those sites with variable strengths depending on how closely the site matches their preferred nucleotide sequence [7, 31, 32, 39,–41]. Methylation, nucleosomes, and other biochemical structures also affect gene activation. Analogous features could be added to the UM.
Additionally, the simulated reproduction process includes crossover, but not in the usual diploid way. The following thought experiment suggests that this is a potentially crucial detail that could affect the spread of an innovation. Suppose two genes present in the simulated population would grant a fitness benefit if both occurred in the same agent (epistasis), but are otherwise neutral. Suppose they are not yet common, and are located near opposite ends of the chromosome. The population could discover the combination fairly easily. Crossover is likely to happen away from the ends of the chromosomes, so a mating between agents, each with one of these genes, is likely to result in a chromosome with both genes. However, the combination will likely have to be rediscovered several times before it can spread, because crossovers involving the first innovative agent are also likely to happen in mid-chromosome and separate the pair. Without elitism, the combination would be at risk of going extinct.
In contrast, during natural meiosis, matching chromosomes are both duplicated and recombined, yielding haploid cells with a mixture of both parents' chromosomes, but with the possibility of having unchanged copies of one of the parents' chromosomes . Consider a modification of the simulation with a true diploid genome consisting of one pair of chromosomes. Continuing the thought experiment, the beneficial pairing of two genes at distant loci will first occur when each is on a different chromosome in the same agent. A bit less than 1/4 of its gametes will have a mixed chromosome with both genes, and about 1/4 of the child's gametes will have an intact copy of that chromosome. The pair has a greater probability of being transmitted to future generations than under the simulation's haploid genome architecture, which should increase the speed at which it spreads. In a population without elitism, this effect might significantly increase the probability that the combination gets established before accidentally going extinct. This dynamic difference between recombination algorithms could be part of why so many organisms reproduce use a diploid genome architecture. Many details of this conjecture remain to be worked out and confirmed. The current simulation could be modified to include full diploid recombination. One of the advantages of the UM over the parametric, register machine, and Lisp execution models is that it is especially compatible with a diploid genome. All chromosomes can be decoded into UM instructions, and since their loci in the genome are irrelevant to its abstract biochemical execution model, the resulting program is perfectly valid. The UM therefore provides an ideal platform for studying different genome architectures and models of recombination.
Most of the sample runs of the simulation were performed on the Clemson Palmetto computing cluster over the course of several months. This project would not have been possible without it.
The implementation of the Utrecht machine framework was written in a combination of Haskell, Python, C++, C, and assembly language. The author is grateful to the developers of the Glasgow Haskell Compiler, Python, the GNU Compiler Collection, R, and the many open source software packages used during this project.
The name comes from the fact that the author first presented this simulation framework at the Evolution of Language Conference 2010 in the city of Utrecht.
Gray code representations of integers will be incorporated in a future project.
Department of Mathematics, College of Charleston, 175 Calhoun Street, Robert Scott Small Building, Room 339, Charleston, SC 29401. E-mail: email@example.com