## Abstract

Amoeba, a computer platform inspired by the Tierra system, is designed to study the generation of self-replicating sequences of machine operations (opcodes) from a prebiotic world initially populated by randomly selected opcodes. Point mutations drive opcode sequences to become more fit as they compete for memory and CPU time. Significant features of the Amoeba system include the lack of artificial encapsulation (there is no write protection) and a computationally universal opcode basis set. Amoeba now includes two additional features: pattern-based addressing and injecting entropy into the system. It was previously thought such changes would make it highly unlikely that an ancestral replicator could emerge from a fortuitous combination of randomly selected opcodes. Instead, Amoeba shows a far richer emergence, exhibiting a self-organization phase followed by the emergence of self-replicators. First, the opcode basis set becomes biased. Second, short opcode building blocks are propagated throughout memory space. Finally, prebiotic building blocks can combine to form self-replicators. Self-organization is quantified by measuring the evolution of opcode frequencies, the size distribution of sequences, and the mutual information of opcode pairs.

## 1 Introduction

Artificial computer worlds have been designed to study many diverse topics. There are several reviews on the artificial chemistry architectures for these artificial worlds [10, 15, 19, 29]. A number of artificial chemistry architectures are designed to model life [1, 5, 21, 25, 26, 28], its biological complexity [3], and its relationship to information theory [4, 16, 27].

Within this sphere, there has been considerable discussion as to how self-replicators can emerge from a primordial soup of initially random computer operations (opcodes) in a system that is sufficiently open-ended to enable organisms of ever increasing complexity to evolve. Studies have ranged from prebiotic chemistries for the origin of biological life [9, 11] to the guided self-organization of artificial systems in computer systems [16, 29, 30].

Amoeba is an artificial chemistry inspired by the Coreworld [25], Tierra [26], and Avida [1, 20] systems and designed specifically to study the process of self-organization in a prebiotic world that eventually leads to the emergence of self-replicators [14]. Amoeba's memory space is initially loaded with opcodes (primitive machine operations) randomly selected from an alphabet of 25 unique opcodes. Opcode sequences form prebiotic (non-replicating) programs that compete for memory space and CPU time and evolve through point mutations [24].

The original Amoeba version, Amoeba-I, used a set of 16 possible opcodes and a memory topology where virtual CPUs operated on sequences of opcodes situated on a 2D interaction grid [22]. Complements of the opcodes themselves were the addresses, so it was impossible to move to arbitrary positions in memory. Amoeba-I could not simulate an infinite Turing tape, as there were no stacks assigned to the CPUs. Amoeba-II used the same Amoeba-I memory topology but also added two stacks for each CPU and expanded the basis set to 32 opcodes [23]. While these opcodes formed a computationally universal set, it was difficult to navigate throughout memory, because the addressing used {opcode::address} pairs (instead of pattern-based addressing, for example).

Amoeba-III used a new topology for opcode memory space. The 2D interaction grid was replaced by a 2D memory map with toroidal boundary conditions. The memory map consisted of 500 parallel circular bands, each consisting of thousands of opcodes [24]. Instead of encapsulating programs in separate program stacks on an interaction grid, all Amoeba-III programs had access to the same memory space, allowing one to study the parasitism of memory, which exists in the biological world and which is artificially suppressed in some other artificial chemistries, such as Tierra and Avida. Amoeba-III still used {opcode::address} pairing.

Both Amoeba-II and Amoeba-III exhibited some low-level biasing of the opcode basis set where the frequencies of those opcodes required for allocating memory, copying opcodes, and initiating child replicators increased at the expense of less critical opcodes. However, no self-organization of opcodes into building-block sequences was observed. As with Amoeba-I, ancestral self-replicators still relied on a fortuitous sequence of randomly generated opcodes, capable of self-replication, that would spontaneously emerge.

We report that the recent version of Amoeba (Amoeba-IV), with addressing that freely accesses memory and a modified opcode basis set, not only exhibits emergence—it does so using a far richer pathway [14].

## 2 Description of the Amoeba-IV System

The current version of the Amoeba system uses the same 2D memory space topology with periodic toroidal boundary conditions as was used in Amoeba-III [24]. The opcode basis set consists of 25 unique opcodes. The operations of some of these opcodes have changed considerably from earlier Amoeba versions, radically changing the self-ordering of Amoeba's memory space, the emergence of ancestral replicators, and the diversity of those replicators. The main changes include pattern-based Tierra-type addressing [26] and a set of more primitive opcodes, requiring longer sequences of opcode programs to propagate opcodes to other regions in the memory space. Nonoperating opcodes (NOPs) are used for pattern-based addressing, as was done with the Tierra and Avida systems. The Self-Exam process for calculating a program's size is more complex, as is the procedure for copying a program's opcode sequence. Eight opcodes use the Avida methodology where a default operation can be modified by means of a following NOP.

The use of pattern-based addressing is a significant change for Amoeba-IV. Earlier versions of Amoeba did not use pattern-based addressing, because it was believed this would require a prohibitively long time for emergence of a replicator from a soup of random opcodes. For example, an Amoeba-III replicator would require at least five NOPs in addition to a minimum of seven opcodes for even primitive sequences that inefficiently replicate only once (referred to as proto-replicators). The recent changes in Amoeba-IV require a minimum of fourteen opcodes for a self-replicator, and eleven opcodes for an inefficient proto-replicator.1 The probability of an Amoeba-IV self-replicator consisting of L = 14 opcodes forming from a fortuitous combination of L opcodes randomly selected from an alphabet (basis set) of D = 25 unique opcode scales is 1/DL = 25−14 ≈ 2.7 × 10−20. Although there are a large number of alternative replicators for such a sequence length, this is still a discouragingly tiny number.

However, we find that Amoeba-IV exhibits emergence by self-organizing its opcodes through several stages. First, the opcode basis set coalesces into a reduced set. Second, primordial building blocks, consisting of short opcode sequences that later develop into specialized genes necessary for self-replication, are propagated throughout memory space. Finally, we speculate that replicators can emerge if the building-block density is sufficiently high. This self-organization followed by emergence is one of the most significant observations of our current Amoeba research.

### 2.1 Memory Space and Virtual CPUs

Amoeba differs from Avida and Tierra in that there is no write protection in memory and no encapsulation of programs into artificial cells. Any program in Amoeba can potentially write to any location in the entire memory space. Organisms in the biological world encapsulate their genetic code inside membranes, but that information can be accessed by endogenous parasites, such as retroviruses (like HIV) and transposons [7]. The role of such elements in evolution has recently been more fully appreciated, and Amoeba has a unique architecture in which to study it. The ability to incorporate such features into an artificial evolutionary system is a significant feature of Amoeba and the subject of ongoing work. In artificial computer chemistries, one can study the implications of encapsulating a sequence of opcodes [24]. Tierra, Avida, and earlier Amoeba variations used a form of program encapsulation, manifested in different ways. In Tierra, a parent sequence uses the MALL opcode to allocate and write-protect memory for its child [26]. Avida write-protects by embedding opcode sequence loops on a 2D interaction grid [20], similarly to Amoeba-I and -II [22].

The opcode memory space in the Amoeba-III and -IV variations is two-dimensional with periodic toroidal boundary conditions. The memory is organized into 500 parallel circular bands, each with 2399 opcodes,2 for a total of ≈1.2 × 106 opcodes.

A summary of the key parameters mentioned in this study, along with typical values used, is shown in Table 1. Also shown (in parentheses in the “Value” column) are the ranges of values for some of the parameters. These ranges were tested previously to the studies reported here.

Table 1.

Key parameters, each with its value used in this study and range of values tested.

 Parameter Value (range) Number of (horizontal) Tierra-like bands containing opcodes 500 Number of opcode locations per band (prime number) 2399 Number of unique opcodes in basis set 25 (25–32) Number of virtual CPUs 2000 Number of virtual CPUs reserved for random sequences per generation 100 (0–100) Size (number) of randomly selected opcodes per random sequence 2 (1–3) Typical size of a self-replicator and observed range of sizes 30 (17–150) Probability of substitution mutation when executing COPY 0.005 (0.001–0.010) Probability of (insert/delete/substitute) mutation when executing DIVD 0.10 (0.00–0.20)
 Parameter Value (range) Number of (horizontal) Tierra-like bands containing opcodes 500 Number of opcode locations per band (prime number) 2399 Number of unique opcodes in basis set 25 (25–32) Number of virtual CPUs 2000 Number of virtual CPUs reserved for random sequences per generation 100 (0–100) Size (number) of randomly selected opcodes per random sequence 2 (1–3) Typical size of a self-replicator and observed range of sizes 30 (17–150) Probability of substitution mutation when executing COPY 0.005 (0.001–0.010) Probability of (insert/delete/substitute) mutation when executing DIVD 0.10 (0.00–0.20)

There are 2000 virtual CPUs, allocated in two ways. First, at the start of each new generation, 100 of the CPUs are assigned to short sequences ranging from one to three randomly generated opcodes for an average of 200 randomly generated opcodes per generation. These short sequences are placed at random locations in memory and are a means to jump-start each generation. They are also a means of injecting entropy into the system. Second, a program allocates the next CPU in the queue (using the MALL opcode) prior to copying opcodes to its child. A new generation starts when the 2000th CPU has been allocated or if 50,000 virtual CPU executions have occurred since the start of the new generation.3 The first 100 CPUs are then reassigned to new random sequences as described above, and the CPU queue reset to the 101st CPU. Each CPU has four numerical registers (AX, BX, CX, DX), two address registers (EX, FX), two stacks (A, B), and an instruction pointer (IP) that operates on opcodes. Additional parameters include the program's size and its IP location (band and position within the band). CPUs are accessed sequentially. Each CPU is given a slice of CPU time that is proportional to its program's size. In the studies reported here, the time slices typically ranged from a minimum of 6 to a maximum of 100 operations.

The opcode sequence for each of 2000 programs is confined to a particular band. Parent programs can define a starting location for their children in adjacent bands, thus propagating sequences throughout all of memory space. Multiple programs can operate on the same opcode sequences in the memory space.

The definition of a program is somewhat nebulous in Amoeba because, as in Coreworld, there is no artificial encapsulation of sequences. Programs are only clearly defined for true self-replicators, where a sequence of opcodes has a beginning NOP address and an ending NOP address, and the IP is retained. But in the prebiotic and protobiotic stages, IPs are readily lost to the surrounding soup of opcodes in memory. These rogue IPs roam throughout the band, executing whatever opcodes they encounter. Rogue IPs are a valuable resource; other programs can capture them and use their virtual CPU along with associated registers and time slices.

Figure 1 is a snapshot of part of Amoeba's memory space along with a schematic of a virtual CPU. Each opcode is color-coded according to the key at the bottom. Bands run horizontally in this figure. An opcode sequence in one of the bands is expanded. This example is from a run after emergence, evident from the repeated sequences of opcodes in adjacent bands.

Figure 1.

Top: Portion of Amoeba memory space. Middle: A virtual CPU with associated registers, its IP executing the INCA opcode in the blown-up opcode segment in the inset. Bottom: Opcode key.

Figure 1.

Top: Portion of Amoeba memory space. Middle: A virtual CPU with associated registers, its IP executing the INCA opcode in the blown-up opcode segment in the inset. Bottom: Opcode key.

In Figure 1, IPs move from left to right along a band unless the IP is jumped to an address (pattern consisting of NOPs) along the band by means of a JMPB, JMPF, CALL, or RETN opcode. In this particular run, the memory space was dominated by robust self-replicating programs consisting of 18 opcodes. Mutations create variants of the main replicator with varying lengths and replication efficiencies.

We analyzed a total of 54 individual runs using standard values for various environmental variables such as mutation rates (see the following section for more details). We performed eight control runs, discussed in more detail in Section 3.2.3 (Role of Entropy in Emergence). Each run starts by reading in the values for the environment variables. These values, along with the source code, are part of the Online Supplementary Material found at http://www.mitpressjournals.org/doi/suppl/10.1162/ARTL_a_00234. We observed emergence of self-replicators in 28 of the standard runs. Each run was continued either until the system had stabilized into a static pattern of replicators with minimal changes in the opcode sequences or until 400,000 viable mutants had been generated. In no case were runs continued for longer than 20 million (20M) generations.

### 2.2 Opcode Basis Set

There are several important considerations when designing an opcode basis set. Ideally, the set should be computationally universal, and it should be possible to propagate the opcodes throughout memory space [2, 29]. A computer system uses a set of rules that form an instruction set of machine operations for manipulating data. These rules form a basis set that is computationally universal if they can simulate a Turing machine and therefore the computational functionality of any computer. This is desirable because it enables a system to exhibit open-ended evolution where there is ongoing adaptive novelty and ongoing growth of algorithms of arbitrary complexity [8, 31].

Maley showed that the Tierra system could simulate a Turing machine, although some of the fundamental operations are inefficient [17]. However, the choice of basis set is not the only criterion for a Turing machine. Another requirement is the ability to read and write to any location in memory. Previous versions of Amoeba used a scheme of address labels paired with individual opcodes (referred to as codons in earlier research), greatly limiting an instruction pointer's movement to other locations in memory. Amoeba-IV addresses this defect by using pattern-based addressing consisting of NOPs.

Amoeba-IV uses an opcode basis set that is derived from basis sets used by other Tierra-like worlds (Coreworld, Tierra, Avida, earlier Amoeba versions). The functionalities of some of the Amoeban opcodes differ from similar opcodes in those other worlds. A summary of the opcodes used in Amoeba-IV is shown in Table 2.

Table 2.

Description of opcodes used in Amoeba.

Several opcodes (PSHA, POPA, ADDA, SUBB, BEQA, AEQZ, INCA, DECA) invoke the Avida methodology where a default operation is modified if the following opcode is a NOP [2]. The alternative action is shown in parentheses in Table 2.

Each numerical register has a default mode of operation. The starting opcode in a (child) program is set in the CX register when its parent executes the MALL opcode. The DX register is the offset (positive or negative) number of bands from the parent's band and is used by the MALL opcode to select which band a parent's future child's code will be written. The AX register is the number of opcodes offset from the program's start. When a parent copies one of its opcodes to its child (COPY opcode), the parent's opcode at memory location, CX(parent) + AX, is copied to the location, CX(child) + AX, in the child's band. The BX register is loaded by one of the numerical opcodes and is used to carry information such as the program's size.

The JMPB, JMPF, and CALL opcodes move the pointer to the address complement of the following NOP(s). In the absence of such a NOP, the IP will jump to the address stored in the EX address register if an opcode such as ADRB or ADRF had previously loaded the EX register.

### 2.3 Evolution and Mutations

Replicators evolve in three ways: opcode mutations; randomly generated sequences; and programs overwriting each other's code, potentially parasitizing their unprotected opcodes. Opcodes are mutated in two different ways.

First, each time a program copies an opcode to its child (COPY opcode), the probability Psub that that opcode is replaced by another is Psub = 0.005. Larger probabilities make it more difficult for longer sequences to survive, since the upper limit to the length Lmax of replicated sequences scales as the reciprocal of the substitution mutation rate, Lmax ∼ 1/Psub = 200 opcodes.

Second, each time a program initiates its child (DIVD opcode), the child's opcode sequence can undergo one of three types of mutation, for a total rate of 0.10. Empirically chosen mutation rates when executing the DIVD opcode are insertion (0.02), deletion (0.02), and substitution (0.06).

An important issue is what range of hand-selected mutation rates is reasonable. We observe that DIVD mutation rates above about 0.20 tend to melt the world where the opcodes become randomized more frequently than the rate at which the system self-organizes by biasing the opcode basis set and propagating building blocks for future genes that enable self-replication. COPY mutation rates above 0.01 favor smaller sequences with sizes less than 100 opcodes, while rates less than 0.005 result in a world that evolves at a slower rate.

At the start of each new generation, 100 sequences, each consisting of one, two, or three randomly selected opcodes, are randomly distributed throughout memory space. The probability of inserting a random sequence is peaked at the middle of the 500 bands. The middle bands tend to be melted, but the edge bands have no random sequences. The effect of this nonuniform distribution of injected entropy on emergence is discussed below in Section 3.2.3 (Role of Entropy in Emergence). In that same subsection, we show through two sets of control experiments the effect of differing amounts of this entropy.

### 2.4 Anatomy of a Typical Self-replicator

Figure 2 shows the anatomy of a typical robust self-replicator from one of the 28 runs, generated from its ancestral proto-replicator after almost 1.5M generations of evolution. The proto-replicator was very inefficient, with 132 opcodes in its sequence, of which three-quarters were nonfunctional. These nonfunctional opcodes were shed in about 47,000 generations, resulting in the more robust replicator of only 26 opcodes shown here.

Figure 2.

Anatomy of a self-replicator. Color bars on the left show regions for four genes used by replicators. Run: 06-08-60140. Generation: 3.499M.

Figure 2.

Anatomy of a self-replicator. Color bars on the left show regions for four genes used by replicators. Run: 06-08-60140. Generation: 3.499M.

The opcode color coding is as in Figure 1. This replicator has four genes (shown on the left-hand side): Self-Exam (ADRB, NOP1, ADRF, NOP2, SUBB), Copy-Loop (NOP1, COPY, INCA, IFAG, JMPB), Biological (MALL, DIVD), and Reset-Register (INCD, POPA, JMPB). Note that the genes overlap and/or can be split into multiple pieces. For example, the Copy-Loop overlaps with the Self-Exam gene.

### 3 Self-Organization Leads to Emergence

The main result of the Amoeba-IV system is the self-organization of the initially random distribution of opcodes in memory, leading up to the emergence of an ancestral self-replicator. As observed in Coreworld [25, Figure 6], we observe distinct stages or epochs in the self-organizing process. Previously, we have separated the self-organization into three stages [21]: prebiotic, protobiotic, biotic. During the prebiotic stage, the original basis set of opcodes condenses into a reduced, biased basis set. This reduced set continues to self-organize, generating short sequences of n-opcodes (n-ops) that propagate critical building blocks (primarily copy loops, but also other precursors to future genes) required for replication. The protobiotic stage begins, usually within a million generations, when an inefficient proto-replicator emerges. Many of these proto-replicators die out within a few thousand generations. However, in many cases, mutations will drive the proto-replicators to evolve into robust replicators, initiating the final, biotic stage. During this latter phase, a population of robust replicator variants eliminates unneeded opcodes and unrolls the Copy-Loop (multiple {COPY, INCA} sequences per loop).

We use several metrics to track the degree of self-organization prior to emergence, and the degree of ongoing evolution after emergence. These metrics include the frequency of individual opcodes, f(Oi); the frequency of two successive opcode pairs, f(Oi, Oj); and the distribution of sizes (numbers of opcodes in a child) of opcode sequences. These metrics are used to calculate and quantify fundamental properties of an evolving Amoeba system.

#### 3.1 Self-Organization

Prior to the emergence of an ancestral replicator, self-organization biases the opcode basis set, increases the mutual information of opcode pairs, and propagates ever growing opcode sequences that are building blocks of a future replicator. The overall self-organizing process of the memory space can be visualized in the screenshots (only a small portion of the total memory space is shown) in Figure 3: (a) the initial random opcode distribution, (b) the prebiotic self-organization, and (c) post-emergence generation dominated by self-replicators.

Figure 3.

Small portion of the self-organizing memory space in Amoeba. (a) Initial random opcodes; (b) prebiotic propagation of short opcode sequences; (c) post-emergence generation of self-replicators. Color coding for opcodes is shown in key at bottom. Run: 07-08-2015_54654.

Figure 3.

Small portion of the self-organizing memory space in Amoeba. (a) Initial random opcodes; (b) prebiotic propagation of short opcode sequences; (c) post-emergence generation of self-replicators. Color coding for opcodes is shown in key at bottom. Run: 07-08-2015_54654.

The initially random opcode distribution of Figure 3a becomes partially ordered during the self-organization phase of Figure 3b. Some individual opcodes are replicated sequentially, shown by the horizontal lines of a single color, similar to the dense areas of identical opcodes observed in Coreworld [25]. This biases the basis set, with some opcode frequencies increasing at the expense of others. Opcodes useful for propagating sequences preferentially grow in frequency. In the left-hand third of Figure 3b, a short sequence was propagated across a dozen or more bands, indicating the existence of copying loops (also seen in Coreworld). Figure 3c shows that after emergence the memory map consists of thousands of replicators across bands and within bands.

#### 3.1.1 Size Distribution of Opcode Sequences

The self-organization of opcodes in Amoeba's memory space leads to increasing frequencies for some individual opcodes and opcode pairs within the first 10,000 generations. This leads to a population of propagated sequences with steadily growing sizes. Longer opcode sequences become more prevalent, eventually leading to the emergence of a proto-replicator. About half the proto-replicating sequences contain more than a hundred opcodes.

In the results reported here, a maximum size limit of 450 opcodes was imposed on all sequences. We imposed size limits in order to prevent memory overflows and avoid recording sequences in the log files that were cumbersome to analyze. Removing this size limit still incurs a maximum size limit, the maximum number of 2399 opcodes in each of the circular bands.

Figure 4 shows the growth in the sizes of children for one of the runs. Size distributions for sequences up to the maximum size of 450 opcodes are logged, but only children with sizes of 100 or less are shown here. During the first 10–20 thousand generations, most sequences consist of three opcodes or less; this is the size range for the (100) randomly generated sequences placed throughout the Amoeba world at the beginning of each new generation.

Figure 4.

Evolution of sizes (number of opcodes) for children. A proto-replicator emerged at 389,000 generations. Run: 08-13-2016_28691.

Figure 4.

Evolution of sizes (number of opcodes) for children. A proto-replicator emerged at 389,000 generations. Run: 08-13-2016_28691.

In less than 30,000 generations, some prebiotic propagators are capable of generating populations of child sequences with sizes of 10 to 50 opcodes, consistently smaller than the maximum size limit. These long sequences are shown by the white tails running to the right in Figure 4. Before emergence, the fraction of sequences with the maximum size limit slowly dropped from about 30% near the start of a new run to about 15% immediately prior to emergence. Note the large spread in the sizes, centered about size 25, of the emerged replicator population after 400,000 generations.

#### 3.1.2 Case Example: Self-Organization to Emergence

We discuss the self-organizing of opcodes, with subsequent propagation of building blocks necessary for replication, by analyzing one example of the emergence from an initial primordial soup of random opcodes. We first present the anatomy of a CALL-RETN replicator after emergence, followed by data on the self-organization that leads to the propagation of building blocks for genes.

The most complicated building block (also referred to here as a gene) is the Copy-Loop, because it includes machinery for copying opcodes from the parent to its child, a branch opcode to repeat the loop, and a conditional check that breaks out of the loop. This means a typical Copy-Loop consists of some version of {NOP, COPY, INCA, IFAG, JMPB}. As Figure 2 above shows, this can be complicated in cases where parts of other genes, such as the Self-Exam gene, are embedded in the Copy-Loop gene. We chose the CALL-RETN case, rather than one of the more commonly observed replicators using the JMPB opcode to terminate the Copy-Loop, because there are no extraneous opcodes (parts of other genes) embedded in the Copy-Loop.

#### 3.1.2.1 Anatomy for a CALL-RETN Replicator

The anatomy of a CALL-RETN replicator is shown in Figure 5. There are 14 unique opcodes in this replicator that are useful for replication. Nonfunctional opcodes (introns) have been neglected for brevity. The CALL-RETN replication method is rare because it requires two opcodes (CALL and RETN in that order) for closing the Copy-Loop. Most replicators just use one opcode (JMPB) to close their Copy-Loop and Reset-Register genes.

Figure 5.

Anatomy of a replicator that uses the CALL-RETN combination for the (extended) COPY loop and for retaining the IP. Run: 11-10-2015_25271. Generation: 321,982.

Figure 5.

Anatomy of a replicator that uses the CALL-RETN combination for the (extended) COPY loop and for retaining the IP. Run: 11-10-2015_25271. Generation: 321,982.

#### 3.1.2.2 Biasing the Opcode Basis Set

Initially, all 25 possible opcodes are equally distributed and the frequency (the fraction, f(mj), of all 1.2 million opcodes) is the same: f(Oj) = 0.040 for all Oj. However, the frequencies for some opcodes useful for propagating sequences of opcodes preferentially grow at the expense of other opcodes. Figure 6 shows the increase in frequency over time (left-hand scale) for selected single opcodes (1-ops). Emergence occurred at about 292,000 generations (labeled, vertical line).

Figure 6.

Growth of 1-ops and commensurate decrease in entropy for the CALL-RETN replicator of Figure 5. Run: 11-10-2015_25271.

Figure 6.

Growth of 1-ops and commensurate decrease in entropy for the CALL-RETN replicator of Figure 5. Run: 11-10-2015_25271.

We quantify the self-organization of single opcodes by plotting the monomeric opcode entropy, H(O), on the right-hand scale. The monomeric entropy is defined by
$HO=−∑j=1DmjMlnmjM,$
1
where D = 25 is the size of the alphabet (number of unique opcodes in the basis set), M ≈ 1.2 × 106 is the size of the memory space (total number of opcodes), mj is the number of occurrences (counts) for the jth opcode, denoted by the symbol Oj, and we take the natural log.4 An estimate of the effective size of the biased opcode basis set, essentially the weighted number of available opcodes to be chosen from the basis set, is given by the perplexity, PP(O) = eH(O) [18]. For the initial, equally distributed opcodes, H(O) = 3.219 and PP(O) = 25. By 500,000 generations, the entropy drops to H(O) = 2.935, indicating that the effective basis set has shrunk to PP(O) ≈ 19 opcodes. Most runs eventually shrink the basis set to about 15 opcodes when robust self-replicators dominate the world.

#### 3.1.2.3 Growing Multi-opcode Frequencies

The co-occurrence of two opcodes, a measure of the degree of opcode self-organization, can be quantified using the mutual information, I(O; O′):
$I(O;O′)≡HO−H(O|O′)=∑OiOjp(Oi,Oj)lnpOiOjpOipOj,$
2
where OiO, OjO′, and the sum is over all 252 = 625 possible opcode pairs for our basis set of 25 opcodes [18, 27]. The mutual information is the reduction in the uncertainty of one opcode, because we know something about the other; I(O;O′) = 0, when the opcode ensembles, O and O′, are completely independent. The log term in the sum in Equation 2 is the pointwise mutual information, I(Oi,Oj). This is the log of the odds ratio of opcode pairs, p(Oi,Oj)/p(Oi)p(Oj), and is zero in the absence of any correlation between opcodes.

We plot I(O;O′) versus time with the dashed blue line in Figure 7 (right-hand scale), clearly showing that self-organization has occurred. Critically, what we previously noted as the time of emergence coincides with the maximum rate of change in the mutual information. Scientifically, this is a key finding. We demonstrate the degree of ordering for the joint probability versus the uncorrelated single probabilities for a select set of opcode pairs (2-ops) by plotting the time evolution of the odds ratio, the argument of the natural logarithm in Equation 2. We used the CMU toolkit for counting the number of occurrences of n-op sequences [6].

Figure 7.

Mutual Information (right-hand scale, dashed blue line) and odds ratio (left-hand scale, solid lines) for 2-ops that lead to the COPY-RETN building block for the replicator of Figure 5.

Figure 7.

Mutual Information (right-hand scale, dashed blue line) and odds ratio (left-hand scale, solid lines) for 2-ops that lead to the COPY-RETN building block for the replicator of Figure 5.

A key observation is that 2-ops critical to development of the CALL-RETN loop, {CALL, COPY, INCA, RETN}, grow in abundance at least 50,000 generations before emergence during the self-organization period. The ADRB opcode gets replaced by the ADRF opcode once it is no longer useful after emergence, and the {RETN, NOP1} drops after emergence because introns are inserted between the RETN and NOP1.

#### 3.1.2.4 Development of the Copy-Loop Building Block

The Copy-Loop building block is noteworthy in that the primordial COPY-RETN loop, {CALL, COPY, INCA, RETN}, is a prebiotic building block; it is impossible for an IP to break out of this loop. This sequence copies opcodes throughout memory, but without a conditional check (IFAG or IFAL) the IP can never break the loop and initiate a child. Nevertheless, this copy sequence is a useful building block, capable of propagating itself and other opcodes. This building block was subsequently modified prior to emergence by inserting an IFAG conditional check before the RETN opcode. One method for this insertion would have been an insertion mutation while executing the DIVD operation.

Early prebiotic versions of the Copy-Loop gene are an example of propagator sequences. These prebiotic copying sequences propagate sequences of opcodes into other memory locations. We observe the frequency of these propagator sequences to increase prior to emergence of an ancestral replicator, discussed below for this CALL-RETN replicator. Figure 8 is a schematic showing how the CALL-RETN Copy-Loop propagator evolves over at least 100,000 generations, leading to an ancestral replicator emerging at about 289,000 generations into this run.

Figure 8.

Timeline for the Copy-Loop building block.

Figure 8.

Timeline for the Copy-Loop building block.

The CALL opcode calls the NOP1 address (see Figure 5 for the anatomy of the CALL-RETN replicator). Usually, the CALL opcode is followed by a NOP and the call is to that NOP's complement. In the absence of a subsequent NOP, the CALL will jump the IP to the complementary address template, if it exists, in its address register, EX. In this run, the previous ADRB opcode had loaded the EX register with NOP1, the complement to NOP2. It is interesting to note that the {ADRB, NOP2} combination appeared many opcodes prior to the CALL-RETN loop in prebiotic propagators. There was also a subsequent NOP1 in the primordial form of the CALL-RETN loop. An intron preceded the NOP1 during pre-emergence.

The vertical dashed blue lines show the time range over which each of the sequences exist. The solid blue diamonds are when we first saw the sequence in either a log file or a memory snapshot. Times of extinction are shown by solid blue circles. Once a sequence occurs, it is propagated for many generations until replaced by a more viable alternative. For example, the {CALL, COPY, INCA, RETN} sequence persists until the Copy-Loop is unrolled (generation 239,000).

#### 3.2 Emergence

The emerging ancestral proto-replicators yield several interesting observations: Half the ancestors have sizes greater than 50 opcodes in length; the probability of emergence drops off after about 1M generations; ancestors do not emerge uniformly across all bands. The size dependence of the emerging ancestors indicates that half the time the ancestors are large, ungainly sequences of opcodes. The decreasing probability of emergence with time correlates with a shrinking of the opcode basis set's effective size (perplexity). This correlation indicates the basis set may become overly biased and reduce the likelihood of emergence. We speculate that the nonuniform emergence across bands might be due to the nonuniform distribution of entropy injection into the system at the start of each new generation. We discuss the role of entropy injection in more detail in Section 3.2.3 (Role of Entropy in Emergence).

#### 3.2.1 Size Distribution of Emergent Ancestors

Figure 9 shows the (binned) distribution of sizes for 31 emergent replicators in a set of 28 runs exhibiting emergence out of a total of 54 runs (two runs had multiple emergences). These data include the emergence of proto-replicators, all of which survived at least 1000 generations. Note that the last size bin is for all emergent ancestral sequences of size 351 opcodes or larger. The smallest emergent ancestral size observed was 17 opcodes. (The smallest self-replicator ever observed in Amoeba-IV runs to date contains 16 opcodes.)

Figure 9.

Distribution of emergent ancestor sizes.

Figure 9.

Distribution of emergent ancestor sizes.

The size distribution is peaked at two main bin sizes: those between 17 and 50 opcodes, and those between 100 and 150 opcodes. We speculate that the small-size group, centered around 25 to 30 opcodes, is approximately the number of opcodes between successive building blocks that code for replication. That is to say, as the concentration of propagated building blocks increases, it is reasonable that the probability of emergence would increase. We tentatively identify the critical concentration to be about 30 opcodes per building block. The large-size set, with sizes centered at 125 opcodes, is limited at about 200 opcodes. This is also the reciprocal of the substitution mutation rate, Lmax ∼ 1/Psub = 1/0.005 = 200, incurred when executing the COPY opcode.

The definition of an emergent ancestor is nontrivial. In about 15% of the emergences, a self-replicator of size close to 30 would emerge. A subsequent analysis of the log files of unique propagated sequences showed that a protobiotic sequence of the maximum size (450 opcodes) was the actual ancestor, containing most of the genes observed in the later, more robust self-replicating ancestor.

A fundamental difference between Amoeba-IV and earlier versions is that the ancestral proto-replicator does not suddenly emerge through some fortuitous combination of opcodes. The combination of removing encapsulation in Amoeba-III and the inclusion of NOPs for addressing in Amoeba-IV makes such an ancestor's spontaneous emergence highly improbable, because the minimal size of a self-replicator is at least 14 opcodes. We have hand-crafted two very different ancestral proto-replicators of size 14 opcodes and various ancestral self-replicators of 15 opcodes. There are 2514 ≈ 1019.6 combinations of size-14 sequences, of which a tiny fraction (not easy to estimate) are protoreplicators. We have not found shorter ancestors.

Surprisingly, rather than losing the ability to generate self-replicators, the Amoeba memory space self-organizes over a period of several hundred thousand generations by propagating opcode sequences of ever growing length and complexity. Eventually, in about half the runs, a proto-replicator ancestor emerges that quickly evolves into a population of robust replicators. As Figure 9 indicates, many of these early ancestral replicators are extremely long sequences of hundreds of opcodes.

#### 3.2.2 Distribution of Emergence Times

Figure 10 shows the (binned) distribution of emergence times for 31 emergent replicators in a set of 28 runs exhibiting emergence out of a total of 54 runs (two runs had multiple emergences). Note that the bin sizes for emergence times after 5M generations increase from 0.5M to 5M generations. These data include proto-replicators whose children survived at least 1000 generations.

Figure 10.

Distribution of self-replicator emergence times.

Figure 10.

Distribution of self-replicator emergence times.

Nearly half of the emergences occur within the first million generations, with the emergence probability steadily dropping with time. This correlates with a steady drop in the effective size of the basis set, PP(O), which shrinks to less than 20 opcodes within the first 50,000 generations.

At the start of a new run (immediately before the first generation), each location in memory has an equal probability of being any one of the 25 unique opcodes in the basis set, p(O) = 1/25 = 0.04, and a perplexity of PP(O) = 25. This reduction in the basis set is very uneven; after two million generations in runs that did not exhibit emergence, more than 25% of the opcodes are either NOP1 or NOP2. This reduces the frequencies of other opcodes critical for replication, including conditional opcodes (IFAL and IFAG) and branching opcodes (JMPB and JMPF). The frequencies for some of these critical opcodes drop below 0.02 after about 5M generations. We observe the rate at which these frequencies decrease to correlate with the amount of entropy injected into memory at the start of each new generation.

#### 3.2.3 Role of Entropy in Emergence

A main feature of the Amoeba platform is the injection of random opcodes (e.g., entropy) at the beginning of each generation [21]. In early Amoeba versions, this was done merely as a means to inject IPs, each associated with a few random opcodes and a virtual CPU, into the memory space of the prebiotic world. Subsequent versions of Amoeba also showed that randomizing opcodes, either through mutations and/or at the start of each generation, prevented freezing out of the opcode basis set [23].

We hypothesize that while the initial biasing of the opcode basis set increases the likelihood of generating propagators, subsequent shrinking of the basis set can reduce the likelihood of self-replicator emergence because some opcodes crucial for self-replication become frozen out. Specifically, the reduced likelihood of emergence appears to be correlated with PP(O) < 20 opcodes. We partially test this hypothesis by carrying out two sets of control experiments. We consider two sets because there are two sources of entropy injection: mutations, and the randomly selected opcode sequences (of lengths ranging from 1 to 3 opcodes) associated with each of the 100 randomly positioned IPs at the start of each generation.

Figure 11 shows the effect of entropy injection on the rate at which the effective size of the opcode basis set (perplexity) shrinks for three cases, each averaged over four runs: (a) the standard case for the current studies, but where no emergence occurred, (b) the first control case, where the mutation rates are as for the standard case but no random opcodes are associated with the 100 randomly positioned IPs, and (c) the second control case, where both the COPY and DIVD mutation rates are reduced to a low level5 of 0.0001 and no random opcodes are associated with any of the 100 randomly positioned IPs (as for case b).

Figure 11.

Reduced basis set perplexity versus time (millions of generations). (a) Standard case: normal mutation rate, normal random opcode injection. (b) Control case: standard mutation rate, no random opcode injection. (c) Control case: minimal mutation rate, no random opcode injection.

Figure 11.

Reduced basis set perplexity versus time (millions of generations). (a) Standard case: normal mutation rate, normal random opcode injection. (b) Control case: standard mutation rate, no random opcode injection. (c) Control case: minimal mutation rate, no random opcode injection.

As Figure 11 shows, successively reducing the rate of entropy injection (case b and case c) biases the opcode basis set perplexity below 20 opcodes in less than 100,000 generations. This implies the probability of a self-replicator emerging will be less, as some opcodes useful and/or required for self-replication become eliminated from the memory space.

The control cases still exhibit self-organization; we observe some replicating structures similar to those seen in Coreworld. But these structures become static, with no further evolution in the basis set, in a few 100K generations.

This injection of random sequences at the start of each new generation is a means to introduce entropy (in addition to mutations) into the system. We observe that injecting too much entropy thermalizes the system; that is, the contents of memory space are randomized more rapidly than the rate of biasing the basis set. On the other hand, the control tests of Figure 11 indicate that there are too few random opcodes and the basis set condenses into an effective alphabet of only a dozen or so opcodes. The address opcodes, NOP1 and NOP2, dominate memory space, while the frequencies for some opcodes critical for replication, such as conditional breaks (IFAG or IFAL) or opcodes required for determining the end of a program's sequence (ADRF and SUBB), steadily decrease. Earlier studies with Coreworld evolved a reduced basis set under various conditions [25]. Recent studies with the Avida platform have generated reduced basis sets when no entropy is injected [16]. This suggests there is a critical regime for the amount of entropy injected into the system.

How much injected entropy is optimal? This question is challenging because propagated sequences affect the environment (evidenced by the increasing mutual information in Figure 7). For example, when self-replicators dominate an Amoeba run, a much larger mutation rate is needed to offset the high rate of sequence duplication for these robust replicators. The optimal amount of entropy changes during a run.

Since it is impossible to match the optimal amount of entropy at any given stage during an evolving Amoeba world, Amoeba-IV injects entropy (random sequences of one to three opcodes) with probabilities that have a triangular distribution across the memory bands, shown in Figure 12 by the solid black curve (using the left-hand scale). The amount of entropy is a maximum in the central bands centered at number 250, dropping off to zero probability near the edge bands 1 and 500. The triangular distribution is created by adding two successive random numbers to generate an integral band number between 1 and 500.

Figure 12.

Dependence of propagated-sequence frequencies (solid green diamonds, left-hand scale) and emerging replicator counts (solid red circles, right-hand scale) across bands. Random opcodes (entropy) are injected with the triangular frequency given by the solid black lines (left-hand scale).

Figure 12.

Dependence of propagated-sequence frequencies (solid green diamonds, left-hand scale) and emerging replicator counts (solid red circles, right-hand scale) across bands. Random opcodes (entropy) are injected with the triangular frequency given by the solid black lines (left-hand scale).

The distributions across bands for both ancestral replicator sequences (solid red circles, plotted against the right-hand axis as number of counts) and propagated sequences before emergence (solid green diamonds, plotted against the left-hand axis as frequencies) are also plotted in Figure 12. These data are binned, 50 bands per bin for the replicator counts and 20 bands per bin for the propagator frequencies.

The emergence band data for propagated sequences include data for all logged sequences containing at least ten opcodes that were generated by parents during the first 1M generations of a run, or up until 100,000 generations prior to emergence of a self-replicator. This was done to identify the emergence band locations of sequences that existed at least four generations.

The replicator ancestor data include not only robust self-replicators that survive for many millions of generations, but also proto-replicator quasi-species populations that survived at least 1000 generations after the proto-replicator emerged. This raises an important point: It is a challenge to identify the emergence of an ancestor in Amoeba-IV. Ancestors are logged in a file when a parent has faithfully copied itself to children for at least four generations. However, early proto-replicators do not copy themselves with any fidelity. It typically takes several thousand generations before a faithful replicator is generated and logged. During this time, millions of programs have been initiated, and some small subset of that number will eventually lead to an ancestor that faithfully copies its opcodes to its children. It is possible that an ancestral proto-replicator emerges out of a hypercycle of interacting components [11], but this type of interaction is nontrivial to track. One can examine the “world snapshots” that are periodically saved, but each snapshot is a list of the entire Amoeba memory; the analysis of any interactions within that map is difficult. In several runs we have been able to identify an inefficient proto-replicator precursor as the ancestor. Many of these are very large sequences, as shown by the tail of the size distribution (Figure 9).

We show for a selected run the effect self-organization has on the environment in Figure 13, where we plot the mutual information as a function of (binned) bands and the time (generations) into the run. Emergence occurred at 1.835M generations, only 5000 generations after the green (solid squares) data were taken.

Figure 13.

Evolution of the mutual information across the bands for a single Amoeba run. See the text for details. Run: 02102016_47902.

Figure 13.

Evolution of the mutual information across the bands for a single Amoeba run. See the text for details. Run: 02102016_47902.

Random opcode sequences of one to three opcodes were injected with the peaked distribution (black curve in Figure 12). The solid diamonds (red curve) are only 10K generations after the start of the Amoeba run, and already there are some differences across the bands. The localized spikes are where a single opcode is repeated sequentially for hundreds of opcodes in a single band. This can be seen in the horizontal streaks of a single color (opcode) of Figure 3b. A few thousand generations before emergence, the central bands with a high probability of injected entropy exhibit lower mutual information. About 0.440M generations after emergence (blue crosses in Figure 13), the mutual information increases across all bands, as the replicators are able to propagate children even into the high-entropy central bands.

We also observe the shape of the mutual information curves to change in two ways: The minimum can shift at least a hundred bands (compare the solid yellow circles with the solid green squares); the variation of the mutual information lessens as we approach the time of emergence (solid yellow circles and red diamonds). The variation in the post-emergence data is less than before emergence because the robust replicators are able to copy across the central bands faster than random opcodes are added. These data indicate the optimal amount of injected entropy changes during a run (before emergence).

### 3.3 Diversity of Emergent Ancestral Replicators

The Amoeba-IV system is more open-ended than previous versions during the prebiotic and protobiotic stages. Various novel adaptive functionalities emerge: Copy-Loops propagate sequences throughout memory; programs determine their size; programs create colonies, partially protecting them from lethal mutations; and programs create primitive “cell walls,” affording them some protection against parasites. However, Amoeba has not yet achieved the ultimate goal of true open-endedness where ever increasingly complex functionalities evolve. Once a robust self-replicator emerges, future evolution in its functionality is limited to minor improvements such as unrolling the Copy-Loop or resetting registers and propagating multiple children.

Anatomies for two very different replicators were shown earlier: the “typical” replicator with anatomy of Figure 2, and the CALL-RETN replicator with anatomy of Figure 5. Those two examples, along with the next two below, demonstrate the rich diversity of replicators that is one of the outcomes of the modified Amoeba-IV artificial chemistry.

#### 3.3.1 “Size Guesser” Proto-replicator

An example of a class of robust proto-replicators that has emerged in several runs is shown in Figure 14. This is an inefficient replicator; it does not include the Self-Exam gene, so it “guesses” its size; the parent cannot generate a complete copy of itself until the third child.

Figure 14.

Anatomy of a replicator that guesses its size; the Self-Exam gene is missing. Run: 06-17-2015_54809. Generation: 1.042M.

Figure 14.

Anatomy of a replicator that guesses its size; the Self-Exam gene is missing. Run: 06-17-2015_54809. Generation: 1.042M.

About 200,000 generations later, this proto-replicator evolved into a robust replicator that used ADRF and SUBB opcodes to properly calculate its size and avoid copying incomplete versions of itself for the first two attempts.

The anatomy of a conditional ladder replicator is shown in Figure 15. This replicator uses an IFAL ladder to avoid prematurely resetting the program's registers while copying its code to a child: two opcodes every COPY loop. When done copying, the replicator's fourth IFAL check fails and the BX register is set by means of the second SUBB opcode.

Figure 15.

Anatomy of a replicator that uses a conditional ladder of IFAL opcodes to avoid resetting registers (DIVD, POPA, SUBB). Run: 06-20-2015_31615. Generation: 4.315M.

Figure 15.

Anatomy of a replicator that uses a conditional ladder of IFAL opcodes to avoid resetting registers (DIVD, POPA, SUBB). Run: 06-20-2015_31615. Generation: 4.315M.

This replicator's COPY loop is inefficient because it includes the entire program's opcode sequence. Once the parent has copied itself, the value stored in the AX register exceeds the program's size, recorded in the BX register, and the IFAL ladder checks fail. The parent initiates its child (DIVD) and resets its AX register (POPA), and the MALL functions again, initiating a new location for the parent to copy its sequence to a new child.

#### 3.3.3 Encapsulation of Replicator Sequences

Amoeba-IV does not artificially write-protect any portion of its memory space. There are no predefined cellular boundaries such as existed in the earlier Amoeba-I and -II topologies. However, programs frequently manage to partially protect their code. They do this in two main ways.

First, most replicators use the opcode combination {ADRB, NOP(s)} to reset a virtual CPU's beginning location for its child's code to be the starting NOP template of the replicator. This also prevents rogue (other programs') IPs from using a replicator's code to copy themselves, because a rogue's starting address (in the CX register) is reset by the ADRB opcode to the host replicator's starting memory address. Similarly, a RETN opcode, usually in a proto-replicator, can return a (potential) parasite's IP back to its own opcode sequence. The parasite is usually some short viral sequence of only a few opcodes of which one is the CALL opcode. This is a transient situation; once the category of viruses using a CALL to access a proto-replicator's code has vanished, the RETN is no longer needed.

Second, we have observed a primitive repair mechanism: Robust replicators will overwrite less fit sequences that may arise from mutations occurring in their children. Colony members protect their code similarly.

## 4 Discussion and Future Research

There are several other interesting observations in addition to the success in generating self-replicators due to opcode self-organization from a primordial soup of opcodes.

Many proto-replicators lack some of the genes shown above in Figure 2 for a self-replicator. An example of a proto-replicator without the Self-Exam gene was shown in Figure 14. Additionally, not all replicators retain their IP. Replicators can lose their IP and still function if they are members of a colony of similar replicators that each generate a single child and then lose their IP to a neighboring colony member; the lost IP executes opcodes belonging to other programs further down the band.

We observe runs where replicators cheat and get a larger CPU time slice. This issue arises because the time slice is proportional to a program's size. Earlier versions of Amoeba automatically incremented the AX register as part of the COPY opcode; the size of a program was the number of opcodes copied. Amoeba-IV now requires the INCA opcode to increment the AX register. This raises an interesting question: How to determine the size of a child when the DIVD opcode initiates it? We cannot increment the size every time a COPY is used, since prebiotics can copy the same opcode many times (they lack the INCA opcode). Currently, we use the AX register value to determine the size. The cheaters copy their replicator sequence, typically about 30 opcodes. Then, prior to the DIVD opcode, they use multiple ADDA opcodes to increase the AX value to many times the program's size. We are still investigating how to prevent this parasitic behavior without decreasing the replicator's genome diversity.

The Amoeba systems have always been dominated by variants of a single species after emergence. We have never observed two radically different species coexisting. Virus-like parasites occur, but are quickly eliminated when the host mutates. For example, the retention of the ADRB opcode, even though not required for a Self-Exam, effectively hijacks viral IPs and their associated CPUs. We believe a variation in the externally imposed fitness landscape may enable alternative species, as well as parasites and hosts, to coexist for extended times. Amoeba-IV partially addresses this by varying the rate at which random sequences are distributed throughout the bands. However, replicators are still able to quickly scatter their children throughout all bands. There are several options to consider: slowing down the rate at which children are spread throughout the bands, imposing write protection when a program allocates memory for a child (similarly to Tierra), or modifying the time slice parameters for different regions in memory so that replicator sequences of different sizes would find some parts of memory inhospitable.

We observe that the probability of a replicator emerging after about one million generations is low (Figure 10). This implies that there are some self-organization processes that can lower the probability of emergence. One observation is that the NOP1 and NOP2 opcodes steadily become more prevalent in runs without emergence. This reduces the likelihood of creating building-block propagators, because other opcodes critical to replication occur less frequently. We used nonuniform entropy injection as a means to minimize the freezing out of critical opcodes. Emergence preferentially occurred in the bands with an intermediate amount of entropy injection (Figure 12). Control experiments showed that the opcode basis set becomes more biased with less entropy injection (Figure 11).

The earlier Amoeba versions I and II demonstrated that the probability of a randomly generated opcode sequence being a replicator increased with size (number of opcodes). It has been argued that computation-universal chemistries such as Avida or Tierra would exhibit a probability that decreased with increasing size [2]. Preliminary observations with Amoeba-IV indicate this is not the case—about half of the ancestral replicators have sizes exceeding 100 opcodes—and this aspect of our findings warrants further exploration. While ancestors in Amoeba-IV are generated from building blocks that are propagated during the self-organization phase, these genes are generated multiple times in huge ungainly proto-replicators with sizes ranging up to the maximum currently allowed (450 opcodes). We would also like to explore removing the maximum size of 450 opcodes.

Another topic of interest is the difference between propagators, proto-replicators, and self-replicators. Amoeba-IV has shown it is very difficult to identify precisely when an ancestral replicator emerges. It may be that groups of propagators form hypercycles that eventually generate a proto-replicator that subsequently leads to robust self-replicators [11,13]. We are currently investigating this by logging sequences that have been generated at least 50 times, regardless of whether or not the sequences are the same as their parental sequence.

We have created ancestral self-replicators with a length of 14 opcodes and an upper limit of 11.5 for their information content, based on one-point substitutions performed on the ancestor in a sterile environment (no opcodes in surrounding memory space) [3]. This needs to be investigated further, using multi-point substitutions and a range of environments [16].

## Acknowledgments

We are grateful for fruitful conversations with Stanislas Leibler, who was also the main inspiration for the development of the Amoeba-IV platform and subsequent research. We owe a great deal to the folks at IAS (Princeton, NJ), especially Bernard Chazelle. We had informative discussions with many people at the Artificial Life XV conference in Cancun, especially Christoph Adami, Mark Bedau, Nicholas Guttenberg, Simon Hickinbotham, Charles Ofria, Steen Rasmussen, and Hiroki Sayama. Finally, we thank the three anonymous reviewers for their excellent comments and suggestions.

Supplementary material, consisting of the source code (written in PowerBasic, an early type of visual Basic) and a sample environment file read by the source code, can be found here: http://www.mitpressjournals.org/doi/suppl/10.1162/ARTL_a_00234.

## Notes

1

These minimum lengths are not the respective information contents. Our initial estimates, based on one-point substitutions [3] in a sterile environment (the surrounding memory is empty, devoid of any opcodes), place an upper limit of 11.5 for the information content of a self-replicator containing 14 opcodes, and 11.0 for an inefficient proto-replicator of 11 opcodes.

2

We chose 2399 opcodes per band because 2399 is a prime number, preventing programs from being able to generate an integral number of children along a band and creating a barrier to altering program sizes in future generations.

3

This latter criterion was imposed because there were some runs where the 100 randomly injected IPs were unable to allocate the next 1,900 CPUs in a reasonable period of time.

4

An alternative to using the natural logarithm is to take the log of the size of the opcode alphabet, base 25 in the Amoeba results discussed here.

5

We did not observe self-organization when mutation rates were set to zero. Each run settled into a (unique) static state within a few 100K generations.

## References

1
,
C.
(
1995
).
On modeling life
.
Artificial Life
,
1
,
429
438
.
2
,
C.
(
1998
).
Introduction to artificial life
.
New York
:
Springer-Verlag
.
3
,
C.
,
Ofria
,
C.
, &
Collier
,
T. C.
(
2000
).
Evolution of biological complexity
.
Proceedings of the National Academy of Sciences of the U.S.A.
,
97
(
9
),
4463
4468
.
4
,
C.
(
2012
).
The use of information theory in evolutionary biology
.
Annals of the New York Academy of Sciences
,
1256
,
49
65
.
5
Bedau
,
M. A.
(
2003
).
Artificial life: Organization, adaptation and complexity from the bottom up
.
Trends in Cognitive Sciences
,
7
(
11
),
505
512
.
6
Clarkson
,
P. R.
, &
Rosenfeld
,
R.
(
1997
).
Statistical language modeling using the CMU-Cambridge Toolkit
. In
G.
Kokkinakis
et al
(Eds.),
Fifth European Conference on Speech Communication and Technology
(pp.
2707
2710
).
Bonn, Germany
:
ISCA Archive, W. Hess (Ed.)
.
7
Cordaux
,
R.
, &
Batzer
,
M. A.
(
2009
).
The impact of retrotransposons on human genome evolution
.
Nature Reviews Genetics
,
10
(
10
),
691
703
.
8
Cover
,
T. M.
, &
Thomas
,
J. A.
(
1991
).
Elements of information theory
.
New York
:
Wiley
.
9
Cronin
,
L.
, &
Walker
,
S. I.
(
2016
).
Beyond prebiotic chemistry
.
Science
,
352
,
1174
1175
.
10
Dittrich
,
P.
,
Ziegler
,
J.
, &
Banzhaf
,
W.
(
2001
).
Artificial chemistries—a review
.
Artificial Life
,
7
(
3
),
225
275
.
11
Eigen
,
M.
(
1971
).
Selforganization of matter and the evolution of biological macromolecules
.
Die Naturwissenschaften
,
58
,
465
532
.
12
Eigen
,
M.
,
Schuster
,
P.
,
Winkler-Oswatitsch
,
R.
, &
Gardiner
,
W.
(
1981
).
The origin of genetic information
.
Scientific American
,
244
(
4
),
88
118
.
13
Eriksson
,
A.
,
Gőrnerup
,
O.
,
Jacobi
,
M. N.
, &
Rasmussen
,
S.
(
2006
).
Quasi-species and aggregate dynamics
. In
L. M.
Rocha
et al
(Eds.),
Artificial life X
(pp.
145
151
).
Cambridge, MA
:
MIT Press
.
14
Greenbaum
,
B.
, &
Pargellis
,
A. N.
(
2016
).
Digital replicators emerge from a self-organizing prebiotic world
. In
C.
Gershenson
,
T.
Froese
,
J. M.
Siqueiros
,
W.
Aguilar
,
E. J.
Izquierdo
, &
H.
Sayama
(Eds.),
Proceedings of the Artificial Life Conference 2016
(pp.
60
67
).
Cambridge, MA
:
MIT Press
.
15
Hickinbotham
,
S. J.
,
Clark
,
E.
,
Nellis
,
A.
,
Stepney
,
S.
,
Clarke
,
T.
, &
Young
,
P.
(
2016
).
Maximizing the adjacent possible in automata chemistries
.
Artificial Life
,
22
(
1
),
49
75
.
16
LaBar
,
T.
, &
,
C.
(
2016
).
From entropy to information: Biased typewriters and the origin of life
. In
S. I.
Walker
,
P.
Davies
, &
G.
Ellis
(Eds.),
Information and causality: From matter to life
(pp.
95
113
).
Cambridge, U.K.
:
Cambridge University Press
.
17
Maley
,
C. C.
(
1994
).
The computational completeness of Ray's Tierran assembly language
. In
C. G.
Langton
(Ed.),
Artificial Life III
(pp.
503
514
).
Redwood City, CA
:
.
18
Manning
,
C. D.
, &
Schutze
,
H.
(
2000
).
Foundations of statistical natural language processing
.
Cambridge, MA
:
MIT Press
.
19
McMullin
,
B.
(
2012
).
Architecture for self-reproduction: Abstractions, realisations and a research program
. In
C.
,
D. M.
Bryson
,
C.
Ofria
, &
R. T.
Pennock
(Eds.),
Artificial Life 13, Proceedings of the Thirteenth International Conference on the Simulation and Synthesis of Living Systems
(pp.
83
90
).
Cambridge, MA
:
MIT Press
.
20
Ofria
,
C.
,
Bryson
,
D. M.
, &
Wilke
,
C. O.
(
2009
).
Avida: A software platform for research in computational evolutionary biology
. In
M.
Komosinski
&
A.
(Eds.),
Artificial life models in software
(
Chapter 1
).
London
:
Springer-Verlag
.
21
Pargellis
,
A.
(
1996
).
The spontaneous generation of digital “life.”
Physica D
,
91
,
86
96
.
22
Pargellis
,
A.
(
1996
).
The evolution of self-replicating computer organisms
.
Physica D
,
98
,
111
127
.
23
Pargellis
,
A. N.
(
2001
).
Digital life behavior in the Amoeba world
.
Artificial Life
,
7
(
1
),
63
75
.
24
Pargellis
,
A.
(
2003
).
Self-organizing genetic codes and the emergence of digital life
.
Complexity
,
8
(
4
),
69
78
.
25
Rasmussen
,
S.
,
Knudsen
,
C.
,
Feldberg
,
R.
, &
Hindsholm
,
M.
(
1990
).
The Coreworld: Emergence and evolution of cooperative structures in a computational chemistry
.
Physica D
,
42
,
111
134
.
26
Ray
,
T.
(
1991
).
An approach to the synthesis of life
. In
C. G.
Langton
et al
(Eds.),
Artificial life II: Proceedings of the Second Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems
(pp.
371
408
).
Redwood City, CA
:
.
27
Rivoire
,
O.
, &
Leibler
,
S.
(
2011
).
The value of information for populations in varying environments
.
Journal of Statistical Physics
,
28
,
1124
1166
.
28
Sayama
,
H.
(
1999
).
A new structurally dissolvable self-reproducing loop evolving in a simple cellular automata space
.
Artificial Life
,
5
(
4
),
343
365
.
29
Suzuki
,
H.
(
2011
).
Artificial chemistry and molecular networks
. In
H.
Sawai
(Ed.),
Biological functions for information and communication technologies: Studies in computational intelligence
(pp.
87
161
).
Berlin
:
Springer-Verlag
.
30
Tangen
,
U.
(
2010
).
The emergence of replication in a digital evolution system using a secondary structure approach
. In
H.
Fellermann
et al
(Eds.),
Artificial Life XII
(pp.
168
175
).
Cambridge, MA
:
MIT Press
.
31
Taylor
,
T.
, et al
(
2016
).
Open-ended evolution: Perspectives from the OEE Workshop in York
.
Artificial Life
,
22
(
3
),
408
423
.

## Author notes

Contact author.

∗∗

Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029. E-mail: benjamin.greenbaum@mssm.edu

7109 Via De La Reina, Bonsall, CA 92003. E-mail: apargellis@yahoo.com