## Abstract

One of the main goals of Artificial Life is to research the conditions for the emergence of life, not necessarily as it is, but as it could be. Artificial chemistries are one of the most important tools for this purpose because they provide us with a basic framework to investigate under which conditions metabolisms capable of reproducing themselves, and ultimately, of evolving, can emerge. While there have been successful attempts at producing examples of emergent self-reproducing metabolisms, the set of rules involved remain too complex to shed much light on the underlying principles at work. In this article, we hypothesize that the key property needed for self-reproducing metabolisms to emerge is the existence of an autocatalyzed subset of Turing-complete reactions. We validate this hypothesis with a minimalistic artificial chemistry with conservation laws, which is based on a Turing-complete rewriting system called combinatory logic. Our experiments show that a single run of this chemistry, starting from a tabula rasa state, discovers—with no external intervention—a wide range of emergent structures including ones that self-reproduce in each cycle. All of these structures take the form of recursive algorithms that acquire basic constituents from the environment and decompose them in a process that is remarkably similar to biological metabolisms.

## 1 Introduction

One central area of focus for Artificial Life is characterizing the conditions that lead to the emergence of living systems. More precisely, the goal is to understand under which conditions metabolisms capable of sustaining themselves in time, reproducing, and ultimately, evolving can emerge. *Artificial chemistries* can be used to reveal this process by simulating the properties of natural chemical systems at different levels of abstraction (see Dittrich et al., 2001, for a thorough review). The driving hypothesis is that complex organizations emerge from the interactions of simpler components thanks to self-organizing attractors in chemical networks (Kauffman, 1993; Walker & Ashby, 1966; Wuensche et al., 1992). While some artificial chemistries model as closely as possible the properties of the chemistry that gave rise to life on Earth (Flamm et al., 2010; Högerl, 2010; Young & Neshatian, 2013), others leave out the particularities of natural chemistry to focus only on their hypothesized core computational properties (Buliga & Kauffman, 2014; di Fenizio and Banzhaf, 2000; Fontana & Buss, 1994; Hutton, 2002; Sayama, 2018; Tominaga et al., 2007). Interestingly, some of these studies have described systems that produce emergent metabolic structures (Bagley & Farmer, 1992), and others feature self-reproducing structures as well (Hutton, 2002; Young & Neshatian, 2013), at times also capable of undergoing evolution (Hutton, 2007). However, it is still not clear which of the properties in these chemical systems are central to the emergence of these structures. Yet, gaining such insights is a crucial step for deriving more general biological theories grounded both in life-as-it-is and life-as-it-might-be (Langton, 1989).

Here, we hypothesize that self-reproducing metabolisms emerge as a subset of autocatalyzed reactions within a Turing-complete set. To validate this idea, we introduce *combinatory chemistry*, an artificial chemistry designed to capture the core computational properties of living systems. It consists in a set of autocatalyzed reactions that are based on the rewriting rules of *combinatory logic* (Curry et al., 1958; Schönfinkel, 1924). These reactions inherit from the original rewiring rules the capacity to perform universal computation. Furthermore, we adapted the rules so that the ensuing reactions would have conservative dynamics. Completing the set of possible reactions, there are random mixing rules that act at a far lower rate.

The resulting chemical system is strongly constructive (Fontana et al., 1993), which means that as the system evolves in time it can create—by chance or through the action of its own components—new components that can in turn modify its global dynamics. Furthermore, thanks to its universal computation capabilities, there is no theoretical limit to the complexity that emergent forms can have. On the other hand, because of the conservation dynamics, the memory cost of the system remains constant without needing to introduce external perturbations (such as randomly removing elements from the system), while possibly also providing a natural source of selective pressure.

In contrast to previous work that explicitly banned expressions that would not reduce to a normal form (di Fenizio and Banzhaf, 2000; Fontana & Buss, 1994), combinatory chemistry can handle them adequately by distributing their (potentially infinite) computation steps over time as individual reactions. Also, with respect to an earlier version of this work (Kruszewski & Mikolov, 2020), here we have further simplified the system, dropping the “reactant assemblage” mechanism through which we used to feed emergent structures with their required resources. This mechanism enabled them to grow their activity and thus be more easily spotted, but it also biased the evolution of the system. Instead, here we have devised a new metric to identify these structures in naturally occurring resource conditions. Moreover, we use Gillespie's algorithm (Gillespie, 1977) to simulate the time evolution of the system to obtain unbiased samples of the proposed process, instead of the more simplistic algorithm that was used before.

Starting from a tabula rasa state consisting of only elementary components, combinatory chemistry produces a diversity explosion, which then develops into a state dominated by self-organized emergent autopoietic structures (Maturana & Varela, 1973), including recursively growing and self-replicating ones. Notably, all these types of structures emerge during a single run of the system without requiring any external intervention. Furthermore, they preserve themselves in time by absorbing compounds from their environment and decomposing them step by step, in a process that has a striking resemblance to the metabolism of biological organisms. These structures take the form of recursive algorithms that never reach a normal form (i.e., halting point) as long as sufficient resources are present in their environment. Notably, we found that Turing-completeness was required in combinatory chemistry for representing self-replicating structures, but not for simple autopoietic or recursively growing ones.

This article is organized as follows. First, we describe earlier work in artificial chemistry that is most related to the approach introduced here. Then, we explain the basic workings of combinatory logic and how we adapted it in an artificial chemistry. Third, following earlier work, we discuss how autocatalytic sets can be used to detect emerging phenomena in this system, and propose a novel measure of emergent complexity, which is well adapted to the introduced system. Finally, we describe our experiments showcasing the emergence of complex structures in combinatory chemistry and discuss their implications.

## 2 Artificial Chemistries

Artificial chemistries are models inspired in natural chemical systems that are usually defined by three different components: a set of possible molecules, a set of reactions, and a reactor algorithm describing the reaction vessel and how molecules interact with each other (Dittrich et al., 2001). In the following discussion, we will focus on the algorithmic chemistries that are the closest to the present work.

AlChemy (Fontana & Buss, 1994) is an artificial chemistry where molecules are given by *λ*-calculus expressions. *λ*-calculus is a mathematical formalism that, like Turing machines, can describe any computable function. In AlChemy, pairs of randomly sampled expressions are joined through function application, evaluated, and the corresponding result is added back to the population. To keep the population size bounded, expressions are randomly discarded. Fontana and Buss showed that expressions that computed themselves quickly emerged in this system, which they called *level 0 organizations*. Furthermore, when these expressions were explicitly prohibited, a more complex organization emerged where every expression in a set was computed by other expressions within the same set (level 1 organizations). Finally, mixing level 1 organizations could lead to higher order interactions between them (level 2 organizations). Yet, this system had some limitations. First, each level of organization was only reached after external interventions. In addition, programs were evaluated using *β* reductions, which require that they reach a normal form, namely, that there are no more *λ*-calculus rules than can be applied. Thus, recursive programs, which never reach a normal form, are banned from the system. Here, we use weak reductions instead, allowing the system to compute the time evolution of programs that never reach a formal form. Interestingly, it is exactly in this way that emergent metabolisms are represented. Furthermore, in AlChemy, two processes were introduced as analogues of food and waste, respectively. First, when expressions are combined, they are not removed from the system, allowing the system to temporarily grow in size. Second, expressions that after being combined with existing expressions do not match any *λ*-calculus reduction rules are removed. Without these processes, complex organizations fail to emerge. Yet, it is not clear under which circumstances these external interventions would not be needed anymore in order for the system to evolve autonomously. Finally, bounding the total number of expressions by randomly removing excess ones creates perturbations to the system that can arbitrarily affect the dynamics. Fontana and Buss (1996) later proposed MC2, a chemistry based on linear logic that addressed some of these limitations (notably, conservation of mass), although we are not aware of empirical work on it.

Here, we propose an artificial chemistry based on combinatory logic. This formalism has been explored before in the context of artificial chemistries by di Fenizio and Banzhaf (2000). This work also introduces conservation laws, even though they rely on decomposing expressions to their individual components, introducing some randomness into the dynamics that here we have avoided. Furthermore, like AlChemy, it reduces expressions until they reach their normal forms, explicitly forbidding recursive and other types of expressions that do not converge.

Other very related artificial chemistries are based on graph rewriting systems. Squirm3 (Hutton, 2002) is a chemistry in which atoms are placed in a 2D grid where they can react with each other, creating or breaking bonds. Interestingly, Hutton (2002) shows that self-reproducing evolvable chains can emerge in this environment when using the right set of reactions, which like in the artificial chemistry here introduced, have intrinsic conservation laws. Yet, it remains unclear which characteristics of those reactions make it possible for this emergence to occur. In this work, we study the hypothesis that self-reproducing metabolisms are linked to recursive programs expressed through a network of autocatalysed reactions endowed with universal computation capabilities. In a different vein, *Chemlambda* (Buliga, 2020; Buliga & Kauffman, 2014) is a Turing-complete graph rewriting artificial chemistry that allows the encoding of *λ*-calculus and combinatory logic operators. As such, it is complementary in many ways to the system proposed here. While the original Chemlambda did not consider conservation laws, an extension called Hapax is currently exploring them. Yet, emergent phenomena have not yet been explored under this formalism.

## 3 Combinatory Logic

*Language of Thought*(Piantadosi, 2021). One of the main advantages of combinatory logic is its formal simplicity while capturing Turing-complete expressiveness. In contrast to other mathematical formalisms, such as

*λ*-calculus, it dispenses with the notion of variables and all the necessary bookkeeping that comes with it. For instance, a function

*f*(

*x*) = 1 +

*x*+

*y*would be nonsensical, and a function-generating system based on

*λ*-calculus would need to have explicit rules to avoid the formation of such expressions. Instead, combinatory logic expressions are built by composing elementary operators called

*combinators*. Here, we restrict combinators to

*S*,

*K*and

*I*, which form a Turing-complete basis.

^{1}A combinatory logic expression is defined either to be a singleton combinator or recursively, given two expressions

*x*and

*y*, by the application operation (

*x y*). It is important to note that, by convention, application is left-associative and thus, (

*x y z*) and ((

*x y*)

*z*) are equivalent. Given an expression

*e*of the form

*e*= (

*αXβ*) where

*X*is a well-formed subexpression and

*α*and

*β*are some arbitrary left and right context, it can be rewritten in combinatory logic, as follows:

*αXβ*) matches the left-hand side of any of the rules above, the term

*X*is called a

*reducible expression*or

*redex*. A single expression can contain multiple redexes. If no rule is matched, the expression is said to be in

*normal form*. The application of these rules to rewrite any redex is called a

*(weak) reduction*. For example, the expression (

*SII*(

*SII*)) could be reduced as follows (underlining the corresponding redexes being rewritten): $(SII(SII)\u0332)\u22b3(I(SII)\u0332(I(SII)))\u22b3(SII(I(SII)\u0332))\u22b3(SII(SII))$. Thus, this expression, also known as the

*omega combinator*, reduces to itself. We will later see that expressions such as this one will be important for the self-organizing behavior of the system introduced here. In contrast, (

*SII*) is not reducible because

*S*requires three arguments.

^{2}Additionally, note that (

*I*(

*SII*)(

*I*(

*SII*))) has two redexes that can be rewritten, namely, the outermost or the innermost

*I*combinators. Even though many different evaluation order strategies have been defined (Pierce, 2002), here we opt for picking a redex at random, both because this is more natural for a chemical system and to avoid limitations that would come from following a fixed deterministic evaluation order.

## 4 Combinatory Chemistry

*K*combinator discards a part of the expression (the argument

*g*),

*S*duplicates its third argument

*x*. Thus, to make a chemical system with conservation laws, we posit that, on one hand, reduction operations can generate one or more by-product expressions. On the other hand, the reduction rules can be applied to more than one expression simultaneously. Therefore, using the + symbol to indicate that multiple expressions are being rewritten when it appears on the left-hand side, or more than one expression is being added back to the multiset when it is on the right-hand side, we define reduce reactions for an expression or

*substrate*(

*αXβ*), as follows:

*reducible*if it contains a combinatory chemistry redex (CC-redex). A CC-redex is a plain combinatory logic redex, except when it involves the reduction of an

*S*combinator, in which case a copy of its third argument

*x*(the

*reactant*) must also be present in the multiset $P$ for it to be a redex in combinatory chemistry. For example, the expression

*SII*(

*SII*) is reducible if and only if the third argument of the combinator

*S*, namely (

*SII*), is also present in the set. When a reduction operation is applied, the redex is rewritten following the rules of combinatory logic, removing any reactant from $P$ and adding back to it the

*product*and all

*by-products*, as specified on the right-hand side of the reaction. The type of combinator being reduced gives name to the reaction. For instance, the

*S*-reaction operating on

*SII*(

*SII*) + (

*SII*) removes these two elements from $P$, adding back

*I*(

*SII*)(

*I*(

*SII*)) and

*S*to it. Notably, each of these reduction rules preserves the total number of combinators in the multiset, intrinsically enforcing conservation laws in this chemistry. It is also worth noting that each of these combinators plays different roles in the creation of novel compounds. While

*K*-reactions split the expression, decreasing its total size and complexity,

*S*-reactions create larger and possibly more complex expressions from smaller parts.

In contrast to previous attempts in which expressions were combined and then reduced to normal form (di Fenizio and Banzhaf, 2000; Fontana & Buss, 1994)—thus being forced to exclude expressions that did not reach a normal form—here each reduce reaction corresponds to a *single* reduction step that can always be computed. For this reason, we do not need to take any precautions to avoid recursive expressions, but instead allow these interesting programs to form part of our system's dynamics.

*x*and

*y*, whereas cleavages are the inverse. Note that cleaving (

*xyz*) can only result in (

*xy*) +

*z*because, otherwise, the tree structure would not be preserved.

In combinatory chemistry, computation takes precedence. This means that reduction reactions must happen at much higher rates than those of random recombination. Over the following section we detail how this happens.

### 4.1 Temporal Evolution

The system is initialized with a tabula rasa state containing only expressions with a single combinator *S*, *K*, or *I*. In this way, we can be sure that any emergent diversity is the consequence of the system's dynamics rather than the outcome of an external intervention. Then, it evolves by sampling reactions following Gillespie's algorithm (Gillespie, 1977, 2007). We note that the time evolution algorithm has changed from an earlier version of this work (Kruszewski & Mikolov, 2020) in favor of this more principled approach.

More precisely, we define a propensity function *a*_{j}(**x**) for each reaction *j*, which computes the unnormalized probability that the reaction *j* will occur within an infinitesimal time interval given the system's state vector **x**. The component **x**_{x} indicates the number of instances of an expression *x* that are present in $P$. To define the propensity functions, we make use of reaction rate constants *k*_{X}, *k*_{Π}, $kAS$, $kAK$, $kAI$, for the cleavage, condensation, and *S*, *K*, and *I* reduction reactions, respectively. Importantly, because in combinatory chemistry computation takes precedence, reaction rates *k*_{j} of reduction reactions must be significantly larger than those of random recombinations: *k*_{A∈{S,K,I}} ≫ *k*_{B∈{X,Π}}. The propensity function takes different forms depending on whether the reaction is unimolecular or bimolecular, following the formulation of Gillespie (2007). For unimolecular reactions, such as *I* and *K* reductions, and cleavages, *a*_{j} takes the form $aj(x)=cjxx1$ where $xx1$ is the number of copies of the reaction's substrate *x*_{1} in **x**. For bimolecular reactions like *S* reductions and condensation with substrate *x*_{1} and reactant *x*_{2}, it takes the form $aj(x)=cjxx1xx2$ if *x*_{1}≠*x*_{2} and the form $aj(x)=cj12xx1(xx1\u22121)$ if *x*_{1} = *x*_{2}, where $xx1$ and $xx2$ are the number of expressions of *x*_{1} and *x*_{2}, respectively. In the case of unimolecular reactions, *c*_{j} is equal to a reaction rate constant *k*_{J} where *J* ∈{*X*,*A*_{K},*A*_{I}} is the type of reaction *j*, whereas for bimolecular reactions *c*_{j} = *k*_{J}/*Ω* if *x*_{1}≠*x*_{2} and 2*k*_{J}/*Ω* if *x*_{1} = *x*_{2}, where *Ω* is the volume of the simulated container (Gillespie, 2007) and *J* ∈{*Π*,*A*_{S}}.

*j*is

*p*(

*j*) =

*a*

_{j}(

**x**)/

*a*

_{0}where $a0=\u2211jaj(x)$, and the time interval until its occurrence is distributed as an exponential distribution with parameter

*a*

_{0}. To sample from this process, we follow the direct method (Gillespie, 1977). For efficiency reasons, we factorize the reaction probability by the kind of the next reaction

*J*being a condensation (

*Π*), a cleavage (

*X*), or a reduction (

*A*=

*A*

_{S}∪

*A*

_{K}∪

*A*

_{I}), where {

*A*

_{Z}}

_{Z∈{S,K,I}}stands for the set of all possible

*Z*reductions:

*p*(

*J*) =

*a*

_{J}(

**x**)/

*a*

_{0}(

**x**), $a0(x)=\u2211J\u2208{\Pi ,X,A}aJ(x)$, and $aJ=\u2211j\u2208Jaj(x)$.

*a*

_{J}takes a simpler form that can be efficiently computed by keeping track of the total number of expressions $\u2211xxx$, whereas for reduce reactions we must explicitly sum over all of reactions:

*j*

_{1}is the substrate of the reduction reaction

*j*and

*j*

_{2}is the reactant in the case of reducing an

*S*combinator.

Thus, to sample a reaction, we first sample the reaction type *J*. If it is a cleavage, then we sample one expression according to its concentration, and cleave it into two subexpressions by dividing it at the root. If it is a condensation, then we sample two expressions according to their concentration (the second, after removing one element of the first one), and combine them through the application operator. Finally, if it is a reduction, then we sample one reduce reaction from the space of all possible reduce reactions, with probability proportional to its propensity. In practice, we just need to compute all possible reductions involving the expressions that are present in the system's state.^{3} The complete algorithm describing the temporal evolution of our system is summarized on Algorithm 1.^{4}

## 5 Emergent Structures

Having described the dynamics of combinatory chemistry, we now turn to discuss how we can characterize emergent structures in this system. For this, we first discuss how autocatalytic sets can be applied for this purpose. Second, we observe that this formalism may not completely account for some emergent structures of interest, and thus, we propose to instead track reactant consumption rates as a proxy metric to uncover the presence of these structures. Finally, we enrich this metric to detect consumption levels that are above chance levels.

### 5.1 Autocatalytic Sets

Self-organized order in complex systems is hypothesized to be driven by the existence of attracting states in the system's dynamics (Kauffman, 1993; Walker & Ashby, 1966; Wuensche et al., 1992). Autocatalytic sets (Kauffman, 1993) were first introduced by Stuart Kauffman in 1971 as one type of such attractors that could help explain the emergence of life in chemical networks. (See Hordijk, 2019, for a comprehensive historical review on the topic.) Related notions are the concept of autopoiesis (Maturana & Varela, 1973), and the hypercycle model (Eigen & Schuster, 1978).

*Autocatalytic sets* are reaction networks that perpetuate in time by relying on a network of catalyzed reactions, where each reactant and catalyst of a reaction is either produced by at least some reaction in the network, or it is freely available in the environment. This notion was later formalized in mathematical form (Hordijk & Steel, 2004; Hordijk et al., 2015) with the name of *reflexively autocatalytic food-generated* sets (RAFs). Particularly, a *chemical reaction system* (CRS) is first defined to denote the set of possible molecules, the set of possible reactions, and a catalysis set indicating which reactions are catalyzed by which molecules. Furthermore, a set of freely available molecules in the environment, called the *food set*, is assumed to exist. Then, an autocatalytic set (or RAF set) $S$ of a CRS with associated food set *F* is a subset of reactions, which is:

*reflexively autocatalytic*(RA): Each reaction $r\u2208S$ is catalyzed by at least one molecule that is either present in*F*or can be formed from*F*by using a series of reactions in $S$ itself.*food-generated*(F): Each reactant of each reaction in $S$ is either present in*F*or can be formed by using a series of reactions from $S$ itself.

### 5.2 Metabolic Structures in Combinatory Chemistry

In combinatory chemistry, all reducing reactions take precedence over random condensations and cleavages, and thus, they proceed at a higher rate than random reactions without requiring definition of a catalyst. Therefore, we say that they are *autocatalyzed* and note that they all trivially satisfy condition 1. Thus, autocatalytic sets in this system are defined in terms of subsets of reduce reactions in which every reactant is produced by a reduce reaction in the set or is freely available in the environment (condition 2). For example, if we assume that *A* = (*SII*) is in the food set, Figure 1 shows a simple autocatalytic set associated with the expression (*AA*) = (*SII*(*SII*)). As shown, a chain of reduce reactions keeps the expression in a self-sustaining loop: When the formula is first reduced by the reaction *r*_{1}, a reactant *A* is absorbed from the environment and one *S* combinator is released. Over the following steps, two *I* combinators are sequentially applied and released back into the multiset $P$, with the expression returning back to its original form. In other words, in one cycle the expression (*AA*) absorbs one copy of *A* from the environment releasing back into it the elementary components obtained as by-products of the reactions. We refer to this process as a metabolic cycle because of its strong resemblance to its natural counterpart. For convenience, we write the just described cycle as (*AA*) + *A* ⇒^{*}(*AA*) + *ϕ*(*A*), where *ϕ* is a convenience function that allows us to succinctly represent the decomposition of *A* as elementary combinators by mapping its argument into *n*_{S}*S* + *n*_{K}*K* + *n*_{I}*I* where *n*_{S}, *n*_{K}, and *n*_{I} stand for the number of *S*, *K*, and *I* combinators in *A*, and the ⇒^{*} symbol indicates that there exists a pathway of reduction reactions from the reactives in the left-hand side to the products in the right-hand side.

It should be noted that there are (infinitely) many possible pathways, some of them not necessarily closing the loop. For instance, instead of reducing (*SII*(*I*(*SII*))) through the only available *I* reduction (*r*_{4} in Figure 1), it could also be possible to reduce the full expression by applying the *S* reduction rule as long as at least one copy of (*I*(*SII*)) is present in $P$, yielding (*I*(*I*(*SII*))(*I*(*I*(*SII*)))) as a result. We could continue reducing this expression, first through the two outermost *I* combinators, thus obtaining (*SII*(*I*(*I*(*SII*)))), and then though the first *S* combinator, provided that a copy of (*I*(*I*(*SII*))) is present in $P$, obtaining (*I*(*I*(*I*(*SII*)))(*I*(*I*(*I*(*SII*))))) as a result. This outermost-first reduction order could be followed ad infinitum, stacking ever more *I* combinators within the expression. Nonetheless, this also has a cost, as there must be copies of reactants (*I*(*SII*)), (*I*(*I*(*SII*))), …, (*I*(…(*I*(*SII*)))) in the multiset $P$ for these reactions to occur, and longer expressions are normally scarcer. For this reason, when we talk about the metabolic cycle we usually refer to the *least effort* cycle, in which *I* and *K* combinators are reduced before *S* ones, and where *S* combinators with shorter or naturally more frequent expressions in the third argument (the reactant) are reduced before longer ones.

While autocatalytic sets provide a compelling formalism to study emergent organization in artificial chemistry, they also leave some blind spots for detecting emergent structures of interest. Such is the case for *recursively growing* metabolisms. Consider, for instance, *e* = (*S*(*SI*)*I*(*S*(*SI*)*I*)). This expression is composed of two copies of *A* = (*S*(*SI*)*I* ) applied to itself (*AA*). As shown in Figure 2, during its metabolic cycle, it will consume two copies of the element *A*, metabolizing one to perform its computation and appending the other one to itself, thus (*AA*) + 2*A* ⇒^{*}(*A*(*AA*)) + *ϕ*(*A*). As time proceeds, the same computation will occur recursively, thus (*A*(*AA*)) + 2*A* ⇒^{*}(*A*(*A*(*AA*))) + *ϕ*(*A*), and so on. While this particular behavior cannot be detected through autocatalytic sets, because the resulting expression is not exactly equal to the original one, it still involves a structure that preserves in time its functionality.

Moreover, while the concept of autocatalytic set captures both patterns that perpetuate themselves in time and patterns that also multiply their numbers, it does not explicitly differentiate between them. A pattern with a metabolic cycle of the form *AA* + *A* ⇒^{*}*AA* + *ϕ*(*A*) (as in Figure 1) keeps its own structure in time by metabolizing one *A* in the food set, but it does not self-reproduce. We call such patterns *simple autopoietic* (Maturana & Varela, 1973). In contrast, for a pattern to be *self-reproducing* it must create copies of itself that are later released as new expressions in the environment. For instance, consider a metabolic cycle in Figure 3 with the form (*AA*) + 3*A* ⇒^{*}2(*AA*) + *ϕ*(*A*). This structure creates a copy of itself from two freely available units of *A* and metabolizes a third one to carry out the process.

### 5.3 Metrics for Detecting Emergence

All structures identified in the previous section have in common the need to absorb reactants from the environment to preserve themselves in homeostasis. Furthermore, because they follow a cyclical process, they will continuously consume the same types of reactants. Thus, we propose tracking *reactant consumption* as a proxy metric for the emergence of structures. In other words, we note that the only operation that allows an expression to incorporate a reactant into its own body is the reduction of the *S* combinator, and thus we count the number of reactants *x* consumed by expressions of the form *α*(*Sf g x*)*β* in the time interval [*t* : *t* + *δ*) normalized by the total number of reactants consumed in the same interval, denoting it as *O*_{t:t +δ}(*x*), or simply *O*(*x*).

Indeed, this metric was used in a previous version of this work (Kruszewski & Mikolov, 2020) to detect emergent structures. Yet, it has the problem that it is also sensitive to reactants that are consumed at high rates just because they are very frequent, and as a consequence, expressions containing them as a third argument to an *S* combinator are expected to be common as well, inflating this metric for uninteresting reactions.

*pointwise information*.

^{5}

*I*(

*x*) associated with observing a consumption rate

*O*(

*x*) for a reactant

*x*in contrast to its consumption rate if the process were driven by chance only,

*R*(

*x*):

*R*(

*x*) as the relative frequency of any expression on a hypothetical process where no reduce reactions are present, but instead only random mixing given by random collisions and cleavages. To model this process, we follow the formulation of the reaction kinetic equations given by Fellermann et al. (2017):

*c*

_{X}=

*k*

_{X},

*c*

_{Π}=

*k*

_{Π}/

*Ω*, and

*c*

_{0}=

*a*

_{0}(

**x**) is the partition function, and define

*R*(

*x*) as the normalized equilibrium concentration $xx*$ of expression

*x*under these random kinetics:

*x*| is the length of expression

*x*and b is a constant that depends on the boundary conditions. In particular, we ask for the initial mass of the system, represented by the dimensionless constant M, to be conserved in the equilibrium distribution by equating it to the total number of combinators:

^{n}

*C*

_{n−1}expressions of length

*n*, where k = 3 is the number of different possible combinators and

*C*

_{n}is the

*n*th Catalan number, this equation becomes:

**x**

^{*}:

*I*(

*x*). We refer to Appendix 1 for all the computations verifying these claims. In the following section, we show its effects experimentally.

## 6 Experiments and Discussion

### 6.1 Metrics

We began by testing the effect of the proposed information metric on a system with uniformly distributed M = 10^{4}*S*, *K*, and *I* combinators simulated until it reached time *T* = 1,000. Candidate reactions are sampled by 10 threads working simultaneously. For all our experiments, we used *k*_{Π} = *k*_{X} = 1, *k*_{K} = *k*_{I} = 10^{4}, *k*_{S} = 10^{6} and fixed the dimensionless constant *Ω* = M. We leave a thorough exploration of parameter values for future work. With these parameters, the expressions associated with random kinetic dynamics become $b=log(2+5)k$ and $R(x)=12(3+5)(2+5)k\u2212|x|$. All curves are smoothed through locally averaging every data point at time *t* with those in the interval [*t* − 10,*t* + 10].

We first present the raw consumption rates *O*(*x*), which are displayed in Figure 4(a). As shown, some of the most frequently consumed reactants include the atomic combinators *S*, *K*, and *I*, and some of their binary compositions. Binary reactants such as (*KI*) and atomic ones such as *I* do not form part of any stable structure, and the expressions consuming them are produced by chance. Yet, they are used with considerable frequency because *S* combinators are more likely to be applied to shorter arguments than longer ones. For this reason, the consumption of *I* is considerably higher than the consumption of (*KI*). Yet, even though by the same argument the consumption rate of *A* = (*SII*) should be below binary reactants, self-organization into autopoietic patterns drives the usage of this reactant above what would be expected if chance were the only force at play. Indeed, the curve corresponding to the consumption of the reactant *A* = (*SII*) is associated with the autopoietic pattern (*SII*(*SII*)), composed of two copies of this reactant, and a metabolic cycle of the form (*AA*) + *A* ⇒^{*}(*AA*) + *ϕ*(*A*), as shown in Figure 1. Nonetheless, (*S*(*SI*)*I*), used by the structure in Figure 2, is cramped at the bottom with the binary reactants in the consumption rate plot.

When applying the proposed information metric *I*(*x*) the curves corresponding to emerging structures become featured at the top of the graph whereas all other reactants that are mostly driven by random generation are pushed to the bottom, as shown in Figure 4(b). Therefore, this experiment showcases the usefulness of this metric in separating consumption rates propelled by emergent structures from uninteresting fortuitous ones.

### 6.2 Emergent Metabolic Structures

For the next part, we studied some of the emergent structures in a system initialized with a tabula rasa state consisting of M = 10^{6} evenly distributed *S*, *K*, and *I* combinators simulated for 1,000 units of time.

In an earlier version of this work (Kruszewski & Mikolov, 2020), we had used a supplementary mechanism called *reactant assemblage*, through which we “fed” emerging structures with their required reactants to allow them to be spotted through sheer reactant consumption rates in a relatively small system with only 10,000 combinators. Here, we simplified the model and dropped the need for this mechanism, thanks both to simulating a larger system with 1M combinators and to the usage of the re-weighting metric presented in Metrics for Detecting Emergence, above, which allowed us to spot emergent complex structures even at very low levels of reactant consumption rates.

We began by analyzing some general metrics on one given run of the system. First, and in agreement with previous work (Meyer et al., 2009), we find that there is a tendency for the system to create increasingly longer expressions, as shown by the length of the largest expression in the system (Figure 5(a)). We also count the number of distinct expressions present in the system at any given time, which we display in Figure 5(b). As can be seen, diversity explodes at the beginning, driven by the random recombination of the elementary combinators, peaking very early on, then decreasing fast at first, and then slower after about time 200. This behavior is consistent with a system that self-organizes into attracting states dominated by fewer, but more frequent expressions. Again, this result agrees with previous work that has shown that diversity decreases over time on a number of ACs (Banzhaf et al., 1996; Dittrich & Banzhaf, 1998; Dittrich et al., 2001). However, we note that in contrast with many of these systems that were initialized with random elements, thus, maximizing diversity from the start, here the initial diversity is an emergent property of the system dynamics as it is only initialized with three different elements, namely, the singleton expressions *S*, *K*, and *I*. Next, we note that the proportion of reducing vs. random recombination reactions is an emergent property of the system which depends on the number of reducible expressions that are present in its state at any given time. As can be seen in Figure 5(c), this rate, which necessarily starts at 0, increases sharply in the beginning, reaching slightly more than 30%. Then, it starts to slowly decrease, but remains always above 25% during the studied period.

However, it is unclear from these results whether there are emergent complex structures that act as attractors, or if a different explanation for these outcomes is at play. To answer this question, we turned to the reactant consumption rates weighted by Equation 9 to detect whether specific reactants were more prominently used by some emergent structures.

Results are shown in Figure 6(a) for a few selected reactants that highlight the emergence of different types of structures, including simple autopoietic, recursively growing, and self-reproducing ones. Interestingly, they can emerge at different points in time, co-exist, or be driven to extinction. In parallel, Figure 6(b) shows the number of copies of these reactants available at each discrete time interval of unit length.

While there are (infinitely) many possible expressions that can consume a given type of reactant, only a few of them will correspond to emergent metabolisms. In general, we observed that expressions that consume any given reactant *A* are typically composed of multiple juxtaposed copies of this reactant in an expression of the form (*AA*). This is linked to the fact that in order to express recursive functions in combinatory logic, the function (in this case denoted by *A*) must take itself as its own argument, but this particularity also confirms the old adage: “Tell me what you eat and I will tell you what you are.”

The first curve in Figure 6(a) (in the order of the legend) corresponds to the reactant *A* = (*SII*), associated with the *simple autopoietic* structure of Figure 1. The consumption rates are much more stabilized in comparison with the results reported in (b), which belonged to a system that was 100 times smaller. Furthermore, the number of copies of this reactant decreases sharply at the beginning, but then, intriguingly, they slowly increase again until reaching a concentration of about 200 units in total.

The second reactant in the plot, (*AA*) (where *A* = (*SII*)) is not actually consumed by a metabolism. Instead, it is consumed by one of the two possible reductions of the expression (*A*(*AA*)) = (*SII*(*SII*(*SII*))), which reduces to (*AA*(*AA*)) = (*SII*(*SII*)(*SII*(*SII*))). This new structure is yet another autopoietic structure that is composed of two copies of (*AA*) = (*SII*(*SII*)) that persist in time by consuming the (*SII*) reactant independently of each other.

The next three reactants in Figure 6(a) correspond to *recursively growing* structures. The first one uses the reactant *A* = (*S*(*SI*)*I*) and follows a right-branching cycle that linearly increases the size of the structure: (*AA*) + 2*A* ⇒^{*}(*A*(*AA*)) + *ϕ*(*A*) (Figure 2). The second one, with reactant *A* = (*S*(*SII*)*I*), also grows recursively, although with a left-branching structure: (*AA*) + 2*A* ⇒^{*}(*AAA*) + *ϕ*(*A*). Third, there is the reactant *A* = (*S*(*SSI*)*K*), which is associated with a binary-branching recursive structure.^{6}

Finally, the curve of the reactant *A* = (*SI*(*S*(*SK*)*I*)) corresponds to the emergence of a *self-reproducing* structure, following a cycle of the form (*AA*) + 3*A* ⇒^{*}2(*AA*) + *ϕ*(*A*), thus duplicating itself after metabolizing one copy of the reactant in the process. Interestingly, this structure emerges not only through the effect of random recombination, but also thanks to self-organization. Instead of being a product of a random combination of two copies of the reactant *A*, it often emerges after the condensation of other reactants, such as (*S*(*SI*(*SI*))(*SI*)) and (*S*(*SK*)*I*), (*SI*(*SI*(*SI*))) and (*S*(*SK*)*I*), or (*SII*) and (*SI*(*S*(*SK*)*I*)), among other possibilities that induce a chain of reduction reactions that result in producing at least one copy of (*AA*).

It is worth noting that one cycle of this structure's metabolism requires three copies of the reactant. When there are none in the environment, the structure cannot proceed with its metabolism and this structure is vulnerable to being cleaved or being condensed with an expression that will cause it to stop functioning normally. Because of the rare supply of its six-combinator-long reactant, plus the fact that all existing structures compete with one another, the structure will fall into extinction at about time *t* = 200, although it will make a short comeback at time *t* = 400, when only two reproduction cycles are completed, before falling back into oblivion. Nonetheless, we speculate that in larger systems the population will recover from periods of resource scarcity by following the periodic dynamics originally proposed by Lotka (1910), thus allowing these structures to perpetuate in time.

Notably, recursive structures also experience low resource conditions starting at time *t* = 200. However, they might be able to cope with conditions of low resources more effectively because their repetitive structure allows them to be cleaved and still conserve their function. For instance, *A*(*AA*) ⇒ *A* + (*AA*) still leaves a functioning (*AA*) structure. When new copies of *A* become available either through the random condensation of combinators released by every computed reduction or by some other process, they can consume them and grow back again.

### 6.3 Other Bases

Thus far, we have explored emergent structures on systems composed of *S*, *K*, and *I* combinators. Because one of the main goals of this work is linking the emergence of metabolic structures to the core computational properties of the underlying chemistry, we explored other possible (smaller) bases.

First of all, we note that using *K* or *I* combinators by themselves would not produce any meaningful structure, neither would a combination thereof. Any expression formed out of these combinators can only decay into binary expressions at most. In the case of the *S* combinator, while it still allows expressions that can be reduced infinitely, such as (*SSS*(*SSS*)(*SSS*)), there are no expressions that do so by continually consuming the same reactant. This is because, as shown by Waldmann (2000), there are no expressions *X* composed only of the *S* combinator such that *X* ⇒^{*}(*αXβ*). Therefore, autocatalytic sets are not possible in this environment. For all of these reasons, it is not meaningful to look at simulations containing either one of the *S*, *K*, or *I* combinators alone.

The only two possible remaining subsets of combinators are the pairs *S* − *I* and *S* − *K*, with only the latter being Turing-complete. We ran again experiments in which only one pair of combinators was present. We used 10^{4} combinators for the *S* − *I* basis, and 10^{6} for the *S* − *K* one. Figure 7(a) shows the information traces for the reactant (*SII*), consumed by the simple autopoietic structure in Figure 1, and (*S*(*SI*)*I*), consumed by the structure in Figure 2, corroborating that these structures also emerge when only *S* and *I* combinators are present. In Figure 7(b), we also note that a homologous autopoietic structure emerges with *S* − *K* basis, as shown by the consumption of the reactant *A* = (*S*(*SK*)(*SKK*)). This structure has a metabolic cycle of the form (*AA*) + 3*A* ⇒^{*}(*AA*) + 2*A* + *ϕ*(*A*), thus needing to absorb two extra copies of the reactant to perform the computation, even if they are later released unchanged. Additionally, recursively growing structures can be spotted in the *S* − *K* base, as shown by the consumption of *A* = (*S*(*SSK*)), which is associated with a metabolic cycle of the form (*AAA*) + 3*A* + *AA* ⇒^{*}(*SSKA*(*AAA*)) + 4*S* + *K* + *AA*, thus growing and incorporating *SSK* as a prefix in the process. Interestingly, *AA* is absorbed and released intact, which could be construed as an emergent catalyst for the reaction: Even though we can interpret each reduce reaction to be autocatalyzed, reaction chains can have emergent properties, such as in this case, where a reactant is just used to complete the metabolic cycle and then released.

Finally, we note that while we have not yet witnessed an emergent self-reproducing structure using the *S* − *K* only, it can in fact be represented with the expression (*AA*) where *A* = (*S*(*SK*)(*S*(*SK*)(*SKK*))), and having as metabolic cycle (*AA*) + 5*A* ⇒^{*}2(*AA*) + *A* + *KA* + 5*S* + 3*K*. However, finding it requires discovering a considerably longer expression than the one in the *SKI* basis and consumes much longer reactants, which might explain why we have not yet found them to emerge in our simulations at the current scale. Nonetheless, it is worth noting that in the incomplete *S* − *I* basis it would not be possible to represent a self-reproducing metabolic cycle of the form *X* ⇒^{*}2*X* + … for any *X* because this would ask for a reaction capable of producing as a by-product an arbitrarily long expression that reduces to *X* at some point during the cycle. Yet, *S* and *I* combinators have only themselves as by-products of their respective reactions and cannot fulfill this requisite.

## 7 Conclusions

We have introduced combinatory chemistry, an algorithmic artificial chemistry based on combinatory logic. Even though it has relatively simple dynamics, it gives rise to a wide range of autopoietic structures, including recursively growing and self-reproducing ones. These structures feature reaction cycles that bear a striking resemblance to natural metabolisms. All of them take the form of recursive algorithms that continually consume specific resources, incorporating them into their structure and decomposing them to perform their function. Thanks to combinatory logic being Turing-complete, the presented system can theoretically represent patterns of arbitrary complexity. Furthermore, we have argued that in the context of the *SKI* basis, this computational universality property is both necessary and sufficient to represent self-reproducing patterns, while also showing them to emerge at least in the case where all three combinators are present. On the other hand, a non-universal basis consisting only of *S* and *I* combinators can still give rise to simple autopoietic and recursively growing structures.

The proposed system does not need to start from a random set of initial expressions to kick-start diversity. Instead, this initial diversity is the product of the system's own dynamics, as it is only initialized with elementary combinators. In this way, we can expect that this first burst of diversity is not just a one-off event, but it is deeply embedded into the mechanics of the system, possibly allowing it to keep on developing novel structures continually.

To conclude, we have introduced a simple model of emergent complexity in which self-reproduction emerges autonomously from the system's own dynamics. In the future, we will seek to apply it to explain the emergence of evolvability, one of the central questions in Artificial Life. We believe that the simplicity of the model, along with the encouraging results presently obtained and the creativity obtained from balancing computation with random recombination to search for new forms, leaves it in good standing to tackle the many challenges that lie ahead.

## Acknowledgements

We would like to thank the two anonymous reviewers for their thorough and insightful feedback, which significantly improved earlier versions of this work.

## Appendices

## Appendix 1: Random Kinetics Derivations

### A1.1 Equilibrium Distribution

*z*= (

*xy*), then |

*z*| = |

*x*| + |

*y*|, that an expression

*x*= (

*x*

_{l}

*x*

_{r}) can only be cleaved into

*x*

_{l}and

*x*

_{r}, while vice versa, only the condensation of

*x*

_{l}and

*x*

_{r}can form

*x*, and that there are k

^{n}

*C*

_{n−1}possible expression of length

*n*(where

*C*

_{n}stands for the

*n*th Catalan number and k = 3 is the number of combinators), and that we must sum twice the factors corresponding to expression

*z*that can be formed either as (

*xy*) or as (

*yx*), then we have:

### A1.2 Boundary Conditions

*m*=

*n*− 1. Then, using the definition of $Cn=1n+12nn$, first, and the generating function for central binomial coefficients (Lehmer, 1985), second, we obtain:

*a*| < 1/4. Finally, replacing Equation 21 into 20 and expanding

*a*, we have:

### A1.3 Normalizing Constant

*m*=

*n*− 1. Thus, using again that there are k

^{n}

*C*

_{n−1}expressions of length

*n*:

## Appendix 2: Metabolic Cycles

The following derivations show one of the possible pathways that each of the described structures can undertake as they develop. Whenever more than one reduction is possible, the “least effort” path is followed, namely, *I* and *K* combinators are reduced first, and then *S* combinators with the shortest reactant (i.e., third argument). Also, note that every expression written as (( *fx*)( *g y*)) can also be written simply as ( *fx*( *g y*)), a fact that we often make use of when applying an *S* reduction.

### A2.1 Metabolic Cycle of a Simple Autopoietic Pattern

*A*= (

*SII*). Then,

### A2.2 Metabolic Cycle of a Right-Branching Recursively Growing Structure

*A*= (

*S*(

*SI*)

*I*). Then,

### A2.3 Metabolic Cycle of a Binary-Branching Structure

*A*= (

*S*(

*SSI*)

*K*). Then (

*AA*) can follow the metabolic pathway:

*A*(

*KA*)) can be reduced as follows

*AA*) + 2

*A*+ 5(

*KA*) + (

*K*(

*KA*)) ⇒

^{*}

*AA*(

*AA*) + 4(

*K*(

*KA*)) + 2

*ϕ*(

*A*).

### A2.4 Metabolic Cycle of a Self-Reproducing Expression

*A*= (

*SI*(

*S*(

*SK*)

*I*)). Then,

### A2.5 Arrival of the Self-Reproducing Expression

*AA*) with

*A*= (

*SI*(

*S*(

*SK*)

*I*)) often emerged from the condensation of two expressions leading to a chain of reactions that resulted in (

*AA*). Here we show one simple path involving the condensation of (

*SI*(

*SI*)) and (

*S*(

*SK*)

*I*) to produce (

*SI*(

*SI*)(

*S*(

*SK*)

*I*)). Let us call

*B*= (

*S*(

*SK*)

*I*), and note that

*A*= (

*SIB*). The reduction chain that leads to (

*AA*) proceeds as follows:

### A2.6 Metabolic Cycle of a Simple Autopoietic Pattern on the *S* − *K* Basis

*A*= (

*S*(

*SK*)(

*SKK*)). Then,

### A2.7 Metabolic Cycle of a Recursively-Growing Expression on the *S* − *K* Basis

*A*= (

*S*(

*SSK*)). Then,

### A2.8 Metabolic Cycle of a Self-Reproducing Expression on the *S* − *K* Basis

*A*= (

*S*(

*SK*)(

*S*(

*SK*)(

*SKK*))). Then,

## Notes

^{1}

As a matter of fact, *S* and *K* suffice because *I* can be written as (*SKK*). The inclusion of I simply allows for expressing more complex programs with shorter expressions.

^{2}

Also, *I* cannot be reduced with I as an argument because (*SII*) = ((*SI*)*I*) and thus, the second I is not an argument for the first I but to (*SI*).

^{3}

Some expressions can participate in a very large number of reductions, considerably slowing the simulation. For this reason, the system is currently limited to computing up to 10 reductions per expression in no special order.

^{4}

We make available the code to simulate combinatory chemistry in https://github.com/germank/combinatory-chemistry.

^{5}

The name and the formula are related to the pointwise mutual information (PMI) metric (Church & Hanks, 1990) that has been extensively used in computational linguistics. PMI computes the log ratio between the empirical co-occurrence probability of two events *x* and *y* with respect to their expected probability if these two events were independent, as given by the product of the marginals. Like PMI, the proposed metric computes the log ratio between observed odds and chance-driven ones.

^{6}

See Appendix 2 for more details on these derivations.