## Abstract

The emergence of self-sustaining autocatalytic networks in chemical reaction systems has been studied as a possible mechanism for modeling how living systems first arose. It has been known for several decades that such networks will form within systems of polymers (under cleavage and ligation reactions) under a simple process of random catalysis, and this process has since been mathematically analyzed. In this paper, we provide an exact expression for the expected number of self-sustaining autocatalytic networks that will form in a general chemical reaction system, and the expected number of these networks that will also be uninhibited (by some molecule produced by the system). Using these equations, we are able to describe the patterns of catalysis and inhibition that maximize or minimize the expected number of such networks. We apply our results to derive a general theorem concerning the trade-off between catalysis and inhibition, and to provide some insight into the extent to which the expected number of self-sustaining autocatalytic networks coincides with the probability that at least one such system is present.

## 1 Introduction

A key step in the origin of life is the formation of a metabolic network that is both self-sustaining and collectively autocatalytic (Arsène et al., 2018; Hayden et al., 2008; Liu et al., 2020; Vaidya et al., 2012; Vasas et al., 2012; Xavier et al., 2020). Systems that combine these two general properties have been studied within a formal framework that is sometimes referred to as RAF theory (Hordijk & Steel, 2017). We give precise definitions shortly but, roughly speaking, a RAF (Reflexively Autocatalytic and F-generated) set is a subset of reactions where the reactants and at least one catalyst of each reaction in the subset can be produced from an available food set by using reactions from within the subset only.

The study of RAFs traces back to pioneering work on collectively autocatalytic sets in polymer models of early life (Kauffman, 1971, 1986, 1993), which was subsequently developed mathematically (see (Hordijk, 2019, and Hordijk & Steel, 2017) and the references therein). RAF algorithms have been applied recently to investigate the traces of earliest metabolism that can be detected in large metabolic databases across bacteria and archaea (Xavier et al., 2020), leading to the development of an open-source program to analyze and visualize RAFs in complex biochemical systems (Huson & Steel, 2020). RAF theory overlaps with other graph-theoretic approaches in which the emergence of directed cycles in reaction graphs plays a key role (Bollobás & Rasmussen, 1989; Jain & Krishna, 1998; Jain & Krishna, 2001), and is also related to (M, R) systems (Cornish-Bowden & Cárdenas, 2007; Jaramillo et al., 2010) and chemical organization theory (Dittrich & Speroni di Fenizio, 2007).

RAF theory has also been applied in other fields, including ecology (Cazzolla Gatti et al., 2018) and cognition (Gabora & Steel, 2017), and the ideas may have application in other contexts. In economics, for instance, the production of consumer items can be viewed as a catalyzed reaction; for example, the production of a wooden table involves nails and wood (reactants) and a hammer (a catalyst, as it is not used up in the reaction but makes the reaction happen much more efficiently), and the output (reaction product) is the table. On a larger scale, a factory is a catalyst for the production of the items produced in it from reactants brought into the factory. In both these examples, notice that each reactant may be either a raw material (i.e., the elements of a “food set”) or a product of other (catalyzed) reactions, whereas the products may, in turn, be reactants, or catalysts, for other catalyzed reactions. Products can sometimes also inhibit reactions; for example, the production of internal combustion engines resulted in processes for building steam engines being abandoned.

In this paper, we extend RAF theory further by investigating the impact of different modes of catalysis and inhibition on the appearance of (uninhibited) RAF subsets. We focus on the expected number of such sets (rather than on the probability that at least one such set exists, which has been the focus of nearly all earlier RAF studies [Filisetti et al., 2014; Mossel & Steel, 2005]). Using a mathematical approach, we derive explicit and exact analytical expressions for the expected number of such uninhibited RAF subsets, as well as providing some insight into the expected population sizes of RAFs for the catalysis rate at which they first appear (Section 4.2). In particular, we show that for simple systems, with an average catalysis rate that is set at the level where RAFs first appear, the expected number of RAFs depends strongly on the variability of catalysis across molecules. At one extreme (uniform catalysis), the expected number of RAFs is small (e.g., one, or a few), while at the other extreme (all-or-nothing catalysis) the expected number of RAFs grows exponentially with the size of the system.

The motivation for looking at the expected number of RAFs (rather than the probability that a RAF exists) is twofold. First, by focusing on expected values it is possible to present certain exact results (see Theorem 1), rather than just inequalities or asymptotic results, while still gaining some information about the probability that a RAF exists. Second, in origin of life studies, it is relevant to consider populations of self-sustaining autocatalytic chemical networks, which may be subject to competition and selection, a topic that has been explored by others (see, e.g., Szathmáry, 2000; Vasas et al., 2012; Virgo & Guttenberg, 2015), and information concerning the likely diversity of RAFs available in a given chemical reaction system is therefore a natural question. In previous analyses where RAFs have been identified, subsequent analysis has revealed a large number of RAFs present within the RAF; for example, for a 7-reaction RAF in a laboratory-based study involving RNA-ribosymes (from Vaidya et al., 2012), more than half of the 27 = 128 subsets of this RAF are also RAFs (cf. Figure 5 of Steel et al., 2008). Simulation studies involving Kauffman's binary polymer model have also identified a large number of RAFs present once catalysis rises above the level at which RAFs first appear (Hordijk & Steel, 2015).

The structure of this paper is as follows. We begin with some formal definitions, and then describe different models for catalysis and inhibition. In Section 3, we present the main mathematical result, along with some remarks, and proof. We then present a number of consequences of our main result, beginning with a generic result concerning the impact of inhibition when catalysis is uniform. We then investigate the impact of different catalysis distributions on the expected number of RAFs arising in “elementary” chemical reaction systems, focusing on the catalysis rate at which RAFs first appear. We end with some brief concluding comments.

### 1.1 Definitions

Let X be a set of molecule types; R a set of reactions, where each reaction consists of a subset of molecule types as input (reactants) and a set of molecule types as outputs (products); and let F be a subset of X (called a food set). We refer to the triple 𝒬 = (X, R, F) as a chemical reaction system with food set and, unless stated otherwise, we impose no further restrictions on 𝒬 (e.g., it need not correspond to a system of polymers, and a reaction can have any positive number of reactants and any positive number of products).

Given a reaction rR, we let ρ(r) ⊆ X denote the set of reactants of r, and π(r) denote the set of products of r. Moreover, given a subset R′ of R, we let π(R′) = ⋃rRπ(r).

A subset R′ of R is said to be F-generated if R′ can be ordered r1, r2, …, r|R′| so that ρ(r1) ⊆ F and for each i ∈ {2, …, |R|}, we have ρ(ri) ⊆ Fπ({r1, …, ri−1}). In other words, R′ is F-generated if the R′ can be built up by starting from one reaction that has all its reactants in the food set, then adding reactions in such a way that each added reaction has each of its reactants present either in the food set or as a product of a reaction in the set generated so far.

Now suppose that certain molecule types in X can catalyze certain reactions in R. A subset R′ of R is said to be “Reflexively Autocatalytic and F-generated” (more briefly, a RAF) if R′ is nonempty and each reaction rR′ is catalyzed by at least one molecule type in Fπ(R′) and R′ is F-generated.

We may also allow certain molecule types to inhibit reactions in R, in which case a subset R′ of R is said to be an “uninhibited RAF” (uRAF) if R′ is a RAF and no reaction in R′ is inhibited by any molecule type in Fπ(R′). The notion of a uRAF was first defined and studied in Mossel and Steel (2005). Notice that inhibition is being applied in a strong sense: a reaction r cannot be part of a uRAF if r is inhibited by at least one molecule type present, regardless of how many molecule types are catalysts for r and present in the uRAF.

Since a union of RAFs is also a RAF, when a RAF exists in a system, there is a unique maximal RAF. However, the same does not apply to uRAFs—in particular, the union of two uRAFs can fail to be a uRAF. These concepts are illustrated in Figure 1.

Figure 1.

A chemical reaction system consisting of the set of molecule types X = {a, b, c, a′, b′, c′, x, x′, w, w′, z, z′}, a food set F = {a, b, c, a′, b′, c′} (each placed inside a green box) and the reaction set R = {r1, r2, r1′, r2′, r3, r4} (bold, beside small white-filled squares). Solid arcs indicate two reactants entering a reaction and a product coming out. Catalysis is indicated by dashed arcs (blue) and inhibition (also called blocking) is indicated by dotted arcs (red). The full set of reactions is not a RAF, but it contains several RAFs that are contained in the unique maximal RAF R′ = {r1, r1′, r2, r2′} (note that r4 is not part of this RAF even though it is catalyzed and the reactants of r4 are present in the food set). The maximal RAF R′ is not a uRAF (e.g., r1′ is inhibited by z which is a product of r2); however, {r1, r2} and {r1′, r2′} are uRAFs, and so are {r1}, {r1′}, and {r1, r1′}.

Figure 1.

A chemical reaction system consisting of the set of molecule types X = {a, b, c, a′, b′, c′, x, x′, w, w′, z, z′}, a food set F = {a, b, c, a′, b′, c′} (each placed inside a green box) and the reaction set R = {r1, r2, r1′, r2′, r3, r4} (bold, beside small white-filled squares). Solid arcs indicate two reactants entering a reaction and a product coming out. Catalysis is indicated by dashed arcs (blue) and inhibition (also called blocking) is indicated by dotted arcs (red). The full set of reactions is not a RAF, but it contains several RAFs that are contained in the unique maximal RAF R′ = {r1, r1′, r2, r2′} (note that r4 is not part of this RAF even though it is catalyzed and the reactants of r4 are present in the food set). The maximal RAF R′ is not a uRAF (e.g., r1′ is inhibited by z which is a product of r2); however, {r1, r2} and {r1′, r2′} are uRAFs, and so are {r1}, {r1′}, and {r1, r1′}.

## 2 Modeling Catalysis and Inhibition

We will model catalysis and also blocking (inhibition) by random processes. To provide for greater generality, we allow the possibility that elements in a subset C (respectively, B) of the food set cannot catalyze (respectively block) any reaction in R. Let c = |F \ C| and b = |F \ B|. Thus c (respectively b) is the number of food elements that are possible catalysts (respectively blockers).

Suppose that each molecule type xX \ C has an associated probability Cx of catalyzing any given reaction in R. The values Cx are sampled independently from a distribution 𝒟, for each xX. This results in a random assignment of catalysis (i.e., a random subset χ of X × 𝓡), where (x, r) ∈ χ if x catalyzes r. Let 𝒞x,r be the event that x catalyzes r.

We assume that:
• (I1) 𝒞 = (Cx, xX \ C) is a collection of independent random variables.

• (I2) Conditional on 𝒞, (𝒞x,r : xX \ C, rR) is a collection of independent events.

Since the distribution of Cx is the same for all xX \ C, we will use C to denote an arbitrary random variable sampled from the distribution 𝒟. Let μC = 𝔼[C] and, for i ≥ 0, let λi be the i–th moment of 1 − C; that is:
$λi=𝔼1−Ci.$
Although our results concern general catalysis distributions, we will pay particular attention to three forms of catalysis that have been considered in previous studies (e.g., Hordijk & Steel, 2016), and that will be compared in our analyses.
• The uniform model: Each xX \ C catalyzes each reaction in 𝓡 with a fixed probability p. Thus, C = p with probability 1, and so μC = p.

• The sparse model: C = u with probability π and C = 0 with probability 1 − π, and so μC = .

• The all-or-nothing model: C = 1 with probability π and C = 0 with probability 1 − π, and so μC = π.

The uniform model is from Kauffman's binary polymer network and has been the default for most recent studies involving polymer models (Hordijk & Steel, 2017). More realistic catalysis scenarios can be modeled by allowing C to take a range of values around μC with different probabilities. The sparse model generalizes the uniform model slightly by allowing a (random) subset of molecule types to be catalysts. In this model, π would typically be very small in applications (i.e., most molecules are not catalysts but those few that are will catalyze a lot of reactions, as in the recent study of metabolic origins, described in Xavier et al., 2020). The all-or-nothing model is a special case of the sparse model. The emergence of RAFs in these models (and others, including a power-law distribution) was investigated in Hordijk and Steel (2016).

For these three models, the associated λi values are given as follows: λ0 = 1, and for all i ≥ 1:
$λi=1−μCi,uniformmodel;1−π+π1−ui,sparsemodel;1−μC,allornothingmodel.$
(1)
In addition to catalysis, we may also allow random blocking (inhibition) of reactions by molecules, formalized as follows. Suppose that each molecule type xX \ B has an associated probability Bx of blocking any given reaction in R. We will treat Bx as a random variable taking values in [0, 1] with a common distribution $𝒟ˆ$. This results in a random assignment of blocking (i.e., a random subset β of X × 𝓡), where (x, r) ∈ β if x blocks reaction r. Let 𝓑x,r be the event that x blocks r. We assume that:
• (I1′) 𝓑 = (Bx, xX \ B) is a collection of independent random variables.

• (I2′) Conditional on 𝓑, (𝓑x,r : xX \ C, rR) is a collection of independent events.

Since the distribution of Bx is the same for all x, we will use B to denote this random variable; let μB = 𝔼[B] and, for i ≥ 0, let:
$λˆi=𝔼1−Bi.$
We also assume that catalysis and inhibition are independent of each other. Formally, this is the following condition:
• (I3) C–random variables in (I1, I2) are independent of the B–random variables in (I1′, I2′).

Note that (I3) allows the possibility that a molecule type x both catalyzes and blocks the same reaction r (the effect of this on uRAFs is the same as if x just blocks r; (i.e., blocking is assumed to trump catalysis)). Notice also that λ0 = $λˆ$0 = 1.

## 3 Generic Results

To state our first result, we require two further definitions. Let μRAF and μuRAF denote the expected number of RAFs and uRAFs (respectively) arising in 𝒬 under the random process of catalysis and inhibition described. For integers k, s ≥ 1 let nk,s be the number of F-generated subsets R′ of R that have size k and for which the total number of non-food products in X produced by reactions in R′ is s. Note that nk,s = 0 for s > min{|X| − F, kM} where M is the maximum number of products of any single reaction.

Part (i) of the following theorem gives an exact expression for μRAF and μuRAF, which we then use in Parts (ii) and (iii) to describe the catalysis and inhibition distributions (having a given mean) that minimize or maximize the expected number of RAFs and uRAFs. We apply this theorem to particular systems in the next section.

Theorem 1: Let 𝒬 be any chemical reaction system with food set, accompanied by catalysis and inhibition distributions 𝒟 and$𝒟ˆ$, respectively.

• (i) The expected number of RAFs and uRAFs for 𝒬 is given as follows:
$μRAF=∑k≥1,s≥0nk,s∑i=0k−1ikiλis+c$
(2)
and
$μuRAF=∑k≥1,s≥0nk,s∑i=0k−1ikiλis+cλˆks+b.$
(3)
• (ii) Among all distributions 𝒟 on catalysis having a given mean μC, the distribution that minimizes the expected number of RAFs and uRAFs (for any inhibition distribution) is the uniform model (i.e., C = μC with probability 1).

• (iii) Among all distributions$𝒟ˆ$on inhibition having a given mean μB, the following hold:

• (a) the distribution that minimizes the expected number of uRAFs (for any catalysis distribution) is the uniform model (B = μB with probability 1).

• (b) the distribution that maximizes the expected number of uRAFs (for any catalysis distribution) is the all-or-nothing inhibition model (i.e., B = 1 with probability μB, and B = 0 with probability 1 − μB).

We give the proof of Theorem 1 shortly, following some brief remarks.

### 3.1 Remarks

• (1) If PRAF and PuRAF are the probability that 𝒬 contains a RAF and a uRAF, respectively, then these quantities are bounded above as follows:
$PRAF≤μRAFandPuRAF≤μuRAF.$
This follows from the well-known inequality ℙ(V > 0) ≤ 𝔼[V] for any non-negative integer-valued random variable V, upon taking V to be the number of RAFs (or the number of uRAFs). We will explore the extent to which PRAF underestimates μRAF in Section 4.2.
• (2) Theorem 1 makes clear that the only relevant aspects of the network (X, R) for μRAF and μuRAF are encoded entirely within the coefficients nk,s (the two stochastic terms depend only on r and s but not on further aspects of the network structure). By contrast, an expression for the probabilities PRAF and PuRAF that a RAF or uRAF exists requires more detailed information concerning the structure of the network. This is due to dependencies that arise in the analysis. Notice also that Theorem 1 allows the computation of μuRAF in O(|R|2 × |X|) steps (assuming that the λi, $λˆ$i, and nk,s values are available).

• (3) Although the computation or estimation of nk,s may be tricky in general systems, Equation 2 can still be useful (even with little or no information about nk,s) for asking comparative types of questions. In particular, Parts (ii) and (iii) provide results that are independent of the details of the network (X, R, F). In particular, Theorem 1(ii) is consistent with simulation results in Hordijk and Steel (2016) for Kauffman's binary polymer model, in which variable catalysis rates (the sparse and all-or-nothing model) led to RAFs appearing at lower average catalysis values (μC) than for uniform catalysis.

• (4) For the uniform model, note that the term ($∑i=0k$(1)i$kiλis+c$) in Equations 2 and 3 simplifies to [1 − (1 − μC)s+c]k.

### 3.2 Proof of Theorem 1

For Part (i), recall that π(R′) denotes the set of products of reactions in R′.

For k, s ≥ 1, let FG(k, s) denote the collection of subsets R′ of R that satisfy all of the following three properties:

• (i) R′ has size k,

• (ii) R′ is F-generated, and

• (iii) the number of non-food molecule types produced by reactions in R′ is s.

Thus,
$nk,s=FGks.$
For R′ ⊆ R, let 𝕀R be the Bernoulli random variable that takes the value 1 if each reaction in R′ is catalyzed by at least one product of a reaction in R′ or by an element of F \ C, and 0 otherwise. Similarly, let $𝕀ˆ$R be the Bernoulli random variable that takes the value 1 if no reaction in R′ is blocked by the product of any reaction in R′ or by an element of F \ B. Then the random variable
$∑k≥1,s≥0∑R′∈FGks𝕀𝓡′⋅𝕀ˆ𝓡′$
counts the number of uRAFs present, so we have:
$μuRAF=𝔼∑k≥1,s≥0∑R′∈FGks𝕀𝓡′⋅𝕀ˆ𝓡′=∑k≥1,s≥0∑R′∈FGks𝔼𝕀𝓡′⋅𝕀ˆ𝓡′=∑k≥1,s≥0∑R′∈FGks𝔼𝕀𝓡′⋅𝔼𝕀ˆ𝓡′,$
(4)
where the second equality is by linearity of expectation, and the third equality is by the independence assumption (I3). Given 𝓡′ ∈ FG(k, s), let C1, C2, …, Cs+c be the random variables (ordered in any way) that correspond to the catalysis probabilities of the s products of 𝓡′ and the c elements of F \ C. We can then write:
$𝔼𝕀𝓡′=ℙ𝕀R′=1=𝔼ℙ𝕀R′=1C1C2…Cs+c,$
(5)
where the second expectation is with respect to the random variables Ci. The event 𝕀R = 1 occurs precisely when each of the r reactions in R′ is catalyzed by at least one of the s + c elements in (μ(R′) \ F) ∪ (F \ C). By the independence assumption (I2),
$ℙ𝕀R′=1C1C2…Cs+c=∏r′∈R′1−∏j=1s+c1−Cj=1−∏j=1s+c1−Cjk.$
(6)
Set V := $∏j=1s+c$(1 − Cj). Equations 5 and 6 then give:
$𝔼𝕀𝓡′=𝔼1−Vk=∑i=0k−1iki𝔼Vi,$
(7)
where the second equality is from the binomial expansion (1 − V)k = $∑i=0k$(−1)$ki$Vi, and linearity of expectation. Moreover, for each i ≥ 0, we have:
$𝔼Vi=𝔼∏j=1s+c1−Cji=𝔼∏j=1s+c1−Cji=∏j=1s+c𝔼1−Cji=∏j=1s+cλi=λis+c,$
(8)
where the first two equalities are trivial algebraic identities, the third is by the independence assumption (I1), the fourth is by definition, and the last is trivial. Substituting Equation 8 into Equation 7 gives:
$𝔼𝕀𝓡′=∑i=0k−1ikiλis+c.$
(9)
Turning to inhibition, a RAF subset R′ of R in FG(k, s) is a uRAF precisely if no reaction in R′ is blocked by any of the s + b elements of (π(R′) \ F ) ∪ (F \ B). By the independence assumption (I2′),
$ℙ𝕀ˆR′=1B1B2…,Bs+b=∏r′∈R′∏j=1s+b1−Bj=∏j=1s+b1−Bjk=∏j=1s+b1−Bjk.$
Applying expectation (using the independence assumption (I1′)), together with the identity 𝔼[(1 − Bj)k] = $λˆ$k gives:
$𝔼𝕀ˆ𝓡′=λˆks+b.$
(10)

Combining Equations 9 and 10 into Equation 4 gives the first equation in Part (i). The second is then obtained by putting $λˆ$i = 1 for all i.

Parts (ii) and (iii): Observe that the function u = (1 − y)k for k ≥ 1 is convex and strictly convex when k > 1. Thus, by Jensen's Inequality, for any random variable Y, we have:
$𝔼1−Yk≥1−𝔼Yk,$
(11)
with a strict inequality when Y is nondegenerate and k > 1.
For Part (ii), let V = $∏j=1s+c$(1 − Cj). Then by the first equality in Equation 7 we have:
$𝔼𝕀𝓡′=𝔼1−Vk,$
and by Inequality (Equation 11) (with Y = V) we have:
$𝔼𝕀𝓡′≥1−𝔼Vk,$
(12)
and the inequality is strict when V is nondegenerate and k > 1. By the independence assumption (I1), and noting that 𝔼[(1 − Cj)] = 1 − μC we have:
$𝔼V=𝔼∏j=1s+c1−Cj=∏j=1s+c𝔼1−Cj=1−μCs+c,$
(13)
and substituting Equation 13 into Inequality (Equation 12) gives:
$𝔼𝕀𝓡′≥1−1−μCs+ck,$
with equality only for the uniform model. This gives Part (ii).
For Part (iii)(a), Inequality (Equation 11) implies that $λˆ$k = 𝔼[(1 − B)k)] ≥ (1 − μB)k. Let H(k, s) := ($∑i=0k$(−1)i$kiλis+c$). By Equation 9, H(k, s) = 𝔼[𝕀𝓡′] for 𝓡′ ∈ FG(k, s) and so H(k, s) ≥ 0. Thus, by Equation 3 we have:
$μuRAF=∑k≥1,s≥0nk,s⋅Hks⋅λˆks+b≥∑k≥1,s≥0nk,s⋅Hks⋅1−μBks+b,$
and the right-hand side of this inequality is the value of μuRAF for the uniform model of inhibition.

For Part (iii)(b), suppose that Y is a random variable taking values in [0, 1] with mean η and let Y0 be the random variable that takes the value 1 with probability η and 0 otherwise. Then 𝔼[$Y0m$] = η for all m ≥ 1, and 𝔼[Ym] ≤ 𝔼[Y2] ≤ η for all m ≥ 1 (since YmY2Y because Y takes values in [0, 1]); moreover, 𝔼[Y2] = η if and only if 𝔼[Y(1 − Y)] = 0, which implies that Y = Y0. Now apply this to Y = (1 − B) and m = k to deduce for the distributions on B that have a given mean μB, $λˆ$k is maximized when the distribution takes the value 1 with probability μB and zero otherwise.

## 4 Applications

### 4.1 Inhibition-Catalysis Trade-Offs Under the Uniform Model

For any model in which catalysis and inhibition are uniform, Theorem 1 provides a simple prediction concerning how the expected number of uRAFs compares with a model with zero inhibition (and a lower catalysis rate). To simplify the statement, we will assume b = c and we will write μuRAF(p, tp) to denote the dependence of μuRAF on μC = p and μB = tp for some value of t. We will also write p = ν/N, where N is the total number of molecule types that are in the food set or can be generated by a sequence of reactions in 𝓡. We assume in the following result that p is small (in particular, < 1/2) and N is large (in particular, (1 − ν/N)N can be approximated by eν).

The following result (which extends Theorem 2 from Hordijk & Steel, 2016) applies to any chemical reaction system and provides a lower bound on the expected number of uRAFs in terms of the expected number of RAFs in the system with no inhibition (and half the catalysis rate); its proof relies on Theorem 1. Roughly speaking, Corollary 1 states that for any chemical reaction system with uniform catalysis, if one introduces a limited degree of inhibition then by doubling the original catalysis rate, the expected number of uninhibited RAFs is at least as large as the original number of expected RAF before inhibition was present (and at the original catalysis rate).

Corollary 1. For all non-negative values of t with t$1ν$ ln(1 + eν), the following inequality holds:
$μuRAF2ptp≥μRAFp0.$

Proof. By Theorem 1, and Remark (4) following this theorem, and noting that μC = p and μB = tp we have:
$μuRAF2ptp=∑k≥1,s≥0nk,s1−1−2ps+c⋅1−tps+ck,$
(14)
which can be re-written as:
$μuRAF2ptp=∑k≥1,s≥cnk,s−c1−1−2ps⋅1−tpsk.$
(15)
Thus (putting t = 0 in this last equation) we obtain:
$μRAFp0=∑k≥1,s≥cnk,s−c1−1−psk.$
(16)
Now, for each x ∈ (0, 0.5), we have:
$1−1−2xs≥1−1−x2s=1−1−xs1+1−xs.$
Thus (with x = p), we see that the term inside the square brackets in Equation 15 exceeds the term in square brackets in Equation 16 by a factor of (1 + (1 − p)s)(1 − tp)s, and this is minimized when s = N (the largest possible value s can take). Setting s = N and writing p = ν/N we have
$1+1−ps1−tps=(1+1−ν/NN1−tν/NN∼1+e−νe−tν$
and the last term on the right is at least 1 when t satisfies the stated inequality (namely, t$1ν$ ln(1 + eν)). Thus (1 + (1 − p)s)(1 − tp)s ≥ 1, for all s between 1 and N and so each term in Equation 15 is greater than or equal to the corresponding term in square brackets in Equation 16, which justifies the inequality in Corollary 1.□

### 4.2 Explicit Calculations for Two Models on a Subclass of Networks

For the remainder of this section, we consider elementary chemical reaction systems (i.e., systems for which each reaction has all its reactants in the food set, as studied in Steel et al., 2018), with the further conditions that: (i) each reaction has exactly one product, (ii) different reactions produce different products, (iii) no reaction is inhibited, and (iv) no food element catalyzes any reaction.

We can associate with each such system a directed graph 𝒢 on the set XF of products of the reactions, with an arc from x to y if x catalyzes the reaction that produces y (this models a setting investigated in Jain & Krishna, 1998, 2001). RAF subsets are then in one-to-one correspondence with the subgraphs of 𝒢 for which each vertex has indegree at least one. In particular, a RAF exists if and only if there is a directed cycle in 𝒢 (which could be an arc from a vertex to itself).1 In this simple setup, if N denotes the number of reactions (= number of non-food molecule types) then:
$nk,s=Nk,ifk=s;0,otherwise.$
Applying Theorem 1(i) gives:
$μRAF=∑j=1NNj∑i=0j−1ijiλij.$
(17)
Regarding catalysis, consider first the all-or-nothing model, for which λi = 1 − π = 1 − μC for i ≥ 1 (and λ0 = 1). Equation 17 simplifies to:
$μRAF=2N−2−μCN,$
(18)
and we provide a proof of this in the  Appendix.

This expression can also be derived by the following direct argument. First, note that a subset S of the N products of reactions does not correspond to a RAF if and only if each of the |S| elements x in S has Cx = 0. The random variable W = |{x : Cx = 1}| follows the binomial distribution Bin(N, μC), and the proportion of sets of size N that avoid a given set S of size m is 2m. Thus the expected proportion of subsets that are not RAFs is the expected value of 2W where W is the binomial distribution above. Applying standard combinatorial identities then leads to Equation 18.

The probability of a RAF for the all-or-nothing models is also easily computed:
$PRAF=1−1−μCN.$
(19)
Notice that one can select μC to tend to 0 in such a way that PRAF converges to 0 exponentially quickly with N while μRAF tends to infinity at an exponential rate with N (this requires μC to decay sufficiently fast with N but not too fast, e.g., μC = Θ(N−1−δ) for δ > 0). Comparing Equations 18 and 19, we also observe the following identity:
$μRAFμC=2NPRAFμC/2.$
By contrast, for the uniform model, applying straightforward algebra to Equation 17 leads to
$μRAF=∑j=1NNj1−1−μCjj.$
(20)

We now use these formulae to investigate the relationship between PRAF and μRAF in elementary chemical reaction systems (satisfying conditions (i)–(iv)) as N becomes large; in particular the impact of the choice of model (all-or-nothing vs. uniform) on this relationship.

### 4.3 Asymptotic Properties of the Two Models at the Catalysis Level Where RAFs Arise

For the all-or-nothing and uniform models, RAFs arise with a given (positive) probability, provided that μC converges to 0 no faster than N−1 as N grows. Thus, it is helpful to write μC = γ/N to compare their behavior as N grows.

For the all-or-nothing model, Equations 18 and 19 reveal that:
$μRAFPRAF=2N1−1−γ2NN1−1−γNN∼2N1−exp−γ/21−exp−γ,$
where ∼ is asymptotic equivalence as N becomes large (with γ being fixed), and so:
$μRAFPRAF∼2N−11+Oγ.$
(21)
Let us compare this with the uniform model with the same μC (and hence γ) value. It can be shown that when γ < e−1, we have:
$limN→∞∑j=1NNj1−1−γ/Njj=γ+oγ,$
(22)
where o(γ) has order γ2 as γ → 0 (a proof is provided in the  Appendix).
By Theorem 1 of Hordijk et al. (2019) (and for any value of N and assuming γ < 1), we have:
$1−exp−γ≤PRAF≤−ln1−γ.$
(23)
In particular, for small γ and the uniform model we have:
$PRAF=γ+oγ.$
(24)
Equations 17, 22, and 24 provide the following result for the uniform model when γ < e−1:
$μRAFPRAF∼1+Oγ,$
(25)
where ∼ again denotes asymptotic equivalence as N becomes large (with γ fixed).

Comparing Equations 21 and 25 reveals a key difference in the ratio μRAF/PRAF between the all-or-nothing and uniform models when N is large and γ is small: the former equation involves an exponential term in N, while the second does not. This can be explained as follows. In the all-or-nothing model, the existence of a RAF comes down to whether or not there is a reaction r that generates a universal catalyst; when there is, then any subset of the N reactions that contains r is a RAF. By contrast, with the uniform model at a low catalysis level where RAFs are improbable, if a RAF exists, there is likely to be only one. Note that the results in this section are particular to chemical reaction systems that are elementary and satisfy properties (i)–(iv) as described at the start of this section.

In this paper, we have focused on the expected number of RAFs and uRAFs (rather than the probability of at least one such set existing), as this quantity can be described explicitly, and generic results described via this expression can be derived (e.g., in Parts (ii) and (iii) of Theorem 1 and Corollary 1). Even so, the expressions in Theorem 1 involve quantities nk,s that may be difficult to quantify exactly; thus in the second part of the paper, we consider more restrictive types of systems.

In our analysis, we have treated inhibition and catalysis as simple and separate processes. However, a more general approach would allow reactions to proceed under rules that are encoded by Boolean expressions. For example, the expression (ab) ∨ c ∨ (d ∧ ¬e) assigned to a reaction r would allow r to proceed if at least one of the following holds: (i) both a and b are present as catalysts, or (ii) c is present as a catalyst, or (iii) d is present as a catalyst and e is not present as an inhibitor. Extending the results in this paper to this more general setting could be an interesting exercise for future work.

## Acknowledgments

We thank the two reviewers for a number of helpful comments on an earlier version of this manuscript.

## Note

1

An asymptotic study of the emergence of first cycles in large random directed graphs was explored in Bollobás and Rasmussen (1989).

## References

References
Arsène
,
S.
,
Ameta
,
S.
,
Lehman
,
N.
,
Griffiths
,
A. D.
, &
Nghe
,
P.
(
2018
).
Coupled catabolism and anabolism in autocatalytic RNA sets
.
Nucleic Acids Research
,
46
(
18
),
9660
9666
.
Bollobás
,
B.
, &
Rasmussen
,
S.
(
1989
).
First cycles in random directed graph processes
.
Discrete Mathematics
,
75
,
55
68
.
Cazzolla Gatti
,
R.
,
Fath
,
B.
,
Hordijk
,
W.
,
Kauffman
,
S.
, &
Ulanowicz
,
R.
(
2018
).
Niche emergence as an autocatalytic process in the evolution of ecosystems
.
Journal of Theoretical Biology
,
454
,
110
117
.
Cornish-Bowden
,
A.
, &
Cárdenas
,
M. L.
(
2007
).
Organizational invariance in (M, R)-systems
.
Chemistry and Biodiversity
,
4
,
2396
2406
.
Dittrich
,
P.
, &
Speroni di Fenizio
,
P.
(
2007
).
Chemical organisation theory
.
Bulletin of Mathematical Biology
,
69
,
1199
1231
.
Filisetti
,
A.
,
Villani
,
M.
,
Damiani
,
C.
,
Graudenzi
,
A.
,
Roli
,
A.
,
Hordijk
,
W.
, &
Serra
,
R.
(
2014
).
On RAF sets and autocatalytic cycles in random reaction networks
. In
C.
Pizzuti
&
G.
Spezzano
(Eds.),
Advances in Artificial Life and Evolutionary Computation (WIVACE 2014). Communications in Computer and Information Science
,
vol. 445
(pp.
113
126
).
Cham, Switzerland
:
Springer
.
Gabora
,
L.
, &
Steel
,
M.
(
2017
).
Autocatalytic networks in cognition and the origin of culture
.
Journal of Theoretical Biology
,
63
,
617
638
.
Hayden
,
E. J.
,
von Kiedrowski
,
G.
, &
Lehman
,
N.
(
2008
).
Systems chemistry on ribozyme self-construction: Evidence for anabolic autocatalysis in a recombination network
.
Angewandte Chemie International Edition
,
47
(
44
),
8424
8428
.
Hordijk
,
W.
(
2019
).
A history of autocatalytic sets
.
Biological Theory
,
14
,
224
246
.
Hordijk
,
W.
, &
Steel
,
M.
(
2015
).
Conditions for evolvability of autocatalytic sets: A formal example and analysis
.
Origins of Life and Evolution of Biospheres
,
44
(
2
),
111
124
.
Hordijk
,
W.
, &
Steel
,
M.
(
2016
).
Autocatalytic sets in polymer networks with variable catalysis distributions
.
Journal of Mathematical Chemistry
,
54
(
10
),
1997
2021
.
Hordijk
,
W.
, &
Steel
,
M.
(
2017
).
Chasing the tail: The emergence of autocatalytic networks
.
Biosystems
,
152
,
1
10
.
Hordijk
,
W.
,
Steel
,
M.
, &
Kauffman
,
S. A.
(
2019
).
Molecular diversity required for the formation of autocatalytic sets
.
Life
,
9
(
1
),
23
.
Huson
,
D.
, &
Steel
,
M.
(
2020
).
CatlyNet
.
https://github.com/husonlab/catlynet (accessed 9 November 2020)
.
Jain
,
S.
, &
Krishna
,
S.
(
1998
).
Autocatalytic sets and the growth of complexity in an evolutionary model
.
Physical Review Letters
,
81
,
5684
5687
.
Jain
,
S.
, &
Krishna
,
S.
(
2001
).
A model for the emergence of cooperation, interdependence, and structure in evolving networks
.
Proceedings of the National Academy of Sciences USA
,
98
,
543
547
.
Jaramillo
,
S.
,
Honorato-Zimmer
,
R.
,
Pereira
,
U.
,
Contreras
,
D.
,
Reynaert
,
B.
,
Hernández
,
V.
,
,
J.
,
Cárdenas
,
M.
,
Cornish-Bowden
,
A.
, &
Letelier
,
J.
(
2010
).
(M, R) systems and RAF sets: Common ideas, tools and projections
. In
M.
Dörr
,
M. M.
Hanczyc
,
L.
Laursen
,
S. E.
Maurer
,
D.
Merkle
,
P.
Monnard
,
K.
Støy
, &
S.
Rasmussen
(Eds.),
Proceedings of the Alife XII Conference
(pp.
94
100
).
Cambridge, MA
:
MIT Press
.
Kauffman
,
S. A.
(
1971
).
Cellular homeostasis, epigenesis and replication in randomly aggregated macromolecular systems
.
Journal of Cybernetics
,
1
,
71
96
.
Kauffman
,
S. A.
(
1986
).
Autocatalytic sets of proteins
.
Journal of Theoretical Biology
,
119
,
1
24
.
Kauffman
,
S. A.
(
1993
).
The origins of order
.
Oxford, UK
:
Oxford University Press
.
Liu
,
B.
,
Pappas
,
C. G.
,
Ottelé
,
J.
,
Schaeffer
,
G.
,
Jurissek
,
C.
,
Pieters
,
P. F.
,
Altay
,
M.
,
Marić
,
I.
,
Stuart
,
M. C. A.
, &
Otto
,
S.
(
2020
).
Spontaneous emergence of self-replicating molecules containing nucleobases and amino acids
.
Journal of the American Chemical Society
,
142
(
9
),
4184
4192
.
Mossel
,
E.
, &
Steel
,
M.
(
2005
).
Random biochemical networks: The probability of self-sustaining autocatalysis
.
Journal of Theoretical Biology
,
233
(
3
),
327
336
.
Steel
,
M.
,
Hordijk
,
W.
, &
Xavier
,
J. C.
(
2018
).
Autocatalytic networks in biology: Structural theory and algorithms
.
Journal of the Royal Society Interface
,
16
,
20180808
.
Szathmáry
,
E.
(
2000
).
The evolution of replicators
.
Philosophical Transactions of Royal Society of London B: Biological Sciences
,
355
(
1403
),
1669
1676
.
Vaidya
,
N.
,
Manapat
,
M. L.
,
Chen
,
I. A.
,
Xulvi-Brunet
,
R.
,
Hayden
,
E. J.
, &
Lehman
,
N.
(
2012
).
Spontaneous network formation among cooperative RNA replicators
.
Nature
,
491
,
72
77
.
Vasas
,
V.
,
Fernando
,
C.
,
Santos
,
M.
,
Kauffman
,
S.
, &
Szathmáry
,
E.
(
2012
).
Evolution before genes
.
Biology Direct
,
7
(
1
),
1
14
.
Virgo
,
N.
, &
Guttenberg
,
N.
(
2015
).
Heredity in messy chemistries
. In
P.
Andrews
,
L.
Caves
,
R.
Doursat
,
S.
Hickinbotham
,
F.
Polack
,
S.
Stepney
,
T.
Taylor
, &
J.
Timmis
(Eds.),
Proceedings of the European Conference on Artificial Life
(pp.
325
332
).
Cambridge, MA
:
MIT Press
.
Xavier
,
J. C.
,
Hordijk
,
W.
,
Kauffman
,
S.
,
Steel
,
M.
, &
Martin
,
W. F.
(
2020
).
Autocatalytic chemical networks at the origin of metabolism
.
Proceedings of the Royal Society B: Biological Sciences
,
287
,
20192377
.

### Appendix: Justification of Equations 18 and 22

#### Equation 18

We use three applications of the standard binomial identity $∑k=0nnk$xk = (1 + x)n. Set λ = 1 − μC. Since λi = λ for i ≥ 1, the binomial identity (with x = −1, n = j, k = i) gives:
$∑i=1j−1ijiλij=λj⋅∑i=1j−1iji=λj⋅∑i=0jji−1i−1=λj0j−1=−λj,$
for each j ≥ 1. Thus, adding in the additional term (for i = 0 where λ0 = 1), we obtain:
$∑i=0j−1ijiλij=1−λj.$
Equation 17 now gives:
$μRAF=∑j=1NNj1−λj=∑j=0NNj1−λj=2N−1+λN,$
where the third equality involves two further applications of the binomial identity (with n = N, k = j, and with one application using x = 1, the other using x = λ).

#### Equation 22

Observe that the j-th term on the LHS of Equation 22 is $Nj$ (1 − (1 − γ/N)j)j. For j = 1 this simplifies to γ. A simple proof by induction shows that for all j ≥ 1, and all x ∈ (0, 1), we have: (1 − x)j ≥ 1 − xj and so (1 − (1 − x)j)j ≤ (xj)j. Applying this with x = γ/N ∈ (0, 1), the LHS of Equation 22 is bounded below by γ (the term where j = 1) and is bounded above by:
$γ+∑j≥2NjγjjjNj≤γ+∑j≥2γjjjj!,$
where the inequality follows from $Nj$/Nj$1j!$. Next, observe that, by Stirling's formula for j!, we have:
$γj⋅jjj!∼γej2πj,$
and this term converges to zero at exponential rate as j increases provided that γ < e−1; in particular, for γ < e−1, the sum ∑j≥2$γjjjj!$ converges to a constant of order γ2 as γ → 0.