## Abstract

The emergence of self-sustaining autocatalytic networks in chemical reaction systems has been studied as a possible mechanism for modeling how living systems first arose. It has been known for several decades that such networks will form within systems of polymers (under cleavage and ligation reactions) under a simple process of random catalysis, and this process has since been mathematically analyzed. In this paper, we provide an exact expression for the expected number of self-sustaining autocatalytic networks that will form in a general chemical reaction system, and the expected number of these networks that will also be uninhibited (by some molecule produced by the system). Using these equations, we are able to describe the patterns of catalysis and inhibition that maximize or minimize the expected number of such networks. We apply our results to derive a general theorem concerning the trade-off between catalysis and inhibition, and to provide some insight into the extent to which the expected number of self-sustaining autocatalytic networks coincides with the probability that at least one such system is present.

## 1 Introduction

A key step in the origin of life is the formation of a metabolic network that is both self-sustaining and collectively autocatalytic (Arsène et al., 2018; Hayden et al., 2008; Liu et al., 2020; Vaidya et al., 2012; Vasas et al., 2012; Xavier et al., 2020). Systems that combine these two general properties have been studied within a formal framework that is sometimes referred to as RAF theory (Hordijk & Steel, 2017). We give precise definitions shortly but, roughly speaking, a *RAF* (*Reflexively Autocatalytic and F-generated*) set is a subset of reactions where the reactants and at least one catalyst of each reaction in the subset can be produced from an available food set by using reactions from within the subset only.

The study of RAFs traces back to pioneering work on *collectively autocatalytic sets* in polymer models of early life (Kauffman, 1971, 1986, 1993), which was subsequently developed mathematically (see (Hordijk, 2019, and Hordijk & Steel, 2017) and the references therein). RAF algorithms have been applied recently to investigate the traces of earliest metabolism that can be detected in large metabolic databases across bacteria and archaea (Xavier et al., 2020), leading to the development of an open-source program to analyze and visualize RAFs in complex biochemical systems (Huson & Steel, 2020). RAF theory overlaps with other graph-theoretic approaches in which the emergence of directed cycles in reaction graphs plays a key role (Bollobás & Rasmussen, 1989; Jain & Krishna, 1998; Jain & Krishna, 2001), and is also related to (M, R) systems (Cornish-Bowden & Cárdenas, 2007; Jaramillo et al., 2010) and chemical organization theory (Dittrich & Speroni di Fenizio, 2007).

RAF theory has also been applied in other fields, including ecology (Cazzolla Gatti et al., 2018) and cognition (Gabora & Steel, 2017), and the ideas may have application in other contexts. In economics, for instance, the production of consumer items can be viewed as a catalyzed reaction; for example, the production of a wooden table involves nails and wood (reactants) and a hammer (a catalyst, as it is not used up in the reaction but makes the reaction happen much more efficiently), and the output (reaction product) is the table. On a larger scale, a factory is a catalyst for the production of the items produced in it from reactants brought into the factory. In both these examples, notice that each reactant may be either a raw material (i.e., the elements of a “food set”) or a product of other (catalyzed) reactions, whereas the products may, in turn, be reactants, or catalysts, for other catalyzed reactions. Products can sometimes also *inhibit* reactions; for example, the production of internal combustion engines resulted in processes for building steam engines being abandoned.

In this paper, we extend RAF theory further by investigating the impact of different modes of catalysis and inhibition on the appearance of (uninhibited) RAF subsets. We focus on the expected number of such sets (rather than on the probability that at least one such set exists, which has been the focus of nearly all earlier RAF studies [Filisetti et al., 2014; Mossel & Steel, 2005]). Using a mathematical approach, we derive explicit and exact analytical expressions for the expected number of such uninhibited RAF subsets, as well as providing some insight into the expected population sizes of RAFs for the catalysis rate at which they first appear (Section 4.2). In particular, we show that for simple systems, with an average catalysis rate that is set at the level where RAFs first appear, the expected number of RAFs depends strongly on the variability of catalysis across molecules. At one extreme (uniform catalysis), the expected number of RAFs is small (e.g., one, or a few), while at the other extreme (all-or-nothing catalysis) the expected number of RAFs grows exponentially with the size of the system.

The motivation for looking at the expected number of RAFs (rather than the probability that a RAF exists) is twofold. First, by focusing on expected values it is possible to present certain exact results (see Theorem 1), rather than just inequalities or asymptotic results, while still gaining some information about the probability that a RAF exists. Second, in origin of life studies, it is relevant to consider populations of self-sustaining autocatalytic chemical networks, which may be subject to competition and selection, a topic that has been explored by others (see, e.g., Szathmáry, 2000; Vasas et al., 2012; Virgo & Guttenberg, 2015), and information concerning the likely diversity of RAFs available in a given chemical reaction system is therefore a natural question. In previous analyses where RAFs have been identified, subsequent analysis has revealed a large number of RAFs present within the RAF; for example, for a 7-reaction RAF in a laboratory-based study involving RNA-ribosymes (from Vaidya et al., 2012), more than half of the 2^{7} = 128 subsets of this RAF are also RAFs (cf. Figure 5 of Steel et al., 2008). Simulation studies involving Kauffman's binary polymer model have also identified a large number of RAFs present once catalysis rises above the level at which RAFs first appear (Hordijk & Steel, 2015).

The structure of this paper is as follows. We begin with some formal definitions, and then describe different models for catalysis and inhibition. In Section 3, we present the main mathematical result, along with some remarks, and proof. We then present a number of consequences of our main result, beginning with a generic result concerning the impact of inhibition when catalysis is uniform. We then investigate the impact of different catalysis distributions on the expected number of RAFs arising in “elementary” chemical reaction systems, focusing on the catalysis rate at which RAFs first appear. We end with some brief concluding comments.

### 1.1 Definitions

Let *X* be a set of molecule types; *R* a set of reactions, where each reaction consists of a subset of molecule types as input (*reactants*) and a set of molecule types as outputs (*products*); and let *F* be a subset of *X* (called a *food set*). We refer to the triple 𝒬 = (*X*, *R*, *F*) as a *chemical reaction system with food set* and, unless stated otherwise, we impose no further restrictions on 𝒬 (e.g., it need not correspond to a system of polymers, and a reaction can have any positive number of reactants and any positive number of products).

Given a reaction *r* ∈ *R*, we let *ρ*(*r*) ⊆ *X* denote the set of reactants of *r*, and *π*(*r*) denote the set of products of *r*. Moreover, given a subset *R*′ of *R*, we let *π*(*R*′) = ⋃_{r∈R′}*π*(*r*).

A subset *R*′ of *R* is said to be *F-*generated if *R*′ can be ordered *r*_{1}, *r*_{2}, …, *r*_{|R′|} so that *ρ*(*r*_{1}) ⊆ *F* and for each *i* ∈ {2, …, |*R*|}, we have *ρ*(*r*_{i}) ⊆ *F* ∪ *π*({*r*_{1}, …, *r*_{i−1}}). In other words, *R*′ is *F*-generated if the *R*′ can be built up by starting from one reaction that has all its reactants in the food set, then adding reactions in such a way that each added reaction has each of its reactants present either in the food set or as a product of a reaction in the set generated so far.

Now suppose that certain molecule types in *X* can catalyze certain reactions in *R*. A subset *R*′ of *R* is said to be “Reflexively Autocatalytic and *F*-generated” (more briefly, a *RAF*) if *R*′ is nonempty and each reaction *r* ∈ *R*′ is catalyzed by at least one molecule type in *F* ∪ *π*(*R*′) and *R*′ is *F*-generated.

We may also allow certain molecule types to inhibit reactions in *R*, in which case a subset *R*′ of *R* is said to be an “uninhibited RAF” (uRAF) if *R*′ is a RAF and no reaction in *R*′ is inhibited by any molecule type in *F* ∪ *π*(*R*′). The notion of a uRAF was first defined and studied in Mossel and Steel (2005). Notice that inhibition is being applied in a strong sense: a reaction *r* cannot be part of a uRAF if *r* is inhibited by at least one molecule type present, regardless of how many molecule types are catalysts for *r* and present in the uRAF.

Since a union of RAFs is also a RAF, when a RAF exists in a system, there is a unique maximal RAF. However, the same does not apply to uRAFs—in particular, the union of two uRAFs can fail to be a uRAF. These concepts are illustrated in Figure 1.

## 2 Modeling Catalysis and Inhibition

We will model catalysis and also blocking (inhibition) by random processes. To provide for greater generality, we allow the possibility that elements in a subset *C*^{−} (respectively, *B*^{−}) of the food set cannot catalyze (respectively block) any reaction in *R*. Let *c* = |*F* \ *C*^{−}| and *b* = |*F* \ *B*^{−}|. Thus *c* (respectively *b*) is the number of food elements that are possible catalysts (respectively blockers).

Suppose that each molecule type *x* ∈ *X* \ *C*^{−} has an associated probability *C*_{x} of catalyzing any given reaction in *R*. The values *C*_{x} are sampled independently from a distribution 𝒟, for each *x* ∈ *X*. This results in a random assignment of catalysis (i.e., a random subset *χ* of *X* × 𝓡), where (*x*, *r*) ∈ *χ* if *x* catalyzes *r*. Let 𝒞_{x,r} be the event that *x* catalyzes *r*.

(

*I*_{1}) 𝒞 = (*C*_{x},*x*∈*X*\*C*^{−}) is a collection of independent random variables.(

*I*_{2}) Conditional on 𝒞, (𝒞_{x,r}:*x*∈*X*\*C*^{−},*r*∈*R*) is a collection of independent events.

*C*

_{x}is the same for all

*x*∈

*X*\

*C*

^{−}, we will use

*C*to denote an arbitrary random variable sampled from the distribution 𝒟. Let

*μ*

_{C}= 𝔼[

*C*] and, for

*i*≥ 0, let

*λ*

_{i}be the

*i*–th moment of 1 −

*C*; that is:

The

*uniform model*: Each*x*∈*X*\*C*^{−}catalyzes each reaction in 𝓡 with a fixed probability*p*. Thus,*C*=*p*with probability 1, and so*μ*_{C}=*p*.The

*sparse model*:*C*=*u*with probability*π*and*C*= 0 with probability 1 −*π*, and so*μ*_{C}=*uπ*.The

*all-or-nothing model*:*C*= 1 with probability*π*and*C*= 0 with probability 1 −*π*, and so*μ*_{C}=*π*.

The uniform model is from Kauffman's binary polymer network and has been the default for most recent studies involving polymer models (Hordijk & Steel, 2017). More realistic catalysis scenarios can be modeled by allowing *C* to take a range of values around *μ*_{C} with different probabilities. The sparse model generalizes the uniform model slightly by allowing a (random) subset of molecule types to be catalysts. In this model, *π* would typically be very small in applications (i.e., most molecules are not catalysts but those few that are will catalyze a lot of reactions, as in the recent study of metabolic origins, described in Xavier et al., 2020). The all-or-nothing model is a special case of the sparse model. The emergence of RAFs in these models (and others, including a power-law distribution) was investigated in Hordijk and Steel (2016).

*λ*

_{i}values are given as follows:

*λ*

_{0}= 1, and for all

*i*≥ 1:

*x*∈

*X*\

*B*

^{−}has an associated probability

*B*

_{x}of blocking any given reaction in

*R*. We will treat

*B*

_{x}as a random variable taking values in [0, 1] with a common distribution $\mathcal{D}\u02c6$. This results in a random assignment of blocking (i.e., a random subset

*β*of

*X*× 𝓡), where (

*x*,

*r*) ∈

*β*if

*x*blocks reaction

*r*. Let 𝓑

_{x,r}be the event that

*x*blocks

*r*. We assume that:

(

*I*_{1}′) 𝓑 = (*B*_{x},*x*∈*X*\*B*^{−}) is a collection of independent random variables.(

*I*_{2}′) Conditional on 𝓑, (𝓑_{x,r}:*x*∈*X*\*C*^{−},*r*∈*R*) is a collection of independent events.

*B*

_{x}is the same for all

*x*, we will use

*B*to denote this random variable; let

*μ*

_{B}= 𝔼[

*B*] and, for

*i*≥ 0, let:

(

*I*_{3})*C*–random variables in (*I*_{1},*I*_{2}) are independent of the*B*–random variables in (*I*_{1}′,*I*_{2}′).

Note that (*I*_{3}) allows the possibility that a molecule type *x* both catalyzes and blocks the same reaction *r* (the effect of this on uRAFs is the same as if *x* just blocks *r*; (i.e., blocking is assumed to trump catalysis)). Notice also that *λ*_{0} = $\lambda \u02c6$_{0} = 1.

## 3 Generic Results

To state our first result, we require two further definitions. Let *μ*_{RAF} and *μ*_{uRAF} denote the expected number of RAFs and uRAFs (respectively) arising in 𝒬 under the random process of catalysis and inhibition described. For integers *k*, *s* ≥ 1 let *n*_{k,s} be the number of *F*-generated subsets *R*′ of *R* that have size *k* and for which the total number of non-food products in *X* produced by reactions in *R*′ is *s*. Note that *n*_{k,s} = 0 for *s* > min{|*X*| − *F*, *kM*} where *M* is the maximum number of products of any single reaction.

Part (i) of the following theorem gives an exact expression for *μ*_{RAF} and *μ*_{uRAF}, which we then use in Parts (ii) and (iii) to describe the catalysis and inhibition distributions (having a given mean) that minimize or maximize the expected number of RAFs and uRAFs. We apply this theorem to particular systems in the next section.

Theorem 1: *Let* 𝒬 *be any chemical reaction system with food set, accompanied by catalysis and inhibition distributions* 𝒟 *and*$\mathcal{D}\u02c6$*, respectively.*

*(i) The expected number of RAFs and uRAFs for*𝒬*is given as follows:*$\mu RAF=\u2211k\u22651,s\u22650nk,s\u2211i=0k\u22121iki\lambda is+c$(2)*and*$\mu uRAF=\u2211k\u22651,s\u22650nk,s\u2211i=0k\u22121iki\lambda is+c\lambda \u02c6ks+b.$(3)*(ii) Among all distributions*𝒟*on catalysis having a given mean μ*_{C}, the distribution that minimizes the expected number of RAFs and uRAFs (for any inhibition distribution) is the uniform model (i.e., C = μ_{C}with probability 1).*(iii) Among all distributions*$\mathcal{D}\u02c6$*on inhibition having a given mean μ*_{B}, the following hold:*(a) the distribution that minimizes the expected number of uRAFs (for any catalysis distribution) is the uniform model (B = μ*_{B}with probability 1).*(b) the distribution that maximizes the expected number of uRAFs (for any catalysis distribution) is the all-or-nothing inhibition model (i.e., B = 1 with probability μ*_{B}, and B = 0 with probability 1 − μ_{B}).

We give the proof of Theorem 1 shortly, following some brief remarks.

### 3.1 Remarks

- (1) If
*P*_{RAF}and*P*_{uRAF}are the probability that 𝒬 contains a RAF and a uRAF, respectively, then these quantities are bounded above as follows:This follows from the well-known inequality ℙ($PRAF\u2264\mu RAFandPuRAF\u2264\mu uRAF.$*V*> 0) ≤ 𝔼[*V*] for any non-negative integer-valued random variable*V*, upon taking*V*to be the number of RAFs (or the number of uRAFs). We will explore the extent to which*P*_{RAF}underestimates*μ*_{RAF}in Section 4.2. (2) Theorem 1 makes clear that the only relevant aspects of the network (

*X*,*R*) for*μ*_{RAF}and*μ*_{uRAF}are encoded entirely within the coefficients*n*_{k,s}(the two stochastic terms depend only on*r*and*s*but not on further aspects of the network structure). By contrast, an expression for the probabilities*P*_{RAF}and*P*_{uRAF}that a RAF or uRAF exists requires more detailed information concerning the structure of the network. This is due to dependencies that arise in the analysis. Notice also that Theorem 1 allows the computation of*μ*_{uRAF}in*O*(|*R*|^{2}× |*X*|) steps (assuming that the*λ*_{i}, $\lambda \u02c6$_{i}, and*n*_{k,s}values are available).(3) Although the computation or estimation of

*n*_{k,s}may be tricky in general systems, Equation 2 can still be useful (even with little or no information about*n*_{k,s}) for asking comparative types of questions. In particular, Parts (ii) and (iii) provide results that are independent of the details of the network (*X*,*R*,*F*). In particular, Theorem 1(ii) is consistent with simulation results in Hordijk and Steel (2016) for Kauffman's binary polymer model, in which variable catalysis rates (the sparse and all-or-nothing model) led to RAFs appearing at lower average catalysis values (*μ*_{C}) than for uniform catalysis.(4) For the uniform model, note that the term ($\u2211i=0k$(

*−*1)^{i}$ki\lambda is+c$) in Equations 2 and 3 simplifies to [1 − (1 −*μ*_{C})^{s+c}]^{k}.

### 3.2 Proof of Theorem 1

For Part (i), recall that *π*(*R*′) denotes the set of products of reactions in *R*′.

For *k*, *s* ≥ 1, let FG(*k*, *s*) denote the collection of subsets *R*′ of *R* that satisfy all of the following three properties:

(i)

*R*′ has size*k*,(ii)

*R*′ is*F*-generated, and(iii) the number of non-food molecule types produced by reactions in

*R*′ is*s*.

*R*′ ⊆

*R*, let 𝕀

_{R′}be the Bernoulli random variable that takes the value 1 if each reaction in

*R*′ is catalyzed by at least one product of a reaction in

*R*′ or by an element of

*F*\

*C*

^{−}, and 0 otherwise. Similarly, let $\mathbb{I}\u02c6$

_{R′}be the Bernoulli random variable that takes the value 1 if no reaction in

*R*′ is blocked by the product of any reaction in

*R*′ or by an element of

*F*\

*B*

^{−}. Then the random variable

*I*

_{3}). Given 𝓡′ ∈ FG(

*k*,

*s*), let

*C*

_{1},

*C*

_{2}, …,

*C*

_{s+c}be the random variables (ordered in any way) that correspond to the catalysis probabilities of the

*s*products of 𝓡′ and the

*c*elements of

*F*\

*C*

^{−}. We can then write:

*C*

_{i}. The event 𝕀

_{R′}= 1 occurs precisely when each of the

*r*reactions in

*R*′ is catalyzed by at least one of the

*s*+

*c*elements in (

*μ*(

*R*′) \

*F*) ∪ (

*F*\

*C*

^{−}). By the independence assumption (

*I*

_{2}),

*V*:= $\u220fj=1s+c$(1 −

*C*

_{j}). Equations 5 and 6 then give:

*V*)

^{k}= $\u2211i=0k$(−1)$ki$

*V*

^{i}, and linearity of expectation. Moreover, for each

*i*≥ 0, we have:

*I*

_{1}), the fourth is by definition, and the last is trivial. Substituting Equation 8 into Equation 7 gives:

*R*′ of

*R*in FG(

*k*,

*s*) is a uRAF precisely if no reaction in

*R*′ is blocked by any of the

*s*+

*b*elements of (

*π*(

*R*′) \

*F*) ∪ (

*F*\

*B*

^{−}). By the independence assumption (

*I*

_{2}′),

*I*

_{1}′)), together with the identity 𝔼[(1 −

*B*

_{j})

^{k}] = $\lambda \u02c6$

_{k}gives:

Combining Equations 9 and 10 into Equation 4 gives the first equation in Part (i). The second is then obtained by putting $\lambda \u02c6$_{i} = 1 for all *i*.

*Parts (ii) and (iii)*: Observe that the function

*u*= (1 −

*y*)

^{k}for

*k*≥ 1 is convex and strictly convex when

*k*> 1. Thus, by Jensen's Inequality, for any random variable

*Y*, we have:

*Y*is nondegenerate and

*k*> 1.

*V*= $\u220fj=1s+c$(1 −

*C*

_{j}). Then by the first equality in Equation 7 we have:

*Y*=

*V*) we have:

*V*is nondegenerate and

*k*> 1. By the independence assumption (

*I*

_{1}), and noting that 𝔼[(1 −

*C*

_{j})] = 1 −

*μ*

_{C}we have:

_{k}= 𝔼[(1 −

*B*)

^{k})] ≥ (1 −

*μ*

_{B})

^{k}. Let

*H*(

*k*,

*s*) := ($\u2211i=0k$(−1)

^{i}$ki\lambda is+c$). By Equation 9,

*H*(

*k*,

*s*) = 𝔼[𝕀

_{𝓡′}] for 𝓡′ ∈ FG(

*k*,

*s*) and so

*H*(

*k*,

*s*) ≥ 0. Thus, by Equation 3 we have:

*μ*

_{uRAF}for the uniform model of inhibition.

For Part (iii)(b), suppose that *Y* is a random variable taking values in [0, 1] with mean *η* and let *Y*_{0} be the random variable that takes the value 1 with probability *η* and 0 otherwise. Then 𝔼[$Y0m$] = *η* for all *m* ≥ 1, and 𝔼[*Y*^{m}] ≤ 𝔼[*Y*^{2}] ≤ *η* for all *m* ≥ 1 (since *Y*^{m} ≤ *Y*^{2} ≤ *Y* because *Y* takes values in [0, 1]); moreover, 𝔼[*Y*^{2}] = *η* if and only if 𝔼[*Y*(1 − *Y*)] = 0, which implies that *Y* = *Y*_{0}. Now apply this to *Y* = (1 − *B*) and *m* = *k* to deduce for the distributions on *B* that have a given mean *μ*_{B}, $\lambda \u02c6$_{k} is maximized when the distribution takes the value 1 with probability *μ*_{B} and zero otherwise.

## 4 Applications

### 4.1 Inhibition-Catalysis Trade-Offs Under the Uniform Model

For any model in which catalysis and inhibition are uniform, Theorem 1 provides a simple prediction concerning how the expected number of uRAFs compares with a model with zero inhibition (and a lower catalysis rate). To simplify the statement, we will assume *b* = *c* and we will write *μ*_{uRAF}(*p*, *tp*) to denote the dependence of *μ*_{uRAF} on *μ*_{C} = *p* and *μ*_{B} = *tp* for some value of *t*. We will also write *p* = *ν*/*N*, where *N* is the total number of molecule types that are in the food set or can be generated by a sequence of reactions in 𝓡. We assume in the following result that *p* is small (in particular, < 1/2) and *N* is large (in particular, (1 − *ν*/*N*)^{N} can be approximated by *e*^{−ν}).

The following result (which extends Theorem 2 from Hordijk & Steel, 2016) applies to any chemical reaction system and provides a lower bound on the expected number of uRAFs in terms of the expected number of RAFs in the system with no inhibition (and half the catalysis rate); its proof relies on Theorem 1. Roughly speaking, Corollary 1 states that for any chemical reaction system with uniform catalysis, if one introduces a limited degree of inhibition then by doubling the original catalysis rate, the expected number of uninhibited RAFs is at least as large as the original number of expected RAF before inhibition was present (and at the original catalysis rate).

**Corollary 1**.

*For all non-negative values of t with t*≤ $1\nu $ ln(1 +

*e*

^{−ν})

*, the following inequality holds:*

*Proof*. By Theorem 1, and Remark (4) following this theorem, and noting that

*μ*

_{C}=

*p*and

*μ*

_{B}=

*tp*we have:

*t*= 0 in this last equation) we obtain:

*x*∈ (0, 0.5), we have:

*x*=

*p*), we see that the term inside the square brackets in Equation 15 exceeds the term in square brackets in Equation 16 by a factor of (1 + (1 −

*p*)

^{s})(1 −

*tp*)

^{s}, and this is minimized when

*s*=

*N*(the largest possible value

*s*can take). Setting

*s*=

*N*and writing

*p*=

*ν*/

*N*we have

*t*satisfies the stated inequality (namely,

*t*≤ $1\nu $ ln(1 +

*e*

^{−ν})). Thus (1 + (1 −

*p*)

^{s})(1 −

*tp*)

^{s}≥ 1, for all

*s*between 1 and

*N*and so each term in Equation 15 is greater than or equal to the corresponding term in square brackets in Equation 16, which justifies the inequality in Corollary 1.□

### 4.2 Explicit Calculations for Two Models on a Subclass of Networks

For the remainder of this section, we consider *elementary* chemical reaction systems (i.e., systems for which each reaction has all its reactants in the food set, as studied in Steel et al., 2018), with the further conditions that: (i) each reaction has exactly one product, (ii) different reactions produce different products, (iii) no reaction is inhibited, and (iv) no food element catalyzes any reaction.

*X*−

*F*of products of the reactions, with an arc from

*x*to

*y*if

*x*catalyzes the reaction that produces

*y*(this models a setting investigated in Jain & Krishna, 1998, 2001). RAF subsets are then in one-to-one correspondence with the subgraphs of 𝒢 for which each vertex has indegree at least one. In particular, a RAF exists if and only if there is a directed cycle in 𝒢 (which could be an arc from a vertex to itself).

^{1}In this simple setup, if

*N*denotes the number of reactions (= number of non-food molecule types) then:

*all-or-nothing model*, for which

*λ*

_{i}= 1 −

*π*= 1 −

*μ*

_{C}for

*i*≥ 1 (and

*λ*

_{0}= 1). Equation 17 simplifies to:

This expression can also be derived by the following direct argument. First, note that a subset *S* of the *N* products of reactions does not correspond to a RAF if and only if each of the |*S*| elements *x* in *S* has *C*_{x} = 0. The random variable *W* = |{*x* : *C*_{x} = 1}| follows the binomial distribution *Bin*(*N*, *μ*_{C}), and the proportion of sets of size *N* that avoid a given set *S* of size *m* is 2^{−m}. Thus the expected proportion of subsets that are not RAFs is the expected value of 2^{−W} where *W* is the binomial distribution above. Applying standard combinatorial identities then leads to Equation 18.

*μ*

_{C}to tend to 0 in such a way that

*P*

_{RAF}converges to 0 exponentially quickly with

*N*while

*μ*

_{RAF}tends to infinity at an exponential rate with

*N*(this requires

*μ*

_{C}to decay sufficiently fast with

*N*but not too fast, e.g.,

*μ*

_{C}= Θ(

*N*

^{−1−δ}) for

*δ*> 0). Comparing Equations 18 and 19, we also observe the following identity:

*uniform model*, applying straightforward algebra to Equation 17 leads to

We now use these formulae to investigate the relationship between *P*_{RAF} and *μ*_{RAF} in elementary chemical reaction systems (satisfying conditions (i)–(iv)) as *N* becomes large; in particular the impact of the choice of model (all-or-nothing vs. uniform) on this relationship.

### 4.3 Asymptotic Properties of the Two Models at the Catalysis Level Where RAFs Arise

For the all-or-nothing and uniform models, RAFs arise with a given (positive) probability, provided that *μ*_{C} converges to 0 no faster than *N*^{−1} as *N* grows. Thus, it is helpful to write *μ*_{C} = *γ*/*N* to compare their behavior as *N* grows.

*N*becomes large (with

*γ*being fixed), and so:

*μ*

_{C}(and hence

*γ*) value. It can be shown that when

*γ*<

*e*

^{−1}, we have:

*o*(

*γ*) has order

*γ*

^{2}as

*γ*→ 0 (a proof is provided in the Appendix).

*N*and assuming

*γ*< 1), we have:

*γ*and the uniform model we have:

*γ*<

*e*

^{−1}:

*N*becomes large (with

*γ*fixed).

Comparing Equations 21 and 25 reveals a key difference in the ratio *μ*_{RAF}/*P*_{RAF} between the all-or-nothing and uniform models when *N* is large and *γ* is small: the former equation involves an exponential term in *N*, while the second does not. This can be explained as follows. In the all-or-nothing model, the existence of a RAF comes down to whether or not there is a reaction *r* that generates a universal catalyst; when there is, then any subset of the *N* reactions that contains *r* is a RAF. By contrast, with the uniform model at a low catalysis level where RAFs are improbable, if a RAF exists, there is likely to be only one. Note that the results in this section are particular to chemical reaction systems that are elementary and satisfy properties (i)–(iv) as described at the start of this section.

## 5 Concluding Comments

In this paper, we have focused on the expected number of RAFs and uRAFs (rather than the probability of at least one such set existing), as this quantity can be described explicitly, and generic results described via this expression can be derived (e.g., in Parts (ii) and (iii) of Theorem 1 and Corollary 1). Even so, the expressions in Theorem 1 involve quantities *n*_{k,s} that may be difficult to quantify exactly; thus in the second part of the paper, we consider more restrictive types of systems.

In our analysis, we have treated inhibition and catalysis as simple and separate processes. However, a more general approach would allow reactions to proceed under rules that are encoded by Boolean expressions. For example, the expression (*a* ∧ *b*) ∨ *c* ∨ (*d* ∧ ¬*e*) assigned to a reaction *r* would allow *r* to proceed if at least one of the following holds: (i) both *a* and *b* are present as catalysts, or (ii) *c* is present as a catalyst, or (iii) *d* is present as a catalyst and *e* is not present as an inhibitor. Extending the results in this paper to this more general setting could be an interesting exercise for future work.

## Acknowledgments

We thank the two reviewers for a number of helpful comments on an earlier version of this manuscript.

## Note

An asymptotic study of the emergence of first cycles in large random directed graphs was explored in Bollobás and Rasmussen (1989).

## References

### Appendix: Justification of Equations 18 and 22

#### Equation 18

*x*

^{k}= (1 +

*x*)

^{n}. Set

*λ*= 1 −

*μ*

_{C}. Since

*λ*

_{i}=

*λ*for

*i*≥ 1, the binomial identity (with

*x*= −1,

*n*=

*j*,

*k*=

*i*) gives:

*j ≥*1. Thus, adding in the additional term (for

*i*= 0 where

*λ*

_{0}= 1), we obtain:

*n*=

*N*,

*k*=

*j*, and with one application using

*x*= 1, the other using

*x*=

*λ*).

#### Equation 22

*j*-th term on the LHS of Equation 22 is $Nj$ (1 − (1 −

*γ*/

*N*)

^{j})

^{j}. For

*j*= 1 this simplifies to

*γ*. A simple proof by induction shows that for all

*j*≥ 1, and all

*x*∈ (0, 1), we have: (1 −

*x*)

^{j}≥ 1 −

*xj*and so (1 − (1 −

*x*)

^{j})

^{j}≤ (

*xj*)

^{j}. Applying this with

*x*=

*γ*/

*N*∈ (0, 1), the LHS of Equation 22 is bounded below by

*γ*(the term where

*j*= 1) and is bounded above by:

*N*

^{j}≤ $1j!$. Next, observe that, by Stirling's formula for

*j*!, we have:

*j*increases provided that

*γ*<

*e*

^{−1}; in particular, for

*γ*<

*e*

^{−1}, the sum ∑

_{j≥2}$\gamma jjjj!$ converges to a constant of order

*γ*

^{2}as

*γ*→ 0.