## Abstract

We show that self-replication of a chemical system encapsulated within a membrane growing from within is possible without any explicit feature such as autocatalysis or metabolic closure, and without the need for their emergence through complexity. We use a protocell model relying upon random conservative chemical reaction networks with arbitrary stoichiometry, and we investigate the protocell's capability for self-replication, for various numbers of reactions in the network. We elucidate the underlying mechanisms in terms of simple minimal conditions pertaining only to the topology of the embedded chemical reaction network. A necessary condition is that each moiety must be fed, and a sufficient condition is that each siphon is fed. Although these minimal conditions are purely topological, by further endowing conservative chemical reaction networks with thermodynamically consistent kinetics, we show that the growth rate tends to increase on increasing the Gibbs energy per unit molecular weight of the nutrient and on decreasing that of the membrane precursor.

## 1 Introduction

At the basis of life, chemical self-replication is believed to require specific features such as autocatalysis or metabolic closure. In most previous theoretical work, such features were either made explicit in the chemical reaction network or were emergent as the result of complexity. Examples of the former are Ganti's chemoton, where molecules and chemical pathways are assigned specific roles and functions [9], and Cornish-Bowden's instantiation of Rosen metabolism-repair (*M,R*) systems, where explicit pathways account for the synthesis of all enzymes [20, 19]. Examples of the latter are Kauffman's autocatalytic protein sets, in which autocatalysis emerges as the result of complexity in random Boolean networks [12], and Kaneko's protocell toy model in which some molecules are assigned specific roles, but all other cell constituents interact and are produced through a large random catalytic chemical reaction network [11, 13]. Our own previous work is also an example of the latter: Building upon Kaneko's work, two molecules (nutrient, membrane precursor) were assigned specific roles, and all other constituents were produced by a large random chemical reaction network [2].

Yet, experimental attempts towards artificial life and accompanying modeling work rely upon much simpler chemistries, typically consisting of only a couple of constituents and reactions [14, 6, 15]. This raises the question of what the minimal conditions for a self-replicating chemical system truly are.

In this work we show that self-replication of a chemical system encapsulated within a membrane growing from within is possible without any explicit feature such as autocatalysis or metabolic closure, and without the need for their emergence through complexity. We use a protocell model relying upon random conservative chemical reaction networks (CRNs) with arbitrary stoichiometry, and we investigate the protocell's behavior, varying the number of reactions. We elucidate the underlying mechanisms in terms of simple minimal conditions pertaining only to the topology of the embedded CRN. A necessary condition for protocell stationary growth is that each moiety must be fed, and a sufficient condition is that each siphon is fed. These conditions are consistent with previously reported mathematical proofs [3].

Further endowing the CRN with thermodynamically consistent kinetics, we show that the growth rate increases on increasing the Gibbs energy per molecular weight of the nutrient and on decreasing that of the membrane precursor. We propose an approximate thermodynamical model to account for this behavior. This approximate model requires knowledge of the CRN topology, the Gibbs energies of chemical constituents, and the nutrient diffusion and membrane parameters, but does not require knowledge of the CRN kinetics.

## 2 Definitions and Methods

In this text, a vector ** v** is written in boldface. Its

*i*th coordinate is denoted

*v*

_{i}. The

*i*th coordinate of a vector

*v*_{k}, carrying its own index

*k*is denoted

*v*

_{k,i}.

### 2.1 Definitions

** Moieties** Following standard practice in CRN theory [8], a stoichiometry matrix

**may be associated to any CRN. Only conservative CRNs are considered in this work, with all material fluxes (nutrient, membrane precursor incorporation or dilution) made explicit in the protocell model. There thus exists a strictly positive mass vector**

*S***such that**

*m*

*m*^{T}

**= 0 (**

*S**m*

_{i}is the molecular mass of species

*A*

_{i}). There may exist more than one solution to mass conservation. The set of mass vectors compatible with mass conservation is {

**|**

*m**m*

_{i}> 0,

*i*= 0,…,

*N*− 1, and

*m*^{T}

**= 0}. It is a pointed convex cone having**

*S**p*linearly independent generating vectors {

*b*_{k}}

_{k=0,…, p−1}, which constitute a basis of Ker(

*S*^{T}) [21]. Moieties are the elements of such a set of generating vectors and correspond to positive linear combinations of concentrations that are left invariant by the chemical reactions. The

*support*of a moiety

**is the subset of chemical species**

*b**A*

_{i}along which

**has nonzero components.**

*b*** Siphons** The concept of siphon was first applied to CRNs to study persistence [1]. A

*siphon Z*is a subset of species such that for each species

*A*

_{i}in

*Z*and every reaction where

*A*

_{i}appears as product, at least one of the reactant species also belongs to

*Z*. A siphon is minimal if it does not contain strictly any other siphon. Siphons are essentially the sets of chemical species whose absence cannot be compensated by the chemical reactions.

### 2.2 Random Network Generation

**Any reaction with arbitrary stoichiometry may be decomposed into elementary monomolecular or bimolecular steps, at the expense of increasing the number of chemical intermediate species (even if short-lived). These elementary reactions typically represent elementary physico-chemical events, most of which only involve one or two reactants. This is the reason why we only consider reactions of the kindwhere {**

*Topology**A*

_{i}}

_{i=0,…, N−1}is the set of all

*N*reacting chemical species intervening in the network.

By convention, whenever we refer to a total number of reactions given by *R*, we choose to count forward and reverse reactions as a single reaction (all chemical reactions are assumed reversible for consistency with thermodynamics [7]). The pool of elementary reactions contains transformation reactions plus association–dissociation reactions (allowing 2*A*_{i} ⇄ *A*_{k}, and forbidding the non-conservative *A*_{i} + *A*_{k} ⇄ *A*_{i}), which makes a total of elementary reactions. To generate a random CRN, an initial reaction is first randomly picked in this pool, followed by additional random choices. Compatibility with mass conservation is checked upon each addition by running a previously published standard algorithm [21], which relies upon computing the set of generating vectors for the pointed convex cone of admissible mass vectors (moieties). If a pulled reaction is not compatible with the existing set, it is rejected from this and any subsequent choice. As the number of reactions in the network under construction increases, the number of moieties is reduced, and eventually reaches unity, at which point the rank of the network is *N* − 1. We denote by *R*_{th1} the number of reactions for which this first threshold is reached, and call the corresponding CRN the threshold-1 network. By construction, *R*_{th1} ≥ *N* − 1, because the rank of an *N* × *R* stoichiometry matrix ** S** is at most

*R*. Beyond that threshold, it suffices to check that each additional reaction satisfies mass conservation for the single admissible mass vector (single, that is, up to a multiplying factor). At some point, a maximum size is reached when none of the remaining reactions is compatible with mass conservation. We denote by

*R*

_{max}the corresponding number of reactions, and call the corresponding CRN the maximum-size network.

Minimal siphons are also monitored during this construction process. Increasing the number of reactions increases the number of species participating in minimal siphons, and eventually the only siphon is the full set of species. We denote by *R*_{th2} the number of reactions for which this second threshold is reached, and call the corresponding CRN the threshold-2 network. Consistent with the proof given in Appendix 1, we observe that *R*_{th2} ≥ *R*_{th1}.

By this random generation process we find that statistical distributions are (i) narrow for *R*_{th1}, (ii) broader for *R*_{th2}, and (iii) even broader for *R*_{max}. Generating a sample of 1000 networks with *N* = 10 chemical species, and pulling any elementary reaction with equal probability, we found 9 ≤ *R*_{th1} = 10 ± 1.6 ≤ 21; 9 ≤ *R*_{th2} = 13 ± 4 ≤ 43; and 14 ≤ *R*_{max} = 66 ± 22 ≤ 108. The boundary numbers are the minimum and maximum values in the sample, and the number after ± designates the standard deviation. These numbers are to be compared with a total of 495 elementary reactions in the entire pool of elementary reactions. Figure 1 shows these statistical distributions. Although *R*_{th2} = *R*_{th1} for 393 out of the 1000 pulled networks, there were 12 networks for which the single siphon threshold could not be reached even for *R* = *R*_{max} and for which no *R*_{th2} could be defined.

**Once the network topology has been defined, mass-action kinetics are then assigned. A first step consists in assigning random standard chemical potentials {μ**

*Kinetics*_{0, i}}

_{i=0,…, N−1}(also known as standard Gibbs free energies of formation) to the species {

*A*

_{i}}

_{i=0,…, N−1}. The second step consists in using these to determine the forward versus reverse directions (under standard conditions) for each reaction by comparing the sum of reference chemical potentials for products minus that for reactants, Δμ. The third and final step consists in assigning random kinetic coefficients for forward reactions, and computing kinetic coefficients for reverse reactions, consistently with thermodynamics, usingwhere c

^{⦵}is the standard concentration for which standard chemical potentials are defined, δ is the difference in molecularity between forward and reverse, R is the ideal gas constant, and

*T*is the absolute temperature. It is numerically verified that such random CRNs endowed with thermodynamically consistent kinetics verify detailed balance and minimize their free energy, for a closed system at equilibrium.

*x*,

*y*) designating a random real number with equal probability between

*x*and

*y*. Taking the reference concentration c

^{⦵}= 1 M, random reference chemical potentials were chosen as μ

_{0, i}/R

*T*= U(−7.5, 7.5) so that the maximum variation in chemical potential approaches that for ATP hydrolysis. Random forward kinetic coefficients were taken spanning two orders of magnitude through

*k*

^{→}=

*k*

_{avg}× 10

^{U(−1,1)}with

*k*

_{avg}= 1 × 10

^{2}s

^{−1}for monomolecular forward reactions and

*k*

_{avg}= 1 × 10

^{4}M

^{−1}·s

^{−1}for bimolecular forward reactions, consistent with typical

*k*

_{cat}and

*k*

_{cat}/

*K*

_{m}for enzymatic processes. The choice of typical

*k*

_{cat}and

*k*

_{cat}/

*K*

_{m}as average kinetic coefficients is justified as follows. Actual metabolic networks are made of enzymatic reactions, which may be decomposed as an association elementary step followed by a dissociation elementary step:Michaelis-Menten kinetics are derived by approximating the net balance across these two reactions (each considered with mass-action kinetics) under the following assumptions: The second step should be strongly forward biased with

*E*:

*S*→

*E*+

*P*(dissociation) having a significantly large kinetic coefficient

*k*

_{cat}; and the sum of kinetic coefficients for

*E*:

*S*→

*E*+

*S*and

*E*:

*S*→

*E*+

*P*(dissociations) divided by the kinetic coefficient for

*E*+

*S*→

*E*:

*S*(association) should be a sufficiently low concentration

*K*

_{m}. This corresponds to monomolecular (to bimolecular) forward elementary reactions having

*k*

_{cat}(

*k*

_{cat}/

*K*

_{m}) as typical kinetic coefficient.

When generating multiple CRNs, a new random set of standard chemical potentials and a new random set of forward kinetic coefficients are generated for each CRN. Figure 2 gives an example random CRN.

### 2.3 Protocell Model

In this section, we present a protocell model that embeds the previously considered CRN within a membrane synthesized from within. Two species participating in the CRN play special roles: the nutrient *A*_{nu} flowing by diffusion across the membrane, and the membrane precursor *A*_{me}. Figure 3 gives a schematic representation of the protocell model.

*Assumptions*

- 1.
Constant surface-area-to-volume ratio : An example of cellular shape compatible with this assumption is that of a thin filament. This is a simplifying assumption, as in the most general case this ratio should only be constrained to be periodic, ρ(

*t*+*T*) = ρ(*t*), where*T*is the cell cycle period, with the additional constraint so that at the end of a cell cycle, one ends up with exactly the material required for two new identical cells. - 2.
Self-assembly of a precursor

*A*_{me}in a structured membrane: Incorporation of*A*_{me}from inside the protocell into the growing membrane is kinetically controlled, with kinetic coefficient*K*_{me}per unit area, or*k*_{me}= ρ_{0}*K*_{me}per unit volume, owing to assumption 1. Denoting by*N*_{me}the number of molecules per unit area in the self-assembled membrane,*C*_{me}= ρ_{0}*N*_{me}is the effective membrane concentration as if all self-assembled molecules were dissolved in the cell volume. - 3.
Semipermeability of self-assembled membrane to a nutrient

*A*_{nu}flowing by passive diffusion: The resulting nutrient influx is*F*_{input,nu}per unit area, or*f*_{input,nu}= ρ_{0}*F*_{input,nu}per unit volume, owing to assumption 1. - 4.
Homogeneous concentrations: All chemical species are assumed to be homogeneously distributed, and all intracellular diffusion effects are neglected. This is a simplifying assumption, as one should expect the concentration of the nutrient

*A*_{nu}(respectively, the membrane precursor*A*_{me}) to be highest (lowest) near the membrane that acts as an effective source (sink) for such chemical species. - 5.
Constant outside nutrient concentration, even in the presence of a growing protocell: This would typically be the case with an outside growth medium whose volume is large compared to that of the protocell.

Cell division is not taken into account in this work. We shall consider the protocell as compatible with self-replication if the concentration vector trajectory converges asymptotically either towards constant concentrations (protocell stationary growth) or towards periodic concentrations with periodicity *T*. In the latter case, an additional requirement is that the cell surface area (and volume ) be doubled every *T*, , where is the instantaneous growth rate.

**With the above assumptions, the ODE system governing the time evolution of concentrations inside the protocell is given bywhere**

*Ordinary differential equation (ODE) system***is the concentration vector inside the protocell,**

*c***is the stoichiometry matrix,**

*S***=**

*f***(**

*f***) is the reaction flux vector giving the reaction rates as a function of the concentration vector**

*c***,**

*c*

*f*_{input}is the nutrient flux vector with sole nonzero component

*f*

_{input,nu}along

*A*

_{nu},

*f*_{output}is the membrane precursor incorporation flux vector with sole nonzero component

*f*

_{output,me}along

*A*

_{me}, and λ

**represents the dilution factor with growth rate λ given bywhere is the protocell volume (surface area). The latter equality in the above equation results from assumption 1.**

*c**f*

_{output,me}and

*C*

_{me}as follows. As

*A*

_{me}gets incorporated into the growing membrane with rate (per unit area)

*F*

_{me}=

*K*

_{me}

*c*

_{me}, the membrane area grows asEquations 2–6 thus give the ODE system governing the time evolution of concentrations for the growing protocell. The CRN is characterized by its stoichiometry matrix

**and by the reaction kinetics represented by the dependence of the reaction flux vector on the concentration vector,**

*S***=**

*f***(**

*f***). The membrane is characterized by**

*c**k*

_{me}and

*C*

_{me}accounting for precursor incorporation, as well as by accounting for nutrient diffusion. And the environment is characterized by the outside nutrient concentration

*c*

_{out,nu}.

** Minimum conditions for stationary growth** We have previously proved necessary and sufficient conditions for protocell stationary growth [3]:

- 1.
A necessary condition is that each moiety

must be fed. Otherwise, all species in the support of*b*go extinct by dilution, as there is no material influx into this support.*b* - 2.
A sufficient condition for the existence of a fixed point is that each siphon contains the support of a moiety (that is fed). Extending the results in [3], we have recently proved that a weaker sufficient condition for the existence of a fixed point is simply that each siphon is fed [4].

These conditions hold for any conservative CRN endowed with any kinetics and for any membrane parameters. Although this is not considered in the present numerical analyses, the proofs in [3] hold even if some species leak out through the membrane.

### 2.4 Numerical Methods

In order to determine the protocell system trajectory for various combinations of membrane (, *k*_{me}, *C*_{me}) and environmental (*c*_{out,nu}) parameters, a set of nominal parameters is first chosen, and the parameter space is then explored by varying nominal parameters over several orders of magnitude.

** Nominal parameters** Nominal (,

*c*

_{out,nu},

*k*

_{me},

*C*

_{me}) values were selected that we believed to make biological sense for

*E. coli*, chosen as an arbitrary reference organism. These values are ,

*c*

_{out,nu}= 1 × 10

^{−3}M,

*k*

_{me}= 1 × 10

^{−3}s

^{−1}, and

*C*

_{me}= 5 × 10

^{−2}M. The chosen

*c*

_{out,nu}is a typical glucose concentration used in

*E. coli*growth medium. The rationale for the chosen

*C*

_{me}value is the following: The total dry weight in the exponential growth phase is estimated to be 500 fg, out of which 8% corresponds to the membrane [17]. This gives a total dry weight of 40 fg for the membrane, corresponding to a density of 40 g·l

^{−1}for an estimated 1-μm

^{3}total cell volume. The membrane is primarily constituted of phosphatidylethanolamine with molecular weight 744 g·mol

^{−1}[10]. This gives . As there is no equivalent direct way of estimating a biologically sensible

*k*

_{me}, we proceeded indirectly. The total metabolite concentration in

*E. coli*is 300 mM [17]. We arbitrarily assumed that membrane precursors constitute 10% of this pool, that is,

*c*

_{me}≈ 30 mM. The cell division time is typically 40 min, corresponding to a growth rate of 4 × 10

^{−4}s

^{−1}. This is given for our model by Equation 6, from which we extract

*k*

_{me}= λ

*C*

_{me}/

*c*

_{me}≈ 1 × 10

^{−3}s

^{−1}. Last, was chosen so that the asymptotic value of

*c*

_{me}approximately matches the above expected

*c*

_{me}.

*Approach for exploring the parameter space*

- 1.
The protocell trajectory was first integrated for the

*N*×*N*= 100 possible (*A*_{nu},*A*_{me}) combinations using the above nominal parameters (,*c*_{out,nu},*k*_{me},*C*_{me}). - 2.
*A*_{nu}(*A*_{me}) was then fixed as the species that maximizes (minimizes) μ_{0, i}/*m*_{i}. As explained in Appendix 3, this (*A*_{nu},*A*_{me}) combination tends to maximize the growth rate. Then the parameters (,*c*_{out,nu},*k*_{me},*C*_{me}) were varied around their nominal values the following way:- (a)
First, (

*k*_{me},*C*_{me}) were kept constant at their nominal values, and the ODE system was integrated for various (*x*,*y*) = (,*c*_{out,nu}) values. - (b)
Second, (,

*c*_{out,nu}) were kept constant at their nominal values, and the ODE system was integrated for various (*x*,*y*) = (*k*_{me},*C*_{me}) values.In each case, three values of

*x*(generated in three-order-of-magnitude variation around the nominal value), for each of which nine values of*y*(generated in one-order-of-magnitude variation around the nominal value) were tested.

- (a)

Each of the 154 trajectories (100 trajectories for the (*A*_{nu}, *A*_{me}) span + 27 trajectories for the (, *c*_{out,nu}) span + 27 trajectories for the (*k*_{me}, *C*_{me}) span) was integrated using three different choices of initial conditions: all *c*_{i} equal to 1 × 10^{−9} M; all *c*_{i} equal to 1 M; and all *c*_{i} equal to their equilibrium concentrations *c*_{equ,i} (*c*_{equ} as defined in Appendix 2). These various initial conditions are referred to as *low*, *high*, and *medium*, respectively.

This characterization process was repeated for the maximum-size, the threshold-2, and the threshold-1 (if different from threshold-2) networks. Additional characterizations were also carried out after removing the last-added reaction from the threshold-1 network.

ODEs were handled in vectorial form using the odeint native ODE solver from the scientific Python software package (Python version 2.7.3) with dynamical adjustment of the time step.

** Investigated CRNs** In addition to the reference CRN example shown in Figure 2, a consecutive series of ten CRNs out of the sample of 1000 networks with

*N*= 10 chemical species was also fully characterized. This series was chosen so as to include one of the peculiar CRNs exhibiting shorter siphons even at maximum size. A table summarizing qualitative results for all 1 + 10 tested CRNs is given in the following section. Figures showing quantitative results are only given for the reference example. The corresponding figures for the other 10 CRNs are given in Supplementary Material.

## 3 Results

### 3.1 Qualitative

Qualitative findings for the 1 + 10 tested CRNs are given in Table 1. CRNs are indexed by their original numbering when generating the sample of 1000 random CRNs, but have been grouped by similarity in features and behavior.

CRN index . | 1 . | 236 . | 239 . | 230 . | 231 . | 233 . | 234 . | 235 . | 238 . | 232 . | 237 . |
---|---|---|---|---|---|---|---|---|---|---|---|

R_{th1} | 9 | 13 | 13 | 11 | 11 | 9 | 10 | 9 | 9 | 11 | 9 |

R_{th2} | 9 | 13 | 13 | 12 | n. | 13 | 11 | 11 | 15 | 15 | 15 |

R_{max} | 71 | 85 | 87 | 75 | 25 | 53 | 67 | 61 | 56 | 108 | 75 |

A_{i} outside minimal siphon | n. | n. | n. | A_{9} | A_{4} | A_{4}, | A_{2} | A_{5}, | A_{3} | A_{0} | A_{3}, |

A_{7} | A_{6} | A_{5}, | |||||||||

A_{6}, | |||||||||||

A_{7} | |||||||||||

Failed (nu,me) for R = R_{th1} | n. | n. | n. | (9,1), | (4,0), | (4,1), | (2,1), | (5,5), | (3,3), | (0,1), | (3,1), |

(9,6), | (4,1), | (4,2), | (2,2) | (5,6), | (3,8) | (0,3), | (3,6), | ||||

(9,9) | (4,4), | (4,3), | (5,7), | (0,6) | (3,7), | ||||||

(4,6), | (4,4), | (5,8), | (3,8), | ||||||||

(4,8), | (4,5), | (5,9), | (5,1), | ||||||||

(4,9) | (4,6), | (6,3), | (5,5), | ||||||||

(4,7), | (6,5), | (5,8), | |||||||||

(4,8), | (6,6), | (6,1), | |||||||||

(4,9) | (6,7), | (6,6), | |||||||||

(6,8) | (6,7), | ||||||||||

(6,8), | |||||||||||

(7,1), | |||||||||||

(7,6), | |||||||||||

(7,7), | |||||||||||

(7,8) | |||||||||||

A_{nu} for max. | A_{0} | A_{3} | A_{8} | A_{3} | A_{8} | A_{9} | A_{6} | A_{3} | A_{5} | A_{0} | A_{3} |

Failed (, c_{out,nu}) | n. | n. | n. | n. | n. | n. | n. | n. | n. | (1,1–5), | All |

(2,1–5), | |||||||||||

(3,1–5) | |||||||||||

Failed (k_{me,}C_{me}) | n. | n. | n. | n. | n. | n. | n. | n. | n. | (2,1–9), | (2,1–9), |

(3,1–9) | (3,1–9) |

CRN index . | 1 . | 236 . | 239 . | 230 . | 231 . | 233 . | 234 . | 235 . | 238 . | 232 . | 237 . |
---|---|---|---|---|---|---|---|---|---|---|---|

R_{th1} | 9 | 13 | 13 | 11 | 11 | 9 | 10 | 9 | 9 | 11 | 9 |

R_{th2} | 9 | 13 | 13 | 12 | n. | 13 | 11 | 11 | 15 | 15 | 15 |

R_{max} | 71 | 85 | 87 | 75 | 25 | 53 | 67 | 61 | 56 | 108 | 75 |

A_{i} outside minimal siphon | n. | n. | n. | A_{9} | A_{4} | A_{4}, | A_{2} | A_{5}, | A_{3} | A_{0} | A_{3}, |

A_{7} | A_{6} | A_{5}, | |||||||||

A_{6}, | |||||||||||

A_{7} | |||||||||||

Failed (nu,me) for R = R_{th1} | n. | n. | n. | (9,1), | (4,0), | (4,1), | (2,1), | (5,5), | (3,3), | (0,1), | (3,1), |

(9,6), | (4,1), | (4,2), | (2,2) | (5,6), | (3,8) | (0,3), | (3,6), | ||||

(9,9) | (4,4), | (4,3), | (5,7), | (0,6) | (3,7), | ||||||

(4,6), | (4,4), | (5,8), | (3,8), | ||||||||

(4,8), | (4,5), | (5,9), | (5,1), | ||||||||

(4,9) | (4,6), | (6,3), | (5,5), | ||||||||

(4,7), | (6,5), | (5,8), | |||||||||

(4,8), | (6,6), | (6,1), | |||||||||

(4,9) | (6,7), | (6,6), | |||||||||

(6,8) | (6,7), | ||||||||||

(6,8), | |||||||||||

(7,1), | |||||||||||

(7,6), | |||||||||||

(7,7), | |||||||||||

(7,8) | |||||||||||

A_{nu} for max. | A_{0} | A_{3} | A_{8} | A_{3} | A_{8} | A_{9} | A_{6} | A_{3} | A_{5} | A_{0} | A_{3} |

Failed (, c_{out,nu}) | n. | n. | n. | n. | n. | n. | n. | n. | n. | (1,1–5), | All |

(2,1–5), | |||||||||||

(3,1–5) | |||||||||||

Failed (k_{me,}C_{me}) | n. | n. | n. | n. | n. | n. | n. | n. | n. | (2,1–9), | (2,1–9), |

(3,1–9) | (3,1–9) |

CRNs in the first vertical block (indices 1, 236, and 239) are such that *R*_{th1} = *R*_{th2}, while all others had a single shorter minimal siphon at *R* = *R*_{th1} (having a single minimal siphon at *R* = *R*_{th1} is not a general property: It is possible to build counterexamples having several minimal siphons and a single moiety). CRNs in the second vertical block (indices 230, 231, 233, 234, 235, and 238) are such that the *A*_{i} that maximizes μ_{0, i}/*m*_{i} belongs to the shorter minimal siphon at *R* = *R*_{th1}. CRNs in the last vertical block (indices 232 and 237) are such that the *A*_{i} that maximizes μ_{0, i}/*m*_{i} is outside the shorter minimal siphon at *R* = *R*_{th1}, which affects the protocell stationary growth capability when varying (, *c*_{out,nu}) or (*k*_{me}, *C*_{me}).

Their features have been grouped by horizontal blocks. The first block gives topological features. The second block gives protocell characteristics when varying (*A*_{nu}, *A*_{me}) with fixed nominal (, *c*_{out,nu}, *k*_{me}, *C*_{me}) parameters. The last block gives protocell characteristics when varying (, *c*_{out,nu}) or (*k*_{me}, *C*_{me}) with fixed (*A*_{nu}, *A*_{me}) such that *A*_{nu} (*A*_{me}) maximizes (minimizes) μ_{0, i}/*m*_{i}.

Findings are summarized below:

- 1.
For maximum-size or threshold-2 networks (i.e., for CRNs having the full set of species as single siphon), protocell trajectories asymptotically converge towards constant strictly positive concentrations (stationary growth regime) for any (

*A*_{nu},*A*_{me}) and (,*c*_{out,nu},*k*_{me},*C*_{me}) combination. The same stationary concentrations are reached for any of the three choices of initial conditions. - 2.
For threshold-1 networks such that

*R*_{th1}<*R*_{th2}:- (a)
If

*A*_{nu}belongs to the minimal siphon, protocell trajectories converge towards constant strictly positive concentrations (stationary growth regime) for any (*A*_{nu},*A*_{me}) and (,*c*_{out,nu},*k*_{me},*C*_{me}) combination. The same stationary concentrations are reached for any of the three choices of initial conditions. - (b)
If

*A*_{nu}lies outside the minimal siphon, stationary growth may be reached for some*A*_{me}and (,*c*_{out,nu},*k*_{me},*C*_{me}) choices, but not for others. The same behavior (stationary growth or not) and the same stationary concentrations (in case of stationary growth) are reached for the medium and high initial conditions. In some cases, stationary growth is reached from low but not from medium or high initial conditions, or stationary growth is reached from all initial conditions but different concentrations are reached for the low initial conditions. Multistationarity is thus in principle possible, although in these cases stationary concentrations reached starting from low initial conditions are so low for species in the unfed siphon that they are not physically sensible and are not considered further in this work.

- (a)
- 3.
Removing any reaction from threshold-1 networks (in which case CRNs have two moieties) typically results in no stationary protocell growth for any (

*A*_{nu},*A*_{me}) and (,*c*_{out,nu},*k*_{me},*C*_{me}) combination or for any choice of initial conditions. At least one of the chemical species asymptotically disappears. However, exceptions are encountered with possible stationary growth when the supports of the two moieties intersect for some species and when*A*_{nu}is chosen in this intersection.

Finding 3 is consistent with the necessary condition given in Section 2.3 (each moiety must be fed). At threshold 1, rank(** S**) =

*N*− 1 and the only moiety is the mass vector

**having as support the full set of species. The necessary condition is thus automatically satisfied for any (**

*m**A*

_{nu},

*A*

_{me}) choice. But removing any reaction below threshold 1 typically results in rank(

**) =**

*S**N*− 2, and there are two moieties, the support of which is most often disjoint. This explains why in general such a CRN cannot grow with a single nutrient flux (finding 3 above), with possible exceptions when the supports of the two moieties intersect for some species and

*A*

_{nu}is chosen in this intersection (in the most general case where the supports do not intersect, we have verified numerically that feeding the network with two nutrient fluxes such that each moiety is fed also typically results in stationary growth).

Finding 1 is consistent with the sufficient condition given in Section 2.3 (each siphon contains the support of a moiety that is fed). If *R* ≥ *R*_{th2} (threshold-2 or maximum-size networks), then the only moiety is the mass vector ** m** having as support the full set of species that is also the only siphon. The sufficient condition is thus satisfied for any (

*A*

_{nu},

*A*

_{me}) choice. The proof in [3] is only partial, because it only ensures existence, not uniqueness or stability. Finding 1 suggests that this fixed point might be stable and unique, that is, a global attractor. Further work is required either to confirm this hypothesis with an extended proof, or to invalidate it with a counterexample.

Finding 2a is consistent with the weaker sufficient condition mentioned in Section 2.3: Having each siphon fed (and not necessarily having each siphon containing the support of a moiety that is itself fed) suffices to ensure stationary growth, also with the corresponding fixed point apparently being a global attractor.

Besides the stationary regimes observed in this work, an oscillatory behavior such that ** c**(

*t*+

*T*) =

**(**

*c**t*) (

*T*being the cell cycle period) would also be compatible with cell replication provided the cell surface area (or volume ) is doubled every . Although we did not observe such behavior in our random CRNs, we note that the proved necessary and sufficient conditions in [3] also hold for both stationary and oscillatory cases.

### 3.2 Quantitative

Protocell stationary growth characteristics are given below for the reference example shown in Figure 2 (maximum-size and threshold networks; threshold 1 and threshold 2 coincide for this example). Characteristics for the 10 other random CRNs (threshold-1, threshold-2, and maximum-size networks) are given in the Supplementary Material.

Appendix 2 presents an approximate thermodynamic model that gives an estimate of stationary concentrations assuming the out-of-equilibrium protocell is close to thermodynamic equilibrium. This approximate model will prove useful when analyzing the dependence of the stationary out-of-equilibrium state on protocell parameters.

** Growth rate as function of (A_{nu}, A_{me})** Figure 4 gives the growth rate λ and the ratio λ/λ

_{equ}(where λ

_{equ}is the estimated growth rate using the approximate thermodynamic equilibrium model) as a function of the (

*A*

_{nu},

*A*

_{me}) combination for both threshold and maximum-size networks. Chemical species indices are ranked by decreasing reference chemical potential per unit molecular weight, μ

_{0,i}/

*m*

_{i}. Carrying out a two-dimensional linear regression, this is the ranking that tends to give the highest correlation coefficient (averaging this correlation coefficient over all tested 1 + 10 CRNs) compared to a ranking by decreasing μ

_{0, i}or to the unranked indices: λ tends to increase with increasing μ

_{0, nu}/

*m*

_{nu}and with decreasing μ

_{0, me}/

*m*

_{me}. Appendix 3 explains why such a trend is also to be expected from the approximate thermodynamic equilibrium model. Although all (

*A*

_{nu},

*A*

_{me}) combinations theoretically result in stationary growth, only those combinations that tend to have a high μ

_{0, nu}/

*m*

_{nu}(or a low μ

_{0, me}/

*m*

_{me}) result in a significant growth rate.

Both threshold and maximum-size networks exhibit similar λ patterns (left plots), whereas λ/λ_{equ} (right plots) exhibit significant differences. This is because λ spans about 20 orders of magnitude whereas λ/λ_{equ} only spans about 3 orders of magnitude. λ_{equ} is thus a fair predictor of the λ-versus-(*A*_{nu}, *A*_{me}) pattern. This also shows that features in these patterns originate mostly from the characteristics of chemical species {(*m*_{i}, μ_{0, i})}_{i=0,…,N−1} as well as from the protocell parameters (, *c*_{out,nu}, *k*_{me}, *C*_{me}), but not from the CRN kinetics, because the approximate model is independent of these (and is the same for both threshold and maximum-size networks). The maximum-size network is closer to equilibrium than the threshold network, which is consistent with the larger number of chemical pathways through which the system can relax.

*Growth rate as function of (*** , c_{out,nu}) and of (k_{me}, C_{me})** We here keep (

*A*

_{nu},

*A*

_{me}) = (

*A*

_{0},

*A*

_{7}), which maximizes μ

_{0, nu}/

*m*

_{nu}and minimizes μ

_{0, me}/

*m*

_{me}. Figure 5 [Figure 6] gives the growth rate λ and the ratio λ/λ

_{equ}for varying (,

*c*

_{out,nu}) [for varying (

*k*

_{me},

*C*

_{me})] for both threshold and maximum-size networks. As intuitively expected, the growth rate increases with increasing effective diffusion constant , nutrient concentration,

*c*

_{out,nu}in the outside growth medium or kinetic coefficient

*k*

_{me}for membrane precursor incorporation; and it decreases with increasing effective membrane concentration

*C*

_{me}. Appendix 3 explains why the trends in Figures 5 and 6 (left) for the dependence of λ on (,

*c*

_{out,nu}) or (

*k*

_{me},

*C*

_{me}) are to be expected from the approximate thermodynamic equilibrium model.

Figures 5 and 6 (right) show that λ remains within a factor of two from λ_{equ}, except at the highest value for the threshold network. The membrane precursor stationary concentration *c*_{me} (proportional to λ through Equation 6) is thus fairly close to its equilibrium value. But this is not true of all chemical species. This is illustrated in Figure 7, giving the stationary concentrations {*c*_{i}}_{i=0,…, N−1} and the ratios {*c*_{i}/*c*_{equ,i}}_{i=0,…, N−1} (where *c*_{equ} is the concentration vector estimated using the approximate thermodynamic equilibrium model given in Appendix 2) as a function of the outside nutrient concentration *c*_{out,nu} with all other parameters (, *k*_{me}, *C*_{me}) fixed to their nominal values and (*A*_{nu}, *A*_{me}) = (*A*_{0}, *A*_{7}). Although *c*_{me} = *c*_{7} remains fairly close to equilibrium, other concentrations may be several orders of magnitude away from equilibrium. Here again, the maximum-size network is significantly closer to equilibrium than the threshold network, which is consistent with the larger number of chemical pathways through which the system can relax.

## 4 Discussion

### 4.1 Chemical Interpretation of Necessary and Sufficient Conditions

The necessary condition is consistent with common intuition: Either species in the support of an unfed moiety go extinct by dilution, or the growth rate tends towards zero. Models of specific metabolic pathways often make directly apparent the existence of certain moieties. Taking glycolysis as an example, all reactions that involve ATP (involve ADP) result in it being downconverted to ADP (upconverted to ATP). As a result, if such a metabolic pathway were to be considered in a closed environment, the sum of concentrations for ATP and ADP would remain constant. And if glycolysis were to be considered in a cellular model with a finite growth rate (and thus dilution), the concentrations of ATP and ADP would progressively be reduced towards zero by dilution. Full cell models must therefore also explicitly take into account additional metabolic pathways that synthesize ATP (or ADP). In essence, this is what the necessary condition says.

*pass*reactions of the kindwhere

*A*

_{j}and

*A*

_{k}belong to the siphon and

*A*

_{i}does not. Whether such a pass reaction is capable of transferring mass into the siphon at a sufficient rate to compensate dilution depends not only on the way the concentrations

*c*

_{j}and

*c*

_{k}may tend towards zero relative to

*c*

_{me}(which drives the growth rate and thus the dilution), but also on the value of

*c*

_{i}(

*A*

_{i}outside the siphon belongs to the support of a fed moiety) as well as on the kinetic coefficients of the pass reaction. This is consistent with finding 2b in Section 3.1: Stationary growth may or may not be reached, depending on the chosen parameters (

*A*

_{nu},

*A*

_{me}) and (,

*c*

_{out,nu},

*k*

_{me},

*C*

_{me}).

### 4.2 Growth Drives the System Away from Thermodynamic Equilibrium

With nutrient flowing across the membrane by bidirectional diffusion, one could expect the system to relax towards thermodynamic equilibrium, with the same chemical state inside and outside the protocell. That would be the case if the only contributions to the ODE system given by Equation 2 were the contributions from chemical reactions given by ** Sf** and from the nutrient flux given by Equation 4. But there are two additional contributions that drive the system away from thermodynamic equilibrium: the membrane precursor incorporation rate −

*f*_{output}(with

*f*

_{output,me}given by Equation 5) and the dilution factor −λ

**(with λ given by Equation 6). These additional contributions are generally only negligible if the membrane precursor concentration**

*c**c*

_{me}is very low, which results in a negligible growth rate. This is typically the case when the Gibbs energy per unit molecular weight is high for

*A*

_{me}or low for

*A*

_{nu}. In such cases, not only is the nutrient concentration inside the protocell very similar to that in the outside growth medium, but so are also all other concentrations (unless there is a continuous flow of

*A*

_{nu}through the outside growth medium as hypothesized in [22]).

If on the contrary the Gibbs energy per unit molecular weight is high for the nutrient *A*_{nu} and low for the membrane precursor *A*_{me}, then the growth rate tends to be significant. With such an energetic configuration, the rate of transformation of *A*_{nu} into *A*_{me} (as well as into all other cell constituents) tends to be large, the nutrient concentration inside the protocell is significantly below that in the outside growth medium, and the membrane precursor concentration tends to be high. This results not only in a large growth rate, but also in a protocell that is significantly driven away from equilibrium.

### 4.3 Osmotic Pressure and Cell Geometry

Our assumption of a constant surface-area-to-volume ratio ρ could typically correspond to a situation where the cell membrane is be so rigid that it is constrained to take a filament shape regardless of any difference in osmotic pressure across the membrane.

**in the ODE system. Assuming the dynamics of such a balancing to be very fast, this would correspond to a constraint of the kind**

*c*

*u*^{T}

**=**

*c*

*u*^{T}

*c*_{out}, where

**is the osmolar vector (such that**

*u**u*

_{i}is the number of particles obtained by dissolving one molecule of

*A*

_{i}in water) and

*c*_{out}is the constant outside concentration vector. Taking the derivative with given by Equation 2 would relate λ = to (

**, ρ) throughwhere**

*c*

*F*_{input}(

*F*_{output}) is the nutrient (membrane precursor incorporation) flux vector per unit area. The missing additional differential equation for ρ could be simply obtained fromThis would make the set of ODEs fully self-consistent, with the trajectories of both the concentration vector

**and the surface-area-to-volume ratio ρ being solutions of this extended set of ODEs, similarly to the approach taken in [16]. It should be noted that the proofs in [3] do not hold in such a case. Under which minimal conditions would such a scheme result in protocell stationary growth or in a periodic regime compatible with self-replication remains an open question.**

*c*### 4.4 Comparison with Previously Reported Necessary and Sufficient Conditions

Within the framework of the proposed protocell model, a necessary condition for stationary growth is that each moiety must be fed, and a sufficient condition is that each siphon is fed. In this section, we compare these conditions with previously reported minimal conditions.

An osmotic synchronization condition for stationary reproduction of a protocell has previously been reported in [15]. This condition was derived by combining two conditions:

- 1.
From purely geometric considerations, the growth control coefficient, defined as the relative rate of increase of the cell membrane volume divided by the relative rate of increase of the cell membrane area must be equal to unity for stationary cell replication to occur.

- 2.
Water flows instantaneously across the membrane to balance any difference in osmotic pressure that may result from a difference in total concentration of chemical species between the inside of the cell and the outside growth medium.

Combining these two conditions results in the osmotic synchronization condition, which is not obviously met by any CRN, even those satisfying our sufficient condition. Condition 1 is equivalent to our assumption 2 of having a constant ratio. We believe that, in the most general sense, this ratio need not necessarily be constant, but should only be periodic, , with *T* being the cell cycle period (just as the concentration vector ** c** need not necessarily be constant but should be periodic,

**(**

*c**t*+

*T*) =

**(**

*c**t*)). Whether such a relaxation of condition 1 would lead to an extension of the class of CRNs satisfying the osmotic synchronization condition remains an open question (see the above discussion in Section 4.3). In any case, the simple necessary and sufficient conditions introduced in the present article correspond to a simplified model that focuses on the CRN properties, and that does not take into account other physical constraints such as the balance of osmotic pressure or mechanical properties of the membrane. Future work is required to build a generic model including such additional features, along the lines presented in Section 4.3.

Different sufficient conditions for emergent synchronization in protocells have also been proved in [6], *synchronization* referring in this work to the property of having the rate of synthesis for the membrane material matching that for the inner cell constituents. Such an emergent synchronization is quite distinct from earlier proposals [9] where synchronization was achieved by some ad hoc mechanism. Our results could also be qualified as emergent synchronization, because such a synchronization is automatically achieved once our sufficient condition is met. However, it is difficult to compare the earlier work with ours, because of the significant differences in model assumptions: In [6] self-replication of protocell constituents occurs without explicit nutrient flux or mass conservation, and the membrane precursor concentration is assumed constant (corresponding to a membrane growing from the outside). On the one hand, the protocell models for which the sufficient conditions in [6] were proved are a closer match to experimental approaches towards artificial life than ours. On the other hand, our approach may prove more general with respect to the metabolic network represented by the CRN.

A different line of work [13] has established different growth phases for a generic protocell model embedding a CRN of arbitrary complexity. In addition to various growth phases characterized by high or low growth rate and high or low nutrient influx, a death phase was also identified, corresponding to a negative growth rate. This death phase occurs if the nutrient concentration in the outside growth medium, *c*_{out,nu}, is too low. We see such behavior only for CRNs that do not comply with the sufficient condition. But for those CRNs that are compliant with that condition, stationary growth is guaranteed for any *c*_{out,nu} value. Here again, this may arise from different model assumptions: In [13] the moiety constituted by the energy currency molecules (*A* and *B*, using the notation of [13]) is kept at constant total concentration, and the membrane precursor concentration is also kept constant (which could correspond to a membrane growing from the outside, as in [6]). In our model, every moiety must be explicitly fed, and the membrane precursor concentration is a variable, which corresponds to the case of a membrane growing from within. Regarding the random CRN topology, all reactions in [13] are of the autocatalytic type. Excluding the specific moiety or siphon constituted by energy currency molecules, this choice is likely to result in a single moiety with all species having the same mass, and the large number of reactions considered (*R*/*N* ≥ 8) is likely to result in a single siphon. This should make the CRNs considered in [13] a particular case of the more general class of CRNs considered in the present article, and as such cannot account for the difference in behavior.

### 4.5 Relevance of Random CRNs

We do not make any claim regarding the chemical relevance of random CRNs, nor of the particular choice of *N* = 10 used in numerical analyses. Our goal in using such an approach is to understand the properties of any conservative chemical reaction networks (including real ones), and to unveil the underlying mechanisms behind such properties. Derived laws should then apply to any reaction network, including real ones. The necessary and sufficient conditions proved in [3] are a first step in this direction. More work remains to be done towards a finer delineation of the class of CRNs compatible with self-reproduction. Likewise, the approximate model approach is applicable to any thermodynamically consistent conservative CRN, and not just to the random ones generated in this work.

### 4.6 Robustness

This discussion addresses two aspects: (i) robustness against by-products, side reactions, or leakage through the membrane; and (ii) robustness against removing chemical reactions or changing kinetics.

Regarding (i), any conservative CRN also encompasses situations where elementary building blocks may be degraded or produced from simpler constituents inside the protocell. The proved necessary and sufficient conditions in [3] even hold in the case where some species may leak out of the membrane by passive diffusion. Considering all species that may diffuse through the membrane as a single set including nutrients and leaking species, the sufficient condition grants the existence of a fixed point, but does not predict which species concentrations settle at a higher or lower value inside versus outside concentration. In such a stationary growth regime, we can simply observe that concentrations settle to values such that as a whole there is a net positive material influx to ensure a positive growth rate, that is, the rate of mass nutrient influx exceeds the rate of mass leakage.

Regarding (ii), we have already mentioned that removing any reaction from a threshold-1 network typically results in no stationary growth with a single nutrient, unless the supports of the two moieties intersect for some species and the nutrient belongs to that intersection. Removing any reaction from a threshold-2 network typically results in this CRN having a shorter siphon. If *A*_{nu} is chosen within this siphon, then protocell stationary growth is still obtained. Else, protocell stationary growth may or may not be possible, depending on the particular CRN, its kinetics, or protocell membrane parameters. Removing any reaction from a maximum-size network typically results in this CRN being still above threshold 2, which automatically assures protocell stationary growth regardless of the chosen (*A*_{nu}, *A*_{me}) and of the protocell parameters. The above statements are consistent with common intuition: A larger number of reactions tends to increase the robustness with respect to reaction removal (robustness in terms of qualitative ability to grow). From a quantitative standpoint, the out-of-equilibrium stationary growth state gets closer to the approximate thermodynamic equilibrium model on increasing the number of reactions. This approximate model only depends on the individual characteristics of chemical species (masses and reference chemical potentials) and is independent of the CRN topology or reaction kinetics. This shows that increasing the number of reactions increases the quantitative phenotype stability upon reaction removal or upon changes in reaction kinetics.

### 4.7 Minimal Requirements for Self-Replication

Within the framework of our protocell model, there is no requirement for any specific features nor for their emergence through complexity. Any CRN such that the only siphon is the full set of species results in stationary protocell growth for any (*A*_{nu}, *A*_{me}) choice and for any (, *c*_{out,nu}, *k*_{me}, *C*_{me}) parameter set. With our random network generation process, Figure 1 shows that this is reached on average for a number of reactions of *R* = *R*_{th2} = 13 corresponding to a fairly low complexity ratio *R*/*N* = 1.3. Even smaller networks such that *R*_{th1} ≤ *R* < *R*_{th2} (corresponding to a complexity ratio 1 ≤ *R*/*N* < 1.3) result in protocell stationary growth for any *A*_{me} choice and for any (, *c*_{out,nu}, *k*_{me}, *C*_{me}) parameter set, provided *A*_{nu} belongs to some minimal siphon. And if there exists some siphon that does not contain *A*_{nu}, our numerical results show that there exists some (, *c*_{out,nu}, *k*_{me}, *C*_{me}) parameter range for which stationary growth can also be achieved. It thus appears that the class of CRNs compatible with protocell stationary growth is very broad.

## 5 Conclusion

Using a generic protocell model, we have shown that self-replication does not require any specific features such as autocatalysis or metabolic closure. Nor does it require the emergence of such features through complexity. Most random networks lead to protocells capable of stationary growth over a very broad parameter range, even at very low complexity with a number of reactions that is comparable to the number of chemical species. We have illustrated simple necessary and sufficient conditions for protocell stationary growth. A necessary condition is that each moiety must be fed. And a sufficient condition is that each siphon is fed. These conditions are consistent with previously reported mathematical proofs and hold for any conservative CRN, kinetics, membrane parameters, or growth medium nutrient concentration.

Although the theoretical capability for stationary growth only depends on the CRN topology, we have found that the nutrient (respectively, the membrane precursor) should have a high (a low) Gibbs energy per unit molecular weight for the growth rate to be significant. This was explained using an approximate thermodynamic equilibrium model. This model only requires knowledge of the chemical species characteristics (reference chemical potential and molecular weight), of the kinetics for nutrient influx and membrane precursor incorporation, and of membrane structural properties (number of molecules per unit membrane area). But it does not require knowledge of the reaction network topology or of the CRN kinetics.

## Acknowledgments

The authors would like to thank Pierre Legrain, Loïc Paulevé, and Laurent Schwartz for stimulating discussions, as well as anonymous reviewers for their useful comments.

## References

### Appendix 1 Proof for *R*_{th2} ≥ *R*_{th1}

The proof proceeds in two steps. First, we shall prove that the support of any moiety is a siphon. Second, we shall prove that if there is more than one moiety (i.e., if *R* < *R*_{th1}), then at least one of them has its support shorter than the full set of species. This support is thus a siphon that is shorter than the full set of species (i.e., *R* < *R*_{th2}). This proves that *R*_{th2} ≥ *R*_{th1}.

The proof of the two steps is given below.

- 1.
*First step:*The following definition will prove useful for this first step. For any moietyand any reaction in a CRN, we define the product*b*-weight (the reactant*b*-weight) as the sum of the components of*b*along product species (reactant species) weighted by the corresponding stoichiometric coefficients. Conservation ensures that the*b*-weight is conserved on either side of the reaction, that is, the product*b*-weight equals the reactant*b*-weight.*b*The support supp(

) of any moiety*b*is a siphon. This is because if any species*b**A*_{i}in supp() appears on the product side of a reaction, then the corresponding product*b*-weight is strictly positive, and so is the corresponding reactant*b*-weight. A strictly positive reactant*b*-weight implies that at least one of the reactant species belongs to supp(*b*), which matches the definition of a siphon.*b* - 2.
*Second step:*If a CRN has more than one moiety (i.e., rank() =*S**N*−*p*with*p*> 1), then at least one of the moieties has its support shorter than the full set of species. If that were not the case, one could find two moieties*b*_{1}and*b*_{2}having as support the full set of species. Defining*i*_{0}as the species index for which*b*_{1, i}/*b*_{2, i}is minimized, and*x*=*b*_{1, i0}/*b*_{2, i0}, one can define a new moiety=*b**b*_{1}−*x**b*_{2}. Thisis a moiety because (i) it is a linear combination of two elements of Ker(*b**S*^{T}) and as such also belongs to Ker(*S*^{T}), and (ii) all its components are positive or null. In particular,*b*_{i0}= 0, which results in supp() being shorter than the full set of species.*b*

### Appendix 2 Approximate Thermodynamic Model

*m*^{T}and denoting the density

*D*=

*m*^{T}

**givesWe have in the stationary state. Feeding Equations 4 and 5 into Equation 9 and rearranging terms givesDefining**

*c**D*

_{me}=

*m*

_{me}

*C*

_{me}, Equation 10 can be rewritten as

The equilibrium chemical potential vector **μ**_{equ} and the mass vector ** m** both belong to Ker(

*S*^{T}). Considering only CRNs at or above threshold 1, the mass vector is the only moiety, and dim(Ker(

*S*^{T})) = 1, which forces

**μ**

_{equ}and

**to be collinear,**

*m***μ**

_{equ}= α

**.**

*m**c*

_{equ,i}are uniquely determined by α throughFeeding Equation 12 into Equation 10 gives a polynomial equation in where the left-hand side (LHS) is a constant and the right-hand side (RHS) is a polynomial expression containing only positive terms, which uniquely defines α and the equilibrium state

*c*_{equ}.

*c*_{equ} only depends on {*m*_{i}}_{i=0,…,N−1} and {μ_{0,i}}_{i=0,…, N−1} and not on kinetics, nor on topology. In particular, it is the same for maximum-size, threshold-1 or threshold-2 CRNs, as well as for any other subset of the maximum-size network having rank *N* − 1. Further, *c*_{equ} is an approximation of the stationary concentration vector denoted ** c**, assuming the latter to be close to thermodynamic equilibrium. Although

**may be far away from thermodynamic equilibrium, we shall see that many of the features in the dependence of**

*c***on (,**

*c**c*

_{out,nu},

*k*

_{me},

*C*

_{me}) and on the (

*A*

_{nu},

*A*

_{me}) combination are actually also exhibited by

*c*_{equ}. This approximate model holds for any conservative and thermodynamically consistent CRN and does not presume specific kinetics (mass-action or other).

By a thermodynamically consistent CRN [7], we refer to any CRN such that every reaction is bidirectional (quasi-irreversible reactions are simply reversible with one direction having very low rate) and such that it is endowed with kinetics ensuring detailed balance at equilibrium for a closed system, that is, for every reaction the forward and reverse rates are exactly balanced [18]. Detailed balance ensures that the free energy is minimized at equilibrium.

This approximate model can be extended to the general case where there are several moieties {*b*_{k}}_{k=0,…,p−1} that constitute a basis of Ker(*S*^{T}). Decomposing **μ**_{equ} on this basis (**μ**_{equ} = ∑_{k=0,…,p−1} α_{k}*b*_{k}) and multiplying both sides of Equation 2 by , gives a set of *p* coupled polynomial equations in {exp(α_{k}/R*T*)}_{k=0,…, p−1}, where the *p* LHSs are constant and strictly positive (because each moiety must be fed according to the necessary condition) and the RHSs are polynomial expressions with only positive terms. This uniquely defines {α_{k}}_{k=0,…,p−1} and the equilibrium state *c*_{equ}.

### Appendix 3 Trend Analysis for the Approximate Thermodynamic Equilibrium Model

This appendix analyzes the way the approximate thermodynamic equilibrium model (as defined in Appendix 2) depends on various model parameters, and how this is in agreement with numerical observations.

**μ**

_{equ}= α

**:withWe wish to determine how this equilibrium state varies with varying the (**

*m**A*

_{nu},

*A*

_{me}) combination and with varying the (,

*c*

_{out,nu}) and (

*k*

_{me},

*C*

_{me}) parameters. We shall follow the approach presented in the main text for the exploration of the parameter space. First, we will keep (,

*c*

_{out,nu}) and (

*k*

_{me},

*C*

_{me}) constant and analyze how the growth rate λ depends on the choice of the (

*A*

_{nu},

*A*

_{me}) combination ( Appendix A3.1). Second, we will then fix the choice of the (

*A*

_{nu},

*A*

_{me}) combination, keep (

*k*

_{me},

*C*

_{me}) constant, and analyze how the concentrations {

*c*

_{i}}

_{i=0,…,N−1}and the growth rate λ vary with varying (,

*c*

_{out,nu}) ( Appendix A3.2). Third, we will also fix the choice of the (

*A*

_{nu},

*A*

_{me}) combination, keep (,

*c*

_{out,nu}) constant, and analyze how the concentrations {

*c*

_{i}}

_{i=0,…,N−1}and the growth rate λ vary with varying (

*k*

_{me},

*C*

_{me}) ( Appendix A3.3).

#### A3.1 Vary the (*A*_{nu}, *A*_{me}) Combination; Keep (, *c*_{out,nu}) and (*k*_{me}, *C*_{me}) Constant

We shall proceed with a crude analysis of the solutions to Equation 11, assuming that its right-hand side is dominated by either one or the other contribution, or λ(*D*_{me} + *D*).

##### A3.1.1 Assume , or Equivalently, *c*_{nu} ≈ *c*_{out,nu}

*RHS*

_{1}≪ 1 and

*RHS*

_{2}≪ 1. α having been determined in Equation 16 above, one can calculate

*RHS*

_{1}and

*RHS*

_{2}:Noting thatgivesSimilarly,Noting thatgivesNoting thatshows that for

*RHS*

_{1}≪ 1 and

*RHS*

_{2}≪ 1 to hold, one should try to choose

*A*

_{nu}with a low μ

_{0,nu}/

*m*

_{nu}(or

*A*

_{me}with a high μ

_{0,me}/

*m*

_{me}).

##### A3.1.2 Assume , or Equivalently, *c*_{nu} ≪ *c*_{out,nu}

We shall further assume that either *D*_{me} or *D* dominates the other.

- 1. Starting with the further assumption that
*D*≪*D*_{me}, we then haveand the unknown α is approximately given byα being thus determined, one can verify the conditions under which the main assumptionis actually verified. Noting thatwe haveAnd for the main assumption*c*_{nu}≪*c*_{out,nu}to be verified, one should try to choose*A*_{nu}with a high μ_{0,nu}/*m*_{nu}(or*A*_{me}with a low μ_{0,me}/*m*_{me}). - 2. Continuing with the further alternative assumption that
*D*_{me}≪*D*, we then havewhere*A*_{i0}is the most abundant species, the contribution of which dominates*D*. The unknown α is then approximately given byα being thus determined, one can verify the conditions under which the main assumption as in step 1 is actually verified. Noting thatwe haveAnd for the main assumption*c*_{nu}≪*c*_{out,nu}to be verified, one should try to choose*A*_{nu}with a high μ_{0,nu}/*m*_{nu}(or*A*_{me}with a low μ_{0,me}/*m*_{me}).

The above is is just a trend analysis and does not guarantee that a suitable choice of (*A*_{nu}, *A*_{me}) suffices to make one or the other contribution on the right-hand side of Equation 11 dominant, because the actual values of these contributions also depend on the particular numerical values used in the model. It illustrates the rationale for ordering the different species according to their reference chemical potential per unit molecular weight, μ_{0, i}/*m*_{i}, when plotting the protocell growth rate λ as a function of the (*A*_{nu}, *A*_{me}) combination in the main text. And it explains why one should try to choose nutrient (respectively, membrane precursor) with a high (low) reference chemical potential per unit molecular weight for the growth rate to be significant, as seen in Figure 4.

#### A3.2 Vary (, *c*_{out,nu}); Keep the (*A*_{nu}, *A*_{me}) Combination and (*k*_{me}, *C*_{me}) Constant

*c*

_{out,nu}). Equation 11 may be rewritten aswhere

*f*and

*g*are monotonically increasing functions of α defined by

We will consider both variables and *c*_{out,nu} in turn.

##### A3.2.1 Vary ; Keep *c*_{out,nu} Constant

*f*′ and

*g*′ designate the derivatives of the functions

*f*(·) and

*g*(·). We thus obtain :

The equilibrium state solution of Equation 11 is such that *c*_{out,nu} > *c*_{nu} = *f*(α) (i.e., *c*_{out,nu} − *f*(α) > 0), and *f*′(α) + *g*′(α) > 0 because *f* and *g* are monotonically increasing functions of α. This shows that . As α increases with increasing , all concentrations *c*_{i} as well as the growth rate λ (which is an increasing function of *c*_{me}) also increase. This explains the pattern observed in Figure 5.

##### A3.2.2 Vary *c*_{out,nu}; Keep Constant

*c*

_{out,nu}. Differentiating this equation with respect to the variable

*c*

_{out,nu}givesWe thus obtain dα/d

*c*

_{out,nu}:

#### A3.3 Vary (*k*_{me}, *C*_{me}); Keep the (*A*_{nu}, *A*_{me}) Combination and (, *c*_{out,nu}) Constant

*k*

_{me},

*C*

_{me}). Equation 11 may be rewritten aswhere

*F*is a monotonically increasing function of α defined byWe will consider both variables

*k*

_{me}and

*C*

_{me}in turn.

##### A3.3.1 Vary *k*_{me}; Keep *C*_{me} Constant

*k*

_{me}. Differentiating this equation with respect to the variable

*k*

_{me}giveswhere

*F*′ designates the derivative of the function

*F*(.).

Feeding this information into Equation 45 also shows that dλ/d*k*_{me} > 0, because *F* and *D* are monotonically increasing functions of α.

As a result, increasing *k*_{me} results in a decrease in all concentrations but in an increase in the growth rate λ. This is consistent with the pattern observed in Figure 6.

##### A3.3.2 Vary *C*_{me}; Keep *k*_{me} Constant

*C*

_{me}. Differentiating this equation with respect to the variable

*C*

_{me}givesFurther noting thatand thatresults inThis shows that , because

*F*,

*c*

_{me}, and

*D*are monotonically increasing functions of α.

*C*

_{me}< 0, because

*F*,

*c*

_{me}, and

*D*are monotonically increasing functions of α.

As a result, increasing *C*_{me} results in an increase in all concentrations and in a decrease in the growth rate λ. This is consistent with the pattern observed in Figure 6.

## Author notes

Contact author.

Laboratoire d'Informatique (LIX), École Polytechnique, 91128 Palaiseau Cedex, France. E-mail: bigan.erwan@orange.fr (E.B.); steyaert@lix.polytechnique.fr (J.-M.S.)

Laboratoire Matière et Systèmes Complexes, UMR7057 CNRS, Université Paris Diderot, 75205 Paris Cedex 13, France. E-mail: stephane.douady@univ-paris-diderot.fr (S.D.)