## Abstract

It is widely accepted that autocatalysis constitutes a crucial facet of effective replication and evolution (e.g., in Eigen's hypercycle model). Other models for early evolution (e.g., by Dyson, Gánti, Varela, and Kauffman) invoke catalytic networks, where cross-catalysis is more apparent. A key question is how the balance between auto- (self-) and cross- (mutual) catalysis shapes the behavior of model evolving systems. This is investigated using the graded autocatalysis replication domain (GARD) model, previously shown to capture essential features of reproduction, mutation, and evolution in compositional molecular assemblies. We have performed numerical simulations of an ensemble of GARD networks, each with a different set of lognormally distributed catalytic values. We asked what is the influence of the catalytic content of such networks on beneficial evolution. Importantly, a clear trend was observed, wherein only networks with high mutual catalysis
propensity (*p*_{mc}) allowed for an augmented diversity of composomes, quasi-stationary compositions that exhibit high replication fidelity. We have reexamined a recent analysis that showed meager selection in a single GARD instance and for a few nonstationary target compositions. In contrast, when we focused here on compotypes (clusters of composomes) as targets for selection in populations of compositional assemblies, appreciable selection response was observed for a large portion of the networks simulated. Further, stronger selection response was seen for high *p*_{mc} values. Our simulations thus demonstrate that GARD can help analyze important facets of evolving systems, and indicate that excess mutual catalysis over self-catalysis is likely to be important for the emergence of molecular systems capable of evolutionlike behavior.

## 1 Introduction

The fundamental question of how primitive life emerged on the prebiotic Earth has drawn considerable scientific attention throughout the centuries [2, 5, 14, 15, 22, 42, 59, 64]. The path from organic mixtures (i.e., the primeval soup) to reproducing lifelike protocells has traditionally been dominated by two different views: the genetic, or *replicator-first*, approach, and the *metabolism-first* approach [2, 42]. Both acknowledge the need for reliable information storage and transfer, assisted by self-replication. The replicator-first approach suggests that life began with a single self-perpetuating biopolymer (e.g., RNA) [14, 15, 18, 19, 22, 37, 42, 64], which later evolved into multimolecular networks under the replicator's control. Orgel [41] has highlighted the relationship between molecular replication and the concept of autocatalysis or self-catalysis. The metabolism-first approach suggests that the very first life precursors must have been relatively complex molecular networks arising via spontaneous accretion of simpler organic molecules [3, 9, 24, 25, 34, 48, 51, 53, 60]. In this scenario, it is further proposed that faithful reproduction directly stems from certain network attributes. Therefore, one should better understand the network properties of the implicated molecular assemblies [1, 47, 57, 66] if one can merge the two seemingly conflicting scenarios for prebiotic evolution.

One embodiment of the metabolism-first view is the lipid world scenario, which considers non-covalent assemblies of amphiphiles, such as micelles and vesicles formed by lipids [8, 39, 48, 50, 53, 69]. These are assumed to store information in the form of nonrandom molecular compositions, and pass it to progeny via homeostatic growth accompanied by fission [49]. The graded autocatalysis replication domain (GARD) kinetic model for prebiotic evolution quantitatively describes the details of such a process. It elaborates some of its evolution-related attributes [10–12, 27, 30, 44, 50, 58, 62, 67, 68], with an implied route to minimal protocells [8, 45, 49, 63, 65]. The model is based on a catalytic network, usually presented in the form of a matrix β with autocatalysis (self-catalysis) and cross (mutual) catalysis terms. Importantly, the system is kept away from thermodynamic equilibrium by assembly fission [49]. Key in GARD dynamics are compotypes—clusters of replication-prone quasi-stationary states (composomes, a term derived from the notion of compositional genomes [49]), proposed to play a crucial role in the GARD's evolutionary behavior. Introducing substantial inhibition in β is expected to result in net catalysis because an inhibitor of an inhibitor is an activator [20].

Catalysis, the enhancement of reaction rate by an external chemical component, was recognized as early as 1836 by Berzelius, and Ostwald applied the term *autocatalysis* in 1890 to reactions that gain speed as they proceed [26, 44]. In the genetic approach to life's origin, researchers invoke one or several autocatalytic molecules as the core of a prebiotic entity. This is exemplified by the *hypercycle*, a set of self-replicating polynucleotides, coding for and acted upon by enzymes [10, 30, 58]. In the metabolism-first domain, autopoiesis [67] and the chemoton model [12] are examples of collective autocatalysis [25].

Collectively autocatalytic systems feature a central role not only for self-catalysis, but also for mutual catalysis. In this, they arguably resemble present-day living cells, which harbor self-catalytic polynucleotides as well as a plethora of mutual catalysts that constitute metabolic pathways. Here we utilize a metabolism-first simulator to examine the relative importance of the two catalytic modes (self- and mutual catalysis). Previously [11], an abstract chemistry model has been used to demonstrate that self-maintaining organizations arise only once self-catalysis is completely inhibited [11, 62]. We attempt to extend such results in the realm of the GARD kinetic model, asking what features of the β network contribute to the evolution of the ensuing compositional assemblies. It is shown that excess mutual catalysis is a necessary, though not sufficient, condition for displaying several evolutionlike characteristics, including a high number of composome types, higher evolvability scores, and a significant response to selection.

Recently, it has been argued that collectively autocatalytic metabolic networks, such as the GARD, do not allow for fitter compositional genomes to be maintained by selection. Vasas et al. [68] compared the frequency ranking of random GARD compositional assemblies before and after selection, and found that the relative ranks changed only slightly. This was taken as evidence for an inherent evolutionary limitation of metabolism-first scenarios. Here it is demonstrated, based on a large number of simulations, that when quasi-stationary composomes rather than arbitrary compositions serve as selection targets, GARD networks *are* capable of a significant response to selection. Importantly, this can happen chiefly when a high proportion of mutual catalysis is present in a GARD network. The results highlight the potentially important role of mutual catalysis, as compared to self-catalysis, in the emergence of early lifelike
systems.

## 2 Model and Methods

### 2.1 GARD Formalism

*n*

_{i}within the assembly:The vector dynamics is governed by mutually catalytic interactions among the invariable number of constituent molecule types,

*N*

_{G}. The assembly grows by accretion of environmental molecules, and once a limiting size

*N*

_{max}is attained, random fission is applied, producing two progeny of the same size,

*N*

_{min}=

*N*

_{max}/2, one of which grows again, generating growth-fission cycles of consecutive generations. GARD dynamics is described by a set of ordinary differential equationswhere

*dn*

_{i}/

*dt*is in units of the individual reaction rates at which the counts of elements are changing [49], and

*k*

_{f}and

*k*

_{b}are respectively the basal forward and backward rate constants (joining and leaving the assembly). Typically , reflecting a high equilibrium constant

*k*

_{f}/

*k*

_{b}for spontaneous amphiphile accretion (Table 1). Here ρ

_{i}is the buffered concentration of molecule type

*i*in the environment (assumed here to be equal for all

*i*values),

*N*is the assembly current size, and β

_{ij}is the non-negative matrix element signifying the rate enhancement exerted by an assembly molecule of type

*j*on an incoming or outgoing molecule of type

*i*The chemical reaction in Equation 3 embodies the notion that molecular catalysis equally affects the forward and the backward rates, obeying the constraint that a catalyst may not change the equilibrium constant of the reaction it affects. This means that even under catalytic action, the relationship prevails.

The matrix β represents a network of self-catalytic (diagonal elements) and mutually catalytic (off-diagonal elements) catalytic interactions (Figure 1), with self-catalysis represented by the case *j* = *i* ( Appendix A.1, Equation 13). The matrix elements are randomly drawn from a lognormal distribution ( Appendix A.1 and Equation 12) [49].

### 2.2 GARD Simulations

The model is subjected to a kinetic Monte Carlo simulation based on Gillespie's algorithm [16, 17, 51] using parameter values similar to those employed in previous studies (Table 1). Simulations are run using MATLAB versions 7.6–7.10 (the GARD10 code is available upon request). A set of 10,000 GARD simulations is generated, all with the same parameters, and each with a different matrix β generated by the MATLAB pseudorandom number generator with seeds 1–10,000. The validity of the conclusions drawn here is ascertained by repeating the simulations with smaller data sets, with seeds 1–2,000 and 2,001–4,000, striving to verify that the entire 10,000-strong data set adequately represents the GARD simulation space. The random sampling of β values may be perceived as representing different possible GARD environmental chemistries.

*N*

_{G}diagonal elements and the total number of elements is

*N*

_{G}

^{2}, an appropriate correction is introduced. Thus, the excess of mutual catalysis is represented by

*p*

_{mc}> 1, while the excess of self-catalysis (or autocatalysis) is portrayed by

*p*

_{mc}< 1.

### 2.3 Compositional Similarity and Compotypes

^{χ}and ν

^{δ}of the respective assemblies at generations χ and δ is defined as the dot product

*H*(see Equation 5) of their composition vectors [49], typically calculated at assembly size

*N*

_{max}(end of the growth cycle).GARD dynamics is visually portrayed by a

*similarity carpet*, showing

*H*between any pair of parent assemblies during a simulation (e.g., Figure 10 in Appendix A.4). Composomes, appearing as dense areas with high similarity near the main diagonal, are defined as any two consecutive generations where

*H*(χ, χ + 1) ≥ 0.9 [56]. Inter-composome similarity is viewed by off-diagonal examination. The time duration of different generations (Equation 2) is different due to different growth pathways; hence a certain level of selection is already achieved by the matrix β causing composomes to appear more frequently than random compositions [49].

All the compositions belonging to composomes in the entire simulation undergo *k*-means clustering [56, 61], and the centers of mass of the resulting clusters are defined as compotypes.

### 2.4 Similarity Autocorrelation

*c*(Δ

*t*), akin to a Fourier transform of the compositional similarity time series, is defined bywhere 〈…〉 denotes averaging over all generation pairs fulfilling δ − χ = Δ

*t*. This function is history independent, that is, no conditions are imposed on the events occurring between generations χ and δ.

*c*(Δ

*t*) is fitted with a single exponential with parameters τ and

*H*

_{0}using a least squares fit (see Appendix A.2, and Figure 12 in Appendix A.4):The parameters τ and

*H*

_{0}are used to define a measure of evolvability (Section 3).

### 2.5 Selection in GARD

*T*. A selection-GARD simulation is then run, whereby the growth of an assembly at generation χ is biased toward

*T*via a growth bonus parametermanifested as a temporary enhancement of the corresponding β

_{ij}values, as suggested [68], where

*s*> 1 is a fitness gain, embodying a selective advantage, and for consistency with previous work [68]

*H*(ν

^{χ},

*T*) is calculated at assembly size

*N*

_{min}, that is, the beginning of the growth cycle.

_{ij}− is obtained at each generation according towhere

*i*and

*j*are molecular type indices, and β

_{ij}modification is effected for all

*i,j*(and

*j,i*) pairs contained within the current assembly. Thus, the network will be perturbed only at edges present within the current assembly according to how similar the current assembly is to the target. In the selection-GARD simulation, a compotype

*T*− is identified as having the highest

*H*value with respect to

*T*. An unambiguous identification of

*T*− is afforded by the fact that the mean similarity between

*T*and

*T*− in the entire data set is

*H*= 0.9933 ± 0.0217. The selection excess is subsequently defined aswhere

*f*

_{T−}and

*f*

_{T}are the fractions of generations belonging the respective compotype (before and after selection). Selection excesses ≥1.05 and ≤0.95 are respectively taken to represent positive and negative target selection; the rest are taken to signify no selection.

### 2.6 Selection Dynamics in a Population of Compositional Assemblies

An initially random population of a fixed number of assemblies, *L*_{pop}, is allowed to simultaneously grow according to Equations 1 and 2 and its idiosyncratic composition. When one of the assemblies reaches the limiting size *N*_{max}, it divides by random fission, and a randomly chosen assembly from among the other *L*_{pop} − 1 assemblies is removed, thus keeping the population size constant. This is repeated for GEN splits (Table 1). This protocol is based on the classical Moran process [36, 68, 70], and to some degree reflects an earlier attempt to simulate GARD populations [38].

The frequency of the target in each population is defined as the number of assemblies that are highly similar (*H* ≥ 0.9) to the target compotype taken from regular GARD for the same β network (Figure 13 in Appendix A.4). Selection is exerted by performing a simulation with the same parameters, biasing the growth of assemblies toward a target compotype as for regular GARD (Equations 8 and 9). The selection excess is defined as in Equation 10, where *f*_{T−} and *f*_{T} are respectively the fractions of assemblies within the population belonging to the target compotype before and after selection.

## 3 Results

### 3.1 Selection in GARD

We used GARD simulations to ask what is the selection response of compositional assemblies. A value for the selection excess was obtained for each of 10,000 simulations, using a modest value of the fitness gain, *s* = 1.1, in line with previous work [68]. Figure 2a shows the correlation between the frequencies of the target compotype with and without selection (examples of regular GARD carpets before and after selection are given in Figure 14 in Appendix A.4). An overall skew is seen here toward positive selection. The figure also demonstrates that significant positive selection, as well as negative, occurs over most of the range of *f*_{T}.

Figure 2b shows the distribution of selection excess values for the entire data set (Equation 10). Importantly, a considerable percentage of the simulations (33%) show positive selection, with a mean selection excess of 1.38 for selection excess >1.05, and as much as 10% shows selection excess >1.5. Interestingly, 31% of the cases showed negative selection, with a mean selection excess of 0.775 for selection excess <0.95, and about 36% were neutral to the selection pressure. Similar to the skewness in Figure 2a, there is a slight bias in favor of positive selection, as indicated by an overall mean selection excess equal to 1.05. Notably, higher mean selection values positively correlate with the number of other compotypes coexisting with the target compotype in a given system (Figure 15 in Appendix A.5).

GARD simulations are used to see how attributes of the catalytic network embodied in the matrix β govern the evolution-related dynamics of compositional assemblies. It is asked how the mutually catalytic power *p*_{mc} (Equation 4) influences the selection response. A clear trend appears here, whereby strong positive or negative selection is found almost entirely for *p*_{mc} higher than 1 (Figure 3b).

The main trends appear also at lower simulation counts, barring small-number fluctuations at high *p*_{mc} (Figure 3a). For example, for the range of *p*_{mc} > 100, a meaningful *p*-value with 5% significance level is achieved only after performing more than 2,500 simulations (Table 3 in Appendix A.6). The other two evolution-related parameters withstand similar scrutiny (below).

### 3.2 Populations of GARD Assemblies

The foregoing simulations of the regular GARD model addressed the case in which at each time point only one GARD assembly is considered. To enhance the capacity to draw conclusions about selection in GARD, 1,000 simulations were performed, each for a population of 1,000 assemblies, under the constant population conditions. Figure 4 shows an example of the dynamics for one of the networks. Starting from a population of random assemblies, the population frequency of the target compotype gradually grows over the first 10,000 split events, reaching a plateau with fluctuations, signifying the compositional preference imposed by the matrix β towards this compotype. When selection toward this compotype is applied (Equation 9), this general behavior is retained, with a faster growth and a higher plateau, that is, showing positive response to selection. Similar to Figure 3b, strong positive or negative selection is much more prevalent for *p*_{mc} values higher than 1 (Figure 5b).

The effect of selection pressure on the frequency of the target compotype for all 1,000 networks is presented in Figure 5a. Similarly to Figure 2a, an overall skew toward positive selection is seen (about 50% of cases), with some cases of negative (about 15% of cases) or no response to selection, and with a mean selection excess of 1.254 ± 0.804. Significantly, the ratio of the number of simulations showing positive selection to that showing negative selection increased more than threefold, from 1.06 in the regular GARD to 3.33 in the population-GARD. In line with previous work [68], the growth bonus was calculated when the assembly size was *N*_{min} (Equation 8). When the bonus was calculated for all time points between *N*_{min} and *N*_{max} (for a smaller set of 100 population-GARD simulations), the overall selection response seems to become even more positive (70% of cases), with a higher selection excess value of 1.399 ± 0.997.

### 3.3 Compotype Diversity

The influence of *p*_{mc} on one of the attributes of GARD diversity, the mean number of different compotypes appearing in a simulation, is now analyzed. It is found that as *p*_{mc} increases, so does the mean number of compotypes, reaching a maximal value of nearly 3 at *p*_{mc} = 100 (Figure 6a). Furthermore, in the realm of excess self-catalysis (*p*_{mc} < 0.5), one compotype appears in an overwhelming majority of the cases (91%) (Figure 6b). In contrast, compotype counts between 2 and 6 are almost entirely confined to the domain of excess mutual catalysis (*p*_{mc} > 2). Curiously, even among the ∼5,000 simulations that show only one compotype, a large majority have *p*_{mc} > 2, suggesting that high mutual
catalysis is a necessary but not sufficient condition for a high number of compotypes.

### 3.4 GARD Evolvability

The similarity autocorrelation function (Equation 6) and its derived parameters (Equation 7) are employed to obtain information on the evolutionlike dynamics of GARD assemblies. One possible interpretation of the value of τ is a depiction of the whole-simulation average of the assembly compositional lifetime. Longer τ may be taken to represent better average maintenance of compositional similarity between consecutive GARD generations, symbolizing better reproduction fidelity. Likewise, 1/τ may be thought of as related to the compositional mutation rate. Indeed, effective compositional preservation is implicated by the most frequent number of generations, τ ≈ 3, with a non-negligible probability for τ ≥ 10 (Figure 7a). Note that τ does not represent the composomal lifetime. In fact, the
most probable target compotype lifetime (taking for simplicity the maximal time from each simulation) is 30, and the average is 434 generations (Figure 7c). The other similarity autocorrelation parameter, *H*_{0}, is interpreted here as showing the residual compositional similarity among assemblies along many generations in the entire simulation. Thus, 1 − *H*_{0} is taken as proportional to the overall compositional diversity of assemblies across the entire simulation. Note that *H*_{0} is not strongly correlated with the compotype count (Figure 16 in Appendix A.5, correlation coefficient −0.049, *r*^{2} = 0.89) and therefore constitutes a rather independent diversity assessment attribute. The most probable *H*_{0} value is ∼0.5, with a smaller probability peak at *H*_{0} ≈ 1. The latter stems from simulations in which a single compotype tends to dominate.

*p*

_{mc}values higher than 1 (Figure 8).

### 3.5 The Effect of Assembly Size

The effect of the assembly pre-fission size *N*_{max} on GARD's evolutionlike behavior was studied by performing two additional sets of 10,000 simulations, each with the same parameters as in Table 1 except for *N*_{max} = *N*_{G}/2 and 2*N*_{G} (Figure 9; Figure 17 and Table 3 in Appendix A.6). While for the smaller *N*_{max} value, the percentage of beneficial outcomes seems to be even higher than for the nominal *N*_{max} = *N*_{G}, a larger *N*_{max} value appears to have a disruptive effect because the system is nearing the
equilibrium steady state [23]. This is especially seen for the compotype count and the evolvability score.

## 4 Discussion

### 4.1 The Significance of Mutual Catalysis

One of the dominant concepts in prebiotic evolution research is the replicator-first scenario [10, 32, 40]. Based on the concept that molecular replication is related to self-catalysis [41], such views may be perceived as related to the RNA-first scenario, positing that life began with a unique self-replicating polyribonucleotide. In this realm, it is argued that more complex interaction networks have arisen only at later stages, as when precursors for the autocatalytic molecule have been exhausted [31]. Our simulation results demonstrate an advantage for a network-first scenario, in which a large number of molecular components mutually interact. While arising from a metabolism-related framework, such results may be taken as relevant to the question of whether life's early precursors were a set of replicators or a metabolic network. Note that the present work makes a direct comparison between a metabolic network with frequent self-catalytic interactions and a metabolic network with frequent mutually catalytic interactions, and therefore has only indirect relevance to the question of the validity of replicator models. It is conceivable that future work incorporating templating biopolymers together with mutually catalytic networks will better resolve this issue.

A widespread argument against metabolism-like entities being the first seed of life is the assertion that metabolic networks cannot store and propagate information. The GARD model may be viewed as a counterexample, as it is endowed with a (limited) capacity to store and propagate compositional information. This has implications for a set of previously proposed models involving networks of molecular interactions. Two of the earliest relevant concepts are Gánti's chemoton [12, 13, 63] and Maturana and Varella's autopoietic systems [35, 67]. Autopoiesis characterizes a spatially confined network of molecular components, whose mutual interactions continuously regenerate the network itself. The chemoton is described as a system of three subnetworks: metabolite generation, template copying, and membrane synthesis. We prudently suggest that GARD may be viewed as a special case of autopoietic-chemoton-like models, where template copying and compartmentation are embodied in one entity, and a continuous supply of metabolites is afforded by the spontaneous accretion of lipids from the buffered environment.

### 4.2 The Effect of Mutual Catalysis on GARD Diversity and Evolvability

An important result of this work is that networks within a certain range of kinetic parameters, namely those that exhibit excess mutual catalysis, lead to enhanced diversity and evolvability of GARD compotypes. The compotype count is a direct indication of the degree of composomal diversity. This result is related to an important aspect of early evolution: Self-catalysts tend to propagate their own identity and suppress processes essential for the increasing complexity necessary for transitions from early seeds of life toward systems resembling present-day life. The presently demonstrated importance of mutual catalysis echoes the notion of systems prebiology [21, 57], whereby it is suggested that life began its trajectory from complex chemical mixtures obeying network behavior similar to that of metabolism in present-day cells.

### 4.3 Compotypes as Selection Targets

One of the unique corollaries of the GARD model is the emergence of composomes, dynamic states of compositional assemblies that embody both metabolism-like characteristics and a rudimentary capacity to store and propagate molecular information [49]. Composomes may be considered as forming bridges between seemingly disparate views of the early seeds of life: metabolism first and replicators first. Compotypes are further defined as centers of mass of composome clusters, which may be regarded as analogous to species or quasi species [6]. This is due to the fact that a compotype is a distinct entity, with distinct physical properties and hence fitness encoded in its compositional information, different from those of other compotypes but still harboring considerable internal variability of constituents. Therefore, compotypes are considered as natural targets of selection, as compared to randomly chosen compositions, as previously pursued [68]. Note that here we have a measure of selection inherently present in the GARD model even in the absence of external selective pressure, due to the fact that different composomes have different average growth rates. This is seen in the present population GARD simulations, which are seeded with a random population, but show a gradual increase of the population frequency of a specific compotype even in the absence of externally imposed selection. This increase comes at the expense of other compositions because of the constant population condition.

### 4.4 Selection in a GARD

The present results show that GARD assemblies can exhibit positive or negative selection toward a compotype target, as well as no selection at all. While in regular GARD the overall average selection excess is merely 1.05, it is noteworthy that as many as 10% of the simulations show high selection excess, >1.5. Importantly, these general results are borne out both in simulations of the regular model and in simulations involving populations of assemblies. Previously, GARD population dynamics has been studied by addressing various emergent properties, including a comparison of finite and infinite chemical environments [38]. Another study [70] showed that compositional inheritance also emerges in the GARD model variants involving assembly populations and spatial proximity interaction effects, and that it emerges in both a thermodynamic and a kinetic interaction regimen.

Analyzing GARD, both positive and negative selections can be observed in practice only when the underlying network exhibits mutual catalysis excess. This conclusion is strengthened by its demonstration in two different simulation modes: in the regular model and in populations. Notably, positive selection is observed appreciably more often in population GARD simulations, perhaps reflecting the advantage of addressing populations of competing entities with different reproductive rates. Furthermore, this selection response tends to be augmented as the number of coexisting compotypes increases in a given simulation, which may indicate a capacity of selective forces to provide an edge to the target compotype in inter-compotype competition. Further in-depth analyses (currently underway) of the ultrastructure of the β network, as well as subnetworks (quasi compartments [68]), could lead to a better understanding of the influence
of *p*_{mc} and the compotype count on selection.

The present method for biasing the growth rate of a GARD target composition is in principle similar to that used previously [68]. In both cases, modifications are in effect introduced to β matrix elements. However, the previous analysis utilizes an interim formalism, the Eigen equation, for replication-mutation dynamics [10], and the selection-related modification is exerted by multiplying the growth rate by *fH*, defined in the same way as in Equation 8. The method utilized here involves direct modification (Equation 9), a possible explanation for the discrepant results obtained by the two reports. There are, however, additional significant differences between the two studies: (a) a pre-fission value *N*_{max} = 100 used here, as compared to *N*_{max} = 6 used previously, an obligatory small value required for the realistic application of Eigen's formalism with the available computing power; (b) a large difference in repertoire size (*N*_{G} = 100 here versus *N*_{G} = 10 in the earlier study); (c) the performance here of 10,000 random simulations, considered essential for proper statistical rigor, as compared to only a single simulation done previously. Both points (b) and (c) provide a significant edge to the present simulations in sampling the β interaction space, which allows drawing conclusions with higher certainty. In the future it will be interesting to consider additional methodologies to exert external selection. One could be a variant of the presently used method, whereby the β network will be biased by a constant factor and not employing target
similarity-oriented bias. Another could be biasing the environmental concentration ρ_{i} (Equation 2) by a constant factor based on the molecules that are contained in the target compotype.

## 5 Conclusion

The GARD model embodies the inheritance of compositional information in the realm of a lipid world scenario for early evolution [20, 21, 23, 27, 48, 49, 51, 55–57]. The GARD has recently been pursued in several additional publications [20, 39, 68, 70] and has been chosen as an archetypal metabolism-first realization [68]. This suggests that despite being a simulated toy model, the GARD has sufficient complexity to shed light on some important questions in the field of prebiotic origins. In the present work an attempt is made to shed further light on some of the GARD's evolutionary features. It is expected that the present insights will become instrumental in further efforts to extend the GARD beyond the monomer world [54], as has been preliminarily explored [55]. This might be necessary to reveal the capacity of the GARD model to capture the much-needed open-ended attributes of natural selection and evolution.

## Acknowledgments

We thank Raphael Zidovetzki and Tsviya Olender for enlightening comments, Aron Inger for assistance in coding, and Ágnes Tóth Petróczy, Azhar Ali Shah, Hugues Bersini, Leong Ting Lui, and Natalio Krasnogor for discussions. This work is partly supported by EU-FP7 project MATCHIT and by the Crown Human Genetics Center at the Weizmann Institute of Science.

*K*-means clustering: A half-century synthesis.

### Appendix

#### A.1 Distribution and Sampling of the GARD Matrix β

_{ij}values are randomly generated based on a lognormal distributionwhere μ and σ are the mean and standard deviation, respectively, which can be considered as a “natural” distribution [33], in accordance with the receptor affinity distribution formalism [28, 29, 46], and it was also shown that a lognormal β increases the reproduction fidelity over the normal β in GARD [52]. Each randomization of the β network may be thought of as representing the relative rates of the

*N*

_{G}molecules as they might ensue from different possible GARD environments.

_{ii}and β

_{XY}is clarified on noting that in the GARD, molecules have two states,

*in*and

*out*, which behave as distinct chemical species. While it is possible that more complex pathways would also be autocatalytic [44], this work refers to self-catalysis as the simplest closed subnetwork of the β network, containing one element (Figure 1).

#### A.2 Fitting the Similarity Autocorrelation Function

The fitting procedure is as follows: (1) Calculate *H*_{0} as the mean of *c*(Δ*t*) in the interval [*GEN*/4, *GEN*/2]. (2) Guess τ^{‡} as the first instance *c*(Δ*t*) drops below *H*_{0}. (3) Smooth the *c*(Δ*t*) tail by forcing: *c*(Δ*t* > τ^{‡}) = *H*_{0}. (4) Fit an exponential (Equation 7) to the smoothed *c*(Δ*t*), using nonlinear least squares with a tolerance of 10^{-5}.

Examples are given in Figure 12 in Appendix A.4.

#### A.3 *p*-Values

See Table 2.

Random-seed range . | p_{mc} > 100∗. | Selection excess^{†}. | p-Value. |
---|---|---|---|

50–300 | 3 | 0.973 ± 0.0395 | 3.57 × 10^{-1} |

300–800 | 5 | 1.311 ± 0.389 | 1.49 × 10^{-1} |

1,000–3,500 | 40 | 1.100 ± 0.325 | 5.95 × 10^{-2} |

5,000–10,000 | 70 | 1.119 ± 0.248 | 1.45 × 10^{-4} |

1–10,000 | 143 | 1.105 ± 0.272 | 8.03 × 10^{-6} |

Random-seed range . | p_{mc} > 100∗. | Selection excess^{†}. | p-Value. |
---|---|---|---|

50–300 | 3 | 0.973 ± 0.0395 | 3.57 × 10^{-1} |

300–800 | 5 | 1.311 ± 0.389 | 1.49 × 10^{-1} |

1,000–3,500 | 40 | 1.100 ± 0.325 | 5.95 × 10^{-2} |

5,000–10,000 | 70 | 1.119 ± 0.248 | 1.45 × 10^{-4} |

1–10,000 | 143 | 1.105 ± 0.272 | 8.03 × 10^{-6} |

∗The number of networks exhibiting high *p*_{mc} value.

^{†}Mean and standard deviation of the selection excess of these networks (under regular GARD simulations).

#### A.4 Examples

See Figures 10,^{11}^{12}^{13}–14.

#### A.5 Selection Excess and the Number of Compotypes

See Figures 15–16.

#### A.6 Assembly Size

See Figures 17 and Table 3.

Number . | Mean value . | ||
---|---|---|---|

N_{max} = 2N_{G}. | N_{max} = N_{G}
. | N_{max} = N_{G}/2. | |

NC | 1.20 | 2.03 | 3.38 |

EV | 0.72 | 1.11 | 1.35 |

SE | 1.01 | 1.05 | 1.04 |

SE > 1.05 | 1.36 (8%) | 1.38 (33%) | 1.28 (48%) |

SE < 0.95 | 0.85 (13%) | 0.77 (31%) | 0.71 (30%) |

Number . | Mean value . | ||
---|---|---|---|

N_{max} = 2N_{G}. | N_{max} = N_{G}
. | N_{max} = N_{G}/2. | |

NC | 1.20 | 2.03 | 3.38 |

EV | 0.72 | 1.11 | 1.35 |

SE | 1.01 | 1.05 | 1.04 |

SE > 1.05 | 1.36 (8%) | 1.38 (33%) | 1.28 (48%) |

SE < 0.95 | 0.85 (13%) | 0.77 (31%) | 0.71 (30%) |

## Author notes

Contact author.

Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel. E-mail: [email protected] (O.M.); [email protected] (D.L.)