## Abstract

Understanding the dynamics of biodiversity has become an important line of research in theoretical ecology and, in particular, conservation biology. However, studying the evolution of ecological communities under traditional modeling approaches based on differential calculus requires species' characteristics to be predefined, which limits the generality of the results. An alternative but less standardized methodology relies on intensive computer simulation of evolving communities made of simple, explicitly described individuals. We study here the formation, evolution, and diversity dynamics of a community of virtual plants with a novel individual-centered model involving three different scales: the genetic, the developmental, and the physiological scales. It constitutes an original attempt at combining development, evolution, and population dynamics (based on multi-agent interactions) into one comprehensive, yet simple model. In this world, we observe that our simulated plants evolve increasingly elaborate canopies, which are capable of intercepting ever greater amounts of light. Generated morphologies vary from the simplest one-branch structure of promoter plants to a complex arborization of several hundred thousand branches in highly evolved variants. On the population scale, the heterogeneous spatial structuration of the plant community at each generation depends solely on the evolution of its component plants. Using this virtual data, the morphologies and the dynamics of diversity production were analyzed by various statistical methods, based on genotypic and phenotypic distance metrics. The results demonstrate that diversity can spontaneously emerge in a community of mutually interacting individuals under the influence of specific environmental conditions.

## 1 Introduction

### 1.1 The Rise of Virtual Biology

Understanding the detailed mechanisms and reconstructing the precise lineage of evolutionary biological processes constitute difficult problems for at least two reasons. First, available data (fossils, DNA, extant species) is still scarce compared to the whole history of life on earth, since obviously only one instance of this history exists and the great majority of its content has been lost. Second, most significant evolutionary processes occur on the scale of millions of years, whereas modern experimental evolution in biology is limited to simple and fast-reproducing organisms that are restricted to adaptation in artificially altered environmental conditions. In fact, it can be asked whether biological experimental evolution is inherently more valuable than computer modeling and simulation, considering that the former is also significantly more costly than the latter [34]. Thus, evolution is being increasingly studied through alternative approaches involving the theoretical reconstruction and simulation of living systems at multiple scales: genetic, cellular, developmental, population, and ecosystem. In particular, computational models have the great benefit of potentially producing a large number of experiments that can condense long evolutionary periods into short computing time frames, while making vast collections of data available for the analysis and extraction of relevant properties [26, 28]. Moreover, increasing computer power and storage capacity provide the means to actually simulate explicative models of biological evolution.

Today, numerical experiments are commonplace in biological research. In the last 20 years, thefield has moved from studying the evolution and population dynamics of simplified individuals [12, 38] to multi-species interactions and diversification [8, 19]. A great number of evolutionary computational models have been recently proposed, whether at the abstract level of large interacting populations [10, 48] or at the fine-grained level of multicellular growth [16]. Traditionally, models of population dynamics have been formulated as systems of differential equations [32], whether well mixed (ODEs) or spatially extended (PDEs), in which the established goal was to find analytical solutions. These approaches present a number of drawbacks, however, since their formulation and execution demand expert mathematical knowledge, yet they also need to remain simple enough to be mathematically tractable. For all these reasons, mathematical descriptions at the macroscopic level are generally unrealistic and difficult to extend with new features. On the other hand, *agent-based* models (ABMs) propose to describe communities in a bottom-up fashion through the properties of their individual members, the developmental and behavioral rules they obey, and their interactions with one another and the environment [20]. The collective behavior of the system, originating from complex multi-agent dynamics, is then obtained by computer simulation. Compared to analytical frameworks, two major properties are distinctive of the agent-based computational approach: *emergence* and *adaptation*. New properties at higher scales, which were not explicitly included in the model, appear in the system as the individuals interact. This makes it possible to describe and explain how certain properties observed in real communities relate to their lower-level features. Simultaneously, the community can also evolve and adapt to new environmental conditions, therefore broadening the scope of possible experiments and testable hypotheses.

### 1.2 A Generative Approach to Diversity

In this work, we focus on an especially puzzling aspect of evolution: *diversification*. We simulate a virtual community of plants to study the emergence and dynamics of genomic and phenotypic variation during evolution. To model plants, we use *rewriting systems*, a family of formal methods widely investigated in theoretical computer science and capable of encoding and generating complex structures. In this formal framework, parts of an initial object, generally strings of symbols, are iteratively replaced by other parts, generally longer string segments, following *rewrite rules*. Among the most popular and best-studied rewriting systems are *L-systems*. First proposed by Lindenmayer [27] (hence its name), L-systems are a class of formal grammars that proceed by parallel application of the rewrite rules to the current string. Parallel transformation (context-free or context-sensitive) makes L-systems especially suitable for the computational modeling of the development of unicellular colonies and multicellular organisms [21]. In particular, they have been extensively used in plant simulation [37], due to the ease of converting generated strings into 2D or 3D treelike structures by means of turtle geometry (e.g., in the Logo computer language) [29, 36]. Through the years, L-systems have found many applications in plant anatomy, physiology, and morphogenesis [2, 11, 18, 40]. Remarkably few of these efforts, however, have examined plant generation on the *evolutionary* scale, nor do they use genetic algorithms. Yet L-systems provide a rather straightforward model suitable for evolutionary exploration: Just as in biological development, where complex organisms can be considered as emerging from the “interpretation” of one-dimensional strings (genomes), L-systems can encode complex structures through fairly simple sets of rules. Encoding large phenotypes with small genotypes via an intermediate developmental stage has been termed *indirect* or *implicit* encoding in the evolutionary computation community [41].

A first attempt at evolving L-systems was proposed by Koza [25]; it used genetic programming and represented the rewrite rules as labeled tree graphs. The goal of the search was to find specific rules that could generate a structure identical to a predefined one. Since then, other works based on evolutionary complex structures encoded by L-systems have been proposed. Jacob [21, 22] presented a variant of genetic programming to evolve context-free and context-sensitive L-systems that could generate plants similar to real ones. The fitness of individuals was based on the volume and number of blooms and leaves of the virtual plant during each iteration of the system. He also experimented with an ecosystem of different coevolving plant species [23]. Ochoa [35] designed a model for generating 2D plant morphologies from D0L systems (the simplest class of L-systems, where D stands for deterministic and 0 for context-free). It resembles the one that we propose here, although her work was more focused on exploring the model itself than on using it to investigate evolutionary dynamics. Another similar, albeit only briefly sketched, framework was proposed by Mock [31]. Ebner et al. [13, 15] presented a light-seeking 3D plant model based on L-systems. They also observed [14] the *Red Queen effect* under competitive simulations, that is, the fitness either remained constant or decreased while evolutionary progress was still going on. However, their simulation evolved to a dominant stable strategy with no further progress. Toussaint [43] defined a very similar model to study the role of neutral mutations in the evolvability of genetic representations, but it had additional layers of complexity in the genetic representation. Finally, Bornhofen and Lattaud [6, 7] evolved generic virtual plants including their full life cycle. Their simulations ended in ecosystems of tall plants with no phenotypic diversification. In [8], they concluded that the genetic search space of the D0L system was too limited, constraining evolutionary dynamics too severely and preventing it from evolving a wide variety of plant morphologies.

Here, in contrast to Bornhofen and Lattaud's conclusions, we present a morphological model based on a D0L system that shows a wide range of concurrent diversification in an open-ended evolutionary process (i.e., evolution without a definite maximum fitness, hence no optimal goal or solution to reach). Regarding the issue of genetic search space, an important aspect when designing evolutionary models of complex structures (and especially so when using an indirect encoding) concerns the proper choice of *genetic operators*. For its part, the present work largely relies on *gene duplication*. In biology, gene duplication (leading to the creation of paralogous genes) is a form of *silent* mutation in the sense that it is generally neutral with respect to selective pressure. Although in most cases copied genes could be considered “junk DNA,” since they simply seem to increment redundancy, they provide in fact a fertile coding substrate for the advent of new proteins and new functions. In some models, gene duplication has been combined with traditional evolutionary operators, as in [44], which concluded that creating copies of key portions of the genome was useful because it could both retain their function and leave the copies open to incremental improvement. In the same spirit, we show here the performance of various gene duplication operators combined with usual point mutation operators (alteration, deletion, insertion), and how this can create initially non-disruptive, but later highly productive, genetic modifications.

The remainder of the article is organized as follows. Section 2 introduces our evolutionary model of virtual population, specifying mechanisms at the genetic level, the laws of interaction with the environment, and the mutation and selection rules. We then present the detailed results of a full set of numerical experiments in Section 3, followed by a discussion and our conclusions in Section 4.

## 2 An Evolutionary Model of Virtual Plant Population

Following standard practice in evolutionary computation studies, we present our model in four subsections: the genome and development of plant phenotypes (Section 2.1), the genetic operators (Section 2.2), the environment and associated fitness function (Section 2.3), and the evolutionary scheme (Section 2.4). We sometimes use the term *individual* to refer to a virtual plant.

### 2.1 Genome and Development

As in other works in virtual botany, we choose a grammar-based indirect encoding and define plant growth as a rewriting process. More precisely, a plant develops according to a deterministic, context-free L-system, also called *bracketed D0L system*. In this type of abstract framework, gene regulation is greatly simplified, since all symbols are rewritten in parallel and the only remaining control parameter is the length of the derivation process, that is, the number of global rewriting stages, denoted *n*. Since our study is mainly concerned with the structural evolution of genomes, this parameter will be constant for all practical purposes and set to *n* = 3 in all individuals.

#### 2.1.1 Genotype Representation

The D0L systems used in this article are defined as a triplet ({G,+,−,[,]}, G, {G → *T*}), where there is only one nonterminal symbol, denoted G. Accordingly, there is a single production rule G → *T*, whose right-hand side is a string *T* representing the *genome* of the plant, since it is the only part of the D0L system that changes from plant to plant. The genome *T* is composed of symbols from the alphabet {G,+,−,[,]} and includes at least one instance of the symbol G, with the additional syntactic restriction that brackets must be balanced. Genetic expression and organism development in the proposed model consist of rewriting the axiom G by applying the rule G → *T* exactly *n* = 3 times, using the production rule encoded by the genome. In this way, the encoding method is rather straightforward, as first remarked in [35]: Since only one rewrite rule applies, its right-hand side codes fully, albeit indirectly, for the plant's structure.

The final string generated after three applications of the rule G → *T* is referred to as the *developed string*, and it is denoted *D*(*T*) to highlight the deterministic nature of the system. The top row of Figure 1 illustrates this derivation process. Despite its simplicity compared to most works in L-system evolution, our encoding allows for the definition of robust genetic mutation operators (see Section 2.2).

#### 2.1.2 Phenotype Development

The phenotype of an individual, denoted *P*(*D*(*T*)), is the complete matrix of pixels (bottom of Figure 1; see also a close-up view in Figure 4) produced from the developed string *D*(*T*), also in deterministic fashion. To create the phenotype, the developed string is interpreted graphically through a Logo-style geometrical routine [1]. Initially, a graphic turtle pen points upwards and its memory stack is empty; then each symbol of *D*(*T*) is executed as follows:

+ (−): The turtle changes its current direction by adding (subtracting) an angle of 22°.

[: The turtle pushes (saves) onto the stack its current position and direction.

]: The turtle pops (restores) from the stack the last saved position and direction.

G: The turtle advances 10 units in its current direction, marking the pixels on the way in black (part of the plant) and the last pixel in red (representing a

*leaf*; see Figure 4 and an explanation of its role in Section 2.3).

#### 2.1.3 Genotype-Phenotype Mapping

At this point, it is important to remark that closely similar genotypes can map to very different phenotypes (divergence) and, conversely, the exact same phenotype can be produced by very different genotypes (convergence). Divergence is illustrated in Figure 3, where genotypes G[G+[G][−G]] and G[G+[G][−−G]] differ only by one symbol, yet they produce widely dissimilar graphical forms. Convergence is due mostly (but not exclusively) to *redundancy* in genotypes and consequently in developed strings: For example, the sequence [G][G]G has the effect of drawing the same branch three times, as every closed bracket followed by an open bracket takes the graphical turtle back toits previous bifurcation point. This also means that the total number of branches of the graphical phenotype *P*(*D*(*T*)) is not equal to the number of instances of G in *D*(*T*) (which is the cube ofthe number of instances of G in *T*). For example, for *T* = G[−−GG][G] (left individual in Figure4), the corresponding developed string *D*(*T*) contains 4^{3} = 64 instances of G, whereas the corresponding phenotype *P*(*D*(*T*)) only has 33 distinct branches. We discuss again this issue of relative uncorrelatedness between genomic similarity and phenotypic similarity when defining distance measures in Section 3.2.3. The specific case of genotypic redundancy is addressed in the next section.

#### 2.1.4 Reduction of Genotypes

The mapping from genotypes *T* to phenotypes *P*(*D*(*T*)) is many-to-one: there exist different genotypes that can be graphically translated into the exact same phenotype. Without loss of generality, since the mapping from *T* to *D*(*T*) is one-to-one, let us consider only the transformation of *D*(*T*) into *P*(*D*(*T*)).

For example, the string *D*(*T*) = G creates the simplest phenotype of all, consisting of only one vertical branch. This would also be the case of strings *D*(*T*) = [[G]] (the additional brackets have no effect), *D*(*T*) = [G][G]G (the same branch is drawn three times by the turtle), and *D*(*T*) = [][+−G++] (no effect is produced by empty brackets, +− pairs, or any combination of + and − symbols at the end of a bracket). This raises interesting questions, in particular: Given a phenotype *P*(*D*(*T*)), what is the minimal developed string *D*(*T*), and hence the minimal genotype *T*, that can generate it?

While this question cannot be easily answered for every case, simple heuristics can be applied to a genotype *T* to remove the most blatant redundancies in order to substantially reduce the computational cost of the simulations, producing what can then be called the *reduced* genotype *R*(*T*). It is important to note, however, that the reduced genotype is not part of the biological model, but only a tool to analyze this model (see Section 3.2). The phenotype is invariant under reduction of the genotype: *P*(*D*(*T*)) = *P*(*D*(*R*(*T*))); while *T*'s length increases at a significantly higher rate than *R*(*T*)'s length, as will be later explained in Section 3.2.

The algorithmic reduction that we use here simply applies the following four transformation rules iteratively until no further change is made to the genotype:

- 1.
Review all bracketed segments and remove any trailing + or − symbols inside them. Also remove all +− and −+ pairs wherever they appear. For example: G+−+[G+−+] becomes G+[G].

- 2.
Remove empty brackets and extra layers of nested brackets, except the last pair. For example: [[[G[]][]]] becomes [G].

- 3.
Reduce expressions of the form [

*X*]*X*to*X*and [*X*][*X*] to [*X*], where*X*does not contain any brackets (to accomplish this, we use regular pattern matching for performance reasons). - 4.
Unnest expressions of the form [[

*X*_{1}][*X*_{2}]…[*X*_{n}]] to produce [*X*_{1}][*X*_{2}]…[*X*_{n}], and expressions of the form [[*X*_{1}][*X*_{2}]…[*X*_{n}]*X*_{n+1}] to produce [*X*_{1}][*X*_{2}]…[*X*_{n}][*X*_{n+1}],*unless*the outer pair of brackets encloses the whole genotype*T*. In the last case, removing it would dramatically change the phenotype, since the rule G →*T*is applied three times to generate*D*(*T*) (whereas it would not change anything if removed from around*D*(*T*)).

### 2.2 Mutation Operators

While clonal reproduction ideally creates perfect copies of successful genomes, other genetic mechanisms alter the genomic information passed on to descendants, ultimately causing evolution. We denote with *M* a mutation operator transforming one genotype into another: *T*′ = *M*(*T*). In the present model, mutations are constrained to preserve well-balanced brackets in the symbolic expressions. Two types of mutations are modeled here: *point mutations* and *duplications*. Point mutation operators affect the genome at the level of a single symbol or a bracketed expression in three different ways:

*M*_{A}:*alteration*, replacing a symbol (other than brackets) by another symbol (other than brackets);*M*_{D}:*deletion*, removing a symbol (other than brackets) or a whole bracketed expression of the genotype (under the constraint that the resulting genotype still includes at least one G symbol);*M*_{I}:*insertion*, adding a symbol or an empty bracketed expression [] (itself susceptible to being filled later on) at a random position in the genome.

*M*_{R}:*random duplication*, inserting the copy of the bracketed expression at any position in the genome (including possibly inside itself);*M*_{L}:*level duplication*, inserting the copy of the bracketed expression at any position in the genome that is located at the same bracket-nesting level as the original (where the nesting level of a position can be computed as the number of open brackets minus the number of closed brackets leading to that position);*M*_{T}:*tandem duplication*, inserting the copy of the bracketed expression immediately after the original (note that this kind of mutation is always silent according to the genotype-to-phenotype mapping—e.g.,*P*(*D*([G])) =*P*(*D*([G][G]))—until further mutations change either the original or the copy).

### 2.3 Environment and Fitness Function

The plants' environment is simply the discretized 2D plane containing the pixel matrix occupied by the plant phenotypes. Accordingly, a pixel of the environment can be either empty or occupied by one or several overlapping plant bits (branches or leaves). This 2D world is bounded only downwards by the horizontal ground. It extends without bounds left, right, and up. Plants are modeled solely by their aerial part, ignoring the roots, and can grow in width or height without restriction. This means that a plant can cover an unlimited number of pixels, and the length of occupied ground can increase as far as plants can reach.

Plants do not interfere with one another during growth; they develop independently as if in isolation. When immersed in the environment, the base of each plant's stem (i.e., where the turtle starts drawing) is assigned a horizontal coordinate on the ground; then the plant's branches and leaves freely expand in the open environment. As explained above, the exact phenotype *P*(*D*(*T*)) is a deterministic product of the turtle-graphic interpretation of the thrice written string *D*(*T*). If any branch of the plant hits the ground (i.e., if any pixel is going to be drawn below ground level), then the plant is marked as unfit and removed from the environment and the evolving population.

The environment offers a single source of energy: light, distributed homogeneously and uniformly over space and constantly over time. Light projects from the top, vertically, one beam per pixel column, and is captured exclusively by the plants' leaves (the red terminal pixels), not by their branches. In other terms, only leaves are opaque: A unit of light hitting a leaf is fully absorbed and contributes to the energy captured by the plant. It does not propagate further down, nor is it reflected in any direction. The only two other possible events involving light are falling through a branch or vanishing into the ground. If a unit of light hits a pixel that is occupied by several overlapping leaves from different plants, then the energy contribution will be captured by only one of these, chosen at random.

*a*in the environment has a replicating potential, or

*fitness*, determined in part by the total amount of light, denoted

*l*

_{a}, that it is able to capture. The other part contributing to the fitness, but in an inverse way, is the size of the foliage,

*f*

_{a}, measured as the number of branches. This number corresponds to the subset of those instances of G in

*D*(

*T*) that caused the turtle to switch a white pixel to black (i.e., it excludes other instances of G that only caused the turtle to redraw an existing branch; see previous discussion in Section 2.1.3). In summary, the fitness

*F*

_{a}of each individual is based on the ratio of gained energy

*l*

_{a}to spent energy

*f*

_{a}, thus it represents the ease of building its structure (the higher

*F*

_{a}, the better):in which α ∈ [0, 1] is a constant exponent. The parameter α tunes the environmental cost of growing and branching, and for this reason we refer to α as the

*harshness*of the environment: The higher α, the more difficult it is for the plant to increase its surface exposure to light without compromising its overall fitness. Typical values of α will be chosen in the interval [0.5, 1] (see Section 3), which corresponds to fitness values

*F*

_{a}roughly of the order of 1.

### 2.4 Evolutionary Schedule

Numerical simulations of plant populations are organized in synchronous cycles, or *generations*. At each time step, the previous generation of plants is entirely removed from the environment and a new generation is calculated by applying the following four transformations (Figure 4):

(A)

*Development*: Each plant of the*i*th generation (offspring of another plant from the (*i*− 1)th generation) is fully developed from the ground up by calculating its phenotype*P*(*D*(*T*)) from its genotype*T*. Phenotypes with branches projecting below the ground are discarded from the population.(B)

*Competition*: Each plant captures the light arriving to its leaves and calculates its fitness value*F*. This is done by positioning the phenotype's stem in the environment at a location that was randomly chosen when the (*i*− 1)th generation was spawned. As explained in the previous section,*F*can be affected by the overlap with neighboring plants.(C)

*Reproduction*: Each plant then spawns a certain number of descendants (a function of its fitness), and each descendant is assigned a random position in the environment near its parent (see details below).(D)

*Mutation*: Finally, each descendant is subject to the mutation operators explained in Section 2.2, potentially modifying its genotype. After mutation, the set of all resulting genotypes make the (*i*+ 1)th generation, which is ready to be processed in the same way as the previous generation, starting again from step (A).

The initial population is composed of a single plant, whose genome codes for the simplest L-system able to reproduce: the single-character string G. This single-individual, single-symbol initialization scenario is at the core of the main motivation in this work, which is to study the evolution of diversity in purely emergent communities. It is thus best to start from the simplest forms of organization, both at the organism and at the population level. If we started from a population of already developed plant shapes, this would bias the results of the study.

The (A)…(D) loop is run for a preset amount of generations, 500 in this work. It can also be stopped earlier if it becomes too computationally costly due to occasional explosions in the size of *T* and hence *D*(*T*) and *P*(*D*(*T*)).

## 3 Experimental Evolution of Virtual Plant Communities

We present here the results of a number of experiments in detail. Our main observation is that evolution drives a single individual containing the simplest possible genome *T* = G toward an increasingly large population that expands along the horizontal ground in both directions and grows vertically. Our simulations of virtual plant communities illustrate how organisms proliferate, diversify, and eventually colonize their world. With each individual able to grow, absorb energy, replicate, and mutate, experiments show that the size of the population first increases, then starts to diversify. It can then take different evolutionary paths. Final distributions of phenotypes range from a spread-out collection of simple individuals to winner-take-all situations where a few individuals have developed dramatically large morphologies, extinguishing all simpler types—after which they start a Red Queen race with each other toward even greater heights and widths.

Three values of the exponent α were tried (see definition above in Section 2.3). We summarize the resulting three different worlds here, then describe their dynamics in greater detail in Section 3.1. We also present *distance* measures in Section 3.2 for a more rigorous quantitative assessment of the visible qualitative differences between these three dynamic regimes:

α = 0.5 corresponds to a

*mild*environment: The cost of growing is low, yet the population collapses rapidly because a handful of individuals manage to evolve toward very wide, tall, and branchy forms, blocking most of the light and swiftly eliminating smaller competitors.α = 0.75 corresponds to a

*harsh*environment: The cost of growing is higher than in the previous case; thus simple individuals can proliferate while larger individuals develop, compete, and eventually become extinct, resulting in a slowly declining population size.α = 1 corresponds to a

*very harsh*environment: In this case, the cost of growing is so high that all individuals remain simple and proliferate near the ground.

### 3.1 Overview of the Population Dynamics

Figures 5, 6, and 7 display a full example of simulated population in each type of environment. In the electronic version, individuals are colored randomly for visualization purposes. The first population (Figure 5) is typical of a mild environment where plants grow to a great magnitude, both in height and in breadth, and are intensely competing with neighboring individuals. The second population (Figure 6) shows a moderately harsh environment filled with a higher diversity of individuals presenting a more limited growth area and interfering less with other individuals. Simpler plants grow at the two ends of the population's domain. Finally, the last population (Figure 7) is characteristic of a very harsh environment, where plant forms remain simple and short and colonize the ground efficiently.

Generally, the type of environment is reflected in the evolution of two global variables: the *population size*, equal to the number of plants, and the global *biomass*, defined here as the total number of branches of all plants (Figure 8). In a mild environment (black line), a fast spike in the number of plants is followed by a great extinction event, as tall individuals eliminate smaller ones by overshadowing their canopies. Even in this scenario of dwindling population size, however, the few remaining plants evolve toward such gigantic forms, and do this so rapidly that the total biomass increases exponentially. By contrast, in a harsh environment (dark gray), the population size fluctuates as colonization of new territories is punctuated by local extinctions due to tall plants. Biomass also fluctuates, but tends to grow over larger time periods, as competition triggers the evolution of larger plants. Finally, in a very harsh environment (light gray), the population usually increases faster than in the previous case, but fluctuations remain present, since a limited degree of complexification still takes place while plants compete for light. Biomass grows very slowly, however, mostly as a consequence of the small size of the plants.

Mean fitness values () in each kind of environment are also revealing (Figure 9). In very harsh simulations, where α = 1, is always less than or equal to 1 because by definition the number of light beams, *l*_{a}, captured by an individual *a* cannot be higher than its number of leaves and hence branches, *f*_{a}. In milder simulations, where α < 1, some individuals can reach fitness values much larger than 1. However, the results show that the milder the environment, the lower the mean fitness level. This is due to the fact that milder environments are more crowded because plants are more prolific than in harsher environments and offspring's positions are statistically closer to their parents. Crowded conditions then create more intense competition, resulting in a significant decrease of the mean fitness. In fact, Figure 9 clearly shows that the mean fitness never rises above 1 in any environment, which indicates that high-fitness individuals constitute a minority whose sum never outweighs the remaining low-fitness population (with respect to reference level 1). Note also an interesting phenomenon of episodic *collapse* of the fitness distribution during the early generations in all environments: The mean reaches exactly 1, while the standard deviation drops to 0. These particular generations correspond to populations composed exclusively of elementary stick plants of length 1, whose fitness is exactly 1 by definition (1 unit of light divided by 1 branch, for any exponent α). At these moments, more complex individuals have all died out, albeit temporarily.

Note that the mild environmental condition α = 0.5 is also characterized by an exploding growth in computational cost, both in memory and in time. This is why we decided to stop the simulation after 151 generations, while the time to generate the phenotypes and evaluate the fitness of the individuals was still within reasonable limits. In order to estimate the computational cost of pursuing the simulation toward later generations, an exponential curve was fitted to a 2D cloud of experimental (*i*, *t*_{i}) points, where *i* is the generation number, and *t*_{i} the CPU time required to calculate the plants up to generation *i*. Extrapolating from this curve, the CPU time to reach generation 200 increased 100-fold, while for generation 300 it increased 70,000-fold.

### 3.2 A Distance-Based Measure of Diversity

To quantitatively assess the amount of diversity of the plant population in each environment, we need a measure of distance between individuals. We propose here two definitions of distance: a genetic distance and a phenotypic distance; then we show a comparison of their abilities to differentiate among the three types of environment.

#### 3.2.1 Definition of a Genetic Distance

We use here the *edit distance*(*T*_{a}, *T*_{b}) between two strings of symbols *T*_{a} and *T*_{b} (the genomes of two individuals *a* and *b*), defined as the minimal number of insertions, deletions, and alterations of symbols in *T*_{a} needed to transform it into *T*_{b}, or vice versa (it is indeed symmetrical, as every insertion can be reverted by a deletion and every alteration by the opposite alteration). In real-world genomics, the genetic distance between nucleotide sequences is measured in many cases by a *weighted edit distance* [4], using one of several possible weighting schemes [3] specifically fitted for each specific task. In many cases, however, genomic metrics take into account some statistical model of mutation [17]. In the present study, we measure genetic diversity using a raw edit distance (without weighting). For example, if *T*_{a} = G+[G[−G]G]+ and *T*_{b} = G[G[−−G]G]G, then (*T*_{a}, *T*_{b}) = 3, as we can transform *T*_{a} into *T*_{b} by removing the second symbol from *T*_{a}, adding a symbol − before the third G, and replacing the last symbol + by G.

#### 3.2.2 Definition of a Phenotypic Distance

*Jaccard distance*, denoted . This distance equals the ratio of overlapping pixels to the total number of pixels occupied by both plants, which are aligned and superimposed to make their stem bases match. Thus, if

*P*

_{a}and

*P*

_{b}are the pixel sets of two aligned individuals

*a*and

*b*, the Jaccard distance reads

Figure 10 shows three distinct phenotypes *P*_{a}, *P*_{b}, and *P*_{c}, illustrating the Jaccard distance between two similar ones ((*P*_{a}, *P*_{b}) = 0.31) and two dissimilar ones ((*P*_{a}, *P*_{c}) = 0.92).

#### 3.2.3 Comparing Genetic and Phenotypic Diversity

*N*is the size of the population, and or (similar notions, such as “discrepancy,” have been used elsewhere [42]). Figure 11 shows the evolution of and over time, using each of the two distances above. Note that the edit distance was appliedto reduced genotypes

*R*(

*T*) only; interestingly, the results obtained with original genotypes

*T*turned out to be very similar, essentially differing in scale, so we omitted them for clarity.

In milder environments, the diversity based on the Jaccard distance increases faster toward its highest value 1 (in particular, the growth is considerably slower for the very harsh simulation). The evolution of diversity () based on the edit distance between reduced genomes, for its part, is rather similar in the harsh and very harsh environments, while the mild conditions clearly stick out through high values. This process of accelerated diversification is driven by a high growth rate in the length of the reduced genotypes.

The observed differences between the genetically based and phenotypically based notions of diversity and their evolution over generations can be ascribed to the decorrelation introduced by the indirect developmental mapping. As discussed in Section 2.1.3, a consequence of this mapping is that changes in the genotype may have wildly different consequences on the phenotype: Some mutations may cause only slight modifications in the phenotype, or none at all, while others can cause major transformations. This is famously the case of *homeobox* genes, which play a critical role in the establishment of the overall body plan of metazoan organisms [33], and whose mutation can give rise to ectopic structures, such as the development of legs in the place of antennae in *Drosophila*. Conversely, the phenomenon of *convergent evolution* also hints at the possibility of very different genomes associated with very similar phenotypes. Thus genetic diversity and phenotypic diversity are not necessarily correlated. Figure 3 shows a case where similar genotypes give rise to clearly different phenotypes, while, on the contrary, related phenotypes encoded by very different genotypes can be seen in Figure 12.

### 3.3 Robustness to Mutations

Evolutionary search relies on the application of genetic operators to the individuals of a population (i.e., in this study, to their genomic strings *T*, *not* their developed strings *D*(*T*)) in order to create variability. Since reproduction is asexual here, these operators intervene only in the form of mutations. By mutating, individuals explore the fitness landscape in search of solutions adapted to the problem imposed by the environment via the fitness function. Mutation operators introduce alterations in the genome, which provoke changes in the phenotype through a developmental process. One could say that biology makes use of “indirect strategies” to encode organisms, in the sense that the genotype-to-phenotype mapping is generally highly complex. In real-world multicellular organisms, it is the result of a gigantic self-generating and self-assembling process involving between thousands and trillions of cells. In our abstract virtual model, it is produced by the linear expansion of a string that can typically contain between a dozen and a few thousand characters, giving rise to a geometric morphology covering hundreds to millions of pixels.

In this section, we use the Jaccard distance (see definition above, Section 3.2.2) to study the extent of the disruption provoked by the different types of mutation operators in morphological phenotypes (only single mutations this time, not in sequence). The first five types of mutation operators described in Section 2.2 are considered: alteration (*M*_{A}), deletion (*M*_{D}), insertion (*M*_{I}), random duplication (*M*_{R}), and level duplication (*M*_{L}), leaving out tandem duplication. We use the notation *M* ∈ {*M*_{A}, *M*_{D}, *M*_{I}, *M*_{R}, *M*_{L}} to refer to any one of these mutation operators.

Our motivation is to test the hypothesis that evolution encourages the selection of genomes that are more tolerant to mutation-induced disruption [46]. The concept of *mutational robustness* has been correlated with the evolutionary advantage of lower-than-optimal fitness plateaus under high mutation rates [47], the existence of large networks of neutral mutations [45], and the variability of the environment [30]. In our model, an evolved genome might show a more robust underlying structure than a random genome, in the sense that it would be able to minimize the effects of various mutations, especially if the environmental conditions remain stable during evolution. Thus, for each mutation operator *M*, we are interested in measuring its effect on the phenotype of an individual *a*. To this aim, if *b* is the mutated individual such that *T*_{b} = *M*(*T*_{a}), we define the *quantity of disruption* as the Jaccard distance between *a* and *b*, (*P*_{a}, *P*_{b}).

Three pools

*E*_{m},*E*_{h},*E*_{vh}of evolved genotypes, one for each kind of environment—respectively mild, harsh, and very harsh—composed of all the distinct genotypes produced by the last generation of the corresponding simulation (for*E*_{m}the 150th generation, due to the high computational cost discussed at the end of Section 3.1; for*E*_{h}and*E*_{vh}, the 500th generation). These pools contain 64, 245, and 593 genotypes, respectively.Three other pools

*R*_{m},*R*_{h},*R*_{vh}of randomly generated genotypes. Each pool*R*_{x}matches the corresponding evolved pool*E*_{x}as follows: It contains the same number of genotypes, and its random strings were generated in such a way that they have the same statistical properties as the strings of*E*_{x}(average length and probability of appearance of each symbol, under the constraint of well-balanced brackets).

*T*

_{a}in each pool, 100 different mutated strings

*T*

_{b}=

*M*(

*T*

_{a}) were generated per mutation operator

*M*(thus totaling 500 mutated strings per

*T*

_{a}), and the corresponding disruption values (

*P*

_{a},

*P*

_{b}) were evaluated. Finally, for each mutation operator

*M*and each pool

*E*∈ {

*E*

_{m},

*E*

_{h},

*E*

_{vh}}, was defined as the mean of all the distances (

*P*

_{a},

*P*

_{b}) between each string

*P*

_{a}in the pool

*E*and the corresponding strings

*P*

_{b}generated through mutation operator

*M*:Mutatis mutandis, we also defined the corresponding for pools

*R*∈ {

*R*

_{m},

*R*

_{h},

*R*

_{vh}}. We refer to and as the

*mean disruption*from the original to the mutated phenotypes for each pool and operator. Therefore, since we considered five mutation operators and six pools, a total of 5 × 6 = 30 mean disruption values were generated. These values can also be considered as a special type of diversity measure or “diversity jump,” applied here between two different worlds (based on before-mutation–after-mutation pairs of individuals) instead of the same world (based on all possible pairs of individuals).

The final results are displayed in Figure 13. As the results for harsh and very harsh environments were nearly identical, only the results for the mild and very harsh environments are presented here (20 values). The 2D coordinates of each one of the 10 points represent the pairs of mean disruption quantities of a given mutation type over the evolved and random pools, that is, . The main observations suggested by this chart can be summarized as follows:

In all cases, point mutations (

*M*_{A},*M*_{D},*M*_{I}) are far more disruptive than duplication mutations (*M*_{R},*M*_{L}).In the very harsh condition,

*point mutations are significantly less disruptive in evolved genomes than in random ones*, while duplication mutations are about the same (at a low level).In the mild condition, disruption caused by point mutations, while still lower for evolved genomes, is shifted toward significantly higher values than in the harsh condition.

*more*sensitive to all types of mutations, especially point mutations, in order to keep an edge over the ever larger competitors. In summary, these results reinforce the view that the environmental conditions must remain relatively constant for mutational robustness to emerge [30].

## 4 Discussion and Conclusions

This work combines advanced (yet minimal) models based on L-systems, genetic expression, biologically inspired mutations, and open-ended evolution in a population of interacting individuals to create a framework of evolving virtual plants. In order to characterize and understand the dynamics of diversification, data produced by the simulations was processed by simple statistical analysis based on measures of distance. While we kept the overall model as simple as possible (through a reduced number of rules and parameters), the results show the evolution of plantlike organisms capable of increasing their degree of complexity and diversification at the same time that they are competing for common resources.

The biological value and relevance of the model is twofold. On the one hand, experiments can be fully recorded at the genomic level, since organisms are represented by one-dimensional chains of symbols, which deterministically develop into plantlike structures. Because mutations act upon genotypes, the genomic dynamics can be characterized by a statistical analysis of the pool of character strings. On the other hand, this simple genotypic model maps to a highly complex phenotypic space [9] allowing the study of rich evolutionary processes oriented toward greater organism complexity and population diversity.

We wish to stress the simplicity of our model. Other models of evolving plant communities based on L-systems are generally more complicated, making use of a great number of rules and symbols to encode the various parts of the plants. In some of these works, the conclusion was that the genetic search space of D0L systems was too limited [8]. However, this might be precisely a consequence of an overly complicated framework, since the high-dimensional parameter space that these models generate can be particularly difficult to explore and analyze. In contrast, our minimal plant system facilitates the identification of relevant biological phenomena during the simulations, in which the great volume of generated data helps, rather than hinders, the understanding of finer causality links among interacting individuals.

It is important to stress here that the present model is not based on data or parameters derived from real-world ecosystems, such as densities of species or rates of interactions between individuals. It is an abstract, artificial-life model that is not committed to any particular plant community found in nature. Yet, we believe that general parallels with biological ecology can still be drawn. More specifically, the model shows the emergence of at least two different evolutionary strategies: (1) developing a simple morphology while yielding large offspring to escape competition and colonize new areas; (2) developing a large and branchy morphology to better compete in overcrowded environments, but at the cost of reduced offspring. These two dynamics closely resemble the ecological transition from newly colonized areas to old-growth forest. In fact, the results presented here show that very harsh environments are characterized by the first strategy and mild environments by the second, reflecting their differences in developmental costs.

In summary, the biological implications of our results are the following:

- 1.
Population diversity and individuals' complexity strongly depend on environmental conditions, confirming experimental [39] and theoretical results [24].

- 2.
Population dynamics can be extremely varied, ranging from low-diversity populations that tend to increase in size, to very small, exclusive communities of very large individuals.

- 3.
Environmental conditions must remain relatively constant for mutational robustness to emerge.

- 4.
The evolution of phenotypic complexity is based upon the dynamics of genetic mutations. As these are quick to produce larger genomes, the evolutionary exploration is biased toward more complicated structures.

- 5.
Eventually, one individual undergoes a fast excursion toward significantly more complex morphologies, extinguishing the intermediate individuals from which it evolved. These complex structures then dominate during a certain period of time, eventually themselves disappearing due to the combined effect of high internal competition and excessive enlargement.

Greater computer power available at a lower cost provides new means to simulate complex worlds of evolving and interacting organism populations. Today, large computational resources are now accessible to conventional research groups, and more projects are undertaken in the field of artificial life, which is potentially open to any degree of model sophistication and realism. Theoretical ecology, in particular, can greatly benefit from this approach, as several open problems of population genetics can be formulated and studied in virtual communities.

## Acknowledgments

The authors wish to thank Andrew Hendry and Claire de Mazancourt for their insightful comments on a preliminary version of the manuscript, and Vicente Canteli for technical support. This research was partially supported by a grant for the GENEX project (P09-TIC-5123) from the Consejería de Innovación y Ciencia de Andalucía. J.D.F. was supported by a FPU grant from the Spanish Ministerio de Educación. R.D. wishes to thank the Région Ile-de-France for supporting his research position at the Complex Systems Institute, Paris Ile-de-France.

## References

*Bostrychia radicans*(Ceramiales, Rhodophyta) using Lindenmayer systems.

*Trifolium repens L.*) using L-systems.

## Author notes

Contact author.

Research Group in Biomimetics, Department of Computer Science and Languages, University of Málaga, Severo Ochoa 4, 29590 Málaga, Spain. E-mail: josedavid@geb.uma.es (J.D.F.); dlobo@geb.uma.es (D.L.); gema@geb.uma.es (G.M.M.); fjvico@uma.es (F.J.V.)

Complex Systems Institute Paris Ile-de-France (ISC-PIF), CREA, Ecole Polytechnique and CNRS, 57-59, rue Lhomond, 75005 Paris, France. E-mail: rene.doursat@polytechnique.edu