## Abstract

In gene regulatory circuits, the expression of individual genes is commonly modulated by a set of regulating gene products, which bind to a gene's *cis*-regulatory region. This region encodes an input-output function, referred to as signal-integration logic, that maps a specific combination of regulatory signals (inputs) to a particular expression state (output) of a gene. The space of all possible signal-integration functions is vast and the mapping from input to output is many-to-one: For the same set of inputs, many functions (genotypes) yield the same expression output (phenotype). Here, we exhaustively enumerate the set of signal-integration functions that yield identical gene expression patterns within a computational model of gene regulatory circuits. Our goal is to characterize the relationship between robustness and evolvability in the signal-integration space of regulatory circuits, and to understand how these properties vary between the genotypic and phenotypic scales. Among other results, we find that the distributions of genotypic robustness are skewed, so that the majority of signal-integration functions are robust to perturbation. We show that the connected set of genotypes that make up a given phenotype are constrained to specific regions of the space of all possible signal-integration functions, but that as the distance between genotypes increases, so does their capacity for unique innovations. In addition, we find that robust phenotypes are (i) evolvable, (ii) easily identified by random mutation, and (iii) mutationally biased toward other robust phenotypes. We explore the implications of these latter observations for mutation-based evolution by conducting random walks between randomly chosen source and target phenotypes. We demonstrate that the time required to identify the target phenotype is independent of the properties of the source phenotype.

## 1 Introduction

Living organisms exhibit two seemingly paradoxical properties: They are robust to genetic change, yet highly evolvable [35]. These properties appear contradictory, because the former requires that genetic alterations leave the phenotype intact, while the latter requires that these alterations explore new phenotypes. Despite this apparent contradiction, several empirical analyses of living systems, particularly at the molecular scale, have revealed that robustness often facilitates evolvability [5, 13, 16, 19]. A case in point is cytochrome P450 BM3. In this protein, both thermodynamic stability and mutational robustness—defined as the tendency of a protein to adopt its native structure in the face of mutation—increase the protein's propensity to evolve novel catalytic functions [5]. In other, unrelated enzymes, the ability to evolve new functions is facilitated by the ability of the enzyme's native functions to tolerate mutations [1]. More generally, the enzymatic diversity of proteins increases with their mutational robustness. This indicates that robust proteins have acquired functional diversity in their evolutionary history—they are more evolvable [13].

To clarify the relationship between robustness and evolvability, several theoretical models have been proposed (e.g., [11, 28, 36]). A common feature of these models is the concept of a genotype network (a.k.a. neutral network). In such a network, each node represents a genotype. Edges connect genotypes that share the same phenotype and can be interconverted via single mutational events (Figure 1A). In the case of RNA secondary-structure phenotypes, for example, nodes represent RNA sequences and two nodes are connected if their corresponding sequences confer the same minimum-free-energy secondary structure, yet differ by a single nucleotide [33]. Robust phenotypes have large genotype networks [36]. Phenotypic robustness confers evolvability because a population can diffuse neutrally throughout the genotype network [16, 18] and build up genetic diversity, which allows access to novel phenotypes through non-neutral point mutations into adjacent genotype networks [36].

Genotype networks have been used to explore the relationship between robustness and evolvability in a variety of biological systems, ranging from the molecular [8, 13, 14, 33, 37] to the cellular level [3, 6, 7, 27]. At the level of proteins and their three-dimensional structure phenotypes, for example, a genotype network is made up of all amino acid sequences that fold into the same structure and that can be reached via single amino acid substitutions [13]. Robust proteins, which are commonly referred to as designable [24], have large genotype networks and exhibit an enhanced capacity for functional innovation [13]. This is a consequence of the arrangement of protein functions in sequence space, where distant regions of a genotype network provide access to different protein functions [14].

At higher levels of organization, the phenotype of interest is often a gene expression pattern. Such a pattern is produced by a gene regulatory network, which consists of gene products that activate and inhibit one another's expression. Gene expression is controlled by a gene's *cis*-regulatory region on DNA (Figure 2A), which can be thought of as performing a computation (Figure 2B), using the regulating gene products as inputs. The regulatory program that encodes this computation is referred to as signal-integration logic. It is specified in the network's (regulatory) genotype, which comprises the coding regions and the *cis*-regulatory regions of the network's genes.

Previous studies of the robustness and evolvability of gene regulatory networks have focused on genetic perturbations that alter network structure by adding or deleting regulatory interactions [3, 6, 7, 27]. In this case, two gene regulatory networks are connected in a genotype network if they confer the same gene expression pattern, yet differ in a single regulatory interaction (Figure 1B). The corresponding genotype network is therefore a “network of networks” [7]. These analyses have revealed several general properties of the genotypes and phenotypes of gene regulatory networks. First, a genotype's capacity to bring forth new gene expression phenotypes through genetic change is dependent upon its position in a genotype network [6]. Second, the robustness of a genotype to mutation varies across a genotype network, so that some genotypes are vastly more robust than others [7], implying that genotypic robustness is itself an evolvable property [27]. Third, phenotypes have vast genotype networks that extend throughout the space of all possible genotypes [6, 27]; and fourth, highly robust phenotypes are often highly evolvable [3, 6].

While these studies have helped to elucidate the relationship between robustness and evolvability in gene regulatory networks, they are limited by their assumption that genetic perturbations primarily affect network structure. It is well known that the presence or absence of regulatory interactions is not the only determining factor of gene expression patterns [17, 21, 26, 34]. Specifically, by altering the arrangement of promoters and transcription factor binding sites (Figure 2A, shaded boxes) in a gene's *cis*-regulatory region, the signal-integration logic of gene regulation can be dramatically influenced. For example, by rearranging the location of transcription start sites in the promoter region of a reporter gene in the galactose network of *Escherichia coli*, 12 of the 16 possible Boolean input-output mappings can be realized [17]. Thus, it is not only the structure of regulatory interactions that affects robustness and evolvability, but also the logic of signal integration used in the *cis*-regulatory region of each gene. When genetic perturbations correspond to changes in the signal-integration logic, two gene regulatory networks are connected in the genotype network if they are topologically identical and confer the same gene expression pattern, yet differ in a single element of their signal-integration logic (Figure 1C). The extent to which genetic perturbations in the signal-integration logic of gene regulatory networks affect robustness and evolvability remains largely unexplored.

We have recently begun to address this open question using computational models of small gene regulatory circuits [30]. These circuits are ideal for such an investigation, because their genotype networks are exhaustively enumerable, which allows for a full characterization of the relationship between robustness and evolvability, at both the genotypic and phenotypic scales. In our previous analysis, we focused on the relationship between these properties at the level of the phenotype only, and investigated their influence on a simple, mutation-limited evolutionary process. Here, we extend our previous analysis substantially and provide a detailed characterization of robustness and evolvability at the level of the genotype. We address several fundamental questions concerning the structure of genotype networks in signal-integration space. For instance, what is the distribution of genotypic robustness on a genotype network? Does this distribution differ from the case where genetic perturbations affect circuit structure? Does the position of a genotype in genotype space affect its capacity for evolutionary innovation (i.e., its ability to acquire novel phenotypes via genetic change)? How does the robustness of a phenotype influence the robustness and evolvability of its underlying genotypes? Is there a tradeoff between robustness and evolvability at the level of the phenotype? Are robust phenotypes mutationally biased toward one another? If so, how does this influence mutation-based evolution? After addressing these and other questions, we discuss the implications of our results and present directions for future work.

## 2 Methods

### 2.1 Random Boolean Circuits

We use random Boolean circuits (RBCs) to model genetic regulation [22]. RBCs are composed of nodes and directed edges (Figure 2C). Nodes represent gene products, and edges represent regulatory interactions. Two nodes *a* and *c* are connected by a directed edge *a* → *c* if the expression of gene *c* is regulated by gene product *a*. Node states are binary, reflecting the presence (1) or absence (0) of a gene product, and dynamic, so that the state of a node at time *t* + 1 is dependent upon the states of its regulating nodes at time *t*. This dependence is captured by a lookup table associated with each node, which explicitly maps all possible combinations of regulatory input states to an output expression state. This lookup table is analogous to the signal-integration logic encoded in *cis*-regulatory regions. The signal-integration logic of all of the nodes in the network can be simultaneously represented using a single *rule vector* (Figure 2D).

The dynamics of RBCs occur in discrete time with synchronous updating of node states (Figure 2E). The dynamics begin at a prespecified initial state, which can be thought of as representing regulatory factors upstream of the circuit [6, 25]. The dynamics then unfold according to the circuit's structure and signal-integration logic. Since the system is both finite and deterministic, its dynamics eventually settle into an attractor [22], which represents the gene expression pattern, and is referred to as the *phenotype*. We refer to the combination of circuit structure, rule vector, and initial state as an *instance* of a RBC.

While simple, the Boolean abstraction has proven capable of precisely replicating specific properties of genetic regulation in natural systems. For example, variants of the model have emulated the expression patterns of the fruit fly *Drosophila melanogaster* [2], the plant *Arabidopsis thaliana* [12], and the yeast *Saccharomyces pombe* [9]. Due to their accuracy in capturing the dynamics of genetic regulation, and because the signal-integration logic of each gene is explicitly represented, RBCs are ideal synthetic systems for investigating the relationship between robustness and evolvability when genetic perturbations correspond to changes in signal-integration logic.

### 2.2 Dynamical Regimes of RBCs

An important feature of RBCs is that they exhibit three dynamical regimes: ordered, critical, and chaotic [22]. In the ordered regime, gene expression patterns (i.e., attractors) are relatively insensitive to perturbations, while in the chaotic regime they are highly sensitive. The critical regime delineates these two extremes. For randomly constructed circuits, as considered herein, the transitions between regimes are controlled by two parameters: the average in-degree *z* and the probability ρ of gene expression (i.e., the probability of observing a 1 in the rule vector). Letting *s* = 2ρ(1 − ρ)*z*, the RBC lies in the ordered regime when *s* < 1, the critical regime when *s* = 1, and the chaotic regime when *s* > 1 [4, 32]. When there is an equal probability of observing a 0 or a 1 in the rule vector (ρ = 0.5), the dynamical regime is determined solely by the average in-degree, with *z* < 2 yielding the ordered regime, *z* = 2 the critical regime, and *z* > 2 the chaotic regime. In this study, we use ρ = 0.5 and a fixed number of inputs per node, *z* ∈ {1, 2, 3}.

### 2.3 Genotype Networks

We refer to the signal-integration logic of a RBC, as represented by its rule vector (Figure 2D), as the circuit's *genotype*. There are a total of 2^{L} unique genotypes for a given combination of circuit structure and initial state, where *L* = *N*2^{z}. We refer to this set of genotypes as the genotype space, or equivalently, as the signal-integration space. For the RBCs considered here, the size of the genotype space ranges from 2^{6} for the ordered regime to 2^{24} for the chaotic regime.

These genotypes map to a significantly smaller set of phenotypes. This high level of redundancy is a general feature of RBCs, and can be formalized using a genotype network, in which rule vectors are represented as nodes, and edges connect rule vectors that differ by a single bit, yet yield the same gene expression pattern (i.e., phenotype). Thus, we define a neutral point mutation as a single change to an element of the genotype that does not lead to a change in phenotype. Such a mutation can be thought of as a change in the affinity or position of a transcription factor binding site in the *cis*-regulatory region that leaves the gene expression pattern unchanged. We characterize genotype networks using an exhaustive breadth-first search, thus discovering all genotypes that yield the same phenotype and are accessible via neutral point mutations, starting from the original genotype of an RBC instance. While it may be possible that a phenotype consists of multiple disconnected genotype networks, our analysis is focused on the connected component of the genotype network in which the RBC instance resides, which we refer to as the focal genotype network.

### 2.4 Observable Quantities

Several definitions of robustness and evolvability have been proposed, at both the genotypic and phenotypic scales [3, 11, 27, 37]. Here, we define these terms as they are used in this study. Where appropriate, we use superscripts to indicate whether a variable or function pertains to the genotypic (g) or phenotypic (p) scale.

The quantity *v*_{xpyp} captures the number of unique non-neutral point mutations to genotypes in the focal genotype network of phenotype *x*^{p} that lead to genotypes in a genotype network of phenotype *y*^{p}. We call phenotypes *x*^{p} and *y*^{p} adjacent if *v*_{xpyp} > 0. By enumerating all of the genotype networks that are adjacent to the focal genotype network of phenotype *x*^{p}, we capture the mutational biases between adjacent phenotypes; that is, we capture the propensity of the genotypes on one genotype network to mutate into the genotypes on adjacent genotype networks. Our analysis therefore completely characterizes both the focal genotype network and its adjacent genotype networks.

#### 2.4.1 Robustness

We define the genotypic robustness *R*^{g}(*x*^{g}) of a genotype *x*^{g} as the proportion of its *L* possible point mutations that do not lead to a change in phenotype. This quantity is normalized by *L* to provide a measure that is independent of rule vector length. We use as a proxy for phenotypic robustness *R*^{p}(*x*^{p}) the proportion of signal-integration space that is occupied by the focal genotype network of phenotype *x*^{p}. This proxy captures the fraction of all genotypes that yield the same phenotype and that can also be accessed via neutral point mutation from the rule vector of an RBC instance. This quantity is normalized by 2^{L} to provide a measure that is independent of the size of genotype space. Thus, we consider phenotypic robustness to be the fractional size of the phenotype's underlying focal genotype network.

#### 2.4.2 Evolvability

*E*

^{g}(

*x*

^{g}) is defined as the number of unique phenotypes that can be reached through individual non-neutral point mutations to genotype

*x*

^{g}. We define phenotypic evolvability using two metrics. The first,

*E*

^{p}(

*x*

^{p}), is simply the number of phenotypes that can be accessed through non-neutral point mutations from the focal genotype network of phenotype

*x*

^{p}[37]. The second, ξ

^{p}(

*x*

^{p}), captures the mutational biases that exist between the focal genotype network of phenotype

*x*

^{p}and its adjacent genotype networks [8]. Lettingdenote the fraction of non-neutral point mutations to genotypes on the focal genotype network of phenotype

*x*

^{p}that result in genotypes of phenotype

*y*

^{p}, we define the evolvability ξ

^{p}(

*x*

^{p}) of phenotype

*x*

^{p}asSince captures the probability that two randomly chosen non-neutral point mutations to genotypes on the focal genotype network of phenotype

*x*

^{p}result in genotypes with identical phenotypes, its complement ξ

^{p}(

*x*

^{p}) captures the probability that these same mutations result in genotypes with distinct phenotypes. This metric takes on high values when a phenotype is adjacent to many other phenotypes and its non-neutral point mutations are uniformly divided among these phenotypes. The metric takes on low values when a phenotype is adjacent to only a few other phenotypes and its non-neutral point mutations are biased toward a subset of these phenotypes.

#### 2.4.3 Accessibility

*x*

^{p}[8]. This metric takes on high values if the phenotypes adjacent to phenotype

*x*

^{p}are mutationally biased toward

*x*

^{p}, and low values otherwise.

#### 2.4.4 Adjacent Robustness

*x*

^{p}, in proportion to the probability that these phenotypes are encountered through a randomly chosen, non-neutral point mutation from the focal genotype network of phenotype

*x*

^{p}[8]. We refer to this quantity as adjacent robustness,This metric takes on high values when a phenotype is mutationally biased toward robust phenotypes, and low values otherwise.

#### 2.4.5 Distance and Diversity

*D*

^{g}between two genotypes

*x*

^{g}and

*y*

^{g}is given by the normalized Hamming distancewhere δ(

*x*

_{i}

^{g},

*y*

_{i}

^{g}) = 1 if genotypes

*x*

^{g}and

*y*

^{g}differ at location

*i*, and δ(

*x*

_{i}

^{g},

*y*

_{i}

^{g}) = 0 otherwise. This quantity is normalized by

*L*to provide a measure that is independent of rule vector length.

*F*

^{p}of the sets of phenotypes

*P*(

*x*

^{g}) and

*P*(

*y*

^{g}) that are accessible through individual non-neutral point mutations to genotypes

*x*

^{g}and

*y*

^{g}of the same genotype network, is calculated asThis measure captures the proportion of all phenotypes in

*P*(

*x*

^{g}) and

*P*(

*y*

^{g}) that are unique to either

*P*(

*x*

^{g}) or

*P*(

*y*

^{g}). We refer to this measure as the diversity of adjacent phenotypes and consider it in the context of genotypic distance. For example, if the genotypic distance

*D*

^{g}(

*x*

^{g},

*y*

^{g}) between genotypes

*x*

^{g}and

*y*

^{g}is large, but the diversity

*F*

^{p}(

*P*(

*x*

^{g}),

*P*(

*y*

^{g})) of adjacent phenotypes is small, then the position of a genotype in genotype space has little influence on which phenotypes are accessible via non-neutral point mutation. In contrast, if both

*D*

^{g}(

*x*

^{g},

*y*

^{g}) and

*F*

^{p}(

*P*(

*x*

^{g}),

*P*(

*y*

^{g})) are large, then the position of a genotype in genotype space has a strong influence on which phenotypes are accessible via non-neutral point mutation. Note that

*x*

^{g}and

*y*

^{g}can be separated by any distance, so long as they reside on the same genotype network. Note also that the phenotypes in

*P*(

*x*

^{g}) need not be adjacent to the phenotypes in

*P*(

*y*

^{g}).

### 2.5 Simulation Details and Data Analysis

For all RBC instances, the rule vector and initial state are generated at random with ρ = 0.5. The circuit structure is also generated at random, but subject to the constraint that each node has exactly *z* inputs. Self-loops are permitted, mimicking autoregulation. We separately consider RBCs in the ordered, critical, and chaotic regimes by setting *z* = 1, 2, 3, respectively. The initial state and circuit structure are held fixed for each RBC instance. To ensure that all of the genotype networks considered in this study are amenable to exhaustive enumeration, we restrict our attention to RBCs with *N* = 3 nodes. While these RBCs are small, sensitivity analysis [10] confirms that they exhibit the same dynamical regimes as larger networks, albeit with shorter attractors. To assess the strength and significance of the trends in our data, we employ Pearson's correlation coefficient.

## 3 Results

### 3.1 Characteristics of Genotype Networks

To describe the structure of signal-integration space in RBCs, we randomly generate 2,500 RBC instances for each dynamical regime and exhaustively characterize the focal genotype networks of their corresponding phenotypes, and the genotype networks of all adjacent phenotypes.

As the dynamical regime shifts from ordered to chaotic, the mean, variance, and maximum of the distributions of genotypic robustness (Figure 3A–C) and genotypic evolvability (Figure 3D–F) increase. For example, mean genotypic robustness increases by 72% and mean genotypic evolvability increases by 114% as the dynamical regime transitions from ordered to chaotic. Genotypic evolvability and genotypic robustness are inversely correlated (Figure 3G), highlighting the fundamental tradeoff between these two quantities. The strength of correlation is virtually indistinguishable between dynamical regimes (*z* = 1: *r* = −0.97, *p* ≪ 0.001; *z* = 2: *r* = −0.98, *p* ≪ 0.001; *z* = 3: *r* = −0.98, *p* ≪ 0.001), but the slopes of these inverse relationships vary significantly. For instance, increasing the genotypic robustness from 0.4 to 0.5 results in a 28% reduction in genotypic evolvability for chaotic RBCs, but only an 18% reduction for ordered RBCs.

The range of phenotypic robustness *R*^{p} varies with dynamical regime: ordered RBCs span the smallest range (3.12 × 10^{−2} ≤ *R*^{p} ≤ 1.25 × 10^{−1}), critical RBCs span an intermediate range (4.88 × 10^{−4} ≤ *R*^{p} ≤ 1.25 × 10^{−1}), and chaotic RBCs span the largest range (4.77 × 10^{−7} ≤ *R*^{p} ≤ 1.25 × 10^{−1}). The maximum value of phenotypic robustness is independent of dynamical regime, and corresponds to the case where the initial state is identical to the attractor, which thus comprises only a single state (i.e., it is a fixed-point attractor). In this case, there are no transient dynamics and each vertex is exposed to only a single input value during the attractor, which means that only a single entry of each of the *N* lookup tables is used. This implies that only *N* bits of the rule vector are accessed during the RBC's dynamics, leaving *L* − *N* bits unused. Thus, the corresponding genotype network is of size 2^{L−N}, with phenotypic robustness *R*_{max}^{p} = 2^{L−N}/2^{L} = 2^{−N} = 1.25 × 10^{−1}. The average phenotypic robustness decreases from the ordered (*R*^{p} = 9.57 × 10^{−2}) to the critical (*R*^{p} = 4.12 × 10^{−2}) to the chaotic (*R*^{p} = 3.02 × 10^{−2}) regime.

Average genotypic robustness is positively correlated (Figure 4A; *z* = 1: *r* = 0.82, *p* ≪ 0.01; *z* = 2: *r* = 0.88, *p* ≪ 0.01; *z* = 3: *r* = 0.81, *p* ≪ 0.01), and average genotypic evolvability negatively correlated (Figure 4B; *z* = 1: *r* = −0.83, *p* ≪ 0.01; *z* = 2: *r* = −0.88, *p* ≪ 0.01; *z* = 3: *r* = −0.81, *p* ≪ 0.01), with phenotypic robustness. The genotypes that map to a given phenotype therefore become more robust and less evolvable as that phenotype becomes more robust. Averaging genotypic robustness and genotypic evolvability across all 2,500 RBC instances per dynamical regime reveals a linear increase in both quantities as *z* increases (Figure 4, insets). Thus, the genotypes of chaotic RBCs are simultaneously more robust and more evolvable than the genotypes of critical or ordered RBCs, on average.

To determine the distributions of genotypic distance between pairs of genotypes, we randomly sample 10,000 pairs of genotypes from each of the 2,500 genotype networks per dynamical regime (Figure 5A–C). We then compare these distributions with their corresponding null distributions, which are computed by randomly sampling 25 million pairs of genotypes from the entirety of genotype space (i.e., without regard to phenotype) (Figure 5A–C, vertical lines). The average genotypic distance between randomly sampled pairs of genotypes from focal genotype networks increases from the ordered () to the critical () to the chaotic () regime. However, these averages are always significantly less than the averages of the corresponding null distributions (*p* ≪ 0.001 for all *z* , Kolmogorov-Smirnov test), indicating that the connected components of genotype networks of signal-integration logic are constrained to specific regions of genotype space.

To understand how the position of a genotype in genotype space affects the variety of phenotypes it may access via non-neutral mutation, we consider the diversity of adjacent phenotypes *F*^{p} as a function of genotypic distance *D*^{g} (Figure 5D). For all three dynamical regimes, the diversity of adjacent phenotypes increases as the distance between two genotypes on the same connected components of a genotype network increases. In the critical regime, for example, 43% of the phenotypes found within the 1-neighborhoods of genotypes separated by a distance of *D* = 0.08 are unique; for *D* = 0.75, 95% are unique. Thus, the greater the distance between two genotypes, the greater the difference between the two sets of phenotypes that may be encountered via non-neutral mutation. Strikingly, 100% of the phenotypes found within the 1-neighborhoods of genotypes separated by a distance *D* > 0.8 are unique in the chaotic regime. As expected, the average genotypic distance between randomly sampled genotypes increases as phenotypic robustness *R*^{p} increases (Figure 5E), supporting the intuitive notion that larger genotype networks extend farther throughout genotype space than smaller genotype networks.

Phenotypic evolvability *E*^{p} and phenotypic robustness *R*^{p} are positively correlated (Figure 6A), and the strength of correlation increases from the ordered (*r* = 0.65, *p* ≪ 0.01) to the critical (*r* = 0.89, *p* ≪ 0.01) to the chaotic (*r* = 0.98, *p* ≪ 0.01) regime. This indicates that, in this system, no tradeoff exists between phenotypic robustness and the number of phenotypes accessible via non-neutral point mutations; the more robust the phenotype, the higher its evolvability *E*^{p}. The average phenotypic evolvability increases faster than linearly with increasing *z*, indicating a rapid increase in the number of adjacent phenotypes as the dynamical regime shifts from ordered to chaotic (Figure 6A, inset).

When mutational biases between adjacent phenotypes are taken into account using ξ^{p}, a slightly different relationship is observed between phenotypic evolvability and phenotypic robustness (Figure 6B). RBCs in the ordered regime exhibit a weak and insignificant correlation between ξ^{p} and *R*^{p} (*r* = 0.02, *p* = 0.41). In contrast, RBCs in the critical and chaotic regimes exhibit weak, but significant correlations, with the strength of correlation increasing from the critical (*r* = 0.09, *p* ≪ 0.01) to the chaotic regime (*r* = 0.42, *p* ≪ 0.01). The average evolvability increases approximately linearly as *z* increases (Figure 6B, inset). Thus, the average probability that two randomly chosen, non-neutral point mutations lead to distinct phenotypes is only ≈15% higher in chaotic RBCs than in ordered RBCs, despite the four-order-of-magnitude difference in the absolute number of adjacent phenotypes (Figure 6A, inset).

Phenotypic accessibility *A*^{p} and phenotypic robustness *R*^{p} are positively correlated (Figure 6C), with the strength of correlation again increasing from the ordered (*r* = 0.81, *p* ≪ 0.01) to the critical (*r* = 0.93, *p* ≪ 0.01) to the chaotic (*r* = 0.98, *p* ≪ 0.01) regime. This implies that, for all three dynamical regimes, random point mutations are more likely to lead to robust phenotypes than to non-robust phenotypes. The average accessibility increases faster than linearly as *z* increases (Figure 6C, inset), indicating a rapid increase in the relative ease with which phenotypes are found by random mutation as the dynamical regime shifts from ordered to chaotic.

Adjacent robustness *B*^{p} and phenotypic robustness *R*^{p} are positively correlated, with the strength of correlation decreasing from the ordered (*r* = 0.89, *p* ≪ 0.01) to the critical (*r* = 0.64, *p* ≪ 0.01) to the chaotic regime (*r* = 0.35, *p* ≪ 0.01). This implies that non-neutral point mutations to genotypes within robust phenotypes often lead to other robust phenotypes, but that the strength of this tendency weakens as RBCs approach the chaotic regime. The average adjacent robustness decreases approximately linearly as *z* increases (Figure 6D, inset), indicating that the expected robustness of a phenotype encountered via non-neutral point mutation decreases as the dynamical regime shifts from ordered to chaotic.

Taken together, these results suggest that a series of random point mutations will tend toward phenotypes of increased robustness (Figure 6D) and correspondingly increased evolvability (Figure 6A,B). Further, the ease with which such a blind evolutionary process identifies an arbitrary phenotype should increase with that phenotype's robustness (Figure 6C) and as the dynamical regime shifts from ordered to critical to chaotic (Figure 6C, inset).

### 3.2 Random Walks through Signal-Integration Space

To investigate how phenotypic robustness, evolvability, and accessibility influence blind, mutation-based evolution, we conduct an ensemble of random walks. For each dynamical regime, we randomly generate 1,000 RBC instances and identify the phenotype of each instance as a source phenotype. For each instance, we then sample the genotype space at random until we discover a genotype that yields a different phenotype from the source phenotype, and we identify this as the target phenotype. For each pair of source and target phenotypes, we then perform a random walk, starting from the instance's genotype and ending when the random walk reaches any genotype in the target phenotype. Each step in the random walk corresponds to a single point mutation to the genotype. We record the number of steps, *S*, required to reach the target phenotype, which we normalize by the size of the signal-integration space 2^{L}, and refer to as the waiting time *T* = *S*/2^{L}.

The waiting time *T* decreases faster than linearly as *z* increases (Figure 7A, inset). For all three dynamical regimes, *T* is strongly negatively correlated with the accessibility *A* of the target phenotype (Figure 7A), and the strength of correlation increases from the ordered (*r* = −0.41, *p* ≪ 0.01) to the critical (*r* = −0.67, *p* ≪ 0.01) to the chaotic (*r* = −0.82, *p* ≪ 0.01) regime. In contrast, the correlation between the waiting time *T* and the evolvability *E*^{p} of the source phenotype is weak and insignificant (*z* = 1: *r* = −0.03, *p* = 0.38; *z* = 2: *r* = 0.01, *p* = 0.82; *z* = 3: *r* = −0.02, *p* = 0.56) (Figure 7B). We observed similarly weak and insignificant correlations between *T* and other characteristics of the source phenotype, such as ξ^{p}, *A*^{p}, and *B*^{p} (data not shown). These results indicate that the time required for a blind evolutionary search to identify a target phenotype is independent of the phenotypic properties of the starting point and solely dependent upon the phenotypic properties of the target.

## 4 Discussion

In this study, we have extended our previous analysis [30] of genotype networks in the signal-integration space of random Boolean circuits (RBCs). While our earlier work was focused exclusively on the properties of phenotypes, we now additionally provide a detailed description of the properties of genotypes. Specifically, we have characterized the distributions of genotypic robustness *R*^{g} (Figure 3A–C) and genotypic evolvability *E*^{g} (Figure 3D–F), revealing distributions that are skewed toward genotypes of high robustness and low evolvability, which increase in both mean and variance as the dynamical regime shifts toward chaos. This variability implies that genotypic robustness and genotypic evolvability are themselves evolvable properties in RBCs—that by gradually altering signal-integration logic via point mutation, an evolutionary process can navigate through signal-integration space toward areas of either high genotypic robustness or high genotypic evolvability, without phenotypic modification.

When genetic perturbations correspond to changes in circuit structure, rather than to changes in signal-integration logic, the distribution of genotypic robustness is skewed toward genotypes of low robustness [7]. This result contrasts with our observations, and underscores the sensitivity of the structure of genotype networks to the form of genetic perturbation under consideration. Regardless of the form of genetic perturbation, it is not possible to simultaneously maximize genotypic robustness and genotypic evolvability (Figure 3G), due to the inherent tradeoff between these properties [37].

We found a positive correlation between average genotypic robustness and phenotypic robustness *R*^{p} (Figure 4A), an intuitive observation given that the latter quantity is commonly defined via the former [37]. In contrast, average genotypic evolvability was negatively correlated with phenotypic robustness *R*^{p} (Figure 4B), indicating that the individual genotypes that make up robust phenotypes have a reduced capacity for innovation, relative to the genotypes of less robust phenotypes. Both the average genotypic robustness and the average genotypic evolvability increased as the dynamical regime shifted from ordered to chaotic (Figure 4, insets).

To understand how a genotype's position in signal-integration space influences its capacity for innovation, we analyzed the relationship between the genotypic distance *D*^{g} of randomly sampled pairs of genotypes on the same genotype network, and the diversity of phenotypes *F*^{p} accessible via non-neutral point mutation from these genotypes. We found a monotonic increase in *F*^{p} across the full range of genotypic distances observed for each dynamical regime. This result contrasts again with observations made in [6], where a strong statistical association was only found between these quantities for low genotypic distance (*D*^{g} ≲ 0.2). Thus, large distances between genotypes in signal-integration space may facilitate access to a greater diversity of phenotypes than the same genotypic distances in the space of circuit structures.

We found a positive correlation between phenotypic robustness *R*^{p} and phenotypic evolvability, as measured either by the absolute number of adjacent phenotypes, *E*^{p} (Figure 6A), or by the probability that two non-neutral point mutations lead to distinct phenotypes, ξ^{p} (Figure 6B). Our results corroborate the observation made in previous studies that the phenotypes of gene regulatory networks can simultaneously exhibit robustness and evolvability [3, 6, 7]. Further, our analyses extend these previous studies by providing an explicit description of this relationship and by considering genetic perturbations that alter the signal-integration logic encoded in *cis*-regulatory regions, instead of genetic perturbations that alter circuit structure.

We also found a positive correlation between phenotypic robustness *R*^{p} and phenotypic accessibility *A*^{p} (Figure 6C), a measure that captures the relative ease with which a phenotype can be identified by mutation-based evolution. This result supports the intuitive notion that phenotypes formed by many genotypes are easier to find in blind evolutionary searches than phenotypes formed by few genotypes. In addition, robust phenotypes are mutationally biased toward other robust phenotypes (Figure 6D), indicating that blind evolutionary searches should encounter, at least on average, highly robust phenotypes. This observation corroborates recent observations made in RNA systems [20], where biological RNA structures were shown to have significantly larger genotype networks than random RNA structures.

To understand how phenotypic robustness, evolvability, and accessibility in signal-integration space influence mutation-based evolution, we considered an ensemble of random walks between pairs of source and target phenotypes. We found that the number of random mutations required to reach the target phenotype was entirely dependent upon its accessibility *A*_{targ}^{p} (Figure 7A) and independent of any properties of the source phenotype (e.g., Figure 7B). This suggests that a random walk through signal-integration space quickly loses any memory of its starting location. Consequently, existing evolvability metrics cannot be expected to predict the duration of a random walk between phenotypes. To overcome this problem, future work will seek to develop new phenotypic evolvability metrics that take into consideration the global structure of genotype space, as opposed to only considering the immediate adjacency of genotype networks.

Most of our results are consistent with those found in RNA systems [8, 37]. However, there is one difference worth emphasizing: The correlation between phenotypic robustness *R*^{p} and phenotypic evolvability ξ^{p} is negative in RNA [8]. Since the relationship between *R*^{p} and adjacent robustness *B*^{p} is positive in RNA, it has been suggested that robust phenotypes act as “evolutionary traps” [8] (but see [37]). That is, random mutation may tend toward phenotypes of higher robustness, which in turn may be less evolvable by this criterion, and therefore slow down evolutionary search. Since we observed a positive correlation between (i) *R*^{p} and ξ^{p} and between (ii) *R*^{p} and *B*^{p}, we conclude that robust phenotypes in the signal-integration space of RBCs are not evolutionary traps, but instead may facilitate the discovery of novel phenotypes. Such contrasts between model systems underscore the fact that the relationships between robustness, evolvability, and accessibility are system dependent.

Phenotypic evolvability increased monotonically as *z* increased (Figure 6A,B, insets) and the maximum achievable robustness was independent of *z* (*R*_{max} = 2^{−N}). Taken together, these results indicate that robustness and evolvability can be simultaneously maximized in chaotic RBCs. This result contrasts with previous analysis [3], which found robustness and evolvability to be simultaneously maximized in critical RBCs. This discrepancy can be understood by considering the two primary differences between the analyses. First, the analysis in [3] focused on genetic perturbations that altered circuit structure (and consequently, in some cases, signal-integration logic), while we focused solely on genetic perturbations that altered signal-integration logic. Second, and of greater importance, the measures of robustness and evolvability considered in [3] were not based on genotype networks. Instead, robustness was defined as the ability of a single mutated genotype to maintain the phenotypic landscape (i.e., the set of all phenotypes observed across all possible initial states), and evolvability was defined as the capacity of the mutated genotype to expand the phenotypic landscape (i.e., add new phenotypes to the set of existing phenotypes). Thus, the definitions of robustness and evolvability used in [3] differed from ours, and were focused solely on the level of individual genotypes and their immediate mutational neighbors. While these definitions are reasonable and insightful, our departure from their use precludes any direct comparison between the two studies. That said, our observation that the robustness and evolvability of chaotic RBCs, at both the genotypic and phenotypic scales, are higher than those of critical or ordered RBCs should be interpreted with caution. For all dynamical regimes, robustness is maximal for fixed point attractors, and these occur with decreasing frequency as the dynamical regime transitions from order to chaos (i.e., as *z* increases). Thus, while it is only possible to simultaneously observe maximal robustness and maximal evolvability in chaotic RBCs, this case represents the exception rather than the rule.

Future work will seek to understand how evolution navigates signal-integration space. Is it possible for mutation and selection to identify the high-robustness, high-evolvability phenotypes of chaotic RBCs? If so, can they outcompete critical and ordered RBCs in static [29] or dynamic [15] environments? How are these evolutionary outcomes affected by mutation rate [38] or recombination [25]? Future research will also focus on larger systems, moving from an analysis of small circuits to large networks. To accomplish this, Monte Carlo sampling methods will be required [20], as the increased size of the signal-integration space will prohibit the exhaustive enumeration of genotype networks. In addition, future work will seek to understand the influence of both canalyzing functions [23] and the probability of gene expression on the size and structure of genotype networks. These directions, among others (e.g., [31]), will lead to a more thorough theoretical understanding of how the genetic malleability of *cis*-regulatory DNA can influence evolutionary processes.

## Acknowledgments

J.L.P. was supported by a Collaborative Research Travel Grant from the Burroughs Wellcome Fund and an International Research Fellowship from the National Science Foundation. J.H.M. was supported by NIH grants LM010098, LM009012, and AI59694. A.W. acknowledges support through Swiss National Science Foundation grants 315230–129708, as well as through the YeastX project of SystemsX.ch, and the University Priority Research Program in Systems Biology at the University of Zurich.

## References

## Author notes

Contact author.

University of Zurich, Institute of Evolutionary Biology and Environmental Studies, Building Y27-J-48, Winterhurerstrasse 190, CH-8057 Zurich, Switzerland. E-mail: joshua.payne@ieu.uzh.ch

Dartmouth College, Computational Genetics Laboratory, HB 7937, One Medical Center Drive, Dartmouth Hitchcock Medical Center, Lebanon, NH 03756. E-mail: jason.h.moore@dartmouth.edu

University of Zurich, Institute of Evolutionary Biology and Environmental Studies, Building Y27-J-54, Winterhurerstrasse 190, CH-8057 Zurich, Switzerland; The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501. E-mail: andreas.wagner@ieu.uzh.ch