This article presents an abstract, tunable model containing two of the principal information-processing features of cells and explores its use with simulated evolution. The random Boolean model of genetic regulatory networks is extended to include a protein interaction network. The underlying behavior of the resulting two coupled dynamical networks is investigated before their evolvability is explored using a version of the NK model of fitness landscapes.
The genetic regulatory networks (GRNs) within cells synthesize proteins, which both directly and indirectly affect behavior through an interaction network (PIN). That is, within a given GRN, some genes encode for proteins that have regulatory functions, some for proteins that have intracellular function, and some for proteins that pass through the cell membrane. A growing body of work with artificial systems incorporates increasing levels of detail from the natural phenomena, but very few have considered PINs explicitly, and only one is known to do so while also exploiting an underlying GRN. This article begins by presenting a generic representation for an abstract cell and explores effects on behavior caused by properties of its GRNs and PINs. In particular, the model is an extension of a well-known Boolean GRN formalism. The article then uses the presented generic scheme to examine how such GRN and PIN properties affect the evolvability of abstract cells, using a recently presented extension to a well-known tunable fitness landscape model.
Random Boolean networks (RBNs)  were introduced as an abstract model by which to explore aspects of genetic regulatory networks (see also ). An RBN consists of R nodes, each connected to B other randomly chosen nodes, with each performing a randomly assigned Boolean update function based upon the current state of those nodes. The emergent behavior of these discrete dynamical systems has been explored widely (e.g., see  and  for overviews). Some of this work has included the use of simulated evolution to induce specific network-wide behavior (e.g., ). With the aim of enabling exploration of the interactions between gene regulation and behavioral trait interdependence, RBNs have recently been combined with the NK model of fitness landscapes. The NK model  considers sets of N (binary) genes or traits, each of which depends upon K others within the set. As the degree of epistasis, K, is varied, features of the landscapes are affected. This abstract, tunable model has been used widely to explore aspects of evolution (e.g., see  and  for overviews). Some work in simulated evolution has also utilized the model (e.g., ). In the combined form—termed the RBNK model —a simple relationship between the states of N randomly assigned nodes within an RBN was assumed such that their value is used within a given NK fitness landscape of trait dependences. That is, as in the RBN model, the existence of proteins within their own network of interactions was implicit within the model. This article adds an explicit PIN to the GRN, thereby presenting a tunable, generic cell representation scheme with the expectation that both dynamical behavior and evolvability can be significantly changed by such additional considerations.
2 Extending the RBN Model to Consider Proteins Explicitly
As noted above, within the traditional form of an RBN, a network of R nodes, each with B directed connections randomly assigned from other nodes in the network, all update synchronously, based upon the current state of those B nodes. Hence those B nodes are seen to have a regulatory effect upon the given node, specified by the given Boolean function randomly attributed to it. Since they have a finite number of possible states and they are deterministic, such networks eventually fall into an attractor. It is well established that the value of B affects the emergent behavior of an RBN in that attractors typically contain an increasing number of states with increasing B. Three phases of behavior were originally suggested through observation: ordered when B = 1, with attractors consisting of one or a few states; chaotic when B ≥ 3, with a very large number of states per attractor; and a critical regime around B = 2, where similar states lie on trajectories that tend to neither diverge nor converge (see  for discussions of this critical regime, e.g., with respect to perturbations). Subsequent formal analysis using an annealed approximation of behavior identified B = 2 as the critical value of connectivity for behavior change .
Hence within the original model, the existence of proteins and RNA is implicit in the gene-to-gene connectivity graph. Essentially, only the role of regulatory proteins and RNA is considered, as shown schematically in Figure 1 (left). However, as also shown schematically in Figure 1 (right), and as mentioned above, many different types of protein can exist within a cell. Proteins and protein complexes are created by genes and by the interactions between proteins: Proteins and complexes exist in the cytoplasm that are the product of protein-protein interactions and not (directly) transcription. Therefore, proteins can also be seen to exist within a dynamical Boolean logic network of R nodes, each with B directed connections from other nodes in the network, all updated synchronously based upon the current state of those B nodes. Again, those B nodes can be seen to have an effect upon the given node, specified by a given Boolean function attributed to it. Here the logical state of a node represents the presence or absence of a protein, as opposed to the activation or inhibition of a gene. Nodes can also be self-connected—for example, representing catalysis. In contrast to the GRN case, the logic of nodes in a Boolean PIN must consider both spontaneous emergence and transformation. The former case simply implies that the transition table entry for all B inputs being at logic 0 must also be logic 0. In the latter case, since protein-protein interactions can cause things such as the formation of protein complexes or shape change in those involved, both typically giving new functionality, the state of a node must be set to 0 if any of the nodes that use it as an input transit to a logical 1 on the following update cycle (Figure 2).
Figure 3 shows the typical (dynamical) behavior of a traditional RBN as a simple GRN representation; although self-connection is also allowed here for generality, no change in typical behavior is seen. As in Figure 1 (left), a regulatory protein exists per gene here, and the presence or absence of that protein matches the activation or inhibition of the gene, respectively. In the case shown there were 1,500 genes and proteins (i.e., RG = 1,500 and RP = RG), and various degrees of gene-to-gene connectivity via the proteins were considered (0 < B < 6). It can be noted that this explicit inclusion of the regulatory proteins does not appear to affect the emergent behavior discussed above; see  for equivalent examples.
Figure 4 shows the typical (dynamical) behavior after 100 update cycles from coupling the two random Boolean networks together in the way described, that is, where one RBN is used to represent the GRN with the directly associated proteins, and one for the wider PIN of a cell. RG = 1,000 here, but now there are 1,000 extra proteins (i.e., RP = 2000), where the total number of nodes remains at 3,000 for comparison with Figure 3. As can be seen, for low connectivity within the PIN (BP), the presence of the protein network significantly (t-test, p < 0.05) reduces the number of nodes changing state per update cycle in the GRN for all BG > 2, that is, the typical attractors are smaller in size. For increasing BP (e.g., BP = 5), regardless of BG, the behavior is more like that seen in the traditional model shown in Figure 2, but typically still reduced. As expected from the traditional model, the density—that is, the number of genes at logical 1 per update cycle—is around 0.5 in all cases (not shown).
Figure 4 also shows the typical behavior of the PIN. The overall dynamical behavior is very similar to that in the corresponding GRN, although attractors are typically statistically significantly smaller (t-test, p < 0.05) in the PIN for BG > 3. It can also be noted that the typical number of proteins present within a given PIN (i.e., the density) decreases (t-test, p < 0.05) as the degree of connectivity BG within the GRN increases, regardless of BP.
The main result here indicates that when an RBN is coupled to one with lower internal coupling, it can experience a reduction in the expected length of its attractors, with an increase occurring in the RBN with low internal coupling. For example, as shown at the top of Figure 4, when BP = 1, the fraction of nodes changing state for BG = 5 in the GRN is less than in the corresponding traditional behavior shown in Figure 3 (B = 5), whereas the fraction of nodes changing state in the coupled PIN is more than in Figure 3 (B = 1). That is, the influence of each network upon its coupled partner is mutual. The effect decreases as the degree of connectivity increases in the RBN with lower internal coupling, at a rate inversely proportional to BG.
Considering the global view of the cell (i.e., GRN + PIN) as essentially a heterogeneously connected single RBN, it is perhaps not so surprising that attractors are smaller than the traditional homogeneous case when a significant proportion of the network has lower connectivity. Moreover, given the two aforementioned modifications to the updating in the PIN form of the RBN over the traditional GRN form, there is a bias within the state transition space of the PIN toward 0. It has long been noted (e.g., see ) that a bias in the transition space of the traditional RBN—that is, a deviation from the expected average of 0.5 for either state—reduces the number of attractors and their size for a given size and connectivity. This explains why the attractors are typically smaller, particularly within the PIN, when point attractors do not dominate and connectivity homogeneity exists, that is, when BP = BG.
Further, Figure 5 shows the typical behavior of both the GRN and the PIN for BG = 5 (i.e., high BG), to briefly examine how the aforementioned dynamic is influenced by the relative size of the two networks. The cases of tripling and halving the number of proteins not connected directly to genes (i.e., RP = 4000 and 1500, respectively) are shown. As can be seen, the degree of influence of the lower internal coupling network does not vary linearly with its relative size, perhaps as might be expected, but it can change the dynamics significantly. This is particularly true for low BP. Again, a network with expected small attractors has influence upon and is influenced by a network with expected large attractors, with the degree of the influence being related to the relative difference both in internal connectivity and in size between the two networks.
It should be noted that there is a small amount of related work exploring multiple, coupled RBNs: Morelli and Zanette  used probabilistic connections between two RBNs, wherein they explored synchronization between the networks with K = 3; and similarly, Villani et al. (e.g., ) have explored the emergent dynamics of multiple RBNs coupled within a two-dimensional grid (see also  and ). Most related to the result above, Hung et al.  examined two coupled disordered cellular automata with probabilistic connectivity of varying K. They report a change in the synchronization dynamic when the internal coupling K is different in each network.
Hence the presence of an explicit PIN can affect the underlying dynamical behavior of a GRN, particularly by reducing the number and length of attractors. The next section considers the evolvability of the simple cell representation formed by their coupling, using a variant of a well-known tunable fitness landscape model.
3 Evolving a Simple Cell
3.1 Related Research
There is a small amount of prior work exploring the use of simulated evolution to design RBNs. Van den Broeck and Kawai  explored the use of a simulated-annealing approach to design a feedforward RBN for the four-bit parity problem. Kauffman [22, p. 211] evolved an RBN to match a given attractor. Lemke et al.  did the same (see  for an overview of work evolving the related threshold networks to an attractor). The same approach has been used to explore attractor stability  and to model real regulatory network data; see  for an example using a probabilistic RBN. Sipper and Ruppin  evolved an RBN for the well-known density task. Bull  has evolved RBN ensembles to solve simple machine learning problems. Quayle and Bullock  used simulated evolution in conjunction with another Boolean GRN model  to design phenotypic behavior. They explored evolving networks exhibiting defined cycles in an attractor and short patterns of individual gene expression. In both cases it was shown how the longer the attractor or pattern, the longer the evolution needed, with a complexity barrier apparently existing.
Other work has moved beyond the Boolean formalism and explored the evolution of GRNs containing continuous-valued levels of protein expression and other details such as protein binding—but not protein-protein interactions. Examples include robot controllers (e.g., [3, 36]) and hardware implementations (e.g., [26, 47]). As with the Boolean models, some work has sought to model real GRN data (e.g., ) or to evolve defined global behavior, such as regular oscillations (e.g., [19, 24, 28]). See  for an overview.
The idea to consider cytoplasm proteins as a significant component in cellular information processing is not new [5, 32], but little work in the area of generic computer modeling appears to exist. This is, of course, not the same as the work on protein structure prediction (e.g., using computational intelligence) or the more specific modeling work of many aspects of cells typically undertaken in systems biology. Fisher et al.  viewed proteins in signaling pathways as agents capable of rudimentary pattern recognition, memory, signal integration, and so on. More recently, Qadir et al.  have used a protein metaphor to realize relatively fault-tolerant associative memory in evolvable hardware.
Most similar to the work presented here, and the only other known abstract model of genome and proteome interaction that may potentially be used as a general representation of a simple artificial cell, is that by Knibbe et al. . They used a fuzzy representation wherein proteins of overlapping membership functions interact in a simple PIN: Proteins are produced by an underlying Gray-encoded GRN and contribute fractionally to phenotypic traits based upon the degree of overlap. The PIN is shown to influence the underlying structure of the GRN when heavily interconnected.
3.2 The RBNK Model
As noted above, Bull  has recently combined the RBN and NK models—in the RBNK model—to explore GRN and phenotype dependence. As shown in Figure 6, N nodes in the RBN are chosen as outputs, that is, after running the RBN for T cycles, their state determines fitness using the NK model. Kauffman and Levin  introduced the NK model to allow the systematic study of various aspects of evolution (see ). In the standard model an individual is represented by a set of N (binary) genes or traits, each of which depends upon its own value and that of K randomly chosen others in the individual. Thus increasing K, with respect to N, increases the epistatic linkage. This increases the ruggedness of the fitness landscapes by increasing the number of fitness peaks.
The NK model assumes all epistatic interactions are so complex that it is only appropriate to assign (uniform) random values to their effects on fitness. Therefore, for each of the possible K interactions, a table of 2K+1 fitnesses is created, with all entries in the range 0.0 to 1.0, such that there is one fitness value for each combination of traits. The fitness contribution of each trait is found from its individual table. These fitnesses are then summed and normalized by N to give the selective fitness of the individual. Exhaustive search of NK landscapes  suggests three general classes exist: unimodal when K = 0; uncorrelated, multi-peaked when K > 3; and a critical regime around 0 < K < 4, where multiple peaks are correlated. This differs slightly from Kauffman's  analysis, wherein it was suggested that correlation decreases as K → N—a difference perhaps caused by his use of non-exhaustive search to explore landscapes of increasing N. An element of correspondence between NK and RBN models with respect to the degree of connectivity can therefore be noted.
The combination of the RBN and NK models enables a systematic exploration of the relationship between phenotypic traits and the genetic regulatory network by which they are produced. In this article, following , a simple scheme is adopted: N phenotypic traits are attributed to randomly chosen nodes within the network of RP proteins, which are not directly connected to a gene (Figure 6). Thereafter all aspects of the two models remain as described above, with simulated evolution used to evolve the coupled RBN on NK landscapes. Hence the NK element creates a tunable component to the overall artificial cell fitness landscape.
Following , the simple case of a greedy, genetic hillclimber is considered here. Mutation can either alter the Boolean function of a randomly chosen node or alter a randomly chosen connection for that node (equal probability) with the restrictions detailed above. Whether the GRN or the PIN RBN is mutated is determined at random, with equal probability, with the aforementioned considerations: Genes can only connect to the first RG proteins, and only proteins in the range [RG, RP] can be mutated; their connections can be altered to any protein, but the state transition for the first entry of their Boolean function cannot be altered from 0.
A single fitness evaluation of a given cell is ascertained by first assigning each node to a randomly chosen start state and updating each node synchronously for T cycles. Here T is chosen such that the two networks have typically reached an attractor. At update cycle T, the value of each of the N trait nodes in the PIN is then used to calculate fitness on the given NK landscape. This process is performed 10 times per evaluation, the fitness assigned to the cell being the average. Then a mutated cell becomes the parent for the next generation if its fitness is higher than that of the original (ties are broken at random).
In the following, RG = 100, RP = 200, N = 20, and results are the averages of 100 runs (10 runs on each of 10 landscapes per parameter configuration). A value of T = 50 is used, by which time, as shown in Figure 3 (left), a randomly configured RBN will have long reached an attractor for the ordered or critical regime. Evolution is run for 5,000 generations—when fitness stasis is typically seen.
In Figure 7 (left column) it can be seen that, regardless of the phenotypic trait epistasis K, cells with low PIN connectivity BP typically perform better (t-test, p < 0.05) than those with higher BP—where BG < 4—which roughly correlates with the typical attractor sizes shown in Figure 4. Hence evolution appears better able to manipulate cell GRN and PIN with attractors containing one or a few states to find high-fitness solutions. It can be noted that other values of N have been used with the same general findings (not shown); see  for discussion.
Figure 7 (right column) also shows the results for the equivalent traditional RBN model described above. As reported in , regardless of K, a sharp transition occurs in the fitness level reached when B > 2. Again, this can be explained by the typical attractor sizes experienced, as shown in Figure 3. Therefore, in correlation with the cases where a reduction in typical attractor size was seen in Figure 4 compared to the traditional model, the addition of the PIN can reduce the effects on fitness from increasing connectivity within the underlying GRN: a PIN can improve the evolvability of a cell by smoothing the effects of exploring GRN connectivity, for low PIN connectivity. That is, fitnesses reached for the coupled GRN and PIN, with low BP, are typically significantly (t-test, p < 0.05) higher than the equivalent traditional GRN-only model for high gene connectivity, but the same (t-test, p ≥ 0.05) for low gene connectivity. It can be noted that within natural GRNs it appears genes have low connectivity on average (e.g., see  for discussion), but a number of high-connectivity “hub” genes perform significant roles (e.g., see ). Hence explicit consideration of the PIN makes the natural phenomenon less surprising than with a GRN-only model; as before, the overall dynamical system of the cell appears to benefit from connectivity heterogeneity.
In the above, fitness is calculated from the state of the N trait nodes on the step after T update cycles, that is, from within an attractor (e.g., see ). A number of examples of related previous work discussed in Section 3.1 have sought to evolve temporal behavior, that is, particular sequences of gene activity wherein the action of the GRN is sampled on every update cycle, up to and potentially including states within an attractor. This is particularly true in the case of robotics studies, for example (e.g., ). Similar explorations are possible within the RBNK model. Here total fitness can be calculated as the average of the fitness of each successive state of the N nodes for T cycles. Thus cells must evolve temporal behavior that keeps them consistently within the high optimal region(s) of the fitness landscape. The results (not shown) indicate that the change makes no statistical difference (t-test, p ≥ 0.05) in fitness achieved for any combination of parameters used. This is in contrast to the same comparison using the traditional RBN model explored in , where it was found that the evolutionary process finds it harder to produce consistently high fitness with networks in the highly connected chaotic regime, corroborating and generalizing previous results with another GRN-only model . This difference can again be explained by the reduced attractor sizes due to the coupling and state bias in the PIN. It is interesting to note how this general result is in contrast to the high overall connectivity case in the single-evaluation version (Figure 7), where no consistent benefit was found for high BP. That is, it appears the reduction in typical attractor size seen in Figure 4, primarily due to the update bias in the PIN even when BP = BG, gives an evolutionary advantage when consistent temporal behavior is required.
5 An Extension: Considering Processing Speed
Information processing within cells occurs on multiple time scales. For example, action potentials occur in milliseconds, whereas transcription takes minutes. In all of the aforementioned examples of artificial GRN modeling, the network is essentially seen to respond almost instantaneously to inputs, much like an artificial neural network. That is, inputs are applied and a response is taken from the GRN one or a few update cycles later, as above. Given the time lag in transcription, it can be noted that this cannot be the primary response mechanism by cells for many classes of environmental input. Rather, a cell's PIN typically determines its immediate response. Information may then trickle down to the GRN and cause appropriate gene excitation/inhibition. Thus GRNs, together with their coupled PINs, must anticipate the environment to some extent, since the delay in transcription could otherwise mean low-fitness behavior. It can also be noted that a “scatter gun” response by a GRN in the over production of proteins would be energetically inefficient compared to more (accurate) predictive behavior.
Hence PINs typically operate over a much smaller updating time scale than GRNs. To begin to capture and explore this basic aspect of cells, a new element can be introduced into the above model, which specifies the number of update cycles a given network undertakes before another performs one update cycle. Thus the updates of the PIN and GRN become deterministically asynchronous, somewhat akin to deterministic asynchronous single-network RBNs . In these RBNs, nodes update if a defined number of cycles have passed since their last update. It is known that within single-network deterministic asynchronous RBNs the number of nodes updated per cycle and the frequency of such updates can significantly change the overall behavior, despite the connectivity and update rules remaining constant . Figure 8 shows examples of how this can also be true within the GRNs and PINs used above. With low connectivity within the PINs (e.g., BP = 1), there is no change in behavior for any of the results above, regardless of the PIN's relative update rate to that of the GRN. For high connectivity within the PINs (e.g., BP = 5) and low connectivity within the GRNs (e.g., BG = 1), no change is seen in the GRNs, but the PINs experience more, larger (t-test, p < 0.05) attractors as a result of faster updating within themselves. Density does not appear to be affected. Conversely, for high connectivity within the GRN (e.g., BG = 5), the PINs experience fewer, smaller (t-test, p < 0.05) attractors as a result of faster updating within themselves. The switchover point is at BG = 2 for BP = 5, that is, where there is no significant change in either network regardless of the relative updating rate of the highly connected PIN (not shown).
Based on the general results in previous sections, where the typical size of attractors was roughly inversely correlated to evolvability, it may be predicted that for the high-BP case evolvability will change; from Figure 8 the achievable fitness can be expected to decrease for BG = 1 and increase for BG = 5 when BP = 5. Figure 9 shows examples of the evolutionary performance from using faster-updating PINs within the cells—here, 25 times faster than the GRN on K = 0 phenotype component fitness landscapes. As can be seen, fitnesses are generally lower on average; however, the difference typically is not statistically significantly different. That is, fitness levels do not decrease for BG = 1 and do not increase for BG = 5.
Thus it appears that the change in dynamics seen in Figure 8 is not sufficient for evolution to either exploit or be disturbed by. Analysis of the rate of acceptance of the four different mutations (not shown) indicates no significant variation between the operators for any parameter settings. For example, the average success rate of connection mutations is roughly the same in the GRN as in the PIN. That is, the effects of increasing the update rate of the PIN are not confined to that network but affect the whole cell's fitness landscape seemingly equally. The same was found for other relative update rates and for the continuous evaluation scheme (not shown).
Hence, inclusion of a more realistic PIN with respect to relative updating rate gives the potential for significant changes in dynamical behavior, but in a way evolution finds difficult to exploit. This is also surprising, since  shows how using asynchronous updating, through random node selection, does not significantly alter evolvability in a traditional RBN.
There is a growing body of work exploring generic representations more closely analogous to those seen in natural systems—artificial genetic regulatory networks. This article has extended one of the simplest and earliest-presented schemes to consider one of the other fundamental information-processing components in cells and placed it within a tunable fitness landscape model to begin to systematically explore both the behavioral and the evolutionary dynamics that can emerge in such systems. In the only known closely related prior work, it was shown that a high degree of connectivity within the PIN could affect the structures that emerged within the associated GRN under evolution. This article has shown that the degree of connectivity within the PIN can be significant with respect to both dynamical behavior and evolvability. More specifically, it has been found that coupling GRNs and PINs of differing internal connectivity can alter the dynamical behavior of each in comparison with the single-network case. In particular, typical attractors are reduced (increased) in size when coupled to a network of lower (higher) connectivity. It was also shown that the relative size of the PIN to that of the GRN can alter the dynamics. Moreover, evolution has been found better able to design coupled networks when the typical attractors are smaller, regardless of the degree of interaction between the phenotypic traits; having fewer, smaller attractors appears to give smoother fitness landscapes.
The relative rate of updating—seen as the information-processing rate—within the PIN was also considered. Here, despite the potential for a relative reduction in the size and number of attractors, the overall (simple) evolutionary process remains unaltered. Of course, this initial version of the model only considered the underlying dynamical behavior of the artificial cell's GRN and PIN and did not place it within an environment containing a variety of stimuli over a variety of time scales. Future work should explore enabling the information-processing speed of subnetworks within the PIN to vary against that of the GRN for tasks with different temporal dynamics and the scope for emergent anticipatory behavior.
Many other extensions to this work can also be envisaged. For example, following the developments of Boolean GRN models discussed in Section 3.1, more realistic PIN components can be introduced. Indeed, protein shape is extremely important, and the details of the folding process after gene expression are potentially significant, to evolvability (e.g., see  and ). Other aspects, such as how proteins degrade at different rates, should also be explored (e.g., see ). The aforementioned related work by Knibbe et al. , while not dynamical in nature and without feedback between the PIN and GRN during evaluations, might form the basis for one such route to more realistic extensions, for example.
And there are many other aspects of cells that could be included, both within the model presented and within more complex variants (see  for an overview of other models). For example, the inclusion of RNA seems to be a clear next step, as is consideration of previous work on cytoplasm components, particularly that using discrete dynamical systems to model elements such as microtubules (e.g., ) and cell-to-cell transduction (e.g., ).
Department of Computer Science & Creative Technologies, University of the West of England, Bristol BS16 1QY, UK. E-mail: Larry.Bull@uwe.ac.uk