Abstract
In nature, gene regulatory networks are a key mediator between the information stored in the DNA of living organisms (their genotype) and the structural and behavioral expression this finds in their bodies, surviving in the world (their phenotype). They integrate environmental signals, steer development, buffer stochasticity, and allow evolution to proceed. In engineering, modeling and implementations of artificial gene regulatory networks have been an expanding field of research and development over the past few decades. This review discusses the concept of gene regulation, describes the current state of the art in gene regulatory networks, including modeling and simulation, and reviews their use in artificial evolutionary settings. We provide evidence for the benefits of this concept in natural and the engineering domains.
1 Introduction
Ever since the seminal 1975 article by King and Wilson [88], the biological community has been aware that there is more to the genome than nucleic acid sequences translated into amino acid sequences. However, the apparatus for transcription and translation of DNA information into protein function has been studied since the 1950s, leading to the “Central Dogma” of Molecular Biology [25, 26]. By that time Schroedinger in his famous 1944 work on life [128] had already alluded to the possibility of an executing role (i.e., more than an information template role) for his “aperiodic crystal” at the foundation of life. With the development of the operon model in 1961, Jacob and Monod [71] firmly established the idea of regulation in our understanding of the life-organizing functions of DNA.
The central challenge that gene regulatory networks address is the translation between spatial patterns of information, as provided by different types of macromolecules such as oligomers and polymers—that is, the configuration of matter in space—and the dynamical processes in time necessarily underlying any type of behavior. With regulation, the study of objects in biology is enlarged by the study of processes. Spatial arrangements of objects (the material of life) are replaced by temporal arrangements of processes as the core principles of living systems. While space remains an important aspect of life (natural life without spatial embodiment is unthinkable), it is the dynamic aspects of entities that play the dominant role. With regulation, the notion of time, dynamics, transients, and steady states becomes of utmost importance in our understanding of organisms.
The simplest self-contained genetic regulatory element (or motif) is a feedback loop. A completely new phenomenon (oscillations in time) can emerge in such an otherwise dull behavioral landscape; for as soon as there is a mixture of positive and negative feedback connections in the loop, the tiniest amount of noise can cause oscillations to emerge, bringing about a new behavior of this system with an intrinsic time scale. Thus, it is to be expected that larger gene regulatory networks, consisting of many genes, contain a variety of network motifs with both positive and negative feedback connections [111]. Between these motifs, weaker connections can accumulate causes and distribute effects.
Additionally, we can immediately see that many different behaviors should be expected to emerge from such networks. While the details of these behaviors are the result of possibly delicate network interactions and therefore difficult to predict, the fact that networks allow a richness of behaviors to emerge is important for adaptivity of living organisms. For example, the growth of the body of an organism and the behavior of its parts in different phases of its development do not obey strict rules directly able to generate these elements. Rather, we have to assume a set of processes that obey their own internal logic of regulation and develop in phases, influenced by the environment and controlled by different subsets of the genetic regulatory network. In multicellular organisms, there are many precursor cell phases until a final cell state is reached, and development can be seen as a process of gradual approach to the state of maturity of a body, rather than its immediate instantiation.
We can finally see that noise and stochastic events are likely playing a key role in promoting variety [122]. First off, regulation is sensitive to single molecule activities, bringing about the possibility of stochastic quantum effects influencing the outcome. However, the Brownian motion underlying diffusion that we are used to averaging out from underlying cause-effect relationships brings another type of stochasticity into these systems, providing additional time scales that correspond to the spatial organization of the organisms and their environment.
This article is aimed at providing a brief overview of the phenomena and models of gene regulatory networks and delving into the applicability of these concepts in man-made artefacts. Thus, Section 2 discusses some of the most important phenomena in biological gene regulatory systems, and Section 3 provides an overview of the modeling efforts that have been made over the years to describe and understand these phenomena. Section 4 is then dedicated to a discussion of the internal dynamics of artificial regulatory network models, and Section 5 reviews the current applications of these approaches. The article concludes in Section 6 with a discussion of the future of artificial gene regulatory network research.
2 Gene Regulation in Nature
Regulation in general refers to the control of the flow of certain quantities by signals from another entity. If we consider the multitude of flows that have to be arranged for a living organism to function properly, it becomes clear that the first target of gene regulation has to be the control of metabolic fluxes. From the intake of nutrients to the expulsion of waste, the energy household of cells needs to be organized and controlled. Enzymes and their expression levels are the material patterns that have to be arranged in time to make this possible.
However, this is just the most basic regulatory need of organisms. The sophisticated weaving together of behaviors to produce the life cycle of a cell or organism, or to allow it to survive under adverse circumstances, stand to gain as much from regulation as energy fluxes do [57].
All of this leads to the conclusion that the control of gene expression levels, which in turn control the interaction of the organism with its environment, is the most basic function one can imagine for a gene. But in addition, gene products can interact with other genes and their products, thus forming a network that allows intrinsic time scales and autonomy (self-regulation) to emerge [32]. The following subsections discuss these topics more in detail.
2.1 Gene Expression, Cell Function, and Differentiation
The behavior and type of a cell are characterized by its gene expression patterns. This is because the gene expression patterns describe the components of the cell that have been constructed. By construction we mean the transcription of DNA into RNA, and its possible subsequent translation into protein. RNA expression levels therefore can be used to identify specific cell types. One speaks of the fingerprint of a cell, revealed in its expression pattern [2, 53] through RNA sequencing techniques.
However, as always in biology, individual cells vary a lot. Therefore, while the characterization of cell types is a convenient way of clustering cell behaviors into classes, there is nevertheless substantial variation between different cells even in the same cell class (or type). Historically, this was difficult to examine in the laboratory, since most techniques could only be used to analyze cell mixtures. With the advent of single-cell transcriptomics, however, the situation has changed and differences down to the individual cell level can be resolved [45, 136].
It is also becoming clear that not only do individual cells of the same type have differences in their expression profile, but cells vary their expression levels depending on circumstances and age [87]. No wonder: Cells are open systems, best characterized at any time as in steady state or in transition. This confirms the intuitive ideas first proposed by Waddington in 1957 [144], as depicted in Figure 1.
2.2 Genomic and Protein Aspects
Gene expression happens through a process of gene activation and subsequent generation of protein and/or RNA products. The activation and control of gene expression is the focus of genetic regulation, where multiple mechanisms influence the rate of gene expression. While the machinery of gene expression is complex and varies across organisms, some features are consistent. Gene expression requires the recruitment of a RNA polymerase to a gene's region of the DNA sequence, which will then transcribe the gene into mRNA by forming RNA polymers from nucleotides. Promoter sequences are used to initiate transcription of a gene by recruiting the polymerase. Transcription factors modulate the rate of expression by inhibiting or enhancing the rate of transcription of genes. Transcription factors are themselves the products of gene expression, serving as a source of feedback for the regulation of genetic networks. While the transcription of genes yields mRNA, in the case of protein products an additional step of translation must be performed to produce a protein product from a given mRNA molecule. However, mRNA itself can spawn other types of RNA (iRNA, etc.), which take on their own regulatory or interaction role in the processes leading to protein production.
As mentioned above, the products of gene expression are protein and RNA molecules, some of which are transcription factors while others directly contribute to the metabolism and behavior of cells. The behavior and function of a cell are the result of the gene expression that has contributed to the current state of the cell. That is to say, the size, membrane composition, and structure of a cell are defined by, among other things, the gene products that reside within the cell. As a consequence, the state and function of a cell are determined by the composition of the cell, which is the result of gene expression, degradation, and so on. Readers interested in these mechanisms can find an extended review in [67].
2.3 Significance of GRNs in Cellular Physiology
The cell cycle is fundamental to biological organisms, as it governs the process of cell division and therefore of replication. One of the classic model organisms for studying the cell cycle is fission yeast. Regulation of the cell cycle in fission yeast has been well characterized [140]. The cell cycle is controlled by three modules of its gene regulatory network that operate at different phases of the cycle: G1/S, G2/M, and mitosis. In the G1/S phase, cells grow and replicate DNA. There are four key control elements involved in this process (see Figure 2): cdc2 and cdc13, which are two proteins shared throughout all three phases of the cell cycle, pair to form complexes that activate key pathways; and ste9 and rum1, which maintain bottlenecks via degradation and inhibition of cdc13 and the cdc2-cdc13 complex. In turn, cdc2-cdc13 acts to reduce the activity of ste9 and rum1. As the cycle proceeds through G2/M and mitosis, other players take part in the control of the cell cycle through similar feedback mechanisms. In the cell cycle of yeast and mammalian cells, a number of these interactions involve the explicit control of transcription factors, such as e2f and p53 [46], where p53 is well known for its role as a tumor suppressor [120]. The interested reader will find ample literature to review on the intricate details of the regulation of the cell cycle, starting from [140]. The key feature that we hope to convey to the reader is that the dynamic feedback between the activity of regulatory elements (as manifested by their concentration and localization within the cell) allows the gene regulatory network to transition between different modes of activity. In the case of fission yeast, this interplay manifests itself as distinct modules that operate at different phases of the cell cycle.
2.4 Significance of GRNs in Developmental Biology
We now consider the evolutionarily conserved Delta-Notch signaling pathway present in metazoans [3]. Delta-Notch signaling was initially studied because of its role in neurogenesis (growth of the nervous system). The intercellular signaling of this system exhibits a lateral inhibition dynamic, where a cell that commits to neural differentiation inhibits its neighbors from doing so. This increases the sparseness of cells that commit to a neural fate. The Delta-Notch signaling pathway is regulated by a suite of achaete-scute genes that produce transcription factors that both control and are themselves controlled by lateral inhibition [98]. Again, we see that feedback loops are involved in the control of genetically regulated networks. The Delta-Notch pathway not only is evolutionarily conserved, but plays many roles beyond neurogenesis in development, such as embryonic segmentation in Drosophila [99], wing patterning in Drosophila [70], and blood vessel formation in mice [12] and zebrafish [142].
While the Delta-Notch pathway is a well-studied case, genetic regulation is fundamental throughout developmental biology. The patterning of positional cues in Drosophila development, such as Bcd, a key determinant of anterior-posterior polarity, is regulated by multiple transcription factor binding sites with various binding strengths [114]. Eight key transcription factors are involved in providing positional information during Drosophila development (including Bcd), and recent work has shown that it is possible to predict the resultant patterning by modeling the interactions of regulatory elements on the basis of the underlying regulatory sequences [129]. The role of transcription factors in the patterning of Drosophila even transcends the individual organism itself, where maternal inclusion of localized transcription factors in the embryo expedites segmental patterning.
2.5 Significance of GRNs in Evolution
The two primary forms of genetic regulation that have evolved are transcription factors and microRNAs (miRNAs). While transcription factors are proteins with variable binding affinities to particular regulatory sequences, miRNAs are simpler. They are short RNA sequences that can bind to regulatory sequences and repress gene products [9]. Both miRNAs and transcription factors are known to be highly conserved throughout evolution; however, a key difference between the two regulatory mechanisms is their binding affinity. Transcription factors can generally bind to a range of sequence patterns with variable binding strength, while miRNAs have almost exact binding specificity due to nucleotide complementarity. As a result, the rates of evolution of miRNA (slow due to exact binding) and transcription factors (fast due to reduced specificity) have been predicted to differ by approximately 4 orders of magnitude [21].
A fascinating evolutionary mechanism that has played a significant role in the evolution of GRNs is the occurrence of transposable elements. Transposable elements (TEs) are sequences that move and replicate throughout the genome, and are commonly described as genomic parasites [17]. TEs have been clearly shown to be beneficial to hosts in some cases, such as the upregulation of factors leading to pesticide resistance [23]. The dynamics of replicating TEs contributes to the establishment of sequence motifs that have related sequence patterns. TEs serve as a source of novel and derivative genetic material that can be recycled into regulatory sequences and binding elements [50]. The molecular mechanisms underlying the evolution of gene regulatory networks are diverse, ranging from proteins to RNA-binding elements, and effectively form a genomic ecosystem.
2.6 Significance of GRNs in Epigenetics
The expression of genes relies not only upon genetic sequences, but also upon the accessibility of the genes. Epigenetics is concerned with heritable traits that are not encoded within the sequence of the genome. Most forms of epigenetics involve altering the physical structure of the sequence, such as wrapping DNA around histones and methylation of nucleotides. Chromatin is a collection of DNA, RNA, and protein that condenses the structure of these molecules, which allows for increased stability, density, and organization. Within chromatin, DNA is wrapped around histones to form nucleosomes. The accessibility of DNA sequences in these nucleosomes has a significant influence on the expression of genes located within the nucleosome [62]. Furthermore, modification of histones further alters gene expression to the extent that gene expression can be predicted from the known modifications [82].
DNA methylation is another form of epigenetic regulation that involves structural modification of nucleotides in the sequence itself. One of the key mechanisms of DNA methylation is physically blocking enhancer and/or promoter regions, thus altering the expression of a gene [80]. Methylation itself is the addition of one carbon with three hydrogens to an existing structure; in the case of DNA methylation the existing structures are nucleotides. This additional structure is sufficient to modify binding sites to the point of prohibiting interactions with regular binders of a regulatory sequence. Methylation can be induced by environmental factors [72]. The methylated state of a DNA sequence can be transmitted through multiple cell divisions, as well as across generations [56]. In this way the evolution of GRNs can be affected by epigenetic modifications, which may be derived from environmental factors that were experienced as a result of genetically regulated behaviors.
In summary, gene regulation has emerged as a key player in translating the information provided by an organism's inherited DNA into the structure (via growth and development) and behavior of that organism. Time scales range from seconds (in the case of the regulation of metabolism in neurons [102]) to thousands of years (in the case of evolutionary processes). Gene regulatory networks have been compared to the compilers of computer languages that translate code into behavior of the underlying machine. However, there is much more to the computational modeling of gene regulation, and this brings us to our next topic.
3 Computational Gene Regulatory Networks
Artificial gene regulatory networks are a complex example of systems biology [89]. Comprehensive models of the gene regulatory process would require a large range of complexity, from molecular dynamics to morphogenetic coupling, making complete and exact models prohibitively expensive. As a result, GRN models generally focus on particular aspects of genetic regulation; for example, the Gillespie algorithm [58] attempts to capture the stochasticity of genetic regulation without modeling stochastic molecular dynamics. While we primarily focus on computational and evolutionary models of gene regulatory networks in this review, we will also touch on mathematical studies and analyses of the dynamics of GRNs.
3.1 Biological Models
There are a number of approaches that are used for modeling gene regulatory networks [16, 35]. These approaches include differential equations, stochastic simulations, Petri nets, flux balance analysis, graphical models, and more. A number of reviews modeling genetic regulation have been written, many of which primarily focus on biological modeling [81]. We begin our discussion of biological models by introducing the steps of gene regulation and how they are modeled, but ultimately focus on how these models are subjected to evolution.
Gene regulatory networks are commonly modeled with Hill kinetics, which models the cooperative binding of two or more proteins to promoters, enhancers, silencers, and other regulatory regions of a gene. These kinetics are formulated using reaction rules that describe the rates of association/dissociation of a gene and regulatory proteins to form a complex as functions of the cooperativity of multiple binding proteins. The complexes formed by the binding of regulatory proteins are used in conjunction with additional reactions to either directly produce protein products or, more realistically, model the transcription of mRNA, which is later translated into protein products.
While the initial approaches to analyzing gene regulatory networks focused on deterministic models, such as ordinary-differential-equation-based mean-field approximations, the use of stochastic models has been increasing in recent research. Heightened attention to stochastic gene expression has been strongly supported by observations of stochasticity-driven differences between cells with identical genetic background, and by single-molecule experiments [49, 123, 124]. The cornerstone of stochastic simulation of chemical kinetics is the Gillespie algorithm [58] and its extensions for adaptive time steps [59].
It is well known that GRNs are stochastic systems where noise can have a significant effect on resulting cellular behaviors [49]. The occurrence of noise is generally discussed in conjunction with the common observation of small numbers of some molecular species acting in a GRN, where fluctuations in the concentration of such species can lead to significant changes in network dynamics. However, biological mechanisms have evolved not only to control and eliminate such effects of molecular noise, but also to amplify and exploit it [116]. In a more detailed study of noise cascading through a GRN, the authors of [115] show that even in networks that do not involve small concentrations of some molecular species it is possible for noise to have significant effects due to global network modulation and upstream effects.
Biophysical models focus on capturing aspects of genetic regulation that extend beyond the scope of simulating chemical kinetics. This has been heavily utilized in computational models of Drosophila segmentation [69, 104], neural development of zebrafish [150], and vascular biology [12], where GRNs tuned to biological networks interact with cytoskeletal and chemotactic behaviors to predict the temporal behavior of vascular dynamics [94]. The same model has also been utilized to show how the bistable dynamics of a GRN can be used to regulate pattern formation for healthy angiogenesis [11, 141].
3.2 Dynamics and Analysis
Mathematical studies of genetic regulation have been driven primarily by the dynamical systems community. As a result, the majority of analytical work has focused on the stability and attractor dynamics of gene regulatory networks. It is important to address these mathematical foundations of gene regulatory dynamics, as they inform researchers of the capabilities of GRN representations. For example, in the Delta-Notch signaling system discussed in Section 2.4 the previous belief had been that cells would decide on an environmental preference and then move to favor that preference. Through a dynamical systems study focusing on the bistable nature of a genetically regulated model, it was shown that the decisions for motion and for location preference operate in parallel [11]. This conclusion was reached by evaluating the dynamic stability of cellular behaviors as a function of cellular phenotype and environmental inputs to the cell's gene regulatory network. Similarly, the synthetic biology community has begun to use models of GRNs to develop hypotheses for experimental validation [67].
The theoretical biology community has been studying the dynamics of GRNs for many years, beginning with Boolean networks, where genes are defined by on-off states [83, 85]. Mathematical methods have extended this class of models with tools for the analytical discovery of steady states, attractors that the GRNs will tend towards without external stimuli [110]. Such analysis led to the prediction that evolution will drive single-cell genetic network dynamics toward greater dynamic stability [84]. In a detailed analysis of a Boolean GRN derived from the biology of the cell cycle, Deritei et al. [37] show that this ubiquitous GRN is inherently modular with a switch that triggers the completion of the cell cycle after passing a restriction point. These analytical and dynamical studies of Boolean GRNs have led to concrete predictions that can be experimentally verified.
Information-theoretic approaches to the analysis of gene regulatory networks often draw upon techniques from statistical mechanics that describe the state space for molecular arrangements, and allow one to relate distributions of biophysical states to information in terms of the entropic cost of molecular configurations. Tkačik et al. [135] use information theory to show that, given biologically observed noise levels in gene expression, it is possible for genes to encode more than one bit of information (“on” or “off”), a point that becomes particularly pertinent when addressing the representation of GRNs. Through a combined experimental and analytical study Cheong et al. [22] show how negative feedback of signaling and transcription can suppress noise, thus facilitating communication between collections of cells. By drawing upon methodologies from information theory, these studies have shown how cells can regulate the otherwise confounding noise of the biological environment to store and transmit meaningful information.
3.3 Representations
Boolean models of GRNs capture the on-off nature of genes, switching when concentration of proteins crosses threshold values that represent the transition between on and off states [83]. Early studies of this Boolean model investigate the stability and oscillatory dynamics of random Boolean GRNs [85], and later studies show support for the validity of the Boolean GRN model [131]. Arguments for the use of Boolean GRN models often draw upon the quantity of information available from biological experiments, suggesting that the amount of information made available by gene expression profiling is only adequate for training Boolean GRN models [95]. Furthermore, the simplicity of Boolean models makes it easier to map out the state space of a GRN, thus facilitating analysis.
Petri nets (PNs) are a formalism commonly used in modeling distributed systems [118]. PNs are graphical models that use nodes and edges to represent places and transitions, where places can contain variable numbers of tokens. When used to model GRNs, places represent molecules (proteins, mRNA, complexes), transitions represent possible reactions (excitation and inhibition, including reversibility), and tokens represent concentrations. There is a wide range of PN models that incorporate features such as stochasticity; as a result, PNs have direct relations to Boolean GRNs and the Gillespie algorithm. Matsuno et al. [108] present an application of PNs to modeling λ-phage gene expression that readily incorporates key relevant molecular species and transition types. The interested reader is directed to [19] for an extensive review of PNs for biochemical models.
To study the network structure effects of a GRN, Banzhaf (2003) developed in [5] a model that uses a sequence of bits to represent a genome, with mobile protein elements that bind according to bit patterns, similar to transcriptional footprints [14]. The features of the model are demonstrated by showing a wide variety of dynamic characteristics, the effects of genetic perturbations, and the evolution of the model with an evolutionary strategy. This bit-string matching representation was simplified to an integer-based representation [30], which has been shown to be effective in numerous applications discussed in Section 5.
3.4 Evolution
Although we have focused our discussion on features of gene regulatory networks that are being modeled and analytical approaches to modeling, one of the most powerful approaches to understanding gene regulatory networks is the use of evolutionary methods. Gene regulatory networks are evolved reaction networks, and the existence of a naturally observed GRN is prefaced by an process that transformed the GRN into its observed state. An understanding of the evolvability of GRNs serves as a basis for understanding why particular gene regulatory mechanisms have emerged.
The ability to evolve the simplified bit-string-based GRN model of [5] to fit multiple mathematical expressions was explored in [93], where oscillatory, sigmoid, and exponential decay functions were successfully matched.
In a stochastic simulation model using the Gillespie algorithm, Leier et al. [96] show that a comprehensive GRN model with first-order and second-order reactions and homodimer formations can be evolved to obtain oscillatory dynamics. A particular challenge of this evolutionary problem is compensating for the noisy dynamics, which can shift the period and amplitude of observed oscillations; this leads to a need for evaluating each simulation trajectory and aggregating over the results, as opposed to observations.
It has been shown that in Boolean GRNs evolving under a gene duplication/divergence model, functionality can be well conserved even under extreme evolutionary conditions [1]. Long term evolution of biologically plausible GRNs using agent-based models shows the emergence of evolutionary sensors—genes that sense evolutionary pressures—that allow for rapid evolutionary change in response to environmental variation [27]. In a related study it was shown that under variations in fitness, such as environmental variation, the ability of GRNs to adapt to new environments can be enhanced [44]. In Section 5 we discuss a number of evolutionary methods and genetic representations that have been used in applications of GRNs.
4 Internal Dynamics of GRNs
Before applying gene regulation to agent control, researchers focused on understanding internal dynamics of gene regulation. In 1999, Reil was one of the first to present a biologically plausible model used in an artificial life context [117]. In his work, he randomly generated a set of variable-size binary genomes in which each gene started with the particular sequence 0101, named the promoter. Promoters exist in living systems: A very specific sequence of nucleotides, the TATA1 box, is known to identify a gene's starting position. As presented in Figure 3, Reil used a simple visualization technique to observe gene activation and inhibition over time with randomly generated networks. He obtained several activation patterns such as stable, chaotic, or oscillatory patterns. Reil also pointed out that after random genome deteriorations, the system was able to rebuild the same pattern through an oscillation period.
The artificial gene regulatory network proposed in [5] is strongly inspired by real gene regulation. In this work, the genome is coded as a set of 32-bit integers (in other words, a bit string). Each gene of a genome starts with a promoter coded by the sequence “XYZ01010101,” where XYZ is any bit sequence to complete a 32-bit integer. The combination “01010101” has a probability of 2−8 in a bit string equivalent to a TATA box from a real DNA sequence. The gene coded after a promoter has a fixed size of five integers (160 bits, each integer having 32 bits). Upstream from the promoter, two integers code for an enhancing site and for an inhibiting site, thus regulating gene expression activity. In this model, all DNA transcription mechanisms are omitted to focus on gene regulation dynamics itself. This kind of genome can produce various activation dynamics, as presented in Figure 4. Randomly generated genomes were used in these experiments.
A key property observed in these networks is heterochrony [6]. As depicted by Figure 5, small changes in the network structure or concentration only imply small changes in network dynamics. This behavior is crucial when the network is evolved for artificial life applications: Heterochrony smooths the fitness landscape, making it more evolvable. This model has also been used in [18] to analyze the inner temporal dynamics of gene regulatory networks using pole-balancing and signal-processing benchmarks and its capacity to reproduce input signals within a delayed time frame.
Pictures and videos have been employed to visualize the dynamics of GRNs and observe the complexity of the behaviors generated. In research by Cussat-Blanc and Pollack (2012), the GRN, cloned to every pixel of the picture, computed the RGB components of each pixel of the picture [31]. Each GRN used the pixel coordinates (input proteins) to compute the color component (output proteins). GRNs were evolved using interactive evolution: Users were tasked with selecting the most beautiful images, and GRNs were mutated and recombined based on this selection mechanism. Some of the generated pictures, representing snapshots of the network dynamics, are presented in Figure 6, while Figure 7 presents screenshots of videos.2 This work allowed the exploration of possible behavioral structures generated by gene regulatory networks. For example, in picture (b) of Figure 6, repetitive patterns can be observed with some modifications. That shows the capacity of gene regulatory networks to produce modular patterns. Pictures (b–d) also show the ability of the GRN to produce both smooth and abrupt variations, as codified here in color transitions. Finally, picture (e) depicts the GRN's capacity to produce extremely complex behavior, with very different outputs for close input values. More details about properties highlighted in these pictures can be found in [31].
Videos can capture the temporal aspect of gene regulation. As presented in Figure 7, oscillatory behaviors can be easily visualized. Other videos show chaotic and steady-state behaviors, which are the two other main behaviors of gene regulatory networks in addition to oscillations.
The computability of gene regulatory networks has been been studied extensively over the past years. A full review of computability in GRNs can be found in [103]. Artificial GRNs have also been used to investigate a number of questions in the context of evolution. Using Reil's DNA-like model, Rohlf and Winkler studied the evolvability of GRNs and showed a strong relationship between their robustness against noise and their robustness against genetic material deletions due to the evolutionary process [119], which are key properties in real-world applications. Schramm et al. investigated the role of redundancy in artificial gene regulatory networks [127], showing that genetic redundancy can enhance evolvability up to a point, after which greater redundancy becomes deleterious. Genetic networks were able to evolve modular genotypes when subjected to dynamic fitness landscapes [100]. Recent work has shown that evolved GRNs can achieve greater hierarchical modularity than neural networks [107].
Numerous studies remain to be performed regarding the evolution of GRNs, and some of those can be approached from the perspective of artificial life, which is our next topic.
5 Gene Regulatory Networks and Artificial Life
As in nature, the role of (artificial) gene regulatory networks in artificial life and evolutionary computing systems is manifold. In general terms, GRNs promote our understanding and implementation of the genotype-phenotype map in those systems, notably the nonlinearity between those representations. In this section, we shall discuss some of the most important applications.
The phenomena produced by GRNs can be classified into (i) interactions between genes through their expression products, (ii) spatial patterning of expression, and (iii) temporal structuring of expression. There is a close connection between those phenomena and applications, where spatial structuring enables embryogenesis and design, temporal structuring allows for development and dynamic control of agents, and interaction among genes (and their products) allows for neuromodulation and indirect encoding of various structures.
This section presents the main application work using artificial gene regulatory networks. Their use started in artificial embryogenesis, described in Section 5.1, which aims at reproducing the development of multicellular organisms. These models are perfect theoretical frameworks to develop, improve, and understand gene regulation in a setup comparable to or inspired by biological examples. Following this period, researchers started to use these models to solve real-world problems. Due to their intrinsic ability to control behaviors of cells, artificial gene regulatory networks were first transferred to agent-based systems, discussed in Section 5.2 in more detail. In this case artificial GRNs are used to produce agent behavior by using an agent's sensors as input proteins to an AGRN and using an agent's actuators to be controlled by the AGRN's output proteins. First real-world applications were taken from evolutionary robotics. Successful applications in virtual agents then led to more recent work using AGRNs as neuromodulators in neural-network-based learning systems (see Section 5.3) or as indirect encoders for neural networks or genetic programming trees (see Section 5.4).
5.1 Artificial GRNs in Artificial Embryogenesis
The previous section provided an explanation of the dynamics and properties emerging from gene regulation. In this section we discuss applications of artificial gene regulatory networks. One of the most obvious ones is artificial embryogenesis. Artificial embryogenesis draws inspiration from biological mechanisms involved in the growth of a living organism, from the initial single-cell zygote to a whole mature organism. In this process, gene regulation is a central mechanism that controls a wide range of interactions between the cells and their micro environment. Therefore, artificial gene regulatory networks have frequently been used in artificial embryogenesis. They have been employed to control cells, their cell cycle regulation (when to divide), their migration strategy, and their specialization (color or function). This section provides an overview of existing models of embryogenesis that are based on artificial gene regulatory networks.
5.1.1 Background on Pattern Generation
Turing produced the first work on modeling morphogenetic development in 1952. He suggested that a reaction-diffusion model could capture cell development from a chemical point of view [138]. In his model, a set of differential equations governs the dynamics of morphogen concentrations in an environment. Even though this seminal work did not include any discrete form of cells or genetic regulation, Turing's model has served as the basis for most of the past and current models of artificial embryogenesis and patterning [109].
Turing's model was mainly used to describe lower-level phenomena such as chemical substance diffusion in the environment (see Figure 8). Bio-inspired mechanisms have been used to add cells interacting with this environment, mainly for the diffusion of morphogens produced by organisms, as in [5, 30, 47, 52, 75, 90, 92, 117, 133].
At the cellular level, cellular automata, proposed by von Neumann in the 1950s, are considered a key contribution. At the time, von Neumann was working on self-replicating machines. He stated the hypothesis that a system able to manipulate elementary components could be capable of constructing a copy of itself. Due to the technical complexity of building such a machine, he imagined a universal automaton able to pick up and to assemble arbitrary components. During the same period of time, the mathematician Ulam was working on recursively defined objects. Recursive objects are cells in an infinite matrix, which can have two states: active or passive. Cells evolve over time following rules based on their neighborhood. Ulam suggested von Neumann use this kind of environment to avoid technical problems with his universal constructors. Von Neumann was successful with a proof for such a constructor, and in 1966 the theory of von Neumann's self-replicating machines, implemented as cellular automata, was posthumously published [143].
In 1970, Conway defined the famous game-of-Life cellular automaton [55], in which cells have only two states (dead or alive) and two simple rules using a Moore neighborhood. Depending on the initial environment configuration (repartition of living and dead cells at the beginning of the simulation), several shapes emerged and were able to move, to reproduce, or to merge (see Figure 9). In 1999, de Garis encoded the transition rules of a cellular automaton into a genome evolved with a genetic algorithm [34]. He observed that, using a von Neumann neighborhood, only 14 states were possible for a cell in a 2D environment at each time step. This allowed a simple coding of the rules in a genome, enabling evolution with a standard genetic algorithm. With this setup, de Garis was able to produce several simple shapes (e.g., triangles and squares) or more complex shapes (e.g., letters, turtles, and snowmen), as illustrated in Figure 10.
5.1.2 Simulating Cell Differentiation
Whereas the previous work was focused on developing shapes, cell differentiation is one of the key aspects for simulating artificial embryogenesis. In a natural developmental process, cell differentiation dictates the specialization of a cell type. Starting from a unique cell, it allows the creation of various cell types, such as neurons, muscle cells and liver cells, which will have very different functions in organisms. In 1969, Kauffman introduced random Boolean networks (RBNs) [85] to simulate this feature. The interpretation of this regulatory network was simple: Each node controls a gene, and the node state represents the gene activity (activated or inhibited). The genome transcription produced the cell's final function.
In 1994, Dellaert and Beer proposed a developmental model using this network [36]. In his model, a Boolean network represented an artificial regulatory network for cell differentiation control Boolean networks were at the time (and still are) classical approaches in computational biology to simulate gene regulatory networks of real living systems. In this work, the authors used a 2D matrix that allowed simple cell divisions: The initial organism was made of only one cell that was covering the entire grid; during the first division, the grid was split into two, horizontally or vertically (the division plane orientation was controlled by the genome), and the new cell could differentiate. The aim was to visually observe cell differentiation by a modification of the cell color and explore the capacity of Boolean networks to produce various shapes with the help of genetic algorithm. Dellaert obtained several shapes, such as presented in Figure 11.
In 1997, Eggenberger Hotz explored asymmetric division and division plan control with a model able to produce a simple creature with a user-defined shape. This creature was able to move in the environment, actuated by an artificial gene regulatory network [47]. With this design, the model was able to simulate natural mechanisms of asymmetric cell division [68]. This kind of division allowed cell differentiation by producing daughter cells with different proteins. The regulatory network produced a specific protein used to adjust the orientation of the cell division plane and the division timing. The regulatory network also controlled cell physical dynamics and its own gene regulation, which corresponded to adhesion coefficients between cells. Cells periodically emitted molecules that modified adhesion parameters between cells and the environment. With this setup, Eggenberger Hotz produced a growing creature with a T shape. It was able to move in the environment by modifying its morphology [48].
In 2004, Bentley used fractal AGRNs, in which genes are expressed as fractal protein subsets of the Mandelbrot set, with the proteins interacting through a fractal chemistry, to show that this structure can produce complex growing shapes with a very small amount of genetic material [13]. The system demonstrated the capacity of GRNs—when associated with a developmental process—to compress the data necessary to generate shapes and behaviors. Krohn studied the dynamics involved in this process and applied a fractal AGRN to classical control problems such as mountain car driving and pole balancing [91].
In 2005, Flann et al. used a graph implementation of an artificial regulatory network to develop pictures composed of differentiated cells (illustrated by Figure 12 [51]). Similarly to Dellaert, the aim was to explore differentiation mechanisms in cells, but here with an increased level of complexity in terms of shapes produced. In the graph in the lowest panel of the figure, each node represents the expression level of a distinct protein, and each edge represents the interaction between proteins. In this model, cell coloration (see upper panels of the figure) revealed the cellular differentiation. Whereas simple shapes were easily produced with this kind of network, the use of multiple networks in parallel was necessary to produce more complex shapes. In this case, protein concentration levels had to be combined to determine the global gene activity.
Chavoya and Duthen developed an artificial gene regulatory network model in 2008 to solve the French flag problem [20]. It was inspired by the model in [5]. The goal was to explore the coevolution of shape and color (i.e., cell differentiation), both controlled by the same AGRN. Illustrative results are presented in Figure 13. The authors used a cellular automaton to generate the shape, based on de Garis' work, where rules were enabled or disabled by the artificial gene regulatory network. Moreover, morphogen gradients, pre-positioned in the environment, gave localization information to cells and generated further information for the artificial gene regulatory network. The authors obtained perfectly scalable flags and furthermore several shapes such as multicolor squares, triangles, and polyhedrons (3D).
In his study of complex systems, Doursat used a model based on gene expression levels to simulate the developmental process of complex shapes [41, 42]. An artificial regulatory network composed of three layers was used:
- •
A first layer used positioning data given by morphogens.
- •
A central layer contained boundary nodes, which allowed horizontal and vertical segmentation of the embryo. Gene regulation was also managed with this layer, thanks to the production of activator and inhibitor proteins.
- •
A third layer determined the regulatory protein production thanks to the concentrations of activator and inhibitor proteins produced by the second layer.
- 1.
Cell division allowed each cell of the organism to divide with a particular probability,
- 2.
Intercellular adhesion forces, based on a mass and spring, kept the global consistency of the organism.
Also in 2008, Joachimczak and Wróbel proposed stepping up to the third dimension and into continuous space, substantially increasing the complexity of the models [75]. Every cell had an artificial gene regulatory network and regulated a quantity of morphogens produced on its own. These morphogens guided the development of daughter cells in the environment. Cells had various sizes, depending on their stage of evolution. The organism's genome was one of the strengths of this model because of its capacity for complexification over generations. It was composed of a list of genetic elements where each element has a specific type with different functions during the genome parsing stage. The main types were:
- •
Regulatory elements: (also called promoters), regulated the activation of genes.
- •
Genes: products or substrates produced by the cell that were used to give pieces of information to the regulatory elements. They could be internal (intracellular), external (extracellular; also called morphogens), or receptors (interacted with external products and influenced the cell division axis).
- •
Special elements: coded the outputs of the regulatory network.
In 2011, Cussat-Blanc et al. used a discrete developmental model in which cells were controlled by an artificial gene regulatory network in order to produce 2D colored shapes [28]. In contrast to the previous model, the growing organism was required to develop a metabolism based on an artificial chemistry defined in the environment. The model was based on the coevolution of a bit-string artificial gene regulatory network that controlled the cell specialization into different colors, and a rule set that controlled the proliferation of the cells. Both were evolved using an evolutionary algorithm: the rule set to produce a shape and the necessary metabolism to survive, and the artificial gene regulatory network to specialize the cells as targeted by the user. The authors showed how morphogen gradients guided the AGRN to the regionalization of the cells. Figure 16 illustrates this model.
The models described in this section inspired many researchers to develop their own artificial regulatory networks or to apply such systems to specific problems. As one of many possible examples, Bongard and Pfeifer used a model close to Reil's model to develop modular robots [15]. These robots had a neural network that controlled each module (for rotation, elongation, etc.). The genetic expression of the artificial gene regulatory network allowed the activation or the inhibition of 23 phenotypically predefined transformations, such as module size growth, module division, parameter modifications, and neural network topology.
5.1.3 Morphology and Controllers
Based on their previous work, Schramm et al. used an artificial gene regulatory network to evolve both the morphology and the controller of virtual animats [73, 74, 78, 126]. The morphology was grown from a single cell using a developmental model comparable to the one presented above. The artificial gene regulatory network was then used to control the cell behavior (cell division, division-plane orientation, etc.). After a given developmental time, the morphology was frozen. The cells were transferred to a simulated aquatic environment, and the cell aggregate was transformed using Delaunay triangulation to a set of masses (centers of the cells) and springs (cell connections). The artificial gene regulatory network previously used to control the growth of cells was now used to control the spring stiffness in order to move the animat in the environment. AGRNs were evolved to reach the maximum distance. Figure 17 presents one of the animats obtained with such a method.
Such approaches brought artificial creatures closer to resembling living systems. Living systems learn locomotion and everything else during their entire life, from the very few first cells to the end of their lives; however, most previous work did not take this into consideration and used a fixed morphology. In this domain, AGRNs paired with a developmental model could exhibit powerful solutions, given their capacity of adaptation to changing environments. The oscillatory behavior of AGRNs was used to control the ATRON modular robot [149]: The robot, possessing a snakelike structure, was successfully controlled by an artificial gene regulatory network with the fractal representation [13].
In all previous work, artificial gene regulatory networks and developmental models were used with a specific objective described by a fitness function of the evolutionary algorithm. However, living beings do not act in the environment with an explicit lifelong objective function. While using a fitness function helps to quickly obtain interesting results and simplify the analysis of the creatures obtained, it can trap the system in a local optimum due to the engineering of the fitness function itself.
In 2014, Doursat and Sánchez gave an overview on how coevolution of morphology and controller using evolution and development can help to generate multicellular soft robots [43]. After classifying existing approaches in modular/soft robotics into four different categories of morphogenetic engineering,3 from human assembly (constructing) to rewriting/inserting (generating), passing through syncing/swarming (coalescing), and growing/aggregating (developing), they presented the three key components of their system, MapDevo3D, a multicellular soft-robot-growing platform:
- •
Cells (represented as a swarm) can adhere through elastic forces.
- •
Positional information (morphogenes) can diffuse in the environment for cell-cell communication.
- •
An artificial gene regulatory network can control cells.
Disset et al. (2014) proposed simplifying this fitness function and building a virtual environment complex enough to allow creatures with complex behaviors to emerge [38]. The fitness function was extremely simple: The artificial creature (and thus the artificial gene regulatory network that controls its cells) was evaluated based on its survival duration. The evolutionary algorithm, which evolves the AGRN, could explore all the possible strategies to reach this aim. In these first experiments, the authors proposed evolving 2D creatures fighting against harmful particles. Cells could, in addition to classical division and communication with morphogens, specialize into two different types: nutritive (capacity to extract energy from the environment) and protective (capacity to resist the particles). Cells had to self-organize in order to survive as long as possible.
In 2016, this work was extended to the third dimension and explored the developmental strategies in a more realistic environment in which cells must proliferate both in soil which contains nutritive resources, and in the air, where sunlight transforms the nutriment into energy [39]. Once again, the fitness function consisted of only the survival duration of the organism. In this experiment, the results were at first deceptive: Evolution was stuck in a local optimum. The authors explored novelty search strategies in order to surpass this deceptive result. (Novelty search had already been used in previous work on artificial embryogenesis [79]). The authors obtained complex growth strategies such as, for instance, the one presented in Figure 19, with simple diversity measures balancing the survival duration objective.
5.1.4 Bridging the Gap to Reality
While up to this point computational simulations of GRNs have advanced beyond the reach of synthetic biology, biological engineering has been rapidly catching up, with applications from recording images [97] and basic image processing [134] using engineered microbes, to quorum-sensing regulated drug delivery [151]. A number of computational elements have been developed using synthetic biology [66], and the development of BioBricks [130] has stimulated a movement in synthetic biology akin to the early days of electronic engineering. The limits of phenomena that can be understood from the perspective of biological circuit engineering have yet to be discovered [65]. The field and future of synthetic biology are well beyond the scope of this review, and so we direct the interested reader to some of the extensive reviews of the field [24, 54, 86].
5.2 Agent Control
Artificial embryogenesis was one of the first applications of artificial gene regulatory networks. Because AGRNs serve as the central controller of living cells, artificial embryogenesis was a perfect and natural framework to develop the models and explore their capabilities. Once they reached a satisfactory level of quality, researchers started to evaluate possible use of these controllers in different types of applications, more oriented towards real-world problems. This subsection presents the use of artificial gene regulatory networks of different kinds in the control of virtual agents.
One of the first applications of artificial embryogenesis was to use an artificial gene regulatory network to control a pole cart [91, 113]. This experiment, a typical benchmark in the evolutionary computation community, consists of balancing a pole on top of a cart. The cart's motion (left or right, with no possibility to stay put) is in a one-dimensional continuous environment of limited space. The controller senses the position of the cart, the pole angle, the cart velocity, and the angular velocity of the pole. This benchmark has been solved with multiple machine learning approaches [10, 146].
In 2010, Nicolau et al. used a bit-string artificial gene regulatory network based on the model in [6]. The model included input and output proteins connected to the sensors and the effectors of the cart. These proteins had specific hand-designed signatures, and their concentrations were updated differently than regulatory proteins: Input protein concentrations were not regulated by the network but fed by the cart sensors, and output proteins were regulated but did not regulate other proteins. In order to evolve the GRN to find the optimal controller, a (250+250) evolution strategy (ES) with up to 50 generations was used with a mutation consisting of mutating 1% of the genome bits with a 1/5 adaptation rule. The authors showed rapid convergence of the ES-evolved GRN with very good generalization capability of the network. Using the generalization test from [146], the evolved AGRNs showed close to optimum generalization behavior. With the same model, Nicolau et al. demonstrated later a decision-making system for index trading. The system decides to buy, sell, or do nothing according to the fluctuations of trading indices provided [112].
A similar experiment was developed by Trefzer et al. in 2010, in which an artificial gene regulatory network was used to solve various obstacle-avoidance tasks (cave, maze, distributed obstacles) [137]. Interestingly, the problem was implemented on an E-Puck robot, which shows the capacity of AGRNs to bridge the gap to reality. This work was recently extended with artificial epigenetic networks, in which artificial gene regulatory networks are used as the central system to dynamically distribute control tasks [139].
AGRN models have been employed to control agents in ecosystems for foraging tasks. Joachimczak and Wróbel [77] presented AGRNs expressing rich dynamics and motion patterns. Harrington and Magbunduku (2017) presented evidence suggesting that competitive dynamics in an ecology of genetically regulated agents can stimulate the evolution of complex behavior [63].
In 2014, in a complex environment, Sanchez and Cussat-Blanc used an artificial gene regulatory network to control a virtual car in TORCS, a simulated car environment [125]. In this work, an artificial gene regulatory network's input proteins were connected to the car sensors (distance to the track border and longitudinal and lateral speeds; see Figure 20), and the output proteins to the actuators (wheel, accelerator, and brake). After evolution on asphalt tracks only, the best network obtained was able to drive on any kind of track (turn shapes, etc.) and on other surfaces (asphalt, ice, rocks, etc.). Figure 20 shows the best AGRN obtained after evolution. This approach was extremely competitive with other approaches from the literature: It was able to win the Simulated Car Racing Championship4 in 2015 against eight other competitors of various kinds (neural networks, optimized scripts, etc.).
5.3 Neuromodulation
Recent work has begun to make a connection between learning and genetic regulation. In 2013, Harrington, et al. studied a robot navigation problem with a robot controlled by a temporal-difference reinforcement learning agent [64]. By introducing a neuromodulatory system governed by a GRN to control the agent's learning and memory, the robot was able to outperform traditional reinforcement learning. In a follow-up study, the ability of the genetically regulated neuromodulation system was utilized in a multi-task setting, where agents had to solve an array of different problems with both discrete and continuous state spaces, as well as one-shot and continuous rewards [29]. Agents were required to learn to solve a series of problems, while the same AGRN was used to regulate the learning parameters for each problem. It was shown that an evolved GRN could accelerate the learning of multiple tasks, and general problem-solving GRNs could improve learning beyond traditional reinforcement learning.
5.4 AGRN as Indirect Encoders
By exploiting previously developed algorithms for the genetic programming of register machines, Banzhaf and Lasarezyk (2005) realized an artificial-chemistry implementation of genetic programming [7]. The asynchronous evaluation of the evolved program represented an alternative approach to genetic programming, based on the parallel nature of chemical systems.
In 2012, Lopes and Costa implemented a similar idea of using a bit-string AGRN to indirectly encode a GP tree [105]. The AGRN was evolved using an evolution strategy. Before evaluation, genes following promoter sites expressed operators of the GP tree, and the inhibitor and enhancer sites were used to connect the operators. Lopes (2015) used this indirect encoding to evolve programs in symbolic regression problems (n-bit parity, squares, Fibonacci series, etc.), with promising results with respect to the quality of solutions and the very small programs generated [106].
In the same period, Wróbel and colleagues proposed using artificial gene regulatory networks to encode artificial neural networks [147, 148]. They used a leaky integrate-and-fire model of a spiking neural network (SNN) [33], in which regulatory units (equivalent to regulatory proteins) encoded the neurons of the regulatory network, connections between neurons were given by protein connections through inhibiting and enhancing sites, and the protein concentrations provided the potential of each neuronal membrane. With this encoding, the authors showed the capacity of the produced SNN, after evolution of the AGRN, to reproduce spikes when input stimuli were of high enough frequencies. Recently, indirect encoding of recurrent neural networks into genetic networks was used to solve dynamic problems such as state space targeting in a numerical dynamical system, the inverted pendulum, and orbit transfer control in a gravitational system [139].
The approaches introduced here are still in the early stages of their development, yet they are growing in complexity. Their main objective is to use an AGRN's very compact encoding to evolve large networks: An AGRN possesses a small genome in comparison to the millions or billions of parameters needed to optimize a deep neural network, for example. This compact encoding is expected to reduce the computation cost of optimization, as in nature.
5.5 Summary of the Applications
Artificial gene regulatory networks have been studied in a variety of contexts and applications, from dynamical systems to circuit design. In Table 1 we have listed key references for applications of AGRNs of which we are aware. Early research on AGRNs focused on the study of dynamic behaviors of networks using a wide range of representations. Recently, advanced applications of AGRNs have favored the use of bit strings and ODE-based models. As research on AGRNs continues, we expect that applications will continue to utilize and extend such models.
Problem | AGRN representations | ||
ODEs | RBN | Bit strings | |
Dynamics analysis | [30] [31] [91] [100] [107] [127] | [1] [27] [37] [44] [83] [84] [85] [110] [131] | [5] [6] [18] [93] [96] [117] [119] |
Morphogenesis | [13] [15] [28] [38] [39] [41] [42] [47] [48] [51] [73] [74] [75] [76] [126] [138] | [20] [36] | [90] |
Agent control | [4] [63] [73] [74] [77] [91] [125] [149] | [112] [113] [137] [139] | |
Neuromodulation | [29] [64] | ||
Indirect encoding | [147] [148] | [7] [105] [139] |
Problem | AGRN representations | ||
ODEs | RBN | Bit strings | |
Dynamics analysis | [30] [31] [91] [100] [107] [127] | [1] [27] [37] [44] [83] [84] [85] [110] [131] | [5] [6] [18] [93] [96] [117] [119] |
Morphogenesis | [13] [15] [28] [38] [39] [41] [42] [47] [48] [51] [73] [74] [75] [76] [126] [138] | [20] [36] | [90] |
Agent control | [4] [63] [73] [74] [77] [91] [125] [149] | [112] [113] [137] [139] | |
Neuromodulation | [29] [64] | ||
Indirect encoding | [147] [148] | [7] [105] [139] |
Figure 21 presents the references from Table 1 partitioned into the different categories of dynamics study, evo-devo (i.e., artificial embryogenesis), agent control, neuromodulation, and indirect encoding over the years. We take this as an indicator of research interest in each topic. The publication year of an article is calculated backward from 2017 in 5-year bins. Interestingly, we can observe a massive increase in publications at the beginning of the 21st century and the appearance of first applications. Evo-devo was a dominant benchmark to support the theoretical development of artificial gene regulatory networks (certainly because it is a very natural use of these networks) up until very recently (2012). Since 2012, however, Evo-devo topics seem to be decreasing in frequency,5 and the field appears to be transitioning to more real-world applications of gene regulation (agent control, neuromodulation, and indirect encoding). This seems due to a certain maturity of the technology and a better understanding of AGRN functioning. It also has to be noted that theoretical publications are still necessary to further develop artificial gene regulatory networks, since knowledge of their biological counterpart is still expanding very quickly and transfer between the two communities (ALife and genetics/bioinformatics) is recurrent due to the strong links existing between the models.
6 Conclusion
In this article we have reviewed existing work on the use of gene regulatory networks for computational purposes. We first introduced how genetic regulation works in living systems, followed by a discussion of existing computational models. This shows the diversity of encodings and dynamics that are currently being used; however, a rigorous comparison of models has yet to be performed. Recently, an initial study has been conducted in order to compare various encodings and dynamics [40]. Without doubt, the community would benefit from standardized benchmarks to facilitate the comparison of various models of gene regulation as well as other optimizable models, such as artificial neural networks, genetic programming, and handwritten scripts. The increasing frequency of competitions organized at conferences is one step in this direction, and serves as a good basis of comparison.
In our past experience of presenting artificial gene regulatory networks and applications to various real-world control problems, we are often asked about the difference between artificial gene regulatory networks and artificial neural networks (ANNs). While artificial gene regulatory networks and artificial neural networks can be used for similar purposes, AGRNs utilize a compact genetic representation: For instance, instead of encoding connection weights between neurons, which can mean the need to optimize millions of variables in recent deep neural networks, artificial gene regulatory networks only encode the 3D structure of proteins that codes for the dynamic interaction between them. This drastically reduces the number of variables, to a few hundred. This has widespread consequences, especially in the age of deep learning (DL) [60]. While applications of evolutionary algorithms to DL have just started to appear [101, 121], we expect that evolving DL neural networks with AGRNs will be a major application area in the future.
For a direct comparison between AGRNs and ANNs, however, the recurrent connectivity of AGRNs allows them to be best compared to recurrent neural networks. While there is significant work to be done in relating artificial gene regulatory networks to artificial neural networks, initial steps have been taken in [145]. There, Watson et al. show that evolving simple artificial gene regulatory networks is equivalent to the associative learning of weights in a Hopfield network. However, this observation has not been extended to artificial gene regulatory networks with more complex genetic representations. Also, Baran et al. recently proposed the use of AGRNs to study the evolution of social behavior and, more precisely, the underlying development of the brain's neural circuitry [8]. This opens new perspectives for studies of the connection between artificial neural networks and artificial gene regulatory networks.
Two other properties of AGRNs that are, in our opinion, particularly interesting and not yet fully used and understood are temporal dynamics and heterochrony. The first, temporal dynamics, allows a certain memory to emerge in the network: Concentrations can be updated constantly, at every time step of the simulation or problem resolution, while actions are executed once in a while. This provides the network with all the history of a given state of the environment, which is naturally kept by protein concentrations and provides a memory system of the GRN. Not yet mathematically studied or fully understood, these dynamics could be beneficial for long-term decision making.
The second, heterochrony, is a crucial property of these networks. As described previously in this article, this mechanism allows a slow modification of the network dynamics when mutation occurs. This mechanism is not yet sufficiently employed in current mutation operators of genetic algorithms. While crossover operators have been recently improved in [29], mutation is still crucial in AGRN optimization, since most approaches use very high mutation rates (∼75%), if not exclusively mutation. Whereas the NEAT algorithm has strongly impacted the evolution of neural networks [132], improving the evolutionary algorithm is a central question in order to find the best possible network for a given problem. More work is necessary in this domain in order to generate better results with artificial gene regulatory networks.
Finally, one domain in which artificial gene regulatory networks could excel but have not been well tested is online learning. Thanks to their easy-to-modify structure based on protein affinities, slight changes of proteins' tags while the agent is acting in the environment should be possible. A mechanism, comparable to backpropagation in artificial neural networks, will have to be designed in order to intelligently change these values according to the rewards obtained by the agent. The architecture of artificial gene regulatory networks should be helpful here, due to the small number of parameters one needs to modify in order to change entire networks. One could easily imagine particle-swarm-optimization-like motion, in which the AGRN's proteins would move in a 3D space (for a model based on three tags such as Cussat-Blanc et al.'s [39, 40, 125] model), attracted and repelled by other proteins according to the efficacy of the networks for a given task.
Possibilities opened by gene regulatory networks are numerous. Whereas biologists have made significant progress in understanding the inner mechanisms of gene regulation in living systems, much remains to be discovered and understood. These mechanisms produce extremely complex behaviors in living organisms, from embryogenesis to the regulation of everyday life. Computer science and more specifically artificial intelligence will benefit from these discoveries and, with gene regulatory networks, could produce more intelligent behaviors for artificial agents in the near future.
Acknowledgments
K.H. thanks the NSF Idaho EPSCoR Program and the National Science Foundation under award number IIA-1301792. W.B. thanks the Canadian NSERC for funding under discovery grant numbers RGPIN 283304-2012 and 2018-05365, and Michigan State University for providing JR Koza Endowment funding.
Notes
T = thymine and A = adenine.
More examples and the software to generate pictures and videos can be found online: http://www.irit.fr/∼Sylvain.Cussat-Blanc/ColorfulRegulation/index_en.php.
The science of engineering growing systems.
The evo-devo community is now also moving to application domains such as computational biology and soft robotics.