Plants are frequently wounded by mechanical impact or by insects, and their ability to adequately respond to wounding is essential for their survival and reproductive success. The wound response is mediated by a signal transduction and regulatory network. Molecular studies in Arabidopsis have identified the COI1 gene as a central component of this network. Current models of these networks qualitatively describe the wound response, but they are not directly assessed using quantitative gene expression data. We built a model comprising the key components of the Arabidopsis wound response using the transsys framework. For comparison, we constructed a null model that is devoid of any regulatory interactions, and various alternative models by rewiring the wound response model. All models were parametrized by computational optimization to generate synthetic gene expression profiles that approximate the empirical data set. We scored the fit of the synthetic to the empirical data with various distance measures, and used the median distance after optimization to directly and quantitatively assess the wound response model and its alternatives. Discrimination of candidate models depends substantially on the measure of gene expression profile distance. Using the null model to assess quality of the distance measures for discrimination, we identify correlation of log-ratio profiles as the most suitable distance. Our wound response model fits the empirical data significantly better than the alternative models. Gradual perturbation of the wound response model results in a corresponding gradual decline in fit. The optimization approach provides insights into biologically relevant features, such as robustness. It is a step toward enabling integrative studies of multiple cross-talking pathways, and thus may help to develop our understanding how the genome informs the mapping of environmental signals to phenotypic traits.
The ability to adequately respond to environmental challenges and pathogens is essential for a plant's evolutionary fitness. Responses are triggered by external signals, such as temperature, light, or pathogen components. These are sensed by receptors, which feed into signal transduction cascades that eventually lead to the generation of the response [12, 44]. The pathways leading from the perception of the signals to responses are, to a large extent, genetically predetermined. Regulatory gene networks (RGNs) and signal transduction cascades mediate the decoding of genetic information and ultimately the production of the response to environmental signals and challenges.
RGNs and signal transduction systems are composed of multiple genes and the products encoded by the genes. Gene products include RNAs and proteins, synthesized via the classical central dogma processes of transcription and translation. In a wider sense, molecules synthesized by proteins acting as enzymes may be considered to be more indirect products of a gene. Activity, or expression, of each gene is regulated by a specific set of products. Thus, RGNs are complex systems, and analysis of their constituent components in isolation is not sufficient to understand their dynamics and their system-level properties.
As central mechanisms for the realization of phenotypic properties, RGNs continue to attract interest in ALife. They have been the subject of a wide range of computational models [2, 27, 31, 36]. More recently, computational systems biology  aims to construct computer models of regulatory networks, and to use them to advance system-level understanding. Such models must constitute an explicit and detailed representation of a regulatory network, and they must be able to approximate the empirically observed gene expression dynamics quantitatively. RGNs are also important as an evolvable mechanism shaping the fitness landscape of natural evolution , and their complexity has been utilized, for example, to evolve robot controllers [4, 38].
RGN modeling can be carried out in a forward and a reverse direction. In forward modeling, network models are obtained by assembling the model directly. This can be based on generic properties, such as network connectivity , or activation, inhibition, and diffusion . Alternatively, specific regulatory networks can be modeled, such as those that organize circadian clocks  or flower morphogenesis [16, 22]. Forward models are succinct, and they are direct formalizations of biological hypotheses, but their validation against gene expression data is often difficult and sometimes purely qualitative. Complementary to the forward approach, reverse engineering of RGNs uses empirical gene expression measurements to construct a model in a data-driven manner. Examples of reverse engineering techniques include genome-wide disruption networks  or the ARACNE method . Reversely engineered models quantitatively fit the data they are based on by construction. However, identifying which biological hypotheses are supported by a reverse-engineered model is typically difficult, and it is often risky, since reverse engineering is almost always underdetermined  unless the network is very small.
Two aspects can be discerned for most regulatory network models. Firstly, the topology specifies the network elements in qualitative terms by describing elements such as genes and gene products, along with the regulatory interactions among these elements. Secondly, parameters, such as the decay rate of a gene product or the maximal strength of the regulatory effects that a gene product exerts on a target gene, describe the quantitative aspects of a regulatory network. We summarily refer to such parameters as dynamical parameters. Given a gene expression data set and a topology, the dynamical parameters can be estimated or reversely engineered (see, e.g., [28, 35, 45]). However, reconstructing both topology and dynamical parameters is not possible to date, mainly because such an approach would require data sets of a size that cannot be provided even by the most powerful high-throughput methods.
In the approach we present here, we combine forward engineering of topologies with reverse engineering of parameters. In this way, we aim to provide a top-down view of biological networks that captures gene expression in quantitative detail at the same time , and to quantitatively assess models of the regulatory networks underpinning the wound response in Arabidopsis based on gene expression measurements.
We use the transsys modeling framework , which provides a computer language that facilitates description of RGNs in a form that renders them amenable to various computational approaches and analyses. A key feature of transsys RGN models is their usability as components in integrated models (e.g., of plant development) as shown in [22, 23]. transsys has also been used to study methods for the analysis of gene expression data [23, 41]. It provides facilities for quantitatively describing regulatory effects in a programmatic way, and for simulating continuous-valued gene expression levels that can readily be compared with microarray measurements. The application programming interfaces include facilities for parameter optimization , which are useful for the approach presented here.
Arabidopsis is frequently used in experimental plant biology due to its compactness in physical size, length of life cycle, and genome size.1 Devoto et al.  have conducted gene expression measurements to study the signal transduction network mediated by the plant hormone methyl jasmonate (MeJA, subsequently referred to as jasmonate) and by wounding in Arabidopsis, and they have assembled a network model based on their data. This Boolean network model provides a qualitative account of the expression of 33 target genes. Based on this Boolean model, we have developed a transsys model of the wound response network. This enables a direct and quantitative computational evaluation of the model.
We describe here a novel reverse engineering approach to validate a computational model of a regulatory gene network against empirical data. We provide a novel method to evaluate such models when the kinetic parameters of the regulatory interactions involved are unknown. This method allows direct use of the empirical data to compare a candidate topology to a number of alternative topologies, in order to establish that the candidate is better capable of explaining the data than the alternatives. We introduce a model that is composed of key components of the network organizing the wound response in Arabidopsis, and apply our method to evaluate this model by comparing it with alternative models constructed by various randomization procedures. This approach will contribute to strengthening the link between computational systems biology and experimental molecular biology.
2.1 The Wound Response in Arabidopsis thaliana
Devoto et al.  identified the COI1 gene as a key regulator mediating the response to wounding and to MeJA. Using Affymetrix microarrays, the authors measured expression of more than 8,000 genes in wild-type Arabidopsis Col-gl1 and coi1-16 plants, untreated and following wounding and MeJA treatment.
Measurements were obtained for wild-type plants before treatment (five replicates), 30 min after wounding, and 6 hr after wounding (two replicates), and with treatment with jasmonate after 1 hr, after 6 hr (three replicates), and after 48 hrs. For coi1-16, measurements were made before treatment (three replicates), 30 min after wounding, and 6 hr after wounding (two replicates), and with treatment with jasmonate after 1 hr and after 6 hr (two replicates). The authors identified 513 differentially expressed genes and proposed a Boolean model of the transcriptional network regulated by wounding and MeJA. Following the approach used by Genoud et al., inputs that are continuous in nature, such as jasmonate concentrations, as well gene expression levels, have to be discretized (e.g., by defining a threshold value for each such quantity). The Boolean model [11, Figure 4] is composed of the COI1 gene and several unlabeled gates, and connects the inputs jasmonate and wounding to the expression of 33 target genes, the outputs. Figure 1 shows a modified and extended version of this Boolean model.
The set of expression profiles of the 33 target genes provide the basis for evaluating our computational RGN models. From this set, we computed gene expression profiles with exactly one measurement for each plant line, treatment, and time point by averaging data where replicated measurements were carried out. The set of gene expression profiles thus obtained is called the empirical data set henceforth.
The gene expression profiles in the empirical data set consist of 11 measurements for each gene (subsets of six measurements for wild type and five for coi1-16, consisting of reference measurements before treatment and measurements at two or three points in time after each of the two treatments). Mathematically, the expression profile of a target gene is thus an 11-dimensional vector.
transsys  is a framework for computational RGN modeling, comparable in some regards to systems such as SynTReN  or Gepasi . The transsys computer language formally represents RGNs as transsys programs. An example is shown in Figure 2. A transsys program is composed of components of two types, genes and factors. Factors can be used to model transcription factors as well as any other effector molecules that impinge on gene expression. Each component is specified by an individual declaration. factor declarations contain the gene product's decay rate. gene declarations contain a promoter block and a product block. The promoter block represents the cis-regulatory information of a gene and is composed of promoter elements. It determines the gene's rate of expression as a function of the expression levels of the regulators of the gene. Each promoter element specifies a rate of expression. constitutive promoter elements specify a fixed expression rate. activate and repress elements state a trans-acting factor that activates or represses the gene, respectively. The strength of activation or repression is determined by a Michaelis-Menten function (see Electronic Supplement, http://www.mitpressjournals.org/doi/suppl/10.1162/ARTL_a_00076, Section C.1), which is parametrized by vmax, the maximal regulatory effect, and KM, the concentration of the trans-acting factor at which it exerts half its maximal effect. An individual promoter element models a transcription factor's binding site and the effect that binding of the factor has on the gene's expression. The product block states which factor is the gene's product. The effects of the individual elements of a gene's promoter are accumulated additively. Further details of the transsys language are provided in [22, 23] and also on the transsys Web site.2
transsys simulates gene expression dynamics indiscrete time steps. Given a transsys program and the current expression levels of factors, a single update is the computation of the expression levels in the next time step. Repeated updates computed in this way generate synthetic expression profiles of the factors. In this work, we compare these synthetic expression profiles with empirical measurements in order to evaluate transsys programs for their ability to explain empirical data.
transsys programs can be separated into a program structure, which determines what the regulators of each gene are and whether the effect of a regulator is activation or repression, and a program parametrization, which comprises the numerical parameters contained in the program. Figure 2 illustrates this by displaying two transsys programs with identical structures but different parametrizations. transsys program structures are the forward-engineered models that we evaluate in this study. The potential of each model to explain the empirical gene expression profiles is quantified by computing parametrizations that are optimized to fit the synthetic expression profiles to the empirically observed ones.
2.3 A transsys Implementation of the Jasmonate- and COI-Mediated Network
The Boolean network model in  is composed of two input signals (jasmonate and wounding), the COI1 gene, and the 33 target genes. Seven categories of genes are defined and labeled a through g. The information of this Boolean network, along with further information provided in , has been used to develop a transsys model. Figure 3 illustrates the network structure and its implementation in transsys.
The genes are straightforwardly implemented in transsys as genes. All other entities are represented as transsys factors; the set of factors consists of jasmonate, the conceptual signal wounding, the COI1 gene product representing the COI1 protein,3 and the products of the 33 target genes. jasmonate and wounding are not encoded by any genes, so that these products cannot be generated by the network itself. These two factors are input signals, and therefore their expression levels are externally controlled, as described in further detail below.
The regulatory circuitry is implemented in the promoter blocks of the transsys program. The coi1 gene is activated by the input signals jasmonate and wounding. Figure 3 thus shows arrows leading from these factors to the coi1 gene in the top part, and the underlying activate elements in the promoter of coi1 in the bottom part. The COI1 product activates target genes in groups a and f, and represses target genes in groups b, c, and g. Independently of COI1, jasmonate activates target genes in group b and represses those in group e. wounding activates genes in groups b, d, and f; and it represses genes in groups e and g, independently of COI1. These activatory and repressive effects are straightforwardly implemented by activate and repress promoter elements, respectively. In addition to these regulatory promoter elements, a constitutive element is added to the promoters of all target genes. The transsys program thus constructed is henceforth called the wound response model or program.
2.4 Generating Synthetic Expression Profiles
Simulated gene expression data sets were generated to have the same structure as profiles in the empirical data set. They consist of reference measurements for wild-type and coi1-16 mutant plants, and measurements after treatment with jasmonate and by wounding. The coi1-16 mutation was simulated by replacing the product of the coi1 gene with COI1_nonfunc. This product does not appear as a trans-acting factor in any promoter element and thus simulates a mutation that renders the gene's product entirely nonfunctional. In terms of molecular genetics, this corresponds to a mutation leading to a complete loss of function in the gene's coding region.
Gene expression dynamics starting with arbitrary initial gene expression levels typically go through a transient phase before gene expression levels equilibrate into a stable pattern. The initial transients are largely an artifact of simulation, while the stable pattern is determined by the regulatory network and is thus a significant aspect of the object of modeling. We therefore followed the common practice of discarding such initial transients. For both the wild-type network and the network with the COI1 loss of function mutation, the untreated state is simulated by starting with expression levels of all factors set to 0 and performing 100 updates. This equilibrated state is used as the reference measurement for computing log ratios, as described in Section 2.5.
Wounding and jasmonate treatment were simulated by taking the untreated state and setting the expression level of the wounding factor and of the jasmonate factor to 1, respectively. The levels remain at 1 once set, because the wounding and jasmonate factors have decay rates of 0, so the signal remains “on” once it is “switched on.” Following these input settings, expression profiles consisting of measurements corresponding to those provided by the empirical data set were simulated. For this purpose, one transsys update was defined to correspond to 5 min (i.e., synthetic measurements after 6, 12, 72, and 576 updates correspond to empirical measurements after 30 min, 1 hr, 6 hr, and 48 hr, respectively). The synthetic reference measurements and synthetic measurements after simulated treatments are then assembled into a synthetic data set, which is subsequently compared with the empirical data set. Synthetic data generation is described in further detail in the supplementary materials (Electronic Supplement, Section C.3).
2.5 Comparing Expression Profiles
The simulation procedure to generate synthetic data sets from transsys programs is designed so that the synthetic expression profiles have the same 11-dimensional format as empirical gene expression profiles, thus rendering the synthetic profiles directly comparable to the empirical profiles. Quantitatively comparing gene expression profiles requires a measure of distance between profiles. Distances between data sets are defined as the sum of distances between each empirical profile and the corresponding synthetic profile.
We used squared Euclidean and correlation distances between profiles. The distance based on squared Euclidean distance is denoted by Dssq (for “sum of differences squared”). This distance evaluates to 0 only if two profiles are identical. We furthermore used the uncentered Pearson correlation, following Eisen et al. . As correlation S is a similarity indicator taking on values within [−1, 1], we transformed correlation coefficients to the dissimilarity semimetric S′ = 1 − S. Values of S′ range from 0 to 2, and smaller values indicate greater similarity. A value of S′ = 0 indicates maximal correlation, which occurs if one profile is equal to the other when all its expression levels are multiplied by a scaling constant greater than 0. In this case, the profiles can be considered to have the same shape on a different scale.
The regulatory response to a treatment (i.e., whether the gene is upregulated or downregulated) is in many cases more important than the absolute levels of gene expression. This is quantitatively captured by the ratio of a gene's expression level after treatment to its expression level in the reference state before treatment. A disadvantage of these expression ratios is that upregulation by a factor u yields a ratio of u while downregulation by the same factor u yields a ratio of 1/u, thus making upregulation and downregulation difficult to compare. This is overcome by using logarithmized expression ratios (log ratios ). The log-ratio transformation is routinely applied to gene expression data [15, 29], and we have therefore included it in our study. Both sum of squared differences and correlation distance can be applied to log-ratio-transformed data sets. We denote them by Dssq(l) and Dcorr(l), respectively. Further details about the distance measures are given in the supplementary materials (Electronic Supplement, Sections C.4 and C.5).
2.6 transsys Parameter Optimization
The parameters of the target genes and their products were subjected to parameter optimization to minimize distance (and thus maximize similarity) between the empirical data set and the synthetic data within the constraints given by the candidate model structures. For this purpose, the distance measures Dssq, Dcorr, Dssq(l), and Dcorr(l) were employed as objective functions, and a gradient search strategy was applied to minimize the objective function. Note that attaining an objective-function value of 0 would correspond to a transsys program parametrized so that it generates target gene profiles that are identical, or completely correlated, to the profiles in the empirical data set.
The optimization procedure is:
Choose an initial current parametrization at random.
Determine the gradient at the current point by computing the objective for parametrizations offset by small negative and positive values for each parameter.
Determine the new parametrization by conducting an interval search for the nearest local minimum in the direction of the gradient.
If the improvement achieved by the new parametrization is less than 1%, or if the distance between the current and the new parametrization is less than 0.001, output the current parametrization as a local optimum and terminate.
Use the new parametrization as the current parametrization, and continue with step 2.
The distance measures as a function of the transsys program parametrization generally have multiple local optima, and as a consequence, results obtained by the local gradient search used here strongly depend on the initial parametrization. Therefore, we conducted multiple searches starting from different initial parametrizations for each transsys program.
Some extreme parametrizations of a transsys program (e.g., where parameters are very close to 0 or very large relative to other parameters) result in artifacts that are caused by the limits of numerical precision. Such problems can be avoided by restricting the values that are explored by the optimizer. Further to avoiding numerical artifacts, this approach may also help to improve optimization. Therefore, dynamical parameter values were constrained by the following conditions:
Decay rates are within the range [0, 1].
Parameters to constitutive promoter elements are in the range [0, 10].
The Michaelis constants (KM) for activate and repress promoter elements are in the range [0.01, 10].
The vmax values for activate and repress statements are in the range [0, 100].
For each parameter type, the constraints were chosen to allow generation of expression levels on the order of magnitude found in the empirical data; for example, the maximal values of the constitutive parameters, KM and vmax, combined with a low decay rate, would result in synthetic gene expression levels in excess of the highest expression levels in the empirical data set. Further details of the optimization procedure are given in the supplementary materials (Electronic Supplement, Section B).
2.7 Comparative Assessment of the transsys Program Model
We assessed the consistency of the wound response model by comparing it with alternative transsys program models—called null models, randomly regrouped models, and perturbed models—described in further detail below. For all models, we optimized parameters in multiple, independent optimization runs and compared the final objective-function values attained. Comparisons are summarized using boxplots as shown in Figures 4 and 5, and also based on t-tests and U-tests, as shown in the Electronic Supplement (Section D).
The null model is constructed by removing all activate and repress elements from the target gene promoters, leaving only the constitutive element intact. Thus, in the null model, all target genes have a constant equilibrium expression level that is determined by the constitutive element and unaffected by the input signals wounding and jasmonate, or by the COI1 factor. The null model serves as a minimal baseline, all other models are expected to perform better than the null model.
Randomly regrouped models are generated by assigning each gene to a group chosen at random. The number of genes in each group is preserved in this process, which thus corresponds to drawing random promoters without replacement. After optimization, randomly regrouped models are expected to perform better than the null model on average, as part of their regulatory structure will typically be capable of mimicking some characteristics of the empirical data by chance. On the other hand, if the wound response model captures the structure of the biological network well, it should perform significantly better than the randomly regrouped models.
Finally, perturbed models are versions of the wound response program generated by choosing n target genes at random and assigning them to a random gene group different from their original group. This technique facilitates more gradual changes of the original transsys program. With n = 0, the wound response model is not changed, while with n = 33, no target gene remains in its original group. Assignment of genes to their original group is excluded to ensure that the number of perturbations indicates the divergence of a perturbed model from the wound response model with minimal random variation.
The null model is derived from the wound response model by removal of components only. Therefore, each model can be parametrized so that it becomes equivalent to a null model—for example, by setting all vmax parameters in the promoter elements of the target genes to 0. As a result, the global minimum of the objective function attainable with the null model is at least equal to the global minimum that can be attained with the other models. Consequently, the scores obtained with the wound response model must be expected to be significantly smaller than those obtained with the null model. Deviations from this inequality indicate that the scores are not valid as a basis for assessing correspondence between models and the regulatory network underpinning the empirical measurements. Such cases may occur, for example, with very rugged objective functions, or by an unsuitable choice of parameter value ranges made accessible to the optimizer. It is therefore important to include a null model as a control that indicates such problems.
Computer programs to generate randomly regrouped models and perturbed models were written in the Python programming language , using of the Python application programming interface provided by the transsys framework. Parameter optimization was implemented in Python, partially based on the transsys optimization code developed for . The R programming language  was used for data analysis and visualization.
We carried out 50 optimizations for the wound response model, the null model, and 20 randomly regrouped models. The results obtained for the distance measures Dssq, Dcorr, Dssq(l), and Dcorr(l) are shown in Figure 4.
For optimization using the sum of squared distances to the untransformed empirical profiles (Dssq, Figure 4a) the wound response model and the null model achieved scores that, given the overlap of the interquartile ranges and the notches,4 are not significantly different. The scores attained by the randomly regrouped models are subject to substantial variation that exceeds the interquartile ranges and the notches, indicating that the randomly regrouped models vary significantly with respect to their fit to the empirical data set. The scores of the randomly regrouped models are higher than those of the wound response model, that is, they fit the empirical data set less closely than the wound response model, and the notch criterion indicates that this difference is significant. t-tests and U-tests (data shown in the Electronic Supplement, Section D) confirm that there is no significant difference between the wound response model and the null model in Figure 4a, and show not all randomly regrouped models yield optimized scores that are significantly higher than those obtained with the wound response model.
Optimization using the correlation distance applied to the untransformed profiles (Dcorr, Figure 4b) results in a clear distinction of the wound response model from the alternative models. The interquartile range of the scores attained with the wound response program does not overlap with the interquartile ranges of the scores obtained with the null or randomly regrouped models. The fact that the notch range of the score with the wound response program score boxplot does not overlap with the notch ranges of the other boxplots further supports the conclusion that the fit to the empirical data achieved with the wound response program is significantly better than that achieved with the alternative models. The randomly regrouped models do not fit to the empirical data set significantly better than the null model, indicating that the better fit of the wound response model can be ascribed specifically to the designation of the gene groups, rather than to the presence of regulatory connectivity between the input signals and the target genes in general.
Optimization using distances of the the log-ratio-transformed profiles (Dssq(l), Figure 4c, and Dcorr(l), Figure 4d) result in an even clearer indication of the better fit of the wound response model than that of the alternative models. In contrast to optimization of Dcorr, the randomly regrouped models with optimized parameter fit the empirical log-ratio-transformed profiles better than the null model does. With the sum of squared differences, the optimization results show much more variation with the individual randomly regrouped model than with the correlation-based distance, indicating that Dssq(l) is more sensitive to differences in models than Dcorr(l).
The results of parameter optimization on perturbed variations of the wound response model are shown in Figure 5. The values of all four objective functions increases the number of perturbation increases. This correlation is more pronounced with the distance measures that operate on the log-ratio-transformed data (Dssq(l) and Dcorr(l)).
The transsys wound response model captures a part of the regulatory and signal transduction network that mediates responses to wounding in Arabidopsis thaliana. Specifically, it includes explicit representations of the molecules jasmonate and COI1, enabling explicit modeling of jasmonate treatment and of loss of COI1 function. As an advantage over the previous Boolean model by Devoto et al. , the transsys model individually represents each target gene. Therefore, it can generate synthetic gene expression profiles for each target gene. These can be directly compared with microarray gene expression measurements, thus establishing a more direct validation of the model against empirical data.
Arabidopsis has been the object of a number of computational modeling approaches. Mintz-Oron et al. presented highly detailed metabolic network models , which, however, do not cover regulation. The most advanced models of regulation and signaling in Arabidopsis pertain to auxin signaling [14, 17]. A recent model of jasmonate signaling  focuses on transient pulses that result from jasmonate activating its own synthesis. While our model does not include this aspect, it is the first model that explicitly includes target genes and wounding as an environmental input signal, and that is quantitatively validated against gene expression data.
Trans-acting factors and other components of the gene expression machinery interact in various ways, resulting in complex functions that determine gene expression. The transsys model of gene regulatory effects, based on additive contributions of individual promoter elements, is a simplification of these complex functions. It is therefore important to note that the numerical parameters of a promoter element do not directly represent biochemical properties.
Promoter elements in a transsys program should be considered to formally express the hypothesis that a trans-acting factor exerts a regulatory effect on the respective gene. While the detailed quantitative nature of regulatory effects may be unknown and different for each gene, it is plausible (and commonly assumed) that the effect of trans-acting factors on their target genes is monotonic and saturable. The transsys model of gene regulation, outlined in the Electronic Supplement (Section C.1), incorporates these two key properties, and the functions underlying the activate and repress elements can be parametrized to approximate any monotonic and saturable function fairly closely. Promoter elements in a transsys program represent semiquantitative hypotheses in this sense.
On the molecular level, binding sites commonly realize regulatory effects, and transsys promoter elements represent binding sites in the RGN. In contrast to this, Boolean functions that are used in Boolean networks do not directly model individual binding site effects. Therefore, a transsys model is a more realistic computational model with respect to capturing monotonic and saturable regulation, and with respect to modeling individual binding sites.
The empirical data set contains measurements for a small number of irregularly spaced time points for each condition. This precludes the use of parameter estimators described in  and , which require estimation of slopes along time series. The optimizer we have used does not have that requirement. However, it is not deterministic, and its results are subject to noise. Furthermore, results of individual optimizations do not provide a basis for reliably comparing networks. We have therefore performed sets of multiple optimizations for each candidate network structure and used boxplots to verify that these sets provide sufficient power to discriminate the structures compared in this study.
Robustness against changes or random fluctuations is an important feature of biological networks [16, 20, 48]. A network is robust in this sense if changes in its parametrization result in small changes in the objective score. Therefore, robust optima can be expected to be surrounded by larger basins of attraction. From a perspective of evolutionary developmental biology, it is interesting to observe that gradient search is more likely to find optima that have a large basin of attraction. More specifically, network structures with objective functions that have the same set of optima receive different median scores if the basins of attraction surrounding the optima differ in size. The median optimized objective scores can therefore be interpreted to indicate robustness of the wound response model, in addition to its consistency with the empirical data. Thus, in contrast to the Boolean model, the transsys wound response model implicitly takes robustness into account. It would be interesting to further investigate this important biological property, perhaps by computationally exploring the “epigenetic landscape”  of the wound response program and its alternatives. This would provide a point of departure for finding out whether this landscape is less rugged with biologically evolved topologies than with average random topologies, or whether the optima in biological topologies are flatter  or less deceptive  than in the average case.
The Arabidopsis wound response model connects the input signals of wounding and jasmonate to regulatory responses in 33 target genes. Its implementation in transsys enables comparative evaluation of alternative models on the basis of empirical gene expression measurements. This evaluation is more direct and objective than previous approaches, which involve more human interpretation, such as discretizing continuous variables (e.g., by means of thresholds).
The purpose of our method is to determine whether the structure of the wound response model (i.e., its topology) enables it to explain the empirical data better than the alternative models do. We use optimization as a tool to characterize the potential of a network structure to approximate the observed gene expression profiles. The parametrizations produced in this process should not be interpreted to be approximations to the true parameters in the biological system. Our results show that collections of networks with the same structure and independently optimized dynamical parameters provide a statistical basis for discriminating structures.
Several distances or metrics can be used to quantitatively score the match between empirical and synthetic gene expression profiles. Following , we have have included Euclidean and correlation measures and applied them to untransformed and log-ratio-transformed expression data. The choice of the distance measure may have a profound effect on the characteristics of the objective function (e.g., with regard to ruggedness). Also, the network model structure may impinge on the objective function's shape. The performance of the parameter optimization technique may be subject to variation depending on such characteristics of the objective function, and this may result in spurious discrimination. Therefore we have included a null model, which has no regulatory connections. If the scores of a candidate model are not significantly better than those of the null model, the results cannot be used for model discrimination, due to problems in the design of the objective function or the optimization procedure.
This control indicates that optimization of Euclidean distance between profiles of absolute gene expression values (i.e., with the Dssq objective function) is not suitable for model discrimination and that correlations of relative change (i.e., the Dcorr(l) objective function) yields best results. This shows that choice of an adequate objective function is important for our method, and it is consistent with the view that the shape of an expression profile carries more information about the gene's function than absolute gene expression values, as suggested by . It also indicates that the log-ratio transform, which is now widely adopted, is suitable for the method we introduce here.
The implementation of the wound response model in transsys enables further extensions and analyses. We currently work to extend the wound response transsys model by including further regulatory factors in order to explore further features of the internal regulatory circuitry that results in the different target gene groups. The empirical data set used for scoring will be extended in order to complement the growth in regulatory complexity captured by the candidate models.
The approach introduced here is applicable to a wide spectrum of computational models. For wound response modeling, it would be interesting to include aspects of crosstalk with other pathways and its relationship to protein degradation , as well as to the spatial organization of the wound response . It would also be worthwhile to incorporate components of the jasmonate signaling and regulatory network that have been discovered more recently, most notably the JAZ proteins  and possibly some of their complex interactions . Having established the computational systems biology framework for modeling the jasmonate signaling network, we now plan on extending the model to reflect the emerging integrated view of jasmonate signaling [21, 43], and to assess extended models against further empirical data as these become available. Finally, the biological context of the wound response network is the activation of target genes to implement an appropriate response, based on input that originates from sensory mechanisms that detect wounding. From this perspective, the wound response network can be considered as a component in a sensor-actuator system, and it would therefore be interesting to systematically study the properties of alternative RGN models within a sensor-actuator framework, such as .
This work was supported by the Biotechnology and Biological Sciences Research Council (grant BB/F009437/1).
See The Arabidopsis Information Resource, http://www.arabidopsis.org/.
Identifiers printed in the monospace font, such as COI1, refer to elements in transsys program code, while identifiers printed in proportional fonts, such as COI1, refer to molecular entities.
R documentation for the boxplot function states on the notches (i.e., the indentations around the median in the boxplots that are used in Figures 4 and 5) that “if the notches of two plots do not overlap this is ‘strong evidence’ that the two medians differ,” and the documentation for boxplot.stats adds that this is “said to be rather insensitive to the underlying distributions of the samples.”
Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Gogerddan, Aberystwyth, Ceredigion, SY23 3EB, United Kingdom. E-mail: email@example.com
School of Biological Sciences, Royal Holloway University of London, Egham, Surrey, TW20 0EX, United Kingdom. E-mail: Alessandra.Devoto@rhul.ac.uk
School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ, United Kingdom. E-mail: J.G.Turner@uea.ac.uk