## Abstract

We describe a method to identify *relevant subsets* of variables, useful to understand the organization of a dynamical system. The variables belonging to a relevant subset should have a strong integration with the other variables of the same relevant subset, and a much weaker interaction with the other system variables. On this basis, extending previous work on neural networks, an information-theoretic measure, the dynamical cluster index, is introduced in order to identify good candidate relevant subsets. The method does not require any previous knowledge of the relationships among the system variables, but relies on observations of their values over time. We show its usefulness in several application domains, including: (i) random Boolean networks, where the whole network is made of different subnetworks with different topological relationships (independent or interacting subnetworks); (ii) leader-follower dynamics, subject to noise and fluctuations; (iii) catalytic reaction networks in a flow reactor; (iv) the MAPK signaling pathway in eukaryotes. The validity of the method has been tested in cases where the data are generated by a known dynamical model and the dynamical cluster index is applied in order to uncover significant aspects of its organization; however, it is important that it can also be applied to time series coming from field data without any reference to a model. Given that it is based on relative frequencies of sets of values, the method could be applied also to cases where the data are not ordered in time. Several indications to improve the scope and effectiveness of the dynamical cluster index to analyze the organization of complex systems are finally given.

## 1 Introduction

In artificial as well as in natural life one often encounters systems that show some forms of organization but cannot be understood by referring to simple hierarchical models (e.g., a tree). In most interesting cases one is faced with complicated situations, sometimes referred to as “tangled hierarchies,” where a clear-cut hierarchy of levels, with a unique well-defined direction of information flow, cannot be found.

Moreover, in several cases the interactions among the interesting variables are largely, or at least partly, unknown; therefore it is necessary to infer some hypotheses about the organization by observing its behavior “from outside,” that is, as a black box or a gray box.

In this article we will address the issue of identifying sets of variables that are good candidates as *relevant subsets* for describing the organization of a dynamical system. We will assume that some variables can be observed and that they change in time (possibly reaching an attractor state), and we will look for groups of variables that can represent the required relevant subsets for describing the system organization.

Note that the very notion of a relevant subset is somewhat ill defined. However, this difficulty is shared by several other interesting concepts, such as that of a cluster. In general, we stress that a good candidate relevant subset (CRS for short) should provide important indications about some key features of the system organization. While that might seem abstract at this stage, we will clarify how it works in a number of examples, and then we will come back to the issue of the definition.

The main features of good CRSs can be tentatively identified as follows:

- 1.
The variables that belong to a relevant subset show strong interaction and integration with the other variables of the same relevant subset.

- 2.
They show weaker integration with the other system variables or relevant subsets.

The outcome of the analysis we propose here is essentially a list of subsets, ranked according to the above criteria, that provide clues to understand the system organization. The list cannot be used with brute force methods: Its application to a particular case requires some care, but it can lead one to discover very interesting inobvious relationships.

A peculiar feature of the method that will be described and applied below is that it does not require knowledge of the system structure and rules. To clarify this statement, consider for example the case of a network composed of *N* nodes; to each node *i* a dynamical variable *x*_{i} is associated, which changes at discrete times (*t* = 1, 2, …) according to a precise first-order differential (or difference) equation. As usual, we will say that there is a directed link from node *i* to *k* if *x*_{i} appears on the right-hand side of the dynamical equation of *x*_{k}. The set of dynamical equations and the set of links will be collectively referred to as the dynamical rules and the topology of the network, respectively. It is important to stress that the application of the method only requires knowledge of the behavior in time of the variables *x*_{1},…, *x*_{N}, without any prior knowledge of the topology or of the dynamical rules. All that is needed is a set of numerical values of the relevant variables over time; therefore the method can be applied to models as well as to real-world data. If further information concerning the system (e.g., its topology) is available, it can be profitably used, as will be discussed in the final section; but it is not necessary.

An interesting feature of our approach is that it relies upon information theory. While the idea of applying concepts and measures of information theory to the study of complex systems is certainly not new, we found that a measure of this kind, the *cluster index* that was introduced by Tononi and Edelman [28] in the study of neural networks close to a stationary state, can be profitably generalized to study dynamical systems. This generalization is called here the *dynamical cluster index* (DCI). This measure will be described in detail in Section 2, but we want to remark here that it can be applied both to nonlinear systems with different attractors, and to transients; the choice will be dictated by the available information, and when both are available, both can be used.

The DCI can be applied to any subset of the system variables; its evaluation allows us to identify those subsets that are strongly integrated among themselves, and loosely interacting with the rest of the system. While it cannot be claimed that these subsets always correspond to “important” intermediate levels, they are good candidates for that role, so the application of the DCI is a good method to draw attention to hierarchical relationships, when there are any.

Note that the dynamical cluster index has actually been introduced in [32]; the present article describes further developments of the method (see Sections 2 and 7), and it also shows new interesting applications (see Sections 4, 5, and 6) that highlight its effectiveness and help uncover some of its features. In order to make the article self-contained, the DCI is also fully reviewed (Section 2), and an application to a model of random Boolean networks, which represents a basic test bed for the method itself, is given in Section 3.

A methodological aspect concerns the way in which this approach can be tested. As has already been stressed, it can be applied to data as well as to models, but in order to ascertain its properties it is useful to focus on the latter. We will therefore consider in this work dynamical models, whose role will be twofold: On the one hand, they will generate data concerning the behavior in time of the system variables, which will be used to feed the method that will yield a list of candidate relevant subsets. On the other hand, the structure of the model is perfectly well known to us, so we will compare the proposed organizational hints with the true organization of the model. In this way we will get information about the reliability of the method, as well as clues concerning the way in which its outcomes should be used to make proper inferences.

In the following, after presenting and discussing the method in Section 2, we will show the results of its application to various models. In Section 3 the random Boolean network framework will be considered, in order to verify that the method is able to identify subsets that make sense. In Section 4 a peculiar (artificial) model made of leaders and followers will be introduced, and it will be used to test the response of the system to external changing stimuli and to increasing noise levels.

While the systems considered in Sections 3 and 4 are Boolean, in the following sections the variables to be analyzed are continuous. Yet, in order to apply the proposed method, a peculiar discrete coding will be used, which does not describe the instantaneous values of the state variables, but rather the signs of their first-order time derivatives. This three-level coding (increasing, constant, decreasing) has proven very effective in analyzing a simulated chemical system in a continuously stirred tank reactor, individuating the relevant chemical subnetworks (described in Section 5), as well as in interpreting the results of a model, based on experimental data, of the dynamics of protein phosphorylation-dephosphorylation in the MAPK system, whose details are given in Section 6. This is so far the most complex case that has been analyzed with the DCI method.

Finally, in Section 7 we will revisit the method, taking into account the lessons learnt from the case studies. In particular, we will show that the DCI is a useful technique, but its application requires some ingenuity, and it should not be considered a brute force method. Some of the problems, to be discussed also in Section 7, are related to the fact that good clusters of *k* variables usually include good clusters of *k* − 1 variables, so identifying the “more relevant” subsets may be an ill-defined and controversial problem.

The DCI is only one (actually, the simplest) measure out of a number of alternatives based on information theory that can be applied to identifying CRSs. Indications for further studies will also be given in the final section.

## 2 The Dynamical Cluster Index

Let us consider a system *U*, composed of *N* elements assuming finite and discrete values. The cluster index, as defined by Edelman and Tononi [28], is an information-theoretical measure based on the Shannon entropy of both the single elements and sets of elements in *U*.

*x*

_{i}is defined aswhere

*V*

_{i}is the set of the possible values of

*x*

_{i}, and

*p*(

*v*) the probability of occurrence of the symbol

*v*. The entropy of a pair of elements

*x*

_{i}and

*x*

_{j}is defined by means of their joint probabilities:Equation 2 can be extended to sets of

*k*elements by considering the probability of occurrence of vectors of

*k*components.

In this work, we deal with observational data; therefore probabilities will be estimated by means of relative frequencies.

*C*(

*S*) of a set

*S*of

*k*elements is defined as the ratio between two other measures, namely, the

*integration I*(

*S*) of

*S*, and the

*mutual information*between

*S*and the rest of the system,

*U*−

*S*. Let us define the integration

^{1}asHere

*I*(

*S*) represents the deviation from statistical independence of the

*k*elements in

*S*. Its maximum value is (

*k*− 1)log(

*b*), where

*b*is the number of levels of the variable; so the ratio

*I*/(

*k*− 1) provides useful information to compare the integration of subsets of different size. This ratio can in some cases directly provide useful indications to identify candidate relevant subsets; however, it has been observed that the dynamical cluster index is effective even in cases where the normalized integration is not sufficient. Then, let us define (as usual) the mutual information

*M*(

*S;U*−

*S*) aswhere

*H*(

*A*|

*B*) is the conditional entropy and

*H*(

*A,B*) the joint entropy. Finally, the cluster index

*C*(

*S*) is defined as

Since *C* is defined as a ratio, it is undefined in all those cases where *M*(*S;U* − *S*) vanishes. In this case, however, the subset *S* is statistically independent from the rest of the system, and it therefore has to be analyzed separately. These cases can be screened in advance. In particular, if *I* is finite while *M* = 0, then *C* → +∞ and *S* is to some extent integrated, but also segregated from the rest of the system. It can be an interesting subset, but not of the kind we are looking for.

Note that *C*(*S*) scales with the size of *S*; therefore the cluster index values of subsystems of different size cannot be in principle compared. To overcome this limitation, cluster index values of systems of different size need to be normalized. To this aim, let us define a reference system: the homogeneous system *U*_{h}, randomly generated in accordance with the probability of each single state measured in the original system *U*, along all its series of states. Then, in order to have a reference value for the cluster index for each size, for each subsystem size of *U*_{h} the average integration 〈*I*_{h}〉 and the average mutual information 〈*M*_{h}〉 are computed.

*S*can be normalized by means of the appropriate normalization constant based on the size of

*S*:Furthermore, to assess the significance of the differences observed in the cluster index values, a statistical index

*T*

_{c}is computed:where 〈

*C*

_{h}′〉 and σ(

*C*

_{h}′) are the average and the standard deviation of the population of normalized cluster indices with the same size of

*S*from the homogeneous system, and ν = 〈

*M*

_{h}〉/〈

*I*

_{h}〉 is the normalization constant. Note that from Equation 7 we conclude that, in order to compute the statistical significance, the normalization with respect to ν is not a necessary step.

The definitions (5)–(7) are made without any reference to a particular type of system. In their original articles, Edelman and Tononi considered fluctuations around a stationary state of a neural system. Here this measure will be applied to time series^{2} of data generated by a dynamical model. In general, these data lack the stationarity properties of fluctuations around a fixed point. Moreover, depending upon the case at hand, either transients from arbitrary initial states to a final attractor, or collections of attractor states, can be considered (see Sections 3 and 4). For reasons that will be discussed below, in some cases responses to perturbations of attractor states will also be considered (Sections 5 and 6). In all these cases we will use Equation 5, which will therefore be called the *dynamical cluster index* (DCI).

The search for candidate relevant subsets of a dynamical system by means of the DCI requires first the collection of observations of the values of some variables at different instants. It is important to emphasize that no other information is needed; therefore, the method can be applied even if nothing is known about the structure of the system (such as functional relations among its variables). It is not even necessary to know the values of all the important variables, although of course the quality of the results can be lessened by lack of information. In any case, the information provided by this analysis can be complemented with other kinds of knowledge about the system, as discussed in Section 7.

In order to find candidate RSs, in principle all the possible subsets of system variables should be considered and their DCIs computed. In practice, this procedure is feasible in a reasonable amount of time only for small subsystems. Due to this combinatorial problem, some simplifications of the method are required in order to address the study of large systems. A simple approximation consists in sampling the configurations instead of exhaustively analyzing them all. In the absence of specific heuristics guiding the sampling, uniform sampling can be chosen. More precisely, a maximum fraction of samples to be evaluated in each subsystem is chosen. In this way, for subsets of small size the error introduced by sampling is limited. Indeed, the information provided by RSs of small size is usually more significant than that by those of large size, because large candidate RSs can be composed on the basis of smaller ones.

Nevertheless, a uniform sampling might miss subsets with high DCI value. To overcome this problem, random sampling is complemented with a simple heuristic search. Once all the samples of size *k* are evaluated, the samples of size *k* + 1 are composed of random samples plus all the size-(*k* + 1) neighbors of the size-*k* subset with the highest DCI value. This heuristic has proven to be quite effective, because usually the subsets with highest DCI value are composed of subsets that in turn have a high DCI value, compared to the subsets of the same cardinality. Once all the samples of size up to *N* − 1 are evaluated, the *T*_{c} is computed so as to rank the candidate RSs.

In our experiments, we always relied on the procedure described above. However, other search procedures can be adopted. As an ongoing work, we are experimenting with a genetic algorithm for searching the candidate RS with highest DCI value for each size.

In the following we will show the result of the application of this ranking method to some interesting classes of systems. The method draws our attention to the subsets that are highly functionally correlated and that represent possible candidate relevant subsets.

The data will be presented in a way that is clearly exemplified by the first case studied (see Figure 1). In each table, a horizontal line is associated to a subset *S* of the whole system: The variables are ordered from left to right; a black square is associated to those variables that are present in the subset considered, and a white square to those variables that are not members of that subset. So, for example, the first row of Figure 1d describes a subset composed of the last three variables only (N4, N5, and N6), while the third row refers to a subset composed of N2, N4, N5, and N6. For obvious reasons, we refer to these rows as *masks*: Each mask is a graphical representation of a subset. In each row of Figure 1d one also finds a number, which is the *T*_{c} of the corresponding subset. The subsets are listed from top to bottom according to their *T*_{c}'s, in decreasing order.

The DCIs (which are at the basis of the computation of the *T*_{c}'s) are computed from dynamical data whose kind may depend upon the specific case that is considered. We will state which types of data are used in the various applications, and will finally comment on their properties in Section 7.

## 3 Results on Boolean Networks

The case study presented here consists of three synchronous deterministic Boolean networks (BNs), described in Figure 1. BNs are an important framework frequently used to model genetic regulatory networks [18, 19], also applied to relevant biological data [23, 26, 29] and processes [24, 30]. The aim of this case study is to check whether DCI analysis is capable of recognizing special topological cases, such as causally independent subnetworks. In particular, BN1 is made of two independent components, while in BN2 the same components do interact. Finally, BN3 is the union of the two previous cases.

This system has already been described in [32]; however, the main results are reviewed in this section, as they illustrate the method and provide information about its performance. Note that the behavior of each node is affected by the values of more than one other node, so traditional analyses based on correlation between pairs of variables might fail. For example, the computation of Pearson correlation coefficients of the networks of this section does not lead to identifying related variables, given that only diagonal elements take non-negligible values.

The data used in order to determine the relative frequencies that are necessary to compute the entropies required to apply the DCI method are obtained from the states of the various attractors, each one weighted according to its basin of attraction.^{3} In order to visualize the procedure, imagine listing all the states of all the attractors in horizontal lines, one below another; each horizontal line contains the values, ordered from left to right. If all the basins of attraction have the same size, each attractor is listed only once; otherwise, an attractor is listed a number of times proportional to its relative frequency. Suppose that this procedure gives rise to a series of *T* horizontal stripes. Then the relative frequency of, say, the value 1 of (e.g.) the third variable is simply given by the number of 1's in the third vertical stripe divided by *T* (of course, the relative frequencies of combinations of values, which are needed to compute the joint entropies, are computed in much the same way).

The rationale for using the attractor states for this analysis is, intuitively, that attractors should be able to capture the important functional relationships in a system. This holds true when the attractor landscape is rich enough, but there are cases where a different approach is needed, as will be seen in Section 5. Further comments on this issue are deferred to Section 7.

The DCI analysis is able to correctly identify the two separated subnetworks of BN1 (see the masks of the first and second rows of Figure 1d). The same type of analysis clusters together five of six nodes of BN2: those already clustered in BN1, plus node 2 and the node that computes the xor of the signals coming from the two just-mentioned groups. Indeed, all these nodes are needed in order to correctly reconstruct the BN2 series. The analysis is able to identify all MDSs also when all the series are merged together (Figure 1f, where the top two clusters correspond respectively to the five nodes already recognized in BN2 and to the whole BN2 system, while the third and fourth rows correspond to the independent subgraphs of BN1—see [32] for details. Experiments performed using asynchronous update yielded essentially the same results. Therefore we come to the conclusion that the DCI provides useful information for identifying the relevant subsets in this case (which is admittedly somewhat artificial, but this case was planned in order to test the method itself).

## 4 Results on Leader-Follower Dynamics

To assess the applicability of the method, we studied a simple model in which the integration among variables in a subsystem under observation and its mutual information can be tuned by acting on two parameters. The model abstracts from specific functional relations among elements of the system and could resemble a basic leader-followers model (see Figure 2). The system is composed of a vector of *n* binary variables *x*_{1},*x*_{2},…, *x*_{n}—representing, for example, the opinion in favor of or against a given proposal. The model generates independent observations of the system state, that is, each observation is a binary *n*-vector generated independently of the others, on the basis of the following rules:

- •
Variables are divided into two groups, G1 = [

*x*_{1},…,*x*_{k}] and G2 = [*x*_{k+1},…,*x*_{n}]. - •
*x*_{1}is called the*leader*, and it is assigned a random value in {0,1}. - •
The values of the

*followers x*_{2},…,*x*_{k}are set as a copy of*x*_{1}with probability 1 −*p*_{noise}and randomly with probability*p*_{noise}. - •
The values of the elements of G2 are assigned as a copy of a random element in G1 with probability

*p*_{copy}, or a random value with probability 1 −*p*_{copy}.

It is possible to tune the integration among elements in G1 and the mutual information between G1 and G2 by changing *p*_{noise} and *p*_{copy}. Note that, given a significant level of integration, we have two notable cases:

- 1.
*M*≈ 0 → isolated (possibly integrated) relevant set; - 2.
*M*≫ 0 → integrated and segregated cluster.

The possible scenarios that can be obtained by tuning *p*_{noise} and *p*_{copy} can be conveniently illustrated by a three-dimensional plot. Figure 3a shows the behavior of the integration of G1 as a function of *p*_{noise} and *p*_{copy}. We can observe that it is a decreasing function of *p*_{noise}, while it is independent of *p*_{copy} (by definition, indeed). The behavior of the mutual information between G1 and G2 is depicted in Figure 3b; as we can observe, *M* increases fast with *p*_{copy}, as this parameter increases the correlation between variables in G2 and G1. Moreover, it also increases with *p*_{noise}, but the reason is that the correlation among variables in G1 increases the randomness of variables in G1, which then behave similarly to the variables in G2. This last observation deserves to be discussed in more detail, because it concerns one of the main characteristics of the computation of DCI. By definition, the entropy values are computed on the basis of the occurrence of combinations of symbols; hence the information conveyed relates to frequencies and not to causal relations. For this reason, integration and mutual information, and consequently DCI, capture relations among the distributions of the values assumed by state variables. This property has the advantage of making it possible to find relevant sets even in the absence of any knowledge of the relations between the system's variables. Yet, it has the drawback that it might erroneously detect relations among independent variables drawn from the same distribution or it might miss identifying a relevant set because all the combinations of its variables appear with the same frequency as in a random sequence. However, since dynamical systems of interest are usually dissipative, this circumstance is not frequent.

The special case of *M* ≈ 0 corresponds to the situation in which G1 is almost completely independent of G2 and can be easily detected by observing only *M*. However, the general case of interest is that of discovering G1 to be a significant RS; then we would consider cases in which both *I* and *M* are significantly high. In Figure 4a the matrix representing the cases in which the method correctly detected G1 is depicted. For each value of *p*_{noise} and *p*_{copy} in {0.0,0.1,…,1.0} we ran the analysis and checked if the RS with highest *T*_{c} value corresponded to G1. Dark cells denote the successful cases. As can be observed, G1 is identified up to *p*_{noise} equal to 0.8, at decreasing values of *p*_{copy}. The method is quite robust, as it is able to distinguish the group of (even highly) noisy followers from the group G2, even at non-negligible *p*_{copy} values.

Furthermore, it should be observed that the method is superior to hierarchical clustering based on pair correlations. As an example, in Figure 4b we show the result of the application of a hierarchical clustering algorithm for the case *p*_{noise} = 0.8, *p*_{copy} = 0.2; we can note that G1 (i.e., the cluster composed of the variables *V*_{1}, *V*_{2}, and *V*_{3}) is missed. The reason for this difference is to be found in the fact that the method based on DCI takes into account multiple and not just pair correlations.

## 5 Results on Catalytic Reactions

The formation of sets of molecules able to collectively self-replicate is thought to be fundamental both for the origin of life [2, 4, 5, 6, 8, 16, 18, 21] and for new foreseeable biotechnological systems [27]. Accordingly, several attempts have been made to identify dynamical cores of self-replicating structures [7, 14, 17] that can be regarded as relevant sets according to our terminology. All these methods rely on the topological and structural properties of the reaction networks they deal with; hence dynamical information is not taken into account. In this section we investigate the capability of the DCI method to identify CRSs in systems where several reactions take place simultaneously, using only data concerning the values of the system variables (i.e., the concentrations of the various species) without any prior knowledge of the reaction graph. As discussed in Section 1, we will actually use a dynamical model, well known to us but totally unknown to the analyst, in order to generate the data.

In particular, we consider here a relatively simple reaction system with a clear organization (see Figure 5). It is composed of two distinct reaction pathways: a reaction chain (chain from now on) and an autocatalytic set (acs from now on) that take place in the same vessel, without however directly interfering with each other. The goal of this test will be achieved if the DCI method correctly identifies these subsets. Note that a simplified study of this system has already been presented in [32], but the analysis presented here is more complete in that it takes into account all the system variables.

The main entities of the model are *polymers* composed of symbols from a binary alphabet {A,B}. The only reactions allowed are condensation, where two polymers are linked to form a longer one, and cleavage, where a polymer is broken in two fragments. Moreover, the only reactions that are assumed to take place at a significant rate are those that are catalyzed. Therefore it turns out that condensations are three-molecule reactions (two substrates and a catalyst): In order for them to happen at an appreciable rate, it is supposed that a temporary complex with a finite lifetime is formed between the catalyst and one of the substrates—the complex can then give rise to the final product if it meets the other substrate. See [9, 10] for further details.

The two reaction pathways are composed of six different condensation reactions, three for the chain and three for the acs reactions (see Figure 5). Moreover, a control non-reacting species BB has been added to the system.

The model considered here describes the dynamics of the reaction system of Figure 5 in a well-stirred tank reactor (CSTR) with a constant influx of food molecules and a constant outflow rate, like the one studied by Filisetti et al. [9,–11]. Whereas that model was truly stochastic, for the purpose of testing the DCI method we resort here to a simplified deterministic approach, where the reaction scheme is translated, by using the law of mass action, into a set of ordinary differential equations. Simulations are performed using the CellDesigner software [12, 13].

It turns out that the asymptotic behavior of the system is a stable fixed point. When this is the case, the DCI method cannot be applied to the system variables, since there would be no sequence of states to consider, but only one. However, in order to display the main features of the system organization, we can perturb the stationary state and study its relaxation behavior. The particular kind of perturbation that has been studied is described in the caption of Figure 6, where also the time behaviors are shown.

While the dynamical cluster index method might be generalized to continuous-valued variables, in this article we concentrate on discrete values; therefore, continuous concentration values are coded according to a three-level code related to the sign of the time derivatives at time *t* (decreasing concentration, no significant change, increasing concentration). The dynamical cluster index has been applied both on the reduced system composed only of the molecules that are not found in the inflow (i.e., those of Figure 6b) and on the complete set of molecules (Figure 6a). Results concerning the former case are shown in Figure 7.

Without any information about the topology of the system, the dynamical cluster index is able to identify the two sets acs and chain as the top-ranking ones. In addition, hybrid sets containing molecules from both sets with a relatively high *T*_{c} are also found.

In Figure 8 the results concerning the overall system with all 26 variables are shown, and it can be seen that also in this case the two reaction pathways are clearly identified. With regard to that case, since the initial dynamical transient is not taken into account, the coding procedure returns molecules with a derivative equal to 0 along the whole observed time span. Since the entropy of a single fixed node is equal to 0, the mutual information of the node with respect to the rest of the system is 0 too. Again, the integration of sets composed only of fixed nodes is 0. Thus, the overall system is analyzed by removing those fixed nodes. In particular, the molecules with fixed concentrations are those involved in the first reaction of the chain (BAB, BBA, and ABB) and the control molecule (BB). With regard to the former, since the perturbations occur in the next steps of the chain, the upstream molecules are not affected. In contrast, the explanation of the latter is straightforward: Once the control molecule reaches its equilibrium concentration (during the initial transient), its concentration remains fixed in time.

## 6 Results on the Mitogen-Activated Protein Kinase (MAPK) Cascade

In this section we show, and discuss in some depth, the results of the application of the DCI to models of one of the major cellular signal transduction pathways, known as the mitogen-activated protein kinase (MAPK) cascade. This pathway responds to a wide range of external stimuli, triggering growth, cell division, and proliferation, and its biological significance is highlighted by the fact that it is evolutionarily conserved, from yeasts to humans. We will not provide here more details on the biological activity of this pathway, referring the interested reader to Sarma and Gosh [22].

Those authors also introduce the three models that will be considered in the following analysis. The basic model is composed of reactions that are quite well established from an experimental viewpoint, and it has the hierarchical structure shown in Figure 9. The three groups of chemicals that have been identified as the core of this three-layer system are the MAPKKK, MAPKK, and MAPK kinases (respectively M3K, M2K, and MK for short) [33]. M3K is activated by means of single phosphorylation, whereas M2K and MK are both activated by double phosphorylation [1, 3, 15]. Parallel to the phosphorylation by kinases, phosphatases present in the cellular volume can dephosphorylate the phosphorylated kinases (Figure 9 shows the schema of the MAPK cascade where each layer is dephosphorylated by a specific phosphatase). Note that superimposed on the three-layer structure of substrate-product reactions there is the so-called MAPK signaling cascade, a linear chain of catalysis (dashed lines in Figure 9) that transmit the external signal from M3K∗ to MK∗∗.

When the external signal and the concentrations of the phosphatases are kept constant, a top-down reaction scheme such as the one described in Figure 9 leads to fixed-point solutions. On the other hand, oscillations have been reported in the MAPK cascade [25] (as well as in many other biological circuits), and in order to account for them, Sarma and Gosh [22] make use of two models with feedback, described in Figure 10. The two variants (called S1 and S2 in what follows) are characterized by a different disposition of the activation of inhibiting interactions among layers—see Figure 9 for the details. Note that the insertion of these new catalytic actions alters the layered structure of the basic model, which is no longer strictly hierarchical. The presence of feedback from the bottom to the top layer gives rise to what is sometimes called a “tangled hierarchy” [20].

These alternative models are hypothetical but grounded on experimental data as well as on knowledge about chemical interactions; we will not enter here a discussion about the merits and limits of the two models, again referring the interested reader to the original article; but we will take them for granted and we will apply our method to test whether it can discover significant CRSs without any prior knowledge of the interactions, but on the sole basis of the dynamics of concentrations.

We have simulated Sarma and Gosh's models with the CellDesigner package [12, 13], keeping the P1, P2, and P3 phosphatases constant (as they do), and we obtained the same asymptotic states shown by those authors.

The time data could be obtained in different ways. However, the basic case leads to a fixed point, so it would be useful to use perturbations, as has been done in the case of the catalytic reaction system. We therefore decided to use the perturbation of asymptotic states for all the three models: In particular, we focused our analysis on the kinases and perturbed^{4} each kind of kinase 10 times, by allowing time for the system to find a new stable situation after the change. The stable situations that are reached can show both oscillating (S1 and S2 systems) or constant concentrations (basic system). Here again, as in Section 5, the three-level coding (concentration decrease, no change, concentration increase) is used.

As in the previous examples, the distribution of the *T*_{c} values shows that one or a few CRSs outperform the other ones: The highest-ranking ones are interesting and are shown in Figure 11. The three models of the MAPK cascade show both similarities and dissimilarities, so it is interesting to describe them in detail.

Remarkably, our method detects the presence of the starting layers (in the following indicated as Layer1, Layer2, and Layer3, ordered from the top to the bottom) among the first highlighted masks; interestingly, at the same time it proposes other groups that well describe the peculiarities of the interactions among the kinases.

Let us first consider the basic model whose analysis is described in Figure 11a. The top-ranking CRS is indeed layer 1 of the original model, while the third one is the set union of layers 2 and 3. This points to a peculiar feature of the concept of relevant subsets, namely, that good RSs can be (and are often) made out of smaller subsets that are also relevant. Anyway, layer 2 alone ranks fourth, immediately after the union of layers 3 and 4. Therefore a careful consideration of masks 1, 3, and 4 would by itself draw our attention to all the relevant layers.

We have so far neglected the candidate relevant subset that ranks second in Figure 11a where it is called a *hole*: It is essentially the third layer without the central molecule. The reason why this subset ranks so high is essentially a dynamical one, shown in Figure 12a where it can be seen that there is a strong dynamical correlation between the non-phosphorylated and the fully phosphorylated kinases of the third layer (MK and MK∗∗) in the transient shown (where one grows, the other decreases). On the contrary, MK∗ has a different behavior, and its derivative changes sign when the other two curves intersect each other. Our three-level coding cannot capture this aspect; therefore, the behavior of MK∗ appears less correlated than that of the other two. Similar remarks apply to the second layer, thus explaining the presence of other high-ranking subsets with holes.

Summarizing, an analysis of Figure 11a would highlight Layer1, Layer2+Layer3, and Layer2, therefore drawing attention to Layer3. The fact that Layer3 with a hole ranks higher than Layer3 itself requires a more detailed understanding of the system dynamics, but the DCI method would anyway induce the analyst to understand the dynamical organization of the system.

When we move to models S1 and S2, the layered structure is affected by the feedbacks. Nonetheless, by looking at Figure 11b referring to model S2, one can see that the basic level structure is still clearly seen, the three top rankings being Layer2, Layer1+Layer3, and Layer1 alone. The fact that Layer2 is now in a privileged position can be understood in that it is now the layer subject to negative feedbacks and it has therefore a strong influence on the system dynamics. We can note the appearance in second position of the association between Layer1 and Layer3—related to the positive feedback that was added to the basic model. Here we find a new arrangement, called “first two,” due to the interactions among the new dynamical structure and a subset of the perturbations: The non-phosphorylated form M2K monotonically increases, whereas the other two forms show the sequence decrease-increase-decrease (Figure 12b).

On the contrary, in model S1 the negative feedback acts on Layer1, and indeed in this case it is the corresponding mask that has the highest *T*_{c}, as shown in Figure 11c. The further analysis of the other CRSs is more cumbersome due to the presence of various holes; we merely observe that this modification has a stronger influence than S1 on the organization of the reaction network.

## 7 Conclusions

In this article we have introduced a method to identify candidate relevant subsets of variables in dynamical systems. The method does not require any knowledge of the relationships among the system variables, but relies on observations of their values in time. We tested its usefulness in a number of different application domains, where the data are generated by a model and the DCI method is applied in order to uncover significant aspects of its organization. In order of increasing difficulty, we considered models of

- (a)
random Boolean networks, where the whole network is made of different subnetworks with different topological relationships (independent or interacting subnetworks),

- (b)
leader-follower dynamics, subject to noise and fluctuations,

- (c)
catalytic reaction networks in a flow reactor,

- (d)
the MAPK signaling pathway in eukaryotes.

We refer to the previous sections for all the details; here we want to synthesize some broad observations and comments that come from those experiences, and point to further research directions.

The main novelty of the present work, in comparison with previous applications of the cluster index and of similar measures [28], is that we use it to consider truly dynamical systems, and not only fluctuations around stable asymptotic states. In principle, different kinds of data can be considered. In the case of a deterministic dynamical system, attractors are the main candidates to provide the required time series, and we have shown in case (a) that they can work effectively. Note, however, that, for reasons discussed in Section 5, the method is ineffective in a situation that is sometimes encountered, namely, if there is just a single fixed point. Therefore we conclude that the use of attractor states is effective only if the attractor landscape is rich enough to show the main features of the system organization. This has usually to be evaluated a posteriori, with the exception of trivial cases like that of a single fixed point.

A possible alternative, or perhaps complementary, approach is based on using transient data from arbitrary initial conditions or, perhaps even better, from perturbations of attractor states. This can be very effective, as shown in Sections 5 and 6. Note that the three-level coding used there regards the similarities of the derivatives rather than those of the values of the variables, and that the models that were used in this article to generate the time series are all based on first-order ordinary differential or difference equations; it should be verified whether the approach is valid also when higher-order dynamical systems are considered. Of course, high-order ODE systems can be transformed into first-order systems by adding variables, but the auxiliary variables that are required might turn out to be unobservable.

When a system is subject to continuous external disturbances, the time series directly provides the required data, and the results of Section 2 show that our treatment can reveal its organizational features even when a high noise level is present.

Actually, the range of applicability of this method is quite broad, and it need not be limited to dynamical systems. Indeed, the method needs just a set of frequencies of co-occurrences of the values of the system variables. We have always considered here time series data, but that may not always be the case. Note for example that, if we were to scramble the order in which the attractor states of the RBNs of Section 3 are reported, this would not change the relative frequencies or therefore the ranking of the relevant subsets. So the method can be applied also to many other systems, since all that is required is a series of cases associated with vectors of numerical variables that are not necessarily ordered in time (think for example of different patients, each one described by a vector of values of various symptoms).

A final comment is that the method is not a brute-force one: When there is a clear organization in the system, (as in case (a) when the networks are disjoint, and in cases (b) and (c)), the organization can be read out directly from the ordered list of candidate relevant sets; but this does not always happen. The study of case (d) shows that even in entangled real systems the method provides useful clues to uncover the system organization, but that their interpretation may sometimes require considerable ingenuity.

As far as future directions of research are concerned, let us only briefly mention some major ones. Several aspects have been mentioned above, and will be subject to further analysis, including the effective analysis of high-order dynamical systems with a simplified discrete coding and the comparison among different kinds of time series data: attractors, transients from arbitrary initial conditions, and perturbations.

In its present version, the method needs discrete variables, so when dealing with continuous data we have introduced a three-level coding that necessarily misses some information (as documented, e.g., by the discussion of Section 6). Different coding schemes could be compared; moreover one might also consider the continuous generalization of the proposed method.

Other research will be devoted to improvements of the method: It is apparent that it faces a huge computational problem for large systems, since the number of possible subsets increases extremely fast with the number of variables. Exhaustive search is impossible, and effective heuristics need to be tested and developed to limit the number of candidate subsets to be screened.

Several other specific cases need to be studied to confirm the validity of the approach, mainly from real-world data. Let us mention that, among others, we are considering applications to the study of innovation processes, where data come from the real world, and not from models. In this respect, it is important to realize that the DCI method may be integrated with other approaches: When dealing with real-world problems, and not with data generated by a perfectly known model, a typical situation that is often encountered is that one knows some relationships between variables, but not all of them. In this case, the most promising approach would be based on a combination of the a priori knowledge with the DCI, in ways that still need to be tested.

Moreover, it should be recalled that the dynamical cluster index is just one of several information-theoretic measures that might be applied to analyze dynamical systems. It is worth noticing that the integration and the mutual information can be useful even if used in isolation, and not combined in the DCI. But other entropies (e.g., those that refer to joint distributions at different times) might prove particularly useful for the study of dynamical systems.

But there is also another line of development of this type of research that may be worth pursuing, concerning the very notion of the organization of a dynamical system. So far, these analyses have largely relied on relatively simple concepts like that of a chain of influences, perhaps with feedback, and that of a layered level structure, perhaps perturbed by some feedback from low levels to higher ones. The case we studied in Section 6 shows that there may be some inobvious combinations of variables that seem to play a role, and the DCI could be a way to explore this new territory.

## Acknowledgments

This article has been partially funded by the UE projects “MD—Emergence by Design” (Pr.ref. 284625) and “INSITE—The Innovation Society, Sustainability, and ICT” (Pr.ref. 271574), under the 7th FWP-FET program. Useful discussions with David Lane and Stefano Benedettini are gratefully acknowledged.

## Notes

The integration is also known as intrinsic information or multi-information.

Indeed, the method is even more general; see Section 7 for a comment.

In the case of nondeterministic or noisy systems a possibility for estimating the attractors' weights is that of using the persistence times of the systems in each of them [31].

More precisely, we made a perturbation each 1000 simulated seconds (out of 100,000 simulated seconds of a very long run, and avoiding its initial transient phases; samples for the analysis are recorded every 10 simulated seconds). For each chemical species we chose a random value among the admissible ones, and cyclically imposed it in such a way that each avalanche of changes starts with only one perturbed kinase.

## References

## Author notes

Contact author.

European Centre for Living Technology, Ca' Minich, S. Marco 2940, 30124 Venezia, Italy. E-mail: marco.villani@unimore.it (M.V.); alessandro.filisetti@unibo.it (A.F.); irenpoli@unive.it (I.P.)

Department of Physics, Informatics and Mathematics, University of Modena e Reggio Emilia, v. Campi 213b, 41125 Modena, Italy. E-mail: rserra@unimore.it

DISI Alma Mater Studiorum, University of Bologna, Campus of Cesena, Via Venezia 52, I-47521 Cesena, Italy. E-mail: andrea.roli@unibo.it

Ca' Foscari University of Venice, Department of Environmental Sciences (DAIS). E-mail: marco.fiorucci@unive.it