In order to develop systems capable of artificial evolution, we need to identify which systems can produce complex behavior. We present a novel classification method applicable to any class of deterministic discrete space and time dynamical systems. The method is based on classifying the asymptotic behavior of the average computation time in a given system before entering a loop. We were able to identify a critical region of behavior that corresponds to a phase transition from ordered behavior to chaos across various classes of dynamical systems. To show that our approach can be applied to many different computational systems, we demonstrate the results of classifying cellular automata, Turing machines, and random Boolean networks. Further, we use this method to classify 2D cellular automata to automatically find those with interesting, complex dynamics. We believe that our work can be used to design systems in which complex structures emerge. Also, it can be used to compare various versions of existing attempts to model open-ended evolution (Channon, 2006; Ofria & Wilke, 2004; Ray, 1991).

## 1 Introduction

There are many approaches to searching for systems capable of open-ended evolution. One option is to carefully design a model and observe its dynamics. Iconic examples were designed by Ray (1991), Ofria and Wilke (2004), Channon (2006), and Soros and Stanley (2012). However, as we lack any universally accepted formal definition of open-endedness or complexity, there is no formal method of proving the system is indeed “interesting.” Conversely, lacking definitions of such key terms, it seems extremely difficult to design such models systematically.

Approaching the problem of searching for open-endedness bottom up, we can define a suitable classification of dynamical systems that would help us identify a region of complexity. An ideal classification would be based on a formally defined property, be effectively computable, and help us automatically search for complex systems possibly capable of modeling artificial evolution.

Over the years, many different metrics have been introduced to study a system's dynamics. As an example, cellular automata (CAs) were studied in terms of their space-time dynamics observations (Wolfram, 1984), their space-time compression sizes (Zenil, 2010), their actions on probability measures (Gutowitz, 1990), the *Z*-parameter (Wuensche & Lesser, 2001), and the lambda parameter (Langton, 1986). Most of such approaches show that the complex region of systems lies somewhere “in between” the ordered and chaotic phase.

In this article, we introduce a novel method of classifying complex systems based on estimating their asymptotic average computation time with increasing space size. The key result is that the classification identifies a region of systems that seem to be at a phase transition between ordered and chaotic behavior. Across various classes of discrete systems, we demonstrate that complex systems such as CAs computing nontrivial tasks, universal Turing machines (TMs), and random Boolean networks (RBNs) at a critical phase belong to this region. Even though we are far from characterizing complexity, we hope this method helps us understand which formally defined properties correlate with it.

This article is an extension of Hudcová and Mikolov (2020). Compared to our previous work where we concentrated solely on CAs, this article includes additional results of the classification method of TMs and RBNs as well as general conclusions about the average computation time at a phase transition.

## 2 Transient Classification: A General Method

We first introduce the basic principle of classification based on transients, which can be applied to any deterministic discrete space and time dynamical system (DDDS). In subsequent sections, we describe the results of the classification applied to CAs, TMs, and RBNs to demonstrate its use across different classes of discrete dynamical systems.

### 2.1 Basic Notions

*D*operating on finite space, characterized by a tuple

*D*= (

*S*,

*F*) where

*S*is a finite set of configurations and

*F*:

*S*→

*S*is a global transition function governing the dynamics of the system. We define the

*trajectory of a configuration u*∈

*S*as the sequence

*S*is finite, every trajectory eventually becomes periodic. We call the preperiod of this sequence the

*transient of initial configuration u*and denote its length by

*t*. More formally, we define

_{u}*t*to be the smallest positive integer

_{u}*i*, for which there exists

*j*∈ $N$,

*j*>

*i*, such that

*F*(

^{i}*u*) =

*F*(

^{j}*u*). The periodic part of the sequence is called an

*attractor*. The

*phase-space*of

*D*= (

*S*,

*F*) is an oriented graph with vertices

*V*=

*S*and edges

*E*= {(

*u*,

*F*(

*u*)),

*u*∈

*S*}. Such a graph is composed of components, each containing one attractor and multiple transient paths leading to the attractor. The phase-space completely characterizes the dynamics of the system. However, it is infeasible to describe when the configuration space

*S*is large. Given a DDDS

*D*, we will focus on studying its

*average transient length*

### 2.2 The Main Principle

*F*:

_{i}*S*→

_{i}*S*and |

_{i}*S*| < |

_{i}*S*

_{i+1}| for each

*i*. For instance, the sequence can be given by a cellular automaton with a fixed local rule, operating on a finite cyclic grid of growing size.

Our goal is to estimate the asymptotic growth of the system's average transient lengths, as shown in Figure 1.

In practice, we generate a finite part of the sequence (|*S _{i}*|,

*T*(

*D*)$)i=1B$ where

_{i}*B*is an upper bound imposed by our computational limitations and examine different regression fits of the data. Specifically, we evaluate the fit to constant, logarithmic, linear, polynomial, and exponential functions. We pick the best fit with respect to the

*R*

^{2}score and obtain the classes: Bounded, Log, Lin, Poly, and Exp. If the score of the fit to all such functions is low (i.e.,

*R*

^{2}< 85%), we say the system is Unclassified. Surprisingly, we found a very good fit to one of the classes with

*R*

^{2}> 90% for most DDDSs we examined. The trend we have observed, which seems to hold across various families of DDDSs, is shown in Figure 2. We describe it in more detail for each family in the subsequent sections.

We do not claim our method determines the true asymptotic behavior of a system; it is merely a possible interpretation of the method. For some systems, the transient growth might correspond to more complicated functions but we have deliberately chosen the classes to be quite robust and coarse to have clearer boundaries between them.

The uncertainty of the true asymptotic growth is especially relevant for the Lin and Poly Classes which identify the critical phase transition region. Such systems might turn out to be logarithmic or exponential, and it might be the case that we have not detected this due to our limited data. However, in such a case, such systems would exhibit significantly slower convergence to their asymptotic behavior than systems in other classes, which is a typical property of a system at a phase transition.

### 2.3 Computational Interpretation

In non-classical models of computation (Stepney, 2012), the process of traversing a discrete system's transients can be perceived as the process of self-organization, in which information can be aggregated in an irreversible manner. The attractors are then viewed as memory storage units, from which the information about the output can be extracted. For CAs this is explored in Kaneko (1986). The average transient growth then corresponds to the average computation time of the system^{1}. Therefore, we can interpret our goal as trying to estimate a system's *asymptotic average computation time*. DDDSs with bounded transient lengths can only perform trivial computation. On the other hand, DDDSs with exponential transient growth can be interpreted as inefficient computation models.

In the context of artificial evolution, we can view the global transition rule of a DDDS as the physical rule of the system, whereas the initial configuration is the particular “setting of the universe,” which is then subject to evolution. If we are interested in finding DDDSs capable of complex behavior automatically, it would be beneficial for us if such behavior occurred on average, rather than having to select the initial configurations carefully from some narrow region. The probability of finding such special initial configurations would be extremely low as the configuration space tends to be very large. This motivates our study of the growth of average transient lengths rather than the maximum ones.

### 2.4 Average Transients: Error Estimate

Let us fix a DDDS *D* = (*S*, *F*) operating on a large configuration space, e.g., |*S*| ≫ 2^{100}. In such a case, computing the average transient length *μ* is infeasible. Thus, we uniformly randomly sample initial configurations *u*_{1}, *u*_{2}, …, *u*_{m} and estimate *μ* by $1m\u2211i=1m$*t*_{ui}. It remains to estimate the number of samples *m* so that the error |$1m\u2211i=1m$*t*_{ui} − *μ*| is reasonably small.

More formally, for *D* = (*S*, *F*), let (*S*, *P*) be a discrete probability space where *S* is the set of all configurations and *P* is a uniform distribution. Let *X* : *S* → $N$ be a random variable, which sends each *u* to its transient length *t _{u}*. This gives rise to a probability distribution of transient lengths on $N$ with mean

*E*(

*X*) and variance

*var*(

*X*). It can be easily shown that

*E*(

*X*) =

*μ*. Our goal is to obtain a good estimate of

*E*(

*X*) by the Monte Carlo method (Owen, 2013).

*X*

_{1},

*X*

_{2}, …,

*X*) be a random sample of iid random variables,

_{m}*X*

_{i}$=d$

*X*for all

*i*. Let

*μ*

^{(m)}= $1m\u2211i=1m$

*X*be the sample mean and

_{i}*X*is a mapping from a finite set,

*var*(

*X*) < ∞, and thus we have by the central limit theorem the convergence to a normal distribution. The interval

*u*is the

_{β}*β*quantile of the normalized normal distribution covers

*μ*for

*m*large with probability approximately 1 −

*α*. We will take

*α*= 0.05. Hence, with probability approximately 95%

*E*(

*X*) =

*μ*and

*var*(

*X*) tend to grow with increasing size of the configuration space. Therefore, to employ a general method of estimating the number of samples, we normalize the error by the sample mean and consider

*m*sufficiently large such that

*μ*

^{(m)}differs from

*μ*by at most

*ϵ*· 100% with probability approximately 95%.

In practice, we put *ϵ* = 0.1 and produce the observations in batches of size 20 until the condition in Equation 1 is met (for most elementary CAs this was satisfied typically after 400 data points were sampled). We approximate the uniform random sampling of initial configurations using Python's numpy.random library (https://numpy.org/doc/1.16/reference/routines.random.html).

## 3 Cellular Automata

### 3.1 Introducing Cellular Automata

Informally, a cellular automaton can be perceived as a *k*-dimensional grid consisting of identical finite state automata. They are all updated synchronously in discrete time steps based on an identical local update function depending only on the states of automata in their local neighborhood. A formal definition can be found in Jarkko Kari (2005).

CAs were first studied as models of self-replicating structures (Langton, 1984; Neumann & Burks, 1966; Reggia et al., 1993). Subsequently, they were examined as dynamical systems (Hedlund, 1969; Gutowitz et al., 1987; Vichniac, 1984), or as models of computation (Mitchell, 1998; Toffoli, 1977). Being so simple to simulate, yet capable of complex behavior and emergent phenomena (Crutchfield & Hanson, 1993; Hanson, 2009), CAs provide a convenient tool to examine the key, yet undefined notions of complexity and emergence.

#### 3.1.1 Basic Notions

We study the simple class of *elementary cellular automata* (ECAs), which are one-dimensional CAs with two states {0, 1} and neighborhood of size 3. Each ECA is given by a local transition function *f* :{0, 1}^{3} → {0, 1}. Hence, there are only 256 of them. The size of this CA class is the reason for making it our first case study. One can simply explore it by studying the dynamics of every single ECA.

*f*determining an ECA with

*Wolfram number*of the ECA defined as:

*k*” where

*k*is the corresponding Wolfram number of its underlying local rule.

We will consider the CA to operate on finite grids with periodic boundary conditions. Hence, given a local rule *f* and a grid size *n*, we obtain a configuration space {0, 1}* ^{n}* and a global update function

*F*: {0, 1}

*→ {0, 1}*

^{n}*.*

^{n}Let ({0, 1}* ^{n}*,

*F*) be an ECA operating on a grid of size

*n*and (

*u*,

*F*(

*u*),

*F*

^{2}(

*u*), …) a trajectory of a configuration

*u*∈ {0, 1}

*. The space-time diagram of such a simulation is obtained by plotting the configurations as horizontal rows of black and white squares (corresponding to states 1 and 0) with a vertical axis determining the time, which is progressing downwards.*

^{n}### 3.2 History of CA Classifications

An ideal classification would be based on a rigorously defined and efficiently measurable property, identifying a region of systems with interesting behavior. In this section, we describe three qualitatively different classifications of ECAs, and subsequently, we will compare our results to them.

#### 3.2.1 Wolfram's Classification

The most intuitive and simple approach to examining the dynamics of CAs is to observe their space-time diagrams. This method was particularly proclaimed by Wolfram (2002). Therein, he established an informal classification of CA dynamics based on such diagrams. Wolfram (2002, p. 231–249) distinguishes the following classes, which are shown in Figure 3.

Class 1 … quickly resolves to a homogenous state

Class 2 … exhibits simple periodic behavior

Class 3 … exhibits chaotic or random behavior

Class 4 … produces localized structures that interact with each other in complicated ways

#### 3.2.2 Zenil's Classification

In the first part of his article Zenil (2010) studied the compression size of the space-time diagrams of each ECA simulated for a fixed amount of steps. For the classification, he examines the simulations from a particular initial configuration (a single one surrounded by zeros). Using a clustering technique, Zenil obtained two classes distinguishing between Wolfram's simple classes 1 and 2 and complex classes 3 and 4. We show our reproduction of Zenil's results in Figure 5.

Zenil's method nicely formalizes Wolfram's observations of the space-time diagrams. However, the results depend on the choice of initial conditions as well as the grid size, data representation, and the compression algorithm. We conducted multiple experiments presented in Figure 6, which suggest that Zenil's results might be sensitive to the choice of such parameters. We note that he addresses the sensitivity to the choice of the initial configurations in the second part of his article (Zenil, 2010).

In vast CA spaces where it is not feasible to examine every CA and mark it into one of Wolfram's classes by hand, it is not clear how the parameter values should be chosen. Moreover, the data representation causes the extension of this method to more general dynamical systems to be problematic; for example, using gzip (https://www.gnu.org/software/gzip/) to compress space-time diagrams of a 2D CA is suboptimal.

#### 3.2.3 Wuensche's *Z*-parameter

Wuensche and Lesser (2001) chose an interesting approach by studying ECA behavior when reversing the simulations and computing the preimages of each configuration. They introduce the *Z*-parameter, representing the probability that a partial preimage can be uniquely prolonged by one symbol and suggest that Class 4 CAs typically occurs at *Z* ≈ 0.75. However, no clear classification is formed. The crucial advantage is that the *Z*-parameter depends only on the CA's local rule and can be computed effectively. It is, however, questionable whether studying only the local rule could describe the overall dynamics of a system sufficiently well.

### 3.3 Transient Classification of ECAs

*f*, we consider the sequence of systems

We consider all 256 ECAs up to equivalence classes obtained by changing the role of left and right neighbor, the role of 0 and 1 state, or both. It can be easily shown that automata in the same equivalence class have isomorphic phase spaces for any grid size. Thus, they perform the same computation. This yields 88 effectively different ECAs, each being a representative with the minimum Wolfram number from its corresponding equivalence class. In this section, we present the classification of the 88 unique ECAs based on their asymptotic transient growth.

#### 3.3.1 Results

We obtained a surprisingly clear classification of all 88 unique ECAs with four major classes corresponding to the bounded, logarithmic, linear, and exponential growth of average transients. Below, we give a more detailed description of each class.

**Bounded Class**: 27/88 rules (30.68%) The average transient lengths were bounded by a constant independent of the grid size. This suggests that the long-term dynamics of such automata can be predicted efficiently. See Figure 7.

**Log Class**: 39/88 rules (44.32%) The largest ECA class exhibits logarithmic average transient growth. The event of two cells at a large distance “communicating” is improbable for this class. See Figure 8.

**Lin Class**: 8/88 rules (9.09%) On average, information can be aggregated from cells at an arbitrary distance. This class contains automata whose space-time diagrams resemble some sort of computation.This is supported by the fact that this class contains two rules known to have a nontrivial computational capacity: rule 184, which computes the majority of black and white cells, and rule 110, which is the only ECA so far proven to be Turing complete (Cook, 2004).

We note that rules in this class are not necessarily complex as the interesting behavior seems to correlate with the slope of the linear growth. Most of Lin Class rules had only a very gradual incline. In fact, the only two rules with such slope greater than 1, rules 110 and 62, seem to be the ones with the most interesting space-time diagrams. See Figure 9.

We are aware that average transients of rules in Lin Class might turn out to grow logarithmically or exponentially given enough data samples. In such a case, the rules in Lin Class show a significantly slower convergence to their asymptotic behavior, which supports the hypothesis that they belong to a phase transition region.

**Exp Class**: 6/88 rules (6.82%) This class has a striking correspondence to automata with chaotic behavior. Visually, there seem to be no persistent patterns in the configurations. Not only the transients but also the attractor lengths are significantly larger than for other rules. The rules with the fastest growing transients are 45, 30, and 106. See Figure 10.

**Affine Class**: 4/88 rules (4.55%) This class contains rules 60, 90, 105, and 150 whose local rules are affine Boolean functions. Such automata can be studied algebraically and predicted efficiently. It was shown in Martin et al. (1984) that the transient lengths of rule 90 depend on the largest power of 2, which divides the grid size. Therefore, the measured data did not fit any of the functions above but formed a rather specific pattern. See Figure 11.

**Fractal Class**: 4/88 rules (4.55%) This class contains rules 18, 122, 126, and 146 which are sensitive to initial conditions. Their evolution either produces a fractal structure resembling a Sierpinski triangle or a space-time diagram with no apparent structures. We could say such rules oscillate between easily predictable behavior and chaotic behavior. Their average transients and periods grow quite fast, making it difficult to gather data for larger grid sizes. See Figure 12.

### 3.4 Discussion

We have also tried to measure the asymptotic growth of the average attractor size *a _{u}*,

*u*∈ {0, 1}

*as well as the average rho value defined as*

^{n}*ρ*=

_{u}*t*+

_{u}*a*. This, however, produced data points that could not be fitted to simple functions well. This is due to the fact that many automata have attractors consisting of a configuration, which is shifted by one bit to the left, resp. right, at every time step. The size of such an attractor then depends on the greatest common divisor of the size of the period of the attractor and the grid size, and this causes oscillations. We conclude that such phase-space properties are not suitable for this classification method.

_{u}Below, we compare our results to other classifications described earlier. An exhaustive comparison for each ECA is presented in Figure 13.

**Wolfram's Classification**. The significance of our results for ECA stems precisely from the fact that the transient classification corresponds to Wolfram's (2002) so well. As it is not clear for many rules which Wolfram class they belong to, the main advantage is that we provide a formal criterion upon which this could be decided.

In particular, rules in Bounded Class and Log Class correspond to rules in either Class 1 or Class 2. Exp Class corresponds to the chaotic Class 3, and Lin Class contains Class 4 together with some Class 2 rules. We note an interesting discrepancy: Rule 54, which is possibly considered by Wolfram to be Turing complete, belongs to the Exp Class. This might suggest that computations performed by this rule can be on average quite inefficient.

**Zenil's Classification**. Zenil's (2010) classification of ECAs offers a great formalization of Wolfram's and seems to roughly correspond to it. Compared to the transient classification, it is, however, less fine-grained. Moreover, it contains some arbitrary parameters, such as the data representation and compression algorithm used. In addition, it uses a clustering technique, which requires data of multiple automata to be mutually compared in order to give rise to different classes. In contrast, the transient class can be determined for a single automaton without any context.

Another important difference is that Zenil observed the simulations from a fixed initial configuration; that is, he examined the local dynamics of ECAs. In contrast, the transient classification studies their global dynamics.

**Wuensche's Z-parameter**. Wuensche (Wuensche & Lesser, 2001) suggests that complex behavior occurs around

*Z*= 0.75, which agrees with the fact that Lin Class rules with a steep slope (rule 110, 62, and 25) have this

*Z*value precisely. However, the

*Z*= 0.75 is in fact quite frequent. This suggests that thanks to its simplicity, the

*Z*parameter can be used to narrow down a vast space of CA rules when searching for complexity. However, more refined methods have to be subsequently applied to find concrete CAs with interesting behavior.

### 3.5 Transient Classification of 2D CAs

So far, we have examined the toy model of ECAs. Transient classification's true usefulness would stem from its application to more complex CAs, where it could be used to discover automata with interesting behavior. Therefore, we applied the classification on a subset of two-dimensional CAs with a 3 × 3 neighborhood and three states to see whether 2D automata would still exhibit such clear transient growths.

We work with 2D CAs operating on a finite square grid of size *n* × *n*. We consider the topology of the grid to be that of a torus for each cell to have a uniform neighborhood. We estimated the average transient length and measured the asymptotic growth with respect to *n* (i.e., the size of the square grid's side). This is motivated by the fact that in an *n* × *n* grid, the greatest distance between two cells depends linearly on *n* rather than quadratically.

To reduce the vast automaton space, we only considered such automata whose local rules are invariant to all the symmetries of a square. As there are still 3^{2861} such symmetrical 2D CAs, we randomly sampled 10,000 of them.

For such a large space, we could not examine each CA individually. Therefore, we fit the average transient growth to bounded, logarithmic, linear, polynomial, and exponential functions to obtain the Bounded, Log, Lin, Poly, and Exp classes. If none of the fits gave a good enough score (i.e., *R*^{2} > 85%), then we marked the corresponding CA as unclassified. We were able to classify 93.03% of 10,000 sampled automata with a time bound of 40 s for the computation of one transient length value on a single CPU. We estimate that most CAs are unclassified due to such a computation resources restriction, or rather strict conditions we imposed on a good regression fit. In this large space of 2D CAs Exp Class seems to dominate the rule space. Another interesting aspect in which 2D CAs differ from the ECAs is the emergence of rules in Poly Class; the transients of such rules grow approximately quadratically. Moreover, our results suggest that the occurrence of Bounded Class CAs in 2D is much scarcer as we found no such CAs in our sample. See Table 1.

Transient Class . | Percentage of CAs . |
---|---|

Bounded | 0% |

Log | 18.21% |

Lin | 1.17% |

Poly | 1.03% |

Exp | 72.62% |

Unclassified | 6.97% |

Transient Class . | Percentage of CAs . |
---|---|

Bounded | 0% |

Log | 18.21% |

Lin | 1.17% |

Poly | 1.03% |

Exp | 72.62% |

Unclassified | 6.97% |

We observed the space-time diagrams of randomly sampled automata from each class to infer its typical behavior. On average, Log Class automata quickly enter attractors of small size. Lin Class exhibits the emergence of various local structures. For automata with a more gradual incline, such structures seem to die out quite fast. Automata with steeper slopes exhibit complex interactions of such structures. Poly Class automata with a steep slope seem to produce spatially separated regions of chaotic behavior against a static background. In the case of more gradual slopes, some local structures emerge. Finally, Exp Class seems to be evolving chaotically with no apparent local structures. We present various examples of CA evolution dynamics in the form of GIF animations here^{2}.

This suggests that the regions of Lin Class with a steep slope and Poly Class with a more gradual incline seems to contain a non-trivial ratio of automata with complex behavior. In this sense, the transient classification can assist us to automatically search for complex automata similarly to the method designed by Cisneros et al. (2019) where interesting novel automata were discovered by measuring growth of structured complexity using a data compression approach.

### 3.6 Transients Classification of Other Well-Known CAs

We were interested in whether some well-known complex automata from larger CA spaces would conform to the transient classification as well. As we show in this section, the result is positive.

**Game of Life**. As the left plot in Figure 14 suggests, the Turing complete Game of Life (Gardener, 1970) seems to fit Lin Class. This is confirmed by the linear regression fit with *R*^{2} ≈ 98.4%.

**Genetically Evolved Majority CAs**. Mitchell et al. (2000) studied how genetic algorithms can evolve CAs capable of global coordination. The authors were able to find a 1D CA denoted as *ϕ _{par}* with two states and radius

*r*= 3, which is quite successful at computing the majority task with the output required to be of the form of a homogenous state of either all 0's or all 1's. This CA seems to belong to Lin Class, which is confirmed by the linear regression fit with

*R*

^{2}≈ 99.2%. See Figure 15.

**Totalistic 1D 3-state CAs**. A totalistic CA is any CA whose local rule depends only on the number of cells in each state and not on its particular position. Wolfram studied various CA classes, one of them being the totalistic 1D CA with radius *r* = 1 and 3 states *S* = {0, 1, 2}.

Wolfram (2002) presents a list of possibly complex CAs from this class. We applied the transient classification to such CAs and learned that most of them were classified as logarithmic. This agrees with our space-time diagram observations that the local structures in such CAs “die out” quite quickly. Nonetheless, some of the CAs were classified as linear. An example of such a CA is in Figure 16 where the linear regression fit has *R*^{2} ≈ 97.63%.

## 4 Turing Machines

In order to demonstrate the generality of the transient classification method, we further used it to examine the dynamics of TMs. In this section, we present the classification results.

### 4.1 Introducing Turing Machines

Informally, a TM consists of an infinite tape divided into cells and a movable reading head that scans one cell of the tape at a time. Every cell contains a symbol from some finite alphabet *A*, and the TM is at an internal state from a finite set *S*. Depending on the symbol the head is reading and on its internal state, the TM changes its internal state, rewrites the symbol on the tape, and either moves one cell to the left or right, or stays in place. TMs represent the most classical model of computation; the Church–Turing thesis states that “effectively calculable functions” are exactly those that can be realized by a TM (Turing, 1937). For a formal definition of TMs as well as a great introduction to computability theory, see Soare (2016).

*S*will always denote the finite set of internal states,

*A*will denote the finite set of tape symbols. To ensure the TM operates on a finite grid, as in the case of CAs, we will consider a tape of finite size with periodic boundary conditions. Therefore, each TM with

*S*and

*A*operating on a tape of size

*n*gives rise to a global update function

*any*attractor. This is a much weaker notion of halting.

We will depict the space-time diagrams of TM computation as a matrix, each row corresponds to the content of the tape at subsequent time steps, and time is progressing downwards. As opposed to CAs with their inherently parallel nature, TMs are sequential computational models. Thus, at each time step, only one symbol on the tape is changed. To produce space-time diagrams comparable to those produced by CAs, we only depict tape contents at every *n*th step where *n* is the size of the tape. This helps us to intuitively recognize the chaotic dynamics of TMs; see Figure 17. To the best of our knowledge, we know of no prior work examining the transients of TMs operating on cyclic tapes.

### 4.2 Transient Classification of Turing Machines

We have studied “small” TMs with the number of states |*S*| ranging from 4 to 8 and the number of alphabet symbols |*A*| ranging from 2 to 5. For every such combination of values |*S*| and |*A*|, we have randomly generated 100 transition functions of TMs and computed each TM's average transient length estimate for cyclic tapes of sizes ranging from 20 to 400.

#### 4.2.1 Results

For all the considered values of |*S*| and |*A*|, more than 90% of the TMs were successfully classified using the transient classification method. We discuss a particular example in more detail below.

**Touring Machines with 7 states and 4 symbols**. As an example, we present classification results of 100 TMs with 7 states and 4 tape symbols. The results are summarized in Table 2.

Transient Class . | Percentage of TMs . |
---|---|

Bounded | 41% |

Log | 2% |

Lin | 28% |

Poly | 13% |

Exp | 15% |

Unclassified | 1% |

Transient Class . | Percentage of TMs . |
---|---|

Bounded | 41% |

Log | 2% |

Lin | 28% |

Poly | 13% |

Exp | 15% |

Unclassified | 1% |

**Bounded Class**: (41/100 TMs) In this space of rather small TMs, Bounded Class seems to dominate the space. TMs in Bounded Class halt in time independent of the tape size. Therefore, for such TMs it seems improbable to perform any nontrivial computation on both the finite and infinite tape.

**Log Class**: (2/100 TMs) Log Class seems to be relatively small across all the TM classes we have examined. Here, the event that a TM head will read the whole input from the tape is improbable for large tape sizes.

**Lin Class**: (28/100 TMs) Let us consider a TM with trivial dynamics, which, given any input configuration, traverses each cell one by one and changes the state of each cell to the state 0 ∈ *S*. After all cells enter this state, the computation halts. Such trivial behavior could be realized in constant time by a CA, but for a TM it takes at least *n* steps where *n* is the size of the tape. Hence, some TMs in Lin Class exhibit periodic or simple dynamics. This emphasizes the fact that being contained in Lin Class or Poly Class seems to be a necessary condition for complexity, not a sufficient one. See Figure 18. Nevertheless, we have observed TM in the Lin Class whose space-time diagrams seem to contain some higher-level structures.

**Poly Class**: (13/100 TMs) In Poly Class we have also observed TMs producing some higher-order structures. See Figure 19.

**Exp Class**: (15/100 TMs) We find it interesting that once only every *n*th row of the space-time diagram is depicted (*n* being the tape size). The space-time diagrams of TMs in Exp Class resemble the space-time diagrams of chaotic CAs. See Figure 20.

As in the case of CAs we are aware that the true asymptotic behavior of TMs in Lin Class or Poly Class might turn out to be logarithmic or exponential. In such a case, the systems in these classes would need significantly longer time to converge to their typical long-term behavior, which is a typical property of systems at a phase-transition.

### 4.3 Transient Classification of Universal TM

Without much doubt, universal TMs are considered complex. We have estimated the asymptotic average computation time of 7 universal TMs with a small number of states and symbols constructed by Rogozhin (1996). All were successfully classified, 6 belonging to Poly Class and 1 to Lin Class. An example is shown in Figure 21.

Such results agree with the ones obtained for CAs and support the hypothesis that complex dynamical systems belong to Lin Class or Poly Class.

## 5 Random Boolean Networks

Random Boolean networks form a very wide class of discrete dynamical systems that contains both CAs and TMs. In this section, we show that dynamical systems from this general class also conformed to our classification method.

### 5.1 Introducing Random Boolean Networks

The classical *N*−*K* RBN is given by an oriented graph with *N* nodes, each having exactly *K* edges pointing toward it. In addition, each node is equipped with a Boolean function of *K* variables. Every node can have the value of either 0 or 1; therefore, the configuration space is exactly {0, 1}* ^{N}*. To update a particular configuration of the network, the values of all nodes are changed in parallel, according to the outputs of their corresponding Boolean functions. This gives rise to a global update rule

*F*: {0, 1}

*→ {0, 1}*

^{N}*. For a concise introduction to RBNs see Gershenson (2004).*

^{N}RBNs were first introduced by Stuart Kauffman (1969) as models of gene regulatory networks. Classically, the nodes are interpreted as genes of a particular organism; their value represents whether the gene is “turned on” or “off.” In this setting, the attractors of the network represent different cell types of the organism. Over the years, the networks have been widely studied as models of cell differentiation (Huang & Ingber, 2000), immune response (Kauffman & Weinberger, 1989), and neural networks (Wuensche, 1996). The measures of criticality in RBNs have been studied by Luque and Solé (2000), Wang et al. (2011), and Pineda et al. (2019). For a great overview of work done on RBNs see Aldana et al. (2003).

#### 5.1.1 Critical Behavior in RBNs

RBNs are generic in the sense that both the connections of nodes and the Boolean functions are chosen uniformly at random. This makes it possible to analytically study the properties of a typical *N*−*K* network. Indeed, different approaches (Derrida & Pomeau, 1986; Luque & Solé, 1997) lead to the same description of phase transitions in RBNs. We describe the results briefly below, as we will use them in our experiments.

*N*, we will assign to each node

*i*∈ {1, …,

*N*} the connectivity

*K*∈ $N$ and a Boolean function

_{i}*f*of arity

_{i}*K*. Such a network is parametrized by the mean connectivity 〈

_{i}*K*〉 = $1N\u2211i=1N$

*K*

_{i}. We also introduce the Boolean function sampling bias

*p*∈ (0, 1). That is, we will sample the Boolean functions so that for all

*i*the probability that

*f*(

_{i}*x*

_{1}, …,

*x*

_{Ki}) = 1 is

*p*. Derrida and Pomeau (1986) have analytically determined the edge of chaos for RBNs depending on the mean connectivity parameter 〈

*K*〉 and Boolean function bias

*p*. By studying the evolution of the distance between two randomly generated initial configurations over time, they have shown that the critical values of 〈

*K*〉 and

*p*are exactly those satisfying

In section 5.2, we support the analytical results by showing that the transient classification clearly distinguished between the ordered, critical, and chaotic regions.

#### 5.1.2 Phase-Space Properties of RBNs

The generic nature of *N*−*K* RBNs makes it possible to analytically study their global dynamics. This is a key difference between CAs and RBNs: CAs have a very particular architecture with only local connections and a uniform local transition rule. Therefore, the mean-field approximation methods of phase-space properties used for a “typical” *N*−*K* RBN would not be as easy to apply to CAs.

With the classical interpretation of RBNs as gene regulatory networks where attractors represent different cell types, most of the focus has been on analyzing the number and size of the attractors for different values of *N* and *K*. We briefly summarize some results related to our experiments below. For a more detailed discussion see Aldana et al. (2003).

**Ordered Phase**. In the ordered phase, when *K* = 1, it has been shown that a probability of having an attractor of size *l* falls exponentially with *l* (Flyvbjerg & Kjaer, 1988). For a subset of *K* = 2 RBNs with ordered behavior, Lynch (1986) has shown that their average transient time grows at most logarithmically with the network size *N*.

**Chaotic Phase**. In the case when *p* = $12$ and *K* ≥ *N*, the RBN is essentially a random mapping whose phase-space properties have been studied extensively. It has been shown that both the average attractor and transient lengths of such RBNs grow exponentially with increasing *N* (Harris, 1960; Derrida & Flyvbjerg, 1987).

### 5.2 Transient Classification of RBNs

We have sampled RBNs parametrized by the mean connectivity 〈*K*〉 and the Boolean function bias *p*. In this section we show that the results of transient classification clearly distinguish the ordered, critical, and chaotic phase of RBNs.

#### 5.2.1 Details of the Experiment

Our goal is to estimate the average transient length of a “typical” RBN of size *N*, with mean connectivity 〈*K*〉 and Boolean function bias *p*. We would do so for increasing *N* to observe the asymptotic behavior. We proceed as follows:

Given

*N*, 〈*K*〉, and*p*, we generate a RBN*R*(*N*, 〈*K*〉,*p*) with the corresponding parameters. We estimate the average transient length*T*(*R*(*N*, 〈*K*〉,*p*)) of*R*(*N*, 〈*K*〉,*p*) using the approach described in section 2.4.- We repeat step 1 and generate a sequence of RBNsand their average transient lengths$R1N,Kp,R2N,K,p,\u2026,RmN,K,p$to ensure that we are close to the average transient length of an average RBN with parameters$TR1N,K,p,\u2026,TRmN,K,p$
*N*, 〈*K*〉, and*p*. We determine the number of sampled RBNs needed to get sufficiently close to the true average behavior by a method analogous to the one described in section 2.4. Finally, we obtain the typical average transient length as$TN,K,p=1m\u2211i=1mTRiN,K,p.$ We try to approximate the sequence (

*T*(*N*, 〈*K*〉,*p*)$)N=1\u221e$ by generating a finite part of it. We typically compute (*T*(*N*, 〈*K*〉,*p*)$)N=5200$, the upper bound being either*N*= 200 or the limit imposed by the computation time of the transient lengths.

### 5.3. Results

**Ordered Phase**. We have computed *K _{c}*,

*p*along the curve given by (1) for

*p*= 0.1, 0.2, …, 0.9 and sampled RBNs with parameters 〈

*K*– 1〉,

_{c}*p*to ensure we are in the ordered region. See Figure 23.

For all such ensembles of RBNs the best fit for the typical average transient asymptotic growth was logarithmic. This supports the analytical results proven for special cases of *K* and *p* values.

**Critical Phase**. We have sampled RBNs with parameters 〈*K _{c}*〉,

*p*along the curve given by (1) for

*p*= 0.1, 0.2, 0.3, …, 0.9. In all the sampled cases, the best fit for the typical average transient growth was linear. As in the case for CAs, we are aware that the asymptotic behavior of such RBNs can turn out to be logarithmic or exponential and that we just might not have sampled large enough networks. In such a case, we can interpret Lin Class as a region of RBNs that take significantly longer to converge to their asymptotic behavior. See Figure 24.

**Chaotic Phase**. We have computed the critical values *K _{c}*,

*p*along the curve given by (1) for values

*p*= 0.2, 0.3, …, 0.7, 0.8 and sampled RBNs with parameters 〈

*K*2〉,

_{c}+*p*to ensure we are in the chaotic region. For all such ensembles of RBNs, the best fit for the typical average transient asymptotic growth was exponential, which again agrees with the analytic results. See Figure 25.

These experiments support the results obtained for CAs and TMs indicating that ordered discrete systems belong to Bounded Class and Log Class, chaotic systems correspond to Exp Class, and complex systems lie in the region in between, corresponding to Lin Class and Poly Class.

## 6 Conclusion

We presented a classification method based on the asymptotic growth of average computation time. It is applicable to any deterministic discrete space and time dynamical system. We presented a good correspondence between transient classification and Wolfram's (2002) classification in the case of ECAs. Further, we showed that the classification works for 2D CAs, TMs, and RBNs, and we used it to discover 2D CAs capable of emergent phenomena. By demonstrating that complex discrete systems (e.g., Game of Life, rule 110, and several universal TMs and RBNs with critical parameter values) belong to Lin Class or Poly Class, we have reason to believe that linear and polynomial transient growth navigates us toward a region of complex discrete systems.

Another elegant alternative would be to merge Bounded Class and Log Class representing the ordered phase, and Lin Class and Poly Class corresponding to the critical phase. In this way, we would obtain the traditional three phases of dynamics. This is entirely possible; we have kept the five classes to respect our initial experiments on elementary CAs where we obtained a much finer classification scheme.

The classification is based on a very simple idea and can be implemented with a few lines of code. In the field of ALife where novel discrete dynamical systems are designed as possible models of artificial evolution, the method we presented can be used to check whether such systems belong to Lin Class or Poly Class. This might support the claim that such systems are capable of complex dynamics and emergent phenomena.

## 7 Future Work

We are interested in examining the transient growth of recurrent neural networks (RNNs). In the simplest case, we can add a “rounding off ” output layer to discretize the configuration space. Then, the transient classification could be used to study the dynamics of RNNs, possibly guiding us toward appropriate network initializations and overall architectures yielding complex dynamics.

It would also be interesting to examine Busy Beaver TMs. These TMs take the longest time to halt (in the classical sense) among all TMs with the same number of states and tape symbols when run from an empty tape. It is interesting to observe how such machines behave when run from a randomly sampled initial configuration and whether they would exhibit complex dynamics, possibly being computationally universal (Zenil, 2012).

Lastly, we could examine the dynamics of systems when simulated from a special region of their configuration space. The initial configurations could be generated by a designated algorithm, possibly discovering completely different dynamics of a system, as opposed to its average behavior.

## Funding Information

Our work was supported by Grant Schemes at CU, reg. no. CZ.02.2.69/0.0/0.0/19_073/0016935, the Ministry of Education, Youth and Sports within the dedicated program ERC CZ under the project POSTMAN with reference LL1902, by the Czech project AI&Reasoning CZ.02.1.01/0.0/0.0/15_003/0000466, European Regional Development Fund, bySVV-2020-260589, and is part of the RICAIP project that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 857306.

## Notes

Here, the computation time is understood in the abstract sense; as the number of iterations of the transition function.