Computational Hierarchy of Elementary Cellular Automata

The complexity of cellular automata is traditionally measured by their computational capacity. However, it is difficult to choose a challenging set of computational tasks suitable for the parallel nature of such systems. We study the ability of automata to emulate one another, and we use this notion to define such a set of naturally emerging tasks. We present the results for elementary cellular automata, although the core ideas can be extended to other computational systems. We compute a graph showing which elementary cellular automata can be emulated by which and show that certain chaotic automata are the only ones that cannot emulate any automata non-trivially. Finally, we use the emulation notion to suggest a novel definition of chaos that we believe is suitable for discrete computational systems. We believe our work can help design parallel computational systems that are Turing-complete and also computationally efficient.


Introduction
Discrete systems exhibit a wide variety of dynamical behavior ranging from ordered and easily predictable to a very complex, disordered one. In this paper, we study cellular automata (CA) that have intriguing visualizations of their space-time dynamics. Observing them helps us build intuition for distinguishing the different types of dynamics, as studied by Wolfram (2002). Informally, complex CA produce higher-order structures, whereas the space-time diagrams of chaotic CA are seemingly random. Though CA dynamics has been studied extensively (Wolfram (1984), Kůrka (2009), Wuensche and Lesser (2001), Gutowitz (1990), Zenil (2009)), it is still a very difficult problem to formally define the notions of complexity and chaos in CA.
Traditionally, complexity of CA is studied through their computational capacity (Toffoli (1977), Cook (2004)), whereas chaos is studied via topological dynamics (Devaney (1989)). As the two approaches are not connected in any obvious way, it is an interesting open problem of whether chaotic CA can compute non-trivial tasks ).
In this paper, we study the complexity of CA through their computational capacity; namely, we study their ability to emulate one another; this notion was first introduced by Mazoyer and Rapaport (1999). We present the emulation relation between a pair of CA, and we demonstrate the results on a toy class of elementary CA; namely, we present their computational hierarchy. The results inspired us to define a new notion of chaos for discrete dynamical systems connected to their computational capacity.

Introducing Cellular Automata
A cellular automaton can be perceived as a k-dimensional grid consisting of identical finite state automata with the same local neighborhood. They are all updated synchronously in discrete time steps according to a fixed local update rule which is a function of the neighbors' states. A formal definition can be found in Kari (2005).

Basic Notions
We say that Z is a one-dimensional cellular grid and we call its elements the cells. Let S be a finite set. An Sconfiguration of the grid is a mapping c : Z → S, we write c i = c(i) for each i. We define the nearest-neighbors relative neighborhood of each cell i ∈ Z to be the triple (i − 1, i, i + 1). In this paper, we will study 1-dimensional CA with nearest neighbors. Each such CA is characterized by a tuple (S, f ) where S is a finite set of states and f : S 3 → S is a local transition rule of the CA. The global rule of the CA (S, f ) operating on an infinite grid is a mapping F : S Z → S Z defined as: For practical purposes, when observing the CA simulations, we consider the grid to be of finite size with a periodic boundary condition. In such a case, we compute the cells in the relative neighborhood modulo the size of the grid.
Elementary CA Elementary cellular automata (ECA) are 1D nearest-neighbors CA with states S = {0, 1}. We identify each local rule f determining an ECA with the Wolfram number of f defined as: We will refer to each ECA as a "rule k" where k is the corresponding Wolfram number of its underlying local rule. The class of ECA is a frequently used toy model for studying different CA properties due to its relatively small size; there are only 256 of them.
For an ECA operating on a cyclic grid of size n with global rule F we define the trajectory of a configuration u ∈ {0, 1} n to be (u, F (u), F 2 (u), . . .). The space-time diagram of such a simulation is obtained by plotting the configurations as horizontal rows of black and white squares (corresponding to states 1 and 0) with time progressing downwards.

CA Complexity via Computational Capacity
Classically, the complexity of a CA is demonstrated by its computational capacity. Intuitively, we believe that CA capable of computing non-trivial tasks should be more complex than those that are not. In the past, many different computational problems were considered, such as the majority computation task (Mitchell et al. (2000)) or the firing squad synchronization (Mazoyer (1986)), a detailed overview was written by Mitchell (1998). Nevertheless, the most classical task is the simulation of a computationally universal system (a Turing machine, a tag system, etc.). Over the years, many different CA were designed or showed to be Turing complete. However, it seems unnatural to demonstrate the complexity of an inherently parallel system by embedding a sequential computational model into it. In our opinion, an ideal set of benchmark tasks helping us determine the computational capacity of CA should • consist of tasks suitable for the parallel computational environment • be challenging enough • for a given CA and a task T , it should be effectively verifiable whether the CA can compute T .
For a fixed class C of CA, a natural task is the following: Given a CA in C, how many other CA in C can it simulate?
The key problem is finding a suitable definition of CA "simulating" one-another. Various approaches have been suggested -simulations can be interpreted via CA coarsegrainings (Israeli and Goldenfeld (2006)), or through embedding a local rule to larger size cell-blocks (Mazoyer and Rapaport (1999), Ollinger (2001)). Many interesting theoretical results stem from such definitions. For instance, Ollinger (2001) defined a CA simulation notion which admits a universal CA -one that is able to simulate all other CA with the same dimensionality; it was designed by Culik II and Albert (1987). In this paper, we consider a notion very similar to the one suggested by Mazoyer and Rapaport (1999). Compared to their definition, ours is stricter and, thus, slightly easier to verify. We call it CA emulation and introduce it in the subsequent section.
This simply means that we can embed the rule table of ca 1 into the rule table of ca 2 ; the definition is equivalent to saying that (S, f ) is a subalgebra of (T, g). Example 2. For an ECA ca = ({0, 1}, f ), we define the dual ECA ca = ({0, 1}, f ) as  Each ECA contains at most two subautomata from the ECA class: itself and its dual ECA. Hence, the subautomaton relation would not produce a rich hierarchy. It is, however, a key concept for defining the CA emulation. Below, we restrict the terminology to ECA, but the theory can be generalized to any 1D CA in a straightforward way.
By f k we simply mean the composition Each iteration of f shortens the input size by 2, therefore we can notice that when we restrict the domain of f k we get Hence, f k can be interpreted as a ternary function operating on supercells of k bits and we obtain a CA (2 k , f k ) whose dynamics is completely governed by the simple local rule f . Therefore, each ECA (2, f ) gives rise to a series of CA Figure 2 we show the diagram of f 2 computation. Let e : S → T be a mapping between some finite sets. We define e : S + → T + simply as e(s 1 , s 2 , . . . , s n ) = e(s 1 ), e(s 2 ), . . . , e(s n ) for each s 1 , . . . , s n ∈ S, n ∈ N.
Definition 5 (ECA Emulation). We say that ca 1 = (2, f ) can be emulated by ca 2 = (2, g) with a supercell size k, if ca 1 is a subautomaton of (2 k , g k ). In such a case, we write ca 1 ≤ k ca 2 .
Hence, (2, f ) ≤ k (2, g) holds if and only if there exists a one-to-one encoding enc : 2 → 2 k such that for any initial configuration c ∈ 2 3 it holds that enc(f (c)) = g k (enc(c)).
We call such encoding the witnessing encoding. In other words, the following diagram commutes.
We say that ca 1 can be emulated by ca 2 if there exists some k for which ca 1 ≤ k ca 2 , and we write ca 1 ≤ ca 2 . If ca 1 ≤ 1 ca 2 we say that ca 2 emulates ca 1 trivially. If ca ≤ k ca for k > 1 we say that the ca is self-similar.
In the next section, we give some simple proofs of the key properties of the ≤ relation. Namely, we prove that ≤ is a preorder and that whenever ca 1 ≤ ca 2 and ca 1 is Turing complete, then ca 2 is also Turing complete.
Proof. Let c ∈ 2 l for some l ≥ 3. Then, f (c) ∈ 2 l−2 , and enc( f (c)) is a sequence of length l − 2, each of its elements being a binary k-tuple. Below, we show that for each Using the previous result it can easily be shown that for each t ∈ N and for each c ∈ 2 + sufficiently long it holds that: simply using induction on t. Therefore, if (2, f ) ≤ k (2, g) then the diagram (1) commutes for arbitrarily long configurations and for arbitrarily many iterations of the functions f and g k . Therefore, any space-time diagram produced by ca 1 = (2, f ) can be efficiently encoded by enc to a space-time diagram of ca 2 = (2, g). Hence, if ca 1 is Turing complete, ca 2 must be also.
Emulation Relation Is a Preorder Clearly, for each ca it holds that ca ≤ 1 ca. Hence, ≤ is reflexive.
Hence, ≤ is transitive and therefore, a preorder. In contrast, Mazoyer and Rapaport (1999) define that (2, f ) can be emulated by (2, g) if there exist k, l ∈ N such that (2 k , f k ) is a subautomaton of (2 l , g l ). They present much deeper theoretical results about the notion in their paper. In contrast, we concentrate on explicitly computing which ECA can emulate which to form their computational hierarchy and discuss the results.

It holds that
(2, f ) ≤ k (2, g) if and only if (2 k , g k ) contains a two-element subalgebra isomorphic to (2, f ); the witnessing encoding being the corresponding algebra homomorphism. Israeli and Goldenfeld (2006) have defined a different notion of CA simulation called the CA coarse-graining. It is interesting to notice that their notion is dual to the CA emulation we have defined in this paper. More specifically, the CA coarse-graining directly corresponds to the congruences of algebras. Namely, (2, g) can be coarse-grained into (2, f ) if and only if there exists some k ∈ N such that (2, f ) is isomorphic to some quotient algebra of (2 k , g k ).
The interpretation of CA emulation as subalgebras comes in handy when developing an effective algorithm for computing the ≤ k for a given supercell size k.

Emulation Computing Algorithm
Checking whether ca 1 ≤ k ca 2 gets infeasible for large k.
In this section, we present an algorithm computing the ≤ k relation for a given k and compare it to the "naive" algorithm in terms of efficiency.
Algorithm 2: Subalgebra algorithm computing all ca 1 for which ca 1 ≤ k ca 2 .
Input : ca2 = (2, g), supercell size k Output: all ca1 = (2, f ) together with a witnessing enc such that ca1 ≤ k ca2 1 for u, v ∈ 2 k 2 do 3 {u, v} is a valid sublagebra of (2 k , g k ); The time complexity with respect to the supercell size k is in O(k · 2 2k ) for both algorithms. However, Algorithm 1 needs to iterate over all ECA to compute the same result as Algorithm 2. Experimentally, we have indeed observed that Algorithm 1 takes approximately 250 times longer than Algorithm 2 to compute the emulated automata.

Emulation Hierarchy of ECA
In this section, we present a computational hierarchy based on the emulation relation. We were able to compute ≤ k for supercell size k ranging from 1 to 11 due to the computational limitations.
In Example 2 we have seen that for ca = (2, f ) and its dual ca = (2, f ) it holds that ca ≤ 1 ca and ca ≤ 1 ca. Thus, ≤ 1 is an equivalence relation on the set of ECA, each class containing exactly f and its dual f (those might coincide for some rules). From the transitivity of ≤ we know that ca can emulate (and be emulated by) exactly the same rules as ca . Using a simple program, we obtained 136 different ECA equivalence classes given by ≤ 1 . In the following text, we will identify each equivalence class with the ECA it contains having the smaller Wolfram number. We show the hierarchy for such representatives and for supercells of size k ranging from 2 to 11. Specifically, whenever we found that ca 1 ≤ k ca 2 for some k ∈ {2, 3, . . . , 11} we represent it by the following diagram. Many ECA are capable of emulating simple rules, such as rule 0 or the identity rule 204. Adding so many edges to the diagram would make it unreadable. Hence, we present the hierarchy in three parts.  Figure 3: Emulation Hierarchy of ECA computed for supercell sizes ranging from 2 to 11; main part of the diagram. Some edges are depicted in light gray purely for better understandability. A looped arrow marks self-similar rules.
Main Part For compactness, we often show multiple rules in the same node in Figure 3. However, that does not imply that such rules can emulate one another. Rules in one node are emulated and can emulate exactly the same rules contained in the main part in Figure 3. However, they do not necessarily emulate the same trivial or linear rules in Figures 4 and 5. Therefore, we note that rules in the same node do not necessarily have an identical computational capacity. A task frequently studied in the CA environment is to determine the majority of 0's and 1's in the input configuration. The strict version of the task requires the CA to enter a homogenous state of all 0's or all 1's to indicate the result. It has been shown that no ECA can solve this strict version. If we relax the requirements on the form of the output, Capcarrere et al. (1996) have shown that rule 184 solves the majority task exactly. From the main part of the hierarchy, we can see that by encoding the input configuration in a simple way, at least nine more ECA and their dual rules can solve the task. Frequently Emulated Rules The CA emulation offers a natural definition of a CA supporting memory: we say that a CA has a capability of perfect memory if it can emulate the identity rule 204 (alternatively, we could also tolerate the emulation of shifting rules or the negation). Hence, we have a practical criterion for checking whether a given CA can support memory effectively, i.e., with a small supercell size. Rules, which cannot emulate either of the rules 51, 204, 170, and 240 with the examined supercell sizes are: 0, 3,7,8,17,21,22,30,32,45,60,64,86,89,90,102,105,128,136,150,160,192. We can notice that they include trivial rules (0, 3, 8, 128), linear rules (60,90,105,150), as well as the chaotic rules (22,30,45,86,89).

Emulating Linear Rules
We say an ECA is linear, if its local rule f is a linear Boolean function; i. e., it satisfies f (x + y) = f (x) + f (y) for all x, y ∈ 2 3 . Such ECA can be studied algebraically which lead to exact results describing e.g., their transients or the structure of attractors (Martin et al. (1984), Jen (1988)). Jen (1999) has shown that the nonlinear rules 18, 126, and 146 can be mapped onto the linear rule 90. Figure 5 agrees with the results and further shows other nonlinear rules with this property which makes them interesting candidates for further algebraic studies.  , 20, 54, 57, 134, 148 7 14, 37, 56, 84, 94, 98, 156 6 0, 3, 8, 17, 60, 64, 90, 102, 105, 106, 120, 150, 170, 204, 240 1 Bottom 30, 45, 86, 89 0 There are exactly four rules (and their duals) that seem to be unable to emulate any ECA non-trivially (i.e., with a supercell size larger than one). Rule 86 is obtained from rule 30 by changing the role of the "left and right neighbor". Rule 89 is obtained in the same way from the dual of rule 45. All the four rules seem to belong to the most disordered ECA according to metrics such as the compression size of space-time diagrams (Zenil (2009)) or the transient classification (Hudcová and Mikolov (2020)). Due to the seemingly random patterns it produces, rule 30 was implemented as a pseudorandom number generator in Mathematica.
It is interesting that the most chaotic ECA seem to be exactly those unable to emulate any ECA non-trivially. To further explore this, we asked whether the four chaotic CA can emulate any 1D nearest-neighbors CA non-trivially, not just the ECA.
Let (S, f ) be a CA. Since S is finite, there must exist some s ∈ S and k ∈ N such that f k (s, s, s, . . . , s) = s. Hence, for any CA with one state ({q}, g) the encoding enc : q → (s, s, . . . , s k−times ) witnesses that (S, f ) can emulate ({q}, g). Therefore, any CA can emulate every CA with one state. The question is whether the four CA can emulate any CA with more than one state non-trivially.
We found out that all of the four CA only contain subalgebras with one element, hence, they cannot emulate any CA with more than one state for supercell sizes 2 to 11.
It remains an open problem whether this would hold for supercells of arbitrary size. However, we can conclude that if any of the four chaotic ECA can emulate any non-trivial automaton, it cannot do so effectively, i.e., with a supercell of small size.
We note that analogous experiments were conducted for the coarse-graining relation. (Dzwinel and Magiera (2015)) showed that the rules 30 and 45, together with their negations, seem to be exactly those that cannot be coarse-grained non-trivially. These results motivate us to study chaos from a computational perspective in the next section.

Chaos In Cellular Automata
Intuitively, a chaotic CA has unpredictable dynamics and seemingly random space-time diagrams with no apparent higher-order structures forming. Below, we discuss examples of formal definitions of chaos suggested in the past and propose a new one based on the CA emulation notion.
For discrete systems, chaos is classically studied through topological dynamics, and it is typically connected to the system's sensitivity to initial conditions (Devaney (1989), Cattaneo et al. (1999)). Other studies of chaos in CA examine e.g., their input entropy (Wuensche (1999)) or properties of the local rule itself (Wuensche (2009)); a comprehensive study of ECA, including their chaotic behavior, was introduced by Chua et al. (2007).
From a great review by Martinez (2013) one can see that some definitions of chaos admit either the shift CA (Devaney (1989)) or linear CA (Cattaneo et al. (1999)) to be chaotic. In the first case, shifting a configuration by one bit does not intuitively feel very unpredictable. For linear automata, it is known that they can be simulated on a computer significantly faster than general CA (Robinson (1987)). Hence, their dynamics can be predicted quite efficiently. Below, we propose a new, much stricter definition of chaotic behavior.
Definition 8. We call a 1D nearest-neighbors CA computationally chaotic if it cannot emulate any 1D nearestneighbors CA with more than one state non-trivially.
Proving that a particular CA is computationally chaotic would, in principle, require verifying an infinite amount of conditions and might require deep theoretical insight into the dynamics of the CA. However, it is useful as a new theoretical notion, which formalizes the unpredictability of a CA. Indeed, if ca 1 ≤ k ca 2 then for a subset of initial conditions, the result of ca 2 run for k time-steps is equivalent to simply running ca 1 for 1 time-step. Hence, at least some part of its dynamics can be predicted more effectively. In contrast, computationally chaotic CA cannot contain any simpler CA as a substructure in its space-time dynamics. This suggests that no part of its dynamics can be predicted efficiently. Hence, the definition seems to agree with the intuitive understanding of unpredictability.
In CA with trivial dynamics, structures tend to die out quite quickly. This enables such rules to emulate either the rule 0 non-trivially (this can be seen from Figure 4) or to be self-similar (e.g., rules 0, 240, or the shift rule). Thus, they are not computationally chaotic. As linear CA are selfsimilar, it follows that they are not computationally chaotic either.
From the results we presented, it follows that the only ECA that could be computationally chaotic are rules 30 and 45, together with their symmetrical rules. Hence, we can form the following hypothesis. Hypothesis 9. ECA rules 30 and 45 are computationally chaotic.
Proving this result would imply that such rules cannot compute any task in the sense of CA emulation.
We note that the definition can be extended to CA in arbitrary dimensions with various neighborhood shapes in a straightforward way. We also note that the definition does not depend on whether the CA operates on a finite or infinite grid and is purely a property of its local rule.
Most definitions of chaos in discrete systems are not related to their computational capacity. Hence, there is an interesting ongoing debate whether chaotic CA can compute any non-trivial tasks. In contrast, the notion of computational chaos is strongly connected to the computational capacity of the CA. If a CA is computationally chaotic, it cannot compute the dynamics of any other CA. Hence, we might conclude that being computationally chaotic directly implies the inability to compute a non-trivial task.

Conclusion
We studied the notion of CA emulation as a method of determining the computational capacity of CA. We showed that the CA emulation relation forms a preorder on the ECA class and presented an approximation of the emulation hierarchy produced by the preorder. We did notice that the most chaotic CA seem to be the minimal elements of the hierarchy. This inspired us to introduce a new definition of chaos in the CA environment. Our notion of chaos is novel because it is connected to the computational capacity of a system. In contrast to previous concepts of chaos, our definition does not regard linear and shifting automata as chaotic. This agrees with the results that the dynamics of such CA can be predicted more efficiently than for general CA.
The emulation relation can be defined for CA in any dimension and with an arbitrary neighborhood. Though its computation requires verifying infinitely many conditions, we can compute it just for supercells of small size. Verifying such small supercell values helps us determine whether a CA can emulate any others effectively.
A pair of ECA obtained by changing the role of their "left and right neighbor" has equivalent dynamics; however, the emulation relation we presented is unable to discover such an equivalence. For this reason, it is meaningful to study other possible definitions of a CA subautomaton. It would be very interesting to examine whether the different variants of the definition would change the computational hierarchy results significantly.
As a possible application, we can use the method of CA emulation to construct Turing-complete CA without having to give elaborate proofs of this fact. We would simply embed a well-known Turing-complete CA into a newly constructed CA with possibly much richer dynamics. It could be interesting to design a CA capable of emulating many different CA and study the dynamics of such a rich system.