In the past decades many definitions of complexity have been proposed. Most of these definitions are based either on Shannon's information theory or on Kolmogorov complexity; these two are often compared, but very few studies integrate the two ideas. In this article we introduce a new measure of complexity that builds on both of these theories. As a demonstration of the concept, the technique is applied to elementary cellular automata and simulations of the self-organization of porphyrin molecules.
The notion of complexity has been used in various disciplines. However, the very definition of complexity1 is still without consensus. One of the reasons for this is that there are many intuitive interpretations to the concept of complexity that can be applied in different domains of science. For instance, the Lorentz attractor can be simply described by three interaction differential equations, but it is often seen as producing complex chaotic dynamics. The iconic fractal image generated by the Mandelbrot set can be seen as a simple mathematical algorithm having a complex phase diagram. Yet another characteristic of complex systems is that they consist of many interacting components such as genetic regulatory networks, electronic components, and ecological food chains. Complexity is also associated with processes that generate complex outcomes, evolution being the best example. These wildly different perceptions and contextual dependences make it difficult to come up with a unifying definition of complexity that captures all these different aspects.
Of the many definitions of complexity [1, 15] that have been proposed over the years, most are, to a large extent, derived from one or the other of two powerful mathematical theories—Shannon information theory  or Kolmogorov complexity [10, 2].
The complexity of a system is often expressed in terms of “the amount of information produced by the system.” If we measure the outputs of a system and encode them into a series of strings, then the complexity of the system can be assessed by the information contained within any such string; Kolmogorov complexity measures the amount of information required to generate this string. Another way of evaluating the complexity of a system is by the diversity of its outputs, that is, the probability distributions of the different symbols in the string; this approach is related to Shannon's information theory. The two approaches are often compared . It has been shown that for a recursive probability distribution, the average value of Kolmogorov complexity of the distribution is similar to the value of the corresponding Shannon entropy. However, Teixeira et al.  proved that the relationship does not hold for the two generalizations of Shannon entropy—Renyi entropy and Tsallis entropy—except for the special case of α = 1.
Several studies combine both Kolmogorov complexity and Shannon information theory to derive a more general definition of complexity. Let us consider a system that can be in an ensemble E of states, with each state e having a corresponding probability Pe. Zurek [26, 27] proposed the concept of physical entropy. It has two components, (i) K(d), the algorithmic randomness (Kolmogorov complexity) of the known data d about the system, (ii) the missing information about the system, measured by Shannon entropy H(d) = −∑ePe|d log2Pe|d, where Pe|d is the conditional probability of state e given data d. The physical entropy Sd of the system is the sum of the two terms, Sd = H(d) + K(d).
Gell-Mann and Lloyd  introduced effective complexity. Similarly to physical entropy, they define the total information ∑ of the system to be the sum of two terms, ∑ = Y + I. The effective complexity Y is the algorithmic information content (AIC, or Kolmogorov complexity) of the regularities among all the possible states of the system, Y = K(E). The term I represents the random aspect of the states, I ≈ ∑ePeK(e|E), where K(e|E) is the contingent AIC of each member e given E. Then Y can be considered as the most concise theory that describes the system, whereas I captures the accuracy of the theory. For instance, a simple system can be described by a simple theory (which has a low value of K) with high certainty (low I), and therefore has a low overall total information; on the other hand, a complex system would require either a complex theory (high K), or one with a low predictability (high value of I), or both.
Different from, and complementary to, the above approaches, our definition of complexity is based on the relative difference between the Shannon information in the ensemble of possible input (initial) states of a system and the entropy of the ensemble of the resulting output states for such a system. A system is considered to be less complex if it preserves lower information while mapping input ensembles to the output ensembles. A more complex system would generate more information in the resultant output ensembles. The output states are categorized by their estimated Kolmogorov complexity (the method is described in the following sections).
In this article we refine our recently proposed definition of complexity , which is based on a few intuitive but less studied concepts of complexity. The new definition also takes into account studies that used Kolmogorov complexity [25, 21, 12] to characterize and classify images, which essentially are 2D strings.
2 Background Theory
2.1 Kolmogorov Complexity
One should note that Kolmogorov complexity is not computable—there is no existing algorithm that can calculate the shortest description for any given string. Lossless compression algorithms can be used instead to calculate an upper bound of the Kolmogorov complexity. In this study we take such an approximation as the estimate of K . Another concern regarding Kolmogorov complexity is that it assigns long programs to both random strings and genuinely complex ones.
2.2 Shannon Information Theory
3 Controllability Complexity
Korb and Dorin  argued that when analyzing the emergence of complex events one should consider not just the complexity of the event itself, but also the complexity of the system that generated it. They suggested that a minimum-message-length theory (MML)  would provide the most suitable basis for investigating the emergence of complexity, especially in the biological context. MML divides messages into two components, of which one describes the hypothesis under consideration and the other describes the sample statistics available. Inspired by their insight, we try to relate the probabilities of the input states of the system (U) to those of the corresponding output states (E). We define an index of emergent, or excess, complexity (EC) of output states in relation to their corresponding input states, that is, the amount of additional complexity a system adds to the initial states measured. This index relates the probability of observing particular outputs, p(E|U), with the probability of finding the system in some initial state, p(U).
Our definition is based on the following six boundary conditions for what a useful complexity measure needs to have:
p(E|U) and p(U) are normalized and proper probabilities such that 0 ≤ p(E|U), p(U) ≤ 1.
The parameter EC is an unbounded positive number.
limp(E|U)→1, p(U)→1EC(p(E|U), p(U)) = 0, that is, probable initial conditions that lead to probable events receive the lowest rank: no surprises can be expected from this universe under the given (highly probable) initial conditions. This is marked as scenario A in Figure 1.
limp(E|U)→1, p(U)→0EC(p(E|U), p(U)) = K with K > 0, that is, improbable initial conditions that lead to probable events are ranked slightly higher than 0 (the previous case). Intuitively this is a “needle in a haystack” universe with highly predictable dynamics (scenario B).
limp(E|U)→0, p(U)→0EC(p(E|U), p(U)) = L with L > K, that is, an improbable initial state that leads to improbable events ranks even higher than scenario B, as it clearly represents an unexpected observation emerging from an unexpected initial condition (“Garden of Eden,” scenario C).
limp(E|U)→0, p(U)→1EC(p(E|U), p(U)) = M with M > L: a probable set of initial conditions throws out surprising outputs, a simplicity-begets-complexity situation, thus ranking at the top of the scale (“elegant Garden of Eden,” scenario D).
Figure 1 shows the four EC values of the four extreme scenarios. As one moves from A to B to C and finally D, EC increases.
Whereas probabilistic information is fundamentally related to Shannon's information theory, we are also interested in how much Kolmogorov complexity is contained in the outputs. We define the output states E not as the precise final states of the system, but as the amount of Kolmogorov complexity they contain. Therefore, p(E|U) should be understood as the probability of a system generating outcomes with a specific value (or within a certain range) of Kolmogorov complexity, given the initial state U.
The procedures for applying the measure are the following:
The input states U for the complex system under study (cellular automata or porphyrin self-assembly) are defined as the different possible input parameters (initial conditions) of the system. This is straightforward for systems with discrete input parameters; for systems that take continuous parameters, the input parameter space is discretized and each resultant unit is considered as one input state Ui.
Once the input states are identified, their corresponding probabilities p(Ui) can then be calculated.
Next we want to specify the output states E. For each input state U, a number of results Rij (for 0 < j < N, N is the number of results generated) are, in general, produced by a combination of system stochasticity and nonlinear behavior. As mentioned in the previous section, the Kolmogorov complexity of a string can be estimated by its compressed size. We follow the approach we presented in our previous works [25, 20]. Each generated Rij is converted into images. These images are then compressed into PNG format and further optimized using pngcrush, which has been shown to achieve good compression ratios . The Kolmogorov approximations K of the results are taken to be the final sizes of the images after the compressions. Thus, for each input state we have NK-values from their corresponding Rij.
The range of measured K is divided into a series of discrete intervals, whose size is given by ΔK. The output state E is defined as a resultant image having a specific Kolmogorov complexity K, i.e., between KE and KE + ΔK. Therefore, p(E/U) is the probability of the system generating an image that has a K value within a specific range given initial state U. The size of the intervals, ΔK, indicates the sensitivity in measuring the K value of the different outputs. As it is intuitive for us to consider a system that gives rich outputs as “complex,” the calculated value of OEC is affected by the size of the intervals we can reliably measure.
After we have defined E, we can then calculate p(E|U) for all combinations of E and U, using the distribution of K of the images generated by each input state U.
4 Analysis of Cellular Automata
4.1 Experiment Setting and Procedures
Having discussed the concept and implementation method of our complexity measurement, we proceed to analyze the complexity of the different rule set of elementary cellular automata. Numerous studies have focused on the classification of cellular automata. One of the most notable investigations was reported by Wolfram in A New Kind of Science (NKS) . He classified the rules of cellular automata into four classes, namely:
Class 1: The differences between the different initial states quickly diminish and result in homogeneous states.
Class 2: The output states evolve into periodic states.
Class 3: The outputs have random patterns.
Class 4: The outputs exhibit both ordered and random patterns.
The procedure is the following:
For each selected rule, 10% of the possible initial states are randomly chosen and simulated, which generates over 20,000 images. The images are then converted into PNG format and further optimized.
- The input states U are denoted by the number of 1's in the initial condition; for example, for cellular automata of width 6, initial conditions 000001 and 100000 both belong to U1, and 000011 and 110000 belong to U2. For a system with fixed width m the probability of each Un can be easily calculated using the following equation:
The probability distributions of K for each of the input states U, which are given by p(E|U), are built using the corresponding images. Figure 2 shows the probability distribution of p(E); each bar in the histogram is considered to be an output state E.
The 32 rules are ranked according to their calculated OEC values and are presented in Table 1. First of all we should note that, as stated above, the width of the intervals (ΔK) affects the values of the OEC; in order to show our classification to be consistent, we have calculated the OEC values using different sizes of ΔK. The results show that there is little effect on the comparison between the rules, that is, if, by using a particular width, rule A is considered to be more complex than rule B, then it is highly likely that we will see the same relationship when a different width is used. Thus, while the OEC absolute values of the rules are bin-width dependent, the relative values of complexity are unaffected.
|Rank .||Rule number .||OEC .||Rank .||Rule number .||OEC .|
|Rank .||Rule number .||OEC .||Rank .||Rule number .||OEC .|
We plot the value of the OEC against the corresponding rank among the examined rules (see Figure 3). From the figure we can roughly identify four regions according to the OEC values. However, it is not straightforward to justify the calculated values. We start with two of the simplest rules—rules 0 and 204. Rule 0 simply turns any state into 0; rule 204, on the other hand, maintains the previous row. Three example images produced using the two rules are shown in Figure 4. In the context of complexity theories, these images have very low algorithm complexity (can be described using short strings); therefore they occupy a very small region in the K space. This is indeed what we see from the probability distributions in Figure 5. As the images are concentrated in a small region in the K space, in comparison with other rules there are more pairs of (U, E) belonging to scenario A, which resulted in low OEC values. Similar K probability distributions are observed for the rules that have low OEC values.
Next we look at the higher end of the OEC spectrum. Here we classify rules with OEC higher than 2400 to be in class 4. Wolfram considers class 4 cellular automata to produce both random and structured patterns. Examples of the images from class 4 are displayed in Figure 6. Figure 7 shows the K probability distributions for rules 18, 110, and 146. All three of the distributions spread over a large range of K, from structured to random, which fits Wolfram's intuition. The differences between the rules that have high OEC values (class 4) and low ones (class 1) are apparent—the profiles of the simple rules occupy only a small region in the K space, whereas the ones with high values occupy a much larger region. This is a promising result, in that if one took an intuitive approach and evaluated the complexity of the rules using the average of the Kolmogorov complexity of the generated images, then one would not be able to correctly identify the class 4 rules as complex.
Now, let us consider rules that produce regular simple structures and were identified as class 2 by our measure. Among the 32 rules, rules 50 and 178 are examples of class 2: both of them have midrange OEC values. Three images for the two rules are presented in Figure 8. In addition to the two sets of images being very similar, it is not difficult to see that there are obvious simple patterns in them. Therefore, the images are considered to have low Kolmogorov complexity; in other words, they are simple structured objects. Their K distributions, as expected, are shifted to the lower end (Figure 9).
By closer inspection we can see differences between our OEC evaluation and Wolfram's classification. For instance, rule 150, which in general is considered to be a class 3 automaton, has a calculated OEC value lower than some of the class 2 rules such as rule 94. Figure 10a shows that the images generated in fact have complex patterns and are measured to have high Kolmogorov complexity. Again, we refer to the corresponding K probability profile (Figure 11) to see if the calculated OEC can be explained. The profiles agree with what we would have expected from the two rules. The probability profile of rule 150 shows that the majority of its images have high values of K. The reason that a low OEC value was assigned to this rule is that its distribution does not cover a wide range of K-values and hence the space of possible outputs is less surprising.
4.3 Modified OEC
A possible weakness of our measure is that the calculation does not properly weight the actual complexity of the output state E (given by the corresponding Kolmogorov complexity). In other words, two rules may end up having the same OEC as long as they have similar probability profiles p(E|P), regardless of the information content in the images that they generate.
The weighted OEC values for the rules are listed in Table 2. The modified function addresses some of the differences between the two classifications. As complex images are weighted higher than simple ones, the calculated OEC values for class 2 and class 3 rules (such as rules 150, 30, and 90) are now in appropriate positions in the ranking, while the rankings for the rules that were identified as class 1 and class 4 are not affected. We would also like to point out that rule 110, the only rule proven to be Turing complete , was calculated to have one of the highest OEC values among the tested rules by our proposed complexity measure.
|Rank .||Rule number .||Weighted OEC .||Rank .||Rule number .||Weighted OEC .|
|Rank .||Rule number .||Weighted OEC .||Rank .||Rule number .||Weighted OEC .|
5 Porphyrin-Tile Kinetic Monte Carlo System
In this section we apply our complexity measure to a novel porphyrin-mediated molecular self-assembly simulation model. We first describe the components of a multi-agent system for the simulation of porphyrin molecule self-assembly. Porphyrins are planar molecules with fourfold symmetry and a chemical structure comprising four structural units that can be synthesized with substituent functional groups. Intermolecular binding, such as hydrogen bonding and van der Waals forces among such substituents, allows diverse self-assembly complexity together with a high degree of reversibility and highly dynamic pattern formation. We choose the Wang tile model  as an idealized model of porphyrins, since Wang tiles are square, with labeled edges, and undergo tile-to-tile interactions. Thus, there is not only a morphological correspondence to functionalized porphyrin molecules, but also a functional mapping to the intermolecular interactions between them. We refer to such models as porphyrin-tile models. A tile can be as classified isofunctionalized when its four sides are set with the same functional group, and as heterofunctionalized when its four sides are set with different functional groups. Figure 12 depicts the model correspondence and structural parts of a heterofunctionalized porphyrin molecule.
The substrate where molecules are deposited and interact with one another, forming aggregates, is modeled as a two-dimensional square-site lattice. Such a lattice is subjected to periodic boundary conditions where each position is occupied by only one porphyrin tile at a time. The neighborhood of a porphyrin tile is of von Neumann type, and energy interactions among neighboring molecules are at the core of the system dynamics for capturing deposition, motion, and rotation of a molecule on the substrate. In particular, deposition models the arrival of a molecule at an empty position of the substrate, that is, the placement of a porphyrin tile in an unoccupied position (i, j) of the lattice. Motion models the translation of a molecule to one of its four neighboring empty positions of the substrate, that is, the movement of a porphyrin tile located at position (i, j) to one of its four unoccupied nearest neighboring positions (i +1, j), (i, j +1), (i −1, j) or (i, j −1). For this, we consider three cases: the diffusion of a molecule across the lattice without interacting with neighboring molecules as shown in Figure 13b, diffusion along an aggregate as shown in Figure 13c, and departure of a molecule from an aggregate as shown in Figure 13d. Rotation models spinning of a molecule about its center of mass; it consists of a 90-degree gyration of a porphyrin tile about its geometrical midpoint.
We use this complex system model to run simulations that can be analyzed with our proposed complexity measure. Across this set of simulations, the porphyrin tile kMC system was configured with a lattice of 64 × 64 positions, Er = 1.3 eV, TT0 = 28 × 10−3, RDep = 5 × 10−5, a maximum coverage of 50%, a given binding energy between molecule and substrate (Es), and six species comprising two isofunctionalized and four heterofunctionalized porphyrin tiles as shown in Figure 14.
The simulations of this complex system involving these six porphyrin-tile species consider functional groups for which the intermolecular bindings are always positive. Therefore, all the possible combinations among E11, E22, E12, and Es were systematically given in turn, the first three taking values 0.1, 0.2, … , 0.5 eV, and the last taking values 0.5, 0.6, 0.7 eV.
5.1 Complexity Analysis
Similarly to our analysis on cellular automata, the porphyrin simulation results are converted into PNG images and optimally compressed using pngcrush as before. The Kolmogorov complexities of the images are measured. Our initial set of simulations has explored a large area in the parameter space. Analyzing the images using the method described in the previous section gives us a rough estimation of the diversity of possible outputs of the system.
Let us assume the system has equal probability of taking each of 10 different sets of input parameters. For each of these input parameters, we run 100 independent runs and generate their associated images; sample images are shown in Figure 15. The values of these 10 sets of parameters are (ordered by increasing average K)
U1 = (0.5, 0.3, 0.3, 0.2),
U2 = (0.5, 0.3, 0.5, 0.2),
U3 = (0.5, 0.5, 0.5, 0.2),
U4 = (0.5, 0.1, 0.3, 0.2),
U5 = (0.5, 0.1, 0.4, 0.2),
U6 = (0.5, 0.1, 0.5, 0.5),
U7 = (0.7, 0.2, 0.3, 0.3),
U8 = (0.5, 0.1, 0.2, 0.1),
U9 = (0.5, 0.1, 0.5, 0.1),
U10 = (0.7, 0.5, 0.5, 0.4).
Using the distributions, we can calculate the p(E|U) for each pair of E and U. Since we know the probabilities of the input states, we can calculate how much each U contributes to the overall OEC value for the system (using Equation 4). This is shown in Table 3.
|A .||B .||C .|
|Input state Ui .||.||Average K .|
|A .||B .||C .|
|Input state Ui .||.||Average K .|
By carefully analyzing (i) the ranking obtained in Table 3, (ii) the probability distribution in Figure 16, and (iii) the actual numerical values used to calculate column B in Table 3, it is possible to observe that the OEC (Equation 4) is correctly distinguishing four cases, namely:
Input-to-output mappings with relatively low complexity (K) and a low range of K-values (most morphologies are very similar to each other), sets U1, U2 and U3.
Input-to-output mappings with relatively low complexity but with varied (hence less predictable) morphologies, sets 7 and, notably, U5. The latter has lower average K than U7, but it has a wider distribution, that is, it produces a larger variety of outputs.
Input-to-output mappings with relatively high algorithmic complexity but with a narrow variety of morphologies (and hence more predictable), sets 4 and, notably, 10. In the latter case, although U10 has the highest average algorithmic complexity, it produces the smallest range of possible values, thus making it highly predictable.
Input-to-output mappings with relatively high algorithmic complexity and a wide range of morphologies (thus less predictable), sets U8 and U9.
6 Conclusion and Discussion
One of the key concepts of this work is the merging of Shannon information and Kolmogorov complexity into a new measure of emergent complexity, EC (Equations 1, 2, 3, and 4). We use the concept of Kolmogorov complexity to classify the outputs of a system according to their algorithmic complexity, then use MML-inspired equations to measure the amount of information contained in the K distribution of outputs. The proposed method has the advantage of taking into account both the information in the generated messages and the information given by the diversity of the messages. Apart from the information contained, our complexity measure also considers the correlation between input states and output states.
The method was applied to elementary cellular automata, where good agreement has been found between our formalized complexity classification and Wolfram's intuitive classification. We then introduced and demonstrated how the proposed measurement can be applied to a novel porphyrin self-assembly model, thus demonstrating its general utility in complex systems other than cellular automata.
The approximations we made for Equations 1–4 would benefit from better compression algorithms (more accurate estimates of Kolmogorov complexity—for example, approximations using the coding theorem method ). The derived equations (1–4) were simply designed to serve our basic intuitions of what a good measurement of complexity should do (i.e., conditions 1–6 in Section 3.1). Further investigations can aim to strike a balance between the different components of information content, that is, between the algorithmic information content in the objects produced and that in the distribution of those objects. The results we obtained with our new measure of overall complexity (Equation 4) are very intriguing indeed; notably, our measure ranks cellular automata rule 110, the only one known to be Turing complete, to be one of the most complex among all the tested rules.
We thank the UK EPSRC (grants EP/H010432/1, EP/G042462/1, and EP/J004111/1). We thank Prof. P. Moriarty and Prof. N. Champness for constructive discussions on molecular self-assembly.
School of Computer Science, University of Nottingham, Nottingham NG8 1BB, United Kingdom. E-mail: Leong.Lui@nottingham.ac.uk
Institute for Advanced Manufacturing, Faculty of Engineering, University of Nottingham, Nottingham NG7 2RD, United Kingdom. E-mail: email@example.com
Behavioural and Evolutionary Theory Lab, Department of Computer Science, University of Sheffield, Sheffield S10 1TN, United Kingdom. E-mail: firstname.lastname@example.org
School of Pharmacy, University of Nottingham, NG7 2RD United Kingdom. E-mail: email@example.com
Interdisciplinary Computing and Complex BioSystems (ICOS) Research Group, School of Computing Science, Newcastle University, Newcastle NE1 7RU, United Kingdom. E-mail: Natalio.Krasnogor@Newcastle.ac.uk