Abstract
Camouflage in nature seems to arise from competition between predator and prey. To survive, predators must find prey, while prey must avoid being found. A simulation model of that adversarial relationship is presented here. Camouflage patterns of prey coevolve in competition with visual perception of predators. During their lifetimes, predators learn to better locate the camouflaged prey they encounter. The environment for this 2-D simulation is provided by photographs of natural scenes. The model consists of two evolving populations, one of prey and another of predators. Conflict between these populations produces both effective prey camouflage and predators able to “break” camouflage. The resulting open-source Artificial Life model can help the study of camouflage in nature and the perceptual phenomenon of camouflage more generally.
1 Introduction
This work aims to create a simple, abstract, 2-D simulation model of camouflage evolution in nature. These simulated camouflage patterns (Figure 1) emerge from the interaction, the coevolution, of a population of simulated prey, each with a candidate texture, and a population of simulated predators, each with a learning visual detector. The simulation’s main input is a set of photos of a background environment. Prey evolve to be cryptic (hard to find) against the background (Figure 2). Evolving predators learn to hunt prey by locating their positions within that 2-D environment.
Photographs of natural textures, each overlaid with three camouflaged prey. The prey are randomly placed 2-D disks, each with its own evolved camouflage texture. Background photos of: plum leaf litter, tree and sky, gravel, and oxalis sprouts. Zoom in for detail. Disk diameter is 20% of image width.
Photographs of natural textures, each overlaid with three camouflaged prey. The prey are randomly placed 2-D disks, each with its own evolved camouflage texture. Background photos of: plum leaf litter, tree and sky, gravel, and oxalis sprouts. Zoom in for detail. Disk diameter is 20% of image width.
Prey camouflage evolving over simulation time to become more effective in a given environment.
Prey camouflage evolving over simulation time to become more effective in a given environment.
Computational models of complex biological systems have several benefits. Constructing them, making them work as observed in nature, helps crystallize our thinking about natural phenomena. Computational models also allow experimentation in silico to help in understanding these complex natural systems.
This work follows the approach of an earlier simulation (Reynolds, 2011) in which a population of camouflaged prey evolved in response to negative selection from a predator seeking conspicuous prey. In that earlier, interactive, gamelike simulation, the predator was a human “player.” That simulation displayed a photographic background image overlaid with 10 camouflaged prey. With five mouse clicks, the human predator selected the most conspicuous prey. Those selected prey, “eaten” by the predator, were removed from the population. Then they were replaced by offspring created with genetic crossover between surviving prey (the parents) followed by mutation. That simulation step was repeated approximately 2,000 times.
Here evolution of prey camouflage closely follows that earlier work, while the human-in-the-loop is replaced with a population of predators, each based on a deep neural network. These convolutional neural networks (CNNs) take an image as input and produce a prediction, an estimate, of where in the image the most conspicuous prey is located. See Figure 2 for an example of how this process unfolds over time.
In abstract Artificial Life models, it is common to focus in detail on one aspect of a natural system. In the current model, that aspect is the coevolutionary dynamics between prey camouflage and predator vision. To make this feasible, other levels of organization are ignored, or assumed, or represented by a simple computational stand-in. So, for example, the entire living organism that embodies these predators and prey is simply assumed to exist, behaving as animals do, and is otherwise ignored. This model has a simple abstract representation of biological morphogenesis as programs (nested expressions) in TexSyn’s domain-specific language for procedural texture synthesis (Reynolds, 2019). Genetic programming (GP) provides a simple model of evolution that acts on this “genetic” representation, creating new offspring textures through crossover and mutation. On the predator side, all of the animal’s existence is ignored, except for the key aspect of hunting behavior: looking at a scene and forming an opinion about where in the scene a prey is likely located. Learning from experience, these predators adapt to the appearance of an environment and the prey found there. Predators compete with each other on the basis of their ability to find prey, and so to eat, and so to survive. These details of simulated evolution, morphogenesis, vision, and genetic representation are all quite unlike the natural world. But they appear to be sufficiently similar in their effect to allow a plausible simulation of the natural system, producing analogous results, and so may provide insights about the natural system.
To help ground this abstract simulation, consider a bird (predator) hunting for tasty but camouflaged beetles (prey), seen against the bark of a tree trunk (background image).
2 Related Work
This work builds on Reynolds (2011) by replacing the human predator there with an evolving population of procedural predators that hunt using a learning vision model. Harrington et al. (2014) also used coevolution between prey and predators to create camouflage. Their predators detected prey with a multiscale convolution filter whose weights were evolved with a genetic algorithm.
There has been closely related work on learning surface textures to camouflage 3-D objects within real 3-D scenes. A technique for cubes is described by Owens et al. (2014). Then, Guo et al. (2022) described an approach for arbitrarily shaped 3-D objects. In both cases, the 3-D scene is described by a set of photos from various viewpoints. The camouflage textures mapped onto these objects must trade off being inconspicuous from all viewpoints in the scene. Recent work in field biology uses synthetic prey—3-D printed then painted—offered to wild predators to measure responses to specific aspects of camouflage (Kelley et al., 2023).
Other computer graphics work related to camouflage includes meticulously detailed reproduction of coloration patterns on real animals (De Gomensoro Malheiros et al., 2020), generation of visual puzzles incorporating camouflaged images (Chu et al., 2010; Zhang et al., 2020), and a generative, real-time art exhibit based on mimicry (Wu & Huang, 2021). CamoEvo (Hancock & Troscianko, 2022) is a tool box for authoring online camouflage evolution games to understand evolution in real biological species. Such web-based simulations can be powered by very large numbers of human volunteers (“citizen scientists”) engaged in human-based computation.
This adversarial coevolutionary simulation has clear similarities to generative adversarial networks (GANs), originally described by Goodfellow et al. (2014). Observer-driven optimization of camouflage patterns is an ideal application of GANs, as in CamoGAN (Talas et al., 2020). However, the goal of the current work (as suggested in the Future Work section of Reynolds, 2011) is not to learn camouflage but to produce a simulation of a biological evolutionary system, suitable for “what-if” experiments that cannot be performed in the natural world, for example, how does evolved camouflage change as the ratio of predators to prey changes? Some conjectures about natural camouflage might be better tested using A/B comparisons, for example, the relative value of specialist versus generalist camouflage (e.g., Hughes et al., 2019) or of background matching versus disruptive edge coloration (e.g., Price et al., 2019). The simulation described here allows for constructing carefully controlled experiments on these questions, without the difficulty of identifying comparable species in nature. Finally, this model demonstrates that, in the abstract, camouflage can arise in “small” populations over a “short” amount of time.
The procedural texture synthesis used here to generate camouflage patterns has a long history. This work is perhaps most directly inspired by Perlin (1985), in which images are rendered from purely procedural representations of 3-D textures. A recent example of this approach is found in Guerrero et al. (2022). Using texture synthesis under the control of a genetic algorithm goes back to Sims (1991), which in turn was inspired by the interactive biomorph evolution demo by Dawkins (1986). FormSynth (Latham, 1989) inspired Mutator (Todd & Latham, 1994) and other tools. Troscianko et al. (2017) attempted camouflage synthesis, then evaluated effectiveness. Two other recent approaches to generating camouflage patterns are described by Xiuxia et al. (2023) and Zhang et al. (2013). Indirectly related to camouflage, adversarial images (which fool image classifiers into choosing the wrong category) have been created using genetic algorithms (Bradley & Blossom, 2023).
Evolution is represented in this model using GP, a population-based evolutionary optimization algorithm. It was first described by Cramer (1985) and popularized by Koza (1992). GP is a variation of genetic algorithms (GAs) (Holland, 1984). GAs traditionally use a fixed-length bit string as their genetic representation, whereas GP uses an arbitrarily sized, tree-shaped representation. GP trees conveniently map onto nested expressions in a domain-specific language. Texture synthesis in this work is based on nested expressions of texture operators from the TexSyn library (see Figure 4). A prey population of these textures is optimized for camouflage effectiveness by GP using the selection pressure from a population of predators that determine fitness. TexSyn is used with the strongly typed variant of GP known as STGP (Montana, 1995), one of several grammar-based GP variants (McKay et al., 2010). GP has recently been used in the FunSearch work (Romera-Paredes et al., 2023). The GP implementation used here for camouflage evolution is called LazyPredator (Reynolds, 2022b).
The biological literature on camouflage and related topics is vast. A few starting points include an authoritative modern survey (Cuthill, 2019), a comprehensive early survey (Thayer, 1909), an influential book by Cott (1940), pioneering work on mathematical models of biological patterns by Turing (1952), a revisiting of Turing’s reaction-diffusion model with modern computation (Murray, 1988), and a contemporary perspective on how life evolves and learns (Valiant, 2013). Also noteworthy are Endler’s studies of camouflage, both experimental (Endler, 1980) and analytic (Endler, 1978, 2012). Like Endler, Brichard et al. (2023) looked at the interaction of camouflage and sexual signaling. A recent survey of how camouflage increases survival (de Alcantara Viana et al., 2022) has emphasized how it need not be perfect: that a small change—increasing a predator’s search time or reducing its attack rate—provides enough survival advantage to drive camouflage evolution.
2.1 Camouflaged Object Detection
The last several years have seen a surge of computer vision research on camouflaged object detection (COD), which simulates an aspect of predator behavior, specifically the breaking of camouflage. (See the well-curated bibliography in visionxiang, 2022.) COD systems seek to segment camouflaged objects in images: identifying the pixels they cover. A recent example surveyed this topic and has presented a strong solution (Zhang et al., 2022). Other research on COD has included some based on boundaries (Chen et al., 2022; Sun et al., 2022), a mixed-scale approach (Pang et al., 2022), one using transformer architecture (Yin et al., 2022), and attempts to rank camouflaged objects by “conspicuousness” (Lv et al., 2022; Volonakis et al., 2018).
COD attempts a priori camouflage “breaking”—detecting the presence of well-camouflaged objects—without learning either the background or the typical appearance of prey camouflage found in a given environment; that is, COD is a generalist predator, effectively using a form of salience. As summarized by Zhang et al. (2022), COD is based on several labeled data sets—CHAMELEON, CAMO, COD10K, and CAMO-FS (Nguyen et al., 2023) and the ACOD2K data set (Song et al., 2023)—carefully annotated by hand at the pixel level. Newer work has looked at using generative techniques to create synthetic training data for COD (Zhang et al., 2023), and basing COD directly on a diffusion-based approach (Chen, Gao, et al., 2023; Chen, Sun, et al., 2023) and with CLIP (Vu et al., 2023). Hu et al. (2024) took a novel approach, using virtual shadows as a form of cosupervision.
In contrast, the goal of the simulation reported in this article is to pit camouflage evolution against vision-based hunting. So, determining the exact (pixel level) shape of the prey is not a requirement. This simulation ignores segmentation, abstracting prey as a disk of constant size and so sufficiently characterized by its center position. A real-world predator does not require an exact segmentation to aim its attack at a prey’s center. This work needs to find the most conspicuous prey, not all prey. This work simulates predators learning to find prey despite evolving camouflage patterns. This model adapts to dynamic camouflage rather than approaching COD as a static task of generalist detection. Significantly, this work requires no hand-labeled training data sets, because it uses a form of self-supervision.
3 Components of the Simulation
This section provides overviews of the various components that interact to form this coevolutionary camouflage model based on the interaction of predator and prey.
3.1 Coevolution, Populations, and Fitness
This camouflage simulation is based on two adversarial populations: one of predators and one of prey. Individual prey compete for survival within their own population, and similarly for predators. Predators must hunt successfully to “eat” and so to survive. Prey survive if they are inconspicuous (cryptic) enough to avoid being found and eaten. A prey eaten by a predator, or a predator that has perished from hunger, is removed from its population. It is replaced by an offspring of parents from the surviving population.
Predators define the fitness of prey: Being easy to spot is bad; blending in is good. Similarly, prey define the fitness of predators: Being fooled by camouflage is bad; spotting cryptic prey is good. From this adversarial interaction, the two populations coevolve. If one side has some sort of flaw, the other side is motivated to exploit it. As a result, both sides tend to improve over simulated time (see Figures 2, 16, and 17).
This notion of dynamic equilibria between coevolving species is often called the Red Queen hypothesis, after Van Valen (1973), who colorfully explained it with a quote from Lewis Carroll’s Through the Looking Glass: “Now here, you see, it takes all the running you can do, to keep in the same place.”
Initial random prey have coloration likely to contrast with the background. Initial predators have a pretrained (“innate”) ability to find conspicuous (salient) objects, which often allows them to hunt these initially uncamouflaged prey. They could be called generalist predators. As coevolution proceeds, prey become better camouflaged against the given background images. In response, predators learn to better hunt these prey on those backgrounds. They become specialist predators.
3.2 Tournaments, Competition, and Relative Fitness
It is common in evolutionary computation to define fitness as a function that maps an individual (a member of the evolving population) into a number; that is, the function somehow evaluates the individual and assigns it a numeric score. Typically, this fitness function is idempotent: Its value depends only on static properties of the given individual. This can be seen as absolute fitness.
In contrast, this simulation uses relative fitness, determined by competition between individuals, in tournaments. A tournament is a contest between multiple individuals (Angeline & Pollack, 1993).
A simple example of relative fitness is a footrace. The winner of the race is the first to cross the finish line. The order of finishing sorts the runners by speed. Races have been run this way since ancient times. Today it is simple to precisely measure each runner’s absolute speed, but that is not required to determine who won the race. Now, consider two people playing chess. We do not know how to measure or predict a player’s skill. But by pitting the players against each other, having them play a game (or a series of games), the results provide a useful measurement of relative fitness. For a related study of fitnessless coevolution, see Jaśkowski et al. (2008).
Throughout this model, tournaments involve three individuals. (Reynolds, 2011, used tournaments of size 10.) One simulation step (see Figure 3) consists of randomly selecting three prey individuals out of their population to compete in a tournament. Like a footrace, or chess game, a tournament serves to sort the individuals according to relative fitness. That relative fitness results from the behavior of adversarial predators. During the same simulation step, three predators are randomly selecting from their population. Each predator looks at the same input: an image with three camouflaged prey overlaid on a background image. Predators compete with each other by most accurately targeting prey. Prey compete with each other by hiding from (not being found by) the predators.
Overview of one step of the coevolutionary simulation of camouflage. Three prey are selected at random from their population of 400, and similarly for three predators from their population of 40. A random background image is selected from the given set, and a random crop of 5122 pixels is made. The three prey are rendered over the background at random, nonoverlapping locations. This composite tournament image is given to each predator, which estimates a position (circled crosshairs in tournament image; see also Figure 5) predicting the center point of the most conspicuous prey. The predators are scored by “aim error”—the distance from their estimate to the ground truth center of the nearest prey. If the best predator’s estimate is inside a prey’s disk, that prey is eaten and replaced by a new offspring of the other two prey. If all predators fail, all prey survive. If the worst-scoring predator’s estimate is outside all prey, it may die of starvation, to be replaced by a new offspring predator.
Overview of one step of the coevolutionary simulation of camouflage. Three prey are selected at random from their population of 400, and similarly for three predators from their population of 40. A random background image is selected from the given set, and a random crop of 5122 pixels is made. The three prey are rendered over the background at random, nonoverlapping locations. This composite tournament image is given to each predator, which estimates a position (circled crosshairs in tournament image; see also Figure 5) predicting the center point of the most conspicuous prey. The predators are scored by “aim error”—the distance from their estimate to the ground truth center of the nearest prey. If the best predator’s estimate is inside a prey’s disk, that prey is eaten and replaced by a new offspring of the other two prey. If all predators fail, all prey survive. If the worst-scoring predator’s estimate is outside all prey, it may die of starvation, to be replaced by a new offspring predator.
3.3 Negative Selection, Drift, and Mixability
A modern synthesis (Livnat & Papadimitriou, 2016) of evolution theory, game theory, and machine learning (the multiplicative weights update algorithm: “no-regret learning”) suggests that evolution may act to optimize mixability, a type of modularity in the genetic representation of phenotypes. Chastain et al. (2013) showed that it focuses on the “special case of weak selection in which all fitness values are assumed to be close to one another …hypothesizing that evolution proceeds for the most part not by substantial increases in fitness but by essentially random drift” (p. 5). This concept goes back to the “neutral theory” of Kimura (1968).
LazyPredator, the evolutionary model used here, operates in this “neutral” mode (as did, among others, Reynolds, 2011; Harrington et al., 2014). Genetic algorithms are often designed to promote the population’s “best” individuals. They allow these high-fitness individuals to survive longer and reproduce more often. This elitism can skew the evolutionary search: too much exploiting without enough exploring. In contrast, LazyPredator uses negative selection to encourage drift. All high-performing individuals are assumed to have similar fitness. It is the lowest-fitness individuals that get culled by predation. (LazyPredator gets its name from this effect in nature: A lioness chases an antelope herd, causing its members to sort themselves by speed. The lioness usually attacks the slowest individual at the back of the herd.) In each tournament (see section 3.2), the goal is to sort the three individuals by relative fitness. In fact, the only requirement is to identify the least fit of the three individuals.
For example, a predator looks at a scene, then predicts the location of the most conspicuous of three prey. If this location is within the disk-shaped “body” of a prey, it is the least fit of the prey tournament and gets “eaten.” The relative fitness ordering of the other two prey is not significant and is ignored. If all predators fail and predict positions outside all three prey, then the entire simulation step is abandoned, no prey is eaten, and the prey population is unchanged.
In the predator population, a tournament is ranked by the distance from a predator’s prediction to the center of the nearest prey—essentially the predator’s “aiming error.” The worst predator might then die from starvation based on its recent history of hunting success: Has it eaten enough to survive? (Currently this threshold is 40% hunting success over its last 20 attempts; see Appendix 1.) No specific limit on a predator’s lifespan is imposed. Yet predators tend to eventually die off, likely because the prey camouflage gets “too good” for them or they are outcompeted by younger, better predators. This can be seen as a microscopic perspective on the Red Queen hypothesis (Van Valen, 1973).
3.4 Offspring, Crossover, and Mutation
In the prey population, a tournament is used to find the lowest relative fitness (see section 3.2). This corresponds to the worst camouflage, the most conspicuous of three prey in a tournament. If a predator successfully locates a prey, it is captured and “eaten.” The object representing that prey is removed from its population and replaced with a new individual (see Figure 3). This is the population update stage of a steady state GP system (Syswerda, 1991).
The new prey, replacing the one eaten by the predator, is the offspring of two other prey. This is the motivation for using tournaments of size 3: The least-fit prey dies and is replaced by the offspring of the tournament’s two surviving prey. This is why their relative fitness is irrelevant: They both become the parents given as input to the crossover operation.
GP (and LazyPredator), individuals are represented as tree structures. In this simulation, those trees are interpreted as TexSyn programs as described in section 3.5. The GP crossover operation is defined on two abstract parent trees. First, they are copied to preserve the originals. One copy is chosen as recipient and one as donor. In each, a random subtree is selected. The recipient’s random subtree is replaced by the donor’s random subtree by effectively splicing the pointers between tree nodes (Figure 4).
TexSyn expression trees and crossover between them, illustrated here with a simplified version of TexSyn with just three texture operators (spots, stripes, and warp) plus four named, solid-color textures. (a, b) Minimal operator trees. (c, d) Crossover between panels (a) and (b). (c) Spots where blue is replaced with stripes. (d) Stripes where gray is replaced with spots. (e, f) Panels (b) and (d) under a warp operator. (See Appendix 4 for the actual TexSyn c++ code used to create these examples.)
TexSyn expression trees and crossover between them, illustrated here with a simplified version of TexSyn with just three texture operators (spots, stripes, and warp) plus four named, solid-color textures. (a, b) Minimal operator trees. (c, d) Crossover between panels (a) and (b). (c) Spots where blue is replaced with stripes. (d) Stripes where gray is replaced with spots. (e, f) Panels (b) and (d) under a warp operator. (See Appendix 4 for the actual TexSyn c++ code used to create these examples.)
After crossover, a mutation operator further modifies the offspring. It traverses the tree, finding all the leaf nodes, which here correspond to numerical constants in the texture programs. Because LazyPredator uses STGP (Montana, 1995), each leaf belongs to a specific application-defined type (e.g., a floating point number between 0 and 1). A method of the type’s class can mutate (“jiggle”) the constant value of a leaf node (e.g., add a small, signed, random offset, then clip back into the type’s range). After crossover and mutation, the offspring is put into the population, replacing the dead prey.
Predators are represented as CNNs (see section 3.6). For predator offspring, a surviving parent is selected from the tournament. All its parameters are directly copied, then mutated by adding signed noise to each model parameter. This is a population-based mutation-only evolutionary strategy, as opposed to a genetic algorithm or GP. Predators then learn (fine-tune) during their lifetimes.
3.5 Texture Synthesis
Camouflage textures are represented in this simulation as trees of procedural texture operators. These correspond directly to nested expressions in a typical programming language. TexSyn is a simple, domain-specific language for describing textures (Reynolds, 2019).
The details of TexSyn are not central to understanding this camouflage model. A quick overview is given here. TexSyn is a (C++) library, an API, with various operators, each of which returns a texture. Most of them also take textures as input parameters, along with simple values like colors, 2-D vectors, and scalars. Nested expressions of TexSyn operators (“source code”) are compiled into trees of operator instances.
These TexSyn textures are represented by operators, trees, and parameters—but not by pixel data. Instead, the texture class has a function to sample its color at any floating point xy location. That color is computed on the fly, similar to GPU fragment shaders or the Pixel Stream Editor functions in Perlin (1985). Figure 4 shows some simple examples and how these textures can be recombined with tree crossover.
Note that the images in this article are rendered at 5122 pixels and that the prey disks have a diameter of 100 pixels. These are downsampled to 1282 for use by predators’ vision CNNs.
3.6 Predator Vision
It is the predator’s job to look at a tournament image and “hunt” for prey. These images are built from a portion of a background photo and overlaid with three randomly placed, camouflaged prey. The camouflage texture for each disk-shaped prey is rendered on the TexSyn side. Because all images in this simulation are synthetic, they are labeled with the random ground truth position data for each prey. This allows predators to learn in a self-supervised manner.
3.6.1 Pretraining Predator Vision
The basis for a predator’s visual system is a pretrained deep neural net model for a “find conspicuous disk” task (Figure 6). The goal of this task is to look at an arbitrary image and locate the center point of the most conspicuous (salient) region, assumed to be a prey-sized disk. This pretraining is done once, then reused for each subsequent camouflage evolution run. TexSyn was used to generate a data set of 20,000 labeled training examples called FCD6 (Reynolds, 2022a; Figure 7). Using augmentation by random variation, the effective training set size was 500,000. Each example was an RGB image, a 128× 128 ×3 tensor, with an associated label: an xy coordinate pair indicating a location in the image.
Each training image starts with a random texture or a random crop of a photo (from a library of background images, see section 3.7) over which are one or three prey disks with random texture. Although “random texture” is a slippery concept, the meaning here is the kind of prey texture used to initialize the prey population before an evolution run. The LazyPredator GP engine has the ability to create random trees of a given size from a user-defined function set, such as one for TexSyn. A random tree is interpreted as a random nested expression of TexSyn operators with randomly chosen leaf constants. As such, these prey textures are quite varied, but they have a fair amount of structure; they are not uncorrelated noise. See the leftmost image in Figure 2.
Each image in the “find conspicuous disk” training data set is generated in one of three styles chosen with equal probability. Style 1 has a single prey disk; the label is its center point. Style 2 has three different prey, whereas Style 3 has three copies of one prey disk. The latter two cases reduce the visibility of two disks by blending or dithering pixels of the prey disk into the background. The label corresponds to the unchanged disk, presumed to be more conspicuous.
3.6.2 Fine-Tuning Predator Vision
During simulations of camouflage evolution, each predator is initialized as a copy of the pretrained “find conspicuous disk” model to which noise is added. (Zero mean noise of less than ±0.003 is added to each parameter of the deep neural net.) In each simulation step (see Figure 3), the three predators chosen to participate in the tournament predict a prey position. Then, each predator is fine-tuned based on a data set of labeled images collected during this simulation run. Each predator collects its own data set consisting of tournament images in which it participated—essentially its memory of prey coloration in this environment. The fine-tuning data set starts empty, then collects each tournament image until the data set holds 500 images, then replaces one of the 500 chosen at random. Labels for a predator’s data set are the predictions it made in each tournament (see Figure 5). The predator’s CNN is fine-tuned (trained) on this set of (up to) 500 images. The samples are initially in temporal order, then become randomized as the predator survives more than 500 tournaments. This approach is analogous to the use of batching in deep Q-learning (see, e.g., Casgrain et al., 2022, Algorithm 5.1).
Tournament image after simulation step: three camouflaged prey on a random background crop. Three crosshair marks show the responses of three predators, ranked by minimum distance to a prey center. Details are in Appendix 6.
Tournament image after simulation step: three camouflaged prey on a random background crop. Three crosshair marks show the responses of three predators, ranked by minimum distance to a prey center. Details are in Appendix 6.
3.7 Background Sets
Each simulation run is based on a background set of images, usually photographs of natural scenes. These images provide the background of tournament images, over which camouflaged prey are drawn. (See Figures 11–18 and Appendix 2.) The images play the role of an environment in which prey must hide to avoid being found and eaten by predators. Because the model is purely 2-D, photographs offer an easy way to provide varied environments of plausible natural complexity in which to test camouflage evolution.
Background sets used in this work consist of from 1 to 14 photographs, 4 or 5 being typical. Almost all are casual snapshots taken with a mobile phone. The images within a given background set are all similar. For example, a set called oak_leaf_litter has six images, all of slightly different portions of fallen leaves piled along a roadside. Each image in this set is taken with the camera pointing straight down, all from about the same height. As a result, the images have features (e.g., leaves) of about the same size. This “similarity” helps the camouflage evolution process by providing many unique yet analogous backgrounds in which to try hiding.
As used here, “similarity” is meant to suggest that image features have a stationary distribution, that different patches (say, of a size comparable to prey) have the same statistical distribution of color and spatial frequency. When background sets are “less stationary” (e.g., large areas of uniform color), it becomes harder for evolution to find good camouflage. (See the faster progress for an easy background in Figure 16—the “hard” background set shown there is the yellow_flower_on_green set in Figure 12, which includes large regions of mostly yellow, mostly green, or mostly black.)
3.8 Simulation Runs
To run this coevolutionary simulation model, two processes are launched. One is the evolutionary texture synthesis system that models a population of camouflaged prey. It is C++ code based on TexSyn and LazyPredator. The other is a “predator server” that manages a population of visual hunters and fine-tunes their visual perception. This PredatorEye (Reynolds, 2021) is Python code using Keras (Chollet et al., 2015) and TensorFlow (Abadi et al., 2016). The prey side produces a labeled tournament image, which is inspected by the predator side, which then sends back a target location within the image, indicating its estimate of where the most conspicuous prey is located. The labeled tournament image is used to fine-tune the predators. The target location drives the fitness function for prey evolution.
Main parameters for a simulation run are a choice of background set (see section 3.7), a scale factor for the background images, and a random number seed. The full set of run parameters is described in Appendices 1 and 5.
A typical run consists of 12,000 steps with a prey population of 400 and a predator population of 40. A reimplementation of the interactive approach of Reynolds (2011) used a prey population of 120 and typical runs of 2,000–3,000 steps, as did the first version with a neural net predator. Introducing a population of such predators reduced the rate of fine-tuning of predators’ neural net models, leading to simulations of 6,000–12,000 steps.
During a simulation run, various data are collected in log files. The most important output is a “visual log”—periodically saving tournament images (a crop of the given backgrounds overlaid with camouflaged prey, as in Figure 1) along with the predator response data. These images are saved “occasionally.” Originally it was every 20 steps but was changed to every 19 steps to be relatively prime and so cycle through the 6, 10, or 20 subpopulations (demes or islands) of the prey population. Eventually, hand selection from this periodic collection was replaced by autocuration (see section 3.10).
3.9 Static Quality Metric
To track the objective progress of camouflage coevolution, the pretrained FCD neural net model can serve as a standard predator to provide an unchanging static quality metric (SQM). Each prey is scored by the fraction of 10 trials over which this standard predator is “fooled”—or fails to find the prey on a random background. Figure 16 uses the SQM to visualize the difference between an easy, “stationary” background set and a hard, “nonstationary” background containing large features. Figure 17 illustrates the relatively small variance between two runs that were identical except for their random number seeds.
3.10 Automatic Selection of Results From a Run
A typical simulation run produces up to 600 files containing tournament images. That is a lot to store. Worse, it is a lot to sort through by hand to find a few good-quality, “representative” images, say, for publication. This was the procedure used for most of the camouflage images in this article.
This project has progressed from a human-in-the-loop model to less and less human intervention. In Reynolds (2011), the predator was a human. Then, a self-organizing predator model was incorporated (Reynolds, 2023a). Finally, an autocuration facility was added to automatically select results from a run. This is similar in spirit to an issue in multiobjective evolutionary algorithms: selecting the best from the many solutions clustered along a Pareto optimal boundary (see, e.g., Ishibuchi et al., 2022).
This autocuration selects candidate images by an objective method, based on the static quality metric (section 3.9), intended to collect most of the high-quality images showing effective camouflage. Nonetheless, the final culling (for publication or application) is likely done by a human.
The current autocuration filter combines two factors. Both selected images must have a “perfect” SQM score, meaning the pretrained predator fails to find them in any of 10 trials. Second, images selected by autocuration must have “fooled” (gone undetected by) all three predators in a tournament. The first component is based on 10 trials against the pretrained predator. The second component is based on three trials against fine-tuned predators evolved for this specific background and these specific camouflaged prey. (More details are in the September 30, 2023, post in the project blog; Reynolds, 2023b.)
4 Discussion
This simulation model was applied to a variety of background sets to produce prey camouflage patterns suited to those environments. See examples in Figures 1, 2, and 18; in other figures in this article; and on the TexSyn blog (Reynolds, 2023b).
These experiments were run on a 2021 Apple MacBook Pro with an M1 Max chip. Typical simulation runs show the predator vision/learning process taking approximately 600% of the processing power (i.e., approximately 6 of 10 cores) and the prey texture render/evolve process taking approximately 10% of a core (many steps reuse cached prey renderings). The time to complete a single simulation step is approximately 1 second. Typical simulation runs are 12,000 steps and so take approximately 3.5 hours. When using the static quality metric to evaluate progress (Figure 16), total simulation run time is approximately 10 hours.
A key observation from these experiments is that background sets present varying levels of difficulty (Figure 16). For some background sets, a simulated evolution run easily produces effective camouflage. Occasionally it is good enough to momentarily fool a human observer: “Wait, where is the third prey?” Sometimes runs fail to produce effective results. (That is, none of the results appear well camouflaged to a human observer, or the average SQM for a run never gets above, say, 0.6.) Rerunning the simulation with a new random seed will often find a solution. This suggests that a “hard” background set may have a likelihood of success during a run of, say, 1/2 or 1/3, while “easy” backgrounds have a likelihood close to 1.
Figure 17 shows that these simulations produce camouflage patterns of similar quality for similar runs. Keeping all simulation parameters constant, except for the initial random number seed, the plot of static quality metric over time is very similar between the two runs. Images of these runs can be seen in the project blog (Reynolds, 2023b) entry for March 19, 2023. Note that although those runs have similar SQM scores, visually, they are quite dissimilar.
5 Limitations
This work has the nature of an “existence proof.” The model of camouflage coevolution presented here is a starting point. Its architecture and parameters ( Appendix 1) are unlikely to be optimal. It was tuned well enough to allow camouflage evolution to emerge but surely can be improved through further systematic experimentation.
Another limitation is that all final results are to some extent hand selected: “cherry-picked” (see, e.g., Figures 1, 2, and 18). After a simulation run, the resulting images are examined by a human, who selects some as representative of the results by making a mental judgment about their effectiveness.
This cherry-picking is a problem for evaluating the scientific validity of the model: Does this technique work well, or is the author just good at cherry-picking effective results? As a practical matter, the autocuration described in section 3.10 allows for using this simulation to generate camouflage without human input and so can avoid potential experimenter bias that might skew results. That “hands-free” operation enables running large, automated processes composed of many camouflage coevolution simulations, for example, making several runs that are identical except for settings of a single parameter.
Other limitations include the inherently 2-D nature of the simulation, that simulated time is discrete, and that the model of texture synthesis lacks genetic or biological plausibility. These are among the simplifying abstractions intended to make this initial model of camouflage coevolution tractable. Still, it may be fair to complain that as a result, the model is too abstract to provide much biological insight.
6 Future Work
The goal of this work has been to build a computational model of the coevolution of camouflage from the interaction of predator and prey. It demonstrates that camouflage can in fact arise from such a system. It also provides a simple process to generate camouflage for a given environment from photos. The most important contribution of the work has been its creation of an open source model allowing future experiments to study camouflage in nature and the perceptual phenomenon of camouflage more generally.
A static quality metric for camouflage (discussed in section 3.8 and Figures 16 and 17) is based on using the pretrained predator model (see section 3.6.1) as a standard. This is not ideal. It seems likely that this metric will saturate (reach the top of its range, failing to find a prey in all 10 trials) while coevolution continues to improve camouflage quality. Other approaches should be investigated. For example, Lv et al. (2022) and Volonakis et al. (2018) suggested potential ways to rank “conspicuousness.” The model presented here would be a useful test bed for evaluating candidate camouflage metrics. It should also be possible to validate these candidate metrics with crowd-sourced ratings of camouflage quality, as in Sensory Ecology and Evolution: Games (Stevens et al., 2022) and CamoEvo (Hancock & Troscianko, 2022).
As mentioned in section 3.7 and Figure 16, the “difficulty” of evolving an effective camouflage pattern for a given background set seems to be strongly related to the stationarity of the images. Applying information theory metrics for stationarity (Conni et al., 2021) to background images beforehand might provide useful predictions of the difficulty of a given background set. This could in turn be used to automatically set hyperparameters for the run, such as evolutionary population and the number of evolution steps to take.
All experiments described here use a fixed background environment, uniformly sampled from a prespecified set of background images. This model could easily be extended to schedule changes of the background images over evolutionary time. These might reflect seasonal changes, longer-term climate changes, or animals moving from one ecosystem to another. Not only would this open up many new experimental directions but, as described by Kashtan et al. (2007), “varying environments can speed up evolution” (p. 13711).
All of the simulation parameters in Appendix 1 should be reexamined using a static camouflage metric. Similarly, the neural net architecture in Figure 6 is simply the first one that worked. Other designs should be constructed and evaluated.
Architecture of a predator’s neural net, which maps a 1282 RGB image onto an xy location where it estimates that the most conspicuous prey is centered. All convolution layers after the first use strides of (2,2), while doubling the number of learned filters. Then, fully connected layers reduce the flattened features, by a factor of 4, down to an output layer of two values. This model contains 3.2 million parameters. (More details are provided in Appendix 3.)
Architecture of a predator’s neural net, which maps a 1282 RGB image onto an xy location where it estimates that the most conspicuous prey is centered. All convolution layers after the first use strides of (2,2), while doubling the number of learned filters. Then, fully connected layers reduce the flattened features, by a factor of 4, down to an output layer of two values. This model contains 3.2 million parameters. (More details are provided in Appendix 3.)
This work used a fixed predator model as a static quality metric to evaluate and track changes in prey camouflage quality over time (see section 3.9). Perhaps it would be useful to do effectively the opposite and so provide a way to track predator quality over time. Perhaps a subset of FCD6 (Reynolds, 2022a) training images (see Figure 7), or a random sampling of tournament images from the current run could be selected as a fixed reference for predator quality. The metric might be the fraction of such test images a predator locates successfully. This sort of metric might help, for example, to visualize the coevolutionary dynamics at work in this simulation.
Examples of the three types of labeled examples in the training data set for the “find conspicuous disk” task for pretrained CNN models FCD5 and FCD6 (Reynolds, 2022a). (top) Type 1: single random prey over a random background. (middle) Type 2: three different prey. (bottom) Type 3: three copies of one prey. For Types 2 and 3, one prey is unaltered, whereas the other two are blended into the background, by differing amounts, to make them more muted and so perhaps less conspicuous. See section 3.6.1.
Examples of the three types of labeled examples in the training data set for the “find conspicuous disk” task for pretrained CNN models FCD5 and FCD6 (Reynolds, 2022a). (top) Type 1: single random prey over a random background. (middle) Type 2: three different prey. (bottom) Type 3: three copies of one prey. For Types 2 and 3, one prey is unaltered, whereas the other two are blended into the background, by differing amounts, to make them more muted and so perhaps less conspicuous. See section 3.6.1.
Four tournament images from run yellow_flower_on_green_20221217_1826. This background set was a notoriously “hard” non-stationary test case.
Four tournament images from run yellow_flower_on_green_20221217_1826. This background set was a notoriously “hard” non-stationary test case.
Four tournament images from run redwood_leaf_litter_20230115_1730.
Four tournament images from run tree_leaf_blossom_sky_20221108_2018.
Static quality metric (see section 3.9) versus simulation time. Metric is based on the failure of the pretrained FCD model to find prey. It compares an “easy” background (oak_leaf_litter) to a “hard” background (yellow_flower_on_green).
Static quality metric (see section 3.9) versus simulation time. Metric is based on the failure of the pretrained FCD model to find prey. It compares an “easy” background (oak_leaf_litter) to a “hard” background (yellow_flower_on_green).
Static quality metric comparing two runs that differ only by the random number seed used. This illustrates the typical variance between simulation runs. Run A (redwood_leaf_litter_20230319_1128) and Run B (redwood_leaf_litter_20230322_2254) track along a similar curve during the 12,000 simulation steps. In both cases, the population average SQM improves strongly until approximately Step 5,000, then continues to improve at a much slower rate.
Static quality metric comparing two runs that differ only by the random number seed used. This illustrates the typical variance between simulation runs. Run A (redwood_leaf_litter_20230319_1128) and Run B (redwood_leaf_litter_20230322_2254) track along a similar curve during the 12,000 simulation steps. In both cases, the population average SQM improves strongly until approximately Step 5,000, then continues to improve at a much slower rate.
In early versions, simulated predators often incorrectly predicted prey to be at the tournament image’s center. Perhaps this center preference is a “lazy” default strategy, because that is the mean over all prey positions. To work around this, TexSyn’s random placement was constrained to avoid prey center positions within one prey diameter of the image center, thus forcing predators to hunt prey away from the center. This seems like a bug that should be better understood and fixed in a more principled way.
This simulation is currently a mixed paradigm model using both evolution and learning. While these typically co-occur in nature (Valiant, 2013), what about an all-evolution model with evolved detectors for predators, perhaps along the lines of Harrington et al. (2014) and Bi et al. (2022)? Recent work on evolutionary algorithms for merging deep neural net models (Akiba et al., 2024) might provide a useful way to evolve a population of CNN predators. Or conversely, an all-learning model, something similar to CamoGAN (Talas et al., 2020), may provide new research directions.
An obvious next step is to apply this model to 3-D environments—perhaps as described by Miller et al. (2022). One key simplifying assumption of the current, purely 2-D model is that plausible, naturally complex environments can be provided by simple photographs from the phone in one’s pocket. Providing plausibly complex 3-D environments is much harder. Perhaps neural techniques, such as NeRF (Gao et al., 2022) or NIP (Sharp & Jacobson, 2022), would meet that need.
Acknowledgments
I deeply appreciate everyone who helped me with this work: my family for loving support, Ken Perlin for (well, lots, but especially) PSE (Perlin, 1985), Andrew Glassner for teaching me everything I know about deep learning (Glassner, 2021), and Pat Hanrahan for some key career advice (“just do the research”). I’ve been working on this project on and off since 2007, based on inspiration by papers in the early 1990s by Witkin and Kass (1991), Turk (1991), Angeline and Pollack (1993), and Sims (1991, 1994), and also one paper published the year before I was born: Turing (1952). I also thank three sets of reviewers for many helpful suggestions. Thanks for additional help from Bilal Abbasi, Jan Allbeck, Rebecca Allen, Richard Dawkins, Steve DiPaola, Aaron Hertzmann, Bjoern Knafla, John Koza, Dominic Mallinson, Nick Porcino, Michael Wahrman, and Lance Williams. Thanks also to my neighbors, whose landscaping provided many of the background images used here, collected on daily walks during COVID-19 lockdown.
References
Appendix 1: Key Simulation Parameters
dummy caption 1.
Parameter . | Value . |
---|---|
Predator population | 40 |
Prey population | 400 |
Prey subpopulations (demes) | 20 |
Prey max. initial tree size | 100 |
Prey min. tree size after crossover | 50 |
Prey max. tree size after crossover | 150 |
Prey render diameter (pixels) | 100 |
Tournament output image size | 512×512 |
Predator input image size | 128×128 |
Simulation steps per run (typical) | 12,000 |
Prey generations equiv. (steps/pop.) | 30 |
Predator fail rate (typical; %) | 15–30 |
Predator starvation threshold | |
(success in previous 20 attempts, %) | <40 |
Predator “FCD” pretraining: | |
Synthetic data set size | 20,000 |
Effective size with augmentation | 500,000 |
Max. signed “jiggle” noise added to | |
all parameters of new predator CNN | ±0.003 |
Static quality metric: trials per prey | 10 |
Note. Details in source code. |
Parameter . | Value . |
---|---|
Predator population | 40 |
Prey population | 400 |
Prey subpopulations (demes) | 20 |
Prey max. initial tree size | 100 |
Prey min. tree size after crossover | 50 |
Prey max. tree size after crossover | 150 |
Prey render diameter (pixels) | 100 |
Tournament output image size | 512×512 |
Predator input image size | 128×128 |
Simulation steps per run (typical) | 12,000 |
Prey generations equiv. (steps/pop.) | 30 |
Predator fail rate (typical; %) | 15–30 |
Predator starvation threshold | |
(success in previous 20 attempts, %) | <40 |
Predator “FCD” pretraining: | |
Synthetic data set size | 20,000 |
Effective size with augmentation | 500,000 |
Max. signed “jiggle” noise added to | |
all parameters of new predator CNN | ±0.003 |
Static quality metric: trials per prey | 10 |
Note. Details in source code. |
Appendix 2: Background Image Sets
Following are the names and descriptions for the sets of background images used in this article (see section 3.7). Each set comprises several photographs of a similar natural scene.
dummy caption 2.
Name . | Description . | Photos . | Figures . |
---|---|---|---|
backyard_oak | under canopy of California live oak (Quercus agrifolia) | 12 | 8 |
bean_soup_mix | mixture of dried beans from grocery store | 4 | 19 |
jans_oak_leaves | white oak leaf litter (Jan Allbeck, Fairfax, VA) | 6 | 20 |
kitchen_granite | polished granite countertop in our kitchen | 6 | 11 |
mbta_flowers | flowers (impatiens?) near MBTA Northeastern stop | 4 | 5, 9 |
michaels_gravel | gravel bed in neighbor’s front yard | 4 | 1, 7, 10 |
oak_leaf_litter | fallen oak leaves on edge of road | 6 | 2 |
oxalis_sprouts | sprouts of oxalis push through leaf litter after first rain | 5 | 1 |
plum_leaf_litter | fallen leaves from plum and other trees, near sunset | 5 | 1, 15 |
redwood_leaf_litter | dried redwood leaf litter collected in a roadside gutter | 4 | 13 |
rock_wall | “dry stack” retaining wall in a neighbor’s front yard | 14 | 18 |
tiger_eye_beans | dried heirloom “tiger eye” beans from farmer’s market | 5 | 3 |
tree_leaf_blossom_sky | small trees (branch, leaf, and blossom), sky background | 5 | 1, 14 |
yellow_flower_on_green | “Scot’s broom” (or “French broom”?) in neighbor’s yard | 6 | 12 |
Name . | Description . | Photos . | Figures . |
---|---|---|---|
backyard_oak | under canopy of California live oak (Quercus agrifolia) | 12 | 8 |
bean_soup_mix | mixture of dried beans from grocery store | 4 | 19 |
jans_oak_leaves | white oak leaf litter (Jan Allbeck, Fairfax, VA) | 6 | 20 |
kitchen_granite | polished granite countertop in our kitchen | 6 | 11 |
mbta_flowers | flowers (impatiens?) near MBTA Northeastern stop | 4 | 5, 9 |
michaels_gravel | gravel bed in neighbor’s front yard | 4 | 1, 7, 10 |
oak_leaf_litter | fallen oak leaves on edge of road | 6 | 2 |
oxalis_sprouts | sprouts of oxalis push through leaf litter after first rain | 5 | 1 |
plum_leaf_litter | fallen leaves from plum and other trees, near sunset | 5 | 1, 15 |
redwood_leaf_litter | dried redwood leaf litter collected in a roadside gutter | 4 | 13 |
rock_wall | “dry stack” retaining wall in a neighbor’s front yard | 14 | 18 |
tiger_eye_beans | dried heirloom “tiger eye” beans from farmer’s market | 5 | 3 |
tree_leaf_blossom_sky | small trees (branch, leaf, and blossom), sky background | 5 | 1, 14 |
yellow_flower_on_green | “Scot’s broom” (or “French broom”?) in neighbor’s yard | 6 | 12 |
Appendix 3: Details of Pretrained Predator Model
The pretrained predator visual system, shown in Figure 6, is a Keras TensorFlow CNN model with approximately 3.2 million parameters. Its input is a 128×128 pixel RGB image (128× 128 ×3 scalar values), and its output is an xy location where the model estimates that the most conspicuous prey is centered. Following is its Keras model.summary().
Appendix 4: TexSyn C++ Code for Figure 4
Uniform white(1);
Uniform gray(0.1);
Uniform blue(0, 0, 1);
Uniform green(0, 1, 0);
LotsOfSpots spots(0.9, 0.05, 0.3, 0.02, 0.02, blue, white);
Grating stripes(Vec2(), green, Vec2(0.1, 0.2), gray, 0.3, 0.5);
NoiseWarp warp_stripes(1, 0.1, 0.7, stripes);
LotsOfSpots spots2(0.9, 0.05, 0.3, 0.02, 0.02, stripes, white);
Grating stripes2(Vec2(), green, Vec2(0.1, 0.2), spots, 0.3, 0.5);
NoiseWarp warp_all(1, 0.1, 0.7, stripes2);
Appendix 5: Additional Command Line Arguments to TexSyn for Simulation Runs
background image directory (required)
output directory (defaults to .)
background scale (defaults to 0.5)
random seed (else: default seed)
window width (defaults to 1200)
window height (defaults to 800)
individuals (defaults to 120)
subpopulations (defaults to 6)
max init tree size (defaults to 100)
min crossover tree size (default max init tree size * 0.5)
max crossover tree size (default max init tree size * 1.5)
Appendix 6: Additional Samples of Predator Responses
Similar to Figure 5, the examples in Figure A1 show the prediction output of all three predators in each tournament. The crosshairs are ordered by least “aim error,” with the best drawn in black and white, the second best in black and green, and the worst in black and red. In Figure A1(a), All three predators miss all three prey. In Figure A1(b), The best is near the center of one prey, the second is off-center but still inside another prey, and the third fails to find any prey. In Figure A1(c), All three succeed; the best is inside one prey, while the other two are inside another prey. Figure A1(d), All three predators select the same prey, which has especially conspicuous coloration.
Four tournament images, plus predator responses as crosshairs, from run tiger_eye_beans_20220903_1401.
Four tournament images, plus predator responses as crosshairs, from run tiger_eye_beans_20220903_1401.