## Abstract

Thirty years, 1993–2023, is a huge time frame in science. We address some major developments in the field of evolutionary algorithms, with applications in parameter optimization, over these 30 years. These include the covariance matrix adaptation evolution strategy and some fast-growing fields such as multimodal optimization, surrogate-assisted optimization, multiobjective optimization, and automated algorithm design. Moreover, we also discuss particle swarm optimization and differential evolution, which did not exist 30 years ago, either. One of the key arguments made in the paper is that we need fewer algorithms, not more, which, however, is the current trend through continuously claiming paradigms from nature that are suggested to be useful as new optimization algorithms. Moreover, we argue that we need proper benchmarking procedures to sort out whether a newly proposed algorithm is useful or not. We also briefly discuss automated algorithm design approaches, including configurable algorithm design frameworks, as the proposed next step toward designing optimization algorithms automatically, rather than by hand.

## 1 Introduction

When being asked to write a paper 30 years after the original work was published,
namely our paper, “An Overview of Evolutionary Algorithms for Parameter
Optimization” (Bäck and Schwefel, 1993), I (Thomas Bäck) immediately thought this would be too big
a challenge. In the early 1990s, the “big unification” was achieved in
the field that formerly consisted of relatively strictly separated algorithmic
branches: Genetic Algorithms, Genetic Programming, Evolutionary Strategies, and
Evolutionary Programming, which were then united into what we call today *Evolutionary Computation*. Due to Evolutionary
Computation's success in a huge range of application domains and its
importance as a foundation for many subfields of computer science, the development
of the field over the past 30 years has been tremendous, going far beyond what we
can report in this paper. In the late 1990s, the *Handbook of Evolutionary
Computation* (Bäck et al., 1997) represented an attempt towards summarizing the state of the art
based on a unified perspective of the field, and the editors were taking up this
challenge since there was at least some hope to make it happen.

*continuous parameter optimization*problems of the form:

In the following, we provide a list of *what we subjectively consider* important developments over the past 30 years, and we elaborate on some of those in
detail across the paper.

The development of the

*Covariance Matrix Adaptation Evolution Strategy*, in short CMA-ES (Hansen and Ostermeier, 1996), has defined a new state of the art and a whole branch of algorithms derived from the CMA-ES (see Bäck et al., 2013, for an overview). Section 2.1 provides more details about this development.Genetic Algorithms have been specialized into many powerful variants for combinatorial optimization problems, which are beyond the scope of this paper. When it comes to our problem definition (see Equation 1), the Genetic Algorithms often still exhibit strong similarities with the canonical Genetic Algorithm described by Bäck and Schwefel (1993), as discussed in Section 2.2.

Parameter control methods, dating back to the early days of Evolution Strategies with Rechenberg's 1/5-success rule (Rechenberg, 1973) and Schwefel's self-adaptation methods (Schwefel, 1981), have now become a widespread technique in Evolutionary Computation. Section 2.2 discusses this in the context of Genetic Algorithms, and Section 3.1 provides a more general perspective.

When it comes to identifying multiple local optima at the same time in so-called

*multimodal optimization*, and its generalization to find solutions that exhibit behavioral properties as diverse as possible, new techniques in*Novelty Search*and*Quality-Diversity Search*have been developed. These algorithms are briefly discussed in Section 3.2.*Constraint handling methods*have been developed well beyond penalty-function and re-evaluation-based approaches that were popular in the early 1990s; see Section 3.3 for details.For expensive objective function evaluations,

*Surrogate-Assisted Evolutionary Algorithms*use nonlinear regression methods from the field of machine learning to learn fast-to-evaluate proxy functions for objective(s) and constraints (Jin, 2011). Section 3.4 provides a few of the corresponding developments in this area of research.Multiobjective optimization (Deb, 2009; Coello Coello, 2006) has developed from only a few publications into a huge field within Evolutionary Computation due to the powerful ability of a population to approximate a Pareto-optimal frontier in its entirety by using concepts such as Pareto dominance or performance indicators. Section 3.5 explains some of the key developments and state of the art.

Automated algorithm design using configurable frameworks, discussed in Section 3.6, is a promising new approach towards automatically creating new algorithms for parameter optimization and dynamically configuring algorithms.

Particle swarm optimization and differential evolution, two relevant classes of evolutionary algorithms that did not exist yet in 1993, are discussed briefly in Sections 3.7 and 3.8.

It is impossible to discuss the wealth of new theoretical results in the field, such that we can only give a very brief list of relevant works in Section 4.

In Section 5, we briefly discuss the flood of “new” paradigms from nature that are proposed as optimization algorithms, and the urgent need for appropriate benchmarking procedures for qualifying new methods, and in Section 6 a brief outlook is provided.

Most recently, we have seen many tremendously important developments in the field, for instance, new benchmarking standards and test function sets, statistical analysis methods, and software tools for experimentation, visualization, and analysis of experimental results (Bäck et al., 2022). In Section 5.2, we address some key developments in benchmarking and argue that proper benchmarking and empirical analysis of optimization algorithms is critical to assessing their performance and their value to the community and, more importantly, to the domain expert who wants to solve an optimization problem but is not an expert in optimization algorithms. Related to this, we also need to stress that it is of paramount importance to properly benchmark any proposed “natural metaphor-based” optimization algorithm against established state-of-the-art methods and to refrain from an unjustified use of non-mathematical terminology to describe such algorithms. As outlined in Section 5.1, we fully concur with critical voices such as Sörensen (2015) and Campelo and Aranha (2018, 2021), and would like to encourage the research community to support the publication of new algorithms only if they are adequately defined using mathematical notation and appropriately benchmarked against well-known state-of-the-art algorithms.

Going beyond proposing expert-designed optimization algorithms, a fascinating idea is
to automatically design optimization algorithms that are best-in-class for a given
(set of) optimization problems. Approaches such as *hyper-heuristics* (Pillay and Qu, 2018; Drake et al., 2020), which can be based on genetic
programming (Burke et al., 2009), have been
proposed for such automated algorithm design. More recently, algorithm configuration
methods have also been proposed very successfully for genetic algorithms (Ye et al., 2022), modern CMA-ES-based evolution
strategies (van Rijn et al., 2016), and
differential evolution and particle swarm optimization (Boks et al., 2020; Camacho-Villalón et al., 2022b). Although these ideas are not completely
new in the domain of genetic algorithms (see Grefenstette, 1986, for hyperparameter optimization; Bäck, 1994, for algorithm configuration and
hyperparameter optimization), they can now be applied at a massive scale, with
proper benchmarking procedures for comparing the algorithmic variants, and also
facilitating both configuration and hyperparameter optimization as a systematic next
step.

There are many important aspects of evolutionary computation that are not covered in this paper simply because we had to make choices.

## 2 Evolutionary Computation for Parameter Optimization—30 Years Later

We assume familiarity with the underlying ideas of evolutionary computation,
including a set of candidate solutions (the *population*), a given
optimization problem (as in Equation 1), *variation operators* such as *mutation* and *recombination*, and one or more *selection* operators—and an iteration of the operator loop until a given termination
criterion is satisfied. In Bäck and Schwefel (1993), we have presented the general algorithmic loop of evolutionary
algorithms and how it can be used to describe the various instances. In this
section, we briefly summarize some of the key developments since 1993 concerning
parameter optimization variants.

### 2.1 Evolution Strategies

Evolution strategies, like all other EAs, have seen tremendous development over
more than 60 years, starting from the early work at the Technical University of
Berlin (Schwefel, 1965; Rechenberg, 1965, 1973). The algorithm, a (1+1)-ES, was inspired by
the process of biological evolution and used a simple iterative procedure that
mutates the search point of a single individual $x\u2208Rn$ by a multivariate normal distribution, that is, $x'=x+\sigma \xb7N(0,I)$,
and assigns $x\u2190x'$ iff $f(x')\u2264f(x)$.
Experimenting with engineering optimization problems, Schwefel and Rechenberg
quickly discovered the need for adaptively learning the mutation step size $\sigma $,
a concept that then became central to evolution strategies: The *self-adaptation* of strategy parameters.

For the (1+1)-ES, it started with the so-called 1/5-success rule for updating $\sigma $, which was derived from theoretical results on the sphere and corridor models (Schwefel, 1977). With multimembered evolution strategies, for example, the $(\mu ,\lambda )$-ES and the $(\mu +\lambda )$-ES, which has $\mu $ parents and $\lambda $ offspring, mutative self-adaptation was introduced. With these methods, individuals are represented as $(x,\theta )$, where in addition to the search point $x$, a set of endogenous parameters $\theta $ is evolved by the ES, based on the intuition that good search points sprout from good strategy parameters (Beyer and Schwefel, 2002). Schwefel's strategy extensions additionally allowed for the self-adaptation of coordinate-specific step sizes $\sigma i$ and covariances $cij$ for mutations, using a lognormal distribution for their variation (Schwefel, 1981). In 1993, this $(\mu ,\lambda )$-mutational step size control (MSC)-ES (Schwefel, 1981) was considered state of the art and was the first evolution strategy that was able to adapt the parameters of the mutation distribution to an arbitrary covariance matrix and generate correlated mutations.

However, as pointed out by Hansen et al. (1995), this strategy strongly depends on the chosen coordinate
system. It is not invariant to search space rotations—an insight that led
to a development that again, after the discovery of self-adaptation, has
significantly advanced state of the art through *derandomization* (Section 2.1.1) and the *Covariance Matrix Adaptation ES* and its derivates (Section 2.1.2), which are now state of the
art in the field.

#### 2.1.1 Derandomized (DR) Evolution Strategies

Derandomized evolution strategies moved away from mutative self-adaptation and used global strategy parameters for all the individuals in the population. This was motivated by the observation that mutative self-adaptation of individual step sizes is unsatisfactory for small populations (Schwefel, 1987), which is attributed to two key reasons. Namely, (i) good strategy parameters do not necessarily cause a successful mutation of the search points, and (ii) there is a conflict in maintaining a diverse set of strategy parameters within the population and stability of the search behaviour (Ostermeier et al., 1994a). Derandomized evolution strategies aim to achieve self-adaptation without any independent stochastic variation of the strategy parameters. This is addressed in DR1 by using the length of the most successful mutation vector $z\u223cN(0,I)$ to update $\sigma $ and by applying a dampening factor $\beta $ in order to reduce strong oscillations of $\sigma $ over time (Ostermeier et al., 1994a).

With DR2 (Ostermeier et al., 1994b), in addition to a global step size $\sigma $, local step sizes $\sigma \u2208R+n$ are included, which control the variability of each component of the decision variables. The update of both the local and global step sizes is not only based on the best mutation vector $z$ in the current generation, but also on the best moves of previous generations. This information is collected in a vector $\zeta $, which is updated based on a global learning rate $c$.

DR3 (Hansen et al., 1995), or *Generating Set Adaptation Evolution Strategy* ($(1,\lambda )$-GSA-ES),
includes a representation of the covariance matrix as a global strategy
parameter and can produce mutations from an arbitrary multivariate normal
distribution, similar to the $(\mu ,\lambda )$-MSC-ES.
Moreover, the GSA-ES is able to learn problem scaling and is invariant to
search space rotations. Instead of using a direct representation for the
covariance matrix, a matrix $B\u2208Rn\xd7m$ is used, of which the column vectors form a so-called generating set.
Correlated mutation vectors are generated by matrix multiplication of a
random normal vector $z\u223cN(0,I)$ with $B$. The number of columns $n2<m<2n2$ of the matrix can be used to control the speed of adaptation, where a
smaller value of $m$ yields faster adaption and a larger value increases the accuracy.

#### 2.1.2 Completely Derandomized Self-Adaptation in Evolution Strategies

The algorithms described in the previous section, DR1-3, introduce the first level of derandomization of parameter control in evolution strategies, which reduces selection disturbance on the strategy level. The second level of derandomization, or complete derandomization, completely removes this disruption and aims to explicitly realize the original objective of mutative strategy parameter control: to favor strategy parameters that produce selected mutation steps with high probability (Hansen and Ostermeier, 2001). In addition, the second level of derandomization also aims to provide a direct control mechanism for the rate of change of strategy parameters and requires that strategy parameters are left unchanged in the case of random selection.

The *Covariance Matrix Adaptation Evolution Strategy
(CMA-ES)*, introduced in Hansen and Ostermeier (1996), satisfies these conditions
through two key techniques, Covariance Matrix Adaptation (CMA) and
Cumulative Step length Adaptation (CSA). A series of papers (Hansen and
Ostermeier, 1996, 1997, 2001) refined the CMA-ES to use weighted recombination for $\mu >1$.
In this section, we discuss the $(\mu W,\lambda )$-CMA-ES
as introduced in Hansen and Ostermeier (2001), focusing on CMA and CSA. For a complete tutorial on the
CMA-ES, we refer the interested reader to Hansen (2016).

**The $(\mu W,\lambda )$-CMA-ES.**As previously mentioned, evolution strategies (commonly) generate search points $xi$ for $i,\cdots ,\lambda $ by sampling from a multivariate normal distribution, which for CMA-ES, at generation $g$, reads as:

**Covariance Matrix Adaptation.**The covariance matrix is updated such that the likelihood of successful search steps is increased, and effectively involves an iterative principal component analysis of selected search steps. Since, in order to guarantee fast search on simple functions, the population size $\lambda $ is required to be small, it is not feasible to produce a reliable estimator of the covariance matrix using only the information available in the current population at generation $g$. The CMA-ES thus uses the selected steps from previous generations to yield a good covariance matrix estimator. To put more weight on the information obtained in recent generations, exponential smoothing with learning rate $c\mu $ is used in computing the so-called rank-$\mu $ update:

**Cumulative Step Length Adaptation.**The update of the global step size $\sigma (g+1)$ is not addressed explicitly via CMA, and only the changes in scale for the respective directions are computed. The reasoning for maintaining a global step size in the CMA-ES algorithm is two-fold. First, the optimal step size cannot be well approximated by the CMA update alone, and second, the required rate of change for the global step length is much higher than the maximum rate of change of the covariance matrix (Hansen, 2016). Similar to CMA, the update of $\sigma (g+1)$ utilizes an evolution path $p\sigma (g+1)$, which represents the sum of successive steps over a given backward time horizon. Since the evolution path $pc(g+1)$ depends on its direction, that is, covariance, a conjugate evolution path $p\sigma (g+1)$ is constructed via:

**Extensions.** Several modifications to the CMA-ES have been
proposed over the years, but the core algorithm remains the current state of
the art in ES, ranking high in benchmarks for numerical optimization (Hansen
et al., 2010), with many successful
applications in real-world optimization problems (Hansen, Niederberger et
al., 2009; Bredèche, 2008; Winter et al., 2008). Early extensions to the CMA-ES
involved the introduction of local restart strategies (Auger et al., 2004), which uses multiple criteria to
determine whether to restart the algorithm at certain points during the
optimization run. This gave rise to the introduction of the IPOP-CMA-ES
(Auger and Hansen, 2005), which upon
a restart increases the size of the population, and the BIPOP-CMA-ES (Hansen
et al., 2010), which balances
between larger and smaller population sizes during restarts. The
(1+1)-Cholesky CMA-ES (Igel et al., 2006) uses an implicit method for adapting the covariance
matrix, without using an eigendecomposition but by leveraging the so-called
Cholesky decomposition, which has the effect of reducing the runtime
complexity in each generation from $O(n3)$ to $O(n2)$.
This was later extended to the $(\mu ,\lambda )$-Cholesky
CMA-ES (Krause et al., 2016), which
uses triangular Cholesky factorization instead of the inverse square root of
the covariance matrix. Since, in general, the CMA-ES scales poorly with
increasing $n$,
several variants have been proposed for large-scale optimization such as the
sep-CMA-ES (Ros and Hansen, 2008)
and dd-CMA-ES (Akimoto and Hansen, 2020); see Varelas et al. (2018) for a review. The $(\mu W,\lambda )$-CMA-ES
uses the information of the $\mu $ best individuals in the population to compute the update of $C(g+1)$.
The Active-CMA-ES improves upon this idea by also including the information
from the $\lambda -\mu $ worst individuals and scaling them with negative weights. Alternative
strategies for updating the global step size have been proposed, such as the
two-point step-size adaption (Hansen, 2008) and the median success rule (ElHara et al., 2013). Mirrored sampling and pairwise
selection (Auger et al., 2011), seeks
to improve the mutation operator by generating pairs of mirrored mutation
steps and only selecting the best out of each mirrored pair for updating the
strategy parameters. This was later extended to use orthogonal sampling
(Wang et al., 2014). Many of the
extensions mentioned in this section can be combined arbitrarily to produce
novel CMA-ES variants (van Rijn et al., 2016; de Nobel et al., 2021a), even ones that are not originally proposed for use with
the CMA-ES, such as the usage of quasi-random mutations (Auger et al., 2005). Hybrid CMA-ES algorithms, such
as the HCMA, BIPOP-Active-CMA-STEP (Loshchilov et al., 2013), and IPOP-Active-CMA-ES (Faury et al., 2019) are currently ranking top on the
well-known BBOB single objective noiseless benchmark (Hansen et al., 2021).

This brief overview clarifies how much the subfield of evolution strategies has diversified after the introduction of the CMA-ES in the second half of the 1990s. All variants based on the CMA-ES use historical information about successful mutations to support continual learning of the mutation distribution. Even for this branch of evolutionary computation, gaining a complete overview of the state of the art in the field becomes very difficult, as exemplified by the subset of algorithm variants that have been collected and empirically compared by Bäck et al. (2013).

### 2.2 Genetic Algorithms

The genetic algorithm (GA) is likely still the most widely known variant of
evolutionary algorithms. It was initially conceived in Holland's work of
general adaptive processes (Holland, 1975). The commonly known GA, which is referred to as *canonical* GA or *simple* GA (SGA), has been
the first example of applying genetic algorithms to parameter optimization (De
Jong, 1975). The SGA applies a binary
representation and proportional selection, which is based on the fitness values
of individuals. It emphasizes the use of recombination, that is, crossover, as
the essential variator for generating offspring, while preferring a low mutation
probability. After the *initialization* step, the procedure of an
SGA can be briefly described as an optimization loop that successively applies
the operators *selection*, *crossover*, *mutation*, and *evaluation* until a stopping
criterion is met. Following years of development, the GA variants that have been
proposed for applications in different domains still primarily follow this
algorithm skeleton, but depending on the application domain (and,
correspondingly, the search space), numerous variations have been proposed. A
complete overview is beyond the scope of this paper, which focuses on continuous
parameter optimization.

#### 2.2.1 Decoding Is Not a Unique Option

A GA used to be characterized by its *encoding* and *decoding* system. The optimization operators execute on
a *genotype* solution, while the problem-specific solutions
are presented as the *phenotype*. The SGA applies a binary
genotype representation, that is, $x\u2208{0,1}n$,
where $n$ is the length of the bit string. When dealing with continuous and discrete
parameter optimization problems, encoding and decoding methods are applied
to transfer solutions between genotype and phenotype.

Nowadays, the GA supports the representations of arbitrary types of variables such as real, integer, and categorical values such that the encoding and decoding system is unnecessary for some variants, for example, in complex GA applications such as in engineering (Dasgupta and Michalewicz, 2013), neural networks (Leung et al., 2003), and permutation-based problems (Whitley, 2019; Larrañaga et al., 1999; Reeves and Yamada, 1998; Costa et al., 2010). Also, novel operators have been proposed to deal with the non-binary representation.

#### 2.2.2 Real-Coded Operators

For continuous optimization, the GA variants with non-binary representation usually apply the following collection of real-coded crossover operators:

Blend crossover (BLX) (Eshelman and Schaffer, 1992) samples values of offspring from a uniform distribution, of which the range depends on the values of the two parents.

Simulated binary crossover (SBX) (Deb et al., 1995) simulates the one-point crossover of the binary representation. It creates two offspring based on the linear combination of two parents, and the scale of each parent's value depends on a distribution with a given spread factor.

Unimodal normal distribution crossover (UMDX) (Ono et al., 1999) samples offspring values as linear combinations of the realization of two normal distributions. One distribution obtains the mean and the standard deviation, which is the average of two parents' values and is proportional to their distances, respectively. The other distribution's mean is 0, and its variance depends on the distance from the third parent to the line connecting the two parents.

Recent works mostly focus on the modification and self-adaptivity of the distributions that are used to sample offspring. For example, the simple crossover (SPX) (Tsutsui et al., 1999) samples offspring values as the combinations of values sampled uniformly at random based on pairs formed by multiple parents, which can be seen as an extension of BLX for multiple parents. Moreover, self-adaptive and dynamic models have been proposed for SBX (Deb et al., 2007; Chacón and Segura, 2018; Pan et al., 2021). The Parent-Centric crossover (PCX) (Deb, Anand, et al., 2002) modified the UNDX by introducing a parent-centric concept, and the PNX (Ballester and Carter, 2004) followed the idea of PCX and simplified the normal distribution used to sample offspring based on the deviation of two parents.

For the real-coded mutation operators, one of the earliest related work (Janikow et al., 1991; Neubauer, 1997) known as a non-uniform mutation increases and decreases variable values with equal probabilities. The mutated distance, that is, deviation of the parent and offspring values, is proportional to the deviation of the boundary and the parent values, and this deviation gets close to 0 after a large number of generations. Also, the recently proposed wavelet mutation (Ling and Leung, 2007) samples the mutated distance using the Morlet wavelet, which distributes symmetrically with an expected value of zero. Moreover, novel mutation operators (Korejo et al., 2010; Temby et al., 2005) search towards promising space based on the progress feedback from previous updatings of the population.

#### 2.2.3 Developments in Binary Representation

At the same time, GAs with the binary representation are still applied in many domains, for example, pseudo-Boolean optimization. For the binary representation, mutation flips the chosen bits of a parent, and crossover swaps the chosen bits of the two parents.

The classic *standard bit mutation* flips each bit of an
individual with probability $pm\u2208[0,1]$.
It was seen as a “background operator” with a small $pm\u224810-3$,
later $pm\u22481/n$,
in early GAs (Holland, 1975;
Bäck and Schwefel, 1993). We
rephrase the mutation procedure as flipping $\u2113\u2208{0,\u2026,n}$ bits chosen uniformly at random. The standard bit mutation samples $\u2113$ from a binomial distribution $Bin(n,pm)$.
Recent work of mutation-only EAs studies the choice of $\u2113$.
For example, the so-called fast GA applies a heavy-tailed power-law
distribution to sample $\u2113$ (Doerr et al., 2017), and a
normalized bit mutation samples $\u2113$ using a normal distribution in Ye et al. (2019). Moreover, it has been shown that the value of $\u2113$ depends on problem dimensionality $n$ (Witt, 2013; Doerr et al., 2019).

Crossover, another key operator of the GA, obtains important characteristics that are not in mutation (Spears, 1993). Theoretical studies have tried to explain how crossover works in a GA and indicate that crossover could capitalize on population and mutation (Doerr et al., 2015; Dang et al., 2017; Corus and Oliveto, 2017; Sudholt, 2017; Pinto and Doerr, 2018; Antipov et al., 2020). The value of $pc\u2208[0.6,1.0]$ was mentioned in Bäck and Schwefel (1993) for the crossover probability, found in earlier empirical studies. However, a recent study shows that this suggested value is not optimal for some cases. For example, the optimal value changes with population size and dimensionality for the LeadingOnes problem (Ye et al., 2020).

Note that many theoretical studies exist about GAs for discrete optimization (Doerr and Neumann, 2020). Also, much evidence indicates that the optimal parameters (i.e., mutation rate, crossover probability, and population size) are dynamic, which relates to the topic of parameter control (Karafotias et al., 2014; Aleti and Moser, 2016).

#### 2.2.4 Learning in Genetic Algorithms

The *schema theorem* plays an important role in the early
theoretical study of genetic algorithms. It considers the hypothesis that
good GAs combine *building blocks*, which are short,
low-order, and above-average schemata, to form better solutions (Goldberg, 1989; White, 2014). Though the schema theorem shows
some drawbacks (Eiben and Rudolph, 1999)---for example, this building blocks hypothesis implicitly
assumes that the problems are separable, and the theorem cannot explain the
dynamic behavior and limits of GAs---it reveals that one key issue of GA is
destroying building blocks. Furthermore, to take “correct”
actions, contemporary methods extract and apply useful information learning
from the population of GAs, though this information may unnecessarily be
about building blocks.

*Linkage Learning* was introduced to form a novel crossover
operator (Harik and Goldberg, 1996),
avoiding to destroy building blocks. Later on, the hierarchical clustering
algorithm is applied to learn a linkage tree, creating building blocks for
the crossover of the linkage tree GA (Thierens, 2010), which belongs to the family of gene-pool
optimal mixing EAs (GOMEAs) (Thierens and Bosman, 2011). The GOMEA learns the maintained
population's general linkage model, called the Family of Subset
(FOS). A FOS is usually a set of variables, and for each FOS, GOMEA varies
the values of the corresponding variables. Many variants exist for GOMEA,
and the algorithms have been applied in both discrete and continuous
optimization (Bosman and Thierens, 2013; Bouter et al., 2017).

The compact GA (CGA) (Harik et al., 1999) is another GA extension inspired by the idea of building blocks. However, CGA represents the population as a distribution over the set of solutions. New solutions are sampled based on the distribution, and the distribution is updated for each generation. Nowadays, this technique is known for the family of estimation of distribution algorithms (EDA) (Hauschild and Pelikan, 2011). EDAs maintain a stochastic model for the search space, and this model can be built for discrete, continuous, and permutation-based solutions.

#### 2.2.5 Outlook

As before, the GA can still be represented by the iterative order *selection,**variator*, and *evaluate*. A significant
difference from the SGA introduced in Bäck and Schwefel (1993) is that the contemporary GAs do
not necessarily follow the generational model, maintaining consistent
population size. Moreover, more techniques are available for the variators
to create new solutions. Novel operators have been proposed for mutation and
crossover. In addition, contemporary variators learn to create new solutions
using linkage information intelligently. GAs have been applied in various
domains, although we cannot list all the applications, such as
permutation-based problems (Oliver et al., 1987; Mathias and Whitley, 1992; Nagata and Kobayashi, 1997; Whitley et al., 2010; Goldberg and Lingle, 1985). While more and more techniques appear in various
problem-specific domains, theoretical work is developing to help us
understand how the GA works. However, providing guidelines for selecting the
optimal operators for a given problem remains challenging while the GA
community is putting effort towards this goal (Bartz-Beielstein et al., 2020; Ye, 2022).

### 2.3 Evolutionary Programming

The development of evolutionary programming (EP) dates back to the mid-1960s, too. Initial work was presented by Fogel et al. (1965, 1966) for evolving finite state machines (FSM) with the aim to solve prediction tasks generated from Markov processes and non-stationary time series. EP was originally proposed as an alternative method for generating machine intelligence (Fogel, 1994). Genetic variation was performed by mutating the corresponding discrete, finite alphabet with uniform random permutations. Each candidate machine in the parent population produced one offspring by mutation. The best performing half number of parents and offspring were then selected for the following generation. This selection method is analogous to the ($\mu $ + $\lambda $) strategy commonly used in ES. In addition, the original work for evolving finite state machines also offered the potential for exchanging parts of finite state machines, which one might consider as a proposal for a “crossover” operator (Fogel et al., 1966).

D. B. Fogel later enhanced EP by adapting genetic variation on real-valued
variables with normally distributed mutations (see e.g., Fogel et al., 1990; Fogel, 1992, 1993). EP
traditionally did not use any form of crossover; therefore, the evolutionary
process typically relied on a set of mutation techniques that have been proposed
over the last decades. In general, EP viewed each candidate solution as a *reproducing population*, whereby the used mutation operator
simulates all changes that occur between the current and the subsequent
generation. Like evolutionary strategies, EP stressed the role of mutation in
the evolutionary process. Traditionally, EP used normally distributed mutation
as the primary search operator. During the course of the 1990s, the annual
conference on EP was held until 1998. Several studies in the mid- and late-1990s
showed that using a Cauchy random variable can enhance performance for certain
parameter optimization problems (Yao and Liu, 1996). Moreover, a multi-operator approach to EP was proposed in
this period. It was shown that this multi-mutational self-adaptive EP strategy
was able to adjust the use of both, the Gaussian and Cauchy mutations, and was
thereby able to provide greater rates of optimization when compared to the sole
use of Cauchy mutations (Saravanan and Fogel, 1997). A runtime analysis of EP using Cauchy mutations has been
presented by Huang et al. (2010).

EP has been very successful in evolving strategies for certain games (see, e.g.,
work by Chellapilla and Fogel, 1999a).
Most remarkable is the use of EP to evolve neural networks that were able to
play checkers without relying on expert knowledge (Chellapilla and Fogel, 1999b; Fogel, 2000). In this approach, EP created a population of 15
artificial neural networks. Each candidate network created an offspring with
Gaussian mutation that was controlled with a self-adaptive step size. To
validate the strength of the best-evolved networks, the created artificial
checkers players operated within a computer program called *Blondie24* on the online board gaming site Zone.com. Over a
two-month period, 165 games against human opponents were played. The estimated
rating of Blondie24 was in the mean 2045.85 with a standard error of 0.48
whereby the master level starts at a rating of 2200. The best result from the
165 games was seen when the network defeated a player with a rating of 2173,
which is only 27 points away from the master level. The story of Blondie24 is
described in David B. Fogel's book *Blondie 24—Playing at
the Edge of AI* (Fogel, 2002). Later on, Fogel applied the approach to chess and created
Blondie25, a chess program that was able to win over a nationally ranked human
chess player and also showed some wins (one out of twelve games for black, two
of twelve for white) over Fritz 8.0, which was the number five ranked program in
the world at that time (Fogel et al., 2006; see also Fogel, 2023).

When considering parameter optimization problems as defined in Equation (1), early version of ES and EP
shared quite a few similarities, as we already outlined in Bäck and
Schwefel (1993). From the year 2000,
algorithms based on the concept of CMA-ES are more prominent for this type of
application than EP-based approaches, and today one finds that the term *evolutionary programming* is sometimes used to denote
algorithms for continuous parameter optimization that are solely based on
mutation (Abido and Elazouni, 2021) or the
term is also sometimes confused with *genetic programming* (Jamei
et al., 2022).

## 3 Other Important Developments

In this section, a few developments are discussed that we consider of strong importance for the field, however, again this is a subjective selection and a wide range of topics cannot be covered in this paper.

### 3.1 Parameter Control

It is truly amazing to see that early parameter control techniques such as the 1/5-th success rule (Rechenberg, 1973) and Hans-Paul's early approaches for adapting variance(s) and covariances of an $n$-dimensional Gaussian distribution (Schwefel, 1981), later on proposed in a similar way for genetic algorithms and binary search spaces ${0,1}n$ (Bäck and Schütz, 1996; Kruisselbrink et al., 2011), as well as mixed search spaces (Li et al., 2013), have now turned into a standard approach in evolutionary computation. Thanks to the wonderful progress that was achieved in the theoretical analysis of evolutionary algorithms in the past 30 years, we also now have a fairly good understanding of why such approaches are helpful, how to analyze them, and what the time complexity and optimal parameter settings for some (classes of) test functions look like—both for specific pseudo-Boolean optimization problems and for specific continuous parameter optimization problems. This field of research is huge now, and giving an overview is impossible, so a selection of just a few research works is incomplete and subjective, and therefore we abstain from trying to do so here.

### 3.2 Multimodal Optimization, Novelty Search, and Quality Diversity Algorithms

In this subsection, we consider the development in multimodal optimization in the
last 30 years and a relatively new generalization of such optimization referred
to as *Novelty Search* and *Quality Diversity* algorithms. At first, we introduce and motivate multimodal optimization, then
highlight the development of niching as one of the most important concepts in
multimodal optimization, and then proceed with describing several algorithms
that use this concept including the most promising state-of-the-art algorithm.
Secondly, we show how the idea of multimodal optimization developed to a more
general setting by introducing and motivating Novelty Search and Quality
Diversity. Thirdly, we introduce Multidimensional Archive of Phenotypic Elites
(MAP-Elites) algorithm as one of the historically first and most influential
methods in quality diversity and discuss its limitations and how they are
overcome. We finish with a discussion of state-of-the-art quality diversity
algorithms and the scenarios of their application.

#### 3.2.1 Multimodal Optimization

Traditional single-objective optimization focuses on obtaining a solution with the best possible quality with regard to the given objective function in the smallest possible time. However, this exact setup may be not targeted by engineers and other practitioners who apply optimization at the beginning of the construction process. For this kind of application, it is relevant to explore possible trade-offs that give sufficiently good performance in acceptable time (Bradner et al., 2014). Such exploration may be seen as the task of obtaining a diverse set of solutions with a fitness value above the given threshold. This task is called multimodal optimization (MMO) since it is common for the considered objective functions to have many local optima or modes.

To approach MMO, a number of algorithms were developed. The first algorithms
of this type were proposed back in the 1970s (De Jong, 1975). They were extensions of genetic algorithms by *niching* methods that aim to maintain the diversity in
the populations (Preuss, 2015).
Works of the mid-1990s followed where niching methods were improved. A *clearing* method was proposed (Pétrowski, 1996) where similar individuals are
penalized when their number exceeds the given threshold. Restricted
tournament selection modifies the selection procedure for creating the new
population (Harik, 1995) by
restricting the competition to take place only between similar offspring.
This ensures that the population has a diverse set of individuals even if
the quality of some of them is worse than others. Both those niching methods
use distance in the search space as the criterion of similarity. The
threshold (or niche radii) for this criterion in particular, and other
parameters of those niching algorithms in general, cannot be efficiently
determined a priori without assumptions about the optimized function. The
series of works followed to solve this problem, in particular Shir and
Bäck (2006) and Shir et al.
(2007) propose to adaptively
control this threshold during the optimization process. The latter work also
introduced niching into the CMA-ES for the first time. The comprehensive
survey on variations of niching can be found in Das et al. (2011).

Apart from niching in evolutionary algorithms, the same idea of diversity in population can be applied to other metaheuristics. Particle swarm optimization was extended with niching by Qu et al. (2012). It uses the Euclidean distance-based neighborhood of every particle and picks several locally best-performing ones to guide the search for every particle. Thomsen (2004) proposed to extend differential evolution with niching using the classical idea to penalize individuals that are close. Surveys by Das et al. (2011) and Li et al. (2016) thoroughly describe other important methods and ideas for extending existing algorithms to MMO settings. According to Preuss et al. (2021), one of the leaders in competitions in MMO that were held on CEC/GECCO from 2016 to 2019 and the winner among niching algorithms when a greater budget is given is HillVallEA (Maree et al., 2018). This algorithm makes extra evaluation of the objective function to check the existence of worse solutions (hills) between the given pair of solutions to decide whether they belong to the same niche (valley).

#### 3.2.2 Novelty Search

In the optimization community, mainly in evolutionary robotics, diversity
with regard to the behavior of the solutions started to be considered around
2010 (Lehman and Stanley, 2008). In
such domains, the individuals do not describe the complete solution, but the
complete description may be obtained during the evaluation of the objective
function, for example, the simulation of robot movements. The set of all
possible descriptions is called *behavior space* (BS). The
first algorithms that approached this modified goal aimed to explore this
space, because the connection with genotype space is not known for the
optimizer. Such approaches, known as novelty search algorithms (Lehman and
Stanley, 2011a), changed the goal
from obtaining high-quality solutions to the exploration of BS. Instead of
optimizing the single given value of the objective function, they optimize
the distance in BS between the sampled solution and solutions in the
previously collected archive. Illumination of the landscape of BS produces a
diverse set of solutions with different qualities. However, only the
solutions of sufficiently good quality are interesting for practical
applications.

#### 3.2.3 Quality-Diversity

The family of algorithms that search for a distributed set of sufficiently
good solutions in BS is known as *Quality-Diversity* (QD)
algorithms (Pugh et al., 2015).
Following Chatzilygeroudis et al. (2021), the goal of the QD algorithms may be seen as a
generalization of MMO, such that the objective function is maximized as in
the MMO case but the diversity is targeted with regard to the given behavior
descriptor function. One of the first QD algorithms proposed in 2011 is an
extension of novelty search and optimizes two objectives: novelty objective
(the same as in the original novelty search) and local competition objective
(number of closest solutions with lower objective value) (Lehman and
Stanley, 2011b). Optimization of
those objectives allows the generation of a set of diverse solutions with
good local quality. Approximately at the same time, the MAP-Elites (Mouret
and Clune, 2015) algorithm was
proposed. Due to the simplicity of its idea and efficiency in practice, this
algorithm gained a lot of attention in the optimization area and became a
classical QD optimizer. Due to the great influence of this algorithm on QD
optimization, we give its description in more detail in the next
paragraph.

MAP-Elites maintains an archive of previously obtained solutions in a special container. The container discretizes the BS and splits it into cells. Every solution in the container belongs to the cell that corresponds to its behavior descriptor and eliminates from the container a worse solution (with respect to the objective function) that shares the same cell with it. To update the container MAP-Elites follows the general template of evolutionary algorithms. In the original version, random selection is applied to choose a number of solutions from the container, then mutation and crossover are applied to the selected individuals to create a new generation of individuals, then the objective function is evaluated for each of them to obtain their behavior descriptors and performance, and then the container is updated.

The standard version of MAP-Elites was applied in a number of domains (see Section 3.1 of Chatzilygeroudis et al., 2021). However, it exhibits two limitations: (i) the high dimensionality of BS does not allow for efficient storage of the container; and (ii) the expensiveness of the objective function restricts the number of evaluations that the QD algorithm may afford to make in a reasonable time. Different modifications of this algorithm were proposed to overcome those problems; here we mention only the first ones. Vassiliades et al. (2017) addressed the problem of high-dimensional BS by using Voronoi diagrams to store the individuals in the container instead of the traditional grid, which occupies exponentially more memory with the growth of the number of dimensions. Gaier et al. (2017) addresses the problem of optimizing the expensive objective function by building a surrogate of the function and applying EGO for optimization.

QD algorithms are actively developed to satisfy the needs of modern applications. In the case when the BS is not simple to define before the optimization, Cully and Demiris (2018) proposed to apply autoencoders to automatically learn the latent space of digits images and use it as BS of a robotic hand that learns to write digits. In the case of a noisy function, Flageat and Cully (2020) proposed to store multiple individuals in every cell of the MAP-Elites container, select only the individuals with the highest quality, and change the randomly selected individual in the cell to the freshly added individual. This procedure ensures that the search is not misled by the deceptiveness of a noisy environment. Pierrot et al. (2022) considered the case of multiobjective functions and proposed to maintain the Pareto front in every cell of MAP-Elites' grid. The benchmarking study of this algorithm showed that it maintains a diverse set of individuals and at the same time manages to outperform (with respect to the global hypervolume) traditional multi-objective algorithms in many cases while not significantly losing on the rest of the considered functions. Urquhart and Hart (2018) first proposed a MAP-Elites based QD algorithm for discrete settings and applied it to workforce scheduling and routing problems, resulting in new best-known solutions for several problem instances.

All of these areas are strongly related, as analyzed and compared also in Hagg et al. (2020). Their use to support engineers in finding diverse solutions to engineering design problems (Hagg et al., 2018) and in involving the user experience in the ideation process (Hagg et al., 2019) are highly relevant to works in real-world optimization.

### 3.3 Constrained Optimization

Another highly relevant topic for parameter optimization is the handling of constraints, as these are typically occurring in real-world optimization settings. Consequently, the past 30 years have seen tremendous development in constraint-handling approaches (Coello, 2022), which, 30 years ago, was often limited to the simple “death penalty” approach of discarding infeasible offspring. Ordered from the most straightforward to more complex ones, we categorize contemporary approaches into six groups.

**(i) Feasible solution preference** is the most straightforward
strategy. It works by always preferring feasible solutions over infeasible
solutions (also known as the death penalty (Kramer, 2010), and preferring solutions with a small constraint
violation over solutions with large constraint violations (Deb, 2000; Mezura-Montes and Coello, 2005). A downside of this method is that
little information is preserved in the population from infeasible solutions in
the optimization process. Consequently, the algorithms using this method can get
stuck in local optima.

**(ii) Penalized unconstrained optimization** uses a so-called penalty
function that assigns a cost to constraint violations (Mezura-Montes and Coello, 2011). Penalty functions can be
static, dynamic, adaptive, or even based on self-adaptive fitness formulations
and stochastic ranking approaches. Penalty functions for constraints, however,
lead to a strong bias toward feasible solutions. This is not always ideal, as
for example, a high cost assigned by a bad penalty function could cause a
selection operator to prefer a feasible solution far away from the optimum over
a solution very close to the optimum which is only slightly infeasible. All the
information from this promising solution would be lost and not maintained in the
population. Unfortunately, the best penalty function cannot be known upfront,
and tuning the hyper-parameters of the penalty function for each constraint
requires a lot of additional function evaluations (Richardson et al., 1989).

**(iii) Repair mechanisms** aim to repair solutions that violate the
constraints, and are often based on constraint-specific characteristics. Since
they are often problem-dependent, a range of different repair mechanisms has
been proposed (Salcedo-Sanz, 2009), for
example, gradient-based repair mechanisms (Chootinan and Chen, 2006; Zahara and Kao, 2009) and strategies that reduce the probability to
generate infeasible offspring (Liepins and Vose, 1990; Arnold and Hansen, 2012).

**(iv) Constraint separation** considers both the constraint and the
original problem as separate optimization problems and aims to solve each of
these independently. This can, for example, be achieved via coevolution, where
two populations interact with each other (Paredis, 1994), where one population is focused on the fitness of
solutions, and the other on constraint satisfaction. Another frequently used
approach considers the constraints as additional objectives (Surry et al., 1997; Coello and Montes, 2002). After optimizing the multiobjective
(see Section 3.5) problem, the solution
with the best true objective function value without any constraint violation can
be selected. A downside of this method is an unnecessarily more complex
objective space and a strong bias towards the regions where the constraints are
violated in the optimization process.

**(v) Model-assisted constraint optimization** uses machine learning to
identify feasible solutions, by training a classification model or fitting a
surrogate model with the already evaluated solutions of the constraint function.
The trained models can then be used to classify or predict which solutions are
feasible and which are not. Simple classification algorithms (such as SVMs; see
Poloczek and Kramer, 2013) can be used
in this approach, but more advanced regression or surrogate models (see Section 3.4) can also be used. A benefit of
model-assisted constraint optimization is that no information is lost from
evaluated solutions and that a lot of function evaluations can be saved.
However, this often comes at the cost of a substantial increase in computational
resources.

**(vi) Hybrid methods** exist that make use of a combination of the
above-mentioned techniques. Noteworthy mentions are, for example, SACOBRA, which
uses a repair mechanism in combination with model-assisted constraint
optimization techniques (Bagheri et al., 2017) and automatic constraint-handling technique selection based on
problem characteristics (Mallipeddi and Suganthan, 2010).

These constraint-handling strategies are not necessarily limited to being used solely in single-objective optimization, but can also be applied to find feasible solutions for multiobjective problems. The only clear difference between them is a different way of handling the objectives (see Section 3.5 for more information).

### 3.4 Surrogate-Assisted Approaches

Optimization based on computer simulations or real-world experiments can be very
expensive in terms of time and money. Therefore, the automotive, marine,
aviation, and other engineering fields that make use of computer simulations
make use of *surrogate-assisted* optimization algorithms.

Surrogate-assisted evolutionary computation is a variant of EC that utilizes a
surrogate model, also known as a *metamodel* or *response
surface model*, to guide the search process. In traditional EC, the
objective function, which maps a set of input parameters to a single output
value, is evaluated directly to determine the fitness of candidate solutions. In
surrogate-assisted EC, the objective function is approximated using a surrogate
model, which is trained on a subset of the evaluated points and can be used to
predict the objective function value at any point in the search space. The use
of a surrogate model allows the search process to be more efficient by reducing
the number of function evaluations required to find good solutions. This is
particularly useful when the objective function is computationally expensive to
evaluate or when the search space is large and complex.

The original idea of using surrogate models in optimization comes from the *Efficient Global Optimization* (EGO) framework
(*Bayesian optimization*) by Jones et al. (1998). In EGO, a Gaussian Process
Regression (GPR) model is used as a surrogate model where the GPR is exploited
to select a promising candidate solution. At each iteration, the most promising
(yet unseen) candidate solution is evaluated on the real objective function. In
surrogate-assisted EC, similar techniques are used to reduce the number of real
objective evaluations by using the surrogate models as a filter to remove the
most likely bad individuals from the population before the real evaluation.
Surrogate-assisted EC was mainly motivated by reducing the time-complexity of
expensive problems such as aerodynamic design optimization (e.g., Jin and
Sendhoff, 2009) or drug design (e.g.,
Douguet, 2010).

Different strategies for using surrogate models are often known as model management or evolution control. This is specifically challenging when the problems are of high dimension. The first works in this field date from the mid-1990s (e.g., Ratle, 1998; Yang and Flockton, 1995). Surrogate models can be applied to almost all operations of EAs, such as population initialization, mutation (Abboud and Schoenauer, 2001), crossover (Anderson and Hsu, 1999), and fitness evaluations (Rasheed and Hirsh, 2000).

#### 3.4.1 Advances in Model Management Strategies

Model management strategies for fitness evaluations can generally be divided into population-based, individual-based, and generation-based strategies (Jin, 2005). Population-based means that more than one subpopulation coevolves. Each subpopulation uses its own surrogate model for fitness evaluations. Individual-based methods use a surrogate for some of the individuals in a generation and use the real function on the rest of the individuals (Jin et al., 2000). Generation-based means that the surrogate is used for fitness evaluations in some of the generations, and the real objective function in the rest (Ratle, 1998; Jin et al., 2002).

#### 3.4.2 Advances in Surrogate Models

The original EGO framework uses Gaussian Process Regression, or Kriging (Rasmussen and Williams, 2006), as the default surrogate model due to its capability of providing uncertainty quantification. However, for surrogate-assisted EC this uncertainty is not always required and other machine-learning models can be used as well. There are also many advances in EGO surrogate models that use Kriging approximation methods to deal with larger numbers of dimensions or samples (Van Stein et al., 2020; Raponi et al., 2020; Antonov et al., 2022). Next to Kriging(-like) methods, popular choices are Radial Basis Functions (Sun et al., 2018), artificial neural networks (Jin et al., 2002), polynomial repression (Zhou et al., 2005), and support vector regression (Wang et al., 2015). Recent advances also include the use of surrogate ensembles and incorporate automated hyperparameter optimization methods (AutoML) for selecting a well-fitting surrogate model (Yu et al., 2020, 2022; Huang et al., 2022; de Winter et al., 2022).

### 3.5 Multiobjective Optimization

While multiobjective optimization almost was in its infancy 30 years ago, nowadays it is almost impossible to imagine evolutionary computation without this important subfield. Evolutionary multiobjective optimization now defines its own research field and requires other approaches than single-objective optimization. The main reason for this is that in multiobjective optimization, we are often not interested in a single solution, but in finding the set of Pareto-optimal solutions (Konak et al., 2006). The first multiobjective optimization algorithm (VEGA, see Schaffer, 1985) splits the population into multiple subpopulations where each population would be optimized for a different objective. This strategy, however, tends to only converge to the extremes of each objective. In the 30 years following, researchers realized that there was a need for diversity mechanisms, elitism in the population, and other fitness functions to find well-spread solutions along the entire Pareto frontier. Three evolutionary multiobjective optimization algorithms that had these characteristics quickly started to define the state of the art (Deb, 2007):

*Nondominated Sorting Genetic Algorithm II (NSGA-II)*by Deb, Agrawal et al. (2002) became popular because it uses an elite-preservation mechanism, enhances a diversity mechanism, and emphasizes nondominated solutions. The emphasis on the nondominated solutions is done with a selection operator that selects only the top $N$ nondominated solutions for crossover to generate new individuals.*Strength Pareto Evolutionary Algorithm 2 (SPEA2)*by Zitzler et al. (2001) keeps a fixed archive of nondominated solutions where nondominated solutions are replaced with solutions that dominate more individuals. The search for new solutions is guided by a $k$-th nearest neighbor density estimation technique.*Pareto Envelope-based Selection Algorithm (PESA2)*by Corne et al. (2001) is an evolutionary strategy that compares a parent selected from a hyperbox with its offspring child. If the child dominates the parent, the child is accepted as the next parent. All nondominated solutions are put in an archive, and to keep the archive small only the solutions in the least crowded regions are kept.

In the years following, the multiobjective optimization research field was enhanced with strategies for many-objective optimization (e.g., Wagner et al., 2007; Deb and Jain, 2013; Ishibuchi et al., 2008), for problems with expensive function evaluations (e.g., Ponweiser et al., 2008; Knowles, 2006; Santana-Quintero et al., 2010), interactive multiobjective optimization (e.g., Miettinen, 2014; Xin et al., 2018), and constraint multiobjective optimization (e.g., Qu and Suganthan, 2011; Blank and Deb, 2021; de Winter et al., 2022).

### 3.6 Automated Algorithm Design

As the number of available algorithms to solve parameter optimization problems
keeps growing, as evidenced by the many developments in the algorithms described
in Section 2, the question of how to choose
one of them is gaining considerable importance. While the problem of *algorithm selection* (see Kerschke et al., 2019; Muñoz et al., 2015 for surveys), where we want to pick
one algorithm to solve a particular problem, has been around for decades (Rice, 1976), extensions on this original
framework have gained more traction in the last decade.

The question of *algorithm configuration* or meta-optimization
(see Huang et al., 2019 for a survey),
where we want to tune parameters for a specific algorithm, has become more
feasible as computational resources are more readily available. Simultaneously,
the number of parameters of most methods in evolutionary computation is growing
as more modifications are proposed. As such, tuning of parameters is becoming
more prevalent (Eiben and Smit, 2011),
and tools for algorithm configuration are gaining popularity (e.g., see
López-Ibáñez et al., 2016; Lindauer et al., 2022).

While tuning parameters for an overall performance gain on a specific function or
set of functions is natural, further benefits are expected when the choice of
algorithms or their parameters can be changed during the optimization process.
In contrast to self-adaptation, this notion of *Dynamic Algorithm
Configuration* (DAC; see Biedenkapp et al., 2020; Adriaensen et al., 2022) often relies on methods like reinforcement learning, which
enable more general adaptation of not just a single parameter, but potentially
switching multiple operators at once (see Eimer et al., 2021; Sun et al., 2021). Results with DAC have been promising, showing the potential
to mimic complex traditional self-adaptation schemes (Shala et al., 2020).

These online configuration tasks are feasible only if we consider our algorithms to be modular in nature, so changes between operators are as simple as swapping a module, as we have recently shown (de Nobel et al., 2021a; Vermetten et al., 2019). However, even when this modularity is not present, advantages could be gained by switching between algorithms to exploit differences in their behavior (Vermetten et al., 2020). While this notion of switching between algorithms has shown promise in the context of hybrid algorithm design, online policies to switch between algorithms on a per-run basis, for example, based on the internal state and the landscape as seen by the algorithm, are promising future directions (e.g., Kostovska, Jankovic et al., 2022).

### 3.7 Particle Swarm Optimization

*inertia weight*$\omega $, that dampens the rate of change of $vi$. Large values of $\omega $ result in more exploratory behavior while a small value results in exploitative behavior. It was proposed by Shi and Eberhart (1998) to reduce the value of $\omega $ adaptively from 0.9 to 0.4 during the algorithm run. Notable variants of PSO are

*Fully Informed Particle Swarm (FIPS)*(see Kennedy and Mendes, 2002), which uses the best previous positions of all its neighbors, and

*Bare-Bones PSO*(see Kennedy, 2003), which does not use velocity at all but updates the position using a Gaussian probability distribution.

### 3.8 Differential Evolution

Originally proposed in 1995 for parameter optimization problems, *Differential Evolution (DE)* (Storn and Price, 1995) has rapidly gained popularity over
the subsequent years due to the simplicity of its implementation, low number of
parameters, low space complexity, and competitive performance on a wide range of
problems (Das and Suganthan, 2011).

The general framework of DE is as follows: the algorithm starts with a population
of $N$ solutions typically initialized at random as real-valued vectors inside problem
boundaries. Then, within the generational loop of DE, all solutions in the
population sequentially one-by-one undergo a multi-stage perturbation made up
of: (i) *mutation*: first, a number of “donor”
solutions is selected uniformly at random from the population for each
“current” solution (now called “target”), then a new
“mutant” solution is generated as a sum of a selected solution
(called “base”) selected following the mutation strategy and a
scaled difference between pairs of abovementioned selected solutions; (ii) *crossover/recombination*: components of the new
“trial” solution are produced via exchanging components, with some
probability, between mutant and target solutions; (iii) *selection*: trial and target solutions deterministically
compete for a spot in the next generation in terms of their values of the
objective function. Similar to other EC algorithms, the termination criterion
can be either based on the precision or the budget of function evaluations.

All classic DE variants can be described via a general notation *DE/a/b/c* where *a* stands for mutation
strategy, *b* denotes the number of difference vectors employed
in the mutation strategy (e.g., *rand/1* Storn and Price, 1995; *rand/2*, *best/1*, *best/2* originally proposed in
Price et al., 2005; and more advanced
versions like *target-to-pbest/1*, Tanabe and Fukunaga, 2013 and Zhang and Sanderson, 2007; *target-to-best/2*,
Gong and Cai, 2013; *target-to-rand/1*, Qijang, 2014, Sharma et al., 2020,
Qin et al., 2009; *2-Opt/1* Chiang et al., 2010), and *c* specifies the crossover strategy
(e.g., originally proposed *bin*, Storn and Price, 1995; *exp*, Price et al., 2005; or more advanced options from
Das et al., 2016). Initialization
(uniform, Halton, Gaussian, etc.) and feasibility-preserving (Boks et al., 2021) operators are traditionally not
included in such notation, typically leaving the algorithm's
specification somewhat ambiguous (Kononova et al., 2022).

#### 3.8.1 Differential Evolution in the Context of EC

DE is *closely related* to a number of other heuristic
algorithms, namely: (i) the standard EA, due to the same computational steps
involved in both algorithms; (ii) annealing-inspired algorithms ^{1} (Kirkpatrick et al., 1983), due to built-in scaling down of
the “step size” in both algorithms; (iii) the Nelder-Mead
algorithm (Nelder and Mead, 1965)
and controlled random search (CRS) (Price, 1977), both of which also operationally rely heavily on the
difference vectors to perturb the current trial solutions.

At the same time, DE is *remarkably different* from
traditional EC algorithms (Das and Suganthan, 2011): (i) population-derived differences used for
perturbation in DE automatically adapt to the natural scaling of the
problem, unlike, for example, an ES which requires explicit rules for
adaptation of step size per variable over time; (ii) perturbations to
solutions are generated from evolving nonzero population difference vectors
rather than a predefined density function in, for example, Estimation of
Distribution Algorithms (EDA)—the so-called *contour
matching* property of DE (Price et al., 2005) which allows both a mutation step's size
and its orientation to adapt automatically to the landscape of the objective
function; (iii) donor solutions in the mutation step of DE are selected
without any reference to their fitness values, unlike a GA, where typically
some sort of fitness-based parent selection takes place; (iv) while both
algorithms ensure *elitism*, one-to-one competition for
survival in DE between parent and child mostly resembles that of a
($\mu +\lambda $)-ES,
where surviving solutions are selected from the joined pool of parents and
children; (v) survivor selection in DE is local as solutions are compared
only against a (relatively local) mutated version of itself and not against
solutions that have evolved in distant parts of solution space—apart
from multi-population versions, no standard EC algorithm behaves in a
similar fashion (Price et al., 2005). Finally, the latter property prevents premature convergence
and makes DE perfectly suitable for parallelization.

#### 3.8.2 Parameter Control

**Population size.** To employ the full potential of the chosen
mutation operator, the population size $Np$ should allow choices of non-repeating indices of donor solutions (i.e., $Np\u22654$ for *rand/1*). At the same time, too large populations should
be avoided as potentially deleterious for the convergence process, at the
expense of an adequate exploitation phase to refine promising solutions
locally.

**Control parameters.** DE mutation is parameterized via *scale factor*$F\u2208(0,2]$,
whose value in practice should be sufficiently high to counteract selection
pressure and generate sufficiently diverse trial solutions to avoid
premature convergence (Zaharie, 2002). While values $1<F<2$ are generally less reliable, their occasional utilization can be beneficial
within adaptive schemes described below. DE crossover is parameterized via *crossover rate*$Cr\u2208[0,1]$,
which controls the proportion of components of the mutant solution copied
into the trial solution and, counterintuitively at first, defines a mutation
rate—the probability that a component is inherited from a mutant
(Price et al., 2005).

**(Self-)adaptation.** Population size clearly influences population
diversity: larger population size means higher diversity and therefore a
better exploration of the search space (Zaharie and Micota, 2017). Such reasoning gave rise to
algorithms with adaptive schemes, controlling a gradual reduction of *population size* over the optimization run, to lower
chances of stagnation (Brest et al., 2008; Zamuda and Brest, 2012; Tanabe and Fukunaga, 2013), following similar developments in other subfields of
EC.

## 4 Theoretical Aspects

Huge efforts have been devoted in the past 30 years to develop a rigorous theoretical analysis of evolutionary computation, in particular for genetic algorithms and evolution strategies. Following the pioneering work of John Holland's schema theorem (Holland, 1975) on genetic algorithms, Wegener (2001) used a fitness level-based method to analyze the expected running time of the $(1+1)$ EA. This work presented the first proof that crossover can reduce the running time of EAs on the so-called royal road functions (Garnier et al., 1999). The analysis of the $(1+1)$ EA for more theoretical problems (e.g., Linear and LeadingOnes) was presented in the work of Droste et al. (2002). Recently, attention was also drawn to the impact of hyperparameters on the running time, for instance, for EAs with parameterized population sizes (Jansen et al., 2005; Witt, 2006; Rowe and Sudholt, 2014; Antipov et al., 2019) and mutation rates (Böttcher et al., 2010; Doerr et al., 2013; Gießen and Witt, 2017). Furthermore, extensive work on the analysis of the crossover operator has also been done in Doerr and Doerr (2015); Doerr et al. (2015); Dang et al. (2017); Corus and Oliveto (2017); Sudholt (2017); Pinto and Doerr (2018); and Antipov et al. (2020).

For continuous optimization, many contributions focus on the asymptotic analysis of evolution strategies, such as the inspiring and fundamental analysis of Beyer (2001) and Auger and Hansen (2011). Moreover, the well-known drift analysis has been introduced by He and Yao (2001, 2003), serving as a general theoretical framework. It was employed for several challenging theoretical questions. For instance, Doerr et al. (2012) provides an elegant proof for the $O(nlogn)$ running time of the $(1+1)$ EA on any linear function. In Witt (2013) tight runtime bounds of $(1+1)$ EA are presented, and Akimoto et al. (2018) uses drift analysis to analyze the runtime bound for the $(1+1)$ evolution strategy with the 1/5-success rule. Lengler has summarized some examples of using drift analysis for EAs (Lengler, 2020).

For multiobjective evolutionary algorithms (MOEAs), theoretical results are mainly derived for simple MOEAs on discrete problems. For example, the runtime analyses for (global) simple evolutionary multi-objective optimizers on classic bi-objective pseudo-Boolean optimization problems (Laumanns et al., 2004; Giel and Lehre, 2006; Osuna et al., 2020). Very recently, the first theoretical results were presented for the famous NSGA-II algorithm (Zheng et al., 2022; and Doerr and Qu, 2022a, b).

## 5 The More the Better? Not Always

In this section, we briefly address the excessive number of optimization algorithms with potentially similar performance in the field, referring to the ever-growing number of paradigms gleaned from nature, claimed to be distinct models for optimization algorithms (Section 5.1). A possible investigation of this problem can be achieved by proper benchmarking, as we argue in Section 5.2.

### 5.1 The “Bestiary” and the Problem of Too Many Algorithms

Although the rising popularity of evolutionary computation has led to a vast number of positive contributions in the optimization domain at large, there have also been some rather negative developments over the past 30 years. In particular, the trend of devising “novel” nature-inspired optimization heuristics has been a concern in the last decades, and the trend does not seem to slow down (Hussain et al., 2019).

While early evolutionary computation techniques effectively use metaphors to motivate the algorithmic concepts, the use of metaphor has expanded to the point where it is (un)intentionally obfuscating the underlying algorithmic components (Sörensen, 2015). There is now a veritable zoo of metaphor-based optimization heuristics (Campelo and Aranha, 2021, 2018), each with its own terminology, which is hard to decipher for those with a more standard background in optimization heuristics, and for those who are looking for a well-defined formal algorithmic or mathematical description. These “novel” metaphor-based methods still gain attention, and even though many of them have been exposed as providing no novelty, new ones are still devised and promoted (Weyland, 2010; Camacho Villalón et al., 2020; Camacho-Villalón et al., 2022a). For practitioners who are not familiar with the field, but looking for the best possible choice of an optimization algorithm for solving a real-world problem, the ever-growing zoo of metaphor-based heuristics often confuses them and results in the impossibility of making a sensible choice for a specific problem to solve.

We, therefore, strongly advocate (i) the use of mathematical language and formal pseudocode to describe new algorithms; (ii) avoiding the excessive use of analogies with nature and the corresponding language, and (iii) proper benchmarking comparisons (see Section 5.2) against the state-of-the-art optimization algorithms to justify any proposal for new ones.

### 5.2 Benchmarking for Algorithm Performance Comparison

With an ever-increasing variety of optimization algorithms, questions about when a particular algorithm is useful become difficult to answer. Despite numerous advances in the theoretical analysis of optimization algorithms, it is infeasible to answer our question with only theory. This is due to the sheer number of variations of evolutionary algorithms and the fact that modern runtime analyses only apply to a limited number of theoretical problems and algorithms. There is, however, still a need to gain insight into the behavior of algorithms to decide when they might apply to a particular real-world scenario. As such, benchmarking has grown from an ad-hoc procedure into a field with many toolboxes to help researchers perform their experiments (Bartz-Beielstein et al., 2020).

The standards for experimentation have improved a lot in the last 30 years, and
the importance of not just benchmarking, but standardized and reproducible
benchmarking, has become increasingly apparent to the community
(López-Ibáñez et al., 2021). Various benchmarking frameworks have been developed,
providing access to curated sets of problems (Bennet et al., 2021; Hansen et al., 2021; de Nobel et al., 2021b). This way, performance data can be shared, promoting data
reuse and enabling straightforward comparisons to state-of-the-art algorithms.
One particular example of benchmarking tools promoting this idea is the
IOHprofiler (Doerr et al., 2020), a
collaboration between Leiden University, Sorbonne Université, Tel-Hai
College, and others, which collects performance data from a wide range of
sources and makes it directly accessible via a web service,^{2} such that anyone can analyze the data (Wang et
al., 2022).

As noted in Section 3, there is a great variety in the types of optimization problems that can be formulated, and each of them can be applicable for a specific real-world use case (van der Blom et al., 2020). As such, benchmark problems have to be developed for each of these domains. The most used suites in the context of parameter optimization are for single-objective, noiseless optimization without additional constraints, where benchmark collections have largely become standardized in the last decade (Hansen, Finck et al., 2009; Suganthan et al., 2005), but other areas (e.g., constrained optimization) have also seen significant efforts to ensure reliable benchmark setups (see e.g., Hellwig and Beyer, 2019). Some tools, such as the COCO framework, have expanded on a single suite of commonly used problems (Hansen et al., 2010) to include variations with constraints, noise, discrete variables, and multiple objectives (Hansen et al., 2021).

While the specific demands of different communities within EC lead to variations in benchmarking problems, the overall procedures have much in common with each other (Beiranvand et al., 2017). To ensure fair comparisons, evaluation counts provide a hardware-independent measure of computational effort, which allows data generated decades ago to still be used today. While performance data formats have not yet been standardized, moves towards interoperability of benchmark tools are gaining steam, with, for example, analysis pipelines that inherently support data from many different sources (Wang et al., 2022). Further developments in terms of data representation and reuse are promising future directions (Kostovska et al., 2021).

While benchmarking plays a key role in understanding the strengths of
optimization algorithms, benchmarking data is useful only when the problem being
benchmarked can be clearly understood. While many benchmark problems are built
from the ground up to have specific high-level properties (e.g., strong global
structure), the way in which the problem and algorithm interact in a lower-level
context is often not as easily described. To aid with this process, the field of *Exploratory Landscape Analysis* (ELA) has been developed
(Mersmann et al., 2011). This field aims
to characterize problems based on sets of low-level features, such as
information content of a set of samples. This type of analysis (Kerschke and
Trautmann, 2019) has proven useful in
areas like algorithm selection (Bischl et al., 2012), but also for benchmarking and understanding the impact of
algorithmic components of modular algorithms (Kostovska, Vermetten, et al., 2022).

## 6 A Subjective Outlook (Not for 30 Years)

In the domain of continuous parameter optimization problems, as defined in Equation
(1), it is extremely difficult
today for practitioners and researchers to pick the best possible optimization
algorithm for a given problem and to parameterize it correctly. First steps in the
direction of problem characterization, for example, by means of *Exploratory
Landscape Analysis*, and problem-specific *Combined Algorithm
Selection and Hyperparameter Optimization* have been made recently, with
promising results. Methods such as dynamic algorithm configuration, metaheuristics,
and modular algorithm frameworks, as discussed in Section 3.6, clearly indicate one path towards the future of the
field—namely, automatically designing the optimal optimization algorithm for
a given problem or problem class.

Related to this, we would like to emphasize that there are too many nature-inspired optimization algorithms around, with new ones being proposed continuously. We strongly propose for the future to consolidate the field, understand which algorithms are good for which optimization problem characteristics, and develop configurable algorithm frameworks that allow for the generation, experimentation, and proper benchmarking of existing and new algorithms. A few more topics we propose are the following:

In addition to the common “competitive” form of benchmarking, a focus should also be placed on the explainability of achieved results by understanding the interaction between algorithmic components and problem properties better.

Where possible, reproducibility should become a requirement for all evolutionary computation publications. We should, however, also leave possibilities to publish about real-world applications, which often use proprietary data or simulators but facilitate a better understanding of the requirements of real-world applications.

It would be good to use configurable algorithm frameworks with straightforward interfaces to allow easy use of state-of-the-art algorithms.

We need to get rid of “novel” nature-inspired algorithms and use a standardized description and formalization approach for new algorithms, to easily see whether they are similar or identical to existing methods.

## Acknowledgments

It would be a paper on its own to acknowledge all the wonderful people we have had the pleasure to work with during the last 30 years, at the Technical University of Dortmund, at Leiden University, and within our worldwide research network. Thomas Bäck would like to mention one person, Hans-Paul Schwefel, who was the reason that I joined this exciting field of research, and have always enjoyed and continue to enjoy this journey. Thank you very much for this, Hans-Paul, and for your inspiration and continuous support, especially of course for the first years of my journey, as a young PhD student under your guidance. Our special thanks also go to David B. Fogel for his very useful input on Section 2.3, and to Emma Hart for encouraging this project and providing suggestions that helped improve the paper.

## Notes

^{1}

In fact, the first version of DE originated as a floating-point-arithmetic
modification of a genetic annealing algorithm (Price et al., 2005) proposed in 1994 in a popular
programmer's magazine, *Dr. Dobb's Journal*.

## References

*Artificial Evolution, 5th International Conference, Evolution Artificielle, EA 2001, Le Creusot, France, October 29--31, 2001, Selected Papers*

*Expert Systems with Applications*

*Journal of Artificial Intelligence Research*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*Evolutionary Computation*

*ACM Computing Surveys*

*Proceedings of the 1999 Congress on Evolutionary Computation*

*Genetic and Evolutionary Computation Conference (GECCO)*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*International Conference on Parallel Problem Solving from Nature*

*Genetic and Evolutionary Computation Conference (GECCO)*

*Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO)*

*Proceedings of the IEEE Congress on Evolutionary Computation*

*Theory of randomized search heuristics: Foundations and recent developments*

*Series on theoretical computer science*

*Artificial Evolution, 7th International Conference, Evolution Artificielle, EA 2005, Lille, France, October 26--28, 2005, Revised Selected Papers*

*Proceedings of Parallel Problem Solving from Nature*

*Proceedings of the 3rd International Conference on Parallel Problem Solving from Nature*

*IEEE Transactions on Evolutionary Computation*

*Handbook of evolutionary computation*

*Contemporary evolution strategies*

*Proceedings of the 9th International Symposium on Foundations of Intelligent Systems*

*Evolutionary Computation*

*Applied Soft Computing*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*CoRR*

*Optimization and Engineering*

*ACM SIGEVOlution*

*The theory of evolution strategies*

*Natural Computing*

*Proceedings of the 24th European Conference on Artificial Intelligence*

*Genetic and Evolutionary Computation Conference (GECCO)*

*Proceedings of the 11th International Conference on Evolutionary Multi-Criterion Optimization*

*Proceedings of the 2021 Genetic and Evolutionary Computation Conference Companion (GECCO)*

*Genetic and Evolutionary Computation Conference Companion (GECCO)*

*Genetic and Evolutionary Computation Conference (GECCO)*

*Proceedings of the 11th International Conference on Parallel Problem Solving from Nature*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*Proceedings of the Symposium on Simulation for Architecture & Urban Design*, Article 26, pp.

*Tenth International Conference on Simulation of Adaptive Behavior, From Animals to Animats*

*Applied Intelligence*

*Computational Intelligence: Collaboration, Fusion and Emergence*

*International Conference on Swarm Intelligence*

*Computers $&$ Operations Research*

*IEEE Transactions on Evolutionary Computation*

*Proceedings of the CEUR Workshop*

*Proceedings of the IEEE Congress on Evolutionary Computation*

*Black box optimization, machine learning, and no-free lunch theorems*

*Proceedings of the IEEE*

*IEEE Transactions on Neural Networks*

*Applied Soft Computing*

*Computers & Operations Research*

*Genetic and Evolutionary Computation Conference Companion (GECCO)*

*Advanced Engineering Informatics*

*IEEE Computational Intelligence Magazine*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*IEEE Transactions on Evolutionary Computation*

*Computers & Industrial Engineering*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*IEEE Transactions on Evolutionary Computation*

*Swarm and Evolutionary Computation*

*Swarm and Evolutionary Computation*

*IEEE Transactions on Evolutionary Computation*

*Evolutionary algorithms in engineering applications*

*Genetic and Evolutionary Computation Conference Companion (GECCO)*

*Memetic Computing*

*Computer Methods in Applied Mechanics and Engineering*

*International Journal for Simulation and Multidisciplinary Design Optimization*

*Complex Systems*

*IEEE Transactions on Evolutionary Computation*

*Evolutionary Computation*

*IEEE Transactions on Evolutionary Computation*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*Theoretical Computer Science*

*Algorithmica*

*Evolutionary Computation*

*Algorithmica*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*Theory of evolutionary computation—Recent developments in discrete optimization*

*ACM Transactions on Evolutionary Learning and Optimization*

*Proceedings of the 17th International Conference on Parallel Problem Solving from Nature*

*CoRR*

*Applied Soft Computing*

*Nucleic acids research*

*European Journal of Operational Research*

*Theoretical Computer Science*

*International Symposium on Micro Machine and Human Science*

*Theoretical Computer Science*

*Swarm and Evolutionary Computation*

*Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence*

*Genetic and Evolutionary Computation Conference (GECCO)*

*Proceedings of the Second Workshop on Foundations of Genetic Algorithms*

*Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO)*

*Proceedings of 2020 Conference on Artificial Life*

*Proceedings of the 1st Annual Conference on Evolutionary Programming*

*Proceedings of International Conference on Neural Networks*

*Statistics and Computing*

*Intelligence*

*Blondie24: Playing at the edge of AI*

*Evolutionary Computation*

*International Joint Conference on Neural Networks*

*2006 IEEE Symposium on Computational Intelligence and Games*

*Artificial intelligence through simulated evolution*

*IEEE Transactions on Human Factors in Electronics*

*18th AIAA/ISSMO Conference on Multidisciplinary Analysis and Optimization*

*Evolutionary Computation*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*Algorithmica*

*Genetic algorithms in search, optimization, and machine learning*

*Proceedings of the 1st International Conference on Genetic Algorithms*

*IEEE Transactions on Cybernetics*

*IEEE Transactions on Systems, Man, and Cybernetics*

*Parallel Problem Solving from Nature*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*Bioinspired optimization methods and their applications*

*Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO)*

*Optimization Methods and Software*

*IEEE Transactions on Evolutionary Computation*

*Proceedings of IEEE International Conference on Evolutionary Computation*

*Proceedings of the 5th European Congress on Intelligent Techniques and Soft Computing*

*Evolutionary Computation*

*Proceedings of the Sixth International Conference on Genetic Algorithms*

*Proceedings of the Sixth International Conference on Genetic Algorithms*

*Proceedings of the 4th Workshop on Foundations of Genetic Algorithms*

*IEEE Transactions on Evolutionary Computation*

*Swarm and Evolutionary Computation*

*Artificial Intelligence*

*Artificial Intelligence*

*Swarm and Evolutionary Computation*

*Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control and artificial intelligence*

*IEEE Transactions on Evolutionary Computation*

*Proceedings of the First International Conference on Swarm, Evolutionary, and Memetic Computing*

*IEEE Symposium Series on Computational Intelligence*

*Artificial Intelligence Review*

*Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO)*

*Proceedings of the IEEE Congress on Evolutionary Computation*

*Applied Sciences*

*International Conference on Genetic Algorithms*

*Evolutionary Computation*

*Soft Computing*

*Swarm and Evolutionary Computation*

*IEEE Transactions on Evolutionary Computation*

*Proceedings of the 2nd Annual Conference on Genetic and Evolutionary Computation (GECCO)*

*IEEE Computational Intelligence Magazine*

*Journal of Global Optimization*

*IEEE Transactions on Evolutionary Computation*

*Proceedings of the 2003 IEEE Swarm Intelligence Symposium*

*Proceedings of the 2002 Congress on Evolutionary Computation*

*Evolutionary Computation*

*Applications in statistical computing*

*Science*

*IEEE Transactions on Evolutionary Computation*

*Reliability Engineering and System Safety*

*Proceedings of Applications of Evolutionary Computation: EvoApplicatons 2010: EvoCOMPLEX, EvoGAMES, EvoIASP, EvoINTELLIGENCE, EvoNUM, and EvoSTOC*

*Proceedings of the 17th International Conference on Parallel Problem Solving from Nature*

*Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO)*

*Genetic and Evolutionary Computation Conference (GECCO)*

*Applied Computational Intelligence and Soft Computing*

*Advances in neural information processing systems*

*Theory of evolutionary computation—Recent developments in discrete optimization*

*Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO)*

*Artificial Intelligence Review*

*IEEE Transactions on Evolutionary Computation*

*Proceedings of the Eleventh International Conference on the Synthesis and Simulation of Living Systems*

*Evolutionary Computation*

*Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO)*

*Theory of evolutionary computation—Recent developments in discrete optimization*

*IEEE Transactions on Neural Networks*

*Evolutionary Computation*

*IEEE Transactions on Evolutionary Computation*

*Journal of Experimental $&$ Theoretical Artificial Intelligence*

*Journal of Machine Learning Research*

*Soft Computing*

*ACM Transactions on Evolutionary Learning and Optimization*

*Operations Research Perspectives*

*Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO)*

*IEEE Transactions on Evolutionary Computation*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*Parallel Problem Solving from Nature 2*

*Proceedings of Genetic and Evolutionary Computation Conference (GECCO)*

*IEEE Transactions on Evolutionary Computation*

*Swarm and Evolutionary Computation*

*OR Spectrum*

*CoRR*

*Information Sciences*

*Proceedings of International Conference on Genetic Algorithms*

*Computer Journal*

*Proceedings of 1997 IEEE International Conference on Evolutionary Computation*

*Proceedings of International Conference on Genetic Algorithms*

*Proceedings of the 1st Annual Conference on Genetic and Evolutionary Companion (GECCO)*

*Evolutionary Computation*

*Proceedings of the 3rd International Conference on Parallel Problem Solving from Nature*

*Theoretical Computer Science*

*Swarm and Evolutionary Computation*

*International Conference on Parallel Problem Solving from Nature*

*Proceedings of IEEE International Conference on Evolutionary Computation*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*Hyper-heuristics: Theory and applications*

*Proceedings of Conference on Parallel Problem Solving from Nature*

*Annual Conference on Artificial Intelligence*

*Proceedings of the 10th International Conference on Parallel Problem Solving from Nature*

*Multimodal optimization by means of evolutionary algorithms*

*Metaheuristics for finding multiple solutions*

*Differential evolution: A practical approach to global optimization*

*Computer Journal*

*Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO)*

*IEEE Transactions on Evolutionary Computation*

*IEEE Transactions on Evolutionary Computation*

*Engineering Optimization*

*IEEE Transactions on Evolutionary Computation*

*International Conference on Parallel Problem Solving from Nature*

*Proceedings of the 2nd Annual Conference on Genetic and Evolutionary Computation (GECCO)*

*Gaussian processes for machine learning*

*International Conference on Parallel Problem Solving from Nature*

*Royal Aircraft Establishment Library Translation 1122*

*Evolutionsstrategie: Optimierung technischer System nach Prinzipien der biologischen Evolution*

*Evolutionary Computation*

*Advances in computers*

*Proceedings of the 3rd International Conference on Genetic Algorithms*

*International Conference on Parallel Problem Solving from Nature*

*Theoretical Computer Science*

*Computer Science Review*

*Computational Intelligence in Expensive Optimization Problems*

*Proceedings of the 6th International Conference on Evolutionary Programming VI, 6th International Conference*

*Proceedings of the 1st International Conference on Genetic Algorithms*

*Numerische optimierung von computer-modellen mittels der evolutionsstrategie.(Teil 1, Kap. 1-5)*

*31st Annual Meeting of the International Society for General Systems Research*

*International Conference on Parallel Problem Solving from Nature*

*1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360)*

*Parallel Problem Solving from Nature*

*2007 IEEE Congress on Evolutionary Computation*

*International Transactions in Operational Research*

*Foundations of genetic algorithms*

*Journal of Global Optimization*

*Evolutionary Computation*

*Problem definitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization*

*Memetic Computing*

*IEEE Transactions on Evolutionary Computation*

*Control and Cybernetics*

*2013 IEEE Congress on Evolutionary Computation*

*Proceedings of the 18th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence*

*International Conference on Parallel Problem Solving from Nature*

*Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO)*

*Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No. 04TH8753)*

*Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation (GECCO)*

*Proceedings of the 15th International Conference on Parallel Problem Solving from Nature*

*2016 IEEE Symposium Series on Computational Intelligence*

*Applied Intelligence*

*Parallel Problem Solving from Nature*

*IEEE Transactions on Evolutionary Computation*

*Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*

*Proceedings of the 2020 Genetic and Evolutionary Computation Conference*

*Proceedings of the 4th International Conference on Evolutionary Multi-Criterion Optimization*

*Knowledge-Based Systems*

*Proceedings of the 29th Annual ACM Symposium on Applied Computing*

*ACM Transactions on Evolutionary Learning and Optimization*

*Proceedings of the 28th International Colloquium on Automata, Languages and Programming*

*International Journal of Applied Metaheuristic Computing*

*Next generation genetic algorithms: A user's guide and tutorial*

*Proceedings of the 11th International Conference on Parallel Problem Solving from Nature*

*IEEE Transactions on Evolutionary Computation*

*Evolutionary Computation*

*Combinatorics, Probability & Computing*

*IEEE Access*

*Proceedings of 1995 IEEE International Conference on Evolutionary Computation*

*Proceedings of the Fifth Annual Conference on Evolutionary Programming*

*IEEE Congress on Evolutionary Computation*

*IEEE Transactions on Evolutionary Computation*

*Proceedings of the 16th International Conference on Parallel Problem Solving from Nature*

*Structural and Multidisciplinary Optimization*

*Knowledge-Based Systems*

*Expert Systems with Applications*

*Proceedings of 8th International Mendel Conference on Soft Computing*

*2017 IEEE Congress on Evolutionary Computation*

*Swarm and evolutionary computation*

*2007 IEEE Congress on Evolutionary Computation*

*Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022*

*2005 IEEE Congress on Evolutionary Computation*

*SPEA2: Improving the Strength Pareto Evolutionary Algorithm*