## Abstract

Substitution Boxes (S-Boxes) play an important role in many modern-day cryptographic algorithms, more commonly known as ciphers. Without carefully chosen S-Boxes, such ciphers would be easier to break. Therefore, it is not surprising that the design of suitable S-Boxes attracts a lot of attention in the cryptography community. The evolutionary computation (EC) community also had several attempts using evolutionary paradigms to evolve S-Boxes with good cryptographic properties. This article focuses on a fitness function one should use when evolving highly nonlinear S-Boxes. After an extensive experimental analysis of the current state-of-the-art fitness functions, we present a new one that offers higher speed and better results when compared with the aforementioned fitness functions.

## 1  Introduction

Today, modern communications would not be possible without cryptography. One can consider online shopping as an example; there, although the end users do not notice it (when all is going well), they use cryptography extensively. Credit card details are transferred through the Internet in an encrypted form, which means no one except the seller can read the data. Naturally, there are a multitude of ciphers that can be used. However, for transmitting protected data most often symmetric-key cryptography, and more precisely block ciphers, are used (Schneier, 1995).

All block ciphers have in common that they need some nonlinear operation/element to be secure. The choice of the nonlinear element depends on the design strategy one follows. A well-known and explored element of this kind is the Substitution Box (S-Box). When explaining S-Boxes, the easiest way is to consider them as a set of Boolean functions; hence S-Boxes are also known as the vectorial Boolean functions.

The input size and output size of an S-Box or -function does not need to be the same. For instance, several S-Boxes are used in the DES cipher which has 8 S-Boxes of size  (FIPS, 1999). However, there are many ciphers that have the same input and output sizes and today two of the most popular sizes for S-Boxes are and . The first S-Box size is usually used in lightweight cryptography that is primarily intended for the constrained environments; an example of such a lightweight cipher is the PRESENT cipher (Bogdanov et al., 2007). On the other side, size S-Boxes are used when the security of a cipher is of primary importance; as an example we mention the AES cipher (Daemen and Rijmen, 2002).

As already said, ciphers that utilize S-Boxes could be easier to break if S-Boxes are not carefully chosen. Therefore, the design of S-Boxes with good cryptographic properties is a very active research area. However, it is impossible to design an S-Box that has all the cryptographic properties optimal and therefore a certain trade-off is necessary (Carlet, 2005).

In the process of the design of S-Boxes (similarly as in the design of Boolean functions), one can roughly follow three directions, namely, algebraic constructions, random search, and heuristics (Picek, Marchiori et al., 2014). Naturally, it is also possible to employ design strategies that represent various combinations of the aforementioned techniques. For the S-Boxes, algebraic constructions (for example, the finite field inversion method (Nyberg, 1991)) give unsurpassed results with regards to the most of the cryptographic properties (Carlet, 2010b).

The random search method has an advantage that it is relatively easy to produce a large number of S-Boxes with it. However, when discussing the quality of such solutions, they are far from those created by algebraic constructions.

The third direction utilizes various heuristics to find S-Boxes with good properties. This direction resulted in a large body of research over the years. If we consider the quality of solutions produced by such methods, they could fit somewhere between the random search and algebraic constructions. Since we stated that the quality of solutions obtained with heuristics is most often inferior to those obtained with algebraic constructions, the question one could ask is why such methods are of interest. Indeed, if we consider the finite field inversion method and S-Box size, the properties of S-Boxes obtained in such a way are superior to any other known method (Carlet, 2010b). However, there are scenarios where new S-Boxes would be beneficial. The first example is when the primary goal is to evolve S-Boxes with properties having values that cannot be obtained with algebraic constructions (for example, when optimizing properties which are opposing). The second example is when the goal is to evolve S-Boxes that have values comparable to those of S-Boxes created with algebraic constructions (i.e., to create proprietary S-Boxes). The third scenario does not reflect so much the quality of properties (although that is still of high importance), but rather it deals with solutions that are desirable from the implementation perspective. One example is when the designer requires a solution which, when implemented in hardware, offers the best performance (for instance, with regards to the power, area, or latency).

Improving the quality of heuristics (and consequently the results obtained by it) represents a valuable goal since it would help raise the confidence in such methods. In accordance with that, devising heuristics that could reach the solutions obtained with algebraic constructions would have a significant impact for both cryptography and EC community. Therefore, the central question of this research is how to improve the quality of results obtained by heuristics. One quite common way is to explore different heuristics in a hope to find the most appropriate one. However, such a research avenue cannot help to strengthen the foundations of one’s knowledge nor would it allow more general answers about the problem difficulty. Before going further into details, we briefly discuss the number of possible S-Boxes for different input/output sizes to see how difficult this problem really is. The number of functions equals and for an case, we show several search space sizes in Table 1. Seeing the exponential growth of the search space with the number of variables, it is obvious that the exhaustive search does not present a viable option for sizes larger than four.

Table 1:
Search space size for functions.
n45678

n45678

In this article, we focus on the questions of representation and fitness function instead of the choice of heuristics. Indeed, our research indicates that the varying success of different evolutionary methods stems from the choice of fitness function and not so much from the choice of the algorithm itself. To this end, we work with several evolutionary algorithms (EAs) where we investigate the choices behind the representation perspective and fitness functions. We consider the choice of the fitness function as of the utmost importance and therefore we examine it thoroughly in order to find fitness functions that have sound bases and good performance regardless of the size of S-Boxes.

The remainder of this article is organized as follows. In Section 2, we present the necessary background information on S-Boxes and their properties, as well as some bounds on the values that are possible to obtain. Next, in Section 3, we give a survey of related work. In Section 4, we discuss the motivation behind this research as well as our contributions. Section 5 gives details about the algorithms we use and details about the fitness functions from related work. In Section 6, we give the results for five different sizes of bijective S-Boxes when using state-of-the-art fitness functions. Furthermore, we present the details about our new cost function that offers greater speed and higher nonlinearity values and we support that with detailed experimental results and a short discussion. Finally, in Section 7, we give a short conclusion and offer several potential research directions.

## 2  Preliminaries

Let be positive integers, that is, . The set of all -tuples of the elements in the field is denoted as where is the Galois field with two elements. The inner product of two vectors and is denoted as and equals . Here, “” represents addition modulo two (bitwise XOR). The Hamming weight (HW) of a vector , where , is the number of non-zero positions in the vector.

An -function is any mapping F from to . If equals 1, then the function is called a Boolean function, and when the function F is called an S-Box or a vectorial Boolean function. An -function F can be defined as a vector , where the Boolean functions for are called the coordinate functions of F. The component functions of an -function are all the linear combinations of coordinate functions with non-all-zero coefficients.

### 2.1  Representations

In this section, we present several unique S-Box representations that are of relevance for this work. We note that these representations are not the only unique ones. An -function F can be represented as a list of values (lookup table, LUT), with each value ranging from 0 to . Alternatively, an -function can be implemented as a lookup table with words of bits each. When it is usual that the S-Box is bijective, that is, that each value in the output appears exactly once.

A Boolean function on is represented by a truth table (TT), which is a vector that contains the function values of , ordered lexicographically, i.e., , where and are two input entries for the truth table (Carlet, 2010a). An S-Box can be represented in the truth table form as a matrix of dimension where each column represents one Boolean function (i.e., one coordinate function).

The Walsh--Hadamard transform of an -function F equals (Carlet, 2010b):
1
where and .

### 2.2  Properties

Since we are interested in evolving S-Boxes with good cryptographic properties, it is necessary first to discuss which S-Boxes are suitable in practice. When considering S-Boxes, there exist in total 16! bijective S-Boxes, which is approximately options to search from. Leander and Poschmann (2007) define 4-bit optimal S-Boxes as those that are bijective (balanced), have linearity equal to 8, and -uniformity equal to 4. The linearity property that equals 8 is the same as the nonlinearity property that equals 4 and we continue using the nonlinearity property value as an indicator. The optimal S-Boxes possess the best possible values of the aforementioned properties. For the S-Box size, we concentrate only on optimal S-Boxes as these are of practical interest. Indeed, as far as the authors know, all ciphers that have S-Boxes use optimal S-Boxes (Bogdanov et al., 2007; Borghoff et al., 2012; Daemen et al., 2000). For all other S-Box sizes that we consider in this work, we restrict our attention to the same properties as for the size.

An -function is called balanced if it takes every value of the same number of times. Balanced -functions are permutations on  (Carlet, 2010b).

The nonlinearity of an -function F equals the minimum nonlinearity of all nonzero linear combinations of its coordinate functions , where  (Carlet, 2010b; Nyberg, 1993):
2
Let be a function from into and . We denote:
3
The entry at the position corresponds to the cardinality of and is denoted as . The -uniformity is then defined as (Biham and Shamir, 1991; Nyberg, 1991):
4

Here, we give a small S-Box example in Tables 2 and 3. First, in Table 2, we give the truth table and lookup table representations. In Table 3, we present all linear nonzero combinations (component functions) and the Walsh--Hadamard spectrum. By simply setting the Walsh--Hadamard values from Table 3 to Equation (2), we see that the nonlinearity of this S-Box equals 0.

Table 2:
S-Box example, part 1.
Truth table inputTruth table outputLUT inputLUT output
Truth table inputTruth table outputLUT inputLUT output
Table 3:
S-Box example, part 2.
−4
−4
−4
−4

### 2.3  Bounds

The maximal nonlinearity of any S-Box is upper bounded by the following inequality:
5
In the case of equality, we talk about Almost Bent (AB) functions, where examples are power functions like Gold and Kasami. Note that AB functions exist only for odd dimensions (Carlet, 2010b). When is even, the best known nonlinearity is obtained for the inverse function and it equals (Budaghyan et al., 2006):
6

For the -uniformity property the best possible value is 2 (the smaller the better), which is a value obtained for Almost Perfect Nonlinear (APN) functions. When talking about bijective functions, this bound is achieved for any odd and also for . For even and larger than 6, it is an open question whether such functions exist. The best known -uniformity value for the bijective S-Box equals 4 and is obtained with the inverse function (Carlet, 2010b). Note that every AB function is also APN function, but the converse is not true. For further information about S-Boxes, their properties, and role in cryptography, we refer interested readers to Carlet (2010b).

## 3  Related Work

There are several successful applications of Monte Carlo algorithms when creating S-Boxes and we mention here only a representative subset. Clark et al. (2005) use the principles from the evolutionary design of Boolean functions to evolve S-Boxes with desired cryptographic properties. They use simulated annealing (SA) coupled with the hill climbing algorithm to evolve bijective S-Boxes of sizes up to that have high nonlinearity values.

Millan et al. (1999) work with genetic algorithms (GAs) to evolve S-Boxes with high nonlinearity and low autocorrelation. Furthermore, the authors discuss the selection of the appropriate genetic algorithm (GA) parameters.

On the other hand, Burnett et al. (2001) use a heuristic method to generate MARS-like S-Boxes. They are able to generate a number of S-Boxes of appropriate size that satisfy all the requirements placed on the MARS S-Box (Burwick et al., 1999). With a combination of several techniques, they even manage to find S-Boxes with improved nonlinearity values.

Fuller et al. (2004) use a heuristic method to evolve bijective S-Boxes from the power mappings. They use only iterated mutation operators and report to generate S-Boxes with the best known trade-offs among the criteria they consider.

Burnett (2005) in her thesis experiments with a GA, hill climbing or with a combination of those two methods in order to evolve S-Boxes where .

Tesař (2010) uses a combination of a special GA with a total tree searching to generate S-Boxes with nonlinearity equal up to 104.

Kazymyrov et al. (2013) use an improved gradient descent method to find S-Boxes that have high nonlinearity values.

Ivanov et al. (2016) experiment with a GA that works in a reverse way in order to generate bijective S-Boxes of dimensions from to . They seed the initial S-Boxes population with solutions based on the finite field inversion method and then evolve them to find new solutions. The same authors use a modified immune algorithm to generate S-Boxes that are balanced with high nonlinearity and low -uniformity (Ivanov et al., 2015).

Picek, Ege, Batina et al. (2014) explore how to generate S-Boxes of size with a better resistance against side-channel attacks (SCA) as measured with the transparency order property. Next, Picek, Ege, Papagiannopoulos et al. (2014) and Picek, Papagiannopoulos, Ege et al. (2014) investigate side-channel resilience of S-Boxes as well as when considering the confusion coefficient property. Ege et al. (2015) use genetic algorithms to evolve S-Boxes with better SCA resilience and they implement such S-Boxes in both software and hardware settings in order to properly evaluate them. Finally, Picek, Mazumdar et al. (2015) experiment with the modified transparency order property in order to achieve S-Boxes with better SCA resistivity.

Picek, Miller et al. (2015) use Cartesian genetic programming (CGP) and genetic programming (GP) to evolve S-Boxes where they discuss how to obtain permutation-based encoding with those algorithms.

## 4  Motivation and Contributions

As already mentioned, the quality of solutions achieved with heuristics lies somewhere between the random search and algebraic constructions. To clarify this, we consider here only the two (since we assume S-Boxes are always bijective) most important properties (nonlinearity and -uniformity) and observe the following. When using random search for generating S-Boxes of size , the nonlinearity goes approximately up to 98 (the higher the better) and the -uniformity is in the range of 10 to 18 (the smaller the better). More generally, S-Boxes created with random search usually have good nonlinearity values, but poor -uniformity values (Dib, 2014). However, when using heuristics, nonlinearity is usually in range between 98 and 104. The -uniformity property is somewhat less considered, but again we can observe a slight improvement with values around 10 when compared with the random search. Lastly, when using the finite field inversion construction, the nonlinearity equals 112 and the -uniformity is equal to 4 for size.

Next, we discuss various options regarding the representations of Boolean functions and the influence of a fitness function and then we enumerate our contributions.

### 4.1  On the Representation Options

We start from the truth table representation that consists of rows and columns. The rows represent all possible combinations of the input bits while the columns represent all possible combinations of the output bits. We first investigate a single Boolean function scenario (i.e., when ) and then extend it to Boolean functions.

For a single Boolean function, the most natural way of representation is the truth table form. There, to obtain a balanced solution, it is enough to set exactly half of the zeros and half of the ones in the output vector. However, when talking about an S-Box, it is not enough that each column (a Boolean function) is balanced, but also all linear combinations need to be balanced. If we work with the bitstring representation, this immediately starts to present a problem. For instance, if we use a GA, then the resulting chromosome has dimensions, or when using GP, independent trees. The situation is slightly less complicated with CGP where we can have a graph with outputs.

For smaller dimensions (up to 4), our experiments show it is possible to obtain balanced solutions with the truth table representation. However, even that represents a moderately difficult problem. Furthermore, if we add just one more property, for example the nonlinearity, then we again need to check not only the coordinate functions (columns), but also all linear combinations of columns. Therefore, although a valid representation, it presents difficulties that are hard for EC to circumvent.

Next, we concentrate on the Walsh--Hadamard representation and a single Boolean function scenario. In this representation, each coefficient is a number in . However, there is no straightforward way to set those values. Only if we want to create a bent (where bent functions exist only for even) Boolean function, we know that the Walsh--Hadamard spectrum is flat, that is, . However, bent functions are not balanced and therefore not appropriate for most usages in cryptography (Carlet, 2010a). Besides that information, there is also the Parseval theorem:
7

Unfortunately, this theorem only gives a necessary condition on a Boolean function and not a sufficient one. Therefore, to evolve a Boolean function represented with the Walsh--Hadamard spectrum, it would be necessary to constantly check whether the solution is indeed a Boolean function and to implement an operator that would fix those solutions that are not valid (alternatively, such solutions would need to be discarded). Going to the S-Box case would be even harder since there are a number of Boolean functions we need to check.

Finally, we are left with the lookup table representation where in the case that , we can represent an S-Box as a permutation (if the resulting S-Box is bijective, which is the usual design choice). In such a case, an S-Box is always balanced and we do not need to verify that property. Therefore, it is easy to conclude that the permutation encoding represents the most natural one for S-Boxes and is consequently the encoding we use in the rest of this article. As already stated, there are other unique representations of Boolean functions (and consequently S-Boxes), but they provide even less information in the evolution process, so we do not consider them here.

### 4.2  On the Fitness Functions

Since we decide to work with the permutation encoding, we do not need to consider the balancedness property because it is satisfied automatically. Therefore, we consider now only the nonlinearity and the -uniformity properties. We start with the analysis of the nonlinearity property first. By checking related work we can observe that a significant part of it uses only the nonlinearity value itself (i.e., the extreme value of the Walsh--Hadamard spectrum) as a part of the fitness function. Therefore, the fitness function is of the form:
8
where the goal is maximization. Often the fitness function has other parts in accordance with the properties one wants to obtain. In common for all works where the fitness function is of the previous form is that the final nonlinearity value usually does not reach over 100 for the size. A possible explanation is that such a fitness function actually loses a significant part of the information since it works with only one, extreme value and not with all the values in the Walsh--Hadamard spectrum. On the basis of Equation (2), it is evident that by lowering the values of the whole Walsh--Hadamard spectrum (i.e., making the spectrum more flat), the final nonlinearity value will be higher. This also stems from the fact that the maximal nonlinearity is obtained for bent functions where the Walsh--Hadamard spectrum is flat and equals . In accordance with that, Clark et al. (2004) introduced a cost function that considers the whole Walsh--Hadamard spectrum for a Boolean function and equals:
9
where and are real-valued parameters. One can immediately observe a significant downside of this cost function: it uses additional parameters where the only way to adjust them is to conduct the parameter tuning phase. Furthermore, such a procedure needs to be done for each Boolean function size where the authors experiment with different parameter values for up to 9. To elaborate more on the downside of such fitness function, we cite Clark et al. (2004, page 2): “The parameters and provide freedom to experiment. It is difficult to predict what the best parameter values should be: it is far from clear what is the effect of imposing a balance requirement, and what is the effect of an odd .”
The fitness from Equation (9) can be naturally extended to a multioutput case where it equals (Clark et al., 2005):
10
Here, was set to 3.0 and was in set . The best nonlinearity obtained with this fitness function and S-Box size equals 102. The authors also try to give at least a notion of the reasoning behind the choice of parameter values, but cannot give definitive guidelines. Note that the multioutput case cost function suffers from the same downside as for the single output case, namely, the need for tuning and parameters.

Tesař (2010) conducted an extensive parameter tuning where both parameters are integers and goes in range while goes in range . The author found that the best set of parameters has values and , but offers no reasoning why this set of values would perform the best. Furthermore, with his algorithm that consists of a combination of the genetic algorithm and hill climbing the author states to regularly find S-Boxes of nonlinearity 104 for the size.

Besides the aforesaid results, there exist a small number of research papers where authors use heuristics and obtain nonlinearities higher than 104. However, we note that the comparison between them is not entirely fair since the authors do not start from a random initial population, but from a population (or a single individual) that is highly competitive. Ivanov et al. (2016) start from the population of AES affine equivalent S-Boxes and then use the “reverse genetic algorithm” to evolve solutions up to a certain level (i.e, a nonlinearity value). The best S-Box the authors report has the nonlinearity of 112, which is the same as in the case of the AES S-Box. Next, Fuller and Millan (2003) explore the linear redundancy in S-Boxes and experiment with the AES S-Box where they conduct swaps of 8 value pairs. In such a way, the best obtained S-Box has the nonlinearity of 106. Finally, Kazymyrov et al. (2013) use a gradient descent method where they start with an S-Box that has good -uniformity value and they conduct a number of steps until they find an S-Box with good nonlinearity, which in their work reaches 104 for the size. To conclude, if disregarding the last three mentioned papers, since they use extra knowledge in the form of special initial population, the best nonlinearity equals 104 where it is obtained by a combination or customization of algorithms. However, a single algorithm that uses Equation (9) goes up to only 102 where the best choice of parameters is unclear (considering only a single S-Box size, let alone a number of sizes).

The situation with the -uniformity is much simpler since many of the related works do not consider that property. Usually when this property is of interest then it is evaluated only a posteriori. The exceptions are works by Picek, Papagiannopoulos et al. (2015) and Picek, Mazumdar, Ege et al. (2014) where the authors consider the -uniformity value in the fitness function. However, this is somewhat simplistic in a sense that only the smallest (extreme) value is considered. On the other hand, Ivanov et al. (2015) evolve bijective S-Boxes where they use a more complex expression in the fitness function to calculate the -uniformity property.

### 4.3  Our Contributions

The main contribution of this article is the definition of a new fitness function that is able to produce highly competitive results with greater speed. Furthermore, the nonlinearity property easily reaches the value of 104 without using combinations of algorithms as used in related work. Finally, our new fitness function does not use parameters that are necessary to tune or difficult to justify theoretically. Rather, it uses only a single parameter that enables the researcher to select how fine-grained a procedure is wanted.

Besides that contribution, we give a number of smaller ones in forms of the analysis of the existing fitness functions as well as the extensive analysis of the performance of several evolutionary algorithms and fitness functions on five sizes of bijective S-Boxes.

## 5  Experimental Setup

In our experiments, we use three different algorithms, namely, Genetic Algorithm (GA) (Eiben and Smith, 2003), Genetic and Tree Algorithm (GaT) (Tesař, 2010), and our Local Search Algorithm (LSA) with two different cost functions. The experiments are first executed using the Clark’s cost function (Clark, 1998) with two sets of parameters and again with our newly developed cost function that finds cryptographically strong S-Boxes with greater speed. The details about our new cost function are presented in Section 6.

A number of experiments are performed using the Evolution Strategy (ES) algorithm, the Artificial Immune System (AIS) algorithm, and the Particle Swarm Optimization (PSO) algorithm. However, the obtained results show that they do not outperform those reached with the GA. In all our experiments, we use a solution representation as explained in Sections 2.1 and 4.1. All experiments are conducted on bijective S-Boxes of dimensions , , , , and .

### 5.1  Common Parameters

The number of independent runs for each experiment is 50. The stopping conditions for every algorithm we use are the number of evaluations or the target nonlinearity value as given in Table 4. The values presented in the table are determined after performing a small set of tuning experiments. For an S-Box of dimensions , the highest known nonlinearity value was found, but for larger sizes the experiments showed us that, by using any of the presented algorithms, S-Boxes of higher nonlinearity values cannot be found in a reasonable amount of time.

Table 4:
Common parameters. NoE—maximal number of solution evaluations, NL—target nonlinearity, NLH—nonlinearity highest known value.
S-Box sizeNoENLNLH

10 12
22 24
48 56
104 112
S-Box sizeNoENLNLH

10 12
22 24
48 56
104 112

### 5.2  Genetic Algorithm

The GA starts with an initial population created by randomly setting each value from 0 to as outputs of a lookup table, where is the dimension of an S-Box ().

After the initial population is generated, the genetic algorithm starts with the evolution process. In each iteration it randomly chooses 3 possible solutions (the tournament of size equal to 3) and selects the worst solution among those (this selection method also ensures elitism, i.e., the best solutions are always propagated to the next iteration). The remaining two solutions are used as parents, which create one offspring via variation operators and the offspring is then evaluated. If it is better than the worst solution in the tournament and its genome is different from all other solutions in the population, then it replaces the worst solution; otherwise, the process is repeated.

For variation operators, we use three mutation operators and three crossover operators where we chose among the most common ones in use today. The mutation operators used are insert mutation (Eiben and Smith, 2003), inversion mutation (Eiben and Smith, 2003), and swap mutation (Ryan et al., 2004). As for the crossover, we used partially mapped crossover (PMX) (Goldberg and Lingle, 1985), position-based crossover (PBX) (Syswerda, 1985), and order crossover (OX) (Davis, 1985). For each offspring, an operator is selected uniformly at random between all operators within a class (mutation and crossover).

The GA parameters are the following ones: the size of the population is 100. The mutation probability per individual is set to 0.13 for the insert and inversion mutation and 0.25 for the swap mutation. These parameters were chosen on a basis of a small set of tuning experiments in which they showed the best results on average.

### 5.3  Genetic and Tree Algorithm

As mentioned before, this algorithm consists of two parts: a special case of the GA and the total tree search. This algorithm’s parameters are set in accordance with the previous results (Tesař, 2010). We emphasize the two most important parameters of the algorithm: the criterion to enter the tree part of the algorithm and the stopping criterion. Here, the criterion to enter the tree part is the S-Box nonlinearity value as given in Table 5.

Table 5:
Parameters for GaT algorithm: NT—Nonlinearity value to enter the tree part of the algorithm.
S-Box size
20 46 102
S-Box size
20 46 102

The first part of this algorithm is a population algorithm where its initial population is generated in the same way as it is done for the GA. The algorithm runs the GA part until it finds the solution that satisfies the criterion to enter the tree part. With that solution the total tree search algorithm starts and runs until any of the stopping criteria are met. For more details on this algorithm, we refer interested readers to Tesař (2010).

### 5.4  Local Search Algorithm

Here, we illustrate how the local search algorithm works where we use it for a comparison with the other evolutionary algorithms we investigate in this article. The algorithm starts with a single randomly generated solution which is set as algorithm’s current solution. In each iteration, the algorithm produces new solutions generated with given mutation operators. The mutation operator randomly chooses k different positions in the solution representation and then randomly permutes elements on chosen positions. To achieve the best results, we provide mutation operators with . By using a small set of tuning experiments, we determine that larger values of are destructive, that is, they are more likely to harm than to improve the solution they are applied on. Due to that reason, we opted to use mutation operators with small values of .

As an input to the LS algorithm, a set of mutation operations is provided. Each mutation operator is defined with two parameters, and , where is a number of positions whose elements are to be permuted and defines how many times that mutation operator will be applied on the current solution in each iteration. The mutation operators used in our experiments are presented in Table 6. Here, the total number of solutions generated in each iteration is 97, where that value is easily obtained by summation of all values in a single row of Table 6. In each iteration, new solutions are generated by applying given mutation operators. From produced solutions, the best one is selected and set as the current solution.

Table 6:
Mutation operators used in LS algorithm.
k234567
l 50 25 12
k234567
l 50 25 12

### 5.5  Cost Functions Parameters

We use two different cost functions in our experiments: the Clark’s cost function as given in Equation (9) and our newly developed cost function as given in Equation (12). We investigate the Clark’s fitness function with parameters and on bijective S-Boxes of dimensions  (Clark et al., 2005). Following that, we set parameters and to 21 and 7, respectively. Those values are presented as the best performing in the experiments for S-Boxes of dimension  (Tesař, 2010).

Contrasting that, the cost function we develop uses only a single parameter which defines how many values () of the Walsh--Hadamard spectrum will be used in the solution evaluation. The results presented are obtained with the parameter values and . With the cost function reduces to a base case, which is the initial form of our cost function. Later, we generalize it to use an arbitrary number of values. The value is chosen based on a small set of tuning experiments in which it showed the best results on average for all S-Box sizes we investigated.

In all algorithms used, the fitness function is represented with a two-component vector , where represents the solutions’ nonlinearity and is a value obtained by the Clark’s or our cost function. The Clark’s cost function is presented in Section 4.2 and ours is described in detail in Section 6.2. A solution with a fitness is better than the solution with a fitness if , or but (which, in effect, represents a hierarchical vector comparison).

## 6  Results and Discussion

In this section, we present the results obtained in our experiments. First, we give the results acquired when using the Clark’s cost function and continue with the development of our cost function and consequently the results obtained with it. Note that if not stated otherwise, we evaluate the -uniformity property a posteriori.

### 6.1  Results with the Clark’s Cost Function

Recall from Section 4.2 that the Clark’s cost function is calculated on the basis of values in the Walsh--Hadamard spectrum and its value depends on two parameters, namely, and . Clark et al. (2005) run the experiments with the parameter values and . However, values and were determined as the best choice for an S-Box of size after performing the exhaustive grid search (Tesař, 2010). Furthermore, there is no intuition provided why those exact values would give the best results. For S-Boxes of smaller dimensions no such exhaustive search was performed and based on the values of their Walsh--Hadamard spectra and the Clark’s cost function definition one can expect those parameters not to work well. Therefore, we perform two sets of experiments with the Clark’s cost function: one with the parameters and and the other with and .

The results obtained using the Clark’s cost function and different and parameter values are presented in Tables 7 and 8 in which we give the best and the average results obtained by using all three algorithms.

Table 7:
Results obtained using the Clark’s cost function, and . Presented values are best and average solutions found in format nonlinearity/-uniformity.
GAGaTLSA
S-Box size/AlgorithmBestAvgBestAvgBestAvg

GAGaTLSA
S-Box size/AlgorithmBestAvgBestAvgBestAvg

Table 8:
Results obtained using the Clark’s cost function, and . Presented values are best and average solutions found in format nonlinearity/-uniformity.
GAGaTLSA
S-Box size/AlgorithmBestAvgBestAvgBestAvg

GAGaTLSA
S-Box size/AlgorithmBestAvgBestAvgBestAvg

Next, in Tables 9 and 10, we give the average number of solution evaluations before a certain nonlinearity value was found while optimizing S-Boxes of dimension . In the case that an algorithm is not able to obtain a certain solution repeatedly, we note that with the acronym NPR (Not able to Produce Repeatedly). From the presented results, we can conclude that the performance of the Clark’s cost function is highly dependent on the parameters and and without exhaustive search it is difficult to say which ones would be the optimal for an arbitrary S-Box size.

Table 9:
Average number of evaluations needed with the Clark’s cost function, and , S-Box size .
Algorithm/Nonlinearity98100102104
GA 457 9194 119229 NPR
GaT 677 3693 90059 NPR
LSA 623 2814 69420 NPR
Algorithm/Nonlinearity98100102104
GA 457 9194 119229 NPR
GaT 677 3693 90059 NPR
LSA 623 2814 69420 NPR
Table 10:
Average number of evaluations needed with the Clark’s cost function, and , S-Box size .
Algorithm/Nonlinearity98100102104
GA 416 6492 28200 NPR
GaT 171 1055 9024
LSA 371 1095 6701 NPR
Algorithm/Nonlinearity98100102104
GA 416 6492 28200 NPR
GaT 171 1055 9024
LSA 371 1095 6701 NPR

### 6.2  Results with the New Cost Function

In this section, we present the results obtained with our cost function. The motivation for our function comes from the promising results of the Clark’s cost function which is based on the Walsh--Hadamard spectrum. More precisely, from Equation (2) it can be seen that the nonlinearity is directly defined with the maximum absolute value of the Walsh--Hadamard coefficient in the Walsh--Hadamard spectrum. Note that there can be a variety of different coefficient values with different multiplicities in the spectrum, in accordance with Equation (1).

With that in mind, the definition of our cost function stems quite naturally. We want to decrease the number of coefficients that have the maximum absolute value. Once the number of the maximum absolute value coefficients is reduced to zero, the nonlinearity of an S-Box is increased. As stated in Section 5.5, S-Boxes are first compared on their nonlinearity value (the larger the better) and if the nonlinearity value is equal, then we see that the better S-Box is the one with a smaller value of the cost function. The value of the initial version of our cost function when applied on an S-Box is evaluated as a number of the Walsh--Hadamard coefficients of the maximum absolute value (denoted here as ):
11

In Figure 1, we display the Walsh--Hadamard spectrum of an S-Box with the nonlinearity value of 102. Its maximum absolute coefficient value equals 52. Only when the number of coefficients with the value is reduced to 0, the nonlinearity of that S-Box would increase to 104, with coefficients remaining the new largest ones. The Walsh--Hadamard spectrum of an S-Box and the nonlinearity 104 is given in Figure 3, where we can see that there are no coefficients with the absolute value 52. To illustrate it better, consider a case where there are two S-Boxes with the nonlinearity 102: S-Box and S-Box with their respective Walsh--Hadamard spectra given in Figures 1 and 2. When using our cost function, is considered better since it has fewer coefficients with the absolute value of 52.

Figure 1:

The Walsh--Hadamard spectrum of an S-Box , size , .

Figure 1:

The Walsh--Hadamard spectrum of an S-Box , size , .

Figure 2:

The Walsh--Hadamard spectrum of an S-Box , size , .

Figure 2:

The Walsh--Hadamard spectrum of an S-Box , size , .

Figure 3:

The Walsh--Hadamard spectrum of an S-Box, size , .

Figure 3:

The Walsh--Hadamard spectrum of an S-Box, size , .

The results obtained with our initial version of cost function are presented in Tables 11 and 12. In the latter table, we show the speed of convergence for our cost function for different nonlinearity values and S-Box size.

Table 11:
Results obtained using the initial version of our cost function. Presented values are best and average solutions found in format nonlinearity/-uniformity.
GAGaTLSA
S-Box size/AlgorithmBestAvgBestAvgBestAvg

GAGaTLSA
S-Box size/AlgorithmBestAvgBestAvgBestAvg

Table 12:
Average number of evaluations needed with the initial version of our cost function, S-Box size .
Algorithm/Nonlinearity98100102104
GA 416   NPR
GaT 453   NPR
LSA 468   NPR
Algorithm/Nonlinearity98100102104
GA 416   NPR
GaT 453   NPR
LSA 468   NPR

The initial version of our cost function gives promising results, but not as good as those acquired with the Clark’s cost function. To improve our cost function, we consider now up to coefficients in the Walsh--Hadamard spectrum. By doing so, we aim to decrease the number of any of those absolute coefficients.

The “importance” of reducing the number of the absolute value coefficients is proportional with the coefficients' absolute values. To state it differently, the most important thing is to reduce the number of maximum absolute value coefficients and then in the decreasing order the number of all absolute value coefficients. Therefore, one can expect good results if one reduces only the biggest absolute coefficient values. However, to improve the obtained results, one should also consider smaller values of coefficients.

Let be a histogram of absolute values of the Walsh--Hadamard coefficients for an S-Box . It is a vector having on position the number of coefficients equal to in the Walsh--Hadamard spectrum of an S-Box . Let denote the maximal (last) position in this vector with the nonzero value. Then the maximum absolute value of the Walsh--Hadamard coefficients for an S-Box is and this coefficient determines the S-Box nonlinearity. Now, our cost function is given as:
12
where is a cost function applied to an S-Box , is the -th component of a zero-indexed vector , and we set . Note when , this expression reduces to our initial cost function.

In Equation (12), by using the terms we set the aforementioned “importance” of reducing the number of absolute value coefficients in the vector . We see from the expression that the number of coefficients with the maximum absolute value will be multiplied with the largest value of the term (which is ) making it the most influential while computing the value of the cost function.

As stated in Section 5.5, in our experiments we set to the value of 10. Note that if an S-Box does not have 10 coefficient levels (i.e., ), we then take all the possible coefficients into account when calculating the value of the cost function. For a concrete example we take S-Box with the nonlinearity 104 with its Walsh--Hadamard spectrum shown in Figure 3. Here, we take the frequency of coefficients with the value , then we add the number of coefficients with the value divided by and continue, as defined in Equation (12), all the way until the number of coefficients with the value divided by is added. The results obtained with the final version of our cost function are presented in Tables 13 and 14.

Table 13:
Results obtained using the final version of our cost function. Presented values are best and average solutions found in format nonlinearity/-uniformity.
GAGaTLSA
S-Box size/AlgorithmBestAvgBestAvgBestAvg

GAGaTLSA
S-Box size/AlgorithmBestAvgBestAvgBestAvg

Table 14:
The average number of evaluations needed with the final version of our cost function, S-Box size .
Algorithm/Nonlinearity98100102104
GA 473 6184
GaT 172 751 4519
LSA 387 1003 4362
Algorithm/Nonlinearity98100102104
GA 473 6184
GaT 172 751 4519
LSA 387 1003 4362

### 6.3  Result Comparison

Here, we provide a comparison of results acquired with the Clark’s and our cost functions. In Table 15 are given the maximum nonlinearity values acquired with the aforesaid cost functions. Naturally, since the problem of nonlinearity is a discrete one, where we have only a small number of possible values, it is hard to conclude which of the cost functions performs the best.

Table 15:
The maximum nonlinearity value found for each cost function.
Clark’s cost functionOur cost function
Cost function

10 10 10 10
22 22 22 22
46 48 48 48
104 102 104 104
Clark’s cost functionOur cost function
Cost function

10 10 10 10
22 22 22 22
46 48 48 48
104 102 104 104

Next, we present algorithms’ convergence for an S-Box of size. In Figure 4, we compare convergence of the Clark’s function with the parameters  (Clark et al., 2005) and  (Tesař, 2010). Based on the results presented, we conclude that the parameter pair is indeed better than . The convergence speed of the pair is much better and with the GaT algorithm it repeatedly manages to find S-Boxes with the nonlinearity value 104. On the other hand, with the parameters none of the three algorithms used could repeatedly produce S-Boxes with the nonlinearity 104. However, as mentioned in Section 6.1, for smaller S-Box sizes, parameter pair gives worse results than the parameter pair . In order to achieve the best results using the Clark’s function we would have to perform a more exhaustive parameter tuning for every S-Box dimension, which is of course possible, but is a time-consuming process.

Figure 4:

Convergence of algorithms GA, GaT, and LSA using the Clark’s cost function with two sets of parameter pairs: and , S-Box size .

Figure 4:

Convergence of algorithms GA, GaT, and LSA using the Clark’s cost function with two sets of parameter pairs: and