The performance of image classification is highly dependent on the quality of the extracted features that are used to build a model. Designing such features usually requires prior knowledge of the domain and is often undertaken by a domain expert who, if available, is very costly to employ. Automating the process of designing such features can largely reduce the cost and efforts associated with this task. Image descriptors, such as local binary patterns, have emerged in computer vision, and aim at detecting keypoints, for example, corners, line-segments, and shapes, in an image and extracting features from those keypoints. In this article, genetic programming (GP) is used to automatically evolve an image descriptor using only two instances per class by utilising a multitree program representation. The automatically evolved descriptor operates directly on the raw pixel values of an image and generates the corresponding feature vector. Seven well-known datasets were adapted to the few-shot setting and used to assess the performance of the proposed method and compared against six handcrafted and one evolutionary computation-based image descriptor as well as three convolutional neural network (CNN) based methods. The experimental results show that the new method has significantly outperformed the competitor image descriptors and CNN-based methods. Furthermore, different patterns have been identified from analysing the evolved programs.

Image classification is concerned with categorising images into a predetermined set of classes based on the visual content of those images, and represents an essential task in computer vision that has received increasing attention over the past few decades. Human beings have a very advanced and complicated visual system that allows the analysis and understanding the visual content of images efficiently. However, trying to mimic this ability with machines is very difficult and many researchers have investigated some small components, for example, region of interest detection, classification, and feature extraction, of the whole human visual system puzzle. The majority of machine learning algorithms were designed to deal with features rather than raw data. To extract good features from images, it is important to find a reliable set of image keypoints such as corners, vertical lines, horizontal lines, and shapes. Conventionally, such keypoints are manually designed by a domain expert, and detecting them in an image may also require human intervention to manually annotate each keypoint or develop a method to automatically search for them. Performing such operations (labelling and designing) is a time-consuming task. Hence, several image descriptors have emerged that aim to automatically detect one or more image keypoints and generate the corresponding feature vector. In other words, the ultimate goal of an image descriptor is to extract features from an image. The Harris corner detector (Harris and Stephens, 1988), scale-invariant feature transform (SIFT) (Lowe, 1999), local binary patterns (LBP) (Ojala et al., 1994), and their recent variants (Wang et al., 2012; Satpathy et al., 2014) are some typical image descriptors that are widely used in computer vision and pattern recognition domains. These image descriptors can be divided into sparse and dense categories based on the mechanism by which they generate a feature vector for an image. The methods of the former group extract features from some parts of the image that mostly are small cut-outs, also known as patches, of the original image. The methods of the latter group operate in a pixel-by-pixel fashion, where each pixel in the image contributes towards generating the feature vector. Although these image descriptors have facilitated the process of generating feature vectors for an image, they have three main issues. First, designing such descriptors still requires domain-expert interventions to develop different components. Second, some of them are not robust to various deformations such as rotation, illumination and scale, and extending them to handle such deformations may require major changes. Third, some descriptors are designed to detect only a group of a predetermined keypoints that are expected to produce good results.

Convolutional neural networks (CNNs) can operate directly on the pixel values and have been utilised to effectively tackle various real-world problems such as object detection (Galvez et al., 2018), texture image classification (Hafemann et al., 2015), and image descriptors (Bello-Cerezo et al., 2019). However, CNN-based approaches have three main limitations: (1) Human experts are needed to design the architecture of these networks and specify the hyper-parameters; (2) the resulting model is a blackbox model that, if feasible, is very difficult to interpret; and (3) a large number of examples is needed in order to train such models. Although several evolutionary methods have been proposed to tackle the first limitation (automatically evolving the network structure and/or tuning the hyper-parameters), these approaches typically are very expensive and have their limitations (Sun et al., 2019). Regarding the second limitation, some attempts to interpret the trained model have been made (Simonyan et al., 2014); however, they are still far from producing insightful/meaningful interpretations of the different components, for example, weights and feature maps. Regarding the third limitation (number of training examples), labelled data are not always available and can be very costly to obtain in some domains, for example, the medical domain, due to the requirements of specialised devices and domain experts. The proposed method in this study has been specifically designed to automatically evolve an image descriptor using few instances per class. Furthermore, the same instances that were used during the evolutionary process are also used to train a classifier. Hence, the classifiers are trained using only a few images/instances per class. It is well-known (also shown in our experiments in this study) that NN-based methods cannot cope with such a small number of training examples to build an effective model (Bartlett and Maass, 2003). Transfer learning and data augmentation (Napoletano, 2017) are two approaches that have been adopted in the literature to address this limitation in NN-based and other methods. However, the focus of this work is to automatically evolve an image descriptor that can lead to good performance when there are only a few training instances.

It is worth mentioning that domain knowledge is often required to adopt transfer learning, particularly to address the three main questions (Pan and Yang, 2010) (1) what to transfer, (2) how to transfer, and (3) when to transfer. The data augmentation approach, on the other hand, can help with the number of training examples but does not solve the training efficiency problem. For example, in Napoletano (2017) over 600, 1,000, 4,500, and 750,000 instances were used to train the model on four datasets (that are also used in this study) whereas only two instances per class are used with our method.

One mechanism to perform data augmentation is to extract multiple patches (small image regions) from each of the original images. For example, many 100$×$100 pixels patches can be generated from an image with size 1000$×$1000 pixels. However, this is infeasible in this study as the largest instance size is 128$×$128 pixels (more details in Section 4.1).

Few-shot learning is important for texture and other domains. Reducing the number of instances in the training set will largely affect the accuracy/effectiveness, that is, training a model. Skin or breast cancer (or medical imaging in general) can benefit from having a method that can cope with a small number of training instances as (1) doctors are needed to label instances (very costly), (2) some hospitals/clinics do not have many instances (as the number of patients is small in some cities/countries), and (3) the variation in the texture between benign and malignant regions is an important feature that can largely help cancer detection in medical images.

Genetic programming (GP) is a broadly used evolutionary computation (EC) technique that is designed based on the principles of natural selection and survival of the fittest (Poli et al., 2008). Although GP does not differ largely from other EC techniques where the aim is to improve a population of randomly generated candidate solutions, GP has a flexible, typically tree-like, representation that makes it more desirable in many cases over other EC techniques (Poli et al., 2008).

Although the tree structure is commonly used in GP, it does not necessarily mean that other representations are not allowed or that GP is limited to only this type of representation. Many other individual representations, for example, linear GP (Oltean et al., 2009), multitree GP (Lee et al., 2015), and Cartesian GP (Miller and Smith, 2006), have been utilised over the past 30 years (Poli et al., 2008).

GP has been shown to effectively tackle issues in various domains and applications (Bhanu et al., 2005; Poli et al., 2008; Espejo et al., 2010; Durasevic and Jakobovic, 2018).

Perez and her colleagues have proposed various GP methods for object detection by utilising GP to automatically combine different operators to synthesise a local image descriptor (Perez and Olague, 2009, 2013).

For real-world scene recognition, Liu et al. (2013) proposed a multiobjective GP-based method that aims to automatically synthesise domain-adaptive holistic image descriptors. The task has been addressed by adopting a multiobjective approach.

Liu et al. (2012) utilised GP for automatically generating low-level spatio-temporal image descriptors for human-action recognition. In this method, GP aims to automatically combine a set of primitive 3D operators; this is handled as an optimisation task. The generalisability of this method has been empirically demonstrated and the results show that this method has better performance than several existing handcrafted descriptors.

Similarly, in Liu et al. (2016) GP is applied to automatically evolve spatio-temporal descriptors that are scale- and shift-invariant to human-action recognition. The method aims to adaptively learn descriptors for various datasets by employing the multilayer structure to effectively use the full knowledge as a way to mimic the human visual cortex physical structure.

GP has been utilised and combined with support vector machines (SVMs) in Bhatt and Patalia (2015) to automatically learn spatial descriptors to help classify images of the Indian monuments that were captured and uploaded by various tourists. The overall algorithm comprises preprocessing, spatial descriptors evolving, and classification phases. The experiments show that promising results have been achieved by the proposed method.

Price and Anderson (2017) extended the improved evolutionary-constructed features proposed in Lillywhite et al. (2012). In Price and Anderson (2017), GP is utilised to build a richer and more powerful class of features via arithmetic combinations and compositions. This method has been extensively experimented, and the results show its superiority compared to the baseline method.

However, these methods have utilised the typical single-tree GP representation instead of having multiple trees to form the evolved solution. Utilising the multitree individual representation in GP has been shown to be very effective in tackling different problems in various domains. A preliminary work of this study is presented in Al-Sahaf et al. (2017a) and Al-Sahaf et al. (2017b). However, the analyses in both studies were very limited; therefore, this article aims to provide more insight into the algorithm and the evolved programs by analysing the evolutionary and evaluation times, convergence, and program size, and to thoroughly investigate two best evolved programs to identify some potential patterns.

### Goals

The overall goal of this article is to significantly extend the proposed method in Al-Sahaf et al. (2017b) by describing the different components of the method, by assessing and comparing its performance using 7 benchmark texture image classification datasets and 7 state-of-the-art image descriptors, and by analysing the convergence behaviour and solution complexity. Specifically, this article is concerned with the following objectives:

• Compare the performance of utilising $k$-NN and the features extracted by automatically evolved image descriptors to that of three CNN-based methods.

• Investigate the convergence of the method to find good image descriptors and to determine whether the crossover and mutation operators can effectively work when multitree representation is used.

• Analyse the required time to evolve (training) and evaluate (testing) a descriptor, and assess whether and how the instance size and the number of classes will influence these times.

• Analyse the size of the evolved descriptors and how the behaviour of the program vary during the evolutionary process.

• Reveal/discover patterns that can be drawn from the evolved programs on different datasets, and determine the interpretability of the evolved programs.

This section provides a survey on the closely related work, and briefly introduces the baseline methods.

### 2.1  Related Work

The main focus of this section is on the directly related work of using GP to evolve image descriptors, that is, keypoint detectors and feature extractors in images. Furthermore, some of the GP and multitree GP work is also briefly reviewed in this section.

Detecting interest points (keypoints) in an image is important, and Ebner and Zell (1999) utilised GP to automatically evolve an individual that detects interest points in an image. A similar work is carried out and extended in Olague and Trujillo (2011).

Detecting edges in an image represents an important preprocessing task that helps in identifying the boundaries and shapes of the different objects and regions. GP has been successfully utilised in Fu et al. (2015) to automatically evolve a model that constructs invariant features for edge detection. Furthermore, the method considers how the observations from different GP programs are distributed in order to enhance the extracted features from raw pixel values. Several experiments have been conducted, and the experimental results show that the automatically constructed features by the evolved GP individuals combined with the distribution estimation have a positive impact on the detection performance. The method outperforms the combination of linear SVMs and a Bayesian model on the tested datasets.

A method for automatically generating prototypes in classification by utilising the multitree representation in GP has been proposed (Cordella et al., 2005). Furthermore, the number of trees in this method is dynamically determined, which allows having individuals with a varying number of trees. The main motivation behind this is to handle the situation in which different subclasses are combined into a single large class. The experiments conducted in Cordella et al. (2005) on three benchmark datasets show that the evolved programs by this dynamic multitree GP representation have significantly outperformed the competitive methods.

The GP multitree representation is utilised in conjunction with information theory by Boric and Estevez (2007) for clustering. Information theory is used to develop a measure, that is, fitness function, that reflects the goodness of an evolved program to perform the clustering task. In this method, no conflict resolution phase is needed to interpret the outputs obtained from the different trees of the same individual, which is achieved by adopting a probabilistic approach. The experimental results on 10 benchmark datasets reveal the superiority of their method, which outperformed the widely used $k$-means clustering method.

The structural genes in a living organism were the main source of inspiration for Benbassat and Sipper (2010) and they have utilised strongly typed GP (STGP) (Montana, 1995) to the zero-sum, deterministic, full-knowledge board game of Lose Checkers by adopting the multitree GP representation. Their method aims to enable GP to automatically discover effective strategies for the Lose Checkers game. In their method, STGP is employed with explicitly defined local mutation and introns. Their experiments reveal the applicability of the method to automatically evolve effective strategies to a full and nontrivial board game that can potentially achieve comparable performance to handcrafted machine players.

A method to discover an efficient set of patterns by utilising multitree GP to the task for self-assembling swarm robots has been proposed by Lee et al. (2013). Those automatically discovered patterns are then incorporated into the modules. The effectiveness of those discovered patterns has been revealed by the results of the conducted experiments in Lee et al. (2013).

The problem of building rotation-invariant image descriptors is approached by Al-Sahaf, Al-Sahaf et al. (2017) who proposed an STGP-based method that automatically combines simple arithmetic operators and first-order statistics. The evolved descriptors are rotation-invariant and their experiments on six texture classification datasets show the potential of the method to outperform handcrafted descriptors.

Multitree GP representation has not yet been utilised to evolve texture image descriptors. Such a representation allows the program to perform different tasks or explore different regions of the solution space by each tree.

### 2.2  Background

The proposed method in this study is motivated by a widely used and powerful hand-crafted image descriptor known as local binary pattern (Ojala et al., 1994), which represents an extension to a recently proposed genetic programming descriptor (GP-criptor) method in Al-Sahaf, Al-Sahaf et al. (2017). Hence, this section briefly introduces the two methods, that is, LBP and GP-criptor.

#### 2.2.1  Local Binary Pattern

Local binary pattern (LBP), originally proposed by Ojala et al. (1994), represents one of the most frequently used and studied image descriptors in computer vision and pattern recognition. Similar to other dense image descriptors, LBP generates a feature vector for an image by scanning the image pixels using a sliding window. At each position of the sliding window, LBP relies on the values of the neighbouring pixels to calculate the value of the central pixel. Subsequently, a histogram based on the frequencies of the newly calculated values will be generated. Each bin in the LBP histogram represents a single feature and corresponds to a single image keypoint. Hence, the process in LBP at each position of the sliding window comprises three steps as shown in Figure 1.
Figure 1:

Illustration of the LBP main steps.

Figure 1:

Illustration of the LBP main steps.

Close modal
The size of the sliding window and the number of pixels within that window are the two parameters that LBP relies on to specify the length of the generated feature vector. In LBP, the radius ($r$) is used to determine the size of the sliding window, where only those pixels equidistant from the central pixel are considered. However, only $p$ pixels (specified by the user) from the considered pixels are used to generate the feature vector. Figure 2 presents six examples, each of which uses different combinations of $r$ and $p$. In fact, to find which bin to increment at each position of the sliding window, LBP uses the following equations:
$LBPp,r=∑i=0p-1s(gi-g^)2i,s(γ)=0,γ<01,otherwise$
(1)
$gi=I(xi,yi)$
(2)
$xi=x^+rcos(2πi/p)$
(3)
$yi=y^-rsin(2πi/p)$
(4)
where intensity of the current window central pixel and its corresponding coordinates are denoted as $g^$ and $(x^,y^)$, respectively, and the intensity of the $i$th neighbouring pixel and its coordinates are denoted as $gi$ and $(xi,yi)$, respectively. Whether the $s(·)$ function returns 0 or 1 depends on whether the sign of the argument is negative or positive (including zero). Here, $I$ denotes the image being evaluated.
Figure 2:

Illustration of the LBP$p,r$ parameters.

Figure 2:

Illustration of the LBP$p,r$ parameters.

Close modal

The good performance of LBP has inspired many researchers to extend this image descriptor to address various limitations of the original algorithm such as illumination- and rotation-invariant issues. Interested readers can refer to Zhao et al. (2012), Kylberg and Sintorn (2013), and Yang and Chen (2013), which provide good reviews of LBP variants.

#### 2.2.2  GP Image Descriptor (GP-criptor$ri$⁠)

The proposed method in this article is based the recently introduced rotation-invariant GP descriptor (GP-criptor$ri$) (Al-Sahaf, Al-Sahaf et al., 2017). GP-criptor$ri$ has been designed to tackle two main limitations of the existing methods. First, it aims to automatically evolve rotation-invariant image descriptors—where human intervention is not required—by utilising GP and a set of simple arithmetic operators and first-order statistics. Second, the evolved descriptors can cope with the limitation of having few training instances. GP-criptor$ri$ uses the conventional single-tree representation to represent an individual. In other words, each program evolved by this method comprises a single root node that returns the output of the tree for the instance being evaluated. Furthermore, STGP is adopted to impose restrictions on the inputs and outputs of the different node types.

In GP-criptor$ri$, the function set comprises the typically used arithmetic operators including $+$, $-$, $×$, and protected $/$. However, GP-criptor$ri$ uses a special node type, code, that must be the root of any evolved program. The division operator performs a check on the second child and returns zero if its value is 0 in order to prevent the division by zero situation from occurring. The code node type takes a user-defined number of children. Each individual must have a single node of this type that represents the root of the tree. This node applies a threshold (set to 0 in Al-Sahaf, Al-Sahaf et al., 2017) to its input values and returns a binary code, which is similar to $s(·)$ in LBP.

To evolve rotation-invariant descriptors, GP-criptor$ri$ uses a set of first-order statistics that are order-invariant. Specifically, the terminal set in GP-criptor$ri$ comprises four node types that are $min(·)$, $max(·)$, $mean(·)$, and $stdev(·)$. The first two types return the minimum and maximum values of the pixels under the current position of the sliding window. Furthermore, $mean(·)$ and $stdev(·)$ return the average and standard deviation of the pixel values of the window, respectively.

Tackling the limitation of having a very small number of training instances was accomplished in Al-Sahaf, Al-Sahaf et al. (2017) by relying on a fitness function that considers the distances between the instances. The fitness function is defined as
$fitness'=1-11+e-5(Db'-Dw'),$
(5)
where the average distance of within-class is indicated by $Dw'$ and the average distance of between-class is indicated by $Db'$. These two components are calculated as
$Db'=1z(z-n)∑u∈R∑v∈R∖uχ2(u,v),{u∈u,v∈v}$
(6)
$Dw'=1z(n-1)∑u∈R∑u,v∈uχ2(u,v),{u≠v}$
(7)
where the number of instances per class and the number of classes are indicated by $n$ and $c$, respectively. The total number of instances in the training set is $z$. The training set ($R$) is represented as a set of feature vectors ($vi$) and the corresponding class label ($ℓi$) pairs. Therefore, $R={(vi,ℓi)}$, where $i∈{1,2,…,z}$, $vi∈Z+$ (non-negative integers), and $ℓi∈{1,2,…,c}$. Furthermore, the widely used Chi-square ($χ2$) measure (Cha, 2007) is utilised to measure the distance between two normalised vectors ($u$ and $v$) of the same length; that is, both have the same number of elements ($E$), which is defined as
$χ2(u,v)=12∑i=1E(ui-vi)2(ui+vi)$
(8)
where the corresponding $i$th element in the two vectors, that is, $u$ and $v$, is indicated as $ui$ and $vi$.

It is important to notice that the exponent in Equation (5) is multiplied by 5 to ensure the output of the fitness function is in the range [0,1].

This section discusses the proposed method and its key components. From now on, the acronym MGPD$t,wri$, where $t$ and $w$ are the number of trees and window size, respectively (more details are provided in Section 3.4), will be used to refer to this method, which stands for multitree GP rotation-invariant image descriptor.

### 3.1  Overall Algorithm

MGPD$t,wri$ has a large overlap with the baseline GP-criptor$ri$, and the overall evolutionary and evaluation processes are similar. These processes are discussed here in order to make this article self-contained.

As presented in Figure 3, the overall process comprises three phases: (1) preparing data, (2) evolving a descriptor, and (3) evaluating the evolved descriptor.
Figure 3:

The overall algorithm of MGPD$t,wri$.

Figure 3:

The overall algorithm of MGPD$t,wri$.

Close modal

In the data preparation phase, the instances of the dataset are divided into two sets: training, and test. The former will be used in both subsequent phases, whereas the latter will be used only in the third phase. It is important to notice that the instances of each class are equally divided into two halves (50%:50%). Only two instances from each class are randomly selected from the first 50% to form the training set $Str$. The second 50% of the total instances will be used to form the test set $Sts$. The data in both $Str$ and $Sts$ are images (the raw pixel values).

The evolutionary process phase represents the main part of the overall algorithm in which an image descriptor is automatically evolved by GP. The training set ($Str$) is fed into GP, and the best evolved individual is returned at the end of the process after going through a number of generations.

To measure the effectiveness of an evolved descriptor, the evaluation phase needs to evaluate the accuracy of a classifier on the unseen test set. This phase comprises three tasks. The first task is to transfer the images of both training ($Str$) and test ($Sts$) sets into the corresponding feature vectors by feeding them into the best evolved program at the end of the evolutionary process (more details will be provided in Section 3.4). The output of this process is the transformed training set ($R$) and the transformed test set ($S$). The aim of the second task is to train a classifier using one of the widely used classification algorithms. The third task is calculating the accuracy of the classifier on the transformed test set ($S$). In this study, $k$-NN is used to perform the classification task. Furthermore, the number of neighbours ($k$) is set to 1 mainly because there are only 2 instances of each class in the training set. Although only 2 instances per class are used in this study, the proposed method can cope with a larger number. However, 2 is the smallest number as the intra-class similarity is a main part of the fitness function (more details below).

### 3.2  Program Representation

Unlike typical GP, in which an individual is often represented as a single tree as depicted in Figure 4(a), a multitree program representation is used in this article as shown in Figure 4(b). Hence, each individual comprises a user-defined number of trees ($t$). A major difference between the two representations is the number of output values. The decision on how to interpret/utilise multiple outputs can vary as in a multitree representation an individual returns multiple values for each instance. The output of each tree in MGPD$t,wri$ is fed into $s(·)$, which returns 0 if the input value is negative and 1 otherwise.
Figure 4:

GP individual representations (a) single tree and (b) multitree.

Figure 4:

GP individual representations (a) single tree and (b) multitree.

Close modal

#### 3.2.1  Terminal Set

The terminal set in MGPD$t,wri$ is identical to that of the baseline method, that is, GP-criptor$ri$, which consists of the $min(·)$, $max(·)$, $mean(·)$, and $stdev(·)$ node types. However, the restriction of preventing terminal nodes from being the root node is removed. In GP-criptor$ri$, a terminal node cannot be the root due to the restriction that the root of the tree must be a code node. Although having a terminal node to be the root of a tree means that such a tree has only one node, this can be helpful during the evolutionary process when such a node is needed to improve the performance of another individual and can be swapped through the crossover operator.

#### 3.2.2  Function Set

The function set in MGPD$t,wri$ comprises only the four arithmetic operators $+$, $-$, $×$, and protected $/$. Similar to GP-criptor$ri$, each of these four operators takes two children, performs the corresponding operator, and returns a single value. The code node type represents a major difference between the two methods; the MGPD$t,wri$ method does not have a code node type in the function set. Removing this type of node has a direct effect on the individual representation; in GP-criptor$ri$, each individual must have a code node as the root of the tree. This restriction requires STGP to be utilised to define such structural rules between the nodes. As all nodes in MGPD$t,wri$ have the same input and output types, STGP is not required. It is important to notice that utilising STGP also needs careful implementation of the crossover and mutation operators in order to preserve the closure property of the nodes. This means that the implementation of MGPD$t,wri$ is simpler than that of GP-criptor$ri$.

### 3.3  Fitness Function

In this article, a similar strategy to that of GP-criptor$ri$ is adopted to overcome the problem of having a small number of instances in the training set. The distance between instances belonging to the same class (within-class), and the distance between instances belonging to different classes (between-class) are considered by MGPD$t,wri$. Hence, the fitness function is defined as
$fitness=α×Dw+(1-α)×(1-Db),$
(9)
where $Dw$ is the within-class distance component, and $Db$ is the between-class distance component. To give different importance to these two components, a scale factor $α∈[0,1]$ is used. These two distance components are calculated as
$Dw=1z∑u∈R∑u∈umaxvχ2(u,v),{v∈u,u≠v},$
(10)
$Db=1z(c-1)∑u∈R∑v∈R∖uminu,vχ2(u,v),{u∈u,v∈v},$
(11)
where $c$ represents the total number of classes, $z$ is the number of instances in training set (which is $2×c$ in this study as only two instances of each class are used for evolving a model), and $R$ is the training set.

The underlying principle of this fitness function has some similarity to the Triplet loss function (Chechik et al., 2010), where the distance between instances of the different classes is an important aspect.

Although MGPD$t,wri$ relies on distances to measure the quality of an individual as in GP-criptor$ri$, the methods used to calculate those components are different in order to mitigate the effect of the outlier instances. In GP-criptor$ri$, the average distance between each instance and all instances of the same class is considered as shown in Equation (7), whereas only the distance of the farthest (most dissimilar) instance is considered in MGPD$t,wri$, as shown in Equation (10). Similarly, the average distance between each instance and all other instances belonging to different classes is used in GP-criptor$ri$, as shown in Equation (6), while only the distance of the closest instance (most similar) is considered in MGPD$t,wri$ as shown in Equation (11).

### 3.4  Feature Vector Extraction

To generate the feature vector for an image, the evolved individual scans the image in a pixel-by-pixel fashion (dense image descriptors) starting from the left-top corner and ending at the right-bottom corner. As demonstrated in Figure 5, at each pixel, that is, position of the sliding window, the terminals, that is, minimum, maximum, mean, and standard deviation, are calculated based on the values of the neighbouring (including the current) pixels. The terminals represent the inputs of the individual's trees. The individual applies a threshold on the returned value by each tree, which is set to 0 in this study as in LBP. Therefore, the output of each tree will either be 0 (if the returned value was negative) or 1 (otherwise). These binary values are then concatenated to form a binary code that is converted into a decimal and the corresponding bin of the histogram is incremented (+1). The histogram represents the frequency of each keypoint presented in the image.
Figure 5:

An example demonstrating the steps to generate the feature vector.

Figure 5:

An example demonstrating the steps to generate the feature vector.

Close modal

There are two main parameters that must be defined before the evolutionary process starts. The first parameter is $t$, which represents the number of trees in each individual. This parameter directly affects the length of the feature vector (histogram) as each tree represents a bit in the generated binary code at each position of the sliding window. The second parameter is the size of the sliding window ($w$), that is, the number of neighbouring pixels that contribute to the terminal values at each position of the sliding window. As only a square-shaped window is used in this study, the window size is set to $w×w$.

A number of experiments have been conducted in this study to assess the performance of MGPD$t,wri$. This section describes the design of those experiments, discusses the parameter settings, and describes the benchmark datasets and methods.

### 4.1  Benchmark Datasets

Seven datasets are used in this study to investigate the quality of MGPD$t,wri$, which have been formed using four well-known and widely used benchmark datasets for texture classification. Here, these datasets are adapted to the few-shot setting. The instances of all seven datasets are grey-scale. These datasets vary in the number of classes, number of instances per class, number of rotation angles, illumination (lighting conditions), dimensions, and photographed materials.

#### 4.1.1  Brodatz Texture

Brodatz Texture1 (Brodatz, 1999) is likely one of the most used texture datasets, which comprises 112 classes in total. Each class of this dataset consists of a single 640$×$640 pixels image. To generate the instances of each class, the single image is divided into 84 nonoverlapping sub-images each of which is 64$×$64 pixels. Although it is possible to divide the single large image into a grid of 10$×$10 nonoverlapping tiles with size 64$×$64 pixels each, rotating this large image to any angle around the centre will lead to having some of those tiles falling outside the boundaries of the image. Rotating the image $45∘$ gives only 85 complete tiles, and when the original image is rotated $30∘$, only 84 complete tiles enclosed within the boundaries of the image can be extracted.

The first (BrNoRo) and second (BrWiRo) datasets in this study are formed from the Brodatz Texture dataset. Only 20 classes were randomly selected from Brodatz Texture to form these two datasets. The instances of BrNoRo are not rotated and there are 1680 [=20 (classes) $×$ 84 (instances)] instances in this dataset in total and Figure 6 shows an example of each class. The instances of BrWiRo are rotated in 12 angles at successive $30∘$ between $0∘$ and $330∘$ (inclusive), that is, ${0∘,30∘,…,330∘}$. The total number of instances in BrWiRo is 20160 [$=$20 (classes) $×$ 84 (instances) $×$ 12 (angles)]. Figure 7 demonstrates the 12 different rotation angles of an instance of BrWiRo.
Figure 6:

Samples from the BrNoRo dataset.

Figure 6:

Samples from the BrNoRo dataset.

Close modal
Figure 7:

A sample from the BrWiRo dataset presented in 12 rotation angles.

Figure 7:

A sample from the BrWiRo dataset presented in 12 rotation angles.

Close modal

#### 4.1.2  Outex Texture Classification

Another widely used benchmark dataset is Outex Texture Classification2 (OTC) (Ojala et al., 2002). OTC comprises 16 test suites for texture classification of varying difficulty. Some test suites are rotation-free test suites, whereas others have rotation. Moreover, the illumination is not controlled/fixed across all instances in each test suite.

The third (OutexTC00) and fourth (OutexTC10) datasets in this study are formed using the instances of Outex_TC_00000 and Outex_TC_00010, respectively. These two datasets are for the same type of objects/materials; however, the latter is the rotated version of the former. Each of these two datasets has 24 classes and the instances are of size 200$×$200 pixels. Each class of the OutexTC00 consists of 20 instances. Those instances are rotated around the centre at 9 angles $0∘$, $5∘$, $10∘$, $15∘$, $30∘$, $45∘$, $60∘$, $75∘$, and $90∘$ to form the instances of the OutexTC10 dataset. Hence, these two datasets comprise 480 [=24 (classes) $×$ 20 (instances)] and 4320 [=24 (classes) $×$ 20 (instances) $×$ 9 (angles)] instances, respectively. Samples from OutexTC00 and OutexTC10 are presented in Figures 8 and 9, respectively.
Figure 8:

Samples from the OutexTC00 dataset.

Figure 8:

Samples from the OutexTC00 dataset.

Close modal
Figure 9:

A sample from the OutexTC10 dataset presented in 9 rotation angles.

Figure 9:

A sample from the OutexTC10 dataset presented in 9 rotation angles.

Close modal

#### 4.1.3  Kylberg Sintorn Rotation

The Kylberg Sintorn Rotation3 (Kylberg, 2014) benchmark dataset was introduced in 2014 and aims to investigate the robustness with respect to rotation. The dataset comprises texture instances falling into 25 classes each of which consists of 900 instances. Figure 10 presents an example from each class of this dataset, and Figure 11 presents a sample in 9 rotation angles. The instances of each class appear in 9 different rotation angles at successive $40∘$ angles (${0∘,40∘,…,320∘}$). Therefore, there are 100 instances per rotation angle in each class. The total number of instances is 22500 [=25 (classes) $×$ 100 (instances) $×$ 9 (angles)]. The instances are of size 122$×$122 pixels each and are normalised to have a mean and standard deviation of 127 and 40, respectively, to eliminate the illumination changes between those instances. Notably, six methods have been used to rotate the instances of this dataset that consist of one hardware method and five interpolation methods including linear, nearest neighbour, B-spline, 3rd-order cubic, and Lanczos 3-kernel.
Figure 10:

Samples from the KySinHw dataset.

Figure 10:

Samples from the KySinHw dataset.

Close modal
Figure 11:

A sample from the KySinHw dataset presented in 9 rotation angles.

Figure 11:

A sample from the KySinHw dataset presented in 9 rotation angles.

Close modal

In this study, the fifth dataset (KySinHw) is formed using the instances of the hardware rotation method as it is more realistic than the interpolation methods.

#### 4.1.4  Kylberg Texture

To form the sixth (KyNoRo) and seventh (KyWiRo) datasets of this study, the widely used Kylberg Texture4 (Kylberg, 2011) benchmark dataset is used. This dataset is divided into two groups: without rotation (forms KyNoRo) and with rotation (forms KyWiRo). However, both groups consist of the same number of classes (28). The major difference between the two groups is that the instances in the former group are captured under the same angle, that is, fixed rotation angle, while the instances of the latter group are captured under 12 rotation angles between $0∘$ and $330∘$ (inclusive) at successive $30∘$ increments. The number of instances in each class is also different between the two groups; hence, there are 160 instances per class in the without rotation group and 1920 instances [=160 (instances) $×$ 12 (angles)] in each class of the with rotation group. The instances are relatively large, with 576$×$576 pixels, and handling such instances can drastically increase the computational cost. Therefore, due to the resource limitation, the instances were re-sampled to 57$×$57 pixels via mean-pooling. Some samples of KyNoRo and KyWiRo are shown in Figures 12 and 13, respectively.
Figure 12:

Samples from the KyNoRo dataset.

Figure 12:

Samples from the KyNoRo dataset.

Close modal
Figure 13:

A sample from the KyWiRo dataset presented in 12 rotation angles.

Figure 13:

A sample from the KyWiRo dataset presented in 12 rotation angles.

Close modal

To summarise the seven texture datasets used in this study, Table 1 lists the number of classes, total number of instances, number of rotation angles, and instance dimensions for each dataset on a single row. The datasets are listed in ascending order based on the number of classes and total number of instances.

Table 1:

A summary of the datasets.

Data setClassesInstancesRotationsDimensions
BrNoRo 20 1680 64$×$64
BrWiRo 20 20160 12 64$×$64
OutexTC00 24 480 128$×$128
OutexTC10 24 4320 128$×$128
KySinHw 25 22500 122$×$122
KyNoRo 28 4480 115$×$115
KyWiRo 28 53760 12 115$×$115
Data setClassesInstancesRotationsDimensions
BrNoRo 20 1680 64$×$64
BrWiRo 20 20160 12 64$×$64
OutexTC00 24 480 128$×$128
OutexTC10 24 4320 128$×$128
KySinHw 25 22500 122$×$122
KyNoRo 28 4480 115$×$115
KyWiRo 28 53760 12 115$×$115

### 4.2  Methods for Comparison

Various image descriptors currently exist, and studying the differences among all of them is beyond the scope of this study. To keep the comparison more focused on relative methods and those directly related methods to MGPD$t,wri$, seven, including the baseline, benchmark methods are used here. Six of those methods are handcrafted and have been shown to achieve state-of-the-art performance for texture image classification, which include uniform local binary pattern (LBP$p,ru2$) (Ojala et al., 1996), uniform and rotation-invariant LBP (LBP$p,ru2ri$) (Ojala et al., 2000), completed LBP (CLBP$p,r$) (Guo et al., 2010), local binary count (LBC$p,r$), and completed LBC (CLBC$p,r$) (Zhao et al., 2012), and dominant rotation LBP (DRLBP$p,r$) (Mehta and Egiazarian, 2016). The baseline method, that is, GP-criptor$ri$, is also used to show whether the new representation and fitness function have any major impact on the performance.

As CNN methods are capable of operating directly on the raw pixel values and automatically perform feature extraction during the training process, three CNN-based methods with varying architectures have been included in the experiments of this study. These methods are the original implementation of LeNet (LeCun et al., 1998), a five-layer CNN (CNN-5) (Shao et al., 2014), and an eight-layer CNN (CNN-8) (Chollet et al., 2015).

### 4.3  Parameter Settings

Both proposed and benchmark methods comprise some parameters that need to be set. Performing a sensitivity analysis on each parameter of those methods is very expensive and time consuming to find the optimal settings. Therefore, very few parameters are set based on some experiments that are conducted in this study and the vast majority are based on the literature. The parameters of the two GP-based methods, that is, MGPD$t,wri$ and GP-criptor$ri$, are discussed first followed by a discussion of the parameters of the other methods.

#### 4.3.1  GP Methods

The evolutionary parameters for both MGPD$t,wri$ and GP-criptor$ri$ methods have been kept identical for fair comparisons. The population size is set to 300 mainly due to the computation costs associated with handling a large number of individuals. Dealing with images is a very expensive task, especially when each image needs to be scanned in a pixel-by-pixel manner. The other evolutionary parameters are set based on Al-Sahaf, Al-Sahaf et al. (2017). The termination criteria are either an ideal solution with a fitness value equal to 0, or the 50th generation is reached. The crossover and mutation rates are 0.8 and 0.2, respectively, and the best 10 individuals in the current generation are copied directly to the population of the next generation to prevent degrading the best performance found so far. The minimum and maximum depths of the program tree are 2 and 10, respectively. The individuals are generated using ramped half-and-half, and a tournament selection of size 7 is used.

MGPD$t,wri$ and GP-criptor$ri$ also comprise other non-evolutionary parameters. The number of children under the code node in GP-criptor$ri$ is set to 9 based on the observation of Al-Sahaf, Al-Sahaf et al. (2017). Similarly, the number of trees ($t$) in MGPD$t,wri$ are set to 9 as this parameter specifies the length of the generated feature vector similar to code in GP-criptor$ri$. The size of the sliding window ($w$) is set to 5, that is, 5$×$5 pixels, in both methods as it has been observed to give the best results in Al-Sahaf, Al-Sahaf et al. (2017).

The $α$ parameter is a scaling factor that is introduced into the fitness function of MGPD$t,wri$ in order to balance the between-class and within-class distance components. The value of this parameter ranges between 0.0 and 1.0, therefore, 11 values at successive 0.1 step, that is, ${0.0,0.1,0.2,…,1.0}$, are used to investigate the effect of this parameter on the performance. Figure 14 shows the average performance of MGPD$t,wri$ over 30 independent runs on KyNoRo for those 11 different values of $α$. It can be observed that the system was able to evolve better solutions when the between-class distance was considered as more important than the within-class distance. This was expected as it is very easy to find a model that is capable of generating identical, that is, constant, feature vectors regardless of the input images. This fact is reflected by the very poor performance when $α$ is set to 1.0 (the system relies completely on and tries to minimise the within-class distance). In this study, α has been set to 0.1 because it shows good performance, as depicted in Figure 14.
Figure 14:

The sensitivity of the $α$ parameter on KyNoRo.

Figure 14:

The sensitivity of the $α$ parameter on KyNoRo.

Close modal

#### 4.3.2  Non-GP Methods

The non-GP benchmark methods also comprise a number of parameters. The majority of those parameters were set based on the corresponding original papers of those methods. Here, these settings are kept similar to those found in Zhao et al. (2012), Rassem and Khoo (2014), and Al-Sahaf, Al-Sahaf et al. (2017), which are $p=24$ (number of neighbouring pixels) and $r=3$ (the radius) in LBP$p,ru2ri$, CLBP$p,r$, LBC$p,r$, and CLBC$p,r$; these two parameters are set to 8 and 1, respectively, in LBP$p,ru2$ and DRLBP$p,r$. In other words, the methods LBP$24,3u2ri$, CLBP$24,3$, LBC$24,3$, CLBC$24,3$, LBP$8,1u2$, and DRLBP$8,1$ are used in this study as non-GP benchmark methods.

The widely used rectified linear unit (ReLU) (Nair and Hinton, 2010) activation function and softmax classification function are utilised in this study for LeNet, CNN-5 and CNN-8 (Chollet et al., 2015).

### 4.4  Performance Measure

The accuracy of 1-NN ($k$-NN where $k=1$) on the test set using only the two randomly selected instances from each class for training (knowledge-base as $k$-NN does not have a training phase) is used here to measure the quality of the studied image descriptors.

It is important to notice that all non-GP image descriptors are handcrafted and deterministic. However, both GP-criptor$ri$ and MGPD$t,wri$ are stochastic methods, and to provide more concrete conclusions, these methods have been executed 30 times using different seed values. The average accuracy ($x¯$) and standard deviation ($s$) over these 30 independent runs are reported. Therefore, in total, this experiment is executed 66 times ${$=[30 (runs) $×$ 2 (methods)] + [1 (run) $×$ 6 (methods)]$}$ on each dataset.

The other important point is the randomness in the process of selecting the training set ($Str$) as described in Section 3.1. This randomness in selecting those instances imposes the requirements for repeating the described experiment multiple times to eliminate the effect of the selected instances. Hence, the experiment is further repeated 10 times using different instances in $Str$ each time. The total number of runs on all the 7 datasets is 4620 [=66 (runs) $×$ 10 (repeats) $×$ 7 (datasets)].

### 4.5  Implementation

The evolutionary computation Java-based package version 24 (Luke, 2013) is used to implement MGPD$t,wri$ and GP-criptor$ri$. The implementations for uniform and rotation-invariant LBP5, CLBP6, DRLBP7 are freely available online. The implementations for both LBC and CLBC were obtained from the corresponding author (Zhao et al., 2012). Furthermore, all experiments have been carried out using machines running Linux version 3.7.5-1-ARCH with an Intel® CoreTM i7-3770 CPU @ 3.40 GHz and an 8 GByte of memory each, and Java version 1.8.0_144.

This section presents and discusses the results obtained from the experiments using eight image descriptors on seven benchmark texture classification datasets. The results of applying three CNN-based methods to those benchmark datasets are also presented and discussed in this section.

### 5.1  Image Descriptors

The experimental results are presented in Table 2; each column, apart from the first, corresponds to one dataset. The first row lists the names of the datasets, whereas the first column lists the names of the image descriptors. To measure whether the achieved results by MGPD$9,5ri$ are significantly different from those of the other methods, the non-parametric Wilcoxon signed-rank test (Demšar, 2006; Derrac et al., 2011) is used with a 95% significance interval. The symbols $+$ or $-$ are used in Table 2 to indicate that MGPD$9,5ri$ has significantly better or significantly worse performance, respectively, than that of the corresponding method. Furthermore, the $=$ symbol is used to indicate that the difference in performance between MGPD$9,5ri$ and the corresponding method is not significant.

Table 2:

Average accuracy (%) of 1-NN using eight image descriptors on the seven texture images datasets ($x¯±s$).

BrNoRoBrWiRoOutexTC00OutexTC10KySinHwKyNoRoKyWiRo
LBP$8,1u2$ 83.99 $±$ 1.95 $+$ 42.29 $±$ 1.66 $+$ 87.88 $±$ 1.23 $=$ 34.45 $±$ 1.79 $+$ 54.82 $±$ 2.18 $+$ 75.45 $±$ 1.96 $+$ 42.61 $±$ 1.84 $+$
LBP$24,3u2ri$ 68.49 $±$ 3.18 $+$ 67.63 $±$ 2.62 $+$ 69.50 $±$ 2.88 $+$ 64.50 $±$ 1.05 $+$ 81.76 $±$ 1.46 $+$ 67.41 $±$ 2.94 $+$ 69.46 $±$ 3.33 $+$
CLBP$24,3$ 82.37 $±$ 4.88 $+$ 85.78 $±$ 2.70 $+$ 81.00 $±$ 2.95 $+$ 86.10 $±$ 2.39 $=$ 97.31$±$0.73$-$ 90.56$±$1.16$=$ 88.97 $±$ 2.82 $=$
LBC$24,3$ 66.26 $±$ 2.80 $+$ 64.49 $±$ 2.95 $+$ 68.25 $±$ 3.81 $+$ 60.50 $±$ 1.82 $+$ 80.72 $±$ 1.51 $+$ 66.09 $±$ 2.76 $+$ 68.26 $±$ 3.59 $+$
CLBC$24,3$ 63.62 $±$ 2.52 $+$ 70.84 $±$ 3.19 $+$ 72.79 $±$ 2.87 $+$ 75.53 $±$ 2.24 $+$ 89.03 $±$ 1.86 $+$ 76.67 $±$ 4.05 $+$ 76.58 $±$ 3.77 $+$
DRLBP$8,1$ 83.17 $±$ 2.49 $+$ 69.65 $±$ 2.41 $+$ 84.96 $±$ 1.79 $+$ 63.97 $±$ 2.57 $+$ 85.30 $±$ 2.32 $+$ 86.26 $±$ 1.14 $+$ 74.05 $±$ 2.07 $+$
GP-criptor$ri$ 90.92 $±$ 1.94 $+$ 92.49 $±$ 1.14 $=$ 87.68 $±$ 1.87 $=$ 86.82 $±$ 1.93 $=$ 94.06 $±$ 1.63 $+$ 86.66 $±$ 1.79 $+$ 88.51 $±$ 1.39 $+$
MGPD$9,5ri$ 93.00$±$1.82 92.94$±$1.73 89.75$±$3.12 87.82$±$2.73 95.88 $±$ 1.23 90.53 $±$ 2.09 90.91$±$1.51
BrNoRoBrWiRoOutexTC00OutexTC10KySinHwKyNoRoKyWiRo
LBP$8,1u2$ 83.99 $±$ 1.95 $+$ 42.29 $±$ 1.66 $+$ 87.88 $±$ 1.23 $=$ 34.45 $±$ 1.79 $+$ 54.82 $±$ 2.18 $+$ 75.45 $±$ 1.96 $+$ 42.61 $±$ 1.84 $+$
LBP$24,3u2ri$ 68.49 $±$ 3.18 $+$ 67.63 $±$ 2.62 $+$ 69.50 $±$ 2.88 $+$ 64.50 $±$ 1.05 $+$ 81.76 $±$ 1.46 $+$ 67.41 $±$ 2.94 $+$ 69.46 $±$ 3.33 $+$
CLBP$24,3$ 82.37 $±$ 4.88 $+$ 85.78 $±$ 2.70 $+$ 81.00 $±$ 2.95 $+$ 86.10 $±$ 2.39 $=$ 97.31$±$0.73$-$ 90.56$±$1.16$=$ 88.97 $±$ 2.82 $=$
LBC$24,3$ 66.26 $±$ 2.80 $+$ 64.49 $±$ 2.95 $+$ 68.25 $±$ 3.81 $+$ 60.50 $±$ 1.82 $+$ 80.72 $±$ 1.51 $+$ 66.09 $±$ 2.76 $+$ 68.26 $±$ 3.59 $+$
CLBC$24,3$ 63.62 $±$ 2.52 $+$ 70.84 $±$ 3.19 $+$ 72.79 $±$ 2.87 $+$ 75.53 $±$ 2.24 $+$ 89.03 $±$ 1.86 $+$ 76.67 $±$ 4.05 $+$ 76.58 $±$ 3.77 $+$
DRLBP$8,1$ 83.17 $±$ 2.49 $+$ 69.65 $±$ 2.41 $+$ 84.96 $±$ 1.79 $+$ 63.97 $±$ 2.57 $+$ 85.30 $±$ 2.32 $+$ 86.26 $±$ 1.14 $+$ 74.05 $±$ 2.07 $+$
GP-criptor$ri$ 90.92 $±$ 1.94 $+$ 92.49 $±$ 1.14 $=$ 87.68 $±$ 1.87 $=$ 86.82 $±$ 1.93 $=$ 94.06 $±$ 1.63 $+$ 86.66 $±$ 1.79 $+$ 88.51 $±$ 1.39 $+$
MGPD$9,5ri$ 93.00$±$1.82 92.94$±$1.73 89.75$±$3.12 87.82$±$2.73 95.88 $±$ 1.23 90.53 $±$ 2.09 90.91$±$1.51

On the first benchmark dataset (BrNoRo), MGPD$9,5ri$ has achieved an average 93.00% accuracy, which represents the best achieved performance on this dataset as presented in the second column of Table 2. The statistical significance test shows that MGPD$9,5ri$ has significantly outperformed the competitor methods. Moreover, the gap between the performance of MGPD$9,5ri$ and that of GP-criptor$ri$ is over 2%.

The proposed method shows the best performance compared with the other competitive methods on the second dataset (BrWiRo) with an average 92.94% accuracy. The results of the significance test show that, apart from the baseline method, MGPD$9,5ri$ has significantly better performance than that of the other methods. The average accuracy for MGPD$9,5ri$ over the 30 independent runs was greater than that for GP-criptor$ri$; however, the difference was not statistically significant.

The results of OutexTC00 are presented in the fourth column of Table 2. Similar to the previous two datasets, MGPD$9,5ri$ shows the best performance compared with all methods on this dataset, with an average 89.75% accuracy. Apart from LBP$8,1u2$ and GP-criptor$ri$, MGPD$9,5ri$ significantly outperformed the other benchmark methods with a minimum gap of 4.79% accuracy. The proposed method shows better performance than that of the LBP$8,1u2$ and GP-criptor$ri$ methods with over 1.80% and 2.0% average accuracy, respectively.

On the rotated version of OutexTC00 (OutexTC10), MGPD$9,5ri$ has achieved 87.82% average accuracy and is the best performing method on this dataset. The statistical significance test shows that MGPD$9,5ri$ has significantly better performance than the other benchmark methods, apart from CLBP$24,3$ (86.10%) and GP-criptor$ri$ (86.82%). Note that the performances of all methods, apart from CLBP$24,3$ and CLBC$24,3$, have dropped compared to their performances on the rotation-free dataset.

The sixth column of Table 2 presents the experimental results on the KySinHw dataset. The results show that MGPD$9,5ri$ has achieved 95.88% average accuracy on this dataset. Although MGPD$9,5ri$ was not the best performing method on this dataset (CLBP$24,3$), it was the second best method. The significance test shows that although MGPD$9,5ri$ has significantly worse performance than CLBP$24,3$, it has significantly outperformed all the other benchmark methods.

The experimental results of the eight image descriptors on KyNoRo are presented in the seventh column of Table 2. The proposed method has achieved 90.53% average accuracy, which is the second best performance on this dataset with a difference of only 0.03% from the best performing method (CLBP$24,3$). The statistical significance test shows that MGPD$9,5ri$ has significantly outperformed the other benchmark methods apart from CLBP$24,3$ where the difference is not significant.

The last column of Table 2 lists the results on the rotated version of Kylberg (KyWiRo). Unlike KyNoRo, MGPD$9,5ri$ has achieved the best performance on this dataset with 90.91% average accuracy. Apart from CLBP$24,3$, the significance test shows that MGPD$9,5ri$ has significantly better performance than all the other methods.

### 5.2  Convolutional Neural Networks

CNN-based methods can operate directly on the raw pixel values and automatically perform feature extraction during the learning process. Hence, three CNN methods have been used with the same settings as the previous set of experiments; that is, only two instances per class are randomly drawn from the first half of the instances and used to train the model, whereas the instances of the second half are used to evaluate the trained model.

The average performance and standard deviation of 30 independent runs (using different seed values) for LeNet, CNN-5, and CNN-8 are presented in Table 3 and the best performing method on each dataset is presented in boldface font. Apart from BrNoRo, LeNet has achieved the best performance on the other six datasets compared with CNN-5 and CNN-8.

Table 3:

Average accuracy (%) of three CNN methods on the seven texture images datasets ($x¯±s$).

BrNoRoBrWiRoOutexTC00OutexTC10KySinHwKyNoRoKyWiRo
LeNet 19.64 $±$ 6.56 $+$ 12.03$±$2.38$+$ 12.50$±$2.33$+$ 7.49$±$1.35$+$ 6.36$±$1.78$+$ 8.79$±$3.12$+$ 6.31$±$2.00$+$
CNN-5 21.36$±$6.56$+$ 12.01 $±$ 2.38 $+$ 5.03 $±$ 2.33 $+$ 4.81 $±$ 1.35 $+$ 6.09 $±$ 1.78 $+$ 5.39 $±$ 3.12 $+$ 4.80 $±$ 2.00 $+$
CNN-8 16.10 $±$ 3.97 $+$ 9.60 $±$ 2.64 $+$ 7.01 $±$ 3.39 $+$ 5.82 $±$ 1.78 $+$ 6.22 $±$ 1.95 $+$ 6.29 $±$ 2.39 $+$ 5.23 $±$ 1.81 $+$
BrNoRoBrWiRoOutexTC00OutexTC10KySinHwKyNoRoKyWiRo
LeNet 19.64 $±$ 6.56 $+$ 12.03$±$2.38$+$ 12.50$±$2.33$+$ 7.49$±$1.35$+$ 6.36$±$1.78$+$ 8.79$±$3.12$+$ 6.31$±$2.00$+$
CNN-5 21.36$±$6.56$+$ 12.01 $±$ 2.38 $+$ 5.03 $±$ 2.33 $+$ 4.81 $±$ 1.35 $+$ 6.09 $±$ 1.78 $+$ 5.39 $±$ 3.12 $+$ 4.80 $±$ 2.00 $+$
CNN-8 16.10 $±$ 3.97 $+$ 9.60 $±$ 2.64 $+$ 7.01 $±$ 3.39 $+$ 5.82 $±$ 1.78 $+$ 6.22 $±$ 1.95 $+$ 6.29 $±$ 2.39 $+$ 5.23 $±$ 1.81 $+$

The results clearly show that CNN-based methods were, as expected, unable to cope well with the small number of instances per class. The three methods have produced very poor performance on the unseen data, which is significantly worse than the results of utilising $k$-NN that operates on the extracted features by any of the eight image descriptor methods in the previous experiment (see Table 2). It is important to notice that no data augmentation was used here, which can help to improve the performance of CNN-based method. However, data augmentation is out the scope of this study.

### 5.3  Results Summary

To summarise the experimental results, MGPD$9,5ri$ has significantly outperformed the other image descriptor methods in the majority of the cases, and it is the second best, if not the best, performing method on each of the experimented datasets. This method shows significantly better performance on 41 comparisons, 7 better or comparable (not significant) performances, and only 1 significantly worse performance of the 49 [=7 (benchmark methods) $×$ 7 (benchmark datasets)] comparisons.

With the use of $k$-NN and the features extracted by the automatically evolved image descriptors by the proposed method, the performance was significantly better than CNN-based methods on all the seven datasets in this study.

This section aims to provide further analysis and discussion on different aspects of MGPD$t,wri$ in this study. The overall analysis is discussed first, which includes the convergence of the evolutionary process, program size, and time needed to evolve an image descriptor. The second part of this section focuses on analysing the automatically evolved image descriptors by MGPD$9,5ri$ on different datasets in order to provide insight into how such descriptors can perform well. Moreover, the analysis of such programs, that is, descriptors, can help to identify/learn significant patterns that may help in designing more powerful image descriptors.

### 6.1  Overall Analysis

During the evolutionary process, different factors of the best program at each generation are measured. Here, the convergence, program size, evolution time, and evaluation time of MGPD$t,wri$ are discussed.

#### 6.1.1  Convergence

The convergence of MGPD$9,5ri$ is measured by calculating the mean and standard deviation of the fitness value for the best program at each generation over 300 [=30 (runs) $×$ 10 (repeats)] independent runs for each dataset. Figures 15 to 21 each include two plots where the convergence is presented in the first, and the gap in fitness value between two subsequent generations is presented in the second. In other words, the $ith$ bar ($bi$) in the second plot of those figures is calculated as
$bi=fi-1-fi,$
(12)
where $fi$ is the average fitness of the $i$th generation, and $i∈{1,2,…,50}$. The whiskers presented on the first plot of those figures represent the standard deviations. The plots show that a similar pattern can be drawn from those datasets, where MGPD$9,5ri$ has made faster progress over the first 20 generations than the subsequent 30 generations. For example, on the BrNoRo dataset, the system has reduced the fitness from 0.763 to 0.489 over the first 20 generations and then to 0.434 at the last generation, which means that over 83% [$=(0.763-0.489)/(0.763-0.434)$] of the fitness improvement happened early in the evolutionary process. This is also reflected by the high values of the left-most bars in the second plots of Figures 15 to 21. The system has made considerable jumps, that is, large improvements, in the first generation in some datasets. For example, on OutexTC10, OutexTC10, and KySinHw the improvement in the first generation was 15.1%, 14.8% and 14.1%, respectively, over the entire evolutionary process. Although each individual comprises multiple trees, using the conventional crossover and mutation operators, that is, a single tree at a time, did not slow down the system from finding better individuals during the whole evolutionary process, as shown in Figures 15,1617181920 to 21. Furthermore, the standard deviation bars show that the system has evolved individuals who are different from each other in different runs, albeit not largely different. This small variation shows the consistent ability of the system in evolving good solutions.
Figure 15:

Analysis of the evolution progress of MGPD$9,5ri$ on BrNoRo.

Figure 15:

Analysis of the evolution progress of MGPD$9,5ri$ on BrNoRo.

Close modal
Figure 16:

Analysis of the evolution progress of MGPD$9,5ri$ on BrWiRo.

Figure 16:

Analysis of the evolution progress of MGPD$9,5ri$ on BrWiRo.

Close modal
Figure 17:

Analysis of the evolution progress of MGPD$9,5ri$ on OutexTC00.

Figure 17:

Analysis of the evolution progress of MGPD$9,5ri$ on OutexTC00.

Close modal
Figure 18:

Analysis of the evolution progress of MGPD$9,5ri$ on OutexTC10.

Figure 18:

Analysis of the evolution progress of MGPD$9,5ri$ on OutexTC10.

Close modal
Figure 19:

Analysis of the evolution progress of MGPD$9,5ri$ on KySinHw.

Figure 19:

Analysis of the evolution progress of MGPD$9,5ri$ on KySinHw.

Close modal
Figure 20:

Analysis of the evolution progress of MGPD$9,5ri$ on KyNoRo.

Figure 20:

Analysis of the evolution progress of MGPD$9,5ri$ on KyNoRo.

Close modal
Figure 21:

Analysis of the evolution progress of MGPD$9,5ri$ on KyWiRo.

Figure 21:

Analysis of the evolution progress of MGPD$9,5ri$ on KyWiRo.

Close modal

#### 6.1.2  Program Size

To understand the trend of growth in the program size during the evolutionary process, the size (number of nodes) of the best program at each generation is reported. Figure 22 shows the average program size measured over the 300 [=30 (runs) $×$ 10 (repeats)] independent runs for each of the seven datasets. The whiskers present the standard deviation statistics measured over the 300 runs for each generation.
Figure 22:

The average size (number of nodes) of the best program per generation.

Figure 22:

The average size (number of nodes) of the best program per generation.

Close modal

Generally, the system starts from a relatively small program that continues to grow during the evolutionary process as presented in Figure 22. Having small/shallow individuals in the early generations is expected as the maximum depth is 5. The plots presented in Figure 22 show that the increment in the program size is smooth, which is also expected as the conventional crossover and mutation operators that only affect the size of a single sub-tree at a time were used. However, the variation in the program size at later generations is higher than that of the earlier generations as presented in Figure 22. This shows the system's ability to evolve compact/small individuals with good performance. This is an interesting aspect of the proposed system as it increases the interpretability of those evolved individuals.

Although the increase in the program size from the initial generation to the last generation is ranging between 78% (KyWiRo) and 80% (KySinHw), the average total number of nodes is below 600. In other words, each tree has approximately 67 nodes. This means that such programs can be evaluated very fast as only simple arithmetic operations are performed to calculate the output of each tree [the evolutionary (training) and evaluation (testing) times are discussed in the next subsection].

#### 6.1.3  Evolutionary and Evaluation Times

Two important questions are as follows: (1) How long does it take to evolve a model/solution? (2) How fast is an evolved model at performing the feature extraction task? To answer these two questions, some in-depth analysis is performed here. The CPU time has been measured independently for the evolutionary and evaluation times.

Regarding the evolution time, the average CPU time from generating the initial population to the last generation is measured. Figure 23a shows the average time in hours of 300 independent runs of MGPD$9,5ri$ on each of the seven datasets. The system takes on average between 2 hours (BrNoRo) and 11 hours (OutexTC10) to automatically evolve an image descriptor. Figure 23a shows that MGPD$9,5ri$ has considerably different times for the different image groups (Brodatz, OutexTC, and Kylberg). These differences were expected and there are two main factors that directly affect the evolution time. The first factor is the size, that is, dimensions, of each instance. MGPD$t,wri$ is designed to evolve dense image descriptors, which means that each individual needs to scan each image in a pixel-by-pixel manner. Hence, larger images require more time to go through all the pixels. This can be observed by comparing the times of BrNoRo and OutexTC00. The former comprises instances of size 64$×$64 pixels, whereas each instance of the latter dataset is 128$×$128 pixels in size. The second factor is the total number of classes. The training set comprises more instances when the number of classes is increased as this set has two instances of each class. Although the second factor has impact on the evolution time, it has less of an impact than the first factor. This can be validated by comparing the resulted times of KySinHw and KyNoRo. Although KyNoRo has more classes (28) than KySinHw (25), the average time for KySinHw is longer than that of KyNoRo.
Figure 23:

The average evolution time (hours) and evaluation time (milliseconds) for MGPD$9,5ri$ on the seven datasets.

Figure 23:

The average evolution time (hours) and evaluation time (milliseconds) for MGPD$9,5ri$ on the seven datasets.

Close modal

To measure the evaluation time, only the best evolved program at the end of each run is used. The time to convert the entire test set from images to their corresponding feature vectors is measured and the average time per instance is calculated. Figure 23b shows the average time per instance calculated for the best evolved programs from 300 independent runs on each of the seven datasets. The results show that an evolved image descriptor by MGPD$9,5ri$ takes a few milliseconds to produce the corresponding feature vector for an image. On the Brodatz datasets (BrNoRo and BrWiRo), the average time does not exceed 15 milliseconds. As the evaluation time, that is, time for an individual to generate the corresponding feature vector for an image, varies among the different datasets, this clearly indicates that the size, that is, dimensions, of the instance has a direct impact on the time required to generate the feature vector. In fact, the dimensions of the instance represent the only factor, apart from the number of operators (terminal function nodes), that can slow down the feature vector extracting operation.

It is important to notice that MGPD$t,wri$ is not optimised and different parts can be optimised to improve the efficiency of MGPD$t,wri$ (for both evolutionary and evaluation procedures). For example, an evolved individual consists of multiple trees that can be evaluated in parallel. Moreover, different individuals can be evaluated in parallel as there is no interaction among those individuals during the fitness calculation of each of them. Using optimised platforms or programming languages that are capable of performing image processing operators more efficiently than ECJ and Java is another mechanism that can potentially improve MGPD$t,wri$. Most likely, the simplest optimisation that can be done is pre calculating the mean, standard deviation, minimum and maximum statistics for each position of the sliding window instead of recalculating those values for each individual in every generation. This was not performed in this study due to resource limitations, specifically physical memory.

Figure 24:

Nine trees of an evolved program by MGPD$9,5ri$ on the KyWiRo dataset.

Figure 24:

Nine trees of an evolved program by MGPD$9,5ri$ on the KyWiRo dataset.

Close modal

### 6.2  Evolved Image Descriptors

To dig deeply into the key factors of MGPD$9,5ri$, two of those automatically evolved good-performing descriptors are thoroughly analysed in this section. Furthermore, the subtrees of each individual are simplified and presented as a list of formulae each of which is represented as $vi=⋯$, where $i$ indicates the index of the corresponding sub-tree.

#### 6.2.1  An Evolved Descriptor on KyWiRo

A good performing and relatively small image descriptor evolved by MGPD$9,5ri$ on the KyWiRo dataset is depicted in Figure 24. This program has achieved 88.74% accuracy on the unseen data. This program has been selected as it is easier to be visualised in a tree representation form. As shown in Figure 24, this program comprises 125 nodes in total including 67 terminals and 58 functions. On average, each tree in this program consists of 14 nodes. Clearly, some of the trees are very easy to interpret such as the second, third, fifth and eighth trees while the other trees are more complicated. However, those trees can further be simplified to the following nine equations:

• $v1=2×max×mean2-stdev2$

• $v2=2×stdev-max+mean$

• $v3=stdev+min-mean$

• $v4=max×stdev2-stdev-max×min+max×stdev+(stdev-min)(mean+min)$

• $v5=3×stdev-min$

• $v6=max×mean(-max+mean+stdev)min+max+mean×stdev$

• $v7=stdev-max+mean+minstdev2mean$

• $v8=min-2×stdev$

• $v9=stdev×(max+min)max+min-mean$

An interesting point to notice regarding these simplified equations is that they use the same components in different ways. For example, the fifth ($v5$) and eighth ($v8$) trees use only the $min$ and $stdev$ terminals, and in $v5$ the minimum value of the window is subtracted from the scaled standard deviation (multiplied by 3) value, whereas in $v8$, the scaled standard deviation value is subtracted from the minimum value. Moreover, the scaling factor of $stdev$ in $v5$ and $v8$ is different.

#### 6.2.2  An Evolved Descriptor on KyNoRo

The nine trees of the best evolved image descriptor by MGPD$9,5ri$ on the KyNoRo dataset are presented in Figure 25, and the $min$, $max$, $mean$ and $stdev$ are substituted by $a$, $b$, $c$, and $d$, respectively, to make these equation more readable:

• $v1=cb(b-2a)(b+a+d)(b+a)(2d2+3ad+bd-cd-cb+ba+a2)$

• $v2=a+d22d2c+d+c-b$

• $v3=d2cd+ddac+d(b+a)b+c-b+ab$

• $v4=d2ac-dbad+2db-b2b+c2d2c+b-cdb-c-d$

• $v5=ba+d-c-ac-dbbd+a-c$

• $v6=a2-acd2+2ad-bdc-2d3+2c2b$

• $v7=a+2d-c$

• $v8=2d-a+ac$

• $v9=a+d-c$

Figure 25:

Nine trees of an evolved program by MGPD$9,5ri$ on the KyNoRo dataset.

Figure 25:

Nine trees of an evolved program by MGPD$9,5ri$ on the KyNoRo dataset.

Close modal

Originally, this program comprises 219 nodes including 114 terminals and 105 functions. Clearly, the evaluation of such a program should not take a long time as only a few simple arithmetic operations must be performed. The program takes an average of 23 milliseconds to generate the feature vector for an image of size 115$×$115 pixels. Furthermore, it takes only 3 hours and 45 minutes to evolve this program. Checking these simplified trees reveals some interesting patterns. For example, the seventh ($v7$) and ninth ($v9$) trees have similar structures and use similar terminals ($min$, $mean$, and $stdev$); however, the standard deviation is scaled (multiplied by 2) in $v7$.

This program has achieved 91.12% accuracy on the test set, which is the best performance out of the 300 independent runs. Table 4 shows the confusion matrix for this program on the test set. The program has perfectly classified the instances of 10 classes, scored over 80% on 16 classes, and performed poorly on two classes (rug1 and wall) only, as depicted in Table 4. The program did struggle with generating a good feature vector for the rug1 class as it achieved only 46% accuracy. The majority of those misclassified instances from rug1 are confused with grass1. A closer inspection of a sample from each of these two classes, that is, rug1 and grass1, shows that both instances share similar characteristics such as the structure of the grass leaves as presented in Figure 26.
Figure 26:

Samples taken from the KyNoRo dataset.

Figure 26:

Samples taken from the KyNoRo dataset.

Close modal
Table 4:

Confusion matrix for the best evolved program on the KyNoRo dataset, where the first row lists the predicted labels based on the generated feature vectors by this program and using 1-NN, and the actual class labels are listed in the first column.

blanket1blanket2canvas1ceiling1ceiling2cushion1floor1floor2grass1lentils1linseeds1oatmeal1pearlsugar1rice1rice2rug1sand1scarf1scarf2screen1seat1seat2sesameseeds1stone1stone2stone3stoneslab1wall1Accuracy
blanket1 69 11 0.86
blanket2 74 0.93
canvas1 80 1.00
ceiling1 66 14 0.83
ceiling2 80 1.00
cushion1 80 1.00
floor1 80 1.00
floor2 80 1.00
grass1 11 64 0.80
lentils1 80 1.00
linseeds1 80 1.00
oatmeal1 74 0.93
pearlsugar1 71 0.89
rice1 79 0.99
rice2 80 1.00
rug1 23 37 0.46
sand1 74 0.93
scarf1 80 1.00
scarf2 80 1.00
screen1 74 0.93
seat1 79 0.99
seat2 72 0.90
sesameseeds1 71 0.89
stone1 79 0.99
stone2 69 0.86
stone3 65 0.81
stoneslab1 74 0.93
wall1 17 11 50 0.63
Average  0.91
blanket1blanket2canvas1ceiling1ceiling2cushion1floor1floor2grass1lentils1linseeds1oatmeal1pearlsugar1rice1rice2rug1sand1scarf1scarf2screen1seat1seat2sesameseeds1stone1stone2stone3stoneslab1wall1Accuracy
blanket1 69 11 0.86
blanket2 74 0.93
canvas1 80 1.00
ceiling1 66 14 0.83
ceiling2 80 1.00
cushion1 80 1.00
floor1 80 1.00
floor2 80 1.00
grass1 11 64 0.80
lentils1 80 1.00
linseeds1 80 1.00
oatmeal1 74 0.93
pearlsugar1 71 0.89
rice1 79 0.99
rice2 80 1.00
rug1 23 37 0.46
sand1 74 0.93
scarf1 80 1.00
scarf2 80 1.00
screen1 74 0.93
seat1 79 0.99
seat2 72 0.90
sesameseeds1 71 0.89
stone1 79 0.99
stone2 69 0.86
stone3 65 0.81
stoneslab1 74 0.93
wall1 17 11 50 0.63
Average  0.91

### 6.3  Analysis Summary

The analysis shows that MGPD$t,wri$ converges and makes large improvements over the first few generations but keeps improving over the evolutionary process. The program tends to start from relatively small programs that grow larger during the evolutionary process in order to build better solutions. The size of the evolved descriptors vary largely between the different executions of MGPD$9,5ri$, which shows that MGPD$9,5ri$ is flexible for producing compact and large individuals for different problems. Although the evolution time is not fast, that is, it takes hours, the method is still faster than domain experts manually designing an image descriptor, which may take several days to build. The analysis shows that an evolved descriptor takes a few milliseconds to generate the feature vector for an image, which makes such models sufficient for online applications where a fast response is needed. The analysis of two individuals reveals the ability of the system to combine the terminal and function components efficiently to build image descriptors that can potentially outperform the domain-expert designed descriptors. Furthermore, the evolved models are interpretable and can be simplified and converted into a human-readable equations.

In this article, a multitree GP representation has been successfully utilised for the task of automatically evolving image descriptors where only a few instances from each class are used to provide feedback on the quality of those descriptors. An evolved descriptor comprises a number of trees, and simple arithmetic operators and first-order statistics are automatically combined to form each tree. The experimental results on seven texture datasets show the superior performance of MGPD$9,5ri$ compared with that of seven handcrafted expert designed descriptors and those automatically evolved by the baseline method. This study has also thoroughly analysed some key factors of the evolutionary process and the evolved descriptors. The analysis shows that the method finds good candidate solutions over the first few generations and iteratively improves those solutions over the later generations during the evolutionary process. Furthermore, these descriptors do not require a long time to evolve and they are very fast to evaluate as they only comprise a number of simple arithmetic operators. The interpretability of these descriptors represents a very important factor that can help in optimising/simplifying such descriptors. Analysing some of the automatically evolved image descriptors by MGPD$9,5ri$ revealed that different patterns can be identified/learned from them, for example, how function and terminal node types have been combined differently to form the subtrees of an individual.

### 7.1  Major Contributions

The study has made the following contributions:

• This is the first study that utilises multitree GP to automatically evolve image descriptors using two instances per class.

• The analysis reveals that the evolved descriptors are relatively fast to evolve and very fast to evaluate.

• The evolved descriptors are interpretable and can be simplified.

• The evolved descriptors have significantly outperformed both those manually crafted by domain experts and those automatically evolved by the state-of-the-art image descriptors.

### 7.2  Future Work

Different directions can be investigated in the future in order to either further improve the performance of MGPD$t,wri$ or to elucidate into the semantics of the different components of MGPD$t,wri$. Apart from the minimum and maximum tree depth, there is no restriction on the program size in the current implementation. Different mechanisms and approaches can be explored, for example, multiobjective, to reduce the program size, which can largely reduce the complexity of the evolved programs. We aim to investigate this in the future. We would also like to study the effect of using different crossover and mutation operators that apply multiple changes, instead of the conventional single change, on the selected program(s). Although this is believed to speed-up the exploration process of the search space and may lead to identification of better individuals, it requires substantial changes. Moreover, investigating the impact of the number of trees in the individual on the performance of MGPD$t,wri$ can be studied in the future. The main goal of this article is to use few-shot learning in GP for evolving text image descriptors. In the literature, transfer learning and data augmentation are also used to cope with a small number of instances. In the future, we will investigate the pros and cons of these methods for further improvements. Although the current design considers only grey-scale images, it can be extended to handle coloured images, which is an ongoing work.

This work was supported in part by the MBIE Data Science SSIF Fund under the contract RTVU1914, the Marsden Fund of New Zealand Government under Contracts VUW1615, VUW1913 and VUW1914, and the Science for Technological Innovation Challenge (SfTI) fund under grant E3603/2903.

Al-Sahaf
,
H.
,
Al-Sahaf
,
A.
,
Xue
,
B.
,
Johnston
,
M.
, and
Zhang
,
M.
(
2017
).
Automatically evolving rotation-invariant texture image descriptors by genetic programming
.
IEEE Transactions on Evolutionary Computation
,
21
(
1
):
83
101
.
Al-Sahaf
,
H.
,
Xue
,
B.
, and
Zhang
,
M.
(
2017a
).
Evolving texture image descriptors using a multitree genetic programming representation
. In
Proceedings of the 2017 Annual Conference on Genetic and Evolutionary Computation
, pp.
219
220
.
Al-Sahaf
,
H.
,
Xue
,
B.
, and
Zhang
,
M.
(
2017b
).
A multitree genetic programming representation for automatically evolving texture image descriptors
. In
Proceedings of the 11th International Conference on Simulated Evolution and Learning
, pp.
499
511
.
Lecture Notes in Computer Science
, Vol.
10593
.
Bartlett
,
P. L.
, and
Maass
,
W.
(
2003
).
Vapnik--Chervonenkis dimension of neural nets
. In
M. A.
Arbib
(Ed.),
The handbook of brain theory and neural networks
, 2nd ed., pp.
1188
1192
.
Cambridge, MA
:
MIT Press
.
Bello-Cerezo
,
R.
,
Bianconi
,
F.
,
Di Maria
,
F.
,
Napoletano
,
P.
, and
Smeraldi
,
F.
(
2019
).
Comparative evaluation of hand-crafted image descriptors vs. off-the-shelf cnn-based features for colour texture classification under ideal and realistic conditions
.
Applied Sciences
,
9
(
4
):
1
32
.
Benbassat
,
A.
, and
Sipper
,
M.
(
2010
).
Evolving lose-checkers players using genetic programming
. In
Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games
, pp.
30
37
.
Bhanu
,
B.
,
Yingqiang
,
L.
, and
Krawiec
,
K.
(
2005
).
Evolutionary synthesis of pattern recognition systems
.
Berlin
:
Springer
.
Bhatt
,
M. S.
, and
Patalia
,
T. P.
(
2015
).
Genetic programming evolved spatial descriptor for Indian monuments classification
. In
Proceedings of the 2015 IEEE International Conference on Computer Graphics, Vision and Information Security
, pp.
131
136
.
Boric
,
N.
, and
Estevez
,
P. A.
(
2007
).
Genetic programming-based clustering using an information theoretic fitness measure
. In
Proceedings of the 2007 IEEE Congress on Evolutionary Computation
, pp.
31
38
.
Brodatz
,
P.
(
1999
).
Textures: A photographic album for artists and designers
.
Mineola, NY
:
Dover Publications
.
Cha
,
S.-H.
(
2007
).
Comprehensive survey on distance/similarity measures between probability density functions
.
International Journal of Mathematical Models and Methods in Applied Sciences
,
1
(
4
):
300
307
.
Chechik
,
G.
,
Sharma
,
V.
,
Shalit
,
U.
, and
Bengio
,
S.
(
2010
).
Large scale online learning of image similarity through ranking
.
The Journal of Machine Learning Research
,
11:1109
1135
.
Chollet
,
F.
et al. (
2015
).
Keras
. https://github.com/fchollet/keras.
Cordella
,
L. P.
,
Stefano
,
C. D.
,
Fontanella
,
F.
, and
Marcelli
,
A.
(
2005
).
Genetic programming for generating prototypes in classification problems
. In
Proceedings of the 2005 IEEE Congress on Evolutionary Computation
, pp.
1149
1155
.
Demšar
,
J.
(
2006
).
Statistical comparisons of classifiers over multiple data sets
.
Journal of Machine Learning Research
,
7:1
30
.
Derrac
,
J.
,
García
,
S.
,
Molina
,
D.
, and
Herrera
,
F.
(
2011
).
A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms
.
Swarm and Evolutionary Computation
,
1
(
1
):
3
18
.
Durasevic
,
M.
, and
Jakobovic
,
D.
(
2018
).
A survey of dispatching rules for the dynamic unrelated machines environment
.
Expert Systems with Applications
,
113
(
2
):
555
569
.
Ebner
,
M.
, and
Zell
,
A.
(
1999
).
Evolving a task specific image operator
. In
Evolutionary Image Analysis, Signal Processing and Telecommunications
, pp.
74
89
.
Lecture Notes in Computer Science
, Vol.
1596
.
Espejo
,
P. G.
,
Ventura
,
S.
, and
Herrera
,
F.
(
2010
).
A survey on the application of genetic programming to classification
.
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
,
40
(
2
):
121
144
.
Fu
,
W.
,
Johnston
,
M.
, and
Zhang
,
M.
(
2015
).
Distribution-based invariant feature construction using genetic programming for edge detection
.
Soft Computing
,
19
(
8
):
2371
2389
.
Galvez
,
R. L.
,
Bandala
,
A. A.
,
,
E. P.
,
Vicerra
,
R. R. P.
, and
Maningo
,
J. M. Z.
(
2018
).
Object detection using convolutional neural networks
. In
Proceedings of TENCON 2018 IEEE Region 10 Conference
, pp.
2023
2027
.
Guo
,
Z.
,
Zhang
,
L.
, and
Zhang
,
D.
(
2010
).
A completed modeling of local binary pattern operator for texture classification
.
IEEE Transactions on Image Processing
,
19
(
6
):
1657
1663
.
Hafemann
,
L. G.
,
Oliveira
,
L. S.
,
Cavalin
,
P. R.
, and
Sabourin
,
R.
(
2015
).
Transfer learning between texture classification tasks using convolutional neural networks
. In
Proceedings of the 2015 International Joint Conference on Neural Networks
, pp.
1
7
.
Harris
,
C.
, and
Stephens
,
M.
(
1988
).
A combined corner and edge detector
. In
Proceedings of the 4th Alvey Vision Conference
, pp.
147
151
.
Kylberg
,
G.
(
2011
).
The Kylberg texture dataset v. 1.0. External report (Blue series) 35, Centre for Image Analysis
,
Swedish University of Agricultural Sciences and Uppsala University, Uppsala, Sweden
.
Kylberg
,
G.
(
2014
).
Automatic virus identification using TEM: Image segmentation and texture analysis
.
PhD thesis, Division of Visual Information and Interaction, Uppsala University, Uppsala, Sweden
.
Kylberg
,
G.
, and
Sintorn
,
I.-M.
(
2013
).
Evaluation of noise robustness for local binary pattern descriptors in texture classification
.
EURASIP Journal on Image and Video Processing
,
2013
(
1
):
1
20
.
LeCun
,
Y.
,
Bottou
,
L.
,
Bengio
,
Y.
, and
Haffner
,
P.
(
1998
).
Gradient-based learning applied to document recognition
.
Proceedings of the IEEE
,
86
(
11
):
2278
2324
.
Lee
,
J.-H.
,
Ahn
,
C. W.
, and
An
,
J.
(
2013
).
An approach to self-assembling swarm robots using multitree genetic programming
.
Scientific World Journal
,
2013:1
10
.
Lee
,
J.-H.
,
Anaraki
,
J. R.
,
Ahn
,
C. W.
, and
An
,
J.
(
2015
).
Efficient classification system based on fuzzyrough feature selection and multitree genetic programming for intension pattern recognition using brain signal
.
Expert Systems with Applications
,
42
(
3
):
1644
1651
.
Lillywhite
,
K.
,
Tippetts
,
B.
, and
Lee
,
D.-J.
(
2012
).
Self-tuned evolution-constructed features for general object recognition
.
Pattern Recognition
,
45
(
1
):
241
251
.
Liu
,
L.
,
Shao
,
L.
, and
Li
,
X.
(
2013
).
Building holistic descriptors for scene recognition: A multi-objective genetic programming approach
. In
Proceedings of the 21st ACM International Conference on Multimedia
, pp.
997
1006
.
Liu
,
L.
,
Shao
,
L.
,
Li
,
X.
, and
Lu
,
K.
(
2016
).
Learning spatio-temporal representations for action recognition: A genetic programming approach
.
IEEE Transactions on Cybernetics
,
46
(
1
):
158
172
.
Liu
,
L.
,
Shao
,
L.
, and
Rockett
,
P.
(
2012
).
Genetic programming-evolved spatio-temporal descriptor for human action recognition
. In
Proceedings of the British Machine Vision Conference
, pp.
18.1
18.12
.
Lowe
,
D. G.
(
1999
).
Object recognition from local scale-invariant features
. In
Proceedings of the International Conference on Computer Vision
, pp.
1150
1157
.
Luke
,
S.
(
2013
).
Essentials of metaheuristics
.
http://cs.gmu.edu/sean/book/metaheuristics/
Mehta
,
R.
, and
Egiazarian
,
K.
(
2016
).
Dominant rotated local binary patterns (DRLBP) for texture classification
.
Pattern Recognition Letters
,
71
(
1
):
16
22
.
Miller
,
J. F.
, and
Smith
,
S. L.
(
2006
).
Redundancy and computational efficiency in Cartesian genetic programming
.
IEEE Transactions on Evolutionary Computation
,
10
(
2
):
167
174
.
Montana
,
D. J.
(
1995
).
Strongly typed genetic programming
.
Evolutionary Computation
,
3
(
2
):
199
230
.
Nair
,
V.
, and
Hinton
,
G. E.
(
2010
).
Rectified linear units improve restricted Boltzmann machines
. In
Proceedings of the 27th International Conference on International Conference on Machine Learning
, pp.
807
814
.
Napoletano
,
P.
(
2017
).
Hand-crafted vs learned descriptors for color texture classification
. In
S.
Bianco
,
R.
Schettini
,
A.
Trémeau
, and
S.
Tominaga
(Eds.),
Computational color imaging
, pp.
259
271
.
Berlin
:
Springer
.
Ojala
,
T.
,
Mäenpää
,
T.
,
Pietikäinen
,
M.
,
Viertola
,
J.
,
Kyllonen
,
J.
, and
Huovinen
,
S.
(
2002
).
Outex---New framework for empirical evaluation of texture analysis algorithms
. In
Proceedings of the 16th International Conference on Pattern Recognition
, Vol.
1
, pp.
701
706
.
Ojala
,
T.
,
Pietikäinen
,
M.
, and
Harwood
,
D.
(
1994
).
Performance evaluation of texture measures with classification based on Kullback discrimination of distributions
. In
Proceedings of the 12th International Conference on Pattern Recognition
, Vol.
1
, pp.
582
585
.
Ojala
,
T.
,
Pietikäinen
,
M.
, and
Harwood
,
D.
(
1996
).
A comparative study of texture measures with classification based on feature distributions
.
Pattern Recognition
,
29
(
1
):
51
59
.
Ojala
,
T.
,
Pietikäinen
,
M.
, and
Mäenpää
,
T.
(
2000
).
Gray scale and rotation invariant texture classification with local binary patterns
. In
Proceedings of the 6th European Conference on Computer Vision
, pp.
404
420
.
Lecture Notes in Computer Science
Vol.
1842
.
Olague
,
G.
, and
Trujillo
,
L.
(
2011
).
Evolutionary-computer-assisted design of image operators that detect interest points using genetic programming
.
Image and Vision Computing
,
29
(
7
):
484
498
.
Oltean
,
M.
,
Grosan
,
C.
,
Diosan
,
L.
, and
Mihaila
,
C.
(
2009
).
Genetic programming with linear representation: A survey
.
International Journal on Artificial Intelligence Tools
,
18
(
2
):
197
238
.
Pan
,
S. J.
, and
Yang
,
Q.
(
2010
).
A survey on transfer learning
.
IEEE Transactions on Knowledge and Data Engineering
,
22
(
10
):
1345
1359
.
Perez
,
C. B.
, and
Olague
,
G.
(
2009
).
Evolutionary learning of local descriptor operators for object recognition
. In
Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation
, pp.
1051
1058
.
Perez
,
C. B.
, and
Olague
,
G.
(
2013
).
Genetic programming as strategy for learning image descriptor operators
.
Intelligent Data Analysis
,
17
(
4
):
561
583
.
Poli
,
R.
,
Langdon
,
W. B.
, and
McPhee
,
N. F.
(
2008
).
A field guide to genetic programming (with contributions by J. R. Koza)
.
http://lulu.com
Price
,
S. R.
, and
Anderson
,
D. T.
(
2017
).
Genetic programming for image feature descriptor learning
. In
Proceedings of the 2017 IEEE Congress on Evolutionary Computation
, pp.
854
860
.
Rassem
,
T.
, and
Khoo
,
B. E.
(
2014
).
Completed local ternary pattern for rotation invariant texture classification
.
The Scientific World Journal
,
2014:1
10
.
Satpathy
,
A.
,
Jiang
,
X.
, and
Eng
,
H.-L.
(
2014
).
LBP-based edge-texture features for object recognition
.
IEEE Transactions on Image Processing
,
23
(
5
):
1953
1964
.
Shao
,
L.
,
Liu
,
L.
, and
Li
,
X.
(
2014
).
Feature learning for image classification via multiobjective genetic programming
.
IEEE Transactions on Neural Networks and Learning Systems
,
25
(
7
):
1359
1371
.
Simonyan
,
K.
,
Vedaldi
,
A.
, and
Zisserman
,
A.
(
2014
).
Deep inside convolutional networks: Visualising image classification models and saliency maps
. In
Proceedings of the 2nd International Conference on Learning Representations
, pp.
1
8
.
Sun
,
Y.
,
Xue
,
B.
,
Zhang
,
M.
, and
Yen
,
G. G.
(
2019
).
Evolving deep convolutional neural networks for image classification
.
IEEE Transactions on Evolutionary Computation
, pp.
1
14
. doi: 10.1109/TEVC.2019.2916183
Wang
,
X.
,
Wang
,
Y.
,
Yang
,
X.
, and
Zuo
,
H.
(
2012
).
Texture classification based on SIFT features and bag-of-words in compressed domain
. In
Proceedings of the 5th International Congress on Image and Signal Processing
, pp.
941
945
.
Yang
,
B.
, and
Chen
,
S.
(
2013
).
A comparative study on local binary pattern (LBP) based face recognition: LBP histogram versus LBP image
.
Neurocomputing
,
120:365
379
.
Zhao
,
Y.
,
Huang
,
D.-S.
, and
Jia
,
W.
(
2012
).
Completed local binary count for rotation invariant texture classification
.
IEEE Transactions on Image Processing
,
21
(
10
):
4492
4497
.