Abstract
The performance of image classification is highly dependent on the quality of the extracted features that are used to build a model. Designing such features usually requires prior knowledge of the domain and is often undertaken by a domain expert who, if available, is very costly to employ. Automating the process of designing such features can largely reduce the cost and efforts associated with this task. Image descriptors, such as local binary patterns, have emerged in computer vision, and aim at detecting keypoints, for example, corners, line-segments, and shapes, in an image and extracting features from those keypoints. In this article, genetic programming (GP) is used to automatically evolve an image descriptor using only two instances per class by utilising a multitree program representation. The automatically evolved descriptor operates directly on the raw pixel values of an image and generates the corresponding feature vector. Seven well-known datasets were adapted to the few-shot setting and used to assess the performance of the proposed method and compared against six handcrafted and one evolutionary computation-based image descriptor as well as three convolutional neural network (CNN) based methods. The experimental results show that the new method has significantly outperformed the competitor image descriptors and CNN-based methods. Furthermore, different patterns have been identified from analysing the evolved programs.
1 Introduction
Image classification is concerned with categorising images into a predetermined set of classes based on the visual content of those images, and represents an essential task in computer vision that has received increasing attention over the past few decades. Human beings have a very advanced and complicated visual system that allows the analysis and understanding the visual content of images efficiently. However, trying to mimic this ability with machines is very difficult and many researchers have investigated some small components, for example, region of interest detection, classification, and feature extraction, of the whole human visual system puzzle. The majority of machine learning algorithms were designed to deal with features rather than raw data. To extract good features from images, it is important to find a reliable set of image keypoints such as corners, vertical lines, horizontal lines, and shapes. Conventionally, such keypoints are manually designed by a domain expert, and detecting them in an image may also require human intervention to manually annotate each keypoint or develop a method to automatically search for them. Performing such operations (labelling and designing) is a time-consuming task. Hence, several image descriptors have emerged that aim to automatically detect one or more image keypoints and generate the corresponding feature vector. In other words, the ultimate goal of an image descriptor is to extract features from an image. The Harris corner detector (Harris and Stephens, 1988), scale-invariant feature transform (SIFT) (Lowe, 1999), local binary patterns (LBP) (Ojala et al., 1994), and their recent variants (Wang et al., 2012; Satpathy et al., 2014) are some typical image descriptors that are widely used in computer vision and pattern recognition domains. These image descriptors can be divided into sparse and dense categories based on the mechanism by which they generate a feature vector for an image. The methods of the former group extract features from some parts of the image that mostly are small cut-outs, also known as patches, of the original image. The methods of the latter group operate in a pixel-by-pixel fashion, where each pixel in the image contributes towards generating the feature vector. Although these image descriptors have facilitated the process of generating feature vectors for an image, they have three main issues. First, designing such descriptors still requires domain-expert interventions to develop different components. Second, some of them are not robust to various deformations such as rotation, illumination and scale, and extending them to handle such deformations may require major changes. Third, some descriptors are designed to detect only a group of a predetermined keypoints that are expected to produce good results.
Convolutional neural networks (CNNs) can operate directly on the pixel values and have been utilised to effectively tackle various real-world problems such as object detection (Galvez et al., 2018), texture image classification (Hafemann et al., 2015), and image descriptors (Bello-Cerezo et al., 2019). However, CNN-based approaches have three main limitations: (1) Human experts are needed to design the architecture of these networks and specify the hyper-parameters; (2) the resulting model is a blackbox model that, if feasible, is very difficult to interpret; and (3) a large number of examples is needed in order to train such models. Although several evolutionary methods have been proposed to tackle the first limitation (automatically evolving the network structure and/or tuning the hyper-parameters), these approaches typically are very expensive and have their limitations (Sun et al., 2019). Regarding the second limitation, some attempts to interpret the trained model have been made (Simonyan et al., 2014); however, they are still far from producing insightful/meaningful interpretations of the different components, for example, weights and feature maps. Regarding the third limitation (number of training examples), labelled data are not always available and can be very costly to obtain in some domains, for example, the medical domain, due to the requirements of specialised devices and domain experts. The proposed method in this study has been specifically designed to automatically evolve an image descriptor using few instances per class. Furthermore, the same instances that were used during the evolutionary process are also used to train a classifier. Hence, the classifiers are trained using only a few images/instances per class. It is well-known (also shown in our experiments in this study) that NN-based methods cannot cope with such a small number of training examples to build an effective model (Bartlett and Maass, 2003). Transfer learning and data augmentation (Napoletano, 2017) are two approaches that have been adopted in the literature to address this limitation in NN-based and other methods. However, the focus of this work is to automatically evolve an image descriptor that can lead to good performance when there are only a few training instances.
It is worth mentioning that domain knowledge is often required to adopt transfer learning, particularly to address the three main questions (Pan and Yang, 2010) (1) what to transfer, (2) how to transfer, and (3) when to transfer. The data augmentation approach, on the other hand, can help with the number of training examples but does not solve the training efficiency problem. For example, in Napoletano (2017) over 600, 1,000, 4,500, and 750,000 instances were used to train the model on four datasets (that are also used in this study) whereas only two instances per class are used with our method.
One mechanism to perform data augmentation is to extract multiple patches (small image regions) from each of the original images. For example, many 100100 pixels patches can be generated from an image with size 10001000 pixels. However, this is infeasible in this study as the largest instance size is 128128 pixels (more details in Section 4.1).
Few-shot learning is important for texture and other domains. Reducing the number of instances in the training set will largely affect the accuracy/effectiveness, that is, training a model. Skin or breast cancer (or medical imaging in general) can benefit from having a method that can cope with a small number of training instances as (1) doctors are needed to label instances (very costly), (2) some hospitals/clinics do not have many instances (as the number of patients is small in some cities/countries), and (3) the variation in the texture between benign and malignant regions is an important feature that can largely help cancer detection in medical images.
Genetic programming (GP) is a broadly used evolutionary computation (EC) technique that is designed based on the principles of natural selection and survival of the fittest (Poli et al., 2008). Although GP does not differ largely from other EC techniques where the aim is to improve a population of randomly generated candidate solutions, GP has a flexible, typically tree-like, representation that makes it more desirable in many cases over other EC techniques (Poli et al., 2008).
Although the tree structure is commonly used in GP, it does not necessarily mean that other representations are not allowed or that GP is limited to only this type of representation. Many other individual representations, for example, linear GP (Oltean et al., 2009), multitree GP (Lee et al., 2015), and Cartesian GP (Miller and Smith, 2006), have been utilised over the past 30 years (Poli et al., 2008).
GP has been shown to effectively tackle issues in various domains and applications (Bhanu et al., 2005; Poli et al., 2008; Espejo et al., 2010; Durasevic and Jakobovic, 2018).
Perez and her colleagues have proposed various GP methods for object detection by utilising GP to automatically combine different operators to synthesise a local image descriptor (Perez and Olague, 2009, 2013).
For real-world scene recognition, Liu et al. (2013) proposed a multiobjective GP-based method that aims to automatically synthesise domain-adaptive holistic image descriptors. The task has been addressed by adopting a multiobjective approach.
Liu et al. (2012) utilised GP for automatically generating low-level spatio-temporal image descriptors for human-action recognition. In this method, GP aims to automatically combine a set of primitive 3D operators; this is handled as an optimisation task. The generalisability of this method has been empirically demonstrated and the results show that this method has better performance than several existing handcrafted descriptors.
Similarly, in Liu et al. (2016) GP is applied to automatically evolve spatio-temporal descriptors that are scale- and shift-invariant to human-action recognition. The method aims to adaptively learn descriptors for various datasets by employing the multilayer structure to effectively use the full knowledge as a way to mimic the human visual cortex physical structure.
GP has been utilised and combined with support vector machines (SVMs) in Bhatt and Patalia (2015) to automatically learn spatial descriptors to help classify images of the Indian monuments that were captured and uploaded by various tourists. The overall algorithm comprises preprocessing, spatial descriptors evolving, and classification phases. The experiments show that promising results have been achieved by the proposed method.
Price and Anderson (2017) extended the improved evolutionary-constructed features proposed in Lillywhite et al. (2012). In Price and Anderson (2017), GP is utilised to build a richer and more powerful class of features via arithmetic combinations and compositions. This method has been extensively experimented, and the results show its superiority compared to the baseline method.
However, these methods have utilised the typical single-tree GP representation instead of having multiple trees to form the evolved solution. Utilising the multitree individual representation in GP has been shown to be very effective in tackling different problems in various domains. A preliminary work of this study is presented in Al-Sahaf et al. (2017a) and Al-Sahaf et al. (2017b). However, the analyses in both studies were very limited; therefore, this article aims to provide more insight into the algorithm and the evolved programs by analysing the evolutionary and evaluation times, convergence, and program size, and to thoroughly investigate two best evolved programs to identify some potential patterns.
Goals
The overall goal of this article is to significantly extend the proposed method in Al-Sahaf et al. (2017b) by describing the different components of the method, by assessing and comparing its performance using 7 benchmark texture image classification datasets and 7 state-of-the-art image descriptors, and by analysing the convergence behaviour and solution complexity. Specifically, this article is concerned with the following objectives:
Compare the performance of utilising -NN and the features extracted by automatically evolved image descriptors to that of three CNN-based methods.
Investigate the convergence of the method to find good image descriptors and to determine whether the crossover and mutation operators can effectively work when multitree representation is used.
Analyse the required time to evolve (training) and evaluate (testing) a descriptor, and assess whether and how the instance size and the number of classes will influence these times.
Analyse the size of the evolved descriptors and how the behaviour of the program vary during the evolutionary process.
Reveal/discover patterns that can be drawn from the evolved programs on different datasets, and determine the interpretability of the evolved programs.
2 Literature Survey
This section provides a survey on the closely related work, and briefly introduces the baseline methods.
2.1 Related Work
The main focus of this section is on the directly related work of using GP to evolve image descriptors, that is, keypoint detectors and feature extractors in images. Furthermore, some of the GP and multitree GP work is also briefly reviewed in this section.
Detecting interest points (keypoints) in an image is important, and Ebner and Zell (1999) utilised GP to automatically evolve an individual that detects interest points in an image. A similar work is carried out and extended in Olague and Trujillo (2011).
Detecting edges in an image represents an important preprocessing task that helps in identifying the boundaries and shapes of the different objects and regions. GP has been successfully utilised in Fu et al. (2015) to automatically evolve a model that constructs invariant features for edge detection. Furthermore, the method considers how the observations from different GP programs are distributed in order to enhance the extracted features from raw pixel values. Several experiments have been conducted, and the experimental results show that the automatically constructed features by the evolved GP individuals combined with the distribution estimation have a positive impact on the detection performance. The method outperforms the combination of linear SVMs and a Bayesian model on the tested datasets.
A method for automatically generating prototypes in classification by utilising the multitree representation in GP has been proposed (Cordella et al., 2005). Furthermore, the number of trees in this method is dynamically determined, which allows having individuals with a varying number of trees. The main motivation behind this is to handle the situation in which different subclasses are combined into a single large class. The experiments conducted in Cordella et al. (2005) on three benchmark datasets show that the evolved programs by this dynamic multitree GP representation have significantly outperformed the competitive methods.
The GP multitree representation is utilised in conjunction with information theory by Boric and Estevez (2007) for clustering. Information theory is used to develop a measure, that is, fitness function, that reflects the goodness of an evolved program to perform the clustering task. In this method, no conflict resolution phase is needed to interpret the outputs obtained from the different trees of the same individual, which is achieved by adopting a probabilistic approach. The experimental results on 10 benchmark datasets reveal the superiority of their method, which outperformed the widely used -means clustering method.
The structural genes in a living organism were the main source of inspiration for Benbassat and Sipper (2010) and they have utilised strongly typed GP (STGP) (Montana, 1995) to the zero-sum, deterministic, full-knowledge board game of Lose Checkers by adopting the multitree GP representation. Their method aims to enable GP to automatically discover effective strategies for the Lose Checkers game. In their method, STGP is employed with explicitly defined local mutation and introns. Their experiments reveal the applicability of the method to automatically evolve effective strategies to a full and nontrivial board game that can potentially achieve comparable performance to handcrafted machine players.
A method to discover an efficient set of patterns by utilising multitree GP to the task for self-assembling swarm robots has been proposed by Lee et al. (2013). Those automatically discovered patterns are then incorporated into the modules. The effectiveness of those discovered patterns has been revealed by the results of the conducted experiments in Lee et al. (2013).
The problem of building rotation-invariant image descriptors is approached by Al-Sahaf, Al-Sahaf et al. (2017) who proposed an STGP-based method that automatically combines simple arithmetic operators and first-order statistics. The evolved descriptors are rotation-invariant and their experiments on six texture classification datasets show the potential of the method to outperform handcrafted descriptors.
Multitree GP representation has not yet been utilised to evolve texture image descriptors. Such a representation allows the program to perform different tasks or explore different regions of the solution space by each tree.
2.2 Background
The proposed method in this study is motivated by a widely used and powerful hand-crafted image descriptor known as local binary pattern (Ojala et al., 1994), which represents an extension to a recently proposed genetic programming descriptor (GP-criptor) method in Al-Sahaf, Al-Sahaf et al. (2017). Hence, this section briefly introduces the two methods, that is, LBP and GP-criptor.
2.2.1 Local Binary Pattern
The good performance of LBP has inspired many researchers to extend this image descriptor to address various limitations of the original algorithm such as illumination- and rotation-invariant issues. Interested readers can refer to Zhao et al. (2012), Kylberg and Sintorn (2013), and Yang and Chen (2013), which provide good reviews of LBP variants.
2.2.2 GP Image Descriptor (GP-criptor)
The proposed method in this article is based the recently introduced rotation-invariant GP descriptor (GP-criptor) (Al-Sahaf, Al-Sahaf et al., 2017). GP-criptor has been designed to tackle two main limitations of the existing methods. First, it aims to automatically evolve rotation-invariant image descriptors—where human intervention is not required—by utilising GP and a set of simple arithmetic operators and first-order statistics. Second, the evolved descriptors can cope with the limitation of having few training instances. GP-criptor uses the conventional single-tree representation to represent an individual. In other words, each program evolved by this method comprises a single root node that returns the output of the tree for the instance being evaluated. Furthermore, STGP is adopted to impose restrictions on the inputs and outputs of the different node types.
In GP-criptor, the function set comprises the typically used arithmetic operators including , , , and protected . However, GP-criptor uses a special node type, code, that must be the root of any evolved program. The division operator performs a check on the second child and returns zero if its value is 0 in order to prevent the division by zero situation from occurring. The code node type takes a user-defined number of children. Each individual must have a single node of this type that represents the root of the tree. This node applies a threshold (set to 0 in Al-Sahaf, Al-Sahaf et al., 2017) to its input values and returns a binary code, which is similar to in LBP.
To evolve rotation-invariant descriptors, GP-criptor uses a set of first-order statistics that are order-invariant. Specifically, the terminal set in GP-criptor comprises four node types that are , , , and . The first two types return the minimum and maximum values of the pixels under the current position of the sliding window. Furthermore, and return the average and standard deviation of the pixel values of the window, respectively.
It is important to notice that the exponent in Equation (5) is multiplied by 5 to ensure the output of the fitness function is in the range [0,1].
3 The Proposed Method
This section discusses the proposed method and its key components. From now on, the acronym MGPD, where and are the number of trees and window size, respectively (more details are provided in Section 3.4), will be used to refer to this method, which stands for multitree GP rotation-invariant image descriptor.
3.1 Overall Algorithm
MGPD has a large overlap with the baseline GP-criptor, and the overall evolutionary and evaluation processes are similar. These processes are discussed here in order to make this article self-contained.
In the data preparation phase, the instances of the dataset are divided into two sets: training, and test. The former will be used in both subsequent phases, whereas the latter will be used only in the third phase. It is important to notice that the instances of each class are equally divided into two halves (50%:50%). Only two instances from each class are randomly selected from the first 50% to form the training set . The second 50% of the total instances will be used to form the test set . The data in both and are images (the raw pixel values).
The evolutionary process phase represents the main part of the overall algorithm in which an image descriptor is automatically evolved by GP. The training set () is fed into GP, and the best evolved individual is returned at the end of the process after going through a number of generations.
To measure the effectiveness of an evolved descriptor, the evaluation phase needs to evaluate the accuracy of a classifier on the unseen test set. This phase comprises three tasks. The first task is to transfer the images of both training () and test () sets into the corresponding feature vectors by feeding them into the best evolved program at the end of the evolutionary process (more details will be provided in Section 3.4). The output of this process is the transformed training set () and the transformed test set (). The aim of the second task is to train a classifier using one of the widely used classification algorithms. The third task is calculating the accuracy of the classifier on the transformed test set (). In this study, -NN is used to perform the classification task. Furthermore, the number of neighbours () is set to 1 mainly because there are only 2 instances of each class in the training set. Although only 2 instances per class are used in this study, the proposed method can cope with a larger number. However, 2 is the smallest number as the intra-class similarity is a main part of the fitness function (more details below).
3.2 Program Representation
3.2.1 Terminal Set
The terminal set in MGPD is identical to that of the baseline method, that is, GP-criptor, which consists of the , , , and node types. However, the restriction of preventing terminal nodes from being the root node is removed. In GP-criptor, a terminal node cannot be the root due to the restriction that the root of the tree must be a code node. Although having a terminal node to be the root of a tree means that such a tree has only one node, this can be helpful during the evolutionary process when such a node is needed to improve the performance of another individual and can be swapped through the crossover operator.
3.2.2 Function Set
The function set in MGPD comprises only the four arithmetic operators , , , and protected . Similar to GP-criptor, each of these four operators takes two children, performs the corresponding operator, and returns a single value. The code node type represents a major difference between the two methods; the MGPD method does not have a code node type in the function set. Removing this type of node has a direct effect on the individual representation; in GP-criptor, each individual must have a code node as the root of the tree. This restriction requires STGP to be utilised to define such structural rules between the nodes. As all nodes in MGPD have the same input and output types, STGP is not required. It is important to notice that utilising STGP also needs careful implementation of the crossover and mutation operators in order to preserve the closure property of the nodes. This means that the implementation of MGPD is simpler than that of GP-criptor.
3.3 Fitness Function
The underlying principle of this fitness function has some similarity to the Triplet loss function (Chechik et al., 2010), where the distance between instances of the different classes is an important aspect.
Although MGPD relies on distances to measure the quality of an individual as in GP-criptor, the methods used to calculate those components are different in order to mitigate the effect of the outlier instances. In GP-criptor, the average distance between each instance and all instances of the same class is considered as shown in Equation (7), whereas only the distance of the farthest (most dissimilar) instance is considered in MGPD, as shown in Equation (10). Similarly, the average distance between each instance and all other instances belonging to different classes is used in GP-criptor, as shown in Equation (6), while only the distance of the closest instance (most similar) is considered in MGPD as shown in Equation (11).
3.4 Feature Vector Extraction
There are two main parameters that must be defined before the evolutionary process starts. The first parameter is , which represents the number of trees in each individual. This parameter directly affects the length of the feature vector (histogram) as each tree represents a bit in the generated binary code at each position of the sliding window. The second parameter is the size of the sliding window (), that is, the number of neighbouring pixels that contribute to the terminal values at each position of the sliding window. As only a square-shaped window is used in this study, the window size is set to .
4 Experiment Design
A number of experiments have been conducted in this study to assess the performance of MGPD. This section describes the design of those experiments, discusses the parameter settings, and describes the benchmark datasets and methods.
4.1 Benchmark Datasets
Seven datasets are used in this study to investigate the quality of MGPD, which have been formed using four well-known and widely used benchmark datasets for texture classification. Here, these datasets are adapted to the few-shot setting. The instances of all seven datasets are grey-scale. These datasets vary in the number of classes, number of instances per class, number of rotation angles, illumination (lighting conditions), dimensions, and photographed materials.
4.1.1 Brodatz Texture
Brodatz Texture1 (Brodatz, 1999) is likely one of the most used texture datasets, which comprises 112 classes in total. Each class of this dataset consists of a single 640640 pixels image. To generate the instances of each class, the single image is divided into 84 nonoverlapping sub-images each of which is 6464 pixels. Although it is possible to divide the single large image into a grid of 1010 nonoverlapping tiles with size 6464 pixels each, rotating this large image to any angle around the centre will lead to having some of those tiles falling outside the boundaries of the image. Rotating the image gives only 85 complete tiles, and when the original image is rotated , only 84 complete tiles enclosed within the boundaries of the image can be extracted.
4.1.2 Outex Texture Classification
Another widely used benchmark dataset is Outex Texture Classification2 (OTC) (Ojala et al., 2002). OTC comprises 16 test suites for texture classification of varying difficulty. Some test suites are rotation-free test suites, whereas others have rotation. Moreover, the illumination is not controlled/fixed across all instances in each test suite.
A sample from the OutexTC10 dataset presented in 9 rotation angles.
4.1.3 Kylberg Sintorn Rotation
In this study, the fifth dataset (KySinHw) is formed using the instances of the hardware rotation method as it is more realistic than the interpolation methods.
4.1.4 Kylberg Texture
To summarise the seven texture datasets used in this study, Table 1 lists the number of classes, total number of instances, number of rotation angles, and instance dimensions for each dataset on a single row. The datasets are listed in ascending order based on the number of classes and total number of instances.
A summary of the datasets.
Data set . | Classes . | Instances . | Rotations . | Dimensions . |
---|---|---|---|---|
BrNoRo | 20 | 1680 | 1 | 6464 |
BrWiRo | 20 | 20160 | 12 | 6464 |
OutexTC00 | 24 | 480 | 1 | 128128 |
OutexTC10 | 24 | 4320 | 9 | 128128 |
KySinHw | 25 | 22500 | 9 | 122122 |
KyNoRo | 28 | 4480 | 1 | 115115 |
KyWiRo | 28 | 53760 | 12 | 115115 |
Data set . | Classes . | Instances . | Rotations . | Dimensions . |
---|---|---|---|---|
BrNoRo | 20 | 1680 | 1 | 6464 |
BrWiRo | 20 | 20160 | 12 | 6464 |
OutexTC00 | 24 | 480 | 1 | 128128 |
OutexTC10 | 24 | 4320 | 9 | 128128 |
KySinHw | 25 | 22500 | 9 | 122122 |
KyNoRo | 28 | 4480 | 1 | 115115 |
KyWiRo | 28 | 53760 | 12 | 115115 |
4.2 Methods for Comparison
Various image descriptors currently exist, and studying the differences among all of them is beyond the scope of this study. To keep the comparison more focused on relative methods and those directly related methods to MGPD, seven, including the baseline, benchmark methods are used here. Six of those methods are handcrafted and have been shown to achieve state-of-the-art performance for texture image classification, which include uniform local binary pattern (LBP) (Ojala et al., 1996), uniform and rotation-invariant LBP (LBP) (Ojala et al., 2000), completed LBP (CLBP) (Guo et al., 2010), local binary count (LBC), and completed LBC (CLBC) (Zhao et al., 2012), and dominant rotation LBP (DRLBP) (Mehta and Egiazarian, 2016). The baseline method, that is, GP-criptor, is also used to show whether the new representation and fitness function have any major impact on the performance.
As CNN methods are capable of operating directly on the raw pixel values and automatically perform feature extraction during the training process, three CNN-based methods with varying architectures have been included in the experiments of this study. These methods are the original implementation of LeNet (LeCun et al., 1998), a five-layer CNN (CNN-5) (Shao et al., 2014), and an eight-layer CNN (CNN-8) (Chollet et al., 2015).
4.3 Parameter Settings
Both proposed and benchmark methods comprise some parameters that need to be set. Performing a sensitivity analysis on each parameter of those methods is very expensive and time consuming to find the optimal settings. Therefore, very few parameters are set based on some experiments that are conducted in this study and the vast majority are based on the literature. The parameters of the two GP-based methods, that is, MGPD and GP-criptor, are discussed first followed by a discussion of the parameters of the other methods.
4.3.1 GP Methods
The evolutionary parameters for both MGPD and GP-criptor methods have been kept identical for fair comparisons. The population size is set to 300 mainly due to the computation costs associated with handling a large number of individuals. Dealing with images is a very expensive task, especially when each image needs to be scanned in a pixel-by-pixel manner. The other evolutionary parameters are set based on Al-Sahaf, Al-Sahaf et al. (2017). The termination criteria are either an ideal solution with a fitness value equal to 0, or the 50th generation is reached. The crossover and mutation rates are 0.8 and 0.2, respectively, and the best 10 individuals in the current generation are copied directly to the population of the next generation to prevent degrading the best performance found so far. The minimum and maximum depths of the program tree are 2 and 10, respectively. The individuals are generated using ramped half-and-half, and a tournament selection of size 7 is used.
MGPD and GP-criptor also comprise other non-evolutionary parameters. The number of children under the code node in GP-criptor is set to 9 based on the observation of Al-Sahaf, Al-Sahaf et al. (2017). Similarly, the number of trees () in MGPD are set to 9 as this parameter specifies the length of the generated feature vector similar to code in GP-criptor. The size of the sliding window () is set to 5, that is, 55 pixels, in both methods as it has been observed to give the best results in Al-Sahaf, Al-Sahaf et al. (2017).
4.3.2 Non-GP Methods
The non-GP benchmark methods also comprise a number of parameters. The majority of those parameters were set based on the corresponding original papers of those methods. Here, these settings are kept similar to those found in Zhao et al. (2012), Rassem and Khoo (2014), and Al-Sahaf, Al-Sahaf et al. (2017), which are (number of neighbouring pixels) and (the radius) in LBP, CLBP, LBC, and CLBC; these two parameters are set to 8 and 1, respectively, in LBP and DRLBP. In other words, the methods LBP, CLBP, LBC, CLBC, LBP, and DRLBP are used in this study as non-GP benchmark methods.
4.4 Performance Measure
The accuracy of 1-NN (-NN where ) on the test set using only the two randomly selected instances from each class for training (knowledge-base as -NN does not have a training phase) is used here to measure the quality of the studied image descriptors.
It is important to notice that all non-GP image descriptors are handcrafted and deterministic. However, both GP-criptor and MGPD are stochastic methods, and to provide more concrete conclusions, these methods have been executed 30 times using different seed values. The average accuracy () and standard deviation () over these 30 independent runs are reported. Therefore, in total, this experiment is executed 66 times =[30 (runs) 2 (methods)] + [1 (run) 6 (methods)] on each dataset.
The other important point is the randomness in the process of selecting the training set () as described in Section 3.1. This randomness in selecting those instances imposes the requirements for repeating the described experiment multiple times to eliminate the effect of the selected instances. Hence, the experiment is further repeated 10 times using different instances in each time. The total number of runs on all the 7 datasets is 4620 [=66 (runs) 10 (repeats) 7 (datasets)].
4.5 Implementation
The evolutionary computation Java-based package version 24 (Luke, 2013) is used to implement MGPD and GP-criptor. The implementations for uniform and rotation-invariant LBP5, CLBP6, DRLBP7 are freely available online. The implementations for both LBC and CLBC were obtained from the corresponding author (Zhao et al., 2012). Furthermore, all experiments have been carried out using machines running Linux version 3.7.5-1-ARCH with an Intel® CoreTM i7-3770 CPU @ 3.40 GHz and an 8 GByte of memory each, and Java version 1.8.0_144.
5 Results and Discussion
This section presents and discusses the results obtained from the experiments using eight image descriptors on seven benchmark texture classification datasets. The results of applying three CNN-based methods to those benchmark datasets are also presented and discussed in this section.
5.1 Image Descriptors
The experimental results are presented in Table 2; each column, apart from the first, corresponds to one dataset. The first row lists the names of the datasets, whereas the first column lists the names of the image descriptors. To measure whether the achieved results by MGPD are significantly different from those of the other methods, the non-parametric Wilcoxon signed-rank test (Demšar, 2006; Derrac et al., 2011) is used with a 95% significance interval. The symbols or are used in Table 2 to indicate that MGPD has significantly better or significantly worse performance, respectively, than that of the corresponding method. Furthermore, the symbol is used to indicate that the difference in performance between MGPD and the corresponding method is not significant.
Average accuracy (%) of 1-NN using eight image descriptors on the seven texture images datasets ().
. | BrNoRo . | BrWiRo . | OutexTC00 . | OutexTC10 . | KySinHw . | KyNoRo . | KyWiRo . |
---|---|---|---|---|---|---|---|
LBP | 83.99 1.95 | 42.29 1.66 | 87.88 1.23 | 34.45 1.79 | 54.82 2.18 | 75.45 1.96 | 42.61 1.84 |
LBP | 68.49 3.18 | 67.63 2.62 | 69.50 2.88 | 64.50 1.05 | 81.76 1.46 | 67.41 2.94 | 69.46 3.33 |
CLBP | 82.37 4.88 | 85.78 2.70 | 81.00 2.95 | 86.10 2.39 | 97.310.73 | 90.561.16 | 88.97 2.82 |
LBC | 66.26 2.80 | 64.49 2.95 | 68.25 3.81 | 60.50 1.82 | 80.72 1.51 | 66.09 2.76 | 68.26 3.59 |
CLBC | 63.62 2.52 | 70.84 3.19 | 72.79 2.87 | 75.53 2.24 | 89.03 1.86 | 76.67 4.05 | 76.58 3.77 |
DRLBP | 83.17 2.49 | 69.65 2.41 | 84.96 1.79 | 63.97 2.57 | 85.30 2.32 | 86.26 1.14 | 74.05 2.07 |
GP-criptor | 90.92 1.94 | 92.49 1.14 | 87.68 1.87 | 86.82 1.93 | 94.06 1.63 | 86.66 1.79 | 88.51 1.39 |
MGPD | 93.001.82 | 92.941.73 | 89.753.12 | 87.822.73 | 95.88 1.23 | 90.53 2.09 | 90.911.51 |
. | BrNoRo . | BrWiRo . | OutexTC00 . | OutexTC10 . | KySinHw . | KyNoRo . | KyWiRo . |
---|---|---|---|---|---|---|---|
LBP | 83.99 1.95 | 42.29 1.66 | 87.88 1.23 | 34.45 1.79 | 54.82 2.18 | 75.45 1.96 | 42.61 1.84 |
LBP | 68.49 3.18 | 67.63 2.62 | 69.50 2.88 | 64.50 1.05 | 81.76 1.46 | 67.41 2.94 | 69.46 3.33 |
CLBP | 82.37 4.88 | 85.78 2.70 | 81.00 2.95 | 86.10 2.39 | 97.310.73 | 90.561.16 | 88.97 2.82 |
LBC | 66.26 2.80 | 64.49 2.95 | 68.25 3.81 | 60.50 1.82 | 80.72 1.51 | 66.09 2.76 | 68.26 3.59 |
CLBC | 63.62 2.52 | 70.84 3.19 | 72.79 2.87 | 75.53 2.24 | 89.03 1.86 | 76.67 4.05 | 76.58 3.77 |
DRLBP | 83.17 2.49 | 69.65 2.41 | 84.96 1.79 | 63.97 2.57 | 85.30 2.32 | 86.26 1.14 | 74.05 2.07 |
GP-criptor | 90.92 1.94 | 92.49 1.14 | 87.68 1.87 | 86.82 1.93 | 94.06 1.63 | 86.66 1.79 | 88.51 1.39 |
MGPD | 93.001.82 | 92.941.73 | 89.753.12 | 87.822.73 | 95.88 1.23 | 90.53 2.09 | 90.911.51 |
On the first benchmark dataset (BrNoRo), MGPD has achieved an average 93.00% accuracy, which represents the best achieved performance on this dataset as presented in the second column of Table 2. The statistical significance test shows that MGPD has significantly outperformed the competitor methods. Moreover, the gap between the performance of MGPD and that of GP-criptor is over 2%.
The proposed method shows the best performance compared with the other competitive methods on the second dataset (BrWiRo) with an average 92.94% accuracy. The results of the significance test show that, apart from the baseline method, MGPD has significantly better performance than that of the other methods. The average accuracy for MGPD over the 30 independent runs was greater than that for GP-criptor; however, the difference was not statistically significant.
The results of OutexTC00 are presented in the fourth column of Table 2. Similar to the previous two datasets, MGPD shows the best performance compared with all methods on this dataset, with an average 89.75% accuracy. Apart from LBP and GP-criptor, MGPD significantly outperformed the other benchmark methods with a minimum gap of 4.79% accuracy. The proposed method shows better performance than that of the LBP and GP-criptor methods with over 1.80% and 2.0% average accuracy, respectively.
On the rotated version of OutexTC00 (OutexTC10), MGPD has achieved 87.82% average accuracy and is the best performing method on this dataset. The statistical significance test shows that MGPD has significantly better performance than the other benchmark methods, apart from CLBP (86.10%) and GP-criptor (86.82%). Note that the performances of all methods, apart from CLBP and CLBC, have dropped compared to their performances on the rotation-free dataset.
The sixth column of Table 2 presents the experimental results on the KySinHw dataset. The results show that MGPD has achieved 95.88% average accuracy on this dataset. Although MGPD was not the best performing method on this dataset (CLBP), it was the second best method. The significance test shows that although MGPD has significantly worse performance than CLBP, it has significantly outperformed all the other benchmark methods.
The experimental results of the eight image descriptors on KyNoRo are presented in the seventh column of Table 2. The proposed method has achieved 90.53% average accuracy, which is the second best performance on this dataset with a difference of only 0.03% from the best performing method (CLBP). The statistical significance test shows that MGPD has significantly outperformed the other benchmark methods apart from CLBP where the difference is not significant.
The last column of Table 2 lists the results on the rotated version of Kylberg (KyWiRo). Unlike KyNoRo, MGPD has achieved the best performance on this dataset with 90.91% average accuracy. Apart from CLBP, the significance test shows that MGPD has significantly better performance than all the other methods.
5.2 Convolutional Neural Networks
CNN-based methods can operate directly on the raw pixel values and automatically perform feature extraction during the learning process. Hence, three CNN methods have been used with the same settings as the previous set of experiments; that is, only two instances per class are randomly drawn from the first half of the instances and used to train the model, whereas the instances of the second half are used to evaluate the trained model.
The average performance and standard deviation of 30 independent runs (using different seed values) for LeNet, CNN-5, and CNN-8 are presented in Table 3 and the best performing method on each dataset is presented in boldface font. Apart from BrNoRo, LeNet has achieved the best performance on the other six datasets compared with CNN-5 and CNN-8.
Average accuracy (%) of three CNN methods on the seven texture images datasets ().
. | BrNoRo . | BrWiRo . | OutexTC00 . | OutexTC10 . | KySinHw . | KyNoRo . | KyWiRo . |
---|---|---|---|---|---|---|---|
LeNet | 19.64 6.56 | 12.032.38 | 12.502.33 | 7.491.35 | 6.361.78 | 8.793.12 | 6.312.00 |
CNN-5 | 21.366.56 | 12.01 2.38 | 5.03 2.33 | 4.81 1.35 | 6.09 1.78 | 5.39 3.12 | 4.80 2.00 |
CNN-8 | 16.10 3.97 | 9.60 2.64 | 7.01 3.39 | 5.82 1.78 | 6.22 1.95 | 6.29 2.39 | 5.23 1.81 |
. | BrNoRo . | BrWiRo . | OutexTC00 . | OutexTC10 . | KySinHw . | KyNoRo . | KyWiRo . |
---|---|---|---|---|---|---|---|
LeNet | 19.64 6.56 | 12.032.38 | 12.502.33 | 7.491.35 | 6.361.78 | 8.793.12 | 6.312.00 |
CNN-5 | 21.366.56 | 12.01 2.38 | 5.03 2.33 | 4.81 1.35 | 6.09 1.78 | 5.39 3.12 | 4.80 2.00 |
CNN-8 | 16.10 3.97 | 9.60 2.64 | 7.01 3.39 | 5.82 1.78 | 6.22 1.95 | 6.29 2.39 | 5.23 1.81 |
The results clearly show that CNN-based methods were, as expected, unable to cope well with the small number of instances per class. The three methods have produced very poor performance on the unseen data, which is significantly worse than the results of utilising -NN that operates on the extracted features by any of the eight image descriptor methods in the previous experiment (see Table 2). It is important to notice that no data augmentation was used here, which can help to improve the performance of CNN-based method. However, data augmentation is out the scope of this study.
5.3 Results Summary
To summarise the experimental results, MGPD has significantly outperformed the other image descriptor methods in the majority of the cases, and it is the second best, if not the best, performing method on each of the experimented datasets. This method shows significantly better performance on 41 comparisons, 7 better or comparable (not significant) performances, and only 1 significantly worse performance of the 49 [=7 (benchmark methods) 7 (benchmark datasets)] comparisons.
With the use of -NN and the features extracted by the automatically evolved image descriptors by the proposed method, the performance was significantly better than CNN-based methods on all the seven datasets in this study.
6 Further Analysis
This section aims to provide further analysis and discussion on different aspects of MGPD in this study. The overall analysis is discussed first, which includes the convergence of the evolutionary process, program size, and time needed to evolve an image descriptor. The second part of this section focuses on analysing the automatically evolved image descriptors by MGPD on different datasets in order to provide insight into how such descriptors can perform well. Moreover, the analysis of such programs, that is, descriptors, can help to identify/learn significant patterns that may help in designing more powerful image descriptors.
6.1 Overall Analysis
During the evolutionary process, different factors of the best program at each generation are measured. Here, the convergence, program size, evolution time, and evaluation time of MGPD are discussed.
6.1.1 Convergence
6.1.2 Program Size
The average size (number of nodes) of the best program per generation.
Generally, the system starts from a relatively small program that continues to grow during the evolutionary process as presented in Figure 22. Having small/shallow individuals in the early generations is expected as the maximum depth is 5. The plots presented in Figure 22 show that the increment in the program size is smooth, which is also expected as the conventional crossover and mutation operators that only affect the size of a single sub-tree at a time were used. However, the variation in the program size at later generations is higher than that of the earlier generations as presented in Figure 22. This shows the system's ability to evolve compact/small individuals with good performance. This is an interesting aspect of the proposed system as it increases the interpretability of those evolved individuals.
Although the increase in the program size from the initial generation to the last generation is ranging between 78% (KyWiRo) and 80% (KySinHw), the average total number of nodes is below 600. In other words, each tree has approximately 67 nodes. This means that such programs can be evaluated very fast as only simple arithmetic operations are performed to calculate the output of each tree [the evolutionary (training) and evaluation (testing) times are discussed in the next subsection].
6.1.3 Evolutionary and Evaluation Times
Two important questions are as follows: (1) How long does it take to evolve a model/solution? (2) How fast is an evolved model at performing the feature extraction task? To answer these two questions, some in-depth analysis is performed here. The CPU time has been measured independently for the evolutionary and evaluation times.
The average evolution time (hours) and evaluation time (milliseconds) for MGPD on the seven datasets.
The average evolution time (hours) and evaluation time (milliseconds) for MGPD on the seven datasets.
To measure the evaluation time, only the best evolved program at the end of each run is used. The time to convert the entire test set from images to their corresponding feature vectors is measured and the average time per instance is calculated. Figure 23b shows the average time per instance calculated for the best evolved programs from 300 independent runs on each of the seven datasets. The results show that an evolved image descriptor by MGPD takes a few milliseconds to produce the corresponding feature vector for an image. On the Brodatz datasets (BrNoRo and BrWiRo), the average time does not exceed 15 milliseconds. As the evaluation time, that is, time for an individual to generate the corresponding feature vector for an image, varies among the different datasets, this clearly indicates that the size, that is, dimensions, of the instance has a direct impact on the time required to generate the feature vector. In fact, the dimensions of the instance represent the only factor, apart from the number of operators (terminal function nodes), that can slow down the feature vector extracting operation.
It is important to notice that MGPD is not optimised and different parts can be optimised to improve the efficiency of MGPD (for both evolutionary and evaluation procedures). For example, an evolved individual consists of multiple trees that can be evaluated in parallel. Moreover, different individuals can be evaluated in parallel as there is no interaction among those individuals during the fitness calculation of each of them. Using optimised platforms or programming languages that are capable of performing image processing operators more efficiently than ECJ and Java is another mechanism that can potentially improve MGPD. Most likely, the simplest optimisation that can be done is pre calculating the mean, standard deviation, minimum and maximum statistics for each position of the sliding window instead of recalculating those values for each individual in every generation. This was not performed in this study due to resource limitations, specifically physical memory.
6.2 Evolved Image Descriptors
To dig deeply into the key factors of MGPD, two of those automatically evolved good-performing descriptors are thoroughly analysed in this section. Furthermore, the subtrees of each individual are simplified and presented as a list of formulae each of which is represented as , where indicates the index of the corresponding sub-tree.
6.2.1 An Evolved Descriptor on KyWiRo
A good performing and relatively small image descriptor evolved by MGPD on the KyWiRo dataset is depicted in Figure 24. This program has achieved 88.74% accuracy on the unseen data. This program has been selected as it is easier to be visualised in a tree representation form. As shown in Figure 24, this program comprises 125 nodes in total including 67 terminals and 58 functions. On average, each tree in this program consists of 14 nodes. Clearly, some of the trees are very easy to interpret such as the second, third, fifth and eighth trees while the other trees are more complicated. However, those trees can further be simplified to the following nine equations:
An interesting point to notice regarding these simplified equations is that they use the same components in different ways. For example, the fifth () and eighth () trees use only the and terminals, and in the minimum value of the window is subtracted from the scaled standard deviation (multiplied by 3) value, whereas in , the scaled standard deviation value is subtracted from the minimum value. Moreover, the scaling factor of in and is different.
6.2.2 An Evolved Descriptor on KyNoRo
The nine trees of the best evolved image descriptor by MGPD on the KyNoRo dataset are presented in Figure 25, and the , , and are substituted by , , , and , respectively, to make these equation more readable:
Originally, this program comprises 219 nodes including 114 terminals and 105 functions. Clearly, the evaluation of such a program should not take a long time as only a few simple arithmetic operations must be performed. The program takes an average of 23 milliseconds to generate the feature vector for an image of size 115115 pixels. Furthermore, it takes only 3 hours and 45 minutes to evolve this program. Checking these simplified trees reveals some interesting patterns. For example, the seventh () and ninth () trees have similar structures and use similar terminals (, , and ); however, the standard deviation is scaled (multiplied by 2) in .
Confusion matrix for the best evolved program on the KyNoRo dataset, where the first row lists the predicted labels based on the generated feature vectors by this program and using 1-NN, and the actual class labels are listed in the first column.
. | blanket1 . | blanket2 . | canvas1 . | ceiling1 . | ceiling2 . | cushion1 . | floor1 . | floor2 . | grass1 . | lentils1 . | linseeds1 . | oatmeal1 . | pearlsugar1 . | rice1 . | rice2 . | rug1 . | sand1 . | scarf1 . | scarf2 . | screen1 . | seat1 . | seat2 . | sesameseeds1 . | stone1 . | stone2 . | stone3 . | stoneslab1 . | wall1 . | Accuracy . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
blanket1 | 69 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 0 | 0 | 0.86 |
blanket2 | 0 | 74 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.93 |
canvas1 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
ceiling1 | 0 | 0 | 0 | 66 | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.83 |
ceiling2 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
cushion1 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
floor1 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
floor2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
grass1 | 0 | 0 | 0 | 11 | 0 | 0 | 0 | 0 | 64 | 0 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0.80 |
lentils1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
linseeds1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
oatmeal1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 74 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.93 |
pearlsugar1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 71 | 0 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.89 |
rice1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.99 |
rice2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
rug1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 7 | 7 | 2 | 4 | 0 | 0 | 37 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.46 |
sand1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 74 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 4 | 0.93 |
scarf1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
scarf2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
screen1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 74 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.93 |
seat1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0.99 |
seat2 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 72 | 0 | 0 | 0 | 0 | 0 | 3 | 0.90 |
sesameseeds1 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 71 | 3 | 0 | 0 | 0 | 0 | 0.89 |
stone1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 1 | 0 | 0 | 0 | 0.99 |
stone2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 69 | 3 | 0 | 0 | 0.86 |
stone3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 65 | 0 | 4 | 0.81 |
stoneslab1 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 74 | 0 | 0.93 |
wall1 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 50 | 0.63 |
Average | 0.91 |
. | blanket1 . | blanket2 . | canvas1 . | ceiling1 . | ceiling2 . | cushion1 . | floor1 . | floor2 . | grass1 . | lentils1 . | linseeds1 . | oatmeal1 . | pearlsugar1 . | rice1 . | rice2 . | rug1 . | sand1 . | scarf1 . | scarf2 . | screen1 . | seat1 . | seat2 . | sesameseeds1 . | stone1 . | stone2 . | stone3 . | stoneslab1 . | wall1 . | Accuracy . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
blanket1 | 69 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 0 | 0 | 0.86 |
blanket2 | 0 | 74 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.93 |
canvas1 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
ceiling1 | 0 | 0 | 0 | 66 | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.83 |
ceiling2 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
cushion1 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
floor1 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
floor2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
grass1 | 0 | 0 | 0 | 11 | 0 | 0 | 0 | 0 | 64 | 0 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0.80 |
lentils1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
linseeds1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
oatmeal1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 74 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.93 |
pearlsugar1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 71 | 0 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.89 |
rice1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.99 |
rice2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
rug1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 7 | 7 | 2 | 4 | 0 | 0 | 37 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.46 |
sand1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 74 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 4 | 0.93 |
scarf1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
scarf2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
screen1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 74 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.93 |
seat1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0.99 |
seat2 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 72 | 0 | 0 | 0 | 0 | 0 | 3 | 0.90 |
sesameseeds1 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 71 | 3 | 0 | 0 | 0 | 0 | 0.89 |
stone1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 1 | 0 | 0 | 0 | 0.99 |
stone2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 69 | 3 | 0 | 0 | 0.86 |
stone3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 65 | 0 | 4 | 0.81 |
stoneslab1 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 74 | 0 | 0.93 |
wall1 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 50 | 0.63 |
Average | 0.91 |
6.3 Analysis Summary
The analysis shows that MGPD converges and makes large improvements over the first few generations but keeps improving over the evolutionary process. The program tends to start from relatively small programs that grow larger during the evolutionary process in order to build better solutions. The size of the evolved descriptors vary largely between the different executions of MGPD, which shows that MGPD is flexible for producing compact and large individuals for different problems. Although the evolution time is not fast, that is, it takes hours, the method is still faster than domain experts manually designing an image descriptor, which may take several days to build. The analysis shows that an evolved descriptor takes a few milliseconds to generate the feature vector for an image, which makes such models sufficient for online applications where a fast response is needed. The analysis of two individuals reveals the ability of the system to combine the terminal and function components efficiently to build image descriptors that can potentially outperform the domain-expert designed descriptors. Furthermore, the evolved models are interpretable and can be simplified and converted into a human-readable equations.
7 Conclusions
In this article, a multitree GP representation has been successfully utilised for the task of automatically evolving image descriptors where only a few instances from each class are used to provide feedback on the quality of those descriptors. An evolved descriptor comprises a number of trees, and simple arithmetic operators and first-order statistics are automatically combined to form each tree. The experimental results on seven texture datasets show the superior performance of MGPD compared with that of seven handcrafted expert designed descriptors and those automatically evolved by the baseline method. This study has also thoroughly analysed some key factors of the evolutionary process and the evolved descriptors. The analysis shows that the method finds good candidate solutions over the first few generations and iteratively improves those solutions over the later generations during the evolutionary process. Furthermore, these descriptors do not require a long time to evolve and they are very fast to evaluate as they only comprise a number of simple arithmetic operators. The interpretability of these descriptors represents a very important factor that can help in optimising/simplifying such descriptors. Analysing some of the automatically evolved image descriptors by MGPD revealed that different patterns can be identified/learned from them, for example, how function and terminal node types have been combined differently to form the subtrees of an individual.
7.1 Major Contributions
The study has made the following contributions:
This is the first study that utilises multitree GP to automatically evolve image descriptors using two instances per class.
The analysis reveals that the evolved descriptors are relatively fast to evolve and very fast to evaluate.
The evolved descriptors are interpretable and can be simplified.
The evolved descriptors have significantly outperformed both those manually crafted by domain experts and those automatically evolved by the state-of-the-art image descriptors.
7.2 Future Work
Different directions can be investigated in the future in order to either further improve the performance of MGPD or to elucidate into the semantics of the different components of MGPD. Apart from the minimum and maximum tree depth, there is no restriction on the program size in the current implementation. Different mechanisms and approaches can be explored, for example, multiobjective, to reduce the program size, which can largely reduce the complexity of the evolved programs. We aim to investigate this in the future. We would also like to study the effect of using different crossover and mutation operators that apply multiple changes, instead of the conventional single change, on the selected program(s). Although this is believed to speed-up the exploration process of the search space and may lead to identification of better individuals, it requires substantial changes. Moreover, investigating the impact of the number of trees in the individual on the performance of MGPD can be studied in the future. The main goal of this article is to use few-shot learning in GP for evolving text image descriptors. In the literature, transfer learning and data augmentation are also used to cope with a small number of instances. In the future, we will investigate the pros and cons of these methods for further improvements. Although the current design considers only grey-scale images, it can be extended to handle coloured images, which is an ongoing work.
Acknowlegment
This work was supported in part by the MBIE Data Science SSIF Fund under the contract RTVU1914, the Marsden Fund of New Zealand Government under Contracts VUW1615, VUW1913 and VUW1914, and the Science for Technological Innovation Challenge (SfTI) fund under grant E3603/2903.
Notes
Available at: http://multibandtexture.recherche.usherbrooke.ca
Available at: http://www.outex.oulu.fi/index.php?page=classification
Available at: http://www.cb.uu.se/∼gustaf/KylbergSintornRotation/
Available at: http://www.cb.uu.se/∼gustaf/texture/
Available at: http://www.cse.oulu.fi/CMV/Downloads/LBPMatlab
Available at: http://www.comp.polyu.edu.hk/∼cslzhang/code/CLBP.rar
Available at: http://www.cs.tut.fi/∼mehta/drlbp