Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-16 of 16
Mengjie Zhang
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation 1–28.
Published: 27 November 2024
Abstract
View article
PDF
Performing classification on high-dimensional data poses a significant challenge due to the huge search space. Moreover, complex feature interactions introduce an additional obstacle. The problems can be addressed by using feature selection to select relevant features or feature construction to construct a small set of high-level features. However, performing feature selection or feature construction might only make the feature set suboptimal. To remedy this problem, this study investigates the use of genetic programming for simultaneous feature selection and feature construction in addressing different classification tasks. The proposed approach is tested on 16 datasets and compared with seven methods including both feature selection and feature construction techniques. The results show that the obtained feature sets with the constructed and/or selected features can significantly increase the classification accuracy and reduce the dimensionality of the datasets. Further analysis reveals the complementarity of the obtained features leading to the promising classification performance of the proposed method.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation 1–27.
Published: 21 November 2024
Abstract
View article
PDF
High-dimensionality is one of the serious real-world data challenges in symbolic regression and it is more challenging if the data are incomplete. Genetic programming has been successfully utilised for high-dimensional tasks due to its natural feature selection ability, but it is not directly applicable to incomplete data. Commonly, it needs to impute the missing values first and then perform genetic programming on the imputed complete data. However, in the case of having many irrelevant features being incomplete, intuitively, it is not necessary to perform costly imputations on such features. For this purpose, this work proposes a genetic programming-based approach to select features directly from incomplete high-dimensional data to improve symbolic regression performance. We extend the concept of identity/neutral elements from mathematics into the function operators of genetic programming, thus they can handle the missing values in incomplete data. Experiments have been conducted on a number of data sets considering different missingness ratios in high-dimensional symbolic regression tasks. The results show that the proposed method leads to better symbolic regression results when compared with state-of-the-art methods that can select features directly from incomplete data. Further results show that our approach not only leads to better symbolic regression accuracy but also selects a smaller number of relevant features, and consequently improves both the effectiveness and the efficiency of the learning process.
Includes: Supplementary data
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation 1–34.
Published: 05 November 2024
Abstract
View article
PDF
In classification, feature selection is an essential preprocessing step that selects a small subset of features to improve classification performance. Existing feature selection approaches can be divided into three main approaches: wrapper approaches, filter approaches, and embedded approaches. In comparison with the two other approaches, embedded approaches usually have better trade-off between classification performance and computation time. One of the most well-known embedded approaches is sparsity regularisation-based feature selection which generates sparse solutions for feature selection. Despite its good performance, sparsity regularisation-based feature selection outputs only a feature ranking which requires the number of selected features to be predefined. More importantly, the ranking mechanism introduces a risk of ignoring feature interactions which leads to the fact that many top-ranked but redundant features are selected. This work addresses the above problems by proposing a new representation that considers the interactions between features and can automatically determine an appropriate number of selected features. The proposed representation is used in a differential evolutionary (DE) algorithm to optimise the feature subset. In addition, a novel initialisation mechanism is proposed to let DE consider various numbers of selected features at the beginning. The proposed algorithm is examined on both synthetic and real-world datasets. The results on the synthetic dataset show that the proposed algorithm can select complementary features while existing sparsity regularisation-based feature selection algorithms are at risk of selecting redundant features. The results on real-world datasets show that the proposed algorithm achieves better classification performance than well-known wrapper, filter, and embedded approaches. The algorithm is also as efficient as filter feature selection approaches.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2024) 32 (3): 217–248.
Published: 03 September 2024
FIGURES
Abstract
View article
PDF
Minimizing the number of selected features and maximizing the classification performance are two main objectives in feature selection, which can be formulated as a bi-objective optimization problem. Due to the complex interactions between features, a solution (i.e., feature subset) with poor objective values does not mean that all the features it selects are useless, as some of them combined with other complementary features can greatly improve the classification performance. Thus, it is necessary to consider not only the performance of feature subsets in the objective space, but also their differences in the search space, to explore more promising feature combinations. To this end, this paper proposes a tri-objective method for bi-objective feature selection in classification, which solves a bi-objective feature selection problem as a tri-objective problem by considering the diversity (differences) between feature subsets in the search space as the third objective. The selection based on the converted tri-objective method can maintain a balance between minimizing the number of selected features, maximizing the classification performance, and exploring more promising feature subsets. Furthermore, a novel initialization strategy and an offspring reproduction operator are proposed to promote the diversity of feature subsets in the objective space and improve the search ability, respectively. The proposed algorithm is compared with five multiobjective-based feature selection methods, six typical feature selection methods, and two peer methods with diversity as a helper objective. Experimental results on 20 real-world classification datasets suggest that the proposed method outperforms the compared methods in most scenarios.
Includes: Supplementary data
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation 1–26.
Published: 05 August 2024
Abstract
View article
PDF
Evolutionary Computation (EC) often throws away learned knowledge as it is reset for each new problem addressed. Conversely, humans can learn from small-scale problems, retain this knowledge (plus functionality), and then successfully reuse them in larger-scale and/or related problems. Linking solutions to problems has been achieved through layered learning, where an experimenter sets a series of simpler related problems to solve a more complex task. Recent works on Learning Classifier Systems (LCSs) has shown that knowledge reuse through the adoption of Code Fragments, GP-like tree-based programs, is plausible. However, random reuse is inefficient. Thus, the research question is how LCS can adopt a layered-learning framework, such that increasingly complex problems can be solved efficiently. An LCS (named XCSCF*) has been developed to include the required base axioms necessary for learning, refined methods for transfer learning and learning recast as a decomposition into a series of subordinate problems. These subordinate problems can be set as a curriculum by a teacher, but this does not mean that an agent can learn from it; especially if it only extracts over-fitted knowledge of each problem rather than the underlying scalable patterns and functions. Results show that from a conventional tabula rasa, with only a vague notion of which subordinate problems might be relevant, XCSCF* captures the general logic behind the tested domains and therefore can solve any n-bit Multiplexer, n-bit Carry-one, n-bit Majority-on, and n-bit Even-parity problems. This work demonstrates a step towards continual learning as learned knowledge is effectively reused in subsequent problems.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2022) 30 (1): 99–129.
Published: 01 March 2022
Abstract
View article
PDF
High-dimensional unbalanced classification is challenging because of the joint effects of high dimensionality and class imbalance. Genetic programming (GP) has the potential benefits for use in high-dimensional classification due to its built-in capability to select informative features. However, once data are not evenly distributed, GP tends to develop biased classifiers which achieve a high accuracy on the majority class but a low accuracy on the minority class. Unfortunately, the minority class is often at least as important as the majority class. It is of importance to investigate how GP can be effectively utilized for high-dimensional unbalanced classification. In this article, to address the performance bias issue of GP, a new two-criterion fitness function is developed, which considers two criteria, that is, the approximation of area under the curve (AUC) and the classification clarity (i.e., how well a program can separate two classes). The obtained values on the two criteria are combined in pairs, instead of summing them together. Furthermore, this article designs a three-criterion tournament selection to effectively identify and select good programs to be used by genetic operators for generating offspring during the evolutionary learning process. The experimental results show that the proposed method achieves better classification performance than other compared methods.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2021) 29 (3): 331–366.
Published: 01 September 2021
FIGURES
Abstract
View article
PDF
The performance of image classification is highly dependent on the quality of the extracted features that are used to build a model. Designing such features usually requires prior knowledge of the domain and is often undertaken by a domain expert who, if available, is very costly to employ. Automating the process of designing such features can largely reduce the cost and efforts associated with this task. Image descriptors, such as local binary patterns, have emerged in computer vision, and aim at detecting keypoints, for example, corners, line-segments, and shapes, in an image and extracting features from those keypoints. In this article, genetic programming (GP) is used to automatically evolve an image descriptor using only two instances per class by utilising a multitree program representation. The automatically evolved descriptor operates directly on the raw pixel values of an image and generates the corresponding feature vector. Seven well-known datasets were adapted to the few-shot setting and used to assess the performance of the proposed method and compared against six handcrafted and one evolutionary computation-based image descriptor as well as three convolutional neural network (CNN) based methods. The experimental results show that the new method has significantly outperformed the competitor image descriptors and CNN-based methods. Furthermore, different patterns have been identified from analysing the evolved programs.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2021) 29 (1): 75–105.
Published: 01 March 2021
FIGURES
| View All (5)
Abstract
View article
PDF
Dynamic Flexible Job Shop Scheduling (DFJSS) is an important and challenging problem, and can have multiple conflicting objectives. Genetic Programming Hyper-Heuristic (GPHH) is a promising approach to fast respond to the dynamic and unpredictable events in DFJSS. A GPHH algorithm evolves dispatching rules (DRs) that are used to make decisions during the scheduling process (i.e., the so-called heuristic template). In DFJSS, there are two kinds of scheduling decisions: the routing decision that allocates each operation to a machine to process it, and the sequencing decision that selects the next job to be processed by each idle machine. The traditional heuristic template makes both routing and sequencing decisions in a non-delay manner, which may have limitations in handling the dynamic environment. In this article, we propose a novel heuristic template that delays the routing decisions rather than making them immediately. This way, all the decisions can be made under the latest and most accurate information. We propose three different delayed routing strategies, and automatically evolve the rules in the heuristic template by GPHH. We evaluate the newly proposed GPHH with Delayed Routing (GPHH-DR) on a multiobjective DFJSS that optimises the energy efficiency and mean tardiness. The experimental results show that GPHH-DR significantly outperformed the state-of-the-art GPHH methods. We further demonstrated the efficacy of the proposed heuristic template with delayed routing, which suggests the importance of delaying the routing decisions.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2020) 28 (4): 531–561.
Published: 01 December 2020
Abstract
View article
PDF
Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2020) 28 (4): 563–593.
Published: 01 December 2020
Abstract
View article
PDF
Due to its direct relevance to post-disaster operations, meter reading and civil refuse collection, the Uncertain Capacitated Arc Routing Problem (UCARP) is an important optimisation problem. Stochastic models are critical to study as they more accurately represent the real world than their deterministic counterparts. Although there have been extensive studies in solving routing problems under uncertainty, very few have considered UCARP, and none consider collaboration between vehicles to handle the negative effects of uncertainty. This article proposes a novel Solution Construction Procedure (SCP) that generates solutions to UCARP within a collaborative, multi-vehicle framework. It consists of two types of collaborative activities: one when a vehicle unexpectedly expends capacity ( route failure ), and the other during the refill process. Then, we propose a Genetic Programming Hyper-Heuristic (GPHH) algorithm to evolve the routing policy used within the collaborative framework. The experimental studies show that the new heuristic with vehicle collaboration and GP-evolved routing policy significantly outperforms the compared state-of-the-art algorithms on commonly studied test problems. This is shown to be especially true on instances with larger numbers of tasks and vehicles. This clearly shows the advantage of vehicle collaboration in handling the uncertain environment, and the effectiveness of the newly proposed algorithm.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2020) 28 (2): 289–316.
Published: 01 June 2020
FIGURES
| View All (9)
Abstract
View article
PDF
The uncertain capacitated arc routing problem is of great significance for its wide applications in the real world. In the uncertain capacitated arc routing problem, variables such as task demands and travel costs are realised in real time. This may cause the predefined solution to become ineffective and/or infeasible. There are two main challenges in solving this problem. One is to obtain a high-quality and robust baseline task sequence , and the other is to design an effective recourse policy to adjust the baseline task sequence when it becomes infeasible and/or ineffective during the execution. Existing studies typically only tackle one challenge (the other being addressed using a naive strategy). No existing work optimises the baseline task sequence and recourse policy simultaneously. To fill this gap, we propose a novel proactive-reactive approach, which represents a solution as a baseline task sequence and a recourse policy. The two components are optimised under a cooperative coevolution framework, in which the baseline task sequence is evolved by an estimation of distribution algorithm, and the recourse policy is evolved by genetic programming. The experimental results show that the proposed algorithm, called Solution-Policy Coevolver, significantly outperforms the state-of-the-art algorithms to the uncertain capacitated arc routing problem for the ugdb and uval benchmark instances. Through further analysis, we discovered that route failure is not always detrimental. Instead, in certain cases (e.g., when the vehicle is on the way back to the depot) allowing route failure can lead to better solutions.
Includes: Supplementary data
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2019) 27 (3): 467–496.
Published: 01 September 2019
FIGURES
| View All (17)
Abstract
View article
PDF
Designing effective dispatching rules for production systems is a difficult and time-consuming task if it is done manually. In the last decade, the growth of computing power, advanced machine learning, and optimisation techniques has made the automated design of dispatching rules possible and automatically discovered rules are competitive or outperform existing rules developed by researchers. Genetic programming is one of the most popular approaches to discovering dispatching rules in the literature, especially for complex production systems. However, the large heuristic search space may restrict genetic programming from finding near optimal dispatching rules. This article develops a new hybrid genetic programming algorithm for dynamic job shop scheduling based on a new representation, a new local search heuristic, and efficient fitness evaluators. Experiments show that the new method is effective regarding the quality of evolved rules. Moreover, evolved rules are also significantly smaller and contain more relevant attributes.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2017) 25 (2): 173–204.
Published: 01 June 2017
FIGURES
| View All (19)
Abstract
View article
PDF
A main research direction in the field of evolutionary machine learning is to develop a scalable classifier system to solve high-dimensional problems. Recently work has begun on autonomously reusing learned building blocks of knowledge to scale from low-dimensional problems to high-dimensional ones. An XCS-based classifier system, known as XCSCFC, has been shown to be scalable, through the addition of expression tree–like code fragments, to a limit beyond standard learning classifier systems. XCSCFC is especially beneficial if the target problem can be divided into a hierarchy of subproblems and each of them is solvable in a bottom-up fashion. However, if the hierarchy of subproblems is too deep, then XCSCFC becomes impractical because of the needed computational time and thus eventually hits a limit in problem size. A limitation in this technique is the lack of a cyclic representation, which is inherent in finite state machines (FSMs). However, the evolution of FSMs is a hard task owing to the combinatorially large number of possible states, connections, and interaction. Usually this requires supervised learning to minimize inappropriate FSMs, which for high-dimensional problems necessitates subsampling or incremental testing. To avoid these constraints, this work introduces a state-machine-based encoding scheme into XCS for the first time, termed XCSSMA. The proposed system has been tested on six complex Boolean problem domains: multiplexer, majority-on, carry, even-parity, count ones, and digital design verification problems. The proposed approach outperforms XCSCFA (an XCS that computes actions) and XCSF (an XCS that computes predictions) in three of the six problem domains, while the performance in others is similar. In addition, XCSSMA evolved, for the first time, compact and human readable general classifiers (i.e., solving any n -bit problems) for the even-parity and carry problem domains, demonstrating its ability to produce scalable solutions using a cyclic representation.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2016) 24 (1): 143–182.
Published: 01 March 2016
FIGURES
| View All (27)
Abstract
View article
PDF
In the computer vision and pattern recognition fields, image classification represents an important yet difficult task. It is a challenge to build effective computer models to replicate the remarkable ability of the human visual system, which relies on only one or a few instances to learn a completely new class or an object of a class. Recently we proposed two genetic programming (GP) methods, one-shot GP and compound-GP, that aim to evolve a program for the task of binary classification in images. The two methods are designed to use only one or a few instances per class to evolve the model. In this study, we investigate these two methods in terms of performance, robustness, and complexity of the evolved programs. We use ten data sets that vary in difficulty to evaluate these two methods. We also compare them with two other GP and six non-GP methods. The results show that one-shot GP and compound-GP outperform or achieve results comparable to competitor methods. Moreover, the features extracted by these two methods improve the performance of other classifiers with handcrafted features and those extracted by a recently developed GP-based method in most cases.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2014) 22 (4): 629–650.
Published: 01 December 2014
FIGURES
| View All (14)
Abstract
View article
PDF
Image pattern classification is a challenging task due to the large search space of pixel data. Supervised and subsymbolic approaches have proven accurate in learning a problem’s classes. However, in the complex image recognition domain, there is a need for investigation of learning techniques that allow humans to interpret the learned rules in order to gain an insight about the problem. Learning classifier systems (LCSs) are a machine learning technique that have been minimally explored for image classification. This work has developed the feature pattern classification system (FPCS) framework by adopting Haar-like features from the image recognition domain for feature extraction. The FPCS integrates Haar-like features with XCS, which is an accuracy-based LCS. A major contribution of this work is that the developed framework is capable of producing human-interpretable rules. The FPCS system achieved 91 1% accuracy on the unseen test set of the MNIST dataset. In addition, the FPCS is capable of autonomously adjusting the rotation angle in unaligned images. This rotation adjustment raised the accuracy of FPCS to 95%. Although the performance is competitive with equivalent approaches, this was not as accurate as subsymbolic approaches on this dataset. However, the benefit of the interpretability of rules produced by FPCS enabled us to identify the distribution of the learned angles—a normal distribution around —which would have been very difficult in subsymbolic approaches. The analyzable nature of FPCS is anticipated to be beneficial in domains such as speed sign recognition, where underlying reasoning and confidence of recognition needs to be human interpretable.
Journal Articles
Publisher: Journals Gateway
Evolutionary Computation (2014) 22 (1): 105–138.
Published: 01 March 2014
FIGURES
| View All (6)
Abstract
View article
PDF
Due-date assignment plays an important role in scheduling systems and strongly influences the delivery performance of job shops. Because of the stochastic and dynamic nature of job shops, the development of general due-date assignment models (DDAMs) is complicated. In this study, two genetic programming (GP) methods are proposed to evolve DDAMs for job shop environments. The experimental results show that the evolved DDAMs can make more accurate estimates than other existing dynamic DDAMs with promising reusability. In addition, the evolved operation-based DDAMs show better performance than the evolved DDAMs employing aggregate information of jobs and machines.