Learning with imbalanced data is one of the recent challenges in machine learning. Various solutions have been proposed in order to find a treatment for this problem, such as modifying methods or the application of a preprocessing stage. Within the preprocessing focused on balancing data, two tendencies exist: reduce the set of examples (undersampling) or replicate minority class examples (oversampling). Undersampling with imbalanced datasets could be considered as a prototype selection procedure with the purpose of balancing datasets to achieve a high classification rate, avoiding the bias toward majority class examples. Evolutionary algorithms have been used for classical prototype selection showing good results, where the fitness function is associated to the classification and reduction rates. In this paper, we propose a set of methods called evolutionary undersampling that take into consideration the nature of the problem and use different fitness functions for getting a good trade-off between balance of distribution of classes and performance. The study includes a taxonomy of the approaches and an overall comparison among our models and state of the art undersampling methods. The results have been contrasted by using nonparametric statistical procedures and show that evolutionary undersampling outperforms the nonevolutionary models when the degree of imbalance is increased.

This content is only available as a PDF.