Abstract
Complementary to machine learning, controllers for swarm robotics can also be evolved using methods of evolutionary computation. Approaches such as novelty search and MAP-Elites go beyond mere fitness-based optimization by increasing the time spent on exploration. Instead of optimizing a fitness function, selective pressure towards unexplored behavior space is generated by forcing behavioral distance to previously seen behaviors. Ideally, we would like to define a generic behavioral distance function; however, effective distance functions are usually domain specific.
Our minimize surprise approach concurrently evolves two artificial neural networks: one for action selection and one as world model. Selective pressure is implemented by rewarding good predictions of the world model. As an effect, the evolutionary dynamics push towards swarm behaviors that are easy to predict, that is, the robots virtually try to minimize surprise in their environment. Here, we compare minimize surprise to novelty search and, as baseline, a genetic algorithm in simulations of swarm robots. We observe a diversity of collective behaviors, such as aggregation, dispersion, clustering, line formation, etc. We find that minimize surprise is competitive to novelty search for the investigated swarm scenario, although it does not require a cleverly crafted domain-specific behavioral distance function.