In many problems of interest, performance can be evaluated using tests, such as examples in concept learning, test points in function approximation, and opponents in game-playing. Evaluation on all tests is often infeasible. Identification of an accurate evaluation or fitness function is a difficult problem in itself, and approximations are likely to introduce human biases into the search process. Coevolution evolves the set of tests used for evaluation, but has so far often led to inaccurate evaluation.
We show that for any set of learners, a Complete Evaluation Set can be determined that provides ideal evaluation as specified by Evolutionary Multi-Objective Optimization. This provides a principled approach to evaluation in coevolution, and thereby brings automatic ideal evaluation within reach. The Complete Evaluation Set is of manageable size, and progress towards it can be accurately measured. Based on this observation, an algorithm named DELPHI is developed. The algorithm is tested on problems likely to permit progress on only a subset of the underlying objectives. Where all comparison methods result in overspecialization, the proposed method and a variant achieve sustained progress in all underlying objectives. These findings demonstrate that ideal evaluation may be approximated by practical algorithms, and that accurate evaluation for test-based problems is possible even when the underlying objectives of a problem are unknown.