Research on multi-objective evolutionary algorithms (MOEAs) has produced over the past decades a large number of algorithms and a rich literature on performance assessment tools to evaluate and compare them. Yet, newly proposed MOEAs are typically compared against very few, often a decade older MOEAs. One reason for this apparent contradiction is the lack of a common baseline for comparison, with each subsequent study often devising its own experimental scenario, slightly different from other studies. As a result, the state of the art in MOEAs is a disputed topic. This article reports a systematic, comprehensive evaluation of a large number of MOEAs that covers a wide range of experimental scenarios. A novelty of this study is the separation between the higher-level algorithmic components related to multi-objective optimization (MO), which characterize each particular MOEA, and the underlying parameters—such as evolutionary operators, population size, etc.—whose configuration may be tuned for each scenario. Instead of relying on a common or “default” parameter configuration that may be low-performing for particular MOEAs or scenarios and unintentionally biased, we tune the parameters of each MOEA for each scenario using automatic algorithm configuration methods. Our results confirm some of the assumed knowledge in the field, while at the same time they provide new insights on the relative performance of MOEAs for many-objective problems. For example, under certain conditions, indicator-based MOEAs are more competitive for such problems than previously assumed. We also analyze problem-specific features affecting performance, the agreement between performance metrics, and the improvement of tuned configurations over the default configurations used in the literature. Finally, the data produced is made publicly available to motivate further analysis and a baseline for future comparisons.