Although the use of ensemble methods in machine-learning is ubiquitous due to their proven ability to outperform their constituent algorithms, ensembles of optimisation algorithms have received relatively little attention. Existing approaches lag behind machine-learning in both theory and practice, with no principled design guidelines available. In this article, we address fundamental questions regarding ensemble composition in optimisation using the domain of bin-packing as an example. In particular, we investigate the trade-off between accuracy and diversity, and whether diversity metrics can be used as a proxy for constructing an ensemble, proposing a number of novel metrics for comparing algorithm diversity. We find that randomly composed ensembles can outperform ensembles of high-performing algorithms under certain conditions and that judicious choice of diversity metric is required to construct good ensembles. The method and findings can be generalised to any metaheuristic ensemble, and lead to better understanding of how to undertake principled ensemble design.