We examine the learnability of emergent flocking behavior in boid simulations. To this end, we present (1) a detailed definition of the boid model, (2) a formulation such that model instances can be simulated efficiently, (3) metrics for training surrogate models, (4) and an evaluation of early training results. For this proof of concept, we focus on simple architectures like multi-layer perceptrons and graph neural networks. The performance of these models is comparable to simulations with an absolute error in the boid state of 5% in varying scenarios with varying interaction patterns and even surpasses the erroneous simulations for the prediction of formed flocks. By splitting the prediction task into a boid adjacency detection and a rule-application task, we observe that wrong interactions between boids only have a minor impact on the prediction results. Besides evaluating more complex models, we suggest focusing on either the detection of stable emergent states to predict them separately or on the understanding of dynamic transitions of groups that show emergent behavior.