The Gestalt principle of collinearity (and curvilinearity) is widely regarded as being mediated by the long-range connection structure in primary visual cortex. We review the neurophysiological and psychophysical literature to argue that these connections are developed from visual experience after birth, relying on coherent object motion. We then present a neural network model that learns these connections in an unsupervised Hebbian fashion with input from real camera sequences. The model uses spatiotemporal retinal filtering, which is very sensitive to changes in the visual input. We show that it is crucial for successful learning to use the correlation of the transient responses instead of the sustained ones. As a consequence, learning works best with video sequences of moving objects. The model addresses a special case of the fundamental question of what represents the necessary a priori knowledge the brain is equipped with at birth so that the self-organized process of structuring by experience can be successful.