In order to interact intelligently with objects in the world, animals must first transform neural population responses into estimates of the dynamic, unknown stimuli that caused them. The Bayesian solution to this problem is known as a Bayes filter, which applies Bayes' rule to combine population responses with the predictions of an internal model. The internal model of the Bayes filter is based on the true stimulus dynamics, and in this note, we present a method for training a theoretical neural circuit to approximately implement a Bayes filter when the stimulus dynamics are unknown. To do this we use the inferential properties of linear probabilistic population codes to compute Bayes' rule and train a neural network to compute approximate predictions by the method of maximum likelihood. In particular, we perform stochastic gradient descent on the negative log-likelihood of the neural network parameters with a novel approximation of the gradient. We demonstrate our methods on a finite-state, a linear, and a nonlinear filtering problem and show how the hidden layer of the neural network develops tuning curves consistent with findings in experimental neuroscience.