We propose a method for estimating probability density functions and conditional density functions by training on data produced by such distributions. The algorithm employs new stochastic variables that amount to coding of the input, using a principle of entropy maximization. It is shown to be closely related to the maximum likelihood approach. The encoding step of the algorithm provides an estimate of the probability distribution. The decoding step serves as a generative mode, producing an ensemble of data with the desired distribution. The algorithm is readily implemented by neural networks, using stochastic gradient ascent to achieve entropy maximization.