This paper considers a phenomenon in Estimation of Distribution Algorithms (EDA) analogous to drift in population genetic dynamics. Finite population sampling in selection results in fluctuations which get reinforced when the probability model is updated. As a consequence, any probability model which can generate only a single set of values with probability 1 can be an attractive fixed point of the algorithm. To avoid this, parameters of the algorithm must scale with the system size in strongly problem-dependent ways, or the algorithm must be modified. This phenomenon is shown to hold for general EDAs as a consequence of the lack of ergodicity and irreducibility of the Markov chain on the state of probability models. It is illustrated in the case of UMDA, in which it is shown that the global optimum is only found if the population size is sufficiently large. For the needle-in-a haystack problem, the population size must scale as the square-root of the size of the search space. For the one-max problem, the population size must scale as the square-root of the problem size.