We address unsupervised learning subject to structural constraints, with particular emphasis placed on clustering with an imposed decision tree structure. Most known methods are greedy, optimizing one node of the tree at a time to minimize a local cost. By constrast, we develop a joint optimization method, derived based on information-theoretic principles and closely related to known methods in statistical physics. The approach is inspired by the deterministic annealing algorithm for unstructured data clustering, which was based on maximum entropy inference. The new approach is founded on the principle of minimum cross-entropy, using informative priors to approximate the unstructured clustering solution while imposing the structural constraint. The resulting method incorporates supervised learning principles applied in an unsupervised problem setting. In our approach, the tree “grows” by a sequence of bifurcations that occur while optimizing an effective free energy cost at decreasing temperature scales. Thus, estimates of the tree size and structure are naturally obtained at each temperature in the process. Examples demonstrate considerable improvement over known methods.