We propose an approach for learning latent directed polytrees as long as there exists an appropriately defined discrepancy measure between the observed nodes. Specifically, we use our approach for learning directed information polytrees where samples are available from only a subset of processes. Directed information trees are a new type of probabilistic graphical models that represent the causal dynamics among a set of random processes in a stochastic system. We prove that the approach is consistent for learning minimal latent directed trees. We analyze the sample complexity of the learning task when the empirical estimator of mutual information is used as the discrepancy measure.