Incrementally constructed cascade architectures are a promising alternative to networks of predefined size. This paper compares the direct cascade architecture (DCA) proposed in Littmann and Ritter (1992) to the cascade-correlation approach of Fahlman and Lebiere (1990) and to related approaches and discusses the properties on the basis of various benchmark results. One important virtue of DCA is that it allows the cascading of entire subnetworks , even if these admit no error-backpropagation. Exploiting this flexibility and using LLM networks as cascaded elements, we show that the performance of the resulting network cascades can be greatly enhanced compared to the performance of a single network. Our results for the Mackey-Glass time series prediction task indicate that such deeply cascaded network architectures achieve good generalization even on small data sets, when shallow, broad architectures of comparable size suffer from overfitting. We conclude that the DCA approach offers a powerful and flexible alternative to existing schemes such as, e.g., the mixtures of experts approach, for the construction of modular systems from a wide range of subnetwork types.