Statistical mechanics is used to study generalization in a tree committee machine with K hidden units and continuous weights trained on examples generated by a teacher of the same structure but corrupted by noise. The corruption is due to additive gaussian noise applied in the input layer or the hidden layer of the teacher. In the large K limit the generalization error εg as function of α, the number of patterns per adjustable parameter, shows a qualitatively similar behavior for the two cases: It does not approach its optimal value and is nonmonotonic if training is done at zero temperature. This remains true even when replica symmetry breaking is taken into account. Training at a fixed positive temperature leads, within the replica symmetric theory, to an α-k decay of εg toward its optimal value. The value of k is calculated and found to depend on the model of noise. By scaling the temperature with α, the value of k can be increased to an optimal value kopt. However, at one step of replica symmetry breaking at a fixed positive temperature εg decays as α−kopt. So, although εg will approach its optimal value with increasing sample size for any fixed K, the convergence is only uniform in K when training at a positive temperature.