Supervised learning of perceptron networks is investigated as an optimization problem. It is shown that both the theoretical and the empirical error functionals achieve minima over sets of functions computable by networks with a given number n of perceptrons. Upper bounds on rates of convergence of these minima with n increasing are derived. The bounds depend on a certain regularity of training data expressed in terms of variational norms of functions interpolating the data (in the case of the empirical error) and the regression function (in the case of the expected error). Dependence of this type of regularity on dimensionality and on magnitudes of partial derivatives is investigated. Conditions on the data, which guarantee that a good approximation of global minima of error functionals can be achieved using networks with a limited complexity, are derived. The conditions are in terms of oscillatory behavior of the data measured by the product of a function of the number of variables d, which is decreasing exponentially fast, and the maximum of the magnitudes of the squares of the L1-norms of the iterated partial derivatives of the order d of the regression function or some function, which interpolates the sample of the data. The results are illustrated by examples of data with small and high regularity constructed using Boolean functions and the gaussian function.