Abstract
Real classification problems involve structured data that can be essentially grouped into a relatively small number of clusters. It is shown that, under a local clustering condition, a set of points of a given class, embedded in binary space by a set of randomly parameterized surfaces, is linearly separable from other classes, with arbitrarily high probability. We call such a data set a local relative cluster. The size of the embedding set is shown to be inversely proportional to the squared local clustering degree. A simple parameterization by embedding hyperplanes, implementing a voting system, results in a random reduction of the nearest-neighbor method and leads to the separation of multicluster data by a network with two internal layers. This represents a considerable reduction of the learning problem with respect to known techniques, resolving a long-standing question on the complexity of random embedding. Numerical tests show that the proposed method performs as well as state-of the-art methods and in a small fraction of the time.