Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-7 of 7
Bernhard Schölkopf
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (7): 1484–1517.
Published: 01 July 2014
FIGURES
| View All (12)
Abstract
View article
PDF
Causal discovery via the asymmetry between the cause and the effect has proved to be a promising way to infer the causal direction from observations. The basic idea is to assume that the mechanism generating the cause distribution p ( x ) and that generating the conditional distribution p ( y | x ) correspond to two independent natural processes and thus p ( x ) and p ( y | x ) fulfill some sort of independence condition. However, in many situations, the independence condition does not hold for the anticausal direction; if we consider p ( x , y ) as generated via p ( y ) p ( x | y ), then there are usually some contrived mutual adjustments between p ( y ) and p ( x | y ). This kind of asymmetry can be exploited to identify the causal direction. Based on this postulate, in this letter, we define an uncorrelatedness criterion between p ( x ) and p ( y | x ) and, based on this uncorrelatedness, show asymmetry between the cause and the effect in terms that a certain complexity metric on p ( x ) and p ( y | x ) is less than the complexity metric on p ( y ) and p ( x | y ). We propose a Hilbert space embedding-based method EMD (an abbreviation for EMbeDding) to calculate the complexity metric and show that this method preserves the relative magnitude of the complexity metric. Based on the complexity metric, we propose an efficient kernel-based algorithm for causal discovery. The contribution of this letter is threefold. It allows a general transformation from the cause to the effect involving the noise effect and is applicable to both one-dimensional and high-dimensional data. Furthermore it can be used to infer the causal ordering for multiple variables. Extensive experiments on simulated and real-world data are conducted to show the effectiveness of the proposed method.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2009) 21 (1): 272–300.
Published: 01 January 2009
FIGURES
| View All (4)
Abstract
View article
PDF
We shed light on the discrimination between patterns belonging to two different classes by casting this decoding problem into a generalized prototype framework. The discrimination process is then separated into two stages: a projection stage that reduces the dimensionality of the data by projecting it on a line and a threshold stage where the distributions of the projected patterns of both classes are separated. For this, we extend the popular mean-of-class prototype classification using algorithms from machine learning that satisfy a set of invariance properties. We report a simple yet general approach to express different types of linear classification algorithms in an identical and easy-to-visualize formal framework using generalized prototypes where these prototypes are used to express the normal vector and offset of the hyperplane. We investigate non-margin classifiers such as the classical prototype classifier, the Fisher classifier, and the relevance vector machine. We then study hard and soft margin classifiers such as the support vector machine and a boosted version of the prototype classifier. Subsequently, we relate mean-of-class prototype classification to other classification algorithms by showing that the prototype classifier is a limit of any soft margin classifier and that boosting a prototype classifier yields the support vector machine. While giving novel insights into classification per se by presenting a common and unified formalism, our generalized prototype framework also provides an efficient visualization and a principled comparison of machine learning classification.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2006) 18 (12): 3097–3118.
Published: 01 December 2006
Abstract
View article
PDF
Volterra and Wiener series are perhaps the best-understood nonlinear system representations in signal processing. Although both approaches have enjoyed a certain popularity in the past, their application has been limited to rather low-dimensional and weakly nonlinear systems due to the exponential growth of the number of terms that have to be estimated. We show that Volterra and Wiener series can be represented implicitly as elements of a reproducing kernel Hilbert space by using polynomial kernels. The estimation complexity of the implicit representation is linear in the input dimensionality and independent of the degree of nonlinearity. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2006) 18 (1): 143–165.
Published: 01 January 2006
Abstract
View article
PDF
We attempt to shed light on the algorithms humans use to classify images of human faces according to their gender. For this, a novel methodology combining human psychophysics and machine learning is introduced. We proceed as follows. First, we apply principal component analysis (PCA) on the pixel information of the face stimuli. We then obtain a data set composed of these PCA eigenvectors combined with the subjects' gender estimates of the corresponding stimuli. Second, we model the gender classification process on this data set using a separating hyperplane (SH) between both classes. This SH is computed using algorithms from machine learning: the support vector machine (SVM), the relevance vector machine, the prototype classifier, and the K-means classifier. The classification behavior of humans and machines is then analyzed in three steps. First, the classification errors of humans and machines are compared for the various classifiers, and we also assess how well machines can recreate the subjects' internal decision boundary by studying the training errors of the machines. Second, we study the correlations between the rank-order of the subjects' responses to each stimulus—the gender estimate with its reaction time and confidence rating—and the rank-order of the distance of these stimuli to the SH. Finally, we attempt to compare the metric of the representations used by humans and machines for classification by relating the subjects' gender estimate of each stimulus and the distance of this stimulus to the SH. While we show that the classification error alone is not a sufficient selection criterion between the different algorithms humans might use to classify face stimuli, the distance of these stimuli to the SH is shown to capture essentials of the internal decision space of humans.Furthermore, algorithms such as the prototype classifier using stimuli in the center of the classes are shown to be less adapted to model human classification behavior than algorithms such as the SVM based on stimuli close to the boundary between the classes.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2001) 13 (7): 1443–1471.
Published: 01 July 2001
Abstract
View article
PDF
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple” subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2000) 12 (5): 1207–1245.
Published: 01 May 2000
Abstract
View article
PDF
We propose a new class of support vector algorithms for regression and classification. In these algorithms, a parameter ν lets one effectively control the number of support vectors. While this can be useful in its own right, the parameterization has the additional benefit of enabling us to eliminate one of the other free parameters of the algorithm: the accuracy parameter ε in the regression case, and the regularization constant C in the classification case. We describe the algorithms, give some theoretical results concerning the meaning and the choice of ν, and report experimental results.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1998) 10 (5): 1299–1319.
Published: 01 July 1998
Abstract
View article
PDF
A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map—for instance, the space of all possible five-pixel products in 16 × 16 images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.