Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-8 of 8
Klaus-Robert Müller
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (2): 349–376.
Published: 01 February 2014
FIGURES
| View All (10)
Abstract
View article
PDF
Electroencephalographic signals are known to be nonstationary and easily affected by artifacts; therefore, their analysis requires methods that can deal with noise. In this work, we present a way to robustify the popular common spatial patterns (CSP) algorithm under a maxmin approach. In contrast to standard CSP that maximizes the variance ratio between two conditions based on a single estimate of the class covariance matrices, we propose to robustly compute spatial filters by maximizing the minimum variance ratio within a prefixed set of covariance matrices called the tolerance set. We show that this kind of maxmin optimization makes CSP robust to outliers and reduces its tendency to overfit. We also present a data-driven approach to construct a tolerance set that captures the variability of the covariance matrices over time and shows its ability to reduce the nonstationarity of the extracted features and significantly improve classification accuracy. We test the spatial filters derived with this approach and compare them to standard CSP and a state-of-the-art method on a real-world brain-computer interface (BCI) data set in which we expect substantial fluctuations caused by environmental differences. Finally we investigate the advantages and limitations of the maxmin approach with simulations.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (3): 791–816.
Published: 01 March 2011
FIGURES
| View All (15)
Abstract
View article
PDF
Brain-computer interfaces (BCIs) allow users to control a computer application by brain activity as acquired (e.g., by EEG). In our classic machine learning approach to BCIs, the participants undertake a calibration measurement without feedback to acquire data to train the BCI system. After the training, the user can control a BCI and improve the operation through some type of feedback. However, not all BCI users are able to perform sufficiently well during feedback operation. In fact, a nonnegligible portion of participants (estimated 15%–30%) cannot control the system (a BCI illiteracy problem, generic to all motor-imagery-based BCIs). We hypothesize that one main difficulty for a BCI user is the transition from offline calibration to online feedback. In this work, we investigate adaptive machine learning methods to eliminate offline calibration and analyze the performance of 11 volunteers in a BCI based on the modulation of sensorimotor rhythms. We present an adaptation scheme that individually guides the user. It starts with a subject-independent classifier that evolves to a subject-optimized state-of-the-art classifier within one session while the user interacts continuously. These initial runs use supervised techniques for robust coadaptive learning of user and machine. Subsequent runs use unsupervised adaptation to track the features’ drift during the session and provide an unbiased measure of BCI performance. Using this approach, without any offline calibration, six users, including one novice, obtained good performance after 3 to 6 minutes of adaptation. More important, this novel guided learning also allows participants with BCI illiteracy to gain significant control with the BCI in less than 60 minutes. In addition, one volunteer without sensorimotor idle rhythm peak at the beginning of the BCI experiment developed it during the course of the session and used voluntary modulation of its amplitude to control the feedback application.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2004) 16 (5): 1077–1104.
Published: 01 May 2004
Abstract
View article
PDF
A well-known result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This letter follows the same spirit, as we will stabilize the unbiased generalization error estimates by regularization and finally obtain more robust model selection criteria for learning. We trade a small bias against a larger variance reduction, which has the beneficial effect of being more precise on a single training set. We focus on the subspace information criterion (SIC), which is an unbiased estimator of the expected generalization error measured by the reproducing kernel Hilbert space norm. SIC can be applied to the kernel regression, and it was shown in earlier experiments that a small regularization of SIC has a stabilization effect. However, it remained open how to appropriately determine the degree of regularization in SIC. In this article, we derive an unbiased estimator of the expected squared error, between SIC and the expected generalization error and propose determining the degree of regularization of SIC such that the estimator of the expected squared error is minimized. Computer simulations with artificial and real data sets illustrate that the proposed method works effectively for improving the precision of SIC, especially in the high-noise-level cases. We furthermore compare the proposed method to the original SIC, the cross-validation, and anempirical Bayesian method in ridge parameter selection, withgood results.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2004) 16 (1): 115–137.
Published: 01 January 2004
Abstract
View article
PDF
This letter analyzes the Fisher kernel from a statistical point of view. The Fisher kernel is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data (i.e., of the underlying data density). It is important to analyze and ultimately understand the statistical properties of the Fisher kernel. To this end, we first establish sufficient conditions that the constructed posterior model is realizable (i.e., it contains the true distribution). Realizability immediately leads to consistency results. Subsequently, we focus on an asymptotic analysis of the generalization error, which elucidates the learning curves of the Fisher kernel and how unlabeled data contribute to learning. We also point out that the squared or log loss is theoretically preferable-because both yield consistent estimators-to other losses such as the exponential loss, when a linear classifier is used together with the Fisher kernel. Therefore, this letter underlines that the Fisher kernel should be viewed not as a heuristics but as a powerful statistical tool with well-controlled statistical properties.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2003) 15 (5): 1089–1124.
Published: 01 May 2003
Abstract
View article
PDF
We propose kTDSEP, a kernel-based algorithm for nonlinear blind source separation (BSS). It combines complementary research fields: kernel feature spaces and BSS using temporal information. This yields an efficient algorithm for nonlinear BSS with invertible nonlinearity. Key assumptions are that the kernel feature space is chosen rich enough to approximate the nonlinearity and that signals of interest contain temporal information. Both assumptions are fulfilled for a wide set of real-world applications. The algorithm works as follows: First, the data are (implicitly) mapped to a high (possibly infinite)—dimensional kernel feature space. In practice, however, the data form a smaller submanifold in feature space—even smaller than the number of training data points—a fact that has already been used by, for example, reduced set techniques for support vector machines. We propose to adapt to this effective dimension as a preprocessing step and to construct an orthonormal basis of this submanifold. The latter dimension-reduction step is essential for making the subsequent application of BSS methods computationally and numerically tractable. In the reduced space, we use a BSS algorithm that is based on second-order temporal decorrelation. Finally, we propose a selection procedure to obtain the original sources from the extracted nonlinear components automatically. Experiments demonstrate the excellent performance and efficiency of our kTDSEP algorithm for several problems of nonlinear BSS and for more than two sources.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2002) 14 (10): 2397–2414.
Published: 01 October 2002
Abstract
View article
PDF
Recently, Jaakkola and Haussler (1999) proposed a method for constructing kernel functions from probabilistic models. Their so-called Fisher kernel has been combined with discriminative classifiers such as support vector machines and applied successfully in, for example, DNA and protein analysis. Whereas the Fisher kernel is calculated from the marginal log-likelihood, we propose the TOP kernel derived from tangent vectors of posterior log-odds. Furthermore, we develop a theoretical framework on feature extractors from probabilistic models and use it for analyzing the TOP kernel. In experiments, our new discriminative TOP kernel compares favorably to the Fisher kernel.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1998) 10 (5): 1299–1319.
Published: 01 July 1998
Abstract
View article
PDF
A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map—for instance, the space of all possible five-pixel products in 16 × 16 images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1996) 8 (2): 340–356.
Published: 15 February 1996
Abstract
View article
PDF
We present a method for the unsupervised segmentation of data streams originating from different unknown sources that alternate in time. We use an architecture consisting of competing neural networks. Memory is included to resolve ambiguities of input-output relations. To obtain maximal specialization, the competition is adiabatically increased during training. Our method achieves almost perfect identification and segmentation in the case of switching chaotic dynamics where input manifolds overlap and input-output relations are ambiguous. Only a small dataset is needed for the training procedure. Applications to time series from complex systems demonstrate the potential relevance of our approach for time series analysis and short-term prediction.