Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-7 of 7
Motoaki Kawanabe
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (3): 445–484.
Published: 01 March 2016
FIGURES
| View All (22)
Abstract
View article
PDF
In many multivariate time series, the correlation structure is nonstationary, that is, it changes over time. The correlation structure may also change as a function of other cofactors, for example, the identity of the subject in biomedical data. A fundamental approach for the analysis of such data is to estimate the correlation structure (connectivities) separately in short time windows or for different subjects and use existing machine learning methods, such as principal component analysis (PCA), to summarize or visualize the changes in connectivity. However, the visualization of such a straightforward PCA is problematic because the ensuing connectivity patterns are much more complex objects than, say, spatial patterns. Here, we develop a new framework for analyzing variability in connectivities using the PCA approach as the starting point. First, we show how to analyze and visualize the principal components of connectivity matrices by a tailor-made rank-two matrix approximation in which we use the outer product of two orthogonal vectors. This leads to a new kind of transformation of eigenvectors that is particularly suited for this purpose and often enables interpretation of the principal component as connectivity between two groups of variables. Second, we show how to incorporate the orthogonality and the rank-two constraint in the estimation of PCA itself to improve the results. We further provide an interpretation of these methods in terms of estimation of a probabilistic generative model related to blind separation of dependent sources. Experiments on brain imaging data give very promising results.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (2): 349–376.
Published: 01 February 2014
FIGURES
| View All (10)
Abstract
View article
PDF
Electroencephalographic signals are known to be nonstationary and easily affected by artifacts; therefore, their analysis requires methods that can deal with noise. In this work, we present a way to robustify the popular common spatial patterns (CSP) algorithm under a maxmin approach. In contrast to standard CSP that maximizes the variance ratio between two conditions based on a single estimate of the class covariance matrices, we propose to robustly compute spatial filters by maximizing the minimum variance ratio within a prefixed set of covariance matrices called the tolerance set. We show that this kind of maxmin optimization makes CSP robust to outliers and reduces its tendency to overfit. We also present a data-driven approach to construct a tolerance set that captures the variability of the covariance matrices over time and shows its ability to reduce the nonstationarity of the extracted features and significantly improve classification accuracy. We test the spatial filters derived with this approach and compare them to standard CSP and a state-of-the-art method on a real-world brain-computer interface (BCI) data set in which we expect substantial fluctuations caused by environmental differences. Finally we investigate the advantages and limitations of the maxmin approach with simulations.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2004) 16 (5): 1077–1104.
Published: 01 May 2004
Abstract
View article
PDF
A well-known result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This letter follows the same spirit, as we will stabilize the unbiased generalization error estimates by regularization and finally obtain more robust model selection criteria for learning. We trade a small bias against a larger variance reduction, which has the beneficial effect of being more precise on a single training set. We focus on the subspace information criterion (SIC), which is an unbiased estimator of the expected generalization error measured by the reproducing kernel Hilbert space norm. SIC can be applied to the kernel regression, and it was shown in earlier experiments that a small regularization of SIC has a stabilization effect. However, it remained open how to appropriately determine the degree of regularization in SIC. In this article, we derive an unbiased estimator of the expected squared error, between SIC and the expected generalization error and propose determining the degree of regularization of SIC such that the estimator of the expected squared error is minimized. Computer simulations with artificial and real data sets illustrate that the proposed method works effectively for improving the precision of SIC, especially in the high-noise-level cases. We furthermore compare the proposed method to the original SIC, the cross-validation, and anempirical Bayesian method in ridge parameter selection, withgood results.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2004) 16 (1): 115–137.
Published: 01 January 2004
Abstract
View article
PDF
This letter analyzes the Fisher kernel from a statistical point of view. The Fisher kernel is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data (i.e., of the underlying data density). It is important to analyze and ultimately understand the statistical properties of the Fisher kernel. To this end, we first establish sufficient conditions that the constructed posterior model is realizable (i.e., it contains the true distribution). Realizability immediately leads to consistency results. Subsequently, we focus on an asymptotic analysis of the generalization error, which elucidates the learning curves of the Fisher kernel and how unlabeled data contribute to learning. We also point out that the squared or log loss is theoretically preferable-because both yield consistent estimators-to other losses such as the exponential loss, when a linear classifier is used together with the Fisher kernel. Therefore, this letter underlines that the Fisher kernel should be viewed not as a heuristics but as a powerful statistical tool with well-controlled statistical properties.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2003) 15 (5): 1089–1124.
Published: 01 May 2003
Abstract
View article
PDF
We propose kTDSEP, a kernel-based algorithm for nonlinear blind source separation (BSS). It combines complementary research fields: kernel feature spaces and BSS using temporal information. This yields an efficient algorithm for nonlinear BSS with invertible nonlinearity. Key assumptions are that the kernel feature space is chosen rich enough to approximate the nonlinearity and that signals of interest contain temporal information. Both assumptions are fulfilled for a wide set of real-world applications. The algorithm works as follows: First, the data are (implicitly) mapped to a high (possibly infinite)—dimensional kernel feature space. In practice, however, the data form a smaller submanifold in feature space—even smaller than the number of training data points—a fact that has already been used by, for example, reduced set techniques for support vector machines. We propose to adapt to this effective dimension as a preprocessing step and to construct an orthonormal basis of this submanifold. The latter dimension-reduction step is essential for making the subsequent application of BSS methods computationally and numerically tractable. In the reduced space, we use a BSS algorithm that is based on second-order temporal decorrelation. Finally, we propose a selection procedure to obtain the original sources from the extracted nonlinear components automatically. Experiments demonstrate the excellent performance and efficiency of our kTDSEP algorithm for several problems of nonlinear BSS and for more than two sources.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2002) 14 (10): 2397–2414.
Published: 01 October 2002
Abstract
View article
PDF
Recently, Jaakkola and Haussler (1999) proposed a method for constructing kernel functions from probabilistic models. Their so-called Fisher kernel has been combined with discriminative classifiers such as support vector machines and applied successfully in, for example, DNA and protein analysis. Whereas the Fisher kernel is calculated from the marginal log-likelihood, we propose the TOP kernel derived from tangent vectors of posterior log-odds. Furthermore, we develop a theoretical framework on feature extractors from probabilistic models and use it for analyzing the TOP kernel. In experiments, our new discriminative TOP kernel compares favorably to the Fisher kernel.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1994) 6 (6): 1244–1261.
Published: 01 November 1994
Abstract
View article
PDF
It was reported (Kabashima and Shinomoto 1992) that estimators of a binary decision boundary show asymptotically strange behaviors when the probability model is ill-posed or semiparametric. We give a rigorous analysis of this phenomenon in a stochastic perceptron by using the estimating function method. A stochastic perceptron consists of a neuron that is excited depending on the weighted sum of inputs but its probability distribution form is unknown here. It is shown that there exists no √ n -consistent estimator of the threshold value h , that is, no estimator h that converges to h in the order of 1/ √ n as the number n of observations increases. Therefore, the accuracy of estimation is much worse in this semiparametric case with an unspecified probability function than in the ordinary case. On the other hand, it is shown that there is a √ n -consistent estimator ŵ of the synaptic weight vector. These results elucidate strange behaviors of learning curves in a semiparametric statistical model.