Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-15 of 15
Aapo Hyvärinen
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2021) 33 (8): 2128–2162.
Published: 26 July 2021
Abstract
View article
PDF
Summarizing large-scale directed graphs into small-scale representations is a useful but less-studied problem setting. Conventional clustering approaches, based on Min-Cut-style criteria, compress both the vertices and edges of the graph into the communities, which lead to a loss of directed edge information. On the other hand, compressing the vertices while preserving the directed-edge information provides a way to learn the small-scale representation of a directed graph. The reconstruction error, which measures the edge information preserved by the summarized graph, can be used to learn such representation. Compared to the original graphs, the summarized graphs are easier to analyze and are capable of extracting group-level features, useful for efficient interventions of population behavior. In this letter, we present a model, based on minimizing reconstruction error with nonnegative constraints, which relates to a Max-Cut criterion that simultaneously identifies the compressed nodes and the directed compressed relations between these nodes. A multiplicative update algorithm with column-wise normalization is proposed. We further provide theoretical results on the identifiability of the model and the convergence of the proposed algorithms. Experiments are conducted to demonstrate the accuracy and robustness of the proposed method.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2017) 29 (11): 2887–2924.
Published: 01 November 2017
FIGURES
| View All (19)
Abstract
View article
PDF
The statistical dependencies that independent component analysis (ICA) cannot remove often provide rich information beyond the linear independent components. It would thus be very useful to estimate the dependency structure from data. While such models have been proposed, they have usually concentrated on higher-order correlations such as energy (square) correlations. Yet linear correlations are a fundamental and informative form of dependency in many real data sets. Linear correlations are usually completely removed by ICA and related methods so they can only be analyzed by developing new methods that explicitly allow for linearly correlated components. In this article, we propose a probabilistic model of linear nongaussian components that are allowed to have both linear and energy correlations. The precision matrix of the linear components is assumed to be randomly generated by a higher-order process and explicitly parameterized by a parameter matrix. The estimation of the parameter matrix is shown to be particularly simple because using score-matching (Hyvärinen, 2005 ), the objective function is a quadratic form. Using simulations with artificial data, we demonstrate that the proposed method improves the identifiability of nongaussian components by simultaneously learning their correlation structure. Applications on simulated complex cells with natural image input, as well as spectrograms of natural audio data, show that the method finds new kinds of dependencies between the components.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (7): 1249–1264.
Published: 01 July 2016
FIGURES
| View All (6)
Abstract
View article
PDF
In visual modeling, invariance properties of visual cells are often explained by a pooling mechanism, in which outputs of neurons with similar selectivities to some stimulus parameters are integrated so as to gain some extent of invariance to other parameters. For example, the classical energy model of phase-invariant V1 complex cells pools model simple cells preferring similar orientation but different phases. Prior studies, such as independent subspace analysis, have shown that phase-invariance properties of V1 complex cells can be learned from spatial statistics of natural inputs. However, those previous approaches assumed a squaring nonlinearity on the neural outputs to capture energy correlation; such nonlinearity is arguably unnatural from a neurobiological viewpoint but hard to change due to its tight integration into their formalisms. Moreover, they used somewhat complicated objective functions requiring expensive computations for optimization. In this study, we show that visual spatial pooling can be learned in a much simpler way using strong dimension reduction based on principal component analysis. This approach learns to ignore a large part of detailed spatial structure of the input and thereby estimates a linear pooling matrix. Using this framework, we demonstrate that pooling of model V1 simple cells learned in this way, even with nonlinearities other than squaring, can reproduce standard tuning properties of V1 complex cells. For further understanding, we analyze several variants of the pooling model and argue that a reasonable pooling can generally be obtained from any kind of linear transformation that retains several of the first principal components and suppresses the remaining ones. In particular, we show how the classic Wiener filtering theory leads to one such variant.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (3): 445–484.
Published: 01 March 2016
FIGURES
| View All (22)
Abstract
View article
PDF
In many multivariate time series, the correlation structure is nonstationary, that is, it changes over time. The correlation structure may also change as a function of other cofactors, for example, the identity of the subject in biomedical data. A fundamental approach for the analysis of such data is to estimate the correlation structure (connectivities) separately in short time windows or for different subjects and use existing machine learning methods, such as principal component analysis (PCA), to summarize or visualize the changes in connectivity. However, the visualization of such a straightforward PCA is problematic because the ensuing connectivity patterns are much more complex objects than, say, spatial patterns. Here, we develop a new framework for analyzing variability in connectivities using the PCA approach as the starting point. First, we show how to analyze and visualize the principal components of connectivity matrices by a tailor-made rank-two matrix approximation in which we use the outer product of two orthogonal vectors. This leads to a new kind of transformation of eigenvectors that is particularly suited for this purpose and often enables interpretation of the principal component as connectivity between two groups of variables. Second, we show how to incorporate the orthogonality and the rank-two constraint in the estimation of PCA itself to improve the results. We further provide an interpretation of these methods in terms of estimation of a probabilistic generative model related to blind separation of dependent sources. Experiments on brain imaging data give very promising results.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2015) 27 (7): 1373–1404.
Published: 01 July 2015
FIGURES
| View All (35)
Abstract
View article
PDF
Unsupervised analysis of the dynamics (nonstationarity) of functional brain connectivity during rest has recently received a lot of attention in the neuroimaging and neuroengineering communities. Most studies have used functional magnetic resonance imaging, but electroencephalography (EEG) and magnetoencephalography (MEG) also hold great promise for analyzing nonstationary functional connectivity with high temporal resolution. Previous EEG/MEG analyses divided the problem into two consecutive stages: the separation of neural sources and then the connectivity analysis of the separated sources. Such nonoptimal division into two stages may bias the result because of the different prior assumptions made about the data in the two stages. We propose a unified method for separating EEG/MEG sources and learning their functional connectivity (coactivation) patterns. We combine blind source separation (BSS) with unsupervised clustering of the activity levels of the sources in a single probabilistic model. A BSS is performed on the Hilbert transforms of band-limited EEG/MEG signals, and coactivation patterns are learned by a mixture model of source envelopes. Simulation studies show that the unified approach often outperforms conventional two-stage methods, indicating further the benefit of using Hilbert transforms to deal with oscillatory sources. Experiments on resting-state EEG data, acquired in conjunction with a cued motor imagery or nonimagery task, also show that the states (clusters) obtained by the proposed method often correlate better with physiologically meaningful quantities than those obtained by a two-stage method.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (1): 57–83.
Published: 01 January 2014
FIGURES
| View All (13)
Abstract
View article
PDF
We consider learning a causal ordering of variables in a linear nongaussian acyclic model called LiNGAM. Several methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But the estimation results could be distorted if some assumptions are violated. In this letter, we propose a new algorithm for learning causal orders that is robust against one typical violation of the model assumptions: latent confounders. The key idea is to detect latent confounders by testing independence between estimated external influences and find subsets (parcels) that include variables unaffected by latent confounders. We demonstrate the effectiveness of our method using artificial data and simulated brain imaging data.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2010) 22 (9): 2308–2333.
Published: 01 September 2010
FIGURES
| View All (7)
Abstract
View article
PDF
We consider a hierarchical two-layer model of natural signals in which both layers are learned from the data. Estimation is accomplished by score matching, a recently proposed estimation principle for energy-based models. If the first-layer outputs are squared and the second-layer weights are constrained to be nonnegative, the model learns responses similar to complex cells in primary visual cortex from natural images. The second layer pools a small number of features with similar orientation and frequency, but differing in spatial phase. For speech data, we obtain analogous results. The model unifies previous extensions to independent component analysis such as subspace and topographic models and provides new evidence that localized, oriented, phase-invariant features reflect the statistical properties of natural image patches.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2008) 20 (12): 3087–3110.
Published: 01 December 2008
Abstract
View article
PDF
In signal restoration by Bayesian inference, one typically uses a parametric model of the prior distribution of the signal. Here, we consider how the parameters of a prior model should be estimated from observations of uncorrupted signals. A lot of recent work has implicitly assumed that maximum likelihood estimation is the optimal estimation method. Our results imply that this is not the case. We first obtain an objective function that approximates the error occurred in signal restoration due to an imperfect prior model. Next, we show that in an important special case (small gaussian noise), the error is the same as the score-matching objective function, which was previously proposed as an alternative for likelihood based on purely computational considerations. Our analysis thus shows that score matching combines computational simplicity with statistical optimality in signal restoration, providing a viable alternative to maximum likelihood methods. We also show how the method leads to a new intuitive and geometric interpretation of structure inherent in probability distributions.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2006) 18 (10): 2283–2292.
Published: 01 October 2006
Abstract
View article
PDF
A Boltzmann machine is a classic model of neural computation, and a number of methods have been proposed for its estimation. Most methods are plagued by either very slow convergence or asymptotic bias in the resulting estimates. Here we consider estimation in the basic case of fully visible Boltzmann machines. We show that the old principle of pseudolikelihood estimation provides an estimator that is computationally very simple yet statistically consistent.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2003) 15 (3): 663–691.
Published: 01 March 2003
Abstract
View article
PDF
Recently, statistical models of natural images have shown the emergence of several properties of the visual cortex. Most models have considered the nongaussian properties of static image patches, leading to sparse coding or independent component analysis. Here we consider the basic time dependencies of image sequences instead of their nongaussianity. We show that simple-cell-type receptive fields emerge when temporal response strength correlation is maximized for natural image sequences. Thus, temporal response strength correlation, which is a nonlinear measure of temporal coherence, provides an alternative to sparseness in modeling simple-cell receptive field properties. Our results also suggest an interpretation of simple cells in terms of invariant coding principles, which have previously been used to explain complex-cell receptive fields.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2001) 13 (7): 1527–1558.
Published: 01 July 2001
Abstract
View article
PDF
In ordinary independent component analysis, the components are assumed to be completely independent, and they do not necessarily have any meaningful order relationships. In practice, however, the estimated “independent” components are often not at all independent. We propose that this residual dependence structure could be used to define a topo-graphic order for the components. In particular, a distance between two components could be defined using their higher-order correlations, and this distance could be used to create a topographic representation. Thus, we obtain a linear decomposition into approximately independent components, where the dependence of two components is approximated by the proximity of the components in the topographic representation.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2001) 13 (4): 883–898.
Published: 01 April 2001
Abstract
View article
PDF
A generalization of projection pursuit for time series, that is, signals with time structure, is introduced. The goal is to find projections of time series that have interesting structure, defined using criteria related to Kolmogoroff complexity or coding length. Interesting signals are those that can be coded with a short code length. We derive a simple approximation of coding length that takes into account both the nongaussianity and the autocorrelations of the time series. Also, we derive a simple algorithm for its approximative optimization. The resulting method is closely related to blind separation of nongaussian, time-dependent source signals.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2000) 12 (7): 1705–1720.
Published: 01 July 2000
Abstract
View article
PDF
Olshausen and Field (1996) applied the principle of independence maximization by sparse coding to extract features from natural images. This leads to the emergence of oriented linear filters that have simultaneous localization in space and in frequency, thus resembling Gabor functions and simple cell receptive fields. In this article, we show that the same principle of independence maximization can explain the emergence of phase- and shift-invariant features, similar to those found in complex cells. This new kind of emergence is obtained by maximizing the independence between norms of projections on linear subspaces (instead of the independence of simple linear filter outputs). The norms of the projections on such “independent feature subspaces” then indicate the values of invariant features.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1999) 11 (7): 1739–1768.
Published: 01 October 1999
Abstract
View article
PDF
Sparse coding is a method for finding a representation of data in which each of the components of the representation is only rarely significantly active. Such a representation is closely related to redundancy reduction and independent component analysis, and has some neurophysiological plausibility. In this article, we show how sparse coding can be used for denoising. Using maximum likelihood estimation of nongaussian variables corrupted by gaussian noise, we show how to apply a soft-thresholding (shrinkage) operator on the components of sparse coding so as to reduce noise. Our method is closely related to the method of wavelet shrinkage, but it has the important benefit over wavelet methods that the representation is determined solely by the statistical properties of the data. The wavelet representation, on the other hand, relies heavily on certain mathematical properties (like self-similarity) that may be only weakly related to the properties of natural data.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1997) 9 (7): 1483–1492.
Published: 10 July 1997
Abstract
View article
PDF
We introduce a novel fast algorithm for independent component analysis, which can be used for blind source separation and feature extraction. We show how a neural network learning rule can be transformed into a fixedpoint iteration, which provides an algorithm that is very simple, does not depend on any user-defined parameters, and is fast to converge to the most accurate solution allowed by the data. The algorithm finds, one at a time, all nongaussian independent components, regardless of their probability distributions. The computations can be performed in either batch mode or a semiadaptive manner. The convergence of the algorithm is rigorously proved, and the convergence speed is shown to be cubic. Some comparisons to gradient-based algorithms are made, showing that the new algorithm is usually 10 to 100 times faster, sometimes giving the solution in just a few iterations.