Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-13 of 13
Laurenz Wiskott
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2024) 36 (9): 1669–1712.
Published: 19 August 2024
FIGURES
| View All (10)
Abstract
View article
PDF
This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.
Includes: Supplementary data
Journal Articles
Publisher: Journals Gateway
Neural Computation (2023) 35 (11): 1713–1796.
Published: 10 October 2023
FIGURES
| View All (21)
Abstract
View article
PDF
Markov chains are a class of probabilistic models that have achieved widespread application in the quantitative sciences. This is in part due to their versatility, but is compounded by the ease with which they can be probed analytically. This tutorial provides an in-depth introduction to Markov chains and explores their connection to graphs and random walks. We use tools from linear algebra and graph theory to describe the transition matrices of different types of Markov chains, with a particular focus on exploring properties of the eigenvalues and eigenvectors corresponding to these matrices. The results presented are relevant to a number of methods in machine learning and data mining, which we describe at various stages. Rather than being a novel academic study in its own right, this text presents a collection of known results, together with some new concepts. Moreover, the tutorial focuses on offering intuition to readers rather than formal understanding and only assumes basic exposure to concepts from linear algebra and probability theory. It is therefore accessible to students and researchers from a wide variety of disciplines.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2022) 34 (9): 1841–1870.
Published: 16 August 2022
FIGURES
Abstract
View article
PDF
Many studies have suggested that episodic memory is a generative process, but most computational models adopt a storage view. In this article, we present a model of the generative aspects of episodic memory. It is based on the central hypothesis that the hippocampus stores and retrieves selected aspects of an episode as a memory trace, which is necessarily incomplete. At recall, the neocortex reasonably fills in the missing parts based on general semantic information in a process we call semantic completion. The model combines two neural network architectures known from machine learning, the vector-quantized variational autoencoder (VQ-VAE) and the pixel convolutional neural network (PixelCNN). As episodes, we use images of digits and fashion items (MNIST) augmented by different backgrounds representing context. The model is able to complete missing parts of a memory trace in a semantically plausible way up to the point where it can generate plausible images from scratch, and it generalizes well to images not trained on. Compression as well as semantic completion contribute to a strong reduction in memory requirements and robustness to noise. Finally, we also model an episodic memory experiment and can reproduce that semantically congruent contexts are always recalled better than incongruent ones, high attention levels improve memory accuracy in both cases, and contexts that are not remembered correctly are more often remembered semantically congruently than completely wrong. This model contributes to a deeper understanding of the interplay between episodic memory and semantic information in the generative process of recalling the past.
Includes: Supplementary data
Journal Articles
Publisher: Journals Gateway
Neural Computation (2018) 30 (5): 1151–1179.
Published: 01 May 2018
FIGURES
| View All (14)
Abstract
View article
PDF
The computational principles of slowness and predictability have been proposed to describe aspects of information processing in the visual system. From the perspective of slowness being a limited special case of predictability we investigate the relationship between these two principles empirically. On a collection of real-world data sets we compare the features extracted by slow feature analysis (SFA) to the features of three recently proposed methods for predictable feature extraction: forecastable component analysis, predictable feature analysis, and graph-based predictable feature analysis. Our experiments show that the predictability of the learned features is highly correlated, and, thus, SFA appears to effectively implement a method for extracting predictable features according to different measures of predictability.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2018) 30 (2): 293–332.
Published: 01 February 2018
FIGURES
| View All (13)
Abstract
View article
PDF
The experimental evidence on the interrelation between episodic memory and semantic memory is inconclusive. Are they independent systems, different aspects of a single system, or separate but strongly interacting systems? Here, we propose a computational role for the interaction between the semantic and episodic systems that might help resolve this debate. We hypothesize that episodic memories are represented as sequences of activation patterns. These patterns are the output of a semantic representational network that compresses the high-dimensional sensory input. We show quantitatively that the accuracy of episodic memory crucially depends on the quality of the semantic representation. We compare two types of semantic representations: appropriate representations, which means that the representation is used to store input sequences that are of the same type as those that it was trained on, and inappropriate representations, which means that stored inputs differ from the training data. Retrieval accuracy is higher for appropriate representations because the encoded sequences are less divergent than those encoded with inappropriate representations. Consistent with our model prediction, we found that human subjects remember some aspects of episodes significantly more accurately if they had previously been familiarized with the objects occurring in the episode, as compared to episodes involving unfamiliar objects. We thus conclude that the interaction with the semantic system plays an important role for episodic memory.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (9): 2289–2323.
Published: 01 September 2011
FIGURES
| View All (6)
Abstract
View article
PDF
Primates are very good at recognizing objects independent of viewing angle or retinal position, and they outperform existing computer vision systems by far. But invariant object recognition is only one prerequisite for successful interaction with the environment. An animal also needs to assess an object's position and relative rotational angle. We propose here a model that is able to extract object identity, position, and rotation angles. We demonstrate the model behavior on complex three-dimensional objects under translation and rotation in depth on a homogeneous background. A similar model has previously been shown to extract hippocampal spatial codes from quasi-natural videos. The framework for mathematical analysis of this earlier application carries over to the scenario of invariant object recognition. Thus, the simulation results can be explained analytically even for the complex high-dimensional data we employed.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (2): 303–335.
Published: 01 February 2011
FIGURES
| View All (5)
Abstract
View article
PDF
We develop a group-theoretical analysis of slow feature analysis for the case where the input data are generated by applying a set of continuous transformations to static templates. As an application of the theory, we analytically derive nonlinear visual receptive fields and show that their optimal stimuli, as well as the orientation and frequency tuning, are in good agreement with previous simulations of complex cells in primary visual cortex (Berkes and Wiskott, 2005 ). The theory suggests that side and end stopping can be interpreted as a weak breaking of translation invariance. Direction selectivity is also discussed.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (4): 994–1021.
Published: 01 April 2007
Abstract
View article
PDF
In the linear case, statistical independence is a sufficient criterion for performing blind source separation. In the nonlinear case, however, it leaves an ambiguity in the solutions that has to be resolved by additional criteria. Here we argue that temporal slowness complements statistical independence well and that a combination of the two leads to unique solutions of the nonlinear blind source separation problem. The algorithm we present is a combination of second-order independent component analysis and slow feature analysis and is referred to as independent slow feature analysis. Its performance is demonstrated on nonlinearly mixed music data. We conclude that slowness is indeed a useful complement to statistical independence but that time-delayed second-order moments are only a weak measure of statistical independence.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2006) 18 (10): 2495–2508.
Published: 01 October 2006
Abstract
View article
PDF
We present an analytical comparison between linear slow feature analysis and second-order independent component analysis, and show that in the case of one time delay, the two approaches are equivalent. We also consider the case of several time delays and discuss two possible extensions of slow feature analysis.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2006) 18 (8): 1868–1895.
Published: 01 August 2006
Abstract
View article
PDF
In this letter, we introduce some mathematical and numerical tools to analyze and interpret inhomogeneous quadratic forms. The resulting characterization is in some aspects similar to that given by experimental studies of cortical cells, making it particularly suitable for application to second-order approximations and theoretical models of physiological receptive fields. We first discuss two ways of analyzing a quadratic form by visualizing the coefficients of its quadratic and linear term directly and by considering the eigenvectors of its quadratic term. We then present an algorithm to compute the optimal excitatory and inhibitory stimuli—those that maximize and minimize the considered quadratic form, respectively, given a fixed energy constraint. The analysis of the optimal stimuli is completed by considering their invariances, which are the transformations to which the quadratic form is most insensitive, and by introducing a test to determine which of these are statistically significant. Next we propose a way to measure the relative contribution of the quadratic and linear term to the total output of the quadratic form. Furthermore, we derive simpler versions of the above techniques in the special case of a quadratic form without linear term. In the final part of the letter, we show that for each quadratic form, it is possible to build an equivalent two-layer neural network, which is compatible with (but more general than) related networks used in some recent articles and with the energy model of complex cells. We show that the neural network is unique only up to an arbitrary orthogonal transformation of the excitatory and inhibitory subunits in the first layer.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2003) 15 (9): 2147–2177.
Published: 01 September 2003
Abstract
View article
PDF
Temporal slowness is a learning principle that allows learning of invariant representations by extracting slowly varying features from quickly varying input signals. Slow feature analysis (SFA) is an efficient algorithm based on this principle and has been applied to the learning of translation, scale, and other invariances in a simple model of the visual system. Here, a theoretical analysis of the optimization problem solved by SFA is presented, which provides a deeper understanding of the simulation results obtained in previous studies.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2002) 14 (4): 715–770.
Published: 01 April 2002
Abstract
View article
PDF
Invariant features of temporally varying signals are useful for analysis and classification. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of principal component analysis to this expanded signal and its time derivative. It is guaranteed to find the optimal solution within a family of functions directly and can learn to extract a large number of decor-related features, which are ordered by their degree of invariance. SFA can be applied hierarchically to process high-dimensional input signals and extract complex features. SFA is applied first to complex cell tuning properties based on simple cell output, including disparity and motion. Then more complicated input-output functions are learned by repeated application of SFA. Finally, a hierarchical network of SFA modules is presented as a simple model of the visual system. The same unstructured network can learn translation, size, rotation, contrast, or, to a lesser degree, illumination invariance for one-dimensional objects, depending on only the training stimulus. Surprisingly, only a few training objects suffice to achieve good generalization to new objects. The generated representation is suitable for object recognition. Performance degrades if the network is trained to learn multiple invariances simultaneously.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1998) 10 (3): 671–716.
Published: 01 April 1998
Abstract
View article
PDF
Computational models of neural map formation can be considered on at least three different levels of abstraction: detailed models including neural activity dynamics, weight dynamics that abstract from the neural activity dynamics by an adiabatic approximation, and constrained optimization from which equations governing weight dynamics can be derived. Constrained optimization uses an objective function, from which a weight growth rule can be derived as a gradient flow, and some constraints, from which normalization rules are derived. In this article, we present an example of how an optimization problem can be derived from detailed nonlinear neural dynamics. A systematic investigation reveals how different weight dynamics introduced previously can be derived from two types of objective function terms and two types of constraints. This includes dynamic link matching as a special case of neural map formation. We focus in particular on the role of coordinate transformations to derive different weight dynamics from the same optimization problem. Several examples illustrate how the constrained optimization framework can help in understanding, generating, and comparing different models of neural map formation. The techniques used in this analysis may also be useful in investigating other types of neural dynamics.