Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-3 of 3
Javier R. Movellan
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2008) 20 (9): 2238–2252.
Published: 01 September 2008
Abstract
View article
PDF
This letter presents an analysis of the contrastive divergence (CD) learning algorithm when applied to continuous-time linear stochastic neural networks. For this case, powerful techniques exist that allow a detailed analysis of the behavior of CD. The analysis shows that CD converges to maximum likelihood solutions only when the network structure is such that it can match the first moments of the desired distribution. Otherwise, CD can converge to solutions arbitrarily different from the log-likelihood solutions, or they can even diverge. This result suggests the need to improve our theoretical understanding of the conditions under which CD is expected to be well behaved and the conditions under which it may fail. In, addition the results point to practical ideas on how to improve the performance of CD.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2002) 14 (7): 1507–1544.
Published: 01 July 2002
Abstract
View article
PDF
We present a Monte Carlo approach for training partially observable diffusion processes. We apply the approach to diffusion networks, a stochastic version of continuous recurrent neural networks. The approach is aimed at learning probability distributions of continuous paths, not just expected values. Interestingly, the relevant activation statistics used by the learning rule presented here are inner products in the Hilbert space of square integrable functions. These inner products can be computed using Hebbian operations and do not require backpropagation of error signals. Moreover, standard kernel methods could potentially be applied to compute such inner products. We propose that the main reason that recurrent neural networks have not worked well in engineering applications (e.g., speech recognition) is that they implicitly rely on a very simplistic likelihood model. The diffusion network approach proposed here is much richer and may open new avenues for applications of recurrent neural networks. We present some analysis and simulations to support this view. Very encouraging results were obtained on a visual speech recognition task in which neural networks outperformed hidden Markov models.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1998) 10 (5): 1157–1178.
Published: 01 July 1998
Abstract
View article
PDF
This article analyzes learning in continuous stochastic neural networks defined by stochastic differential equations (SDE). In particular, it studies gradient descent learning rules to train the equilibrium solutions of these networks. A theorem is given that specifies sufficient conditions for the gradient descent learning rules to be local covariance statistics between two random variables: (1) an evaluator that is the same for all the network parameters and (2) a system variable that is independent of the learning objective. While this article focuses on continuous stochastic neural networks, the theorem applies to any other system with Boltzmann-like equilibrium distributions. The generality of the theorem suggests that instead of suppressing noise present in physical devices, a natural alternative is to use it to simplify the credit assignment problem. In deterministic networks, credit assignment requires an evaluation signal that is different for each node in the network. Surprisingly, when noise is not suppressed, all that is needed is an evaluator that is the same for the entire network and a local Hebbian signal. This modularization of signals greatly simplifies hardware and software implementations. The article shows how the theorem applies to four different learning objectives that span supervised, reinforcement, and unsupervised problems: (1) regression, (2) density estimation, (3) risk minimization, and (4) information maximization. Simulations, implementation issues, and implications for computational neuroscience are discussed.