Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-8 of 8
Michael I. Jordan
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2000) 12 (12): 2881–2907.
Published: 01 December 2000
Abstract
View article
PDF
It is well known that the convergence rate of the expectation-maximization (EM) algorithm can be faster than those of convention first-order iterative algorithms when the overlap in the given mixture is small. But this argument has not been mathematically proved yet. This article studies this problem asymptotically in the setting of gaussian mixtures under the theoretical framework of Xu and Jordan (1996). It has been proved that the asymptotic convergence rate of the EM algorithm for gaussian mixtures locally around the true solution Θ* is o( e 0.5−ε (Θ*) ), where ε > 0 is an arbitrarily small number, o ( x ) means that it is a higher-order infinitesimal as x → 0, and e (Θ*) is a measure of the average overlap of gaussians in the mixture. In other words, the large sample local convergence rate for the EM algorithm tends to be asymptotically superlinear when e (Θ*) tends to zero.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2000) 12 (6): 1313–1335.
Published: 01 June 2000
Abstract
View article
PDF
We study the probabilistic generative models parameterized by feedfor-ward neural networks. An attractor dynamics for probabilistic inference in these models is derived from a mean field approximation for large, layered sigmoidal networks. Fixed points of the dynamics correspond to solutions of the mean field equations, which relate the statistics of each unittothoseofits Markovblanket. We establish global convergence of the dynamics by providing a Lyapunov function and show that the dynamics generate the signals required for unsupervised learning. Our results for feedforward networks provide a counterpart to those of Cohen-Grossberg and Hopfield for symmetric networks.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1997) 9 (2): 227–269.
Published: 15 February 1997
Abstract
View article
PDF
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas, including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper presents a self-contained review of the basic principles of PINs. It is shown that the well-known forward-backward (F-B) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1996) 8 (1): 129–151.
Published: 01 January 1996
Abstract
View article
PDF
We build up the mathematical connection between the “Expectation-Maximization” (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix P , and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of P and provide new results analyzing the effect that P has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of gaussian mixture models.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1994) 6 (6): 1174–1184.
Published: 01 November 1994
Abstract
View article
PDF
We introduce a large family of Boltzmann machines that can be trained by standard gradient descent. The networks can have one or more layers of hidden units, with tree-like connectivity. We show how to implement the supervised learning algorithm for these Boltzmann machines exactly, without resort to simulated or mean-field annealing. The stochastic averages that yield the gradients in weight space are computed by the technique of decimation. We present results on the problems of N -bit parity and the detection of hidden symmetries.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1994) 6 (6): 1185–1201.
Published: 01 November 1994
Abstract
View article
PDF
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(λ) algorithm of Sutton (1988) and the Q -learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(λ) and Q -learning belong.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1994) 6 (2): 181–214.
Published: 01 March 1994
Abstract
View article
PDF
We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1991) 3 (1): 79–87.
Published: 01 March 1991
Abstract
View article
PDF
We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently different approaches. We demonstrate that the learning procedure divides up a vowel discrimination task into appropriate subtasks, each of which can be solved by a very simple expert network.