Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-2 of 2
Howard Hua Yang
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (1998) 10 (8): 2137–2157.
Published: 15 November 1998
Abstract
View articletitled, Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons
View
PDF
for article titled, Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons
The natural gradient descent method is applied to train an n - m -1 multilayer perceptron. Based on an efficient scheme to represent the Fisher information matrix for an n - m -1 stochastic multilayer perceptron, a new algorithm is proposed to calculate the natural gradient without inverting the Fisher information matrix explicitly. When the input dimension n is much larger than the number of hidden neurons m , the time complexity of computing the natural gradient is O ( n ).
Journal Articles
Publisher: Journals Gateway
Neural Computation (1997) 9 (7): 1457–1482.
Published: 10 July 1997
Abstract
View articletitled, Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information
View
PDF
for article titled, Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information
There are two major approaches for blind separation: maximum entropy (ME) and minimum mutual information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the demixing matrix. The MI is the contrast function for blind separation; the entropy is not. To justify the ME, the relation between ME and MMI is first elucidated by calculating the first derivative of the entropy and proving that the mean subtraction is necessary in applying the ME and at the solution points determined by the MI, the ME will not update the demixing matrix in the directions of increasing the cross-talking. Second, the natural gradient instead of the ordinary gradient is introduced to obtain efficient algorithms, because the parameter space is a Riemannian space consisting of matrices. The mutual information is calculated by applying the Gram-Charlier expansion to approximate probability density functions of the outputs. Finally, we propose an efficient learning algorithm that incorporates with an adaptive method of estimating the unknown cumulants. It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.