Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-8 of 8
Danilo P. Mandic
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2024) 36 (9): 1912–1938.
Published: 19 August 2024
FIGURES
| View All (5)
Abstract
View article
PDF
Adam-type algorithms have become a preferred choice for optimization in the deep learning setting; however, despite their success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms, termed UAdam. It is equipped with a general form of the second-order moment, which makes it possible to include Adam and its existing and future variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. The approach is supported by a rigorous convergence analysis of UAdam in the general nonconvex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with a rate of O(1/ T ). Furthermore, the size of the neighborhood decreases as the parameter β 1 increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also reveal the convergence conditions of vanilla Adam, together with the selection of appropriate hyperparameters. This provides a theoretical guarantee for the analysis, applications, and further developments of the whole general class of Adam-type algorithms. Finally, several numerical experiments are provided to support our theoretical findings.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2023) 35 (8): 1404–1429.
Published: 12 July 2023
FIGURES
| View All (13)
Abstract
View article
PDF
Modern data analytics applications are increasingly characterized by exceedingly large and multidimensional data sources. This represents a challenge for traditional machine learning models, as the number of model parameters needed to process such data grows exponentially with the data dimensions, an effect known as the curse of dimensionality. Recently, tensor decomposition (TD) techniques have shown promising results in reducing the computational costs associated with large-dimensional models while achieving comparable performance. However, such tensor models are often unable to incorporate the underlying domain knowledge when compressing high-dimensional models. To this end, we introduce a novel graph-regularized tensor regression (GRTR) framework, whereby domain knowledge about intramodal relations is incorporated into the model in the form of a graph Laplacian matrix. This is then used as a regularization tool to promote a physically meaningful structure within the model parameters. By virtue of tensor algebra, the proposed framework is shown to be fully interpretable, both coefficient-wise and dimension-wise. The GRTR model is validated in a multiway regression setting and compared against competing models and is shown to achieve improved performance at reduced computational costs. Detailed visualizations are provided to help readers gain an intuitive understanding of the employed tensor operations.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2008) 20 (4): 1042–1064.
Published: 01 April 2008
Abstract
View article
PDF
A homomorphic feedforward network (HFFN) for nonlinear adaptive filtering is introduced. This is achieved by a two-layer feedforward architecture with an exponential hidden layer and logarithmic preprocessing step. This way, the overall input-output relationship can be seen as a generalized Volterra model, or as a bank of homomorphic filters. Gradient-based learning for this architecture is introduced, together with some practical issues related to the choice of optimal learning parameters and weight initialization. The performance and convergence speed are verified by analysis and extensive simulations. For rigor, the simulations are conducted on artificial and real-life data, and the performances are compared against those obtained by a sigmoidal feedforward network (FFN) with identical topology. The proposed HFFN proved to be a viable alternative to FFNs, especially in the critical case of online learning on small- and medium-scale data sets.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (4): 1039–1055.
Published: 01 April 2007
Abstract
View article
PDF
An augmented complex-valued extended Kalman filter (ACEKF) algorithm for the class of nonlinear adaptive filters realized as fully connected recurrent neural networks is introduced. This is achieved based on some recent developments in the so-called augmented complex statistics and the use of general fully complex nonlinear activation functions within the neurons. This makes the ACEKF suitable for processing general complex-valued nonlinear and nonstationary signals and also bivariate signals with strong component correlations. Simulations on benchmark and real-world complex-valued signals support the approach.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2004) 16 (12): 2699–2713.
Published: 01 December 2004
Abstract
View article
PDF
A complex-valued real-time recurrent learning (CRTRL) algorithm for the class of nonlinear adaptive filters realized as fully connected recurrent neural networks is introduced. The proposed CRTRL is derived for a general complex activation function of a neuron, which makes it suitable for nonlinear adaptive filtering of complex-valued nonlinear and nonstationary signals and complex signals with strong component correlations. In addition, this algorithm is generic and represents a natural extension of the real-valued RTRL. Simulations on benchmark and real-world complex-valued signals support the approach.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2002) 14 (11): 2693–2707.
Published: 01 November 2002
Abstract
View article
PDF
A class of data-reusing learning algorithms for real-time recurrent neural networks (RNNs) is analyzed. The analysis is undertaken for a general sigmoid nonlinear activation function of a neuron for the real time recurrent learning training algorithm. Error bounds and convergence conditions for such data-reusing algorithms are provided for both contractive and expansive activation functions. The analysis is undertaken for various configurations that are generalizations of a linear structure infinite impulse response adaptive filter.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2000) 12 (6): 1285–1292.
Published: 01 June 2000
Abstract
View article
PDF
The lower bounds for the a posteriori prediction error of a nonlinear predictor realized as a neural network are provided. These are obtained for a priori adaptation and a posteriori error networks with sigmoid nonlinearities trained by gradient-descent learning algorithms. A contractivity condition is imposed on a nonlinear activation function of a neuron so that the a posteriori prediction error is smaller in magnitude than the corresponding a priori one. Furthermore, an upper bound is imposed on the learning rate η so that the approach is feasible. The analysis is undertaken for both feedforward and recurrent nonlinear predictors realized as neural networks.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1999) 11 (5): 1069–1077.
Published: 01 July 1999
Abstract
View article
PDF
A relationship between the learning rate η in the learning algorithm, and the slope β in the nonlinear activation function, for a class of recurrent neural networks (RNNs) trained by the real-time recurrent learning algorithm is provided. It is shown that an arbitrary RNN can be obtained via the referent RNN, with some deterministic rules imposed on its weights and the learning rate. Such relationships reduce the number of degrees of freedom when solving the nonlinear optimization task of finding the optimal RNN parameters.