Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-5 of 5
S. Sundararajan
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2018) 30 (6): 1673–1724.
Published: 01 June 2018
FIGURES
| View All (5)
Abstract
View article
PDF
Deep learning involves a difficult nonconvex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this letter, we focus on situations where the model is distributedly stored and propose a novel distributed Newton method for training deep neural networks. By variable and feature-wise data partitions and some careful designs, we are able to explicitly use the Jacobian matrix for matrix-vector products in the Newton method. Some techniques are incorporated to reduce the running time as well as memory consumption. First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, we consider subsampled Gauss-Newton matrices for reducing the running time as well as the communication cost. Third, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. Compared with stochastic gradient methods, it is more robust and may give better test accuracy.
Includes: Supplementary data
Journal Articles
Publisher: Journals Gateway
Neural Computation (2015) 27 (10): 2183–2206.
Published: 01 October 2015
FIGURES
| View All (13)
Abstract
View article
PDF
In structured output learning, obtaining labeled data for real-world applications is usually costly, while unlabeled examples are available in abundance. Semisupervised structured classification deals with a small number of labeled examples and a large number of unlabeled structured data. In this work, we consider semisupervised structural support vector machines with domain constraints. The optimization problem, which in general is not convex, contains the loss terms associated with the labeled and unlabeled examples, along with the domain constraints. We propose a simple optimization approach that alternates between solving a supervised learning problem and a constraint matching problem. Solving the constraint matching problem is difficult for structured prediction, and we propose an efficient and effective label switching method to solve it. The alternating optimization is carried out within a deterministic annealing framework, which helps in effective constraint matching and avoiding poor local minima, which are not very useful. The algorithm is simple and easy to implement. Further, it is suitable for any structured output learning problem where exact inference is available. Experiments on benchmark sequence labeling data sets and a natural language parsing data set show that the proposed approach, though simple, achieves comparable generalization performance.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2009) 21 (7): 2082–2103.
Published: 01 July 2009
FIGURES
| View All (4)
Abstract
View article
PDF
Gaussian processes (GPs) are promising Bayesian methods for classification and regression problems. Design of a GP classifier and making predictions using it is, however, computationally demanding, especially when the training set size is large. Sparse GP classifiers are known to overcome this limitation. In this letter, we propose and study a validation-based method for sparse GP classifier design. The proposed method uses a negative log predictive (NLP) loss measure, which is easy to compute for GP models. We use this measure for both basis vector selection and hyperparameter adaptation. The experimental results on several real-world benchmark data sets show better or comparable generalization performance over existing methods.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (1): 283–301.
Published: 01 January 2007
Abstract
View article
PDF
We propose a fast, incremental algorithm for designing linear regression models. The proposed algorithm generates a sparse model by optimizing multiple smoothing parameters using the generalized cross-validation approach. The performances on synthetic and real-world data sets are compared with other incremental algorithms such as Tipping and Faul's fast relevance vector machine, Chen et al.'s orthogonal least squares, and Orr's regularized forward selection. The results demonstrate that the proposed algorithm is competitive.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2001) 13 (5): 1103–1118.
Published: 01 May 2001
Abstract
View article
PDF
Gaussian processes are powerful regression models specified by parameterized mean and covariance functions. Standard approaches to choose these parameters (known by the name hyperparameters) are maximum likelihood and maximum a posteriori. In this article, we propose and investigate predictive approaches based on Geisser's predictive sample reuse (PSR) methodology and the related Stone's cross-validation (CV) methodology. More specifically, we derive results for Geisser's surrogate predictive probability (GPP), Geisser's predictive mean square error (GPE), and the standard CV error and make a comparative study. Within an approximation we arrive at the generalized cross-validation (GCV) and establish its relationship with the GPP and GPE approaches. These approaches are tested on a number of problems. Experimental results show that these approaches are strongly competitive with the existing approaches.