Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-5 of 5
S. Sathiya Keerthi
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2018) 30 (6): 1673–1724.
Published: 01 June 2018
FIGURES
| View All (5)
Abstract
View article
PDF
Deep learning involves a difficult nonconvex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this letter, we focus on situations where the model is distributedly stored and propose a novel distributed Newton method for training deep neural networks. By variable and feature-wise data partitions and some careful designs, we are able to explicitly use the Jacobian matrix for matrix-vector products in the Newton method. Some techniques are incorporated to reduce the running time as well as memory consumption. First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, we consider subsampled Gauss-Newton matrices for reducing the running time as well as the communication cost. Third, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. Compared with stochastic gradient methods, it is more robust and may give better test accuracy.
Includes: Supplementary data
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (3): 792–815.
Published: 01 March 2007
Abstract
View article
PDF
In this letter, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these optimization problems is linear in the number of training samples. The sequential minimal optimization algorithm is adapted for the resulting optimization problems; it is extremely easy to implement and scales efficiently as a quadratic function of the number of examples. The results of numerical experiments on some benchmark and real-world data sets, including applications of ordinal regression to information retrieval, verify the usefulness of these approaches.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (1): 283–301.
Published: 01 January 2007
Abstract
View article
PDF
We propose a fast, incremental algorithm for designing linear regression models. The proposed algorithm generates a sparse model by optimizing multiple smoothing parameters using the generalized cross-validation approach. The performances on synthetic and real-world data sets are compared with other incremental algorithms such as Tipping and Faul's fast relevance vector machine, Chen et al.'s orthogonal least squares, and Orr's regularized forward selection. The results demonstrate that the proposed algorithm is competitive.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2003) 15 (9): 2227–2254.
Published: 01 September 2003
Abstract
View article
PDF
This letter describes Bayesian techniques for support vector classification. In particular, we propose a novel differentiable loss function, called the trigonometric loss function, which has the desirable characteristic of natural normalization in the likelihood function, and then follow standard gaussian processes techniques to set up a Bayesian framework. In this framework, Bayesian inference is used to implement model adaptation, while keeping the merits of support vector classifier, such as sparseness and convex programming. This differs from standard gaussian processes for classification. Moreover, we put forward class probability in making predictions. Experimental results on benchmark data sets indicate the usefulness of this approach.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2003) 15 (7): 1667–1689.
Published: 01 July 2003
Abstract
View article
PDF
Support vector machines (SVMs) with the gaussian (RBF) kernel have been popular for practical use. Model selection in this class of SVMs involves two hyper parameters: the penalty parameter C and the kernel width σ. This letter analyzes the behavior of the SVM classifier when these hyper parameters take very small or very large values. Our results help in understanding the hyperparameter space that leads to an efficient heuristic method of searching for hyperparameter values with small generalization errors. The analysis also indicates that if complete model selection using the gaussian kernel has been conducted, there is no need to consider linear SVM.