Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-3 of 3
Heng Huang
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2024) 36 (5): 897–935.
Published: 23 April 2024
FIGURES
| View All (4)
Abstract
View article
PDF
Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O ( 1 ) computation, which is significantly less than O ( d ) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from O min d n 1 / 2 ε 2 , d ε 3 to O ˜ n + d ε 2 under d > n 1 2 for nonconvex problems, and from O d ε 2 to O ˜ n log 1 ε + d ε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2017) 29 (12): 3381–3396.
Published: 01 December 2017
FIGURES
Abstract
View article
PDF
Spectral clustering is a key research topic in the field of machine learning and data mining. Most of the existing spectral clustering algorithms are built on gaussian Laplacian matrices, which is sensitive to parameters. We propose a novel parameter-free distance-consistent locally linear embedding. The proposed distance-consistent LLE can promise that edges between closer data points are heavier. We also propose a novel improved spectral clustering via embedded label propagation. Our algorithm is built on two advancements of the state of the art. First is label propagation, which propagates a node's labels to neighboring nodes according to their proximity. We perform standard spectral clustering on original data and assign each cluster with -nearest data points and then we propagate labels through dense unlabeled data regions. Second is manifold learning, which has been widely used for its capacity to leverage the manifold structure of data points. Extensive experiments on various data sets validate the superiority of the proposed algorithm compared to state-of-the-art spectral algorithms.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2015) 27 (8): 1766–1795.
Published: 01 August 2015
FIGURES
| View All (20)
Abstract
View article
PDF
Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time-consuming. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations, the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this work, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a two-dimensional subproblem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification.
Includes: Supplementary data