Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-3 of 3
Alessandro Verri
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (12): 2855–2895.
Published: 01 December 2014
FIGURES
| View All (15)
Abstract
View article
PDF
We present an algorithm for dictionary learning that is based on the alternating proximal algorithm studied by Attouch, Bolte, Redont, and Soubeyran ( 2010 ), coupled with a reliable and efficient dual algorithm for computation of the related proximity operators. This algorithm is suitable for a general dictionary learning model composed of a Bregman-type data fit term that accounts for the goodness of the representation and several convex penalization terms on the coefficients and atoms, explaining the prior knowledge at hand. As Attouch et al. recently proved, an alternating proximal scheme ensures better convergence properties than the simpler alternating minimization. We take care of the issue of inexactness in the computation of the involved proximity operators, giving a sound stopping criterion for the dual inner algorithm, which keeps under control the related errors, unavoidable for such a complex penalty terms, providing ultimately an overall effective procedure. Thanks to the generality of the proposed framework, we give an application in the context of genome-wide data understanding, revising the model proposed by Nowak, Hastie, Pollack, and Tibshirani ( 2011 ). The aim is to extract latent features (atoms) and perform segmentation on array-based comparative genomic hybridization (aCGH) data. We improve several important aspects that increase the quality and interpretability of the results. We show the effectiveness of the proposed model with two experiments on synthetic data, which highlight the enhancements over the original model.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2004) 16 (5): 1063–1076.
Published: 01 May 2004
Abstract
View article
PDF
In this letter, we investigate the impact of choosing different loss functions from the viewpoint of statistical learning theory. We introduce a convexity assumption, which is met by all loss functions commonly used in the literature, and study how the bound on the estimation error changes with the loss. We also derive a general result on the minimizer of the expected risk for a convex loss function in the case of classification. The main outcome of our analysis is that for classification, the hinge loss appears to be the loss of choice. Other things being equal, the hinge loss leads to a convergence rate practically indistinguishable from the logistic loss rate and much better than the square loss rate. Furthermore, if the hypothesis space is sufficiently rich, the bounds obtained for the hinge loss are not loosened by the thresholding stage.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1998) 10 (4): 955–974.
Published: 15 May 1998
Abstract
View article
PDF
Support vector machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed support vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this article, we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending on only the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m , we also show that there are at most m + 1 margin vectors and observe that m + 1 SVs are usually sufficient to determine the decision surface fully. For relatively small m , this latter result leads to a consistent reduction of the SV number.