Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-5 of 5
Sepp Hochreiter
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2008) 20 (1): 271–287.
Published: 01 January 2008
Abstract
View article
PDF
We describe a fast sequential minimal optimization (SMO) procedure for solving the dual optimization problem of the recently proposed potential support vector machine (P-SVM). The new SMO consists of a sequence of iteration steps in which the Lagrangian is optimized with respect to either one (single SMO) or two (dual SMO) of the Lagrange multipliers while keeping the other variables fixed. An efficient selection procedure for Lagrange multipliers is given, and two heuristics for improving the SMO procedure are described: block optimization and annealing of the regularization parameter ε. A comparison of the variants shows that the dual SMO, including block optimization and annealing, performs efficiently in terms of computation time. In contrast to standard support vector machines (SVMs), the P-SVM is applicable to arbitrary dyadic data sets, but benchmarks are provided against libSVM's ε-SVR and C-SVC implementations for problems that are also solvable by standard SVM methods. For those problems, computation time of the P-SVM is comparable to or somewhat higher than the standard SVM. The number of support vectors found by the P-SVM is usually much smaller for the same generalization performance.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2006) 18 (6): 1472–1510.
Published: 01 June 2006
Abstract
View article
PDF
We describe a new technique for the analysis of dyadic data, where two sets of objects (row and column objects) are characterized by a matrix of numerical values that describe their mutual relationships. The new technique, called potential support vector machine (P-SVM), is a large-margin method for the construction of classifiers and regression functions for the column objects. Contrary to standard support vector machine approaches, the P-SVM minimizes a scale-invariant capacity measure and requires a new set of constraints. As a result, the P-SVM method leads to a usually sparse expansion of the classification and regression functions in terms of the row rather than the column objects and can handle data and kernel matrices that are neither positive definite nor square. We then describe two complementary regularization schemes. The first scheme improves generalization performance for classification and regression tasks; the second scheme leads to the selection of a small, informative set of row support objects and can be applied to feature selection. Benchmarks for classification, regression, and feature selection tasks are performed with toy data as well as with several real-world data sets. The results show that the new method is at least competitive with but often performs better than the benchmarked standard methods for standard vectorial as well as true dyadic data sets. In addition, a theoretical justification is provided for the new approach.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1999) 11 (3): 679–714.
Published: 01 April 1999
Abstract
View article
PDF
Low-complexity coding and decoding (LOCOCODE) is a novel approach to sensory coding and unsupervised learning. Unlike previous methods, it explicitly takes into account the information-theoretic complexity of the code generator. It computes lococodes that convey information about the input data and can be computed and decoded by low-complexity mappings. We implement LOCOCODE by training autoassociators with flat minimum search, a recent, general method for discovering low-complexity neural nets. It turns out that this approach can unmix an unknown number of independent data sources by extracting a minimal number of low-complexity features necessary for representing the data. Experiments show that unlike codes obtained with standard autoencoders, lococodes are based on feature detectors, never unstructured, usually sparse, and sometimes factorial or local (depending on statistical properties of the data). Although LOCOCODE is not explicitly designed to enforce sparse or factorial codes, it extracts optimal codes for difficult versions of the “bars” benchmark problem, whereas independent component analysis (ICA) and principal component analysis (PCA) do not. It produces familiar, biologically plausible feature detectors when applied to real-world images and codes with fewer bits per pixel than ICA and PCA. Unlike ICA, it does not need to know the number of independent sources. As a preprocessor for a vowel recognition benchmark problem, it sets the stage for excellent classification performance. Our results reveal an interesting, previously ignored connection between two important fields: regularizer research and ICA-related research. They may represent a first step toward unification of regularization and unsupervised learning.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1997) 9 (8): 1735–1780.
Published: 15 November 1997
Abstract
View article
PDF
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O . 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1997) 9 (1): 1–42.
Published: 01 January 1997
Abstract
View article
PDF
We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a “flat” minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to “simple” networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a “good” weight prior. Instead we have a prior over input output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and “optimal brain surgeon/optimal brain damage.”