Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-7 of 7
Barbara Hammer
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2012) 24 (3): 771–804.
Published: 01 March 2012
FIGURES
| View All (4)
Abstract
View articletitled, A General Framework for Dimensionality-Reducing Data Visualization Mapping
View
PDF
for article titled, A General Framework for Dimensionality-Reducing Data Visualization Mapping
In recent years, a wealth of dimension-reduction techniques for data visualization and preprocessing has been established. Nonparametric methods require additional effort for out-of-sample extensions, because they provide only a mapping of a given finite set of points. In this letter, we propose a general view on nonparametric dimension reduction based on the concept of cost functions and properties of the data. Based on this general principle, we transfer nonparametric dimension reduction to explicit mappings of the data manifold such that direct out-of-sample extensions become possible. Furthermore, this concept offers the possibility of investigating the generalization ability of data visualization to new data points. We demonstrate the approach based on a simple global linear mapping, as well as prototype-based local linear mappings. In addition, we can bias the functional form according to given auxiliary information. This leads to explicit supervised visualization mappings with discriminative properties comparable to state-of-the-art approaches.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2010) 22 (9): 2229–2284.
Published: 01 September 2010
FIGURES
| View All (6)
Abstract
View articletitled, Topographic Mapping of Large Dissimilarity Data Sets
View
PDF
for article titled, Topographic Mapping of Large Dissimilarity Data Sets
Topographic maps such as the self-organizing map (SOM) or neural gas (NG) constitute powerful data mining techniques that allow simultaneously clustering data and inferring their topological structure, such that additional features, for example, browsing, become available. Both methods have been introduced for vectorial data sets; they require a classical feature encoding of information. Often data are available in the form of pairwise distances only, such as arise from a kernel matrix, a graph, or some general dissimilarity measure. In such cases, NG and SOM cannot be applied directly. In this article, we introduce relational topographic maps as an extension of relational clustering algorithms, which offer prototype-based representations of dissimilarity data, to incorporate neighborhood structure. These methods are equivalent to the standard (vectorial) techniques if a Euclidean embedding exists, while preventing the need to explicitly compute such an embedding. Extending these techniques for the general case of non-Euclidean dissimilarities makes possible an interpretation of relational clustering as clustering in pseudo-Euclidean space. We compare the methods to well-known clustering methods for proximity data based on deterministic annealing and discuss how far convergence can be guaranteed in the general case. Relational clustering is quadratic in the number of data points, which makes the algorithms infeasible for huge data sets. We propose an approximate patch version of relational clustering that runs in linear time. The effectiveness of the methods is demonstrated in a number of examples.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2009) 21 (12): 3532–3561.
Published: 01 December 2009
FIGURES
| View All (10)
Abstract
View articletitled, Adaptive Relevance Matrices in Learning Vector Quantization
View
PDF
for article titled, Adaptive Relevance Matrices in Learning Vector Quantization
We propose a new matrix learning scheme to extend relevance learning vector quantization (RLVQ), an efficient prototype-based classification algorithm, toward a general adaptive metric. By introducing a full matrix of relevance factors in the distance measure, correlations between different features and their importance for the classification scheme can be taken into account and automated, and general metric adaptation takes place during training. In comparison to the weighted Euclidean metric used in RLVQ and its variations, a full matrix is more powerful to represent the internal structure of the data appropriately. Large margin generalization bounds can be transferred to this case, leading to bounds that are independent of the input dimensionality. This also holds for local metrics attached to each prototype, which corresponds to piecewise quadratic decision boundaries. The algorithm is tested in comparison to alternative learning vector quantization schemes using an artificial data set, a benchmark multiclass problem from the UCI repository, and a problem from bioinformatics, the recognition of splice sites for C. elegans .
Journal Articles
Publisher: Journals Gateway
Neural Computation (2009) 21 (10): 2942–2969.
Published: 01 October 2009
FIGURES
| View All (9)
Abstract
View articletitled, Distance Learning in Discriminative Vector Quantization
View
PDF
for article titled, Distance Learning in Discriminative Vector Quantization
Discriminative vector quantization schemes such as learning vector quantization (LVQ) and extensions thereof offer efficient and intuitive classifiers based on the representation of classes by prototypes. The original methods, however, rely on the Euclidean distance corresponding to the assumption that the data can be represented by isotropic clusters. For this reason, extensions of the methods to more general metric structures have been proposed, such as relevance adaptation in generalized LVQ (GLVQ) and matrix learning in GLVQ. In these approaches, metric parameters are learned based on the given classification task such that a data-driven distance measure is found. In this letter, we consider full matrix adaptation in advanced LVQ schemes. In particular, we introduce matrix learning to a recent statistical formalization of LVQ, robust soft LVQ, and we compare the results on several artificial and real-life data sets to matrix learning in GLVQ, a derivation of LVQ-like learning based on a (heuristic) cost function. In all cases, matrix adaptation allows a significant improvement of the classification accuracy. Interestingly, however, the principled behavior of the models with respect to prototype locations and extracted matrix dimensions shows several characteristic differences depending on the data sets.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2005) 17 (5): 1109–1159.
Published: 01 May 2005
Abstract
View articletitled, Universal Approximation Capability of Cascade Correlation for Structures
View
PDF
for article titled, Universal Approximation Capability of Cascade Correlation for Structures
Cascade correlation (CC) constitutes a training method for neural networks that determines the weights as well as the neural architecture during training. Various extensions of CC to structured data have been proposed: recurrent cascade correlation (RCC) for sequences, recursive cascade correlation (RecCC) for tree structures with limited fan-out, and contextual recursive cascade correlation (CRecCC) for rooted directed positional acyclic graphs (DPAGs) with limited fan-in and fan-out. We show that these models possess the universal approximation property in the following sense: given a probability measure P on the input set, every measurable function from sequences into a real vector space can be approximated by a sigmoidal RCC up to any desired degree of accuracy up to inputs of arbitrary small probability. Every measurable function from tree structures with limited fan-out into a real vector space can be approximated by a sigmoidal RecCC with multiplicative neurons up to any desired degree of accuracy up to inputs of arbitrary small probability. For sigmoidal CRecCC networks with multiplicative neurons, we show the universal approximation capability for functions on an important subset of all DPAGs with limited fan-in and fan-out for which a specific linear representation yields unique codes. We give one sufficient structural condition for the latter property, which can easily be tested: the enumeration of ingoing and outgoing edges should becom patible. This property can be fulfilled for every DPAG with fan-in and fan-out two via reenumeration of children and parents, and for larger fan-in and fan-out via an expansion of the fan-in and fan-out and reenumeration of children and parents. In addition, the result can be generalized to the case of input-output isomorphic transductions of structures. Thus, CRecCC networks consti-tute the first neural models for which the universal approximation ca-pability of functions involving fairly general acyclic graph structures is proved.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2003) 15 (8): 1931–1957.
Published: 01 August 2003
Abstract
View articletitled, Architectural Bias in Recurrent Neural Networks: Fractal Analysis
View
PDF
for article titled, Architectural Bias in Recurrent Neural Networks: Fractal Analysis
We have recently shown that when initialized with “small” weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tiňo, 2002; Tiňo, Čerňanský, &Beňušková, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs . In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram&a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2003) 15 (8): 1897–1929.
Published: 01 August 2003
Abstract
View articletitled, Recurrent Neural Networks with Small Weights Implement Definite Memory Machines
View
PDF
for article titled, Recurrent Neural Networks with Small Weights Implement Definite Memory Machines
Recent experimental studies indicate that recurrent neural networks initialized with “small” weights are inherently biased toward definite memory machines (Tiňno, Čerňanský, & Beňušková, 2002a, 2002b). This article establishes a theoretical counterpart: transition function of recurrent network with small weights and squashing activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite memory machine. Conversely, every definite memory machine can be simulated by a recurrent network with contractive transition function. Hence, initialization with small weights induces an architectural bias into learning with recurrent neural networks. This bias might have benefits from the point of view of statistical learning theory: it emphasizes one possible region of the weight space where generalization ability can be formally proved. It is well known that standard recurrent neural networks are not distribution independent learnable in the probably approximately correct (PAC) sense if arbitrary precision and inputs are considered. We prove that recurrent networks with contractive transition function with a fixed contraction parameter fulfill the so-called distribution independent uniform convergence of empirical distances property and hence, unlike general recurrent networks, are distribution independent PAC learnable.