Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-6 of 6
Todd K. Leen
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2012) 24 (5): 1109–1146.
Published: 01 May 2012
FIGURES
| View All (6)
Abstract
View article
PDF
Online machine learning rules and many biological spike-timing-dependent plasticity (STDP) learning rules generate jump process Markov chains for the synaptic weights. We give a perturbation expansion for the dynamics that, unlike the usual approximation by a Fokker-Planck equation (FPE), is well justified. Our approach extends the related system size expansion by giving an expansion for the probability density as well as its moments. We apply the approach to two observed STDP learning rules and show that in regimes where the FPE breaks down, the new perturbation expansion agrees well with Monte Carlo simulations. The methods are also applicable to the dynamics of stochastic neural activity. Like previous ensemble analyses of STDP, we focus on equilibrium solutions, although the methods can in principle be applied to transients as well.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (9): 2390–2420.
Published: 01 September 2011
FIGURES
| View All (9)
Abstract
View article
PDF
We develop several kernel methods for classification of longitudinal data and apply them to detect cognitive decline in the elderly. We first develop mixed-effects models, a type of hierarchical empirical Bayes generative models, for the time series. After demonstrating their utility in likelihood ratio classifiers (and the improvement over standard regression models for such classifiers), we develop novel Fisher kernels based on mixture of mixed-effects models and use them in support vector machine classifiers. The hierarchical generative model allows us to handle variations in sequence length and sampling interval gracefully. We also give nonparametric kernels not based on generative models, but rather on the reproducing kernel Hilbert space. We apply the methods to detecting cognitive decline from longitudinal clinical data on motor and neuropsychological tests. The likelihood ratio classifiers based on the neuropsychological tests perform better than than classifiers based on the motor behavior. Discriminant classifiers performed better than likelihood ratio classifiers for the motor behavior tests.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (6): 1528–1567.
Published: 01 June 2007
Abstract
View article
PDF
While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1999) 11 (5): 1235–1248.
Published: 01 July 1999
Abstract
View article
PDF
Although the outputs of neural network classifiers are often considered to be estimates of posterior class probabilities, the literature that assesses the calibration accuracy of these estimates illustrates that practical networks often fall far short of being ideal estimators. The theorems used to justify treating network outputs as good posterior estimates are based on several assumptions: that the network is sufficiently complex to model the posterior distribution accurately, that there are sufficient training data to specify the network, and that the optimization routine is capable of finding the global minimum of the cost function. Any or all of these assumptions may be violated in practice. This article does three things. First, we apply a simple, previously used histogram technique to assess graphically the accuracy of posterior estimates with respect to individual classes. Second, we introduce a simple and fast remapping procedure that transforms network outputs to provide better estimates of posteriors. Third, we use the remapping in a real-world telephone speech recognition system. The remapping results in a 10% reduction of both word-level error rates (from 4.53% to 4.06%) and sentence-level error rates (from 16.38% to 14.69%) on one corpus, and a 29% reduction at sentence-level error (from 6.3% to 4.5%) on another. The remapping required negligible additional overhead (in terms of both parameters and calculations). McNemar's test shows that these levels of improvement are statistically significant.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1997) 9 (7): 1493–1516.
Published: 10 July 1997
Abstract
View article
PDF
Reducing or eliminating statistical redundancy between the components of high-dimensional vector data enables a lower-dimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural network communities have developed nonlinear extensions of PCA. This article develops a local linear approach to dimension reduction that provides accurate representations and is fast to compute. We exercise the algorithms on speech and image data, and compare performance with PCA and with neural network implementations of nonlinear PCA. We find that both nonlinear techniques can provide more accurate representations than PCA and show that the local linear techniques outperform neural network implementations.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1995) 7 (5): 974–981.
Published: 01 September 1995
Abstract
View article
PDF
Ideally pattern recognition machines provide constant output when the inputs are transformed under a group G of desired invariances. These invariances can be achieved by enhancing the training data to include examples of inputs transformed by elements of G , while leaving the corresponding targets unchanged. Alternatively the cost function for training can include a regularization term that penalizes changes in the output when the input is transformed under the group. This paper relates the two approaches, showing precisely the sense in which the regularized cost function approximates the result of adding transformed examples to the training data. We introduce the notion of a probability distribution over the group transformations, and use this to rewrite the cost function for the enhanced training data. Under certain conditions, the new cost function is equivalent to the sum of the original cost function plus a regularizer. For unbiased models, the regularizer reduces to the intuitively obvious choice—a term that penalizes changes in the output when the inputs are transformed under the group. For infinitesimal transformations, the coefficient of the regularization term reduces to the variance of the distortions introduced into the training data. This correspondence provides a simple bridge between the two approaches.