Skip Nav Destination
Close Modal
Update search
1-4 of 4
Lawrence K. Saul
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2021) 33 (1): 194–226.
Published: 01 January 2021
View article
We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017 ). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2010) 22 (10): 2678–2697.
Published: 01 October 2010
| View All (18)
View article
We introduce a new family of positive-definite kernels for large margin classification in support vector machines (SVMs). These kernels mimic the computation in large neural networks with one layer of hidden units. We also show how to derive new kernels, by recursive composition, that may be viewed as mapping their inputs through a series of nonlinear feature spaces. These recursively derived kernels mimic the computation in deep networks with multiple hidden layers. We evaluate SVMs with these kernels on problems designed to illustrate the advantages of deep architectures. Compared to previous benchmarks, we find that on some problems, these SVMs yield state-of-the-art results, beating not only other SVMs but also deep belief nets.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (8): 2004–2031.
Published: 01 August 2007
View article
Many problems in neural computation and statistical learning involve optimizations with nonnegativity constraints. In this article, we study convex problems in quadratic programming where the optimization is confined to an axis-aligned region in the nonnegative orthant. For these problems, we derive multiplicative updates that improve the value of the objective function at each iteration and converge monotonically to the global minimum. The updates have a simple closed form and do not involve any heuristics or free parameters that must be tuned to ensure convergence. Despite their simplicity, they differ strikingly in form from other multiplicative updates used in machine learning. We provide complete proofs of convergence for these updates and describe their application to problems in signal processing and pattern recognition.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2000) 12 (6): 1313–1335.
Published: 01 June 2000
View article
We study the probabilistic generative models parameterized by feedfor-ward neural networks. An attractor dynamics for probabilistic inference in these models is derived from a mean field approximation for large, layered sigmoidal networks. Fixed points of the dynamics correspond to solutions of the mean field equations, which relate the statistics of each unittothoseofits Markovblanket. We establish global convergence of the dynamics by providing a Lyapunov function and show that the dynamics generate the signals required for unsupervised learning. Our results for feedforward networks provide a counterpart to those of Cohen-Grossberg and Hopfield for symmetric networks.