Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-8 of 8
David J. C. MacKay
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (1999) 11 (5): 1035–1068.
Published: 01 July 1999
Abstract
View article
PDF
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models that include unknown hyperparameters such as regularization constants and noise levels. In the evidence framework, the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized hyperparameters are used to define a gaussian approximation to the posterior distribution. In the alternative MAP method, the true posterior probability is found by integrating over the hyperparameters. The true posterior is then maximized over the model parameters, and a gaussian approximation is made. The similarities of the two approaches and their relative merits are discussed, and comparisons are made with the ideal hierarchical Bayesian solution. In moderately ill-posed problems, integration over hyperparameters yields a probability distribution with a skew peak, which causes signifi-cant biases to arise in the MAP method. In contrast, the evidence framework is shown to introduce negligible predictive error under straightforward conditions. General lessons are drawn concerning inference in many dimensions.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1996) 8 (1): 178–181.
Published: 01 January 1996
Abstract
View article
PDF
Several authors have studied the relationship between hidden Markov models and “Boltzmann chains” with a linear or “time-sliced” architecture. Boltzmann chains model sequences of states by defining state-state transition energies instead of probabilities. In this note I demonstrate that under the simple condition that the state sequence has a mandatory end state, the probability distribution assigned by a strictly linear Boltzmann chain is identical to that assigned by a hidden Markov model.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1994) 6 (1): 100–126.
Published: 01 January 1994
Abstract
View article
PDF
Models of unsupervised, correlation-based (Hebbian) synaptic plasticity are typically unstable: either all synapses grow until each reaches the maximum allowed strength, or all synapses decay to zero strength. A common method of avoiding these outcomes is to use a constraint that conserves or limits the total synaptic strength over a cell. We study the dynamic effects of such constraints. Two methods of enforcing a constraint are distinguished, multiplicative and subtractive. For otherwise linear learning rules, multiplicative enforcement of a constraint results in dynamics that converge to the principal eigenvector of the operator determining unconstrained synaptic development. Subtractive enforcement, in contrast, typically leads to a final state in which almost all synaptic strengths reach either the maximum or minimum allowed value. This final state is often dominated by weight configurations other than the principal eigenvector of the unconstrained operator. Multiplicative enforcement yields a “graded” receptive field in which most mutually correlated inputs are represented, whereas subtractive enforcement yields a receptive field that is “sharpened” to a subset of maximally correlated inputs. If two equivalent input populations (e.g., two eyes) innervate a common target, multiplicative enforcement prevents their segregation (ocular dominance segregation) when the two populations are weakly correlated; whereas subtractive enforcement allows segregation under these circumstances. These results may be used to understand constraints both over output cells and over input cells. A variety of rules that can implement constrained dynamics are discussed.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1992) 4 (5): 720–736.
Published: 01 September 1992
Abstract
View article
PDF
Three Bayesian ideas are presented for supervised adaptive classifiers. First, it is argued that the output of a classifier should be obtained by marginalizing over the posterior distribution of the parameters; a simple approximation to this integral is proposed and demonstrated. This involves a "moderation" of the most probable classifier's outputs, and yields improved performance. Second, it is demonstrated that the Bayesian framework for model comparison described for regression models in MacKay (1992a,b) can also be applied to classification problems. This framework successfully chooses the magnitude of weight decay terms, and ranks solutions found using different numbers of hidden units. Third, an information-based data selection criterion is derived and demonstrated within this framework.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1992) 4 (4): 590–604.
Published: 01 July 1992
Abstract
View article
PDF
Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed that measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain information about lead to three different criteria for data selection. All these criteria depend on the assumption that the hypothesis space is correct, which may prove to be their main weakness.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1992) 4 (3): 448–472.
Published: 01 May 1992
Abstract
View article
PDF
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1992) 4 (3): 415–447.
Published: 01 May 1992
Abstract
View article
PDF
Although Bayesian analysis has been in use since Laplace, the Bayesian method of model-comparison has only recently been developed in depth. In this paper, the Bayesian approach to regularization and model-comparison is demonstrated by studying the inference problem of interpolating noisy data. The concepts and methods described are quite general and can be applied to many other data modeling problems. Regularizing constants are set by examining their posterior probability distribution. Alternative regularizers (priors) and alternative basis sets are objectively compared by evaluating the evidence for them. “Occam's razor” is automatically embodied by this process. The way in which Bayes infers the values of regularizing constants and noise levels has an elegant interpretation in terms of the effective number of parameters determined by the data set. This framework is due to Gull and Skilling.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1990) 2 (2): 173–187.
Published: 01 June 1990
Abstract
View article
PDF
Linsker has reported the development of center-surround receptive fields and oriented receptive fields in simulations of a Hebb-type equation in a linear network. The dynamics of the learning rule are analyzed in terms of the eigenvectors of the covariance matrix of cell activities. Analytic and computational results for Linsker's covariance matrices, and some general theorems, lead to an explanation of the emergence of center-surround and certain oriented structures. We estimate criteria for the parameter regime in which center-surround structures emerge.