Skip Nav Destination
Close Modal
Update search
NARROW
Date
Availability
1-20 of 94
Note
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
1
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2017) 29 (9): 2450–2490.
Published: 01 September 2017
FIGURES
| View All (61)
Abstract
View article
PDF
In order to interact intelligently with objects in the world, animals must first transform neural population responses into estimates of the dynamic, unknown stimuli that caused them. The Bayesian solution to this problem is known as a Bayes filter, which applies Bayes' rule to combine population responses with the predictions of an internal model. The internal model of the Bayes filter is based on the true stimulus dynamics, and in this note, we present a method for training a theoretical neural circuit to approximately implement a Bayes filter when the stimulus dynamics are unknown. To do this we use the inferential properties of linear probabilistic population codes to compute Bayes' rule and train a neural network to compute approximate predictions by the method of maximum likelihood. In particular, we perform stochastic gradient descent on the negative log-likelihood of the neural network parameters with a novel approximation of the gradient. We demonstrate our methods on a finite-state, a linear, and a nonlinear filtering problem and show how the hidden layer of the neural network develops tuning curves consistent with findings in experimental neuroscience.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (12): 2585–2593.
Published: 01 December 2016
Abstract
View article
PDF
The mixture-of-experts (MoE) model is a popular neural network architecture for nonlinear regression and classification. The class of MoE mean functions is known to be uniformly convergent to any unknown target function, assuming that the target function is from a Sobolev space that is sufficiently differentiable and that the domain of estimation is a compact unit hypercube. We provide an alternative result, which shows that the class of MoE mean functions is dense in the class of all continuous functions over arbitrary compact domains of estimation. Our result can be viewed as a universal approximation theorem for MoE models. The theorem we present allows MoE users to be confident in applying such models for estimation when data arise from nonlinear and nondifferentiable generative processes.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (10): 2045–2062.
Published: 01 October 2016
Abstract
View article
PDF
In many areas of neural computation, like learning, optimization, estimation, and inference, suitable divergences play a key role. In this note, we study the conjecture presented by Amari ( 2009 ) and find a counterexample to show that the conjecture does not hold generally. Moreover, we investigate two classes of -divergence (Zhang, 2004 ), weighted f -divergence and weighted -divergence, and prove that if a divergence is a weighted f -divergence, as well as a Bregman divergence, then it is a weighted -divergence. This result reduces in form to the main theorem established by Amari ( 2009 ) when .
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (8): 1498–1502.
Published: 01 August 2016
FIGURES
Abstract
View article
PDF
The leaky integrator, the basis for many neuronal models, possesses a negative group delay when a time-delayed recurrent inhibition is added to it. By means of this delay, the leaky integrator becomes a predictor for some frequency components of the input signal. The prediction properties are derived analytically, and an application to a local field potential is provided.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (7): 1289–1304.
Published: 01 July 2016
FIGURES
| View All (5)
Abstract
View article
PDF
The possibility of approximating a continuous function on a compact subset of the real line by a feedforward single hidden layer neural network with a sigmoidal activation function has been studied in many papers. Such networks can approximate an arbitrary continuous function provided that an unlimited number of neurons in a hidden layer is permitted. In this note, we consider constructive approximation on any finite interval of by neural networks with only one neuron in the hidden layer. We construct algorithmically a smooth, sigmoidal, almost monotone activation function providing approximation to an arbitrary continuous function within any degree of accuracy. This algorithm is implemented in a computer program, which computes the value of at any reasonable point of the real axis.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (6): 1042–1050.
Published: 01 June 2016
Abstract
View article
PDF
In a pioneering classic, Warren McCulloch and Walter Pitts proposed a model of the central nervous system. Motivated by EEG recordings of normal brain activity, Chvátal and Goldsmith asked whether these dynamical systems can be engineered to produce trajectories that are irregular, disorderly, and apparently unpredictable. We show that they cannot build weak pseudorandom functions.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (5): 815–825.
Published: 01 May 2016
FIGURES
| View All (13)
Abstract
View article
PDF
A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors, followed by (2) taking the absolute value of every entry of the resulting vectors, followed by (3) local averaging. For processing real-valued random vectors, complex-valued convnets can be viewed as data-driven multiscale windowed power spectra, data-driven multiscale windowed absolute spectra, data-driven multiwavelet absolute values, or (in their most general configuration) data-driven nonlinear multiwavelet packets. Indeed, complex-valued convnets can calculate multiscale windowed spectra when the convnet filters are windowed complex-valued exponentials. Standard real-valued convnets, using rectified linear units (ReLUs), sigmoidal (e.g., logistic or tanh) nonlinearities, or max pooling, for example, do not obviously exhibit the same exact correspondence with data-driven wavelets (whereas for complex-valued convnets, the correspondence is much more than just a vague analogy). Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (4): 629–651.
Published: 01 April 2016
FIGURES
| View All (69)
Abstract
View article
PDF
We introduce a neural model capable of feature selectiveness by spike-mediated synchronization through lateral synaptic couplings. For a stimulus containing two features, the attended one elicits a higher response. In the case of sequential single-feature stimuli, repetition of the attended feature also results in an enhanced response, exhibited by greater synchrony and higher spiking rates.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (3): 485–492.
Published: 01 March 2016
FIGURES
Abstract
View article
PDF
Maximum pseudo-likelihood estimation (MPLE) is an attractive method for training fully visible Boltzmann machines (FVBMs) due to its computational scalability and the desirable statistical properties of the MPLE. No published algorithms for MPLE have been proven to be convergent or monotonic. In this note, we present an algorithm for the MPLE of FVBMs based on the block successive lower-bound maximization (BSLM) principle. We show that the BSLM algorithm monotonically increases the pseudo-likelihood values and that the sequence of BSLM estimates converges to the unique global maximizer of the pseudo-likelihood function. The relationship between the BSLM algorithm and the gradient ascent (GA) algorithm for MPLE of FVBMs is also discussed, and a convergence criterion for the GA algorithm is given.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (1): 71–88.
Published: 01 January 2016
Abstract
View article
PDF
We present a better theoretical foundation of support vector machines with polynomial kernels. The sample error is estimated under Tsybakov’s noise assumption. In bounding the approximation error, we take advantage of a geometric noise assumption that was introduced to analyze gaussian kernels. Compared with the previous literature, the error analysis in this note does not require any regularity of the marginal distribution or smoothness of Bayes’ rule. We thus establish the learning rates for polynomial kernels for a wide class of distributions.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2015) 27 (10): 2097–2106.
Published: 01 October 2015
FIGURES
| View All (26)
Abstract
View article
PDF
We compare an entropy estimator recently discussed by Zhang ( 2012 ) with two estimators, and , introduced by Grassberger ( 2003 ) and Schürmann ( 2004 ). We prove the identity , which has not been taken into account by Zhang ( 2012 ). Then we prove that the systematic error (bias) of is less than or equal to the bias of the ordinary likelihood (or plug-in) estimator of entropy. Finally, by numerical simulation, we verify that for the most interesting regime of small sample estimation and large event spaces, the estimator has a significantly smaller statistical error than .
Journal Articles
Publisher: Journals Gateway
Neural Computation (2015) 27 (9): 1857–1871.
Published: 01 September 2015
FIGURES
| View All (4)
Abstract
View article
PDF
We consider the problem of selecting the optimal orders of vector autoregressive (VAR) models for fMRI data. Many previous studies used model order of one and ignored that it may vary considerably across data sets depending on different data dimensions, subjects, tasks, and experimental designs. In addition, the classical information criteria (IC) used (e.g., the Akaike IC (AIC)) are biased and inappropriate for the high-dimensional fMRI data typically with a small sample size. We examine the mixed results on the optimal VAR orders for fMRI, especially the validity of the order-one hypothesis, by a comprehensive evaluation using different model selection criteria over three typical data types—a resting state, an event-related design, and a block design data set—with varying time series dimensions obtained from distinct functional brain networks. We use a more balanced criterion, Kullback’s IC (KIC) based on Kullback’s symmetric divergence combining two directed divergences. We also consider the bias-corrected versions (AICc and KICc) to improve VAR model selection in small samples. Simulation results show better small-sample selection performance of the proposed criteria over the classical ones. Both bias-corrected ICs provide more accurate and consistent model order choices than their biased counterparts, which suffer from overfitting, with KICc performing the best. Results on real data show that orders greater than one were selected by all criteria across all data sets for the small to moderate dimensions, particularly from small, specific networks such as the resting-state default mode network and the task-related motor networks, whereas low orders close to one but not necessarily one were chosen for the large dimensions of full-brain networks.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2015) 27 (1): 32–41.
Published: 01 January 2015
FIGURES
| View All (24)
Abstract
View article
PDF
The colorful representation of orientation preference maps in primary visual cortex has become iconic. However, the standard representation is misleading because it uses a color mapping to indicate orientations based on the HSV (hue, saturation, value) color space, for which important perceptual features such as brightness, and not just hue, vary among orientations. This means that some orientations stand out more than others, conveying a distorted visual impression. This is particularly problematic for visualizing subtle biases caused by slight overrepresentation of some orientations due to, for example, stripe rearing. We show that displaying orientation maps with a color mapping based on a slightly modified version of the HCL (hue, chroma, lightness) color space, so that primarily only hue varies between orientations, leads to a more balanced visual impression. This makes it easier to perceive the true structure of this seminal example of functional brain architecture.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (3): 467–471.
Published: 01 March 2014
FIGURES
Abstract
View article
PDF
Temporal difference learning models of dopamine assert that phasic levels of dopamine encode a reward prediction error. However, this hypothesis has been challenged by recent observations of gradually ramping stratal dopamine levels as a goal is approached. This note describes conditions under which temporal difference learning models predict dopamine ramping. The key idea is representational: a quadratic transformation of proximity to the goal implies approximately linear ramping, as observed experimentally.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2012) 24 (6): 1391–1407.
Published: 01 June 2012
Abstract
View article
PDF
The concave-convex procedure (CCCP) is an iterative algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms, including sparse support vector machines (SVMs), transductive SVMs, and sparse principal component analysis. Though CCCP is widely used in many applications, its convergence behavior has not gotten a lot of specific attention. Yuille and Rangarajan analyzed its convergence in their original paper; however, we believe the analysis is not complete. The convergence of CCCP can be derived from the convergence of the d.c. algorithm (DCA), proposed in the global optimization literature to solve general d.c. programs, whose proof relies on d.c. duality. In this note, we follow a different reasoning and show how Zangwill's global convergence theory of iterative algorithms provides a natural framework to prove the convergence of CCCP. This underlines Zangwill's theory as a powerful and general framework to deal with the convergence issues of iterative algorithms, after also being used to prove the convergence of algorithms like expectation-maximization and generalized alternating minimization. In this note, we provide a rigorous analysis of the convergence of CCCP by addressing two questions: When does CCCP find a local minimum or a stationary point of the d.c. program under consideration? and when does the sequence generated by CCCP converge? We also present an open problem on the issue of local convergence of CCCP.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2012) 24 (3): 607–610.
Published: 01 March 2012
Abstract
View article
PDF
The Levenberg-Marquardt (LM) learning algorithm is a popular algorithm for training neural networks; however, for large neural networks, it becomes prohibitively expensive in terms of running time and memory requirements. The most time-critical step of the algorithm is the calculation of the Gauss-Newton matrix, which is formed by multiplying two large Jacobian matrices together. We propose a method that uses backpropagation to reduce the time of this matrix-matrix multiplication. This reduces the overall asymptotic running time of the LM algorithm by a factor of the order of the number of output nodes in the neural network.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2012) 24 (1): 25–31.
Published: 01 January 2012
Abstract
View article
PDF
We demonstrate the mathematical equivalence of two commonly used forms of firing rate model equations for neural networks. In addition, we show that what is commonly interpreted as the firing rate in one form of model may be better interpreted as a low-pass-filtered firing rate, and we point out a conductance-based firing rate model.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (10): 2498–2510.
Published: 01 October 2011
FIGURES
| View All (14)
Abstract
View article
PDF
Robust coding has been proposed as a solution to the problem of minimizing decoding error in the presence of neural noise. Many real-world problems, however, have degradation in the input signal, not just in neural representations. This generalized problem is more relevant to biological sensory coding where internal noise arises from limited neural precision and external noise from distortion of sensory signal such as blurring and phototransduction noise. In this note, we show that the optimal linear encoder for this problem can be decomposed exactly into two serial processes that can be optimized separately. One is Wiener filtering, which optimally compensates for input degradation. The other is robust coding, which best uses the available representational capacity for signal transmission with a noisy population of linear neurons. We also present spectral analysis of the decomposition that characterizes how the reconstruction error is minimized under different input signal spectra, types and amounts of degradation, degrees of neural precision, and neural population sizes.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (8): 1935–1943.
Published: 01 August 2011
Abstract
View article
PDF
The perceptron is a simple supervised algorithm to train a linear classifier that has been analyzed and used extensively. The classifier separates the data into two groups using a decision hyperplane, with the margin between the data and the hyperplane determining the classifier's ability to generalize and its robustness to input noise. Exact results for the maximal size of the separating margin are known for specific input distributions, and bounds exist for arbitrary distributions, but both rely on lengthy statistical mechanics calculations carried out in the limit of infinite input size. Here we present a short analysis of perceptron classification using singular value decomposition. We provide a simple derivation of a lower bound on the margin and an explicit formula for the perceptron weights that converges to the optimal result for large separating margins.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (7): 1661–1674.
Published: 01 July 2011
Abstract
View article
PDF
Denoising autoencoders have been previously shown to be competitive alternatives to restricted Boltzmann machines for unsupervised pretraining of each layer of a deep architecture. We show that a simple denoising autoencoder training criterion is equivalent to matching the score (with respect to the data) of a specific energy-based model to that of a nonparametric Parzen density estimator of the data. This yields several useful insights. It defines a proper probabilistic model for the denoising autoencoder technique, which makes it in principle possible to sample from them or rank examples by their energy. It suggests a different way to apply score matching that is related to learning to denoise and does not require computing second derivatives. It justifies the use of tied weights between the encoder and decoder and suggests ways to extend the success of denoising autoencoders to a larger family of energy-based models.
1