Neural Computation Current Issue
https://direct.mit.edu/neco
en-usMon, 26 Jul 2021 00:00:00 GMTMon, 26 Jul 2021 22:45:34 GMTSilverchaireditor@direct.mit.edu/necowebmaster@direct.mit.edu/necoSimulating and Predicting Dynamical Systems With Spatial Semantic Pointers
https://direct.mit.edu/neco/article/33/8/2033/102625/Simulating-and-Predicting-Dynamical-Systems-With
Mon, 26 Jul 2021 00:00:00 GMTVoelker AR, Blouw P, Choo X, et al. <span class="paragraphSection">While neural networks are highly effective at learning task-relevant representations from data, they typically do not learn representations with the kind of symbolic structure that is hypothesized to support high-level cognitive processes, nor do they naturally model such structures within problem domains that are continuous in space and time. To fill these gaps, this work exploits a method for defining vector representations that bind discrete (symbol-like) entities to points in continuous topological spaces in order to simulate and predict the behavior of a range of dynamical systems. These vector representations are spatial semantic pointers (SSPs), and we demonstrate that they can (1) be used to model dynamical systems involving multiple objects represented in a symbol-like manner and (2) be integrated with deep neural networks to predict the future of physical trajectories. These results help unify what have traditionally appeared to be disparate approaches in machine learning.</span>3382033206710.1162/neco_a_01410https://direct.mit.edu/neco/article/33/8/2033/102625/Simulating-and-Predicting-Dynamical-Systems-WithPathological Spectra of the Fisher Information Metric and Its Variants in Deep Neural Networks
https://direct.mit.edu/neco/article/33/8/2274/102619/Pathological-Spectra-of-the-Fisher-Information
Mon, 26 Jul 2021 00:00:00 GMTKarakida R, Akaho S, Amari S. <span class="paragraphSection">The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions. Focusing on the FIM and its variants in deep neural networks (DNNs), we reveal their characteristic scale dependence on the network width, depth, and sample size when the network has random weights and is sufficiently wide. This study covers two widely used FIMs for regression with linear output and for classification with softmax output. Both FIMs asymptotically show pathological eigenvalue spectra in the sense that a small number of eigenvalues become large outliers depending on the width or sample size, while the others are much smaller. It implies that the local shape of the parameter space or loss landscape is very sharp in a few specific directions while almost flat in the other directions. In particular, the softmax output disperses the outliers and makes a tail of the eigenvalue density spread from the bulk. We also show that pathological spectra appear in other variants of FIMs: one is the neural tangent kernel; another is a metric for the input signal and feature space that arises from feedforward signal propagation. Thus, we provide a unified perspective on the FIM and its variants that will lead to more quantitative understanding of learning in large-scale DNNs.</span>3382274230710.1162/neco_a_01411https://direct.mit.edu/neco/article/33/8/2274/102619/Pathological-Spectra-of-the-Fisher-InformationDirection Matters: On Influence-Preserving Graph Summarization and Max-Cut Principle for Directed Graphs
https://direct.mit.edu/neco/article/33/8/2128/101868/Direction-Matters-On-Influence-Preserving-Graph
Mon, 26 Jul 2021 00:00:00 GMTXu W, Niu G, Hyvärinen A, et al. <span class="paragraphSection">Summarizing large-scale directed graphs into small-scale representations is a useful but less-studied problem setting. Conventional clustering approaches, based on Min-Cut-style criteria, compress both the vertices and edges of the graph into the communities, which lead to a loss of directed edge information. On the other hand, compressing the vertices while preserving the directed-edge information provides a way to learn the small-scale representation of a directed graph. The reconstruction error, which measures the edge information preserved by the summarized graph, can be used to learn such representation. Compared to the original graphs, the summarized graphs are easier to analyze and are capable of extracting group-level features, useful for efficient interventions of population behavior. In this letter, we present a model, based on minimizing reconstruction error with nonnegative constraints, which relates to a Max-Cut criterion that simultaneously identifies the compressed nodes and the directed compressed relations between these nodes. A multiplicative update algorithm with column-wise normalization is proposed. We further provide theoretical results on the identifiability of the model and the convergence of the proposed algorithms. Experiments are conducted to demonstrate the accuracy and robustness of the proposed method.</span>3382128216210.1162/neco_a_01402https://direct.mit.edu/neco/article/33/8/2128/101868/Direction-Matters-On-Influence-Preserving-GraphLearning Brain Dynamics With Coupled Low-Dimensional Nonlinear Oscillators and Deep Recurrent Networks
https://direct.mit.edu/neco/article/33/8/2087/101867/Learning-Brain-Dynamics-With-Coupled-Low
Mon, 26 Jul 2021 00:00:00 GMTAbrevaya G, Dumas G, Aravkin AY, et al. <span class="paragraphSection">Many natural systems, especially biological ones, exhibit complex multivariate nonlinear dynamical behaviors that can be hard to capture by linear autoregressive models. On the other hand, generic nonlinear models such as deep recurrent neural networks often require large amounts of training data, not always available in domains such as brain imaging; also, they often lack interpretability. Domain knowledge about the types of dynamics typically observed in such systems, such as a certain type of dynamical systems models, could complement purely data-driven techniques by providing a good prior. In this work, we consider a class of ordinary differential equation (ODE) models known as van der Pol (VDP) oscil lators and evaluate their ability to capture a low-dimensional representation of neural activity measured by different brain imaging modalities, such as calcium imaging (CaI) and fMRI, in different living organisms: larval zebrafish, rat, and human. We develop a novel and efficient approach to the nontrivial problem of parameters estimation for a network of coupled dynamical systems from multivariate data and demonstrate that the resulting VDP models are both accurate and interpretable, as VDP's coupling matrix reveals anatomically meaningful excitatory and inhibitory interactions across different brain subsystems. VDP outperforms linear autoregressive models (VAR) in terms of both the data fit accuracy and the quality of insight provided by the coupling matrices and often tends to generalize better to unseen data when predicting future brain activity, being comparable to and sometimes better than the recurrent neural networks (LSTMs). Finally, we demonstrate that our (generative) VDP model can also serve as a data-augmentation tool leading to marked improvements in predictive accuracy of recurrent neural networks. Thus, our work contributes to both basic and applied dimensions of neuroimaging: gaining scientific insights and improving brain-based predictive models, an area of potentially high practical importance in clinical diagnosis and neurotechnology.</span>3382087212710.1162/neco_a_01401https://direct.mit.edu/neco/article/33/8/2087/101867/Learning-Brain-Dynamics-With-Coupled-LowArtificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting
https://direct.mit.edu/neco/article/33/8/2163/101866/Artificial-Neural-Variability-for-Deep-Learning-On
Mon, 26 Jul 2021 00:00:00 GMTXie Z, He F, Fu S, et al. <span class="paragraphSection">Deep learning is often criticized by two serious issues that rarely exist in natural nervous systems: overfitting and catastrophic forgetting. It can even memorize randomly labeled data, which has little knowledge behind the instance-label pairs. When a deep network continually learns over time by accommodating new tasks, it usually quickly overwrites the knowledge learned from previous tasks. Referred to as the <span style="font-style:italic;">neural variability</span>, it is well known in neuroscience that human brain reactions exhibit substantial variability even in response to the same stimulus. This mechanism balances accuracy and plasticity/flexibility in the motor learning of natural nervous systems. Thus, it motivates us to design a similar mechanism, named <span style="font-style:italic;">artificial neural variability</span> (ANV), that helps artificial neural networks learn some advantages from “natural” neural networks. We rigorously prove that ANV plays as an implicit regularizer of the mutual information between the training data and the learned model. This result theoretically guarantees ANV a strictly improved generalizability, robustness to label noise, and robustness to catastrophic forgetting. We then devise a <span style="font-style:italic;">neural variable risk minimization</span> (NVRM) framework and <span style="font-style:italic;">neural variable optimizers</span> to achieve ANV for conventional network architectures in practice. The empirical studies demonstrate that NVRM can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs.</span>3382163219210.1162/neco_a_01403https://direct.mit.edu/neco/article/33/8/2163/101866/Artificial-Neural-Variability-for-Deep-Learning-OnStorage Capacity of Quaternion-Valued Hopfield Neural Networks With Dual Connections
https://direct.mit.edu/neco/article/33/8/2226/101864/Storage-Capacity-of-Quaternion-Valued-Hopfield
Mon, 26 Jul 2021 00:00:00 GMTKobayashi M. <span class="paragraphSection">A complex-valued Hopfield neural network (CHNN) is a multistate Hopfield model. A quaternion-valued Hopfield neural network (QHNN) with a twin-multistate activation function was proposed to reduce the number of weight parameters of CHNN. Dual connections (DCs) are introduced to the QHNNs to improve the noise tolerance. The DCs take advantage of the noncommutativity of quaternions and consist of two weights between neurons. A QHNN with DCs provides much better noise tolerance than a CHNN. Although a CHNN and a QHNN with DCs have the samenumber of weight parameters, the storage capacity of projection rule for QHNNs with DCs is half of that for CHNNs and equals that of conventional QHNNs. The small storage capacity of QHNNs with DCs is caused by projection rule, not the architecture. In this work, the ebbian rule is introduced and proved by stochastic analysis that the storage capacity of a QHNN with DCs is 0.8 times as many as that of a CHNN.</span>3382226224010.1162/neco_a_01405https://direct.mit.edu/neco/article/33/8/2226/101864/Storage-Capacity-of-Quaternion-Valued-HopfieldPower Function Error Initialization Can Improve Convergence of Backpropagation Learning in Neural Networks for Classification
https://direct.mit.edu/neco/article/33/8/2193/101861/Power-Function-Error-Initialization-Can-Improve
Mon, 26 Jul 2021 00:00:00 GMTKnoblauch A. <span class="paragraphSection">Supervised learning corresponds to minimizing a loss or cost function expressing the differences between model predictions yn and the target values tn given by the training data. In neural networks, this means backpropagating error signals through the transposed weight matrixes from the output layer toward the input layer. For this, error signals in the output layer are typically initialized by the difference yn- tn, which is optimal for several commonly used loss functions like cross-entropy or sum of squared errors. Here I evaluate a more general error initialization method using power functions |yn- tn|q for q>0, corresponding to a new family of loss functions that generalize cross-entropy. Surprisingly, experiments on various learning tasks reveal that a proper choice of q can significantly improve the speed and convergence of backpropagation learning, in particular in deep and recurrent neural networks. The results suggest two main reasons for the observed improvements. First, compared to cross-entropy, the new loss functions provide better fits to the distribution of error signals in the output layer and therefore maximize the model's likelihood more efficiently. Second, the new error initialization procedure may often provide a better gradient-to-loss ratio over a broad range of neural output activity, thereby avoiding flat loss landscapes with vanishing gradients.</span>3382193222510.1162/neco_a_01407https://direct.mit.edu/neco/article/33/8/2193/101861/Power-Function-Error-Initialization-Can-ImproveRandomized Self-Organizing Map
https://direct.mit.edu/neco/article/33/8/2241/101860/Randomized-Self-Organizing-Map
Mon, 26 Jul 2021 00:00:00 GMTRougier NP, Detorakis Gs. <span class="paragraphSection">We propose a variation of the self-organizing map algorithm by considering the random placement of neurons on a two-dimensional manifold, following a blue noise distribution from which various topologies can be derived. These topologies possess random (but controllable) discontinuities that allow for a more flexible self-organization, especially with high-dimensional data. The proposed algorithm is tested on one-, two- and three-dimensional tasks, as well as on the MNIST handwritten digits data set and validated using spectral analysis and topological data analysis tools. We also demonstrate the ability of the randomized self-organizing map to gracefully reorganize itself in case of neural lesion and/or neurogenesis.</span>3382241227310.1162/neco_a_01406https://direct.mit.edu/neco/article/33/8/2241/101860/Randomized-Self-Organizing-MapFrequency Selectivity of Neural Circuits With Heterogeneous Discrete Transmission Delays
https://direct.mit.edu/neco/article/33/8/2068/101859/Frequency-Selectivity-of-Neural-Circuits-With
Mon, 26 Jul 2021 00:00:00 GMTHouben A. <span class="paragraphSection">Neurons are connected to other neurons by axons and dendrites that conduct signals with finite velocities, resulting in delays between the firing of a neuron and the arrival of the resultant impulse at other neurons. Since delays greatly complicate the analytical treatment and interpretation of models, they are usually neglected or taken to be uniform, leading to a lack in the comprehension of the effects of delays in neural systems. This letter shows that heterogeneous transmission delays make small groups of neurons respond selectively to inputs with differing frequency spectra. By studying a single integrate-and-fire neuron receiving correlated time-shifted inputs, it is shown how the frequency response is linked to both the strengths and delay times of the afferent connections. The results show that incorporating delays alters the functioning of neural networks, and changes the effect that neural connections and synaptic strengths have.</span>3382068208610.1162/neco_a_01404https://direct.mit.edu/neco/article/33/8/2068/101859/Frequency-Selectivity-of-Neural-Circuits-With