Neural Computation Open Issues
https://direct.mit.edu/neco
en-usWed, 01 Dec 2021 00:00:00 GMTTue, 09 Feb 2021 22:45:58 GMTSilverchaireditor@direct.mit.edu/necowebmaster@direct.mit.edu/necoPassive Nonlinear Dendritic Interactions as a Computational Resource in Spiking Neural Networks
https://direct.mit.edu/neco/article/33/1/96/95671/Passive-Nonlinear-Dendritic-Interactions-as-a
Wed, 01 Dec 2021 00:00:00 GMTStöckel A, Eliasmith C. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Nonlinear interactions in the dendritic tree play a key role in neural computation. Nevertheless, modeling frameworks aimed at the construction of large-scale, functional spiking neural networks, such as the Neural Engineering Framework, tend to assume a linear superposition of postsynaptic currents. In this letter, we present a series of extensions to the Neural Engineering Framework that facilitate the construction of networks incorporating Dale's principle and nonlinear conductance-based synapses. We apply these extensions to a two-compartment LIF neuron that can be seen as a simple model of passive dendritic computation. We show that it is possible to incorporate neuron models with input-dependent nonlinearities into the Neural Engineering Framework without compromising high-level function and that nonlinear postsynaptic currents can be systematically exploited to compute a wide variety of multivariate, band-limited functions, including the Euclidean norm, controlled shunting, and nonnegative multiplication. By avoiding an additional source of spike noise, the function approximation accuracy of a single layer of two-compartment LIF neurons is on a par with or even surpasses that of two-layer spiking neural networks up to a certain target function bandwidth.</span>3319612810.1162/neco_a_01338https://direct.mit.edu/neco/article/33/1/96/95671/Passive-Nonlinear-Dendritic-Interactions-as-aFlexible Working Memory Through Selective Gating and Attentional Tagging
https://direct.mit.edu/neco/article/33/1/1/95670/Flexible-Working-Memory-Through-Selective-Gating
Wed, 01 Dec 2021 00:00:00 GMTKruijne W, Bohte SM, Roelfsema PR, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Working memory is essential: it serves to guide intelligent behavior of humans and nonhuman primates when task-relevant stimuli are no longer present to the senses. Moreover, complex tasks often require that multiple working memory representations can be flexibly and independently maintained, prioritized, and updated according to changing task demands. Thus far, neural network models of working memory have been unable to offer an integrative account of how such control mechanisms can be acquired in a biologically plausible manner. Here, we present WorkMATe, a neural network architecture that models cognitive control over working memory content and learns the appropriate control operations needed to solve complex working memory tasks. Key components of the model include a gated memory circuit that is controlled by internal actions, encoding sensory information through untrained connections, and a neural circuit that matches sensory inputs to memory content. The network is trained by means of a biologically plausible reinforcement learning rule that relies on attentional feedback and reward prediction errors to guide synaptic updates. We demonstrate that the model successfully acquires policies to solve classical working memory tasks, such as delayed recognition and delayed pro-saccade/anti-saccade tasks. In addition, the model solves much more complex tasks, including the hierarchical 12-AX task or the ABAB ordered recognition task, both of which demand an agent to independently store and updated multiple items separately in memory. Furthermore, the control strategies that the model acquires for these tasks subsequently generalize to new task contexts with novel stimuli, thus bringing symbolic production rule qualities to a neural network architecture. As such, WorkMATe provides a new solution for the neural implementation of flexible memory control.</span>33114010.1162/neco_a_01339https://direct.mit.edu/neco/article/33/1/1/95670/Flexible-Working-Memory-Through-Selective-GatingLearning in Volatile Environments With the Bayes Factor Surprise
https://direct.mit.edu/neco/article/33/2/269/95646/Learning-in-Volatile-Environments-With-the-Bayes
Mon, 01 Feb 2021 00:00:00 GMTLiakoni V, Modirshanechi A, Gerstner W, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Surprise-based learning allows agents to rapidly adapt to nonstationary stochastic environments characterized by sudden changes. We show that exact Bayesian inference in a hierarchical model gives rise to a surprise-modulated trade-off between forgetting old observations and integrating them with the new ones. The modulation depends on a probability ratio, which we call the Bayes Factor Surprise, that tests the prior belief against the current belief. We demonstrate that in several existing approximate algorithms, the Bayes Factor Surprise modulates the rate of adaptation to new observations. We derive three novel surprise-based algorithms, one in the family of particle filters, one in the family of variational learning, and one in the family of message passing, that have constant scaling in observation sequence length and particularly simple update dynamics for any distribution in the exponential family. Empirical results show that these surprise-based algorithms estimate parameters better than alternative approximate approaches and reach levels of performance comparable to computationally more expensive algorithms. The Bayes Factor Surprise is related to but different from the Shannon Surprise. In two hypothetical experiments, we make testable predictions for physiological indicators that dissociate the Bayes Factor Surprise from the Shannon Surprise. The theoretical insight of casting various approaches as surprise-based learning, as well as the proposed online algorithms, may be applied to the analysis of animal and human behavior and to reinforcement learning in nonstationary environments.</span>33226934010.1162/neco_a_01352https://direct.mit.edu/neco/article/33/2/269/95646/Learning-in-Volatile-Environments-With-the-BayesWhence the Expected Free Energy?
https://direct.mit.edu/neco/article/33/2/447/95645/Whence-the-Expected-Free-Energy
Mon, 01 Feb 2021 00:00:00 GMTMillidge B, Tschantz A, Buckley CL. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The expected free energy (EFE) is a central quantity in the theory of active inference. It is the quantity that all active inference agents are mandated to minimize through action, and its decomposition into extrinsic and intrinsic value terms is key to the balance of exploration and exploitation that active inference agents evince. Despite its importance, the mathematical origins of this quantity and its relation to the variational free energy (VFE) remain unclear. In this letter, we investigate the origins of the EFE in detail and show that it is not simply ”the free energy in the future.” We present a functional that we argue is the natural extension of the VFE but actively discourages exploratory behavior, thus demonstrating that exploration does not directly follow from free energy minimization into the future. We then develop a novel objective, the free energy of the expected future (FEEF), which possesses both the epistemic component of the EFE and an intuitive mathematical grounding as the divergence between predicted and desired futures.</span>33244748210.1162/neco_a_01354https://direct.mit.edu/neco/article/33/2/447/95645/Whence-the-Expected-Free-EnergyEnhanced Equivalence Projective Simulation: A Framework for Modeling Formation of Stimulus Equivalence Classes
https://direct.mit.edu/neco/article/33/2/483/95644/Enhanced-Equivalence-Projective-Simulation-A
Mon, 01 Feb 2021 00:00:00 GMTMofrad A, Yazidi A, Mofrad S, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Formation of stimulus equivalence classes has been recently modeled through equivalence projective simulation (EPS), a modified version of a projective simulation (PS) learning agent. PS is endowed with an episodic memory that resembles the internal representation in the brain and the concept of cognitive maps. PS flexibility and interpretability enable the EPS model and, consequently the model we explore in this letter, to simulate a broad range of behaviors in matching-to-sample experiments. The episodic memory, the basis for agent decision making, is formed during the training phase. Derived relations in the EPS model that are not trained directly but can be established via the network's connections are computed on demand during the test phase trials by likelihood reasoning. In this letter, we investigate the formation of derived relations in the EPS model using network enhancement (NE), an iterative diffusion process, that yields an offline approach to the agent decision making at the testing phase. The NE process is applied after the training phase to denoise the memory network so that derived relations are formed in the memory network and retrieved during the testing phase. During the NE phase, indirect relations are enhanced, and the structure of episodic memory changes. This approach can also be interpreted as the agent's replay after the training phase, which is in line with recent findings in behavioral and neuroscience studies. In comparison with EPS, our model is able to model the formation of derived relations and other features such as the nodal effect in a more intrinsic manner. Decision making in the test phase is not an ad hoc computational method, but rather a retrieval and update process of the cached relations from the memory network based on the test trial. In order to study the role of parameters on agent performance, the proposed model is simulated and the results discussed through various experimental settings.</span>33248352710.1162/neco_a_01346https://direct.mit.edu/neco/article/33/2/483/95644/Enhanced-Equivalence-Projective-Simulation-AStability Conditions of Bicomplex-Valued Hopfield Neural Networks
https://direct.mit.edu/neco/article/33/2/552/95643/Stability-Conditions-of-Bicomplex-Valued-Hopfield
Mon, 01 Feb 2021 00:00:00 GMTKobayashi M. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Hopfield neural networks have been extended using hypercomplex numbers. The algebra of bicomplex numbers, also referred to as commutative quaternions, is a number system of dimension 4. Since the multiplication is commutative, many notions and theories of linear algebra, such as determinant, are available, unlike quaternions. A bicomplex-valued Hopfield neural network (BHNN) has been proposed as a multistate neural associative memory. However, the stability conditions have been insufficient for the projection rule. In this work, the stability conditions are extended and applied to improvement of the projection rule. The computer simulations suggest improved noise tolerance.</span>33255256210.1162/neco_a_01350https://direct.mit.edu/neco/article/33/2/552/95643/Stability-Conditions-of-Bicomplex-Valued-HopfieldDeeply Felt Affect: The Emergence of Valence in Deep Active Inference
https://direct.mit.edu/neco/article/33/2/398/95642/Deeply-Felt-Affect-The-Emergence-of-Valence-in
Mon, 01 Feb 2021 00:00:00 GMTHesp C, Smith R, Parr T, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The positive-negative axis of emotional valence has long been recognized as fundamental to adaptive behavior, but its origin and underlying function have largely eluded formal theorizing and computational modeling. Using deep active inference, a hierarchical inference scheme that rests on inverting a model of how sensory data are generated, we develop a principled Bayesian model of emotional valence. This formulation asserts that agents infer their valence state based on the expected precision of their action model—an internal estimate of overall model fitness (“subjective fitness”). This index of subjective fitness can be estimated within any environment and exploits the domain generality of second-order beliefs (beliefs about beliefs). We show how maintaining internal valence representations allows the ensuing affective agent to optimize confidence in action selection preemptively. Valence representations can in turn be optimized by leveraging the (Bayes-optimal) updating term for subjective fitness, which we label affective charge (AC). AC tracks changes in fitness estimates and lends a sign to otherwise unsigned divergences between predictions and outcomes. We simulate the resulting affective inference by subjecting an in silico affective agent to a T-maze paradigm requiring context learning, followed by context reversal. This formulation of affective inference offers a principled account of the link between affect, (mental) action, and implicit metacognition. It characterizes how a deep biological system can infer its affective state and reduce uncertainty about such inferences through internal action (i.e., top-down modulation of priors that underwrite confidence). Thus, we demonstrate the potential of active inference to provide a formal and computationally tractable account of affect. Our demonstration of the face validity and potential utility of this formulation represents the first step within a larger research program. Next, this model can be leveraged to test the hypothesized role of valence by fitting the model to behavioral and neuronal responses.</span>33239844610.1162/neco_a_01341https://direct.mit.edu/neco/article/33/2/398/95642/Deeply-Felt-Affect-The-Emergence-of-Valence-inA Novel Neural Model With Lateral Interaction for Learning Tasks
https://direct.mit.edu/neco/article/33/2/528/95641/A-Novel-Neural-Model-With-Lateral-Interaction-for
Mon, 01 Feb 2021 00:00:00 GMTJin D, Qin Z, Yang M, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We propose a novel neural model with lateral interaction for learning tasks. The model consists of two functional fields: an elementary field to extract features and a high-level field to store and recognize patterns. Each field is composed of some neurons with lateral interaction, and the neurons in different fields are connected by the rules of synaptic plasticity. The model is established on the current research of cognition and neuroscience, making it more transparent and biologically explainable. Our proposed model is applied to data classification and clustering. The corresponding algorithms share similar processes without requiring any parameter tuning and optimization processes. Numerical experiments validate that the proposed model is feasible in different learning tasks and superior to some state-of-the-art methods, especially in small sample learning, one-shot learning, and clustering.</span>33252855110.1162/neco_a_01345https://direct.mit.edu/neco/article/33/2/528/95641/A-Novel-Neural-Model-With-Lateral-Interaction-forEnhanced Signal Detection by Adaptive Decorrelation of Interspike Intervals
https://direct.mit.edu/neco/article/33/2/341/95640/Enhanced-Signal-Detection-by-Adaptive
Mon, 01 Feb 2021 00:00:00 GMTNesse WH, Maler L, Longtin A. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Spike trains with negative interspike interval (ISI) correlations, in which long/short ISIs are more likely followed by short/long ISIs, are common in many neurons. They can be described by stochastic models with a spike-triggered adaptation variable. We analyze a phenomenon in these models where such statistically dependent ISI sequences arise in tandem with quasi-statistically independent and identically distributed (quasi-IID) adaptation variable sequences. The sequences of adaptation states and resulting ISIs are linked by a nonlinear decorrelating transformation. We establish general conditions on a family of stochastic spiking models that guarantee this quasi-IID property and establish bounds on the resulting baseline ISI correlations. Inputs that elicit weak firing rate changes in samples with many spikes are known to be more detectible when negative ISI correlations are present because they reduce spike count variance; this defines a variance-reduced firing rate coding benchmark. We performed a Fisher information analysis on these adapting models exhibiting ISI correlations to show that a spike pattern code based on the quasi-IID property achieves the upper bound of detection performance, surpassing rate codes with the same mean rate—including the variance-reduced rate code benchmark—by 20% to 30%. The information loss in rate codes arises because the benefits of reduced spike count variance cannot compensate for the lower firing rate gain due to adaptation. Since adaptation states have similar dynamics to synaptic responses, the quasi-IID decorrelation transformation of the spike train is plausibly implemented by downstream neurons through matched postsynaptic kinetics. This provides an explanation for observed coding performance in sensory systems that cannot be accounted for by rate coding, for example, at the detection threshold where rate changes can be insignificant.</span>33234137510.1162/neco_a_01347https://direct.mit.edu/neco/article/33/2/341/95640/Enhanced-Signal-Detection-by-AdaptivePredicting the Ease of Human Category Learning Using Radial Basis Function Networks
https://direct.mit.edu/neco/article/33/2/376/95638/Predicting-the-Ease-of-Human-Category-Learning
Mon, 01 Feb 2021 00:00:00 GMTRoads BD, Mozer MC. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Our goal is to understand and optimize human concept learning by predicting the ease of learning of a particular exemplar or category. We propose a method for estimating <span style="font-style:italic;">ease values</span>, quantitative measures of ease of learning, as an alternative to conducting costly empirical training studies. Our method combines a psychological embedding of domain exemplars with a pragmatic categorization model. The two components are integrated using a radial basis function network (RBFN) that predicts ease values. The free parameters of the RBFN are fit using human similarity judgments, circumventing the need to collect human training data to fit more complex models of human categorization. We conduct two category-training experiments to validate predictions of the RBFN. We demonstrate that an instance-based RBFN outperforms both a prototype-based RBFN and an empirical approach using the raw data. Although the human data were collected across diverse experimental conditions, the predicted ease values strongly correlate with human learning performance. Training can be sequenced by (predicted) ease, achieving what is known as <span style="font-style:italic;">fading</span> in the psychology literature and <span style="font-style:italic;">curriculum learning</span> in the machine-learning literature, both of which have been shown to facilitate learning.</span>33237639710.1162/neco_a_01349https://direct.mit.edu/neco/article/33/2/376/95638/Predicting-the-Ease-of-Human-Category-LearningConductance-Based Adaptive Exponential Integrate-and-Fire Model
https://direct.mit.edu/neco/article/33/1/41/95662/Conductance-Based-Adaptive-Exponential-Integrate
Fri, 01 Jan 2021 00:00:00 GMTGórski T, Depannemaecker D, Destexhe A. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The intrinsic electrophysiological properties of single neurons can be described by a broad spectrum of models, from realistic Hodgkin-Huxley-type models with numerous detailed mechanisms to the phenomenological models. The adaptive exponential integrate-and-fire (AdEx) model has emerged as a convenient middle-ground model. With a low computational cost but keeping biophysical interpretation of the parameters, it has been extensively used for simulations of large neural networks. However, because of its current-based adaptation, it can generate unrealistic behaviors. We show the limitations of the AdEx model, and to avoid them, we introduce the conductance-based adaptive exponential integrate-and-fire model (CAdEx). We give an analysis of the dynamics of the CAdEx model and show the variety of firing patterns it can produce. We propose the CAdEx model as a richer alternative to perform network simulations with simplified models reproducing neuronal intrinsic properties.</span>331416610.1162/neco_a_01342https://direct.mit.edu/neco/article/33/1/41/95662/Conductance-Based-Adaptive-Exponential-IntegrateNMDA Receptor Alterations After Mild Traumatic Brain Injury Induce Deficits in Memory Acquisition and Recall
https://direct.mit.edu/neco/article/33/1/67/95661/NMDA-Receptor-Alterations-After-Mild-Traumatic
Fri, 01 Jan 2021 00:00:00 GMTGabrieli D, Schumm SN, Vigilante NF, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Mild traumatic brain injury (mTBI) presents a significant health concern with potential persisting deficits that can last decades. Although a growing body of literature improves our understanding of the brain network response and corresponding underlying cellular alterations after injury, the effects of cellular disruptions on local circuitry after mTBI are poorly understood. Our group recently reported how mTBI in neuronal networks affects the functional wiring of neural circuits and how neuronal inactivation influences the synchrony of coupled microcircuits. Here, we utilized a computational neural network model to investigate the circuit-level effects of N-methyl D-aspartate receptor dysfunction. The initial increase in activity in injured neurons spreads to downstream neurons, but this increase was partially reduced by restructuring the network with spike-timing-dependent plasticity. As a model of network-based learning, we also investigated how injury alters pattern acquisition, recall, and maintenance of a conditioned response to stimulus. Although pattern acquisition and maintenance were impaired in injured networks, the greatest deficits arose in recall of previously trained patterns. These results demonstrate how one specific mechanism of cellular-level damage in mTBI affects the overall function of a neural network and point to the importance of reversing cellular-level changes to recover important properties of learning and memory in a microcircuit.</span>331679510.1162/neco_a_01343https://direct.mit.edu/neco/article/33/1/67/95661/NMDA-Receptor-Alterations-After-Mild-TraumaticRobust Stability Analysis of Delayed Stochastic Neural Networks via Wirtinger-Based Integral Inequality
https://direct.mit.edu/neco/article/33/1/227/95660/Robust-Stability-Analysis-of-Delayed-Stochastic
Fri, 01 Jan 2021 00:00:00 GMTSuresh RR, Manivannan AA. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We discuss stability analysis for uncertain stochastic neural networks (SNNs) with time delay in this letter. By constructing a suitable Lyapunov-Krasovskii functional (LKF) and utilizing Wirtinger inequalities for estimating the integral inequalities, the delay-dependent stochastic stability conditions are derived in terms of linear matrix inequalities (LMIs). We discuss the parameter uncertainties in terms of norm-bounded conditions in the given interval with constant delay. The derived conditions ensure that the global, asymptotic stability of the states for the proposed SNNs. We verify the effectiveness and applicability of the proposed criteria with numerical examples.</span>33122724310.1162/neco_a_01344https://direct.mit.edu/neco/article/33/1/227/95660/Robust-Stability-Analysis-of-Delayed-StochasticInformation-Theoretic Representation Learning for Positive-Unlabeled Classification
https://direct.mit.edu/neco/article/33/1/244/95659/Information-Theoretic-Representation-Learning-for
Fri, 01 Jan 2021 00:00:00 GMTSakai T, Niu G, Sugiyama M. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Recent advances in weakly supervised classification allow us to train a classifier from only positive and unlabeled (PU) data. However, existing PU classification methods typically require an accurate estimate of the class-prior probability, a critical bottleneck particularly for high-dimensional data. This problem has been commonly addressed by applying principal component analysis in advance, but such unsupervised dimension reduction can collapse the underlying class structure. In this letter, we propose a novel representation learning method from PU data based on the information-maximization principle. Our method does not require class-prior estimation and thus can be used as a preprocessing method for PU classification. Through experiments, we demonstrate that our method, combined with deep neural networks, highly improves the accuracy of PU class-prior estimation, leading to state-of-the-art PU classification performance.</span>33124426810.1162/neco_a_01337https://direct.mit.edu/neco/article/33/1/244/95659/Information-Theoretic-Representation-Learning-forAn EM Algorithm for Capsule Regression
https://direct.mit.edu/neco/article/33/1/194/95658/An-EM-Algorithm-for-Capsule-Regression
Fri, 01 Jan 2021 00:00:00 GMTSaul LK. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, <a href="#B32" class="reflinks">2017</a>). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.</span>33119422610.1162/neco_a_01336https://direct.mit.edu/neco/article/33/1/194/95658/An-EM-Algorithm-for-Capsule-RegressionEfficient Actor-Critic Reinforcement Learning With Embodiment of Muscle Tone for Posture Stabilization of the Human Arm
https://direct.mit.edu/neco/article/33/1/129/95657/Efficient-Actor-Critic-Reinforcement-Learning-With
Fri, 01 Jan 2021 00:00:00 GMTIwamoto M, Kato D. <span class="paragraphSection"><div class="boxTitle">Abstract</div>This letter proposes a new idea to improve learning efficiency in reinforcement learning (RL) with the actor-critic method used as a muscle controller for posture stabilization of the human arm. Actor-critic RL (ACRL) is used for simulations to realize posture controls in humans or robots using muscle tension control. However, it requires very high computational costs to acquire a better muscle control policy for desirable postures. For efficient ACRL, we focused on embodiment that is supposed to potentially achieve efficient controls in research fields of artificial intelligence or robotics. According to the neurophysiology of motion control obtained from experimental studies using animals or humans, the pedunculopontine tegmental nucleus (PPTn) induces muscle tone suppression, and the midbrain locomotor region (MLR) induces muscle tone promotion. PPTn and MLR modulate the activation levels of mutually antagonizing muscles such as flexors and extensors in a process through which control signals are translated from the substantia nigra reticulata to the brain stem. Therefore, we hypothesized that the PPTn and MLR could control muscle tone, that is, the maximum values of activation levels of mutually antagonizing muscles using different sigmoidal functions for each muscle; then we introduced antagonism function models (AFMs) of PPTn and MLR for individual muscles, incorporating the hypothesis into the process to determine the activation level of each muscle based on the output of the actor in ACRL.ACRL with AFMs representing the embodiment of muscle tone successfully achieved posture stabilization in five joint motions of the right arm of a human adult male under gravity in predetermined target angles at an earlier period of learning than the learning methods without AFMs. The results obtained from this study suggest that the introduction of embodiment of muscle tone can enhance learning efficiency in posture stabilization disorders of humans or humanoid robots.</span>33112915610.1162/neco_a_01333https://direct.mit.edu/neco/article/33/1/129/95657/Efficient-Actor-Critic-Reinforcement-Learning-WithNew Insights Into Learning With Correntropy-Based Regression
https://direct.mit.edu/neco/article/33/1/157/95656/New-Insights-Into-Learning-With-Correntropy-Based
Fri, 01 Jan 2021 00:00:00 GMTFeng Y. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Stemming from information-theoretic learning, the correntropy criterion and its applications to machine learning tasks have been extensively studied and explored. Its application to regression problems leads to the robustness-enhanced regression paradigm: correntropy-based regression. Having drawn a great variety of successful real-world applications, its theoretical properties have also been investigated recently in a series of studies from a statistical learning viewpoint. The resulting big picture is that correntropy-based regression regresses toward the conditional mode function or the conditional mean function robustly under certain conditions. Continuing this trend and going further, in this study, we report some new insights into this problem. First, we show that under the additive noise regression model, such a regression paradigm can be deduced from minimum distance estimation, implying that the resulting estimator is essentially a minimum distance estimator and thus possesses robustness properties. Second, we show that the regression paradigm in fact provides a unified approach to regression problems in that it approaches the conditional mean, the conditional mode, and the conditional median functions under certain conditions. Third, we present some new results when it is used to learn the conditional mean function by developing its error bounds and exponential convergence rates under conditional (1+ε)-moment assumptions. The saturation effect on the established convergence rates, which was observed under (1+ε)-moment assumptions, still occurs, indicating the inherent bias of the regression estimator. These novel insights deepen our understanding of correntropy-based regression, help cement the theoretic correntropy framework, and enable us to investigate learning schemes induced by general bounded nonconvex loss functions.</span>33115717310.1162/neco_a_01334https://direct.mit.edu/neco/article/33/1/157/95656/New-Insights-Into-Learning-With-Correntropy-BasedAssociated Learning: Decomposing End-to-End Backpropagation Based on Autoencoders and Target Propagation
https://direct.mit.edu/neco/article/33/1/174/95655/Associated-Learning-Decomposing-End-to-End
Fri, 01 Jan 2021 00:00:00 GMTKao Y, Chen H. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Backpropagation (BP) is the cornerstone of today's deep learning algorithms, but it is inefficient partially because of backward locking, which means updating the weights of one layer locks the weight updates in the other layers. Consequently, it is challenging to apply parallel computing or a pipeline structure to update the weights in different layers simultaneously. In this letter, we introduce a novel learning structure, associated learning (AL), that modularizes the network into smaller components, each of which has a local objective. Because the objectives are mutually independent, AL can learn the parameters in different layers independently and simultaneously, so it is feasible to apply a pipeline structure to improve the training throughput. Specifically, this pipeline structure improves the complexity of the training time from O(nℓ), which is the time complexity when using BP and stochastic gradient descent (SGD) for training, to O(n+ℓ), where n is the number of training instances and ℓ is the number of hidden layers. Surprisingly, even though most of the parameters in AL do not directly interact with the target variable, training deep models by this method yields accuracies comparable to those from models trained using typical BP methods, in which all parameters are used to predict the target variable. Consequently, because of the scalability and the predictive power demonstrated in the experiments, AL deserves further study to determine the better hyperparameter settings, such as activation function selection, learning rate scheduling, and weight initialization, to accumulate experience, as we have done over the years with the typical BP method. In addition, perhaps our design can also inspire new network designs for deep learning. Our implementation is available at <a href="https://github.com/SamYWK/Associated_Learning">https://github.com/SamYWK/Associated_Learning</a>.</span>33117419310.1162/neco_a_01335https://direct.mit.edu/neco/article/33/1/174/95655/Associated-Learning-Decomposing-End-to-EndActive Learning for Level Set Estimation Under Input Uncertainty and Its Extensions
https://direct.mit.edu/neco/article/32/12/2486/95654/Active-Learning-for-Level-Set-Estimation-Under
Tue, 01 Dec 2020 00:00:00 GMTInatsu Y, Karasuyama M, Inoue K, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Testing under what conditions a product satisfies the desired properties is a fundamental problem in manufacturing industry. If the condition and the property are respectively regarded as the input and the output of a black-box function, this task can be interpreted as the problem called level set estimation (LSE): the problem of identifying input regions such that the function value is above (or below) a threshold. Although various methods for LSE problems have been developed, many issues remain to be solved for their practical use. As one of such issues, we consider the case where the input conditions cannot be controlled precisely—LSE problems under input uncertainty. We introduce a basic framework for handling input uncertainty in LSE problems and then propose efficient methods with proper theoretical guarantees. The proposed methods and theories can be generally applied to a variety of challenges related to LSE under input uncertainty such as cost-dependent input uncertainties and unknown input uncertainties. We apply the proposed methods to artificial and real data to demonstrate their applicability and effectiveness.</span>32122486253110.1162/neco_a_01332https://direct.mit.edu/neco/article/32/12/2486/95654/Active-Learning-for-Level-Set-Estimation-UnderResonator Networks, 2: Factorization Performance and Capacity Compared to Optimization-Based Methods
https://direct.mit.edu/neco/article/32/12/2332/95653/Resonator-Networks-2-Factorization-Performance-and
Tue, 01 Dec 2020 00:00:00 GMTKent SJ, Frady E, Sommer FT, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We develop theoretical foundations of resonator networks, a new type of recurrent neural network introduced in Frady, Kent, Olshausen, and Sommer (<a href="#B22" class="reflinks">2020</a>), a companion article in this issue, to solve a high-dimensional vector factorization problem arising in Vector Symbolic Architectures. Given a composite vector formed by the Hadamard product between a discrete set of high-dimensional vectors, a resonator network can efficiently decompose the composite into these factors. We compare the performance of resonator networks against optimization-based methods, including Alternating Least Squares and several gradient-based algorithms, showing that resonator networks are superior in several important ways. This advantage is achieved by leveraging a combination of nonlinear dynamics and searching in superposition, by which estimates of the correct solution are formed from a weighted superposition of all possible solutions. While the alternative methods also search in superposition, the dynamics of resonator networks allow them to strike a more effective balance between exploring the solution space and exploiting local information to drive the network toward probable solutions. Resonator networks are not guaranteed to converge, but within a particular regime they almost always do. In exchange for relaxing the guarantee of global convergence, resonator networks are dramatically more effective at finding factorizations than all alternative approaches considered.</span>32122332238810.1162/neco_a_01329https://direct.mit.edu/neco/article/32/12/2332/95653/Resonator-Networks-2-Factorization-Performance-andRedundancy-Aware Pruning of Convolutional Neural Networks
https://direct.mit.edu/neco/article/32/12/2532/95652/Redundancy-Aware-Pruning-of-Convolutional-Neural
Tue, 01 Dec 2020 00:00:00 GMTXie G. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Pruning is an effective way to slim and speed up convolutional neural networks. Generally previous work directly pruned neural networks in the original feature space without considering the correlation of neurons. We argue that such a way of pruning still keeps some redundancy in the pruned networks. In this letter, we proposed to prune in the intermediate space in which the correlation of neurons is eliminated. To achieve this goal, the input and output of a convolutional layer are first mapped to an intermediate space by orthogonal transformation. Then neurons are evaluated and pruned in the intermediate space. Extensive experiments have shown that our redundancy-aware pruning method surpasses state-of-the-art pruning methods on both efficiency and accuracy. Notably, using our redundancy-aware pruning method, ResNet models with three times the speed-up could achieve competitive performance with fewer floating point operations per second even compared to DenseNet.</span>32122532255610.1162/neco_a_01330https://direct.mit.edu/neco/article/32/12/2532/95652/Redundancy-Aware-Pruning-of-Convolutional-NeuralResonator Networks, 1: An Efficient Solution for Factoring High-Dimensional, Distributed Representations of Data Structures
https://direct.mit.edu/neco/article/32/12/2311/95651/Resonator-Networks-1-An-Efficient-Solution-for
Tue, 01 Dec 2020 00:00:00 GMTFrady E, Kent SJ, Olshausen BA, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The ability to encode and manipulate data structures with distributed neural representations could qualitatively enhance the capabilities of traditional neural networks by supporting rule-based symbolic reasoning, a central property of cognition. Here we show how this may be accomplished within the framework of Vector Symbolic Architectures (VSAs) (Plate, <a href="#B38" class="reflinks">1991</a>; Gayler, <a href="#B15" class="reflinks">1998</a>; Kanerva, <a href="#B23" class="reflinks">1996</a>), whereby data structures are encoded by combining high-dimensional vectors with operations that together form an algebra on the space of distributed representations. In particular, we propose an efficient solution to a hard combinatorial search problem that arises when decoding elements of a VSA data structure: the factorization of products of multiple codevectors. Our proposed algorithm, called a resonator network, is a new type of recurrent neural network that interleaves VSA multiplication operations and pattern completion. We show in two examples—parsing of a tree-like data structure and parsing of a visual scene—how the factorization problem arises and how the resonator network can solve it. More broadly, resonator networks open the possibility of applying VSAs to myriad artificial intelligence problems in real-world domains. The companion article in this issue (Kent, Frady, Sommer, & Olshausen, <a href="#B27" class="reflinks">2020</a>) presents a rigorous analysis and evaluation of the performance of resonator networks, showing it outperforms alternative approaches.</span>32122311233110.1162/neco_a_01331https://direct.mit.edu/neco/article/32/12/2311/95651/Resonator-Networks-1-An-Efficient-Solution-forDifferential Covariance: A New Method to Estimate Functional Connectivity in fMRI
https://direct.mit.edu/neco/article/32/12/2389/95650/Differential-Covariance-A-New-Method-to-Estimate
Tue, 01 Dec 2020 00:00:00 GMTLin TW, Chen Y, Bukhari Q, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Measuring functional connectivity from fMRI recordings is important in understanding processing in cortical networks. However, because the brain's connection pattern is complex, currently used methods are prone to producing false functional connections. We introduce differential covariance analysis, a new method that uses derivatives of the signal for estimating functional connectivity. We generated neural activities from dynamical causal modeling and a neural network of Hodgkin-Huxley neurons and then converted them to hemodynamic signals using the forward balloon model. The simulated fMRI signals, together with the ground-truth connectivity pattern, were used to benchmark our method with other commonly used methods. Differential covariance achieved better results in complex network simulations. This new method opens an alternative way to estimate functional connectivity.</span>32122389242110.1162/neco_a_01323https://direct.mit.edu/neco/article/32/12/2389/95650/Differential-Covariance-A-New-Method-to-EstimateSynchrony and Complexity in State-Related EEG Networks: An Application of Spectral Graph Theory
https://direct.mit.edu/neco/article/32/12/2422/95649/Synchrony-and-Complexity-in-State-Related-EEG
Tue, 01 Dec 2020 00:00:00 GMTGhaderi A, Baltaretu BR, Andevari M, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The brain may be considered as a synchronized dynamic network with several coherent dynamical units. However, concerns remain whether synchronizability is a stable state in the brain networks. If so, which index can best reveal the synchronizability in brain networks? To answer these questions, we tested the application of the spectral graph theory and the Shannon entropy as alternative approaches in neuroimaging. We specifically tested the alpha rhythm in the resting-state eye closed (rsEC) and the resting-state eye open (rsEO) conditions, a well-studied classical example of synchrony in neuroimaging EEG. Since the synchronizability of alpha rhythm is more stable during the rsEC than the rsEO, we hypothesized that our suggested spectral graph theory indices (as reliable measures to interpret the synchronizability of brain signals) should exhibit higher values in the rsEC than the rsEO condition. We performed two separate analyses of two different datasets (as elementary and confirmatory studies). Based on the results of both studies and in agreement with our hypothesis, the spectral graph indices revealed higher stability of synchronizability in the rsEC condition. The k-mean analysis indicated that the spectral graph indices can distinguish the rsEC and rsEO conditions by considering the synchronizability of brain networks. We also computed correlations among the spectral indices, the Shannon entropy, and the topological indices of brain networks, as well as random networks. Correlation analysis indicated that although the spectral and the topological properties of random networks are completely independent, these features are significantly correlated with each other in brain networks. Furthermore, we found that complexity in the investigated brain networks is inversely related to the stability of synchronizability. In conclusion, we revealed that the spectral graph theory approach can be reliably applied to study the stability of synchronizability of state-related brain networks.</span>32122422245410.1162/neco_a_01327https://direct.mit.edu/neco/article/32/12/2422/95649/Synchrony-and-Complexity-in-State-Related-EEGAnalyzing and Accelerating the Bottlenecks of Training Deep SNNs With Backpropagation
https://direct.mit.edu/neco/article/32/12/2557/95648/Analyzing-and-Accelerating-the-Bottlenecks-of
Tue, 01 Dec 2020 00:00:00 GMTChen R, Li L. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Spiking neural networks (SNNs) with the event-driven manner of transmitting spikes consume ultra-low power on neuromorphic chips. However, training deep SNNs is still challenging compared to convolutional neural networks (CNNs). The SNN training algorithms have not achieved the same performance as CNNs. In this letter, we aim to understand the intrinsic limitations of SNN training to design better algorithms. First, the pros and cons of typical SNN training algorithms are analyzed. Then it is found that the spatiotemporal backpropagation algorithm (STBP) has potential in training deep SNNs due to its simplicity and fast convergence. Later, the main bottlenecks of the STBP algorithm are analyzed, and three conditions for training deep SNNs with the STBP algorithm are derived. By analyzing the connection between CNNs and SNNs, we propose a weight initialization algorithm to satisfy the three conditions. Moreover, we propose an error minimization method and a modified loss function to further improve the training performance. Experimental results show that the proposed method achieves 91.53% accuracy on the CIFAR10 data set with 1% accuracy increase over the STBP algorithm and decreases the training epochs on the MNIST data set to 15 epochs (over 13 times speed-up compared to the STBP algorithm). The proposed method also decreases classification latency by over 25 times compared to the CNN-SNN conversion algorithms. In addition, the proposed method works robustly for very deep SNNs, while the STBP algorithm fails in a 19-layer SNN.</span>32122557260010.1162/neco_a_01319https://direct.mit.edu/neco/article/32/12/2557/95648/Analyzing-and-Accelerating-the-Bottlenecks-ofToward a Unified Framework for Cognitive Maps
https://direct.mit.edu/neco/article/32/12/2455/95647/Toward-a-Unified-Framework-for-Cognitive-Maps
Tue, 01 Dec 2020 00:00:00 GMTKim W, Yoo Y. <span class="paragraphSection"><div class="boxTitle">Abstract</div>In this study, we integrated neural encoding and decoding into a unified framework for spatial information processing in the brain. Specifically, the neural representations of self-location in the hippocampus (HPC) and entorhinal cortex (EC) play crucial roles in spatial navigation. Intriguingly, these neural representations in these neighboring brain areas show stark differences. Whereas the place cells in the HPC fire as a unimodal function of spatial location, the grid cells in the EC show periodic tuning curves with different periods for different subpopulations (called modules). By combining an encoding model for this modular neural representation and a realistic decoding model based on belief propagation, we investigated the manner in which self-location is encoded by neurons in the EC and then decoded by downstream neurons in the HPC. Through the results of numerical simulations, we first show the positive synergy effects of the modular structure in the EC. The modular structure introduces more coupling between heterogeneous modules with different periodicities, which provides increased error-correcting capabilities. This is also demonstrated through a comparison of the beliefs produced for decoding two- and four-module codes. Whereas the former resulted in a complete decoding failure, the latter correctly recovered the self-location even from the same inputs. Further analysis of belief propagation during decoding revealed complex dynamics in information updates due to interactions among multiple modules having diverse scales. Therefore, the proposed unified framework allows one to investigate the overall flow of spatial information, closing the loop of encoding and decoding self-location in the brain.</span>32122455248510.1162/neco_a_01326https://direct.mit.edu/neco/article/32/12/2455/95647/Toward-a-Unified-Framework-for-Cognitive-MapsEffect of Top-Down Connections in Hierarchical Sparse Coding
https://direct.mit.edu/neco/article/32/11/2279/95639/Effect-of-Top-Down-Connections-in-Hierarchical
Sun, 01 Nov 2020 00:00:00 GMTBoutin V, Franciosini A, Ruffier F, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Hierarchical sparse coding (HSC) is a powerful model to efficiently represent multidimensional, structured data such as images. The simplest solution to solve this computationally hard problem is to decompose it into independent layer-wise subproblems. However, neuroscientific evidence would suggest interconnecting these subproblems as in predictive coding (PC) theory, which adds top-down connections between consecutive layers. In this study, we introduce a new model, 2-layer sparse predictive coding (2L-SPC), to assess the impact of this interlayer feedback connection. In particular, the 2L-SPC is compared with a hierarchical Lasso (Hi-La) network made out of a sequence of independent Lasso layers. The 2L-SPC and a 2-layer Hi-La networks are trained on four different databases and with different sparsity parameters on each layer. First, we show that the overall prediction error generated by 2L-SPC is lower thanks to the feedback mechanism as it transfers prediction error between layers. Second, we demonstrate that the inference stage of the 2L-SPC is faster to converge and generates a refined representation in the second layer compared to the Hi-La model. Third, we show that the 2L-SPC top-down connection accelerates the learning process of the HSC problem. Finally, the analysis of the emerging dictionaries shows that the 2L-SPC features are more generic and present a larger spatial extension.</span>32112279230910.1162/neco_a_01325https://direct.mit.edu/neco/article/32/11/2279/95639/Effect-of-Top-Down-Connections-in-HierarchicalAssessing Goodness-of-Fit in Marked Point Process Models of Neural Population Coding via Time and Rate Rescaling
https://direct.mit.edu/neco/article/32/11/2145/95637/Assessing-Goodness-of-Fit-in-Marked-Point-Process
Sun, 01 Nov 2020 00:00:00 GMTYousefi A, Amidi Y, Nazari B, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Marked point process models have recently been used to capture the coding properties of neural populations from multiunit electrophysiological recordings without spike sorting. These clusterless models have been shown in some instances to better describe the firing properties of neural populations than collections of receptive field models for sorted neurons and to lead to better decoding results. To assess their quality, we previously proposed a goodness-of-fit technique for marked point process models based on time rescaling, which for a correct model produces a set of uniform samples over a random region of space. However, assessing uniformity over such a region can be challenging, especially in high dimensions. Here, we propose a set of new transformations in both time and the space of spike waveform features, which generate events that are uniformly distributed in the new mark and time spaces. These transformations are scalable to multidimensional mark spaces and provide uniformly distributed samples in hypercubes, which are well suited for uniformity tests. We discuss the properties of these transformations and demonstrate aspects of model fit captured by each transformation. We also compare multiple uniformity tests to determine their power to identify lack-of-fit in the rescaled data. We demonstrate an application of these transformations and uniformity tests in a simulation study. Proofs for each transformation are provided in the appendix.</span>32112145218610.1162/neco_a_01321https://direct.mit.edu/neco/article/32/11/2145/95637/Assessing-Goodness-of-Fit-in-Marked-Point-ProcessInferring Neuronal Couplings From Spiking Data Using a Systematic Procedure With a Statistical Criterion
https://direct.mit.edu/neco/article/32/11/2187/95636/Inferring-Neuronal-Couplings-From-Spiking-Data
Sun, 01 Nov 2020 00:00:00 GMTTerada Y, Obuchi T, Isomura T, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Recent remarkable advances in experimental techniques have provided a background for inferring neuronal couplings from point process data that include a great number of neurons. Here, we propose a systematic procedure for pre- and postprocessing generic point process data in an objective manner to handle data in the framework of a binary simple statistical model, the Ising or generalized McCulloch–Pitts model. The procedure has two steps: (1) determining time bin size for transforming the point process data into discrete-time binary data and (2) screening relevant couplings from the estimated couplings. For the first step, we decide the optimal time bin size by introducing the null hypothesis that all neurons would fire independently, then choosing a time bin size so that the null hypothesis is rejected with the strict criteria. The likelihood associated with the null hypothesis is analytically evaluated and used for the rejection process. For the second postprocessing step, after a certain estimator of coupling is obtained based on the preprocessed data set (any estimator can be used with the proposed procedure), the estimate is compared with many other estimates derived from data sets obtained by randomizing the original data set in the time direction. We accept the original estimate as relevant only if its absolute value is sufficiently larger than those of randomized data sets. These manipulations suppress false positive couplings induced by statistical noise. We apply this inference procedure to spiking data from synthetic and in vitro neuronal networks. The results show that the proposed procedure identifies the presence or absence of synaptic couplings fairly well, including their signs, for the synthetic and experimental data. In particular, the results support that we can infer the physical connections of underlying systems in favorable situations, even when using a simple statistical model.</span>32112187221110.1162/neco_a_01324https://direct.mit.edu/neco/article/32/11/2187/95636/Inferring-Neuronal-Couplings-From-Spiking-DataRepetitive Control for Multi-Joint Arm Movements Based on Virtual Trajectories
https://direct.mit.edu/neco/article/32/11/2212/95635/Repetitive-Control-for-Multi-Joint-Arm-Movements
Sun, 01 Nov 2020 00:00:00 GMTUno Y, Suzuki T, Kagawa T. <span class="paragraphSection"><div class="boxTitle">Abstract</div>According to the neuromuscular model of virtual trajectory control, the postures and movements of limbs are performed by shifting the equilibrium positions determined by agonist and antagonist muscle activities. In this study, we develop virtual trajectory control for the reaching movements of a multi-joint arm, introducing a proportional-derivative feedback control scheme. In virtual trajectory control, it is crucial to design a suitable virtual trajectory such that the desired trajectory can be realized. To this end, we propose an algorithm for updating virtual trajectories in repetitive control, which can be regarded as a Newton-like method in a function space. In our repetitive control, the virtual trajectory is corrected without explicit calculation of the arm dynamics, and the actual trajectory converges to the desired trajectory. Using computer simulations, we assessed the proposed repetitive control for the trajectory tracking of a two-link arm. Our results confirmed that when the feedback gains were reasonably high and the sampling time was sufficiently small, the virtual trajectory was adequately updated, and the desired trajectory was almost achieved within approximately 10 iterative trials. We also propose a method for modifying the virtual trajectory to ensure that the formation of the actual trajectory is identical even when the feedback gains are changed. This modification method makes it possible to execute flexible control, in which the feedback gains are effectively altered according to motion tasks.</span>32112212223610.1162/neco_a_01322https://direct.mit.edu/neco/article/32/11/2212/95635/Repetitive-Control-for-Multi-Joint-Arm-MovementsReverse-Engineering Neural Networks to Characterize Their Cost Functions
https://direct.mit.edu/neco/article/32/11/2085/95634/Reverse-Engineering-Neural-Networks-to
Sun, 01 Nov 2020 00:00:00 GMTIsomura T, Friston K. <span class="paragraphSection"><div class="boxTitle">Abstract</div>This letter considers a class of biologically plausible cost functions for neural networks, where the same cost function is minimized by both neural activity and plasticity. We show that such cost functions can be cast as a variational bound on model evidence under an implicit generative model. Using generative models based on partially observed Markov decision processes (POMDP), we show that neural activity and plasticity perform Bayesian inference and learning, respectively, by maximizing model evidence. Using mathematical and numerical analyses, we establish the formal equivalence between neural network cost functions and variational free energy under some prior beliefs about latent states that generate inputs. These prior beliefs are determined by particular constants (e.g., thresholds) that define the cost function. This means that the Bayes optimal encoding of latent or hidden states is achieved when the network's implicit priors match the process that generates its inputs. This equivalence is potentially important because it suggests that any hyperparameter of a neural network can itself be optimized—by minimization with respect to variational free energy. Furthermore, it enables one to characterize a neural network formally, in terms of its prior beliefs.</span>32112085212110.1162/neco_a_01315https://direct.mit.edu/neco/article/32/11/2085/95634/Reverse-Engineering-Neural-Networks-toReLU Networks Are Universal Approximators via Piecewise Linear or Constant Functions
https://direct.mit.edu/neco/article/32/11/2249/95633/ReLU-Networks-Are-Universal-Approximators-via
Sun, 01 Nov 2020 00:00:00 GMTHuang C. <span class="paragraphSection"><div class="boxTitle">Abstract</div>This letter proves that a ReLU network can approximate any continuous function with arbitrary precision by means of piecewise linear or constant approximations. For univariate function f(x), we use the composite of ReLUs to produce a line segment; all of the subnetworks of line segments comprise a ReLU network, which is a piecewise linear approximation to f(x). For multivariate function f(x), ReLU networks are constructed to approximate a piecewise linear function derived from triangulation methods approximating f(x). A neural unit called TRLU is designed by a ReLU network; the piecewise constant approximation, such as Haar wavelets, is implemented by rectifying the linear output of a ReLU network via TRLUs. New interpretations of deep layers, as well as some other results, are also presented.</span>32112249227810.1162/neco_a_01316https://direct.mit.edu/neco/article/32/11/2249/95633/ReLU-Networks-Are-Universal-Approximators-viaClosed-Loop Deep Learning: Generating Forward Models With Backpropagation
https://direct.mit.edu/neco/article/32/11/2122/95632/Closed-Loop-Deep-Learning-Generating-Forward
Sun, 01 Nov 2020 00:00:00 GMTDaryanavard S, Porr B. <span class="paragraphSection"><div class="boxTitle">Abstract</div>A reflex is a simple closed-loop control approach that tries to minimize an error but fails to do so because it will always react too late. An adaptive algorithm can use this error to learn a forward model with the help of predictive cues. For example, a driver learns to improve steering by looking ahead to avoid steering in the last minute. In order to process complex cues such as the road ahead, deep learning is a natural choice. However, this is usually achieved only indirectly by employing deep reinforcement learning having a discrete state space. Here, we show how this can be directly achieved by embedding deep learning into a closed-loop system and preserving its continuous processing. We show in z-space specifically how error backpropagation can be achieved and in general how gradient-based approaches can be analyzed in such closed-loop scenarios. The performance of this learning paradigm is demonstrated using a line follower in simulation and on a real robot that shows very fast and continuous learning.</span>32112122214410.1162/neco_a_01317https://direct.mit.edu/neco/article/32/11/2122/95632/Closed-Loop-Deep-Learning-Generating-ForwardBicomplex Projection Rule for Complex-Valued Hopfield Neural Networks
https://direct.mit.edu/neco/article/32/11/2237/95631/Bicomplex-Projection-Rule-for-Complex-Valued
Sun, 01 Nov 2020 00:00:00 GMTKobayashi M. <span class="paragraphSection"><div class="boxTitle">Abstract</div>A complex-valued Hopfield neural network (CHNN) with a multistate activation function is a multistate model of neural associative memory. The weight parameters need a lot of memory resources. Twin-multistate activation functions were introduced to quaternion- and bicomplex-valued Hopfield neural networks. Since their architectures are much more complicated than that of CHNN, the architecture should be simplified. In this work, the number of weight parameters is reduced by bicomplex projection rule for CHNNs, which is given by the decomposition of bicomplex-valued Hopfield neural networks. Computer simulations support that the noise tolerance of CHNN with a bicomplex projection rule is equal to or even better than that of quaternion- and bicomplex-valued Hopfield neural networks. By computer simulations, we find that the projection rule for hyperbolic-valued Hopfield neural networks in synchronous mode maintains a high noise tolerance.</span>32112237224810.1162/neco_a_01320https://direct.mit.edu/neco/article/32/11/2237/95631/Bicomplex-Projection-Rule-for-Complex-ValuedA Cerebellar Computational Mechanism for Delay Conditioning at Precise Time Intervals
https://direct.mit.edu/neco/article/32/11/2069/95630/A-Cerebellar-Computational-Mechanism-for-Delay
Sun, 01 Nov 2020 00:00:00 GMTSanger TD, Kawato M. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The cerebellum is known to have an important role in sensing and execution of precise time intervals, but the mechanism by which arbitrary time intervals can be recognized and replicated with high precision is unknown. We propose a computational model in which precise time intervals can be identified from the pattern of individual spike activity in a population of parallel fibers in the cerebellar cortex. The model depends on the presence of repeatable sequences of spikes in response to conditioned stimulus input. We emulate granule cells using a population of Izhikevich neuron approximations driven by random but repeatable mossy fiber input. We emulate long-term depression (LTD) and long-term potentiation (LTP) synaptic plasticity at the parallel fiber to Purkinje cell synapse. We simulate a delay conditioning paradigm with a conditioned stimulus (CS) presented to the mossy fibers and an unconditioned stimulus (US) some time later issued to the Purkinje cells as a teaching signal. We show that Purkinje cells rapidly adapt to decrease firing probability following onset of the CS only at the interval for which the US had occurred. We suggest that detection of replicable spike patterns provides an accurate and easily learned timing structure that could be an important mechanism for behaviors that require identification and production of precise time intervals.</span>32112069208410.1162/neco_a_01318https://direct.mit.edu/neco/article/32/11/2069/95630/A-Cerebellar-Computational-Mechanism-for-DelayFast and Accurate Langevin Simulations of Stochastic Hodgkin-Huxley Dynamics
https://direct.mit.edu/neco/article/32/10/1775/95623/Fast-and-Accurate-Langevin-Simulations-of
Thu, 01 Oct 2020 00:00:00 GMTPu S, Thomas PJ. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Fox and Lu introduced a Langevin framework for discrete-time stochastic models of randomly gated ion channels such as the Hodgkin-Huxley (HH) system. They derived a Fokker-Planck equation with state-dependent diffusion tensor D and suggested a Langevin formulation with noise coefficient matrix S such that <span style="font-style:italic;">SS</span>⊤=D. Subsequently, several authors introduced a variety of Langevin equations for the HH system. In this article, we present a natural 14-dimensional dynamics for the HH system in which each directed edge in the ion channel state transition graph acts as an independent noise source, leading to a 14 × 28 noise coefficient matrix S. We show that (1) the corresponding 14D system of ordinary differential equations is consistent with the classical 4D representation of the HH system; (2) the 14D representation leads to a noise coefficient matrix S that can be obtained cheaply on each time step, without requiring a matrix decomposition; (3) sample trajectories of the 14D representation are pathwise equivalent to trajectories of Fox and Lu's system, as well as trajectories of several existing Langevin models; (4) our 14D representation (and those equivalent to it) gives the most accurate interspike interval distribution, not only with respect to moments but under both the L1 and L∞ metric-space norms; and (5) the 14D representation gives an approximation to exact Markov chain simulations that are as fast and as efficient as all equivalent models. Our approach goes beyond existing models, in that it supports a stochastic shielding decomposition that dramatically simplifies S with minimal loss of accuracy under both voltage- and current-clamp conditions.</span>32101775183510.1162/neco_a_01312https://direct.mit.edu/neco/article/32/10/1775/95623/Fast-and-Accurate-Langevin-Simulations-ofActive Learning of Bayesian Linear Models with High-Dimensional Binary Features by Parameter Confidence-Region Estimation
https://direct.mit.edu/neco/article/32/10/1998/95622/Active-Learning-of-Bayesian-Linear-Models-with
Thu, 01 Oct 2020 00:00:00 GMTInatsu Y, Karasuyama M, Inoue K, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>In this letter, we study an active learning problem for maximizing an unknown linear function with high-dimensional binary features. This problem is notoriously complex but arises in many important contexts. When the sampling budget, that is, the number of possible function evaluations, is smaller than the number of dimensions, it tends to be impossible to identify all of the optimal binary features. Therefore, in practice, only a small number of such features are considered, with the majority kept fixed at certain default values, which we call the <span style="font-style:italic;">working set heuristic</span>. The main contribution of this letter is to formally study the working set heuristic and present a suite of theoretically robust algorithms for more efficient use of the sampling budget. Technically, we introduce a novel method for estimating the confidence regions of model parameters that is tailored to active learning with high-dimensional binary features. We provide a rigorous theoretical analysis of these algorithms and prove that a commonly used working set heuristic can identify optimal binary features with favorable sample complexity. We explore the performance of the proposed approach through numerical simulations and an application to a functional protein design problem.</span>32101998203110.1162/neco_a_01310https://direct.mit.edu/neco/article/32/10/1998/95622/Active-Learning-of-Bayesian-Linear-Models-withA Predictive-Coding Network That Is Both Discriminative and Generative
https://direct.mit.edu/neco/article/32/10/1836/95621/A-Predictive-Coding-Network-That-Is-Both
Thu, 01 Oct 2020 00:00:00 GMTSun W, Orchard J. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Predictive coding (PC) networks are a biologically interesting class of neural networks. Their layered hierarchy mimics the reciprocal connectivity pattern observed in the mammalian cortex, and they can be trained using local learning rules that approximate backpropagation (Bogacz, <a href="#B3" class="reflinks">2017</a>). However, despite having feedback connections that enable information to flow down the network hierarchy, discriminative PC networks are not typically generative. Clamping the output class and running the network to equilibrium yields an input sample that usually does not resemble the training input. This letter studies this phenomenon and proposes a simple solution that promotes the generation of input samples that resemble the training inputs. Simple decay, a technique already in wide use in neural networks, pushes the PC network toward a unique minimum two-norm solution, and that unique solution provably (for linear networks) matches the training inputs. The method also vastly improves the samples generated for nonlinear networks, as we demonstrate on MNIST.</span>32101836186210.1162/neco_a_01311https://direct.mit.edu/neco/article/32/10/1836/95621/A-Predictive-Coding-Network-That-Is-BothAnalysis of Regression Algorithms with Unbounded Sampling
https://direct.mit.edu/neco/article/32/10/1980/95620/Analysis-of-Regression-Algorithms-with-Unbounded
Thu, 01 Oct 2020 00:00:00 GMTTong H, Gao J. <span class="paragraphSection"><div class="boxTitle">Abstract</div>In this letter, we study a class of the regularized regression algorithms when the sampling process is unbounded. By choosing different loss functions, the learning algorithms can include a wide range of commonly used algorithms for regression. Unlike the prior work on theoretical analysis of unbounded sampling, no constraint on the output variables is specified in our setting. By an elegant error analysis, we prove consistency and finite sample bounds on the excess risk of the proposed algorithms under regular conditions.</span>32101980199710.1162/neco_a_01313https://direct.mit.edu/neco/article/32/10/1980/95620/Analysis-of-Regression-Algorithms-with-UnboundedActive Learning for Enumerating Local Minima Based on Gaussian Process Derivatives
https://direct.mit.edu/neco/article/32/10/2032/95619/Active-Learning-for-Enumerating-Local-Minima-Based
Thu, 01 Oct 2020 00:00:00 GMTInatsu Y, Sugita D, Toyoura K, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We study active learning (AL) based on gaussian processes (GPs) for efficiently enumerating all of the local minimum solutions of a black-box function. This problem is challenging because local solutions are characterized by their zero gradient and positive-definite Hessian properties, but those derivatives cannot be directly observed. We propose a new AL method in which the input points are sequentially selected such that the confidence intervals of the GP derivatives are effectively updated for enumerating local minimum solutions. We theoretically analyze the proposed method and demonstrate its usefulness through numerical experiments.</span>32102032206810.1162/neco_a_01307https://direct.mit.edu/neco/article/32/10/2032/95619/Active-Learning-for-Enumerating-Local-Minima-BasedMultiview Alignment and Generation in CCA via Consistent Latent Encoding
https://direct.mit.edu/neco/article/32/10/1936/95618/Multiview-Alignment-and-Generation-in-CCA-via
Thu, 01 Oct 2020 00:00:00 GMTShi Y, Pan Y, Xu D, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Multiview alignment, achieving one-to-one correspondence of multiview inputs, is critical in many real-world multiview applications, especially for cross-view data analysis problems. An increasing amount of work has studied this alignment problem with canonical correlation analysis (CCA). However, existing CCA models are prone to misalign the multiple views due to either the neglect of uncertainty or the inconsistent encoding of the multiple views. To tackle these two issues, this letter studies multiview alignment from a Bayesian perspective. Delving into the impairments of inconsistent encodings, we propose to recover correspondence of the multiview inputs by matching the marginalization of the joint distribution of multiview random variables under different forms of factorization. To realize our design, we present adversarial CCA (ACCA), which achieves consistent latent encodings by matching the marginalized latent encodings through the adversarial training paradigm. Our analysis, based on conditional mutual information, reveals that ACCA is flexible for handling implicit distributions. Extensive experiments on correlation analysis and cross-view generation under noisy input settings demonstrate the superiority of our model.</span>32101936197910.1162/neco_a_01309https://direct.mit.edu/neco/article/32/10/1936/95618/Multiview-Alignment-and-Generation-in-CCA-viaBinless Kernel Machine: Modeling Spike Train Transformation for Cognitive Neural Prostheses
https://direct.mit.edu/neco/article/32/10/1863/95616/Binless-Kernel-Machine-Modeling-Spike-Train
Thu, 01 Oct 2020 00:00:00 GMTQian C, Sun X, Wang Y, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Modeling spike train transformation among brain regions helps in designing a cognitive neural prosthesis that restores lost cognitive functions. Various methods analyze the nonlinear dynamic spike train transformation between two cortical areas with low computational eficiency. The application of a real-time neural prosthesis requires computational eficiency, performance stability, and better interpretation of the neural firing patterns that modulate target spike generation. We propose the binless kernel machine in the point-process framework to describe nonlinear dynamic spike train transformations. Our approach embeds the binless kernel to eficiently capture the feedforward dynamics of spike trains and maps the input spike timings into reproducing kernel Hilbert space (RKHS). An inhomogeneous Bernoulli process is designed to combine with a kernel logistic regression that operates on the binless kernel to generate an output spike train as a point process. Weights of the proposed model are estimated by maximizing the log likelihood of output spike trains in RKHS, which allows a global-optimal solution. To reduce computational complexity, we design a streaming-based clustering algorithm to extract typical and important spike train features. The cluster centers and their weights enable the visualization of the important input spike train patterns that motivate or inhibit output neuron firing. We test the proposed model on both synthetic data and real spike train data recorded from the dorsal premotor cortex and the primary motor cortex of a monkey performing a center-out task. Performances are evaluated by discrete-time rescaling Kolmogorov-Smirnov tests. Our model outperforms the existing methods with higher stability regardless of weight initialization and demonstrates higher eficiency in analyzing neural patterns from spike timing with less historical input (50%). Meanwhile, the typical spike train patterns selected according to weights are validated to encode output spike from the spike train of single-input neuron and the interaction of two input neurons.</span>32101863190010.1162/neco_a_01306https://direct.mit.edu/neco/article/32/10/1863/95616/Binless-Kernel-Machine-Modeling-Spike-TrainModal Principal Component Analysis
https://direct.mit.edu/neco/article/32/10/1901/95614/Modal-Principal-Component-Analysis
Thu, 01 Oct 2020 00:00:00 GMTSando K, Hino H. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Principal component analysis (PCA) is a widely used method for data processing, such as for dimension reduction and visualization. Standard PCA is known to be sensitive to outliers, and various robust PCA methods have been proposed. It has been shown that the robustness of many statistical methods can be improved using mode estimation instead of mean estimation, because mode estimation is not significantly affected by the presence of outliers. Thus, this study proposes a modal principal component analysis (MPCA), which is a robust PCA method based on mode estimation. The proposed method finds the minor component by estimating the mode of the projected data points. As a theoretical contribution, probabilistic convergence property, influence function, finite-sample breakdown point, and its lower bound for the proposed MPCA are derived. The experimental results show that the proposed method has advantages over conventional methods.</span>32101901193510.1162/neco_a_01308https://direct.mit.edu/neco/article/32/10/1901/95614/Modal-Principal-Component-AnalysisTensor Least Angle Regression for Sparse Representations of Multidimensional Signals
https://direct.mit.edu/neco/article/32/9/1697/95606/Tensor-Least-Angle-Regression-for-Sparse
Tue, 01 Sep 2020 00:00:00 GMTWickramasingha I, Elrewainy A, Sobhy M, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Sparse signal representations have gained much interest recently in both signal processing and statistical communities. Compared to orthogonal matching pursuit (OMP) and basis pursuit, which solve the L0 and L1 constrained sparse least-squares problems, respectively, least angle regression (LARS) is a computationally efficient method to solve both problems for all critical values of the regularization parameter λ. However, all of these methods are not suitable for solving large multidimensional sparse least-squares problems, as they would require extensive computational power and memory. An earlier generalization of OMP, known as Kronecker-OMP, was developed to solve the L0 problem for large multidimensional sparse least-squares problems. However, its memory usage and computation time increase quickly with the number of problem dimensions and iterations. In this letter, we develop a generalization of LARS, tensor least angle regression (T-LARS) that could efficiently solve either large L0 or large L1 constrained multidimensional, sparse, least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameter λ and with lower computational complexity and memory usage than Kronecker-OMP. To demonstrate the validity and performance of our T-LARS algorithm, we used it to successfully obtain different sparse representations of two relatively large 3D brain images, using fixed and learned separable overcomplete dictionaries, by solving both L0 and L1 constrained sparse least-squares problems. Our numerical experiments demonstrate that our T-LARS algorithm is significantly faster (46 to 70 times) than Kronecker-OMP in obtaining K-sparse solutions for multilinear leastsquares problems. However, the K-sparse solutions obtained using Kronecker-OMP always have a slightly lower residual error (1.55% to 2.25%) than ones obtained by T-LARS. Therefore, T-LARS could be an important tool for numerous multidimensional biomedical signal processing applications.</span>3291697173210.1162/neco_a_01304https://direct.mit.edu/neco/article/32/9/1697/95606/Tensor-Least-Angle-Regression-for-SparsePolynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback
https://direct.mit.edu/neco/article/32/9/1733/95605/Polynomial-Time-Algorithms-for-Multiple-Arm
Tue, 01 Sep 2020 00:00:00 GMTKuroki Y, Xu L, Miyauchi A, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We study the problem of stochastic multiple-arm identification, where an agent sequentially explores a size-k subset of arms (also known as a <span style="font-style:italic;">super arm</span>) from given n arms and tries to identify the best super arm. Most work so far has considered the semi-bandit setting, where the agent can observe the reward of each pulled arm or assumed each arm can be queried at each round. However, in real-world applications, it is costly or sometimes impossible to observe a reward of individual arms. In this study, we tackle the full-bandit setting, where only a noisy observation of the total sum of a super arm is given at each pull. Although our problem can be regarded as an instance of the best arm identification in linear bandits, a naive approach based on linear bandits is computationally infeasible since the number of super arms K is exponential. To cope with this problem, we first design a polynomial-time approximation algorithm for a 0-1 quadratic programming problem arising in confidence ellipsoid maximization. Based on our approximation algorithm, we propose a bandit algorithm whose computation time is O(log K), thereby achieving an exponential speedup over linear bandit algorithms. We provide a sample complexity upper bound that is still worst-case optimal. Finally, we conduct experiments on large-scale data sets with more than 1010 super arms, demonstrating the superiority of our algorithms in terms of both the computation time and the sample complexity.</span>3291733177310.1162/neco_a_01299https://direct.mit.edu/neco/article/32/9/1733/95605/Polynomial-Time-Algorithms-for-Multiple-ArmParallel Neural Multiprocessing with Gamma Frequency Latencies
https://direct.mit.edu/neco/article/32/9/1635/95604/Parallel-Neural-Multiprocessing-with-Gamma
Tue, 01 Sep 2020 00:00:00 GMTZhang R, Ballard DH. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The Poisson variability in cortical neural responses has been typically modeled using spike averaging techniques, such as trial averaging and rate coding, since such methods can produce reliable correlates of behavior. However, mechanisms that rely on counting spikes could be slow and inefficient and thus might not be useful in the brain for computations at timescales in the 10 millisecond range. This issue has motivated a search for alternative spike codes that take advantage of spike timing and has resulted in many studies that use synchronized neural networks for communication. Here we focus on recent studies that suggest that the gamma frequency may provide a reference that allows local spike phase representations that could result in much faster information transmission. We have developed a unified model (gamma spike multiplexing) that takes advantage of a single cycle of a cell's somatic gamma frequency to modulate the generation of its action potentials. An important consequence of this coding mechanism is that it allows multiple independent neural processes to run in parallel, thereby greatly increasing the processing capability of the cortex. System-level simulations and preliminary analysis of mouse cortical cell data are presented as support for the proposed theoretical model.</span>3291635166310.1162/neco_a_01301https://direct.mit.edu/neco/article/32/9/1635/95604/Parallel-Neural-Multiprocessing-with-GammaA Mean-Field Description of Bursting Dynamics in Spiking Neural Networks with Short-Term Adaptation
https://direct.mit.edu/neco/article/32/9/1615/95603/A-Mean-Field-Description-of-Bursting-Dynamics-in
Tue, 01 Sep 2020 00:00:00 GMTGast R, Schmidt H, Knösche TR. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Bursting plays an important role in neural communication. At the population level, macroscopic bursting has been identified in populations of neurons that do not express intrinsic bursting mechanisms. For the analysis of phase transitions between bursting and non-bursting states, mean-field descriptions of macroscopic bursting behavior are a valuable tool. In this article, we derive mean-field descriptions of populations of spiking neurons and examine whether states of collective bursting behavior can arise from short-term adaptation mechanisms. Specifically, we consider synaptic depression and spike-frequency adaptation in networks of quadratic integrate-and-fire neurons. Analyzing the mean-field model via bifurcation analysis, we find that bursting behavior emerges for both types of short-term adaptation. This bursting behavior can coexist with steady-state behavior, providing a bistable regime that allows for transient switches between synchronized and nonsynchronized states of population dynamics. For all of these findings, we demonstrate a close correspondence between the spiking neural network and the mean-field model. Although the mean-field model has been derived under the assumptions of an infinite population size and all-to-all coupling inside the population, we show that this correspondence holds even for small, sparsely coupled networks. In summary, we provide mechanistic descriptions of phase transitions between bursting and steady-state population dynamics, which play important roles in both healthy neural communication and neurological disorders.</span>3291615163410.1162/neco_a_01300https://direct.mit.edu/neco/article/32/9/1615/95603/A-Mean-Field-Description-of-Bursting-Dynamics-inHyperbolic-Valued Hopfield Neural Networks in Synchronous Mode
https://direct.mit.edu/neco/article/32/9/1685/95602/Hyperbolic-Valued-Hopfield-Neural-Networks-in
Tue, 01 Sep 2020 00:00:00 GMTKobayashi M. <span class="paragraphSection"><div class="boxTitle">Abstract</div>For most multistate Hopfield neural networks, the stability conditions in asynchronous mode are known, whereas those in synchronous mode are not. If they were to converge in synchronous mode, recall would be accelerated by parallel processing. Complex-valued Hopfield neural networks (CHNNs) with a projection rule do not converge in synchronous mode. In this work, we provide stability conditions for hyperbolic Hopfield neural networks (HHNNs) in synchronous mode instead of CHNNs. HHNNs provide better noise tolerance than CHNNs. In addition, the stability conditions are applied to the projection rule, and HHNNs with a projection rule converge in synchronous mode. By computer simulations, we find that the projection rule for HHNNs in synchronous mode maintains a high noise tolerance.</span>3291685169610.1162/neco_a_01303https://direct.mit.edu/neco/article/32/9/1685/95602/Hyperbolic-Valued-Hopfield-Neural-Networks-inFine-Grained 3D-Attention Prototypes for Few-Shot Learning
https://direct.mit.edu/neco/article/32/9/1664/95601/Fine-Grained-3D-Attention-Prototypes-for-Few-Shot
Tue, 01 Sep 2020 00:00:00 GMTHu X, Liu J, Ma J, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>In the real world, a limited number of labeled finely grained images per class can hardly represent the class distribution effectively. Due to the more subtle visual differences in fine-grained images than simple images with obvious objects, that is, there exist smaller interclass and larger intraclass variations. To solve these issues, we propose an end-to-end attention-based model for fine-grained few-shot image classification (AFG) with the recent episode training strategy. It is composed mainly of a feature learning module, an image reconstruction module, and a label distribution module. The feature learning module mainly devises a 3D-Attention mechanism, which considers both the spatial positions and different channel attentions of the image features, in order to learn more discriminative local features to better represent the class distribution. The image reconstruction module calculates the mappings between local features and the original images. It is constrained by a designed loss function as auxiliary supervised information, so that the learning of each local feature does not need extra annotations. The label distribution module is used to predict the label distribution of a given unlabeled sample, and we use the local features to represent the image features for classification. By conducting comprehensive experiments on Mini-ImageNet and three fine-grained data sets, we demonstrate that the proposed model achieves superior performance over the competitors.</span>3291664168410.1162/neco_a_01302https://direct.mit.edu/neco/article/32/9/1664/95601/Fine-Grained-3D-Attention-Prototypes-for-Few-ShotInference of a Mesoscopic Population Model from Population Spike Trains
https://direct.mit.edu/neco/article/32/8/1448/95629/Inference-of-a-Mesoscopic-Population-Model-from
Sat, 01 Aug 2020 00:00:00 GMTRené A, Longtin A, Macke JH. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Understanding how rich dynamics emerge in neural populations requires models exhibiting a wide range of behaviors while remaining interpretable in terms of connectivity and single-neuron dynamics. However, it has been challenging to fit such mechanistic spiking networks at the single-neuron scale to empirical population data. To close this gap, we propose to fit such data at a mesoscale, using a mechanistic but low-dimensional and, hence, statistically tractable model. The mesoscopic representation is obtained by approximating a population of neurons as multiple homogeneous pools of neurons and modeling the dynamics of the aggregate population activity within each pool. We derive the likelihood of both single-neuron and connectivity parameters given this activity, which can then be used to optimize parameters by gradient ascent on the log likelihood or perform Bayesian inference using Markov chain Monte Carlo (MCMC) sampling. We illustrate this approach using a model of generalized integrate-and-fire neurons for which mesoscopic dynamics have been previously derived and show that both single-neuron and connectivity parameters can be recovered from simulated data. In particular, our inference method extracts posterior correlations between model parameters, which define parameter subsets able to reproduce the data. We compute the Bayesian posterior for combinations of parameters using MCMC sampling and investigate how the approximations inherent in a mesoscopic population model affect the accuracy of the inferred single-neuron parameters.</span>3281448149810.1162/neco_a_01292https://direct.mit.edu/neco/article/32/8/1448/95629/Inference-of-a-Mesoscopic-Population-Model-fromTheory and Algorithms for Shapelet-Based Multiple-Instance Learning
https://direct.mit.edu/neco/article/32/8/1580/95628/Theory-and-Algorithms-for-Shapelet-Based-Multiple
Sat, 01 Aug 2020 00:00:00 GMTSuehiro D, Hatano K, Takimoto E, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We propose a new formulation of multiple-instance learning (MIL), in which a unit of data consists of a set of instances called a bag. The goal is to find a good classifier of bags based on the similarity with a “shapelet” (or pattern), where the similarity of a bag with a shapelet is the maximum similarity of instances in the bag. In previous work, some of the training instances have been chosen as shapelets with no theoretical justification. In our formulation, we use all possible, and thus infinitely many, shapelets, resulting in a richer class of classifiers. We show that the formulation is tractable, that is, it can be reduced through linear programming boosting (LPBoost) to difference of convex (DC) programs of finite (actually polynomial) size. Our theoretical result also gives justification to the heuristics of some previous work. The time complexity of the proposed algorithm highly depends on the size of the set of all instances in the training sample. To apply to the data containing a large number of instances, we also propose a heuristic option of the algorithm without the loss of the theoretical guarantee. Our empirical study demonstrates that our algorithm uniformly works for shapelet learning tasks on time-series classification and various MIL tasks with comparable accuracy to the existing methods. Moreover, we show that the proposed heuristics allow us to achieve the result in reasonable computational time.</span>3281580161310.1162/neco_a_01297https://direct.mit.edu/neco/article/32/8/1580/95628/Theory-and-Algorithms-for-Shapelet-Based-MultipleA Discrete-Time Neurodynamic Approach to Sparsity-Constrained Nonnegative Matrix Factorization
https://direct.mit.edu/neco/article/32/8/1531/95627/A-Discrete-Time-Neurodynamic-Approach-to-Sparsity
Sat, 01 Aug 2020 00:00:00 GMTLi X, Wang J, Kwong S. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Sparsity is a desirable property in many nonnegative matrix factorization (NMF) applications. Although some level of sparseness of NMF solutions can be achieved by using regularization, the resulting sparsity depends highly on the regularization parameter to be valued in an ad hoc way. In this letter we formulate sparse NMF as a mixed-integer optimization problem with sparsity as binary constraints. A discrete-time projection neural network is developed for solving the formulated problem. Sufficient conditions for its stability and convergence are analytically characterized by using Lyapunov's method. Experimental results on sparse feature extraction are discussed to substantiate the superiority of this approach to extracting highly sparse features.</span>3281531156210.1162/neco_a_01294https://direct.mit.edu/neco/article/32/8/1531/95627/A-Discrete-Time-Neurodynamic-Approach-to-SparsityStochastic Multichannel Ranking with Brain Dynamics Preferences
https://direct.mit.edu/neco/article/32/8/1499/95626/Stochastic-Multichannel-Ranking-with-Brain
Sat, 01 Aug 2020 00:00:00 GMTPan Y, Tsang IW, Singh AK, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>A driver's cognitive state of mental fatigue significantly affects his or her driving performance and more important, public safety. Previous studies have leveraged reaction time (RT) as the metric for mental fatigue and aim at estimating the exact value of RT using electroencephalogram (EEG) signals within a regression model. However, due to the easily corrupted and also nonsmooth properties of RTs during data collection, methods focusing on predicting the exact value of a noisy measurement, RT generally suffer from poor generalization performance. Considering that human RT is the reflection of brain dynamics preference (BDP) rather than a single regression output of EEG signals, we propose a novel channel-reliability-aware ranking (CArank) model for the multichannel ranking problem. CArank learns from BDPs using EEG data robustly and aims at preserving the ordering corresponding to RTs. In particular, we introduce a transition matrix to characterize the reliability of each channel used in the EEG data, which helps in learning with BDPs only from informative EEG channels. To handle large-scale EEG signals, we propose a stochastic-generalized expectation maximum (SGEM) algorithm to update CArank in an online fashion. Comprehensive empirical analysis on EEG signals from 40 participants shows that our CArank achieves substantial improvements in reliability while simultaneously detecting noisy or less informative EEG channels.</span>3281499153010.1162/neco_a_01293https://direct.mit.edu/neco/article/32/8/1499/95626/Stochastic-Multichannel-Ranking-with-BrainOn a Scalable Entropic Breaching of the Overfitting Barrier for Small Data Problems in Machine Learning
https://direct.mit.edu/neco/article/32/8/1563/95625/On-a-Scalable-Entropic-Breaching-of-the
Sat, 01 Aug 2020 00:00:00 GMTHorenko I. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Overfitting and treatment of small data are among the most challenging problems in machine learning (ML), when a relatively small data statistics size T is not enough to provide a robust ML fit for a relatively large data feature dimension D. Deploying a massively parallel ML analysis of generic classification problems for different D and T, we demonstrate the existence of statistically significant linear overfitting barriers for common ML methods. The results reveal that for a robust classification of bioinformatics-motivated generic problems with the long short-term memory deep learning classifier (LSTM), one needs in the best case a statistics T that is at least 13.8 times larger than the feature dimension D. We show that this overfitting barrier can be breached at a 10-12 fraction of the computational cost by means of the entropy-optimal scalable probabilistic approximations algorithm (eSPA), performing a joint solution of the entropy-optimal Bayesian network inference and feature space segmentation problems. Application of eSPA to experimental single cell RNA sequencing data exhibits a 30-fold classification performance boost when compared to standard bioinformatics tools and a 7-fold boost when compared to the deep learning LSTM classifier.</span>3281563157910.1162/neco_a_01296https://direct.mit.edu/neco/article/32/8/1563/95625/On-a-Scalable-Entropic-Breaching-of-theAny Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective
https://direct.mit.edu/neco/article/32/8/1431/95624/Any-Target-Function-Exists-in-a-Neighborhood-of
Sat, 01 Aug 2020 00:00:00 GMTAmari S. <span class="paragraphSection"><div class="boxTitle">Abstract</div>It is known that any target function is realized in a sufficiently small neighborhood of any randomly connected deep network, provided the width (the number of neurons in a layer) is sufficiently large. There are sophisticated analytical theories and discussions concerning this striking fact, but rigorous theories are very complicated. We give an elementary geometrical proof by using a simple model for the purpose of elucidating its structure. We show that high-dimensional geometry plays a magical role. When we project a high-dimensional sphere of radius 1 to a low-dimensional subspace, the uniform distribution over the sphere shrinks to a gaussian distribution with negligibly small variances and covariances.</span>3281431144710.1162/neco_a_01295https://direct.mit.edu/neco/article/32/8/1431/95624/Any-Target-Function-Exists-in-a-Neighborhood-ofMinimal Spiking Neuron for Solving Multilabel Classification Tasks
https://direct.mit.edu/neco/article/32/7/1408/95600/Minimal-Spiking-Neuron-for-Solving-Multilabel
Wed, 01 Jul 2020 00:00:00 GMTFil J, Chu D. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The multispike tempotron (MST) is a powersul, single spiking neuron model that can solve complex supervised classification tasks. It is also internally complex, computationally expensive to evaluate, and unsuitable for neuromorphic hardware. Here we aim to understand whether it is possible to simplify the MST model while retaining its ability to learn and process information. To this end, we introduce a family of generalized neuron models (GNMs) that are a special case of the spike response model and much simpler and cheaper to simulate than the MST. We find that over a wide range of parameters, the GNM can learn at least as well as the MST does.We identify the temporal autocorrelation of the membrane potential as the most important ingredient of the GNM that enables it to classify multiple spatiotemporal patterns. We also interpret the GNM as a chemical system, thus conceptually bridging computation by neural networks with molecular information processing. We conclude the letter by proposing alternative training approaches for the GNM, including error trace learning and error backpropagation.</span>3271408142910.1162/neco_a_01290https://direct.mit.edu/neco/article/32/7/1408/95600/Minimal-Spiking-Neuron-for-Solving-MultilabelShapley Homology: Topological Analysis of Sample Influence for Neural Networks
https://direct.mit.edu/neco/article/32/7/1355/95599/Shapley-Homology-Topological-Analysis-of-Sample
Wed, 01 Jul 2020 00:00:00 GMTZhang K, Wang Q, Liu X, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Data samples collected for training machine learning models are typically assumed to be independent and identically distributed (i.i.d.). Recent research has demonstrated that this assumption can be problematic as it simplifies the manifold of structured data. This has motivated different research areas such as data poisoning, model improvement, and explanation of machine learning models. In this work, we study the influence of a sample on determining the intrinsic topological features of its underlying manifold. We propose the Shapley homology framework, which provides a quantitative metric for the influence of a sample of the homology of a simplicial complex. Our proposed framework consists of two main parts: homology analysis, where we compute the Betti number of the target topological space, and Shapley value calculation, where we decompose the topological features of a complex built from data points to individual points. By interpreting the influence as a probability measure, we further define an entropy that reflects the complexity of the data manifold. Furthermore, we provide a preliminary discussion of the connection of the Shapley homology to the Vapnik-Chervonenkis dimension. Empirical studies show that when the zero-dimensional Shapley homology is used on neighboring graphs, samples with higher influence scores have a greater impact on the accuracy of neural networks that determine graph connectivity and on several regular grammars whose higher entropy values imply greater difficulty in being learned.</span>3271355137810.1162/neco_a_01289https://direct.mit.edu/neco/article/32/7/1355/95599/Shapley-Homology-Topological-Analysis-of-SampleA Mathematical Analysis of Memory Lifetime in a Simple Network Model of Memory
https://direct.mit.edu/neco/article/32/7/1322/95598/A-Mathematical-Analysis-of-Memory-Lifetime-in-a
Wed, 01 Jul 2020 00:00:00 GMTHelson P. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We study the learning of an external signal by a neural network and the time to forget it when this network is submitted to noise. The presentation of an external stimulus to the recurrent network of binary neurons may change the state of the synapses. Multiple presentations of a unique signal lead to its learning. Then, during the forgetting time, the presentation of other signals (noise) may also modify the synaptic weights. We construct an estimator of the initial signal using the synaptic currents and in this way define a probability of error. In our model, these synaptic currents evolve as Markov chains. We study the dynamics of these Markov chains and obtain a lower bound on the number of external stimuli that the network can receive before the initial signal is considered forgotten (probability of error above a given threshold). Our results are based on a finite-time analysis rather than large-time asymptotic. We finally present numerical illustrations of our results.</span>3271322135410.1162/neco_a_01286https://direct.mit.edu/neco/article/32/7/1322/95598/A-Mathematical-Analysis-of-Memory-Lifetime-in-aA Model for the Study of the Increase in Stimulus and Change Point Detection with Small and Variable Spiking Delays
https://direct.mit.edu/neco/article/32/7/1277/95597/A-Model-for-the-Study-of-the-Increase-in-Stimulus
Wed, 01 Jul 2020 00:00:00 GMTStraub B, Schneider G. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Precise timing of spikes between different neurons has been found to convey reliable information beyond the spike count. In contrast, the role of small and variable spiking delays, as reported, for example, in the visual cortex, remains largely unclear. This issue becomes particularly important considering the high speed of neuronal information processing, which is assumed to be based on only a few milliseconds within each processing step. We investigate the role of small and variable spiking delays with a parsimonious stochastic spiking model that is strongly motivated by experimental observations. The model contains only two parameters for the response of a neuron to one stimulus, describing directly the rate and the delay, or phase. Within the theoretical model, we specifically investigate two quantities, the probability of correct stimulus detection and the probability of correct change point detection, as a function of these parameters and within short periods of time. Optimal combinations of the two parameters across stimuli are derived that maximize these probabilities and enable comparison of pure rate, pure phase, and combined codes. In particular, the gain in correct detection probability when adding small and variable spiking delays to pure rate coding increases with the number of stimuli. More interesting, small and variable spiking delays can considerably improve the process of detecting changes in the stimulus, while also decreasing the probability of false alarms and thus increasing robustness and speed of change point detection. The results are compared to empirical spike train recordings of neurons in the visual cortex reported earlier in response to a number of visual stimuli. The results suggest that near-optimal combinations of rate and phase parameters may be implemented in the brain and that adding phase information could particularly increase the quality of change point detection in cases of highly similar stimuli.</span>3271277132110.1162/neco_a_01285https://direct.mit.edu/neco/article/32/7/1277/95597/A-Model-for-the-Study-of-the-Increase-in-StimulusHeterogeneous Synaptic Weighting Improves Neural Coding in the Presence of Common Noise
https://direct.mit.edu/neco/article/32/7/1239/95596/Heterogeneous-Synaptic-Weighting-Improves-Neural
Wed, 01 Jul 2020 00:00:00 GMTSachdeva PS, Livezey JA, DeWeese MR. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Simultaneous recordings from the cortex have revealed that neural activity is highly variable and that some variability is shared across neurons in a population. Further experimental work has demonstrated that the shared component of a neuronal population's variability is typically comparable to or larger than its private component. Meanwhile, an abundance of theoretical work has assessed the impact that shared variability has on a population code. For example, shared input noise is understood to have a detrimental impact on a neural population's coding fidelity. However, other contributions to variability, such as common noise, can also play a role in shaping correlated variability. We present a network of linear-nonlinear neurons in which we introduce a common noise input to model—for instance, variability resulting from upstream action potentials that are irrelevant to the task at hand. We show that by applying a heterogeneous set of synaptic weights to the neural inputs carrying the common noise, the network can improve its coding ability as measured by both Fisher information and Shannon mutual information, even in cases where this results in amplification of the common noise. With a broad and heterogeneous distribution of synaptic weights, a population of neurons can remove the harmful effects imposed by afferents that are uninformative about a stimulus. We demonstrate that some nonlinear networks benefit from weight diversification up to a certain population size, above which the drawbacks from amplified noise dominate over the benefits of diversification. We further characterize these benefits in terms of the relative strength of shared and private variability sources. Finally, we studied the asymptotic behavior of the mutual information and Fisher information analytically in our various networks as a function of population size. We find some surprising qualitative changes in the asymptotic behavior as we make seemingly minor changes in the synaptic weight distributions.</span>3271239127610.1162/neco_a_01287https://direct.mit.edu/neco/article/32/7/1239/95596/Heterogeneous-Synaptic-Weighting-Improves-NeuralGeneration of Scale-Invariant Sequential Activity in Linear Recurrent Networks
https://direct.mit.edu/neco/article/32/7/1379/95594/Generation-of-Scale-Invariant-Sequential-Activity
Wed, 01 Jul 2020 00:00:00 GMTLiu Y, Howard MW. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Sequential neural activity has been observed in many parts of the brain and has been proposed as a neural mechanism for memory. The natural world expresses temporal relationships at a wide range of scales. Because we cannot know the relevant scales a priori, it is desirable that memory, and thus the generated sequences, is scale invariant. Although recurrent neural network models have been proposed as a mechanism for generating sequences, the requirements for scale-invariant sequences are not known. This letter reports the constraints that enable a linear recurrent neural network model to generate scale-invariant sequential activity. A straightforward eigendecomposition analysis results in two independent conditions that are required for scale invariance for connectivity matrices with real, distinct eigenvalues. First, the eigenvalues of the network must be geometrically spaced. Second, the eigenvectors must be related to one another via translation. These constraints are easily generalizable for matrices that have complex and distinct eigenvalues. Analogous albeit less compact constraints hold for matrices with degenerate eigenvalues. These constraints, along with considerations on initial conditions, provide a general recipe to build linear recurrent neural networks that support scale-invariant sequential activity.</span>3271379140710.1162/neco_a_01288https://direct.mit.edu/neco/article/32/7/1379/95594/Generation-of-Scale-Invariant-Sequential-ActivityFirst Passage Time Memory Lifetimes for Multistate, Filter-Based Synapses
https://direct.mit.edu/neco/article/32/6/1069/95588/First-Passage-Time-Memory-Lifetimes-for-Multistate
Mon, 01 Jun 2020 00:00:00 GMTElliott T. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Models of associative memory with discrete state synapses learn new memories by forgetting old ones. In contrast to non-integrative models of synaptic plasticity, models with integrative, filter-based synapses exhibit an initial rise in the fidelity of recall of stored memories. This rise to a peak is driven by a transient process and is then followed by a return to equilibrium. In a series of papers, we have employed a first passage time (FPT) approach to define and study memory lifetimes, incrementally developing our methods, from both simple and complex binary-strength synapses to simple multistate synapses. Here, we complete this work by analyzing FPT memory lifetimes in multistate, filter-based synapses. To achieve this, we integrate out the internal filter states so that we can work with transitions only in synaptic strength. We then generalize results on polysynaptic generating functions from binary strength to multistate synapses, allowing us to examine the dynamics of synaptic strength changes in an ensemble of synapses rather than just a single synapse. To derive analytical results for FPT memory lifetimes, we partition the synaptic dynamics into two distinct phases: the first, pre-peak phase studied with a drift-only approximation, and the second, post-peak phase studied with approximations to the full strength transition probabilities. These approximations capture the underlying dynamics very well, as demonstrated by the extremely good agreement between results obtained by simulating our model and results obtained from the Fokker-Planck or integral equation approaches to FPT processes.</span>3261069114310.1162/neco_a_01283https://direct.mit.edu/neco/article/32/6/1069/95588/First-Passage-Time-Memory-Lifetimes-for-MultistateIndependently Interpretable Lasso for Generalized Linear Models
https://direct.mit.edu/neco/article/32/6/1168/95587/Independently-Interpretable-Lasso-for-Generalized
Mon, 01 Jun 2020 00:00:00 GMTTakada M, Suzuki T, Fujisawa H. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Sparse regularization such as ℓ1 regularization is a quite powerful and widely used strategy for high-dimensional learning problems. The effectiveness of sparse regularization has been supported practically and theoretically by several studies. However, one of the biggest issues in sparse regularization is that its performance is quite sensitive to correlations between features. Ordinary ℓ1 regularization selects variables correlated with each other under weak regularizations, which results in deterioration of not only its estimation error but also interpretability. In this letter, we propose a new regularization method, independently interpretable lasso (IILasso), for generalized linear models. Our proposed regularizer suppresses selecting correlated variables, so that each active variable affects the response independently in the model. Hence, we can interpret regression coefficients intuitively, and the performance is also improved by avoiding overfitting. We analyze the theoretical property of the IILasso and show that the proposed method is advantageous for its sign recovery and achieves almost minimax optimal convergence rate. Synthetic and real data analyses also indicate the effectiveness of the IILasso.</span>3261168122110.1162/neco_a_01279https://direct.mit.edu/neco/article/32/6/1168/95587/Independently-Interpretable-Lasso-for-GeneralizedNonequilibrium Statistical Mechanics of Continuous Attractors
https://direct.mit.edu/neco/article/32/6/1033/95586/Nonequilibrium-Statistical-Mechanics-of-Continuous
Mon, 01 Jun 2020 00:00:00 GMTZhong W, Lu Z, Schwab DJ, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Continuous attractors have been used to understand recent neuroscience experiments where persistent activity patterns encode internal representations of external attributes like head direction or spatial location. However, the conditions under which the emergent bump of neural activity in such networks can be manipulated by space and time-dependent external sensory or motor signals are not understood. Here, we find fundamental limits on how rapidly internal representations encoded along continuous attractors can be updated by an external signal. We apply these results to place cell networks to derive a velocity-dependent nonequilibrium memory capacity in neural networks.</span>3261033106810.1162/neco_a_01280https://direct.mit.edu/neco/article/32/6/1033/95586/Nonequilibrium-Statistical-Mechanics-of-ContinuousEfficient Position Decoding Methods Based on Fluorescence Calcium Imaging in the Mouse Hippocampus
https://direct.mit.edu/neco/article/32/6/1144/95585/Efficient-Position-Decoding-Methods-Based-on
Mon, 01 Jun 2020 00:00:00 GMTTu M, Zhao R, Adler A, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Large-scale fluorescence calcium imaging methods have become widely adopted for studies of long-term hippocampal and cortical neuronal dynamics. Pyramidal neurons of the rodent hippocampus show spatial tuning in freely foraging or head-fixed navigation tasks. Development of efficient neural decoding methods for reconstructing the animal's position in real or virtual environments can provide a fast readout of spatial representations in closed-loop neuroscience experiments. Here, we develop an efficient strategy to extract features from fluorescence calcium imaging traces and further decode the animal's position. We validate our spike inference-free decoding methods in multiple in vivo calcium imaging recordings of the mouse hippocampus based on both supervised and unsupervised decoding analyses. We systematically investigate the decoding performance of our proposed methods with respect to the number of neurons, imaging frame rate, and signal-to-noise ratio. Our proposed supervised decoding analysis is ultrafast and robust, and thereby appealing for real-time position decoding applications based on calcium imaging.</span>3261144116710.1162/neco_a_01281https://direct.mit.edu/neco/article/32/6/1144/95585/Efficient-Position-Decoding-Methods-Based-onSalient Slices: Improved Neural Network Training and Performance with Image Entropy
https://direct.mit.edu/neco/article/32/6/1222/95584/Salient-Slices-Improved-Neural-Network-Training
Mon, 01 Jun 2020 00:00:00 GMTFrank SJ, Frank AM. <span class="paragraphSection"><div class="boxTitle">Abstract</div>As a training and analysis strategy for convolutional neural networks (CNNs), we slice images into tiled segments and use, for training and prediction, segments that both satisfy an information criterion and contain sufficient content to support classification. In particular, we use image entropy as the information criterion. This ensures that each tile carries as much information diversity as the original image and, for many applications, serves as an indicator of usefulness in classification. To make predictions, a probability aggregation framework is applied to probabilities assigned by the CNN to the input image tiles. This technique, which we call Salient Slices, facilitates the use of large, high-resolution images that would be impractical to analyze unmodified; provides data augmentation for training, which is particularly valuable when image availability is limited; and the ensemble nature of the input for prediction enhances its accuracy.</span>3261222123710.1162/neco_a_01282https://direct.mit.edu/neco/article/32/6/1222/95584/Salient-Slices-Improved-Neural-Network-TrainingFeature Extraction of Surface Electromyography Based on Improved Small-World Leaky Echo State Network
https://direct.mit.edu/neco/article/32/4/741/95578/Feature-Extraction-of-Surface-Electromyography
Wed, 01 Apr 2020 00:00:00 GMTXi X, Jiang W, Miran SM, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Surface electromyography (sEMG) is an electrophysiological reflection of skeletal muscle contractile activity that can directly reflect neuromuscular activity. It has been a matter of research to investigate feature extraction methods of sEMG signals. In this letter, we propose a feature extraction method of sEMG signals based on the improved small-world leaky echo state network (ISWLESN). The reservoir of leaky echo state network (LESN) is connected by a random network. First, we improved the reservoir of the echo state network (ESN) by these networks and used edge-added probability to improve these networks. That idea enhances the adaptability of the reservoir, the generalization ability, and the stability of ESN. Then we obtained the output weight of the network through training and used it as features. We recorded the sEMG signals during different activities: falling, walking, sitting, squatting, going upstairs, and going downstairs. Afterward, we extracted corresponding features by ISWLESN and used principal component analysis for dimension reduction. At the end, scatter plot, the class separability index, and the Davies-Bouldin index were used to assess the performance of features. The results showed that the ISWLESN clustering performance was better than those of LESN and ESN. By support vector machine, it was also revealed that the performance of ISWLESN for classifying the activities was better than those of ESN and LESN.</span>32474175810.1162/neco_a_01270https://direct.mit.edu/neco/article/32/4/741/95578/Feature-Extraction-of-Surface-ElectromyographyOnline Learning Based on Online DCA and Application to Online Classification
https://direct.mit.edu/neco/article/32/4/759/95577/Online-Learning-Based-on-Online-DCA-and
Wed, 01 Apr 2020 00:00:00 GMTLe Thi H, Ho V. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We investigate an approach based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) for online learning techniques. The prediction problem of an online learner can be formulated as a DC program for which online DCA is applied. We propose the two so-called complete/approximate versions of online DCA scheme and prove their logarithmic/sublinear regrets. Six online DCA-based algorithms are developed for online binary linear classification. Numerical experiments on a variety of benchmark classification data sets show the efficiency of our proposed algorithms in comparison with the state-of-the-art online classification algorithms.</span>32475979310.1162/neco_a_01266https://direct.mit.edu/neco/article/32/4/759/95577/Online-Learning-Based-on-Online-DCA-andCenter Manifold Analysis of Plateau Phenomena Caused by Degeneration of Three-Layer Perceptron
https://direct.mit.edu/neco/article/32/4/683/95576/Center-Manifold-Analysis-of-Plateau-Phenomena
Wed, 01 Apr 2020 00:00:00 GMTTsutsui D. <span class="paragraphSection"><div class="boxTitle">Abstract</div>A hierarchical neural network usually has many singular regions in the parameter space due to the degeneration of hidden units. Here, we focus on a three-layer perceptron, which has one-dimensional singular regions comprising both attractive and repulsive parts. Such a singular region is often called a Milnor-like attractor. It is empirically known that in the vicinity of a Milnor-like attractor, several parameters converge much faster than the rest and that the dynamics can be reduced to smaller-dimensional ones. Here we give a rigorous proof for this phenomenon based on a center manifold theory. As an application, we analyze the reduced dynamics near the Milnor-like attractor and study the stochastic effects of the online learning.</span>32468371010.1162/neco_a_01268https://direct.mit.edu/neco/article/32/4/683/95576/Center-Manifold-Analysis-of-Plateau-PhenomenaNeural Model of Coding Stimulus Orientation and Adaptation
https://direct.mit.edu/neco/article/32/4/711/95575/Neural-Model-of-Coding-Stimulus-Orientation-and
Wed, 01 Apr 2020 00:00:00 GMTVaitkevičius H, Švegžda A, Stanikūnas R, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The coding of line orientation in the visual system has been investigated extensively. During the prolonged viewing of a stimulus, the perceived orientation continuously changes (normalization effect). Also, the orientation of the adapting stimulus and the background stimuli influence the perceived orientation of the subsequently displayed stimulus: tilt after-effect (TAE) or tilt illusion (TI). The neural mechanisms of these effects are not fully understood. The proposed model includes many local analyzers, each consisting of two sets of neurons. The first set has two independent cardinal detectors (CDs), whose responses depend on stimulus orientation. The second set has many orientation detectors (OD) tuned to different orientations of the stimulus. The ODs sum up the responses of the two CDs with respective weightings and output a preferred orientation depending on the ratio of CD responses. It is suggested that during prolonged viewing, the responses of the CDs decrease: the greater the excitation of the detector, the more rapid the decrease in its response. Thereby, the ratio of CD responses changes during the adaptation, causing the normalization effect and the TAE. The CDs of the different local analyzers laterally inhibit each other and cause the TI. We show that the properties of this model are consistent with both psychophysical and neurophysiological findings related to the properties of orientation perception, and we investigate how these mechanisms can affect the orientation's sensitivity.</span>32471174010.1162/neco_a_01269https://direct.mit.edu/neco/article/32/4/711/95575/Neural-Model-of-Coding-Stimulus-Orientation-andOptimal Multivariate Tuning with Neuron-Level and Population-Level Energy Constraints
https://direct.mit.edu/neco/article/32/4/794/95574/Optimal-Multivariate-Tuning-with-Neuron-Level-and
Wed, 01 Apr 2020 00:00:00 GMTHarel Y, Meir R. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Optimality principles have been useful in explaining many aspects of biological systems. In the context of neural encoding in sensory areas, optimality is naturally formulated in a Bayesian setting as neural tuning which minimizes mean decoding error. Many works optimize Fisher information, which approximates the minimum mean square error (MMSE) of the optimal decoder for long encoding time but may be misleading for short encoding times. We study MMSE-optimal neural encoding of a multivariate stimulus by uniform populations of spiking neurons, under firing rate constraints for each neuron as well as for the entire population. We show that the population-level constraint is essential for the formulation of a well-posed problem having finite optimal tuning widths and optimal tuning aligns with the principal components of the prior distribution. Numerical evaluation of the two-dimensional case shows that encoding only the dimension with higher variance is optimal for short encoding times. We also compare direct MMSE optimization to optimization of several proxies to MMSE: Fisher information, maximum likelihood estimation error, and the Bayesian Cramér-Rao bound. We find that optimization of these measures yields qualitatively misleading results regarding MMSE-optimal tuning and its dependence on encoding time and energy constraints.</span>32479482810.1162/neco_a_01267https://direct.mit.edu/neco/article/32/4/794/95574/Optimal-Multivariate-Tuning-with-Neuron-Level-andSwitching in Cerebellar Stellate Cell Excitability in Response to a Pair of Inhibitory/Excitatory Presynaptic Inputs: A Dynamical System Perspective
https://direct.mit.edu/neco/article/32/3/626/95583/Switching-in-Cerebellar-Stellate-Cell-Excitability
Sun, 01 Mar 2020 00:00:00 GMTFarjami S, Alexander RD, Bowie D, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Cerebellar stellate cells form inhibitory synapses with Purkinje cells, the sole output of the cerebellum. Upon stimulation by a pair of varying inhibitory and fixed excitatory presynaptic inputs, these cells do not respond to excitation (i.e., do not generate an action potential) when the magnitude of the inhibition is within a given range, but they do respond outside this range. We previously used a revised Hodgkin–Huxley type of model to study the nonmonotonic first-spike latency of these cells and their temporal increase in excitability in whole cell configuration (termed run-up). Here, we recompute these latency profiles using the same model by adapting an efficient computational technique, the two-point boundary value problem, that is combined with the continuation method. We then extend the study to investigate how switching in responsiveness, upon stimulation with presynaptic inputs, manifests itself in the context of run-up. A three-dimensional reduced model is initially derived from the original six-dimensional model and then analyzed to demonstrate that both models exhibit type 1 excitability possessing a saddle-node on an invariant cycle (SNIC) bifurcation when varying the amplitude of Iapp. Using slow-fast analysis, we show that the original model possesses three equilibria lying at the intersection of the critical manifold of the fast subsystem and the nullcline of the slow variable hA (the inactivation of the A-type K+ channel), the middle equilibrium is of saddle type with two-dimensional stable manifold (computed from the reduced model) acting as a boundary between the responsive and non-responsive regimes, and the (ghost of) SNIC is formed when the hA-nullcline is (nearly) tangential to the critical manifold. We also show that the slow dynamics associated with (the ghost of) the SNIC and the lower stable branch of the critical manifold are responsible for generating the nonmonotonic first-spike latency. These results thus provide important insight into the complex dynamics of stellate cells.</span>32362665810.1162/neco_a_01261https://direct.mit.edu/neco/article/32/3/626/95583/Switching-in-Cerebellar-Stellate-Cell-ExcitabilityModel-Free Robust Optimal Feedback Mechanisms of Biological Motor Control
https://direct.mit.edu/neco/article/32/3/562/95582/Model-Free-Robust-Optimal-Feedback-Mechanisms-of
Sun, 01 Mar 2020 00:00:00 GMTBian T, Wolpert DM, Jiang Z. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Sensorimotor tasks that humans perform are often affected by different sources of uncertainty. Nevertheless, the central nervous system (CNS) can gracefully coordinate our movements. Most learning frameworks rely on the internal model principle, which requires a precise internal representation in the CNS to predict the outcomes of our motor commands. However, learning a perfect internal model in a complex environment over a short period of time is a nontrivial problem. Indeed, achieving proficient motor skills may require years of training for some difficult tasks. Internal models alone may not be adequate to explain the motor adaptation behavior during the early phase of learning. Recent studies investigating the active regulation of motor variability, the presence of suboptimal inference, and model-free learning have challenged some of the traditional viewpoints on the sensorimotor learning mechanism. As a result, it may be necessary to develop a computational framework that can account for these new phenomena. Here, we develop a novel theory of motor learning, based on model-free adaptive optimal control, which can bypass some of the difficulties in existing theories. This new theory is based on our recently developed adaptive dynamic programming (ADP) and robust ADP (RADP) methods and is especially useful for accounting for motor learning behavior when an internal model is inaccurate or unavailable. Our preliminary computational results are in line with experimental observations reported in the literature and can account for some phenomena that are inexplicable using existing models.</span>32356259510.1162/neco_a_01260https://direct.mit.edu/neco/article/32/3/562/95582/Model-Free-Robust-Optimal-Feedback-Mechanisms-ofHidden Aspects of the Research ADOS Are Bound to Affect Autism Science
https://direct.mit.edu/neco/article/32/3/515/95581/Hidden-Aspects-of-the-Research-ADOS-Are-Bound-to
Sun, 01 Mar 2020 00:00:00 GMTTorres EB, Rai R, Mistry S, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The research-grade Autism Diagnostic Observational Schedule (ADOS) is a broadly used instrument that informs and steers much of the science of autism. Despite its broad use, little is known about the empirical variability inherently present in the scores of the ADOS scale or their appropriateness to define change and its rate, to repeatedly use this test to characterize neurodevelopmental trajectories. Here we examine the empirical distributions of research-grade ADOS scores from 1324 records in a cross-section of the population comprising participants with autism between five and 65 years of age. We find that these empirical distributions violate the theoretical requirements of normality and homogeneous variance, essential for independence between bias and sensitivity. Further, we assess a subset of 52 typical controls versus those with autism and find a lack of proper elements to characterize neurodevelopmental trajectories in a coping nervous system changing at nonuniform, nonlinear rates. Repeating the assessments over four visits in a subset of the participants with autism for whom verbal criteria retained the same appropriate ADOS modules over the time span of the four visits reveals that switching the clinician changes the cutoff scores and consequently influences the diagnosis, despite maintaining fidelity in the same test's modules, room conditions, and tasks' fluidity per visit. Given the changes in probability distribution shape and dispersion of these ADOS scores, the lack of appropriate metric spaces to define similarity measures to characterize change and the impact that these elements have on sensitivity-bias codependencies and on longitudinal tracking of autism, we invite a discussion on readjusting the use of this test for scientific purposes.</span>32351556110.1162/neco_a_01263https://direct.mit.edu/neco/article/32/3/515/95581/Hidden-Aspects-of-the-Research-ADOS-Are-Bound-toEvaluating the Potential Gain of Auditory and Audiovisual Speech-Predictive Coding Using Deep Learning
https://direct.mit.edu/neco/article/32/3/596/95580/Evaluating-the-Potential-Gain-of-Auditory-and
Sun, 01 Mar 2020 00:00:00 GMTHueber T, Tatulli E, Girin L, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Sensory processing is increasingly conceived in a predictive framework in which neurons would constantly process the error signal resulting from the comparison of expected and observed stimuli. Surprisingly, few data exist on the accuracy of predictions that can be computed in real sensory scenes. Here, we focus on the sensory processing of auditory and audiovisual speech. We propose a set of computational models based on artificial neural networks (mixing deep feedforward and convolutional networks), which are trained to predict future audio observations from present and past audio or audiovisual observations (i.e., including lip movements). Those predictions exploit purely local phonetic regularities with no explicit call to higher linguistic levels. Experiments are conducted on the multispeaker LibriSpeech audio speech database (around 100 hours) and on the NTCD-TIMIT audiovisual speech database (around 7 hours). They appear to be efficient in a short temporal range (25–50 ms), predicting 50% to 75% of the variance of the incoming stimulus, which could result in potentially saving up to three-quarters of the processing power. Then they quickly decrease and almost vanish after 250 ms. Adding information on the lips slightly improves predictions, with a 5% to 10% increase in explained variance. Interestingly the visual gain vanishes more slowly, and the gain is maximum for a delay of 75 ms between image and predicted sound.</span>32359662510.1162/neco_a_01264https://direct.mit.edu/neco/article/32/3/596/95580/Evaluating-the-Potential-Gain-of-Auditory-andClassification from Triplet Comparison Data
https://direct.mit.edu/neco/article/32/3/659/95579/Classification-from-Triplet-Comparison-Data
Sun, 01 Mar 2020 00:00:00 GMTCui Z, Charoenphakdee N, Sato I, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Learning from triplet comparison data has been extensively studied in the context of metric learning, where we want to learn a distance metric between two instances, and ordinal embedding, where we want to learn an embedding in a Euclidean space of the given instances that preserve the comparison order as much as possible. Unlike fully labeled data, triplet comparison data can be collected in a more accurate and human-friendly way. Although learning from triplet comparison data has been considered in many applications, an important fundamental question of whether we can learn a classifier only from triplet comparison data without all the labels has remained unanswered. In this letter, we give a positive answer to this important question by proposing an unbiased estimator for the classification risk under the empirical risk minimization framework. Since the proposed method is based on the empirical risk minimization framework, it inherently has the advantage that any surrogate loss function and any model, including neural networks, can be easily applied. Furthermore, we theoretically establish an estimation error bound for the proposed empirical risk minimizer. Finally, we provide experimental results to show that our method empirically works well and outperforms various baseline methods.</span>32365968110.1162/neco_a_01262https://direct.mit.edu/neco/article/32/3/659/95579/Classification-from-Triplet-Comparison-DataFace Representations via Tensorfaces of Various Complexities
https://direct.mit.edu/neco/article/32/2/281/95573/Face-Representations-via-Tensorfaces-of-Various
Sat, 01 Feb 2020 00:00:00 GMTLehky SR, Phan A, Cichocki A, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Neurons selective for faces exist in humans and monkeys. However, characteristics of face cell receptive fields are poorly understood. In this theoretical study, we explore the effects of complexity, defined as algorithmic information (Kolmogorov complexity) and logical depth, on possible ways that face cells may be organized. We use tensor decompositions to decompose faces into a set of components, called tensorfaces, and their associated weights, which can be interpreted as model face cells and their firing rates. These tensorfaces form a high-dimensional representation space in which each tensorface forms an axis of the space. A distinctive feature of the decomposition algorithm is the ability to specify tensorface complexity. We found that low-complexity tensorfaces have blob-like appearances crudely approximating faces, while high-complexity tensorfaces appear clearly face-like. Low-complexity tensorfaces require a larger population to reach a criterion face reconstruction error than medium- or high-complexity tensorfaces, and thus are inefficient by that criterion. Low-complexity tensorfaces, however, generalize better when representing statistically novel faces, which are faces falling beyond the distribution of face description parameters found in the tensorface training set. The degree to which face representations are parts based or global forms a continuum as a function of tensorface complexity, with low and medium tensorfaces being more parts based. Given the computational load imposed in creating high-complexity face cells (in the form of algorithmic information and logical depth) and in the absence of a compelling advantage to using high-complexity cells, we suggest face representations consist of a mixture of low- and medium-complexity face cells.</span>32228132910.1162/neco_a_01258https://direct.mit.edu/neco/article/32/2/281/95573/Face-Representations-via-Tensorfaces-of-VariousSynaptic Scaling Improves the Stability of Neural Mass Models Capable of Simulating Brain Plasticity
https://direct.mit.edu/neco/article/32/2/424/95572/Synaptic-Scaling-Improves-the-Stability-of-Neural
Sat, 01 Feb 2020 00:00:00 GMTDemšar J, Forsyth R. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Neural mass models offer a way of studying the development and behavior of large-scale brain networks through computer simulations. Such simulations are currently mainly research tools, but as they improve, they could soon play a role in understanding, predicting, and optimizing patient treatments, particularly in relation to effects and outcomes of brain injury. To bring us closer to this goal, we took an existing state-of-the-art neural mass model capable of simulating connection growth through simulated plasticity processes. We identified and addressed some of the model's limitations by implementing biologically plausible mechanisms. The main limitation of the original model was its instability, which we addressed by incorporating a representation of the mechanism of synaptic scaling and examining the effects of optimizing parameters in the model. We show that the updated model retains all the merits of the original model, while being more stable and capable of generating networks that are in several aspects similar to those found in real brains.</span>32242444610.1162/neco_a_01257https://direct.mit.edu/neco/article/32/2/424/95572/Synaptic-Scaling-Improves-the-Stability-of-NeuralTransition Scale-Spaces: A Computational Theory for the Discretized Entorhinal Cortex
https://direct.mit.edu/neco/article/32/2/330/95569/Transition-Scale-Spaces-A-Computational-Theory-for
Sat, 01 Feb 2020 00:00:00 GMTWaniek N. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Although hippocampal grid cells are thought to be crucial for spatial navigation, their computational purpose remains disputed. Recently, they were proposed to represent spatial transitions and convey this knowledge downstream to place cells. However, a single scale of transitions is insufficient to plan long goal-directed sequences in behaviorally acceptable time.Here, a scale-space data structure is suggested to optimally accelerate retrievals from transition systems, called transition scale-space (TSS). Remaining exclusively on an algorithmic level, the scale increment is proved to be ideally 2 for biologically plausible receptive fields. It is then argued that temporal buffering is necessary to learn the scale-space online. Next, two modes for retrieval of sequences from the TSS are presented: top down and bottom up. The two modes are evaluated in symbolic simulations (i.e., without biologically plausible spiking neurons). Additionally, a TSS is used for short-cut discovery in a simulated Morris water maze. Finally, the results are discussed in depth with respect to biological plausibility, and several testable predictions are derived. Moreover, relations to other grid cell models, multiresolution path planning, and scale-space theory are highlighted. Summarized, reward-free transition encoding is shown here, in a theoretical model, to be compatible with the observed discretization along the dorso-ventral axis of the medial entorhinal cortex. Because the theoretical model generalizes beyond navigation, the TSS is suggested to be a general-purpose cortical data structure for fast retrieval of sequences and relational knowledge.Source code for all simulations presented in this paper can be found at <a href="https://github.com/rochus/transitionscalespace">https://github.com/rochus/transitionscalespace</a>.</span>32233039410.1162/neco_a_01255https://direct.mit.edu/neco/article/32/2/330/95569/Transition-Scale-Spaces-A-Computational-Theory-forScaled Coupled Norms and Coupled Higher-Order Tensor Completion
https://direct.mit.edu/neco/article/32/2/447/95567/Scaled-Coupled-Norms-and-Coupled-Higher-Order
Sat, 01 Feb 2020 00:00:00 GMTWimalawarne K, Yamada M, Mamitsuka H. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Recently, a set of tensor norms known as <span style="font-style:italic;">coupled norms</span> has been proposed as a convex solution to coupled tensor completion. Coupled norms have been designed by combining low-rank inducing tensor norms with the matrix trace norm. Though coupled norms have shown good performances, they have two major limitations: they do not have a method to control the regularization of coupled modes and uncoupled modes, and they are not optimal for couplings among higher-order tensors. In this letter, we propose a method that scales the regularization of coupled components against uncoupled components to properly induce the low-rankness on the coupled mode. We also propose coupled norms for higher-order tensors by combining the square norm to coupled norms. Using the excess risk-bound analysis, we demonstrate that our proposed methods lead to lower risk bounds compared to existing coupled norms. We demonstrate the robustness of our methods through simulation and real-data experiments.</span>32244748410.1162/neco_a_01254https://direct.mit.edu/neco/article/32/2/447/95567/Scaled-Coupled-Norms-and-Coupled-Higher-OrderImproving Generalization via Attribute Selection on Out-of-the-Box Data
https://direct.mit.edu/neco/article/32/2/485/95565/Improving-Generalization-via-Attribute-Selection
Sat, 01 Feb 2020 00:00:00 GMTXu X, Tsang IW, Liu C. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Zero-shot learning (ZSL) aims to recognize unseen objects (test classes) given some other seen objects (training classes) by sharing information of attributes between different objects. Attributes are artificially annotated for objects and treated equally in recent ZSL tasks. However, some inferior attributes with poor predictability or poor discriminability may have negative impacts on the ZSL system performance. This letter first derives a generalization error bound for ZSL tasks. Our theoretical analysis verifies that selecting the subset of key attributes can improve the generalization performance of the original ZSL model, which uses all the attributes. Unfortunately, previous attribute selection methods have been conducted based on the seen data, and their selected attributes have poor generalization capability to the unseen data, which is unavailable in the training stage of ZSL tasks. Inspired by learning from pseudo-relevance feedback, this letter introduces out-of-the-box data—pseudo-data generated by an attribute-guided generative model—to mimic the unseen data. We then present an iterative attribute selection (IAS) strategy that iteratively selects key attributes based on the out-of-the-box data. Since the distribution of the generated out-of-the-box data is similar to that of the test data, the key attributes selected by IAS can be effectively generalized to test data. Extensive experiments demonstrate that IAS can significantly improve existing attribute-based ZSL methods and achieve state-of-the-art performance.</span>32248551410.1162/neco_a_01256https://direct.mit.edu/neco/article/32/2/485/95565/Improving-Generalization-via-Attribute-SelectionFrom Synaptic Interactions to Collective Dynamics in Random Neuronal Networks Models: Critical Role of Eigenvectors and Transient Behavior
https://direct.mit.edu/neco/article/32/2/395/95563/From-Synaptic-Interactions-to-Collective-Dynamics
Sat, 01 Feb 2020 00:00:00 GMTGudowska-Nowak EE, Nowak MA, Chialvo DR, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The study of neuronal interactions is at the center of several big collaborative neuroscience projects (including the Human Connectome Project, the Blue Brain Project, and the Brainome) that attempt to obtain a detailed map of the entire brain. Under certain constraints, mathematical theory can advance predictions of the expected neural dynamics based solely on the statistical properties of the synaptic interaction matrix. This work explores the application of free random variables to the study of large synaptic interaction matrices. Besides recovering in a straightforward way known results on eigenspectra in types of models of neural networks proposed by Rajan and Abbott (<a href="#B57" class="reflinks">2006</a>), we extend them to heavy-tailed distributions of interactions. More important, we analytically derive the behavior of eigenvector overlaps, which determine the stability of the spectra. We observe that on imposing the neuronal excitation/inhibition balance, despite the eigenvalues remaining unchanged, their stability dramatically decreases due to the strong nonorthogonality of associated eigenvectors. This leads us to the conclusion that understanding the temporal evolution of asymmetric neural networks requires considering the entangled dynamics of both eigenvectors and eigenvalues, which might bear consequences for learning and memory processes in these models. Considering the success of free random variables theory in a wide variety of disciplines, we hope that the results presented here foster the additional application of these ideas in the area of brain sciences.</span>32239542310.1162/neco_a_01253https://direct.mit.edu/neco/article/32/2/395/95563/From-Synaptic-Interactions-to-Collective-DynamicsA Continuous-Time Analysis of Distributed Stochastic Gradient
https://direct.mit.edu/neco/article/32/1/36/95571/A-Continuous-Time-Analysis-of-Distributed
Wed, 01 Jan 2020 00:00:00 GMTBoffi NM, Slotine JE. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We analyze the effect of synchronization on distributed stochastic gradient algorithms. By exploiting an analogy with dynamical models of biological quorum sensing, where synchronization between agents is induced through communication with a common signal, we quantify how synchronization can significantly reduce the magnitude of the noise felt by the individual distributed agents and their spatial mean. This noise reduction is in turn associated with a reduction in the smoothing of the loss function imposed by the stochastic gradient approximation. Through simulations on model nonconvex objectives, we demonstrate that coupling can stabilize higher noise levels and improve convergence. We provide a convergence analysis for strongly convex functions by deriving a bound on the expected deviation of the spatial mean of the agents from the global minimizer for an algorithm based on quorum sensing, the same algorithm with momentum, and the elastic averaging SGD (EASGD) algorithm. We discuss extensions to new algorithms that allow each agent to broadcast its current measure of success and shape the collective computation accordingly. We supplement our theoretical analysis with numerical experiments on convolutional neural networks trained on the CIFAR-10 data set, where we note a surprising regularizing property of EASGD even when applied to the non-distributed case. This observation suggests alternative second-order in time algorithms for nondistributed optimization that are competitive with momentum methods.</span>321369610.1162/neco_a_01248https://direct.mit.edu/neco/article/32/1/36/95571/A-Continuous-Time-Analysis-of-DistributedOn Kernel Method–Based Connectionist Models and Supervised Deep Learning Without Backpropagation
https://direct.mit.edu/neco/article/32/1/97/95570/On-Kernel-Method-Based-Connectionist-Models-and
Wed, 01 Jan 2020 00:00:00 GMTDuan S, Yu S, Chen Y, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer by layer a compositional hypothesis class (i.e., a feedforward, multilayer architecture) in a supervised setting. In terms of the models, we present a principled method to “kernelize” (partly or completely) any neural network (NN). With this method, we obtain a counterpart of any given NN that is powered by kernel machines instead of neurons. In terms of learning, when learning a feedforward deep architecture in a supervised setting, one needs to train all the components simultaneously using backpropagation (BP) since there are no explicit targets for the hidden layers (Rumelhart, Hinton, & Williams, <a href="#B40" class="reflinks">1986</a>). We consider without loss of generality the two-layer case and present a general framework that explicitly characterizes a target for the hidden layer that is optimal for minimizing the objective function of the network. This characterization then makes possible a purely greedy training scheme that learns one layer at a time, starting from the input layer. We provide instantiations of the abstract framework under certain architectures and objective functions. Based on these instantiations, we present a layer-wise training algorithm for an l-layer feedforward network for classification, where l≥2 can be arbitrary. This algorithm can be given an intuitive geometric interpretation that makes the learning dynamics transparent. Empirical results are provided to complement our theory. We show that the kernelized networks, trained layer-wise, compare favorably with classical kernel machines as well as other connectionist models trained by BP. We also visualize the inner workings of the greedy kernelized models to validate our claim on the transparency of the layer-wise algorithm.</span>3219713510.1162/neco_a_01250https://direct.mit.edu/neco/article/32/1/97/95570/On-Kernel-Method-Based-Connectionist-Models-andA Robust Model of Gated Working Memory
https://direct.mit.edu/neco/article/32/1/153/95568/A-Robust-Model-of-Gated-Working-Memory
Wed, 01 Jan 2020 00:00:00 GMTStrock A, Hinaut X, Rougier NP. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Gated working memory is defined as the capacity of holding arbitrary information at any time in order to be used at a later time. Based on electrophysiological recordings, several computational models have tackled the problem using dedicated and explicit mechanisms. We propose instead to consider an implicit mechanism based on a random recurrent neural network. We introduce a robust yet simple reservoir model of gated working memory with instantaneous updates. The model is able to store an arbitrary real value at random time over an extended period of time. The dynamics of the model is a line attractor that learns to exploit reentry and a nonlinearity during the training phase using only a few representative values. A deeper study of the model shows that there is actually a large range of hyperparameters for which the results hold (e.g., number of neurons, sparsity, global weight scaling) such that any large enough population, mixing excitatory and inhibitory neurons, can quickly learn to realize such gated working memory. In a nutshell, with a minimal set of hypotheses, we show that we can have a robust model of working memory. This suggests this property could be an implicit property of any random population, that can be acquired through learning. Furthermore, considering working memory to be a physically open but functionally closed system, we give account on some counterintuitive electrophysiological recordings.</span>32115318110.1162/neco_a_01249https://direct.mit.edu/neco/article/32/1/153/95568/A-Robust-Model-of-Gated-Working-MemoryOptimal Sampling of Parametric Families: Implications for Machine Learning
https://direct.mit.edu/neco/article/32/1/261/95566/Optimal-Sampling-of-Parametric-Families
Wed, 01 Jan 2020 00:00:00 GMTHuber AG, Anumula J, Liu S. <span class="paragraphSection"><div class="boxTitle">Abstract</div>It is well known in machine learning that models trained on a training set generated by a probability distribution function perform far worse on test sets generated by a different probability distribution function. In the limit, it is feasible that a continuum of probability distribution functions might have generated the observed test set data; a desirable property of a learned model in that case is its ability to describe most of the probability distribution functions from the continuum equally well. This requirement naturally leads to sampling methods from the continuum of probability distribution functions that lead to the construction of optimal training sets. We study the sequential prediction of Ornstein-Uhlenbeck processes that form a parametric family. We find empirically that a simple deep network trained on optimally constructed training sets using the methods described in this letter can be robust to changes in the test set distribution.</span>32126127910.1162/neco_a_01251https://direct.mit.edu/neco/article/32/1/261/95566/Optimal-Sampling-of-Parametric-FamiliesIterative Retrieval and Block Coding in Autoassociative and Heteroassociative Memory
https://direct.mit.edu/neco/article/32/1/205/95564/Iterative-Retrieval-and-Block-Coding-in
Wed, 01 Jan 2020 00:00:00 GMTKnoblauch A, Palm G. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Neural associative memories (NAM) are perceptron-like single-layer networks with fast synaptic learning typically storing discrete associations between pairs of neural activity patterns. Gripon and Berrou (<a href="#B21" class="reflinks">2011</a>) investigated NAM employing block coding, a particular sparse coding method, and reported a significant increase in storage capacity. Here we verify and extend their results for both heteroassociative and recurrent autoassociative networks. For this we provide a new analysis of iterative retrieval in finite autoassociative and heteroassociative networks that allows estimating storage capacity for random and block patterns. Furthermore, we have implemented various retrieval algorithms for block coding and compared them in simulations to our theoretical results and previous simulation data. In good agreement of theory and experiments, we find that finite networks employing block coding can store significantly more memory patterns. However, due to the reduced information per block pattern, it is not possible to significantly increase stored information per synapse. Asymptotically, the information retrieval capacity converges to the known limits C=ln2≈0.69 and C=(ln2)/4≈0.17 also for block coding. We have also implemented very large recurrent networks up to n=2·106 neurons, showing that maximal capacity C≈0.2 bit per synapse occurs for finite networks having a size n≈105 similar to cortical macrocolumns.</span>32120526010.1162/neco_a_01247https://direct.mit.edu/neco/article/32/1/205/95564/Iterative-Retrieval-and-Block-Coding-inToward Training Recurrent Neural Networks for Lifelong Learning
https://direct.mit.edu/neco/article/32/1/1/95562/Toward-Training-Recurrent-Neural-Networks-for
Wed, 01 Jan 2020 00:00:00 GMTSodhani S, Chandar S, Bengio Y. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Catastrophic forgetting and capacity saturation are the central challenges of any parametric lifelong learning system. In this work, we study these challenges in the context of sequential supervised learning with an emphasis on recurrent neural networks. To evaluate the models in the lifelong learning setting, we propose a curriculum-based, simple, and intuitive benchmark where the models are trained on tasks with increasing levels of difficulty. To measure the impact of catastrophic forgetting, the model is tested on all the previous tasks as it completes any task. As a step toward developing true lifelong learning systems, we unify gradient episodic memory (a catastrophic forgetting alleviation approach) and Net2Net (a capacity expansion approach). Both models are proposed in the context of feedforward networks, and we evaluate the feasibility of using them for recurrent networks. Evaluation on the proposed benchmark shows that the unified model is more suitable than the constituent models for lifelong learning setting.</span>32113510.1162/neco_a_01246https://direct.mit.edu/neco/article/32/1/1/95562/Toward-Training-Recurrent-Neural-Networks-forAn FPGA Implementation of Deep Spiking Neural Networks for Low-Power and Fast Classification
https://direct.mit.edu/neco/article/32/1/182/95561/An-FPGA-Implementation-of-Deep-Spiking-Neural
Wed, 01 Jan 2020 00:00:00 GMTJu X, Fang B, Yan R, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>A spiking neural network (SNN) is a type of biological plausibility model that performs information processing based on spikes. Training a deep SNN effectively is challenging due to the nondifferention of spike signals. Recent advances have shown that high-performance SNNs can be obtained by converting convolutional neural networks (CNNs). However, the large-scale SNNs are poorly served by conventional architectures due to the dynamic nature of spiking neurons. In this letter, we propose a hardware architecture to enable efficient implementation of SNNs. All layers in the network are mapped on one chip so that the computation of different time steps can be done in parallel to reduce latency. We propose new spiking max-pooling method to reduce computation complexity. In addition, we apply approaches based on shift register and coarsely grained parallels to accelerate convolution operation. We also investigate the effect of different encoding methods on SNN accuracy. Finally, we validate the hardware architecture on the Xilinx Zynq ZCU102. The experimental results on the MNIST data set show that it can achieve an accuracy of 98.94% with eight-bit quantized weights. Furthermore, it achieves 164 frames per second (FPS) under 150 MHz clock frequency and obtains 41× speed-up compared to CPU implementation and 22 times lower power than GPU implementation.</span>32118220410.1162/neco_a_01245https://direct.mit.edu/neco/article/32/1/182/95561/An-FPGA-Implementation-of-Deep-Spiking-NeuralStoring Object-Dependent Sparse Codes in a Willshaw Associative Network
https://direct.mit.edu/neco/article/32/1/136/95560/Storing-Object-Dependent-Sparse-Codes-in-a
Wed, 01 Jan 2020 00:00:00 GMTSa-Couto L, Wichert A. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Willshaw networks are single-layered neural networks that store associations between binary vectors. Using only binary weights, these networks can be implemented efficiently to store large numbers of patterns and allow for fault-tolerant recovery of those patterns from noisy cues. However, this is only the case when the involved codes are sparse and randomly generated. In this letter, we use a recently proposed approach that maps visual patterns into informative binary features. By doing so, we manage to transform MNIST handwritten digits into well-distributed codes that we then store in a Willshaw network in autoassociation. We perform experiments with both noisy and noiseless cues and verify a tenuous impact on the recovered pattern's relevant information. More specifically, we were able to perform retrieval after filling the memory to several factors of its number of units while preserving the information of the class to which the pattern belongs.</span>32113615210.1162/neco_a_01243https://direct.mit.edu/neco/article/32/1/136/95560/Storing-Object-Dependent-Sparse-Codes-in-aSpike-Based Winner-Take-All Computation: Fundamental Limits and Order-Optimal Circuits
https://direct.mit.edu/neco/article/31/12/2523/95617/Spike-Based-Winner-Take-All-Computation
Sun, 01 Dec 2019 00:00:00 GMTSu L, Chang C, Lynch N. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Winner-take-all (WTA) refers to the neural operation that selects a (typically small) group of neurons from a large neuron pool. It is conjectured to underlie many of the brain's fundamental computational abilities. However, not much is known about the robustness of a spike-based WTA network to the inherent randomness of the input spike trains. In this work, we consider a spike-based k–WTA model wherein n randomly generated input spike trains compete with each other based on their underlying firing rates and k winners are supposed to be selected. We slot the time evenly with each time slot of length 1 ms and model the n input spike trains as n independent Bernoulli processes. We analytically characterize the minimum waiting time needed so that a target minimax decision accuracy (success probability) can be reached.We first derive an information-theoretic lower bound on the waiting time. We show that to guarantee a (minimax) decision error ≤δ (where δ∈(0,1)), the waiting time of any WTA circuit is at least ((1-δ)log(k(n-k)+1)-1)TR,where R⊆(0,1) is a finite set of rates and TR is a difficulty parameter of a WTA task with respect to set R for independent input spike trains. Additionally, TR is independent of δ, n, and k. We then design a simple WTA circuit whose waiting time is Olog1δ+logk(n-k)TR,provided that the local memory of each output neuron is sufficiently long. It turns out that for any fixed δ, this decision time is order-optimal (i.e., it matches the above lower bound up to a multiplicative constant factor) in terms of its scaling in n, k, and TR.</span>31122523256110.1162/neco_a_01242https://direct.mit.edu/neco/article/31/12/2523/95617/Spike-Based-Winner-Take-All-ComputationSafe Triplet Screening for Distance Metric Learning
https://direct.mit.edu/neco/article/31/12/2432/95615/Safe-Triplet-Screening-for-Distance-Metric
Sun, 01 Dec 2019 00:00:00 GMTYoshida T, Takeuchi I, Karasuyama M. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Distance metric learning has been widely used to obtain the optimal distance function based on the given training data. We focus on a triplet-based loss function, which imposes a penalty such that a pair of instances in the same class is closer than a pair in different classes. However, the number of possible triplets can be quite large even for a small data set, and this considerably increases the computational cost for metric optimization. In this letter, we propose safe triplet screening that identifies triplets that can be safely removed from the optimization problem without losing the optimality. In comparison with existing safe screening studies, triplet screening is particularly significant because of the huge number of possible triplets and the semidefinite constraint in the optimization problem. We demonstrate and verify the effectiveness of our screening rules by using several benchmark data sets.</span>31122432249110.1162/neco_a_01240https://direct.mit.edu/neco/article/31/12/2432/95615/Safe-Triplet-Screening-for-Distance-MetricBayesian Filtering with Multiple Internal Models: Toward a Theory of Social Intelligence
https://direct.mit.edu/neco/article/31/12/2390/95613/Bayesian-Filtering-with-Multiple-Internal-Models
Sun, 01 Dec 2019 00:00:00 GMTIsomura T, Parr T, Friston K. <span class="paragraphSection"><div class="boxTitle">Abstract</div>To exhibit social intelligence, animals have to recognize whom they are communicating with. One way to make this inference is to select among internal generative models of each conspecific who may be encountered. However, these models also have to be learned via some form of Bayesian belief updating. This induces an interesting problem: When receiving sensory input generated by a particular conspecific, how does an animal know which internal model to update? We consider a theoretical and neurobiologically plausible solution that enables inference and learning of the processes that generate sensory inputs (e.g., listening and understanding) and reproduction of those inputs (e.g., talking or singing), under multiple generative models. This is based on recent advances in theoretical neurobiology—namely, active inference and post hoc (online) Bayesian model selection. In brief, this scheme fits sensory inputs under each generative model. Model parameters are then updated in proportion to the probability that each model could have generated the input (i.e., model evidence). The proposed scheme is demonstrated using a series of (real zebra finch) birdsongs, where each song is generated by several different birds. The scheme is implemented using physiologically plausible models of birdsong production. We show that generalized Bayesian filtering, combined with model selection, leads to successful learning across generative models, each possessing different parameters. These results highlight the utility of having multiple internal models when making inferences in social environments with multiple sources of sensory information.</span>31122390243110.1162/neco_a_01239https://direct.mit.edu/neco/article/31/12/2390/95613/Bayesian-Filtering-with-Multiple-Internal-ModelsThe Effect of Signaling Latencies and Node Refractory States on the Dynamics of Networks
https://direct.mit.edu/neco/article/31/12/2492/95612/The-Effect-of-Signaling-Latencies-and-Node
Sun, 01 Dec 2019 00:00:00 GMTSilva GA. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We describe the construction and theoretical analysis of a framework derived from canonical neurophysiological principles that model the competing dynamics of incident signals into nodes along directed edges in a network. The framework describes the dynamics between the offset in the latencies of propagating signals, which reflect the geometry of the edges and conduction velocities, and the internal refractory dynamics and processing times of the downstream node receiving the signals. This framework naturally extends to the construction of a perceptron model that takes into account such dynamic geometric considerations. We first describe the model in detail, culminating with the model of a geometric dynamic perceptron. We then derive upper and lower bounds for a notion of optimal efficient signaling between vertex pairs based on the structure of the framework. Efficient signaling in the context of the framework we develop here means that there needs to be a temporal match between the arrival time of the signals relative to how quickly nodes can internally process signals. These bounds reflect numerical constraints on the compensation of the timing of signaling events of upstream nodes attempting to activate downstream nodes they connect into that preserve this notion of efficiency. When a mismatch between signal arrival times and the internal states of activated nodes occurs, it can cause a breakdown in the signaling dynamics of the network. In contrast to essentially all of the current state of the art in machine learning, this work provides a theoretical foundation for machine learning and intelligence architectures based on the timing of node activations and their abilities to respond rather than necessary changes in synaptic weights. At the same time, the theoretical ideas we developed are guiding the discovery of experimentally testable new structure-function principles in the biological brain.</span>31122492252210.1162/neco_a_01241https://direct.mit.edu/neco/article/31/12/2492/95612/The-Effect-of-Signaling-Latencies-and-NodeReinforcement Learning in Spiking Neural Networks with Stochastic and Deterministic Synapses
https://direct.mit.edu/neco/article/31/12/2368/95611/Reinforcement-Learning-in-Spiking-Neural-Networks
Sun, 01 Dec 2019 00:00:00 GMTYuan M, Wu X, Yan R, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Though succeeding in solving various learning tasks, most existing reinforcement learning (RL) models have failed to take into account the complexity of synaptic plasticity in the neural system. Models implementing reinforcement learning with spiking neurons involve only a single plasticity mechanism. Here, we propose a neural realistic reinforcement learning model that coordinates the plasticities of two types of synapses: stochastic and deterministic. The plasticity of the stochastic synapse is achieved by the hedonistic rule through modulating the release probability of synaptic neurotransmitter, while the plasticity of the deterministic synapse is achieved by a variant of a reward-modulated spike-timing-dependent plasticity rule through modulating the synaptic strengths. We evaluate the proposed learning model on two benchmark tasks: learning a logic gate function and the 19-state random walk problem. Experimental results show that the coordination of diverse synaptic
plasticities can make the RL model learn in a rapid and stable form.</span>31122368238910.1162/neco_a_01238https://direct.mit.edu/neco/article/31/12/2368/95611/Reinforcement-Learning-in-Spiking-Neural-NetworksEvery Local Minimum Value Is the Global Minimum Value of Induced Model in Nonconvex Machine Learning
https://direct.mit.edu/neco/article/31/12/2293/95610/Every-Local-Minimum-Value-Is-the-Global-Minimum
Sun, 01 Dec 2019 00:00:00 GMTKawaguchi K, Huang J, Kaelbling L. <span class="paragraphSection"><div class="boxTitle">Abstract</div>For nonconvex optimization in machine learning, this article proves that every local minimum achieves the globally optimal value of the perturbable gradient basis model at any differentiable point. As a result, nonconvex machine learning is theoretically as supported as convex machine learning with a handcrafted basis in terms of the loss at differentiable local minima, except in the case when a preference is given to the handcrafted basis over the perturbable gradient basis. The proofs of these results are derived under mild assumptions. Accordingly, the proven results are directly applicable to many machine learning models, including practical deep neural networks, without any modification of practical methods. Furthermore, as special cases of our general results, this article improves or complements several state-of-the-art theoretical results on deep neural networks, deep residual networks, and overparameterized deep neural networks with a unified proof technique and novel geometric insights. A special case of our results also contributes to the theoretical foundation of representation learning.</span>31122293232310.1162/neco_a_01234https://direct.mit.edu/neco/article/31/12/2293/95610/Every-Local-Minimum-Value-Is-the-Global-MinimumReplicating Neuroscience Observations on ML/MF and AM Face Patches by Deep Generative Model
https://direct.mit.edu/neco/article/31/12/2348/95609/Replicating-Neuroscience-Observations-on-ML-MF-and
Sun, 01 Dec 2019 00:00:00 GMTHan T, Xing X, Wu J, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>A recent <span style="font-style:italic;">Cell</span> paper (Chang & Tsao, <a href="#B2" class="reflinks">2017</a>) reports an interesting discovery. For the face stimuli generated by a pretrained active appearance model (AAM), the responses of neurons in the areas of the primate brain that are responsible for face recognition exhibit a strong linear relationship with the shape variables and appearance variables of the AAM that generates the face stimuli. In this letter, we show that this behavior can be replicated by a deep generative model, the generator network, that assumes that the observed signals are generated by latent random variables via a top-down convolutional neural network. Specifically, we learn the generator network from the face images generated by a pretrained AAM model using a variational autoencoder, and we show that the inferred latent variables of the learned generator network have a strong linear relationship with the shape and appearance variables of the AAM model that generates the face images. Unlike the AAM model, which has an explicit shape model where the shape variables generate the control points or landmarks, the generator network has no such shape model and shape variables. Yet it can learn the shape knowledge in the sense that some of the latent variables of the learned generator network capture the shape variations in the face images generated by AAM.</span>31122348236710.1162/neco_a_01236https://direct.mit.edu/neco/article/31/12/2348/95609/Replicating-Neuroscience-Observations-on-ML-MF-andCan Grid Cell Ensembles Represent Multiple Spaces?
https://direct.mit.edu/neco/article/31/12/2324/95608/Can-Grid-Cell-Ensembles-Represent-Multiple-Spaces
Sun, 01 Dec 2019 00:00:00 GMTSpalla D, Dubreuil A, Rosay S, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The way grid cells represent space in the rodent brain has been a striking discovery, with theoretical implications still unclear. Unlike hippocampal place cells, which are known to encode multiple, environment-dependent spatial maps, grid cells have been widely believed to encode space through a single low-dimensional manifold, in which coactivity relations between different neurons are preserved when the environment is changed. Does it have to be so? Here, we compute, using two alternative mathematical models, the storage capacity of a population of grid-like units, embedded in a continuous attractor neural network, for multiple spatial maps. We show that distinct representations of multiple environments can coexist, as existing models for grid cells have the potential to express several sets of hexagonal grid patterns, challenging the view of a universal grid map. This suggests that a population of grid cells can encode multiple noncongruent metric relationships, a feature that could in principle allow a grid-like code to represent environments with a variety of different geometries and possibly conceptual and cognitive spaces, which may be expected to entail such context-dependent metric relationships.</span>31122324234710.1162/neco_a_01237https://direct.mit.edu/neco/article/31/12/2324/95608/Can-Grid-Cell-Ensembles-Represent-Multiple-SpacesOn the Effect of the Activation Function on the Distribution of Hidden Nodes in a Deep Network
https://direct.mit.edu/neco/article/31/12/2562/95607/On-the-Effect-of-the-Activation-Function-on-the
Sun, 01 Dec 2019 00:00:00 GMTLong PM, Sedghi H. <span class="paragraphSection"><div class="boxTitle">Abstract</div>We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to gaussian distributions. We show that if the activation function φ satisfies a minimal set of assumptions, satisfied by all activation functions that we know that are used in practice, then, as the width of the network gets large, the “length process” converges in probability to a length map that is determined as a simple function of the variances of the random weights and biases and the activation function φ. We also show that this convergence may fail for φ that violate our assumptions. We show how to use this analysis to choose the variance of weight initialization, depending on the activation function, so that hidden variables maintain a consistent scale throughout the network.</span>31122562258010.1162/neco_a_01235https://direct.mit.edu/neco/article/31/12/2562/95607/On-the-Effect-of-the-Activation-Function-on-theOn Functions Computed on Trees
https://direct.mit.edu/neco/article/31/11/2075/95669/On-Functions-Computed-on-Trees
Fri, 01 Nov 2019 00:00:00 GMTFarhoodi R, Filom K, Jones I, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Any function can be constructed using a hierarchy of simpler functions through compositions. Such a hierarchy can be characterized by a binary rooted tree. Each node of this tree is associated with a function that takes as inputs two numbers from its children and produces one output. Since thinking about functions in terms of computation graphs is becoming popular, we may want to know which functions can be implemented on a given tree. Here, we describe a set of necessary constraints in the form of a system of nonlinear partial differential equations that must be satisfied. Moreover, we prove that these conditions are sufficient in contexts of analytic and bit-valued functions. In the latter case, we explicitly enumerate discrete functions and observe that there are relatively few. Our point of view allows us to compare different neural network architectures in regard to their function spaces. Our work connects the structure of computation graphs with the functions they can implement and has potential applications to neuroscience and computer science.</span>31112075213710.1162/neco_a_01231https://direct.mit.edu/neco/article/31/11/2075/95669/On-Functions-Computed-on-TreesAdversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning
https://direct.mit.edu/neco/article/31/11/2266/95668/Adversarial-Feature-Alignment-Avoid-Catastrophic
Fri, 01 Nov 2019 00:00:00 GMTYao X, Huang T, Wu C, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Humans are able to master a variety of knowledge and skills with ongoing learning. By contrast, dramatic performance degradation is observed when new tasks are added to an existing neural network model. This phenomenon, termed <span style="font-style:italic;">catastrophic forgetting</span>, is one of the major roadblocks that prevent deep neural networks from achieving human-level artificial intelligence. Several research efforts (e.g., lifelong or continual learning algorithms) have proposed to tackle this problem. However, they either suffer from an accumulating drop in performance as the task sequence grows longer, or require storing an excessive number of model parameters for historical memory, or cannot obtain competitive performance on the new tasks. In this letter, we focus on the incremental multitask image classification scenario. Inspired by the learning process of students, who usually decompose complex tasks into easier goals, we propose an adversarial feature alignment method to avoid catastrophic forgetting. In our design, both the low-level visual features and high-level semantic features serve as soft targets and guide the training process in multiple stages, which provide sufficient supervised information of the old tasks and help to reduce forgetting. Due to the knowledge distillation and regularization phenomena, the proposed method gains even better performance than fine-tuning on the new tasks, which makes it stand out from other methods. Extensive experiments in several typical lifelong learning scenarios demonstrate that our method outperforms the state-of-the-art methods in both accuracy on new tasks and performance preservation on old tasks.</span>31112266229110.1162/neco_a_01232https://direct.mit.edu/neco/article/31/11/2266/95668/Adversarial-Feature-Alignment-Avoid-CatastrophicA Novel Predictive-Coding-Inspired Variational RNN Model for Online Prediction and Recognition
https://direct.mit.edu/neco/article/31/11/2025/95667/A-Novel-Predictive-Coding-Inspired-Variational-RNN
Fri, 01 Nov 2019 00:00:00 GMTAhmadi A, Tani J. <span class="paragraphSection"><div class="boxTitle">Abstract</div>This study introduces PV-RNN, a novel variational RNN inspired by predictive-coding ideas. The model learns to extract the probabilistic structures hidden in fluctuating temporal patterns by dynamically changing the stochasticity of its latent states. Its architecture attempts to address two major concerns of variational Bayes RNNs: how latent variables can learn meaningful representations and how the inference model can transfer future observations to the latent variables. PV-RNN does both by introducing adaptive vectors mirroring the training data, whose values can then be adapted differently during evaluation. Moreover, prediction errors during backpropagation—rather than external inputs during the forward computation—are used to convey information to the network about the external data. For testing, we introduce error regression for predicting unseen sequences as inspired by predictive coding that leverages those mechanisms. As in other variational Bayes RNNs, our model learns by maximizing a lower bound on the marginal likelihood of the sequential data, which is composed of two terms: the negative of the expectation of prediction errors and the negative of the Kullback-Leibler divergence between the prior and the approximate posterior distributions. The model introduces a weighting parameter, the meta-prior, to balance the optimization pressure placed on those two terms. We test the model on two data sets with probabilistic structures and show that with high values of the meta-prior, the network develops deterministic chaos through which the randomness of the data is imitated. For low values, the model behaves as a random process. The network performs best on intermediate values and is able to capture the latent probabilistic structure with good generalization. Analyzing the meta-prior's impact on the network allows us to precisely study the theoretical value and practical benefits of incorporating stochastic dynamics in our model. We demonstrate better prediction performance on a robot imitation task with our model using error regression compared to a standard variational Bayes model lacking such a procedure.</span>31112025207410.1162/neco_a_01228https://direct.mit.edu/neco/article/31/11/2025/95667/A-Novel-Predictive-Coding-Inspired-Variational-RNNIntegrating Flexible Normalization into Midlevel Representations of Deep Convolutional Neural Networks
https://direct.mit.edu/neco/article/31/11/2138/95666/Integrating-Flexible-Normalization-into-Midlevel
Fri, 01 Nov 2019 00:00:00 GMTGiraldo L, Schwartz O. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Deep convolutional neural networks (CNNs) are becoming increasingly popular models to predict neural responses in visual cortex. However, contextual effects, which are prevalent in neural processing and in perception, are not explicitly handled by current CNNs, including those used for neural prediction. In primary visual cortex, neural responses are modulated by stimuli spatially surrounding the classical receptive field in rich ways. These effects have been modeled with divisive normalization approaches, including flexible models, where spatial normalization is recruited only to the degree that responses from center and surround locations are deemed statistically dependent. We propose a flexible normalization model applied to midlevel representations of deep CNNs as a tractable way to study contextual normalization mechanisms in midlevel cortical areas. This approach captures nontrivial spatial dependencies among midlevel features in CNNs, such as those present in textures and other visual stimuli, that arise from tiling high-order features geometrically. We expect that the proposed approach can make predictions about when spatial normalization might be recruited in midlevel cortical areas. We also expect this approach to be useful as part of the CNN tool kit, therefore going beyond more restrictive fixed forms of normalization.</span>31112138217610.1162/neco_a_01226https://direct.mit.edu/neco/article/31/11/2138/95666/Integrating-Flexible-Normalization-into-MidlevelCapturing the Forest but Missing the Trees: Microstates Inadequate for Characterizing Shorter-Scale EEG Dynamics
https://direct.mit.edu/neco/article/31/11/2177/95665/Capturing-the-Forest-but-Missing-the-Trees
Fri, 01 Nov 2019 00:00:00 GMTShaw S, Dhindsa K, Reilly JP, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>The brain is known to be active even when not performing any overt cognitive tasks, and often it engages in involuntary mind wandering. This resting state has been extensively characterized in terms of fMRI-derived brain networks. However, an alternate method has recently gained popularity: EEG microstate analysis. Proponents of microstates postulate that the brain discontinuously switches between four quasi-stable states defined by specific EEG scalp topologies at peaks in the global field potential (GFP). These microstates are thought to be “atoms of thought,” involved with visual, auditory, salience, and attention processing. However, this method makes some major assumptions by excluding EEG data outside the GFP peaks and then clustering the EEG scalp topologies at the GFP peaks, assuming that only one microstate is active at any given time. This study explores the evidence surrounding these assumptions by studying the temporal dynamics of microstates and its clustering space using tools from dynamical systems analysis, fractal, and chaos theory to highlight the shortcomings in microstate analysis. The results show evidence of complex and chaotic EEG dynamics outside the GFP peaks, which is being missed by microstate analysis. Furthermore, the winner-takes-all approach of only one microstate being active at a time is found to be inadequate since the dynamic EEG scalp topology does not always resemble that of the assigned microstate, and there is competition among the different microstate classes. Finally, clustering space analysis shows that the four microstates do not cluster into four distinct and separable clusters. Taken collectively, these results show that the discontinuous description of EEG microstates is inadequate when looking at nonstationary short-scale EEG dynamics.</span>31112177221110.1162/neco_a_01229https://direct.mit.edu/neco/article/31/11/2177/95665/Capturing-the-Forest-but-Missing-the-TreesDynamic Integrative Synaptic Plasticity Explains the Spacing Effect in the Transition from Short- to Long-Term Memory
https://direct.mit.edu/neco/article/31/11/2212/95664/Dynamic-Integrative-Synaptic-Plasticity-Explains
Fri, 01 Nov 2019 00:00:00 GMTElliott T. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Repeated stimuli that are spaced apart in time promote the transition from short- to long-term memory, while massing repetitions together does not. Previously, we showed that a model of integrative synaptic plasticity, in which plasticity induction signals are integrated by a low-pass filter before plasticity is expressed, gives rise to a natural timescale at which to repeat stimuli, hinting at a partial account of this spacing effect. The account was only partial because the important role of neuromodulation was not considered. We now show that by extending the model to allow dynamic integrative synaptic plasticity, the model permits synapses to robustly discriminate between spaced and massed repetition protocols, suppressing the response to massed stimuli while maintaining that to spaced stimuli. This is achieved by dynamically coupling the filter decay rate to neuromodulatory signaling in a very simple model of the signaling cascades downstream from cAMP production. In particular, the model's parameters may be interpreted as corresponding to the duration and amplitude of the waves of activity in the MAPK pathway. We identify choices of parameters and repetition times for stimuli in this model that optimize the ability of synapses to discriminate between spaced and massed repetition protocols. The model is very robust to reasonable changes around these optimal parameters and times, but for large changes in parameters, the model predicts that massed and spaced stimuli cannot be distinguished or that the responses to both patterns are suppressed. A model of dynamic integrative synaptic plasticity therefore explains the spacing effect under normal conditions and also predicts its breakdown under abnormal conditions.</span>31112212225110.1162/neco_a_01227https://direct.mit.edu/neco/article/31/11/2212/95664/Dynamic-Integrative-Synaptic-Plasticity-ExplainsMutual Inhibition with Few Inhibitory Cells via Nonlinear Inhibitory Synaptic Interaction
https://direct.mit.edu/neco/article/31/11/2252/95663/Mutual-Inhibition-with-Few-Inhibitory-Cells-via
Fri, 01 Nov 2019 00:00:00 GMTWeissenberger F, Gauy M, Zou X, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>In computational neural network models, neurons are usually allowed to excite some and inhibit other neurons, depending on the weight of their synaptic connections. The traditional way to transform such networks into networks that obey Dale's law (i.e., a neuron can either excite or inhibit) is to accompany each excitatory neuron with an inhibitory one through which inhibitory signals are mediated. However, this requires an equal number of excitatory and inhibitory neurons, whereas a realistic number of inhibitory neurons is much smaller. In this letter, we propose a model of nonlinear interaction of inhibitory synapses on dendritic compartments of excitatory neurons that allows the excitatory neurons to mediate inhibitory signals through a subset of the inhibitory population. With this construction, the number of required inhibitory neurons can be reduced tremendously.</span>31112252226510.1162/neco_a_01230https://direct.mit.edu/neco/article/31/11/2252/95663/Mutual-Inhibition-with-Few-Inhibitory-Cells-viaMachine Learning of Time Series Using Time-Delay Embedding and Precision Annealing
https://direct.mit.edu/neco/article/31/10/2004/95559/Machine-Learning-of-Time-Series-Using-Time-Delay
Tue, 01 Oct 2019 00:00:00 GMTTy AA, Fang Z, Gonzalez RA, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Tasking machine learning to predict segments of a time series requires estimating the parameters of a ML model with input/output pairs from the time series. We borrow two techniques used in statistical data assimilation in order to accomplish this task: time-delay embedding to prepare our input data and precision annealing as a training method. The precision annealing approach identifies the global minimum of the action (-log[P]). In this way, we are able to identify the number of training pairs required to produce good generalizations (predictions) for the time series. We proceed from a scalar time series s(tn);tn=t0+nΔt and, using methods of nonlinear time series analysis, show how to produce a DE>1-dimensional time-delay embedding space in which the time series has no false neighbors as does the observed s(tn) time series. In that DE-dimensional space, we explore the use of feedforward multilayer perceptrons as network models operating on DE-dimensional input and producing DE-dimensional outputs.</span>31102004202410.1162/neco_a_01224https://direct.mit.edu/neco/article/31/10/2004/95559/Machine-Learning-of-Time-Series-Using-Time-DelayA Real-Time Health 4.0 Framework with Novel Feature Extraction and Classification for Brain-Controlled IoT-Enabled Environments
https://direct.mit.edu/neco/article/31/10/1915/95558/A-Real-Time-Health-4-0-Framework-with-Novel
Tue, 01 Oct 2019 00:00:00 GMTJagadish BB, Mishra PK, Kiran MS, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>In this letter, we propose two novel methods for four-class motor imagery (MI) classification using electroencephalography (EEG). Also, we developed a real-time health 4.0 (H4.0) architecture for brain-controlled internet of things (IoT) enabled environments (BCE), which uses the classified MI task to assist disabled persons in controlling IoT-enabled environments such as lighting and heating, ventilation, and air-conditioning (HVAC). The first method for classification involves a simple and low-complex classification framework using a combination of regularized Riemannian mean (RRM) and linear SVM. Although this method performs better compared to state-of-the-art techniques, it still suffers from a nonnegligible misclassification rate. Hence, to overcome this, the second method offers a persistent decision engine (PDE) for the MI classification, which improves classification accuracy (CA) significantly. The proposed methods are validated using an in-house recorded four-class MI data set (data set I, collected over 14 subjects), and a four-class MI data set 2a of BCI competition IV (data set II, collected over 9 subjects). The proposed RRM architecture obtained average CAs of 74.30% and 67.60% when validated using datasets I and II, respectively. When analyzed along with the proposed PDE classification framework, an average CA of 92.25% on 12 subjects of data set I and 82.54% on 7 subjects of data set II is obtained. The results show that the PDE algorithm is more reliable for the classification of four-class MI and is also feasible for BCE applications. The proposed low-complex BCE architecture is implemented in real time using Raspberry Pi 3 Model B+ along with the Virgo EEG data acquisition system. The hardware implementation results show that the proposed system architecture is well suited for body-wearable devices in the scenario of Health 4.0. We strongly feel that this study can aid in driving the future scope of BCE research.</span>31101915194410.1162/neco_a_01223https://direct.mit.edu/neco/article/31/10/1915/95558/A-Real-Time-Health-4-0-Framework-with-NovelA Mechanism for Synaptic Copy Between Neural Circuits
https://direct.mit.edu/neco/article/31/10/1964/95557/A-Mechanism-for-Synaptic-Copy-Between-Neural
Tue, 01 Oct 2019 00:00:00 GMTShao Y, Wang B, Sornborger AT, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Cortical oscillations are central to information transfer in neural systems. Significant evidence supports the idea that coincident spike input can allow the neural threshold to be overcome and spikes to be propagated downstream in a circuit. Thus, an observation of oscillations in neural circuits would be an indication that repeated synchronous spiking may be enabling information transfer. However, for memory transfer, in which synaptic weights must be being transferred from one neural circuit (region) to another, what is the mechanism? Here, we present a synaptic transfer mechanism whose structure provides some understanding of the phenomena that have been implicated in memory transfer, including nested oscillations at various frequencies. The circuit is based on the principle of pulse-gated, graded information transfer between neural populations.</span>31101964198410.1162/neco_a_01221https://direct.mit.edu/neco/article/31/10/1964/95557/A-Mechanism-for-Synaptic-Copy-Between-NeuralOne Step Back, Two Steps Forward: Interference and Learning in Recurrent Neural Networks
https://direct.mit.edu/neco/article/31/10/1985/95556/One-Step-Back-Two-Steps-Forward-Interference-and
Tue, 01 Oct 2019 00:00:00 GMTBeer C, Barak O. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Artificial neural networks, trained to perform cognitive tasks, have recently been used as models for neural recordings from animals performing these tasks. While some progress has been made in performing such comparisons, the evolution of network dynamics throughout learning remains unexplored. This is paralleled by an experimental focus on recording from trained animals, with few studies following neural activity throughout training.In this work, we address this gap in the realm of artificial networks by analyzing networks that are trained to perform memory and pattern generation tasks. The functional aspect of these tasks corresponds to dynamical objects in the fully trained network—a line attractor or a set of limit cycles for the two respective tasks. We use these dynamical objects as anchors to study the effect of learning on their emergence. We find that the sequential nature of learning—one trial at a time—has major consequences for the learning trajectory and its final outcome. Specifically, we show that least mean squares (LMS), a simple gradient descent suggested as a biologically plausible version of the FORCE algorithm, is constantly obstructed by forgetting, which is manifested as the destruction of dynamical objects from previous trials. The degree of interference is determined by the correlation between different trials. We show which specific ingredients of FORCE avoid this phenomenon. Overall, this difference results in convergence that is orders of magnitude slower for LMS.Learning implies accumulating information across multiple trials to form the overall concept of the task. Our results show that interference between trials can greatly affect learning in a learning-rule-dependent manner. These insights can help design experimental protocols that minimize such interference, and possibly infer underlying learning rules by observing behavior and neural activity throughout learning.</span>31101985200310.1162/neco_a_01222https://direct.mit.edu/neco/article/31/10/1985/95556/One-Step-Back-Two-Steps-Forward-Interference-andA Minimum Free Energy Model of Motor Learning
https://direct.mit.edu/neco/article/31/10/1945/95555/A-Minimum-Free-Energy-Model-of-Motor-Learning
Tue, 01 Oct 2019 00:00:00 GMTMitchell BA, Lauharatanahirun NN, Garcia JO, et al. <span class="paragraphSection"><div class="boxTitle">Abstract</div>Even highly trained behaviors demonstrate variability, which is correlated with performance on current and future tasks. An objective of motor learning that is general enough to explain these phenomena has not been precisely formulated. In this six-week longitudinal learning study, participants practiced a set of motor sequences each day, and neuroimaging data were collected on days 1, 14, 28, and 42 to capture the neural correlates of the learning process. In our analysis, we first modeled the underlying neural and behavioral dynamics during learning. Our results demonstrate that the densities of whole-brain response, task-active regional response, and behavioral performance evolve according to a Fokker-Planck equation during the acquisition of a motor skill. We show that this implies that the brain concurrently optimizes the entropy of a joint density over neural response and behavior (as measured by sampling over multiple trials and subjects) and the expected performance under this density; we call this formulation of learning minimum free energy learning (MFEL). This model provides an explanation as to how behavioral variability can be tuned while simultaneously improving performance during learning. We then develop a novel variant of inverse reinforcement learning to retrieve the cost function optimized by the brain during the learning process, as well as the parameter used to tune variability. We show that this population-level analysis can be used to derive a learning objective that each subject optimizes during his or her study. In this way, MFEL effectively acts as a unifying principle, allowing users to precisely formulate learning objectives and infer their structure.</span>31101945196310.1162/neco_a_01219https://direct.mit.edu/neco/article/31/10/1945/95555/A-Minimum-Free-Energy-Model-of-Motor-Learning