Neural Computation Advance Access
https://direct.mit.edu/neco
en-usTue, 08 Oct 2024 00:00:00 GMTWed, 09 Oct 2024 22:46:12 GMTSilverchaireditor@direct.mit.edu/necowebmaster@direct.mit.edu/necoBounded Rational Decision Networks with Belief Propagation
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01719/124827/Bounded-Rational-Decision-Networks-with-Belief
Tue, 08 Oct 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>Complex information processing systems that are capable of a wide variety of tasks, such as the human brain, are composed of specialized units that collaborate and communicate with each other. An important property of such information processing networks is locality: there is no single global unit controlling the modules, but information is exchanged locally. Here, we consider a decision-theoretic approach to study networks of bounded rational decision makers that are allowed to specialize and communicate with each other. In contrast to previous work that has focused on feedforward communication between decision-making agents, we consider cyclical information processing paths allowing for back-and-forth communication. We adapt message-passing algorithms to suit this purpose, essentially allowing for local information flow between units and thus enabling circular dependency structures. We provide examples that show how repeated communication can increase performance given that each unit’s information processing capability is limited and that decision-making systems with too few or too many connections and feedback loops achieve suboptimal utility.</span>15110.1162/neco_a_01719https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01719/124827/Bounded-Rational-Decision-Networks-with-BeliefOptimizing Attention and Cognitive Control Costs Using Temporally Layered Architectures
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01718/124826/Optimizing-Attention-and-Cognitive-Control-Costs
Tue, 08 Oct 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>The current reinforcement learning framework focuses exclusively on performance, often at the expense of efficiency. In contrast, biological control achieves remarkable performance while also optimizing computational energy expenditure and decision frequency. We propose a decision-bounded Markov decision process (DB-MDP) that constrains the number of decisions and computational energy available to agents in reinforcement learning environments. Our experiments demonstrate that existing reinforcement learning algorithms struggle within this framework, leading to either failure or suboptimal performance. To address this, we introduce a biologically inspired, temporally layered architecture (TLA), enabling agents to manage computational costs through two layers with distinct timescales and energy requirements. TLA achieves optimal performance in decision-bounded environments and in continuous control environments, matching state-of-the-art performance while using a fraction of the computing cost. Compared to current reinforcement learning algorithms that solely prioritize performance, our approach significantly lowers computational energy expenditure while maintaining performance. These findings establish a benchmark and pave the way for future research on energy and time-aware control.</span>13010.1162/neco_a_01718https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01718/124826/Optimizing-Attention-and-Cognitive-Control-CostsA Fast Algorithm for All-Pairs-Shortest-Paths Suitable for Neural Networks
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01716/124825/A-Fast-Algorithm-for-All-Pairs-Shortest-Paths
Tue, 08 Oct 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>Given a directed graph of nodes and edges connecting them, a common problem is to find the shortest path between any two nodes. Here we show that the shortest path distances can be found by a simple matrix inversion: if the edges are given by the adjacency matrix Aij, then with a suitably small value of γ, the shortest path distances are Dij=ceil(logγ[(I-γA)-1]ij).We derive several graph-theoretic bounds on the value of γ and explore its useful range with numerics on different graph types. Even when the distance function is not globally accurate across the entire graph, it still works locally to instruct pursuit of the shortest path. In this mode, it also extends to weighted graphs with positive edge weights. For a wide range of dense graphs, this distance function is computationally faster than the best available alternative. Finally, we show that this method leads naturally to a neural network solution of the all-pairs-shortest-path problem.</span>12410.1162/neco_a_01716https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01716/124825/A-Fast-Algorithm-for-All-Pairs-Shortest-PathsRelating Human Error–Based Learning to Modern Deep RL Algorithms
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01721/124824/Relating-Human-Error-Based-Learning-to-Modern-Deep
Tue, 08 Oct 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>In human error–based learning, the size and direction of a scalar error (i.e., the “directed error”) are used to update future actions. Modern deep reinforcement learning (RL) methods perform a similar operation but in terms of scalar rewards. Despite this similarity, the relationship between action updates of deep RL and human error–based learning has yet to be investigated. Here, we systematically compare the three major families of deep RL algorithms to human error–based learning. We show that all three deep RL approaches are qualitatively different from human error–based learning, as assessed by a mirror-reversal perturbation experiment. To bridge this gap, we developed an alternative deep RL algorithm inspired by human error–based learning, model-based deterministic policy gradients (MB-DPG). We showed that MB-DPG captures human error–based learning under mirror-reversal and rotational perturbations and that MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL.</span>13210.1162/neco_a_01721https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01721/124824/Relating-Human-Error-Based-Learning-to-Modern-DeepFine Granularity Is Critical for Intelligent Neural Network Pruning
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01717/124823/Fine-Granularity-Is-Critical-for-Intelligent
Tue, 08 Oct 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>Neural network pruning is a popular approach to reducing the computational costs of training and/or deploying a network and aims to do so while minimizing accuracy loss. Pruning methods that remove individual weights (fine granularity) can remove more total network parameters before reaching a given degree of accuracy loss, while methods that preserve some or all of a network’s structure (coarser granularity, such as pruning channels from a CNN) take better advantage of hardware and software optimized for dense matrix computations. We compare intelligent iterative pruning using several different criteria sampled from the literature against random pruning at initialization across multiple granularities on two different architectures and three image classification tasks. Our work is the first direct and comprehensive investigation of the relationship between granularity and the efficacy of intelligent pruning relative to a random-pruning baseline. We find that the accuracy advantage of intelligent over random pruning decreases dramatically as granularity becomes coarser, with minimal advantage for intelligent pruning at granularity coarse enough to fully preserve network structure. For instance, at pruning rates where random pruning leaves ResNet-20 at 85.0% test accuracy on CIFAR-10 after 30,000 training iterations, intelligent weight pruning with the best-in-context criterion leaves it at about 90.0% accuracy (on par with the unpruned network), kernel pruning leaves it at about 86.5%, and channel pruning leaves it at about 85.5%. Our results suggest that compared to coarse pruning, fine pruning combined with efficient implementation of the resulting networks is a more promising direction for easing the trade-off between high accuracy and low computational cost.</span>13310.1162/neco_a_01717https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01717/124823/Fine-Granularity-Is-Critical-for-IntelligentComputation with Sequences of Assemblies in a Model of the Brain
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01720/124822/Computation-with-Sequences-of-Assemblies-in-a
Tue, 08 Oct 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>Even as machine learning exceeds human-level performance on many applications, the generality, robustness, and rapidity of the brain’s learning capabilities remain unmatched. How cognition arises from neural activity is <span style="font-style:italic;">the</span> central open question in neuroscience, inextricable from the study of intelligence itself. A simple formal model of neural activity was proposed in Papadimitriou <span style="font-style:italic;">et al.</span> (2020) and has been subsequently shown, through both mathematical proofs and simulations, to be capable of implementing certain simple cognitive operations via the creation and manipulation of assemblies of neurons. However, many intelligent behaviors rely on the ability to recognize, store, and manipulate temporal sequences of stimuli (planning, language, navigation, to list a few). Here we show that in the same model, sequential precedence can be captured naturally through synaptic weights and plasticity, and, as a result, a range of computations on sequences of assemblies can be carried out. In particular, repeated presentation of a sequence of stimuli leads to the memorization of the sequence through corresponding neural assemblies: upon future presentation of any stimulus in the sequence, the corresponding assembly and its subsequent ones will be activated, one after the other, until the end of the sequence. If the stimulus sequence is presented to two brain areas simultaneously, a scaffolded representation is created, resulting in more efficient memorization and recall, in agreement with cognitive experiments. Finally, we show that any finite state machine can be learned in a similar way, through the presentation of appropriate patterns of sequences. Through an extension of this mechanism, the model can be shown to be capable of universal computation. Taken together, these results provide a concrete hypothesis for the basis of the brain’s remarkable abilities to compute and learn, with sequences playing a vital role.</span>14110.1162/neco_a_01720https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01720/124822/Computation-with-Sequences-of-Assemblies-in-aSparse-Coding Variational Autoencoders
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01715/124821/Sparse-Coding-Variational-Autoencoders
Tue, 08 Oct 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>The sparse coding model posits that the visual system has evolved to efficiently code natural stimuli using a sparse set of features from an overcomplete dictionary. The original sparse coding model suffered from two key limitations, however: (1) computing the neural response to an image patch required minimizing a nonlinear objective function via recurrent dynamics and (2) fitting relied on approximate inference methods that ignored uncertainty. Although subsequent work has developed several methods to overcome these obstacles, we propose a novel solution inspired by the variational autoencoder (VAE) framework. We introduce the sparse coding variational autoencoder (SVAE), which augments the sparse coding model with a probabilistic recognition model parameterized by a deep neural network. This recognition model provides a neurally plausible feedforward implementation for the mapping from image patches to neural activities and enables a principled method for fitting the sparse coding model to data via maximization of the evidence lower bound (ELBO). The SVAE differs from standard VAEs in three key respects: the latent representation is overcomplete (there are more latent dimensions than image pixels), the prior is sparse or heavy-tailed instead of gaussian, and the decoder network is a linear projection instead of a deep network. We fit the SVAE to natural image data under different assumed prior distributions and show that it obtains higher test performance than previous fitting methods. Finally, we examine the response properties of the recognition network and show that it captures important nonlinear properties of neurons in the early visual pathway.</span>13110.1162/neco_a_01715https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01715/124821/Sparse-Coding-Variational-AutoencodersOrthogonal Gated Recurrent Unit With Neumann-Cayley Transformation
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01710/124541/Orthogonal-Gated-Recurrent-Unit-With-Neumann
Mon, 23 Sep 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series–based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.</span>12610.1162/neco_a_01710https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01710/124541/Orthogonal-Gated-Recurrent-Unit-With-NeumannRealizing Synthetic Active Inference Agents, Part II: Variational Message Updates
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01713/124539/Realizing-Synthetic-Active-Inference-Agents-Part
Mon, 23 Sep 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>The free energy principle (FEP) describes (biological) agents as minimizing a variational free energy (FE) with respect to a generative model of their environment. Active inference (AIF) is a corollary of the FEP that describes how agents explore and exploit their environment by minimizing an expected FE objective. In two related papers, we describe a scalable, epistemic approach to synthetic AIF by message passing on free-form Forney-style factor graphs (FFGs). A companion paper (part I of this article; Koudahl et al., 2023) introduces a constrained FFG (CFFG) notation that visually represents (generalized) FE objectives for AIF. This article (part II) derives message-passing algorithms that minimize (generalized) FE objectives on a CFFG by variational calculus. A comparison between simulated Bethe and generalized FE agents illustrates how the message-passing approach to synthetic AIF induces epistemic behavior on a T-maze navigation task. Extension of the T-maze simulation to learning goal statistics and a multiagent bargaining setting illustrate how this approach encourages reuse of nodes and updates in alternative settings. With a full message-passing account of synthetic AIF agents, it becomes possible to derive and reuse message updates across models and move closer to industrial applications of synthetic AIF.</span>13810.1162/neco_a_01713https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01713/124539/Realizing-Synthetic-Active-Inference-Agents-PartAssociative Learning and Active Inference
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01711/124536/Associative-Learning-and-Active-Inference
Mon, 23 Sep 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>Associative learning is a behavioral phenomenon in which individuals develop connections between stimuli or events based on their co-occurrence. Initially studied by Pavlov in his conditioning experiments, the fundamental principles of learning have been expanded on through the discovery of a wide range of learning phenomena. Computational models have been developed based on the concept of minimizing reward prediction errors. The Rescorla-Wagner model, in particular, is a well-known model that has greatly influenced the field of reinforcement learning. However, the simplicity of these models restricts their ability to fully explain the diverse range of behavioral phenomena associated with learning. In this study, we adopt the free energy principle, which suggests that living systems strive to minimize surprise or uncertainty under their internal models of the world. We consider the learning process as the minimization of free energy and investigate its relationship with the Rescorla-Wagner model, focusing on the informational aspects of learning, different types of surprise, and prediction errors based on beliefs and values. Furthermore, we explore how well-known behavioral phenomena such as blocking, overshadowing, and latent inhibition can be modeled within the active inference framework. We accomplish this by using the informational and novelty aspects of attention, which share similar ideas proposed by seemingly contradictory models such as Mackintosh and Pearce-Hall models. Thus, we demonstrate that the free energy principle, as a theoretical framework derived from first principles, can integrate the ideas and models of associative learning proposed based on empirical experiments and serve as a framework for a better understanding of the computational processes behind associative learning in the brain.</span>13410.1162/neco_a_01711https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01711/124536/Associative-Learning-and-Active-InferenceKLIF: An Optimized Spiking Neuron Unit for Tuning Surrogate Gradient Function
https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01712/124535/KLIF-An-Optimized-Spiking-Neuron-Unit-for-Tuning
Mon, 23 Sep 2024 00:00:00 GMT<span class="paragraphSection"><div class="boxTitle">Abstract</div>Spiking neural networks (SNNs) have garnered significant attention owing to their adeptness in processing temporal information, low power consumption, and enhanced biological plausibility. Despite these advantages, the development of efficient and high-performing learning algorithms for SNNs remains a formidable challenge. Techniques such as artificial neural network (ANN)-to-SNN conversion can convert ANNs to SNNs with minimal performance loss, but they necessitate prolonged simulations to approximate rate coding accurately. Conversely, the direct training of SNNs using spike-based backpropagation (BP), such as surrogate gradient approximation, is more flexible and widely adopted. Nevertheless, our research revealed that the shape of the surrogate gradient function profoundly influences the training and inference accuracy of SNNs. Importantly, we identified that the shape of the surrogate gradient function significantly affects the final training accuracy. The shape of the surrogate gradient function is typically manually selected before training and remains static throughout the training process. In this article, we introduce a novel k-based leaky integrate-and-fire (KLIF) spiking neural model. KLIF, featuring a learnable parameter, enables the dynamic adjustment of the height and width of the effective surrogate gradient near threshold during training. Our proposed model undergoes evaluation on static CIFAR-10 and CIFAR-100 data sets, as well as neuromorphic CIFAR10-DVS and DVS128-Gesture data sets. Experimental results demonstrate that KLIF outperforms the leaky Integrate-and-Fire (LIF) model across multiple data sets and network architectures. The superior performance of KLIF positions it as a viable replacement for the essential role of LIF in SNNs across diverse tasks.</span>11510.1162/neco_a_01712https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01712/124535/KLIF-An-Optimized-Spiking-Neuron-Unit-for-Tuning